CN111639082A

CN111639082A - Object storage management method and system of billion-level node scale knowledge graph based on Ceph

Info

Publication number: CN111639082A
Application number: CN202010514803.4A
Authority: CN
Inventors: 曹亮; 刘魁; 李超
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2020-09-08
Anticipated expiration: 2040-06-08
Also published as: CN111639082B

Abstract

The invention discloses a Ceph-based object storage management method and system of a billion-level node scale knowledge graph. The method includes: constructing and designing a graph storage architecture, acquiring entity data of multiple entities corresponding to a target business, and generating entity data according to the entity data. The knowledge map corresponding to the target business is stored, and Ceph is used as a distributed resource storage, and an external index background mechanism is added at the same time, and a large task is decomposed into multiple subtasks by using a distributed computing engine, and distributed to different machines for execution. After completion, it is aggregated to provide large-scale data processing capabilities to support OLAP requirements for users to perform data analysis based on knowledge graphs. The invention also provides an object storage management system based on Ceph's billion-level node scale knowledge graph. This solution refers to the distributed resource manager, which has the characteristics of scalability and high availability, and can store and express massive knowledge at the same time. It supports the data volume of billions of nodes, and has the characteristics of reliability, ease of use and high efficiency.

Description

Object storage management method and method for a billion-level node-scale knowledge graph based on Ceph system

技术领域technical field

本发明涉及信息处理技术领域，尤其涉及基于Ceph的十亿级节点规模知识图谱的对象存储管理方法及系统。The invention relates to the technical field of information processing, in particular to a Ceph-based object storage management method and system of a billion-level node-scale knowledge graph.

背景技术Background technique

知识图谱，是一种用可视化技术描述知识资源及其载体，挖掘、分析、构建、绘制和显示知识及它们之间的相互联系。知识图谱可以将大规模数据中的隐藏的知识抽取出来，构建一个基于图的数据模型。近些年的数据挖掘、大数据、人工智能、机器学习等等和信息处理关联的热门技术，可以用知识图谱来辅助，这些技术的最终目的，实质是把数据收集整理成结构化的、可复用、可推理的存储，进而用于更多的使用场景，而知识图谱这种存储格式，可以近乎完美的匹配这些需求。知识图谱旨在描述真实世界中存在的各种实体或概念，以及他们之间的关联关系，它的每一个实体用全局唯一确定的ID来标识，就如每个人都有一个身份证号码；第二个就是用属性-值对来刻画实体的内在特性，用关系来连接两个实体，刻画他们之间的关联。Knowledge graph is a kind of visualization technology to describe knowledge resources and their carriers, to mine, analyze, construct, draw and display knowledge and their interrelationships. Knowledge graphs can extract hidden knowledge from large-scale data and build a graph-based data model. In recent years, data mining, big data, artificial intelligence, machine learning, and other popular technologies related to information processing can be assisted by knowledge graphs. The ultimate purpose of these technologies is to collect and organize data into structured, accessible Reusable and reasonable storage can be used for more usage scenarios, and the storage format of knowledge graph can almost perfectly match these requirements. The knowledge graph is designed to describe various entities or concepts existing in the real world and their associations, each of which is identified by a globally unique ID, just like everyone has an ID number; The second is to use attribute-value pairs to describe the internal characteristics of entities, and use relationships to connect two entities and describe the association between them.

目前图存储系统最大的缺陷在于并非是真正的分布式，大数据时代可获取的数据越来越多，单机的容量有限，当数据量超过单机的承载能力以后很难处理，底层存储远远没有块存储与对象存储方式效率高，且图查询及图分析效率低下，系统较差的容灾性和实时性，面临数亿节点量级有着难以动态扩容，节点关联查询效率低下等问题。The biggest defect of the current graph storage system is that it is not truly distributed. In the era of big data, more and more data can be obtained, and the capacity of a single machine is limited. When the amount of data exceeds the carrying capacity of a single machine, it is difficult to process, and the underlying storage is far from Block storage and object storage have high efficiency, and graph query and graph analysis are inefficient, and the system has poor disaster tolerance and real-time performance. It is difficult to dynamically expand the capacity of hundreds of millions of nodes, and the efficiency of node association query is low.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的不足，提供基于Ceph的十亿级节点规模知识图谱的对象存储管理方法及系统，能处理十亿节点规模的知识图谱数据，支持大规模图数据存储和持弹性和线性扩展，可用性和容错率高，具备OLTP与CRUD特性，同时还支持OLAP数据分析和外部索引。The purpose of the present invention is to overcome the deficiencies of the prior art, and to provide a Ceph-based object storage management method and system for a knowledge graph with a scale of one billion nodes, which can process knowledge graph data on a scale of one billion nodes, and support large-scale graph data storage and maintenance. Elastic and linear expansion, high availability and fault tolerance, OLTP and CRUD features, and support for OLAP data analysis and external indexing.

本发明的目的是通过以下技术方案来实现的：The purpose of this invention is to realize through the following technical solutions:

基于Ceph的十亿级节点规模知识图谱的对象存储管理方法，方法包括以下步骤：An object storage management method based on Ceph's billion-level node-scale knowledge graph, the method includes the following steps:

S1：图存储架构的构建，获取与目标业务对应的多个实体的实体数据，并根据所述实体数据生成所述目标业务对应的知识图谱并进行存储，并以Ceph作为分布式资源存储器，采用Client/Server架构，以多个Monitor组成的小集群构建一个Ceph集群，同时单个Monitor小集群下采用多个OSD存储图数据；S1: Construction of a graph storage architecture, obtaining entity data of multiple entities corresponding to the target business, and generating and storing the knowledge graph corresponding to the target business according to the entity data, and using Ceph as a distributed resource storage, using Client/Server architecture, a Ceph cluster is built with a small cluster composed of multiple monitors, and multiple OSDs are used to store graph data in a single monitor small cluster;

S2，外部索引后台的构建，将知识图谱数据映射为固定的索引数据结构，利用Elasticsearch/Solr检索引擎作为外部的索引插件，实现非等值查询，同时结合高效的索引机制构建外部索引后台；S2, the construction of an external index backend, which maps the knowledge graph data into a fixed index data structure, uses the Elasticsearch/Solr search engine as an external index plug-in, realizes non-equivalent query, and combines an efficient indexing mechanism to build an external index backend;

S3，集成分布式计算引擎架构的构建，利用Spark计算引擎框架搭建分布式计算引擎，同时利用GraphX库将图关系转换为Spark算子，GraphX库将图数据以RDD分布式地存储在Ceph集群的节点上，使用顶点RDD和边RDD分别对应存储顶点集合和边集合；S3, integrates the construction of the distributed computing engine architecture, uses the Spark computing engine framework to build a distributed computing engine, and uses the GraphX library to convert the graph relationship into Spark operators. The GraphX library stores the graph data in a distributed RDD in the Ceph cluster. On the node, use vertex RDD and edge RDD to store vertex sets and edge sets respectively;

S4，图存储架构管理，在图存储架构、外部索引后台和集成分布式计算引擎构建的基础上，提供三层扩线查询、数据写入、数据读取、集群扩容、元数据备份、元数据快照、联机事物分析和联机分析处理操作来实现管理知识图谱的图数据。S4, graph storage architecture management, on the basis of graph storage architecture, external index backend and integrated distributed computing engine, it provides three-tier extension query, data writing, data reading, cluster expansion, metadata backup, metadata Snapshots, online transaction analysis, and online analytical processing operations are implemented to manage graph data for knowledge graphs.

具体的，所述步骤S2中高效的索引机制包括图形索引和顶点中心索引，图形索引为整个知识图谱的全局索引结构；顶点中心索引是为每个顶点建立的本地索引结构。Specifically, the efficient indexing mechanism in the step S2 includes a graph index and a vertex center index. The graph index is a global index structure of the entire knowledge graph; the vertex center index is a local index structure established for each vertex.

具体的，所述步骤S3中还包括分区操作，具体包括以下子步骤：Specifically, the step S3 also includes a partition operation, which specifically includes the following sub-steps:

S101，顶点RDD通过按顶点的ID进行哈希分区，将顶点数据以多分区形式分布在集群上；S101, the vertex RDD distributes the vertex data on the cluster in the form of multiple partitions by hash partitioning according to the ID of the vertex;

S102，边RDD按指定的分区策略进行分区，将边数据以多分区形式分布在集群上；S102, the edge RDD is partitioned according to the specified partition strategy, and the edge data is distributed on the cluster in the form of multiple partitions;

S103，将记录RDD分区内顶点与所有边RDD分区关系的路由表存储在顶点RDD的分区中，在边RDD需要顶点数据时，顶点RDD根据路由表把顶点数据发送至边RDD分区。S103, the routing table that records the partition relationship between the vertices in the RDD partition and all the edge RDD partitions is stored in the partition of the vertex RDD, and when the edge RDD needs vertex data, the vertex RDD sends the vertex data to the edge RDD partition according to the routing table.

具体的，所述步骤S4中的数据写入步骤包括以下子步骤：Specifically, the data writing step in the step S4 includes the following sub-steps:

S201，客户端连接上Monitor，获取集群Map信息，请求对应的主OSD数据节点；S201, the client connects to the Monitor, obtains the cluster Map information, and requests the corresponding main OSD data node;

S202，主OSD数据节点同时写入另外两个副本节点数据，等待主节点以及另外两个副本节点完成数据写入状态，主节点及副本节点写入状态都成功后，返回完成信号给Client，数据写入完成。S202, the master OSD data node writes the data of the other two replica nodes at the same time, and waits for the master node and the other two replica nodes to complete the data writing state. Writing is complete.

具体的，所述步骤S4中的集群扩容步骤包括以下子步骤：Specifically, the cluster expansion step in step S4 includes the following sub-steps:

S301，Client连接Monitor获取集群Map信息，新主节点OSD1向Monitor上传请求，让OSD2节点接替OSD1节点成为临时主节点；S301, the Client connects to the Monitor to obtain the cluster map information, and the new master node OSD1 uploads a request to the Monitor, so that the OSD2 node replaces the OSD1 node as the temporary master node;

S302，临时主节点OSD2将数据全量同步给新主节点OSD1，ClientIO读写直接连接临时主节点OSD2进行数据读写；S302, the temporary master node OSD2 synchronizes the full amount of data to the new master node OSD1, and ClientIO read and write directly connects to the temporary master node OSD2 for data read and write;

S303，临时主节点OSD2收到读写IO，同时写入另外两个副本节点中的数据，临时主节点OSD2以及另外两副本节点中的三份数据都写入成功后，返回信号给Client，ClientIO读写完毕；S303, the temporary master node OSD2 receives the read and write IO, and writes the data in the other two replica nodes at the same time. After the temporary master node OSD2 and the three copies of the data in the other two replica nodes are all successfully written, a signal is returned to the Client, ClientIO finished reading and writing;

S304，若节点OSD1数据同步完毕，临时主节点OSD2向Monitor上传请求，交出主节点角色，OSD1节点重新成为主节点，OSD2节点变成副本节点；S304, if the data synchronization of the node OSD1 is completed, the temporary master node OSD2 uploads a request to the Monitor, surrenders the master node role, the OSD1 node becomes the master node again, and the OSD2 node becomes a replica node;

S305，同时在图数据层面，实现节点扩容后，按照图数据切割方式将图数据切割，分别存储到多台机器上。S305, at the same time at the graph data level, after the node expansion is realized, the graph data is cut according to the graph data cutting method, and stored on multiple machines respectively.

具体的，所述图数据切割方式包括按点切割和按边切割两种切割方式；按点切割方式以图的顶点进行数据切割，切割线通过图的顶点，每一条边只保存一次，并且每一条边只出现在一台机器上，邻居顶点多的顶点会被分发到多台不同的机器上存储；按边切割方式以图的边进行数据切割，切割线只穿过连接顶点的边，每一个顶点只保存一次，切断的边分发到多台不同到的机器上存储。Specifically, the graph data cutting method includes two cutting methods: point-cutting and edge-cutting; the point-cutting method uses the vertices of the graph to cut the data, the cutting line passes through the vertices of the graph, and each edge is only saved once, and each edge is saved only once. An edge only appears on one machine, and vertices with many neighbor vertices will be distributed to multiple different machines for storage; the data is cut by the edges of the graph according to the edge cutting method, and the cutting line only passes through the edges connecting the vertices. A vertex is stored only once, and the cut edges are distributed to multiple different machines for storage.

具体的，所述步骤S4中的元数据快照步骤包括：根据元数据信息有效的恢复到以前的数据状态，也可恢复程序到系统运行历史状态；保存特定时间点的系统数据，生成系统相应时间点报告；导出快照数据进行离线工作。Specifically, the metadata snapshot step in the step S4 includes: effectively restoring the previous data state according to the metadata information, and also restoring the program to the system running history state; saving the system data at a specific time point, and generating the system corresponding time point reports; export snapshot data for offline work.

具体的，所述步骤S4中的三层扩线查询步骤包括以下子步骤：Specifically, the three-layer line expansion query step in the step S4 includes the following sub-steps:

S401，将用户给定顶点集合Vset设置为第一层扩线查询的基础数据，设置第一层的查询过滤条件为顶点Label/顶点属性的过滤条件ConditionA，进行第一层的顶点扩线查询；S401, set the user's given vertex set Vset as the basic data of the first-layer line expansion query, set the query filter condition of the first layer as the filter condition ConditionA of the vertex Label/vertex attribute, and perform the first-layer vertex extension line query;

S402，将满足第一层查询过滤条件边的顶点集合作为第二层扩线查询的基础数据，同时设置第二层的查询过滤条件为边Label/边属性的过滤条件ConditionB，并进行第二层的边扩线查询；S402, take the vertex set of the edge that satisfies the query filter condition of the first layer as the basic data of the line expansion query of the second layer, and set the query filter condition of the second layer as the filter condition ConditionB of the edge Label/edge attribute at the same time, and perform the second layer query filter condition B. The edge expansion line query;

S403，将满足第二层查询过滤条件的边集合作为第三层扩线查询的基础数据，设置属性查询条件，进行第三层的属性扩线查询，输出经过三层扩线查询的查询结果。S403: Use the edge set that satisfies the query filter conditions of the second layer as the basic data of the third layer extension query, set attribute query conditions, perform the attribute extension query of the third layer, and output the query result after the third layer extension query.

基于Ceph的十亿级节点规模知识图谱的对象存储管理系统，系统包括图数据存储模块、分布式计算模块、索引模块和元数据管理模块。其中，图数据存储模块用于分布式存储大规模知识图谱的对象数据，提供对象存储、块设备存储和文件系统服务；An object storage management system based on Ceph's billion-level node-scale knowledge graph. The system includes a graph data storage module, a distributed computing module, an indexing module and a metadata management module. Among them, the graph data storage module is used for distributed storage of object data of large-scale knowledge graph, providing object storage, block device storage and file system services;

分布式计算模块用于利用SparkRDD内存计算，将大任务分解成多个子任务，分别部署到不同的机器上执行，完成后汇总，以提供高效的大规模数据处理能力，来支撑OLAP需求，供用户基于知识图谱进行数据分析；The distributed computing module is used to use SparkRDD memory computing to decompose large tasks into multiple sub-tasks, which are deployed to different machines for execution, and aggregated after completion to provide efficient large-scale data processing capabilities to support OLAP requirements for users. Data analysis based on knowledge graph;

索引模块用于将知识数据映射为固定的索引数据结构，为用户提供图形索引、顶点中心索引和外部索引功能；The index module is used to map knowledge data into a fixed index data structure, providing users with graph index, vertex center index and external index functions;

元数据管理模块用于元数据的备份、元数据快照、程序恢复、生成时间点报告和系统离线工作。The metadata management module is used for metadata backup, metadata snapshot, program recovery, generation of point-in-time reports and system offline work.

本发明的有益效果：本方案新增大数据分布式架构，引用分布式资源管理器，具有可扩展，高可用等主要性能特征，主要体现在分布式集群，外部索引，数据可靠性，分布式资源管理器方面。同时获得了能存储和表达海量知识，支持数十亿节点数据量，有着可靠、易用、高效的特性。Beneficial effects of the present invention: The scheme adds a new distributed data architecture, refers to a distributed resource manager, has main performance features such as scalability and high availability, and is mainly reflected in distributed clusters, external indexes, data reliability, distributed Resource manager aspect. At the same time, it can store and express massive knowledge, support billions of node data, and have the characteristics of reliability, ease of use and high efficiency.

附图说明Description of drawings

图1是本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.

图2是本发明的整体分布架构图。FIG. 2 is an overall distribution architecture diagram of the present invention.

图3是本发明的分布式资源管理架构图。FIG. 3 is a diagram of a distributed resource management architecture of the present invention.

图4是本发明的集成式分布式计算引擎架构图。FIG. 4 is an architectural diagram of an integrated distributed computing engine of the present invention.

图5是本发明的外部索引插件架构图。FIG. 5 is an architecture diagram of an external index plug-in of the present invention.

图6是本发明的数据写入流程图。FIG. 6 is a flow chart of data writing of the present invention.

图7是本发明的集群扩容流程图。FIG. 7 is a flow chart of the cluster expansion of the present invention.

图8是本发明的系统功能模块图。FIG. 8 is a system functional block diagram of the present invention.

具体实施方式Detailed ways

为了对本发明的技术特征、目的和效果有更加清楚的理解，现对照附图说明本发明的具体实施方式。In order to have a clearer understanding of the technical features, objects and effects of the present invention, the specific embodiments of the present invention will now be described with reference to the accompanying drawings.

本实施例中，如图1-2所示，基于Ceph的十亿级节点规模知识图谱的对象存储管理方法，方法包括以下步骤：In this embodiment, as shown in Figures 1-2, the Ceph-based object storage management method for a billion-level node-scale knowledge graph includes the following steps:

步骤1：图存储架构的构建，获取与目标业务对应的多个实体的实体数据，并根据所述实体数据生成所述目标业务对应的知识图谱并进行存储。如图3所示，对于整体分布式架构，并以Ceph作为分布式资源存储器，采用Client/Server架构，以多个Monitor组成的小集群构建一个Ceph集群，同时单个Monitor小集群下采用多个OSD存储图数据。Step 1: constructing a graph storage architecture, acquiring entity data of multiple entities corresponding to the target business, and generating and storing a knowledge graph corresponding to the target business according to the entity data. As shown in Figure 3, for the overall distributed architecture, Ceph is used as the distributed resource storage, the Client/Server architecture is used, and a Ceph cluster is constructed with a small cluster composed of multiple monitors. At the same time, multiple OSDs are used in a single monitor small cluster. Store graph data.

步骤2，首先将知识图谱数据映射为固定的索引数据结构，为了具备能处理十亿级节点知识数据能力，如图5所示，新增了外部索引后台机制，利用Elasticsearch/Solr检索引擎作为外部的索引插件，实现在进行非等值查询时也能利用到索引，同时结合高效的索引机制构建外部索引后台。外部索引后台与索引引擎通过API方式交换数据。Step 2: First, map the knowledge graph data into a fixed index data structure. In order to have the ability to process knowledge data of billions of nodes, as shown in Figure 5, an external index background mechanism is added, and the Elasticsearch/Solr retrieval engine is used as an external index. The index plug-in of , realizes that the index can also be used when performing non-equivalent queries, and at the same time combines an efficient indexing mechanism to build an external index background. The external indexing backend and the indexing engine exchange data through API.

步骤3，集成分布式计算引擎架构的构建，利用Spark计算引擎框架搭建分布式计算引擎，同时利用GraphX库将图关系转换为Spark算子，GraphX库将图数据以RDD分布式地存储在Ceph集群的节点上，使用顶点RDD和边RDD分别对应存储顶点集合和边集合。Step 3: Integrate the construction of the distributed computing engine architecture, use the Spark computing engine framework to build a distributed computing engine, and use the GraphX library to convert the graph relationship into Spark operators. The GraphX library stores the graph data in the Ceph cluster in a distributed manner as RDD. On the node of , use vertex RDD and edge RDD to store vertex sets and edge sets respectively.

步骤4，图存储架构管理，在图存储架构、外部索引后台和集成分布式计算引擎构建的基础上，提供三层扩线查询、数据写入、数据读取、集群扩容、元数据备份、元数据快照、联机事物分析和联机分析处理操作来实现管理知识图谱的图数据。Step 4: Graph storage architecture management. Based on the graph storage architecture, external index backend and integrated distributed computing engine, it provides three-tier extension query, data writing, data reading, cluster expansion, metadata backup, metadata Data snapshots, online transaction analysis, and online analytical processing operations are implemented to manage graph data for knowledge graphs.

本实施例中，高效的索引机制包括图形索引和顶点中心索引，图形索引为整个知识图谱的全局索引结构，通过对实体或者边的属性进行索引来获得更好的选择性，从而加快图遍历的速度，通过一个或者一组属性组成的固定属性组合进行等值检索。顶点中心索引是为每个顶点建立的本地索引结构，但当在大型的图中，每个顶点存在数千条或者更多的边，对这些顶点遍历会存在对应边的过滤，遍历效率较低，所以顶点中心索引只支持最左匹配。In this embodiment, the efficient indexing mechanism includes a graph index and a vertex center index. The graph index is a global index structure of the entire knowledge graph. Better selectivity is obtained by indexing attributes of entities or edges, thereby speeding up graph traversal. Speed, which is equivalently retrieved by a fixed attribute combination consisting of one or a group of attributes. The vertex center index is a local index structure established for each vertex, but when there are thousands or more edges per vertex in a large graph, there will be corresponding edge filtering for traversal of these vertices, and the traversal efficiency is low , so vertex center indexing only supports leftmost matching.

其中，对于基于索引的三层扩线查询，首先将用户给定顶点集合Vset设置为第一层扩线查询的基础数据，设置第一层的查询过滤条件为顶点Label/顶点属性的过滤条件ConditionA，进行第一层的顶点扩线查询。然后将满足第一层查询过滤条件边的顶点集合作为第二层扩线查询的基础数据，同时设置第二层的查询过滤条件为边Label/边属性的过滤条件ConditionB，并进行第二层的边扩线查询。最后一次扩线查询只查询除了满足ConditionB的边，但与这些边相关的顶点仅有顶点ID的信息，尚不包含任何属性信息，更不确定是否满足ConditionA，因此，需要再做一次属性查询。先将满足第二层查询过滤条件的边集合作为第三层扩线查询的基础数据，设置属性查询条件，进行第三层的属性扩线查询，输出经过三层扩线查询的查询结果，通过如上实施高效的索引得以发挥。Among them, for the index-based three-layer extension query, first set the vertex set Vset given by the user as the basic data of the first layer extension query, and set the query filter condition of the first layer as the filter condition ConditionA of the vertex Label/vertex attribute , and perform the vertex extension query of the first layer. Then, set the vertex set of the edges that satisfy the query filter conditions of the first layer as the basic data of the second layer extension query, and set the query filter condition of the second layer to the filter condition ConditionB of the edge Label/edge attribute, and perform the second layer query filter condition B. Edge expansion query. The last extended line query only queries the edges that satisfy ConditionB, but the vertices related to these edges only have vertex ID information, and do not contain any attribute information, and it is not sure whether ConditionA is satisfied. Therefore, another attribute query needs to be done. First, take the edge set that satisfies the query filter conditions of the second layer as the basic data of the third layer extension query, set the attribute query conditions, perform the attribute extension query of the third layer, and output the query result after the third layer extension query. Implementing efficient indexing as above comes into play.

本实施例中，如图4所示，为了支持OLA需求P，还扩展了一套高性能计算框架API，支持Spark，利用GraphX库将图关系转换为Spark算子，GraphX将图数据以RDD分布式地存储在集群的节点上，使用顶点RDD(VertexRDD)、边RDD(EdgeRDD)存储顶点集合和边集合。顶点RDD通过按顶点的ID进行哈希分区，将顶点数据以多分区形式分布在集群上。边RDD按指定的分区策略(PartitionStrategy)进行分区，将边数据以多分区形式分布在集群上。此外，顶点RDD中还拥有顶点到边RDD分区的路由信息—路由表。路由表存在顶点RDD的分区中，它记录分区内顶点跟所有边RDD分区的关系。在边RDD需要顶点数据时，顶点RDD会根据路由表把顶点数据发送至边RDD分区。至此，将图数据存储为Spark的RDD。In this embodiment, as shown in Figure 4, in order to support the OLA requirement P, a set of high-performance computing framework APIs is also extended to support Spark, and the GraphX library is used to convert graph relationships into Spark operators, and GraphX distributes graph data as RDDs It is stored on the nodes of the cluster in the form of a vertex RDD (VertexRDD) and an edge RDD (EdgeRDD) to store the vertex set and the edge set. Vertex RDD distributes vertex data on the cluster in the form of multiple partitions by hash partitioning by vertex ID. The edge RDD is partitioned according to the specified partition strategy (PartitionStrategy), and the edge data is distributed on the cluster in the form of multiple partitions. In addition, the vertex RDD also has the routing information of the vertex to the edge RDD partition - the routing table. The routing table exists in the partition of the vertex RDD, and it records the relationship between the vertices in the partition and all the edge RDD partitions. When the edge RDD needs vertex data, the vertex RDD will send the vertex data to the edge RDD partition according to the routing table. So far, the graph data is stored as Spark's RDD.

在Spark底层，算子执行启动SparkContext，SparkContext向资源管理器注册并申请运行Executor资源，资源管理器分配Executor资源并启动StandaloneExecutorBackend(任务调度)，Executor运行情况将随着心跳发送到资源管理器上，SparkContext构建成DAG图，将DAG图分解成Stage，并把Taskset发送给TaskScheduler。Executor向SparkContext申请Task，TaskScheduler将Task发放给Executor运行同时SparkContext将应用程序代码发放给Executor，Task在Executor上运行，运行完毕释放所有资源。从而达到高效的mapEdges，mapVertices，aggregateMessages等操作，快速响应数据分析需求。At the bottom layer of Spark, the operator execution starts the SparkContext, and the SparkContext registers with the resource manager and applies to run the Executor resource. The resource manager allocates the Executor resource and starts the StandaloneExecutorBackend (task scheduling). The running status of the Executor will be sent to the resource manager along with the heartbeat. SparkContext builds a DAG graph, decomposes the DAG graph into Stages, and sends the Taskset to the TaskScheduler. Executor applies for Task to SparkContext, TaskScheduler distributes Task to Executor to run, and SparkContext distributes application code to Executor, Task runs on Executor, and all resources are released after running. So as to achieve efficient mapEdges, mapVertices, aggregateMessages and other operations, quickly respond to data analysis needs.

本实施例中，如图6所示，对于数据写入，客户端(Client)连接上Monitor，获取集群Map信息，请求对应的主OSD数据节点，主OSD数据节点同时写入另外两个副本节点数据，等待主节点以及另外两个副本节点写完数据状态，主节点及副本节点写入状态都成功后，返回给Client，数据写入完成。数据读取方式与数据写入同理。In this embodiment, as shown in FIG. 6 , for data writing, the client (Client) connects to the Monitor, obtains the cluster map information, and requests the corresponding main OSD data node. The main OSD data node simultaneously writes to the other two replica nodes Data, wait for the master node and the other two replica nodes to finish writing the data status. After the master node and the replica node write the status successfully, return it to the Client, and the data writing is completed. The way of data read is the same as that of data write.

本实例中，对于集群扩容，Client连接Monitor获取集群Map信息。同时新主节点OSD1由于没有PG(PlacementGrouops)数据，会主动上报Monitor，告知让OSD2节点临时接替为主节点，临时主节点OSD2会把数据全量同步给新主节点OSD1，ClientIO读写直接连接临时主节点OSD2进行读写，OSD2节点收到读写IO，同时写入另外两副本节点，等待OSD2节点以及另外两副本节点写入成功，OSD2节点的三份数据都写入成功后，返回信号给Client。此时ClientIO读写完毕，如果OSD1节点数据同步完毕，临时主节点OSD2向Monitor上传请求，临时主节点OSD2会交出主角色，OSD1成为主节点，OSD2变成副本节点。，同时在图数据层面，实现扩容后，将图切割，即需要将数据切分存储到多台机器上，第一类切割为按点切割，切割线通过图的顶点(Vertex)，而不是边(Edge)。每一条边只保存一次，并且每一条边只出现在一台机器上，邻居多的顶点会被分发到不同的机器上；第二类按边切割，切割线只穿过连接顶点的边(Edge)，每一个顶点只保存一次，切断的边会被保存到多台机器上，至此集群扩容完成。In this example, for cluster expansion, the Client connects to the Monitor to obtain the cluster map information. At the same time, since the new master node OSD1 has no PG (Placement Grouops) data, it will actively report to the Monitor, telling the OSD2 node to temporarily replace the master node. The temporary master node OSD2 will synchronize the full data to the new master node OSD1, and ClientIO reads and writes directly to the temporary master node. The node OSD2 reads and writes, the OSD2 node receives the read and write IO, and writes to the other two replica nodes at the same time, waiting for the OSD2 node and the other two replica nodes to be successfully written, and after the three copies of the OSD2 node data are written successfully, a signal is returned to the Client. . At this point, ClientIO has finished reading and writing. If the data synchronization of the OSD1 node is completed, the temporary master node OSD2 uploads a request to the Monitor. The temporary master node OSD2 will hand over the master role, OSD1 will become the master node, and OSD2 will become the replica node. , At the same time, at the graph data level, after the expansion is realized, the graph is cut, that is, the data needs to be divided and stored on multiple machines. The first type of cutting is point-by-point cutting, and the cutting line passes through the vertices of the graph (Vertex), not edges. (Edge). Each edge is only saved once, and each edge only appears on one machine, and vertices with many neighbors will be distributed to different machines; the second type is cut by edge, and the cutting line only passes through the edge connecting the vertices (Edge ), each vertex is saved only once, and the cut edge will be saved to multiple machines, so far the cluster expansion is completed.

本实施例中，还提供了基于Ceph的十亿级节点规模知识图谱的对象存储管理系统，系统包括图数据存储模块、分布式计算模块、索引模块和元数据管理模块。In this embodiment, an object storage management system based on Ceph's billion-level node-scale knowledge graph is also provided. The system includes a graph data storage module, a distributed computing module, an index module and a metadata management module.

其中，图数据存储模块用于分布式存储大规模知识图谱的对象数据，提供对象存储、块设备存储和文件系统服务。Among them, the graph data storage module is used for distributed storage of object data of large-scale knowledge graph, and provides object storage, block device storage and file system services.

分布式计算模块用于利用SparkRDD内存计算，将大任务分解成多个子任务，分别部署到不同的机器上执行，完成后汇总，以提供高效的大规模数据处理能力，来支撑OLAP需求，供用户基于知识图谱进行数据分析。The distributed computing module is used to use SparkRDD memory computing to decompose large tasks into multiple sub-tasks, which are deployed to different machines for execution, and aggregated after completion to provide efficient large-scale data processing capabilities to support OLAP requirements for users. Data analysis based on knowledge graph.

索引模块用于将知识数据映射为固定的索引数据结构，为用户提供图形索引、顶点中心索引和外部索引功能。The index module is used to map knowledge data into a fixed index data structure, providing users with graph index, vertex center index and external index functions.

本实施例中，综上整个具体实施方案，可应用于反欺诈检测场景案列。即将用户信息、设备信息及社交关系构建了一个异构网络，并将该异构网络图应用在用户关联分析及反欺诈检测场景。导入数据后，节点数达到11亿的数量级，关系数据达到500亿左右的数量级，构成包含11类节点与13类边的复杂异构网络。通过特定规则筛选可疑用户，查看与可疑用户有特定关联的用户；查看与可疑用户有特定关联的所有用户组成子网的网络特征及用户特征；分析特定用户可以通过什么样的关联关系关联在一起；最多可分析6层关联关系的数据等完成一系列数据分析任务，在拥有11亿量级节点数的本专利的图谱中，相比现有图谱存储系统，本方案的图遍历和查询响应时间快4倍至100倍。本技术方案与现有图谱存储解决方案对比情况如下：In this embodiment, the whole specific implementation scheme can be applied to anti-fraud detection scenarios. That is to say, a heterogeneous network is constructed with user information, device information and social relations, and the heterogeneous network graph is applied in user association analysis and anti-fraud detection scenarios. After importing the data, the number of nodes reaches the order of magnitude of 1.1 billion, and the relational data reaches the order of magnitude of about 50 billion, forming a complex heterogeneous network including 11 types of nodes and 13 types of edges. Screen suspicious users through specific rules, and view users with specific associations with suspicious users; view network characteristics and user characteristics of subnets composed of all users with specific associations with suspicious users; analyze what kind of associations specific users can be associated with ; It can analyze data of up to 6 layers of association relationships to complete a series of data analysis tasks. In the graph of this patent with 1.1 billion nodes, compared with the existing graph storage system, the graph traversal and query response time of this solution are improved. 4x to 100x faster. The comparison between this technical solution and the existing map storage solution is as follows:

表1数据加载Table 1 Data Loading

本技术方案This technical solution NEO4J-OFFLINENEO4J-OFFLINE NEO4J-CYPHERNEO4J-CYPHER 45375秒45375 seconds 24小时内未完成Not completed within 24 hours 24小时内未完成Not completed within 24 hours

表2数据存储大小Table 2 Data storage size

本技术方案This technical solution NEO4J-OFFLINENEO4J-OFFLINE NEO4J-CYPHERNEO4J-CYPHER 609375MB609375MB 275950MB275950MB 1276175MB1276175MB

表3查询性能Table 3 Query performance

本技术方案This technical solution NEO4J-OFFLINENEO4J-OFFLINE NEO4J-CYPHERNEO4J-CYPHER 7.5毫秒7.5 ms 55.0毫秒55.0 ms 34.1毫秒34.1 ms

以上显示和描述了本发明的基本原理和主要特征和本发明的优点。本行业的技术人员应该了解，本发明不受上述实施例的限制，上述实施例和说明书中描述的只是说明本发明的原理，在不脱离本发明精神和范围的前提下，本发明还会有各种变化和改进，这些变化和改进都落入要求保护的本发明范围内。本发明要求保护的范围由所附的权利要求书及其等效物界定。The basic principles and main features of the present invention and the advantages of the present invention have been shown and described above. Those skilled in the art should understand that the present invention is not limited by the above-mentioned embodiments, and the descriptions in the above-mentioned embodiments and the description are only to illustrate the principle of the present invention. Without departing from the spirit and scope of the present invention, the present invention will have Various changes and modifications fall within the scope of the claimed invention. The claimed scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. An object storage management method based on a billion-level node-scale knowledge graph of Ceph, characterized in that, comprising the following steps:

S1: Construction of a graph storage architecture, obtaining entity data of multiple entities corresponding to the target business, and generating and storing the knowledge graph corresponding to the target business according to the entity data, and using Ceph as a distributed resource storage, using Client/Server architecture, a Ceph cluster is built with a small cluster composed of multiple monitors, and multiple OSDs are used to store graph data in a single monitor small cluster;

S2, the construction of an external index backend, which maps the knowledge graph data into a fixed index data structure, uses the Elasticsearch/Solr search engine as an external index plug-in, realizes non-equivalent query, and combines an efficient indexing mechanism to build an external index backend;

S3, integrates the construction of the distributed computing engine architecture, uses the Spark computing engine framework to build a distributed computing engine, and uses the GraphX library to convert the graph relationship into Spark operators. The GraphX library stores the graph data in a distributed RDD in the Ceph cluster. On the node, use vertex RDD and edge RDD to store vertex sets and edge sets respectively;

S4, graph storage architecture management, on the basis of graph storage architecture, external index backend and integrated distributed computing engine, it provides three-tier extension query, data writing, data reading, cluster expansion, metadata backup, metadata Snapshots, online transaction analysis, and online analytical processing operations are implemented to manage graph data for knowledge graphs.

2. the object storage management method based on the billion-level node scale knowledge graph of Ceph according to claim 1, is characterized in that, in described step S2, the efficient indexing mechanism comprises graph index and vertex center index, and graph index is the whole. The global index structure of the knowledge graph; the vertex center index is a local index structure established for each vertex.

3. The object storage management method based on Ceph's billion-level node-scale knowledge graph according to claim 1, is characterized in that, in described step S3, also comprises partition operation, specifically comprises the following sub-steps:

S101, the vertex RDD distributes the vertex data on the cluster in the form of multiple partitions by hash partitioning according to the ID of the vertex;

S102, the edge RDD is partitioned according to the specified partition strategy, and the edge data is distributed on the cluster in the form of multiple partitions;

S103, the routing table that records the partition relationship between the vertices in the RDD partition and all the edge RDD partitions is stored in the partition of the vertex RDD, and when the edge RDD needs vertex data, the vertex RDD sends the vertex data to the edge RDD partition according to the routing table.

4. The object storage management method based on a Ceph-based billion-level node-scale knowledge graph according to claim 1, wherein the data writing step in the step S4 comprises the following sub-steps:

S201, the client connects to the Monitor, obtains the cluster Map information, and requests the corresponding main OSD data node;

S202, the master OSD data node writes the data of the other two replica nodes at the same time, and waits for the master node and the other two replica nodes to complete the data writing state. Writing is complete.

5. The object storage management method based on Ceph's billion-level node scale knowledge graph according to claim 1, wherein the cluster expansion step in the step S4 comprises the following substeps:

S301, the Client connects to the Monitor to obtain the cluster map information, and the new master node OSD1 uploads a request to the Monitor, so that the OSD2 node replaces the OSD1 node as the temporary master node;

S302, the temporary master node OSD2 synchronizes the full amount of data to the new master node OSD1, and ClientIO read and write directly connects to the temporary master node OSD2 for data read and write;

S303, the temporary master node OSD2 receives the read and write IO, and writes the data in the other two replica nodes at the same time. After the temporary master node OSD2 and the three copies of the data in the other two replica nodes are all successfully written, a signal is returned to the Client, ClientIO finished reading and writing;

S304, if the data synchronization of the node OSD1 is completed, the temporary master node OSD2 uploads a request to the Monitor, surrenders the master node role, the OSD1 node becomes the master node again, and the OSD2 node becomes a replica node;

S305, at the same time at the graph data level, after the node expansion is realized, the graph data is cut according to the graph data cutting method, and stored on multiple machines respectively.

6. The object storage management method based on a Ceph-based billion-level node-scale knowledge graph according to claim 4, wherein the graph data cutting method comprises two cutting methods: cutting by point and cutting by edge; The cutting method uses the vertices of the graph to cut the data. The cutting line passes through the vertices of the graph. Each edge is only saved once, and each edge only appears on one machine. The vertices with many neighbor vertices will be distributed to multiple different machines. The data is cut by the edge of the graph according to the edge cutting method, the cutting line only passes through the edge connecting the vertices, each vertex is only saved once, and the cut edge is distributed to multiple different machines for storage.

7. The object storage management method based on Ceph's billion-level node-scale knowledge graph according to claim 1, wherein the metadata snapshot step in the step S4 comprises: according to the metadata information, effectively restore to the previous It can also restore the program to the historical state of the system operation; save the system data at a specific time point, and generate a report at the corresponding time point of the system; export snapshot data for offline work.

8. The object storage management method based on Ceph's billion-level node scale knowledge graph according to claim 1, wherein the three-layer extension query step in the step S4 comprises the following substeps:

S401, set the user's given vertex set Vset as the basic data of the first-layer line expansion query, set the query filter condition of the first layer as the filter condition ConditionA of the vertex Label/vertex attribute, and perform the first-layer vertex extension line query;

S402, take the vertex set of the edge that satisfies the query filter condition of the first layer as the basic data of the line expansion query of the second layer, and set the query filter condition of the second layer as the filter condition ConditionB of the edge Label/edge attribute at the same time, and perform the second layer query filter condition B. The edge expansion line query;

S403: Use the edge set that satisfies the query filter conditions of the second layer as the basic data of the third layer extension query, set attribute query conditions, perform the attribute extension query of the third layer, and output the query result after the third layer extension query.

9. An object storage management system based on Ceph's billion-level node-scale knowledge graph, characterized in that it includes

The graph data storage module is used for distributed storage of object data of large-scale knowledge graphs, and provides object storage, block device storage and file system services;

The distributed computing module is used to use SparkRDD in-memory computing to decompose large tasks into multiple sub-tasks, which are deployed to different machines for execution, and aggregated after completion to provide efficient large-scale data processing capabilities to support OLAP requirements. Users perform data analysis based on knowledge graphs;

The index module is used to map knowledge data into a fixed index data structure, providing users with graph index, vertex center index and external index functions;

Metadata management module for metadata backup, metadata snapshot, program recovery, generation of point-in-time reports and system offline work.