[go: up one dir, main page]

CN103729436A - Distributed metadata management method and system - Google Patents

Distributed metadata management method and system Download PDF

Info

Publication number
CN103729436A
CN103729436A CN201310741599.XA CN201310741599A CN103729436A CN 103729436 A CN103729436 A CN 103729436A CN 201310741599 A CN201310741599 A CN 201310741599A CN 103729436 A CN103729436 A CN 103729436A
Authority
CN
China
Prior art keywords
node
metadata
data
replica
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310741599.XA
Other languages
Chinese (zh)
Inventor
王海平
王树鹏
张永铮
吴广君
周晓阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201310741599.XA priority Critical patent/CN103729436A/en
Publication of CN103729436A publication Critical patent/CN103729436A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种分布式元数据管理方法及系统,所述方法具体包括:存储步骤,划分独立的元数据节点和用户表节点,分别用于存储元数据和用户表,并采用多个元数据节点存储元数据的多副本,形成存储相同元数据的主副本节点和从副本节点;校验步骤,对主副本节点和从副本节点进行数据校验,以保证主副本节点和从副本节点存储的元数据一致;修复步骤,采用ZooKeeper技术建立基于主副本节点和从副本节点的监视环,当监视到有主副本节点或从副本节点宕机时,其触发主副本节点和从副本节点之间的切换,实现对宕机节点的修复。所述系统与分布式元数据管理方法的技术方案一一对应。本发明解决了元数据管理中存在的单点故障和多幅本间一致性等问题。

Figure 201310741599

The present invention relates to a distributed metadata management method and system, and the method specifically includes: a storage step, dividing independent metadata nodes and user table nodes for storing metadata and user tables respectively, and adopting multiple metadata Nodes store multiple copies of metadata to form a master copy node and a slave copy node that store the same metadata; the verification step is to verify the data of the master copy node and the slave copy node to ensure that the master copy node and the slave copy node store The metadata is consistent; the repair step uses ZooKeeper technology to establish a monitoring ring based on the master replica node and the slave replica node. Switching to realize the recovery of downtime nodes. The system is in one-to-one correspondence with the technical solutions of the distributed metadata management method. The invention solves the problems of single-point failure and consistency among multiple pages existing in metadata management.

Figure 201310741599

Description

A kind of distributed meta-data management method and system
Technical field
The invention belongs to mass data storage management review field, particularly relate to the metadata management of large field of data storage, is a kind of distributed meta-data management method and system.
Background technology
In recent years, along with the development of information society, increasing information, by datumization, is especially accompanied by the development of Internet, and data are explosive growth.First be the sharply expansion of memory capacity, thereby proposed larger demand for storage server; Next is the increase of data duration; Finally, the management of data storage is had higher requirement.Especially, the variation of data, geographic dispersiveness, protection of significant data etc. is all had higher requirement to data management.Along with the blast of unstructured data, distributed data base has also entered the gold period of development, from high-performance calculation to data center, from data sharing to internet, applications, has been penetrated into each face of each side of market demand.For most of distributed data bases, conventionally that metadata and data is independent, be about to control stream and data stream carry out separated, thereby obtain higher tactful extendability and I/O concurrency.Thereby metadata management model seems most important, directly have influence on tactful extendability, performance, reliability and stability etc.
Capacity increase in data storage is endless, and the management of metadata is also had higher requirement.When distributed storage, exist many machines to read while write the sight that metadata table is carried out to read and write access, require metadata management strategy that high stable, high performance Metadata Service are provided.Existing metadata management strategy probably has three classes: centralized metadata management strategy, without Metadata Service strategy and distributed meta-data management strategy.Storage and client query request that centralized Metadata Service strategy provides a central meta data server to be responsible for metadata, it provides unified NameSpace, and processes the access control functions such as location resolution and data location.Its shortcoming is very outstanding, wherein two the most key be performance bottleneck and Single Point of Faliure problem.Without Metadata Service strategy, adopt elasticity hash algorithm, directly abandon Metadata Service, allow metadata and data all together with storage.Data consistent problem is more complicated like this, and read-write operation inefficiency lacks global monitoring management function.Also cause client to bear more function, increased the load of client, take suitable CPU and internal memory simultaneously.Traditional distributed metadata management strategy use multiple servers forms cluster and works in coordination with and provide Metadata Service for distributed data base, thereby eliminate performance bottleneck and the Single Point of Faliure problem of centralized Metadata Service model, also eliminated inefficiency and the difficult problem of overall situation supervision without Metadata Service strategy.But traditional distributed metadata management strategy also has its defect, as the consistency problem between performance cost and many copies.
Therefore,, for the limitation of metadata management in prior art, the present invention proposes a kind of new distributed meta-data management method and system.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of distributed meta-data management method and system, for solving Single Point of Faliure that present technology metadata management exists and the problem such as consistance between several.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of distributed meta-data management method, specifically comprises the following steps:
Storing step: divide independently metadata node and subscriber's meter node, be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Checking procedure: carry out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair step: adopt ZooKeeper technology to set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described storing step also comprises employing dynamical fashion or static mode extended metadata node;
Dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification;
Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
Further, described to primary copy node with carry out data check from replica node and adopt lightweight data check mode, specifically comprise: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.
Further, described to primary copy node with carry out data check from replica node and adopt periodic data slicing files verification mode, specifically comprise: whether the data file of the data fragmentation of metadata node meeting periodic check self maintained loses, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
Further, described to primary copy node and the data check mode of carrying out from replica node between the different copies of data check employing periodic data burst: primary copy node is obtained the piecemeal foundation of self, and to sending piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.
Further, in described reparation step, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
Corresponding above-mentioned distributed meta-data management method, technical scheme of the present invention also comprises a kind of distributed meta-data management system, specifically comprises with lower module:
Memory module, be used for dividing independently metadata node and subscriber's meter node, make it be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Correction verification module, for carrying out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair module, for adopting ZooKeeper technology, set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
Further, described memory module is also for adopting dynamical fashion or static mode extended metadata node;
Dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification;
Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
Further, correction verification module comprises lightweight data check module, periodic data slicing files correction verification module and periodic data burst copy correction verification module;
Described lightweight data check module, its for: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.
Described periodic data slicing files correction verification module, its for: whether the data file of the data fragmentation of metadata node periodic check self maintained is lost, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
Described periodic data burst copy correction verification module, its for: make primary copy node obtain the piecemeal foundation of self, and to send piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.
Further, in described reparation module, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
The invention has the beneficial effects as follows: the present invention is independent of subscriber's meter by metadata and is stored on different nodes, when subscriber's meter node load is higher, can not affect the read-write of metadata, improved stability and the efficiency of metadata read-write; Meanwhile, the present invention can realize dynamic expansion metadata node, support many copies storages of metadata, has reduced the delay risk of machine of metadata node; The present invention has designed metadata data check link, and the metadata of storing in each replica node is consistent, and makes the stable performance of metadata cluster.In addition, owing to being provided with many copy storages, when killing wherein abnormal metadata node, other available metadata node can complete rapidly upgrading and repair, and have avoided the Single Point of Faliure phenomenon being prone in metadata management process.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of distributed meta-data management method of the present invention;
Fig. 2 dynamically increases the schematic diagram of metadata node in the embodiment of the present invention;
Fig. 3 is that in the embodiment of the present invention, metadata node triggers the schematic diagram that copy is revised;
Fig. 4 is the schematic flow sheet of the data check mode between the different copies of periodic data burst in the embodiment of the present invention;
Fig. 5 monitors ring upgrading modification process schematic diagram in the embodiment of the present invention;
Fig. 6 is the schematic flow sheet of distributed meta-data management method of the present invention.
Embodiment
Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, the present embodiment has provided a kind of distributed meta-data management method, specifically comprises the following steps:
Storing step: divide independently metadata node and subscriber's meter node, be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Checking procedure: carry out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair step: adopt ZooKeeper technology to set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
Based on these three steps, the specific implementation process of the present embodiment is divided into following three parts.
One, metadata store and metadata node expansion
Metadata and subscriber's meter data are stored respectively, and metadata node is supported dynamic expansion and static expansion.Because metadata and subscriber's meter are stored in different nodes, when subscriber's meter node load is high, can not have influence on the read-write of metadata, improved stability and the efficiency of metadata read-write.
In metadata store process, in order to guarantee data fault-tolerant, adopt many copies of a plurality of metadata node storing metadatas; In order to alleviate the work load of Master node, promote the cluster scale of mass storage system (MSS) simultaneously, introduced principal and subordinate's copy mechanism.In addition, also need to consider the expansion of carrying out metadata node according to actual conditions, metadata profile comprises dynamical fashion and static mode.Described dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification.And static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
As shown in Figure 2, treatment scheme while having provided a dynamically newly-increased metadata node, the metadata node that indicates data with META_RS_01, META_RS_02, and META_RS_03 represents not have the newly-increased metadata node of data, Master is the managing process of being in charge of all back end.When newly-increased metadata node META_RS_03 starts, as a connecting object, initiatively arrive the registration of Master node, then Master preserves the data structure of this connecting object.The data structure of the connecting object of its preservation of Master periodic scanning, whether judgement has node to be registered to Master in the recent period, if had, carry out following operation: take out this connecting object, it is Zookeeper abbreviation that this newly-increased metadata node is registered to ZK(ZK), and upgrade the loop configuration (being the supervision of setting up by Zookeeper encircles) of the metadata node that ZK safeguards; As shown in Figure 3, Master takes out the data structure of the connecting object of preserving from the thread of regular triggering, triggering copy is repaired, and the data structure of connecting object is sent to primary copy node, primary copy node is carried out copy reparation operation, import the data of the data fragmentation of its all metadata table into this newly-increased metadata node, and start corresponding data trnascription service, using this newly-increased metadata node as from replica node.
Therefore, the known expansion of carrying out metadata node is in order to meet the demand of many copy storages of metadata, to cause for further preventing metadata node storing excess data the machine of delaying.The new metadata node of expansion is as from replica node, and the node that former storing metadata is used is as primary copy node, is beneficial to the delay problem of machine of follow-up solution metadata node.
Two, data check
Adopt after the storage of many copies, need to consider primary copy node and replica node the consistance of data, therefore need to carry out data check to primary copy node with from replica node.When carrying out data check, read-write service is not externally provided, the data check when metadata node is restarted is lightweight verification, periodic check during operation belongs to the verification of internal memory rank.
The present embodiment mainly adopts three kinds of verification modes:
First, lightweight data check mode, specifically comprise: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.Lightweight verification while restarting is the state in order to confirm to restart also.
Second, periodic data slicing files verification mode, specifically comprise: whether the data file of the data fragmentation of metadata node meeting periodic check self maintained loses, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
The 3rd, data check mode between the different copies of periodic data burst: as shown in Figure 4, be provided with three replica node, primary copy node and two are from replica node, primary copy node is obtained the check_set of self, and the foundation of piecemeal has been stored in the inside, and to sending check_set and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares with the check_map of self, completes the verification of three replica node.If all consistent from replica node data, think data consistent, otherwise be as the criterion with primary copy data.Check_map is the variable of a mapping structure, and its key (key) is for identifying current data burst, and Value (value) is the md5 value of this data fragmentation.
Three, repair and upgrade
Realize after primary copy node and the data consistent replica node, need to utilize primary copy node and solve the metadata node machine problem of delaying from replica node.
The present embodiment is supported upgrading and the reparation from replica node, during startup, by ZooKeeper, is set up and is monitored ring, when there is the death of metadata process exception, according to dead role and quantity, triggers fast upgrading and repairs.ZooKeeper is the reliable coordination strategy for large-scale distributed strategy, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
When the primary copy node of metadata is delayed machine, select first from replica node, to take over the work of primary copy node, in order to guarantee metadata table, externally service is uninterrupted.Adopt the abbreviation of ZK(ZooKeeper) monitor and trigger and switch metadata primary copy from copy.Key step is as follows:
1) in ZK, set up bibliographic structure/root node/father node/interim node.
2) each META copy and ZK set up session, and below father node, set up interim node, write the agent address of oneself, if session is expired, this interim node can disappear.
3) illustrate, as shown in Figure 5, META_RS_01 is primary copy node, META_RS_02, META_RS_03 and META_RS_04 are from replica node, META_RS_02 monitors whether the interim node of META_RS_01 exists, if primary copy node session is expired, the interim node that primary copy node is corresponding disappears.Now, first, from replica node META_RS_02, upgrade to primary copy node.META_RS_02 is upgraded to after primary copy node and changes into and monitor that last is from replica node META_RS_04, as shown in phantom in FIG..
Method successful described in the present embodiment, while not adopting this strategy, can find by monitoring, kill after metadata node, loading can not complete, and while being parked in 89% left and right, starts to point out mistake.And follow-uply can not user data and metadata be inquired about, insert, be revised and the operation such as deletion.And after adopting this strategy, after killing metadata node, as long as also there is a metadata node, be carried in (in example, test is about 30 seconds) after of short duration stopping, continuing to load, until 100%, follow-up can normal running to user data and metadata.
As shown in Figure 6, corresponding above-mentioned distributed meta-data management method, technical scheme of the present invention also comprises a kind of distributed meta-data management system, specifically comprises with lower module:
Memory module, be used for dividing independently metadata node and subscriber's meter node, make it be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Correction verification module, for carrying out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair module, for adopting ZooKeeper technology, set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
In the present embodiment, described memory module is also for adopting dynamical fashion or static mode extended metadata node, dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification.Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
Equally as shown in Figure 6, described correction verification module comprises lightweight data check module, periodic data slicing files correction verification module and periodic data burst copy correction verification module;
Described lightweight data check module, its for: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.
Described periodic data slicing files correction verification module, its for: whether the data file of the data fragmentation of metadata node periodic check self maintained is lost, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
Described periodic data burst copy correction verification module, its for: make primary copy node obtain the piecemeal foundation of self, and to send piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.
In addition, in described reparation module, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
Much more no longer the specific implementation process of described distributed meta-data management system is consistent with above-mentioned distributed meta-data management method, to state here.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1.一种分布式元数据管理方法,其特征在于,具体包括以下步骤:1. A distributed metadata management method, characterized in that, specifically comprising the following steps: 存储步骤:划分独立的元数据节点和用户表节点,分别用于存储元数据和用户表,并采用多个元数据节点存储元数据的多副本,形成均用于存储相同元数据的主副本节点和从副本节点;Storage step: Divide independent metadata nodes and user table nodes, which are used to store metadata and user tables respectively, and use multiple metadata nodes to store multiple copies of metadata to form a master copy node for storing the same metadata and slave replica nodes; 校验步骤:对主副本节点和从副本节点进行数据校验,以保证主副本节点和从副本节点上存储的元数据的一致性;Verification step: perform data verification on the master replica node and the slave replica node to ensure the consistency of the metadata stored on the master replica node and the slave replica node; 修复步骤:采用ZooKeeper技术建立基于主副本节点和从副本节点的监视环,当监视环监视到有主副本节点或从副本节点宕机时,其触发主副本节点和从副本节点之间的切换,实现对宕机节点的修复。Repair steps: use ZooKeeper technology to establish a monitoring ring based on the master replica node and the slave replica node. When the monitor ring monitors that there is a master replica node or the slave replica node is down, it triggers a switch between the master replica node and the slave replica node. Realize the repair of down nodes. 2.根据权利要求1所述的分布式元数据管理方法,其特征在于,所述存储步骤还包括采用动态方式或静态方式扩展元数据节点;2. The distributed metadata management method according to claim 1, wherein the storing step further comprises expanding metadata nodes in a dynamic or static manner; 动态方式具体包括:增加元数据空节点,通过校验发现元数据空节点后,向发现的元数据空节点传输元数据;The dynamic method specifically includes: adding metadata empty nodes, and transmitting metadata to the found metadata empty nodes after the metadata empty nodes are found through verification; 静态方式具体包括:在所有元数据节点关机后,再增加新的元数据节点,并在该新增的元数据节点启动时修改其配置。The static method specifically includes: adding a new metadata node after all metadata nodes are shut down, and modifying the configuration of the newly added metadata node when it is started. 3.根据权利要求1所述的分布式元数据管理方法,其特征在于,所述对主副本节点和从副本节点进行数据校验采用轻量级数据校验方式,具体包括:在元数据节点启动时,向所有的元数据节点发送请求,获取每个元数据节点上每个元数据表分片的记录数,若记录数不一致,说明有数据不一致,则关闭不符合条件的元数据节点上的数据分片服务,同时删除该数据分片的数据,并触发副本修复操作。3. The distributed metadata management method according to claim 1, wherein the data verification of the master copy node and the slave copy node adopts a lightweight data verification method, specifically comprising: at the metadata node When starting, send a request to all metadata nodes to obtain the number of records of each metadata table fragment on each metadata node. If the number of records is inconsistent, it means that there is data inconsistency. Then close the unqualified metadata nodes. The data sharding service, delete the data of the data shard at the same time, and trigger the replica repair operation. 4.根据权利要求1所述的分布式元数据管理方法,其特征在于,所述对主副本节点和从副本节点进行数据校验采用定期数据分片文件校验方式,具体包括:元数据节点定期校验自身维护的数据分片的数据文件是否发生丢失,若发现丢失,则停止该数据分片在当前节点的数据服务,同时删除该数据分片的数据,并立刻触发副本修复操作。4. The distributed metadata management method according to claim 1, wherein the data verification of the master copy node and the slave copy node adopts a regular data slice file verification method, specifically comprising: metadata node Regularly check whether the data file of the data fragment maintained by itself is lost. If it is found to be lost, stop the data service of the data fragment on the current node, delete the data of the data fragment, and immediately trigger the replica repair operation. 5.根据权利要求1所述的分布式元数据管理方法,其特征在于,所述对主副本节点和从副本节点进行数据校验采用定期数据分片不同副本间的数据校验方式:主副本节点取得自身的分块依据,并向从副本节点发送分块依据和校验请求,主副本节点和从副本节点均根据该分块依据取得md5值,并将md5值存入check_map中;从副本节点把check_map返回给主副本节点,主副本节点收到从副本的check_map,与自身的check_map做比较,如果所有从副本节点数据都一致,则认为数据一致,否则以主副本数据为准。5. The distributed metadata management method according to claim 1, wherein the data verification of the master copy node and the slave copy node adopts a data verification method between different copies of regular data fragmentation: master copy The node obtains its own block basis, and sends the block basis and verification request to the slave copy node. Both the master copy node and the slave copy node obtain the md5 value according to the block basis, and store the md5 value in check_map; the slave copy node The node returns the check_map to the master replica node. The master replica node receives the check_map of the slave replica and compares it with its own check_map. If the data of all the slave replica nodes are consistent, the data is considered to be consistent, otherwise the data of the master replica shall prevail. 6.根据权利要求1所述的分布式元数据管理方法,其特征在于,所述修复步骤中通过判断元数据节点与ZooKeeper的会话是否过期来确定是否有主副本节点或从副本节点宕机,若会话过期,则发生宕机,否则未宕机。6. the distributed metadata management method according to claim 1, is characterized in that, in the repairing step, determine whether there is a master copy node or a downtime from the copy node by judging whether the session of the metadata node and ZooKeeper expires, If the session expires, a downtime occurs, otherwise there is no downtime. 7.一种分布式元数据管理系统,其特征在于,具体包括以下模块:7. A distributed metadata management system, characterized in that it specifically includes the following modules: 存储模块,用于划分独立的元数据节点和用户表节点,使其分别用于存储元数据和用户表,并采用多个元数据节点存储元数据的多副本,形成均用于存储相同元数据的主副本节点和从副本节点;The storage module is used to divide independent metadata nodes and user table nodes, so that they are used to store metadata and user tables respectively, and multiple metadata nodes are used to store multiple copies of metadata to form a database that is used to store the same metadata The master replica node and slave replica node; 校验模块,用于对主副本节点和从副本节点进行数据校验,以保证主副本节点和从副本节点上存储的元数据的一致性;The verification module is used to perform data verification on the master replica node and the slave replica node, so as to ensure the consistency of the metadata stored on the master replica node and the slave replica node; 修复模块,用于采用ZooKeeper技术建立基于主副本节点和从副本节点的监视环,当监视环监视到有主副本节点或从副本节点宕机时,其触发主副本节点和从副本节点之间的切换,实现对宕机节点的修复。The repair module is used to establish a monitoring ring based on the master replica node and the slave replica node using ZooKeeper technology. When the monitor ring monitors that there is a master replica node or a slave replica node is down, it triggers a Switching to realize the recovery of downtime nodes. 8.根据权利要求7所述的分布式元数据管理系统,其特征在于,所述存储模块还用于通过动态方式或静态方式扩展元数据节点;8. The distributed metadata management system according to claim 7, wherein the storage module is also used to expand metadata nodes dynamically or statically; 动态方式具体包括:增加元数据空节点,通过校验发现元数据空节点后,向发现的元数据空节点传输元数据;The dynamic method specifically includes: adding metadata empty nodes, and transmitting metadata to the found metadata empty nodes after the metadata empty nodes are found through verification; 静态方式具体包括:在所有元数据节点关机后,再增加新的元数据节点,并在该新增的元数据节点启动时修改其配置。The static method specifically includes: adding a new metadata node after all metadata nodes are shut down, and modifying the configuration of the newly added metadata node when it is started. 9.根据权利要求7所述的分布式元数据管理系统,其特征在于,校验模块包括轻量级数据校验模块、定期数据分片文件校验模块和定期数据分片副本校验模块;9. The distributed metadata management system according to claim 7, wherein the verification module includes a lightweight data verification module, a regular data fragmentation file verification module and a regular data fragmentation copy verification module; 所述轻量级数据校验模块,其用于:在元数据节点启动时,向所有的元数据节点发送请求,获取每个元数据节点上每个元数据表分片的记录数,若记录数不一致,说明有数据不一致,则关闭不符合条件的元数据节点上的数据分片服务,同时删除该数据分片的数据,并触发副本修复操作;The lightweight data verification module is used to: send a request to all metadata nodes when the metadata node starts, and obtain the number of records of each metadata table fragment on each metadata node, if the record If the number is inconsistent, indicating that there is data inconsistency, the data sharding service on the unqualified metadata node will be shut down, the data of the data shard will be deleted at the same time, and the copy repair operation will be triggered; 所述定期数据分片文件校验模块,其用于:使元数据节点定期校验自身维护的数据分片的数据文件是否发生丢失,若发现丢失,则停止该数据分片在当前节点的数据服务,同时删除该数据分片的数据,并立刻触发副本修复操作;The regular data sharding file verification module is used to: make the metadata node periodically check whether the data file of the data shard maintained by itself is lost, and if it is found to be lost, stop the data of the data shard at the current node service, delete the data of the data fragment at the same time, and immediately trigger the copy repair operation; 所述定期数据分片副本校验模块,其用于:使主副本节点取得自身的分块依据,并向从副本节点发送分块依据和校验请求,主副本节点和从副本节点均根据该分块依据取得md5值,并将md5值存入check_map中;从副本节点把check_map返回给主副本节点,主副本节点收到从副本的check_map,与自身的check_map做比较,如果所有从副本节点数据都一致,则认为数据一致,否则以主副本数据为准。The periodic data fragment copy verification module is used to: make the master copy node obtain its own block basis, and send the block basis and verification request to the slave copy node, and the master copy node and the slave copy node are based on the The block is based on the md5 value obtained, and the md5 value is stored in the check_map; the slave replica node returns the check_map to the master replica node, and the master replica node receives the check_map of the slave replica and compares it with its own check_map. If all slave replica node data If they are all consistent, the data is considered to be consistent, otherwise the data of the master copy shall prevail. 10.根据权利要求7所述的分布式元数据管理系统,其特征在于,所述修复模块中通过判断元数据节点与ZooKeeper的会话是否过期来确定是否有主副本节点或从副本节点宕机,若会话过期,则发生宕机,否则未宕机。10. the distributed metadata management system according to claim 7, is characterized in that, in described repairing module, determine whether to have primary replica node or slave replica node downtime by judging whether the session of metadata node and ZooKeeper expires, If the session expires, a downtime occurs, otherwise there is no downtime.
CN201310741599.XA 2013-12-27 2013-12-27 Distributed metadata management method and system Pending CN103729436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310741599.XA CN103729436A (en) 2013-12-27 2013-12-27 Distributed metadata management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310741599.XA CN103729436A (en) 2013-12-27 2013-12-27 Distributed metadata management method and system

Publications (1)

Publication Number Publication Date
CN103729436A true CN103729436A (en) 2014-04-16

Family

ID=50453510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310741599.XA Pending CN103729436A (en) 2013-12-27 2013-12-27 Distributed metadata management method and system

Country Status (1)

Country Link
CN (1) CN103729436A (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104468569A (en) * 2014-12-04 2015-03-25 北京国双科技有限公司 Integrity detection method and device of distributed data
CN105243125A (en) * 2015-09-29 2016-01-13 北京京东尚科信息技术有限公司 PrestoDB cluster running method and apparatus, cluster and data query method and apparatus
CN105550229A (en) * 2015-12-07 2016-05-04 北京奇虎科技有限公司 Method and device for repairing data of distributed storage system
CN105550230A (en) * 2015-12-07 2016-05-04 北京奇虎科技有限公司 Method and device for detecting failure of node of distributed storage system
CN105589887A (en) * 2014-10-24 2016-05-18 中兴通讯股份有限公司 Data processing method for distributed file system and distributed file system
CN105610903A (en) * 2015-12-17 2016-05-25 北京奇虎科技有限公司 Data node upgrading method and device for distributed system
CN105681401A (en) * 2015-12-31 2016-06-15 深圳前海微众银行股份有限公司 Distributed architecture
CN105892954A (en) * 2016-04-25 2016-08-24 乐视控股(北京)有限公司 Data storage method and device based on multiple copies
CN106293980A (en) * 2016-07-26 2017-01-04 乐视控股(北京)有限公司 Data recovery method and system for distributed storage cluster
CN106945691A (en) * 2017-04-10 2017-07-14 湖南中车时代通信信号有限公司 The real-time hot standby switch device of server multicenter of automatic train monitor
CN107219997A (en) * 2016-03-21 2017-09-29 阿里巴巴集团控股有限公司 A kind of method and device for being used to verify data consistency
WO2017219678A1 (en) * 2016-06-22 2017-12-28 杭州海康威视数字技术股份有限公司 Data recovery method and device, and cloud storage system
CN108259543A (en) * 2016-12-29 2018-07-06 广东中科遥感技术有限公司 Distributed cloud storage database and method for deploying same in multiple data centers
CN109407977A (en) * 2018-09-25 2019-03-01 佛山科学技术学院 A kind of big data distributed storage management method and system
CN109614037A (en) * 2018-11-16 2019-04-12 新华三技术有限公司成都分公司 Data routing inspection method, apparatus and distributed memory system
CN109614164A (en) * 2018-11-29 2019-04-12 深圳前海微众银行股份有限公司 Method, apparatus, device and readable storage medium for realizing plug-in configurability
CN109947730A (en) * 2017-07-25 2019-06-28 中兴通讯股份有限公司 Metadata restoration methods, device, distributed file system and readable storage medium storing program for executing
CN110471934A (en) * 2019-08-19 2019-11-19 泰康保险集团股份有限公司 Method of calibration, device, medium and the electronic equipment of business datum
CN111124301A (en) * 2019-12-18 2020-05-08 深圳供电局有限公司 Data consistency storage method and system of object storage device
CN111241011A (en) * 2019-12-31 2020-06-05 清华大学 A Global Address Space Management Method for Distributed Persistent Memory
CN111695018A (en) * 2019-03-13 2020-09-22 阿里巴巴集团控股有限公司 Data processing method and device, distributed network system and computer equipment
CN111949210A (en) * 2017-06-28 2020-11-17 华为技术有限公司 Metadata storage method, system and storage medium in distributed storage system
WO2020232859A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Distributed storage system, data writing method, device, and storage medium
CN112363675A (en) * 2020-11-18 2021-02-12 苏州元核云技术有限公司 Control method and system based on distributed storage system
CN112711376A (en) * 2019-10-25 2021-04-27 北京金山云网络技术有限公司 Method and device for determining object master copy file in object storage system
CN113239013A (en) * 2021-05-17 2021-08-10 北京青云科技股份有限公司 Distributed systems and storage media
CN113297173A (en) * 2021-05-24 2021-08-24 阿里巴巴新加坡控股有限公司 Distributed database cluster management method and device and electronic equipment
CN113391767A (en) * 2021-06-30 2021-09-14 北京百度网讯科技有限公司 Data consistency checking method and device, electronic equipment and readable storage medium
CN113704359A (en) * 2021-09-03 2021-11-26 优刻得科技股份有限公司 Synchronization method, system and server for multiple data copies of time sequence database
CN116348863A (en) * 2020-10-14 2023-06-27 甲骨文国际公司 System and method for extending transactional continuity across failures in a database
US12321368B2 (en) 2023-07-26 2025-06-03 Douyin Vision Co., Ltd. Method, device, and storage medium for scheduling a distributed database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334797A (en) * 2008-08-04 2008-12-31 中兴通讯股份有限公司 A distributed file system and its data block consistency management method
CN102419766A (en) * 2011-11-01 2012-04-18 西安电子科技大学 Data redundancy and file operation method based on HDFS distributed file system
CN103383689A (en) * 2012-05-03 2013-11-06 阿里巴巴集团控股有限公司 Service process fault detection method, device and service node

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334797A (en) * 2008-08-04 2008-12-31 中兴通讯股份有限公司 A distributed file system and its data block consistency management method
CN102419766A (en) * 2011-11-01 2012-04-18 西安电子科技大学 Data redundancy and file operation method based on HDFS distributed file system
CN103383689A (en) * 2012-05-03 2013-11-06 阿里巴巴集团控股有限公司 Service process fault detection method, device and service node

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李宽: "基于HDFS的分布式Namenode节点模型的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589887A (en) * 2014-10-24 2016-05-18 中兴通讯股份有限公司 Data processing method for distributed file system and distributed file system
CN105589887B (en) * 2014-10-24 2020-04-03 中兴通讯股份有限公司 Data processing method of distributed file system and distributed file system
CN104468569B (en) * 2014-12-04 2017-12-22 北京国双科技有限公司 The integrality detection method and device of distributed data
CN104468569A (en) * 2014-12-04 2015-03-25 北京国双科技有限公司 Integrity detection method and device of distributed data
CN105243125A (en) * 2015-09-29 2016-01-13 北京京东尚科信息技术有限公司 PrestoDB cluster running method and apparatus, cluster and data query method and apparatus
CN105243125B (en) * 2015-09-29 2018-07-06 北京京东尚科信息技术有限公司 Operation method, device, cluster and the inquiry data method and device of PrestoDB clusters
CN105550229A (en) * 2015-12-07 2016-05-04 北京奇虎科技有限公司 Method and device for repairing data of distributed storage system
CN105550230A (en) * 2015-12-07 2016-05-04 北京奇虎科技有限公司 Method and device for detecting failure of node of distributed storage system
CN105550229B (en) * 2015-12-07 2019-05-03 北京奇虎科技有限公司 Method and device for data restoration in distributed storage system
CN105550230B (en) * 2015-12-07 2019-07-23 北京奇虎科技有限公司 The method for detecting and device of distributed memory system node failure
CN105610903A (en) * 2015-12-17 2016-05-25 北京奇虎科技有限公司 Data node upgrading method and device for distributed system
CN105681401A (en) * 2015-12-31 2016-06-15 深圳前海微众银行股份有限公司 Distributed architecture
CN107219997A (en) * 2016-03-21 2017-09-29 阿里巴巴集团控股有限公司 A kind of method and device for being used to verify data consistency
CN107219997B (en) * 2016-03-21 2020-08-18 阿里巴巴集团控股有限公司 Method and device for verifying data consistency
CN105892954A (en) * 2016-04-25 2016-08-24 乐视控股(北京)有限公司 Data storage method and device based on multiple copies
US10824372B2 (en) 2016-06-22 2020-11-03 Hangzhou Hikvision Digital Technology Co., Ltd. Data recovery method and device, and cloud storage system
CN107528872B (en) * 2016-06-22 2020-07-24 杭州海康威视数字技术股份有限公司 Data recovery method and device and cloud storage system
CN107528872A (en) * 2016-06-22 2017-12-29 杭州海康威视数字技术股份有限公司 A kind of data reconstruction method, device and cloud storage system
WO2017219678A1 (en) * 2016-06-22 2017-12-28 杭州海康威视数字技术股份有限公司 Data recovery method and device, and cloud storage system
CN106293980A (en) * 2016-07-26 2017-01-04 乐视控股(北京)有限公司 Data recovery method and system for distributed storage cluster
CN108259543A (en) * 2016-12-29 2018-07-06 广东中科遥感技术有限公司 Distributed cloud storage database and method for deploying same in multiple data centers
CN108259543B (en) * 2016-12-29 2021-07-06 广东中科遥感技术有限公司 Distributed cloud storage database and method for its deployment in multiple data centers
CN106945691B (en) * 2017-04-10 2019-06-21 湖南中车时代通信信号有限公司 The real-time hot standby switch device of the server multicenter of automatic train monitor
CN106945691A (en) * 2017-04-10 2017-07-14 湖南中车时代通信信号有限公司 The real-time hot standby switch device of server multicenter of automatic train monitor
CN111949210A (en) * 2017-06-28 2020-11-17 华为技术有限公司 Metadata storage method, system and storage medium in distributed storage system
CN109947730A (en) * 2017-07-25 2019-06-28 中兴通讯股份有限公司 Metadata restoration methods, device, distributed file system and readable storage medium storing program for executing
CN109947730B (en) * 2017-07-25 2024-02-02 中兴通讯股份有限公司 Metadata recovery method, device, distributed file system and readable storage medium
CN109407977A (en) * 2018-09-25 2019-03-01 佛山科学技术学院 A kind of big data distributed storage management method and system
CN109407977B (en) * 2018-09-25 2021-08-31 佛山科学技术学院 A method and system for distributed storage management of big data
CN109614037A (en) * 2018-11-16 2019-04-12 新华三技术有限公司成都分公司 Data routing inspection method, apparatus and distributed memory system
CN109614164A (en) * 2018-11-29 2019-04-12 深圳前海微众银行股份有限公司 Method, apparatus, device and readable storage medium for realizing plug-in configurability
CN111695018B (en) * 2019-03-13 2023-05-30 阿里云计算有限公司 Data processing method and device, distributed network system and computer equipment
CN111695018A (en) * 2019-03-13 2020-09-22 阿里巴巴集团控股有限公司 Data processing method and device, distributed network system and computer equipment
WO2020232859A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Distributed storage system, data writing method, device, and storage medium
CN110471934A (en) * 2019-08-19 2019-11-19 泰康保险集团股份有限公司 Method of calibration, device, medium and the electronic equipment of business datum
CN112711376A (en) * 2019-10-25 2021-04-27 北京金山云网络技术有限公司 Method and device for determining object master copy file in object storage system
CN111124301B (en) * 2019-12-18 2024-02-23 深圳供电局有限公司 Data consistency storage method and system of object storage device
CN111124301A (en) * 2019-12-18 2020-05-08 深圳供电局有限公司 Data consistency storage method and system of object storage device
CN111241011B (en) * 2019-12-31 2022-04-15 清华大学 Global address space management method of distributed persistent memory
CN111241011A (en) * 2019-12-31 2020-06-05 清华大学 A Global Address Space Management Method for Distributed Persistent Memory
CN116348863A (en) * 2020-10-14 2023-06-27 甲骨文国际公司 System and method for extending transactional continuity across failures in a database
CN112363675A (en) * 2020-11-18 2021-02-12 苏州元核云技术有限公司 Control method and system based on distributed storage system
CN113239013A (en) * 2021-05-17 2021-08-10 北京青云科技股份有限公司 Distributed systems and storage media
CN113239013B (en) * 2021-05-17 2024-04-09 北京青云科技股份有限公司 Distributed systems and storage media
CN113297173B (en) * 2021-05-24 2023-10-31 阿里巴巴新加坡控股有限公司 Distributed database cluster management method and device, electronic equipment
CN113297173A (en) * 2021-05-24 2021-08-24 阿里巴巴新加坡控股有限公司 Distributed database cluster management method and device and electronic equipment
CN113391767A (en) * 2021-06-30 2021-09-14 北京百度网讯科技有限公司 Data consistency checking method and device, electronic equipment and readable storage medium
CN113704359A (en) * 2021-09-03 2021-11-26 优刻得科技股份有限公司 Synchronization method, system and server for multiple data copies of time sequence database
CN113704359B (en) * 2021-09-03 2024-04-26 优刻得科技股份有限公司 Method, system and server for synchronizing multiple data copies of time sequence database
US12321368B2 (en) 2023-07-26 2025-06-03 Douyin Vision Co., Ltd. Method, device, and storage medium for scheduling a distributed database

Similar Documents

Publication Publication Date Title
CN103729436A (en) Distributed metadata management method and system
CN113010496B (en) Data migration method, device, equipment and storage medium
US10169169B1 (en) Highly available transaction logs for storing multi-tenant data sets on shared hybrid storage pools
RU2449358C1 (en) Distributed file system and data block consistency managing method thereof
CN105550229B (en) Method and device for data restoration in distributed storage system
CN104281506B (en) Data maintenance method and system for file system
US7653668B1 (en) Fault tolerant multi-stage data replication with relaxed coherency guarantees
CN102012933B (en) Distributed file system and method for storing data and providing services by utilizing same
US20150363319A1 (en) Fast warm-up of host flash cache after node failover
US9659078B2 (en) System and method for supporting failover during synchronization between clusters in a distributed data grid
CN105338028B (en) Main and subordinate node electoral machinery and device in a kind of distributed server cluster
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
US9367261B2 (en) Computer system, data management method and data management program
US10282256B1 (en) System and method to enable deduplication engine to sustain operational continuity
US7440977B2 (en) Recovery method using extendible hashing-based cluster logs in shared-nothing spatial database cluster
CN107623703B (en) Synchronization method, device and system for global transaction identifier GTID
GB2484086A (en) Reliability and performance modes in a distributed storage system
US11003550B2 (en) Methods and systems of operating a database management system DBMS in a strong consistency mode
CN104679611A (en) Data resource copying method and device
CN115098519A (en) Data storage method and device
CN105550230B (en) The method for detecting and device of distributed memory system node failure
US11500812B2 (en) Intermediate file processing method, client, server, and system
CN114860155B (en) Transfer of synchronous replication and asynchronous replication
CN105323271B (en) Cloud computing system and processing method and device thereof
CN107943615B (en) Data processing method and system based on distributed cluster

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140416