CN103729436A

CN103729436A - Distributed metadata management method and system

Info

Publication number: CN103729436A
Application number: CN201310741599.XA
Authority: CN
Inventors: 王海平; 王树鹏; 张永铮; 吴广君; 周晓阳
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2014-04-16

Abstract

The present invention relates to a distributed metadata management method and system, and the method specifically includes: a storage step, dividing independent metadata nodes and user table nodes for storing metadata and user tables respectively, and adopting multiple metadata Nodes store multiple copies of metadata to form a master copy node and a slave copy node that store the same metadata; the verification step is to verify the data of the master copy node and the slave copy node to ensure that the master copy node and the slave copy node store The metadata is consistent; the repair step uses ZooKeeper technology to establish a monitoring ring based on the master replica node and the slave replica node. Switching to realize the recovery of downtime nodes. The system is in one-to-one correspondence with the technical solutions of the distributed metadata management method. The invention solves the problems of single-point failure and consistency among multiple pages existing in metadata management.

Description

A kind of distributed meta-data management method and system

Technical field

The invention belongs to mass data storage management review field, particularly relate to the metadata management of large field of data storage, is a kind of distributed meta-data management method and system.

Background technology

In recent years, along with the development of information society, increasing information, by datumization, is especially accompanied by the development of Internet, and data are explosive growth.First be the sharply expansion of memory capacity, thereby proposed larger demand for storage server; Next is the increase of data duration; Finally, the management of data storage is had higher requirement.Especially, the variation of data, geographic dispersiveness, protection of significant data etc. is all had higher requirement to data management.Along with the blast of unstructured data, distributed data base has also entered the gold period of development, from high-performance calculation to data center, from data sharing to internet, applications, has been penetrated into each face of each side of market demand.For most of distributed data bases, conventionally that metadata and data is independent, be about to control stream and data stream carry out separated, thereby obtain higher tactful extendability and I/O concurrency.Thereby metadata management model seems most important, directly have influence on tactful extendability, performance, reliability and stability etc.

Capacity increase in data storage is endless, and the management of metadata is also had higher requirement.When distributed storage, exist many machines to read while write the sight that metadata table is carried out to read and write access, require metadata management strategy that high stable, high performance Metadata Service are provided.Existing metadata management strategy probably has three classes: centralized metadata management strategy, without Metadata Service strategy and distributed meta-data management strategy.Storage and client query request that centralized Metadata Service strategy provides a central meta data server to be responsible for metadata, it provides unified NameSpace, and processes the access control functions such as location resolution and data location.Its shortcoming is very outstanding, wherein two the most key be performance bottleneck and Single Point of Faliure problem.Without Metadata Service strategy, adopt elasticity hash algorithm, directly abandon Metadata Service, allow metadata and data all together with storage.Data consistent problem is more complicated like this, and read-write operation inefficiency lacks global monitoring management function.Also cause client to bear more function, increased the load of client, take suitable CPU and internal memory simultaneously.Traditional distributed metadata management strategy use multiple servers forms cluster and works in coordination with and provide Metadata Service for distributed data base, thereby eliminate performance bottleneck and the Single Point of Faliure problem of centralized Metadata Service model, also eliminated inefficiency and the difficult problem of overall situation supervision without Metadata Service strategy.But traditional distributed metadata management strategy also has its defect, as the consistency problem between performance cost and many copies.

Therefore,, for the limitation of metadata management in prior art, the present invention proposes a kind of new distributed meta-data management method and system.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of distributed meta-data management method and system, for solving Single Point of Faliure that present technology metadata management exists and the problem such as consistance between several.

The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of distributed meta-data management method, specifically comprises the following steps:

Storing step: divide independently metadata node and subscriber's meter node, be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;

Checking procedure: carry out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;

Repair step: adopt ZooKeeper technology to set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.

On the basis of technique scheme, the present invention can also do following improvement.

Further, described storing step also comprises employing dynamical fashion or static mode extended metadata node;

Dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification;

Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.

Further, described to primary copy node with carry out data check from replica node and adopt lightweight data check mode, specifically comprise: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.

Further, described to primary copy node with carry out data check from replica node and adopt periodic data slicing files verification mode, specifically comprise: whether the data file of the data fragmentation of metadata node meeting periodic check self maintained loses, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.

Further, described to primary copy node and the data check mode of carrying out from replica node between the different copies of data check employing periodic data burst: primary copy node is obtained the piecemeal foundation of self, and to sending piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.

Further, in described reparation step, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.

Corresponding above-mentioned distributed meta-data management method, technical scheme of the present invention also comprises a kind of distributed meta-data management system, specifically comprises with lower module:

Memory module, be used for dividing independently metadata node and subscriber's meter node, make it be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;

Correction verification module, for carrying out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;

Repair module, for adopting ZooKeeper technology, set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.

Further, described memory module is also for adopting dynamical fashion or static mode extended metadata node;

Further, correction verification module comprises lightweight data check module, periodic data slicing files correction verification module and periodic data burst copy correction verification module;

Described lightweight data check module, its for: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.

Described periodic data slicing files correction verification module, its for: whether the data file of the data fragmentation of metadata node periodic check self maintained is lost, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.

Described periodic data burst copy correction verification module, its for: make primary copy node obtain the piecemeal foundation of self, and to send piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.

Further, in described reparation module, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.

The invention has the beneficial effects as follows: the present invention is independent of subscriber's meter by metadata and is stored on different nodes, when subscriber's meter node load is higher, can not affect the read-write of metadata, improved stability and the efficiency of metadata read-write; Meanwhile, the present invention can realize dynamic expansion metadata node, support many copies storages of metadata, has reduced the delay risk of machine of metadata node; The present invention has designed metadata data check link, and the metadata of storing in each replica node is consistent, and makes the stable performance of metadata cluster.In addition, owing to being provided with many copy storages, when killing wherein abnormal metadata node, other available metadata node can complete rapidly upgrading and repair, and have avoided the Single Point of Faliure phenomenon being prone in metadata management process.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of distributed meta-data management method of the present invention;

Fig. 2 dynamically increases the schematic diagram of metadata node in the embodiment of the present invention;

Fig. 3 is that in the embodiment of the present invention, metadata node triggers the schematic diagram that copy is revised;

Fig. 4 is the schematic flow sheet of the data check mode between the different copies of periodic data burst in the embodiment of the present invention;

Fig. 5 monitors ring upgrading modification process schematic diagram in the embodiment of the present invention;

Fig. 6 is the schematic flow sheet of distributed meta-data management method of the present invention.

Embodiment

Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.

As shown in Figure 1, the present embodiment has provided a kind of distributed meta-data management method, specifically comprises the following steps:

Based on these three steps, the specific implementation process of the present embodiment is divided into following three parts.

One, metadata store and metadata node expansion

Metadata and subscriber's meter data are stored respectively, and metadata node is supported dynamic expansion and static expansion.Because metadata and subscriber's meter are stored in different nodes, when subscriber's meter node load is high, can not have influence on the read-write of metadata, improved stability and the efficiency of metadata read-write.

In metadata store process, in order to guarantee data fault-tolerant, adopt many copies of a plurality of metadata node storing metadatas; In order to alleviate the work load of Master node, promote the cluster scale of mass storage system (MSS) simultaneously, introduced principal and subordinate's copy mechanism.In addition, also need to consider the expansion of carrying out metadata node according to actual conditions, metadata profile comprises dynamical fashion and static mode.Described dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification.And static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.

As shown in Figure 2, treatment scheme while having provided a dynamically newly-increased metadata node, the metadata node that indicates data with META_RS_01, META_RS_02, and META_RS_03 represents not have the newly-increased metadata node of data, Master is the managing process of being in charge of all back end.When newly-increased metadata node META_RS_03 starts, as a connecting object, initiatively arrive the registration of Master node, then Master preserves the data structure of this connecting object.The data structure of the connecting object of its preservation of Master periodic scanning, whether judgement has node to be registered to Master in the recent period, if had, carry out following operation: take out this connecting object, it is Zookeeper abbreviation that this newly-increased metadata node is registered to ZK(ZK), and upgrade the loop configuration (being the supervision of setting up by Zookeeper encircles) of the metadata node that ZK safeguards; As shown in Figure 3, Master takes out the data structure of the connecting object of preserving from the thread of regular triggering, triggering copy is repaired, and the data structure of connecting object is sent to primary copy node, primary copy node is carried out copy reparation operation, import the data of the data fragmentation of its all metadata table into this newly-increased metadata node, and start corresponding data trnascription service, using this newly-increased metadata node as from replica node.

Therefore, the known expansion of carrying out metadata node is in order to meet the demand of many copy storages of metadata, to cause for further preventing metadata node storing excess data the machine of delaying.The new metadata node of expansion is as from replica node, and the node that former storing metadata is used is as primary copy node, is beneficial to the delay problem of machine of follow-up solution metadata node.

Two, data check

Adopt after the storage of many copies, need to consider primary copy node and replica node the consistance of data, therefore need to carry out data check to primary copy node with from replica node.When carrying out data check, read-write service is not externally provided, the data check when metadata node is restarted is lightweight verification, periodic check during operation belongs to the verification of internal memory rank.

The present embodiment mainly adopts three kinds of verification modes:

First, lightweight data check mode, specifically comprise: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.Lightweight verification while restarting is the state in order to confirm to restart also.

Second, periodic data slicing files verification mode, specifically comprise: whether the data file of the data fragmentation of metadata node meeting periodic check self maintained loses, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.

The 3rd, data check mode between the different copies of periodic data burst: as shown in Figure 4, be provided with three replica node, primary copy node and two are from replica node, primary copy node is obtained the check_set of self, and the foundation of piecemeal has been stored in the inside, and to sending check_set and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares with the check_map of self, completes the verification of three replica node.If all consistent from replica node data, think data consistent, otherwise be as the criterion with primary copy data.Check_map is the variable of a mapping structure, and its key (key) is for identifying current data burst, and Value (value) is the md5 value of this data fragmentation.

Three, repair and upgrade

Realize after primary copy node and the data consistent replica node, need to utilize primary copy node and solve the metadata node machine problem of delaying from replica node.

The present embodiment is supported upgrading and the reparation from replica node, during startup, by ZooKeeper, is set up and is monitored ring, when there is the death of metadata process exception, according to dead role and quantity, triggers fast upgrading and repairs.ZooKeeper is the reliable coordination strategy for large-scale distributed strategy, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.

When the primary copy node of metadata is delayed machine, select first from replica node, to take over the work of primary copy node, in order to guarantee metadata table, externally service is uninterrupted.Adopt the abbreviation of ZK(ZooKeeper) monitor and trigger and switch metadata primary copy from copy.Key step is as follows:

1) in ZK, set up bibliographic structure/root node/father node/interim node.

2) each META copy and ZK set up session, and below father node, set up interim node, write the agent address of oneself, if session is expired, this interim node can disappear.

3) illustrate, as shown in Figure 5, META_RS_01 is primary copy node, META_RS_02, META_RS_03 and META_RS_04 are from replica node, META_RS_02 monitors whether the interim node of META_RS_01 exists, if primary copy node session is expired, the interim node that primary copy node is corresponding disappears.Now, first, from replica node META_RS_02, upgrade to primary copy node.META_RS_02 is upgraded to after primary copy node and changes into and monitor that last is from replica node META_RS_04, as shown in phantom in FIG..

Method successful described in the present embodiment, while not adopting this strategy, can find by monitoring, kill after metadata node, loading can not complete, and while being parked in 89% left and right, starts to point out mistake.And follow-uply can not user data and metadata be inquired about, insert, be revised and the operation such as deletion.And after adopting this strategy, after killing metadata node, as long as also there is a metadata node, be carried in (in example, test is about 30 seconds) after of short duration stopping, continuing to load, until 100%, follow-up can normal running to user data and metadata.

As shown in Figure 6, corresponding above-mentioned distributed meta-data management method, technical scheme of the present invention also comprises a kind of distributed meta-data management system, specifically comprises with lower module:

In the present embodiment, described memory module is also for adopting dynamical fashion or static mode extended metadata node, dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification.Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.

Equally as shown in Figure 6, described correction verification module comprises lightweight data check module, periodic data slicing files correction verification module and periodic data burst copy correction verification module;

In addition, in described reparation module, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.

Much more no longer the specific implementation process of described distributed meta-data management system is consistent with above-mentioned distributed meta-data management method, to state here.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. A distributed metadata management method, characterized in that, specifically comprising the following steps:

Storage step: Divide independent metadata nodes and user table nodes, which are used to store metadata and user tables respectively, and use multiple metadata nodes to store multiple copies of metadata to form a master copy node for storing the same metadata and slave replica nodes;

Verification step: perform data verification on the master replica node and the slave replica node to ensure the consistency of the metadata stored on the master replica node and the slave replica node;

Repair steps: use ZooKeeper technology to establish a monitoring ring based on the master replica node and the slave replica node. When the monitor ring monitors that there is a master replica node or the slave replica node is down, it triggers a switch between the master replica node and the slave replica node. Realize the repair of down nodes.

2. The distributed metadata management method according to claim 1, wherein the storing step further comprises expanding metadata nodes in a dynamic or static manner;

The dynamic method specifically includes: adding metadata empty nodes, and transmitting metadata to the found metadata empty nodes after the metadata empty nodes are found through verification;

The static method specifically includes: adding a new metadata node after all metadata nodes are shut down, and modifying the configuration of the newly added metadata node when it is started.

3. The distributed metadata management method according to claim 1, wherein the data verification of the master copy node and the slave copy node adopts a lightweight data verification method, specifically comprising: at the metadata node When starting, send a request to all metadata nodes to obtain the number of records of each metadata table fragment on each metadata node. If the number of records is inconsistent, it means that there is data inconsistency. Then close the unqualified metadata nodes. The data sharding service, delete the data of the data shard at the same time, and trigger the replica repair operation.

4. The distributed metadata management method according to claim 1, wherein the data verification of the master copy node and the slave copy node adopts a regular data slice file verification method, specifically comprising: metadata node Regularly check whether the data file of the data fragment maintained by itself is lost. If it is found to be lost, stop the data service of the data fragment on the current node, delete the data of the data fragment, and immediately trigger the replica repair operation.

5. The distributed metadata management method according to claim 1, wherein the data verification of the master copy node and the slave copy node adopts a data verification method between different copies of regular data fragmentation: master copy The node obtains its own block basis, and sends the block basis and verification request to the slave copy node. Both the master copy node and the slave copy node obtain the md5 value according to the block basis, and store the md5 value in check_map; the slave copy node The node returns the check_map to the master replica node. The master replica node receives the check_map of the slave replica and compares it with its own check_map. If the data of all the slave replica nodes are consistent, the data is considered to be consistent, otherwise the data of the master replica shall prevail.

6. the distributed metadata management method according to claim 1, is characterized in that, in the repairing step, determine whether there is a master copy node or a downtime from the copy node by judging whether the session of the metadata node and ZooKeeper expires, If the session expires, a downtime occurs, otherwise there is no downtime.

7. A distributed metadata management system, characterized in that it specifically includes the following modules:

The storage module is used to divide independent metadata nodes and user table nodes, so that they are used to store metadata and user tables respectively, and multiple metadata nodes are used to store multiple copies of metadata to form a database that is used to store the same metadata The master replica node and slave replica node;

The verification module is used to perform data verification on the master replica node and the slave replica node, so as to ensure the consistency of the metadata stored on the master replica node and the slave replica node;

The repair module is used to establish a monitoring ring based on the master replica node and the slave replica node using ZooKeeper technology. When the monitor ring monitors that there is a master replica node or a slave replica node is down, it triggers a Switching to realize the recovery of downtime nodes.

8. The distributed metadata management system according to claim 7, wherein the storage module is also used to expand metadata nodes dynamically or statically;

9. The distributed metadata management system according to claim 7, wherein the verification module includes a lightweight data verification module, a regular data fragmentation file verification module and a regular data fragmentation copy verification module;

The lightweight data verification module is used to: send a request to all metadata nodes when the metadata node starts, and obtain the number of records of each metadata table fragment on each metadata node, if the record If the number is inconsistent, indicating that there is data inconsistency, the data sharding service on the unqualified metadata node will be shut down, the data of the data shard will be deleted at the same time, and the copy repair operation will be triggered;

The regular data sharding file verification module is used to: make the metadata node periodically check whether the data file of the data shard maintained by itself is lost, and if it is found to be lost, stop the data of the data shard at the current node service, delete the data of the data fragment at the same time, and immediately trigger the copy repair operation;

The periodic data fragment copy verification module is used to: make the master copy node obtain its own block basis, and send the block basis and verification request to the slave copy node, and the master copy node and the slave copy node are based on the The block is based on the md5 value obtained, and the md5 value is stored in the check_map; the slave replica node returns the check_map to the master replica node, and the master replica node receives the check_map of the slave replica and compares it with its own check_map. If all slave replica node data If they are all consistent, the data is considered to be consistent, otherwise the data of the master copy shall prevail.

10. the distributed metadata management system according to claim 7, is characterized in that, in described repairing module, determine whether to have primary replica node or slave replica node downtime by judging whether the session of metadata node and ZooKeeper expires, If the session expires, a downtime occurs, otherwise there is no downtime.