Summary of the invention
Technical matters to be solved by this invention is to provide a kind of distributed meta-data management method and system, for solving Single Point of Faliure that present technology metadata management exists and the problem such as consistance between several.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of distributed meta-data management method, specifically comprises the following steps:
Storing step: divide independently metadata node and subscriber's meter node, be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Checking procedure: carry out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair step: adopt ZooKeeper technology to set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described storing step also comprises employing dynamical fashion or static mode extended metadata node;
Dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification;
Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
Further, described to primary copy node with carry out data check from replica node and adopt lightweight data check mode, specifically comprise: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.
Further, described to primary copy node with carry out data check from replica node and adopt periodic data slicing files verification mode, specifically comprise: whether the data file of the data fragmentation of metadata node meeting periodic check self maintained loses, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
Further, described to primary copy node and the data check mode of carrying out from replica node between the different copies of data check employing periodic data burst: primary copy node is obtained the piecemeal foundation of self, and to sending piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.
Further, in described reparation step, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
Corresponding above-mentioned distributed meta-data management method, technical scheme of the present invention also comprises a kind of distributed meta-data management system, specifically comprises with lower module:
Memory module, be used for dividing independently metadata node and subscriber's meter node, make it be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Correction verification module, for carrying out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair module, for adopting ZooKeeper technology, set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
Further, described memory module is also for adopting dynamical fashion or static mode extended metadata node;
Dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification;
Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
Further, correction verification module comprises lightweight data check module, periodic data slicing files correction verification module and periodic data burst copy correction verification module;
Described lightweight data check module, its for: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.
Described periodic data slicing files correction verification module, its for: whether the data file of the data fragmentation of metadata node periodic check self maintained is lost, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
Described periodic data burst copy correction verification module, its for: make primary copy node obtain the piecemeal foundation of self, and to send piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.
Further, in described reparation module, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
The invention has the beneficial effects as follows: the present invention is independent of subscriber's meter by metadata and is stored on different nodes, when subscriber's meter node load is higher, can not affect the read-write of metadata, improved stability and the efficiency of metadata read-write; Meanwhile, the present invention can realize dynamic expansion metadata node, support many copies storages of metadata, has reduced the delay risk of machine of metadata node; The present invention has designed metadata data check link, and the metadata of storing in each replica node is consistent, and makes the stable performance of metadata cluster.In addition, owing to being provided with many copy storages, when killing wherein abnormal metadata node, other available metadata node can complete rapidly upgrading and repair, and have avoided the Single Point of Faliure phenomenon being prone in metadata management process.
Embodiment
Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, the present embodiment has provided a kind of distributed meta-data management method, specifically comprises the following steps:
Storing step: divide independently metadata node and subscriber's meter node, be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Checking procedure: carry out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair step: adopt ZooKeeper technology to set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
Based on these three steps, the specific implementation process of the present embodiment is divided into following three parts.
One, metadata store and metadata node expansion
Metadata and subscriber's meter data are stored respectively, and metadata node is supported dynamic expansion and static expansion.Because metadata and subscriber's meter are stored in different nodes, when subscriber's meter node load is high, can not have influence on the read-write of metadata, improved stability and the efficiency of metadata read-write.
In metadata store process, in order to guarantee data fault-tolerant, adopt many copies of a plurality of metadata node storing metadatas; In order to alleviate the work load of Master node, promote the cluster scale of mass storage system (MSS) simultaneously, introduced principal and subordinate's copy mechanism.In addition, also need to consider the expansion of carrying out metadata node according to actual conditions, metadata profile comprises dynamical fashion and static mode.Described dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification.And static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
As shown in Figure 2, treatment scheme while having provided a dynamically newly-increased metadata node, the metadata node that indicates data with META_RS_01, META_RS_02, and META_RS_03 represents not have the newly-increased metadata node of data, Master is the managing process of being in charge of all back end.When newly-increased metadata node META_RS_03 starts, as a connecting object, initiatively arrive the registration of Master node, then Master preserves the data structure of this connecting object.The data structure of the connecting object of its preservation of Master periodic scanning, whether judgement has node to be registered to Master in the recent period, if had, carry out following operation: take out this connecting object, it is Zookeeper abbreviation that this newly-increased metadata node is registered to ZK(ZK), and upgrade the loop configuration (being the supervision of setting up by Zookeeper encircles) of the metadata node that ZK safeguards; As shown in Figure 3, Master takes out the data structure of the connecting object of preserving from the thread of regular triggering, triggering copy is repaired, and the data structure of connecting object is sent to primary copy node, primary copy node is carried out copy reparation operation, import the data of the data fragmentation of its all metadata table into this newly-increased metadata node, and start corresponding data trnascription service, using this newly-increased metadata node as from replica node.
Therefore, the known expansion of carrying out metadata node is in order to meet the demand of many copy storages of metadata, to cause for further preventing metadata node storing excess data the machine of delaying.The new metadata node of expansion is as from replica node, and the node that former storing metadata is used is as primary copy node, is beneficial to the delay problem of machine of follow-up solution metadata node.
Two, data check
Adopt after the storage of many copies, need to consider primary copy node and replica node the consistance of data, therefore need to carry out data check to primary copy node with from replica node.When carrying out data check, read-write service is not externally provided, the data check when metadata node is restarted is lightweight verification, periodic check during operation belongs to the verification of internal memory rank.
The present embodiment mainly adopts three kinds of verification modes:
First, lightweight data check mode, specifically comprise: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.Lightweight verification while restarting is the state in order to confirm to restart also.
Second, periodic data slicing files verification mode, specifically comprise: whether the data file of the data fragmentation of metadata node meeting periodic check self maintained loses, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
The 3rd, data check mode between the different copies of periodic data burst: as shown in Figure 4, be provided with three replica node, primary copy node and two are from replica node, primary copy node is obtained the check_set of self, and the foundation of piecemeal has been stored in the inside, and to sending check_set and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares with the check_map of self, completes the verification of three replica node.If all consistent from replica node data, think data consistent, otherwise be as the criterion with primary copy data.Check_map is the variable of a mapping structure, and its key (key) is for identifying current data burst, and Value (value) is the md5 value of this data fragmentation.
Three, repair and upgrade
Realize after primary copy node and the data consistent replica node, need to utilize primary copy node and solve the metadata node machine problem of delaying from replica node.
The present embodiment is supported upgrading and the reparation from replica node, during startup, by ZooKeeper, is set up and is monitored ring, when there is the death of metadata process exception, according to dead role and quantity, triggers fast upgrading and repairs.ZooKeeper is the reliable coordination strategy for large-scale distributed strategy, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
When the primary copy node of metadata is delayed machine, select first from replica node, to take over the work of primary copy node, in order to guarantee metadata table, externally service is uninterrupted.Adopt the abbreviation of ZK(ZooKeeper) monitor and trigger and switch metadata primary copy from copy.Key step is as follows:
1) in ZK, set up bibliographic structure/root node/father node/interim node.
2) each META copy and ZK set up session, and below father node, set up interim node, write the agent address of oneself, if session is expired, this interim node can disappear.
3) illustrate, as shown in Figure 5, META_RS_01 is primary copy node, META_RS_02, META_RS_03 and META_RS_04 are from replica node, META_RS_02 monitors whether the interim node of META_RS_01 exists, if primary copy node session is expired, the interim node that primary copy node is corresponding disappears.Now, first, from replica node META_RS_02, upgrade to primary copy node.META_RS_02 is upgraded to after primary copy node and changes into and monitor that last is from replica node META_RS_04, as shown in phantom in FIG..
Method successful described in the present embodiment, while not adopting this strategy, can find by monitoring, kill after metadata node, loading can not complete, and while being parked in 89% left and right, starts to point out mistake.And follow-uply can not user data and metadata be inquired about, insert, be revised and the operation such as deletion.And after adopting this strategy, after killing metadata node, as long as also there is a metadata node, be carried in (in example, test is about 30 seconds) after of short duration stopping, continuing to load, until 100%, follow-up can normal running to user data and metadata.
As shown in Figure 6, corresponding above-mentioned distributed meta-data management method, technical scheme of the present invention also comprises a kind of distributed meta-data management system, specifically comprises with lower module:
Memory module, be used for dividing independently metadata node and subscriber's meter node, make it be respectively used to storing metadata and subscriber's meter, and adopt many copies of a plurality of metadata node storing metadatas, form all for storing the primary copy node of same metadata and from replica node;
Correction verification module, for carrying out data check to primary copy node with from replica node, with the consistance of the metadata that guarantees primary copy node and store from replica node;
Repair module, for adopting ZooKeeper technology, set up based on primary copy node with from the supervision ring of replica node, when monitoring that ring has monitored primary copy node or delayed machine from replica node, it triggers primary copy node and from the switching between replica node, realizes the reparation to the machine node of delaying.
In the present embodiment, described memory module is also for adopting dynamical fashion or static mode extended metadata node, dynamical fashion specifically comprises: increase the empty node of metadata, find after the empty node of metadata, to the empty node transmission unit of the metadata of finding data by verification.Static mode specifically comprises: after all metadata node shutdown, then increase new metadata node, and revise its configuration when this newly-increased metadata node starts.
Equally as shown in Figure 6, described correction verification module comprises lightweight data check module, periodic data slicing files correction verification module and periodic data burst copy correction verification module;
Described lightweight data check module, its for: when metadata node starts, to all metadata node, send request, obtain the number that records of each metadata table burst in each metadata node, if it is inconsistent to record number, illustrated that data are inconsistent, close the data fragmentation service in ineligible metadata node, delete the data of this data fragmentation simultaneously, and trigger copy reparation operation.
Described periodic data slicing files correction verification module, its for: whether the data file of the data fragmentation of metadata node periodic check self maintained is lost, if find, lose, stop this data fragmentation in the data, services of present node, delete the data of this data fragmentation simultaneously, and trigger at once copy reparation operation.
Described periodic data burst copy correction verification module, its for: make primary copy node obtain the piecemeal foundation of self, and to send piecemeal foundation and check request from replica node, primary copy node and from replica node all according to this piecemeal according to obtaining md5 value, and md5 value is deposited in check_map; From replica node, check_map is returned to primary copy node, primary copy node is received the check_map from copy, compares, if all consistent from replica node data with the check_map of self, think data consistent, otherwise be as the criterion with primary copy data.
In addition, in described reparation module, by judging that whether the session of metadata node and ZooKeeper is expired, determined whether primary copy node or from the replica node machine of delaying, if session is expired, the machine of delaying, otherwise the machine of not delaying.
Much more no longer the specific implementation process of described distributed meta-data management system is consistent with above-mentioned distributed meta-data management method, to state here.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.