CN102063438A

CN102063438A - Recovery method and device for damaged files

Info

Publication number: CN102063438A
Application number: CN2009102247737A
Authority: CN
Inventors: 朱国云; 黄贵; 蔡景现
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2009-11-17
Filing date: 2009-11-17
Publication date: 2011-05-18

Abstract

The embodiment of the invention discloses a recovery method and a recovery device for damaged files. The method comprises the following steps of: when a data node fails, searching for names of all files on the failing data node and data node names of backup files of all the files; randomly selecting one of the data nodes indicated by the data node names, and reading the backup files from the selected data node; and simultaneously writing the read backup files into any other data node, except the failing data node and the data node of the backup files, in a distributed file system. The embodiment of the invention can shorten the recovery time of the damaged files.

Description

A kind of restoration methods of impaired file and device

Technical field

The application relates to the distributed file system field, particularly relates to a kind of restoration methods and device of impaired file.

Background technology

Along with the continuous development of Internet, the data on the internet become explosive and increase, and form mass data.And the file that carries mass data is easy to because the disk failures at the system failure, file place etc. are former thereby be damaged, and can't normally be read.Wherein, at DFS (Distributed File System, distributed file system) in, each file all exists with the form of a plurality of backup files, for example, has content identical three file f 1, f2 and f3 in the system, these three files file that backups each other, that is to say, can be with file f 2 and file f 3 backup file as file f 1, also can be with file f 1 and file f 3 backup file as file f 2.In DFS, a disk has constituted a data node, and each backup file leaves on the different back end.That is, each backup file intersects on back end and deposits.For example, file f leaves on back end A, B and the C, and file g leaves on back end C, D and the E.In case after impaired file occurring, system will recover this impaired file again according to the backup file of this impaired file, to guarantee the data security in DFS.

In existing DFS, adopt the Restoration Mechanism in the one-of-a-kind system that impaired file is recovered.For example, when having 1000 disks among the DFS, will be with the actual disk of some as a RAID (Redundant Array of Independent Disk, raid-array) card, as 5 actual disks being made a RAID dish, then 1000 actual disks among the DFS can be used as 200 RAID dishes.In case after detecting a disk on the RAID dish and being damaged, the All Files on this impaired disk also all damages, and also just loses efficacy with the pairing back end of this impaired disk.At this moment, will enable a new disk, and read the backup file of all impaired files on the impaired disk from the disk of other 4 operate as normal of RAID dish respectively, and each backup file that reads is write respectively in the new disk, finish resuming work of impaired file.

But, the inventor finds under study for action, in existing impaired file rejuvenation, usually need manually come more to renew disk, and, because disk is one of the slowest equipment of access speed in the existing system, therefore, cause this rejuvenation consuming time longer, basically all can be more than one hour.If release time is longer, other disk failures is probably during restoration arranged in addition, this can further cause loss of data.

Summary of the invention

In order to solve the problems of the technologies described above, the embodiment of the present application provides a kind of restoration methods and device of impaired file, to reduce the release time of impaired file.

The embodiment of the present application discloses following technical scheme:

A kind of restoration methods of impaired file comprises: when a data node failure, search the back end title at the backup file place of the title of All Files on the fail data node and All Files; From the back end at the backup file place that described back end name referring shows, select a data node arbitrarily respectively, and from the back end of selecting, read backup file; The backup file that reads is write in the distributed file system in other any one back end except that the back end at described fail data node and backup file place simultaneously.

A kind of restoration methods of impaired file comprises: when file corruption on the back end, obtain the back end title at the backup file place of impaired file; Select a data node on the back end that shows from the back end name referring that obtains arbitrarily, and from the back end of selecting, read backup file; The described backup file that reads is write in the back end of described impaired file place in other arbitrary region except that the shared zone of described impaired file.

A kind of recovery device of impaired file comprises: search the unit, be used for when a data node failure, search the back end title at the backup file place of the title of All Files on the fail data node and All Files; First reading unit is selected a data node arbitrarily on the back end that is used for respectively showing from described back end name referring, and read backup file from the back end of selecting; First recovery unit, the backup file that is used for reading write in distributed file system other any one back end except that the back end at described fail data node and backup file place simultaneously.

A kind of recovery device of impaired file comprises: acquiring unit is used for obtaining the back end title at the backup file place of impaired file when file corruption on the back end; Second reading unit is selected a data node arbitrarily, and read backup file from the back end of selecting on the back end that is used for showing from the back end name referring that obtains; Second recovery unit, the described backup file that is used for reading write in described impaired file place back end other arbitrary region except that the shared zone of described impaired file.

As can be seen from the above-described embodiment, when whole back end lost efficacy, in the application's DFS system, can adopt distributed form simultaneously the backup file of impaired file on the failure node to be write on other the back end, impaired file can only be collectively written into a new disk or back end is compared with existing, improve resume speed greatly, reduced the release time of impaired file.

In addition, when the single file on certain back end is impaired, the application also provides a kind of method that can recover single impaired file, with the method that the impaired file on the failure node is recovered can only be provided in the prior art compare, when single file is impaired, just can be resumed the probability that to avoid whole DFS system to make mistakes.

Description of drawings

In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiment of the application, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the process flow diagram of an embodiment of the restoration methods of a kind of impaired file of the application;

Fig. 2 is the system architecture diagram of the application's distributed file system;

Fig. 3 is the process flow diagram of an embodiment of the detection of a kind of impaired file of the application and restoration methods;

Fig. 4 is the process flow diagram of another embodiment of the restoration methods of a kind of impaired file of the application;

Fig. 5 is the process flow diagram of another embodiment of the detection of a kind of impaired file among the application and restoration methods;

Fig. 6 is the structural drawing of an embodiment of the recovery device of a kind of impaired file of the application;

Fig. 7 is the structural drawing of another embodiment of the recovery device of a kind of impaired file of the application.

Embodiment

For above-mentioned purpose, the feature and advantage that make the application can become apparent more, the embodiment of the present application is described in detail below in conjunction with accompanying drawing.

Embodiment one

See also Fig. 1, it is the process flow diagram of an embodiment of the restoration methods of a kind of impaired file of the application, and this method may further comprise the steps:

Step 101: when a data node failure, search the back end title at the backup file place of the title of All Files on the fail data node and All Files;

Wherein, in the present embodiment, the executive agent of carrying out impaired file rejuvenation can be the name space node among the DFS (Distributed File System, distributed file system), can be to be exclusively used in the functional entity that impaired file recovers, the application limit this yet.The rejuvenation of carrying out impaired file with the name space node is that example describes below, when the file that occurs in some back end among the DFS damaging reaches some, as 5 ‰, think that this back end lost efficacy, All Files on this back end all is identified as " impaired file ", that is to say that the All Files on this back end all needs to be resumed.Under the situation that losing efficacy appears in a certain back end in DFS, the name space node can be searched the title of the All Files on the fail data node by the listed files of reading of data node from the metadata of self, and by the tabulate back end title at the backup file place of searching All Files on the failure node of the back end that reads the file place.Certainly, if by being exclusively used in functional entity that impaired file recovers when realizing recovering, find the back end title at backup file place of the title of All Files on the back end and All Files when the name space node after, just can obtain the content that these find from the name space node by this functional entity.

For example, 26 data node: A-Z are arranged among the DFS, when the back end A among the DFS lost efficacy, the listed files of name space node among DFS reading of data node A from the metadata of self: FileList={f1, f2, f3, f4} can be known follow-up which file that needs to recover from this document tabulation.Certainly, in fact can have thousands of files at a data intranodal, present embodiment is the rejuvenation of impaired file for convenience of description, and the spy is reduced to 4 with the file number in the back end, that is, the impaired file that needs in the present embodiment to recover is f1, f2, f3 and f4.Simultaneously, the name space node among the DFS also will read the back end tabulation at file place from the metadata of self: Replicas={f1 (A, B, C), f2 (A, E, H), f3 (A, C, D), f4 (A, C, L) }, from this back end tabulation, can know that impaired file can be recovered by the backup file on which back end.In the present embodiment, impaired file f 1 can be recovered by the backup file on back end B or the C, impaired file f 2 can be recovered by the backup file on back end E or the H, impaired file f 3 can be recovered by the backup file on back end C or the D, and impaired file f 4 can be recovered by the backup file on back end C or the L.

Step 102: select a data node respectively on the back end that shows from described back end name referring arbitrarily, and from the back end of selecting, read backup file;

Wherein, still be example with above-mentioned situation, can go up data node of selection arbitrarily from back end B or C, and read the backup file of impaired file f 1 from the back end of selecting, meanwhile, go up data node of selection arbitrarily from back end E or H, and read a backup file of impaired file f 2 from the back end of selecting, go up data node of selection arbitrarily from back end C or D, and read a backup file of impaired file f 3 from the back end of selecting, go up data node of selection arbitrarily from back end C or L, and read a backup file of impaired file f 4 from the back end of selecting.At this moment, read a backup file of all impaired files.For example, the name space node is selected back end B, E, C and L respectively, and reads the backup file of impaired file f 1, f2, f3 and f4 from back end B, E, C and L.

Step 103: the backup file that reads is write among the distributed file system DFS in other any one back end except that the back end at described fail data node and backup file place simultaneously.

Wherein, continuing with above-mentioned situation is example, and the name space node is selected back end B, E, C and L respectively, and has read the backup file of impaired file f 1, f2, f3 and f4 from back end B, E, C and L.At this moment, the fail data node is A, and the back end at the backup file place of f1 is B and C, then the backup file of f1 can be write in the back end B and any one back end the C at fail data node A and backup file place.Identical therewith, the backup file of f2 is write in the back end E and any one back end the H at fail data node A and backup file place, f3 is write in the back end C and any one back end the D at fail data node A and backup file place, f4 is write in the back end C and any one back end the L at fail data node A and backup file place.

In addition, the embodiment of the present application may further include by each back end and finishes self-testing process: when a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and do not detect when self losing efficacy, receive the Indication message that described back end sends, described Indication message is used to indicate described back end not lose efficacy, and when not receiving described Indication message, described back end lost efficacy.

For example, A detects the oneself by back end, will be divided into several zones in the back end A, and each zone includes some files, as 1000 files are arranged in the data node A, is divided into 10 zones, and each zone comprises 100 files.And, only in self load of back end A during less than load threshold, just can detect the file in each zone successively, as, when self load for the first time occurring less than load threshold, begin the file in the zone is detected from first zone, when detecting trizonal file, self load of back end A is not less than load threshold, then after having detected the 3rd zone, just stop to detect, when occurring for the second time self load again, then continue to detect since the 4th zone less than load threshold, by that analogy, till the All Files in the All Ranges that has detected back end A.When back end A does not detect when self losing efficacy, will be to the executive agent of rejuvenation, as the name space node, send an Indication message, being used for designation data node A did not lose efficacy, when back end A detects when self losing efficacy, will stop to send Indication message to the executive agent of rejuvenation.When the name space node does not receive Indication message, will learn that back end A lost efficacy, and finish a rejuvenation.Wherein, above-mentioned load threshold is determined by following mode: at first, and the read and write access number of times of back end A in ought be for the previous period, i.e. IO number.Secondly, the function getloadavg () that provides according to system obtains the current server load.At last, by read and write access number of times and current server load are weighted on average, obtain a numerical value, this numerical value is load threshold.

Wherein, back end A still can be by comparing the first cyclic redundancy check (CRC) code CRC (the Cyclical Redundancy Check in the metamessage, cyclic redundancy check (CRC) code) with the 2nd CRC that calculates according to file content, detect a CRC described in self the All Files and described the 2nd CRC number of files inequality and whether reach threshold value, if back end A lost efficacy.Metamessage wherein and file content can be obtained from the name space node by back end A.As, 1000 files are arranged in the back end A, CRC (Cyclical Redundancy Check in the metamessage of each file of back end A comparison, cyclic redundancy check (CRC) code) and the CRC that calculates according to file content, two crc value files inequality are impaired file, add up two impaired number of files that crc value is inequality, when impaired number of files reaches threshold value, as 5 ‰ o'clock, be that back end A is when having counted 5 impaired files from 1000 files, think that then back end A lost efficacy, finish a rejuvenation by the name space node.Wherein, the threshold value whether above-mentioned judgment data node lost efficacy is by the significance level decision of file, for example, when the data of storing in the file were the data of bank inside, its significance level was higher, will be lower numeral with this threshold setting, as 1 ‰, when the data of storing in the file were general user's photo, its significance level was lower, will be higher numeral with this threshold setting, as 5 ‰.

Above-mentioned by crc value relatively and then learn detection method that whether back end lost efficacy not only can detect because the impaired file that Bad Track produces, but also can detect the impaired file that produces owing to damages in logic such as system mistakes.

As can be seen from the above-described embodiment, when whole back end lost efficacy, in the application's DFS system, can adopt distributed form simultaneously the backup file of impaired file on the failure node to be write on other the back end simultaneously, impaired file can only be collectively written into a new disk or back end is compared with existing, improve resume speed greatly, reduced the release time of impaired file.

In addition, two kinds of methods that detect failure node are provided, by crc value relatively and then learn detection method that whether back end lost efficacy not only can detect because the impaired file that Bad Track produces, but also can detect the impaired file that produces owing to damages in logic such as system mistakes.And, in testing process, only under the lighter situation of the load of back end, just detect, will stop when load is high detecting, thereby also make testing process can not influence external application program.

Embodiment two

Be example with a concrete application scenarios below, and describe the restoration methods of impaired file in conjunction with a kind of detection method in detail.Wherein, carry out the oneself with the back end among the DFS and detect, and the impaired file in the fail data node is reverted to example, the rejuvenation of impaired file is described by the name space node.See also Fig. 2, it is the system framework figure of the application's distributed file system.In DFS, include a name space node and lot of data node.The name space node is responsible for the location of management, data access of management, the metadata of back end among the DFS and DATA DISTRIBUTION etc., and back end is responsible for storage, management and reading and writing of files.Simultaneously, applications client APP conducts interviews to DFS.For example, 26 data node: A-Z are arranged among the DFS, 4 files are respectively arranged on each back end, certainly, in actual conditions, can have thousands of files on the data node, present embodiment is the rejuvenation of impaired file for convenience of description, and the spy is reduced to 4 with the file number in the back end.See also Fig. 3, it is the process flow diagram of an embodiment of the detection of a kind of impaired file of the application and restoration methods, may further comprise the steps:

Step 301: back end A is under the situation of self load less than load threshold, read file in each zone successively according to the zone of dividing, and CRC in each file metamessage of comparison and the CRC that calculates according to file content, when two CRC of some files were inequality, mark this document was impaired file;

For example, 1000 files are arranged in the back end A, be divided into 10 zones, each zone comprises 100 files, only in self load of back end A during less than load threshold, just can detect the file in each zone successively, as, when self load for the first time occurring less than load threshold, begin the file in the zone is detected from first zone, when detecting trizonal file, self load of back end A is not less than load threshold, then just stops detection after having detected the 3rd zone, when for the second time occurring self load again less than load threshold, then continue to detect since the 4th zone, by that analogy, till the All Files in the All Ranges that has detected back end A.

File metamessage wherein comprises: file size, creation-time, modification time and CRC.

Step 302: the quantity of the impaired file of back end A statistics self, when quantity reaches threshold value, assert that back end A lost efficacy;

Wherein, the threshold value that whether lost efficacy of judgment data node A can be 5 ‰.Other back end among the DFS all determine according to the method for above-mentioned steps 301-303 whether self lost efficacy.

Step 303: when back end A did not lose efficacy, digital nodes A regularly sent an Indication message to the name space node, was used for designation data node A and did not lose efficacy, and when the name space node is not received Indication message, assert that back end A lost efficacy;

Step 304: the name space node is after assert that back end A lost efficacy, and the back end at the listed files of reading of data node A and each file place is tabulated from the metadata of self;

Wherein, the listed files of the reading of data node A from the metadata of self of the name space node among the DFS: FileList={f1, f2, f3, f4} can be known follow-up which file that needs to recover from this document tabulation.The impaired file that needs in the present embodiment to recover is f1, f2, f3 and f4.Simultaneously, the name space node among the DFS also will read the back end tabulation at file place from the metadata of self: Replicas={f1 (A, B, C), f2 (A, E, H), f3 (A, C, D), f4 (A, C, L) }, from this back end tabulation, can know that impaired file can be recovered by the backup file on which back end.In the present embodiment, impaired file f 1 can be recovered by the backup file on back end B or the C, impaired file f 2 can be recovered by the backup file on back end E or the H, impaired file f 3 can be recovered by the backup file on back end C or the D, and impaired file f 4 can be recovered by the backup file on C on the back end or the L.

Step 305: the name space node is selected a data node arbitrarily on the back end at each file place is tabulated indicated back end, and reads backup file from the back end of selecting;

Wherein, the name space node is selected back end B, E, C and L respectively, and reads the backup file of impaired file f 1, f2, f3 and f4 from back end B, E, C and L.

Step 306: the name space node writes the backup file that reads among the DFS in other any one back end except that the back end at fail data node A and backup file place simultaneously.

Wherein, the name space node is selected back end B, E, C and L respectively, and has read the backup file of impaired file f 1, f2, f3 and f4 from back end B, E, C and L.At this moment, the fail data node is A, and the back end at the backup file place of f1 is B, C, then the backup file of f1 can be write in any one back end except that back end B, the C at fail data node A and backup file place.Identical therewith, the backup file of f2 is write in any one back end except that back end E, the H at fail data node A and backup file place, f3 is write in any one back end except that back end C, the D at fail data node A and backup file place, f4 is write in any one back end except that back end C, the L at fail data node A and backup file place.So far, finish resuming work of impaired file.

As can be seen from the above-described embodiment, as can be seen from the above-described embodiment, when whole back end lost efficacy, in the application's DFS system, can adopt distributed form simultaneously the backup file of impaired file on the failure node to be write on other the back end, impaired file can only be collectively written into a new disk or back end is compared with existing, improve resume speed greatly, reduce the release time of impaired file.

Embodiment three

The embodiment of the present application also provides a kind of restoration methods of impaired file, and the difference of present embodiment and embodiment one is: a kind of method that All Files in the fail data node is recovered is provided among the embodiment one; And the number of impaired file does not reach threshold value on back end, and when this back end did not lose efficacy, present embodiment provided a kind of method that single impaired file in the back end that did not lose efficacy is recovered.See also Fig. 4, it is the process flow diagram of another embodiment of the restoration methods of a kind of impaired file of the application, and this method may further comprise the steps:

Step 401: when file corruption on the back end, obtain the back end title at the backup file place of impaired file;

Wherein, in the present embodiment, the executive agent of carrying out impaired file rejuvenation can be the back end among the DFS, also can be to be exclusively used in the functional entity that impaired file recovers, and the application's contrast does not limit.The rejuvenation of carrying out impaired file with back end is that example describes below.When file corruption on some back end among the DFS, when damaging as the file f on the data node A, back end A tabulates by the back end that obtains impaired file f place from the name space node, thereby obtains the back end title at the backup file place of impaired file f.

For example, (in the time of C), the back end that can obtain the backup file place of impaired file f from this back end tabulation is B or C for A, B for Replicas={f when back end A tabulates from the back end at the impaired file f place that the name space node obtains.

Step 402: select a data node on the back end that shows from the back end name referring that obtains arbitrarily, and from the back end of selecting, read backup file;

Wherein, be example still with above-mentioned situation, can go up data node of selection arbitrarily from back end B or C, and read the backup file of impaired file f from the back end of selecting.For example, back end A selects back end B, and reads the backup file of impaired file f from back end B.

Step 403: the described backup file that will read writes in the back end of described impaired file place in other arbitrary regions except that the shared zone of described impaired file.

Wherein, each file among the back end A occupies storage area certain among the back end A respectively, after the file f among the back end A is impaired, the backup file of impaired file f can be write in other arbitrary region outside the shared zone of impaired file f.

In addition, the embodiment of the present application may further include when the detection of being finished by APP or back end self impaired file, receives the notification message that APP or back end comprise testing result.When detecting by APP, present embodiment further comprises: as two CRC of an applications client APP by comparing the CRC in the metamessage and calculating according to file content, detect the described CRC of a file of a data intranodal and described the 2nd CRC when inequality, receive the notification message that described APP sends, described notification message is used to indicate the tested files on the measured data node impaired.

When carrying out oneself's detection by back end, present embodiment further comprises: when a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and detect when impaired file is arranged self, receive the notification message that described back end sends, described notification message is used to indicate the tested files on the described back end impaired.

Wherein, a described data node is under the situation of self load less than load threshold, detecting each regional file successively according to the zone of dividing in the back end comprises: described back end is by comparing the CRC and the 2nd CRC that calculates according to file content in the metamessage, whether the CRC and described the 2nd CRC that detect in self each file be inequality, if tested files is impaired.

Because above-mentioned testing process will be described in detail in embodiment one and embodiment two, so locate to repeat no more.

As can be seen from the above-described embodiment, when the single file on certain back end is impaired, the application also provides a kind of method that can recover single impaired file, with the method that the impaired file on the failure node is recovered can only be provided in the prior art compare, when when single file is impaired, just being resumed, the probability that can avoid whole DFS system to make mistakes.

Embodiment four

Be example with a concrete application scenarios below, and describe the restoration methods of impaired file in conjunction with a kind of detection method in detail.Wherein, the back end among the DFS is detected, and impaired file is reverted to example, the rejuvenation of impaired file is described by the name space node with APP.See also Fig. 5, it is the process flow diagram of another embodiment of the detection of a kind of impaired file of the application and restoration methods, may further comprise the steps:

The file f of step 501: applications client APP on back end A requests data reading node A;

Wherein, 26 data nodes are arranged on the DFS, APP can read the file on the back end among the DFS according to the demand of client, as, when the user passes through the f file of APP visit data node A, trigger APP and send the request of reading file f, and carry out the testing process of subsequent file f to back end A.

Step 502: the metamessage of back end A backspace file f and file content are given APP;

Step 503:APP is the CRC in the comparison document f metamessage and the CRC that calculates according to file content successively, and when two CRC of file f were inequality, mark this document was impaired file;

For example, by the comparison of CRC, the file f that detects among the back end A is impaired.

Step 504:APP sends a notification message to back end A, and the file f that is used for designation data node A is impaired;

Step 505: when back end A receives the impaired notification message of file f, obtain the back end tabulation at file f place from the name space node;

For example, (in the time of C), the back end that can obtain the backup file place of impaired file f from this back end tabulation is B or C to the tabulation of the back end at impaired file f place for A, B for Replicas={f.

Step 506: back end A selects a data node arbitrarily from the indicated back end of the back end tabulation at file f place, and reads backup file from the back end of selecting;

For example, back end A selects back end B, and reads a backup file of file f from back end B.

Step 507: back end A writes the backup file that reads in the back end of impaired file f place in other arbitrary regions except that the shared zone of described impaired file.

As can be seen from the above-described embodiment, when the single file on certain back end is impaired, the application also provides a kind of method that can recover single impaired file, with the method that the impaired file on the failure node is recovered can only be provided in the prior art compare, when single file is impaired, just can be resumed the probability that to avoid whole DFS system to make mistakes.

Embodiment five

Corresponding with the restoration methods of above-mentioned a kind of impaired file, the embodiment of the present application also provides a kind of recovery device of impaired file.See also Fig. 6, it is the structural drawing of an embodiment of the recovery device of a kind of impaired file of the application, and this device comprises searches unit 601, first reading unit 602 and first recovery unit 603.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.

Search unit 601, be used for when a data node failure, search the back end title at the backup file place of the title of All Files on the fail data node and All Files;

First reading unit 602 is selected a data node arbitrarily on the back end that is used for respectively showing from described back end name referring, and read backup file from the back end of selecting;

First recovery unit 603, the backup file that is used for reading write in distributed file system DFS other any one back end except that the back end at described fail data node and backup file place simultaneously.

Need to prove, described method further comprises: first receiving element 604, be used for working as a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and do not detect when self losing efficacy, receive the Indication message that described back end sends, described Indication message is used to indicate described back end not lose efficacy, when not receiving described Indication message, described back end lost efficacy.

In addition, two kinds of methods that detect failure node are provided, by crc value relatively and then learn detection method that whether back end lost efficacy not only can detect because the impaired file that Bad Track produces, but also can detect the impaired file that produces owing to damages in logic such as system mistakes.

Embodiment six

See also Fig. 7, the embodiment of the present application also provides a kind of recovery device of impaired file.See also Fig. 7, it is the structural drawing of another embodiment of the recovery device of a kind of impaired file of the application, and this device comprises acquiring unit 701, second reading unit 702 and second recovery unit 703.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.

Acquiring unit 701 is used for obtaining the back end title at the backup file place of impaired file when file corruption on the back end;

Second reading unit 702 is selected a data node arbitrarily, and read backup file from the back end of selecting on the back end that is used for showing from the back end name referring that obtains;

Second recovery unit 703, the described backup file that is used for reading write in described impaired file place back end other arbitrary region except that the shared zone of described impaired file.

Need to prove, described method further comprises: second receiving element 704, be used for working as an applications client APP by a CRC who compares metamessage and the 2nd CRC that calculates according to file content, detect the described CRC of a file of a data intranodal and described the 2nd CRC when inequality, receive the notification message that described APP sends, described notification message is used to indicate the tested files on the measured data node impaired.

The second above-mentioned receiving element 704 can replace with the 3rd receiving element, be used for working as a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and detect when impaired file is arranged self, receive the notification message that described back end sends, described notification message is used to indicate the tested files on the described back end impaired.

Need to prove, one of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random Access Memory, RAM) etc.

More than the restoration methods and the device of a kind of impaired file that the application provided is described in detail, used specific embodiment herein the application's principle and embodiment are set forth, the explanation of above embodiment just is used to help to understand the application's method and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to the application's thought, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims

1. the restoration methods of an impaired file is characterized in that, comprising:

When a data node failure, search the back end title at the backup file place of the title of All Files on the fail data node and All Files;

From the back end at the backup file place that described back end name referring shows, select a data node arbitrarily respectively, and from the back end of selecting, read backup file;

The backup file that reads is write in the distributed file system in other any one back end except that the back end at described fail data node and backup file place simultaneously.

2. method according to claim 1 is characterized in that, described method also comprises:

When a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and do not detect when self losing efficacy, receive the Indication message that described back end sends, described Indication message is used to indicate described back end not lose efficacy, when not receiving described Indication message, described back end lost efficacy.

3. method according to claim 2 is characterized in that, a described data node detects each regional file successively according to the zone of dividing in the back end and comprises under the situation of self load less than load threshold:

Described back end is by comparing the first cyclic redundancy check (CRC) code CRC and the 2nd CRC that calculates according to file content in the metamessage, when detecting a CRC described in self the All Files and described the 2nd CRC number of files inequality and whether reaching threshold value, if described back end lost efficacy.

4. the restoration methods of an impaired file is characterized in that, comprising:

When file corruption on the back end, obtain the back end title at the backup file place of impaired file;

Select a data node on the back end that shows from the back end name referring that obtains arbitrarily, and from the back end of selecting, read backup file;

The described backup file that reads is write in the back end of described impaired file place in other arbitrary region except that the shared zone of described impaired file.

5. method according to claim 4 is characterized in that, described method also comprises:

As two CRC of an applications client by comparing the CRC in the metamessage and calculating according to file content, detect the described CRC of a file of a data intranodal and described the 2nd CRC when inequality, receive the notification message that described applications client sends, described notification message is used to indicate the tested files on the measured data node impaired.

6. method according to claim 4 is characterized in that, described method also comprises:

When a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and detect when impaired file is arranged self, receive the notification message that described back end sends, described notification message is used to indicate the tested files on the described back end impaired.

7. method according to claim 6 is characterized in that, a described data node detects each regional file successively according to the zone of dividing in the back end and comprises under the situation of self load less than load threshold:

Described back end is by a CRC and the 2nd CRC that calculates according to file content in the metamessage relatively, and whether the CRC and described the 2nd CRC that detect in self each file be inequality, if tested files is impaired.

8. the recovery device of an impaired file is characterized in that, comprising:

Search the unit, be used for when a data node failure, search the back end title at the backup file place of the title of All Files on the fail data node and All Files;

First reading unit is selected a data node arbitrarily on the back end that is used for respectively showing from described back end name referring, and read backup file from the back end of selecting;

First recovery unit, the backup file that is used for reading write in distributed file system other any one back end except that the back end at described fail data node and backup file place simultaneously.

9. method according to claim 8 is characterized in that, described method also comprises:

First receiving element, be used for working as a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and do not detect when self losing efficacy, receive the Indication message that described back end sends, described Indication message is used to indicate described back end not lose efficacy, and when not receiving described Indication message, described back end lost efficacy.

10. the recovery device of an impaired file is characterized in that, comprising:

Acquiring unit is used for obtaining the back end title at the backup file place of impaired file when file corruption on the back end;

Second reading unit is selected a data node arbitrarily, and read backup file from the back end of selecting on the back end that is used for showing from the back end name referring that obtains;

Second recovery unit, the described backup file that is used for reading write in described impaired file place back end other arbitrary region except that the shared zone of described impaired file.

11. device according to claim 10 is characterized in that, described device also comprises:

Second receiving element, be used for working as an applications client by a CRC who compares metamessage and the 2nd CRC that calculates according to file content, detect the described CRC of a file of a data intranodal and described the 2nd CRC when inequality, receive the notification message that described applications client sends, described notification message is used to indicate the tested files on the measured data node impaired.

12. device according to claim 10 is characterized in that, described device also comprises:

The 3rd receiving element, be used for working as a data node under the situation of self load less than load threshold, detect each regional file successively according to the zone of dividing in the back end, and detect when impaired file is arranged self, receive the notification message that described back end sends, described notification message is used to indicate the tested files on the described back end impaired.