A kind of file data backup method based on the Delta increment
Technical field
The present invention is a kind of file incremental backup technology, is generally used for the standby system based on file, is intended to reduce the memory capacity of using in the storage system.Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.
Background technology
Memory space inadequate is IT personnel pain in the necks always, because just will not buy more memory device, more will face all setting work that comes one after another behind the storage architecture of adjusting.Just much less the complicated loaded down with trivial details of these work in the process of extended storage capacity, more may need to shut down, and this can badly influence the normal operation of enterprise.
Enterprise must regularly carry out data backup for protected data, and this is one of reason of the quick accumulation of data.Especially now some enterprise begins to backup to earlier speed disk faster, back up to equipment such as tape more one by one, for must catch up with the same day come off duty to the next day finish before the working for the enterprise of a large amount of backups, Disk Backup is a good method, backup is fast, answer is also fast, but Disk Backup can be quickened the consumption of disk space undoubtedly.
In general there are a large amount of files and mail in enterprise in using, if each backup is all carried out full backup one time with All Files and data, that will need very large storage space.Generally adopt the mode of incremental backup and differential backup based on this reason industry.Differential backup (differential backup) can not removed the filing piece after backup is finished, and incremental backup can be removed the filing piece after finishing, so just can avoid some file unnecessarily to be backed up once more.Use the filing piece can also make the user view those files truly and need backup.
Be that incremental backup or differential backup all run into a same problem, that is to say for similar Outlook PST file and database file, file is bigger, and often change, thereby think all when adopting incremental backup or differential backup that therefore file backup has taken place carried out full backup.
Therefore how to provide a kind of method, when making it solve big file change, only the file part rather than the whole files that change of backup is that present data sharply increase the challenge that faces.
Summary of the invention
The purpose of this invention is to provide a kind of file incremental backup technology, be generally used for standby system, be intended to reduce the memory capacity of using in the storage system based on file.
The objective of the invention is to realize in the following manner, adopt " Delta file increment technology reduces the memory space of data; make disk abdicate more backup space; realize a large amount of bandwidth required when Backup Data on the disk is preserved the longer time and saved offline storage; system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, and wherein functions of modules and file backup step are as follows:
Delta Sequence block: relatively whether two files are identical, need come comparison by file, byte and Hash, therefore this module is the ordering of file sequence, byte sequence and Hash block, have the comparison filtering function, thereby whether contrast two sequences by the transportation in sequence rapidly identical;
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately, more all is the different piece of latest document and source document promptly at every turn;
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment also can be bigger, adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file;
Delta comparison module: by reading in the mode comparison document inconsequent part of file or byte line by line, the part that changes is taken out separately, and index record is passed through at the data seat, form a Delta delta file relatively, by this delta file, with the file of relatively more synthetic modification this moment of source document;
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided;
In architecture, generate the HASH hash of unique sign for the delegation of each, and compare the difference part of this delegation's byte stream, can correctly obtain the part of difference to guarantee two parts of different files.
In architecture, merge in the resume module step at Delta, each Delta delta file all merges with previous source document, forms the file of this time adjustment, that is to say the corresponding and FileVersion of each Delta delta file.
In architecture, be each document definition file sequence, Hash sequence and byte sequence.
The invention has the beneficial effects as follows: suppose in enterprise uses, if a 1G file has been modified 100 times, each data have only been revised 10K, and traditional backup method need back up whole file fully, that is to say the space that needs 100G, and adopt this file incremental backup technology based on the Delta technology, and only do not need to back up 10K, change 100 times, only needing to back up the data of 1000K, is several thousand times gap than 100G.Backup particularly Network Based will significantly reduce taking of the network bandwidth, thereby backup efficient is provided greatly.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.
Description of drawings
Accompanying drawing 1 is based on the file backup structural drawing of Delta technology;
Accompanying drawing 2 is based on the file increment variation diagram first time of Delta technology;
Accompanying drawing 3 is based on the file increment variation diagram second time of Delta technology;
Accompanying drawing 4 is based on the file merging/filing figure of Delta technology.
Embodiment
With reference to the accompanying drawings, method of the present invention is done following detailed explanation.
Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.This system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, as shown in the figure, and wherein:
Delta Sequence block: for relatively whether two files are identical, need come comparison, so this module is the ordering of file sequence, byte sequence and Hash block, relatively filtering function by file, byte and Hash.Thereby it is whether identical by two sequences of the contrast of the transportation in sequence rapidly.
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately.More all be the different piece of latest document and source document promptly at every turn.
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment may also can be bigger, can adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file.
Delta comparison module:, the part that changes is taken out separately, and index record is passed through at the data seat by reading in the mode comparison document inconsequent part of file or byte line by line.Form a Delta delta file relatively, relatively can synthesize the file of revising this moment by this delta file and source document.
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided.
Embodiment
Content of the present invention is described the process that realizes this architecture with an instantiation.
In accompanying drawing 2, description be file through revising, generate the process of a Delta delta file at last by the Delta increment technique.On the basis of source file, variation has all taken place in the first interline byte, second first byte of row, latter two byte of the third line.By the analysis of Delta comparison module and source file, the byte separating treatment of Bian Huaing has the most at last formed appearance last in the picture.Just just the byte-extraction that changes is come out, remain unchanged for the byte that does not have to change.
In accompanying drawing 3, with the first time Delta comparing class seemingly, twice file more all is to compare with source file.Rather than on previous basis, compare, this mainly be preceding once relatively after, formed relatively more fixing Hash sequence and byte sequence, just do not need regenerate relatively the time next time, thereby improve efficient relatively twice.Relatively with for the first time similar, also be to generate the Delta delta file for the second time.
Description is in accompanying drawing 4, and a source file and certain Delta delta file merge, and the file after the merging has formed a new source file.The modification of subsequent file all is based on this new source file basis and makes amendment.Rather than compare with initial source file.
So far, the complete process that has realized whole file data backup based on the Delta technology, the Delta increment that this technology and the traditional different part of heavy incremental backup technology have been to introduce between the file changes, and is not only to consider whether file variation has taken place.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.