[go: up one dir, main page]

CN101650677A - File data backup method based on Delta increment - Google Patents

File data backup method based on Delta increment Download PDF

Info

Publication number
CN101650677A
CN101650677A CN200910017342A CN200910017342A CN101650677A CN 101650677 A CN101650677 A CN 101650677A CN 200910017342 A CN200910017342 A CN 200910017342A CN 200910017342 A CN200910017342 A CN 200910017342A CN 101650677 A CN101650677 A CN 101650677A
Authority
CN
China
Prior art keywords
delta
file
module
backup
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910017342A
Other languages
Chinese (zh)
Inventor
刘正伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Langchao Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langchao Electronic Information Industry Co Ltd filed Critical Langchao Electronic Information Industry Co Ltd
Priority to CN200910017342A priority Critical patent/CN101650677A/en
Publication of CN101650677A publication Critical patent/CN101650677A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种基于Delta增量的文件数据备份方法,是采用“Delta文件增量”技术减少数据的存储量,使磁盘让出更多的备份空间,实现磁盘上的备份数据保存更长的时间和节约离线存储时所需的大量的带宽,系统体系结构包括:Delta Sequence序列模块、Delta读取模块、Delta合并模块、Delta比较模块、Delta Processor处理模块,其中模块功能和文件备份步骤如下:Delta Sequence序列模块:比较两个文件是否相同,需要通过文件、字节与Hash来比较,因此这个模块为文件序列、字节序列和Hash序列模块的排序,具有比较过滤功能,从而迅速的通过在序列中的运输来对比两个序列是否相同;优点是应用在数据备份系统中,能够增加磁盘备份利用率,节约备份空间,应对数据急剧增加面临的挑战。

The invention provides a method for backing up file data based on Delta increments, which uses the "Delta file increment" technology to reduce the storage capacity of data, so that disks can yield more backup space, and realize longer storage of backup data on disks. Time and save a large amount of bandwidth required for offline storage. The system architecture includes: Delta Sequence sequence module, Delta reading module, Delta merging module, Delta comparison module, Delta Processor processing module. The module functions and file backup steps are as follows: Delta Sequence module: To compare whether two files are the same, you need to compare files, bytes and Hash, so this module sorts file sequences, byte sequences and Hash sequence modules, and has a comparison and filtering function, so that you can quickly pass through the The transportation in the sequence is used to compare whether the two sequences are the same; the advantage is that it is applied in the data backup system, which can increase the utilization rate of disk backup, save backup space, and cope with the challenges faced by the rapid increase of data.

Description

A kind of file data backup method based on the Delta increment
Technical field
The present invention is a kind of file incremental backup technology, is generally used for the standby system based on file, is intended to reduce the memory capacity of using in the storage system.Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.
Background technology
Memory space inadequate is IT personnel pain in the necks always, because just will not buy more memory device, more will face all setting work that comes one after another behind the storage architecture of adjusting.Just much less the complicated loaded down with trivial details of these work in the process of extended storage capacity, more may need to shut down, and this can badly influence the normal operation of enterprise.
Enterprise must regularly carry out data backup for protected data, and this is one of reason of the quick accumulation of data.Especially now some enterprise begins to backup to earlier speed disk faster, back up to equipment such as tape more one by one, for must catch up with the same day come off duty to the next day finish before the working for the enterprise of a large amount of backups, Disk Backup is a good method, backup is fast, answer is also fast, but Disk Backup can be quickened the consumption of disk space undoubtedly.
In general there are a large amount of files and mail in enterprise in using, if each backup is all carried out full backup one time with All Files and data, that will need very large storage space.Generally adopt the mode of incremental backup and differential backup based on this reason industry.Differential backup (differential backup) can not removed the filing piece after backup is finished, and incremental backup can be removed the filing piece after finishing, so just can avoid some file unnecessarily to be backed up once more.Use the filing piece can also make the user view those files truly and need backup.
Be that incremental backup or differential backup all run into a same problem, that is to say for similar Outlook PST file and database file, file is bigger, and often change, thereby think all when adopting incremental backup or differential backup that therefore file backup has taken place carried out full backup.
Therefore how to provide a kind of method, when making it solve big file change, only the file part rather than the whole files that change of backup is that present data sharply increase the challenge that faces.
Summary of the invention
The purpose of this invention is to provide a kind of file incremental backup technology, be generally used for standby system, be intended to reduce the memory capacity of using in the storage system based on file.
The objective of the invention is to realize in the following manner, adopt " Delta file increment technology reduces the memory space of data; make disk abdicate more backup space; realize a large amount of bandwidth required when Backup Data on the disk is preserved the longer time and saved offline storage; system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, and wherein functions of modules and file backup step are as follows:
Delta Sequence block: relatively whether two files are identical, need come comparison by file, byte and Hash, therefore this module is the ordering of file sequence, byte sequence and Hash block, have the comparison filtering function, thereby whether contrast two sequences by the transportation in sequence rapidly identical;
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately, more all is the different piece of latest document and source document promptly at every turn;
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment also can be bigger, adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file;
Delta comparison module: by reading in the mode comparison document inconsequent part of file or byte line by line, the part that changes is taken out separately, and index record is passed through at the data seat, form a Delta delta file relatively, by this delta file, with the file of relatively more synthetic modification this moment of source document;
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided;
In architecture, generate the HASH hash of unique sign for the delegation of each, and compare the difference part of this delegation's byte stream, can correctly obtain the part of difference to guarantee two parts of different files.
In architecture, merge in the resume module step at Delta, each Delta delta file all merges with previous source document, forms the file of this time adjustment, that is to say the corresponding and FileVersion of each Delta delta file.
In architecture, be each document definition file sequence, Hash sequence and byte sequence.
The invention has the beneficial effects as follows: suppose in enterprise uses, if a 1G file has been modified 100 times, each data have only been revised 10K, and traditional backup method need back up whole file fully, that is to say the space that needs 100G, and adopt this file incremental backup technology based on the Delta technology, and only do not need to back up 10K, change 100 times, only needing to back up the data of 1000K, is several thousand times gap than 100G.Backup particularly Network Based will significantly reduce taking of the network bandwidth, thereby backup efficient is provided greatly.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.
Description of drawings
Accompanying drawing 1 is based on the file backup structural drawing of Delta technology;
Accompanying drawing 2 is based on the file increment variation diagram first time of Delta technology;
Accompanying drawing 3 is based on the file increment variation diagram second time of Delta technology;
Accompanying drawing 4 is based on the file merging/filing figure of Delta technology.
Embodiment
With reference to the accompanying drawings, method of the present invention is done following detailed explanation.
Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.This system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, as shown in the figure, and wherein:
Delta Sequence block: for relatively whether two files are identical, need come comparison, so this module is the ordering of file sequence, byte sequence and Hash block, relatively filtering function by file, byte and Hash.Thereby it is whether identical by two sequences of the contrast of the transportation in sequence rapidly.
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately.More all be the different piece of latest document and source document promptly at every turn.
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment may also can be bigger, can adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file.
Delta comparison module:, the part that changes is taken out separately, and index record is passed through at the data seat by reading in the mode comparison document inconsequent part of file or byte line by line.Form a Delta delta file relatively, relatively can synthesize the file of revising this moment by this delta file and source document.
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided.
Embodiment
Content of the present invention is described the process that realizes this architecture with an instantiation.
In accompanying drawing 2, description be file through revising, generate the process of a Delta delta file at last by the Delta increment technique.On the basis of source file, variation has all taken place in the first interline byte, second first byte of row, latter two byte of the third line.By the analysis of Delta comparison module and source file, the byte separating treatment of Bian Huaing has the most at last formed appearance last in the picture.Just just the byte-extraction that changes is come out, remain unchanged for the byte that does not have to change.
In accompanying drawing 3, with the first time Delta comparing class seemingly, twice file more all is to compare with source file.Rather than on previous basis, compare, this mainly be preceding once relatively after, formed relatively more fixing Hash sequence and byte sequence, just do not need regenerate relatively the time next time, thereby improve efficient relatively twice.Relatively with for the first time similar, also be to generate the Delta delta file for the second time.
Description is in accompanying drawing 4, and a source file and certain Delta delta file merge, and the file after the merging has formed a new source file.The modification of subsequent file all is based on this new source file basis and makes amendment.Rather than compare with initial source file.
So far, the complete process that has realized whole file data backup based on the Delta technology, the Delta increment that this technology and the traditional different part of heavy incremental backup technology have been to introduce between the file changes, and is not only to consider whether file variation has taken place.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.

Claims (4)

1, a kind of file data backup method based on the Delta increment, it is characterized in that, adopt " De1ta file increment technology reduces the memory space of data; make disk abdicate more backup space; realize a large amount of bandwidth required when Backup Data on the disk is preserved the longer time and saved offline storage; system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, and wherein functions of modules and file backup step are as follows:
Delta Sequence block: relatively whether two files are identical, need come comparison by file, byte and Hash, therefore this module is the ordering of file sequence, byte sequence and Hash block, have the comparison filtering function, thereby whether contrast two sequences by the transportation in sequence rapidly identical;
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately, more all is the different piece of latest document and source document promptly at every turn;
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment also can be bigger, adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file;
Delta comparison module: by reading in the mode comparison document inconsequent part of file or byte line by line, the part that changes is taken out separately, and index record is passed through at the data seat, form a Delta delta file relatively, by this delta file, with the file of relatively more synthetic modification this moment of source document;
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided
2, method according to claim 1, it is characterized in that, in architecture, generate the HASH hash of unique sign for the delegation of each, and relatively the difference part of this delegation's byte stream can correctly be obtained the part of difference to guarantee two parts of different files.
3, method according to claim 1, it is characterized in that, in architecture, merge in the resume module step at Delta, each Delta delta file all merges with previous source document, form the file of this time adjustment, that is to say, the corresponding and FileVersion of each Delta delta file.
4, Delta Sequence block according to claim 1 is characterized in that in architecture, is each document definition file sequence, Hash sequence and byte sequence.
CN200910017342A 2009-07-27 2009-07-27 File data backup method based on Delta increment Pending CN101650677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910017342A CN101650677A (en) 2009-07-27 2009-07-27 File data backup method based on Delta increment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910017342A CN101650677A (en) 2009-07-27 2009-07-27 File data backup method based on Delta increment

Publications (1)

Publication Number Publication Date
CN101650677A true CN101650677A (en) 2010-02-17

Family

ID=41672916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910017342A Pending CN101650677A (en) 2009-07-27 2009-07-27 File data backup method based on Delta increment

Country Status (1)

Country Link
CN (1) CN101650677A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101848274A (en) * 2010-03-12 2010-09-29 深圳市同洲电子股份有限公司 Methods and devices for backup and recovery of records in mobile terminal
CN102682127A (en) * 2012-05-16 2012-09-19 北京像素软件科技股份有限公司 Data version control method
CN102737098A (en) * 2011-03-29 2012-10-17 日本电气株式会社 Distributed file system
CN103309847A (en) * 2012-03-06 2013-09-18 百度在线网络技术(北京)有限公司 Method and equipment for realizing file comparison
CN103312743A (en) * 2012-03-09 2013-09-18 盛乐信息技术(上海)有限公司 Data synchronization device and method
CN103379150A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Cloud service file management system
CN103377208A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Method for updating files in cloud service file management system
CN103544075A (en) * 2011-12-31 2014-01-29 华为数字技术(成都)有限公司 Data processing method and system
CN103793182A (en) * 2012-09-04 2014-05-14 Lsi公司 Scalable storage protection
CN104794143A (en) * 2014-07-30 2015-07-22 北京中科同向信息技术有限公司 Agent-free backup technology
CN105404562A (en) * 2014-08-18 2016-03-16 北京云巢动脉科技有限公司 Method and system for realizing efficient backup of mirror file of operating system
CN105474250A (en) * 2013-07-12 2016-04-06 贸易技术国际公司 Tailored messaging
CN105516349A (en) * 2016-01-04 2016-04-20 陈华锋 File transmission method and system
CN107111534A (en) * 2016-06-28 2017-08-29 华为技术有限公司 A kind of method and apparatus of data processing
CN109597828A (en) * 2018-09-29 2019-04-09 阿里巴巴集团控股有限公司 A kind of off-line data checking method, device and server
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101848274A (en) * 2010-03-12 2010-09-29 深圳市同洲电子股份有限公司 Methods and devices for backup and recovery of records in mobile terminal
CN102737098B (en) * 2011-03-29 2017-11-10 日本电气株式会社 Distributed file system
CN102737098A (en) * 2011-03-29 2012-10-17 日本电气株式会社 Distributed file system
CN103544075A (en) * 2011-12-31 2014-01-29 华为数字技术(成都)有限公司 Data processing method and system
CN103309847A (en) * 2012-03-06 2013-09-18 百度在线网络技术(北京)有限公司 Method and equipment for realizing file comparison
CN103312743A (en) * 2012-03-09 2013-09-18 盛乐信息技术(上海)有限公司 Data synchronization device and method
CN103377208A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Method for updating files in cloud service file management system
CN103379150A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Cloud service file management system
CN102682127B (en) * 2012-05-16 2014-12-03 北京像素软件科技股份有限公司 Data version control method
CN102682127A (en) * 2012-05-16 2012-09-19 北京像素软件科技股份有限公司 Data version control method
CN103793182A (en) * 2012-09-04 2014-05-14 Lsi公司 Scalable storage protection
US10191676B2 (en) 2012-09-04 2019-01-29 Seagate Technology Llc Scalable storage protection
US9613656B2 (en) 2012-09-04 2017-04-04 Seagate Technology Llc Scalable storage protection
US10664548B2 (en) 2013-07-12 2020-05-26 Trading Technologies International, Inc. Tailored messaging
CN105474250A (en) * 2013-07-12 2016-04-06 贸易技术国际公司 Tailored messaging
CN104794143A (en) * 2014-07-30 2015-07-22 北京中科同向信息技术有限公司 Agent-free backup technology
CN105404562A (en) * 2014-08-18 2016-03-16 北京云巢动脉科技有限公司 Method and system for realizing efficient backup of mirror file of operating system
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus
CN105516349A (en) * 2016-01-04 2016-04-20 陈华锋 File transmission method and system
CN107111534A (en) * 2016-06-28 2017-08-29 华为技术有限公司 A kind of method and apparatus of data processing
CN109597828A (en) * 2018-09-29 2019-04-09 阿里巴巴集团控股有限公司 A kind of off-line data checking method, device and server

Similar Documents

Publication Publication Date Title
CN101650677A (en) File data backup method based on Delta increment
US11575746B2 (en) System and method for real-time cloud data synchronization using a database binary log
CN101604268A (en) A filtering method for monitoring directory change events
TWI476608B (en) A distributed computing data merging method, system and device thereof
CN102508880B (en) Method for joining files and method for splitting files
EP2052337B1 (en) Retro-fitting synthetic full copies of data
Aly et al. M3: Stream processing on main-memory mapreduce
WO2017080431A1 (en) Log analysis-based database replication method and device
US20130198148A1 (en) Estimating data reduction in storage systems
CN102339321A (en) Network file system with version control and method using same
Waas Beyond Conventional Data Warehousing—Massively Parallel Data Processing with Greenplum Database: (Invited Talk)
CN105955843A (en) Method and device used for database recovery
CN111611440B (en) Method for rapidly improving OFD signature, signature and verification
CN112416654A (en) Database log replay method, device, equipment and storage medium
CN1523523A (en) System and method of distributing replication commands
CN112037003A (en) File account checking processing method and device
CN106528344A (en) Log management method for storage system
CN108121793A (en) A kind of DB Backup dispositions method and device
US8818943B1 (en) Mirror resynchronization of fixed page length tables for better repair time to high availability in databases
CN104461931B (en) The trace log output processing method of multinuclear storage device and multi-core environment
US20120254105A1 (en) Synchronizing Records Between Databases
CN104142943A (en) Database expansion method and database
WO2018019310A1 (en) Big data system data backup and recovery methods and devices, and computer storage medium
CN104133876A (en) Affair-based incremental management cluster configuration file method
US10514988B2 (en) Method and system of migrating applications to a cloud-computing environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20100217