CN100547555C - A Data Backup System Based on Fingerprint - Google Patents
A Data Backup System Based on Fingerprint Download PDFInfo
- Publication number
- CN100547555C CN100547555C CNB2007101687158A CN200710168715A CN100547555C CN 100547555 C CN100547555 C CN 100547555C CN B2007101687158 A CNB2007101687158 A CN B2007101687158A CN 200710168715 A CN200710168715 A CN 200710168715A CN 100547555 C CN100547555 C CN 100547555C
- Authority
- CN
- China
- Prior art keywords
- backup
- job
- file
- server
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种基于指纹的数据备份系统,属于计算机存储备份技术领域,目的在于降低数据备份的管理、存储以及网络开销,提高备份性能。本发明包括备份服务器、备份代理、存储服务器和Web服务器,它们通过网络相互通信完成数据备份与恢复;本发明采用基于锚的文件分块技术识别备份文件的冗余数据,具有修改稳定性,计算开销小;数据分块以其指纹为索引存储在存储服务器的磁盘阵列上,消除了冗余数据的备份,节省磁盘存储空间;数据分块一旦存储就不再擦除,可以连续追加在磁盘上,消除了磁盘存储碎片;采用有效的备份缓冲策略,减少了备份的网络开销,提高了数据备份速度,降低了备份对应用服务器的影响。
A fingerprint-based data backup system belongs to the technical field of computer storage backup, and aims at reducing the management, storage and network overhead of data backup and improving backup performance. The present invention includes a backup server, a backup agent, a storage server and a Web server, and they communicate with each other through the network to complete data backup and recovery; the present invention adopts the anchor-based file segmentation technology to identify redundant data of backup files, which has modification stability and calculation The overhead is small; the data blocks are stored on the disk array of the storage server with their fingerprints as the index, eliminating the backup of redundant data and saving disk storage space; once the data blocks are stored, they will not be erased and can be continuously added to the disk , Eliminate disk storage fragmentation; adopt effective backup buffer strategy, reduce backup network overhead, improve data backup speed, and reduce the impact of backup on application servers.
Description
技术领域 technical field
本发明属于计算机存储备份领域,具体涉及一种数据备份系统。The invention belongs to the field of computer storage backup, and in particular relates to a data backup system.
背景技术 Background technique
在当今这个知识爆炸的信息时代,无论对企业还是个人来说,数据都是一项弥足珍贵的资源。数据丢失轻则影响企业业务连续性,使其丧失一时的竞争优势,重则能使一个企业破产倒闭。引起数据丢失的原因很多,包括系统软硬件故障、人为操作失误或破坏以及不可抗拒力(自然灾害、战争)等。为了保护数据免遭不测,传统的方法是周期性地把数据拷贝到可移动的媒介比如磁带、光盘上,然后再离线运送到一个相对安全的地方以便在必要时恢复这些数据。应当指出,这种传统的数据保护方法存在一些明显的缺点:(1)、可移动的存储介质比如磁带、光盘等随着时间的流逝,会出现磨损或损坏使其存储可靠性降低因而不适合作数据的长期存储介质。(2)、作为备份大容量数据的常用存储媒介的磁带,其读写速度往往很慢,而且由于是顺序存储设备,在恢复数据时通常会出现频繁的机械倒带操作,如果备份数据分布在几条磁带上,还需费时的装卸操作。这使得利用磁带进行数据备份和恢复是一件相当耗时的工作。(3)、需要雇用专人把备份数据运送到远程站点,并且保证运输和储存过程中的数据安全。由此可以看出,传统的数据备份需要人工介入完成许多任务,是一项代价高昂的、繁琐的工作。为了提高数据备份和恢复的效率,克服传统的数据保护技术的缺点,近二十年来,世界上一些知名的IT企业或研究机构研制出了形形色色的数据备份系统。包括IBM的TotalStorage,HP的OpenView存储镜像软件、CASA、XPCA以及EVACA,EMC的SRDF和MirrorView,VERITAS的NetBackup等等。这些商业系统没有重复数据删除功能,为了存储在备份中产生的大量冗余数据,往往需要使用磁盘到磁带(D2T)技术,即使用高速磁盘作为备份缓冲区以提高在线备份效率,然后在后台把磁盘缓冲区中的备份数据迁移到磁带库或光盘库等低速大容量的存储媒介上,故其后台存储设备还是需要耗费大量的人力物力进行日常维护。由于磁盘存储较磁带存储具有管理方便、存取速度快等优点,随着磁盘存储技术的发展,使用磁盘存储数据的备份系统越来越受到重视。目前的磁盘存储技术能够很容易搭建一个TB甚至PB级的磁盘存储系统。每比特磁盘存储的价格越来越便宜使得利用磁盘永久归档数据变得现实起来。对于一个基于磁盘的数据备份系统来说,备份数据永久存储于磁盘而不擦除具有许多优点:首先,数据可以连续地写到磁盘上,不会因为空间回收而产生磁盘碎片,其次,用户的数据历史得到完整的保存,用户可以很方便地浏览文件的任一历史版本,第三,有利于保护用户的备份数据,避免了用户误操作而删除重要的数据。然而,对于一个永久存储的基于磁盘的备份系统来说,最大的挑战来源于用户不断增加的备份数据。通常,企业的数据具有高度的冗余,大量重复的数据和文件存储在系统中,一个文件的多个编辑版本之间也存在大量重复的内容。目前广泛使用的基于文件的备份技术不能识别文件之间的冗余数据,导致越来越多的重复数据备份到系统中,不但降低了备份系统的磁盘空间利用率,而且无端通过网络传输了大量冗余数据,增加了数据备份的网络开销,延长了数据备份时间。In today's information age of knowledge explosion, data is a precious resource for both enterprises and individuals. Data loss can affect the business continuity of the enterprise, causing it to lose a temporary competitive advantage, or it can cause a company to go bankrupt. There are many reasons for data loss, including system software and hardware failures, human error or destruction, and force majeure (natural disasters, wars), etc. In order to protect the data from accidents, the traditional method is to periodically copy the data to removable media such as tapes and optical discs, and then transport them offline to a relatively safe place so that the data can be restored when necessary. It should be pointed out that there are some obvious disadvantages in this traditional data protection method: (1), removable storage media such as tapes, optical discs, etc. will wear out or be damaged as time goes by, so that their storage reliability will be reduced, so they are not suitable for storage. Long-term storage medium for data. (2) As a common storage medium for backing up large-capacity data, the reading and writing speed of magnetic tape is often very slow, and because it is a sequential storage device, frequent mechanical rewinding operations usually occur when restoring data. If the backup data is distributed in On several tapes, time-consuming loading and unloading operations are also required. This makes data backup and recovery using tape a time-consuming task. (3) It is necessary to hire a special person to transport the backup data to the remote site, and ensure the data security during transportation and storage. It can be seen from this that traditional data backup requires manual intervention to complete many tasks, which is a costly and tedious task. In order to improve the efficiency of data backup and recovery and overcome the shortcomings of traditional data protection technologies, some well-known IT companies or research institutions in the world have developed various data backup systems in the past two decades. Including IBM's TotalStorage, HP's OpenView storage mirroring software, CASA, XPCA and EVACA, EMC's SRDF and MirrorView, VERITAS's NetBackup and so on. These commercial systems do not have the function of deduplication. In order to store a large amount of redundant data generated in backup, it is often necessary to use disk-to-tape (D2T) technology, that is, to use high-speed disks as backup buffers to improve online backup efficiency, and then transfer them in the background. The backup data in the disk buffer is migrated to low-speed and large-capacity storage media such as tape library or optical disk library, so the background storage device still needs a lot of manpower and material resources for daily maintenance. Because disk storage has the advantages of convenient management and fast access speed compared with tape storage, with the development of disk storage technology, more and more attention has been paid to the backup system using disk storage data. The current disk storage technology can easily build a TB or even PB disk storage system. The ever-increasing price per bit of disk storage has made permanent archiving of data on disk a reality. For a disk-based data backup system, there are many advantages of permanently storing backup data on the disk without erasing: first, data can be continuously written to the disk, and disk fragmentation will not occur due to space reclamation; The data history is completely preserved, and the user can easily browse any historical version of the file. Third, it is beneficial to protect the user's backup data and prevent the user from deleting important data due to misoperation. However, for a permanent storage disk-based backup system, the biggest challenge comes from users' ever-increasing backup data. Usually, enterprise data has a high degree of redundancy, a large number of duplicate data and files are stored in the system, and there are also a lot of duplicate content among multiple edited versions of a file. The currently widely used file-based backup technology cannot identify redundant data between files, resulting in more and more duplicate data being backed up to the system, which not only reduces the disk space utilization of the backup system, but also transmits a large amount of data through the network for no reason. Redundant data increases the network overhead of data backup and prolongs the data backup time.
由此可见,开发一个永久存储的基于磁盘的备份系统,并采用新的数据备份技术清除备份的冗余数据,提高系统的存储效率,是具有积极意义的。It can be seen that it is of positive significance to develop a disk-based backup system for permanent storage, and to use new data backup technology to clear redundant backup data and improve system storage efficiency.
发明内容 Contents of the invention
本发明提出一种基于指纹的数据备份系统,系统采用磁盘永久存储备份数据并采用基于指纹的数据备份技术以删除备份中的冗余数据,目的在于降低数据备份的管理、存储以及网络开销,提高备份性能。The present invention proposes a data backup system based on fingerprints. The system uses disks to permanently store backup data and uses fingerprint-based data backup technology to delete redundant data in the backup. The purpose is to reduce the management, storage and network overhead of data backup and improve backup performance.
本发明的一种基于指纹的数据备份系统,包括备份服务器、备份代理、存储服务器和Web服务器,它们通过网络相互通信完成数据备份与恢复,其特征在于:A kind of fingerprint-based data backup system of the present invention comprises backup server, backup agent, storage server and Web server, they complete data backup and recovery through network mutual communication, it is characterized in that:
所述备份服务器装有配置文件和目录数据库,备份服务器的配置文件中记录用户定义的作业对象,作业对象包含指定系统操作作业运行的属性,备份服务器通过作业对象控制着整个数据备份和恢复的过程;目录数据库存储作业记录,作业记录保存作业对象运行的管理信息;The backup server is equipped with a configuration file and a directory database. The user-defined job object is recorded in the configuration file of the backup server. The job object includes the attributes of the specified system operation job operation. The backup server controls the entire data backup and recovery process through the job object. ;The directory database stores job records, and the job records save the management information of the operation of the job object;
所述备份代理安装于网络中每一个需要备份数据的主机上,备份时由备份代理从所在主机的文件系统中读取需要备份的文件,对文件进行基于锚的分块并计算分块的指纹,把指纹和部分需要的分块数据通过网络送往存储服务器;恢复时备份代理通过网络从存储服务器接收文件数据并写到所在主机的文件系统中指定的目录下;备份代理对文件进行基于锚的分块步骤为:The backup agent is installed on each host that needs to back up data in the network. When backing up, the backup agent reads the file to be backed up from the file system of the host where it is located, performs anchor-based segmentation on the file and calculates the fingerprint of the block , and send the fingerprint and part of the required block data to the storage server through the network; when restoring, the backup agent receives the file data from the storage server through the network and writes it to the specified directory in the file system of the host; the backup agent performs anchor-based The chunking steps are:
(1)以文件的开头48字节b1,b2,...,b48为一个窗口,以式H1=(b1*p47+b2*p46+...+b48)mod M计算文件的第一个窗口的哈希值;式中p为17,M为232,哈希值存储在变量H1中;(1) Take the first 48 bytes b 1 , b 2 ,..., b 48 of the file as a window, and use the formula H 1 =(b 1 *p 47 +b 2 *p 46 +...+b 48 ) mod M calculates the hash value of the first window of the file; where p is 17, M is 2 32 , and the hash value is stored in the variable H 1 ;
(2)向后滑动一个字节,以式H2=(p*H1+b49-b1*p48)mod M计算文件第二个窗口b2,b3,...,b49的哈希值,存储在变量H2中;(2) Slide one byte backward, calculate the second window b 2 , b 3 ,..., b 49 of the file with the formula H 2 =(p*H 1 +b 49 -b 1 *p 48 )mod M The hash value of is stored in the variable H2 ;
(3)以此类推,计算文件的所有窗口的哈希值;(3) By analogy, calculate the hash values of all windows of the file;
(4)对每个窗口的哈希值,取其低13位组成一个二进制数,如果此数等于61,则确定其相应的窗口为一个锚,以锚为边界把文件分成大小不一的数据块;(4) For the hash value of each window, take its lower 13 bits to form a binary number. If the number is equal to 61, then determine the corresponding window as an anchor, and use the anchor as the boundary to divide the file into data of different sizes piece;
所述存储服务器安装有大容量磁盘阵列,大容量磁盘阵列是数据备份的目的地,备份时通过网络从相应的备份代理接收指纹或数据分块,把数据分块存储到磁盘上,并建立文件的索引;恢复时则从大容量磁盘阵列根据文件索引重构文件,并把文件数据通过网络送到相应的备份代理;The storage server is equipped with a large-capacity disk array, and the large-capacity disk array is the destination of data backup. When backing up, it receives fingerprints or data blocks from the corresponding backup agent through the network, stores the data blocks on the disk, and creates a file index; when restoring, the file is reconstructed from the large-capacity disk array according to the file index, and the file data is sent to the corresponding backup agent through the network;
所述Web服务器是本系统的B-S模式网页用户管理界面,通过登录Web服务器,用户既可以指定系统完成交互式的备份或恢复作业、监视系统自动调度型作业的运行情况,还可以修改备份服务器的配置文件、定制作业对象,进行设备管理。The Web server is the B-S mode web page user management interface of the system. By logging into the Web server, the user can designate the system to complete interactive backup or restore jobs, monitor the operation of the system's automatic scheduling type job, and can also modify the settings of the backup server. Configure files, customize job objects, and manage devices.
所述的基于指纹的数据备份系统,其特征在于,所述备份服务器包括备份服务器初始化模块、命令监听模块、命令处理模块、作业处理模块和网络通信模块;The fingerprint-based data backup system is characterized in that the backup server includes a backup server initialization module, a command monitoring module, a command processing module, a job processing module and a network communication module;
所述备份服务器初始化模块执行初始化工作,包括读取配置文件、建立内存中的资源链表、检查目录数据库状态、保证配置文件和目录数据库的数据一致性和完整性、启动命令监控端口、接受来自Web服务器的用户命令、初始化作业队列和用户命令队列、向作业队列中加载作业对象、启动作业和网络监控服务;The backup server initialization module performs initialization work, including reading configuration files, establishing resource linked lists in memory, checking directory database status, ensuring data consistency and integrity of configuration files and directory databases, starting command monitoring ports, accepting data from Web User commands of the server, initializing the job queue and user command queue, loading job objects into the job queue, starting jobs and network monitoring services;
所述命令监听模块是由系统生成的一个网络监听线程,对Web服务器的连接请求进行认证,保证只有经过系统授权的Web服务器才能连接系统,监听已通过认证的Web服务器发送来的命令请求;收到命令请求时,将命令请求加入到用户命令队列中等待系统处理;The command monitoring module is a network monitoring thread generated by the system, which authenticates the connection request of the Web server, ensures that only the Web server authorized by the system can connect to the system, and monitors the command request sent by the authenticated Web server; When a command request is received, add the command request to the user command queue and wait for the system to process it;
所述命令处理模块包括一个用户命令队列和N个命令工作线程,当用户命令队列溢出时,命令监听模块转入睡眠状态;这些命令工作线程不断从用户命令队列中读取命令并执行,根据所执行命令的不同完成不同的功能;当命令监听模块向用户命令队列中加入一个命令时,如果当前没有空闲的命令工作线程且活跃的命令工作线程的数目没有达到N时,就生成一个新的命令工作线程;命令工作线程每次从用户命令队列中读取命令时都检查命令监听模块的状态,如果其处于睡眠状态则唤醒它;The command processing module includes a user command queue and N command worker threads. When the user command queue overflows, the command monitoring module goes into a sleep state; these command worker threads constantly read and execute commands from the user command queue, according to the Different execution commands perform different functions; when the command monitoring module adds a command to the user command queue, if there is no idle command worker thread and the number of active command worker threads does not reach N, a new command is generated Worker thread; the command worker thread checks the status of the command monitoring module every time it reads a command from the user command queue, and wakes it up if it is in a sleep state;
所述作业处理模块包括一个作业队列、L个作业工作线程和一个作业队列加载线程,当作业队列发生溢出时,作业队列加载线程进入睡眠状态;作业工作线程不断从作业队列中取作业对象并执行,根据作业对象属性的不同调用不同的资源、实现不同的功能;作业队列加载线程进行作业调度,检查作业资源链中每个作业对象的调度策略属性,把需要调度运行的作业对象加入作业队列中,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到L时,就生成一个新的作业工作线程;作业工作线程每次从作业队列中读取作业对象时都检查作业队列加载线程的状态,如果其处于睡眠状态则唤醒它;The job processing module includes a job queue, L job working threads and a job queue loading thread, when the job queue overflows, the job queue loading thread enters a sleep state; the job working thread constantly gets the job object from the job queue and executes , call different resources and implement different functions according to different job object attributes; the job queue loads threads to perform job scheduling, checks the scheduling policy attributes of each job object in the job resource chain, and adds the job objects that need to be scheduled to run into the job queue , if there is currently no idle job worker thread and the number of active job worker threads does not reach L, a new job worker thread is generated; every time a job worker thread reads a job object from the job queue, it checks the job queue loading The state of the thread, waking it up if it is asleep;
所述网络通信模块把标准的网络通信应用编程接口进行封装,向命令工作线程和作业工作线程提供网络通信接口,网络通信接口实现备份服务器、备份代理和存储服务器之间的数据传输协议。The network communication module encapsulates a standard network communication application programming interface, provides a network communication interface to the command worker thread and the job worker thread, and the network communication interface realizes the data transmission protocol between the backup server, the backup agent and the storage server.
所述的基于指纹的数据备份系统,其特征在于,所述备份代理包括备份代理初始化模块、请求监听模块、作业处理模块、文件分块模块和网络通信模块;The fingerprint-based data backup system is characterized in that the backup agent includes a backup agent initialization module, a request monitoring module, a job processing module, a file block module and a network communication module;
所述备份代理初始化模块,执行初始化工作,包括读取备份代理配置文件、建立内存资源链表、初始化作业队列、启动备份服务器请求监听模块;The backup agent initialization module performs initialization work, including reading the backup agent configuration file, establishing a memory resource linked list, initializing the job queue, and starting the backup server request monitoring module;
所述请求监听模块监听网络上备份服务器的连接请求,认证连接的备份服务器,认证通过后生成一个网络连接套接字和此备份服务器通信并加入作业队列中;The request monitoring module monitors the connection request of the backup server on the network, authenticates the connected backup server, and generates a network connection socket to communicate with the backup server and join the job queue after passing the authentication;
所述作业处理模块包括一个作业队列和M个作业工作线程,当作业队列溢出时,请求监听模块转入睡眠状态;作业工作线程从作业队列中取出一个网络连接套接字后,首先为作业建立一个作业控制记录,把网络连接套接字链入作业控制记录的成员变量中,然后通过此网络连接套接字和备份服务器交互,把备份服务器作业对象的有关属性通过变换后赋值给作业控制记录的相应成员变量;然后用从备份服务器处得到的作业票据ticket连接相应的存储服务器,产生一个和存储服务器通信的网络连接套接字并将之链入作业控制记录的成员变量中;当请求监听模块向作业队列中加入一个网络连接套接字时,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到M时,就生成一个新的作业工作线程;作业工作线程每次从作业队列中取一个网络连接套接字时都检查请求监听模块的状态,如果其处于睡眠状态则唤醒它;Described job processing module comprises a job queue and M job worker threads, when job queue overflows, request monitor module to transfer to sleep state; After job worker thread takes out a network connection socket from job queue, at first establishes for job A job control record, which links the network connection socket into the member variable of the job control record, and then interacts with the backup server through the network connection socket, and assigns the relevant attributes of the backup server job object to the job control record after transformation The corresponding member variable; then use the job ticket obtained from the backup server to connect to the corresponding storage server, generate a network connection socket for communication with the storage server and link it into the member variable of the job control record; when the request monitor When the module adds a network connection socket to the job queue, if there is no idle job worker thread and the number of active job worker threads does not reach M, a new job worker thread is generated; the job worker thread starts from When fetching a network connection socket in the job queue, check the status of the request monitoring module, and wake it up if it is in a sleeping state;
所述文件分块模块接受作业处理模块中作业工作线程的命令执行备份作业的文件分块任务,在客户机文件系统上打开文件集中的每一个文件,对文件进行基于锚的分块并计算分块指纹,和相应的存储服务器协调执行第一备份过程的备份算法;The file chunking module accepts the command of the job worker thread in the job processing module to execute the file chunking task of the backup job, opens each file in the file set on the client file system, performs anchor-based chunking on the file and calculates the chunking task. Block fingerprints, coordinating with the corresponding storage server to execute the backup algorithm of the first backup process;
所述网络通信模块由作业的网络连接套接字组成,备份代理的每个作业都拥有两个网络连接套接字,分别用于和该作业对应的备份服务器作业以及存储服务器作业通信。The network communication module is composed of job network connection sockets, and each job of the backup agent has two network connection sockets, which are respectively used for communication with the backup server job and the storage server job corresponding to the job.
所述的基于指纹的数据备份系统,其特征在于,所述存储服务器包括存储服务器初始化模块、连接监控模块、作业票据表、作业处理模块和网络通信模块,以及索引缓冲区、分块缓冲区、分块哈希表和磁盘日志;The fingerprint-based data backup system is characterized in that the storage server includes a storage server initialization module, a connection monitoring module, a job ticket table, a job processing module and a network communication module, as well as an index buffer, a block buffer, block hash table and disk log;
所述存储服务器初始化模块执行初始化工作,包括解析存储服务器配置文件,建立内存资源链表,启动相关服务线程;The storage server initialization module performs initialization work, including parsing the storage server configuration file, establishing a memory resource list, and starting related service threads;
所述连接监控模块监控备份服务器和备份代理的连接请求,对连接的备份服务器进行认证,认证通过后生成一个网络连接套接字和此备份服务器通信并加入作业队列中;对连接的备份代理,则根据其出示的作业票据ticket检查作业票据表以对其进行认证,认证通过后生成一个网络连接套接字和此备份代理通信并链接到相应作业控制记录的成员变量中;Described connection monitoring module monitors the connection request of backup server and backup agent, and the backup server of connection is authenticated, and after authentication is passed, generate a network connection socket and communicate with this backup server and join in the job queue; To the backup agent of connection, Check the job ticket table according to the job ticket it presents to authenticate it, and generate a network connection socket to communicate with the backup agent and link to the member variable of the corresponding job control record after the authentication is passed;
所述作业票据表用于存储对备份代理作业进行认证的票据;The job ticket table is used to store a ticket for authenticating the backup proxy job;
所述作业处理模块包括一个作业队列以及W个作业工作线程,当作业队列溢出时,连接监控模块转入“拒绝备份服务器连接请求”状态;作业工作线程从作业队列中取出一个网络连接套接字后,首先为作业建立一个作业控制记录,把网络连接套接字链入作业控制记录的成员变量中,然后通过此网络连接套接字和备份服务器交互,把备份服务器作业对象的有关属性通过变换后赋值给作业控制记录的相应成员变量,并随机生成一个作业票据ticket登记到作业票据表中且向备份服务器作业对象传送此作业票据ticket;当连接监控模块向作业队列中加入一个网络连接套接字时,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到W时,就生成一个新的作业工作线程;作业工作线程每次从作业队列中取一个网络连接套接字时都检查连接监控模块的状态,如果其处于“拒绝备份服务器连接请求”状态则取消这种状态以使它接受备份服务器连接请求;Described job processing module comprises a job queue and W job worker threads, and when job queue overflows, connection monitor module changes over to " reject backup server connection request " state; Job worker thread takes out a network connection socket from job queue Finally, first create a job control record for the job, link the network connection socket into the member variable of the job control record, and then interact with the backup server through the network connection socket, and transfer the relevant attributes of the backup server job object through transformation Afterwards, it is assigned to the corresponding member variable of the job control record, and a job ticket ticket is randomly generated and registered in the job ticket table, and the job ticket ticket is sent to the job object of the backup server; when the connection monitoring module adds a network connection socket to the job queue word, if there is no idle job worker thread and the number of active job worker threads does not reach W, a new job worker thread is generated; every time the job worker thread fetches a network connection socket from the job queue Check the state of the connection monitoring module, if it is in the state of "rejecting the backup server connection request", cancel this state so that it accepts the backup server connection request;
所述网络通信模块由作业的网络连接套接字组成,存储服务器的每个作业都拥有两个网络连接套接字,分别用于和该作业对应的备份服务器作业以及备份代理作业通信;Described network communication module is made up of the network connection socket of job, and each job of storage server all has two network connection sockets, is respectively used for and the backup server job corresponding to this job and backup agent job communication;
所述索引缓冲区是存储服务器作业执行第一备份过程和第二备份过程的基础设施,索引缓冲区以一个内存哈希表实现,用于存储本作业链中本作业实例Jobx(tn)的前一个作业实例Jobx(tn-1)包含的所有指纹以及在本作业运行过程中新生成的指纹;The index buffer is the infrastructure for the storage server job to execute the first backup process and the second backup process. The index buffer is implemented as a memory hash table and is used to store the job instance Job x (t n ) in the job chain. All the fingerprints contained in the previous job instance Job x (t n-1 ) and the newly generated fingerprints during the running of this job;
所述分块缓冲区是存储服务器作业执行第一备份过程和第二备份过程的基础设施,分块缓冲区以一个独立的磁盘阵列实现,用以临时存储第一备份过程中其指纹在索引缓冲区中没有被找到的数据分块;The block buffer is the infrastructure for the storage server job to execute the first backup process and the second backup process. The block buffer is implemented with an independent disk array to temporarily store its fingerprints in the index buffer during the first backup process. Data blocks not found in the zone;
所述分块哈希表是存储服务器作业执行第二备份过程的基础设施,分块哈希表以一个独立的磁盘阵列实现,用以建立分块指纹到此分块在磁盘日志的存储地址的映射;The block hash table is the infrastructure for the storage server job to execute the second backup process. The block hash table is implemented with an independent disk array, and is used to establish the block fingerprint to the storage address of the block in the disk log. mapping;
所述磁盘日志是存储服务器作业执行第二备份过程的基础设施,磁盘日志以一个独立的磁盘阵列实现,用以存储数据分块和以分块形式存储的文件索引。The disk log is the infrastructure for the storage server job to execute the second backup process, and the disk log is implemented as an independent disk array to store data blocks and file indexes stored in blocks.
本发明的优点为:The advantages of the present invention are:
1、采用基于锚的文件分块技术把文件分成变长大小的块以识别文件内部或文件之间的冗余数据,具有修改稳定性,对一个文件的修改仅仅影响修改区域内相邻的数据块,其他数据块的边界不会发生移动。这样在对一个文件进行增量备份时,仅仅修改过的几个数据块需要备份,其他的数据块可以和以前的备份文件共享;使用窗口滑动计算,计算开销小。1. Use the anchor-based file block technology to divide the file into blocks of variable length to identify redundant data within the file or between files. It has modification stability. The modification of a file only affects the adjacent data in the modification area. block, the boundaries of other data blocks will not move. In this way, when incrementally backing up a file, only a few data blocks that have been modified need to be backed up, and other data blocks can be shared with the previous backup file; using window sliding calculation, the calculation overhead is small.
2、数据分块以其指纹为索引存储在存储服务器的磁盘阵列上,把数据存储地址和内容关联起来,改变了数据存储地址和内容相分离的传统概念,消除了冗余数据的备份,节省了磁盘存储空间;2. The data block is stored on the disk array of the storage server with its fingerprint as an index, and the data storage address is associated with the content, which changes the traditional concept of separating the data storage address and content, eliminates the backup of redundant data, and saves disk storage space;
3、数据分块一旦存储就不再擦除,数据分块可以连续追加在磁盘上,消除了磁盘存储碎片;用户的数据历史得到完整保存,用户可以很方便地浏览文件的任一历史版本;避免了用户误操作而删除重要数据。3. Once the data block is stored, it will not be erased, and the data block can be continuously appended to the disk, eliminating disk storage fragmentation; the user's data history is completely preserved, and the user can easily browse any historical version of the file; Avoid user misoperation and delete important data.
4、采用有效的备份缓冲策略,减少了备份的网络开销,提高了数据备份速度,降低了备份对应用服务器的影响。4. Adopting an effective backup buffer strategy reduces the network overhead of backup, improves the data backup speed, and reduces the impact of backup on the application server.
附图说明 Description of drawings
图1为本发明结构示意图;Fig. 1 is a structural representation of the present invention;
图2为备份服务器结构示意图;Fig. 2 is a schematic diagram of the structure of the backup server;
图3为备份代理结构示意图;Fig. 3 is a schematic diagram of the backup agent structure;
图4为存储服务器结构示意图;FIG. 4 is a schematic structural diagram of a storage server;
图5为文件在磁盘日志上的存储示意图;Fig. 5 is the storage diagram of file on the disk log;
图6为磁盘日志上多个文件共享数据分块/索引块示意图;Fig. 6 is a schematic diagram of multiple file sharing data blocks/index blocks on the disk log;
图7为本发明的索引缓冲区结构图;Fig. 7 is the index buffer structural diagram of the present invention;
图8为基于锚的文件分块技术中,文件分块示意图。FIG. 8 is a schematic diagram of file segmentation in the anchor-based file segmentation technology.
具体实施方式 Detailed ways
下面结合附图和实施例对本发明进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.
1、系统总体结构1. The overall structure of the system
图1为本发明系统体系示意图,本发明包括备份服务器、备份代理、存储服务器和Web服务器,它们通过网络相互通信完成数据备份与恢复。Fig. 1 is a schematic diagram of the system system of the present invention. The present invention includes a backup server, a backup agent, a storage server and a Web server, which communicate with each other through a network to complete data backup and recovery.
图2为备份服务器结构示意图;备份服务器包括备份服务器初始化模块、命令监听模块、命令处理模块、作业处理模块和网络通信模块;还装有配置文件和目录数据库。Figure 2 is a schematic diagram of the backup server structure; the backup server includes a backup server initialization module, a command monitoring module, a command processing module, a job processing module and a network communication module; configuration files and a directory database are also installed.
备份服务器是整个网络备份系统的指挥中枢,它通过作业对象控制着整个数据备份和恢复的过程。备份服务器的作业对象给用户提供了一个定制备份/恢复作业的窗口。作业对象包含了许多属性,这些属性指定了系统如何操作作业运行。如备份代理属性指定了作业从哪一台主机上备份/恢复数据;文件集属性指定了作业要备份/恢复的目录;调度策略属性指定了系统调度本作业运行的策略等等。记一个作业对象为Jobx,作业对象在时刻t被调度运行时产生一个运行实例Jobx(t)。作业对象Jobx按时间顺序的一序列运行实例Jobx(t0),Jobx(t1),...Jobx(tn)(t0<t1<...<tn)组成了本作业对象的一条作业链,记为Jobx(t0,t1,...tn)。所述备份服务器同时维护着一个目录数据库用于记录Jobx(t)的管理信息。具体地说,Jobx(t)的管理信息存储在目录数据库中本作业的作业记录Jobx(t).Record中。The backup server is the command center of the entire network backup system, and it controls the entire process of data backup and recovery through job objects. The job object of the backup server provides a window for the user to customize the backup/restore job. A job object contains a number of attributes that specify how the system handles the job run. For example, the backup proxy attribute specifies which host the job will back up/restore data from; the file set attribute specifies the directory to be backed up/restored by the job; the scheduling policy attribute specifies the system scheduling strategy for the job to run, and so on. Denote a job object as Job x , and when the job object is scheduled to run at time t, a running instance Job x (t) is generated. The job object Job x is composed of a sequence of running instances Job x (t 0 ), Job x (t 1 ), ... Job x (t n ) (t 0 <t 1 <...<t n ) in chronological order A job chain of this job object is recorded as Job x (t 0 , t 1 ,...t n ). The backup server also maintains a directory database for recording the management information of Job x (t). Specifically, the management information of Job x (t) is stored in the job record Job x (t).Record of this job in the directory database.
目录数据库:用来存储作业运行的管理信息,即Jobx(t).Record。Jobx(t).Record主要存储本作业包含的文件的根块,本作业的指纹文件Jobx(t).FF等。每一个运行完成的作业Jobx(t)都在目录数据库中保存一份指纹文件Jobx(t).FF,Jobx(t).FF存储作业Jobx(t)所包含的所有指纹。Jobx(tn).FF用于对作业Jobx(tn+1)的索引缓冲区进行初始化。Directory database: used to store the management information of job operation, that is, Job x (t).Record. Job x (t).Record mainly stores the root block of the files included in this job, the fingerprint file Job x (t).FF, etc. of this job. Each completed job Job x (t) saves a fingerprint file Job x (t).FF in the directory database, and Job x (t).FF stores all fingerprints contained in the job Job x (t). Job x (t n ).FF is used to initialize the index buffer of the job Job x (t n+1 ).
图3为备份代理结构示意图;备份代理包括备份代理初始化模块、请求监听模块、作业处理模块、文件分块模块和网络通信模块。Figure 3 is a schematic diagram of the backup agent structure; the backup agent includes a backup agent initialization module, a request monitoring module, a job processing module, a file block module and a network communication module.
图4为存储服务器结构示意图;存储服务器包括存储服务器初始化模块、连接监控模块、作业票据表、作业处理模块和网络通信模块,以及索引缓冲区、分块缓冲区、分块哈希表和磁盘日志。Figure 4 is a schematic diagram of the storage server structure; the storage server includes a storage server initialization module, a connection monitoring module, a job ticket table, a job processing module, and a network communication module, as well as an index buffer, a block buffer, a block hash table, and a disk log .
存储服务器管理着一个大容量的磁盘阵列(RAID)用以存储数据分块。分块以其指纹为索引存储在磁盘阵列上。数据分块一旦写到磁盘上就不再擦除,这样整个磁盘阵列就像一个日志,数据分块无间隔地追加在磁盘上,消除了磁盘存储的碎片。用于存储数据分块的磁盘被称为磁盘日志。存储服务器使用一块专用的磁盘阵列存储分块哈希表,分块哈希表用以建立分块指纹到此分块在磁盘日志的存储地址的映射。备份文件的所有数据分块通过索引块进行索引,一个文件的所有索引块组成了一棵索引树。同时每一个文件都拥有唯一的一个分块叫根块,根块存储文件的索引树的根的索引,同时文件的元数据以及一些管理信息也存储在根块上。文件的根块以及索引块也作为数据分块存储在磁盘日志上。存储服务器采用备份缓冲策略以提高系统的数据备份速度。具体为:(1)采用内存索引缓冲区存储本作业链中本作业实例Jobx(tn)的前一个作业实例Jobx(tn-1)包含的所有指纹以及在本作业运行过程中新生成的指纹。(2)采用一块专用的磁盘阵列作为分块缓冲区用以临时存储备份过程中其指纹在索引缓冲区中没有被找到的数据分块。(3)一个作业的备份过程被分成两个阶段完成,这两个阶段分别记为第一备份过程和第二备份过程。第一备份过程由备份代理和存储服务器相互交互完成文件分块的备份,使用索引缓冲区查找分块指纹,使用分块缓冲区存储在索引缓冲区查找过程中没有发现其指纹的数据分块。对备份代理来说,第一备份过程完成后作业的备份过程就算结束了。因为本过程使用内存索引缓冲区进行指纹查询,免去了费时的分块哈希表查询,故而速度很快。第二备份过程由存储服务器在系统相对空闲的时候运行。本过程把分块缓冲区中临时存储的数据分块转存到磁盘日志上,使用分块哈希表进行指纹查询。本过程同时建立文件在磁盘日志上的索引树。由于第二备份过程是在后台由存储服务器独自完成,故而对运行备份代理的应用服务器没有影响。恢复文件时,存储服务器根据文件索引重构文件并把文件数据通过网络送到相应的备份代理。The storage server manages a large-capacity disk array (RAID) to store data blocks. Chunks are stored on the disk array indexed by their fingerprints. Once the data blocks are written to the disk, they will not be erased, so that the entire disk array is like a log, and the data blocks are appended to the disk without intervals, eliminating the fragmentation of disk storage. The disk used to store chunks of data is called a disk journal. The storage server uses a dedicated disk array to store the block hash table, and the block hash table is used to establish a mapping from the block fingerprint to the storage address of the block in the disk log. All data blocks of a backup file are indexed through index blocks, and all index blocks of a file form an index tree. At the same time, each file has a unique block called the root block, which stores the index of the root of the index tree of the file, and the metadata and some management information of the file are also stored on the root block. The file's root block and index blocks are also stored as data blocks on the disk journal. The storage server adopts the backup buffer strategy to improve the data backup speed of the system. Specifically: (1) Use the memory index buffer to store all the fingerprints contained in the previous job instance Job x (t n-1 ) of this job instance Job x (t n ) in this job chain and the new fingerprints during the running of this job. Generated fingerprints. (2) A dedicated disk array is used as a block buffer to temporarily store data blocks whose fingerprints are not found in the index buffer during the backup process. (3) The backup process of a job is completed in two phases, and these two phases are recorded as the first backup process and the second backup process respectively. In the first backup process, the backup agent and the storage server interact with each other to complete the backup of the file blocks, use the index buffer to find the block fingerprints, and use the block buffer to store the data blocks whose fingerprints are not found in the index buffer search process. For the backup agent, the backup process of the job ends after the first backup process is completed. Because this process uses the memory index buffer for fingerprint query, eliminating the time-consuming block hash table query, so the speed is very fast. The second backup process is run by the storage server when the system is relatively idle. This process transfers the data temporarily stored in the block buffer to the disk log in blocks, and uses the block hash table for fingerprint query. This process also builds an index tree of files on the disk journal. Since the second backup process is independently completed by the storage server in the background, it has no impact on the application server running the backup agent. When restoring a file, the storage server reconstructs the file according to the file index and sends the file data to the corresponding backup agent through the network.
Web服务器:本发明采用B-S模式提供网页用户界面。用户可以在任何地方通过Web浏览器登录系统的管理界面以指定系统完成交互式的备份或恢复作业、监视系统自动调度型作业的运行情况,还可以定制作业、配置备份服务器、进行设备管理等。Web server: the present invention adopts B-S mode to provide a web page user interface. Users can log in to the management interface of the system through a web browser at any place to specify the system to complete interactive backup or recovery jobs, monitor the running status of the system's automatic scheduling jobs, customize jobs, configure backup servers, and perform device management.
2、存储服务器磁盘日志2. Storage server disk log
本发明备份数据分块以其指纹为索引存储在存储服务器的磁盘日志上。这样保证没有相同的两个分块同时存储在磁盘上,因而消除了冗余数据的备份。分块一旦存储就不再擦除,使得分块可以连续的追加在磁盘日志上,消除了磁盘存储碎片。备份文件所属的数据块以索引块为索引。文件的索引块也存储在磁盘日志上。In the present invention, the backup data block is stored on the disk log of the storage server with its fingerprint as an index. This ensures that no two identical blocks are stored on disk at the same time, thereby eliminating redundant data backups. Once the block is stored, it will not be erased, so that the block can be continuously appended to the disk log, eliminating disk storage fragmentation. The data block to which the backup file belongs is indexed by the index block. Index blocks for files are also stored on the disk journal.
2.1、分块块头2.1, block header
为了方面管理,每个数据分块的前面都附加了一个块头。块头为系统管理,包括完整性检测、文件索引以及分块哈希表的重构提供了必要的信息。块头一共39字节,由以下部分组成:For aspect management, each data chunk is preceded by a chunk header. Block headers provide the necessary information for system management, including integrity checks, file indexing, and block hash table reconstruction. The block header is 39 bytes in total and consists of the following parts:
magic:6个字符的块头标志;magic: 6-character block header flag;
fingerprint:本分块的指纹,共20字节;fingerprint: the fingerprint of this block, a total of 20 bytes;
type:本数据分块的类型,共有三种不同类型的数据分块,即数据块、索引块和文件的根块,分别记为:dc,ic,rc;type: the type of the data block, there are three different types of data blocks, namely the data block, the index block and the root block of the file, respectively recorded as: dc, ic, rc;
size:本数据分块的大小,不包括块头。对索引块,系统规定其大小不能超过16KB;size: The size of this data block, not including the block header. For the index block, the system stipulates that its size cannot exceed 16KB;
offset:本数据分块在磁盘日志上的存储地址。offset: The storage address of this data block on the disk log.
2.2、文件索引2.2, file index
图5所示为文件在磁盘日志上的存储结构。文件所属的数据块以索引块为索引,索引块也存储在磁盘日志上,一个文件的所有索引块组成了一棵索引树;每个文件都在磁盘日志上存储有唯一的一个根块,根块里存储文件索引树的根的索引,同时还存储文件的元数据和本文件的一些管理信息。文件备份完成后,其根块作为作业的管理信息同时也存储到目录数据库的作业记录里。图5中,F0表示一个文件,Di表示数据块,Ii表示索引块,索引块由索引项组成,P(X)表示一个索引项,它是一个三元组<H(X),offset,type>,其中X是被索引的数据分块,H(X)表示数据分块X的指纹,offset表示数据分块X在磁盘日志上的存储地址,type表示数据分块X的类型,X可以是一个索引块Ii,也可以是一个数据块Di,图中的箭头表示被索引块和其索引项的对应关系,M(F0)表示文件F0的元数据以及一些管理信息,索引块I0,I1和I2组成了文件F0的索引树,索引块I0为此索引树的根,R0表示文件F0的根块,它由M(F0)和一个指向文件的索引树的根I0的索引项P(I0)组成。磁盘日志上的所有数据块和索引块都可以被不同的文件所共享。图6所示为不同文件共享数据块和索引块的情况,图中各记号表示的意义和图5相同。Figure 5 shows the storage structure of files on the disk log. The data block to which the file belongs is indexed by the index block, and the index block is also stored on the disk log. All index blocks of a file form an index tree; each file has a unique root block stored on the disk log, and the root The block stores the index of the root of the file index tree, and also stores the metadata of the file and some management information of the file. After the file backup is completed, its root block is also stored in the job record of the directory database as the management information of the job. In Fig. 5, F 0 represents a file, D i represents a data block, I i represents an index block, an index block is composed of index items, P(X) represents an index item, and it is a triple <H(X), offset, type>, where X is the indexed data block, H(X) represents the fingerprint of data block X, offset represents the storage address of data block X on the disk log, type represents the type of data block X, X can be an index block I i or a data block D i , the arrows in the figure indicate the corresponding relationship between the indexed block and its index items, and M(F 0 ) indicates the metadata and some management information of the file F 0 , index block I 0 , I 1 and I 2 constitute the index tree of file F 0 , index block I 0 is the root of this index tree, R 0 represents the root block of file F 0 , which consists of M(F 0 ) and a An index entry P(I 0 ) pointing to the root I 0 of the index tree of the file. All data blocks and index blocks on the disk log can be shared by different files. FIG. 6 shows the situation that different files share data blocks and index blocks. The meanings of the symbols in the figure are the same as those in FIG. 5 .
3、存储服务器分块哈希表3. Storage server block hash table
本发明存储服务器分块哈希表用以建立分块指纹到此分块在磁盘日志的存储地址的映射,分块哈希表由相同大小的桶组成。分块哈希表所包含的桶数是根据磁盘日志的大小来确定的,磁盘日志的容量越大,则分块哈希表所包含的桶数就越多,以降低桶的哈希冲突的几率。系统根据哈希表的桶数取指纹的前n位作为桶号把指纹映射到哈希表的相应的桶里。每个指纹以三元组<fingerprint,offset,type>的形式存储在桶里,其中fingerprint表示此分块的指纹,offset表示此指纹对应的分块在磁盘日志上的存储地址,type表示此指纹对应的分块的类型。如果桶发生哈希冲突,则把指纹的三元组存储在相邻的一个桶里。The block hash table of the storage server in the present invention is used to establish the mapping from the block fingerprint to the storage address of the block in the disk log, and the block hash table is composed of buckets of the same size. The number of buckets contained in the block hash table is determined according to the size of the disk log. The larger the capacity of the disk log, the more buckets the block hash table contains, so as to reduce the probability of hash collision of the buckets. probability. According to the number of buckets in the hash table, the system takes the first n digits of the fingerprint as the bucket number and maps the fingerprint to the corresponding bucket in the hash table. Each fingerprint is stored in the bucket in the form of triple <fingerprint, offset, type>, where fingerprint represents the fingerprint of this block, offset represents the storage address of the block corresponding to this fingerprint on the disk log, and type represents this fingerprint The corresponding block type. If a bucket has a hash collision, the triplet of the fingerprint is stored in an adjacent bucket.
4、存储服务器索引缓冲区4. Storage server index buffer
图7所示为索引缓冲区的结构。索引缓冲区为一个内存哈希表,它由一个桶组和许多数据链表组成,桶组一共有1024*1024个桶,桶的编号从00000H到FFFFFH,桶可能为空,桶若非空,则里面包含一个指向数据链表的指针,对应一个数据链表,数据链表的表项存储被哈希到本桶中的指纹信息。哈希时,取指纹的前20比特作为桶号把此指纹哈希到相应的桶所指向的数据链表里。Figure 7 shows the structure of the index buffer. The index buffer is a memory hash table, which consists of a bucket group and many data linked lists. The bucket group has a total of 1024*1024 buckets. The number of the bucket is from 00000H to FFFFFH. The bucket may be empty. If the bucket is not empty, the inside Contains a pointer to the data linked list, corresponding to a data linked list, and the entries of the data linked list store the fingerprint information that is hashed into the bucket. When hashing, take the first 20 bits of the fingerprint as the bucket number and hash the fingerprint into the data linked list pointed to by the corresponding bucket.
数据链表的表项结构为:The entry structure of the data linked list is:
tag:标识符,占4比特,用以指示在第一备份过程和第二备份过程中本指纹的状态;tag: identifier, occupying 4 bits, used to indicate the state of this fingerprint in the first backup process and the second backup process;
fingerprintTail:本分块的指纹的后140比特,因为前20比特隐含在桶号中,故这里只需要存储指纹的后140比特;fingerprintTail: the last 140 bits of the fingerprint of this block, because the first 20 bits are implied in the bucket number, so only the last 140 bits of the fingerprint need to be stored here;
offset:存储地址,占64比特,如果此项非空,则表示此指纹对应的数据分块在磁盘日志的存储地址;offset: storage address, occupying 64 bits, if this item is not empty, it means the storage address of the data block corresponding to this fingerprint in the disk log;
next:占32比特,指向下一个表项的指针。next: occupies 32 bits, and points to the pointer to the next entry.
图7中“一个指纹”所示为一个指纹7E54F36A4EC62…3B被哈希到索引缓冲区的情况,第(1)步用指纹的前20比特“7E54F”作为桶号(bucketNo)找到编号为7E54FH的桶,第(2)步在此桶所指的数据链表中找fingerprintTail为“36A4EC62…3B”的表项,如果找到则表明指纹7E54F36A4EC62…3B已经存储在索引缓冲区中,如果没有找到,则建立一个新的表项存储此指纹的信息。"One fingerprint" in Figure 7 shows a fingerprint 7E54F36A4EC62...3B is hashed into the index buffer, step (1) uses the first 20 bits of the fingerprint "7E54F" as the bucket number (bucketNo) to find the number 7E54FH Bucket, step (2) finds the entry whose fingerprintTail is "36A4EC62...3B" in the data link list pointed to by this bucket. If found, it indicates that the fingerprint 7E54F36A4EC62...3B has been stored in the index buffer. If not found, create A new entry stores information about this fingerprint.
索引缓冲区的数据链表表项的tag共有三个不同的数值,其表示的意义如下:The tag of the data link list entry in the index buffer has three different values, the meanings of which are as follows:
0000:指纹来源于前一个作业的指纹文件,并且在本次备份过程中没有被命中;0000: The fingerprint comes from the fingerprint file of the previous job, and is not hit during this backup process;
1000:指纹来源于前一个作业的指纹文件,并且在本次备份过程中被命中;1000: The fingerprint comes from the fingerprint file of the previous job and is hit during this backup process;
1100:指纹是在本次备份过程中新产生的。1100: The fingerprint is newly generated during this backup.
一个备份作业Jobx(tn-1)完成后,本作业所包含的所有指纹以二元组<fingerprint,offset>(其中fingerprint表示分块的指纹,offset表示分块在磁盘日志上的存储地址)的形式被保存在文件Jobx(tn-1).FF中,文件Jobx(tn-1).FF被存储在目录数据库的作业记录Jobx(tn-1).Record中。Jobx(tn-1).FF被用来初始化作业Jobx(tn)的索引缓冲区。由于同一个作业链的相邻作业通常共享大量的文件或数据,故使用Jobx(tn-1).FF初始化作业Jobx(tn)的索引缓冲区会提高缓冲区的指纹命中率。After a backup job Jobx(t n-1 ) is completed, all fingerprints contained in this job are represented by the binary group <fingerprint, offset> (where fingerprint represents the fingerprint of the block, and offset represents the storage address of the block on the disk log) The form of is saved in the file Jobx(t n-1 ).FF, and the file Jobx(t n-1 ).FF is stored in the job record Jobx(t n-1 ).Record of the directory database. Jobx(t n-1 ).FF is used to initialize the index buffer of job Jobx(t n ). Since adjacent jobs in the same job chain usually share a large number of files or data, using Jobx(t n-1 ).FF to initialize the index buffer of job Jobx(t n ) will improve the fingerprint hit rate of the buffer.
5、备份过程5. Backup process
为方便起见,定义如下记号:For convenience, the following notations are defined:
BS:备份服务器作业工作线程;BS: backup server job worker thread;
BA:备份代理作业工作线程;BA: backup agent job worker thread;
SS:存储服务器作业工作线程;SS: storage server job worker thread;
F:一个文件;F: a file;
H:一个指纹;H: a fingerprint;
M(F):文件F的元数据;M(F): metadata of file F;
R(F):文件F的根块;R(F): the root block of file F;
H(D):数据分块D的指纹;H(D): the fingerprint of data block D;
D(H):指纹H所对应的数据块/索引块;D(H): the data block/index block corresponding to fingerprint H;
F.Index:构建文件F的索引树的内存缓冲区;F.Index: build the memory buffer of the index tree of file F;
index cache:索引缓冲区;index cache: index buffer;
chunk cache:分块缓冲区;chunk cache: block buffer;
hash table:分块哈希表;hash table: block hash table;
Jobx(tn).FileSet:作业对象Jobx(tn)的文件集;Job x (t n ).FileSet: the file set of the job object Job x (t n );
I(F,level):索引树F.Index第level层包含的索引块的集合。索引树的叶子被定义成0层,叶子结点的父结点为树的第1层,依次类推。I(F, level): A collection of index blocks contained in the level of the index tree F.Index. The leaf of the index tree is defined as level 0, the parent node of the leaf node is the first level of the tree, and so on.
Iw(F,level):I(F,level)中当前被用于存储三元组<H,offset,type>的工作结点;I w (F, level): I(F, level) is currently used to store the working node of the triplet <H, offset, type>;
<H,offset,type>:三元组,H:指纹,offset:分块D(H)在磁盘日志上的存储地址,type:分块D(H)的类型;<H, offset, type>: triplet, H: fingerprint, offset: storage address of block D(H) on the disk log, type: type of block D(H);
5.1、第一备份过程5.1, the first backup process
第一备份过程主要由备份代理作业工作线程和存储服务器作业工作线程协作完成,其步骤为:The first backup process is mainly completed by the cooperation of the backup agent job worker thread and the storage server job worker thread, and its steps are:
(1)SS:使用Jobx(tn-1).FF初始化index cache;(1) SS: use Job x (t n-1 ).FF to initialize index cache;
(2)BA:if(Jobx(tn).FileSet为空)转(20),else从Jobx(tn).FileSet中读取一个文件Fi;(2) BA: if (Job x (t n ).FileSet is empty) turn to (20), else read a file F i from Job x (t n ).FileSet;
(3)BA:传送M(Fi)到SS;(3) BA: send M(F i ) to SS;
(4)SS:把M(Fi)缓存到chunk cache;(4) SS: Cache M(F i ) to the chunk cache;
(5)BA:对Fi进行基于锚的文件分块;(5) BA: perform anchor-based file segmentation on F i ;
(6)BA:计算每个分块的指纹并把这些指纹组成的指纹集合传送到SS;(6) BA: Calculate the fingerprint of each block and send the fingerprint set composed of these fingerprints to SS;
(7)SS:if(指纹集合为空)转(17),else在指纹集合中取出一个指纹Hj并在index cache中查询此指纹;(7) SS: if (fingerprint set is empty) turn to (17), else take out a fingerprint H j from the fingerprint set and query this fingerprint in the index cache;
(8)SS:if(在index cache查到指纹Hj){(8) SS: if (fingerprint H j found in index cache) {
(9)SS:if(tag==0000){tag=1000;把<Hj,offset>缓存到chunkcache;}(9) SS: if(tag==0000){tag=1000; cache <H j , offset> to chunkcache;}
(10)SS:else if(tag==1000)把<Hj,offset>缓存到chunkcache;(10) SS: else if (tag==1000) cache <H j , offset> to chunkcache;
(11)SS:else if(tag==1100)把<Hj,null>缓存到chunk cache;}(11) SS: else if (tag==1100) cache <H j , null> to the chunk cache;}
(12)SS:else{把Hj缓存到index cache,tag=1100,offset=null;(12) SS: else {cache H j to index cache, tag=1100, offset=null;
(13)SS:请求BA传送D(Hj);(13) SS: request BA to transmit D(H j );
(14)BA:传送D(Hj)到SS;(14) BA: Send D(H j ) to SS;
(15)SS:把<HK,D(HK)>缓存到chunk cache;}(15) SS: Cache <H K , D(H K )> to the chunk cache;}
(16)SS:返回步骤(7);(16) SS: return to step (7);
(17)SS:通知BA备份下一个文件;(17) SS: notify BA to back up the next file;
(18)BA:返回步骤(2);(18) BA: return to step (2);
(19)BA:向BS及SS报告作业Jobx(tn)的结束状态然后退出.(19) BA: Report the end status of Job x (t n ) to BS and SS and exit.
(20)SS:收到BA的作业结束信号后,结束第一备份过程,转入第二备份过程;(20) SS: after receiving the job end signal from BA, end the first backup process and turn to the second backup process;
(21)BS:收到BA的作业结束信号后,断开和BA的连接,等待SS执行第二备份过程。(21) BS: After receiving the job end signal from BA, it disconnects from BA, and waits for SS to execute the second backup process.
5.1.1基于锚的文件分块5.1.1 Anchor-based file chunking
在第一备份过程的步骤(5)中,基于锚的文件分块是由备份代理作业工作线程调用备份代理文件分块模块完成的,其步骤为:In the step (5) of the first backup process, based on the anchor, the file segmentation is completed by the backup agent job worker thread calling the backup agent file segmentation module, and its steps are:
(1)以文件的开头48字节b1,b2,...,b48为一个窗口,以式H1=(b1*p47+b2*p46+...+b48)mod M计算文件的第一个窗口的哈希值。上式中p为某个素数,可取17,M为常数,可取232。哈希值存储在变量H1中。(1) Take the first 48 bytes b 1 , b 2 ,..., b 48 of the file as a window, and use the formula H 1 =(b 1 *p 47 +b 2 *p 46 +...+b 48 )mod M computes the hash of the first window of the file. In the above formula, p is a certain prime number, which can be 17, and M is a constant, which can be 2 32 . The hash value is stored in variable H1 .
(2)向后滑动一个字节,以式H2=(p*H1+b49-b1*p48)mod M计算文件第二个窗口b2,b3,...,b49的哈希值存储在变量H2中。(2) Slide one byte backward, calculate the second window b 2 , b 3 ,..., b 49 of the file with the formula H 2 =(p*H 1 +b 49 -b 1 *p 48 )mod M The hash value of is stored in variable H2 .
(3)以此类推,计算文件的所有窗口的哈希值。(3) By analogy, calculate the hash values of all windows of the file.
(4)对每个窗口的哈希值,取其低13位组成一个二进制数,如果此数等于预定的某个数(比如61),则确定其相应的窗口为一个锚,以锚为边界把文件分成大小不一的数据块。(4) For the hash value of each window, take its lower 13 bits to form a binary number, if this number is equal to a predetermined number (such as 61), then determine its corresponding window as an anchor, with the anchor as the boundary Divide the file into chunks of varying sizes.
上述基于锚的文件分块遵守如下三个约定:a)如果文件小于48字节,则退出基于锚的文件分块算法,整个文件为一个数据块;b)如果在某一段字节流中包含过多的锚,则舍弃一些锚使得最小的分块不小于2KB(文件末尾的一个分块是唯一的可能小于2KB的分块);c)如果在连续64KB的字节流中都没有锚,则取此64KB为一个分块;The above-mentioned anchor-based file chunking follows the following three conventions: a) If the file is smaller than 48 bytes, the anchor-based file chunking algorithm will be exited, and the entire file is a data block; b) If a byte stream contains If there are too many anchors, some anchors are discarded so that the smallest block is not less than 2KB (a block at the end of the file is the only block that may be smaller than 2KB); c) if there are no anchors in the continuous 64KB byte stream, Then take this 64KB as a block;
本发明中基于锚的文件分块具有如下两个特点:(1)具有修改稳定性,也就是说对一个文件的修改仅仅影响修改区域内相邻的数据块,其他数据块的边界不会发生移动。这样在对一个文件进行增量备份时,仅仅修改过的几个数据块需要备份,其他的数据块可以和以前的备份文件进行共享。修改稳定性还保证了文件内部以及文件之间的数据相似性不因比特偏移而被遗漏,从而最大限度地检测出文件的重复数据。(2)滑动窗口具有计算方便的优点,其下一个窗口的哈希值可以很容易从前一个窗口的哈希值的基础上计算出来,因而使得基于锚的文件分块具有计算开销小的优点,整个算法的时间复杂度为O(n),其中n为文件包含的字节数。In the present invention, the anchor-based file segmentation has the following two characteristics: (1) has modification stability, that is to say, the modification of a file only affects the adjacent data blocks in the modification area, and the boundaries of other data blocks will not occur move. In this way, when a file is incrementally backed up, only a few data blocks that have been modified need to be backed up, and other data blocks can be shared with the previous backup file. The modification stability also ensures that the similarity of data within files and between files will not be missed due to bit offsets, thereby maximally detecting duplicate data in files. (2) The sliding window has the advantage of convenient calculation, and the hash value of the next window can be easily calculated from the hash value of the previous window, so that the anchor-based file partitioning has the advantage of small computational overhead, The time complexity of the whole algorithm is O(n), where n is the number of bytes contained in the file.
图8所示为一个文件分块后再对文件编辑时此文件分块的变化情况。从图中可以看出,基于锚的文件分块具有修改稳定性,也就是说对一个文件的修改仅仅影响修改区域内相邻的数据块,其他数据块的边界不会发生移动。a行所示为一个文件被锚分成了B1~B8大小不一的8块,每一块的边界带纹齿的部分为48字节的锚。b、c、d行为对文件进行第1、2、3次修改后,分块的变化情况,带阴影的部分为被修改过的部分。b行:对文件的第1次修改发生在块B4内,修改后并没有产生新的块,仅仅使块B4变成了块B9,其它的块都没有发生改变。这时候的文件备份就只需要把块B9备份过去替代原来的块B4就可以了。c行:对文件的第2次修改发生在块B5内,修改后产生了新的锚,把块B5分成了两块B10和B11,其它的块都没有发生改变。这时候的文件备份就只需要把块B10和B11备份过去代替原来的块B5就行了。d行:对文件的第3次修改发生在块B2和B3的分界处,结果使B2和B3之间的锚丢失,两块合并成为一个块B12。这时候的文件备份只需把块B12备份过去代替原来的块B2和B3。Fig. 8 shows the change of the block of a file when the file is edited after block. It can be seen from the figure that the anchor-based file partitioning has modification stability, that is to say, the modification of a file only affects the adjacent data blocks in the modification area, and the boundaries of other data blocks will not move. Line a shows that a file is divided into 8 blocks of different sizes from B 1 to B 8 by anchors, and the part with teeth on the border of each block is a 48-byte anchor. The behaviors b, c, and d show the changes in blocks after the 1st, 2nd, and 3rd modification of the file, and the shaded part is the modified part. Line b: the first modification to the file occurs in block B 4 , no new block is generated after the modification, only block B 4 becomes block B 9 , and other blocks are not changed. At this time, the file backup only needs to back up the block B 9 to replace the original block B 4 in the past. Line c: the second modification to the file occurs in block B 5 , a new anchor is generated after the modification, and block B 5 is divided into two blocks B 10 and B 11 , and other blocks are not changed. At this time, the file backup only needs to back up blocks B 10 and B 11 to replace the original block B 5 . Line d: The third modification to the file occurs at the boundary between blocks B 2 and B 3 , as a result, the anchor between B 2 and B 3 is lost, and the two blocks are merged into one block B 12 . At this time, the file backup only needs to back up the block B 12 to replace the original blocks B 2 and B 3 .
5.2、第二备份过程5.2, the second backup process
第二备份过程主要由存储服务器作业工作线程在系统相对空闲的时候完成,其步骤为:The second backup process is mainly completed by the storage server job worker thread when the system is relatively idle, and the steps are as follows:
(1)SS:if(Jobx(tn).FileSet为空)转(19),else从Jobx(tn).FileSet中取一个文件名Fi;(1) SS: if (Job x (t n ).FileSet is empty) turn to (19), else get a file name F i from Job x (t n ).FileSet;
(2)SS:为文件Fi创建内存缓冲区Fi.Index,并在Fi.Index中创建R(Fi),然后把chunk cache中的M(Fi)存到R(Fi);(2) SS: Create a memory buffer F i .Index for the file F i , and create R(F i ) in F i .Index, and then save M(F i ) in the chunk cache to R(F i ) ;
(3)SS:if(chunk cache中没有和Fi相关的元组)转(14),else从chunk cache中读取一个和Fi相关的元组;(3) SS: if (there is no tuple related to F i in the chunk cache) turn to (14), else read a tuple related to F i from the chunk cache;
(4)SS:if(是<Hj,offset>),转步骤(12);(4) SS: if (is <H j , offset>), go to step (12);
(5)SS:else if(是<Hj,D(Hj)>){(5) SS: else if (is <H j , D(H j )>){
(6)SS:在hash table中查询Hj;(6) SS: query H j in the hash table;
(7)SS:if(找到)把“offset”值写到index cache中和的Hj对应的表项中,转步骤(12);(7) SS: if (found) write the "offset" value into the table entry corresponding to H j in the index cache, then go to step (12);
(8)SS:else{把D(Hj)追加到磁盘日志,同时更新hash table;(8) SS: else {append D(H j ) to the disk log, and update the hash table at the same time;
(9)SS:把“offset”值写到index cache中和的Hj对应的表项中,转步骤(12);}}(9) SS: Write the "offset" value into the entry corresponding to H j in the index cache, and go to step (12);}}
(10)SS:else if(是<Hj,null>)(10) SS: else if (is <H j , null>)
(11)SS:从index cache中Hj对应的表项中读取“offset”值;(11) SS: read the "offset" value from the entry corresponding to H j in the index cache;
(12)SS:insert(<Hj,offset,dc>,0,Fi.Index);(12) SS: insert(<H j , offset, dc>, 0, F i .Index);
(13)SS:返回步骤(3);(13) SS: return to step (3);
(14)SS:storeRemain(Fi.Index,R(Fi));(14) SS: storeRemain(F i .Index, R(F i ));
(15)SS:把R(Fi)追加到磁盘日志,同时更新hash table;(15) SS: Append R(F i ) to the disk log and update the hash table at the same time;
(16)SS:把R(Fi)传送给BS;(16) SS: Send R(F i ) to BS;
(17)BS:把R(Fi)传送到目录数据库并存储在Jobx(tn).Record中;(17) BS: transfer R(F i ) to the directory database and store it in Job x (t n ).Record;
(18)SS:返回步骤(1);(18) SS: return to step (1);
(19)SS:创建文件Jobx(tn).FF;(19) SS: create the file Job x (t n ).FF;
(20)SS:读index cache,对每一个符合条件(tag==1000 ortag==1100)的表项,把<H,offset>写到文件Jobx(tn).FF中;(20) SS: read index cache, write <H, offset> to the file Job x (t n ).FF for each entry that meets the conditions (tag==1000 ortag==1100);
(21)SS:把文件Jobx(tn).FF传送给BS;(21) SS: Send the file Job x (t n ).FF to BS;
(22)BS:把文件Jobx(tn).FF传送到目录数据库并存储在Jobx(tn).Record中;(22) BS: transfer the file Job x (t n ).FF to the directory database and store it in Job x (t n ).Record;
(23)SS:向BS报告作业Jobx(tn)的结束状态;(23) SS: report the end status of the job Job x (t n ) to the BS;
(24)BS:中断和SS的连接,把作业Jobx(tn)的结束状态写到目录数据库的Jobx(tn).Record中,并结束作业Jobx(tn)运行。(24) BS: interrupt the connection with SS, write the end status of the job Job x (t n ) into the Job x (t n ).Record of the directory database, and end the operation of the job Job x (t n ).
在上述算法中,步骤(12)和(14)两个函数的算法如下:In the above algorithm, the algorithms of the two functions of steps (12) and (14) are as follows:
步骤(12)算法Step (12) Algorithm
insert(<H,offset,type>,level,F.Index)insert(<H, offset, type>, level, F. Index)
{//存储三元组<H,offset,type>到F.Index.{//Store triple <H, offset, type> to F.Index.
//level:存储三元组<H,offset,type>的索引结点在索引树F.Index中的层号.//level: Stores the layer number of the index node of the triple <H, offset, type> in the index tree F.Index.
{创建Iw(F,level);把<H,offset,type>存储到Iw(F,level);返回;}{ create Iw (F, level); store <H, offset, type> into Iw (F, level); return; }
else if(Iw(F,level)未满)else if(I w (F, level) is not full)
{存储<H,offset,type>到Iw(F,level)中;返回;}{ store <H, offset, type> into I w (F, level); return; }
else if(Iw(F,level)已满)else if(I w (F, level) is full)
{计算H(Iw(F,level));{ compute H(I w (F, level));
在hash table中查询H(Iw(F,level));Query H(I w (F, level)) in the hash table;
if未找到if not found
把Iw(F,level)追加到磁盘日志,同时更新hash table;Append I w (F, level) to the disk log and update the hash table at the same time;
insert(<H(Iw(F,level)),offset,ic>,level+1,F.Index);insert(<H(I w (F, level)), offset, ic>, level+1, F.Index);
创建一个新的索引结点Iw(F,level);Create a new index node I w (F, level);
存储<H,offset,type>到Iw(F,level)中;返回;store <H, offset, type> into I w (F, level); return;
}}
}}
步骤(14)算法Step (14) Algorithm
storeRemain(F.Index,R(F))storeRemain(F.Index, R(F))
{//把F.Index中每一层的工作索引结点存储到磁盘日志中.{//Store the working index nodes of each layer in F.Index to the disk log.
int level:=0;int level:=0;
loop:计算H(Iw(F,level));loop: Calculate H(I w (F, level));
在hash table中查询H(Iw(F,level));Query H(I w (F, level)) in the hash table;
if未找到if not found
把Iw(F,level)追加到磁盘日志,同时更新hash table;Append I w (F, level) to the disk log and update the hash table at the same time;
if(|I(F,level)|=1)If(|I(F, level)|=1)
{存储<H(Iw(F,level)),offset,ic>到R(F);返回;}{store <H(I w (F, level)), offset, ic> into R(F); return; }
elseelse
{insert(<H(Iw(F,level)),offset,ic>,level+1,F.Index);{insert(<H(I w (F, level)), offset, ic>, level+1, F.Index);
level:=level+1;goto loop;level:=level+1; goto loop;
}}
}}
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2007101687158A CN100547555C (en) | 2007-12-10 | 2007-12-10 | A Data Backup System Based on Fingerprint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2007101687158A CN100547555C (en) | 2007-12-10 | 2007-12-10 | A Data Backup System Based on Fingerprint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101183323A CN101183323A (en) | 2008-05-21 |
CN100547555C true CN100547555C (en) | 2009-10-07 |
Family
ID=39448610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2007101687158A Expired - Fee Related CN100547555C (en) | 2007-12-10 | 2007-12-10 | A Data Backup System Based on Fingerprint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100547555C (en) |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101610281B (en) * | 2008-06-19 | 2012-11-21 | 华为技术有限公司 | Method and device for storing data fingerprints |
CN101599079B (en) * | 2009-07-22 | 2011-08-31 | 中国科学院计算技术研究所 | Backup data centralized storage management method |
CN101814045B (en) * | 2010-04-22 | 2011-09-14 | 华中科技大学 | Data organization method for backup services |
CN101887388B (en) * | 2010-06-18 | 2014-03-12 | 中兴通讯股份有限公司 | Data backup system and method based on memory database |
US8600944B2 (en) * | 2010-09-24 | 2013-12-03 | Hitachi Data Systems Corporation | System and method for managing integrity in a distributed database |
CN102456059A (en) * | 2010-10-21 | 2012-05-16 | 英业达股份有限公司 | Data de-duplication processing system |
US8433979B2 (en) | 2011-02-28 | 2013-04-30 | International Business Machines Corporation | Nested multiple erasure correcting codes for storage arrays |
US9058291B2 (en) | 2011-02-28 | 2015-06-16 | International Business Machines Corporation | Multiple erasure correcting codes for storage arrays |
CN102169453A (en) * | 2011-03-08 | 2011-08-31 | 杭州电子科技大学 | File online backup method |
WO2011107045A2 (en) | 2011-04-19 | 2011-09-09 | 华为终端有限公司 | Method for backuping and recovering data of mobile terminal and mobile terminal thereof |
CN102436408B (en) * | 2011-10-10 | 2014-02-19 | 上海交通大学 | Data storage cloudification and cloud backup method based on Map/Dedup |
CN102510340A (en) * | 2011-10-11 | 2012-06-20 | 浪潮电子信息产业股份有限公司 | Method for realizing remote rapid backup by utilizing common Internet network |
US8874995B2 (en) | 2012-02-02 | 2014-10-28 | International Business Machines Corporation | Partial-maximum distance separable (PMDS) erasure correcting codes for storage arrays |
CN102915325A (en) * | 2012-08-11 | 2013-02-06 | 深圳市极限网络科技有限公司 | Md5 Hash list-based file decomposing and combining technique |
EP2915079A4 (en) * | 2012-10-31 | 2016-10-26 | Hewlett Packard Entpr Dev Lp | Cataloging backup data |
CN103873506A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block duplication removing system in storage cluster and method thereof |
WO2014107845A1 (en) * | 2013-01-09 | 2014-07-17 | 华为技术有限公司 | Data processing method and device |
CN103200169A (en) * | 2013-01-30 | 2013-07-10 | 中国科学院自动化研究所 | Method and system of user data protection based on proxy |
CN103384270B (en) * | 2013-06-28 | 2015-01-28 | 环境保护部华南环境科学研究所 | Method and system for data backup of internal and external network penetrating remote data transmission |
CN103677973A (en) * | 2013-09-01 | 2014-03-26 | 西安重装渭南光电科技有限公司 | Distributed multi-task scheduling management system |
CN103500120A (en) * | 2013-09-17 | 2014-01-08 | 北京思特奇信息技术股份有限公司 | Distributed cache high-availability processing method and system based on multithreading asynchronous double writing |
CN103870362B (en) * | 2014-03-21 | 2017-08-04 | 华为技术有限公司 | A data recovery method, device and backup system |
EP3180696B1 (en) * | 2014-08-14 | 2018-12-19 | Veeam Software Ag | System, method, and computer program product for low impact backup |
CN104331525B (en) * | 2014-12-01 | 2018-01-16 | 国家计算机网络与信息安全管理中心 | Sharing method based on data de-duplication |
CN104408141B (en) * | 2014-12-01 | 2018-04-17 | 国家计算机网络与信息安全管理中心 | One kind disappears superfluous file system and its data deployment method |
CN107959658B (en) * | 2016-10-17 | 2019-04-26 | 视联动力信息技术股份有限公司 | A kind of Web conference method of data synchronization and its system |
US10706038B2 (en) * | 2017-07-27 | 2020-07-07 | Cisco Technology, Inc. | System and method for state object data store |
US11113153B2 (en) * | 2017-07-27 | 2021-09-07 | EMC IP Holding Company LLC | Method and system for sharing pre-calculated fingerprints and data chunks amongst storage systems on a cloud local area network |
CN109347899B (en) * | 2018-08-22 | 2022-03-25 | 北京百度网讯科技有限公司 | Method for writing log data in distributed storage system |
CN111382012B (en) * | 2020-03-03 | 2020-12-29 | 广州鼎甲计算机科技有限公司 | Backup method and device for MySQL cloud database, computer equipment and storage medium |
CN114157674B (en) * | 2020-08-17 | 2024-08-09 | 中移(上海)信息通信科技有限公司 | Wireless communication method, device, system, server and medium |
CN116566973B (en) * | 2023-06-20 | 2023-11-07 | 北京中宏立达科技发展有限公司 | File transmission system based on peer-to-peer network |
-
2007
- 2007-12-10 CN CNB2007101687158A patent/CN100547555C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN101183323A (en) | 2008-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100547555C (en) | A Data Backup System Based on Fingerprint | |
US11907168B2 (en) | Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites | |
US12229148B2 (en) | Data connector component for implementing integrity checking, anomaly detection, and file system metadata analysis | |
US11755415B2 (en) | Variable data replication for storage implementing data backup | |
US7418464B2 (en) | Method, system, and program for storing data for retrieval and transfer | |
US9633065B2 (en) | Efficient data rehydration | |
EP2972772B1 (en) | In place snapshots and garbage collection therefor | |
EP3416060B1 (en) | Fast crash recovery for distributed database systems | |
CN101216791B (en) | File Backup Method Based on Fingerprint | |
Tan et al. | CABdedupe: A causality-based deduplication performance booster for cloud backup services | |
AU2017204760A1 (en) | Log record management | |
JP2017216010A (en) | Avoiding system-wide checkpoints in distributed database systems | |
CN110109778A (en) | A kind of a large amount of small data file backup methods and restoration methods | |
US10803012B1 (en) | Variable data replication for storage systems implementing quorum-based durability schemes | |
US9779035B1 (en) | Log-based data storage on sequentially written media | |
CN112800019A (en) | Data backup method and system based on Hadoop distributed file system | |
US8195612B1 (en) | Method and apparatus for providing a catalog to optimize stream-based data restoration | |
US11163447B2 (en) | Dedupe file system for bulk data migration to cloud platform | |
CN118626432A (en) | Data processing method, storage system, network interface device and storage medium | |
CN110297609A (en) | A Data Storage Method for Accurate Information System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20091007 Termination date: 20201210 |
|
CF01 | Termination of patent right due to non-payment of annual fee |