CN101814045A - Data organization method for backup services - Google Patents
Data organization method for backup services Download PDFInfo
- Publication number
- CN101814045A CN101814045A CN 201010152397 CN201010152397A CN101814045A CN 101814045 A CN101814045 A CN 101814045A CN 201010152397 CN201010152397 CN 201010152397 CN 201010152397 A CN201010152397 A CN 201010152397A CN 101814045 A CN101814045 A CN 101814045A
- Authority
- CN
- China
- Prior art keywords
- data
- backup
- space
- storage server
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000008520 organization Effects 0.000 title claims abstract description 17
- 238000011084 recovery Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 4
- 238000013507 mapping Methods 0.000 claims description 5
- 238000012217 deletion Methods 0.000 claims description 4
- 230000037430 deletion Effects 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 9
- 238000013523 data management Methods 0.000 abstract description 3
- 238000007726 management method Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种备份服务软件存储服务器端数据组织方法,用于提高存储服务器端的数据组织和数据管理效率。方法包括:①初始化存储服务器存储空间为元数据区(包括主记录、索引头和数据索引)和数据区;②接受并判断用户操作命令,备份操作顺序进行,恢复操作转步骤④,删除操作转步骤⑤;③处理用户备份操作,将用户数据备份到存储服务器数据区,同时利用重复数据删除技术避免重复数据备份;转步骤②;④处理恢复操作,将用户指定的恢复数据列表在存储服务器数据区定位查找,然后传输到用户端;转步骤②;⑤处理删除操作,查找用户指定删除的数据,根据这些数据在存储服务器数据区的备份数据块引用计数进行相应的处理;转步骤②。本方法提高了存储服务器端的利用率、可管理性和系统的可扩展性,节省了网络带宽,提高了备份效率。
The invention discloses a backup service software storage server end data organization method, which is used for improving the data organization and data management efficiency of the storage server end. The method includes: ① initializing storage server storage space into metadata area (including main record, index header and data index) and data area; ② accepting and judging user operation commands, performing backup operations sequentially, and turning to step ④ for restoring operations, and turning to step ④ for deleting operations. Step ⑤; ③ process user backup operation, back up user data to the storage server data area, and use deduplication technology to avoid duplicate data backup; turn to step ②; ④ process recovery operation, and list the recovery data specified by the user in the storage server data Area location search, and then transfer to the client; go to step ②; ⑤ process the delete operation, find the data that the user specifies to delete, and perform corresponding processing according to the backup data block reference count of these data in the storage server data area; go to step ②. The method improves the utilization rate, manageability and system scalability of the storage server end, saves network bandwidth, and improves backup efficiency.
Description
技术领域technical field
本发明属于计算机数据存储和备份方法,具体涉及一种备份服务的数据组织方法,该方法实现了块级重复数据的删除。The invention belongs to computer data storage and backup methods, and in particular relates to a data organization method for backup services. The method realizes the deletion of block-level repeated data.
背景技术Background technique
随着信息化社会的飞速发展,从人们的日常生活到企业的业务运营,都被日益渗透的信息系统所包围,对其依赖性也越来越大。尤其在金融、通讯、交通和保险等行业中,一旦关键数据丢失或损坏,会给个人和企业带来不可估量的损失。With the rapid development of the information society, people's daily life and business operations of enterprises are surrounded by increasingly infiltrated information systems, and their dependence is increasing. Especially in industries such as finance, communication, transportation and insurance, once key data is lost or damaged, it will bring immeasurable losses to individuals and enterprises.
这里所说的备份服务本质上是一个提供一定容灾功能的备份恢复软件系统,能够为个人和企业用户提供完善的数据备份、恢复及相关管理任务,而且能够根据自身实际需求定制各种备份策略。同时这里的备份服务也是一种软件模式,为企业搭建信息化所需要的所有网络基础设施及软件、硬件运作平台,并负责所有前期的实施、后期的维护等一系列服务,企业无需购买软硬件、建设机房、招聘IT人员,即可通过互联网使用信息系统。就像打开自来水龙头就能用水一样,企业根据实际需要租赁软件服务。The backup service mentioned here is essentially a backup and recovery software system that provides a certain disaster recovery function. It can provide individual and enterprise users with comprehensive data backup, recovery and related management tasks, and can customize various backup strategies according to their actual needs. . At the same time, the backup service here is also a software model. It builds all the network infrastructure and software and hardware operation platforms required for informatization for enterprises, and is responsible for all the early implementation and later maintenance and other services. Enterprises do not need to purchase software and hardware. , building a computer room, and recruiting IT personnel, the information system can be used through the Internet. Just like turning on the tap to get water, companies rent software services based on actual needs.
数据备份与恢复是保障信息安全的重要措施。数据重要性的不断凸现要求存储系统上的数据能得到有效而全面的保护。随着高速网络及通信技术、海量存储技术等新技术的出现和发展,基础存储资源相比以往有了天翻地覆的变化。日益增多的各种信息系统的应用,也使有保护价值的数据量呈几何级数上升,这些都为数据备份与恢复软件的开发与相关技术研究提出了更高的要求。Data backup and recovery are important measures to ensure information security. The increasing importance of data requires effective and comprehensive protection of data on storage systems. With the emergence and development of new technologies such as high-speed network and communication technology and mass storage technology, basic storage resources have undergone earth-shaking changes compared with the past. The increasing application of various information systems has also increased the amount of data with protection value in a geometric progression, which has put forward higher requirements for the development of data backup and recovery software and related technical research.
用户使用备份服务时对存储空间及数据方面的需求通常包括:能够根据需求增加或减少对存储空间的使用量;能够无障碍使用既有空间,即只要有剩余空间而且网络可达,数据备份任务都能够正确执行;能够随时恢复已备份的数据。为了满足这些需求,要求用户空间和数据具备一定的逻辑独立性,故需要研究用户空间管理方式和备份数据组织方法。另外,还需要设计出高效的空间分配、回收机制,在充分挖掘重复数据存储空间的耦合性的同时,保持用户数据的逻辑独立性,并实现高效的数据查找和访问。When users use backup services, their storage space and data requirements usually include: the ability to increase or decrease the use of storage space according to demand; the ability to use existing space without barriers, that is, as long as there is remaining space and the network is reachable, the data backup task can be executed correctly; the backed up data can be restored at any time. In order to meet these requirements, user space and data are required to have a certain logical independence, so it is necessary to study user space management methods and backup data organization methods. In addition, it is also necessary to design an efficient space allocation and recycling mechanism to fully exploit the coupling of duplicate data storage space while maintaining the logical independence of user data and achieve efficient data search and access.
在一般的备份软件架构中,存储服务器是经过数据备份软件的管理控制台认证过的物理介质,它可以是服务器上的一个硬盘空间,服务器外挂的存储设备,或者网络上的一个磁盘映射。通过管理控制台可以配置多个存储服务器,在备份服务器的统一管理下,备份客户端将数据备份到相应的存储服务器上。In the general backup software architecture, the storage server is a physical medium certified by the management console of the data backup software. It can be a hard disk space on the server, a storage device attached to the server, or a disk map on the network. Multiple storage servers can be configured through the management console. Under the unified management of the backup server, the backup client will back up data to the corresponding storage server.
在之前的备份软件存储服务器端的设计中,大多采用文件级的备份方法。文件级的备份,即备份软件只能感知到文件这一层,将源磁盘上所有的文件,备份到另一个目的介质上。所以文件级备份软件,要么依靠操作系统提供的文件系统接口来备份文件,要么自身具有文件系统的功能,可以识别文件系统元数据。简言之,文件级备份软件的机制就是将数据以文件为单位读出,然后再将读出的文件存储在另外一个介质上。显然这对于PB级大规模存储系统形成了性能瓶颈,由于存储服务器端管理的数据单元就是文件,这不可避免的造成大量重复数据的备份,给存储服务器端的管理也带来了很大的不方便,而利用块级重复数据删除技术进行数据备份能在很大程度上解决这些问题。In the design of the previous backup software storage server, most of them adopt the file-level backup method. File-level backup, that is, the backup software can only perceive the file level, and back up all the files on the source disk to another destination medium. Therefore, file-level backup software either relies on the file system interface provided by the operating system to back up files, or has the file system function itself and can identify file system metadata. In short, the mechanism of file-level backup software is to read data in units of files, and then store the read files on another medium. Obviously, this has formed a performance bottleneck for PB-level large-scale storage systems. Since the data unit managed by the storage server is a file, this inevitably results in the backup of a large amount of duplicate data, which also brings great inconvenience to the management of the storage server. , and using block-level deduplication technology for data backup can solve these problems to a large extent.
另一方面,在之前的备份软件中,存储服务器端初始分配给用户的可用存储空间容量往往是固定的,这样大大降低了系统的可扩展性。实际应用中,系统无法预料所面对的各个用户最终会使用多大存储空间(当然可能会根据用户的权限和类型有一个最大可用存储容量限制),分配大了很可能会造成存储空间浪费和利用率下降,分配小了可能会对用户的使用带来很大的限制。On the other hand, in the previous backup software, the storage server initially allocates a fixed amount of available storage space to users, which greatly reduces the scalability of the system. In practical applications, the system cannot predict how much storage space each user will eventually use (of course, there may be a maximum available storage capacity limit according to the user's permissions and types), and a large allocation may cause waste and utilization of storage space The rate drops, and a small allocation may bring great restrictions to the user's use.
最近EMC公司收购了Avamar公司,这家公司获专利的重复数据删除和全局单实例存储技术可确保备份数据段在全局范围内仅存储一次。这可以有效地将移动和恢复的数据量缩减300倍,同时还可以实现每日完整备份和快速恢复。针对每个24KB数据段,Avamar生成唯一的20字节ID标识,使用SHA-1加密算法。该唯一ID就是该数据段的指纹,于是Avamar的软件可以使用该唯一ID来确定是否一个数据段此前曾被存储过。但是SHA-1加密算法计算复杂,对CPU的消耗很大。同时由于数据段过小,当用户备份数据量很大时消耗的指纹空间也很大,同时还存在一定的可扩展性问题。EMC recently acquired Avamar, whose patented deduplication and global single instance storage technology ensures that backup data segments are stored globally only once. This effectively reduces the amount of data moved and restored by a factor of 300, while also enabling daily full backups and fast restores. For each 24KB data segment, Avamar generates a unique 20-byte ID, using the SHA-1 encryption algorithm. The unique ID is the fingerprint of the data segment, so Avamar's software can use the unique ID to determine whether a data segment has been stored before. However, the calculation of the SHA-1 encryption algorithm is complex and consumes a lot of CPU. At the same time, because the data segment is too small, when the user backs up a large amount of data, the fingerprint space consumed is also large, and there is still a certain scalability problem.
发明内容Contents of the invention
本发明的目的在于提供一种用于备份服务的数据组织方法,该方法可以实现块级重复数据的删除,能够提高数据组织和管理效率。The purpose of the present invention is to provide a data organization method for backup service, which can realize block-level duplicate data deletion, and can improve data organization and management efficiency.
本发明提供一种用于备份服务的数据组织方法,该方法包括下述步骤:The present invention provides a data organization method for backup service, the method includes the following steps:
(1)初始化:(1) Initialization:
元数据信息部分初始化,包括给元数据区的索引头信息、数据索引信息、数据区元数据信息赋初始值;Partial initialization of metadata information, including assigning initial values to index header information, data index information, and data area metadata information in the metadata area;
在数据区预分配一个数据空间准备接受用户的备份请求;Pre-allocate a data space in the data area and prepare to accept the user's backup request;
(2)接收用户命令并判断用户命令类型:(2) Receive user commands and determine the type of user commands:
判断用户命令类型,如果是备份操作,进入步骤(3),如果是恢复操作,则转入步骤(4),如果是删除操作,则转入步骤(5);Judge the user command type, if it is a backup operation, then enter step (3), if it is a recovery operation, then proceed to step (4), if it is a delete operation, then proceed to step (5);
(3)按照下述过程进行备份处理:(3) Perform backup processing according to the following process:
(3.1)首先将用户待备份的文件分块,然后对数据分块的内容用MD5算法进行哈希,得到一个唯一标识数据分块的指纹,数据分块以指纹为索引存储在存储服务器端元数据的索引头和数据索引中;(3.1) First divide the file to be backed up by the user into blocks, and then use the MD5 algorithm to hash the content of the data block to obtain a fingerprint that uniquely identifies the data block, and the data block is stored in the storage server end element with the fingerprint as an index Data index header and data index;
(3.2)由备份客户端向存储服务器传送数据分块的指纹,存储服务器端根据指纹查询该数据分块是否存在;(3.2) The fingerprint of the data block is transmitted from the backup client to the storage server, and the storage server side inquires whether the data block exists according to the fingerprint;
(3.3)如果该指纹不存在,则备份客户端传送该数据分块给存储服务器,则该数据分块为新备份数据块,在存储服务器端动态分配存储空间,并完成该新备份数据块的写操作;如果存在,则只需更新存储服务器端该数据分块所对应的索引信息,将其引用计数加一;(3.3) If the fingerprint does not exist, then the backup client sends the data block to the storage server, then the data block is a new backup data block, and the storage server dynamically allocates storage space, and completes the storage of the new backup data block Write operation; if it exists, you only need to update the index information corresponding to the data block on the storage server, and increase its reference count by one;
(3.4)转入步骤(2);(3.4) Go to step (2);
(4)恢复时,由备份服务器查得待恢复文件包含的Hash列表,根据Hash列表访问存储空间元数据信息来定位在相应数据空间中的逻辑位置,然后依次从存储服务器端读待恢复文件数据到内存缓冲区,然后通过套接字传给备份客户端,并合成所需文件集,再转入步骤(2);(4) When restoring, the backup server checks the Hash list contained in the file to be restored, accesses the metadata information of the storage space according to the Hash list to locate the logical position in the corresponding data space, and then reads the data of the file to be restored from the storage server in turn to the memory buffer, then pass it to the backup client through the socket, and synthesize the required file set, and then turn to step (2);
(5)按下述过程删除已备份文件:(5) Delete the backed up files according to the following procedure:
(5.1)由备份系统软件中备份服务器查得待删除文件包含的Hash列表;(5.1) check the Hash list that the file to be deleted comprises by the backup server in the backup system software;
(5.2)根据Hash值查找存储服务器端的元数据区的索引头和数据索引映射表,如果入口参数Hash值不存在,则立刻返回,返回值为false;(5.2) Search the index header and data index mapping table of the metadata area on the storage server side according to the Hash value, if the Hash value of the entry parameter does not exist, return immediately, and the return value is false;
(5.3)否则将Hash值对应的对象元数据的引用计数减1,返回值为true;(5.3) Otherwise, the reference count of the object metadata corresponding to the Hash value is decremented by 1, and the return value is true;
(5.4)再转入步骤(2)。(5.4) Go to step (2) again.
现在企业数据不仅种类多增长快,而且是高度冗余的,有很多相同的文件或数据储存在系统内和系统之间,编辑好的文件也同样有大量的冗余,这些冗余存在于文件以前的版本中。传统的备份软件将这些冗余数据一次又一次备份,放大了这种冗余。现在比较好的解决方法是采用重复数据删除技术。重复数据删除技术不但可以实现高压缩率,释放存储空间,也可以降低基于磁盘备份的成本,也降低了数据管理的成本。本发明是一种基于块级的重复数据删除技术来实现存储服务器端的数据组织及管理方法,能够高效执行与客户机之间的备份/恢复数据的传输,并按备份服务器的策略进行本地存储空间管理与数据组织。本发明在不影响主要用户备份和恢复的前提下,可以实现全局的重复数据删除,随着用户数和备份数据量的增长,重复数据删除的作用将越发明显。能够大大减少用户备份所需的数据量,节省备份时间、网络带宽和备份需要的存储空间。Nowadays, enterprise data is not only diverse and fast-growing, but also highly redundant. There are many identical files or data stored in the system and between systems, and edited files also have a large amount of redundancy, which exists in the file in previous versions. Traditional backup software amplifies this redundancy by backing up this redundant data over and over again. A better solution now is to use data deduplication technology. Data deduplication technology can not only achieve high compression ratio and release storage space, but also reduce the cost of disk-based backup and data management. The present invention is a data organization and management method based on block-level deduplication technology to realize the data organization and management of the storage server. Management and data organization. The present invention can realize global deduplication without affecting the backup and recovery of main users. With the increase of the number of users and the amount of backup data, the effect of deduplication will become more obvious. It can greatly reduce the amount of data required for user backup, saving backup time, network bandwidth and storage space required for backup.
附图说明Description of drawings
图1为本发明方法所使用的存储数据组织及数据项的定位过程图;Fig. 1 is the location process figure of storage data organization and data items used by the inventive method;
图2为本发明方法的流程框图;Fig. 2 is the block flow diagram of the inventive method;
图3为本发明中动态分配存储空间的流程图;Fig. 3 is the flowchart of dynamically allocating storage space among the present invention;
图4为本发明备份操作中的写操作流程图。Fig. 4 is a flow chart of the write operation in the backup operation of the present invention.
具体实施方式Detailed ways
下面通过借助实施实例更加详细地说明本发明,但以下实施例仅是说明性的,本发明的保护范围并不受这些实施例的限制。The present invention will be described in more detail below by means of implementation examples, but the following examples are only illustrative, and the protection scope of the present invention is not limited by these examples.
备份服务系统基于三方架构,由备份服务器、存储服务器、备份客户端三方组成。其中,备份客户端负责接受用户定制的数据备份策略、恢复请求或数据管理相关的其他请求。备份服务器连接备份客户端和存储服务器,是整个数据备份软件的控制中心。它负责用户权限控制、全局作业调度和全局存储管理。备份客户端发起备份/恢复作业时,由备份服务器引导其与指定的存储服务器建立连接并进入执行环节;另一方面,备份服务器将监控各存储服务器的计算、传输和存储压力,并执行负载均衡策略。用户信息、存储服务器状态及其他支撑备份服务器运行的基础元数据拟采用数据库进行存储。存储服务器负责执行与客户机之间的备份/恢复数据的传输,并按备份服务器的策略进行本地存储空间管理与数据组织。The backup service system is based on a tripartite architecture and consists of a backup server, a storage server, and a backup client. Wherein, the backup client is responsible for accepting user-customized data backup policies, recovery requests or other requests related to data management. The backup server connects the backup client and the storage server, and is the control center of the entire data backup software. It is responsible for user permission control, global job scheduling, and global storage management. When the backup client initiates a backup/restore job, the backup server guides it to establish a connection with the designated storage server and enter the execution link; on the other hand, the backup server will monitor the computing, transmission and storage pressure of each storage server, and perform load balancing Strategy. User information, storage server status and other basic metadata supporting the operation of the backup server are planned to be stored in a database. The storage server is responsible for performing backup/recovery data transmission with the client, and performs local storage space management and data organization according to the backup server's policy.
以下为需要说明的本实例使用的4个数据结构:主记录区、索引头、数据索引和数据区,其结构如图1所示。The following are the four data structures used in this example that need to be explained: main record area, index header, data index, and data area. The structure is shown in Figure 1.
主记录区:主要描述整个存储空间的信息,它存放如下信息:索引头信息、数据索引信息、数据区元数据信息。Main record area: mainly describes the information of the entire storage space, which stores the following information: index header information, data index information, and data area metadata information.
索引头:是一个对象哈希表,用来实现对象ID(由数据内容生成的160位的Hash值)到数据索引表的映射。这里的对象是存储系统中数据存储的基本单元,不同于作为传统存储系统中基本组件的文件和块,对象是应用数据和定义存储属性(元数据)的组合,其中包含数据和其它足够的信息允许数据自治和自我管理。它使用Hash值代表面向对象存储中的对象ID,以文件内容作为存储依据,通过Hash值索引建立内容与对象之间的映射关系。因为Hash值在统计意义上是全局唯一的,所以具有全局唯一的命名空间,提高了系统共享的可管理性。系统采用的是成熟的MD5算法,MD5Hash算法将任意长度的数据内容变换成一个128bit(16byte)的大整数,即对象ID。Index header: It is an object hash table, which is used to realize the mapping from the object ID (160-bit Hash value generated by the data content) to the data index table. The object here is the basic unit of data storage in the storage system, which is different from the files and blocks that are the basic components in the traditional storage system. The object is a combination of application data and defined storage attributes (metadata), which contains data and other sufficient information Allow data autonomy and self-management. It uses the Hash value to represent the object ID in object-oriented storage, uses the file content as the storage basis, and establishes the mapping relationship between the content and the object through the Hash value index. Because the Hash value is globally unique in a statistical sense, it has a globally unique namespace, which improves the manageability of system sharing. The system uses the mature MD5 algorithm, and the MD5Hash algorithm transforms the data content of any length into a 128bit (16byte) large integer, namely the object ID.
数据索引:是一个大小为N(N表示数据索引表中索引个数,取值范围为220~230)的数组,数组中的每个元素是一个对象的元数据结构,元数据结构中的信息有:对象ID、对象在数据区的起始偏移地址(I表示数据空间编号,J表示所在数据空间内的逻辑数据块编号,K表示数据空间内的逻辑数据块内的偏移地址)、对象所对应数据大小、对象所对应数据内容的副本个数、与该对象映射到对象哈希表中同一个位置的下一个对象在对象表中的位置,这就把映射到对象哈希表中同一个位置的对象链接成一个链表。Data index: It is an array with a size of N (N represents the number of indexes in the data index table, and the value range is 2 20 to 2 30 ), each element in the array is a metadata structure of an object, in the metadata structure The information includes: object ID, the starting offset address of the object in the data area (I represents the data space number, J represents the logical data block number in the data space, and K represents the offset address in the logical data block in the data space ), the size of the data corresponding to the object, the number of copies of the data content corresponding to the object, and the position in the object table of the next object that is mapped to the same position in the object hash table as the object, which maps to the object hash Objects at the same location in the table are linked into a linked list.
数据区:用来存放对象的数据,对象的数据包括对象ID、数据内容长度和数据内容,为了便于存储空间管理,将数据区分成若干个连续的数据空间(每一个数据空间用一个单独的数据文件来表示),每个数据空间由若干逻辑数据块组成。Data area: used to store the data of the object. The data of the object includes the object ID, the length of the data content and the data content. In order to facilitate the storage space management, the data area is divided into several continuous data spaces (each data space uses a separate data space) file), and each data space is composed of several logical data blocks.
数据分块:在备份服务系统中,执行备份或者恢复操作时,都是将处理的数据按照固定长度分块,每一个分块就是数据分块。Data block: In the backup service system, when performing backup or recovery operations, the processed data is divided into blocks according to a fixed length, and each block is a data block.
备份数据块:用户利用备份客户端备份指定的文件和文件夹时,备份客户端首先将这些要备份的数据按照固定长度分块(实际备份服务软件系统中分块大小为4M),每一个分块就是备份数据块Backup data block: When the user uses the backup client to back up specified files and folders, the backup client first divides the data to be backed up into blocks according to a fixed length (the block size in the actual backup service software system is 4M), and each block block is the backup data block
逻辑数据块:在存储服务器端,为了方便管理以及高效利用存储服务器存储空间,将每个数据空间分成若干个子存储单元,每个子存储单元就是一个逻辑数据块(实际备份服务软件系统中每个逻辑数据块大小为1G)Logical data block: On the storage server side, in order to facilitate management and efficiently utilize storage server storage space, each data space is divided into several sub-storage units, and each sub-storage unit is a logical data block (each logic block in the actual backup service software system The data block size is 1G)
下面结合附图进一步说明本实例的实现过程。The implementation process of this example will be further described below in conjunction with the accompanying drawings.
如图2示,本发明方法包括下述步骤:As shown in Figure 2, the inventive method comprises the following steps:
(1)初始化:(1) Initialization:
通常存储数据分为两部分:元数据区和数据区。用户实际备份的数据存储在数据区,而描述这些用户数据的相关信息存储在元数据区。开始初始化元数据区,主要是给元数据区的索引头信息、数据索引信息、数据区元数据信息赋初始值。将索引头即对象哈希表全部置为0,表示全部可用,同时也将数据索引数组中的每个元素置为零,表示这个时候还没有任何数据写入。并且在数据区预分配一个数据空间准备接受用户的备份请求。数据空间总数定义为S,S取值最大不超过1000。数据区当前已经使用了的数据空间数是V,V<=S。每个数据空间预分配的逻辑数据块(block)个数定义为P,P最大不超过10。每个数据空间最大逻辑数据块个数定义为W,W最大不超过1024。在我们目前备份服务软件部署实施中,每个数据空间由1024个逻辑数据块组成,每个逻辑数据块大小为1G,每个数据空间最大为1T。Usually stored data is divided into two parts: metadata area and data area. The data actually backed up by the user is stored in the data area, and the relevant information describing these user data is stored in the metadata area. Initialize the metadata area, mainly to assign initial values to the index header information, data index information, and data area metadata information in the metadata area. Set all the index headers, that is, the object hash table, to 0, indicating that all are available, and also set each element in the data index array to zero, indicating that no data has been written at this time. And pre-allocate a data space in the data area ready to accept the user's backup request. The total number of data spaces is defined as S, and the maximum value of S cannot exceed 1000. The number of data spaces currently used by the data area is V, and V<=S. The number of logical data blocks (blocks) pre-allocated in each data space is defined as P, and the maximum value of P does not exceed 10. The maximum number of logical data blocks in each data space is defined as W, and the maximum number of W cannot exceed 1024. In our current backup service software deployment and implementation, each data space consists of 1024 logical data blocks, each logical data block has a size of 1G, and each data space has a maximum size of 1T.
(2)接收用户命令并判断用户命令类型:(2) Receive user commands and determine the type of user commands:
判断用户命令类型,如果是备份操作,进入步骤(3),如果是恢复操作,则转入步骤(4),如果是删除操作,则转入步骤(5);Judge the user command type, if it is a backup operation, then enter step (3), if it is a recovery operation, then proceed to step (4), if it is a delete operation, then proceed to step (5);
(3)按照下述过程进行备份处理:(3) Perform backup processing according to the following process:
(3.1)首先将用户待备份的文件分块(定义数据分块大小为b,b取值大小为1M---4M,实际该备份服务软件部署时b取值为4M),然后对数据分块的内容用MD5算法进行哈希,得到一个唯一标识数据分块的指纹,数据分块以指纹为索引存储在存储服务器端元数据的索引头和数据索引中;(3.1) First divide the file to be backed up by the user into blocks (the defined data block size is b, and the value of b is 1M---4M, and the value of b is 4M when the backup service software is actually deployed), and then divide the data The content of the block is hashed with the MD5 algorithm to obtain a fingerprint that uniquely identifies the data block, and the data block is stored in the index header and data index of the server-side metadata with the fingerprint as an index;
(3.2)由备份客户端向存储服务器传送数据分块的指纹,存储服务器端根据指纹查询该数据分块是否存在;(3.2) The fingerprint of the data block is transmitted from the backup client to the storage server, and the storage server side inquires whether the data block exists according to the fingerprint;
(3.3)如果该指纹不存在,则备份客户端传送该数据分块给存储服务器,在存储服务器端动态分配存储空间,并完成该数据分块的写操作;如果存在,则无需传送数据,只需更新存储服务器端该数据分块所对应的索引信息,将引用计数加一。(3.3) If the fingerprint does not exist, the backup client sends the data block to the storage server, dynamically allocates storage space on the storage server side, and completes the write operation of the data block; if it exists, there is no need to transmit the data, only It is necessary to update the index information corresponding to the data block on the storage server, and increase the reference count by one.
(3.4)转入步骤(2);(3.4) Go to step (2);
上述步骤(3.3)中,可以按照图3所示的过程动态分配存储空间,具体步骤如下:In the above step (3.3), the storage space can be dynamically allocated according to the process shown in Figure 3, and the specific steps are as follows:
(a1)判断在当前数据空间的P个逻辑数据块中是否有能够满足指定大小备份数据块的剩余可用空间,如果有,进入步骤(a5),否则,进入步骤(a2);(a1) judge whether in the P logic data blocks of current data space, whether there is the remaining available space that can satisfy the designated size backup data block, if yes, enter step (a5), otherwise, enter step (a2);
(a2)判断P<W是否成立,若成立进入步骤(a6),否则,进入步骤(a3);(a2) Judging whether P<W is established, if established, enter step (a6), otherwise, enter step (a3);
(a3)判断在存储服务器主记录中的其它数据空间是否有能够满足指定大小的备份数据块的剩余可用空间,如果有,进入步骤(a5),否则,进入步骤(a4);(a3) judge whether other data spaces in the storage server master record have the remaining available space that can satisfy the backup data block of designated size, if yes, enter step (a5), otherwise, enter step (a4);
(a4)判读V<S是否成立,若成立,则在存储服务器上增长一个数据空间,为新的数据空间内待写入备份数据块分配一个数据索引,然后转入步骤(a8),否则,进入步骤(a7);(a4) Judging whether V<S is established, if established, then increase a data space on the storage server, allocate a data index for the backup data block to be written in the new data space, and then turn to step (a8), otherwise, Go to step (a7);
(a5)为剩余可用空间内待写入备份数据块分配一个数据索引,然后转入步骤(a8);(a5) allocate a data index for the backup data block to be written in the remaining free space, and then proceed to step (a8);
(a6)在存储服务器上为该数据空间增长一个备份数据块大小的空间,再为该备份数据块分配一个数据索引,然后转入步骤(a8);(a6) increase the space of a backup data block size for the data space on the storage server, then allocate a data index for the backup data block, then proceed to step (a8);
(a7)因为找不到有能够满足指定大小的备份数据块的剩余可用空间,所以宣布动态分配失败;(a7) Because there is no remaining free space that can satisfy the backup data block of the specified size, the dynamic allocation is declared to be failed;
(a8)结束动态分配过程。(a8) End the dynamic allocation process.
还可以如图4所示的过程完成写操作,其步骤如下:The write operation can also be completed as shown in Figure 4, and the steps are as follows:
(b1)在存储服务器端动态寻找可用存储空间,查找是否有满足条件的逻辑数据块;(b1) Dynamically search for available storage space on the storage server side, and find out whether there is a logical data block that meets the conditions;
(b2)如果没有可用逻辑数据块则返回失败;(b2) Return failure if there is no available logical data block;
(b3)如果有满足新备份数据块大小的可用存储空间,就创建一个新数据索引,将新备份数据块写入相应存储服务器位置,然后将相应索引头和数据索引元数据写入主记录区。(b3) If there is available storage space that meets the size of the new backup data block, create a new data index, write the new backup data block to the corresponding storage server location, and then write the corresponding index header and data index metadata into the main record area .
(4)恢复时,由备份系统软件中备份服务器查得待恢复文件包含的Hash列表,根据Hash列表访问存储空间元数据信息来定位在相应数据空间中的逻辑位置,然后依次从存储服务器端读待恢复文件数据到内存缓冲区,然后通过套接字传给备份客户端,并合成所需文件集,再转入步骤(2)。(4) When restoring, the backup server in the backup system software checks the Hash list contained in the file to be restored, accesses the metadata information of the storage space according to the Hash list to locate the logical position in the corresponding data space, and then reads it from the storage server in turn The file data to be restored is stored in the memory buffer, and then transmitted to the backup client through the socket, and the required file set is synthesized, and then the step (2) is performed.
如图2所示,根据Hash列表访问存储空间元数据信息来定位在相应数据空间中的逻辑位置的过程如下:As shown in Figure 2, the process of accessing the metadata information of the storage space according to the Hash list to locate the logical position in the corresponding data space is as follows:
(4.1)设m为预先设定的索引头的位数,由数据Hash值的前m位对索引头进行索引,索引头的内容构成了数据索引号。(4.1) Let m be the number of digits of the pre-set index header, the index header is indexed by the first m bits of the data Hash value, and the content of the index header constitutes the data index number.
通常,每个索引头占两个字节,一共有2m个,m的取值范围通常为20~30。Usually, each index header occupies two bytes, and there are 2 m in total, and the value of m usually ranges from 20 to 30.
(4.2)通过索引头对数据索引寻址。而数据索引则对一次操作的数据项进行寻址,具体包括三部分内容:(4.2) Address the data index through the index header. The data index addresses the data items of an operation, which specifically includes three parts:
(4.2.1)通过数据索引的结构体成员I(I为数据空间编号)找到具体的数据空间号;(4.2.1) Find the specific data space number by the structure member I of the data index (I is the data space number);
(4.2.2)通过数据索引的结构体成员J(J为数据空间内的逻辑数据块编号)找到数据空间中的块号;(4.2.2) Find the block number in the data space through the structure member J of the data index (J is the logical data block number in the data space);
(4.2.4)通过数据索引的结构体成员K(K为数据空间内的逻辑数据块内的偏移地址)找到数据项在逻辑数据块中的偏移地址,相当于三级寻址。由此可以定位数据项的数据头和数据实体。(4.2.4) Find the offset address of the data item in the logical data block through the structure member K of the data index (K is the offset address in the logical data block in the data space), which is equivalent to three-level addressing. The data header and data entity of the data item can thus be located.
(4.4)获得上面三级逻辑数据块地址信息,就可以定位到相应的数据区读取数据。(4.4) After obtaining the above three-level logical data block address information, the corresponding data area can be located to read data.
(5)按下述过程删除已备份文件:(5) Delete the backed up files according to the following procedure:
(5.1)由备份系统软件中备份服务器查得待删除备份文件包含的Hash列表;(5.1) check the Hash list that the backup file to be deleted contains by the backup server in the backup system software;
(5.2)根据Hash值查找存储服务器端的元数据区的索引头和数据索引映射表,如果入口参数Hash值不存在,则立刻返回,返回值为false;(5.2) Search the index header and data index mapping table of the metadata area on the storage server side according to the Hash value, if the Hash value of the entry parameter does not exist, return immediately, and the return value is false;
(5.3)否则将Hash值对应的对象元数据的引用计数减1,返回值为true;(5.3) Otherwise, the reference count of the object metadata corresponding to the Hash value is decremented by 1, and the return value is true;
(5.4)再转入步骤(2)。(5.4) Go to step (2) again.
由于我们提供的是一种在线备份服务,所以备份服务器和存储服务器是作为守护进程始终在后台运行的,因此不存在结束情况,始终等待接收用户的操作请求。而备份客户端是用户使用在线备份服务的操作界面,用户可以在任意时候登陆备份客户端执行指定的操作,如备份、恢复和删除等。Since we provide an online backup service, the backup server and storage server are always running in the background as daemon processes, so there is no end situation, and they are always waiting to receive user operation requests. The backup client is an operation interface for users to use the online backup service. Users can log in to the backup client at any time to perform specified operations, such as backup, recovery, and deletion.
实例:Example:
备份服务系统应用的运行支撑环境为:The running support environment of the backup service system application is:
1.硬件环境与支持环境1. Hardware environment and supporting environment
备份客户端要求主机具备512M及以上内存,10Mbps及以上网络吞吐能力。The backup client requires the host to have a memory of 512M or above and a network throughput of 10Mbps or above.
调度服务器要求主机具备2GB及以上内存,1000Mbps及以上网络吞吐能力。The scheduling server requires the host to have a memory of 2GB or above and a network throughput of 1000Mbps or above.
存储服务器要求主机具备4GB及以上内存和TB级外存储能力,1000Mbps级以上网络吞吐能力。The storage server requires the host to have 4GB or more memory and TB-level external storage capacity, and a network throughput capacity of 1000Mbps or more.
调度服务器和存储服务器软件所在主机之间具备GB级网络交换能力,客户端与服务端软件所在主机之间具备网络连通能力。要求服务器主机所在环境具备冗余电源保障、冗余通信链路保障、温度控制系统、防火系统等确保主机正常运转的基本条件。The scheduling server and the host where the storage server software are located have GB-level network exchange capabilities, and the client and server software hosts have network connectivity capabilities. The environment where the server host is located is required to have redundant power supply guarantee, redundant communication link guarantee, temperature control system, fire prevention system and other basic conditions to ensure the normal operation of the host.
2.软件运行环境2. Software operating environment
备份客户端程序运行在Windows XP及以后版本操作系统或基于Linux 2.6内核的操作系统平台下。The backup client program runs on the Windows XP and later version operating system or the operating system platform based on the Linux 2.6 kernel.
调度服务器及存储服务器运行在Windows Server 2003操作系统平台下。The scheduling server and storage server run on the Windows Server 2003 operating system platform.
在目前实现并正常运行的在线备份服务系统中,存储服务器端每个数据空间大小为1T,数据空间个数最大为20。每个数据空间分为1024个逻辑数据块,每个逻辑数据块大小为1G。In the online backup service system currently implemented and running normally, the size of each data space on the storage server side is 1T, and the maximum number of data spaces is 20. Each data space is divided into 1024 logical data blocks, and the size of each logical data block is 1G.
以上所述为本发明的较佳实施例而已,但本发明不应该局限于该实施例和附图所公开的内容。所以凡是不脱离本发明所公开的精神下完成的等效或修改,都落入本发明保护的范围。The above description is only a preferred embodiment of the present invention, but the present invention should not be limited to the content disclosed in this embodiment and the accompanying drawings. Therefore, all equivalents or modifications that do not deviate from the spirit disclosed in the present invention fall within the protection scope of the present invention.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101523978A CN101814045B (en) | 2010-04-22 | 2010-04-22 | Data organization method for backup services |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101523978A CN101814045B (en) | 2010-04-22 | 2010-04-22 | Data organization method for backup services |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101814045A true CN101814045A (en) | 2010-08-25 |
CN101814045B CN101814045B (en) | 2011-09-14 |
Family
ID=42621306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010101523978A Expired - Fee Related CN101814045B (en) | 2010-04-22 | 2010-04-22 | Data organization method for backup services |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101814045B (en) |
Cited By (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101986276A (en) * | 2010-10-21 | 2011-03-16 | 成都市华为赛门铁克科技有限公司 | Methods and systems for storing and recovering files and server |
CN101989929A (en) * | 2010-11-17 | 2011-03-23 | 中兴通讯股份有限公司 | Disaster recovery data backup method and system |
CN102004769A (en) * | 2010-11-12 | 2011-04-06 | 成都市华为赛门铁克科技有限公司 | File management method, equipment and memory system |
CN102012846A (en) * | 2010-12-12 | 2011-04-13 | 成都东方盛行电子有限责任公司 | Integrity check method for large video file |
CN102364474A (en) * | 2011-11-17 | 2012-02-29 | 中国科学院计算技术研究所 | Metadata storage system and management method for cluster file system |
CN102385554A (en) * | 2011-10-28 | 2012-03-21 | 华中科技大学 | Method for optimizing duplicated data deletion system |
CN102436408A (en) * | 2011-10-10 | 2012-05-02 | 上海交通大学 | Data storage clouding and cloud backup method based on Map/Dedup |
CN102456059A (en) * | 2010-10-21 | 2012-05-16 | 英业达股份有限公司 | Data de-duplication processing system |
CN102467528A (en) * | 2010-11-02 | 2012-05-23 | 英业达股份有限公司 | deduplication operating system |
CN102469142A (en) * | 2010-11-16 | 2012-05-23 | 英业达股份有限公司 | Data transfer methods for deduplicators |
CN102479245A (en) * | 2010-11-30 | 2012-05-30 | 英业达集团(天津)电子技术有限公司 | Data block segmentation method |
CN102647399A (en) * | 2011-02-17 | 2012-08-22 | 腾讯科技(深圳)有限公司 | Software backup method and software backup system |
CN102799659A (en) * | 2012-07-05 | 2012-11-28 | 广州鼎鼎信息科技有限公司 | Overall repeating data deleting system and method based on non-centre distribution system |
CN102810108A (en) * | 2011-06-02 | 2012-12-05 | 英业达股份有限公司 | How to deal with duplicate data |
CN102810107A (en) * | 2011-06-01 | 2012-12-05 | 英业达股份有限公司 | Processing method of repeated data |
CN102833298A (en) * | 2011-06-17 | 2012-12-19 | 英业达集团(天津)电子技术有限公司 | Distributed repeated data deleting system and processing method thereof |
CN102890721A (en) * | 2012-10-16 | 2013-01-23 | 苏州迈科网络安全技术股份有限公司 | Database establishment method and database establishment system based on column storage technology |
CN102915325A (en) * | 2012-08-11 | 2013-02-06 | 深圳市极限网络科技有限公司 | Md5 Hash list-based file decomposing and combining technique |
CN103139300A (en) * | 2013-02-05 | 2013-06-05 | 杭州电子科技大学 | Virtual machine image management optimization method based on data de-duplication |
CN103164431A (en) * | 2011-12-13 | 2013-06-19 | 北京神州泰岳软件股份有限公司 | Data storage method of relational database and storage system |
WO2013107295A1 (en) * | 2012-01-20 | 2013-07-25 | 腾讯科技(深圳)有限公司 | Method for recovering hard drive data, server and distributed storage system |
CN103238140A (en) * | 2010-09-03 | 2013-08-07 | 赛门铁克公司 | System and method for scalable reference management in a deduplication based storage system |
CN103309873A (en) * | 2012-03-09 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Method and device for processing data, and system |
CN103348334A (en) * | 2010-10-11 | 2013-10-09 | Est软件公司 | Cloud system and file compression and transmission method in a cloud system |
WO2013163813A1 (en) * | 2012-05-04 | 2013-11-07 | 华为技术有限公司 | Data deduplication method and device |
CN103412929A (en) * | 2013-08-16 | 2013-11-27 | 蓝盾信息安全技术股份有限公司 | Mass data storage method |
TWI420306B (en) * | 2010-12-22 | 2013-12-21 | Inventec Corp | A searching method of the blocks of the data deduplication |
CN103530201A (en) * | 2013-07-17 | 2014-01-22 | 华中科技大学 | Safety data repetition removing method and system applicable to backup system |
CN103559143A (en) * | 2013-11-08 | 2014-02-05 | 华为技术有限公司 | Data copying management device and data copying method of data copying management device |
CN103873503A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block backup system and method |
CN103944969A (en) * | 2014-03-31 | 2014-07-23 | 中国电子科技集团公司第三十研究所 | Secure transmission optimization method and device for narrow-band network |
CN104166607A (en) * | 2014-09-04 | 2014-11-26 | 北京国双科技有限公司 | Data processing method and device for backup database |
CN104317735A (en) * | 2014-09-24 | 2015-01-28 | 北京云巢动脉科技有限公司 | High-capacity cache and method for data storage and readout as well as memory allocation and recovery |
CN104317676A (en) * | 2014-11-21 | 2015-01-28 | 四川智诚天逸科技有限公司 | Data backup disaster tolerance method |
CN104537112A (en) * | 2015-01-20 | 2015-04-22 | 成都携恩科技有限公司 | Method for safe cloud computing |
CN104536849A (en) * | 2015-01-20 | 2015-04-22 | 成都携恩科技有限公司 | Data backup method based on cloud computing |
CN104778095A (en) * | 2015-01-20 | 2015-07-15 | 成都携恩科技有限公司 | Cloud platform data management method |
CN104965772A (en) * | 2015-07-29 | 2015-10-07 | 浪潮(北京)电子信息产业有限公司 | Method and device for recovering files |
CN105183400A (en) * | 2015-10-23 | 2015-12-23 | 浪潮(北京)电子信息产业有限公司 | Object storage method and system based on content addressing |
CN105302675A (en) * | 2015-11-25 | 2016-02-03 | 上海爱数信息技术股份有限公司 | Method and device for data backup |
CN106203154A (en) * | 2016-06-27 | 2016-12-07 | 联想(北京)有限公司 | A kind of file memory method and electronic equipment |
CN106326397A (en) * | 2016-08-19 | 2017-01-11 | 东软集团股份有限公司 | Method and device for generating index file |
CN106372170A (en) * | 2016-08-30 | 2017-02-01 | 上海爱数信息技术股份有限公司 | Database table recovery method and system and server with system |
CN103810297B (en) * | 2014-03-07 | 2017-02-01 | 华为技术有限公司 | Writing method, reading method, writing device and reading device on basis of re-deleting technology |
CN106877998A (en) * | 2017-01-11 | 2017-06-20 | 裘羽 | electronic evidence management method and system |
CN107066352A (en) * | 2017-03-02 | 2017-08-18 | 陈辉 | With delete again and remote functionality portable intelligent device backup devices and methods therefor |
CN107111460A (en) * | 2015-03-30 | 2017-08-29 | 西部数据技术公司 | Deduplication Using Chunk Files |
CN107340971A (en) * | 2016-04-28 | 2017-11-10 | 上海优刻得信息科技有限公司 | A kind of data storage is with recovering framework and method |
CN109254786A (en) * | 2018-09-30 | 2019-01-22 | 湖北华联博远科技有限公司 | A kind of software backup restoring method and system |
CN109271461A (en) * | 2018-09-30 | 2019-01-25 | 广州鼎甲计算机科技有限公司 | The increment synthesized backup method and device of SQL Server database |
CN110471793A (en) * | 2019-07-18 | 2019-11-19 | 维沃移动通信有限公司 | Data back up method, data reconstruction method, first terminal and second terminal |
CN111435331A (en) * | 2019-01-14 | 2020-07-21 | 杭州宏杉科技股份有限公司 | Data writing method and device for storage volume, electronic equipment and machine-readable storage medium |
CN111694848A (en) * | 2019-03-15 | 2020-09-22 | 阿里巴巴集团控股有限公司 | Method and apparatus for updating data buffer using reference count |
CN112000523A (en) * | 2020-08-25 | 2020-11-27 | 浪潮云信息技术股份公司 | Cloud backup system and method |
CN112256194A (en) * | 2020-09-30 | 2021-01-22 | 新华三技术有限公司成都分公司 | Storage space distribution method and storage server |
CN112328435A (en) * | 2020-12-07 | 2021-02-05 | 武汉绿色网络信息服务有限责任公司 | Method, device, equipment and storage medium for backing up and recovering target data |
CN112394873A (en) * | 2019-08-12 | 2021-02-23 | 深信服科技股份有限公司 | Data management method, system, electronic equipment and storage medium |
CN113111043A (en) * | 2021-04-21 | 2021-07-13 | 北京大学 | Method, device and system for processing source data file of middle station and storage medium |
CN113422789A (en) * | 2020-03-26 | 2021-09-21 | 山东管理学院 | Service deployment method and system in network computing environment |
CN114064361A (en) * | 2021-11-16 | 2022-02-18 | 阿里巴巴(中国)有限公司 | Data writing method executed in backup related operation and backup gateway system |
CN114528148A (en) * | 2020-10-30 | 2022-05-24 | 伊姆西Ip控股有限责任公司 | Method, electronic device and computer program product for storage management |
CN114816228A (en) * | 2021-01-29 | 2022-07-29 | 中移(苏州)软件技术有限公司 | A data processing method, device, server and storage medium |
CN118503207A (en) * | 2024-07-17 | 2024-08-16 | 青岛诺亚信息技术有限公司 | Scientific research whole process-oriented data management and archiving method and integrated platform |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005050386A2 (en) * | 2003-11-13 | 2005-06-02 | Commvault Systems, Inc. | System and method for performing a snapshot and for restoring data |
CN101183323A (en) * | 2007-12-10 | 2008-05-21 | 华中科技大学 | A Data Backup System Based on Fingerprint |
-
2010
- 2010-04-22 CN CN2010101523978A patent/CN101814045B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005050386A2 (en) * | 2003-11-13 | 2005-06-02 | Commvault Systems, Inc. | System and method for performing a snapshot and for restoring data |
CN101183323A (en) * | 2007-12-10 | 2008-05-21 | 华中科技大学 | A Data Backup System Based on Fingerprint |
Cited By (95)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103238140B (en) * | 2010-09-03 | 2016-02-17 | 赛门铁克公司 | Based on the system and method quoting management in the storage system of deduplication for easily extensible |
CN103238140A (en) * | 2010-09-03 | 2013-08-07 | 赛门铁克公司 | System and method for scalable reference management in a deduplication based storage system |
CN103348334B (en) * | 2010-10-11 | 2016-02-17 | Est软件公司 | Cloud system and the compressing file in cloud system and transfer approach |
CN103348334A (en) * | 2010-10-11 | 2013-10-09 | Est软件公司 | Cloud system and file compression and transmission method in a cloud system |
CN101986276A (en) * | 2010-10-21 | 2011-03-16 | 成都市华为赛门铁克科技有限公司 | Methods and systems for storing and recovering files and server |
CN102456059A (en) * | 2010-10-21 | 2012-05-16 | 英业达股份有限公司 | Data de-duplication processing system |
CN102467528A (en) * | 2010-11-02 | 2012-05-23 | 英业达股份有限公司 | deduplication operating system |
CN102004769A (en) * | 2010-11-12 | 2011-04-06 | 成都市华为赛门铁克科技有限公司 | File management method, equipment and memory system |
CN102469142A (en) * | 2010-11-16 | 2012-05-23 | 英业达股份有限公司 | Data transfer methods for deduplicators |
CN101989929B (en) * | 2010-11-17 | 2014-07-02 | 中兴通讯股份有限公司 | Disaster recovery data backup method and system |
WO2012065408A1 (en) * | 2010-11-17 | 2012-05-24 | 中兴通讯股份有限公司 | Disaster tolerance data backup method and system |
CN101989929A (en) * | 2010-11-17 | 2011-03-23 | 中兴通讯股份有限公司 | Disaster recovery data backup method and system |
CN102479245A (en) * | 2010-11-30 | 2012-05-30 | 英业达集团(天津)电子技术有限公司 | Data block segmentation method |
CN102479245B (en) * | 2010-11-30 | 2013-07-17 | 英业达集团(天津)电子技术有限公司 | Data block segmentation method |
CN102012846A (en) * | 2010-12-12 | 2011-04-13 | 成都东方盛行电子有限责任公司 | Integrity check method for large video file |
TWI420306B (en) * | 2010-12-22 | 2013-12-21 | Inventec Corp | A searching method of the blocks of the data deduplication |
CN102647399A (en) * | 2011-02-17 | 2012-08-22 | 腾讯科技(深圳)有限公司 | Software backup method and software backup system |
CN102647399B (en) * | 2011-02-17 | 2015-08-19 | 腾讯科技(深圳)有限公司 | A kind of software backup method and system |
CN102810107A (en) * | 2011-06-01 | 2012-12-05 | 英业达股份有限公司 | Processing method of repeated data |
CN102810107B (en) * | 2011-06-01 | 2015-10-07 | 英业达股份有限公司 | How to deal with duplicate data |
CN102810108A (en) * | 2011-06-02 | 2012-12-05 | 英业达股份有限公司 | How to deal with duplicate data |
CN102833298A (en) * | 2011-06-17 | 2012-12-19 | 英业达集团(天津)电子技术有限公司 | Distributed repeated data deleting system and processing method thereof |
CN102436408A (en) * | 2011-10-10 | 2012-05-02 | 上海交通大学 | Data storage clouding and cloud backup method based on Map/Dedup |
CN102436408B (en) * | 2011-10-10 | 2014-02-19 | 上海交通大学 | Data storage cloudification and cloud backup method based on Map/Dedup |
CN102385554A (en) * | 2011-10-28 | 2012-03-21 | 华中科技大学 | Method for optimizing duplicated data deletion system |
CN102385554B (en) * | 2011-10-28 | 2014-01-15 | 华中科技大学 | Optimizing Method for Data Deduplication System |
US9449005B2 (en) | 2011-11-17 | 2016-09-20 | Huawei Technologies Co., Ltd. | Metadata storage system and management method for cluster file system |
CN102364474A (en) * | 2011-11-17 | 2012-02-29 | 中国科学院计算技术研究所 | Metadata storage system and management method for cluster file system |
CN102364474B (en) * | 2011-11-17 | 2014-08-20 | 中国科学院计算技术研究所 | Metadata storage system for cluster file system and metadata management method |
CN103164431A (en) * | 2011-12-13 | 2013-06-19 | 北京神州泰岳软件股份有限公司 | Data storage method of relational database and storage system |
CN103164431B (en) * | 2011-12-13 | 2016-04-20 | 北京神州泰岳软件股份有限公司 | The date storage method of relevant database and storage system |
WO2013107295A1 (en) * | 2012-01-20 | 2013-07-25 | 腾讯科技(深圳)有限公司 | Method for recovering hard drive data, server and distributed storage system |
CN103309873A (en) * | 2012-03-09 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Method and device for processing data, and system |
WO2013163813A1 (en) * | 2012-05-04 | 2013-11-07 | 华为技术有限公司 | Data deduplication method and device |
US8719237B2 (en) | 2012-05-04 | 2014-05-06 | Huawei Technologies Co., Ltd. | Method and apparatus for deleting duplicate data |
CN102799659B (en) * | 2012-07-05 | 2015-01-21 | 广州鼎鼎信息科技有限公司 | Overall repeating data deleting system and method based on non-centre distribution system |
CN102799659A (en) * | 2012-07-05 | 2012-11-28 | 广州鼎鼎信息科技有限公司 | Overall repeating data deleting system and method based on non-centre distribution system |
CN102915325A (en) * | 2012-08-11 | 2013-02-06 | 深圳市极限网络科技有限公司 | Md5 Hash list-based file decomposing and combining technique |
CN102890721A (en) * | 2012-10-16 | 2013-01-23 | 苏州迈科网络安全技术股份有限公司 | Database establishment method and database establishment system based on column storage technology |
CN102890721B (en) * | 2012-10-16 | 2016-03-30 | 苏州迈科网络安全技术股份有限公司 | Based on database building method and the system of row memory technology |
CN103873503A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block backup system and method |
CN103139300A (en) * | 2013-02-05 | 2013-06-05 | 杭州电子科技大学 | Virtual machine image management optimization method based on data de-duplication |
CN103530201A (en) * | 2013-07-17 | 2014-01-22 | 华中科技大学 | Safety data repetition removing method and system applicable to backup system |
CN103530201B (en) * | 2013-07-17 | 2016-03-02 | 华中科技大学 | A kind of secure data De-weight method and system being applicable to standby system |
CN103412929A (en) * | 2013-08-16 | 2013-11-27 | 蓝盾信息安全技术股份有限公司 | Mass data storage method |
CN103559143A (en) * | 2013-11-08 | 2014-02-05 | 华为技术有限公司 | Data copying management device and data copying method of data copying management device |
WO2015067035A1 (en) * | 2013-11-08 | 2015-05-14 | 华为技术有限公司 | Data copy management device and data copy method thereof |
CN103810297B (en) * | 2014-03-07 | 2017-02-01 | 华为技术有限公司 | Writing method, reading method, writing device and reading device on basis of re-deleting technology |
CN103944969A (en) * | 2014-03-31 | 2014-07-23 | 中国电子科技集团公司第三十研究所 | Secure transmission optimization method and device for narrow-band network |
CN104166607A (en) * | 2014-09-04 | 2014-11-26 | 北京国双科技有限公司 | Data processing method and device for backup database |
CN104166607B (en) * | 2014-09-04 | 2017-12-19 | 北京国双科技有限公司 | Data processing method and device for backup database |
CN104317735A (en) * | 2014-09-24 | 2015-01-28 | 北京云巢动脉科技有限公司 | High-capacity cache and method for data storage and readout as well as memory allocation and recovery |
CN104317676A (en) * | 2014-11-21 | 2015-01-28 | 四川智诚天逸科技有限公司 | Data backup disaster tolerance method |
CN104778095A (en) * | 2015-01-20 | 2015-07-15 | 成都携恩科技有限公司 | Cloud platform data management method |
CN104536849B (en) * | 2015-01-20 | 2017-10-20 | 成都携恩科技有限公司 | A kind of data back up method based on cloud computing |
CN104537112A (en) * | 2015-01-20 | 2015-04-22 | 成都携恩科技有限公司 | Method for safe cloud computing |
CN104778095B (en) * | 2015-01-20 | 2017-11-17 | 成都携恩科技有限公司 | A kind of cloud platform data managing method |
CN104536849A (en) * | 2015-01-20 | 2015-04-22 | 成都携恩科技有限公司 | Data backup method based on cloud computing |
CN104537112B (en) * | 2015-01-20 | 2017-07-14 | 成都携恩科技有限公司 | A kind of method of safe cloud computing |
CN107111460B (en) * | 2015-03-30 | 2020-04-14 | 西部数据技术公司 | Deduplication using chunk files |
CN107111460A (en) * | 2015-03-30 | 2017-08-29 | 西部数据技术公司 | Deduplication Using Chunk Files |
CN104965772A (en) * | 2015-07-29 | 2015-10-07 | 浪潮(北京)电子信息产业有限公司 | Method and device for recovering files |
CN105183400A (en) * | 2015-10-23 | 2015-12-23 | 浪潮(北京)电子信息产业有限公司 | Object storage method and system based on content addressing |
CN105183400B (en) * | 2015-10-23 | 2019-03-12 | 浪潮(北京)电子信息产业有限公司 | A method and system for object storage based on content addressing |
CN105302675A (en) * | 2015-11-25 | 2016-02-03 | 上海爱数信息技术股份有限公司 | Method and device for data backup |
CN107340971B (en) * | 2016-04-28 | 2019-05-07 | 优刻得科技股份有限公司 | A kind of storage of data with restore framework and method |
CN107340971A (en) * | 2016-04-28 | 2017-11-10 | 上海优刻得信息科技有限公司 | A kind of data storage is with recovering framework and method |
CN106203154A (en) * | 2016-06-27 | 2016-12-07 | 联想(北京)有限公司 | A kind of file memory method and electronic equipment |
CN106326397A (en) * | 2016-08-19 | 2017-01-11 | 东软集团股份有限公司 | Method and device for generating index file |
CN106372170B (en) * | 2016-08-30 | 2020-02-14 | 上海爱数信息技术股份有限公司 | Method and system for recovering table in database and server with system |
CN106372170A (en) * | 2016-08-30 | 2017-02-01 | 上海爱数信息技术股份有限公司 | Database table recovery method and system and server with system |
CN106877998A (en) * | 2017-01-11 | 2017-06-20 | 裘羽 | electronic evidence management method and system |
CN106877998B (en) * | 2017-01-11 | 2020-06-19 | 裘羽 | Electronic evidence management method and system |
CN107066352A (en) * | 2017-03-02 | 2017-08-18 | 陈辉 | With delete again and remote functionality portable intelligent device backup devices and methods therefor |
CN109271461A (en) * | 2018-09-30 | 2019-01-25 | 广州鼎甲计算机科技有限公司 | The increment synthesized backup method and device of SQL Server database |
CN109254786A (en) * | 2018-09-30 | 2019-01-22 | 湖北华联博远科技有限公司 | A kind of software backup restoring method and system |
CN111435331A (en) * | 2019-01-14 | 2020-07-21 | 杭州宏杉科技股份有限公司 | Data writing method and device for storage volume, electronic equipment and machine-readable storage medium |
CN111435331B (en) * | 2019-01-14 | 2022-08-26 | 杭州宏杉科技股份有限公司 | Data writing method and device for storage volume, electronic equipment and machine-readable storage medium |
CN111694848A (en) * | 2019-03-15 | 2020-09-22 | 阿里巴巴集团控股有限公司 | Method and apparatus for updating data buffer using reference count |
CN111694848B (en) * | 2019-03-15 | 2023-04-25 | 阿里巴巴集团控股有限公司 | Method and apparatus for updating data buffering using reference counts |
CN110471793A (en) * | 2019-07-18 | 2019-11-19 | 维沃移动通信有限公司 | Data back up method, data reconstruction method, first terminal and second terminal |
CN110471793B (en) * | 2019-07-18 | 2022-05-06 | 维沃移动通信有限公司 | Data backup method, data recovery method, first terminal and second terminal |
CN112394873A (en) * | 2019-08-12 | 2021-02-23 | 深信服科技股份有限公司 | Data management method, system, electronic equipment and storage medium |
CN112394873B (en) * | 2019-08-12 | 2024-05-24 | 深信服科技股份有限公司 | Data management method, system, electronic equipment and storage medium |
CN113422789A (en) * | 2020-03-26 | 2021-09-21 | 山东管理学院 | Service deployment method and system in network computing environment |
CN113422789B (en) * | 2020-03-26 | 2022-11-25 | 山东管理学院 | Service deployment method and system in network computing environment |
CN112000523A (en) * | 2020-08-25 | 2020-11-27 | 浪潮云信息技术股份公司 | Cloud backup system and method |
CN112256194A (en) * | 2020-09-30 | 2021-01-22 | 新华三技术有限公司成都分公司 | Storage space distribution method and storage server |
CN114528148A (en) * | 2020-10-30 | 2022-05-24 | 伊姆西Ip控股有限责任公司 | Method, electronic device and computer program product for storage management |
CN112328435A (en) * | 2020-12-07 | 2021-02-05 | 武汉绿色网络信息服务有限责任公司 | Method, device, equipment and storage medium for backing up and recovering target data |
CN112328435B (en) * | 2020-12-07 | 2023-09-12 | 武汉绿色网络信息服务有限责任公司 | Methods, devices, equipment and storage media for target data backup and recovery |
CN114816228A (en) * | 2021-01-29 | 2022-07-29 | 中移(苏州)软件技术有限公司 | A data processing method, device, server and storage medium |
CN113111043A (en) * | 2021-04-21 | 2021-07-13 | 北京大学 | Method, device and system for processing source data file of middle station and storage medium |
CN114064361A (en) * | 2021-11-16 | 2022-02-18 | 阿里巴巴(中国)有限公司 | Data writing method executed in backup related operation and backup gateway system |
CN118503207A (en) * | 2024-07-17 | 2024-08-16 | 青岛诺亚信息技术有限公司 | Scientific research whole process-oriented data management and archiving method and integrated platform |
Also Published As
Publication number | Publication date |
---|---|
CN101814045B (en) | 2011-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101814045B (en) | Data organization method for backup services | |
JP7312251B2 (en) | Improving available storage space in systems with various data redundancy schemes | |
US10198356B2 (en) | Distributed cache nodes to send redo log records and receive acknowledgments to satisfy a write quorum requirement | |
US9946460B2 (en) | Storage subsystem and storage system architecture performing storage virtualization and method thereof | |
US10437721B2 (en) | Efficient garbage collection for a log-structured data store | |
US7882304B2 (en) | System and method for efficient updates of sequential block storage | |
US9582198B2 (en) | Compressed block map of densely-populated data structures | |
US10725666B2 (en) | Memory-based on-demand data page generation | |
WO2021263224A1 (en) | Incremental backup to object store | |
US10346075B2 (en) | Distributed storage system and control method for distributed storage system | |
US8996490B1 (en) | Managing logical views of directories | |
JP2019071100A (en) | Distributed storage system | |
US8938425B1 (en) | Managing logical views of storage | |
TW201935243A (en) | SSD, distributed data storage system and method for leveraging key-value storage | |
US10409804B2 (en) | Reducing I/O operations for on-demand demand data page generation | |
CN103890738A (en) | System and method for preserving deduplication in storage objects after clone split operations | |
CN104054071A (en) | Method for accessing storage device and storage device | |
CN106528338B (en) | A remote data replication method, storage device and storage system | |
CN111949210A (en) | Metadata storage method, system and storage medium in distributed storage system | |
JP2009512948A (en) | Method and apparatus for increasing throughput in a storage server | |
US20240103744A1 (en) | Block allocation for persistent memory during aggregate transition | |
US10387384B1 (en) | Method and system for semantic metadata compression in a two-tier storage system using copy-on-write | |
TW202340965A (en) | Persistent memory device and method for persistent memory device | |
US11822804B2 (en) | Managing extent sharing between snapshots using mapping addresses | |
JP4245304B2 (en) | Computer cluster system, file access method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110914 |