CN104714858A

CN104714858A - Data backup method, data recovery method and device

Info

Publication number: CN104714858A
Application number: CN201310685278.2A
Authority: CN
Inventors: 秦平
Original assignee: China Mobile Communications Group Co Ltd
Current assignee: China Mobile Communications Group Co Ltd
Priority date: 2013-12-13
Filing date: 2013-12-13
Publication date: 2015-06-17

Abstract

The invention relates to the technical field of data backup, in particular to a data backup method and device and a data recovery method and device. The data backup method and device and the data recovery method and device are used for solving the problem that needed backup and recovery window are comparatively long in the method of using an Export tool and an Import tool in an Hbase to conduct data backup and data recovery in the prior art. The data backup method comprises the steps that backup nodes create snapshoots for backup objects through the Hbase according to instructions of control nodes; the backup nodes backup the data in the created snapshoots to far-end storage nodes through a Hadoop distributed file system (HDFS). The data recovery method comprises the steps that recovery nodes read data stored in far-end storage nodes according to the instructions of the control nodes; the recovery nodes handle the read data to data in a snapshoot format, and the handled data are written into a data recovery system for providing a data access service through an HDFS interface.

Description

Data backup method and device, data recovery method and device

技术领域technical field

本发明涉及数据备份技术领域，尤其涉及数据备份方法及装置、数据恢复方法及装置。The invention relates to the technical field of data backup, in particular to a data backup method and device, and a data recovery method and device.

背景技术Background technique

移动业务运营支撑系统（Business and Operation Support System，BOSS）的帐详单系统已建成多年，承担着原始话单批价处理、账单生成、账详单查询等基础功能，并为统计分析等提供数据来源，而随着用户量和业务量的日益增长，帐详单系统的海量数据引起了存储空间不足、查询性能下降、统计分析瓶颈、文件库修改困难等问题，基于这些问题，人们引入了基于列模式、适合于存储海量数据的分布式数据库（Hbase）系统，为帐详单系统的性能带来了全面提升。The billing statement system of the mobile business operation support system (Business and Operation Support System, BOSS) has been established for many years, and it undertakes basic functions such as original bill approval processing, bill generation, and bill detailed bill query, and provides data for statistical analysis, etc. However, with the increasing number of users and business volume, the massive data of the billing statement system has caused problems such as insufficient storage space, degraded query performance, bottlenecks in statistical analysis, and difficulties in modifying file libraries. Based on these problems, people have introduced The column mode and the distributed database (Hbase) system suitable for storing massive data have brought a comprehensive improvement to the performance of the billing system.

在BOSS帐详单系统的云方案中，Hbase存储着海量的详单数据，数据备份在数据的安全管理等方面起着非常重要的作用；现有的基于Hbase的数据备份方案是利用Hbase的输出（Export）工具进行数据的备份，以及利用输入(Import)工具进行数据的恢复，其具体步骤为：利用Export工具将Hbase中的指定范围的数据，以表为粒度导出成HDFS中的文件；将HDFS中的文件备份到远端的存储节点中进行保存；数据恢复时先从远端存储节点中将数据恢复到HDFS中，再利用Import工具将HDFS中的文件加载到Hbase中。In the cloud solution of the BOSS account statement system, Hbase stores a large amount of detailed statement data, and data backup plays a very important role in data security management; the existing Hbase-based data backup solution uses the output of Hbase (Export) tool for data backup, and use the input (Import) tool for data recovery, the specific steps are: use the Export tool to export the data in the specified range in Hbase to a file in HDFS at the granularity of the table; The files in HDFS are backed up to the remote storage node for storage; when restoring data, first restore the data from the remote storage node to HDFS, and then use the Import tool to load the files in HDFS into Hbase.

在需要备份的数据量较大时，上述利用Hbase的Export工具进行数据的备份的方法需要的备份时间窗口会较长，从而严重影响了备份效率；同样地，在需要恢复的数据量较大时，上述利用Import工具进行数据的恢复的方法需要的恢复窗口也较长，从而严重影响了恢复效率。When the amount of data to be backed up is large, the above-mentioned method of using the Export tool of Hbase for data backup requires a longer backup time window, which seriously affects the backup efficiency; similarly, when the amount of data to be restored is large In addition, the above-mentioned method of recovering data by using the Import tool requires a relatively long recovery window, thereby seriously affecting the recovery efficiency.

发明内容Contents of the invention

本发明实施例提供一种数据备份方法及装置，用以解决现有技术中利用Hbase的Export工具进行数据备份的方法需要的备份时间窗口会较长的问题；The embodiment of the present invention provides a kind of data backup method and device, in order to solve the problem that the backup time window that utilizes the Export tool of Hbase to carry out the method for data backup in the prior art needs can be longer;

本发明实施例还提供一种数据恢复方法及装置，用以解决现有技术中利用Hbase的用Import工具进行数据恢复的方法需要的恢复窗口较长的问题。The embodiment of the present invention also provides a data recovery method and device, which are used to solve the problem of a long recovery window required by the method of using the Import tool for data recovery using Hbase in the prior art.

本发明实施例提供的一种数据备份方法，包括：A data backup method provided by an embodiment of the present invention includes:

备份节点根据控制节点的指示，通过分布式数据存储系统Hbase，为备份对象创建快照；The backup node creates a snapshot for the backup object through the distributed data storage system Hbase according to the instructions of the control node;

所述备份节点将创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中，其中，所述快照中的数据为在创建该快照之后，下一次为所述备份对象创建快照之前，所述备份对象中增加或修改的数据。The backup node backs up the created data in the snapshot to the remote storage node through the distributed file system HDFS, wherein the data in the snapshot is created for the backup object next time after the snapshot is created. Before the snapshot, the data added or modified in the backup object.

可选地，所述备份节点为备份对象创建快照，包括：Optionally, the backup node creates a snapshot for the backup object, including:

所述备份节点根据所述控制节点指示的增量备份时间间隔，为所述控制节点指示的备份对象创建快照；其中，具有关联关系的备份对象之间的增量备份时间间隔相同。The backup node creates a snapshot for the backup object indicated by the control node according to the incremental backup time interval indicated by the control node; wherein, the incremental backup time intervals between the backup objects having an association relationship are the same.

本发明另一实施例提供的一种数据备份方法，包括：A data backup method provided by another embodiment of the present invention includes:

控制节点根据用户输入的备份指示信息，生成备份策略；The control node generates a backup strategy according to the backup instruction information input by the user;

所述控制节点根据所述备份策略，指示多个备份节点并行执行数据备份任务，所述数据备份任务包括：针对备份对象创建快照，将创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中。The control node instructs multiple backup nodes to execute data backup tasks in parallel according to the backup policy, and the data backup tasks include: creating a snapshot for the backup object, and backing up the data in the created snapshot through the distributed file system HDFS to the remote storage node.

可选地，所述备份策略包括：备份对象、备份对象之间的关系和增量备份时间间隔，其中，具有关联关系的备份对象之间的增量备份时间间隔相同。Optionally, the backup strategy includes: backup objects, relationships between backup objects, and incremental backup time intervals, wherein the incremental backup time intervals between backup objects with associated relationships are the same.

本发明实施例提供的一种数据恢复方法，包括：A data recovery method provided by an embodiment of the present invention includes:

恢复节点根据控制节点的指示，读取远端存储节点存储的数据；The recovery node reads the data stored in the remote storage node according to the instructions of the control node;

所述恢复节点将读取的数据整理成快照格式的数据，并将整理后的数据通过分布式文件系统HDFS接口写入用于提供数据访问服务的数据恢复系统。The recovery node organizes the read data into data in a snapshot format, and writes the organized data into a data recovery system for providing data access services through a distributed file system HDFS interface.

可选地，所述恢复节点将读取的数据整理成快照格式的数据，包括：Optionally, the restoration node organizes the read data into data in a snapshot format, including:

所述恢复节点根据备份前的快照目录结构，创建读取的所述数据在所述HDFS中的快照目录结构。The recovery node creates a snapshot directory structure of the read data in the HDFS according to the snapshot directory structure before backup.

本发明另一实施例提供的一种数据恢复方法，包括：A data recovery method provided by another embodiment of the present invention includes:

控制节点根据用户输入的恢复指示信息，生成恢复策略；The control node generates a recovery strategy according to the recovery instruction information input by the user;

所述控制节点根据所述恢复策略，指示多个恢复节点并行执行数据恢复任务，所述数据恢复任务包括：读取远端存储节点存储的数据，将读取的数据整理成快照格式的数据，并将整理后的数据通过分布式文件系统HDFS接口写入用于提供数据访问服务的数据恢复系统。The control node instructs multiple recovery nodes to perform data recovery tasks in parallel according to the recovery strategy, and the data recovery tasks include: reading data stored in remote storage nodes, organizing the read data into data in a snapshot format, And the sorted data is written into the data recovery system for providing data access services through the distributed file system HDFS interface.

可选地，所述恢复策略包括：恢复对象和恢复的时间段。Optionally, the restoration strategy includes: restoration objects and restoration time periods.

本发明实施例提供的一种数据备份装置，包括：A data backup device provided by an embodiment of the present invention includes:

创建模块，用于根据控制节点的指示，通过分布式数据存储系统Hbase，为备份对象创建快照；Create a module for creating a snapshot for the backup object through the distributed data storage system Hbase according to the instructions of the control node;

备份模块，用于将所述创建模块创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中，其中，所述快照中的数据为在创建该快照之后，下一次为所述备份对象创建快照之前，所述备份对象中增加或修改的数据。A backup module, configured to back up the data in the snapshot created by the creation module to a remote storage node through the distributed file system HDFS, wherein the data in the snapshot is after the snapshot is created, the next time is The data added or modified in the backup object before the snapshot is created for the backup object.

本发明另一实施例提供的一种数据备份装置，包括：A data backup device provided by another embodiment of the present invention includes:

生成模块，用于根据用户输入的备份指示信息，生成备份策略；A generating module, configured to generate a backup strategy according to the backup instruction information input by the user;

指示模块，用于根据所述所述生成模块生成的备份策略，指示多个备份节点并行执行数据备份任务，所述数据备份任务包括：针对备份对象创建快照，将创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中。An instruction module, configured to instruct multiple backup nodes to execute data backup tasks in parallel according to the backup policy generated by the generation module, the data backup tasks include: creating a snapshot for the backup object, and the data in the snapshot to be created Back up to remote storage nodes through the distributed file system HDFS.

本发明实施例提供的一种数据恢复装置，包括：A data recovery device provided by an embodiment of the present invention includes:

读取模块，用于根据控制节点的指示，读取远端存储节点存储的数据；The reading module is used to read the data stored in the remote storage node according to the instruction of the control node;

写入模块，用于将所述读取模块读取的数据整理成快照格式的数据，并将整理后的数据通过分布式文件系统HDFS接口写入用于提供数据访问服务的数据恢复系统。The writing module is used to organize the data read by the reading module into snapshot format data, and write the arranged data into the data recovery system for providing data access services through the distributed file system HDFS interface.

本发明另一实施例提供的一种数据恢复装置，包括：A data recovery device provided by another embodiment of the present invention includes:

生成模块，用于根据用户输入的恢复指示信息，生成恢复策略；A generating module, configured to generate a recovery strategy according to the recovery instruction information input by the user;

指示模块，用于根据所述生成模块生成的恢复策略，指示多个恢复节点并行执行数据恢复任务，所述数据恢复任务包括：读取远端存储节点存储的数据，将读取的数据整理成快照格式的数据，并将整理后的数据通过分布式文件系统HDFS接口写入用于提供数据访问服务的数据恢复系统。An instruction module, configured to instruct multiple recovery nodes to execute data recovery tasks in parallel according to the recovery strategy generated by the generation module, and the data recovery tasks include: reading data stored in remote storage nodes, and organizing the read data into The data in the snapshot format, and the sorted data is written to the data recovery system for providing data access services through the distributed file system HDFS interface.

本发明实施例中备份节点通过Hbase为备份对象创建快照，在需要备份时，将快照中的数据通过HDFS备份到远端存储节点中，从而不仅可以不必再采用Export工具进行数据的导出操作，而且可以仅对数据的增量部分进行备份，从而可以极大地缩短了备份窗口，提高了备份效率。In the embodiment of the present invention, the backup node creates a snapshot for the backup object through Hbase, and when backup is required, the data in the snapshot is backed up to the remote storage node through HDFS, so that not only the Export tool can no longer be used to export the data, but also Only the incremental part of the data can be backed up, so the backup window can be greatly shortened and the backup efficiency can be improved.

附图说明Description of drawings

图1为本发明实施例一提供的数据备份方法流程图；FIG. 1 is a flowchart of a data backup method provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的数据备份方法流程图；FIG. 2 is a flowchart of a data backup method provided by Embodiment 2 of the present invention;

图3为本发明实施例一提供的数据恢复方法流程图；FIG. 3 is a flowchart of a data recovery method provided by Embodiment 1 of the present invention;

图4为本发明实施例二提供的数据恢复方法流程图；FIG. 4 is a flowchart of a data recovery method provided by Embodiment 2 of the present invention;

图5为本发明实施例提供的数据备份及恢复的系统结构示意图；5 is a schematic structural diagram of a data backup and recovery system provided by an embodiment of the present invention;

图6为本发明实施例三提供的数据备份方法流程图；FIG. 6 is a flowchart of a data backup method provided by Embodiment 3 of the present invention;

图6a为备份节点进行数据备份的方法流程图；FIG. 6a is a flowchart of a method for backup nodes to perform data backup;

图7为本发明实施例三提供的数据恢复方法流程图；FIG. 7 is a flowchart of a data recovery method provided by Embodiment 3 of the present invention;

图8为本发明实施例一提供的一种数据备份装置结构示意图；FIG. 8 is a schematic structural diagram of a data backup device provided in Embodiment 1 of the present invention;

图9为本发明实施例二提供的一种数据备份装置结构示意图；FIG. 9 is a schematic structural diagram of a data backup device provided in Embodiment 2 of the present invention;

图10为本发明实施例一提供的一种数据恢复装置结构示意图；FIG. 10 is a schematic structural diagram of a data recovery device provided in Embodiment 1 of the present invention;

图11为本发明实施例二提供的一种数据恢复装置结构示意图。FIG. 11 is a schematic structural diagram of a data recovery device provided by Embodiment 2 of the present invention.

具体实施方式Detailed ways

下面结合说明书附图对本发明实施例作进一步详细描述。The embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings.

如图1所示，为本发明实施例一提供的数据备份方法流程图，包括以下步骤：As shown in FIG. 1, it is a flowchart of a data backup method provided by Embodiment 1 of the present invention, including the following steps:

S101：备份节点根据控制节点的指示，通过分布式数据存储系统Hbase，为备份对象创建快照；S101: the backup node creates a snapshot for the backup object through the distributed data storage system Hbase according to the instruction of the control node;

S102：所述备份节点将创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中，其中，所述快照中的数据为在创建该快照之后，下一次为所述备份对象创建快照之前，所述备份对象中增加或修改的数据。S102: The backup node backs up the created data in the snapshot to the remote storage node through the distributed file system HDFS, wherein the data in the snapshot is the next backup after the snapshot is created The added or modified data in the backup object before the snapshot is created for the object.

本发明实施例中，通过Hbase的快照（Snapshot）功能完成对数据的增量备份，同时将数据的备份操作转移到分布式文件系统（Hadoop Distributed FileSystem，HDFS），采用这种快照方式，不仅可以不必再采用Export工具进行数据的导出操作，而且可以仅对数据的增量部分进行备份，从而可以极大地缩短了备份窗口，提高了备份效率。In the embodiment of the present invention, the incremental backup of data is completed through the snapshot (Snapshot) function of Hbase, and at the same time, the backup operation of the data is transferred to the distributed file system (Hadoop Distributed FileSystem, HDFS). Using this snapshot method, not only can It is no longer necessary to use the Export tool to export data, and only the incremental part of the data can be backed up, thereby greatly shortening the backup window and improving backup efficiency.

在具体实施过程中，控制节点可以根据用户输入的备份指示信息，生成备份策略，指示备份节点进行备份，备份策略可以包括备份对象、备份对象之间的关系、增量备份时间间隔以及备份方式等；其中，备份对象可以是需要备份的数据表的名称，这里，备份的数据表可以为原始数据包，其它可以经过原始数据表处理得到的数据可以不进行备份；备份对象之间的关系包括两种，一种是具有关联关系，另一种是不具有关联关系，在具体实施中，如果多个数据表之间具有关联关系，需要同时备份，则多个数据表之间的关系可以用和（AND）表示，如果多个数据表之间不具有关联关系，则可以用或（OR）表示；增量备份时间间隔为针对同一个备份对象，前后两次创建快照的时间差，其中，为了保证数据表之间的逻辑关系的准确性，具有AND关系的数据表之间的增量备份时间间隔相同，而具有OR关系的数据表之间的增量备份时间间隔可以不同；除此，用户还可以选择备份方式，比如可以选择是全量备份还是增量备份，为提高备份效率，本发明实施例中优选增量备份方式。In the specific implementation process, the control node can generate a backup policy according to the backup instruction information input by the user, and instruct the backup node to perform backup. The backup policy can include backup objects, the relationship between backup objects, incremental backup time intervals, and backup methods, etc. ; Wherein, the backup object can be the name of the data table that needs to be backed up. Here, the data table backed up can be the original data package, and other data that can be processed through the original data table can not be backed up; the relationship between the backup objects includes two One type has an association relationship, and the other does not have an association relationship. In specific implementation, if multiple data tables have an association relationship and need to be backed up at the same time, the relationship between multiple data tables can be used and (AND) means that if there is no correlation between multiple data tables, it can be expressed by or (OR); the incremental backup time interval is the time difference between the two snapshots created for the same backup object, in which, in order to ensure The accuracy of the logical relationship between the data tables, the incremental backup time interval between the data tables with the AND relationship is the same, but the incremental backup time interval between the data tables with the OR relationship can be different; in addition, the user also needs to A backup method can be selected, for example, a full backup or an incremental backup can be selected. In order to improve backup efficiency, the incremental backup method is preferred in the embodiment of the present invention.

与上述实施例一的数据备份流程对应，本发明实施例中还提供了以下基于控制节点侧的备份方法，具体实施过程与上述实施例相似，重复之处，不再赘述。Corresponding to the data backup process in the first embodiment above, the embodiment of the present invention also provides the following backup method based on the control node side. The specific implementation process is similar to the above embodiment, and the repetition will not be repeated.

如图2所示，为本发明实施例二提供的数据备份方法流程图，包括：As shown in Figure 2, it is a flow chart of the data backup method provided by Embodiment 2 of the present invention, including:

S201：控制节点根据用户输入的备份指示信息，生成备份策略；S201: The control node generates a backup strategy according to the backup instruction information input by the user;

这里，备份指示信息实质上就是用户（管理人员）通过控制节点的图形界面定制的用户可识别的备份信息，控制节点根据该备份指示信息，生成计算机可识别的具体的备份策略，用于指示备份节点执行备份任务；在具体实施中，可以设置一些默认的备份信息，用户在输入备份指示信息时，可以不用输入详细的备份信息，比如，用户可以输入对数据表1每周备份一次，控制节点根据该备份指示信息生成的备份策略中，默认在每周的周日0点对该数据表1备份一次。Here, the backup instruction information is essentially user-identifiable backup information customized by the user (manager) through the graphical interface of the control node. The node executes the backup task; in the specific implementation, some default backup information can be set, and the user does not need to input detailed backup information when inputting the backup instruction information. For example, the user can input data table 1 to be backed up once a week, and control In the backup policy generated according to the backup instruction information, by default, the data table 1 is backed up once every Sunday at 00:00.

S202：所述控制节点根据所述备份策略，指示多个备份节点并行执行数据备份任务，所述数据备份任务包括：针对备份对象创建快照，将创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中。S202: The control node instructs multiple backup nodes to execute a data backup task in parallel according to the backup policy, the data backup task includes: creating a snapshot for the backup object, and passing the created data in the snapshot through the distributed file system HDFS is backed up to remote storage nodes.

本发明实施例中，控制节点指示多个备份节点并行执行备份任务，多个备份节点进行协同备份，运行在并行计算架构之上，可以有效提高备份的效率。In the embodiment of the present invention, the control node instructs multiple backup nodes to perform backup tasks in parallel, and the multiple backup nodes perform coordinated backup and run on a parallel computing architecture, which can effectively improve backup efficiency.

需要说明的是，本发明实施例一、二中，控制节点和备份节点可以分别布置在不同的硬件设备，如计算机上，也可以布置在同一台硬件设备上，为该同一台硬件设备的两个功能模块。It should be noted that, in Embodiments 1 and 2 of the present invention, the control node and the backup node can be respectively arranged on different hardware devices, such as computers, or they can be arranged on the same hardware device. a functional module.

在进行数据备份之后，若需要获取备份数据，就涉及到了数据恢复问题，基于此，本发明实施例中还提供了以下数据恢复方法；After data backup, if the backup data needs to be obtained, the data recovery problem is involved. Based on this, the following data recovery method is also provided in the embodiment of the present invention;

如图3所示，为本发明实施例一提供的数据恢复方法流程图，包括：As shown in Figure 3, it is a flow chart of the data recovery method provided by Embodiment 1 of the present invention, including:

S301：恢复节点根据控制节点的指示，读取远端存储节点存储的数据；S301: The recovery node reads the data stored in the remote storage node according to the instruction of the control node;

S302：所述恢复节点将读取的数据整理成快照格式的数据，并将整理后的数据通过HDFS接口写入用于提供数据访问服务的数据恢复系统。S302: The restoration node organizes the read data into data in a snapshot format, and writes the organized data into a data recovery system for providing data access services through an HDFS interface.

本发明实施例中，恢复节点根据控制节点的指示，读取远端存储节点存储的数据，并将读取的数据整理成快照格式的数据，将整理后的数据通过HDFS接口写入用于提供数据访问服务的数据恢复系统，与上述数据备份过程相似，采用这种快照方式进行数据恢复，同样极大地缩短了恢复窗口。In the embodiment of the present invention, the recovery node reads the data stored in the remote storage node according to the instructions of the control node, organizes the read data into data in a snapshot format, and writes the organized data through the HDFS interface to provide The data recovery system of the data access service is similar to the above-mentioned data backup process. Using this snapshot method for data recovery also greatly shortens the recovery window.

在具体实施过程中，控制节点可以根据生成的恢复策略，指示恢复节点进行数据的恢复，恢复策略可以包括恢复目标、恢复的时间段和恢复的目的地址等，其中，恢复的目标可以是具体需要恢复的数据表的名称，恢复的时间段具体指恢复数据的时间段，即需要恢复在该时间段内增加或修改的数据，恢复的目的地址可以是指需要将数据恢复到哪个系统，本发明实施例中将需要使用恢复的数据对外提供服务的系统统称为数据恢复系统。In the specific implementation process, the control node can instruct the recovery node to restore the data according to the generated recovery strategy. The recovery strategy can include the recovery target, the recovery time period, and the recovery destination address, etc., wherein the recovery target can be specific needs. The name of the restored data table, the restored time period specifically refers to the time period for recovering data, that is, the data added or modified within this time period needs to be restored, and the destination address of the restoration can refer to which system the data needs to be restored to. In the embodiments, the systems that need to use the restored data to provide external services are collectively referred to as data restoration systems.

与上述数据恢复方法流程对应，本发明实施例还提供了以下基于控制节点侧的数据恢复方法流程图；Corresponding to the flow of the above data recovery method, the embodiment of the present invention also provides the following flow chart of the data recovery method based on the control node side;

如图4所示，为本发明实施例二提供的数据恢复方法流程图，包括;As shown in Figure 4, it is a flow chart of the data recovery method provided by Embodiment 2 of the present invention, including;

S401：控制节点根据用户输入的恢复指示信息，生成恢复策略；S401: The control node generates a recovery strategy according to the recovery instruction information input by the user;

S402：所述控制节点根据所述恢复策略，指示多个恢复节点并行执行数据恢复任务，所述数据恢复任务包括：读取远端存储节点存储的数据，将读取的数据整理成快照格式的数据，并将整理后的数据通过分布式文件系统HDFS接口写入用于提供数据访问服务的数据恢复系统。S402: The control node instructs multiple recovery nodes to perform data recovery tasks in parallel according to the recovery strategy, and the data recovery tasks include: reading data stored in remote storage nodes, and organizing the read data into snapshot format data, and write the sorted data into the data recovery system for providing data access services through the distributed file system HDFS interface.

本发明实施例中，控制节点指示多个恢复节点并行执行恢复任务，多个恢复节点运行在并行计算架构之上，可以有效提高数据恢复效率。In the embodiment of the present invention, the control node instructs multiple recovery nodes to perform recovery tasks in parallel, and multiple recovery nodes run on a parallel computing architecture, which can effectively improve data recovery efficiency.

步骤S401中，恢复指示信息实质上就是用户（管理人员）通过控制节点的图形界面定制的用户可识别的恢复信息，控制节点根据该恢复指示信息，生成计算机可识别的具体的恢复策略，用于指示恢复节点执行恢复任务；在具体实施中，可以设置一些默认的恢复信息，用户在输入恢复指示信息时，可以不用输入详细的恢复信息，比如，用户可以不用输入数据恢复的目的地址，控制节点在接收到用户输入的恢复指示信息后，在生成的恢复策略中，直接将数据恢复系统指定为恢复的目的地址。In step S401, the restoration instruction information is essentially the user-identifiable restoration information customized by the user (manager) through the graphical interface of the control node, and the control node generates a specific restoration strategy recognizable by the computer according to the restoration instruction information for Instruct the recovery node to perform the recovery task; in the specific implementation, some default recovery information can be set, and the user does not need to input detailed recovery information when inputting the recovery instruction information. For example, the user does not need to input the destination address of the data recovery, and the control node After receiving the recovery instruction information input by the user, the data recovery system is directly designated as the recovery destination address in the generated recovery policy.

需要说明的是，本发明实施例中，控制节点和恢复节点可以分别布置在不同的硬件设备，如计算机上，也可以布置在同一台硬件设备上，为该同一台硬件设备的两个功能模块。不仅如此，本发明实施例可以与上述数据备份方法实施例相结合，控制节点、备份节点和恢复节点可以分别布置在不同的硬件设备，如计算机上，也可以布置在同一台硬件设备上，为该同一台硬件设备的不同的功能模块。It should be noted that, in the embodiment of the present invention, the control node and the recovery node can be arranged on different hardware devices, such as computers, or they can be arranged on the same hardware device, which are two functional modules of the same hardware device . Not only that, the embodiment of the present invention can be combined with the above-mentioned data backup method embodiment, the control node, the backup node and the recovery node can be respectively arranged on different hardware devices, such as computers, or they can be arranged on the same hardware device, for Different functional modules of the same hardware device.

本发明实施例提供的上述数据备份及恢复方法，可以极大地缩短备份及恢复窗口，减少对生产任务的影响，具体地，本发明实施例中，数据备份及恢复操作主要在HDFS层完成，对生产任务所依赖的Hbase影响较小；而且，这种快照方式与Export、Import工具相比，可以直接备份压缩数据或直接将压缩数据恢复到数据恢复系统，在不影响数据完整性的前提下，大大缩短了备份及恢复的数据量；同时，由于HDFS层的读数据带宽是Hbase层的6～8倍，采用本发明实施例的上述备份及恢复方法，可以极大地提供数据备份及恢复的效率。The above-mentioned data backup and recovery method provided by the embodiment of the present invention can greatly shorten the backup and recovery window and reduce the impact on production tasks. Specifically, in the embodiment of the present invention, the data backup and recovery operations are mainly completed at the HDFS layer. The Hbase that production tasks depend on has less impact; moreover, compared with Export and Import tools, this snapshot method can directly back up compressed data or directly restore compressed data to the data recovery system without affecting data integrity. Greatly shorten the amount of data for backup and recovery; meanwhile, because the read data bandwidth of the HDFS layer is 6 to 8 times that of the Hbase layer, adopting the above-mentioned backup and recovery method in the embodiment of the present invention can greatly provide data backup and recovery efficiency .

为了更好地说明本发明实施例中进行数据备份及恢复的流程，下面通过具体的实施例进行详细说明；In order to better illustrate the process of data backup and recovery in the embodiment of the present invention, the following will describe in detail through specific embodiments;

如图5所示，为本发明实施例提供的数据备份及恢复的系统结构示意图；本发明实施例实现数据备份及恢复功能的系统主要包括：控制节点、备份节点、恢复节点以及存储节点，其中，控制节点可以提供一个图形界面供备份管理人员进行备份策略和恢复策略的定制，还可以显示备份进度记录，该备份进度记录中记录备份任务的进度，控制节点可以控制备份节点以及恢复节点的工作，比如控制备份及恢复的开始和结束等；备份节点执行具体的备份任务，在具体实施中，多个备份节点运行在并行计算架构之上，可以根据备份文件所在的数据节点（NataNode），实现更快速的备份；对应地，恢复节点执行具体的恢复任务，在具体实施中，多个恢复节点运行在并行计算架构之上，有效提高恢复数据的并行性，提高数据恢复效率；存储节点可以有多个，用于存储海量的备份数据，索引节点可以为备份的数据提供索引，在进行数据恢复时，可以快速定位备份数据；图中的生产系统和数据恢复系统为用于对外提供数据访问服务，其中，生产系统用于对外提供备份前的数据，数据恢复系统用于提供从存储节点中恢复出来的数据。As shown in Figure 5, it is a schematic diagram of the system structure of data backup and recovery provided by the embodiment of the present invention; the system for implementing data backup and recovery functions in the embodiment of the present invention mainly includes: a control node, a backup node, a recovery node and a storage node, wherein , the control node can provide a graphical interface for the backup manager to customize the backup strategy and recovery strategy, and can also display the backup progress record, which records the progress of the backup task, and the control node can control the work of the backup node and the recovery node , such as controlling the start and end of backup and recovery, etc.; the backup node performs specific backup tasks. In the specific implementation, multiple backup nodes run on the parallel computing architecture, which can be realized according to the data node (NataNode) where the backup file is located. Faster backup; Correspondingly, the recovery node performs specific recovery tasks. In the specific implementation, multiple recovery nodes run on the parallel computing architecture, which effectively improves the parallelism of data recovery and improves the efficiency of data recovery; storage nodes can have Multiple, used to store massive backup data, index nodes can provide indexes for the backup data, and can quickly locate the backup data during data recovery; the production system and data recovery system in the figure are used to provide external data access services , wherein the production system is used to provide external data before backup, and the data recovery system is used to provide data recovered from the storage nodes.

如图6所示，为本发明实施例三提供的数据备份方法流程图，包括：As shown in FIG. 6, it is a flow chart of the data backup method provided by Embodiment 3 of the present invention, including:

S601：控制节点生成备份策略；S601: the control node generates a backup policy;

该备份策略包括：a)备份对象：表1（table1）、表2（table2）、表3（table3）；b)备份对象之间的关系为：table1AND table2OR table3；c)增量备份时间间隔为：针对table1和table2，每天增量备份一次，针对table3，每周增量备份一次。The backup strategy includes: a) backup objects: table 1 (table1), table 2 (table2), table 3 (table3); b) the relationship between backup objects is: table1AND table2OR table3; c) the incremental backup time interval is : For table1 and table2, incremental backup is performed once a day, and for table3, incremental backup is performed once a week.

S602：控制节点判断当前时间是否是0点，如果是，则进入步骤S603，否则，返回步骤S602；S602: The control node judges whether the current time is 0:00, if yes, enters step S603, otherwise, returns to step S602;

S603：控制节点判断是否是周日的0点，如果是，则进入步骤S604，否则，进入步骤S605；S603: the control node judges whether it is 0 o'clock on Sunday, if yes, then enter step S604, otherwise, enter step S605;

S604：备份节点根据控制节点的指示，为table1、table2以及table3创建快照；S604: the backup node creates snapshots for table1, table2 and table3 according to the instruction of the control node;

S605：备份节点根据控制节点的指示，为table1和table2创建快照；S605: the backup node creates snapshots for table1 and table2 according to the instruction of the control node;

S606：备份节点通过HDFS，将数据备份到存储节点，并生成索引数据存储到索引节点；S606: the backup node backs up the data to the storage node through HDFS, and generates index data to store in the index node;

S607：备份节点记录备份进度后删除快照。S607: The backup node deletes the snapshot after recording the backup progress.

下面，对上述步骤S606作进一步说明：创建快照后，HDFS上的目录结构如下：Below, the above-mentioned step S606 is further explained: after the snapshot is created, the directory structure on the HDFS is as follows:

如图6a，为备份节点进行数据备份的方法流程图，包括：As shown in Figure 6a, the flow chart of the method for backing up data for the backup node includes:

S6a：备份节点首先到/hbase/.snapshots/completed/regionname/[columnfamily name]/[hfile name]文件中获取此次增量所涉及到的Hfile文件；S6a: The backup node first obtains the Hfile files involved in this increment from the /hbase/.snapshots/completed/regionname/[columnfamily name]/[hfile name] file;

S6b：备份节点分析并获取所有的这些文件列表所在的数据节点，形成如下所示的关系列表：S6b: The backup node analyzes and obtains the data nodes where all these file lists are located, and forms a relation list as shown below:

table1/region1/hfile1 10G datanode1，datanode2，datanode3table1/region1/hfile1 10G datanode1, datanode2, datanode3

table/region1/hfile2 50G datanode4，datanode2，datanode3table/region1/hfile2 50G datanode4, datanode2, datanode3

table1/region1/hfile3 80G datanode4，datanode5，datanode6table1/region1/hfile3 80G datanode4, datanode5, datanode6

......

table2/region1/hfile1 30G datanode7，datanode8，datanode9table2/region1/hfile1 30G datanode7, datanode8, datanode9

table2/region1/hfile2 100G datanode1，datanode3，datanode9table2/region1/hfile2 100G datanode1, datanode3, datanode9

table2/region3/hfile1 56G datanode2，datanode8，datanode4table2/region3/hfile1 56G datanode2, datanode8, datanode4

……

table3/region1/hfile1 38G datanode5，datanode8，datanode9table3/region1/hfile1 38G datanode5, datanode8, datanode9

table3/region1/hfile2 29G datanode1，datanode3，datanode10table3/region1/hfile2 29G datanode1, datanode3, datanode10

table3/region3/hfile1 55G datanode2，datanode8，datanode3table3/region3/hfile1 55G datanode2, datanode8, datanode3

……

S6c：备份节点采用文件的位置和大小作为任务分配因子，生成MapReduce任务；这样，可以保证备份节点在尽量只读去本地文件的同时，增加系统的并行性，从而快速的完成数据的备份，缩小备份窗口。S6c: The backup node uses the location and size of the file as the task allocation factor to generate MapReduce tasks; in this way, it can ensure that the backup node reads only local files as much as possible, and at the same time increases the parallelism of the system, thereby quickly completing data backup and shrinking backup window.

这里，MapReduce是一种编程模型，用于大规模数据集（大于1TB）的并行运算，其中Map可译为映射，Reduce可译为规约。Here, MapReduce is a programming model for parallel computing of large-scale data sets (greater than 1TB), where Map can be translated into mapping, and Reduce can be translated into specification.

如图7所示，为本发明实施例三提供的数据恢复方法流程图，包括：As shown in FIG. 7, it is a flow chart of the data recovery method provided by Embodiment 3 of the present invention, including:

S701：控制节点生成恢复策略；S701: The control node generates a recovery strategy;

该恢复策略包括：a）恢复对象：table3；b）恢复的时间段：恢复2013年8月9日创建的快照中的数据；c）恢复的目的地址：恢复到数据恢复系统中。The restoration strategy includes: a) restoration object: table3; b) restoration time period: restoration of the data in the snapshot created on August 9, 2013; c) restoration destination address: restoration to the data restoration system.

S702：控制节点根据所述恢复策略，向恢复节点发送数据恢复命令；S702: The control node sends a data recovery command to the recovery node according to the recovery strategy;

S703：恢复节点在接收到恢复命令后，从存储节点中根据索引读取相应的数据，并将数据整理成快照格式后，通过HDFS接口写入数据恢复系统。S703: After receiving the recovery command, the recovery node reads corresponding data from the storage node according to the index, organizes the data into a snapshot format, and writes the data into the data recovery system through the HDFS interface.

基于同一发明构思，本发明实施例中还提供了与上述数据备份方法对应的数据备份装置，与上述数据恢复方法对应的数据恢复装置，由于这些装置解决问题的原理与上述数据备份方法、数据恢复方法相似，因此本发明实施例中装置的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept, the embodiments of the present invention also provide a data backup device corresponding to the above-mentioned data backup method, and a data recovery device corresponding to the above-mentioned data recovery method. The methods are similar, so the implementation of the device in the embodiment of the present invention can refer to the implementation of the method, and the repetition will not be repeated.

如图8所示，为本发明实施例一提供的一种数据备份装置结构示意图，该装置包括：As shown in FIG. 8 , it is a schematic structural diagram of a data backup device provided by Embodiment 1 of the present invention. The device includes:

创建模块81，用于根据控制节点的指示，通过分布式数据存储系统Hbase，为备份对象创建快照；Creation module 81, is used for according to the instruction of control node, through distributed data storage system Hbase, creates snapshot for backup object;

备份模块82，用于将创建模块8创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中，其中，所述快照中的数据为在创建该快照之后，下一次为所述备份对象创建快照之前，所述备份对象中增加或修改的数据。The backup module 82 is used to back up the data in the snapshot created by the creation module 8 to the remote storage node through the distributed file system HDFS, wherein the data in the snapshot is after the snapshot is created, and the next time is The data added or modified in the backup object before the snapshot is created for the backup object.

可选地，所述创建模块81具体用于：Optionally, the creation module 81 is specifically used for:

根据所述控制节点指示的增量备份时间间隔，为所述控制节点指示的备份对象创建快照；其中，具有关联关系的备份对象之间的增量备份时间间隔相同。According to the incremental backup time interval indicated by the control node, a snapshot is created for the backup object indicated by the control node; wherein, the incremental backup time intervals between the backup objects having an association relationship are the same.

如图9所示，为本发明实施例二提供的一种数据备份装置结构示意图，该装置包括：As shown in FIG. 9 , it is a schematic structural diagram of a data backup device provided by Embodiment 2 of the present invention. The device includes:

生成模块91，用于根据用户输入的备份指示信息，生成备份策略；A generation module 91, configured to generate a backup strategy according to the backup instruction information input by the user;

指示模块92，用于根据所述备份策略，指示多个备份节点并行执行数据备份任务，所述数据备份任务包括：针对备份对象创建快照，将创建的所述快照中的数据通过分布式文件系统HDFS备份到远端存储节点中。The instructing module 92 is configured to instruct multiple backup nodes to execute data backup tasks in parallel according to the backup policy, the data backup tasks include: creating a snapshot for the backup object, and passing the created data in the snapshot through the distributed file system HDFS is backed up to remote storage nodes.

如图10所示，为本发明实施例一提供的一种数据恢复装置结构示意图，该装置包括：As shown in Figure 10, it is a schematic structural diagram of a data recovery device provided by Embodiment 1 of the present invention, the device includes:

读取模块101，用于根据控制节点的指示，读取远端存储节点存储的数据；The reading module 101 is configured to read the data stored in the remote storage node according to the instruction of the control node;

写入模块102，用于将读取模块101读取的数据整理成快照格式的数据，并将整理后的数据通过分布式文件系统HDFS接口写入用于提供数据访问服务的数据恢复系统。The writing module 102 is configured to organize the data read by the reading module 101 into snapshot format data, and write the organized data into the data recovery system for providing data access services through the distributed file system HDFS interface.

可选地，所述写入模块具体用于：Optionally, the writing module is specifically used for:

根据备份前的快照目录结构，创建读取的所述数据在所述HDFS中的快照目录结构。According to the snapshot directory structure before backup, create the snapshot directory structure of the read data in the HDFS.

如图11所示，为本发明实施例二提供的一种数据恢复装置结构示意图，该装置包括：As shown in FIG. 11 , it is a schematic structural diagram of a data recovery device provided by Embodiment 2 of the present invention. The device includes:

生成模块111，用于根据用户输入的恢复指示信息，生成恢复策略；A generating module 111, configured to generate a recovery strategy according to the recovery instruction information input by the user;

指示模块112，用于根据所述恢复策略，指示多个恢复节点并行执行数据恢复任务，所述数据恢复任务包括：读取远端存储节点存储的数据，将读取的数据整理成快照格式的数据，并将整理后的数据通过分布式文件系统HDFS接口写入用于提供数据访问服务的数据恢复系统。The instructing module 112 is configured to instruct multiple recovery nodes to perform data recovery tasks in parallel according to the recovery strategy, and the data recovery tasks include: reading data stored in remote storage nodes, and organizing the read data into snapshot format data, and write the sorted data into the data recovery system for providing data access services through the distributed file system HDFS interface.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、装置（系统）、和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and combinations of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a Means for realizing the functions specified in one or more steps of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart flow or flows and/or block diagram block or blocks.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.

Claims

1. A data backup method, characterized in that the method comprises:

The backup node creates a snapshot for the backup object through the distributed data storage system Hbase according to the instructions of the control node;

The backup node backs up the created data in the snapshot to the remote storage node through the distributed file system HDFS, wherein the data in the snapshot is created for the backup object next time after the snapshot is created. Before the snapshot, the data added or modified in the backup object.

2. The method according to claim 1, wherein the backup node creates a snapshot for the backup object, comprising:

The backup node creates a snapshot for the backup object indicated by the control node according to the incremental backup time interval indicated by the control node; wherein, the incremental backup time intervals between the backup objects having an association relationship are the same.

3. A data backup method, characterized in that the method comprises:

The control node generates a backup strategy according to the backup instruction information input by the user;

The control node instructs multiple backup nodes to execute data backup tasks in parallel according to the backup policy, and the data backup tasks include: creating a snapshot for the backup object, and backing up the data in the created snapshot through the distributed file system HDFS to the remote storage node.

4. The method according to claim 3, wherein the backup strategy comprises: backup objects, relationships between backup objects, and incremental backup time intervals, wherein the increment between backup objects with associated relationships The backup intervals are the same.

5. A data recovery method, characterized in that the method comprises:

The recovery node reads the data stored in the remote storage node according to the instructions of the control node;

The recovery node organizes the read data into data in a snapshot format, and writes the organized data into a data recovery system for providing data access services through a distributed file system HDFS interface.

6. The method according to claim 5, wherein the restoration node organizes the read data into data in a snapshot format, comprising:

The recovery node creates a snapshot directory structure of the read data in the HDFS according to the snapshot directory structure before backup.

7. A data recovery method, characterized in that the method comprises:

The control node generates a recovery strategy according to the recovery instruction information input by the user;

The control node instructs multiple recovery nodes to perform data recovery tasks in parallel according to the recovery strategy, and the data recovery tasks include: reading data stored in remote storage nodes, organizing the read data into data in a snapshot format, And the sorted data is written into the data recovery system for providing data access services through the distributed file system HDFS interface.

8. The method according to claim 7, wherein the restoration policy comprises: restoration objects and restoration time periods.

9. A data backup device, characterized in that the device comprises:

Create a module for creating a snapshot for the backup object through the distributed data storage system Hbase according to the instructions of the control node;

A backup module, configured to back up the data in the snapshot created by the creation module to a remote storage node through the distributed file system HDFS, wherein the data in the snapshot is after the snapshot is created, the next time is The data added or modified in the backup object before the snapshot is created for the backup object.

10. The device according to claim 9, wherein the creation module is specifically used for:

According to the incremental backup time interval indicated by the control node, a snapshot is created for the backup object indicated by the control node; wherein, the incremental backup time intervals between the backup objects having an association relationship are the same.

11. A data backup device, characterized in that the device comprises:

A generating module, configured to generate a backup strategy according to the backup instruction information input by the user;

The instruction module is configured to instruct multiple backup nodes to execute data backup tasks in parallel according to the backup policy generated by the generation module, the data backup tasks include: creating a snapshot for the backup object, and distributing the created data in the snapshot The file system HDFS is backed up to the remote storage node.

12. The device according to claim 11, wherein the backup strategy comprises: backup objects, relationships between backup objects, and incremental backup time intervals, wherein the incremental backup objects with associated relationships The backup intervals are the same.

13. A data recovery device, characterized in that the device comprises:

The reading module is used to read the data stored in the remote storage node according to the instruction of the control node;

The writing module is used to organize the data read by the reading module into snapshot format data, and write the arranged data into the data recovery system for providing data access services through the distributed file system HDFS interface.

14. The device according to claim 13, wherein the writing module is specifically used for:

According to the snapshot directory structure before backup, create the snapshot directory structure of the read data in the HDFS.

15. A data recovery device, characterized in that the device comprises:

A generating module, configured to generate a recovery strategy according to the recovery instruction information input by the user;

An instruction module, configured to instruct multiple recovery nodes to execute data recovery tasks in parallel according to the recovery strategy generated by the generation module, and the data recovery tasks include: reading data stored in remote storage nodes, and organizing the read data into The data in the snapshot format, and the sorted data is written to the data recovery system for providing data access services through the distributed file system HDFS interface.

16. The apparatus according to claim 15, wherein the recovery strategy comprises: a recovery object and a recovery time period.