CN113064755B

CN113064755B - Data recovery method, device, equipment, medium and program product

Info

Publication number: CN113064755B
Application number: CN202110286501.0A
Authority: CN
Inventors: 石慧兴; 邓治国; 刘岩
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2024-09-20
Anticipated expiration: 2041-03-17
Also published as: CN113064755A

Abstract

The application provides a data recovery method, a device, equipment, a medium and a program product, which are characterized in that address information and recovery mode information of a target recovery node are determined in response to a recovery configuration instruction, target recovery data corresponding to the recovery mode information is determined, and the target recovery data and a node operation tool are copied to the target recovery node according to the address information, so that the target recovery node can complete data recovery. The technical problem of how to quickly recover the shared storage data of the distributed system is solved. The method and the device have the advantages of being suitable for efficient and high-multiplexing automatic data recovery of various recovery modes or recovery scenes, saving manpower and material resources and avoiding larger economic loss caused by long-time unable recovery use of the system.

Description

Data recovery method, device, equipment, medium and program product

Technical Field

The present application relates to the field of computer data sharing storage, and in particular, to a data recovery method, apparatus, device, medium, and program product.

Background

With the continuous development of internet technology, a multi-node distributed system has become a main solution in the field of internet backend services. But one problem that all distributed systems must face is the problem of data sharing among multiple nodes. Such as identity information of individual nodes, task order coordination information that coordinates completion among individual nodes, etc., distributed systems typically solve this problem in two ways, one by synchronizing information (e.g., elastic search) through a reliable search engine, and the other by relying on or integrating a reliable shared storage service.

Therefore, the security of the shared storage of the distributed system has become a problem to be considered in guaranteeing the security of the distributed system, and the prior art generally guarantees the security of the shared storage by providing a form of standby shared storage, such as introducing an elastic search engine and establishing a shared storage service.

However, in the prior art, when the centralized data burst with large data volume and large traffic volume is faced, or the hacker or the virus intentionally attacks, the technical problem of losing the key value sharing storage is still difficult to avoid. Once the key value is lost, the whole distributed system falls into the embarrassing situation of paralysis or breakdown, and background maintenance personnel are required to rebuild the system connection, and the influence is not serious for small systems, but the influence is necessarily serious for complex large systems, so that huge economic loss is caused, and therefore, how to quickly recover the shared storage data of the distributed system becomes a technical problem to be solved.

Disclosure of Invention

The application provides a data recovery method, a device, equipment, a medium and a program product, which are used for solving the technical problem of how to quickly recover shared storage data of a distributed system.

In a first aspect, the present application provides a data recovery method applied to a shared storage service cluster, where the shared storage service cluster includes at least one shared storage node, the method including:

Determining address information and recovery mode information of a target recovery node in response to a recovery configuration instruction, wherein the target recovery node comprises part or all of the shared storage nodes in the shared storage service cluster, and the recovery mode information comprises at least one recovery mode;

determining target recovery data corresponding to the recovery mode information;

and copying the target recovery data and the node operation tool to the target recovery node according to the address information so as to enable the target recovery node to complete data recovery.

In one possible design, the recovery manner information includes: the target recovery data corresponding to the first recovery mode includes: backup data backed up at a first preset time.

Optionally, the backup data includes: the first shared data automatically backed up at a preset time, and the second shared data manually backed up by the user at any time.

In one possible design, the recovery manner information includes: the second recovery method, the target recovery data corresponding to the second recovery mode includes: and normal operation data stored in the shared storage service cluster at a second preset moment.

Optionally, the second preset time includes: and the last moment of normal operation of the shared storage service cluster.

Optionally, the normal operation data includes: at least one of the shared storage nodes is configured to store shared data in real time during normal operation.

In one possible design, the shared storage service cluster includes: the first cluster having failed, in response to the recovery configuration instruction, determining address information and recovery mode information of the target recovery node, including:

judging whether the configuration instruction indicates to perform data recovery on the first cluster;

if yes, taking at least one node to be restored on the first cluster as the target restoring node, and determining the address code and the corresponding user information of the target restoring node, wherein the node to be restored is a shared storage node which has failed;

determining directory information of the target recovery data according to the recovery mode information indicated by the configuration instruction;

The address information includes: the address code and the user information, and the recovery mode information includes: the recovery mode and the directory information.

In one possible design, the shared storage service cluster further comprises: and after the judging whether the configuration instruction indicates to perform data recovery on the first cluster, the method further comprises:

if not, at least one shared storage node on the second cluster is used as the target recovery node, and the address code and the corresponding user information of the target recovery node are determined.

Optionally, before the taking the at least one shared storage node on the second cluster as the target recovery node, the method further includes:

judging whether the second cluster is constructed or not;

if not, creating the second cluster through a preset management tool;

And if so, correspondingly taking at least one shared storage node on the second cluster as the target recovery node, and determining the address code of the target recovery node and the corresponding user information.

Optionally, the preset management tool includes: the containerized management tool creates the second cluster through a preset management tool, and the containerized management tool comprises:

and configuring the connection information and the resource information of the containerized management tool so that the containerized management tool builds and starts the second cluster according to the connection information and the resource information.

In one possible design, the recovery manner information includes: a first recovery manner and a second recovery manner, where the first recovery manner is to recover data with backup data, the second recovery manner is to recover data with normal operation data stored in the shared storage service cluster, and the determining directory information of the target recovery data according to the recovery manner information indicated by the configuration instruction includes:

if the recovery mode is the first recovery mode, determining first directory information of the backup data;

and if the recovery mode is a second recovery mode, determining second directory information of the normal operation data.

In one possible design, if the recovery mode is the first recovery mode, the copying the target recovery data and the node operation tool to the target recovery node according to the address information, so that the target recovery node completes data recovery, includes:

copying the backup data and the node operation tool to each target recovery node according to each address information of each target recovery node and the first directory information corresponding to each target recovery node;

Executing a node restoration task according to the backup data by using the node operation tool on each target restoration node;

and restarting each target recovery node to complete data recovery.

In one possible design, if the recovery mode is the second recovery mode, the copying the target recovery data and the node operation tool to the target recovery node according to the address information, so that the target recovery node completes data recovery, includes:

Copying the second directory information and the operation tool to a target node according to the address information, wherein the target recovery node comprises the target node;

starting the target node, and executing a node update task on the target node by using the node operation tool according to the second directory information so as to update the target node to be a recovery original node;

And circularly executing a new node task according to the second directory information by using the node operation tool, and starting a new shared storage node to execute the node update task until the data recovery of the shared storage service cluster is completed.

In a second aspect, the present application provides a data recovery apparatus comprising:

the acquisition module is used for acquiring the recovery configuration instruction;

The configuration module is used for responding to the recovery configuration instruction and determining the address information and the recovery mode information of a target recovery node, wherein the target recovery node comprises part or all of shared storage nodes in a shared storage service cluster, and the recovery mode information comprises at least one recovery mode;

the recovery module is used for determining target recovery data corresponding to the recovery mode information;

And the recovery module is further used for copying the target recovery data and the node operation tool to the target recovery node according to the address information so as to enable the target recovery node to complete data recovery.

In one possible design, the shared storage service cluster includes: the first cluster that has failed, the configuration module is specifically configured to:

In one possible design, the shared storage service cluster further comprises: and a second cluster which does not generate faults, wherein the configuration module is further used for:

If not, taking at least one shared storage node on the second cluster as the target recovery node, and determining the address code and the corresponding user information of the target recovery node;

Optionally, the configuration module is further configured to:

judging whether the second cluster is constructed or not;

if not, creating the second cluster through a preset management tool;

Optionally, the preset management tool includes: the containerization management tool, the configuration module is further configured to:

In one possible design, the recovery manner information includes: the system comprises a first recovery mode and a second recovery mode, wherein the first recovery mode is to recover data by backup data, the second recovery mode is to recover data by normal operation data stored in the shared storage service cluster, and the recovery module is specifically used for:

In one possible design, if the recovery mode is the first recovery mode, the recovery module is further configured to:

and restarting each target recovery node to complete data recovery.

In one possible design, if the recovery mode is the second recovery mode, the recovery module is further configured to:

In a third aspect, the present application provides an electronic device comprising:

a memory for storing program instructions;

And a processor for calling and executing program instructions in the memory to perform any one of the possible data recovery methods provided in the first aspect.

In a fourth aspect, the present application provides a storage medium having stored therein a computer program for performing any one of the possible data recovery methods provided in the first aspect.

In a fifth aspect, the application also provides a computer program product comprising a computer program which, when executed by a processor, implements any one of the possible data recovery methods provided in the first aspect.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram of an application scenario of an ETCD shared storage cluster according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a data recovery method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating another data recovery method according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a data recovery device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, including but not limited to combinations of embodiments, which are within the scope of the application, can be made by one of ordinary skill in the art without inventive effort based on the embodiments of the application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Distributed systems have become the dominant building model of current internet service systems, but one problem that all distributed systems must face is the problem of data sharing among multiple nodes. Such as identity information of individual nodes, task order coordination information that coordinates completion among individual nodes, etc., distributed systems typically solve this problem in two ways, one by synchronizing information (e.g., elastic search) through a reliable search engine, and the other by relying on or integrating a reliable shared storage service.

As the online cluster environment of the distributed system becomes more and more complicated, the access request amount becomes higher and higher, and the system server is easily broken by a large amount of high-concurrency data access, or the security of the shared storage of the distributed system is more and more emphasized due to a series of uncomfortable situations such as hardware faults of the server, power failure of a machine room, network disconnection and the like, and then a plurality of complicated operation rules are developed to guide operation and maintenance personnel to recover the data when the shared storage service fails.

However, the inventors have found that this also results in increased system construction and operational maintenance costs. And the operation and maintenance personnel perform manual emergency repair on the shared storage service, namely data recovery. The recovery modes corresponding to different distributed systems are various, or when the same distributed system faces different faults, the recovery modes correspond to various different recovery modes, so that operation and maintenance personnel cannot quickly and accurately complete data recovery, each time of recovery of data needs to analyze and manually edit recovery parameters and recovery configuration files by the operation and maintenance personnel, time and labor are wasted, misoperation is extremely easy to occur, logic errors are caused, secondary paralysis of the system is caused, or subsequent chain reaction is caused, and larger economic loss is caused.

In order to solve the technical problems, the application concept of the application is as follows:

And the operation and maintenance personnel simply select or input the required recovery modes and recovery parameters on the data recovery interface, and the system automatically configures each recovery file to match the recovery data. By providing the system service capable of supporting multiple recovery modes, the data can be quickly recovered, the problem of high time cost caused by manual recovery of operation and maintenance personnel under the fault condition is solved, the operation error risk caused by complicated steps during manual operation is avoided, multiple recovery modes under different fields are supported in a diversified mode, multiple conditions of recovery operation are covered by a single tool, step references under multiple conditions required by manual operation are avoided, more unified and simpler operation is realized, and recovery execution is guaranteed efficiently and stably.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

For ease of understanding, the following embodiments are described based on ETCD shared storage service clusters, but the data recovery method provided by the present application can be used not only for ETCD, but also for similar systems, such as Zookeeper, consul, etc., and those skilled in the art may select a specific carrier or a specific implementation of the data recovery method provided by the present application according to a specific application scenario, and the present application is not limited thereto.

A basic description of ETCD is provided below.

ETCD is a distributed, highly available and well-consistent key-value data form shared storage database, and is realized based on Go language and mainly used for shared configuration and service discovery.

As CoreOS and Kubernetes, etc. items are increasingly hot in open source communities, ETCD components used in both of them serve as a highly available, strongly consistent service discovery repository, and are becoming of increasing interest to developers. In the cloud computing era, how to enable services to be quickly and transparently accessed into a computing cluster, how to enable shared configuration information to be quickly found by all machines in the cluster, and more importantly, how to construct a service cluster with high availability, safety, easy deployment and quick response, has become an urgent problem to be solved, and ETCD brings good news for solving the problem.

ETCD may be particularly useful for: service discovery, message publishing and subscribing, load balancing, distributed notification and coordination, distributed locking, distributed queuing, cluster monitoring, LEADER master node election and other various application scenarios.

While these are used in the scenario, the core role played by the ETCD is the shared storage of key value data.

Fig. 1 is a schematic diagram of an application scenario of an ETCD shared storage cluster according to an embodiment of the present application. As shown in fig. 1, the distributed system includes a first service portion 20 and a second service portion 30. Wherein the ETCD shared storage service cluster 10 is required to perform shared storage and configuration of service related parameters when performing certain services. The ETCD shared storage service cluster 10 includes a plurality of shared storage nodes, which are configured to provide shared storage services, such as Key Value Key-type data storage services, for different services or different services of the same service.

The data recovery method provided by the application can recover various forms of data for any one or more shared storage nodes in the ETCD shared storage service cluster 10.

Fig. 2 is a flow chart of a data recovery method according to an embodiment of the present application. As shown in fig. 2, the specific steps of the data recovery method include:

S201, in response to the recovery configuration instruction, determining address information and recovery mode information of the target recovery node.

In this step, the target recovery node includes some or all of the shared storage nodes in the shared storage service cluster, and the recovery mode information includes at least one recovery mode.

In this embodiment, the user inputs or selects a restoration mode, such as backup file restoration or local data restoration, on the control interface, and sets a part of relevant configuration parameters, so that the distributed system automatically complements the remaining configuration parameters to form a configuration file meeting the preset requirements, i.e. a restoration configuration instruction.

The distributed system obtains the recovery configuration instruction and extracts the address information, such as the IP address, of the ETCD node needing data recovery, namely the target recovery node. And simultaneously, determining a recovery mode designated by the user according to the recovery configuration instruction.

It should be noted that, the data recovery method provided in this embodiment is applied to a shared storage service cluster, where the shared storage service cluster includes at least one shared storage node, and optionally, data stored in the shared storage service cluster is data in a Key-Value Key form.

S202, determining target recovery data corresponding to the recovery mode information.

In this step, different recovery modes correspond to different recovery data, for example, recovery is performed by using a backup file, and then the target recovery data is the backup data file backed up by the system in advance; when the shared storage node is used for restoring the local data, the target restoring data is the data recorded when the shared storage node normally operates.

In one possible design, the recovery mode information includes: a first recovery mode; the target recovery data corresponding to the first recovery mode includes: backup data backed up at a first preset time.

The preset time includes: the periodic time of the fixed time interval and the time of the non-fixed time interval. The period time can be understood as that the distributed system where the shared storage service cluster is located periodically performs data backup on all the shared storage nodes; the moment of the non-fixed time interval can be understood as that the distributed system presets the backup condition, and when the operation state of the distributed system reaches the preset backup condition, the backup can be triggered. For example, the distributed system may backup once every hour, or may trigger an automatic backup when the traffic access load rate reaches a preset threshold, such as 70%.

It will be appreciated that the user may also manually backup some or all of the shared storage nodes of the shared storage service cluster in the distributed system at any time, as desired by the user.

In another possible design, the recovery mode information further includes: a second recovery method; the target recovery data corresponding to the second recovery mode includes: and normal operation data stored in the shared storage service cluster at a second preset moment.

For normal operation data, it includes in this embodiment: at least one shared storage node stores shared data in real time during normal operation.

It should be noted that, alternatively, the second preset time may be the same as or different from the first preset time, and in this embodiment, the second preset time and the first preset time have no constraint relationship. Of course, it can be understood that, according to the actual scene requirement, a person skilled in the art can also establish the corresponding relationship between the first preset time and the second preset time, and the specific corresponding manner is not limited by the present application.

In one possible embodiment, the second preset time includes: the last moment the shared storage service cluster is operating normally. And when the shared storage service cluster cannot work normally due to a certain fault of the distributed system, automatically carrying out storage backup on the data operated at the last moment of the shared storage service cluster.

In one possible embodiment, the second preset time includes: at any time between the two backup cycle times, such as the midpoint of the two backup cycle times.

And S203, copying the target recovery data and the node operation tool to the target recovery node according to the address information so as to enable the target recovery node to complete data recovery.

In this embodiment, the node operation tool includes: an operating program file and a program configuration file.

The number of the required target recovery nodes is different in different data recovery modes, or the same data recovery mode can also adopt different numbers of target recovery nodes to recover the data.

For example, a shared storage service cluster includes three shared storage nodes, and when the distributed system fails for some reason, one node may lose data, two nodes may lose data, or all three nodes may lose data.

At this time, if the user selects to restore data in the first restoring manner, that is, in the manner of the backup file, one possible design is to restore data to the shared storage node no matter whether the shared storage node loses data, that is, the three shared storage nodes copy the backup data to the node, then the operation program file invokes the program configuration file to start the restore command, that is, to execute the node restoring task, and restore the three shared storage nodes to the state of the ETCD cluster, which is the shared storage service cluster, at the first preset moment corresponding to the backup data.

In another possible design, it may be checked first whether the shared storage node has lost data or has data errors, and only the abnormal shared storage node is repaired. It will be appreciated that in this case, it is possible that one or some of the shared storage nodes may not be able to complete a recovery in a short period of time, such as a machine room outage or a server hardware failure. At this time, the nodes which cannot be recovered can be replaced by executing the tasks of the newly added nodes on the partially normal shared storage nodes, the backup data is copied to the newly added nodes, and the newly added nodes are started to complete the data recovery. Or repairing some or some abnormal shared storage nodes, then executing newly added node tasks after starting the repaired shared storage nodes, and replacing the nodes which cannot be recovered, thereby completing data recovery. Therefore, the distributed system can complete data recovery as soon as possible, thereby recovering to provide service and reducing or avoiding economic loss caused by system paralysis.

After the data recovery is completed, the whole shared data cluster can be subjected to loop state inspection, so that after each shared storage node of the shared data cluster has no problem, or the data cannot collide due to the data recovery, the successful recovery is confirmed, and a recovery success log is returned, or else, the recovery failure and the error log of the related steps are returned.

The embodiment provides a data recovery method, which comprises the steps of responding to a recovery configuration instruction, determining address information and recovery mode information of a target recovery node, determining target recovery data corresponding to the recovery mode information, and copying the target recovery data and a node operation tool to the target recovery node according to the address information so as to enable the target recovery node to complete data recovery. The technical problem of how to quickly recover the shared storage data of the distributed system is solved. The method and the device have the advantages of being suitable for efficient and high-multiplexing automatic data recovery of various recovery modes or recovery scenes, saving manpower and material resources and avoiding larger economic loss caused by long-time unable recovery use of the system.

Fig. 3 is a flow chart of another data recovery method according to the embodiment of the present application. As shown in fig. 3, the specific steps of the data recovery method include:

S301, judging whether the acquired configuration instruction indicates to perform data recovery on the first cluster.

The data recovery method provided by the embodiment is applied to a shared storage service cluster, wherein the shared storage service cluster comprises at least one shared storage node.

In this embodiment, the shared storage service cluster includes: the first cluster that has failed.

The user inputs or selects the recovery mode on the control interface, sets the configuration parameters which need to be set or selected manually, and then the rest configuration parameters are automatically complemented by the distributed mode to form a configuration file which meets the preset requirements, namely a recovery configuration instruction.

If the user chooses to perform data recovery on the failed cluster, that is, perform data recovery on the first cluster, step S302 is performed, otherwise step S304 is performed.

S302, taking at least one node to be restored on the first cluster as a target restoring node, and determining the address code and the corresponding user information of the target restoring node.

In this step, the node to be restored is a failed shared storage node, and the user information includes: account name (or account number) and account password, address encoding includes: MAC address or IP address.

Specifically, for example, in the current application scenario, if the user selects to restore on the node of the original cluster (i.e. the first cluster), the user selects or the distributed system automatically fills in the node IP of the original cluster and the related user name password information into the configuration file.

In one possible design, the target recovery node includes all of the nodes to be recovered, or may include only some of the nodes to be recovered. Because in some cases, some nodes to be restored cannot complete data restoration in a short time due to objective reasons, such as computer room outage, network disconnection, server hardware damage and other unreliability factors, some shared storage nodes cannot be restored immediately.

S303, determining the directory information of the target recovery data according to the recovery mode information indicated by the configuration instruction.

In this embodiment, the recovery mode information includes: a first recovery mode and a second recovery mode. The first recovery mode is to recover the data by using the backup data, and the second recovery mode is to recover the data by using the normal operation data stored in the shared storage service cluster.

In this step, if the restoration mode is the first restoration mode, determining the first directory information of the backup data; and if the recovery mode is a second recovery mode, determining second directory information of the normal operation data.

In one possible design, after determining the directory information of the target recovery data, further comprising:

Performing correctness checking on each recovery parameter determined according to the configuration instruction by using a checking tool, and if the recovery parameter is correct, if the recovery parameter has no logic error or value error, executing step S307 to start data recovery; if the recovery parameter has errors, an error identification and related error steps or error positions are returned to form an error report log, and the error report log is returned to the distributed system for analysis and processing by a user.

S304, judging whether a second cluster is constructed or not;

In this embodiment, the shared storage service cluster further includes: and a second cluster that has not failed.

If yes, step S305 is executed, and if no, step S306 is executed.

It should be noted that, the second cluster may be a standby cluster or a newly built cluster, or may be another cluster that is running, and the first cluster that has failed is temporarily replaced by a temporary borrowing manner, and then the first cluster is switched back after being restored.

S305, taking at least one shared storage node on the second cluster as a target recovery node, and determining the address code and corresponding user information of the target recovery node.

In this step, when the required new shared storage cluster, i.e. the second cluster, already exists, at least one shared storage node on the second cluster is selected as the target recovery node.

Optionally, when the number of the selected target recovery nodes is consistent with the number of the nodes to be recovered, the replication recovery corresponding to each node to be recovered one by one can be realized, so that the fast switching recovery can be realized.

Optionally, when the number of the selected target recovery nodes is smaller than the number of the nodes to be recovered, a sharing storage node with a critical priority for recovery may be selected, for example, data of a leader master node in the cluster is recovered preferentially, and then the rest nodes to be recovered are recovered or replaced by automatically generating new nodes by the leader master node. The method has the advantages that the method can follow the master-slave relation of the clusters, the recovery mode is simple and easy, and the recovery mechanism of the distributed system can be utilized only after the initial node is recovered, and all data recovery can be completed without external recovery service.

After this step is completed, step S303 is continued.

S306, creating a second cluster through a preset management tool.

In this step, the preset management tool includes: and (5) containerization management tool.

Specifically, the connection information and the resource information of the containerized management tool are configured, so that the containerized management tool builds a second cluster according to the connection information and the resource information and starts the second cluster.

In this embodiment, the containerization management tool includes: kubernetes, call kubernetes enables the new shared storage cluster by configuring kubernetes related connection information and resource information.

After this step is completed, step S305 is continued to be executed.

S307, executing a corresponding data recovery step according to the recovery mode information indicated by the configuration instruction.

In this embodiment, the recovery mode information includes at least: a first recovery mode and a second recovery mode. The first recovery mode is to recover the data by using the backup data, and the second recovery mode is to recover the data by using the normal operation data stored in the shared storage service cluster.

If the data recovery is performed in the first recovery mode, executing steps S308-S310;

if the data recovery is performed in the second recovery mode, steps S311-S314 are performed.

It should be noted that, in one possible manner, in order to ensure that the target recovery node is a usable node when performing data recovery, before performing this step, the method further includes:

utilizing a connectivity verification tool to perform connectivity verification on the target recovery node according to the address information;

if the target recovery node can be communicated, continuing to execute the step, namely judging and identifying recovery mode information in the configuration instruction;

If the target recovery node cannot communicate, an error identification and a related error step or error position are returned to form an error report log, and the error report log is returned to the distributed system for analysis and processing by a user.

S308, copying the backup data and the node operation tool to each target recovery node according to each address information of each target recovery node and the first directory information corresponding to each target recovery node.

In this step, the node operation tool includes: an operation program file and a program configuration file, and the address information comprises: IP address and/or MAC address.

For ease of understanding, this embodiment will be described by taking an ETCD shared storage service cluster as an example.

In this embodiment, the node operation tool includes: ETCD service and ETCD configuration files.

Specifically, first, a new ETCD configuration file is generated in the local or distributed system, and then, backup data, related ETCD binary files (i.e., ETCD service programs) and ETCD configuration files are copied to each target node through an IP address and/or a MAC address.

S309, executing a node restoration task according to the backup data by using a node operation tool on each target restoration node.

In this embodiment, specifically, the snapshot restore command is executed on each node according to the backup data through the ETCD binary file and the ETCD configuration file, that is, the restore task is executed.

S310, restarting each target recovery node to complete data recovery.

In this embodiment, after the restoration task is executed, the ETCD service of each target restoration node is started, that is, the data restoration is completed, so that the node restores to the normal service state.

S311, copying the second directory information and the operation tool to a target node according to the address information of the target recovery node.

In this step, the target recovery node includes the target node.

S312, starting the target node, and executing a node update task on the target node according to the second directory information by using a node operation tool so as to update the target node to be the original recovery node.

S313, executing the newly added node task according to the second directory information by using the node operation tool.

S314, starting the newly added shared storage node to execute the node update task.

It should be noted that S313-S314 are executed in a loop until the data recovery of the shared storage service cluster is completed, i.e. the number of newly built nodes reaches the preset number of nodes. Optionally, the value range of the preset node number is 1 to the total number of nodes of the failed shared storage service cluster. Optionally, the preset node number is a fault node number or a node number to be recovered.

In this embodiment, specifically, a new ETCD configuration file is generated locally (i.e., in a distributed system), then an original data directory file (i.e., second directory information) and related ETCD binary files and ETCD configuration files are copied to a node (i.e., a target node), then the copied data (i.e., second directory information, ETCD binary files and ETCD configuration files) are used to start an ETCD service program of the single node, then metadata information of the single node is changed through a module update (i.e., a node update task), so that the metadata information is different from an original failure node (i.e., a node to be restored), and then new node information is added through a module add (i.e., a new node task), and the new node is started; and if more nodes exist, the module add and the newly added ETCD node start are sequentially repeated.

It should be further noted that, after step S310 or step S314 is completed, in order to ensure that the execution of data recovery is effective, and no logic error or logic conflict exists, the occurrence of a crash of the secondary system is avoided, and the cluster state self-checking operation is performed before the distributed system resumes service. Optionally, checking circularly through interval time until confirming that formal recovery is completed; if the recovery is successful, a success log is returned, and if the recovery is failed, a failure and related step error log is returned.

Fig. 4 is a schematic structural diagram of a data recovery device according to an embodiment of the present application. The data recovery apparatus 400 may be implemented in software, hardware, or a combination of both.

As shown in fig. 4, the data recovery apparatus 400 includes:

an obtaining module 401, configured to obtain a recovery configuration instruction;

a configuration module 402, configured to determine, in response to the restoration configuration instruction, address information of a target restoration node and restoration mode information, where the target restoration node includes some or all of the shared storage nodes in the shared storage service cluster, and the restoration mode information includes at least one restoration mode;

A recovery module 403, configured to determine target recovery data corresponding to the recovery mode information;

The recovery module 403 is further configured to copy the target recovery data and a node operation tool to the target recovery node according to the address information, so that the target recovery node completes data recovery.

In one possible design, the shared storage service cluster includes: the first cluster that has failed, the configuration module 402 is specifically configured to:

In one possible design, the shared storage service cluster further comprises: a second cluster that is not malfunctioning, the configuration module 402 is further configured to:

optionally, the configuration module 402 is further configured to:

judging whether the second cluster is constructed or not;

if not, creating the second cluster through a preset management tool;

Optionally, the preset management tool includes: a containerization management tool, the configuration module 402 is further configured to:

In one possible design, the recovery manner information includes: a first recovery manner and a second recovery manner, where the first recovery manner is to perform data recovery with backup data, the second recovery manner is to perform data recovery with normal operation data stored in the shared storage service cluster, and the recovery module 403 is specifically configured to:

In one possible design, if the recovery mode is the first recovery mode, the recovery module 403 is further configured to:

and restarting each target recovery node to complete data recovery.

In one possible design, if the recovery mode is the second recovery mode, the recovery module 403 is further configured to:

It should be noted that, the apparatus provided in the embodiment shown in fig. 4 may perform the method provided in any of the above method embodiments, and the specific implementation principles, technical features, explanation of terms, and technical effects are similar, and are not repeated herein.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic device 500 may include: at least one processor 501 and a memory 502. Fig. 5 shows an electronic device, for example a processor.

A memory 502 for storing a program. In particular, the program may include program code including computer-operating instructions.

The memory 502 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 501 is configured to execute computer-executable instructions stored in the memory 502 to implement the methods described in the method embodiments above.

The processor 501 may be a central processing unit (central processing unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present application.

Alternatively, the memory 502 may be separate or integrated with the processor 501. When the memory 502 is a device separate from the processor 501, the electronic device 500 may further include:

A bus 503 for connecting the processor 501 and the memory 502. The bus may be an industry standard architecture (industry standard architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. Buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 502 and the processor 501 are integrated on a chip, the memory 502 and the processor 501 may complete communication through an internal interface.

Embodiments of the present application also provide a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, and specifically, the computer readable storage medium stores program instructions for the methods in the above method embodiments.

The embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the method of the above-described method embodiments.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. A data recovery method applied to a shared storage service cluster, the shared storage service cluster including at least one shared storage node, the method comprising:

Determining address information and restoration mode information of a target restoration node in response to a restoration configuration instruction, wherein the target restoration node comprises part or all of the shared storage nodes in the shared storage service cluster, the restoration mode information comprises at least one restoration mode, the restoration configuration instruction is that a user inputs or selects the restoration mode at a control interface, and after setting partial related configuration parameters, a distributed system automatically complements the remaining configuration parameters to form a configuration file meeting preset requirements, and the restoration mode comprises backup file restoration or local data restoration;

2. The data recovery method according to claim 1, wherein the recovery pattern information includes: the target recovery data corresponding to the first recovery mode includes: backup data backed up at a first preset time.

3. The data recovery method of claim 2, wherein the backup data comprises: the first shared data automatically backed up at a preset time, and the second shared data manually backed up by the user at any time.

4. A data recovery method according to any one of claims 1 to 3, wherein the recovery pattern information includes: the second recovery mode, the target recovery data corresponding to the second recovery mode includes: and normal operation data stored in the shared storage service cluster at a second preset moment.

5. The method of claim 4, wherein the second preset time comprises: and the last moment of normal operation of the shared storage service cluster.

6. The data recovery method of claim 4, wherein the normal operation data comprises: at least one of the shared storage nodes is configured to store shared data in real time during normal operation.

7. The data recovery method of claim 4, wherein the shared storage service cluster comprises: the first cluster having failed, in response to the recovery configuration instruction, determining address information and recovery mode information of the target recovery node, including:

8. The data recovery method of claim 7, wherein the shared storage service cluster further comprises: and after the judging whether the configuration instruction indicates to perform data recovery on the first cluster, the method further comprises:

9. The data recovery method of claim 8, further comprising, prior to said taking at least one shared storage node on said second cluster as said target recovery node:

judging whether the second cluster is constructed or not;

if not, creating the second cluster through a preset management tool;

10. The data recovery method according to claim 9, wherein the preset management tool includes: the containerized management tool creates the second cluster through a preset management tool, and the containerized management tool comprises:

11. The data recovery method according to claim 7, wherein the recovery pattern information includes: a first recovery manner and a second recovery manner, where the first recovery manner is to recover data with backup data, the second recovery manner is to recover data with normal operation data stored in the shared storage service cluster, and the determining directory information of the target recovery data according to the recovery manner information indicated by the configuration instruction includes:

12. The method of claim 11, wherein if the recovery mode is the first recovery mode, copying the target recovery data and the node operation tool to the target recovery node according to the address information, so that the target recovery node completes data recovery, comprising:

and restarting each target recovery node to complete data recovery.

13. The method of claim 11, wherein if the recovery mode is the second recovery mode, copying the target recovery data and the node operation tool to the target recovery node according to the address information, so that the target recovery node completes data recovery, comprising:

14. A data recovery apparatus, comprising:

The recovery module is further configured to copy the target recovery data and the node operation tool to the target recovery node according to the address information, so that the target recovery node completes data recovery, the recovery configuration instruction is that after a user inputs or selects a recovery mode at a control interface and sets a part of relevant configuration parameters, the distributed system automatically complements the remaining configuration parameters to form a configuration file meeting preset requirements, and the recovery mode includes backup file recovery or local data recovery.

15. An electronic device, comprising: a processor and a memory; wherein,

The memory is used for storing a computer program of the processor;

the processor is configured to perform the data recovery method of any one of claims 1 to 13 via execution of the computer program.

16. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the data recovery method of any one of claims 1 to 13.

17. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the data recovery method of any one of claims 1 to 13.