[go: up one dir, main page]

CN111158949A - Disaster-tolerant architecture configuration method, switching method, device, device, and storage medium - Google Patents

Disaster-tolerant architecture configuration method, switching method, device, device, and storage medium Download PDF

Info

Publication number
CN111158949A
CN111158949A CN201811318053.2A CN201811318053A CN111158949A CN 111158949 A CN111158949 A CN 111158949A CN 201811318053 A CN201811318053 A CN 201811318053A CN 111158949 A CN111158949 A CN 111158949A
Authority
CN
China
Prior art keywords
cloud
disaster recovery
ecs
availability zone
production
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811318053.2A
Other languages
Chinese (zh)
Inventor
秦可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Chongqing Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Chongqing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Chongqing Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201811318053.2A priority Critical patent/CN111158949A/en
Publication of CN111158949A publication Critical patent/CN111158949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1489Generic software techniques for error detection or fault masking through recovery blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Stored Programmes (AREA)

Abstract

本发明公开了一种容灾架构的配置方法、切换方法及装置、设备和存储介质。该配置方法包括:根据用户需求规格,在云上容灾架构的生产可用区和容灾可用区中分别创建规格相同的云主机;在生产可用区的云主机ECS、和容灾可用区的对应云主机ECS’中分别绑定相同的业务地址,并在ECS中绑定管理地址IP、ECS’中绑定管理地址IP’;分别在ECS和ECS’中加载用于应用数据同步的第一智能引擎;将ECS的业务地址和IP设置为激活状态,将ECS’的业务地址设置为非激活状态,将IP’设置为激活状态。利用本发明可在容灾环境为用户创建与生产环境一致的云主机,并且在上述对应的云主机中分别绑定相同的业务地址,可以无需修改IP地址相关代码或配置文件,实现容灾的自动化无缝切换。

Figure 201811318053

The invention discloses a configuration method, switching method and device, equipment and storage medium of a disaster tolerance architecture. The configuration method includes: creating cloud hosts with the same specifications in the production availability zone and the disaster recovery availability zone of the disaster recovery architecture on the cloud according to user requirements and specifications; the cloud host ECS in the production availability zone corresponds to the disaster recovery availability zone The cloud host ECS' is bound to the same service address, and the management address IP is bound to the ECS, and the management address IP' is bound to the ECS'; the first intelligence for application data synchronization is loaded in the ECS and ECS' respectively Engine; set the service address and IP of the ECS to the active state, set the service address of the ECS' to the inactive state, and set the IP' to the active state. The present invention can create cloud hosts consistent with the production environment for users in a disaster recovery environment, and bind the same business addresses to the corresponding cloud hosts, without modifying IP address-related codes or configuration files, thereby realizing disaster recovery. Automate seamless switching.

Figure 201811318053

Description

Configuration method, switching method and device of disaster recovery architecture, equipment and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a configuration method of a disaster recovery architecture, a disaster recovery switching method and apparatus, a device, and a storage medium.
Background
At present, a double-active solution based on Global load balancing (GSLB) is mainly adopted for realizing disaster tolerance of a cloud service system, wherein the disaster tolerance means that applications and data of users are protected from being affected by faults and disasters, and continuous use is ensured. The solution is to bind cloud hosts (ECSs) of different available areas under a GSLB instance, and distribute and deploy Service system applications in the ECSs, so as to avoid unavailability of external services due to failure of a single available area.
In addition, data disaster recovery in the same city is realized through database services spanning multiple available areas, wherein the available areas refer to one or more data centers isolated from each other by infrastructures such as power and networks in the same area.
However, the business system adopts the disaster recovery architecture as the following preconditions: (1) public or private clouds can provide global load balancing; (2) a database service that can achieve data synchronization between different available areas; (3) the cloud-based business system must adopt a distributed + stateless architecture design.
IT should be noted that, although the distributed + stateless architecture is an excellent Information Technology (IT) system high-availability design solution, there are still a lot of IT systems adopting a traditional architecture that is not distributed or is stateful, and when such business systems are deployed in a public cloud or a private cloud (especially in the public cloud), IT is difficult to implement disaster recovery protection of the business systems based on a global load balancing dual-active solution.
In addition, a large number of service systems have Internet Protocol (IP) address dependencies when communicating internally. For example, in an application scenario where data transmission across servers is performed in a File Transfer Protocol (ftp) manner, in such a scenario, an IP address of at least one of two communicating parties is required to be fixed, otherwise, an IP address related code or a configuration File needs to be modified.
However, a dual-active disaster recovery system based on global load balancing often needs to deploy a service system in two available areas. Due to the inconsistency of the gateways of the two available areas, it is difficult to realize that the services of different available areas can use the same IP address when a disaster occurs, and at this time, the relevant codes or configurations of the IP addresses are often required to be changed to realize the normal operation of the service system.
In summary, a method is required for realizing automatic seamless switching of disaster tolerance and improving disaster tolerance switching efficiency without modifying IP address related codes or configuration files.
Disclosure of Invention
The embodiment of the invention provides a configuration method of a disaster recovery architecture, a disaster recovery switching method, a device thereof, equipment thereof and a storage medium, which can realize the deployment of a production environment and a disaster recovery environment and realize the synchronization of data of the production environment to the disaster recovery environment.
In a first aspect, an embodiment of the present invention provides a method for configuring a disaster recovery architecture, where the method includes:
according to the specification of user requirements, respectively creating cloud hosts with the same specification in a production available area and a disaster recovery available area of a cloud disaster recovery architecture;
respectively binding the same service address in a cloud host ECS of a production available area and a corresponding cloud host ECS 'of a disaster recovery available area, and binding a management address IP in the ECS and a management address IP' in the ECS;
loading a first smart engine for application data synchronization in the ECS and ECS', respectively;
and setting the service address and the IP of the ECS to be in an activated state, setting the service address of the ECS 'to be in an inactivated state, and setting the IP' to be in an activated state.
The configuration method of the disaster recovery architecture according to the present invention further comprises:
respectively creating cloud databases with the same specification in a production available area and a disaster recovery available area;
and respectively loading a second intelligent engine for data synchronization in the cloud database RDS of the production available area and the cloud database RDS' of the disaster tolerance available area.
The configuration method of the disaster recovery architecture on the cloud according to the present invention further includes:
and creating load balancing examples with the same specification in the production available area and the disaster recovery available area, setting the load balancing examples of the production available area to be in an activated state, and setting the load balancing examples of the disaster recovery available area to be in an inactivated state.
The configuration method of the disaster recovery architecture on the cloud according to the present invention further includes:
creating an elastic block storage EBS in a production available area, creating an elastic block storage EBS 'with the same specification in a disaster tolerance available area, completing storage mounting of the ECS and the RDS in the EBS, and completing storage mounting of the ECS' and the RDS 'in the EBS'.
According to the configuration method of the disaster recovery architecture on the cloud of the present invention, before the cloud hosts with the same specification are respectively created in the production available area and the disaster recovery available area of the disaster recovery architecture on the cloud according to the specification of the user requirement, the method further includes:
and according to the disaster recovery requirement of the to-be-cloud service system, a cloud service deployment scheme aiming at the production available area and the disaster recovery available area of the to-be-cloud service system in the cloud is formulated.
According to the configuration method of the disaster recovery architecture on the cloud of the present invention, after the cloud hosts with the same specification are respectively created in the production available area and the disaster recovery available area of the disaster recovery architecture on the cloud according to the specification of the user requirement, the method further includes:
and the configuration data is synchronously issued in the production available area and the disaster recovery available area to complete the synchronization of the configuration data between the ECS and the ECS'.
According to the configuration method of the disaster recovery architecture on the cloud of the present invention, after the synchronization of the configuration data between the ECS and the ECS' is completed by the manner of synchronously issuing the configuration data in the production available area and the disaster recovery available area, the method further includes:
adding a transmitting end mark and a receiving end management address in a first intelligent engine of the ECS, and adding a receiving end mark and a transmitting end management address in a first intelligent engine of the ECS';
a first intelligent engine of the ECS rewrites a data operation command and an Application Program Interface (API) of an operating system, generates a data operation log according to the data operation command and the API, and sends a data synchronization request to the ECS';
and the ECS 'verifies whether the originating address is an authorized address, establishes connection between the ECS and the ECS' after the verification is passed, and completes synchronization of the data operation log.
According to the configuration method of the disaster recovery architecture on the cloud of the present invention, after the cloud databases with the same specification are respectively created in the production available area and the disaster recovery available area, the method further includes:
adding a transmitting end mark and a receiving end management address in a second intelligent engine of the RDS, and adding a receiving end mark and a transmitting end management address in a second intelligent engine of the RDS';
a second intelligent engine of the RDS performs duplication on a data operation command and an application program interface API of database software, and generates a data execution log according to the data operation command and the API, and the RDS sends a data synchronization request to the RDS';
the RDS 'verifies whether the originating address is an authorized address, establishes connection between the RDS and the RDS' after the verification is passed, and completes synchronization of the data execution log.
According to the configuration method of the disaster recovery architecture on the cloud, after the load balancing instances with the same specification are created in the production available area and the disaster recovery available area, the method further includes:
completing load balancing strategy configuration on a load balancing example of a production available area, and reporting load balancing strategy configuration information to a Software Defined Network (SDN) controller of the production available area;
the SDN controller of the production available area generates a load balancing configuration metadata message according to the load balancing strategy configuration information, and synchronizes the metadata message to the SDN controller of the disaster tolerance available area;
and the SDN controller of the disaster recovery available area issues the metadata message to the load balancing example of the disaster recovery available area, and the production data synchronization between the production available area and the load balancing example of the disaster recovery available area is completed.
In a second aspect, an embodiment of the present invention provides a disaster recovery switching method, for a cloud disaster recovery architecture configured according to the method described above, including:
when the production available area is judged to be unavailable, the production environment identification of the production available area is cancelled, and the production environment identification is added to the disaster recovery available area;
adding a sending end mark and a receiving end management address in a second intelligent engine of a cloud database of a disaster recovery available area;
adding a sending end mark and a receiving end management address in a first intelligent engine of a cloud host of a disaster tolerance available area, informing a Software Defined Network (SDN) controller of the disaster tolerance available area, and activating a service address of the cloud host of the disaster tolerance available area;
and informing an SDN controller of the disaster tolerance available area, and activating a load balancing example of the disaster tolerance available area.
According to the disaster recovery switching method, after notifying the SDN controller of the disaster recovery available area to activate the load balancing instance of the disaster recovery available area, the method further includes:
when the production available area is recovered to be normal, informing an SDN controller of the production available area, setting a service address and a load balancing instance of a cloud host of the production available area to be in an inactive state, and adding a receiving end mark and a sending end management address in a second intelligent engine of a cloud database of the production available area and a first intelligent engine of the cloud host to realize data synchronization with the disaster recovery available area.
In a third aspect, an embodiment of the present invention provides a configuration device for a disaster recovery architecture on a cloud, where the configuration device includes:
the system comprises a first establishing module, a second establishing module and a third establishing module, wherein the first establishing module is used for respectively establishing cloud hosts with the same specification in a production available area and a disaster recovery available area of a disaster recovery framework on the cloud according to a user requirement specification;
the binding module is used for binding the same service address in the cloud host ECS of the production available area and the corresponding cloud host ECS 'of the disaster recovery available area respectively, and binding the management address IP in the ECS and the management address IP' in the ECS;
the first loading module is used for loading a first intelligent engine for application data synchronization in the ECS and the ECS' respectively;
the setting module is used for setting the service address and the IP of the ECS to be in an activated state, setting the service address of the ECS 'to be in an inactivated state and setting the IP' to be in an activated state.
In a fourth aspect, an embodiment of the present invention provides a disaster recovery switching device, where the switching device includes:
the judging module is used for canceling the production environment identification of the production available area and adding the production environment identification in the disaster recovery available area when the production available area is judged to be unavailable;
the system comprises a first adding module, a second adding module and a third adding module, wherein the first adding module is used for adding a sending end mark and a receiving end management address in a second intelligent engine of a cloud database of a disaster recovery available area;
the second adding module is used for adding a sending end mark and a receiving end management address in a first intelligent engine of a cloud host of the disaster tolerance available area, informing a Software Defined Network (SDN) controller of the disaster tolerance available area and activating a service address of the cloud host of the disaster tolerance available area;
and the activation module is used for notifying an SDN controller of the disaster tolerance available area and activating a load balancing example of the disaster tolerance available area.
In a fifth aspect, an embodiment of the present invention provides a device in a cloud disaster recovery architecture, where the device includes: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of the first aspect of the embodiments described above or the method of the second aspect of the embodiments described above.
In a sixth aspect, embodiments of the present invention provide a computer-readable storage medium, on which computer program instructions are stored, which, when executed by a processor, implement the method of the first aspect in the above-mentioned implementation mode or the method of the second aspect in the above-mentioned implementation mode.
According to the scheme provided by the invention, when the user selects the disaster tolerance capability provided by the cloud platform, the cloud host consistent with the production environment is automatically created for the user in the disaster tolerance environment, the same service addresses are respectively bound in the cloud host of the production available area and the corresponding cloud host of the disaster tolerance available area, and the intelligent engines are respectively loaded in the two available areas, so that when subsequent disaster tolerance switching is carried out, the related codes or configuration files of IP addresses do not need to be modified, the automatic seamless switching of disaster tolerance can be realized, and the disaster tolerance switching efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a configuration method of a disaster recovery architecture on a cloud based on a software solution according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a dual-active disaster recovery system based on global load balancing according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a cloud disaster recovery system based on a software solution according to an embodiment of the present invention;
fig. 4 shows a structural schematic diagram of a cloud disaster recovery function architecture based on a software solution according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a load-balanced data synchronization mechanism according to an embodiment of the invention;
FIG. 6 is a schematic diagram illustrating an application data synchronization mechanism of a cloud host according to an embodiment of the invention;
FIG. 7 is a schematic diagram illustrating a data synchronization mechanism of a cloud database according to an embodiment of the invention;
fig. 8 is a flowchart illustrating a disaster recovery switching method according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a configuration device of a disaster recovery architecture based on a software solution according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a disaster recovery switching device according to an embodiment of the present invention;
fig. 11 shows a hardware configuration diagram of the apparatus according to the embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The invention aims to automatically create a cloud host consistent with a production environment for a user in a disaster recovery environment when the user selects disaster recovery capability provided by a cloud platform, and respectively bind the same service address in the cloud host of a production available area and the corresponding cloud host of the disaster recovery available area, and respectively load an intelligent engine in the two available areas, so that when subsequent disaster recovery switching is carried out, IP address related codes or configuration files do not need to be modified, automatic seamless switching of disaster recovery can be realized, and the disaster recovery switching efficiency is improved. Various aspects of the invention are described in detail below.
< public cloud >
A public cloud generally refers to a shared resource service such as computing power, storage power, network power, database power, etc., which is provided by a third-party provider for unspecified users and can be directly accessed through the internet.
< private cloud >
A private cloud is a proprietary resource of computing, storage, networks, databases, etc. that is built for individual use by a client.
< elastic calculation Server >
An Elastic Computing Server (ECS) (sometimes also referred to as a cloud host) is a simple and efficient computing Service with elastically scalable processing capability, helps a user to quickly construct a more stable and secure application, improves operation and maintenance efficiency, reduces Information Technology (IT) cost, and enables the user to concentrate on core business innovation.
< Global load Balancing >
Global Load balancing (GSLB). The function is as follows: the traffic allocation among the servers in different regions on a wide area network (including the Internet) is realized, and the optimal server is ensured to be used for serving the client nearest to the server, so that the access quality is ensured.
< Server load Balancing >
Server Load Balance (SLB) supports traffic distribution to multiple ECSs to improve service capability of an application system, which has been the entry of a key service system for a long time.
< relational database service >
A Relational Database Service (RDS or Relational Database, RDB) is an on-line Database Service that is ready-to-use, stable, reliable, and flexible. The system has multiple safety protection measures and a perfect performance monitoring system, and provides a professional database backup, recovery and optimization scheme.
< elastic Block storage >
Elastic Block Store (EBS) is a Block level data Store provided for cloud server instances.
< software-defined network controller >
A Software Defined Network (SDN) controller is an application in a Software Defined Network and is responsible for flow control to ensure an intelligent Network.
Based on the above, an embodiment of the present invention may provide a method for configuring a cloud disaster recovery architecture based on a software scheme, and referring to fig. 1, fig. 1 shows a schematic flow chart of a method 100 for configuring a cloud disaster recovery architecture based on a software scheme according to an embodiment of the present invention, where the method includes:
s110, respectively creating cloud hosts with the same specification in a production available area and a disaster recovery available area of a disaster recovery architecture on the cloud according to the specification required by a user;
s120, respectively binding the same service address in the cloud host ECS of the production available area and the corresponding cloud host ECS ' of the disaster recovery available area, and binding a management address IP in the ECS and a management address IP ' in the ECS ';
s130, loading a first intelligent engine for application data synchronization in the ECS and the ECS' respectively;
s140, the service address and the IP of the ECS are set to be in an activated state, the service address of the ECS 'is set to be in an inactivated state, and the IP' is set to be in an activated state.
By utilizing the scheme provided by the invention, when the user selects the disaster tolerance capability provided by the cloud platform, the cloud host consistent with the production environment can be automatically created for the user in the disaster tolerance environment, the same service addresses are respectively bound in the cloud host of the production available area and the corresponding cloud host of the disaster tolerance available area, and the intelligent engines are respectively loaded in the two available areas, so that the automatic seamless switching of disaster tolerance can be realized without modifying IP address related codes or configuration files during the subsequent disaster tolerance switching, and the disaster tolerance switching efficiency is improved.
The following describes, by way of specific examples, alternative specific processes of embodiments of the present invention. It should be noted that the scheme of the present invention does not depend on a specific software scheme, and in practical applications, any known or unknown software, algorithm, program, or any combination thereof may be used to implement the scheme of the present invention, and the scheme of the present invention is within the protection scope of the present invention as long as the essential idea of the scheme of the present invention is adopted.
Referring to fig. 2, fig. 2 is a schematic structural diagram illustrating a dual-active disaster recovery system based on global load balancing according to an embodiment of the present invention.
At present, a double-active solution based on Global Server Load Balance (GSLB) is mainly adopted for realizing disaster recovery of a cloud service system.
The double-activity solution is a double-activity mode that double data centers simultaneously provide service production services to the outside, the two data centers are equal, do not divide a master and a slave, and can simultaneously deploy services, so that the utilization rate of resources and the working efficiency and performance of the system can be greatly improved.
As an example, referring to fig. 2, an ECS (electronic computer Service) binding different available areas under a global load balancing instance deploys a Service system application in the ECS in a distributed manner, so as to avoid unavailability of external services due to a failure of a single available area. And data disaster recovery in the same city is realized through a Relational Database Service (RDS) across multiple available areas.
The available Zone (Availability Zone) refers to one or more data centers in which infrastructures such as power and network are isolated from each other in the same area.
It should be noted that, in the above dual-active solution, the precondition for the service system to adopt the disaster-tolerant architecture includes the following items: first, a public cloud or a private cloud can provide global load balancing; second, public or private clouds can provide RDS that can enable data synchronization between different available zones; third, the cloud-based business system must adopt a distributed + stateless architecture design.
The public cloud refers to shared resource services such as computing power, storage power, network power, database power and the like which are provided by a third-party provider for unspecified users and can be directly accessed through the Internet. And the private cloud refers to a proprietary resource service such as computing capacity, storage capacity, network capacity, database capacity and the like which is constructed for being used by one customer alone.
In addition, whether the cloud is a public cloud or a private cloud, an application scenario exists, that is, a disaster tolerance capability is provided for the service system deployed on the cloud, so that the service continuity of the service system on the cloud when a disaster comes is improved.
As shown in fig. 2, the dual-active disaster recovery system based on global load balancing provides a good disaster recovery supporting scheme for the cloud on the service system with the distributed + stateless architecture, but the scheme currently has the following technical defects:
first, the distributed + stateless architecture is a good Information Technology (IT) system high availability design. However, a large number of IT systems still adopt a non-distributed or stateful traditional architecture, and when the traditional architecture is deployed in a public cloud or a private cloud, especially in the public cloud, IT is difficult to implement disaster recovery protection of a service system based on a global load balancing dual-active disaster recovery system.
The stateless service refers to a service without a special state, and each request is uniformly and indiscriminately processed for the Web server, that is, the Web server is only responsible for processing data submitted by each request of a user and then returning a processing result, and the Web server does not store any data (such as an IP, an account number, a password and the like) related to the request and stores the data in a background database server or a cluster. When any one Web server fails, other servers can acquire information from the background database server or the cluster without influencing the continuous operation of the request.
Stateful refers to the retention of previously requested information in the server for processing of the current request. Therefore, when any one Web server fails, the other Web servers need to process the request from the beginning because they do not have any data associated with the request.
A distributed system, i.e., having more than one server, and a distributed system is a software system built on top of a network. It is because of the nature of software that distributed systems have a high degree of coherence and transparency. The cohesiveness refers to the high autonomy of distributed nodes of each database and a local database management system; transparency means that each database distribution node is transparent to the user's application, not seen locally or remotely. In a distributed database system, a user does not feel that data is distributed, i.e., the user does not need to know whether a relationship is split, whether there is a copy, where the data resides, on which site a transaction is executed, etc.
In summary, although the stateless + distributed architecture is a very good IT system high availability design solution. However, in the current application, besides some large internet companies adopting the stateless architecture, a great number of IT systems still adopt the stateful traditional architecture, and the stateless application is limited to a certain extent.
Second, a large number of service systems have Internet Protocol (IP) address dependencies when communicating internally. For example, in an application scenario where data transmission across servers is performed in a File Transfer Protocol (ftp) manner, in such a scenario, an IP address of at least one of two communicating parties is required to be fixed, otherwise, an IP address related code or a configuration File needs to be modified.
However, a dual-active disaster recovery system based on global load balancing often needs to deploy a service system in two available areas, for example, two different machine rooms can be simply understood. Because the gateways of the two available areas are inconsistent, it is difficult to realize that the services of different available areas can use the same IP address when a disaster occurs, and at this time, the relevant codes or configurations of the IP addresses are often required to be changed to realize the normal operation of the service system.
In summary, to solve the above technical problems, embodiments of the present invention provide a method and an apparatus for implementing cloud disaster tolerance based on a Software Defined Network (SDN). The device has the core that the redefinition of the network is realized based on a software scheme, the unified management of the disaster tolerance capability of the service system is realized through a cloud disaster tolerance intelligent control device (hereinafter referred to as an intelligent control device), and the data synchronization is realized through an intelligent engine. When a user selects disaster tolerance capability provided by a cloud platform, the intelligent management and control device automatically creates a cloud host (including a Central Processing Unit (CPU), memory size, operating system, IP address and other system configurations), block storage, a database, load balancing and other services which are completely consistent with a production environment for the user in the disaster tolerance environment, and simultaneously starts an intelligent engine to realize data synchronization of a disaster tolerance side. When the disaster tolerance characteristic is triggered, the system automatically activates the disaster tolerance environment, inherits the configuration of the IP address and the like of the original production environment, and ensures the service continuity of the service system under the condition of not changing the system configuration.
Referring to fig. 3, fig. 3 is a schematic structural diagram illustrating a cloud disaster recovery system based on a software scheme according to an embodiment of the present invention.
In fig. 3, the cloud disaster recovery system according to the embodiment of the present invention includes an available area a and an available area B, where the available area a serves as a production environment; the available area B is used as disaster recovery environment and is not activated.
As an example, in the disaster recovery architecture shown in FIG. 3, in the available area A, cloud products SLB, ECS-1, ECS-2, RDS, and EBS are included. The service system distributes the request to ECS-1 and ECS-2 through load balancing SLB in the available area A, and various types of state data and service data are stored in EBS, so that the available area A can be used as a production environment. Wherein ECS-1, ECS-2 and RDS all use EBS to store data. Meanwhile, the same cloud service as the available area A, including SLB, ECS-1 ', ECS-2', RDS and EBS, is automatically created in the available area B, so that the available area B can be used as a disaster recovery environment. And the available area B is implemented to be in data consistency with the available area a, while the available area B is in a to-be-activated state. And when the cloud disaster tolerance intelligent control device detects that the disaster tolerance switching condition is met, activating the service of the available area B and providing the service to the outside so as to realize service continuity guarantee. In other words, before the cloud disaster recovery intelligent control device detects that the cloud disaster recovery intelligent control device meets the disaster recovery switching condition, the available area B is in the state to be activated.
The disaster recovery architecture provided by the embodiment of the invention can provide a capability for a public cloud or a private cloud, and can provide a general disaster recovery solution for a service system of any architecture of the upper cloud so as to supplement a double-active disaster recovery system.
Referring to fig. 4, fig. 4 is a schematic structural diagram illustrating a cloud disaster recovery functional architecture based on a software scheme according to an embodiment of the present invention.
In fig. 4, a cloud disaster recovery function architecture according to an embodiment of the present invention includes a cloud management platform, an intelligent management and control device, and an available area a and an available area B. The intelligent management and control device comprises a disaster tolerance function management and scheduling module, a cloud management platform scheduling module and an intelligent engine management and scheduling module. The available area A is used as a production environment, and the available area B is used as a disaster recovery environment. And each available zone includes an SDN controller, a computing resource pool, a network resource pool, a database resource pool, and a storage resource pool. The resource pool refers to a collection of various hardware and software involved in the cloud computing data center. A plurality of ECSs may be included in the pool of computing resources, the ECSs including an intelligent engine therein. For example, the pool of computing resources of available zone A may include ECS-1, ECS-2. The computing resource pool of available zone B may include ECS-1 ', ECS-2'; the network resource pool may include an SLB; the database resource pool can comprise a plurality of RDSs, and the RDS comprises an intelligent engine; the storage resource pool may include EBSs.
The core functions of the cloud disaster recovery architecture shown in fig. 4 are mainly implemented by an intelligent management and control device, an SDN controller, and an intelligent engine.
The intelligent management and control device mainly realizes application layer requirement analysis, control layer instruction issuing and disaster tolerance intelligent management; the SDN controller mainly realizes automatic configuration of network functions according to instructions of the intelligent management and control device; the intelligent engine mainly realizes the production data modularization synchronization function.
Automated deployment of cloud disaster tolerance environment
The cloud service requirement analysis and logic control according to the embodiment of the present invention will be described with reference to fig. 4.
First, a user submits a Cloud service requirement through a Cloud Management Platform (CMP), and the CMP distributes the Cloud service requirement to an intelligent Management and control device. Among them, the CMP is a product providing integrated management of public cloud, private cloud, and hybrid cloud.
Secondly, when the intelligent management and control device receives the cloud service requirement, the disaster recovery function management and scheduling module analyzes the cloud service requirement, and notifies the formulated deployment scheme to the cloud management platform for execution through the cloud management platform scheduling module. Specific deployment scenarios are discussed below.
In the first step, a deployment scheme of the available area is formulated. As an example, the "disaster recovery function management and scheduling module" determines the deployment scenario of the available area according to the disaster recovery requirement. In an embodiment of the present invention, a dual-available-area deployment scheme is adopted, and a "cloud management platform scheduling module" notifies a CMP to select an appropriate available area a and an appropriate available area B, and at the same time, a special identifier is added to the available area a to indicate that the available area a can be currently used as a production environment, and no identifier is added to the available area B to indicate that the available area B to which the identifier is not added can be currently used as a disaster recovery environment.
And secondly, making a deployment scheme of the cloud host.
As an example, first, according to the user requirement specification, wherein the user requirement specification may include, for example, a Central Processing Unit (CPU), a memory, etc., the "cloud management platform scheduling module" notifies the CMP to create two cloud hosts, including ECS-1 and ECS-2, in the available area A, and also create two cloud hosts, including ECS-1 'and ECS-2', in the available area B according to the same specification, as shown in FIG. 4.
Second, the smart regulation apparatus notifies the CMP to allocate two production IP addresses (e.g., service IP1, service IP2) and four built-in management IP addresses (e.g., management IP1, management IP2, management IP1 ', management IP 2'). And the intelligent management and control device informs the SDN controller of the available area A of completing the IP address binding. And notifies the SDN controller of the available zone B to complete the IP address binding.
For example, ECS-1 bound services IP1, management IP1, ECS-2 bound services IP2, management IP2, with services IP and management IP active; ECS-1 'binding service IP1, management IP 1', ECS-2 'binding service IP2 and management IP 2', wherein the service IP is not activated, and the management IP is activated.
And thirdly, the cloud management platform scheduling module informs the ECS of each available area to load the intelligent engine, so that the ECS is deployed.
And thirdly, establishing a deployment scheme of the SLB of a load balancing example (which can be simply called load balancing). As an example, the "cloud management platform scheduling module" informs the CMP to create load balancing instances in the two available zones, respectively. Wherein, the load balancing example of the available area A is activated, and the load balancing example of the available area B is not activated.
And fourthly, making a deployment scheme of the cloud database RDS. The "cloud management platform scheduling module" informs the CMP to create a database in the available area a and also create a database in the available area B according to the same specification. And loading the intelligent engines in the databases respectively.
And fifthly, establishing a deployment scheme of the elastic block storage EBS. The "cloud management platform scheduling module" notifies the CMP to create an elastic block store in the available area a, and also creates an elastic block store in the available area B according to the same specification. And the storage mounting of the cloud host and the cloud database of the available area to which the elastic block belongs is finished by the storage of the elastic block.
Based on the above examples, it can be understood that by using the above scheme provided by the present invention, when the user selects the disaster tolerance capability provided by the cloud platform, services such as a cloud host, a cloud database, a load balancing example, and the like, which are consistent with the production environment, can be automatically created for the user in the disaster tolerance environment, and the intelligent engine is started at the same time, so that when subsequent disaster tolerance switching is performed, no modification of the relevant codes or configuration files of the IP addresses is required, and thus, automatic seamless switching of disaster tolerance can be realized, and the disaster tolerance switching efficiency is improved.
In addition, the scheme provided by the invention has no limitation and requirement on the cloud system architecture, and can provide disaster recovery solutions for the cloud systems of various architectures. And the deployment of the production environment and the disaster recovery environment can be automatically realized in the selected two data centers based on the software scheme.
< data synchronization mechanism for load balancing >
Referring to fig. 5, fig. 5 is a schematic diagram illustrating a data synchronization mechanism of a load balancing example according to an embodiment of the present invention.
Firstly, a user completes load balancing strategy configuration on a load balancing example of an available area A and automatically reports related configuration information to an SDN controller of the available area A;
secondly, automatically generating a load balancing configuration metadata message by the SDN controller of the available area A according to a standard configuration template, and synchronizing the load balancing configuration metadata message to the SDN controller of the available area B;
and thirdly, the SDN controller of the available area B issues the received load balancing configuration metadata message to the load balancing of the available area B, so that the production data synchronization of the load balancing is completed.
< data synchronization mechanism of cloud host >
In the embodiment of the invention, the data synchronization between the cloud host of the production available area and the corresponding cloud host of the disaster recovery available area comprises configuration data synchronization and application data synchronization.
Firstly, the configuration data of the cloud host mainly realizes the synchronization of the configuration data between the corresponding cloud hosts of the two available areas in a mode that SDN controllers of the two available areas synchronously send the configuration data. The configuration data includes CPU and memory specifications, os type and version, network configuration, and the like.
And secondly, the application data between the cloud host of the production available area and the corresponding cloud host of the disaster recovery available area are synchronized mainly through an intelligent engine module. The application data includes operating system data and the like.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating an application data synchronization mechanism of a cloud host according to an embodiment of the present invention.
As an example, as shown in fig. 6, in the ECS of each available zone, there are an application software layer, an intelligent engine layer, and an operating system layer. The intelligent engine layer comprises a file system management Software Development Kit (SDK), a log cache module and a data synchronization message queue. The application software layer of the ECS-1 is bound with the service IP1, and the intelligent engine layer is bound with the management IP 1; the application software layer of ECS-1 'binds service IP1 and the intelligent engine layer binds IP 1'.
The following describes the mechanism for synchronizing application data by taking ECS-1 in available zone a and ECS-1' in available zone B as an example. And, the ECS-2 of the available zone a and the ECS-2' of the available zone B implement application data synchronization by the same mechanism, and will not be described in detail herein.
In the first step, the intelligent management and control device adds an originating identifier and a terminating management address (such as management IP1 ') to the intelligent engine of the ECS-1 in the available area A through an intelligent engine management and scheduling module according to the state attribute that the available area belongs to the production environment or the disaster tolerance environment, and adds a terminating identifier and an originating management address (such as management IP1) to the intelligent engine of the ECS-1' in the available area B.
And secondly, in the originating ECS-1 of the available area A, when the intelligent engine is loaded, the copying of various data operation commands and Application Program Interfaces (API) of the operating system is mainly completed through the 'file system management SDK' in the intelligent engine. When the application software performs data operations (e.g., add, delete, change, etc.) on the ECS-1 through various commands or APIs provided by the operating system, it is the file system management commands and APIs provided by the called smart engine.
Thirdly, in the sending end ECS-1 of the available area A, the intelligent engine sends the operation command and the call record of the API to the log cache module through the file system management SDK, the log cache module generates a data operation log and calls the data synchronization message queue, and a data synchronization request is sent to the address (management IP1 ') of the receiving end ECS-1' of the available area B. If the receiving end ECS-1 ' does not respond, reporting to an intelligent engine management and scheduling module of the intelligent management and control device, caching data operation log information in the sending end ECS-1 in a file form, notifying reconnection after the intelligent engine management and scheduling module monitors that the sending end ECS-1 and the receiving end ECS-1 ' are normally connected, and implementing synchronization of the data operation logs between the sending end ECS-1 and the receiving end ECS-1 ' after the connection is successful.
Fourthly, when receiving the data synchronization request, the receiving end ECS-1' of the available area B first checks whether the originating address is an authorized address (i.e., determines whether the originating address is the management IP1), establishes a connection after the check is passed, performs data synchronization, and closes the connection after the synchronization is completed.
Fifthly, the receiving end ECS-1' of the available zone B simulates and executes the application data (for example, operating system data) of the cloud host one by one according to the received data operation log in sequence, so that the data synchronization of the application data (for example, operating system data) of the cloud host between the corresponding cloud hosts of the two different available zones (that is, the available zone a and the available zone B) is realized.
< data synchronization mechanism of cloud database >
Referring to fig. 7, fig. 7 is a schematic diagram illustrating a data synchronization mechanism of a cloud database according to an embodiment of the present invention. The data synchronization mechanism of the cloud database is similar to the application data synchronization mechanism of the cloud host, and is introduced below with reference to fig. 7:
as an example, as shown in fig. 7, in the cloud database (i.e., RDB) of each available region, there are a smart engine layer and a database software layer. The intelligent engine layer comprises a database management Software Development Kit (SDK), a log cache module and a data synchronization message queue. And an intelligent engine layer of the cloud database binds management IPs.
Firstly, adding an originating mark and a receiving end management address in an intelligent engine of a cloud database of an available area A by an intelligent engine management and scheduling module according to the state attribute of the available area belonging to a production environment or a disaster tolerance environment; meanwhile, a receiving end identifier and a sending end management address are added to an intelligent engine of the cloud database of the available area B.
And secondly, in the originating cloud database of the available area A, when the intelligent engine is loaded, rewriting various operation commands and APIs of database software is realized through 'database management SDK'. When the front end performs data operations (for example, adding, deleting, changing and the like) through various commands or APIs provided by database software, the front end is actually a database system management command and API provided by an intelligent engine of the called cloud database.
And thirdly, in the sending-end cloud database of the available area A, the intelligent engine sends the database operation record to a log cache module through a database management SDK, and the log cache module generates an operation execution log and calls a data synchronization message queue. And makes a data synchronization request to the receiving cloud database of the available area B. If the receiving end cloud database does not respond, the log information is reported to an intelligent engine management and scheduling module of the intelligent management and control device, the log information is cached in a node of the sending end cloud database in a file form, and after the intelligent engine management and scheduling module monitors that the sending end cloud database is normally connected with the receiving end cloud database, the intelligent engine management and scheduling module notifies that the connection is reconnected and succeeds, and then synchronization of the data execution log between the cloud database of the available area A and the corresponding cloud database of the available area B is implemented.
And fourthly, when the receiving end cloud database of the available area B receives the data synchronization request, firstly verifying whether the sending end address is an authorized address, establishing connection and implementing synchronization after the verification is passed, and closing the connection after the synchronization is completed.
And fifthly, the receiving cloud database of the available area B executes logs according to the received data, and the logs are simulated and executed one by one according to the sequence, so that data synchronization between the corresponding cloud databases of two different available areas is realized.
< mechanism for data synchronization of elastic Block storage >
The data stored in the elastic blocks do not need to be directly synchronized, but the consistency of the data stored in the elastic blocks in different available intervals is ensured through a data synchronization mechanism of the cloud host and a data synchronization mechanism of the cloud database in different available areas. In the foregoing embodiment, the data synchronization mechanism of the cloud host and the data synchronization mechanism of the cloud database in different available areas have been introduced, and are not described herein again.
Based on the above example, it can be understood that the data consistency can be guaranteed across hosts by using the cloud host data remote synchronization scheme provided by the invention; by utilizing the database data remote synchronization scheme provided by the invention, the data consistency can be guaranteed across database nodes.
In summary, based on the data synchronization method provided by the present invention, the data of the production environment can be automatically synchronized to the disaster recovery environment based on the software scheme. In addition, the implementation process of the embodiment of the invention has no limitation or requirement on the cloud system architecture, and can provide disaster recovery solutions for the cloud systems of various architectures.
Referring to fig. 8, the present invention further provides a disaster recovery switching method 800, which performs disaster recovery switching for a cloud disaster recovery architecture configured by the method shown in fig. 1, where the disaster recovery switching method includes:
s810, when the production available area is judged to be unavailable, the production environment identification of the production available area is cancelled, and the production environment identification is added to the disaster recovery available area;
s820, adding a sending end mark and a receiving end management address in a second intelligent engine of a cloud database of the disaster recovery available area;
s830, adding a sending end mark and a receiving end management address in a first intelligent engine of a cloud host of a disaster recovery available area, informing a Software Defined Network (SDN) controller of the disaster recovery available area, and activating a service address of the cloud host of the disaster recovery available area;
and S840, informing the SDN controller of the disaster recovery available area, and activating a load balancing instance of the disaster recovery available area.
By utilizing the scheme provided by the invention, the system can automatically activate the disaster recovery environment and inherit the configuration of the IP address and the like of the original production environment when the disaster recovery characteristic is triggered based on the software scheme, and can realize automatic switching without system configuration change, thereby improving the disaster recovery switching efficiency.
How to implement the disaster recovery automatic switching of four cloud services, i.e., load balancing, cloud hosts, elastic block storage, and cloud database, is described below with reference to fig. 4:
firstly, when the disaster recovery function management and scheduling module of the intelligent management and control device judges that the available area A is unavailable and needs to be switched, the cloud management platform is informed to cancel the production environment identifier of the available area A through the cloud management platform scheduling module, and the production environment identifier is added to the available area B.
And secondly, issuing an instruction by an intelligent engine management and scheduling module, and adding an originating identifier and a receiving management address in an intelligent engine of a cloud database (namely RDB) of the available region B.
Thirdly, an intelligent engine management and scheduling module issues an instruction, and a transmitting end mark and a receiving end management address are added to an intelligent engine of a cloud host machine in the available area B; and informing the SDN controller of the available area B, and activating service IP of two cloud hosts (including ECS-1 'and ECS-2') of the available area B.
And fourthly, informing the SDN controller of the available area B by the cloud management platform scheduling module, and activating the load balance of the available area B. Because the EBS is always mounted in the cloud host and the cloud database, and does not need to be activated or switched independently, the service system has successfully and automatically completed disaster recovery switching, and external services are recovered in the available area B.
Fifthly, because the available area B can be used as a production available area, and when the available area B is used as the production available area, a corresponding disaster recovery available area is not deployed, when a disaster occurs, the service continuity of the service system on the cloud cannot be ensured; therefore, when the available area a returns to normal, the data of the available area B needs to be synchronized to the available area a, and the available area a can be used as the disaster recovery available area. The method comprises the following specific steps:
when the disaster recovery function management and scheduling module monitors and finds that the available area A is recovered to be normal, the cloud management platform scheduling module informs an SDN controller of the available area A, service IPs and load balance of two cloud hosts of the available area A are set to be in an inactive state, and a receiving end mark and a sending end management address are added to a cloud database of the available area A and an intelligent engine of the cloud hosts through the intelligent engine management and scheduling module, so that data synchronization with the available area B is realized.
Corresponding to the configuration method of the cloud disaster recovery architecture based on the software scheme, the invention also provides a configuration device, equipment and a computer storage medium of the cloud disaster recovery architecture based on the software scheme.
Referring to fig. 9, fig. 9 is a schematic structural diagram illustrating a configuration apparatus 900 of a cloud disaster recovery architecture based on a software scheme according to an embodiment of the present invention, where the configuration apparatus of the cloud disaster recovery architecture based on the software scheme includes:
a first creating module 910, configured to create cloud hosts with the same specification in a production available area and a disaster recovery available area of a disaster recovery architecture on a cloud according to a user requirement specification;
a binding module 920, configured to bind the same service address in the cloud host ECS in the production available area and the corresponding cloud host ECS 'in the disaster recovery available area, and bind the management address IP in the ECS and the management address IP' in the ECS;
a first loading module 930 for loading a first smart engine for application data synchronization in the ECS and ECS', respectively;
a setting module 940, configured to set the service address and the IP of the ECS to an active state, set the service address of the ECS 'to an inactive state, and set the IP' to an active state.
The above-described apparatus and computer storage medium for a cloud disaster recovery architecture are described in detail below.
By utilizing the configuration device, equipment and computer storage medium of the on-cloud disaster recovery architecture provided by the invention, when a user selects disaster recovery capability provided by a cloud platform, a cloud host consistent with a production environment can be automatically created for the user in a disaster recovery environment, the same service addresses are respectively bound in the cloud host of a production available area and the corresponding cloud host of the disaster recovery available area, and the intelligent engines are respectively loaded in the two available areas, so that when subsequent disaster recovery switching is carried out, IP address related codes or configuration files do not need to be modified, the automatic seamless switching of disaster recovery can be realized, and the disaster recovery switching efficiency is improved. .
Corresponding to the disaster recovery switching method in the embodiment of the invention, the invention also provides a disaster recovery switching device, equipment and a computer storage medium.
Referring to fig. 10, fig. 10 shows a schematic structural diagram of a disaster recovery switching device 1000 according to an embodiment of the present invention, where the disaster recovery switching device 1000 includes:
a judging module 1010, configured to cancel the production environment identifier of the production available area and add the production environment identifier to the disaster recovery available area when it is determined that the production available area is unavailable;
a first adding module 1020, configured to add a sending end flag and a receiving end management address to a second intelligent engine of a cloud database in a disaster tolerance available area;
a second adding module 1030, configured to add a sending end flag and a receiving end management address to a first intelligent engine of a cloud host in a disaster recovery available area, notify a software defined network SDN controller in the disaster recovery available area, and activate a service address of the cloud host in the disaster recovery available area;
an activating module 1040, configured to notify an SDN controller of the disaster tolerance available area, and activate a load balancing instance of the disaster tolerance available area.
The above-described apparatus and computer storage medium for a cloud disaster recovery architecture are described in detail below.
By utilizing the switching device, the equipment and the computer storage medium of the cloud disaster recovery architecture, which are provided by the invention, based on a software scheme, when the disaster recovery characteristic is triggered, the system can automatically activate the disaster recovery environment and inherit the configuration such as the IP address of the original production environment, so that the automatic switching can be realized without changing the system configuration, and the disaster recovery switching efficiency is improved.
The equipment of the disaster recovery architecture on the cloud comprises:
a memory for storing a program;
a processor, configured to run the program stored in the memory, so as to execute the configuration method of the disaster recovery architecture on the cloud or each step in the disaster recovery switching method according to the embodiment of the present invention.
Fig. 11 is a block diagram illustrating an exemplary hardware architecture capable of implementing the method and apparatus according to the embodiments of the present invention, for example, an apparatus of a disaster recovery architecture based on a software scheme according to the embodiments of the present invention. Wherein computing device 1100 includes input device 1101, input interface 1102, processor 1103, memory 1104, output interface 1105, and output device 1106.
The input interface 1102, the processor 1103, the memory 1104, and the output interface 1105 are connected to each other via a bus 1110, and the input device 1101 and the output device 1106 are connected to the bus 1110 via the input interface 1102 and the output interface 1105, respectively, and further connected to other components of the computing device 1100.
Specifically, the input device 1101 receives input information from the outside and transmits the input information to the processor 1103 through the input interface 1102; the processor 1103 processes the input information based on the computer-executable instructions stored in the memory 1104 to generate output information, stores the output information temporarily or permanently in the memory 1104, and then transmits the output information to the output device 1106 through the output interface 1105; the output device 1106 outputs output information external to the computing device 1100 for use by a user.
The computing device 1100 may perform the steps of the above-described configuration method or switching method of the present invention.
The processor 1103 may be one or more Central Processing Units (CPUs). In the case where the processor 1103 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
The memory 1104 may be, but is not limited to, one or more of Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), compact disc read only memory (CD-ROM), a hard disk, and the like. The memory 1104 is used for storing program code.
It is understood that the functions of any or all of the modules provided in the embodiments of the present invention may be implemented by the central processing unit 1103 shown in fig. 11.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the processes or functions described in accordance with embodiments of the invention, to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
All parts of the specification are described in a progressive manner, and the same and similar parts among the various embodiments can be mutually referred to, and each embodiment is mainly described in different points from other embodiments. In particular, as to the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple and reference may be made to the description of the method embodiments in relevant places.

Claims (15)

1.一种云上容灾架构的配置方法,包括:1. A configuration method for a disaster recovery architecture on the cloud, comprising: 根据用户需求规格,在所述云上容灾架构的生产可用区和容灾可用区中分别创建规格相同的云主机;According to user requirements and specifications, create cloud hosts with the same specifications in the production availability zone and the disaster recovery availability zone of the disaster recovery architecture on the cloud; 在所述生产可用区的云主机ECS、和所述容灾可用区的对应云主机ECS’中分别绑定相同的业务地址,并在所述ECS中绑定管理地址IP、所述ECS’中绑定管理地址IP’;Bind the same service address to the cloud host ECS in the production availability zone and the corresponding cloud host ECS' in the disaster recovery zone respectively, and bind the management address IP in the ECS, and in the ECS' Bind management address IP'; 分别在所述ECS和ECS’中加载用于应用数据同步的第一智能引擎;Load the first intelligent engine for application data synchronization in the ECS and ECS' respectively; 将所述ECS的业务地址和IP设置为激活状态,将所述ECS’的业务地址设置为非激活状态,将所述IP’设置为激活状态。The service address and IP of the ECS are set to an active state, the service address of the ECS' is set to an inactive state, and the IP' is set to an active state. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, wherein the method further comprises: 在所述生产可用区和所述容灾可用区中分别创建规格相同的云数据库;Create cloud databases with the same specifications in the production availability zone and the disaster recovery availability zone respectively; 分别在所述生产可用区的云数据库RDS和所述容灾可用区的云数据库RDS’中加载用于数据同步的第二智能引擎。A second intelligent engine for data synchronization is loaded in the cloud database RDS of the production availability zone and the cloud database RDS' of the disaster recovery availability zone, respectively. 3.根据权利要求1所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1, wherein the method further comprises: 在所述生产可用区和所述容灾可用区中创建规格相同的负载均衡实例,将所述生产可用区的负载均衡实例设置为激活状态,将所述容灾可用区的负载均衡实例设置为非激活状态。Create a load balancing instance with the same specifications in the production availability zone and the disaster recovery zone, set the load balancer instance in the production availability zone to the active state, and set the load balancer instance in the disaster recovery zone as inactive state. 4.根据权利要求2所述的方法,其特征在于,所述方法还包括:4. The method according to claim 2, wherein the method further comprises: 在所述生产可用区中创建弹性块存储EBS,在所述容灾可用区中创建规格相同的弹性块存储EBS’,并在所述EBS中完成所述ECS、RDS的存储挂载,在所述EBS’中完成所述ECS’、RDS’的存储挂载。Create an elastic block storage EBS in the production availability zone, create an elastic block storage EBS' with the same specifications in the disaster recovery availability zone, and complete the storage mounting of the ECS and RDS in the EBS. The storage mounting of the ECS' and RDS' is completed in the EBS'. 5.根据权利要求1所述的方法,其特征在于,所述根据用户需求规格,在所述云上容灾架构的生产可用区和容灾可用区中分别创建规格相同的云主机之前,还包括:5. The method according to claim 1, characterized in that, before creating cloud hosts with the same specifications in the production availability zone and the disaster recovery availability zone of the disaster recovery architecture on the cloud, according to user requirements include: 根据待上云业务系统的容灾需求,制定云端中针对所述待上云业务系统的生产可用区和容灾可用区的云服务部署方案。According to the disaster recovery requirements of the business system to be migrated to the cloud, a cloud service deployment scheme in the cloud for the production availability zone and the disaster recovery availability zone of the business system to be migrated to the cloud is formulated. 6.根据权利要求1所述的方法,其特征在于,所述根据用户需求规格,在所述云上容灾架构的生产可用区和容灾可用区中分别创建规格相同的云主机之后,还包括:6. The method according to claim 1, characterized in that, after the cloud hosts with the same specifications are respectively created in the production availability zone and the disaster recovery availability zone of the disaster recovery architecture on the cloud according to user requirements include: 将配置数据通过在所述生产可用区和所述容灾可用区同步下发的方式,完成所述ECS和所述ECS’之间的配置数据的同步。The configuration data synchronization between the ECS and the ECS' is completed by synchronously delivering the configuration data in the production availability zone and the disaster recovery availability zone. 7.根据权利要求6所述的方法,其特征在于,所述将配置数据通过在所述生产可用区和所述容灾可用区同步下发的方式,完成所述ECS和所述ECS’之间的配置数据的同步之后,还包括:7. The method according to claim 6, characterized in that, the configuration data is synchronously delivered in the production availability zone and the disaster recovery availability zone to complete the connection between the ECS and the ECS'. After the synchronization of the configuration data between the two, it also includes: 在所述ECS的第一智能引擎中添加发端标志和收端管理地址,在所述ECS’的第一智能引擎中添加收端标志和发端管理地址;In the first intelligent engine of the ECS, add the originating sign and the receiving end management address, and add the receiving end sign and the originating management address in the first intelligent engine of the ECS'; 所述ECS的第一智能引擎对操作系统的数据操作命令和应用程序接口API进行复写,并根据所述数据操作命令和API生成数据操作日志,所述ECS向所述ECS’发出数据同步请求;The first intelligent engine of the ECS rewrites the data operation command and the application program interface API of the operating system, and generates a data operation log according to the data operation command and the API, and the ECS sends a data synchronization request to the ECS'; 所述ECS’校验发端地址是否为授权地址,校验通过后在所述ECS和所述ECS’之间建立连接,并完成数据操作日志的同步。The ECS' checks whether the originating address is an authorized address, establishes a connection between the ECS and the ECS' after the verification is passed, and completes the synchronization of the data operation log. 8.根据权利要求2所述的方法,其特征在于,所述在所述生产可用区和所述容灾可用区中分别创建规格相同的云数据库之后,还包括:8 . The method according to claim 2 , wherein after the cloud databases with the same specifications are created in the production availability zone and the disaster recovery availability zone respectively, the method further comprises: 8 . 在所述RDS的第二智能引擎中添加发端标志和收端管理地址,在所述RDS’的第二智能引擎中添加收端标志和发端管理地址;In the second intelligent engine of the RDS, add the originating sign and the receiving end management address, in the second intelligent engine of the RDS' add the receiving end sign and the originating management address; 所述RDS的第二智能引擎对数据库软件的数据操作命令和应用程序接口API进行复写,并根据所述数据操作命令和API生成数据执行日志,所述RDS向所述RDS’发出数据同步请求;The second intelligent engine of the RDS rewrites the data operation command and the application program interface API of the database software, and generates a data execution log according to the data operation command and the API, and the RDS sends a data synchronization request to the RDS'; 所述RDS’校验发端地址是否为授权地址,校验通过后在所述RDS和所述RDS’之间建立连接,并完成数据执行日志的同步。The RDS' verifies whether the originating address is an authorized address, and after the verification is passed, a connection is established between the RDS and the RDS', and the synchronization of the data execution log is completed. 9.根据权利要求3所述的方法,其特征在于,所述在所述生产可用区和所述容灾可用区中创建规格相同的负载均衡实例之后,还包括:9 . The method according to claim 3 , wherein after creating a load balancing instance of the same specification in the production availability zone and the disaster recovery availability zone, the method further comprises: 10 . 在所述生产可用区的负载均衡实例上完成负载均衡策略配置,将负载均衡策略配置信息上报给所述生产可用区的软件定义网络SDN控制器;Complete the load balancing policy configuration on the load balancing instance in the production availability zone, and report the load balancing policy configuration information to the software-defined network SDN controller in the production availability zone; 所述生产可用区的SDN控制器根据所述负载均衡策略配置信息生成负载均衡配置元数据报文,并将所述元数据报文同步给所述容灾可用区的SDN控制器;The SDN controller in the production availability zone generates a load balancing configuration metadata message according to the load balancing policy configuration information, and synchronizes the metadata message to the SDN controller in the disaster recovery availability zone; 所述容灾可用区的SDN控制器将所述元数据报文下发给所述容灾可用区的负载均衡实例,完成所述生产可用区与容灾可用区的负载均衡实例之间的生产数据同步。The SDN controller in the DR availability zone delivers the metadata message to the load balancing instance in the DR availability zone, and completes the production between the production availability zone and the load balancing instance in the DR availability zone data synchronization. 10.一种容灾切换方法,用于根据权利要求1-9中任一项所述的方法配置成的云上容灾架构,包括:10. A disaster-tolerant switching method, used in a cloud-based disaster-tolerant architecture configured according to the method according to any one of claims 1-9, comprising: 当判定生产可用区不可用时,则取消所述生产可用区的生产环境标识,并在容灾可用区添加所述生产环境标识;When it is determined that the production availability zone is unavailable, the production environment identifier of the production availability zone is cancelled, and the production environment identifier is added to the disaster recovery availability zone; 在所述容灾可用区的云数据库的第二智能引擎中添加发端标志和收端管理地址;adding the originating flag and the destination management address to the second intelligent engine of the cloud database in the disaster recovery availability zone; 在所述容灾可用区的云主机的第一智能引擎中添加发端标志和收端管理地址,通知所述容灾可用区的软件定义网络SDN控制器,激活所述容灾可用区的云主机的业务地址;Add the originating flag and the receiving end management address to the first intelligent engine of the cloud host in the disaster recovery availability zone, notify the software-defined network SDN controller in the disaster recovery availability zone, and activate the cloud host in the disaster recovery availability zone business address; 通知所述容灾可用区的SDN控制器,激活所述容灾可用区的负载均衡实例。Notifying the SDN controller of the DR availability zone to activate the load balancing instance of the DR availability zone. 11.根据权利要求10所述的容灾切换方法,所述通知所述容灾可用区的SDN控制器,激活所述容灾可用区的负载均衡实例之后,还包括:11. The disaster-tolerant switching method according to claim 10, wherein after notifying the SDN controller of the disaster-tolerant availability zone and activating the load balancing instance of the disaster-tolerant availability zone, the method further comprises: 当所述生产可用区恢复正常,通知所述生产可用区的SDN控制器,将所述生产可用区的云主机的业务地址、负载均衡实例均设置为非激活状态,在所述生产可用区的云数据库的第二智能引擎和云主机的第一智能引擎中添加收端标志和发端管理地址,实现与所述容灾可用区的数据同步。When the production availability zone returns to normal, notify the SDN controller of the production availability zone to set the business address and load balancing instance of the cloud host in the production availability zone to inactive The second intelligent engine of the cloud database and the first intelligent engine of the cloud host add a receiving end flag and a sending end management address to realize data synchronization with the disaster recovery availability zone. 12.一种云上容灾架构的配置装置,其特征在于,所述装置包括:12. A configuration device for a disaster recovery architecture on the cloud, characterized in that the device comprises: 第一创建模块,用于根据用户需求规格,在所述云上容灾架构的生产可用区和容灾可用区中分别创建规格相同的云主机;a first creation module, configured to respectively create cloud hosts with the same specifications in the production availability zone and the disaster recovery availability zone of the disaster recovery architecture on the cloud according to user requirements and specifications; 绑定模块,用于在所述生产可用区的云主机ECS、和所述容灾可用区的对应云主机ECS’中分别绑定相同的业务地址,并在所述ECS中绑定管理地址IP、所述ECS’中绑定管理地址IP’;A binding module, configured to bind the same service address in the cloud host ECS in the production availability zone and the corresponding cloud host ECS' in the disaster recovery zone respectively, and bind the management address IP in the ECS . Bind the management address IP' in the ECS'; 第一加载模块,用于分别在所述ECS和ECS’中加载用于应用数据同步的第一智能引擎;a first loading module for loading the first intelligent engine for application data synchronization in the ECS and ECS' respectively; 设置模块,用于将所述ECS的业务地址和IP设置为激活状态,将所述ECS’的业务地址设置为非激活状态,将所述IP’设置为激活状态。A setting module, configured to set the service address and IP of the ECS to an active state, set the service address of the ECS' to an inactive state, and set the IP' to an active state. 13.一种容灾切换装置,其特征在于,所述装置包括:13. A disaster-tolerant switching device, characterized in that the device comprises: 判断模块,用于当判定生产可用区不可用时,则取消所述生产可用区的生产环境标识,并在容灾可用区添加所述生产环境标识;a judgment module, configured to cancel the production environment identifier of the production availability zone when it is determined that the production availability zone is unavailable, and add the production environment identifier to the disaster recovery availability zone; 第一添加模块,用于在所述容灾可用区的云数据库的第二智能引擎中添加发端标志和收端管理地址;a first adding module, configured to add the originating flag and the destination management address in the second intelligent engine of the cloud database in the disaster tolerance availability zone; 第二添加模块,用于在所述容灾可用区的云主机的第一智能引擎中添加发端标志和收端管理地址,通知所述容灾可用区的软件定义网络SDN控制器,激活所述容灾可用区的云主机的业务地址;The second adding module is configured to add the originating flag and the receiving end management address to the first intelligent engine of the cloud host in the disaster recovery availability zone, and notify the software-defined network SDN controller in the disaster recovery availability zone to activate the The business address of the cloud host in the DR availability zone; 激活模块,用于通知所述容灾可用区的SDN控制器,激活所述容灾可用区的负载均衡实例。An activation module, configured to notify the SDN controller of the disaster recovery availability zone to activate the load balancing instance of the disaster recovery availability zone. 14.一种云上容灾架构的设备,其特征在于,包括:至少一个处理器、至少一个存储器以及存储在所述存储器中的计算机程序指令,当所述计算机程序指令被所述处理器执行时实现如权利要求1-9中任一项所述的方法或者权利要求10-11中任一项所述的方法。14. A device with a disaster recovery architecture on the cloud, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory, when the computer program instructions are executed by the processor When the method according to any one of claims 1-9 or the method according to any one of claims 10-11 is implemented. 15.一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,当所述计算机程序指令被处理器执行时实现如权利要求1-9中任一项所述的方法或者权利要求10-11中任一项所述的方法。15. A computer-readable storage medium on which computer program instructions are stored, characterized in that, when the computer program instructions are executed by a processor, the method or the right according to any one of claims 1-9 is implemented The method of any of claims 10-11.
CN201811318053.2A 2018-11-07 2018-11-07 Disaster-tolerant architecture configuration method, switching method, device, device, and storage medium Pending CN111158949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811318053.2A CN111158949A (en) 2018-11-07 2018-11-07 Disaster-tolerant architecture configuration method, switching method, device, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811318053.2A CN111158949A (en) 2018-11-07 2018-11-07 Disaster-tolerant architecture configuration method, switching method, device, device, and storage medium

Publications (1)

Publication Number Publication Date
CN111158949A true CN111158949A (en) 2020-05-15

Family

ID=70555073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811318053.2A Pending CN111158949A (en) 2018-11-07 2018-11-07 Disaster-tolerant architecture configuration method, switching method, device, device, and storage medium

Country Status (1)

Country Link
CN (1) CN111158949A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111683139A (en) * 2020-06-05 2020-09-18 北京百度网讯科技有限公司 Method and apparatus for load balancing
CN113301089A (en) * 2020-07-28 2021-08-24 阿里巴巴集团控股有限公司 Cloud service node deployment method and device
WO2022012310A1 (en) * 2020-07-13 2022-01-20 华为技术有限公司 Communication method and apparatus
CN113986610A (en) * 2021-09-28 2022-01-28 新华三大数据技术有限公司 Disaster recovery system, method for realizing business equipment disaster recovery, and management device
CN113987066A (en) * 2021-09-29 2022-01-28 平凯星辰(北京)科技有限公司 Disaster recovery method, device, electronic device and storage medium for dual availability zones
CN114090333A (en) * 2021-10-20 2022-02-25 中核核电运行管理有限公司 Disaster tolerance switching management system and method for production management platform
CN114153655A (en) * 2021-10-29 2022-03-08 郑州云海信息技术有限公司 Disaster tolerance system creating method, disaster tolerance method, device, equipment and medium
CN114285832A (en) * 2021-05-11 2022-04-05 鸬鹚科技(深圳)有限公司 Disaster recovery system, method, computer device and medium for multiple data centers
CN114697198A (en) * 2022-04-18 2022-07-01 北京嗨学网教育科技股份有限公司 Cloud disaster backup server implementation method, cloud disaster backup server starting method and cloud disaster backup server starting device
CN116233138A (en) * 2023-03-12 2023-06-06 天翼云科技有限公司 Method and device for realizing cloud firewall high-availability cluster
CN118550712A (en) * 2024-07-29 2024-08-27 济南浪潮数据技术有限公司 Cloud platform disaster recovery method, device, equipment, medium and product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055605A (en) * 2009-11-11 2011-05-11 中兴通讯股份有限公司 Disaster tolerance system and method applied to AAA (authentication, authorization and accounting) server
CN104717083A (en) * 2013-12-13 2015-06-17 中国移动通信集团上海有限公司 Disaster tolerant switching system, method and device for A-SBC equipment
CN107241430A (en) * 2017-07-03 2017-10-10 国家电网公司 A kind of enterprise-level disaster tolerance system and disaster tolerant control method based on distributed storage
CN108512693A (en) * 2018-02-24 2018-09-07 国家计算机网络与信息安全管理中心 A kind of trans-regional disaster recovery method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055605A (en) * 2009-11-11 2011-05-11 中兴通讯股份有限公司 Disaster tolerance system and method applied to AAA (authentication, authorization and accounting) server
CN104717083A (en) * 2013-12-13 2015-06-17 中国移动通信集团上海有限公司 Disaster tolerant switching system, method and device for A-SBC equipment
CN107241430A (en) * 2017-07-03 2017-10-10 国家电网公司 A kind of enterprise-level disaster tolerance system and disaster tolerant control method based on distributed storage
CN108512693A (en) * 2018-02-24 2018-09-07 国家计算机网络与信息安全管理中心 A kind of trans-regional disaster recovery method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴礼乐: "基于双活容灾存储技术的云计算数据中心的设计及应用", 《电子设计工程》 *
吴礼乐: "基于双活容灾存储技术的云计算数据中心的设计及应用", 《电子设计工程》, vol. 23, no. 06, 20 March 2015 (2015-03-20), pages 190 - 192 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111683139A (en) * 2020-06-05 2020-09-18 北京百度网讯科技有限公司 Method and apparatus for load balancing
WO2022012310A1 (en) * 2020-07-13 2022-01-20 华为技术有限公司 Communication method and apparatus
CN113301089A (en) * 2020-07-28 2021-08-24 阿里巴巴集团控股有限公司 Cloud service node deployment method and device
CN114285832A (en) * 2021-05-11 2022-04-05 鸬鹚科技(深圳)有限公司 Disaster recovery system, method, computer device and medium for multiple data centers
CN113986610A (en) * 2021-09-28 2022-01-28 新华三大数据技术有限公司 Disaster recovery system, method for realizing business equipment disaster recovery, and management device
CN113987066A (en) * 2021-09-29 2022-01-28 平凯星辰(北京)科技有限公司 Disaster recovery method, device, electronic device and storage medium for dual availability zones
CN114090333A (en) * 2021-10-20 2022-02-25 中核核电运行管理有限公司 Disaster tolerance switching management system and method for production management platform
CN114153655A (en) * 2021-10-29 2022-03-08 郑州云海信息技术有限公司 Disaster tolerance system creating method, disaster tolerance method, device, equipment and medium
CN114153655B (en) * 2021-10-29 2024-10-29 郑州云海信息技术有限公司 Disaster recovery system creation method, disaster recovery method, device, equipment and medium
CN114697198A (en) * 2022-04-18 2022-07-01 北京嗨学网教育科技股份有限公司 Cloud disaster backup server implementation method, cloud disaster backup server starting method and cloud disaster backup server starting device
CN116233138A (en) * 2023-03-12 2023-06-06 天翼云科技有限公司 Method and device for realizing cloud firewall high-availability cluster
CN118550712A (en) * 2024-07-29 2024-08-27 济南浪潮数据技术有限公司 Cloud platform disaster recovery method, device, equipment, medium and product

Similar Documents

Publication Publication Date Title
CN111158949A (en) Disaster-tolerant architecture configuration method, switching method, device, device, and storage medium
US11445019B2 (en) Methods, systems, and media for providing distributed database access during a network split
CN112099918B (en) Live migration of clusters in containerized environments
US11687555B2 (en) Conditional master election in distributed databases
CN116302719B (en) System and method for enabling high availability managed failover services
US10922303B1 (en) Early detection of corrupt data partition exports
JP6073246B2 (en) Large-scale storage system
US9344494B2 (en) Failover data replication with colocation of session state data
US8954786B2 (en) Failover data replication to a preferred list of instances
CN113169952A (en) A container cloud management system based on blockchain technology
CN109542611A (en) Database, that is, service system, database dispatching method, equipment and storage medium
US20210240560A1 (en) Block-storage service supporting multi-attach and health check failover mechanism
CN106407011A (en) A routing table-based search system cluster service management method and system
US20240248810A1 (en) Systems and methods for cross-regional back up of distributed databases on a cloud service
US9760370B2 (en) Load balancing using predictable state partitioning
CN110543315A (en) distributed operating system of kbroker, storage medium and electronic equipment
US11556334B2 (en) Systems and methods for gradually updating a software object on a plurality of computer nodes
CN116954816A (en) Container cluster control method, device, equipment and computer storage medium
CN114615268B (en) Service network, monitoring node, container node and equipment based on Kubernetes cluster
WO2023100062A1 (en) Managing nodes of a dbms
CN119557373A (en) Data storage method, device, system and storage medium
CN111818188B (en) Load balancing availability improving method and device for Kubernetes cluster
US20240291895A1 (en) Distributed cloud system, and data processing method and storage medium of distributed cloud system
CN102868594B (en) Method and device for message processing
CN116614521B (en) Docking service methods and systems, distributed file systems and active-active nodes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200515