CN119512818A

CN119512818A - A multi-control cluster SAN switch back method and device

Info

Publication number: CN119512818A
Application number: CN202411560354.1A
Authority: CN
Inventors: 夏威; 周浩; 程勇; 高利娟; 孙涛
Original assignee: CETC 52 Research Institute
Current assignee: CETC 52 Research Institute
Priority date: 2024-11-04
Filing date: 2024-11-04
Publication date: 2025-02-25

Abstract

The invention discloses a multi-control cluster SAN switching back-switching method which is applied to data read-write between a multi-control cluster system and a host, wherein the data read-write between the multi-control cluster system and the host comprises a SAN normal state, a SAN switching state and a SAN back-switching state, and the multi-control cluster system comprises n controllers. The multi-control cluster SAN switching back switching method and the device enable the rest controllers to take over the service of the failed or shut-down controllers through SAN switching when at least one controller fails or shuts down in the multi-control cluster system, further enable the service between the multi-control cluster system and a host to be normally carried out, enable all controllers in the multi-control cluster system to take over the service normally through SAN switching back after the failed controllers are recovered, and enable the internal SAN mapping and the external SAN mapping in the multi-control cluster SAN switching back switching method and the device to be compatible and achieve the SAN switching back switching state without limiting whether the IP-SAN or the FC-SAN is used.

Description

Multi-control cluster SAN switching back-switching method and device

Technical Field

The invention belongs to the technical field of SAN switching, and particularly relates to a multi-control cluster SAN switching back switching method and device.

Background

A SAN (Storage Area Network ) is a high-speed, dedicated network for storage operations, typically independent of a computer local area network. The SAN connects the host and the storage device together and can provide a professional communication channel for any host and any storage device thereon. The SAN enables the storage devices to be independent from the server, and storage resource sharing on the server level is achieved. SAN introduces channel technology and network technology into the storage environment, provides a novel network storage solution, and can simultaneously meet the requirements of throughput, availability, reliability, expandability, manageability and the like. The SAN itself is a storage network that assumes the data storage tasks. By establishing the mapping among the LUN group, the network group and the host group, the host in the host group can perform data transmission with the LUN in the LUN group through the IP or the FC (fiber Channel) in the network group. In SAN networks, all data transfers occur in high-speed, high-bandwidth networks.

The prior patent CN106888111A discloses a switching method of a double-machine cluster FC-SAN, in the method, service data of a user is issued to each controller of storage equipment through SAN service, the storage equipment of the double-machine cluster realizes high availability of the storage equipment through service switching and management between the two controllers so as to ensure that the service of the user is not interrupted in controller switching or path switching, and after the controller is recovered, the original service can be taken over. However, the dual-control cluster environment also has the situation that both controllers are failed, so that the SAN at the upper layer can be stopped immediately, and the normal operation of the service can not be ensured to be maintained continuously; meanwhile, the drive of the optical fiber card is modified, and the program in the drive is controlled, so that the switching is flexible, and the continuity of the server side IO under the condition of node fault switching is ensured. However, this solution is to modify specific fiber cards specifically, and SAN has no versatility except FC-SAN, but also IP-SAN.

Disclosure of Invention

The invention aims to solve the problems in the background art and provides a multi-control cluster SAN switching back-switching method.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the invention provides a multi-control cluster SAN switching back-switching method, which is applied to data read-write between a multi-control cluster system and a host, wherein the data read-write between the multi-control cluster system and the host comprises a SAN normal state, a SAN switching state and a SAN back-switching state, and the multi-control cluster system comprises n controllers, wherein:

When all controllers are normal, namely the SAN is in a normal state, a storage pool is created on one controller among n controllers, a logic volume is created on the storage pool, and the preset controller is defined as a configuration controller;

Creating a first virtual volume on the logical volume, then creating a first internal SAN mapping on the first virtual volume, logging in the first internal SAN mapping by the other n-1 non-configuration controllers, automatically generating a corresponding first internal mapping volume, and then respectively creating second virtual volumes for the first internal mapping volumes of the other n-1 non-configuration controllers to realize the establishment of mapping relation between each second virtual volume and the first virtual volume;

establishing external SAN mapping between each virtual volume and a host, wherein n links for reading and writing data exist between the host and the multi-control cluster system;

when the configuration controller fails or is shut down in the data reading and writing process, namely, the SAN is switched, the second virtual volumes of the other n-1 non-configuration controllers are empty in pointing direction, the first internal SAN mapping is logged out, one preset controller is selected from the other n-1 non-configuration controllers, a storage pool and a logic volume on the configuration controller are imported to the controller, and the controller is defined as an active controller;

The second virtual volumes on the active controllers point to the imported logical volumes, a new second internal SAN mapping is created on the second virtual volumes, the other n-2 non-configuration controllers log in the second internal SAN mapping, corresponding second internal mapping volumes are automatically generated, the second virtual volumes on the other n-2 non-configuration controllers point to the corresponding second internal mapping volumes respectively, a mapping relation is established between the second virtual volumes on the other n-2 non-configuration controllers and the second virtual volumes on the active controllers, and at the moment, n-1 data read-write links exist between the host and the multi-control cluster system;

When the configuration controller is recovered to be normal in the data reading and writing process, namely a SAN (storage area network) switching state, the second virtual volumes of the other n-1 non-configuration controllers are empty in pointing direction, the storage pool and the logical volumes on the active controller are imported to the configuration controller after recovery, the second internal SAN mapping on the second virtual volumes of the active controller is deleted, and then the other n-2 non-configuration controllers are all logged out of the second internal SAN mapping;

Creating a third virtual volume on the logical volume of the configuration controller after the normal recovery, then creating an internal SAN mapping which is the same as that on the first virtual volume on the third virtual volume, naming the third internal SAN mapping, logging in the third internal SAN mapping by the other n-1 non-configuration controllers, automatically generating a corresponding third internal mapping volume, enabling the second virtual volumes on the other n-1 non-configuration controllers to point to the corresponding third internal mapping volumes respectively, realizing the mapping relation between the second virtual volumes on the other n-1 non-configuration controllers and the third virtual volumes, then creating an external SAN mapping between the host for the third virtual volume, and enabling n data read-write links to exist between the host and the multi-control cluster system.

Preferably, in the SAN switching state, the preset controller is a controller with minimum resource occupation.

Preferably, in the data read-write process, at least two controllers are failed or powered off, when the failed or powered off controller is a non-configuration controller, the failed or powered off controller does not need to log in a corresponding internal SAN map, and when the failed or powered off controller is a configuration controller, the operation of the SAN switching state is executed.

Preferably, the multi-control cluster system further comprises a detection module, wherein the detection module detects whether each controller fails or shuts down in the data reading and writing process.

Preferably, in the SAN switch-back state, in the process of importing the storage pool and the logical volume on the active controller to the configuration controller after the recovery, the storage pool and the logical volume on the active controller are exported first, and then the exported storage pool and logical volume are imported to the configuration controller after the recovery.

The invention also provides a multi-control cluster SAN switching back-switching device, which comprises a processor and a memory storing a plurality of computer instructions, wherein the computer instructions realize the steps of the multi-control cluster SAN switching back-switching method when being executed by the processor.

Compared with the prior art, the invention has the beneficial effects that:

1. In the multi-control cluster system, when at least one controller fails or is shut down, the rest controllers can take over the service of the failed or shut down controller through SAN switching, so that the service between the multi-control cluster system and the host can be normally carried out;

2. The internal SAN mapping and the external SAN mapping in the multi-control cluster SAN switching and switching method and the device are not limited to use of an IP-SAN or an FC-SAN, and can be compatible and realize the SAN switching or switching state.

Drawings

FIG. 1 is a block diagram of a multi-control cluster SAN switching back-switching method and apparatus according to the present invention;

FIG. 2 is a flow chart of a SAN switch status according to the present invention;

FIG. 3 is a flow chart of the SAN switch-back state of the present invention.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

As shown in fig. 1-3, a multi-control cluster SAN switching back method is provided, which is applied to data read-write between a multi-control cluster system and a host, wherein the data read-write between the multi-control cluster system and the host includes a SAN normal state, a SAN switching state and a SAN switching back state, the multi-control cluster system includes n controllers, the multi-control cluster system further includes physical hard disks (the number and the type are not limited and can be set according to actual needs), a storage space is provided for the multi-control cluster system, and various physical hard disks can be identified by each controller.

The application is realized by an application tool for controlling the SAN, not limited to a specific storage management software architecture, and can be a kernel mode or a user mode storage software architecture. The application is realized based on SPDK (Storage Performance Development Kit, storing high-performance development kit) software architecture, SPDK is a software library for improving storage performance, aiming at improving lower delay and higher throughput for storage application, can realize more efficient data access by eliminating the spending of an operating system kernel, and SPDK provides rich API interface, so that a developer can integrate the interface into the existing storage solution conveniently.

The multi-control cluster system also comprises a detection module, wherein the detection module detects whether each controller fails or shuts down in the data reading and writing process;

It should be noted that, the detection module may check the operation condition of each controller, and determine whether the configured controller or the non-configured controller is powered off or fails according to whether the internal network is connected, whether the health status of the controller returns to failure (whether the controller has an abnormal error), and so on, so as to determine whether to perform the SAN switching process.

The following describes the processes of the SAN normal state, the SAN switch state, and the SAN switch back state by the embodiments respectively:

Example 1

In this embodiment, when all the controllers are normal, i.e., the SAN is in a normal state (no failure or shutdown controller in the multi-control cluster system is present), a storage pool is created on one of the n controllers (the selection of the controller is not limited and may be randomly selected), and a logical volume is created on the storage pool, and the preset controller is defined as a configuration controller (it should be noted that, by creating the storage pool and the logical volume on the selected controller, the physical hard disk is divided into storage spaces with the size of actual requirements on the software layer, and the storage pool and the logical volume only exist on one configuration controller);

Creating a first virtual volume on the logical volume (the first virtual volume is used for hanging and retrying of the host IO in the process of a subsequent SAN switching state and a SAN switching state, switching of an underlying link is completed in the retrying process), then creating a first internal SAN mapping on the first virtual volume (through an internal network among all controllers, an internal SAN mapping is created, wherein the internal network can be based on a TCP/IP network, an IP-SAN internal mapping can be created at the moment, or an FC network can be used, the FC-SAN internal mapping is created at the moment), all the other n-1 non-configuration controllers log in the first internal SAN mapping, after logging, corresponding first internal mapping volumes are automatically generated (namely, all the other n-1 non-configuration controllers automatically generate respective first internal mapping volumes), then respectively creating a second virtual volume for the first internal mapping volumes of the other n-1 non-configuration controllers (the second virtual volume is used for hanging and retrying of the host IO in the end in the process of the subsequent SAN switching state and the switching state, and the FC-SAN internal mapping is created at the moment), and the first virtual volume is established between the first virtual volume and the first virtual volume in the process of the retrying process of the underlying link is completed;

An external SAN mapping between each virtual volume and the host is established (namely, an external SAN mapping between each virtual volume and the host is established for each first virtual volume and n-1 second virtual volumes, the network between each virtual volume and the host can be based on a TCP/IP network, an IP-SAN external mapping can be established at the moment, or an FC network can be established at the moment, an FC-SAN external mapping is established at the moment), and n links for reading and writing data exist between the host and the multi-control cluster system (namely, multi-path block equipment of n links can be found and synthesized at the host end to perform the reading and writing operation of the data).

It should be noted that, the number of storage pools created on the configuration controller is not limited, and the number of logical volumes created on one storage pool is not limited, and may be set according to actual needs, where a first virtual volume corresponds to a logical volume one by one (i.e., a first virtual volume corresponds to a logical volume), and a first internal mapping volume of a non-configuration controller corresponds to a first virtual volume one by one, and a second virtual volume corresponds to a first internal mapping volume one by one.

The host is also provided with mapping volumes at the host end, and external SAN mapping between each virtual volume and the host is established with the mapping volumes at the host end.

Example 2

Based on the embodiment 1, the embodiment also proposes that when a configuration controller fails or shuts down in the data read-write process, namely, a SAN switch state (the configuration controller which fails or shuts down breaks down connection in a multi-control cluster system, and a first virtual volume and an external SAN map corresponding to the configuration controller are lost, because a communication link between a host end and all controllers is unified and an IO goes to the first virtual volume of the configuration controller, the IOs of all host ends cannot be normally issued to a logical volume of the multi-control cluster system), a second virtual volume of the remaining n-1 non-configuration controllers points to empty (the IOs of the host end are suspended), and logs out of the first internal SAN map, and a preset controller is selected from the remaining n-1 non-configuration controllers, and the storage and the logical volume on the configuration controller are imported to the controller;

the second virtual volume on the active controller points to the imported logic volume (the second virtual volume on the active controller is in one-to-one correspondence with the logic volume), at this time, because the external SAN mapping of the second virtual volume on the active controller still exists and functions normally, the logic volume is mapped to the host again through the second virtual volume on the active controller, the link from the host end to the active controller is restored, that is, the host end can see that one link can normally send IO), and creates a new second internal SAN mapping on the second virtual volume, the rest n-2 non-configuration controllers (namely, the n-2 non-configuration controllers except the active controller) log in the second internal SAN mapping, after logging, the corresponding second internal mapping volumes (namely, the rest n-2 non-configuration controllers all automatically generate respective second internal mapping volumes, and the second internal mapping volumes correspond to the second virtual volumes), and the second virtual volumes on the rest n-2 non-configuration controllers point to the corresponding second internal mapping volumes (namely, the rest n-2 non-configuration controllers can point to the second virtual volumes on the active controller) respectively, and the second virtual volume on the rest n-2 non-configuration controllers can realize the normal function of the virtual volume on the host end, and the virtual volume on the host end can still have normal data-to the virtual volume (namely, the rest n-2 non-configuration controllers can realize the virtual volume on the rest n-configuration controllers) and the virtual volume on the host end, and the virtual volume on the rest n-2 non-configuration controllers are respectively, the SAN switching process is completed, so that data reading and writing between the multi-control cluster system and the host can be normally performed.

Example 3

Based on the embodiment 1 and the embodiment 2, the embodiment also proposes that when the configuration controller is restored to be normal in the data reading and writing process, namely, in a SAN switch-back state, the second virtual volumes of the remaining n-1 non-configuration controllers are pointed to be null (the IO of the host end is suspended), the storage pool and the logical volumes on the active controller are imported to the configuration controller after restoration, the second internal SAN mapping on the second virtual volumes of the active controller is deleted, and then the remaining n-2 non-configuration controllers are all withdrawn from the login second internal SAN mapping, wherein in the SAN switch-back state, in the process of importing the storage pool and the logical volumes on the active controller to the configuration controller after restoration, the storage pool and the logical volumes on the active controller are first exported, and then the exported storage pool and logical volumes are imported to the configuration controller after restoration.

Creating a third virtual volume on the logical volume of the configuration controller after the restoration of the normal state (because the configuration controller fails or is powered off, the first virtual volume is lost or abnormal, so that the virtual volume needs to be re-created, namely, the third virtual volume is in one-to-one correspondence with the logical volume), then creating an internal SAN mapping which is the same as that on the first virtual volume on the third virtual volume, naming the third internal SAN mapping, logging in all the other n-1 non-configuration controllers, automatically generating a corresponding third internal mapping volume (namely, all the other n-1 non-configuration controllers automatically generate respective third internal mapping volumes, and the third internal mapping volume corresponds to the third virtual volume), and respectively pointing to the corresponding third internal mapping volume (namely, the second virtual volumes on the other n-1 non-configuration controllers respectively point to the third internal mapping volume on the corresponding controllers), realizing that a mapping relationship between the second virtual volumes on the other n-1 non-configuration controllers and the third internal mapping volume on the corresponding controllers is realized, and the third virtual volume can be normally established with the host computer when the virtual volume has a plurality of virtual volume has a normal state, and a read-write-back relationship between the virtual volume and the host can be realized, and the host can be realized when the host has a normal state.

In the data reading and writing process, at least two controllers are in failure or shutdown, when the failure or shutdown controllers are all non-configuration controllers, the failure or shutdown controllers do not need to log in corresponding internal SAN mapping, and when one of the failure or shutdown controllers is the configuration controller, the operation of SAN switching state is executed, so that all controllers in the multi-control cluster system can normally take over data reading and writing tasks.

Example 4

On the basis of embodiment 1, embodiment 2 and embodiment 3, the present embodiment further discloses a multi-control cluster SAN switching back switching device, which includes a processor and a memory storing a plurality of computer instructions, where the computer instructions, when executed by the processor, implement the steps of the method of any one of embodiments 1 to 3, and the specific limitation of the multi-control cluster SAN switching back switching device may refer to the limitation of the multi-control cluster SAN switching back switching method described above, and will not be repeated herein.

It should be understood that, although the steps in the flowcharts of fig. 2-3 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least a portion of the other steps or sub-steps of other steps.

The multi-control cluster SAN switching back switching method and the device enable the rest controllers to take over the service of the failed or shut-down controllers through SAN switching when at least one controller fails or shuts down in the multi-control cluster system, further enable the service between the multi-control cluster system and a host to be normally carried out, enable all controllers in the multi-control cluster system to take over the service normally through SAN switching back after the failed controllers are recovered, and enable the internal SAN mapping and the external SAN mapping in the multi-control cluster SAN switching back switching method and the device to be compatible and achieve the SAN switching back switching state without limiting whether the IP-SAN or the FC-SAN is used.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A multi-controller cluster SAN switchback method, characterized in that: the multi-controller cluster SAN switchback method is applied to data reading and writing between a multi-controller cluster system and a host, and the data reading and writing between the multi-controller cluster system and the host includes a SAN normal state, a SAN switch state and a SAN switchback state, and the multi-controller cluster system includes n controllers, wherein:

When all controllers are normal, that is, the SAN is in normal state, a storage pool is created on one controller among n controllers, and a logical volume is created on the storage pool. The preset controller is defined as the configuration controller.

A first virtual volume is created on the logical volume, and then a first internal SAN mapping is created on the first virtual volume. The remaining n-1 non-configuration controllers all log in to the first internal SAN mapping, automatically generate corresponding first internal mapping volumes, and then second virtual volumes are respectively created for the first internal mapping volumes of the remaining n-1 non-configuration controllers, so as to establish a mapping relationship between each second virtual volume and the first virtual volume.

An external SAN mapping is established between each virtual volume and the host, so that n data read and write links exist between the host and the multi-controller cluster system;

When the configuration controller fails or shuts down during data reading and writing, that is, the SAN is switched, the pointers to the second virtual volumes of the remaining n-1 non-configuration controllers are empty, and the first internal SAN mapping is logged out, and a preset controller is selected from the remaining n-1 non-configuration controllers, and the storage pool and logical volume on the configuration controller are imported to the controller, and the controller is defined as the active controller;

The second virtual volume on the active controller points to the imported logical volume, and a new second internal SAN mapping is created on the second virtual volume. The remaining n-2 non-configuration controllers all log in to the second internal SAN mapping, automatically generate corresponding second internal mapping volumes, and the second virtual volumes on the remaining n-2 non-configuration controllers point to corresponding second internal mapping volumes, respectively, so as to establish a mapping relationship between the second virtual volumes on the remaining n-2 non-configuration controllers and the second virtual volume on the active controller. At this time, there are n-1 data reading and writing links between the host and the multi-controller cluster system.

When the configuration controller returns to normal during the data reading and writing process, that is, the SAN is switched back, the pointers of the second virtual volumes of the remaining n-1 non-configuration controllers are set to null, the storage pool and logical volume on the active controller are imported to the restored configuration controller, the second internal SAN mapping on the second virtual volume of the active controller is deleted, and then the remaining n-2 non-configuration controllers all log out of the second internal SAN mapping;

A third virtual volume is created on the logical volume of the configuration controller after recovery, and then the same internal SAN mapping as that on the first virtual volume is created on the third virtual volume and named as the third internal SAN mapping. The remaining n-1 non-configuration controllers all log in to the third internal SAN mapping, and the corresponding third internal mapping volume is automatically generated. The second virtual volumes on the remaining n-1 non-configuration controllers point to the corresponding third internal mapping volumes respectively, so as to establish a mapping relationship between the second virtual and third virtual volumes on the remaining n-1 non-configuration controllers, and then an external SAN mapping is established between the third virtual volume and the host, so that n data reading and writing links are provided between the host and the multi-controller cluster system.

2. The multi-controller cluster SAN switchback method as described in claim 1 is characterized in that: in the SAN switch state, the preset controller is the controller with the least resource occupation.

3. The multi-controller cluster SAN switching and switching back method as described in claim 1 is characterized in that: when at least two controllers fail or shut down during data reading and writing, when the failed or shut down controller is a non-configuration controller, the failed or shut down controller does not need to log in to the corresponding internal SAN mapping, and when the failed or shut down controller is a configuration controller, the SAN switching state operation is performed.

4. The multi-controller cluster SAN switchback method as described in claim 1 is characterized in that: the multi-controller cluster system also includes a detection module, and the detection module detects whether each controller is faulty or shut down during the data reading and writing process.

5. The multi-control cluster SAN switching back method as described in claim 1 is characterized in that: in the SAN switchback state, when importing the storage pool and logical volume on the active controller to the configuration controller after restoration, the storage pool and logical volume on the active controller are exported first, and then the exported storage pool and logical volume are imported to the configuration controller after restoration.

6. A multi-control cluster SAN switchback device, comprising a processor and a memory storing a plurality of computer instructions, wherein the computer instructions, when executed by the processor, implement the steps of the method described in any one of claims 1 to 5.