[go: up one dir, main page]

CN101651559A - Failover method of storage service in double controller storage system - Google Patents

Failover method of storage service in double controller storage system Download PDF

Info

Publication number
CN101651559A
CN101651559A CN200910016770A CN200910016770A CN101651559A CN 101651559 A CN101651559 A CN 101651559A CN 200910016770 A CN200910016770 A CN 200910016770A CN 200910016770 A CN200910016770 A CN 200910016770A CN 101651559 A CN101651559 A CN 101651559A
Authority
CN
China
Prior art keywords
controller
iscsi
iscsi target
controllers
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910016770A
Other languages
Chinese (zh)
Other versions
CN101651559B (en
Inventor
施培任
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Langchao Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langchao Electronic Information Industry Co Ltd filed Critical Langchao Electronic Information Industry Co Ltd
Priority to CN2009100167704A priority Critical patent/CN101651559B/en
Publication of CN101651559A publication Critical patent/CN101651559A/en
Application granted granted Critical
Publication of CN101651559B publication Critical patent/CN101651559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention provides a failover method of iSCSI Target storage service in a double controller storage system, and the failover steps are as follows: disconnecting a power supply of a failed controller by an electrical switch; taking over the iSCSI Target nodes and logic storage units of the failed controller; and binding the failed controller ip and carrying out arp broadcasting. The failover ofthe iSCSI target service can ensure continuous availability of the double controller storage system when single controller is failed.

Description

一种存储服务在双控制器存储系统中故障切换的方法 A method for failover of storage services in a dual-controller storage system

技术领域 technical field

本发明涉及计算机网络存储技术,具体涉及一种在双控制器ip-san存储系统和设备上的iSCSI Target服务故障切换的方法。The invention relates to computer network storage technology, in particular to a method for iSCSI Target service failover on a dual-controller ip-san storage system and equipment.

技术背景 technical background

SCSI(Small Computer System Interface,小型计算机系统接口)是一种高性能计算机外部设备接口。通过这个接口,所有连接到PC的外部设备均可通过HBA(Host Bus Adapter,主机总线适配器)实现彼此间独立于主机的数据传输和分发。目前SCSI协议的主要功能是在主机和存储设备之间传送命令、状态和块数据。SCSI体系结构定义了发起端(主机)和目标端(例如磁盘、磁带)之间作为客户/服务器进行交换的关系。SCSI-3应用程序客户端在主机上,它描述了高层应用程序、文件系统和操作系统的I/O请求。SCSI-3设备服务器在目标设备中,负责对请求做出响应。客户/服务器请求和响应通过底层传输的方式进行交换,并且由合适的SCSI-3服务发送协议进行管理,如FCP协议或者千兆串行链路的iSCSI协议。SCSI (Small Computer System Interface, small computer system interface) is a high-performance computer peripheral device interface. Through this interface, all external devices connected to the PC can realize data transmission and distribution independent of the host through the HBA (Host Bus Adapter, Host Bus Adapter). The main function of the current SCSI protocol is to transfer commands, status and block data between the host computer and the storage device. The SCSI architecture defines the relationship between an initiator (host) and a target (eg, disk, tape) as a client/server exchange. The SCSI-3 application client is on the host, and it describes the I/O requests of high-level applications, file systems, and operating systems. The SCSI-3 device server is in the target device and is responsible for responding to requests. Client/server requests and responses are exchanged over an underlying transport and managed by an appropriate SCSI-3 service delivery protocol, such as FCP or iSCSI over a Gigabit Serial Link.

iSCSI协议(Internet Small Computer Systems Interface),是通过tcp/ip网络传输映射SCSI相关体系结构的协议,iSCSI发起端(iSCSI Initiator)通过tcp/ip网络登录iSCSI目标端(iSCSI Target),传输SCSI命令、数据和状态,从而实现多个客户端通过tcp/ip网络获取同一目标端上的多个块设备,主要是磁盘,客户端的应用程序可以将这些块设备作为本地块设备使用,实现了集中存储。The iSCSI protocol (Internet Small Computer Systems Interface) is a protocol for mapping SCSI-related architectures through tcp/ip network transmission. The iSCSI initiator (iSCSI Initiator) logs in to the iSCSI target (iSCSI Target) through the tcp/ip network to transmit SCSI commands, Data and status, so that multiple clients can obtain multiple block devices on the same target through the tcp/ip network, mainly disks, and client applications can use these block devices as local block devices to achieve centralized storage .

随着网络技术发展,应用中的tcp/ip网络带宽和质量越来越高,iSCSI存储性能提高,应用也越来越广泛,随之而来的是对存储服务的可靠性和数据的安全性需求的提高。对于提供iSCSI存储服务的存储系统,若只有一个存储控制器,则当应用中存储控制器故障时,iSCSI存储应用将中断,这可能造成业务中断甚至数据不完整。若使用双控制器,能够相互接管服务,则根据概率论,存储服务的可用性将大大提高,因为双控制器同时故障的可能性要小得多。配合磁盘阵列RAID,双控存储系统将实现冗余备份功能。而iSCSI Target存储服务在发生控制器故障时切换的方法是这一系统的关键。With the development of network technology, the tcp/ip network bandwidth and quality in applications are getting higher and higher, iSCSI storage performance is improved, and applications are becoming more and more extensive, followed by the reliability of storage services and data security Increased demand. For a storage system that provides iSCSI storage services, if there is only one storage controller, when the storage controller in the application fails, the iSCSI storage application will be interrupted, which may cause service interruption or even data incompleteness. If dual controllers are used to take over services from each other, according to the probability theory, the availability of storage services will be greatly improved, because the possibility of simultaneous failure of dual controllers is much smaller. Cooperating with the disk array RAID, the dual-control storage system will realize the redundant backup function. The key to this system is the way the iSCSI Target storage service switches over in the event of a controller failure.

发明内容 Contents of the invention

本发明的目的是克服单控制器存储系统在控制器发生软硬件故障时存储服务将会中断断时造成应用客户端正在进行数据的读写则会造成应用中断和数据不一致的问题。在双控制器系统上的运行服务监控软件,当觉察到另一个控制器发生故障时,则在足够短(60秒以内)的时间内接管其存储服务(基于ip网络的san存储ip-san,基于光纤交换的san存储fc-san等),从而应用客户端的数据读写进程在短暂的中断后继续正常进行。The purpose of the present invention is to overcome the problem of application interruption and data inconsistency in a single-controller storage system when the storage service will be interrupted when the controller fails and the storage service is interrupted, causing the application client to read and write data. The running service monitoring software on the dual-controller system will take over its storage service in a short enough time (within 60 seconds) when it detects that another controller fails (san storage ip-san based on ip network, San storage based on optical fiber switching (fc-san, etc.), so that the data reading and writing process of the application client continues normally after a short interruption.

本发明的方法是按以下方式实现的,在双控存储系统在发生单个控制器故障时将iSCSI Target服务切换到另一控制器上,该双控存储系统主要结构是两个控制器都连接到相同的磁盘组,可以同时访问所有磁盘,如图1所示。在操作系统和软件方面,每个控制器系统都安装iSCSI Target服务软件,该iSCSI Target软件及服务的特征是:能够创建2个以上的iSCSI Target Node;能够将普通磁盘、RAID、逻辑卷LV等块设备作为SCSI逻辑单元LU并为该LU赋予指定的逻辑单元号LUN;能够为每个LU指定一个SCSI ID。其中SCSI ID是逻辑单元的标识,使用唯一的值来表示该逻辑单元,具体实现方法是在增加一个LU时指定和保存一个SCSI ID,当iSCSI发起端执行INQUIRY命令并将INQUIRY命令包中EVPD置位且PAGE CODE值为0x83h时,使用SCSI ID构件响应包回应LU标识。该双控存储系统在启动时和正常运行中保证两个控制器系统上的配置相同,其中两个控制器分别标识为控制器0和控制器1,在正常运行模式下,双控制器系统关于iSCSI Target服务配置如下:The method of the present invention is realized in the following manner. When a single controller fails in a dual-control storage system, the iSCSI Target service is switched to another controller. The main structure of the dual-control storage system is that both controllers are connected to The same disk group can access all disks at the same time, as shown in Figure 1. In terms of operating system and software, each controller system is installed with iSCSI Target service software. The characteristics of the iSCSI Target software and service are: it can create more than two iSCSI Target Nodes; The block device acts as a SCSI logical unit LU and assigns the specified logical unit number LUN to the LU; a SCSI ID can be specified for each LU. Among them, SCSI ID is the identification of the logical unit, and a unique value is used to represent the logical unit. The specific implementation method is to specify and save a SCSI ID when adding an LU. When the iSCSI initiator executes the INQUIRY command and sets the EVPD in the INQUIRY command packet to bit and the PAGE CODE value is 0x83h, use the SCSI ID component response packet to respond to the LU identifier. The dual-controller storage system ensures that the configurations on the two controller systems are the same during startup and normal operation. The two controllers are identified as controller 0 and controller 1 respectively. In normal operation mode, the dual-controller system is about The iSCSI Target service configuration is as follows:

(1)两控制器都运行iSCSI Target服务,分别创建不同的iSCSI Target Node,区别之一是iSCSI Target Name不同。为引用方便,记控制器0的iSCSI TargetNode为iSCSITargetNode0,控制器1的iSCSI Targe Node为iSCSITargetNode1(1) Both controllers run the iSCSI Target service and create different iSCSI Target Nodes respectively. One of the differences is that the iSCSI Target Name is different. For the convenience of reference, the iSCSI TargetNode of controller 0 is iSCSITargetNode0, and the iSCSI Target Node of controller 1 is iSCSITargetNode1

(2)将共享磁盘组提供的磁盘或者RAID,LV分为两组,分别由两个控制器上两个iSCSI Target Node用作SCSI LU,也即iSCSI LU,并为每个LU分配一个双控系统范围内唯一的单元号LUN和SCSI ID。为方便引用,记为LunGroup0和LunGroup1,分别供iSCSITargetNode0和iSCSITargetNode1使用。(2) Divide the disk or RAID and LV provided by the shared disk group into two groups, and use two iSCSI Target Nodes on the two controllers as SCSI LUs, that is, iSCSI LUs, and assign a dual controller to each LU System-wide unique unit number LUN and SCSI ID. For ease of reference, they are denoted as LunGroup0 and LunGroup1, which are used by iSCSITargetNode0 and iSCSITargetNode1 respectively.

(3)两个控制器使用不同的对外网络ip。为方便引用,记控制器0使用Ip0,控制器1使用Ip1。(3) The two controllers use different external network ip. For ease of reference, note that controller 0 uses Ip0, and controller 1 uses Ip1.

(4)保证iSCSI Initiator应用客户端通过网络连接到两个存储控制器。(4) Ensure that the iSCSI Initiator application client is connected to the two storage controllers through the network.

正常运行时iSCSI Target服务要素分布如表1所示,存储系统应用结构如图1所示。The distribution of iSCSI Target service elements during normal operation is shown in Table 1, and the application structure of the storage system is shown in Figure 1.

当iSCSI Initiator应用客户端正在访问(读写)一个控制器上的LU,该控制器硬件或软件系统发生故障导致客户端的iSCSI数据包没有响应,则将故障控制器上的iSCSI Target服务切换到另一控制器。When the iSCSI Initiator application client is accessing (reading and writing) the LU on a controller, and the hardware or software system of the controller fails, causing the iSCSI data packet of the client to fail to respond, the iSCSI Target service on the faulty controller will be switched to another a controller.

在系统运行中对iSCSI Target服务运行状态监控的方法,首先检测iSCSITarget服务进程是否存在,若不存在表明iSCSI Target服务停止;其次通过系统回环地址127.0.0.1尝试连接iSCSI Target服务监控TCP端口,若连接失败表明iSCSI Target服务运行异常,然后按照iSCSI协议尝试建立会话并发起REPORTLUN请求LUN列表,若多次尝试失败,表明iSCSI Target服务运行异常,否则认为iSCSITarget服务运行正常。对于iSCSI Target服务启动时对回环地址127.0.0.1开启监听。The method of monitoring the running status of the iSCSI Target service during system operation is to first check whether the iSCSI Target service process exists, and if it does not exist, it indicates that the iSCSI Target service is stopped; secondly, try to connect to the iSCSI Target service to monitor the TCP port through the system loopback address 127.0.0.1, if connected Failure indicates that the iSCSI Target service is running abnormally, and then try to establish a session according to the iSCSI protocol and initiate a REPORTLUN request for the LUN list. If multiple attempts fail, it indicates that the iSCSI Target service is running abnormally. Otherwise, the iSCSI Target service is considered to be running normally. When the iSCSI Target service is started, listen to the loopback address 127.0.0.1.

iSCSI Target服务切换的过程如下(以控制器0故障为例):The process of iSCSI Target service switching is as follows (take controller 0 failure as an example):

(1)应用客户端块读写请求阻塞,iSCSI Initiator尝试网络重传,假如TCP连接超时,则删除连接,尝试重建会话,直到超时,在此期间双控存储系统将完成iSCSI Target服务接管。(1) Application client block read and write requests are blocked, iSCSI Initiator tries to retransmit over the network, if the TCP connection times out, delete the connection, try to rebuild the session until timeout, during which the dual-control storage system will complete the iSCSI Target service takeover.

(2)控制器1通过电子开关切断控制器0电源以彻底隔离其对存储资源的访问和对外服务响应(2) Controller 1 cuts off the power supply of controller 0 through an electronic switch to completely isolate its access to storage resources and external service response

(3)控制器1正在运行的iSCSI Target服务创建iSCSI Target节点iSCSITarget0,并为iSCSITarget0增加逻辑单元资源LunGroup0,按照控制器0原配置设置每个LU的LUN,SCSI ID。(3) The running iSCSI Target service of controller 1 creates iSCSI Target node iSCSITarget0, and adds logical unit resource LunGroup0 to iSCSITarget0, and sets the LUN and SCSI ID of each LU according to the original configuration of controller 0.

(4)控制器1绑定原控制器A上对外ip Ip0,并进行arp广播(4) Controller 1 binds the external ip Ip0 on the original controller A, and broadcasts arp

(5)应用客户端接收到arp广播更新Ip0的MAC地址,iSCSI Initiator通过Ip0重新连接到控制器1上的iSCSITarget0,建立会话,获取iSCSI存储逻辑单元列表信息,通过INQUIRY命令查询每个LU的SCSI ID,自此所有挂起的iSCSI LU重新可用,数据读写恢复。(5) The application client receives the arp broadcast to update the MAC address of Ip0, and the iSCSI Initiator reconnects to iSCSITarget0 on controller 1 through Ip0, establishes a session, obtains the list information of the iSCSI storage logical unit, and queries the SCSI of each LU through the INQUIRY command ID, since then all suspended iSCSI LUs are available again, and data read and write resumes.

iSCSI Target服务接管后各要素分布如表2所示。Table 2 shows the distribution of each element after the iSCSI Target service is taken over.

本发明的有益效果是:在双控存储系统在发生单个控制器故障时将iSCSITarget服务切换到另一控制器上,避免iSCSI存储应用业务中断造成数据不完整。使用双控制器,能够相互接管服务,则根据概率论,存储服务的可用性将大大提高,因为双控制器同时故障的可能性要小得多。配合磁盘阵列RAID,双控存储系统将实现冗余备份功能。The beneficial effects of the invention are: when a single controller fails in a dual-controller storage system, the iSCSITarget service is switched to another controller, so as to avoid data incompleteness caused by interruption of iSCSI storage application services. Using dual controllers, which can take over services from each other, according to probability theory, the availability of storage services will be greatly improved, because the possibility of simultaneous failure of dual controllers is much smaller. Cooperating with the disk array RAID, the dual-control storage system will realize the redundant backup function.

附图说明 Description of drawings

图1为双控制器故障切换结构示意图。Figure 1 is a schematic diagram of a dual-controller failover structure.

图2是正常运行时iSCSI Target服务要素分布表;Figure 2 is the distribution table of iSCSI Target service elements during normal operation;

图3是iSCSI Target服务接管后各要素分布表。Figure 3 is the distribution table of each element after the iSCSI Target service is taken over.

具体实施方式Detailed ways

参照说明书附图对本发明的切换方法作以下详细的说明。The switching method of the present invention will be described in detail below with reference to the accompanying drawings.

在双控存储系统在发生单个控制器故障时将iSCSI Target服务切换到另一控制器上,该双控存储系统的结构是两个控制器都连接到相同的磁盘组,同时访问所有磁盘,在操作系统和软件方面,每个控制器系统都安装iSCSI Target服务软件,该iSCSI Target软件及服务的特征是:能够创建2个以上的iSCSI TargetNode;能够将普通磁盘、RAID、逻辑卷LV等块设备作为SCSI逻辑单元LU并为该LU赋予指定的逻辑单元号LUN;能够为每个LU指定一个SCSI ID,其中SCSI ID是逻辑单元的标识,使用唯一的值来表示该逻辑单元,在增加一个LU时指定和保存一个SCSI ID,当iSCSI发起端执行INQUIRY命令并将INQUIRY命令包中EVPD置位且PAGECODE值为0x83h时,使用SCSI ID构件响应包回应LU标识,该双控存储系统在启动时和正常运行中保证两个控制器系统上的配置相同,其中两个控制器分别标识为控制器0和控制器1,在正常运行模式下,双控制器系统关于iSCSI Target服务配置如下:In a dual-controller storage system, when a single controller fails, the iSCSI Target service is switched to another controller. The structure of the dual-controller storage system is that both controllers are connected to the same disk group and access all disks at the same time. In terms of operating system and software, each controller system is installed with iSCSI Target service software. The characteristics of the iSCSI Target software and service are: the ability to create more than two iSCSI TargetNodes; As a SCSI logical unit LU and assign a specified logical unit number LUN to the LU; a SCSI ID can be specified for each LU, where the SCSI ID is the identification of the logical unit, and a unique value is used to represent the logical unit. Adding an LU When the iSCSI initiator executes the INQUIRY command and sets EVPD in the INQUIRY command packet and the PAGECODE value is 0x83h, the SCSI ID component response packet is used to respond to the LU identifier. The dual-controller storage system starts and In normal operation, ensure that the configurations on the two controller systems are the same, and the two controllers are respectively identified as controller 0 and controller 1. In normal operation mode, the configuration of the iSCSI Target service of the dual-controller system is as follows:

(1)两控制器都运行iSCSI Target服务,分别创建不同的iSCSI Target Node,区别之一是iSCSI Target Name不同,为引用方便,记控制器0的iSCSI TargetNode为iSCSITargetNode0,控制器1的iSCSI Targe Node为iSCSITargetNode1;(1) Both controllers run the iSCSI Target service, and create different iSCSI Target Nodes respectively. One of the differences is that the iSCSI Target Name is different. For the convenience of reference, record the iSCSI TargetNode of controller 0 as iSCSI TargetNode0, and the iSCSI Target Node of controller 1 is iSCSITargetNode1;

(2)将共享磁盘组提供的磁盘或者RAID,LV分为两组,分别由两个控制器上两个iSCSI Target Node用作SCSI LU,也即iSCSI LU,并为每个LU分配一个双控系统范围内唯一的单元号LUN和SCSI ID,为方便引用,记为LunGroup0和LunGroup1,分别供iSCSITargetNode0和iSCSITargetNode1使用;(2) Divide the disk or RAID and LV provided by the shared disk group into two groups, and use two iSCSI Target Nodes on the two controllers as SCSI LUs, that is, iSCSI LUs, and assign a dual controller to each LU The unique unit number LUN and SCSI ID within the system are recorded as LunGroup0 and LunGroup1 for easy reference, and are used by iSCSITargetNode0 and iSCSITargetNode1 respectively;

(3)两个控制器使用不同的对外网络ip,为方便引用,记控制器0使用Ip0,控制器1使用Ip1;(3) The two controllers use different external network ip. For the convenience of reference, remember that controller 0 uses Ip0, and controller 1 uses Ip1;

(4)保证iSCSI Initiator应用客户端通过网络连接到两个存储控制器。(4) Ensure that the iSCSI Initiator application client is connected to the two storage controllers through the network.

双控制器连接到相同的磁盘阵列,在正常运行模式下,双控制器都运行iSCSITarget服务,但提供不同的iSCSI Target Node和主控不同的磁盘资源和对外ip;双控制器都运行监控软件监控控制器的iSCSI Target服务运行状况并相互通信,当一个控制器发生故障时,应用客户端访问故障控制器提供的iSCSI Target Node失败,根据iSCSI协议以及客户端数据读写超时重尝机制,它将维持一段时间的等待和重试,在这段时间内,另一控制器将接管故障控制器的iSCSI Target服务,以两个控制器分别为控制器0,控制器1,一个控制器提供iSCSITargetNode0,绑定IP0,另一控制器提供iSCSITargetNode1绑定IP1,假设控制器0障,控制器1管,故障接管步骤如下:Dual controllers are connected to the same disk array. In normal operation mode, both controllers run iSCSITarget services, but provide different iSCSI Target Nodes and masters with different disk resources and external ip; both controllers run monitoring software monitoring The controller's iSCSI Target service is running and communicates with each other. When a controller fails, the application client fails to access the iSCSI Target Node provided by the faulty controller. According to the iSCSI protocol and the client data read and write timeout retry mechanism, it will Keep waiting and retrying for a period of time. During this time, another controller will take over the iSCSI Target service of the faulty controller. The two controllers provide iSCSItargetNode0 for controller 0, controller 1, and one controller respectively. Bind IP0, and another controller provides iSCSITargetNode1 to bind IP1. Assume that controller 0 is faulty and controller 1 is in charge. The fault takeover steps are as follows:

(S1)控制器1通过电子开关切断控制器0电源以彻底隔离其对存储资源的访问和对外服务响应;(S1) Controller 1 cuts off the power supply of Controller 0 through an electronic switch to completely isolate its access to storage resources and external service response;

(S2)控制器1正在运行的iSCSI Target服务创建iSCSITargetNode0,并为iSCSITargetNode0增加原控制器0上的存储资源,以逻辑单元LUN的形式;(S2) The iSCSI Target service that controller 1 is running creates iSCSITargetNode0, and adds storage resources on the original controller 0 to iSCSITargetNode0 in the form of a logical unit LUN;

(S3)控制器1绑定原控制器A上对外ip IP0,并进行arp广播;(S3) Controller 1 binds the external ip IP0 on the original controller A, and performs arp broadcast;

(S4)应用客户端通过IP0重新连接到iSCSITargetNode0,重新建立会话,获取iSCSI存储逻辑单元,重新传输数据读写请求。(S4) The application client reconnects to iSCSITargetNode0 through IP0, reestablishes the session, obtains the iSCSI storage logical unit, and retransmits the data read and write request.

检测系统上iSCSI Target服务是否运行正常的方法为首先检查iSCSI Target服务进程是否存在,其次模拟iSCSI Initiator登录本控制器系统上iSCSI TargetNode并请求报告逻辑单元列表REPORTLUN来模拟应用客户端访问,两步骤都正常则认为本系统上iSCSI Target服务运行正常,否则为异常。The method to detect whether the iSCSI Target service on the system is running normally is to first check whether the iSCSI Target service process exists, and then simulate the iSCSI Initiator to log in to the iSCSI TargetNode on the controller system and request the report logic unit list REPORTLUN to simulate the application client access. If it is normal, it is considered that the iSCSI Target service on this system is running normally, otherwise it is abnormal.

控制器接管时,保持新增的要素和故障控制器的一致,这些要素包括:iSCSITarget Node Name,存储设备列表及其对应的LUN号、SCSI-ID,对外服务ip。When the controller takes over, keep the newly added elements consistent with the failed controller. These elements include: iSCSITarget Node Name, storage device list and its corresponding LUN number, SCSI-ID, and external service ip.

Claims (4)

1、一种iSCSI Target存储服务在双控制器存储系统中故障切换的方法,其特征在于,在双控存储系统在发生单个控制器故障时将iSCSI Target服务切换到另一控制器上,该双控存储系统的结构是两个控制器都连接到相同的磁盘组,同时访问所有磁盘,在操作系统和软件方面,每个控制器系统都安装iSCSI Target服务软件,该iSCSI Target软件及服务的特征是:能够创建2个以上的iSCSI TargetNode;能够将普通磁盘、RAID、逻辑卷LV等块设备作为SCSI逻辑单元LU并为该LU赋予指定的逻辑单元号LUN;能够为每个LU指定一个SCSI ID,其中SCSI ID是逻辑单元的标识,使用唯一的值来表示该逻辑单元,在增加一个LU时指定和保存一个SCSI ID,当iSCSI发起端执行INQUIRY命令并将INQUIRY命令包中EVPD置位且PAGECODE值为0x83h时,使用SCSI ID构件响应包回应LU标识,该双控存储系统在启动时和正常运行中保证两个控制器系统上的配置相同,其中两个控制器分别标识为控制器0和控制器1,在正常运行模式下,双控制器系统关于iSCSI Target服务配置如下:1. A method for failover of iSCSI Target storage services in a dual-controller storage system, characterized in that, when a single controller fails in a dual-controller storage system, the iSCSI Target service is switched to another controller, and the dual-controller storage system switches the iSCSI Target service to another controller. The structure of the control storage system is that two controllers are connected to the same disk group and access all disks at the same time. In terms of operating system and software, each controller system is installed with iSCSI Target service software. The characteristics of the iSCSI Target software and services Yes: can create more than 2 iSCSI TargetNodes; can use ordinary disks, RAID, logical volume LV and other block devices as SCSI logical unit LU and assign a specified logical unit number LUN to the LU; can specify a SCSI ID for each LU , where SCSI ID is the identification of the logical unit, using a unique value to represent the logical unit, specify and save a SCSI ID when adding an LU, when the iSCSI initiator executes the INQUIRY command and sets EVPD in the INQUIRY command packet and PAGECODE When the value is 0x83h, the SCSI ID component response packet is used to respond to the LU identification. The dual-controller storage system ensures that the configurations on the two controller systems are the same during startup and normal operation, and the two controllers are identified as controller 0 and controller 0 respectively. Controller 1, in normal operation mode, the configuration of the iSCSI Target service of the dual-controller system is as follows: (1)两控制器都运行iSCSI Target服务,分别创建不同的iSCSI Target Node,区别之一是iSCSI Target Name不同,为引用方便,记控制器0的iSCSI TargetNode为iSCSITargetNode0,控制器1的iSCSI Targe Node为iSCSITargetNode1;(1) Both controllers run the iSCSI Target service, and create different iSCSI Target Nodes respectively. One of the differences is that the iSCSI Target Name is different. For the convenience of reference, record the iSCSI TargetNode of controller 0 as iSCSI TargetNode0, and the iSCSI Target Node of controller 1 is iSCSITargetNode1; (2)将共享磁盘组提供的磁盘或者RAID,LV分为两组,分别由两个控制器上两个iSCSI Target Node用作SCSI LU,也即iSCSI LU,并为每个LU分配一个双控系统范围内唯一的单元号LUN和SCSI ID,为方便引用,记为LunGroup0和LunGroup1,分别供iSCSITargetNode0和iSCSITargetNode1使用;(2) Divide the disk or RAID and LV provided by the shared disk group into two groups, and use two iSCSI Target Nodes on the two controllers as SCSI LUs, that is, iSCSI LUs, and assign a dual controller to each LU The unique unit number LUN and SCSI ID within the system are recorded as LunGroup0 and LunGroup1 for easy reference, and are used by iSCSITargetNode0 and iSCSITargetNode1 respectively; (3)两个控制器使用不同的对外网络ip,为方便引用,记控制器0使用Ip0,控制器1使用Ip1;(3) The two controllers use different external network ip. For the convenience of reference, remember that controller 0 uses Ip0, and controller 1 uses Ip1; (4)保证iSCSI Initiator应用客户端通过网络连接到两个存储控制器。(4) Ensure that the iSCSI Initiator application client is connected to the two storage controllers through the network. 2、根据权利要求1所述的故障切换的方法,其特征在于,双控制器连接到相同的磁盘阵列,在正常运行模式下,双控制器都运行iSCSI Target服务,但提供不同的iSCSI Target Node和主控不同的磁盘资源和对外ip;双控制器都运行监控软件监控控制器的iSCSI Target服务运行状况并相互通信,当一个控制器发生故障时,应用客户端访问故障控制器提供的iSCSI Target Node失败,根据iSCSI协议以及客户端数据读写超时重尝机制,它将维持一段时间的等待和重试,在这段时间内,另一控制器将接管故障控制器的iSCSI Target服务,以两个控制器分别为控制器0,控制器1,一个控制器提供iSCSITargetNode0,绑定IPO,另一控制器提供iSCSITargetNode1绑定IP1,假设控制器0障,控制器1管,故障接管步骤如下:2. The failover method according to claim 1, wherein the dual controllers are connected to the same disk array, and in the normal operation mode, both controllers run iSCSI Target services, but provide different iSCSI Target Nodes Different disk resources and external ip from the main controller; both controllers run monitoring software to monitor the running status of the iSCSI Target service of the controller and communicate with each other. When a controller fails, the application client accesses the iSCSI Target provided by the faulty controller Node failure, according to the iSCSI protocol and the client data read and write timeout retry mechanism, it will maintain a period of waiting and retrying, during this period, another controller will take over the iSCSI Target service of the faulty controller, with The two controllers are controller 0 and controller 1. One controller provides iSCSITargetNode0 to bind IPO, and the other controller provides iSCSITargetNode1 to bind IP1. Assuming that controller 0 is faulty and controller 1 is in charge, the fault takeover steps are as follows: (S1)控制器1通过电子开关切断控制器0电源以彻底隔离其对存储资源的访问和对外服务响应;(S1) Controller 1 cuts off the power supply of Controller 0 through an electronic switch to completely isolate its access to storage resources and external service response; (S2)控制器1正在运行的iSCSI Target服务创建iSCSITargetNode0,并为iSCSITargetNode0增加原控制器0上的存储资源,以逻辑单元LUN的形式;(S2) The iSCSI Target service that controller 1 is running creates iSCSITargetNode0, and adds storage resources on the original controller 0 to iSCSITargetNode0 in the form of a logical unit LUN; (S3)控制器1绑定原控制器A上对外ip IPO,并进行arp广播;(S3) Controller 1 binds the external ip IPO on the original controller A, and performs arp broadcast; (S4)应用客户端通过IPO重新连接到iSCSITargetNode0,重新建立会话,获取iSCSI存储逻辑单元,重新传输数据读写请求。(S4) The application client reconnects to iSCSITargetNode0 through the IPO, reestablishes the session, obtains the iSCSI storage logic unit, and retransmits the data read and write request. 3、根据权利要求1所述的故障切换的方法,其特征在于,检测系统上iSCSITarget服务是否运行正常的方法为首先检查iSCSI Target服务进程是否存在,其次模拟iSCSI Initiator登录本控制器系统上iSCSI Target Node并请求报告逻辑单元列表REPORTLUN来模拟应用客户端访问,两步骤都正常则认为本系统上iSCSITarget服务运行正常,否则为异常。3. The failover method according to claim 1, wherein the method for detecting whether the iSCSI Target service on the system is running normally is to first check whether the iSCSI Target service process exists, and secondly simulate the iSCSI Initiator to log in to the iSCSI Target on the controller system Node also requests the report logic unit list REPORTLUN to simulate application client access. If both steps are normal, it is considered that the iSCSITarget service on the system is running normally, otherwise it is abnormal. 4、根据权利要求1所述的故障切换的方法,其特征在于,控制器接管时,保持新增的要素和故障控制器的一致,这些要素包括:iSCSI Target Node Name,存储设备列表及其对应的LUN号、SCSI-ID,对外服务ip。4. The failover method according to claim 1, characterized in that, when the controller takes over, keep the newly added elements consistent with the failed controller, these elements include: iSCSI Target Node Name, storage device list and its corresponding The LUN number, SCSI-ID, external service ip.
CN2009100167704A 2009-07-13 2009-07-13 A method for failover of storage services in a dual-controller storage system Active CN101651559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100167704A CN101651559B (en) 2009-07-13 2009-07-13 A method for failover of storage services in a dual-controller storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100167704A CN101651559B (en) 2009-07-13 2009-07-13 A method for failover of storage services in a dual-controller storage system

Publications (2)

Publication Number Publication Date
CN101651559A true CN101651559A (en) 2010-02-17
CN101651559B CN101651559B (en) 2011-07-06

Family

ID=41673688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100167704A Active CN101651559B (en) 2009-07-13 2009-07-13 A method for failover of storage services in a dual-controller storage system

Country Status (1)

Country Link
CN (1) CN101651559B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102318275A (en) * 2011-08-02 2012-01-11 华为技术有限公司 Method, device, and system for processing messages based on CC-NUMA
CN102868754A (en) * 2012-09-26 2013-01-09 北京联创信安科技有限公司 High-availability method, node device and system for achieving cluster storage
CN102868727A (en) * 2012-08-23 2013-01-09 广东电子工业研究院有限公司 Method for realizing high availability of logical volume
CN103124282A (en) * 2011-11-17 2013-05-29 英业达股份有限公司 Failure transfer processing system and method using Ethernet optical fiber network
CN103259688A (en) * 2013-06-04 2013-08-21 北京搜狐新媒体信息技术有限公司 Failure diagnosis method and device of distributed storage system
CN103383689A (en) * 2012-05-03 2013-11-06 阿里巴巴集团控股有限公司 Service process fault detection method, device and service node
CN104718536A (en) * 2012-06-25 2015-06-17 Netapp股份有限公司 Non-disruptive controller replacement in network storage systems
CN104952476A (en) * 2015-07-08 2015-09-30 山东中孚信息产业股份有限公司 Reliable mobile storage medium and implementing method
CN105159846A (en) * 2015-07-02 2015-12-16 浙江宇视科技有限公司 Method for supporting dual-control switching of virtualized disk and storage system
CN105337762A (en) * 2015-09-28 2016-02-17 浪潮(北京)电子信息产业有限公司 File sharing method supporting automatic failover
CN105516252A (en) * 2015-11-26 2016-04-20 华为技术有限公司 TCP (Transmission Control Protocol) connection switching method, apparatus and system
CN106294031A (en) * 2016-07-29 2017-01-04 杭州宏杉科技有限公司 A kind of business management method and storage control
CN106354436A (en) * 2016-09-20 2017-01-25 郑州云海信息技术有限公司 Storage system based on distributed IPSAN
WO2017032147A1 (en) * 2015-08-25 2017-03-02 中兴通讯股份有限公司 Method and apparatus for switching internet small computer system interface session link
CN107423167A (en) * 2017-07-31 2017-12-01 郑州云海信息技术有限公司 A kind of ISCSI target redundancy control methods and system based on dual control storage
CN107678891A (en) * 2017-10-13 2018-02-09 郑州云海信息技术有限公司 The dual control method, apparatus and readable storage medium storing program for executing of a kind of storage system
CN107911238A (en) * 2017-11-13 2018-04-13 郑州云海信息技术有限公司 One kind is based on IPSAN servers two unit standby method and system
CN108519940A (en) * 2018-04-12 2018-09-11 郑州云海信息技术有限公司 A storage device alarm method, system, and computer-readable storage medium
CN108897644A (en) * 2018-06-22 2018-11-27 山东超越数控电子股份有限公司 A kind of dual controller fault handling method and system
CN108959137A (en) * 2018-09-21 2018-12-07 郑州云海信息技术有限公司 A kind of data transmission method, device, equipment and readable storage medium storing program for executing
CN109783280A (en) * 2019-01-15 2019-05-21 上海海得控制系统股份有限公司 Shared memory systems and shared storage method
CN110213065A (en) * 2018-02-28 2019-09-06 杭州宏杉科技股份有限公司 A kind of method and device of path switching
WO2020113875A1 (en) * 2018-12-07 2020-06-11 华为技术有限公司 Control device switching method, control device and storage system
CN111290702A (en) * 2018-12-07 2020-06-16 华为技术有限公司 A switching method of a control device, a control device, and a storage system
CN111581034A (en) * 2020-04-30 2020-08-25 新华三信息安全技术有限公司 RAID card fault processing method and device
CN112015159A (en) * 2019-05-31 2020-12-01 中车株洲电力机车研究所有限公司 Fault record storage method based on dual-core MCU and computer system
CN114741239A (en) * 2022-04-30 2022-07-12 苏州浪潮智能科技有限公司 Enhanced system, method, apparatus and storage medium for storage device management stability

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078521A1 (en) * 2002-10-17 2004-04-22 International Business Machines Corporation Method, apparatus and computer program product for emulating an iSCSI device on a logical volume manager
CN100452795C (en) * 2004-01-16 2009-01-14 英业达股份有限公司 Method for accessing logical device by using iSCSI protocol
CN101123485B (en) * 2007-09-20 2011-05-11 杭州华三通信技术有限公司 iSCSI packet processing method and device, error recovery method and device

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102318275B (en) * 2011-08-02 2015-01-07 华为技术有限公司 Method, device, and system for processing messages based on CC-NUMA
CN102318275A (en) * 2011-08-02 2012-01-11 华为技术有限公司 Method, device, and system for processing messages based on CC-NUMA
CN103124282A (en) * 2011-11-17 2013-05-29 英业达股份有限公司 Failure transfer processing system and method using Ethernet optical fiber network
CN103383689A (en) * 2012-05-03 2013-11-06 阿里巴巴集团控股有限公司 Service process fault detection method, device and service node
CN104718536A (en) * 2012-06-25 2015-06-17 Netapp股份有限公司 Non-disruptive controller replacement in network storage systems
CN104718536B (en) * 2012-06-25 2018-04-13 Netapp股份有限公司 Non-destructive controller in network store system is replaced
CN102868727A (en) * 2012-08-23 2013-01-09 广东电子工业研究院有限公司 Method for realizing high availability of logical volume
CN102868727B (en) * 2012-08-23 2015-06-17 广东电子工业研究院有限公司 Method for realizing high availability of logical volume
CN102868754B (en) * 2012-09-26 2016-08-03 北京联创信安科技股份有限公司 A kind of realize the method for cluster-based storage high availability, node apparatus and system
CN102868754A (en) * 2012-09-26 2013-01-09 北京联创信安科技有限公司 High-availability method, node device and system for achieving cluster storage
CN103259688A (en) * 2013-06-04 2013-08-21 北京搜狐新媒体信息技术有限公司 Failure diagnosis method and device of distributed storage system
CN105159846A (en) * 2015-07-02 2015-12-16 浙江宇视科技有限公司 Method for supporting dual-control switching of virtualized disk and storage system
CN104952476A (en) * 2015-07-08 2015-09-30 山东中孚信息产业股份有限公司 Reliable mobile storage medium and implementing method
WO2017032147A1 (en) * 2015-08-25 2017-03-02 中兴通讯股份有限公司 Method and apparatus for switching internet small computer system interface session link
CN105337762A (en) * 2015-09-28 2016-02-17 浪潮(北京)电子信息产业有限公司 File sharing method supporting automatic failover
CN105516252A (en) * 2015-11-26 2016-04-20 华为技术有限公司 TCP (Transmission Control Protocol) connection switching method, apparatus and system
CN106294031A (en) * 2016-07-29 2017-01-04 杭州宏杉科技有限公司 A kind of business management method and storage control
CN106294031B (en) * 2016-07-29 2019-07-12 杭州宏杉科技股份有限公司 A kind of business management method and storage control
CN106354436A (en) * 2016-09-20 2017-01-25 郑州云海信息技术有限公司 Storage system based on distributed IPSAN
CN107423167A (en) * 2017-07-31 2017-12-01 郑州云海信息技术有限公司 A kind of ISCSI target redundancy control methods and system based on dual control storage
CN107678891A (en) * 2017-10-13 2018-02-09 郑州云海信息技术有限公司 The dual control method, apparatus and readable storage medium storing program for executing of a kind of storage system
CN107678891B (en) * 2017-10-13 2021-06-29 郑州云海信息技术有限公司 A dual-control method, device and readable storage medium for a storage system
CN107911238A (en) * 2017-11-13 2018-04-13 郑州云海信息技术有限公司 One kind is based on IPSAN servers two unit standby method and system
CN110213065A (en) * 2018-02-28 2019-09-06 杭州宏杉科技股份有限公司 A kind of method and device of path switching
CN110213065B (en) * 2018-02-28 2022-11-25 杭州宏杉科技股份有限公司 Method and device for switching paths
CN108519940A (en) * 2018-04-12 2018-09-11 郑州云海信息技术有限公司 A storage device alarm method, system, and computer-readable storage medium
CN108897644A (en) * 2018-06-22 2018-11-27 山东超越数控电子股份有限公司 A kind of dual controller fault handling method and system
CN108959137A (en) * 2018-09-21 2018-12-07 郑州云海信息技术有限公司 A kind of data transmission method, device, equipment and readable storage medium storing program for executing
WO2020113875A1 (en) * 2018-12-07 2020-06-11 华为技术有限公司 Control device switching method, control device and storage system
CN111290702A (en) * 2018-12-07 2020-06-16 华为技术有限公司 A switching method of a control device, a control device, and a storage system
CN109783280A (en) * 2019-01-15 2019-05-21 上海海得控制系统股份有限公司 Shared memory systems and shared storage method
CN112015159A (en) * 2019-05-31 2020-12-01 中车株洲电力机车研究所有限公司 Fault record storage method based on dual-core MCU and computer system
CN111581034A (en) * 2020-04-30 2020-08-25 新华三信息安全技术有限公司 RAID card fault processing method and device
CN114741239A (en) * 2022-04-30 2022-07-12 苏州浪潮智能科技有限公司 Enhanced system, method, apparatus and storage medium for storage device management stability

Also Published As

Publication number Publication date
CN101651559B (en) 2011-07-06

Similar Documents

Publication Publication Date Title
CN101651559A (en) Failover method of storage service in double controller storage system
JP6317856B2 (en) Smooth controller change in redundant configuration between clusters
US10606715B2 (en) Efficient high availability for a SCSI target over a fibre channel
JP6955466B2 (en) SSD data replication system and method
JP6476348B2 (en) Implementing automatic switchover
US6609213B1 (en) Cluster-based system and method of recovery from server failures
US12335088B2 (en) Implementing switchover operations between computing nodes
US10719419B2 (en) Service processor traps for communicating storage controller failure
US8335899B1 (en) Active/active remote synchronous mirroring
US6883065B1 (en) System and method for a redundant communication channel via storage area network back-end
US7043663B1 (en) System and method to monitor and isolate faults in a storage area network
CN100544342C (en) Storage system
US7127633B1 (en) System and method to failover storage area network targets from one interface to another
US9098466B2 (en) Switching between mirrored volumes
US20030135782A1 (en) Fail-over storage system
US20050283641A1 (en) Apparatus, system, and method for verified fencing of a rogue node within a cluster
US20050193238A1 (en) System and method for providing automatic data restoration after a storage device failure
US20030131068A1 (en) Distributed storage system, storage device and method of copying data
US20070022314A1 (en) Architecture and method for configuring a simplified cluster over a network with fencing and quorum
JP2007072571A (en) Computer system, management computer, and access path management method
US20100275219A1 (en) Scsi persistent reserve management
US20230409227A1 (en) Resilient implementation of client file operations and replication
US7711978B1 (en) Proactive utilization of fabric events in a network virtualization environment
US7650463B2 (en) System and method for RAID recovery arbitration in shared disk applications
US9384151B1 (en) Unified SCSI target management for managing a crashed service daemon in a deduplication appliance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant