[go: up one dir, main page]

CN104796276B - A kind of link switch-over method and system - Google Patents

A kind of link switch-over method and system Download PDF

Info

Publication number
CN104796276B
CN104796276B CN201410026793.4A CN201410026793A CN104796276B CN 104796276 B CN104796276 B CN 104796276B CN 201410026793 A CN201410026793 A CN 201410026793A CN 104796276 B CN104796276 B CN 104796276B
Authority
CN
China
Prior art keywords
link
switch module
subnet
switch
standby
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410026793.4A
Other languages
Chinese (zh)
Other versions
CN104796276A (en
Inventor
李�远
裴照华
郭强
赵泽
李明
崔洪涛
彭庆军
邵保华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Henan Co Ltd
Original Assignee
China Mobile Group Henan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Henan Co Ltd filed Critical China Mobile Group Henan Co Ltd
Priority to CN201410026793.4A priority Critical patent/CN104796276B/en
Publication of CN104796276A publication Critical patent/CN104796276A/en
Application granted granted Critical
Publication of CN104796276B publication Critical patent/CN104796276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种链路切换方法和系统,用于解决现有技术中存在的刀片主机在当前使用的链路发生故障时不会主动切换从而导致数据传输中断的问题。该方案中,刀片主机中存在主IB Switch模块和备用IB Switch模块,且任意一个IB Switch模块与所述刀片主机所在的IB子网中的每个IB Switch之间均存在链路。当所述刀片主机根据预先确定出的自身所在IB子网的网络拓扑结构,判断出当前使用的第一链路发生故障时,确定第二链路;其中,所述第一链路与所述第二链路均与所述刀片主机中的主IB Switch模块连接;从所述第一链路切换到所述第二链路,使所述刀片主机能够使用所述第二链路传输数据。

The invention discloses a link switching method and system, which are used to solve the problem in the prior art that a blade host will not actively switch when a currently used link fails, resulting in interruption of data transmission. In this solution, there are active IB Switch modules and standby IB Switch modules in the blade host, and there is a link between any IB Switch module and each IB Switch in the IB subnet where the blade host is located. When the blade host determines that the currently used first link fails according to the predetermined network topology structure of the IB subnet where it is located, it determines the second link; wherein, the first link and the The second links are all connected to the main IB Switch module in the blade host; switching from the first link to the second link enables the blade host to use the second link to transmit data.

Description

一种链路切换方法和系统A link switching method and system

技术领域technical field

本发明涉及信息与业务支撑技术领域,尤其涉及一种链路切换方法和系统。The present invention relates to the technical field of information and service support, in particular to a link switching method and system.

背景技术Background technique

互联结构(InfiniBand,简称IB)网络是一种开放标准的高宽带、高速网络互连技术,这种技术不是用于一般网络连接的,它的主要设计目的是针对服务器端的连接问题。因此,InfiniBand技术主要是应用于服务器与服务器,服务器和存储设备(比如存储区域网络(Storage Area Network,SAN)和直接存储附件)以及服务器和网络之间(比如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和国际互联网(internetwork,简称internet))的通信。Interconnect structure (InfiniBand, referred to as IB) network is an open standard high-bandwidth, high-speed network interconnection technology, this technology is not used for general network connection, its main design purpose is to solve the problem of server-side connection. Therefore, InfiniBand technology is mainly applied to servers and servers, servers and storage devices (such as storage area network (Storage Area Network, SAN) and direct storage attachments) and between servers and networks (such as local area network (Local Area Network, LAN), Wide Area Network (Wide Area Network, WAN) and Internet (internetwork, referred to as the Internet)) communication.

如图1所示,为现有的IB网络拓扑结构图。其中,主机中的两个网卡分别连接到两个不同的交换机上(即主机与每个交换机之间仅存在一条链路),而且两个交换机之间级联。在主机侧做网卡绑定,绑定模式为主-备方式(Active-Standby方式),当active链路发生故障时,standby链路接管发生故障的active链路,同时由standby模式变为active模式,以保障数据在传输过程中不会因为某一链路发生故障而传输中断。参考图1,当9000对应的链路或是网卡出现故障时,数据传输便切换到9002对应的链路和网卡上。As shown in Figure 1, it is a topological structure diagram of an existing IB network. Wherein, the two network cards in the host are respectively connected to two different switches (that is, there is only one link between the host and each switch), and the two switches are cascaded. Network card binding is performed on the host side. The binding mode is active-standby mode (Active-Standby mode). When the active link fails, the standby link takes over the failed active link and changes from standby mode to active mode at the same time. , to ensure that data transmission will not be interrupted due to a link failure during data transmission. Referring to Figure 1, when the link or network card corresponding to 9000 fails, the data transmission is switched to the link and network card corresponding to 9002.

上述这种网络拓扑结构对传统的使用外设互联标准(Peripheral ComponentInterconnect,PCI)的主机来说,一旦当前使用的链路发生故障,就会通过IP协议的上联检测机制来通知主机网络发生故障,因此当主交换机发生故障时,数据传输可以切换到备用交换机上,当主网卡或是主网卡对应的链路发生故障时,数据传输可以切换到备用网卡和相应的链路上。For the above-mentioned network topology, for traditional hosts that use the Peripheral Component Interconnect (PCI) standard, once the currently used link fails, the host will be notified of the network failure through the uplink detection mechanism of the IP protocol. , so when the main switch fails, the data transmission can be switched to the standby switch, and when the main network card or the link corresponding to the main network card fails, the data transmission can be switched to the standby network card and the corresponding link.

但是,对于使用刀片架构(在标准高度的机架式机箱内插装多个卡式的服务器单元,实现高可用和高密度)的主机(简称为刀片主机)来说,如果将刀片主机应用到上述传统的网络拓扑结构,一旦当前使用的链路出现故障,IP协议的上联检测机制也会通知刀片主机网络发生故障,但是由于刀片主机内部的硬件架构已不同于传统的主机架构,其刀片上集成了主机信道适配器(Host Channel Adapter,HCA)卡,HCA卡与刀片主机内部的IBSwitch(互联结构交换机)模块连接,而IB Switch模块没有上联检测机制的功能,因此无论当前使用的链路是否发生故障,刀片上HCA卡与IB Switch模块的连接端口始终都是连接(up)的状态,因此刀片主机无法获知网络故障情况。在这种情况下,刀片主机会始终向主IBSwitch模块对应的链路发送数据,无法实现网络服务从主IB Switch模块切换到备用IBSwitch模块,从而造成数据传输的中断。However, for a host (referred to as a blade host) using a blade architecture (multiple card-type server units inserted in a standard-height rack cabinet to achieve high availability and high density), if the blade host is applied to In the above-mentioned traditional network topology, once the currently used link fails, the uplink detection mechanism of the IP protocol will also notify the blade host that the network has failed. However, because the internal hardware architecture of the blade host is different from the traditional host architecture, its blade The host channel adapter (Host Channel Adapter, HCA) card is integrated on the blade host. The HCA card is connected to the IBSwitch (interconnect structure switch) module inside the blade host, and the IB Switch module does not have the function of the uplink detection mechanism, so no matter the link currently used Whether a fault occurs, the connection port between the HCA card and the IB Switch module on the blade is always connected (up), so the blade host cannot be informed of network faults. In this case, the blade host will always send data to the link corresponding to the active IBSwitch module, and the network service cannot be switched from the active IBSwitch module to the standby IBSwitch module, resulting in interruption of data transmission.

发明内容Contents of the invention

本发明实施例提供一种链路切换方法和系统,用以解决现有技术存在的刀片主机在当前使用的链路发生故障时不会主动切换从而导致数据传输中断的问题。Embodiments of the present invention provide a link switching method and system to solve the problem in the prior art that a blade host does not actively switch when a currently used link fails, resulting in interruption of data transmission.

本发明实施例采用以下技术方案:Embodiments of the present invention adopt the following technical solutions:

一种链路切换方法,刀片主机中存在主互联结构交换机IB Switch模块和备用IBSwitch模块,且任意一个IB Switch模块与所述刀片主机所在的IB子网中的每个IB Switch之间均存在链路,所述方法包括:A link switching method, a main interconnection structure switch IB Switch module and a standby IBSwitch module exist in a blade host, and there is a link between any IB Switch module and each IB Switch in the IB subnet where the blade host is located way, said method includes:

当所述刀片主机根据预先确定出的自身所在IB子网的网络拓扑结构,判断出当前使用的第一链路发生故障时,确定第二链路;其中,所述第一链路与所述第二链路均与所述刀片主机中的主IB Switch模块连接;When the blade host judges that the first link currently used fails according to the network topology structure of the IB subnet determined in advance, it determines the second link; wherein, the first link and the The second links are all connected to the main IB Switch module in the blade host;

从所述第一链路切换到所述第二链路,使所述刀片主机能够使用所述第二链路传输数据。Switching from the first link to the second link enables the blade host to use the second link to transmit data.

其中,所述方法还包括:Wherein, the method also includes:

当所述刀片主机确定出连接所述第一链路的主IB Switch模块出现故障时,确定与所述刀片主机中的备用IB Switch模块连接的链路;When the blade host determines that the main IB Switch module connected to the first link fails, determine the link connected to the standby IB Switch module in the blade host;

从所述第一链路切换到与所述备用IB Switch模块连接的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的链路传输数据。switching from the first link to the link connected to the standby IB Switch module, so that the blade host can use the link connected to the standby IB Switch module to transmit data.

其中,与所述备用IB Switch模块连接的链路有至少两条;则Wherein, there are at least two links connected to the standby IB Switch module; then

从所述第一链路切换到与所述备用IB Switch模块连接的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的链路传输数据,具体包括:Switching from the first link to the link connected to the standby IB Switch module, so that the blade host can use the link connected to the standby IB Switch module to transmit data, specifically includes:

确定与备用IB Switch模块连接的每条链路的权值;Determine the weight of each link connected to the standby IB Switch module;

从所述第一链路切换到与所述备用IB Switch模块连接的权值最大的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的权值最大的链路传输数据。Switching from the first link to the link with the highest weight connected to the standby IB Switch module, so that the blade host can use the link with the highest weight connected to the standby IB Switch module to transmit data.

其中,所述网络拓扑结构按照如下方式确定:Wherein, the network topology is determined as follows:

所述IB子网中的子网管理器SM通过向所述IB子网发送子网探测包对所述IB子网中的节点进行探测,并根据探测结果生成所述IB子网的网络拓扑结构。The subnet manager SM in the IB subnet detects the nodes in the IB subnet by sending subnet detection packets to the IB subnet, and generates the network topology of the IB subnet according to the detection results .

其中,所述IB子网中的每个IB Switch级联,并且针对所述刀片主机中的每个IBSwitch模块:Wherein, each IB Switch in the IB subnet is cascaded, and for each IBSwitch module in the blade host:

当前IB Switch模块与所述刀片主机所在的子网中的每个IB Switch之间存在的链路被绑定成一个虚拟链路。Links existing between the current IB Switch module and each IB Switch in the subnet where the blade host is located are bound into a virtual link.

一种链路切换系统,应用于互联结构IB子网中,所述系统包括:A link switching system, applied in an interconnection structure IB subnet, said system comprising:

至少两个互联结构交换机IB Switch和至少一个刀片主机,所述刀片主机中存在主IB Switch模块和备用IB Switch模块,且任意一个IB Switch模块与每个IB Switch之间均存在链路,其中:At least two interconnection structure switches IB Switch and at least one blade host, where there are a master IB Switch module and a standby IB Switch module in the blade host, and there is a link between any IB Switch module and each IB Switch, wherein:

所述刀片主机,用于当根据预先确定出的所述IB子网的网络拓扑结构,判断出当前使用的第一链路发生故障时,确定第二链路;从所述第一链路切换到所述第二链路,使所述刀片主机能够使用所述第二链路传输数据;其中,所述第一链路与所述第二链路均与所述刀片主机中的主IB Switch模块连接。The blade host is configured to determine a second link when it is determined that the currently used first link fails according to the predetermined network topology structure of the IB subnet; switch from the first link to the second link, so that the blade host can use the second link to transmit data; wherein, both the first link and the second link are connected to the main IB Switch in the blade host module connection.

其中,所述刀片主机还用于:Wherein, the blade host is also used for:

当确定出连接所述第一链路的主IB Switch模块出现故障时,确定与所述刀片主机中的备用IB Switch模块连接的链路;从所述第一链路切换到与所述备用IB Switch模块连接的链路,使所述刀片主机能够使用与所述备用IBSwitch模块连接的链路传输数据。When it is determined that the main IB Switch module connected to the first link fails, determine the link connected to the standby IB Switch module in the blade host; switch from the first link to the link connected to the standby IB The link connected to the Switch module enables the blade host to use the link connected to the standby IBSwitch module to transmit data.

其中,与所述备用IB Switch模块连接的链路有至少两条;则Wherein, there are at least two links connected to the standby IB Switch module; then

所述刀片主机,具体用于:The blade host is specifically used for:

确定与备用IB Switch模块连接的每条链路的权值;从所述第一链路切换到与所述备用IB Switch模块连接的权值最大的链路,使所述刀片主机能够使用与所述备用IBSwitch模块连接的权值最大的链路传输数据。Determine the weight of each link connected to the standby IB Switch module; switch from the first link to the link with the largest weight connected to the standby IB Switch module, so that the blade host can use the The link with the largest weight connected to the backup IBSwitch module transmits data.

其中,所述系统还包括:Wherein, the system also includes:

子网管理器SM,用于通过向所述IB子网发送子网探测包对所述IB子网中的节点进行探测,并根据探测结果生成所述IB子网的网络拓扑结构。The subnet manager SM is configured to detect the nodes in the IB subnet by sending subnet detection packets to the IB subnet, and generate the network topology of the IB subnet according to the detection results.

其中,所述IB子网中的每个IB Switch级联,并且针对所述刀片主机中的每个IBSwitch模块:Wherein, each IB Switch in the IB subnet is cascaded, and for each IBSwitch module in the blade host:

当前IB Switch模块与所述刀片主机所在的子网中的每个IB Switch之间存在的链路被绑定成一个虚拟链路。Links existing between the current IB Switch module and each IB Switch in the subnet where the blade host is located are bound into a virtual link.

本发明实施例的有益效果如下:The beneficial effects of the embodiments of the present invention are as follows:

本发明实施例中,当刀片主机当前使用的与主IB Switch模块连接的第一链路发生故障时,能够在不切换主备IB Switch模块的基础上,切换至与主IBSwitch模块连接的第二链路来传输数据,这种在IB子网中增加冗余链路的方式解决了现有技术中存在的刀片主机在当前使用的链路发生故障时不会主动切换从而导致数据传输中断的问题。In the embodiment of the present invention, when the first link connected to the main IB Switch module currently used by the blade host fails, it can be switched to the second link connected to the main IB Switch module without switching the main and standby IB Switch modules. This method of adding redundant links in the IB subnet solves the problem in the prior art that the blade host will not actively switch when the currently used link fails, resulting in interruption of data transmission .

附图说明Description of drawings

图1为现有的IB网络拓扑结构图;Figure 1 is a diagram of the existing IB network topology;

图2为本发明实施例提供的一种链路切换方法的实现流程图;FIG. 2 is an implementation flowchart of a link switching method provided by an embodiment of the present invention;

图3为本发明实施例提供的链路切换方法在实际应用中的实现示意图;FIG. 3 is a schematic diagram of implementing a link switching method provided in an embodiment of the present invention in practical applications;

图4为本发明实施例提供的一种链路切换系统的结构示意图。FIG. 4 is a schematic structural diagram of a link switching system provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了解决现有技术中存在的刀片主机在当前使用的链路发生故障时不会主动切换从而导致数据传输中断的问题,本发明实施例提供了一种链路切换方法和系统。该技术方案中,当刀片主机当前使用的与主IB Switch模块连接的第一链路发生故障时,能够在不切换主备IB Switch模块的基础上,切换至与主IB Switch模块连接的第二链路来传输数据,这种在IB子网中增加冗余链路的方式解决了现有技术中存在的刀片主机在当前使用的链路发生故障时不会主动切换从而导致数据传输中断的问题。In order to solve the problem in the prior art that the blade host does not actively switch when the currently used link fails, resulting in interruption of data transmission, embodiments of the present invention provide a link switching method and system. In this technical solution, when the first link connected to the main IB Switch module currently used by the blade host fails, it can switch to the second link connected to the main IB Switch module without switching the main and standby IB Switch modules. This method of adding redundant links in the IB subnet solves the problem in the prior art that the blade host will not actively switch when the currently used link fails, resulting in interruption of data transmission .

以下结合说明书附图对本发明的实施例进行说明,应当理解,此处所描述的实施例仅用于说明和解释本发明,并不用于限制本发明。并且在不冲突的情况下,本说明中的实施例及实施例的特征可以互相结合。The embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention. And in the case of no conflict, the embodiments and the features of the embodiments in this description can be combined with each other.

本发明实施例提供了一种链路切换方法,如图2所示,为该方法的实现流程图,其中,刀片主机中存在主IB Switch模块和备用IB Switch模块,且任意一个IB Switch模块与该刀片主机所在的IB子网中的每个IB Switch之间均存在链路,该方法具体包括下述步骤:The embodiment of the present invention provides a link switching method, as shown in Figure 2, which is a flow chart for the implementation of the method, wherein, there are a main IB Switch module and a standby IB Switch module in the blade host, and any one IB Switch module and There is a link between each IB Switch in the IB subnet where the blade host is located, and the method specifically includes the following steps:

步骤21,当该刀片主机根据预先确定出的自身所在IB子网的网络拓扑结构,判断出当前使用的第一链路发生故障时,确定第二链路;第一链路与第二链路均与该刀片主机中的主IB Switch模块连接;Step 21, when the blade host determines that the first link currently in use fails according to the network topology of the IB subnet where it is located, determine the second link; the first link and the second link Both are connected to the main IB Switch module in the blade host;

本发明实施例中,刀片主机所在的IB子网的网络拓扑结构可以按照如下方式确定:In the embodiment of the present invention, the network topology of the IB subnet where the blade host is located can be determined in the following manner:

该IB子网中的主子网管理器SM通过向IB子网发送子网探测包对该IB子网中的节点进行探测,并根据探测结果生成该IB子网的网络拓扑结构。The main subnet manager SM in the IB subnet detects the nodes in the IB subnet by sending subnet detection packets to the IB subnet, and generates the network topology structure of the IB subnet according to the detection results.

其中,SM是IB网络中常见的功能之一,实现对IB子网的集中管理。SM发送的子网探测包中最常用的一种是定向路由包,定向路由包里面有初始路径字段(init_path),指示包进行的路径,有跳数字段(hop_count),指示包经过的跳数,有返回路径字段(return_path),记录应答包进行的路径,依靠上述信息,子网探测包就可以完成所有的拓扑查找和初始化工作。init_path和return_path记录的实际上是端口号,指示当前节点应该把包从哪个端口发送出去。Among them, SM is one of the common functions in the IB network, which realizes the centralized management of the IB subnet. The most commonly used type of subnet detection packets sent by SM is directional routing packets. There is an initial path field (init_path) in the directional routing packet, which indicates the path of the packet, and a hop count field (hop_count), which indicates the number of hops the packet passes through. , there is a return path field (return_path), which records the path of the response packet. Relying on the above information, the subnet detection packet can complete all the topology search and initialization work. The init_path and return_path records are actually port numbers, which indicate the port from which the current node should send the packet.

子网管理器的探测流程主要如下:SM首先向其驻留的本地节点发送一个Get(Nodeinfo)的子网管理探测包SMP,得到响应后,SM判断是否探测过这个节点,如果没有探测过这个节点,则将它加入拓扑数据库,如果已经探测过这个节点,SM通过Get(Portinfo)SMPs获取该节点的端口状态信息,如果是CA端口,则将它加入端口列表,如果是交换机的管理端口,则配置这个端口,进行参数设置,然后将发现的端口加入到端口对象的表项中。接着通过获得的端口状态信息,SM判断该端口的另一端是否连接其它设备,如果连接其它设备则继续发送Get(nodeinfo)的SMP进行探测,如果没有则判断探测是否结束,没有结束的话,继续探测,若结束就结束探测流程。The detection process of the subnet manager is mainly as follows: SM first sends a Get(Nodeinfo) subnet management detection packet SMP to the local node where it resides. node, it will be added to the topology database. If the node has been detected, SM will obtain the port status information of the node through Get (Portinfo) SMPs. If it is a CA port, it will be added to the port list. If it is the management port of the switch, Then configure the port, set parameters, and then add the discovered port to the entry of the port object. Then, through the obtained port status information, the SM judges whether the other end of the port is connected to other devices. If it is connected to other devices, it will continue to send the SMP of Get (nodeinfo) for detection. If not, it will judge whether the detection is over. If not, continue to detect , if it ends, the detection process ends.

一般的,SM可以存在于任何一个CA、交换机或路由器的任何一个端口,而且为了防止单点失效,在IB子网中可以存在多个SM,一个主SM和多个备份SM,备份SM检测到主SM死亡时,会选择一个备份SM称为主SM,接管子网管理权。Generally, SM can exist on any port of any CA, switch or router, and in order to prevent a single point of failure, there can be multiple SMs in the IB subnet, one primary SM and multiple backup SMs, and the backup SM detects When the primary SM dies, a backup SM is selected as the primary SM to take over the management right of the subnet.

步骤22,从第一链路切换到第二链路,使该刀片主机能够使用第二链路传输数据。Step 22, switch from the first link to the second link, so that the blade host can use the second link to transmit data.

由于第一链路和第二链路均与刀片主机中的主IB Switch模块连接,因此实际上并没有切换IB Switch模块。而第一链路和第二链路还可以预先通过双链路绑定技术绑定成一个虚拟链路,也就是说,无论使用第一链路传输数据还是传输第二链路传输数据,都是通过绑定的该虚拟链路传输数据。Since both the first link and the second link are connected to the main IB Switch module in the blade host, the IB Switch module is not actually switched. The first link and the second link can also be pre-bound into a virtual link through the dual-link binding technology, that is, no matter whether the first link is used to transmit data or the second link is used to transmit data. Data is transmitted through the bound virtual link.

本发明实施例中,当刀片主机当前使用的与主IB Switch模块连接的第一链路发生故障时,能够在不切换主备IB Switch模块的基础上,切换至与主IBSwitch模块连接的第二链路来传输数据,这种在IB子网中增加冗余链路的方式解决了现有技术中存在的刀片主机在当前使用的链路发生故障时不会主动切换从而导致数据传输中断的问题。In the embodiment of the present invention, when the first link connected to the main IB Switch module currently used by the blade host fails, it can be switched to the second link connected to the main IB Switch module without switching the main and standby IB Switch modules. This method of adding redundant links in the IB subnet solves the problem in the prior art that the blade host will not actively switch when the currently used link fails, resulting in interruption of data transmission .

如图3所示,为上述链路切换方法在实际应用中的实现示意图。其中,刀片主机1存在两个IB Switch模块,一个为主IB Switch模块,一个为备用IB Switch模块,与现有技术相比,在IB Switch1和刀片主机1的备用IB Switch模块之间增加了一条链路9001,在IBSwitch2和刀片主机1的主IB Switch模块之间增加一条链路9003,而IB Switch1和IBSwitch2之间级联。As shown in FIG. 3 , it is a schematic diagram of realizing the above-mentioned link switching method in practical application. Among them, there are two IB Switch modules in the blade host 1, one is the main IB Switch module, and the other is the standby IB Switch module. For link 9001, a link 9003 is added between IBSwitch2 and the main IB Switch module of blade host 1, and IB Switch1 and IBSwitch2 are cascaded.

参考图3,在IB子网初始化时由SM进行全IB子网的节点检测,生成网络拓扑结构,并且在刀片主机1内部,通过双链路绑定技术,将链路9000和9003绑定成一个虚拟链路,链路9001和9002绑定成另外一个虚拟链路。Referring to Figure 3, when the IB subnet is initialized, the SM performs node detection on the entire IB subnet to generate a network topology, and within the blade host 1, links 9000 and 9003 are bound into A virtual link, links 9001 and 9002 are bound to form another virtual link.

当IB Switch1和刀片主机1的主IB Switch模块之间的链路9000出现故障时,由于IB Switch模块没有上联检测机制,因此主IB Switch模块和备用IB Switch模块不会进行切换,而SM会重新按照网络探测流程进行节点检测,并根据预先设定的路由算法重新生成新的网络拓扑结构。从而确定出连接主IB Switch模块的链路9003,并切换至链路9003使数据能够从刀片主机1的备用IB Switch模块传输至IB Switch2中。When the link 9000 between IB Switch1 and the active IB Switch module of blade host 1 fails, because the IB Switch module does not have an uplink detection mechanism, the active IB Switch module and the standby IB Switch module will not switch over, but the SM will Node detection is performed according to the network detection process again, and a new network topology is regenerated according to the preset routing algorithm. Thus, the link 9003 connected to the main IB Switch module is determined, and the link 9003 is switched to enable data to be transmitted from the standby IB Switch module of the blade host 1 to the IB Switch2.

因此,在这种IB子网中,当当前使用的链路中断时,刀片主机的网络服务不会中断,从而达到保证数据的安全稳定传输的目的。Therefore, in this IB subnet, when the currently used link is interrupted, the network service of the blade host will not be interrupted, thereby achieving the purpose of ensuring safe and stable data transmission.

进一步的,在这种IB网络架构下,当刀片主机判断出链接链路9000和9003的主IBSwitch模块出现故障时,就会切换到与备用IB Switch模块连接的链路上,比如将数据从链路9000切换到链路9002,从而保证数据传输不中断。Furthermore, under this IB network architecture, when the blade host judges that the main IBSwitch module linking links 9000 and 9003 fails, it will switch to the link connected to the standby IB Switch module, such as transferring data from the link to Link 9000 is switched to link 9002 to ensure uninterrupted data transmission.

由于与备用IB Switch模块连接的链路有两条,因此可以首先确定出与备用IBSwitch模块连接的每条链路的权值,进而切换到与备用IB Switch模块连接的权值最大的链路来传输数据。Since there are two links connected to the standby IB Switch module, you can first determine the weight of each link connected to the standby IB Switch module, and then switch to the link with the highest weight connected to the standby IB Switch module. transfer data.

相应的。本发明实施例还提供了一种链路切换系统,如图4所示,为该系统的结构示意图,该系统应用于互联结构IB子网中,包括:corresponding. The embodiment of the present invention also provides a link switching system, as shown in FIG. 4 , which is a schematic structural diagram of the system. The system is applied to the IB subnet of the interconnection structure, including:

至少两个互联结构交换机IB Switch41和至少一个刀片主机42,所述刀片主机42中存在主IB Switch模块和备用IB Switch模块,且任意一个IB Switch模块与每个IBSwitch41之间均存在链路,其中:At least two interconnected structure switches IB Switch41 and at least one blade host 42, there are main IB Switch module and standby IB Switch module in the blade host 42, and there is a link between any IB Switch module and each IBSwitch41, wherein :

所述刀片主机42,用于当根据预先确定出的所述IB子网的网络拓扑结构,判断出当前使用的第一链路发生故障时,确定第二链路;从所述第一链路切换到所述第二链路,使所述刀片主机能够使用所述第二链路传输数据;其中,所述第一链路与所述第二链路均与所述刀片主机中的主IB Switch模块连接。The blade host 42 is configured to determine a second link when it is determined that the currently used first link fails according to the predetermined network topology structure of the IB subnet; from the first link switch to the second link, so that the blade host can use the second link to transmit data; wherein, both the first link and the second link are connected to the master IB in the blade host Switch module connection.

其中,所述刀片主机42还用于:Wherein, the blade host 42 is also used for:

当确定出连接所述第一链路的主IB Switch模块出现故障时,确定与所述刀片主机中的备用IB Switch模块连接的链路;从所述第一链路切换到与所述备用IB Switch模块连接的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的链路传输数据。When it is determined that the main IB Switch module connected to the first link fails, determine the link connected to the standby IB Switch module in the blade host; switch from the first link to the link connected to the standby IB The link connected to the Switch module enables the blade host to use the link connected to the standby IB Switch module to transmit data.

其中,当与所述备用IB Switch模块连接的链路有至少两条时,所述刀片主机42,具体用于:Wherein, when there are at least two links connected with the standby IB Switch module, the blade host 42 is specifically used for:

确定与备用IB Switch模块连接的每条链路的权值;从所述第一链路切换到与所述备用IB Switch模块连接的权值最大的链路,使所述刀片主机能够使用与所述备用IBSwitch模块连接的权值最大的链路传输数据。Determine the weight of each link connected to the standby IB Switch module; switch from the first link to the link with the largest weight connected to the standby IB Switch module, so that the blade host can use the The link with the largest weight connected to the backup IBSwitch module transmits data.

进一步的,所述系统还可以包括:Further, the system may also include:

子网管理器SM,用于通过向所述IB子网发送子网探测包对所述IB子网中的节点进行探测,并根据探测结果生成所述IB子网的网络拓扑结构。The subnet manager SM is configured to detect the nodes in the IB subnet by sending subnet detection packets to the IB subnet, and generate the network topology of the IB subnet according to the detection results.

其中,所述IB子网中的每个IB Switch41级联,并且针对所述刀片主机42中的每个IB Switch模块:Wherein, each IB Switch41 in the IB subnet is cascaded, and for each IB Switch module in the blade host 42:

当前IB Switch模块与所述刀片主机42所在的子网中的每个IB Switch41之间存在的链路被绑定成一个虚拟链路。The links existing between the current IB Switch module and each IB Switch 41 in the subnet where the blade host 42 is located are bound into a virtual link.

本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and combinations of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a Means for realizing the functions specified in one or more steps of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart flow or flows and/or block diagram block or blocks.

尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.

显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies thereof, the present invention also intends to include these modifications and variations.

Claims (6)

1.一种链路切换方法,其特征在于,刀片主机中存在主互联结构交换机IB Switch模块和备用IB Switch模块,且任意一个IB Switch模块与所述刀片主机所在的IB子网中的每个IB Switch之间均存在链路,所述方法包括:1. a link switching method, it is characterized in that, there is main interconnection structure exchange IB Switch module and standby IB Switch module in the blade host, and any one IB Switch module and each in the IB subnet where described blade host is located Links exist between IB Switches, and the methods include: 当所述刀片主机根据预先确定出的自身所在IB子网的网络拓扑结构,判断出当前使用的第一链路发生故障时,确定第二链路,从所述第一链路切换到所述第二链路,使所述刀片主机能够使用所述第二链路传输数据;其中,所述第一链路与所述第二链路均与所述刀片主机中的主IB Switch模块连接;When the blade host judges that the first link currently in use fails according to the network topology of the IB subnet where it is located, it determines the second link and switches from the first link to the The second link enables the blade host to use the second link to transmit data; wherein both the first link and the second link are connected to the main IB Switch module in the blade host; 当所述刀片主机确定出连接所述第一链路的主IB Switch模块出现故障时,确定与所述刀片主机中的备用IB Switch模块连接的链路,从所述第一链路切换到与所述备用IBSwitch模块连接的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的链路传输数据;When the blade host determines that the main IB Switch module connected to the first link fails, determine the link connected to the standby IB Switch module in the blade host, and switch from the first link to the link connected to the first link. The link connected to the standby IBSwitch module enables the blade host to use the link connected to the standby IB Switch module to transmit data; 其中,所述网络拓扑结构按照如下方式确定:所述IB子网中的子网管理器SM通过向所述IB子网发送子网探测包对所述IB子网中的节点进行探测,并根据探测结果生成所述IB子网的网络拓扑结构。Wherein, the network topology is determined as follows: the subnet manager SM in the IB subnet detects the nodes in the IB subnet by sending a subnet detection packet to the IB subnet, and according to The detection result generates the network topology of the IB subnet. 2.如权利要求1所述的方法,其特征在于,与所述备用IB Switch模块连接的链路有至少两条;则2. The method according to claim 1, wherein there are at least two links connected with the standby IB Switch module; then 从所述第一链路切换到与所述备用IB Switch模块连接的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的链路传输数据,具体包括:Switching from the first link to the link connected to the standby IB Switch module, so that the blade host can use the link connected to the standby IB Switch module to transmit data, specifically includes: 确定与备用IB Switch模块连接的每条链路的权值;Determine the weight of each link connected to the standby IB Switch module; 从所述第一链路切换到与所述备用IB Switch模块连接的权值最大的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的权值最大的链路传输数据。Switching from the first link to the link with the highest weight connected to the standby IB Switch module, so that the blade host can use the link with the highest weight connected to the standby IB Switch module to transmit data. 3.如权利要求1或2所述的方法,其特征在于,所述IB子网中的每个IB Switch级联,并且针对所述刀片主机中的每个IB Switch模块:3. The method according to claim 1 or 2, wherein each IB Switch in the IB subnet is cascaded, and for each IB Switch module in the blade host: 当前IB Switch模块与所述刀片主机所在的子网中的每个IB Switch之间存在的链路被绑定成一个虚拟链路。Links existing between the current IB Switch module and each IB Switch in the subnet where the blade host is located are bound into a virtual link. 4.一种链路切换系统,其特征在于,应用于互联结构IB子网中,所述系统包括至少两个互联结构交换机IB Switch和至少一个刀片主机,所述刀片主机中存在主IB Switch模块和备用IB Switch模块,且任意一个IB Switch模块与每个IB Switch之间均存在链路,其中:4. A link switching system, characterized in that it is applied in the interconnection structure IB subnet, the system includes at least two interconnection structure switches IB Switch and at least one blade host, and there is a main IB Switch module in the blade host and standby IB Switch modules, and there is a link between any IB Switch module and each IB Switch, where: 所述刀片主机,用于当根据预先确定出的所述IB子网的网络拓扑结构,判断出当前使用的第一链路发生故障时,确定第二链路,从所述第一链路切换到所述第二链路,使所述刀片主机能够使用所述第二链路传输数据;其中,所述第一链路与所述第二链路均与所述刀片主机中的主IB Switch模块连接;The blade host is configured to determine a second link and switch from the first link when it is determined that the currently used first link fails according to the predetermined network topology structure of the IB subnet to the second link, so that the blade host can use the second link to transmit data; wherein, both the first link and the second link are connected to the main IB Switch in the blade host module connection; 所述刀片主机,还用于当确定出连接所述第一链路的主IB Switch模块出现故障时,确定与所述刀片主机中的备用IB Switch模块连接的链路,从所述第一链路切换到与所述备用IB Switch模块连接的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的链路传输数据;The blade host is further configured to, when it is determined that the main IB Switch module connected to the first link fails, determine the link connected to the standby IB Switch module in the blade host, from the first link Road switching to the link connected to the standby IB Switch module, so that the blade host can use the link connected to the standby IB Switch module to transmit data; 所述系统还包括:子网管理器SM,用于通过向所述IB子网发送子网探测包对所述IB子网中的节点进行探测,并根据探测结果生成所述IB子网的网络拓扑结构。The system also includes: a subnet manager SM, configured to detect nodes in the IB subnet by sending a subnet detection packet to the IB subnet, and generate a network of the IB subnet according to the detection results Topology. 5.如权利要求4所述的系统,其特征在于,与所述备用IB Switch模块连接的链路有至少两条;则5. system as claimed in claim 4, is characterized in that, the link that is connected with described spare IB Switch module has at least two; Then 所述刀片主机,具体用于:The blade host is specifically used for: 确定与备用IB Switch模块连接的每条链路的权值;从所述第一链路切换到与所述备用IB Switch模块连接的权值最大的链路,使所述刀片主机能够使用与所述备用IB Switch模块连接的权值最大的链路传输数据。Determine the weight of each link connected to the standby IB Switch module; switch from the first link to the link with the largest weight connected to the standby IB Switch module, so that the blade host can use the The link with the largest weight connected to the backup IB Switch module transmits data. 6.如权利要求4或5所述的系统,其特征在于,所述IB子网中的每个IB Switch级联,并且针对所述刀片主机中的每个IB Switch模块:6. The system according to claim 4 or 5, wherein each IB Switch in the IB subnet is cascaded, and for each IB Switch module in the blade host: 当前IB Switch模块与所述刀片主机所在的子网中的每个IB Switch之间存在的链路被绑定成一个虚拟链路。Links existing between the current IB Switch module and each IB Switch in the subnet where the blade host is located are bound into a virtual link.
CN201410026793.4A 2014-01-21 2014-01-21 A kind of link switch-over method and system Active CN104796276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410026793.4A CN104796276B (en) 2014-01-21 2014-01-21 A kind of link switch-over method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410026793.4A CN104796276B (en) 2014-01-21 2014-01-21 A kind of link switch-over method and system

Publications (2)

Publication Number Publication Date
CN104796276A CN104796276A (en) 2015-07-22
CN104796276B true CN104796276B (en) 2018-11-23

Family

ID=53560805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410026793.4A Active CN104796276B (en) 2014-01-21 2014-01-21 A kind of link switch-over method and system

Country Status (1)

Country Link
CN (1) CN104796276B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114793211B (en) * 2022-06-21 2022-11-01 新华三信息技术有限公司 Service multilink backup method and device based on blade service
CN116248585B (en) * 2023-04-21 2025-06-17 济南浪潮数据技术有限公司 A communication method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101088257A (en) * 2004-09-30 2007-12-12 艾利森电话股份有限公司 Carrier class resilience solution for switched Ethernet LAN
CN101369914A (en) * 2007-08-15 2009-02-18 华为技术有限公司 Method and system for service link switching
CN101436963A (en) * 2008-12-04 2009-05-20 中兴通讯股份有限公司 Method for switching veneer network card, distributed system and veneer
CN101594235A (en) * 2009-06-02 2009-12-02 浪潮电子信息产业股份有限公司 A method of managing blade server based on SMBUS bus
CN101877631A (en) * 2010-06-28 2010-11-03 中兴通讯股份有限公司 Server and business switching method thereof
CN102023878A (en) * 2010-11-04 2011-04-20 天津曙光计算机产业有限公司 Method for realizing Infiniband network on Loongson blade server
CN102542524A (en) * 2011-12-31 2012-07-04 曙光信息产业股份有限公司 Cluster workstation and method for realizing same

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101088257A (en) * 2004-09-30 2007-12-12 艾利森电话股份有限公司 Carrier class resilience solution for switched Ethernet LAN
CN101369914A (en) * 2007-08-15 2009-02-18 华为技术有限公司 Method and system for service link switching
CN101436963A (en) * 2008-12-04 2009-05-20 中兴通讯股份有限公司 Method for switching veneer network card, distributed system and veneer
CN101594235A (en) * 2009-06-02 2009-12-02 浪潮电子信息产业股份有限公司 A method of managing blade server based on SMBUS bus
CN101877631A (en) * 2010-06-28 2010-11-03 中兴通讯股份有限公司 Server and business switching method thereof
CN102023878A (en) * 2010-11-04 2011-04-20 天津曙光计算机产业有限公司 Method for realizing Infiniband network on Loongson blade server
CN102542524A (en) * 2011-12-31 2012-07-04 曙光信息产业股份有限公司 Cluster workstation and method for realizing same

Also Published As

Publication number Publication date
CN104796276A (en) 2015-07-22

Similar Documents

Publication Publication Date Title
EP3198467B1 (en) System, method and apparatus for improving the performance of collective operations in high performance computing
JP5885747B2 (en) System and method for providing scalability of Ethernet over InfiniBand virtual hub in a middleware machine environment
CN105024855B (en) Distributed type assemblies manage system and method
Sidki et al. Fault tolerant mechanisms for SDN controllers
US20170310641A1 (en) Data center system
KR102014433B1 (en) System and method for supporting discovery and routing degraded fat-trees in a middleware machine environment
CN104104570A (en) Aggregation processing method in IRF (Intelligent Resilient Framework) system and device
US9473360B2 (en) System and method for primary switch election in peer groups
US9455916B2 (en) Method and system for changing path and controller thereof
US20140189094A1 (en) Resilient duplicate link aggregation emulation
CN103684716A (en) Method for transmitting messages in a redundantly operable industrial communication network and communication device for the redundantly operable industrial communication network
US9065678B2 (en) System and method for pinning virtual machine adapters to physical adapters in a network environment
CN101588304A (en) Implementation method of VRRP
US10015098B2 (en) Systems and methods to create highly scalable network services
CN113472646B (en) Data transmission method, node, network manager and system
WO2013113228A1 (en) Method, routing device and system for redundant backup of network device
CN102984058B (en) Network communication method based on open stream, controller and exchangers
CN107566292B (en) Message forwarding method and device
CN104486128A (en) System and method for realizing redundant heartbeat between nodes of double-controller
WO2015154525A1 (en) Method and device for protecting hqos using multiple board cards
EP3523947B1 (en) Method and system for synchronizing policy in a control plane
CN109104319B (en) Data storage device and method
CN105763448B (en) A kind of message transmitting method and device
CN104796276B (en) A kind of link switch-over method and system
CN104618021B (en) Optical fiber based data transmission method, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant