CN102843435A - Access and response method and access and response system of storing medium in cluster system - Google Patents
Access and response method and access and response system of storing medium in cluster system Download PDFInfo
- Publication number
- CN102843435A CN102843435A CN201210333560XA CN201210333560A CN102843435A CN 102843435 A CN102843435 A CN 102843435A CN 201210333560X A CN201210333560X A CN 201210333560XA CN 201210333560 A CN201210333560 A CN 201210333560A CN 102843435 A CN102843435 A CN 102843435A
- Authority
- CN
- China
- Prior art keywords
- client
- read
- storage system
- packet
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种在集群系统中存储介质的访问、响应方法和系统。所述访问系统,包括:集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问集群系统中的存储系统,其中客户端包括:封装装置,用于当接收到客户端在存储系统上某一存储空间上某一目录的小型计算机系统接口SCSI协议的读/写命令时,将读/写命令封装成远程直接存储访问协议SRP的第一数据包;发送装置,用于向存储系统发送第一数据包;解封装装置,用于在接收到存储系统对第一数据包反馈的采用SRP协议封装的第二数据包后,将第二数据包中的信息解封装成SCSI协议的信息;输出装置用于输出SCSI协议的信息。
The invention provides a storage medium access and response method and system in a cluster system. The access system includes: the cluster system includes a plurality of clients, wherein each client accesses the storage system in the cluster system through the InfiniBand network, wherein the client includes: an encapsulation device, used for receiving the client in the storage system When the small computer system interface SCSI protocol read/write command of a certain directory on a certain storage space, the read/write command is encapsulated into the first data packet of the remote direct storage access protocol SRP; the sending device is used to send to the storage system Sending the first data packet; the decapsulating device is used to decapsulate the information in the second data packet into SCSI protocol information after receiving the second data packet encapsulated by the SRP protocol fed back by the storage system to the first data packet ; The output device is used to output SCSI protocol information.
Description
技术领域 technical field
本发明涉及网络通信领域,尤其涉及一种在集群系统中存储介质的访问、响应方法和系统。The invention relates to the field of network communication, in particular to a storage medium access and response method and system in a cluster system.
背景技术 Background technique
BTA成立于1999年,由Compaq、惠普、IBM、戴尔、英特尔、微软和Sun七家公司牵头,共同研究发展的高速先进的I/O标准。最初的命名为System I/O,1999年10月,正式改名为InfiniBand(简称IB)。InfiniBand是一种长缆线的连接方式,具有高速、低延迟的传输特性,与其他网络协议(如TCP/IP)相比,InfiniBand具有更高的传输效率。原因在于许多网络协议具有转发损失的数据包的能力,但是由于要不断地确认与重发,基于这些协议的通信也会因此变慢,极大地影响了性能。TCP协议是一种被大量使用的传输协议,从冰箱到超级计算机等各种设备上都可以看到它的身影,但是使用它必须付出高昂的代价:TCP协议极其复杂、代码量巨大并且充满了各种特例,而且它很难卸载(所谓卸载就是不占用CPU的运行时间)。与之相比,InfiniBand使用基于信任的、流控制的机制来确保连接的完整性,数据包极少丢失。使用InfiniBand,除非确认接收缓存具备足够的空间,否则不会传送数据。接收方在数据传输完毕之后,返回信用来标示缓存空间的可用性。通过这种办法,InfiniBand消除了由于原数据包丢失而带来的重发延迟,从而提升了效率和整体性能。BTA was established in 1999, led by Compaq, Hewlett-Packard, IBM, Dell, Intel, Microsoft and Sun, to jointly research and develop high-speed advanced I/O standards. Originally named System I/O, in October 1999, it was officially renamed InfiniBand (abbreviated as IB). InfiniBand is a long cable connection method with high-speed, low-latency transmission characteristics. Compared with other network protocols (such as TCP/IP), InfiniBand has higher transmission efficiency. The reason is that many network protocols have the ability to forward lost data packets, but due to continuous confirmation and retransmission, communication based on these protocols will also slow down, which greatly affects performance. The TCP protocol is a widely used transport protocol, which can be seen in everything from refrigerators to supercomputers, but using it has to pay a high price: the TCP protocol is extremely complex, the code is huge, and it is full of Various special cases, and it is difficult to unload (the so-called unloading is not to occupy the running time of the CPU). In contrast, InfiniBand uses a trust-based, flow-control mechanism to ensure the integrity of the connection, and data packets are rarely lost. With InfiniBand, data is not transmitted until the receive buffer is confirmed to have sufficient space. After the data transmission is completed, the receiver returns a credit to indicate the availability of the cache space. In this way, InfiniBand eliminates the retransmission delay caused by the original packet loss, thereby improving efficiency and overall performance.
在带宽方面,InfiniBand比FC方案也具有优势,其中InfiniBand带宽为40Gbps,而FC只有16Gbps。IB网络是一种全新的基于通道和交换的开放互连结构标准,它具有高带宽、低延迟的特点,IB网络最高理论带宽可以达到120Gb/s。当前常用IB产品为QDR设备,其单端口单向带宽为40Gb/s,最小延迟小于1μs、支持多种传输服务、支持远端存储器直接访问(RDMA,包括RDMA读和RDMA写),使数据可以旁路核心,实现“零拷贝”传输。In terms of bandwidth, InfiniBand also has an advantage over FC solutions, where InfiniBand bandwidth is 40Gbps, while FC is only 16Gbps. The IB network is a brand-new open interconnection structure standard based on channels and switches. It has the characteristics of high bandwidth and low delay. The maximum theoretical bandwidth of the IB network can reach 120Gb/s. The current commonly used IB product is QDR equipment, its single-port unidirectional bandwidth is 40Gb/s, the minimum delay is less than 1μs, supports multiple transmission services, and supports remote memory direct access (RDMA, including RDMA read and RDMA write), so that data can Bypass the core to achieve "zero-copy" transfer.
在高性能计算领域中,随着高性能计算机各节点越来越多的采用infiniband连接,SAN(Storage Area Network,存储区域网络)存储设备是否支持无缝连接infiniband网络显得更加重要。In the field of high-performance computing, as more and more nodes of high-performance computers are connected by infiniband, it is more important whether SAN (Storage Area Network, storage area network) storage devices support seamless connection to infiniband networks.
发明内容 Contents of the invention
本发明提供一种在集群系统中存储介质的访问、响应方法和系统,要解决的技术The present invention provides a method and system for accessing and responding to storage media in a cluster system, and the technology to be solved
为解决上述技术问题,本发明提供了如下技术方案:In order to solve the problems of the technologies described above, the present invention provides the following technical solutions:
一种在集群系统中存储介质的访问系统,集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述客户端包括:An access system for storing media in a cluster system, the cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through an InfiniBand network, wherein the clients include:
封装装置,用于当接收到客户端在存储系统上某一存储空间上某一目录的小型计算机系统接口SCSI协议的读/写命令时,将所述读/写命令封装成远程直接存储访问协议SRP的第一数据包,其中所述第一数据包携带有所述客户端的标识信息以及该读/写命令所要读/写目录在存储系统上对应的存储位置信息;The encapsulation device is used to encapsulate the read/write command into a remote direct storage access protocol when receiving a read/write command of the small computer system interface SCSI protocol of a certain directory on a certain storage space on the storage system by the client The first data packet of the SRP, wherein the first data packet carries the identification information of the client and the storage location information corresponding to the directory to be read/written by the read/write command on the storage system;
发送装置,与所述封装装置相连,用于向所述存储系统发送所述第一数据包;a sending device, connected to the encapsulating device, and configured to send the first data packet to the storage system;
解封装装置,用于在接收到存储系统对所述第一数据包反馈的采用SRP协议封装的第二数据包后,将所述第二数据包中的信息解封装成SCSI协议的信息;The decapsulating device is configured to decapsulate the information in the second data packet into SCSI protocol information after receiving the second data packet encapsulated by the SRP protocol fed back by the storage system to the first data packet;
输出装置,与所述解封装装置相连,用于输出所述SCSI协议的信息。The output device is connected with the decapsulation device and is used for outputting the information of the SCSI protocol.
优选的,所述系统还具有如下特点:所述系统还包括:Preferably, the system also has the following characteristics: the system also includes:
申请装置,用于在系统初始化时,向存储系统申请该客户端的存储空间;The application device is used to apply to the storage system for the storage space of the client when the system is initialized;
获取装置,与所述申请装置相连,用于获取存储系统为该客户端分配的存储空间;The obtaining device is connected with the application device and is used to obtain the storage space allocated by the storage system for the client;
配置装置,与所述获取装置和所述封装装置相连,用于根据分配的存储空间,配置该客户端的目录与所分配到的存储空间的映射关系。The configuration device is connected with the obtaining device and the packaging device, and is used for configuring the mapping relationship between the catalog of the client and the allocated storage space according to the allocated storage space.
优选的,所述系统还具有如下特点:所述客户端的标识信息为该客户端上InfiniBand通信网卡的标识信息。Preferably, the system also has the following feature: the identification information of the client is the identification information of the InfiniBand communication network card on the client.
一种在集群系统中存储介质的响应系统,集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述存储系统包括:A response system for storing media in a cluster system, where the cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through an InfiniBand network, wherein the storage system includes:
解封装装置,用于当通过InfiniBand网络接收到SRP协议的第一数据包时,将所述第一数据包中的信息解封装,得到SCSI协议的读/写命令,其中所述读/写命包括用户在某一存储空间上某一目录的读/写命令;The decapsulation device is used to decapsulate the information in the first data packet when receiving the first data packet of the SRP protocol through the InfiniBand network to obtain the read/write command of the SCSI protocol, wherein the read/write command Including the user's read/write commands for a certain directory on a certain storage space;
输出装置,与所述解封装相连,用于通知存储系统处理所述读/写命令;an output device, connected to the decapsulation, for informing the storage system to process the read/write command;
封装装置,用于当接收到存储系统对所述第一数据包反馈的SCSI信息后,将所述SCSI信息封装成SRP协议的第二数据包;An encapsulating device, configured to encapsulate the SCSI information into a second data packet of the SRP protocol after receiving the SCSI information fed back by the storage system to the first data packet;
发送装置,与所述封装装置相连,用于发送所述第二数据包。A sending device, connected to the encapsulating device, for sending the second data packet.
优选的,所述系统还具有如下特点:所述系统还包括:Preferably, the system also has the following characteristics: the system also includes:
分配装置,用于在接收到客户端申请存储空间的请求后,为该客户端分配该客户端能够使用的存储空间;The allocating device is configured to allocate to the client the storage space that the client can use after receiving the request from the client for applying for the storage space;
通知装置,与所述分配装置相连,用于通知所述客户端该客户端能够使用的存储空间。Notifying means, connected to the allocating means, for notifying the client of the storage space that the client can use.
一种在集群系统中存储介质的访问方法,集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述客户端执行如下步骤:A method for accessing storage media in a cluster system, where the cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through an InfiniBand network, wherein the client performs the following steps:
当接收到客户端在存储系统上某一存储空间上某一目录的小型计算机系统接口SCSI协议的读/写命令时,将所述读/写命令封装成远程直接存储访问协议SRP的第一数据包,其中所述第一数据包携带有所述客户端的标识信息以及该读/写命令所要读/写目录在存储系统上对应的存储位置信息;When receiving the read/write command of the small computer system interface SCSI protocol of a certain directory on a certain storage space on the storage system, the read/write command is encapsulated into the first data of the remote direct storage access protocol SRP packet, wherein the first data packet carries the identification information of the client and the storage location information corresponding to the directory to be read/written by the read/write command on the storage system;
向所述存储系统发送所述第一数据包;sending the first data packet to the storage system;
在接收到存储系统对所述第一数据包反馈的采用SRP协议封装的第二数据包后,将所述第二数据包中的信息解封装成SCSI协议的信息;After receiving the second data packet encapsulated by the SRP protocol fed back by the storage system to the first data packet, decapsulating the information in the second data packet into SCSI protocol information;
输出所述SCSI协议的信息。Output information about the SCSI protocol.
优选的,所述方法还具有如下特点:所述方法还包括:Preferably, the method also has the following characteristics: the method also includes:
在系统初始化时,向存储系统申请该客户端的存储空间;When the system is initialized, apply to the storage system for the storage space of the client;
获取存储系统为该客户端分配的存储空间;Obtain the storage space allocated by the storage system for the client;
根据分配的存储空间,配置该客户端的目录与所分配到的存储空间的映射关系。According to the allocated storage space, configure the mapping relationship between the directory of the client and the allocated storage space.
优选的,所述方法还具有如下特点:所述客户端的标识信息为该客户端上InfiniBand通信网卡的标识信息。Preferably, the method also has the following characteristics: the identification information of the client is the identification information of the InfiniBand communication network card on the client.
一种在集群系统中存储介质的响应方法,集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述存储系统执行如下步骤:A method for responding to storage media in a cluster system, where the cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through an InfiniBand network, wherein the storage system performs the following steps:
当通过InfiniBand网络接收到SRP协议的第一数据包时,将所述第一数据包中的信息解封装,得到SCSI协议的读/写命令,其中所述读/写命包括用户在某一存储空间上某一目录的读/写命令;When the first data packet of the SRP protocol is received through the InfiniBand network, the information in the first data packet is decapsulated to obtain the read/write command of the SCSI protocol, wherein the read/write command includes the Read/write commands for a certain directory on the space;
通知存储系统处理所述读/写命令;Notifying the storage system to process the read/write command;
当接收到存储系统对所述第一数据包反馈的SCSI信息后,将所述SCSI信息封装成SRP协议的第二数据包;After receiving the SCSI information fed back by the storage system to the first data packet, encapsulating the SCSI information into a second data packet of the SRP protocol;
发送所述第二数据包。Send the second data packet.
优选的,所述方法还具有如下特点:所述方法还包括:Preferably, the method also has the following characteristics: the method also includes:
在接收到客户端申请存储空间的请求后,为该客户端分配该客户端能够使用的存储空间;After receiving the request from the client to apply for the storage space, allocate the storage space that the client can use for the client;
通知所述客户端该客户端能够使用的存储空间。Informing the client of the storage space that the client can use.
本发明提供的技术方案,通过InfiniBand实现数据的传输,实现“零拷贝”传输。The technical solution provided by the present invention realizes data transmission through InfiniBand and realizes "zero-copy" transmission.
附图说明 Description of drawings
图1为本发明提供的在集群系统中存储介质的访问系统实施例的流程示意图;FIG. 1 is a schematic flow diagram of an embodiment of a storage medium access system in a cluster system provided by the present invention;
图2为本发明提供的在集群系统中存储介质的响应系统实施例的结构示意图;FIG. 2 is a schematic structural diagram of an embodiment of a response system for storage media in a cluster system provided by the present invention;
图3为本发明提供的在集群系统中存储介质的访问方法实施例的流程示意图;FIG. 3 is a schematic flowchart of an embodiment of a storage medium access method in a cluster system provided by the present invention;
图4为本发明提供的在集群系统中存储介质的响应方法实施例的流程示意图。FIG. 4 is a schematic flowchart of an embodiment of a storage medium response method in a cluster system provided by the present invention.
具体实施方式 Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图及具体实施例对本发明作进一步的详细描述。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.
图1为本发明提供的在集群系统中存储介质的访问系统实施例的流程示意图。所述集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述客户端包括:FIG. 1 is a schematic flowchart of an embodiment of a storage medium access system in a cluster system provided by the present invention. The cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through an InfiniBand network, wherein the clients include:
封装装置101,用于当接收到客户端在存储系统上某一存储空间上某一目录的小型计算机系统接口SCSI协议的读/写命令时,将所述读/写命令封装成远程直接存储访问协议SRP的第一数据包,其中所述第一数据包携带有所述客户端的标识信息以及该读/写命令所要读/写目录在存储系统上对应的存储位置信息;The encapsulating device 101 is configured to encapsulate the read/write command into a remote direct storage access when receiving a read/write command of the small computer system interface SCSI protocol of a certain directory on a certain storage space on the storage system by the client The first data packet of the protocol SRP, wherein the first data packet carries the identification information of the client and the storage location information corresponding to the directory to be read/written by the read/write command on the storage system;
发送装置102,与所述封装装置101相连,用于向所述存储系统发送所述第一数据包;A sending
解封装装置103,用于在接收到存储系统对所述第一数据包反馈的采用SRP协议封装的第二数据包后,将所述第二数据包中的信息解封装成SCSI协议的信息;The
输出装置104,与所述解封装装置103相连,用于输出所述SCSI协议的信息。The
当然,各客户端设备在存储系统上所能使用的存储空间可以预先配置的,也可以是由客户端申请得到的,例如,所述系统还包括:Of course, the storage space that each client device can use on the storage system can be pre-configured, or can be obtained by the client application. For example, the system also includes:
申请装置,用于在系统初始化时,向存储系统申请该客户端的存储空间;The application device is used to apply to the storage system for the storage space of the client when the system is initialized;
获取装置,与所述申请装置相连,用于获取存储系统为该客户端分配的存储空间;The obtaining device is connected with the application device and is used to obtain the storage space allocated by the storage system for the client;
配置装置,与所述获取装置和所述封装装置相连,用于根据分配的存储空间,配置该客户端的目录与所分配到的存储空间的映射关系。The configuration device is connected with the obtaining device and the packaging device, and is used for configuring the mapping relationship between the catalog of the client and the allocated storage space according to the allocated storage space.
其中,所述客户端的标识信息为该客户端上InfiniBand通信网卡的标识信息。Wherein, the identification information of the client is the identification information of the InfiniBand communication network card on the client.
由上可以看出,由于客户端所访问的存储空间并不在本地,而是由IB网络相连的,在客户端初始化时,通过客户端向存储系统发起申请,该客户端申请到在该客户端所能够使用的存储空间;当用户在客户端发起读/写命令时,通过将客户端内部的读/写命令封装成能够在IB网络上传输的第一数据包,使得该读/写命令能够通过IB网络传输到存储系统,而当接收到存储系统对该数据包反馈的第二数据包后,通过对该第二数据包进行封装,得到客户端能够解析的数据,从而完成对存储系统的访问操作。It can be seen from the above that since the storage space accessed by the client is not local, but connected by the IB network, when the client is initialized, the client initiates an application to the storage system, and the client applies to the storage system on the client. The storage space that can be used; when the user initiates a read/write command on the client, by encapsulating the internal read/write command of the client into the first data packet that can be transmitted on the IB network, the read/write command can be It is transmitted to the storage system through the IB network, and after receiving the second data packet fed back by the storage system, the data that can be parsed by the client is obtained by encapsulating the second data packet, thereby completing the storage system. access operations.
图2为本发明提供的在集群系统中存储介质的响应系统实施例的结构示意图。图2所示方法实施例中,集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述存储系统包括:FIG. 2 is a schematic structural diagram of an embodiment of a storage medium response system in a cluster system provided by the present invention. In the method embodiment shown in Figure 2, the cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through an InfiniBand network, wherein the storage system includes:
解封装装置201,用于当通过InfiniBand网络接收到SRP协议的第一数据包时,将所述第一数据包中的信息解封装,得到SCSI协议的读/写命令,其中所述读/写命包括用户在某一存储空间上某一目录的读/写命令;The
输出装置202,与所述解封装201相连,用于通知存储系统处理所述读/写命令;an
封装装置203,用于当接收到存储系统对所述第一数据包反馈的SCSI信息后,将所述SCSI信息封装成SRP协议的第二数据包;The encapsulating
发送装置204,与所述封装装置相连,用于发送所述第二数据包。The sending
可选的,所述系统还包括:Optionally, the system also includes:
分配装置,用于在接收到客户端申请存储空间的请求后,为该客户端分配该客户端能够使用的存储空间;The allocating device is configured to allocate to the client the storage space that the client can use after receiving the request from the client for applying for the storage space;
通知装置,与所述分配装置相连,用于通知所述客户端该客户端能够使用的存储空间。Notifying means, connected to the allocating means, for notifying the client of the storage space that the client can use.
同理,由于客户端所访问的存储空间并不与用户所使用的客户端在一起,而是由IB网络相连的,在客户端初始化时,根据客户端发起的申请,为该客户端分配了该客户端所能够使用的存储空间;当从IB网络接收到第一数据包时,其中该第一数据包携带有读/写命令,通过将第一数据包解封装成存储系统能够解析的信息,以使得存储系统能够处理该读/写命令,当处理完成后,将处理结果封装成能够在IB网络上传输的第二数据包,使得该处理结构能够通过IB网络传输到客户端,从而完成对客户端的访问的响应。In the same way, since the storage space accessed by the client is not with the client used by the user, but is connected by the IB network, when the client is initialized, according to the application initiated by the client, the client is allocated The storage space that the client can use; when receiving the first data packet from the IB network, wherein the first data packet carries a read/write command, by decapsulating the first data packet into information that the storage system can parse , so that the storage system can process the read/write command. When the processing is completed, the processing result is encapsulated into a second data packet that can be transmitted on the IB network, so that the processing structure can be transmitted to the client through the IB network, thereby completing The response to the client's access.
下面对本发明提供的提供响应系统作进一步说明:The providing response system provided by the present invention is further described below:
磁盘阵列对海量信息的存储与处理,数据的可用性和性能具有重要的意义,目标器为主机提供设备映射和设备访问接口。本文设计和实现了SRP协议的目标器,其中SRP(SCSI RDMA protocol,远程直接存储访问协议)也被称为SCSI Remote Protocol,其主要作用是把SCSI协议的命令和数据通过RDMA的方式运行在infiniband网络上,它使用RDMA通信服务来进行操作。RDMA通信服务为消费对间提供通信,使用消息传递控制信息以及使用RDMA读/写操作传输数据。The disk array is of great significance to the storage and processing of massive information, the availability and performance of data, and the target device provides device mapping and device access interface for the host. This paper designs and implements the target device of the SRP protocol. The SRP (SCSI RDMA protocol, remote direct storage access protocol) is also called the SCSI Remote Protocol. Its main function is to run the commands and data of the SCSI protocol on the infiniband through RDMA. On the network, it uses RDMA communication services to operate. The RDMA communication service provides communication between consumer pairs, using messages to pass control information and using RDMA read/write operations to transfer data.
通过主机HCA卡GUID(Globally Unique Identifier,全球唯一标识符)来实现存储系统逻辑资源的授权访问,其中HCA卡为Mellanox公司推出的Mellanox ConnectX IB InfiniBand主机通道适配器(HCA)卡,下面对目标器对读/写请求的处理流程进行了介绍:Authorized access to logical resources of the storage system is achieved through the host HCA card GUID (Globally Unique Identifier, globally unique identifier). The HCA card is the Mellanox ConnectX IB InfiniBand Host Channel Adapter (HCA) card launched by Mellanox. The following is for the target The processing flow of read/write requests is introduced:
上文中的响应系统可以通过srp target实现,该系统将infiniband协议栈与srp协议有机的结合起来,采用srp存储协议技术可以利用infiniband网络的高带宽底延迟特性,从而实现io流量从客户端传输到磁盘阵列的功能,使SAN存储设备兼容infiniband网络。The above response system can be realized through srp target, which organically combines the infiniband protocol stack with the srp protocol. Using the srp storage protocol technology can take advantage of the high bandwidth and low delay characteristics of the infiniband network, so as to realize the transmission of io traffic from the client to the The disk array function makes SAN storage devices compatible with infiniband networks.
下面从软件实现的角度进行说明:The following is an explanation from the perspective of software implementation:
该系统体系结构包括:infiniband协议栈、srp协议模块和scsi命令解析模块,其中:The system architecture includes: infiniband protocol stack, srp protocol module and scsi command analysis module, in which:
infiniband协议栈提供infiniband HBA卡的硬件驱动及服务,以使得Srp协议依赖它来建立Infiniband连接及读/写数据;The infiniband protocol stack provides the hardware drivers and services of the infiniband HBA card, so that the Srp protocol relies on it to establish Infiniband connections and read/write data;
srp协议模块,提供对客户端infiniband的连接管理,srp读/写功能实现等功能;The srp protocol module provides functions such as infiniband client connection management, srp read/write function realization, etc.;
scsi命令解析模块,提供对scsi命令的解析。The scsi command parsing module provides parsing of scsi commands.
在体系结构中,Srp目标器有三个接口,一个接口是与后端的逻辑卷管理器,来从后端设备上读取或写入IO数据,一个是与plug and play管理器通信,另一个用于与Initiator端的数据传输。与后端的逻辑卷管理器相连是将接收到的数据写入后端磁盘,或者将需要的数据从后端磁盘读出,通过RDMA操作传输至initiator端。In the architecture, the Srp target device has three interfaces, one is to communicate with the back-end logical volume manager to read or write IO data from the back-end device, one is to communicate with the plug and play manager, and the other is to communicate with the plug and play manager. For data transmission with the Initiator side. Connecting with the back-end logical volume manager is to write the received data into the back-end disk, or read the required data from the back-end disk, and transmit it to the initiator side through RDMA operation.
Srp target依赖于infiniband access Layer来建立Infiniband连接,首先它建立监听的接口接收来自Initiator的连接请求,接收来自Initiator的信息单元(IU),对信息单元进行解释并做处理,并发送相应的响应给Initiator端。它的主要作用是将接收到的IU单元转换成SCSI请求,并且将处理的结果封装成SRP信息单元发送给客户端。它能同时为几个initiator同时提供服务,能够为每个Initiator同时提供I/O处理。The Srp target relies on the infiniband access Layer to establish an Infiniband connection. First, it establishes a listening interface to receive connection requests from the Initiator, receive information units (IUs) from the Initiator, interpret and process the information units, and send corresponding responses to Initiator side. Its main function is to convert the received IU unit into a SCSI request, and encapsulate the processing result into an SRP information unit and send it to the client. It can provide services for several initiators at the same time, and can provide I/O processing for each initiator at the same time.
第三个接口是plug and paly管理器,当srpt启动时,他会立即向plug andplay管理器注册,为通知回调提供入口函数,所以当有新的IB事件时就能被通知到。The third interface is the plug and play manager. When srpt starts, it will immediately register with the plug and play manager to provide an entry function for the notification callback, so it can be notified when there are new IB events.
下面对本发明的内容以一个具体实例来描述实现这一体系结构的过程。The following describes the process of realizing this architecture with a specific example for the content of the present invention.
正如发明内容中所描述的,本发明体系结构主要包括:infiniband协议栈(1)、srp协议模块(2)、scsi命令解析模块(3)As described in the summary of the invention, the architecture of the present invention mainly includes: infiniband protocol stack (1), srp protocol module (2), scsi command analysis module (3)
Srpt目标器有三个接口,一个接口是与后端的逻辑卷管理器,来从后端设备上读取或写入IO数据,一个是与plug and play管理器通信,另一个用于与Initiator端的数据传输。与后端的逻辑卷管理器相连是将接收到的数据写入后端磁盘,或者将需要的数据从后端磁盘读出,通过RDMA操作传输至initiator端。The Srpt target device has three interfaces, one is to communicate with the back-end logical volume manager to read or write IO data from the back-end device, one is to communicate with the plug and play manager, and the other is used to communicate with the Initiator-side data transmission. Connecting with the back-end logical volume manager is to write the received data into the back-end disk, or read the required data from the back-end disk, and transmit it to the initiator side through RDMA operation.
Srp target依赖于infiniband access Layer来建立Infiniband连接,首先它建立监听的接口接收来自Initiator的连接请求,接收来自Initiator的信息单元(IU),对信息单元进行解释并做处理,并发送相应的响应给Initiator端。它的主要作用是将接收到的IU单元转换成SCSI请求,并且将处理的结果封装成SRP信息单元发送给客户端。它能同时为几个initiator同时提供服务,能够为每个Initiator同时提供I/O处理。The Srp target relies on the infiniband access Layer to establish an Infiniband connection. First, it establishes a listening interface to receive connection requests from the Initiator, receive information units (IUs) from the Initiator, interpret and process the information units, and send corresponding responses to Initiator side. Its main function is to convert the received IU unit into a SCSI request, and encapsulate the processing result into an SRP information unit and send it to the client. It can provide services for several initiators at the same time, and can provide I/O processing for each initiator at the same time.
第三个接口是plug and paly管理器,当srpt启动时,他会立即向plug andplay管理器注册,为通知回调提供入口函数,所以当有新的IB事件时就能被通知到。The third interface is the plug and play manager. When srpt starts, it will immediately register with the plug and play manager to provide an entry function for the notification callback, so it can be notified when there are new IB events.
Srp target驱动是事件驱动的,所有它的行为都是对一定事件做的响应。常有的事件有RDMA发送消息,RDMA接收消息,RDMA read操作,RDMAwrite操作。The Srp target driver is event-driven, and all its behaviors are responses to certain events. Common events include RDMA sending messages, RDMA receiving messages, RDMA read operations, and RDMAwrite operations.
首先对读操作的实现进行说明:First, the implementation of the read operation is explained:
读操作是指SRP initiator服务器从SRP Target服务器读取信息的过程。读操作包括了SRP initiator从SRP Target服务器获取SCSI存储设备信息、查看存储设备容量大小以及读取存储设备数据等。客户端读数据过程如下:The read operation refers to the process in which the SRP initiator server reads information from the SRP Target server. The read operation includes the SRP initiator obtaining SCSI storage device information from the SRP Target server, checking the capacity of the storage device, and reading the data of the storage device. The process of reading data by the client is as follows:
1)infiniband完成队列回调函数调用,从系统队列中取出完成的工作请求,如果工作请求的状态出错,则abort此次出错的SCSI命令,进行错误处理,否则转21) Infiniband completes the call of the queue callback function and takes out the completed work request from the system queue. If the status of the work request is wrong, abort the wrong SCSI command and perform error handling, otherwise go to 2
2)如果操作类型是接收消息,判断接收的SRP的命令类型,设置RDMA三元组信息,获得SCSI命令属性,解释SCSI命令,分配处理SCSI命令结构,设置命令的tag,2) If the operation type is to receive a message, determine the command type of the received SRP, set the RDMA triplet information, obtain the SCSI command attribute, explain the SCSI command, allocate and process the SCSI command structure, set the tag of the command,
3)为SCSI命令分配相应的资源,判断SCSI命令操作,分配数据缓存区,执行SCSI读操作,从设备中读出数据3) Allocate corresponding resources for SCSI commands, judge SCSI command operations, allocate data buffers, perform SCSI read operations, and read data from devices
4)获取RDMA channel,设置分配RDMA缓存区,执行RDMA写操作4) Obtain RDMA channel, set and allocate RDMA buffer area, and perform RDMA write operation
5)建立RSP响应包,使用RDMA SEND操作发送消息5) Create an RSP response packet and use the RDMA SEND operation to send the message
6)RDMA SEND消息完成,释放RDMA缓存区,投递新的工作请求。6) The RDMA SEND message is completed, the RDMA buffer area is released, and a new work request is delivered.
下面对写操作的实现进行说明:The implementation of the write operation is described below:
写操作是指SRP initiator将数据写入SRP Target的过程。客户端写数据过程如下:The write operation refers to the process in which the SRP initiator writes data to the SRP Target. The process of writing data by the client is as follows:
1)infiniband完成队列回调函数调用,从系统队列中取出完成的工作请求,如果工作请求的状态出错,则abort此次出错的SCSI命令,进行错误处理,否则转21) Infiniband completes the call of the queue callback function and takes out the completed work request from the system queue. If the status of the work request is wrong, abort the wrong SCSI command and perform error handling, otherwise go to 2
2)如果操作类型是接收消息,判断接收的SRP的命令类型,设置RDMA三元组信息,获得SCSI命令属性,解释SCSI命令,分配处理SCSI命令结构,设置命令的tag,2) If the operation type is to receive a message, determine the command type of the received SRP, set the RDMA triplet information, obtain the SCSI command attribute, explain the SCSI command, allocate and process the SCSI command structure, set the tag of the command,
3)为SCSI命令分配相应的资源,判断SCSI命令操作,分配数据缓存区,设置RDMA通道读方式,启动RDMA读操作3) Allocate corresponding resources for SCSI commands, judge SCSI command operations, allocate data buffers, set RDMA channel read mode, and start RDMA read operations
4)RDMA读回调函数完成,设置SCSI命令状态为数据待处理状态,将读入的数据写入后端磁盘4) After the RDMA read callback function is completed, set the SCSI command status to data pending status, and write the read data to the back-end disk
5)建立RSP响应包,使用RDMA SEND操作发送消息5) Create an RSP response packet and use the RDMA SEND operation to send the message
6)RDMA SEND消息完成,释放RDMA缓存区,投递新的工作请求。6) The RDMA SEND message is completed, the RDMA buffer area is released, and a new work request is delivered.
至此,已经完整实现了srp target的读/写的过程,这种技术可以使磁盘阵列使用infiniband网络传输数据。So far, the read/write process of the srp target has been fully realized. This technology allows the disk array to use the infiniband network to transmit data.
因此采用这种技术,target目标器使得infiniband网络性能得到充分发挥,系统的总体性能带宽接近了infiniband理论的极限。Therefore, using this technology, the target device makes the performance of the infiniband network fully utilized, and the overall performance bandwidth of the system is close to the limit of the infiniband theory.
图3为本发明提供的在集群系统中存储介质的访问方法实施例的流程示意图。图3所示方法实施例中,集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述客户端执行如下步骤:FIG. 3 is a schematic flowchart of an embodiment of a storage medium access method in a cluster system provided by the present invention. In the method embodiment shown in Figure 3, the cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through the InfiniBand network, wherein the client performs the following steps:
步骤301、当接收到客户端在存储系统上某一存储空间上某一目录的小型计算机系统接口SCSI协议的读/写命令时,将所述读/写命令封装成远程直接存储访问协议SRP的第一数据包,其中所述第一数据包携带有所述客户端的标识信息以及该读/写命令所要读/写目录在存储系统上对应的存储位置信息;Step 301, when receiving the read/write command of the small computer system interface SCSI protocol of a certain directory on a certain storage space on the storage system by the client, the said read/write command is encapsulated into a remote direct storage access protocol SRP A first data packet, wherein the first data packet carries the identification information of the client and the storage location information corresponding to the directory to be read/written by the read/write command on the storage system;
步骤302、向所述存储系统发送所述第一数据包;Step 302, sending the first data packet to the storage system;
步骤303、在接收到存储系统对所述第一数据包反馈的采用SRP协议封装的第二数据包后,将所述第二数据包中的信息解封装成SCSI协议的信息;Step 303: After receiving the second data packet encapsulated by the SRP protocol fed back by the storage system to the first data packet, decapsulate the information in the second data packet into SCSI protocol information;
步骤304、输出所述SCSI协议的信息。Step 304, output the information of the SCSI protocol.
可选的,所述方法还包括:Optionally, the method also includes:
在系统初始化时,向存储系统申请该客户端的存储空间;When the system is initialized, apply to the storage system for the storage space of the client;
获取存储系统为该客户端分配的存储空间;Obtain the storage space allocated by the storage system for the client;
根据分配的存储空间,配置该客户端的目录与所分配到的存储空间的映射关系。According to the allocated storage space, configure the mapping relationship between the directory of the client and the allocated storage space.
可选的,所述客户端的标识信息为该客户端上InfiniBand通信网卡的标识信息。Optionally, the identification information of the client is the identification information of the InfiniBand communication network card on the client.
由上可以看出,由于客户端所访问的存储空间并不在本地,而是由IB网络相连的,在客户端初始化时,通过客户端向存储系统发起申请,该客户端申请到在该客户端所能够使用的存储空间;当用户在客户端发起读/写命令时,通过将客户端内部的读/写命令封装成能够在IB网络上传输的第一数据包,使得该读/写命令能够通过IB网络传输到存储系统,而当接收到存储系统对该数据包反馈的第二数据包后,通过对该第二数据包进行封装,得到客户端能够解析的数据,从而完成对存储系统的访问操作。It can be seen from the above that since the storage space accessed by the client is not local, but connected by the IB network, when the client is initialized, the client initiates an application to the storage system, and the client applies to the storage system on the client. The storage space that can be used; when the user initiates a read/write command on the client, by encapsulating the internal read/write command of the client into the first data packet that can be transmitted on the IB network, the read/write command can be It is transmitted to the storage system through the IB network, and after receiving the second data packet fed back by the storage system, the data that can be parsed by the client is obtained by encapsulating the second data packet, thereby completing the storage system. access operations.
图4为本发明提供的在集群系统中存储介质的响应方法实施例的流程示意图。图4所示方法实施例包括,集群系统包括多个客户端,其中每个客户端均通过InfiniBand网络访问所述集群系统中的一存储系统,其中所述存储系统执行如下步骤:FIG. 4 is a schematic flowchart of an embodiment of a storage medium response method in a cluster system provided by the present invention. The method embodiment shown in Fig. 4 comprises that the cluster system includes a plurality of clients, wherein each client accesses a storage system in the cluster system through an InfiniBand network, wherein the storage system performs the following steps:
步骤401、当通过InfiniBand网络接收到SRP协议的第一数据包时,将所述第一数据包中的信息解封装,得到SCSI协议的读/写命令,其中所述读/写命包括用户在某一存储空间上某一目录的读/写命令;Step 401, when receiving the first packet of the SRP protocol through the InfiniBand network, decapsulate the information in the first packet to obtain the read/write command of the SCSI protocol, wherein the read/write command includes the Read/write commands for a certain directory on a certain storage space;
步骤402、通知存储系统处理所述读/写命令;
步骤403、当接收到存储系统对所述第一数据包反馈的SCSI信息后,将所述SCSI信息封装成SRP协议的第二数据包;
步骤404、发送所述第二数据包。
可选的,所述方法还包括:Optionally, the method also includes:
在接收到客户端申请存储空间的请求后,为该客户端分配该客户端能够After receiving the request from the client to apply for storage space, assign the client to the client
由于客户端所访问的存储空间并不与用户所使用的客户端在一起,而是由IB网络相连的,在客户端初始化时,根据客户端发起的申请,为该客户端分配了该客户端所能够使用的存储空间;当从IB网络接收到第一数据包时,其中该第一数据包携带有读/写命令,通过将第一数据包解封装成存储系统能够解析的信息,以使得存储系统能够处理该读/写命令,当处理完成后,将处理结果封装成能够在IB网络上传输的第二数据包,使得该处理结构能够通过IB网络传输到客户端,从而完成对客户端的访问的响应。Since the storage space accessed by the client is not with the client used by the user, but is connected by the IB network, when the client is initialized, the client is assigned the client according to the application initiated by the client. The storage space that can be used; when the first data packet is received from the IB network, wherein the first data packet carries a read/write command, by decapsulating the first data packet into information that the storage system can parse, so that The storage system can process the read/write command, and when the processing is completed, encapsulate the processing result into a second data packet that can be transmitted on the IB network, so that the processing structure can be transmitted to the client through the IB network, thereby completing the The response to the visit.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求所述的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope described in the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210333560XA CN102843435A (en) | 2012-09-10 | 2012-09-10 | Access and response method and access and response system of storing medium in cluster system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210333560XA CN102843435A (en) | 2012-09-10 | 2012-09-10 | Access and response method and access and response system of storing medium in cluster system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN102843435A true CN102843435A (en) | 2012-12-26 |
Family
ID=47370488
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201210333560XA Pending CN102843435A (en) | 2012-09-10 | 2012-09-10 | Access and response method and access and response system of storing medium in cluster system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102843435A (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103257941A (en) * | 2013-04-17 | 2013-08-21 | 浪潮(北京)电子信息产业有限公司 | Multi-protocol storage controller and system |
| CN105243166A (en) * | 2015-11-10 | 2016-01-13 | 浪潮(北京)电子信息产业有限公司 | Data management device and system, and data writing and reading methods |
| CN106776430A (en) * | 2016-12-12 | 2017-05-31 | 英业达科技有限公司 | Server system |
| CN107391049A (en) * | 2017-09-08 | 2017-11-24 | 南宁磁动电子科技有限公司 | Storage connection equipment and storage system |
| CN107479833A (en) * | 2017-08-21 | 2017-12-15 | 中国人民解放军国防科技大学 | A remote non-volatile memory access and management method for key-value storage |
| CN111857580A (en) * | 2020-07-06 | 2020-10-30 | 苏州浪潮智能科技有限公司 | Identical writing method and device for distributed storage system |
| CN115080497A (en) * | 2022-06-17 | 2022-09-20 | 苏州浪潮智能科技有限公司 | Method, device, system and medium for identifying RDMA type node |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1691669A (en) * | 2004-04-21 | 2005-11-02 | 国际商业机器公司 | Method, system, and program for executing data transfer requests |
| CN1997033A (en) * | 2006-12-28 | 2007-07-11 | 华中科技大学 | A protocol for network storage and its system |
| CN101119374A (en) * | 2007-09-10 | 2008-02-06 | 杭州华三通信技术有限公司 | iSCSI communication method and corresponding initiation equipment and objective equipment |
| CN202406147U (en) * | 2011-12-31 | 2012-08-29 | 曙光信息产业股份有限公司 | Computer trunking system |
-
2012
- 2012-09-10 CN CN201210333560XA patent/CN102843435A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1691669A (en) * | 2004-04-21 | 2005-11-02 | 国际商业机器公司 | Method, system, and program for executing data transfer requests |
| CN1997033A (en) * | 2006-12-28 | 2007-07-11 | 华中科技大学 | A protocol for network storage and its system |
| CN101119374A (en) * | 2007-09-10 | 2008-02-06 | 杭州华三通信技术有限公司 | iSCSI communication method and corresponding initiation equipment and objective equipment |
| CN202406147U (en) * | 2011-12-31 | 2012-08-29 | 曙光信息产业股份有限公司 | Computer trunking system |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103257941A (en) * | 2013-04-17 | 2013-08-21 | 浪潮(北京)电子信息产业有限公司 | Multi-protocol storage controller and system |
| CN103257941B (en) * | 2013-04-17 | 2015-09-23 | 浪潮(北京)电子信息产业有限公司 | Multi-protocol storage controller and system |
| CN105243166A (en) * | 2015-11-10 | 2016-01-13 | 浪潮(北京)电子信息产业有限公司 | Data management device and system, and data writing and reading methods |
| CN106776430A (en) * | 2016-12-12 | 2017-05-31 | 英业达科技有限公司 | Server system |
| CN107479833A (en) * | 2017-08-21 | 2017-12-15 | 中国人民解放军国防科技大学 | A remote non-volatile memory access and management method for key-value storage |
| CN107479833B (en) * | 2017-08-21 | 2020-04-17 | 中国人民解放军国防科技大学 | Key value storage-oriented remote nonvolatile memory access and management method |
| CN107391049A (en) * | 2017-09-08 | 2017-11-24 | 南宁磁动电子科技有限公司 | Storage connection equipment and storage system |
| CN107391049B (en) * | 2017-09-08 | 2023-05-26 | 南宁磁动电子科技有限公司 | Storage connection device and storage system |
| CN111857580A (en) * | 2020-07-06 | 2020-10-30 | 苏州浪潮智能科技有限公司 | Identical writing method and device for distributed storage system |
| CN115080497A (en) * | 2022-06-17 | 2022-09-20 | 苏州浪潮智能科技有限公司 | Method, device, system and medium for identifying RDMA type node |
| CN115080497B (en) * | 2022-06-17 | 2024-10-15 | 苏州浪潮智能科技有限公司 | A method, device, system and medium for identifying RDMA type nodes |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI777072B (en) | Host, nvme ssd and method for storage service | |
| TWI840641B (en) | Multiple function storage device and method for operating multiple function storage device | |
| US7743178B2 (en) | Method and apparatus for SATA tunneling over fibre channel | |
| US8948199B2 (en) | Fibre channel processing by a host channel adapter | |
| US7853741B2 (en) | Tunneling SATA targets through fibre channel | |
| CN102843435A (en) | Access and response method and access and response system of storing medium in cluster system | |
| US7080190B2 (en) | Apparatus and method for providing transparent sharing of channel resources by multiple host machines | |
| CN103257941B (en) | Multi-protocol storage controller and system | |
| JP2005310130A (en) | Method, system, and program for executing data transfer request | |
| CN103176751A (en) | Unified service system under multiple storage protocols | |
| EP4369171A1 (en) | Method and apparatus for processing access request, and storage device and storage medium | |
| WO2017162175A1 (en) | Data transmission method and device | |
| CN117908796A (en) | FC multi-protocol data storage and transmission system | |
| WO2024227389A1 (en) | Data transmission system, method and apparatus, communication device and storage medium | |
| CN101567890A (en) | Metadata transmission method, client device and server device | |
| CN104038550B (en) | Data communications method and its device, storage system | |
| CN101655773B (en) | Disk array miniature computer system interface target device and data transmission method | |
| US7366802B2 (en) | Method in a frame based system for reserving a plurality of buffers based on a selected communication protocol | |
| CN102868684A (en) | Fiber channel target and realizing method thereof | |
| WO2015055008A1 (en) | Storage controller chip and disk packet transmission method | |
| WO2024245069A1 (en) | Cloud storage processing method, and device, storage medium and system | |
| US10664420B2 (en) | System and method for port-to-port communications using direct memory access | |
| EP2300925B1 (en) | System to connect a serial scsi array controller to a storage area network | |
| CN101212490A (en) | storage device controller | |
| CN121070861B (en) | An Embedded System Based on NVMe oF RDMA |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C12 | Rejection of a patent application after its publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20121226 |
