[go: up one dir, main page]

CN118606079B - A communication method and system based on socket interface - Google Patents

A communication method and system based on socket interface Download PDF

Info

Publication number
CN118606079B
CN118606079B CN202411062523.9A CN202411062523A CN118606079B CN 118606079 B CN118606079 B CN 118606079B CN 202411062523 A CN202411062523 A CN 202411062523A CN 118606079 B CN118606079 B CN 118606079B
Authority
CN
China
Prior art keywords
remote
registration information
local
address registration
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411062523.9A
Other languages
Chinese (zh)
Other versions
CN118606079A (en
Inventor
施威
李靖轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202411062523.9A priority Critical patent/CN118606079B/en
Publication of CN118606079A publication Critical patent/CN118606079A/en
Application granted granted Critical
Publication of CN118606079B publication Critical patent/CN118606079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/549Remote execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The application provides a communication method and a system based on a socket interface, wherein the communication method comprises the following steps: under the condition that a local node and a remote node support a shared memory communication SMC protocol and a target link is reachable, registering a local memory address applied by a user state into a kernel to obtain local address registration information, wherein the target link corresponds to the type of bottom hardware equipment of the SMC; transmitting the local address registration information to a remote node, and acquiring the remote address registration information of the remote node; after the user mode writes the target data into the space of the local memory address, the target data is read from the user mode by a direct memory access mode and is transmitted to the space of the remote memory address corresponding to the remote address registration information, so that the technical problem that communication performance is reduced because a CPU is required to participate in memory copying and semantic conversion between the kernel mode and the user mode based on SMC protocol communication in the related art is solved.

Description

一种基于socket接口的通信方法和系统A communication method and system based on socket interface

技术领域Technical Field

本申请涉及通信技术领域,尤其涉及一种基于socket接口的通信方法和系统。The present application relates to the field of communication technology, and in particular to a communication method and system based on a socket interface.

背景技术Background Art

大规模数据分析和事务处理、高频交易系统、实时数据同步和备份、大型并行计算任务和人工智能模型训练、以及云计算和数据中心内部网络优化场景,对主机间通信性能有着严格的要求。Large-scale data analysis and transaction processing, high-frequency trading systems, real-time data synchronization and backup, large-scale parallel computing tasks and artificial intelligence model training, as well as cloud computing and data center internal network optimization scenarios have strict requirements on the communication performance between hosts.

用户态远程直接内存访问 (remote direct memory access,RDMA)verbs 是上述高性能网络通信场景下的第一选择,其允许用户态进程绕过内核和中央处理器(CentralProcessing Unit,简称为 CPU)直接在网络硬件上操作数据,能显著减少延迟和CPU负载,提供更高的数据吞吐量和更低的数据传输延迟。然而基于 verbs开发RDMA应用要求开发者和运维团队不仅需要具备专业的网络和硬件知识,还需要持续投入时间和资源进行技术更新和系统优化,从而确保系统的高效运行和稳定性。User-mode remote direct memory access (RDMA) verbs are the first choice in the above high-performance network communication scenario. They allow user-mode processes to bypass the kernel and the central processing unit (CPU) to directly operate data on the network hardware, which can significantly reduce latency and CPU load, and provide higher data throughput and lower data transmission latency. However, developing RDMA applications based on verbs requires developers and operation and maintenance teams to not only have professional network and hardware knowledge, but also to continuously invest time and resources in technology updates and system optimization to ensure efficient operation and stability of the system.

共享内存通信(Shared Memory Communication,SMC)是 IBM 提出的一种高性能网络协议,无需应用任何修改即可将TCP通信透明替换成RDMA通信,解决了上述基于 verbs接口编程过于复杂的问题,同时将维护成本限制在内核内,极大地降低了开发者和维护团队的成本。然而,为了完全兼容标准的 TCP 应用,SMC 在实现上引入了用户态和内核之间切换的指令开销和拷贝开销。针对该技术问题,相关技术尚未提出有效地解决方案。Shared Memory Communication (SMC) is a high-performance network protocol proposed by IBM. It can transparently replace TCP communication with RDMA communication without any modification, solving the problem of overly complex programming based on verbs interface. At the same time, it limits the maintenance cost within the kernel, greatly reducing the cost of developers and maintenance teams. However, in order to be fully compatible with standard TCP applications, SMC introduces instruction overhead and copy overhead for switching between user mode and kernel in its implementation. For this technical problem, relevant technologies have not yet proposed an effective solution.

发明内容Summary of the invention

本申请实施例提供了一种基于socket接口的通信方法和系统,以解决上述一个或多个技术问题。The embodiments of the present application provide a communication method and system based on a socket interface to solve one or more of the above-mentioned technical problems.

第一方面,本申请实施例提供了一种基于socket接口的通信方法,包括:In a first aspect, an embodiment of the present application provides a communication method based on a socket interface, comprising:

在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,所述目标链路与所述SMC的底层硬件设备类型对应;In the case where the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, registering the local memory address applied for in the user state into the kernel to obtain local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC;

将所述本地地址注册信息发送至所述远端节点,并接收所述远端节点发送的远端地址注册信息;Sending the local address registration information to the remote node, and receiving the remote address registration information sent by the remote node;

在所述用户态将目标数据写入所述本地内存地址的空间后,通过直接内存访问方式从所述用户态读取所述目标数据并将所述目标数据传输至所述远端地址注册信息对应的远端内存地址的空间中。After the user state writes the target data into the space of the local memory address, the target data is read from the user state through direct memory access and the target data is transferred to the space of the remote memory address corresponding to the remote address registration information.

第二方面,本申请实施例提供了一种socket接口,包括:In a second aspect, an embodiment of the present application provides a socket interface, including:

第一子接口,用于在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,所述目标链路与所述SMC的底层硬件设备类型对应,将所述本地地址注册信息发送至所述远端节点,并接收所述远端节点发送的远端地址注册信息;The first sub-interface is used to register the local memory address applied by the user state into the kernel to obtain local address registration information when the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, wherein the target link corresponds to the underlying hardware device type of the SMC, send the local address registration information to the remote node, and receive the remote address registration information sent by the remote node;

第二子接口,用于在所述用户态将目标数据写入所述本地内存地址的空间后,通过直接内存访问方式从所述用户态读取所述目标数据并将所述目标数据传输至所述远端地址注册信息对应的远端内存地址的空间中。The second sub-interface is used to read the target data from the user state through direct memory access and transfer the target data to the space of the remote memory address corresponding to the remote address registration information after the user state writes the target data into the space of the local memory address.

第三方面,本实施例提供了一种基于socket接口的通信系统,包括:In a third aspect, this embodiment provides a communication system based on a socket interface, including:

本地节点,用于在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,所述目标链路与所述SMC的底层硬件设备类型对应;将所述本地地址注册信息发送至所述远端节点,并接收所述远端节点发送的远端地址注册信息;在所述用户态将目标数据写入所述本地内存地址的空间后,通过直接内存访问方式从所述用户态读取所述目标数据并将所述目标数据传输至所述远端地址注册信息对应的远端内存地址的空间中;A local node is used to register the local memory address applied by the user state into the kernel to obtain local address registration information when the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, wherein the target link corresponds to the underlying hardware device type of the SMC; send the local address registration information to the remote node, and receive the remote address registration information sent by the remote node; after the user state writes the target data into the space of the local memory address, read the target data from the user state through direct memory access and transfer the target data to the space of the remote memory address corresponding to the remote address registration information;

远端节点,用于确定远端地址注册信息,并向所述本地节点发送所述远端地址注册信息 ,以及在接收到所述目标数据后,将所述目标数据写入到所述远端地址注册信息对应的远端内存地址的空间中。The remote node is used to determine the remote address registration information, send the remote address registration information to the local node, and after receiving the target data, write the target data into the space of the remote memory address corresponding to the remote address registration information.

第四方面,本申请实施例提供了一种电子设备,包括存储器、处理器及存储在存储器上的计算机程序,所述处理器在执行所述计算机程序时实现上述任一项所述的方法。In a fourth aspect, an embodiment of the present application provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory, wherein the processor implements any of the above methods when executing the computer program.

第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的方法。In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method described in any one of the above is implemented.

第六方面,本申请实施例提供了一种计算机程序产品,包括计算机指令,所示计算机指令被处理器执行时实现上述任一项所述的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, including computer instructions, which implement any of the methods described above when executed by a processor.

与相关技术相比,本申请具有如下优点:Compared with the related art, this application has the following advantages:

本申请实施例对socket接口进行了扩展,扩展后的socket接口在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应;将该本地地址注册信息发送至该远端节点,并获取该远端节点的远端地址注册信息;在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中。也就是说,本申请实施例扩展后的socket接口可以允许应用使用用户态内存区域直接进行数据通信,而无需CPU在通信数据路径上进行任何操作,解决了相关技术中基于SMC协议通信时,需要CPU参与内核态和用户态之间内存拷贝和语义转换,导致通信性能降低的技术问题,进而达到了提高通信性能的技术效果。The embodiment of the present application extends the socket interface. When the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, the extended socket interface registers the local memory address applied by the user state into the kernel to obtain the local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC; sends the local address registration information to the remote node, and obtains the remote address registration information of the remote node; after the user state writes the target data into the space of the local memory address, the target data is read from the user state through direct memory access and the target data is transferred to the space of the remote memory address corresponding to the remote address registration information. In other words, the extended socket interface of the embodiment of the present application allows the application to use the user state memory area to directly communicate data without the CPU performing any operation on the communication data path, which solves the technical problem in the related technology that the CPU needs to participate in the memory copy and semantic conversion between the kernel state and the user state when communicating based on the SMC protocol, resulting in reduced communication performance, thereby achieving the technical effect of improving communication performance.

上述说明仅是本申请技术方案的概述 ,为了能够更清楚了解本申请的技术手段,可依照说明书的内容予以实施,并且为了让本申请的上述和其他目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。The above description is only an overview of the technical solution of the present application. In order to more clearly understand the technical means of the present application, it can be implemented in accordance with the contents of the specification. In order to make the above and other purposes, features and advantages of the present application more obvious and easy to understand, the specific implementation methods of the present application are listed below.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

在附图中,除非另外规定,否则贯穿多个附图相同的附图标记表示相同或相似的部件或元素。这些附图不一定是按照比例绘制的。应该理解,这些附图仅描绘了根据本申请的一些实施方式,而不应将其视为是对本申请范围的限制。In the accompanying drawings, unless otherwise specified, the same reference numerals throughout the multiple drawings represent the same or similar parts or elements. These drawings are not necessarily drawn to scale. It should be understood that these drawings only depict some embodiments according to the present application and should not be regarded as limiting the scope of the present application.

图1示出了本申请实施例中提供的基于socket接口的通信方法的流程图;FIG1 shows a flowchart of a communication method based on a socket interface provided in an embodiment of the present application;

图2示出了本申请实施例中提供的基于socket接口的通信架构示意图;FIG2 shows a schematic diagram of a communication architecture based on a socket interface provided in an embodiment of the present application;

图3示出了本申请实施例中提供的另一种基于socket接口的通信方法的流程图;FIG3 shows a flowchart of another communication method based on a socket interface provided in an embodiment of the present application;

图4示出了本申请实施例中提供的socket接口结构框图;FIG4 shows a block diagram of a socket interface structure provided in an embodiment of the present application;

图5示出了本申请实施例中提供的基于socket接口的通信装置的结构框图;FIG5 shows a structural block diagram of a communication device based on a socket interface provided in an embodiment of the present application;

图6示出了本申请实施例中提供的基于socket接口的通信系统的结构框图;FIG6 shows a structural block diagram of a communication system based on a socket interface provided in an embodiment of the present application;

图7示出了用来实现本申请实施例的电子设备的框图。FIG. 7 shows a block diagram of an electronic device for implementing an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

在下文中,仅简单地描述了某些示例性实施例。正如本领域技术人员可认识到的那样,在不脱离本申请的构思或范围的情况下,可通过各种不同方式修改所描述的实施例。因此,附图和描述被认为本质上是示例性的,而非限制性的。In the following, only some exemplary embodiments are briefly described. As those skilled in the art will appreciate, the described embodiments may be modified in various ways without departing from the concept or scope of the present application. Therefore, the drawings and descriptions are considered to be exemplary in nature and not restrictive.

为便于理解本申请实施例的技术方案,以下对本申请实施例的相关技术进行说明。以下相关技术作为可选方案与本申请实施例的技术方案可以进行任意结合,其均属于本申请实施例的保护范围。To facilitate understanding of the technical solutions of the embodiments of the present application, the following describes the related technologies of the embodiments of the present application. The following related technologies can be combined with the technical solutions of the embodiments of the present application as optional solutions, and they all belong to the protection scope of the embodiments of the present application.

术语解释Explanation of terms

TCP ,传输控制协议(Transmission Control Protocol,TCP)是一种面向连接的、可靠的、基于字节流的传输层通信协议,由 IETF RFC 793 定义。在简化的计算机网络OSI模型中,它完成第四层传输层所指定的功能。TCP, Transmission Control Protocol (TCP) is a connection-oriented, reliable, byte-stream-based transport layer communication protocol defined by IETF RFC 793. In the simplified computer network OSI model, it completes the functions specified by the fourth layer, the transport layer.

Socket,套接字(socket)是计算机网络节点内部的软件结构,充当通过网络发送、接收数据的端点,或是节点内部进程通信的软件端点。Socket, a socket is a software structure inside a computer network node, acting as an endpoint for sending and receiving data through the network, or a software endpoint for process communication within a node.

RDMA,远程直接内存访问(remote direct memory access,RDMA)是一种绕过远程主机操作系统内核访问其内存中数据的技术,由于不经过操作系统,不仅节省了大量CPU资源,同样也提高了系统吞吐量、降低了系统的网络通信延迟,在大规模并行计算机集群中有广泛应用。RDMA, remote direct memory access (RDMA) is a technology that bypasses the kernel of the remote host operating system to access data in its memory. Since it does not go through the operating system, it not only saves a lot of CPU resources, but also improves system throughput and reduces the system's network communication latency. It is widely used in large-scale parallel computer clusters.

ISM,内部共享内存(internal shared memory,ISM)是一种为运行在同一物理服务器上的多个逻辑分区(LPARs)或虚拟机(VMs)提供了一块共享内存空间的虚拟设备。使各个分区或虚拟机能够通过直接访问这个共享内存区域高效地进行数据交换,而无需经过网络协议栈,从而降低延迟并提升通信性能。ISM, internal shared memory (ISM) is a virtual device that provides a shared memory space for multiple logical partitions (LPARs) or virtual machines (VMs) running on the same physical server. It enables each partition or virtual machine to efficiently exchange data by directly accessing this shared memory area without going through the network protocol stack, thereby reducing latency and improving communication performance.

SMC/SMC-D/SMC-R,共享内存通信协议(Shared Memory Communication,SMC)是IBM 提出的一种高性能网络协议。SMC 协议透明地将 TCP socket 的流数据以共享内存访问的方式传输,提供高吞吐、低时延、低开销的网络。当 SMC 中共享内存操作基于 IBM ISM设备实现时,称为基于 DMA 技术的共享内存通信(Shared Memory Communication overDMA,SMC-D;当 SMC 中共享内存操作基于 RDMA 设备实现时,称为基于 RDMA 技术的共享内存通信(Shared Memory Communication over RDMA,SMC-R)。SMC/SMC-D/SMC-R, Shared Memory Communication (SMC) is a high-performance network protocol proposed by IBM. The SMC protocol transparently transmits the stream data of TCP socket in the form of shared memory access, providing a high-throughput, low-latency, and low-overhead network. When the shared memory operation in SMC is implemented based on IBM ISM devices, it is called Shared Memory Communication over DMA (SMC-D); when the shared memory operation in SMC is implemented based on RDMA devices, it is called Shared Memory Communication over RDMA (SMC-R).

verbs,verbs 是一类应用编程接口(API)或命令集,是一种编程抽象,用于执行特定的操作。在RDMA中,ibverbs 是操作InfiniBand(IB)硬件(IB是一种高性能的计算机网络通信标准,主要用于高速数据传输,尤其在数据中心、高性能计算和企业级存储领域中广泛应用。IB被设计用来支持数据传输速度非常高的需求,并确保数据传输的可靠性和低延迟。IB硬件指涉及到这一通信标准相关的各种物理设备和组件)的用户态编程库名称,定义了一系列操作来管理 RDMA 资源,如保护域(Protection Domain),内存区域(MemoryRegion),队列对(Queue Pair)等。这些操作通常包括创建、修改和销毁资源,以及启动数据传输操作等。verbs, verbs is a type of application programming interface (API) or command set, a programming abstraction used to perform specific operations. In RDMA, ibverbs is the name of the user-mode programming library that operates InfiniBand (IB) hardware (IB is a high-performance computer network communication standard, mainly used for high-speed data transmission, especially in data centers, high-performance computing and enterprise-level storage. IB is designed to support very high data transmission speed requirements and ensure data transmission reliability and low latency. IB hardware refers to various physical devices and components related to this communication standard). It defines a series of operations to manage RDMA resources, such as protection domains, memory regions, queue pairs, etc. These operations usually include creating, modifying and destroying resources, and starting data transmission operations.

基于 verbs开发RDMA应用面临着不少挑战 :1)需要开发者深入理解 RDMA 技术细节,特别是各种网卡硬件资源的抽象所带来的学习门槛和编程复杂度;2)缺少统一的操作系统(Operating System,OS ) 提供的监控和运维管理能力,都需要使用 RDMA 的应用自己来实现;3)维护庞大的 RDMA 软件栈带来的显著成本。这些都要求开发者和运维团队不仅需要具备专业的网络和硬件知识,还需要持续投入时间和资源进行技术更新和系统优化,从而确保系统的高效运行和稳定性。Developing RDMA applications based on verbs faces many challenges: 1) Developers need to have a deep understanding of RDMA technical details, especially the learning threshold and programming complexity brought about by the abstraction of various network card hardware resources; 2) There is a lack of monitoring and operation and maintenance management capabilities provided by a unified operating system (OS), which requires applications using RDMA to implement them themselves; 3) The significant cost of maintaining a huge RDMA software stack. All of these require developers and operation and maintenance teams to not only have professional network and hardware knowledge, but also to continuously invest time and resources in technology updates and system optimization to ensure efficient operation and stability of the system.

在本申请之前的一种相关技术中,RDMA通信管理(Remote Direct Memory AccessCommunication Manager,RDMA-CM)是一套管理RDMA(远程直接内存访问)设备连接的API,提供了一种机制来建立、监听、断开和销毁RDMA连接。RDMA-CM是构建在InfiniBand贸易联盟(InfiniBand Trade Association,简称为IBTA)规范和基于融合以太网的RDMA(RDMAover Converged Ethernet,RoCE)技术之上的,它旨在简化RDMA编程,使应用程序能够更容易地使用RDMA技术进行高性能、低延迟的网络通信。但是其主要缺点包括:1)无法降低维护RDMA软件栈依赖带来的显著成本,依旧需要开发者和维护团队具备专业的网络和硬件知识,持续投入时间和资源进行技术更新和系统优化;2)基于RDMA-CM的开发成本相较于socket接口依旧偏高,开发者可能需要针对不同场景实现不同的代码逻辑。In a related technology prior to this application, RDMA Communication Manager (RDMA-CM) is a set of APIs for managing RDMA (Remote Direct Memory Access) device connections, providing a mechanism to establish, monitor, disconnect, and destroy RDMA connections. RDMA-CM is built on the InfiniBand Trade Association (IBTA) specification and RDMA over Converged Ethernet (RoCE) technology. It aims to simplify RDMA programming and enable applications to more easily use RDMA technology for high-performance, low-latency network communications. However, its main disadvantages include: 1) It cannot reduce the significant cost of maintaining RDMA software stack dependencies, and developers and maintenance teams still need to have professional network and hardware knowledge, and continue to invest time and resources in technology updates and system optimization; 2) The development cost based on RDMA-CM is still relatively high compared to the socket interface, and developers may need to implement different code logic for different scenarios.

在本申请之前的另一种相关技术中,为大规模的内存密集型计算场景,在分布式计算平台上设计的基于 RDMA 的内存系统, 该技术会公开集群中机器的内存作为共享地址空间。应用程序可以使用事务来分配、读取、写入和释放地址空间中的对象。它的一个重要目标也是旨在简化编程,使应用程序能够更容易地使用 RDMA 技术进行高性能、低延迟的网络通信。其主要缺点包括:1)依旧存在用户态维护 RDMA 软件栈依赖所带来的成本,这需要开发者和维护团队具备一定专业的网络和硬件知识;2)开发和学习成本较高,相较于socket接口依旧偏高,另一方面作为基于 RDMA 的分布式内存系统,在通用场景下的编码灵活性较弱,开发者可能还需要针对不同场景实现额外代码,如在 RDMA 失效的场景下,开发者可能需要自行切换回使用socket模型编码。In another related technology before this application, an RDMA-based memory system designed on a distributed computing platform for large-scale memory-intensive computing scenarios exposes the memory of machines in the cluster as a shared address space. Applications can use transactions to allocate, read, write, and release objects in the address space. One of its important goals is to simplify programming, so that applications can more easily use RDMA technology for high-performance, low-latency network communications. Its main disadvantages include: 1) There is still the cost of user-mode maintenance of RDMA software stack dependencies, which requires developers and maintenance teams to have certain professional network and hardware knowledge; 2) The development and learning costs are high, which is still high compared to the socket interface. On the other hand, as a distributed memory system based on RDMA, the coding flexibility in general scenarios is weak, and developers may also need to implement additional code for different scenarios. For example, in the scenario where RDMA fails, developers may need to switch back to using the socket model coding.

后来相关技术提出了一种新型内核网络协议栈 SMC-R,无需应用任何修改即可将TCP通信透明替换成RDMA通信,解决了基于 verbs 接口编程过于复杂的问题,同时将维护成本限制在内核内,极大地降低了开发者和维护团队的成本。然而,为了完全兼容标准的TCP 应用,SMC 在实现上引入了用户态和内核之间切换的指令开销和拷贝开销,同时相较于用户态 verbs 轻薄的数据路径,强调通用性的 SMC 的数据路径较为复杂,这导致基于SMC 的通信和基于用户态 verbs 的通信性能差距较大。Later, related technologies proposed a new kernel network protocol stack SMC-R, which can transparently replace TCP communication with RDMA communication without any modification, solving the problem of overly complex programming based on verbs interfaces, while limiting maintenance costs within the kernel, greatly reducing the cost of developers and maintenance teams. However, in order to be fully compatible with standard TCP applications, SMC introduces instruction overhead and copy overhead for switching between user mode and kernel in its implementation. At the same time, compared with the thin data path of user-mode verbs, the data path of SMC, which emphasizes universality, is more complex, which leads to a large performance gap between SMC-based communication and user-mode verbs-based communication.

有鉴于此,本申请实施例提供了一种基于socket接口的通信方法,以全部或部分解决上述技术问题。本申请实施例的应用场景包括但并不限于:时延敏感的数据查询和处理(如内存数据库Redis、分布式内存对象的缓存系统Memcached、关系数据库PostgreSQL等高性能数据查询与处理的场景)、高吞吐的数据传输。In view of this, the embodiment of the present application provides a communication method based on a socket interface to fully or partially solve the above technical problems. The application scenarios of the embodiment of the present application include but are not limited to: latency-sensitive data query and processing (such as high-performance data query and processing scenarios such as memory database Redis, distributed memory object caching system Memcached, relational database PostgreSQL, etc.), high-throughput data transmission.

如图1所示,上述基于socket接口的通信方法包括:As shown in FIG1 , the above-mentioned communication method based on the socket interface includes:

S102,在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应。S102, when the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, register the local memory address applied for in the user state into the kernel to obtain local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC.

需要说明的是,上述本地节点、远端节点可以是本地主机、远端主机,该本地主机、远端主机的操作系统包括用户态、内核态、底层硬件设备。在本地主机与远端主机建立SMC通信前,本地主机协议栈可以先在内核中与远端主机建立TCP连接,在握手过程中使用特殊的TCP选项表明自身支持SMC,并确认远端主机同样支持SMC。另外,本地主机和远端主机SMC-R协议栈将创建新的或者复用已有的RDMA资源,建立可用的RDMA RC链路,使得网络传输将基于RDMA网络完成,即目标链路可达。上述底层硬件设备可以包括:内部共享内存ISM设备、远程直接内存访问RDMA设备。对应地,上述目标链路可以包括:基于RDMA技术实现的链路,基于直接内存访问DMA技术实现的链路。对应地,内核模块可以是SMC-R模块、SMC-D模块。上述本地地址注册信息可以包括本地地址注册ID。It should be noted that the above-mentioned local node and remote node can be a local host and a remote host, and the operating systems of the local host and the remote host include user state, kernel state, and underlying hardware devices. Before the local host and the remote host establish SMC communication, the local host protocol stack can first establish a TCP connection with the remote host in the kernel, use a special TCP option during the handshake process to indicate that it supports SMC, and confirm that the remote host also supports SMC. In addition, the local host and the remote host SMC-R protocol stack will create new or reuse existing RDMA resources to establish an available RDMA RC link, so that network transmission will be completed based on the RDMA network, that is, the target link is reachable. The above-mentioned underlying hardware devices may include: internal shared memory ISM devices, remote direct memory access RDMA devices. Correspondingly, the above-mentioned target link may include: a link implemented based on RDMA technology, and a link implemented based on direct memory access DMA technology. Correspondingly, the kernel module may be an SMC-R module or an SMC-D module. The above-mentioned local address registration information may include a local address registration ID.

S104,将该本地地址注册信息发送至该远端节点,并接收该远端节点发送的远端地址注册信息。S104: Send the local address registration information to the remote node, and receive the remote address registration information sent by the remote node.

可选地,在本申请实施例中,将本地地址注册信息发送至远端节点的方式包括但并不限于:方式一、将本地地址注册信息拷贝到内核的缓冲区中,通过直接内存访问方式从内核缓冲区中读取本地地址注册信息,并将该本地地址注册信息发送至该远端节点。即,使用SMC通用数据路径发送本地地址注册信息。方式二、建立TCP连接发送本地地址注册信息。相应地,远端节点在接收到本地节点发送的本地地址注册信息后,通过与发送端相同的方式,将远端节点的远端地址注册信息发送至本地节点。Optionally, in an embodiment of the present application, the method of sending the local address registration information to the remote node includes but is not limited to: Method 1, copying the local address registration information to the kernel buffer, reading the local address registration information from the kernel buffer by direct memory access, and sending the local address registration information to the remote node. That is, the local address registration information is sent using the SMC general data path. Method 2, establishing a TCP connection to send the local address registration information. Accordingly, after receiving the local address registration information sent by the local node, the remote node sends the remote address registration information of the remote node to the local node in the same manner as the sending end.

S106,在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中。需要说明的是,上述目标数据可以包括:应用数据、用户态自定义控制数据等。S106, after the user state writes the target data into the space of the local memory address, the target data is read from the user state through direct memory access and the target data is transferred to the space of the remote memory address corresponding to the remote address registration information. It should be noted that the above target data may include: application data, user state custom control data, etc.

由上可知,本申请实施例对socket接口进行了扩展,扩展后的socket接口在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应;将该本地地址注册信息发送至该远端节点,并获取该远端节点的远端地址注册信息;在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中。也就是说,本申请实施例扩展后的socket接口可以允许应用使用用户态内存区域直接进行数据通信,而无需CPU在通信数据路径上进行任何操作,解决了相关技术中基于SMC协议通信时,需要CPU参与内核态和用户态之间内存拷贝和语义转换,导致通信性能降低的技术问题,进而达到了提高通信性能的技术效果。As can be seen from the above, the embodiment of the present application extends the socket interface. When the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, the local memory address applied by the user state is registered in the kernel to obtain the local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC; the local address registration information is sent to the remote node, and the remote address registration information of the remote node is obtained; after the user state writes the target data into the space of the local memory address, the target data is read from the user state through direct memory access and the target data is transferred to the space of the remote memory address corresponding to the remote address registration information. In other words, the extended socket interface of the embodiment of the present application can allow the application to use the user state memory area to directly communicate data without the CPU performing any operation on the communication data path, which solves the technical problem in the related technology that the CPU needs to participate in the memory copy and semantic conversion between the kernel state and the user state when communicating based on the SMC protocol, resulting in reduced communication performance, thereby achieving the technical effect of improving communication performance.

在一种可能的实现方式中,通过直接内存访问方式从所述用户态读取所述目标数据并将所述目标数据传输至所述远端地址注册信息对应的远端内存地址的空间中包括:S11,将该目标数据映射到内核中,得到数据映射关系;S12,通过该数据映射关系,使该底层硬件设备直接读取该用户态的该目标数据,并将所述目标数据通过与所述底层硬件设备对应的网络传输至所述远端地址注册信息对应的远端内存地址的空间中。可选地,在本申请实施例中,上述S11的实现方式可以是基于mmap内存映射,将目标数据映射到内核中。由于上述将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,那么基于该本地地址注册信息,可以将该目标数据映射到该本地地址注册信息对应的空间中,得到目标数据和本地地址注册信息对应的空间之间的映射关系。可选地,该映射关系可以包括:数据映射的开始地址、数据映射长度等。在SMC中,底层硬件通过DMA技术从内核缓冲区中读取数据,而本申请实施例确定了上述映射关系,那么就可以使底层硬件直接访问用户态的目标数据,绕过 CPU。通过上述步骤S11~S12,使得在兼容SMC协议的同时,允许应用使用用户态内存区域直接进行数据通信,避免了内核态和用户态之间的内存拷贝和语义转换,实现在通信数据路径上 CPU 不需要对数据进行任何操作,大幅提升通信性能。In a possible implementation, reading the target data from the user state by direct memory access and transmitting the target data to the space of the remote memory address corresponding to the remote address registration information includes: S11, mapping the target data to the kernel to obtain a data mapping relationship; S12, through the data mapping relationship, enabling the underlying hardware device to directly read the target data in the user state, and transmitting the target data to the space of the remote memory address corresponding to the remote address registration information through the network corresponding to the underlying hardware device. Optionally, in an embodiment of the present application, the implementation of S11 can be based on mmap memory mapping to map the target data to the kernel. Since the local memory address applied by the user state is registered in the kernel to obtain the local address registration information, based on the local address registration information, the target data can be mapped to the space corresponding to the local address registration information to obtain the mapping relationship between the target data and the space corresponding to the local address registration information. Optionally, the mapping relationship can include: the starting address of the data mapping, the data mapping length, etc. In SMC, the underlying hardware reads data from the kernel buffer through DMA technology, and the embodiment of the present application determines the above mapping relationship, so that the underlying hardware can directly access the target data in the user state, bypassing the CPU. Through the above steps S11~S12, while being compatible with the SMC protocol, the application is allowed to use the user state memory area to directly communicate data, avoiding memory copying and semantic conversion between kernel state and user state, and realizing that the CPU does not need to perform any operation on the data on the communication data path, greatly improving communication performance.

可选地,将本地地址注册信息发送至远端节点的方式可以包括:S21,将本地地址注册信息拷贝到内核的缓冲区中;S22,通过直接内存访问方式从内核缓冲区中读取本地地址注册信息,并将本地地址注册信息发送至远端节点。即在本申请实施例中,可以采用SMC通用数据路径传输本地地址注册信息、远端地址注册信息。需要说明的是,SMC通用数据路径可以是本地节点应用程序(APP)通过socket接口将待发送数据拷贝到本侧SMC 协议栈为数据发送分配的环形缓冲区中,由SMC协议栈通过RDMA 写操作直接高效地写入远端节点的环形缓冲区中。Optionally, the method of sending the local address registration information to the remote node may include: S21, copying the local address registration information to the kernel buffer; S22, reading the local address registration information from the kernel buffer by direct memory access, and sending the local address registration information to the remote node. That is, in an embodiment of the present application, the SMC general data path can be used to transmit local address registration information and remote address registration information. It should be noted that the SMC general data path can be a local node application (APP) that copies the data to be sent to the ring buffer allocated by the SMC protocol stack on this side for data transmission through the socket interface, and the SMC protocol stack directly and efficiently writes it to the ring buffer of the remote node through the RDMA write operation.

本申请实施例除了可以在用户态内存上直接进行应用数据交换,还可以实现信息通知。具体地,上述目标数据可以包括:应用数据、用户态自定义控制数据,其中,用户态自定义控制数据包括以下至少之一:远端地址注册信息、写入偏移信息、写入长度信息、读取偏移信息、读取长度信息。可选地,可以通过直接内存访问方式从用户态读取应用数据、用户态自定义控制数据并将该应用数据、用户态自定义控制数据传输至远端地址注册信息对应的远端内存地址的空间中。即,本申请实施例提供的扩展后的socket接口除了可以直接进行应用数据交换,还允许用户交换自定义控制消息数据,在进一步提升性能的同时,也极大地提高了用户编码的灵活性。In addition to being able to directly exchange application data on user-state memory, the embodiments of the present application can also implement information notification. Specifically, the above-mentioned target data may include: application data, user-state custom control data, wherein the user-state custom control data includes at least one of the following: remote address registration information, write offset information, write length information, read offset information, and read length information. Optionally, application data and user-state custom control data can be read from the user state through direct memory access and the application data and user-state custom control data can be transmitted to the space of the remote memory address corresponding to the remote address registration information. That is, the extended socket interface provided by the embodiments of the present application can not only be able to directly exchange application data, but also allow users to exchange custom control message data, which further improves the performance while greatly improving the flexibility of user coding.

下面结合具体示例,对本申请实施例进行举例说明。需要说明的是,在本示例中,以SMC-R为例进行描述。The embodiments of the present application are described below with reference to specific examples. It should be noted that in this example, SMC-R is used as an example for description.

图 2 是本示例的整体设计框架。其中,包括用户态 、内核态、硬件(包括RDMA设备)。SMC-R为实现SMC-R协议的内核模块。SMC-R工作于内核空间,向上支持用户态程序(APP)通过socket接口描述的网络行为,向下使用IB verbs接口实现RDMA网络传输。Send_buf为SMC-R 内核协议栈为连接数据发送分配的环形缓冲区。图2中省略了接收缓冲区。本示例提供的扩展后的socket接口,其允许用户主动创建用户态内存区域映射提供给SMC直接进行数据交换,数据传递不再依赖内核态与用户态拷贝。在这基础上,在数据交换的过程中,扩展后的socket接口还允许用户交换自定义控制消息,在进一步提升性能的同时,也极大地提高了用户编码的灵活性。在图2中,左边为本示例提出的数据传输路径,右边为相关技术中SMC-R的数据传输路径。Figure 2 is the overall design framework of this example. It includes user state, kernel state, and hardware (including RDMA devices). SMC-R is a kernel module that implements the SMC-R protocol. SMC-R works in kernel space, supports network behaviors described by user state programs (APP) through socket interfaces, and uses IB verbs interfaces to implement RDMA network transmission. Send_buf is a ring buffer allocated by the SMC-R kernel protocol stack for connecting data transmission. The receive buffer is omitted in Figure 2. The extended socket interface provided in this example allows users to actively create user state memory area mappings and provide them to SMC for direct data exchange. Data transmission no longer depends on kernel state and user state copies. On this basis, in the process of data exchange, the extended socket interface also allows users to exchange custom control messages, which further improves performance while greatly improving the flexibility of user coding. In Figure 2, the left side is the data transmission path proposed in this example, and the right side is the data transmission path of SMC-R in the related technology.

在上述设计框架下,以一次完整的主机间通信进行为例进行描述,本方法包括:握手建连(使用标准socket接口创建一个SMC连接);资源准备和注册(发送端在本地地址空间申请一块内存local_addr;使用扩展后的socket接口将本地地址内存local_addr注册到SMC连接中,并获得本地地址注册信息local_id;使用SMC-R通用数据路径将本地地址注册信息local_id发送给远端;等待事件通知,获得远端内存地址注册信息remote_id);发送数据(在用户态将应用数据直接写入本地内存地址local_addr,使用扩展后的socket接口将local_addr地址对应的数据搬运到由remote_id地址信息所对应的远端地址空间;若需要通知对端本次写入,使用SMC-R通用数据路径通知对端,传递本次写入信息,包括地址信息(remote_id),写入偏移,写入长度等)。接收数据(等待事件通知,获得对端写入事件信息,如本地地址信息(local_id),写入偏移,写入长度等。直接读取并处理 local_id 所对应的local_addr本地地址的数据),参见图3中S301~S311所示。该示例使用基于socket的接口进行编程,极大地降低了学习成本和开发成本,同时RDMA相关资源的创建或销毁完全由内核SMC处理,用户进程不再需要处理复杂的 RDMA verbs 操作。扩展后的socket接口允许用户绕过SMC 环形缓冲区的数据路径,直接在用户自定义的用户态内存上直接进行数据交换和信息通知,省去了内核态和用户态之间的内存拷贝和语义转换,在原生SMC的基础上,大幅提升通信性能。Under the above design framework, a complete host-to-host communication is described as an example. This method includes: handshake establishment (using a standard socket interface to create an SMC connection); resource preparation and registration (the sender applies for a memory local_addr in the local address space; uses the extended socket interface to register the local address memory local_addr to the SMC connection, and obtains the local address registration information local_id; uses the SMC-R general data path to send the local address registration information local_id to the remote end; waits for event notification and obtains the remote memory address registration information remote_id); sends data (in user mode, writes the application data directly to the local memory address local_addr, and uses the extended socket interface to move the data corresponding to the local_addr address to the remote address space corresponding to the remote_id address information; if it is necessary to notify the other end of this write, use the SMC-R general data path to notify the other end and pass the write information, including address information (remote_id), write offset, write length, etc.). Receive data (wait for event notification, obtain the write event information of the other end, such as local address information (local_id), write offset, write length, etc. Directly read and process the data of the local address local_addr corresponding to local_id), see S301~S311 in Figure 3. This example uses a socket-based interface for programming, which greatly reduces the learning cost and development cost. At the same time, the creation or destruction of RDMA-related resources is completely handled by the kernel SMC, and the user process no longer needs to handle complex RDMA verbs operations. The extended socket interface allows users to bypass the data path of the SMC ring buffer and directly exchange data and notify information on the user-defined user-mode memory, eliminating the memory copy and semantic conversion between kernel mode and user mode, and greatly improving communication performance based on the native SMC.

综上,本申请实施例基于内核协议栈,将 RDMA 的使用放在内核 SMC 模块中,极大地降低了开发者的开发成本和庞大 RDMA 软件栈的管理和维护成本。在大多数场景下能获得和用户态 verbs 接近的时延和吞吐收益,既避免了直接使用用户态 RDMA verbs 的成本,又能很好的解决 SMC 在高性能场景下性能比 ibverbs 差距不小的问题。另外,利用SMC协议,控制平面兼容socket接口,基于 TCP 的应用程序无需做任何修改,数据平面在SMC socket 的基础上添加非常简单的接口,学习成本低,实现在内核态,可复用性强,不需要针对不同场景实现不同的代码逻辑。In summary, the embodiment of the present application is based on the kernel protocol stack and places the use of RDMA in the kernel SMC module, which greatly reduces the developer's development cost and the management and maintenance cost of the huge RDMA software stack. In most scenarios, latency and throughput benefits close to user-state verbs can be obtained, which not only avoids the cost of directly using user-state RDMA verbs, but also solves the problem that SMC's performance is not as good as ibverbs in high-performance scenarios. In addition, using the SMC protocol, the control plane is compatible with the socket interface, and TCP-based applications do not need to be modified. The data plane adds a very simple interface based on the SMC socket, with low learning cost, implemented in the kernel state, and strong reusability. There is no need to implement different code logic for different scenarios.

需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

下面以具体的实施例对本申请的技术方案以及本申请的技术方案如何解决前述技术问题进行详细说明。所列举的若干具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。以下将结合附图,对本申请的实施例进行详细描述。The technical solution of the present application and how the technical solution of the present application solves the above-mentioned technical problems are described in detail below with specific embodiments. The several specific embodiments listed can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

与本申请实施例提供的方法的应用场景以及方法相对应地,本申请实施例还提供一种socket接口。如图4所示为本申请一实施例的socket接口结构框图,该接口可以包括:Corresponding to the application scenario and method of the method provided in the embodiment of the present application, the embodiment of the present application also provides a socket interface. As shown in FIG4 is a structural block diagram of a socket interface of an embodiment of the present application, the interface may include:

第一子接口42,用于在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应,将该本地地址注册信息发送至该远端节点,并接收该远端节点发送的远端地址注册信息;The first sub-interface 42 is used to register the local memory address applied by the user state into the kernel to obtain local address registration information when the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, wherein the target link corresponds to the underlying hardware device type of the SMC, send the local address registration information to the remote node, and receive the remote address registration information sent by the remote node;

第二子接口44,用于在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中;The second sub-interface 44 is used to read the target data from the user state through a direct memory access method and transfer the target data to the space of the remote memory address corresponding to the remote address registration information after the target data is written into the space of the local memory address in the user state;

通过图4所示扩展后的socket接口,在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应;将该本地地址注册信息发送至该远端节点,并获取该远端节点的远端地址注册信息;在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中。也就是说,本申请实施例扩展后的socket接口可以允许应用使用用户态内存区域直接进行数据通信,而无需CPU在通信数据路径上进行任何操作,解决了相关技术中基于SMC协议通信时,需要CPU参与内核态和用户态之间内存拷贝和语义转换,导致通信性能降低的技术问题,进而达到了提高通信性能的技术效果。Through the extended socket interface shown in FIG4, when the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, the local memory address applied by the user state is registered in the kernel to obtain the local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC; the local address registration information is sent to the remote node, and the remote address registration information of the remote node is obtained; after the target data is written into the space of the local memory address in the user state, the target data is read from the user state through direct memory access and the target data is transferred to the space of the remote memory address corresponding to the remote address registration information. In other words, the extended socket interface of the embodiment of the present application can allow the application to use the user state memory area to directly communicate data without the CPU performing any operation on the communication data path, which solves the technical problem in the related technology that the CPU needs to participate in the memory copy and semantic conversion between the kernel state and the user state when communicating based on the SMC protocol, resulting in reduced communication performance, thereby achieving the technical effect of improving communication performance.

可选的,上述目标数据包括:应用数据、用户态自定义控制数据,其中,该用户态自定义控制数据包括以下至少之一:远端地址注册信息、写入偏移信息、写入长度信息、读取偏移信息、读取长度信息。Optionally, the target data includes: application data, user-defined control data, wherein the user-defined control data includes at least one of the following: remote address registration information, write offset information, write length information, read offset information, and read length information.

与本申请实施例提供的方法的应用场景以及方法相对应地,本申请实施例还提供一种基于socket接口的通信装置。如图5所示为本申请一实施例的基于socket接口的通信装置的结构框图,该装置可以包括:Corresponding to the application scenario and method of the method provided in the embodiment of the present application, the embodiment of the present application also provides a communication device based on a socket interface. As shown in FIG5 , a structural block diagram of a communication device based on a socket interface in an embodiment of the present application may include:

注册模块52,用于在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应。需要说明的是,上述本地节点、远端节点可以是本地主机、远端主机,该本地主机、远端主机的操作系统包括用户态、内核态、底层硬件设备。在本地主机与远端主机建立SMC通信前,本地主机协议栈可以先在内核中与远端主机建立TCP连接,在握手过程中使用特殊的TCP选项表明自身支持SMC,并确认远端主机同样支持SMC。另外,本地主机和远端主机SMC-R协议栈将创建新的或者复用已有的RDMA资源,建立可用的RDMA RC链路,使得网络传输将基于RDMA网络完成,即目标链路可达。上述底层硬件设备可以包括:内部共享内存ISM设备、远程直接内存访问RDMA设备。对应地,上述目标链路可以包括:基于RDMA技术实现的链路,基于直接内存访问DMA技术实现的链路。对应地,内核模块可以是SMC-R模块、SMC-D模块。上述本地地址注册信息可以包括本地地址注册ID。The registration module 52 is used to register the local memory address applied by the user state into the kernel when the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, and obtain the local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC. It should be noted that the above-mentioned local node and remote node can be a local host and a remote host, and the operating system of the local host and the remote host includes user state, kernel state, and underlying hardware devices. Before the local host and the remote host establish SMC communication, the local host protocol stack can first establish a TCP connection with the remote host in the kernel, use a special TCP option during the handshake process to indicate that it supports SMC, and confirm that the remote host also supports SMC. In addition, the local host and the remote host SMC-R protocol stack will create new or reuse existing RDMA resources, establish an available RDMA RC link, so that the network transmission will be completed based on the RDMA network, that is, the target link is reachable. The above-mentioned underlying hardware devices may include: internal shared memory ISM devices, remote direct memory access RDMA devices. Correspondingly, the target link may include: a link implemented based on RDMA technology, a link implemented based on direct memory access DMA technology. Correspondingly, the kernel module may be an SMC-R module, an SMC-D module. The local address registration information may include a local address registration ID.

处理模块54,用于将该本地地址注册信息发送至该远端节点,并接收该远端节点发送的远端地址注册信息。可选地,在本申请实施例中,将本地地址注册信息发送至远端节点的方式包括但并不限于:方式一、将本地地址注册信息拷贝到内核的缓冲区中,通过直接内存访问方式从内核缓冲区中读取本地地址注册信息,并将本地地址注册信息发送至远端节点。即,使用SMC通用数据路径发送本地地址注册信息。方式二、建立TCP连接发送本地地址注册信息。相应地,远端节点在接收到本地节点发送的本地地址注册信息后,通过与发送端相同的方式,将远端节点的远端地址注册信息发送至本地节点。Processing module 54 is used to send the local address registration information to the remote node and receive the remote address registration information sent by the remote node. Optionally, in an embodiment of the present application, the method of sending the local address registration information to the remote node includes but is not limited to: Method 1, copying the local address registration information to the kernel buffer, reading the local address registration information from the kernel buffer by direct memory access, and sending the local address registration information to the remote node. That is, the local address registration information is sent using the SMC general data path. Method 2, establishing a TCP connection to send the local address registration information. Accordingly, after receiving the local address registration information sent by the local node, the remote node sends the remote address registration information of the remote node to the local node in the same manner as the sending end.

通信模块56,用于在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中。需要说明的是,上述目标数据可以包括:应用数据、用户态自定义控制数据等。The communication module 56 is used to read the target data from the user state through direct memory access and transmit the target data to the space of the remote memory address corresponding to the remote address registration information after the target data is written into the space of the local memory address in the user state. It should be noted that the above target data may include: application data, user state custom control data, etc.

通过图5所示装置,对socket接口进行了扩展,扩展后的socket接口在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应;将该本地地址注册信息发送至该远端节点,并获取该远端节点的远端地址注册信息;在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中。也就是说,本申请实施例扩展后的socket接口可以允许应用使用用户态内存区域直接进行数据通信,而无需CPU在通信数据路径上进行任何操作,解决了相关技术中基于SMC协议通信时,需要CPU参与内核态和用户态之间内存拷贝和语义转换,导致通信性能降低的技术问题,进而达到了提高通信性能的技术效果。Through the device shown in Figure 5, the socket interface is extended. When the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, the local memory address applied by the user state is registered in the kernel to obtain the local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC; the local address registration information is sent to the remote node, and the remote address registration information of the remote node is obtained; after the user state writes the target data into the space of the local memory address, the target data is read from the user state through direct memory access and the target data is transferred to the space of the remote memory address corresponding to the remote address registration information. In other words, the extended socket interface of the embodiment of the present application can allow the application to use the user state memory area to directly communicate data without the CPU performing any operation on the communication data path, which solves the technical problem that the CPU needs to participate in the memory copy and semantic conversion between the kernel state and the user state when communicating based on the SMC protocol in the related technology, resulting in reduced communication performance, thereby achieving the technical effect of improving communication performance.

在一个可能的实现方式中,通信模块56包括:映射单元,用于将该目标数据映射到该内核中,得到数据映射关系;通信单元,用于通过该数据映射关系,使该底层硬件设备直接读取该用户态的该目标数据,并将所述目标数据通过与所述底层硬件设备对应的网络传输至所述远端地址注册信息对应的远端内存地址的空间中。可选地,在本申请实施例中,上述S11的实现方式可以是基于mmap内存映射,将目标数据映射到内核中。由于上述将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,那么基于该本地地址注册信息,可以将该目标数据映射到该本地地址注册信息对应的空间中,得到目标数据和本地地址注册信息对应的空间之间的映射关系。可选地,该映射关系可以包括:数据映射的开始地址、数据映射长度等。在SMC中,底层硬件通过DMA技术从内核缓冲区中读取数据,而本申请实施例确定了上述映射关系,那么就可以使底层硬件直接访问用户态的目标数据,绕过CPU。通过上述步骤S11~S12,使得在兼容SMC协议的同时,允许应用使用用户态内存区域直接进行数据通信,避免了内核态和用户态之间的内存拷贝和语义转换,实现在通信数据路径上 CPU 不需要对数据进行任何操作,大幅提升通信性能。In a possible implementation, the communication module 56 includes: a mapping unit for mapping the target data into the kernel to obtain a data mapping relationship; a communication unit for enabling the underlying hardware device to directly read the target data in the user state through the data mapping relationship, and transmit the target data to the space of the remote memory address corresponding to the remote address registration information through the network corresponding to the underlying hardware device. Optionally, in an embodiment of the present application, the implementation of the above S11 can be based on mmap memory mapping to map the target data into the kernel. Since the local memory address applied for by the user state is registered in the kernel to obtain the local address registration information, the target data can be mapped to the space corresponding to the local address registration information based on the local address registration information to obtain the mapping relationship between the target data and the space corresponding to the local address registration information. Optionally, the mapping relationship can include: the starting address of the data mapping, the data mapping length, etc. In the SMC, the underlying hardware reads data from the kernel buffer through the DMA technology, and the embodiment of the present application determines the above mapping relationship, so that the underlying hardware can directly access the target data in the user state and bypass the CPU. Through the above steps S11~S12, while being compatible with the SMC protocol, the application is allowed to use the user state memory area to directly communicate data, avoiding memory copying and semantic conversion between kernel state and user state, and achieving the CPU not needing to perform any operations on the data on the communication data path, thereby greatly improving communication performance.

处理模块54包括:拷贝单元,用于将该本地地址注册信息拷贝到该内核的缓冲区中;处理单元,用于通过直接内存访问方式从内核缓冲区中读取本地地址注册信息,并将本地地址注册信息发送至远端节点。即在本申请实施例中,可以采用SMC通用数据路径传输本地地址注册信息、远端地址注册信息。需要说明的是,SMC通用数据路径可以是本地节点应用程序(APP)通过socket接口将待发送数据拷贝到本侧SMC 协议栈为数据发送分配的环形缓冲区中,由SMC协议栈通过RDMA 写操作直接高效地写入远端节点的环形缓冲区中。The processing module 54 includes: a copy unit, which is used to copy the local address registration information to the buffer of the kernel; a processing unit, which is used to read the local address registration information from the kernel buffer by direct memory access, and send the local address registration information to the remote node. That is, in an embodiment of the present application, the SMC general data path can be used to transmit local address registration information and remote address registration information. It should be noted that the SMC general data path can be a local node application (APP) that copies the data to be sent to the ring buffer allocated by the SMC protocol stack on this side for data transmission through the socket interface, and the SMC protocol stack directly and efficiently writes it into the ring buffer of the remote node through the RDMA write operation.

上述装置除了可以在用户态内存上直接进行应用数据交换,还可以实现信息通知。具体地,上述目标数据可以包括:应用数据、用户态自定义控制数据,其中,用户态自定义控制数据包括以下至少之一:远端地址注册信息、写入偏移信息、写入长度信息、读取偏移信息、读取长度信息。可选地,可以通过直接内存访问方式从用户态读取应用数据、用户态自定义控制数据并将该应用数据、用户态自定义控制数据传输至远端地址注册信息对应的远端内存地址的空间中。即,本申请实施例提供的扩展后的socket接口除了可以直接进行应用数据交换,还允许用户交换自定义控制消息数据,在进一步提升性能的同时,也极大地提高了用户编码的灵活性。In addition to being able to directly exchange application data on the user-state memory, the above-mentioned device can also implement information notification. Specifically, the above-mentioned target data may include: application data, user-state custom control data, wherein the user-state custom control data includes at least one of the following: remote address registration information, write offset information, write length information, read offset information, read length information. Optionally, the application data and user-state custom control data can be read from the user state through direct memory access and the application data and user-state custom control data can be transmitted to the space of the remote memory address corresponding to the remote address registration information. That is, the extended socket interface provided in the embodiment of the present application can not only be able to directly exchange application data, but also allow users to exchange custom control message data, which further improves the performance while greatly improving the flexibility of user coding.

本申请实施例各装置中的各模块的功能可以参见上述方法中的对应描述,并具备相应的有益效果,在此不再赘述。The functions of each module in each device in the embodiments of the present application can be found in the corresponding description in the above method, and have corresponding beneficial effects, which will not be repeated here.

与本申请实施例提供的方法的应用场景以及方法相对应地,本申请实施例还提供一种基于socket接口的通信系统。如图6所示为本申请一实施例的基于socket接口的通信系统的结构框图,该系统可以包括:Corresponding to the application scenario and method of the method provided in the embodiment of the present application, the embodiment of the present application also provides a communication system based on a socket interface. As shown in FIG6 , a structural block diagram of a communication system based on a socket interface in an embodiment of the present application can include:

本地节点62,用于在本地节点与远端节点支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应;将该本地地址注册信息发送至该远端节点,并接收所述远端节点发送的远端地址注册信息;在所述用户态将目标数据写入所述本地内存地址的空间后,通过直接内存访问方式从所述用户态读取所述目标数据并将所述目标数据传输至所述远端地址注册信息对应的远端内存地址的空间中。需要说明的是,上述本地节点、远端节点可以是本地主机、远端主机,该本地主机、远端主机的操作系统包括用户态、内核态、底层硬件设备。在本地主机与远端主机建立SMC通信前,本地主机协议栈可以先在内核中与远端主机建立TCP连接,在握手过程中使用特殊的TCP选项表明自身支持SMC,并确认远端主机同样支持SMC。另外,本地主机和远端主机SMC-R协议栈将创建新的或者复用已有的RDMA资源,建立可用的RDMA RC链路,使得网络传输将基于RDMA网络完成,即目标链路可达。上述底层硬件设备可以包括:内部共享内存ISM设备、远程直接内存访问RDMA设备。对应地,上述目标链路可以包括:基于RDMA技术实现的链路,基于直接内存访问DMA技术实现的链路。对应地,内核模块可以是SMC-R模块、SMC-D模块。上述本地地址注册信息可以包括本地地址注册ID。可选地,在本申请实施例中,将本地地址注册信息发送至远端节点的方式包括但并不限于:方式一、将本地地址注册信息拷贝到内核的缓冲区中,通过直接内存访问方式从内核缓冲区中读取本地地址注册信息,并将该本地地址注册信息发送至该远端节点。即,使用SMC通用数据路径发送本地地址注册信息。方式二、建立TCP连接发送本地地址注册信息。相应地,远端节点在接收到本地节点发送的本地地址注册信息后,通过与发送端相同的方式,将远端节点的远端地址注册信息发送至本地节点。The local node 62 is used to register the local memory address applied by the user state into the kernel to obtain the local address registration information when the local node and the remote node support the shared memory communication SMC protocol and the target link is reachable, wherein the target link corresponds to the underlying hardware device type of the SMC; send the local address registration information to the remote node, and receive the remote address registration information sent by the remote node; after the target data is written into the space of the local memory address in the user state, read the target data from the user state through direct memory access and transfer the target data to the space of the remote memory address corresponding to the remote address registration information. It should be noted that the above-mentioned local node and remote node can be a local host and a remote host, and the operating systems of the local host and the remote host include user state, kernel state, and underlying hardware devices. Before the local host and the remote host establish SMC communication, the local host protocol stack can first establish a TCP connection with the remote host in the kernel, use a special TCP option in the handshake process to indicate that it supports SMC, and confirm that the remote host also supports SMC. In addition, the local host and the remote host SMC-R protocol stack will create new or reuse existing RDMA resources to establish an available RDMA RC link, so that network transmission will be completed based on the RDMA network, that is, the target link is reachable. The above-mentioned underlying hardware devices may include: internal shared memory ISM devices, remote direct memory access RDMA devices. Correspondingly, the above-mentioned target link may include: a link implemented based on RDMA technology, a link implemented based on direct memory access DMA technology. Correspondingly, the kernel module may be an SMC-R module or an SMC-D module. The above-mentioned local address registration information may include a local address registration ID. Optionally, in an embodiment of the present application, the method of sending the local address registration information to the remote node includes but is not limited to: Method 1, copying the local address registration information to the kernel buffer, reading the local address registration information from the kernel buffer by direct memory access, and sending the local address registration information to the remote node. That is, the local address registration information is sent using the SMC general data path. Method 2, establishing a TCP connection to send the local address registration information. Correspondingly, after receiving the local address registration information sent by the local node, the remote node sends the remote address registration information of the remote node to the local node in the same manner as the sending end.

远端节点64,用于确定远端地址注册信息,并向该本地节点发送该远端地址注册信息 ,以及在接收到该目标数据后,将该目标数据写入到该远端地址注册信息对应的远端内存地址空间中。The remote node 64 is used to determine the remote address registration information, send the remote address registration information to the local node, and after receiving the target data, write the target data into the remote memory address space corresponding to the remote address registration information.

通过图6所示系统,在本地节点62与远端节点64支持共享内存通信SMC协议且目标链路可达的情况下,将用户态申请的本地内存地址注册到内核中,得到本地地址注册信息,其中,该目标链路与该SMC的底层硬件设备类型对应;将该本地地址注册信息发送至该远端节点64,并获取该远端节点64的远端地址注册信息;在该用户态将目标数据写入该本地内存地址的空间后,通过直接内存访问方式从该用户态读取该目标数据并将该目标数据传输至该远端地址注册信息对应的远端内存地址的空间中。也就是说,本申请实施例对socket接口进行了扩展,可以允许应用使用用户态内存区域直接进行数据通信,而无需CPU在通信数据路径上进行任何操作,解决了相关技术中基于SMC协议通信时,需要CPU参与内核态和用户态之间内存拷贝和语义转换,导致通信性能降低的技术问题,进而达到了提高通信性能的技术效果。Through the system shown in FIG6 , when the local node 62 and the remote node 64 support the shared memory communication SMC protocol and the target link is reachable, the local memory address applied by the user state is registered in the kernel to obtain the local address registration information, wherein the target link corresponds to the underlying hardware device type of the SMC; the local address registration information is sent to the remote node 64, and the remote address registration information of the remote node 64 is obtained; after the target data is written into the space of the local memory address in the user state, the target data is read from the user state through direct memory access and the target data is transferred to the space of the remote memory address corresponding to the remote address registration information. In other words, the embodiment of the present application extends the socket interface, which allows the application to use the user state memory area to directly communicate data without the CPU performing any operation on the communication data path, thereby solving the technical problem in the related technology that the CPU needs to participate in the memory copy and semantic conversion between the kernel state and the user state when communicating based on the SMC protocol, resulting in reduced communication performance, thereby achieving the technical effect of improving communication performance.

本申请实施例各系统中的各模块的功能可以参见上述方法中的对应描述,并具备相应的有益效果,在此不再赘述。The functions of each module in each system of the embodiments of the present application can be found in the corresponding description in the above method, and have corresponding beneficial effects, which will not be repeated here.

图7为用来实现本申请实施例的电子设备的框图。如图7所示,该电子设备包括:存储器701和处理器702,存储器701内存储有可在处理器702上运行的计算机程序。处理器702执行该计算机程序时实现上述实施例中的方法。存储器701和处理器702的数量可以为一个或多个。FIG7 is a block diagram of an electronic device for implementing an embodiment of the present application. As shown in FIG7 , the electronic device includes: a memory 701 and a processor 702, wherein the memory 701 stores a computer program that can be run on the processor 702. When the processor 702 executes the computer program, the method in the above embodiment is implemented. The number of the memory 701 and the processor 702 can be one or more.

该电子设备还包括:The electronic device also includes:

通信接口703,用于与外界设备进行通信,进行数据交互传输。The communication interface 703 is used to communicate with external devices and perform data exchange transmission.

如果存储器701、处理器702和通信接口703独立实现,则存储器701、处理器702和通信接口703可以通过总线相互连接并完成相互间的通信。该总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral ComponentInterconnect,PCI)总线或扩展工业标准体系结构(ExtendedIndustry StandardArchitecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。If the memory 701, the processor 702 and the communication interface 703 are implemented independently, the memory 701, the processor 702 and the communication interface 703 can be connected to each other through a bus and communicate with each other. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.

可选的,在具体实现上,如果存储器701、处理器702及通信接口703集成在一块芯片上,则存储器701、处理器702及通信接口703可以通过内部接口完成相互间的通信。Optionally, in a specific implementation, if the memory 701, the processor 702 and the communication interface 703 are integrated on a chip, the memory 701, the processor 702 and the communication interface 703 can communicate with each other through an internal interface.

本申请实施例提供了一种计算机可读存储介质,其存储有计算机程序,该程序被处理器执行时实现本申请实施例中提供的方法。An embodiment of the present application provides a computer-readable storage medium storing a computer program, which implements the method provided in the embodiment of the present application when the program is executed by a processor.

本申请实施例还提供了一种芯片,该芯片包括处理器,用于从存储器中调用并运行存储器中存储的指令,使得安装有芯片的通信设备执行本申请实施例提供的方法。An embodiment of the present application also provides a chip, which includes a processor for calling and executing instructions stored in the memory from the memory, so that a communication device equipped with the chip executes the method provided in the embodiment of the present application.

本申请实施例还提供了一种芯片,包括:输入接口、输出接口、处理器和存储器,输入接口、输出接口、处理器以及存储器之间通过内部连接通路相连,处理器用于执行存储器中的代码,当代码被执行时,处理器用于执行申请实施例提供的方法。An embodiment of the present application also provides a chip, including: an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected via an internal connection path, and the processor is used to execute the code in the memory. When the code is executed, the processor is used to execute the method provided in the embodiment of the application.

应理解的是,上述处理器可以是中央处理器(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(FieldProgrammable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。值得说明的是,处理器可以是支持进阶精简指令集机器(Advanced RISC Machines,ARM)架构的处理器。It should be understood that the processor may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc. It is worth noting that the processor may be a processor that supports the Advanced RISC Machines (ARM) architecture.

进一步地,可选的,上述存储器可以包括只读存储器和随机访问存储器。该存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以包括只读存储器(Read-Only Memory,ROM)、可编程只读存储器(ProgrammableROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(ElectricallyEPROM,EEPROM)或闪存。易失性存储器可以包括随机访问存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM均可用。例如,静态随机访问存储器(Static RAM,SRAM)、动态随机访问存储器(Dynamic Random Access Memory,DRAM)、同步动态随机访问存储器(SynchronousDRAM,SDRAM)、双倍数据速率同步动态随机访问存储器(DoubleData Rate SDRAM,DDRSDRAM)、增强型同步动态随机访问存储器(Enhanced SDRAM,ESDRAM)、同步链接动态随机访问存储器(Sync link DRAM,SLDRAM)和直接内存总线随机访问存储器(DirectRambus RAM,DR RAM)。Further, optionally, the above-mentioned memory may include a read-only memory and a random access memory. The memory may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory may include a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may include a random access memory (RAM), which is used as an external cache. By way of exemplary but not limiting description, many forms of RAM are available. For example, static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM) and direct memory bus random access memory (DR RAM).

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生依照本申请的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function according to the present application is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包括于本申请的至少一个实施例或示例中。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present application. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine different embodiments or examples described in this specification and the features of different embodiments or examples, unless they are contradictory.

此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the features. In the description of this application, the meaning of "plurality" is two or more, unless otherwise clearly and specifically defined.

流程图中描述的或在此以其他方式描述的任何过程或方法可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分。并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能。Any process or method described in the flow chart or otherwise described herein can be understood as a module, fragment or portion of a code representing one or more executable instructions for implementing the steps of a specific logical function or process. And the scope of the preferred embodiment of the present application includes other implementations, in which the functions may not be performed in the order shown or discussed, including in a substantially simultaneous manner or in a reverse order according to the functions involved.

在流程图中描述的或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。The logic and/or steps described in the flowchart or otherwise described herein, for example, can be considered as an ordered list of executable instructions for implementing logical functions, which can be specifically implemented in any computer-readable medium for use by an instruction execution system, device or apparatus (such as a computer-based system, a system including a processor or other system that can fetch instructions from an instruction execution system, device or apparatus and execute instructions), or used in combination with these instruction execution systems, devices or apparatuses.

应理解的是,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。上述实施例方法的全部或部分步骤是可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。It should be understood that the various parts of the present application can be implemented with hardware, software, firmware or a combination thereof. In the above embodiments, multiple steps or methods can be implemented with software or firmware stored in a memory and executed by a suitable instruction execution system. All or part of the steps of the above embodiment method can be completed by instructing the relevant hardware through a program, which can be stored in a computer-readable storage medium, and when the program is executed, it includes one of the steps of the method embodiment or a combination thereof.

此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。上述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读存储介质中。该存储介质可以是只读存储器,磁盘或光盘等。In addition, each functional unit in each embodiment of the present application can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into one module. The above-mentioned integrated module can be implemented in the form of hardware or in the form of a software functional module. If the above-mentioned integrated module is implemented in the form of a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. The storage medium can be a read-only memory, a disk or an optical disk, etc.

以上所述,仅为本申请的示例性实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请记载的技术范围内,可轻易想到其各种变化或替换,这些都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above is only an exemplary embodiment of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of various changes or substitutions within the technical scope recorded in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application shall be based on the protection scope of the claims.

Claims (14)

1. A socket interface-based communication method, comprising:
Under the condition that a local node and a remote node support a shared memory communication SMC protocol and a target link is reachable, registering a local memory address applied by a user state into a kernel to obtain local address registration information, wherein the target link corresponds to a bottom hardware device type of the SMC;
Transmitting the local address registration information to the remote node, and receiving the remote address registration information transmitted by the remote node;
And after the user state writes the target data into the space of the local memory address, reading the target data from the user state in a direct memory access mode and transmitting the target data into the space of the remote memory address corresponding to the remote address registration information.
2. The method of claim 1, wherein reading the target data from the user state and transmitting the target data to a space of a remote memory address corresponding to the remote address registration information by a direct memory access manner comprises:
Mapping the target data into the kernel to obtain a data mapping relation;
And the bottom hardware device directly reads the target data of the user mode through the data mapping relation, and transmits the target data to a space of a remote memory address corresponding to the remote address registration information through a network corresponding to the bottom hardware device.
3. The method of claim 2, wherein mapping the target data into the kernel to obtain a data mapping relationship comprises:
Acquiring the local address registration information;
And mapping the target data into a space corresponding to the local address registration information to obtain a mapping relation between the target data and the space corresponding to the local address registration information.
4. The method of claim 1, wherein transmitting the home address registration information to the remote node comprises:
copying the local address registration information into the kernel buffer;
and reading the local address registration information from the kernel buffer area in a direct memory access mode, and sending the local address registration information to the remote node.
5. The method of claim 1, wherein the target data comprises: application data and user-mode custom control data, wherein the user-mode custom control data comprises at least one of the following: remote address registration information, write offset information, write length information, read offset information, read length information.
6. The method of any of claims 1 to 5, wherein the underlying hardware device comprises an internal shared memory ISM device or a remote direct memory access RDMA device.
7. The method of claim 6, wherein the target link is a direct memory access, DMA, link when the underlying hardware device is the ISM device and an RDMA link when the underlying hardware device is the RDMA device.
8. A socket interface, comprising:
The first sub-interface is used for registering a local memory address applied by a user mode into a kernel to obtain local address registration information under the condition that a local node and a remote node support a shared memory communication SMC protocol and a target link is reachable, wherein the target link corresponds to the type of bottom hardware equipment of the SMC, the local address registration information is sent to the remote node, and the remote address registration information sent by the remote node is received;
And the second sub-interface is used for reading the target data from the user state in a direct memory access mode after the target data are written into the space of the local memory address by the user state and transmitting the target data into the space of the remote memory address corresponding to the remote address registration information.
9. The socket interface of claim 8, wherein the target data comprises: application data and user-mode custom control data, wherein the user-mode custom control data comprises at least one of the following: remote address registration information, write offset information, write length information, read offset information, read length information.
10. A socket interface-based communication system, comprising:
The local node is used for registering a local memory address applied by a user state into the kernel to obtain local address registration information under the condition that the local node and the remote node support a shared memory communication SMC protocol and a target link is reachable, wherein the target link corresponds to the type of bottom hardware equipment of the SMC; transmitting the local address registration information to the remote node, and receiving the remote address registration information transmitted by the remote node; after the user state writes the target data into the space of the local memory address, reading the target data from the user state in a direct memory access mode and transmitting the target data into the space of the remote memory address corresponding to the remote address registration information;
The remote node is used for determining remote address registration information, sending the remote address registration information to the local node, and writing the target data into a remote memory address space corresponding to the remote address registration information after receiving the target data.
11. The system of claim 10, wherein the underlying hardware device comprises an internal shared memory ISM device or a remote direct memory access RDMA device, the target link being a direct memory access DMA link when the underlying hardware device is the ISM device, and an RDMA link when the underlying hardware device is the RDMA device.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory, the processor implementing the method of any one of claims 1-7 when the computer program is executed.
13. A computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-7.
14. A computer program product comprising computer instructions which, when executed by a processor, implement the method of any of claims 1-7.
CN202411062523.9A 2024-08-05 2024-08-05 A communication method and system based on socket interface Active CN118606079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411062523.9A CN118606079B (en) 2024-08-05 2024-08-05 A communication method and system based on socket interface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411062523.9A CN118606079B (en) 2024-08-05 2024-08-05 A communication method and system based on socket interface

Publications (2)

Publication Number Publication Date
CN118606079A CN118606079A (en) 2024-09-06
CN118606079B true CN118606079B (en) 2024-10-11

Family

ID=92561357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411062523.9A Active CN118606079B (en) 2024-08-05 2024-08-05 A communication method and system based on socket interface

Country Status (1)

Country Link
CN (1) CN118606079B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119248538A (en) * 2024-12-04 2025-01-03 浙江智臾科技有限公司 A method, system and medium for implementing RDMA service

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062253A (en) * 2017-12-11 2018-05-22 北京奇虎科技有限公司 The communication means of a kind of kernel state and User space, device and terminal
CN113448897A (en) * 2021-07-12 2021-09-28 上海交通大学 Array structure and optimization method suitable for pure user mode remote direct memory access

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356598A (en) * 2021-12-29 2022-04-15 山东浪潮科学研究院有限公司 Data interaction method and device for Linux kernel mode and user mode
CN117667369A (en) * 2022-08-26 2024-03-08 华为云计算技术有限公司 Memory management method, electronic device, chip system and readable storage medium
CN117453582A (en) * 2023-09-21 2024-01-26 杭州阿里云飞天信息技术有限公司 Data transmission methods, equipment and storage media

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062253A (en) * 2017-12-11 2018-05-22 北京奇虎科技有限公司 The communication means of a kind of kernel state and User space, device and terminal
CN113448897A (en) * 2021-07-12 2021-09-28 上海交通大学 Array structure and optimization method suitable for pure user mode remote direct memory access

Also Published As

Publication number Publication date
CN118606079A (en) 2024-09-06

Similar Documents

Publication Publication Date Title
US20220263913A1 (en) Data center cluster architecture
US7233984B2 (en) Light weight file I/O over system area networks
US8131814B1 (en) Dynamic pinning remote direct memory access
JP5869135B2 (en) Direct I/O access for coprocessors
US20030145230A1 (en) System for exchanging data utilizing remote direct memory access
CN104216862B (en) Communication method and device between user process and system service
WO2021051919A1 (en) Data forwarding chip and server
CN112099940A (en) Method, equipment and system for realizing hardware acceleration processing
US11693804B2 (en) Cross bus memory mapping
US7650488B2 (en) Communication between processor core partitions with exclusive read or write to descriptor queues for shared memory space
US20070041383A1 (en) Third party node initiated remote direct memory access
CN113904938A (en) A system and method for dynamically configuring PCIe terminal equipment
CN114095251A (en) An implementation method of SSLVPN based on DPDK and VPP
CN118606079B (en) A communication method and system based on socket interface
WO2019153702A1 (en) Interrupt processing method, apparatus and server
WO2022017475A1 (en) Data access method and related device
WO2017101080A1 (en) Write request processing method, processor and computer
CN108989317A (en) A kind of RoCE network card data communication method and network interface card based on FPGA
CN114691286A (en) Server system, virtual machine creation method and device
JP2024501713A (en) Data access methods and related devices
CN117311896A (en) Method and device for testing direct memory access based on virtual kernel environment
CN113821309B (en) Communication method, device, equipment and storage medium between microkernel virtual machines
CN110519242A (en) Data transmission method and device
CN118606075A (en) Data access method and device between multiple operating systems
CN112433826A (en) Hybrid heterogeneous virtualization communication method and chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant