[go: up one dir, main page]

CN116932430A - Memory access control system for RDMA network cards - Google Patents

Memory access control system for RDMA network cards Download PDF

Info

Publication number
CN116932430A
CN116932430A CN202310924977.1A CN202310924977A CN116932430A CN 116932430 A CN116932430 A CN 116932430A CN 202310924977 A CN202310924977 A CN 202310924977A CN 116932430 A CN116932430 A CN 116932430A
Authority
CN
China
Prior art keywords
memory
access control
tree
network card
memory access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310924977.1A
Other languages
Chinese (zh)
Other versions
CN116932430B (en
Inventor
余锋
邢钱舰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202310924977.1A priority Critical patent/CN116932430B/en
Publication of CN116932430A publication Critical patent/CN116932430A/en
Application granted granted Critical
Publication of CN116932430B publication Critical patent/CN116932430B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/14Protection against unauthorised use of memory or access to memory
    • G06F12/1408Protection against unauthorised use of memory or access to memory by using cryptography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a memory access control system for an RDMA network card, which comprises: a memory access control software module and a memory access control logic module; the memory access control software module is arranged in the host RDMA network card driver and is used for registering and deregistering a memory area and a memory window and mapping a virtual address space to a physical address space; the memory access control logic module is arranged in the RDMA network card and used for judging whether the access authority of the RDMA request is legal or not, and converting the virtual address into the physical address when the access authority is legal, so as to provide information for DMA operation of the RDMA network card. The memory access control system for the RDMA network card provided by the application has the beneficial effects that query logic related to virtual-physical address conversion and memory access authority control is unloaded to RDMA network card hardware, so that the performance of high-speed transmission of the network card is brought into play, the transmission bandwidth of the network card is improved, the time delay of RDMA read and write operations is reduced, and the occupation of storage space is reduced.

Description

用于RDMA网卡的内存访问控制系统Memory access control system for RDMA network cards

技术领域Technical field

本发明属于网络技术领域,具体涉及一种用于RDMA网卡的内存访问控制系统。The invention belongs to the field of network technology, and specifically relates to a memory access control system for RDMA network cards.

背景技术Background technique

RDMA(Remote Direct Memory Access)网络以其低延时、低CPU开销和高带宽的优异特性,近年来已经成为数据中心网络的主流,其种类包括Infiniband网络和RoCE网络。RDMA支持数据直接从一台计算机的内存传输到另一台计算机的内存,无需双方操作系统介入。RDMA技术消除了数据包在用户空间和内核空间的拷贝和上下文切换的开销,因而能解放内存带宽和CPU周期用于改进应用系统性能。大多数RDMA网络都使用专门的智能网卡,将网络服务卸载到硬件,进一步减轻CPU负载。RDMA (Remote Direct Memory Access) network has become the mainstream of data center networks in recent years due to its excellent characteristics of low latency, low CPU overhead and high bandwidth. Its types include Infiniband network and RoCE network. RDMA supports data transfer directly from the memory of one computer to the memory of another computer without the intervention of the operating systems of both parties. RDMA technology eliminates the overhead of copying and context switching of data packets in user space and kernel space, thereby freeing up memory bandwidth and CPU cycles to improve application system performance. Most RDMA networks use specialized smart network cards to offload network services to hardware, further reducing CPU load.

RDMA提供了一套软件传输接口,应用程序可以通过接口创建传输请求(WorkRequest,简称WR),WR中描述了希望与对端交互的内容,如内存地址、长度和密钥。WR中的内存地址是虚拟地址,该虚拟内存必须是已经被注册过的,有远程访问权限的,并且密钥正确才能允许对端的RDMA请求。以RDMA写操作(RDMA Write)请求为例,发起端将对端待写入的内存起始地址、写入长度和密钥放入WR;响应端网卡解析该WR,需要验证该内存已经注册,有远程写权限,并且密钥正确,再进行虚拟地址到物理地址的转换,网卡使用该物理地址对主机内存发起DMA访问。RDMA provides a set of software transmission interfaces through which applications can create transmission requests (WorkRequest, WR for short). WR describes the content that they wish to interact with the peer, such as memory address, length and key. The memory address in WR is a virtual address. The virtual memory must have been registered, has remote access permission, and the key is correct to allow the peer's RDMA request. Taking the RDMA Write request as an example, the initiating end puts the starting address of the memory to be written by the peer, the write length and the key into the WR; the responding end's network card parses the WR and needs to verify that the memory has been registered. With remote write permission and the key is correct, the virtual address is converted to a physical address, and the network card uses the physical address to initiate DMA access to the host memory.

一些RDMA网卡通过将读写内存的请求和虚拟地址传输给驱动程序,获取物理地址和内存访问权限,这种方式有很好的灵活性,但是因为涉及与驱动程序的交互,软件开销大,延迟高,不适用于对带宽和延迟要求较高的RDMA网络;一些RDMA网卡保存主机地址映射表的副本,通过芯片内CAM(Content Address Memory)资源进行存储和地址检索,但是这样的方式需要不断更新网卡中的缓存,由于芯片内CAM资源有限,可能会有大量的查询不命中导致查询效率下降。因此,为提高虚拟-物理地址转换的查询效率,减少网卡内存储资源占用,增强内存访问权限控制与地址转换的并行性,同时提高系统的通用性,需要进一步研究内存访问控制技术方案。Some RDMA network cards obtain physical addresses and memory access rights by transmitting memory read and write requests and virtual addresses to the driver. This method has good flexibility, but because it involves interaction with the driver, the software overhead is large and delays High, not suitable for RDMA networks with high bandwidth and delay requirements; some RDMA network cards save a copy of the host address mapping table and perform storage and address retrieval through the on-chip CAM (Content Address Memory) resource, but this method requires constant updating. For the cache in the network card, due to the limited CAM resources in the chip, there may be a large number of query misses, resulting in a decrease in query efficiency. Therefore, in order to improve the query efficiency of virtual-physical address translation, reduce the storage resource occupation in the network card, enhance the parallelism of memory access control and address translation, and improve the versatility of the system, further research on memory access control technical solutions is needed.

发明内容Contents of the invention

本发明提供了一种用于RDMA网卡的内存访问控制系统解决上述提到的技术问题,具体采用如下的技术方案:The present invention provides a memory access control system for RDMA network cards to solve the above-mentioned technical problems, and specifically adopts the following technical solutions:

一种用于RDMA网卡的内存访问控制系统,包含内存访问控制软件模块和内存访问控制逻辑模块;A memory access control system for RDMA network cards, including a memory access control software module and a memory access control logic module;

所述内存访问控制软件模块设置在主机RDMA网卡驱动程序中,用于进行内存区域和内存窗口的注册与注销,以及进行虚拟地址空间到物理地址空间的映射;The memory access control software module is set in the host RDMA network card driver and is used to register and unregister the memory area and memory window, and to map the virtual address space to the physical address space;

所述内存访问控制逻辑模块设置在RDMA网卡中,所述内存访问控制逻辑模块用于判断RDMA请求的访问权限是否合法,并在访问权限合法的情况下,进行虚拟地址到物理地址的转换,为网卡的DMA操作提供信息。The memory access control logic module is set in the RDMA network card. The memory access control logic module is used to determine whether the access permission requested by RDMA is legal, and if the access permission is legal, perform conversion from the virtual address to the physical address, as Provides information for DMA operations of the network card.

进一步地,所述内存访问控制软件模块管理一个内存注册列表,用于存储注册的内存区域和内存窗口相关信息;Further, the memory access control software module manages a memory registration list for storing registered memory areas and memory window related information;

所述内存访问控制软件模块还管理一个B+树数据结构,用于存储虚拟地址到物理地址的映射;The memory access control software module also manages a B+ tree data structure for storing the mapping of virtual addresses to physical addresses;

内存访问控制逻辑模块使用RDMA网卡硬件中存储资源保存内存注册列表和B+树数据结构的副本,所述内存访问控制软件模块主动发起对RDMA网卡硬件中内存注册列表和B+树数据结构的更新,保证软件与硬件存储的一致性;The memory access control logic module uses storage resources in the RDMA network card hardware to save copies of the memory registration list and B+ tree data structure. The memory access control software module actively initiates updates to the memory registration list and B+ tree data structure in the RDMA network card hardware to ensure Consistency of software and hardware storage;

所述内存访问控制逻辑模块通过访问内存注册列表判断RDMA请求的访问权限是否合法,通过查询B+树数据结构获得虚拟地址到物理地址的映射。The memory access control logic module determines whether the access permission requested by RDMA is legal by accessing the memory registration list, and obtains the mapping from the virtual address to the physical address by querying the B+ tree data structure.

进一步地,所述内存访问控制软件模块包括:Further, the memory access control software module includes:

虚拟-物理地址映射单元,用于将应用程序请求的虚拟地址映射到统一的IO虚拟地址空间,再进行虚拟-物理地址转换,将一段连续的虚拟地址空间映射成一段或多段连续的物理地址分段;The virtual-physical address mapping unit is used to map the virtual address requested by the application program to a unified IO virtual address space, and then perform virtual-physical address conversion to map a continuous virtual address space into one or more continuous physical address segments. part;

B+树管理单元,用于将注册的虚拟地址作为B+树的键,将注册内存长度和物理分段信息作为B+树的值,保存到B+树数据结构;The B+ tree management unit is used to use the registered virtual address as the key of the B+ tree, the registered memory length and physical segmentation information as the value of the B+ tree, and save it to the B+ tree data structure;

内存区域/内存窗口管理单元,用于执行内存区域和内存窗口的注册和注销,生成包括本地访问密钥和远程访问密钥的访问密钥,将有效标识、访问密钥、虚拟地址空间的注册地址和长度、地址类型和访问权限等信息存储到内存注册列表。Memory area/memory window management unit, used to perform registration and deregistration of memory areas and memory windows, generate access keys including local access keys and remote access keys, and register effective identification, access keys, and virtual address spaces Information such as address and length, address type and access permissions are stored in a memory registration list.

进一步地,所述内存访问控制软件模块采用数组的形式在系统内存中存储内存注册列表;Further, the memory access control software module uses an array to store the memory registration list in the system memory;

内存注册列表的数组的元素包含:有效标识、本地访问密钥、远程访问密钥、虚拟地址空间的注册地址和长度、地址类型和访问权限。The elements of the array of the memory registration list include: valid identification, local access key, remote access key, registration address and length of the virtual address space, address type and access permissions.

进一步地,有效标识表示该数组元素是否有效,内存区域/内存窗口注册成功时,该有效标识为1,当内存区域/内存窗口被注销时,该有效标识为0;Further, the valid flag indicates whether the array element is valid. When the memory area/memory window is successfully registered, the valid flag is 1. When the memory area/memory window is logged out, the valid flag is 0;

本地访问密钥和远程访问密钥分别在该内存区域/内存窗口具有本地访问权限和远程访问权限时有效,均是32位整数,其中高24位为内存注册列表的查询索引,低8位为全局唯一的密钥;The local access key and the remote access key are valid when the memory area/memory window has local access permissions and remote access permissions respectively. Both are 32-bit integers, of which the upper 24 bits are the query index of the memory registration list and the lower 8 bits are Globally unique key;

虚拟地址空间的注册地址和长度为注册内存区域或内存窗口时指定的虚拟地址空间的起始地址和总注册长度;The registration address and length of the virtual address space are the starting address and total registration length of the virtual address space specified when registering the memory area or memory window;

地址类型包含普通地址和零基地址两种,分别用数字0和1表示;Address types include ordinary addresses and zero-based addresses, represented by the numbers 0 and 1 respectively;

访问权限分为本地写、远程读、远程写和远程原子操作,使用4位掩码表示。Access permissions are divided into local write, remote read, remote write and remote atomic operations, expressed using a 4-bit mask.

进一步地,所述内存访问控制软件模块采用一个树结构节点数组和一个内容节点数组的形式在系统内存中存储B+树数据结构;Further, the memory access control software module uses a tree structure node array and a content node array to store the B+ tree data structure in the system memory;

树结构节点数组存储B+树的中间节点和叶子节点,中间节点元素包含键和左右子树节点的指针,叶子节点元素包含键和指向内容数组元素的指针,内容节点数组存储完整的键-值对,内容节点元素与叶子节点元素相对应,每个内容节点元素保存其右兄弟节点元素指针,方便进行范围查询;The tree structure node array stores the middle nodes and leaf nodes of the B+ tree. The middle node elements contain keys and pointers to the left and right subtree nodes. The leaf node elements contain keys and pointers to content array elements. The content node array stores complete key-value pairs. , the content node element corresponds to the leaf node element, and each content node element saves its right sibling node element pointer to facilitate range query;

每个B+树节点包含的元素个数最大值在网卡初始化时确定;The maximum number of elements contained in each B+ tree node is determined when the network card is initialized;

每个B+树元素包含的物理分段个数最大值在网卡初始化时确定;The maximum number of physical segments contained in each B+ tree element is determined when the network card is initialized;

在一次内存映射过程中,如果虚拟地址空间映射后的物理分段个数超过该最大值,则将这些虚拟地址和物理分段信息拆分到多个B+树元素进行存储;During a memory mapping process, if the number of physical segments after virtual address space mapping exceeds the maximum value, the virtual address and physical segment information will be split into multiple B+ tree elements for storage;

B+树的创建、更新和删除,只能由所述内存访问控制软件模块处理,并将处理结果同步到所述内存访问控制逻辑模块,使得RDMA网卡中的B+树虚拟-物理地址转换表与系统内存中的B+树数据结构内容一致。The creation, update and deletion of the B+ tree can only be processed by the memory access control software module, and the processing results are synchronized to the memory access control logic module, so that the B+ tree virtual-physical address translation table in the RDMA network card is consistent with the system The contents of the B+ tree data structure in memory are consistent.

进一步地,所述内存访问控制逻辑模块包括内存访问权限匹配逻辑单元、虚拟-物理地址转换逻辑单元、内存注册列表存储单元和B+树存储单元;Further, the memory access control logic module includes a memory access permission matching logic unit, a virtual-physical address conversion logic unit, a memory registration list storage unit and a B+ tree storage unit;

所述内存访问权限匹配逻辑单元将RDMA请求中的虚拟地址、长度和密钥作为输入,把密钥的高24位作为内存注册列表的索引,得到内存注册信息,再将RDMA请求与内存注册表中的信息进行匹配,当RDMA请求中的内存区域在内存注册范围内,请求密钥与注册的密钥完全相等,且请求的读写权限在注册的权限范围内时,则内存访问权限匹配成功,否则权限匹配失败;The memory access rights matching logic unit takes the virtual address, length and key in the RDMA request as input, uses the high 24 bits of the key as the index of the memory registration list, obtains the memory registration information, and then compares the RDMA request with the memory registration table. The information in the RDMA request is matched. When the memory area in the RDMA request is within the memory registration range, the requested key is completely equal to the registered key, and the requested read and write permissions are within the registered permission range, the memory access permissions are matched successfully. , otherwise permission matching fails;

所述虚拟-物理地址转换逻辑单元将RDMA请求中的虚拟地址作为键,到B+树虚拟-物理地址转换表中进行查询,该B+树查询为范围查询,查询到虚拟地址在某个B+树元素键与值中的长度信息所表示的内存范围内,则查询成功;The virtual-to-physical address conversion logical unit uses the virtual address in the RDMA request as a key to query the B+ tree virtual-to-physical address translation table. The B+ tree query is a range query, and it is found that the virtual address is in a certain B+ tree element. If the length information in the key and value is within the memory range, the query is successful;

所述内存注册列表存储单元用于存储内存注册列表的副本;The memory registration list storage unit is used to store a copy of the memory registration list;

所述B+树存储单元用于存储B+树数据结构的副本。The B+ tree storage unit is used to store a copy of the B+ tree data structure.

进一步地,所述用于RDMA网卡的内存访问控制系统适用于基于Infiniband网络或RoCE网络的RDMA网卡设备。Further, the memory access control system for RDMA network cards is suitable for RDMA network card devices based on Infiniband network or RoCE network.

本发明的有益之处还在于所提供的用于RDMA网卡的内存访问控制系统,将虚拟-物理地址转换和内存访问权限控制相关的查询逻辑卸载到RDMA网卡硬件,不仅有助于发挥网卡高速传输的性能,提高网卡传输带宽,减少RDMA读、写操作时延,而且减少存储空间占用。The present invention is also beneficial in that the provided memory access control system for the RDMA network card offloads the query logic related to virtual-physical address conversion and memory access permission control to the RDMA network card hardware, which not only helps to maximize the high-speed transmission of the network card performance, improve network card transmission bandwidth, reduce RDMA read and write operation delays, and reduce storage space usage.

本发明的有益之处还在于所提供的用于RDMA网卡的内存访问控制系统,提出基于B+树数据结构存储虚拟-物理地址转换表,方便进行快速更新和查询,在处理从注册内存空间的任意位置开始传输的RDMA请求时,可以快速进行虚拟地址到物理地址的转换。The present invention is also beneficial in that it provides a memory access control system for RDMA network cards, and proposes to store a virtual-to-physical address conversion table based on a B+ tree data structure, which facilitates rapid updates and queries, while processing any data from the registered memory space. When the location starts an RDMA request for transmission, virtual address to physical address translation can be performed quickly.

本发明的有益之处还在于所提供的用于RDMA网卡的内存访问控制系统,使用网卡驱动软件管理B+树虚拟-物理地址转换表和内存注册列表,并在RDMA网卡硬件中存储B+树和内存注册表的副本,仅允许RDMA网卡硬件进行查询操作,简化了网卡硬件的逻辑设计,降低查询延迟。The invention is also beneficial in that it provides a memory access control system for an RDMA network card, uses the network card driver software to manage the B+ tree virtual-physical address conversion table and the memory registration list, and stores the B+ tree and memory in the RDMA network card hardware. A copy of the registry only allows RDMA network card hardware to perform query operations, simplifying the logical design of the network card hardware and reducing query latency.

附图说明Description of the drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.

图1为本发明的用于RDMA网卡的内存访问控制系统的示意图;Figure 1 is a schematic diagram of a memory access control system for RDMA network cards of the present invention;

图2为本发明的内存访问控制软件模块内部结构的示意图;Figure 2 is a schematic diagram of the internal structure of the memory access control software module of the present invention;

图3a为本发明一示例性实例提供的B+树存储键值对内容的示意图;Figure 3a is a schematic diagram of a B+ tree storing key-value pair content provided by an exemplary embodiment of the present invention;

图3b为本发明另一示例性实例提供的B+树存储键值对内容的示意图;Figure 3b is a schematic diagram of a B+ tree storing key-value pair content provided by another exemplary embodiment of the present invention;

图3c为本发明一示例性实例提供的B+树数据结构的示意图;Figure 3c is a schematic diagram of the B+ tree data structure provided by an exemplary embodiment of the present invention;

图4为本发明一示例性实例提供的内存注册列表数组的示意图;Figure 4 is a schematic diagram of a memory registration list array provided by an exemplary embodiment of the present invention;

图5为本发明的内存访问控制逻辑模块内部结构的示意图。Figure 5 is a schematic diagram of the internal structure of the memory access control logic module of the present invention.

具体实施方式Detailed ways

下面详细描述本申请的实施例,实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are intended to explain the present application, but should not be construed as limiting the present application.

如图1所示为本申请的一种用于RDMA网卡的内存访问控制系统,包含内存访问控制软件模块10和内存访问控制逻辑模块20。Figure 1 shows a memory access control system for RDMA network cards of the present application, which includes a memory access control software module 10 and a memory access control logic module 20.

内存访问控制软件模块10设置在主机RDMA网卡驱动程序中。用于进行内存区域和内存窗口的注册与注销,以及进行虚拟地址空间到物理地址空间的映射。The memory access control software module 10 is set in the host RDMA network card driver. Used to register and unregister memory areas and memory windows, as well as map virtual address space to physical address space.

内存访问控制逻辑模块20设置在RDMA网卡中。RDMA网卡基于ASIC或FPGA。内存访问控制逻辑模块20用于判断RDMA请求的访问权限是否合法,并在访问权限合法的情况下,进行虚拟地址到物理地址的转换,为网卡的DMA操作提供信息。The memory access control logic module 20 is provided in the RDMA network card. RDMA network cards are based on ASIC or FPGA. The memory access control logic module 20 is used to determine whether the access permission requested by RDMA is legal, and if the access permission is legal, convert the virtual address to the physical address to provide information for the DMA operation of the network card.

在本申请的实施方式中,内存访问控制软件模块10管理一个内存注册列表,用于存储注册的内存区域和内存窗口相关信息。In the embodiment of the present application, the memory access control software module 10 manages a memory registration list for storing registered memory areas and memory window related information.

内存访问控制软件模块10还管理一个B+树数据结构,用于存储虚拟地址到物理地址的映射。The memory access control software module 10 also manages a B+ tree data structure for storing virtual address to physical address mapping.

内存访问控制逻辑模块20使用RDMA网卡硬件中存储资源保存内存注册列表和B+树数据结构的副本,内存访问控制软件模块10主动发起对RDMA网卡硬件中内存注册列表和B+树数据结构的更新,保证软件与硬件存储的一致性。内存访问控制软件模块10所管理的内存注册列表和B+树数据结构与内存访问控制逻辑模块20所管理的内存注册列表和B+树数据结构具有相同的存储格式和内容。内存访问控制逻辑模块20通过访问内存注册列表判断RDMA请求的访问权限是否合法,通过查询B+树数据结构获得虚拟地址到物理地址的映射。The memory access control logic module 20 uses the storage resources in the RDMA network card hardware to save copies of the memory registration list and B+ tree data structure. The memory access control software module 10 actively initiates updates to the memory registration list and B+ tree data structure in the RDMA network card hardware to ensure Consistency of software and hardware storage. The memory registration list and B+ tree data structure managed by the memory access control software module 10 have the same storage format and content as the memory registration list and B+ tree data structure managed by the memory access control logic module 20 . The memory access control logic module 20 determines whether the access permission requested by the RDMA is legal by accessing the memory registration list, and obtains the mapping from the virtual address to the physical address by querying the B+ tree data structure.

在本申请的实施方式中,内存访问控制软件模块10包括:虚拟-物理地址映射单元11、B+树管理单元12和内存区域/内存窗口管理单元13。In the embodiment of the present application, the memory access control software module 10 includes: a virtual-physical address mapping unit 11, a B+ tree management unit 12, and a memory area/memory window management unit 13.

其中,虚拟-物理地址映射单元11用于将应用程序请求的虚拟地址映射到统一的IO虚拟地址空间,再进行虚拟-物理地址转换,将一段连续的虚拟地址空间映射成一段或多段连续的物理地址分段。具体地,将应用程序请求的起始虚拟地址和内存长度记为<UVA,L>,将映射到统一的IO虚拟地址空间的起始虚拟地址和长度记为<IOVA,L>。经地址转换后,一段连续的虚拟地址空间会映射成N(N为正整数)段连续的物理地址分段,将第i个物理分段记为<PAi,Offseti>,所有的物理分段记为{<PAi,Offseti>|0<=i<N},其中PAi为第i个分段的起始物理地址,Offseti为第i个分段的起始位置在整个内存注册区的偏移。Among them, the virtual-physical address mapping unit 11 is used to map the virtual address requested by the application program to a unified IO virtual address space, and then perform virtual-physical address conversion to map a continuous virtual address space into one or more continuous physical addresses. Address segmentation. Specifically, the starting virtual address and memory length requested by the application program are recorded as <UVA,L>, and the starting virtual address and length mapped to the unified IO virtual address space are recorded as <IOVA,L>. After address translation, a continuous virtual address space will be mapped into N (N is a positive integer) continuous physical address segments. The i-th physical segment is recorded as <PAi, Offseti>, and all physical segments are recorded as is {<PAi,Offseti>|0<=i<N}, where PAi is the starting physical address of the i-th segment, and Offseti is the offset of the starting position of the i-th segment in the entire memory registration area.

B+树管理单元12用于在收到内存注册请求的情形下,将注册的虚拟地址作为B+树的键,将注册内存长度和物理分段信息作为B+树的值,保存到B+树数据结构。The B+ tree management unit 12 is configured to, upon receiving a memory registration request, use the registered virtual address as the key of the B+ tree, use the registered memory length and physical segmentation information as the value of the B+ tree, and save it to the B+ tree data structure.

B+树管理单元12还用于在收到内存注销请求的情形下,将虚拟地址作为B+树的键,将对应的B+树节点删除。The B+ tree management unit 12 is also configured to use the virtual address as the key of the B+ tree and delete the corresponding B+ tree node when receiving a memory deregistration request.

在本申请的实施方式中,内存访问控制软件模块10采用一个树结构节点数组和一个内容节点数组的形式存储B+树数据结构。优选地,内存访问控制软件模块10将B+树数据结构保存在系统内存中。In the embodiment of the present application, the memory access control software module 10 stores the B+ tree data structure in the form of a tree structure node array and a content node array. Preferably, the memory access control software module 10 saves the B+ tree data structure in system memory.

具体地,B+树管理单元12将虚拟-物理地址映射单元11产生的虚拟地址IOVA作为B+树的键(key),将长度L和物理分段信息{<PAi,Offseti>|0<=i<N}作为B+树的值(value)保存到B+树结构。设定一个B+树元素可以保存K(K为正整数)个物理分段,超过K个的物理分段内存被拆分到多个B+树元素保存。B+树采用定长数组的方式存储,包含一个树结构节点数组和一个内容节点数组。树结构节点数组存储B+树的中间节点和叶子节点,其中中间节点元素包含键和左右子树节点的指针,而叶子节点元素包含键和指向内容数组元素的指针。内容节点数组存储完整的键-值对,内容节点元素与叶子节点元素相对应,每个内容节点元素保存其右兄弟节点元素指针,方便进行范围查询。Specifically, the B+ tree management unit 12 uses the virtual address IOVA generated by the virtual-physical address mapping unit 11 as the key of the B+ tree, and uses the length L and the physical segment information {<PAi, Offseti>|0<=i< N} is saved to the B+ tree structure as the value of the B+ tree. It is set that a B+ tree element can save K (K is a positive integer) physical segments. The memory of more than K physical segments is split into multiple B+ tree elements for storage. B+ trees are stored in fixed-length arrays, including a tree structure node array and a content node array. The tree structure node array stores the middle nodes and leaf nodes of the B+ tree, where the middle node elements contain keys and pointers to the left and right subtree nodes, while the leaf node elements contain keys and pointers to content array elements. The content node array stores complete key-value pairs. Content node elements correspond to leaf node elements. Each content node element stores its right sibling node element pointer to facilitate range queries.

其中,每个B+树节点包含的元素个数最大值在网卡初始化时确定。每个B+树元素包含的物理分段个数最大值在网卡初始化时确定。在一次内存映射过程中,如果虚拟地址空间映射后的物理分段个数超过该最大值,则将这些虚拟地址和物理分段信息拆分到多个B+树元素进行存储。B+树的创建、更新和删除,只能由内存访问控制软件模块10处理,并将处理结果同步到内存访问控制逻辑模块20,使得RDMA网卡中的B+树虚拟-物理地址转换表与系统内存中的B+树数据结构内容一致。Among them, the maximum number of elements contained in each B+ tree node is determined when the network card is initialized. The maximum number of physical segments contained in each B+ tree element is determined when the network card is initialized. During a memory mapping process, if the number of physical segments mapped to the virtual address space exceeds the maximum value, the virtual address and physical segment information will be split into multiple B+ tree elements for storage. The creation, update and deletion of the B+ tree can only be processed by the memory access control software module 10, and the processing results are synchronized to the memory access control logic module 20, so that the B+ tree virtual-physical address translation table in the RDMA network card is consistent with the system memory. The contents of the B+ tree data structure are consistent.

如图3a所示为B+树存储键值对的一个实施例,在该实施例中,一个B+树元素可以保存K=4个物理分段,应用程序注册的内存空间被映射为N=3个物理分段,所有物理分段可以被保存到一个B+树元素中,该元素的键为:IOVA,值为:{L,PA0,Offset0,PA1,Offset1,PA2,Offset2}。Figure 3a shows an embodiment of a B+ tree storing key-value pairs. In this embodiment, a B+ tree element can store K = 4 physical segments, and the memory space registered by the application is mapped to N = 3 Physical segmentation. All physical segments can be saved into a B+ tree element. The key of this element is: IOVA, and the value is: {L,PA0,Offset0,PA1,Offset1,PA2,Offset2}.

如图3b所示为B+树存储键值对的另一个实施例,在该实施例中,一个B+树元素可以保存K=4个物理分段,应用程序注册的内存空间被映射为N=6个物理分段,这些物理分段会被拆分到两个B+树元素中保存,第一个元素的键为IOVA0,值为{L0,PA0,Offset0,PA1,Offset1,PA2,Offset2,PA3,Offset3},第二个元素的键为IOVA1,值为{L1,PA4,Offset4,PA5,Offset5}。Figure 3b shows another embodiment of B+ tree storing key-value pairs. In this embodiment, one B+ tree element can save K=4 physical segments, and the memory space registered by the application is mapped to N=6 physical segments. These physical segments will be split into two B+ tree elements for storage. The key of the first element is IOVA0 and the value is {L0,PA0,Offset0,PA1,Offset1,PA2,Offset2,PA3, Offset3}, the key of the second element is IOVA1, and the value is {L1,PA4,Offset4,PA5,Offset5}.

如图3c所示为本申请一示例性实例提供的一个B+树数据结构示意图。在该实例中,设定一个B+树节点最多包含R=3个元素,每个元素最多保存K=4个物理分段,图3c示出了一个三层B+树结构。值得注意的是,由于树结构节点和内容节点被保存到不同的数组,因此B+树结构的叶子节点保存的是元素的键及指向该键对应的内容节点元素的索引,树结构节点的其他层保存的是元素的键及指向该键对应的左右子节点的指针。Figure 3c shows a schematic diagram of a B+ tree data structure provided by an exemplary example of this application. In this example, a B+ tree node is set to contain at most R=3 elements, and each element can store at most K=4 physical segments. Figure 3c shows a three-layer B+ tree structure. It is worth noting that since tree structure nodes and content nodes are saved in different arrays, the leaf nodes of the B+ tree structure store the key of the element and the index pointing to the content node element corresponding to the key. Other layers of the tree structure node What is saved is the key of the element and the pointers to the left and right child nodes corresponding to the key.

需要注意的是,树结构节点数组和内容节点数组可容纳的节点个数在RDMA网卡驱动初始化时就已经确定,在一些场景下,如果B+树节点过多,预留的数组存储空间不足时,需要B+树管理单元12重新向系统申请更大的数组存储空间,并将原数组中的元素全部拷贝到新的存储空间,最后释放原数组存储空间。每个树结构节点或内容节点可容纳的元素个数在RDMA网卡驱动初始化时就确定,并且不会中途改变。It should be noted that the number of nodes that the tree structure node array and content node array can accommodate has been determined when the RDMA network card driver is initialized. In some scenarios, if there are too many B+ tree nodes and the reserved array storage space is insufficient, The B+ tree management unit 12 needs to re-apply to the system for a larger array storage space, copy all elements in the original array to the new storage space, and finally release the original array storage space. The number of elements that each tree structure node or content node can accommodate is determined when the RDMA network card driver is initialized, and will not change midway.

内存区域/内存窗口管理单元13用于执行内存区域和内存窗口的注册和注销,生成包括本地访问密钥(L_Key)和远程访问密钥(R_Key)的访问密钥,将访问密钥、虚拟地址空间的注册地址和长度、地址类型和访问权限等信息存储到内存注册列表。The memory area/memory window management unit 13 is used to perform registration and deregistration of the memory area and memory window, generate an access key including a local access key (L_Key) and a remote access key (R_Key), and convert the access key, virtual address Information such as the registered address and length of the space, address type and access permissions are stored in the memory registration list.

内存访问控制软件模块10添加、删除和更新B+树节点的操作都同步到RDMA网卡。内存访问控制软件模块10添加、删除和更新内存注册列表操作都同步到RDMA网卡。The operations of adding, deleting and updating B+ tree nodes by the memory access control software module 10 are all synchronized to the RDMA network card. The operations of adding, deleting and updating the memory registration list of the memory access control software module 10 are all synchronized to the RDMA network card.

在本申请的实施方式中,内存访问控制软件模块10采用数组的形式存储内存注册列表。优选地,内存访问控制软件模块10在系统内存中存储内存注册列表。In the embodiment of the present application, the memory access control software module 10 stores the memory registration list in the form of an array. Preferably, the memory access control software module 10 stores the memory registration list in system memory.

内存注册列表的数组的元素包含:有效标识、本地访问密钥、远程访问密钥、虚拟地址空间的注册地址和长度、地址类型和访问权限。The elements of the array of the memory registration list include: valid identification, local access key, remote access key, registration address and length of the virtual address space, address type and access permissions.

在本申请的实施方式中,有效标识表示该数组元素是否有效,内存区域/内存窗口注册成功时,该有效标识为1,当内存区域/内存窗口被注销时,该有效标识为0。In the implementation of the present application, the valid flag indicates whether the array element is valid. When the memory area/memory window is successfully registered, the valid flag is 1. When the memory area/memory window is deregistered, the valid flag is 0.

本地访问密钥和远程访问密钥分别在该内存区域/内存窗口具有本地访问权限和远程访问权限时有效,均是32位整数,其中高24位为内存注册列表的查询索引,低8位为全局唯一的密钥。The local access key and the remote access key are valid when the memory area/memory window has local access permissions and remote access permissions respectively. Both are 32-bit integers, of which the upper 24 bits are the query index of the memory registration list and the lower 8 bits are Globally unique key.

虚拟地址空间的注册地址和长度为注册内存区域或内存窗口时指定的虚拟地址空间的起始地址和总注册长度。The registration address and length of the virtual address space are the starting address and total registration length of the virtual address space specified when registering the memory area or memory window.

地址类型包含普通地址和零基地址两种,分别用数字0和1表示。访问权限分为本地写、远程读、远程写和远程原子操作,使用4位掩码表示。值得注意的是,地址类型和访问权限类型可能随着RDMA网络协议的发展进行扩展,本发明实例包含的地址类型和访问权限类型也应当随着协议版本的升级进行相应的扩展。Address types include ordinary addresses and zero-based addresses, represented by the numbers 0 and 1 respectively. Access permissions are divided into local write, remote read, remote write and remote atomic operations, expressed using a 4-bit mask. It is worth noting that address types and access rights types may be expanded with the development of the RDMA network protocol, and the address types and access rights types included in the examples of the present invention should also be expanded accordingly with the upgrade of the protocol version.

如图4所示为本申请的一个内存注册列表数组实例的示意图。数组索引为0的元素描述了一个有效的内存区域/内存窗口注册信息,其注册的起始虚拟地址为0x141200,注册长度为10000字节,地址类型为普通IO地址,访问权限为本地可写、远程可写、远程可读,L_Key和R_Key为0x0A3。数组索引为1的元素描述了一个无效的内存区域/内存窗口注册信息,该元素虽然包含了完整的注册信息,但是有效位为0,说明是一组已经失效的信息。Figure 4 shows a schematic diagram of an example of a memory registration list array in this application. The element with array index 0 describes a valid memory area/memory window registration information. The registered starting virtual address is 0x141200, the registration length is 10000 bytes, the address type is a normal IO address, and the access permission is local writable, Remotely writable and remotely readable, L_Key and R_Key are 0x0A3. The element with array index 1 describes an invalid memory area/memory window registration information. Although this element contains complete registration information, the valid bit is 0, indicating that it is a set of invalid information.

在本申请的实施方式中,内存访问控制逻辑模块20包括内存访问权限匹配逻辑单元21、虚拟-物理地址转换逻辑单元22、内存注册列表存储单元23和B+树存储单元24。其中,内存注册列表存储单元23用于存储内存注册列表的副本。B+树存储单元24用于存储B+树数据结构的副本。In the embodiment of the present application, the memory access control logic module 20 includes a memory access permission matching logic unit 21, a virtual-physical address conversion logic unit 22, a memory registration list storage unit 23 and a B+ tree storage unit 24. The memory registration list storage unit 23 is used to store a copy of the memory registration list. The B+ tree storage unit 24 is used to store a copy of the B+ tree data structure.

内存访问权限匹配逻辑单元21将RDMA请求中的虚拟地址、长度和密钥作为输入,把密钥的高24位作为内存注册列表的索引,得到内存注册信息,再将RDMA请求与内存注册表中的信息进行匹配,当RDMA请求中的内存区域在内存注册范围内,请求密钥与注册的密钥完全相等,且请求的读写权限在注册的权限范围内时,则内存访问权限匹配成功,否则权限匹配失败。The memory access permission matching logic unit 21 takes the virtual address, length and key in the RDMA request as input, uses the high 24 bits of the key as the index of the memory registration list, obtains the memory registration information, and then compares the RDMA request with the memory registration table. The information is matched. When the memory area in the RDMA request is within the memory registration range, the requested key is completely equal to the registered key, and the requested read and write permissions are within the registered permission range, the memory access permissions are matched successfully. Otherwise permission matching fails.

作为一种优选的实施方式,内存访问权限匹配逻辑单元21的具体实施步骤如下:As a preferred implementation, the specific implementation steps of the memory access permission matching logic unit 21 are as follows:

S211:将RDMA请求中的虚拟地址(IOVA_in)、长度(L_in)、密钥(L_Key_in或R_Key_in)、访问权限请求(Access_in)作为输入,把密钥的高24位作为内存注册列表的索引,访问内存注册列表存储单元23,得到一条内存注册信息,包括前述有效标识(V)、访问密钥(L_Key、R_Key)、虚拟地址空间的注册地址(IOVA)和长度(L)、地址类型(AType)和访问权限(Access)等信息;S211: Use the virtual address (IOVA_in), length (L_in), key (L_Key_in or R_Key_in), and access permission request (Access_in) in the RDMA request as input, use the high 24 bits of the key as the index of the memory registration list, and access The memory registration list storage unit 23 obtains a piece of memory registration information, including the aforementioned valid identification (V), access key (L_Key, R_Key), virtual address space registration address (IOVA) and length (L), address type (AType) and access rights (Access) and other information;

S212:判断有效标识V为1,则进行步骤S213,否则结束并返回请求失败;S212: If it is judged that the valid flag V is 1, proceed to step S213, otherwise it will end and the request failure will be returned;

S213:判断输入的密钥(L_Key_in或R_Key_in)是否与该内存注册信息中的密钥(L_Key或R_Key)完全相等,以及输入的访问权限请求(Access_in)是否全部包含在内存注册信息的访问权限(Access)中,只有当密钥完全相当,并且访问权限被完全包含时,进行步骤S214,否则结束并返回请求失败;S213: Determine whether the input key (L_Key_in or R_Key_in) is completely equal to the key (L_Key or R_Key) in the memory registration information, and whether the input access permission request (Access_in) is all included in the access permission of the memory registration information ( Access), only when the keys are completely equivalent and the access rights are completely included, proceed to step S214, otherwise it will end and the request failure will be returned;

S214:判断RDMA请求中要访问的内存区域是否在注册范围内,即只有当((IOVA_in>=IOVA)&&(IOVA_in+L_in<=IOVA+L))成立,才说明内存访问权限匹配成功,返回请求成功,否则返回请求失败。S214: Determine whether the memory area to be accessed in the RDMA request is within the registration range, that is, only when ((IOVA_in>=IOVA)&&(IOVA_in+L_in<=IOVA+L)) is established, it means that the memory access permissions are matched successfully and return The request is successful, otherwise the request fails.

可以理解的是,上述步骤S212、S213、S214可以并行执行,将三个步骤的执行结果进行逻辑“与”,得到结果为真则返回请求成功,得到结果为假则返回请求失败。It can be understood that the above steps S212, S213, and S214 can be executed in parallel, and the execution results of the three steps are logically "ANDed". If the result is true, the request is returned as successful, and if the result is false, the request is returned as failed.

在具体实施例中,只有当内存访问权限匹配逻辑单元21返回请求成功后,才会启动虚拟-物理地址转换逻辑单元22的处理。如果内存访问权限匹配逻辑单元21返回请求失败,则内存访问控制逻辑模块20立即返回失败,则不会启动虚拟-物理地址转换逻辑单元22的处理。In a specific embodiment, only when the memory access rights matching logic unit 21 returns a successful request, the processing of the virtual-to-physical address conversion logic unit 22 will be started. If the memory access rights matching logic unit 21 returns a request failure, the memory access control logic module 20 immediately returns a failure, and the processing of the virtual-to-physical address translation logic unit 22 will not be started.

可选地,内存注册列表存储单元23和B+树存储单元24可以是片内存储资源SRAM,也可以是片外存储资源DDR。为减少数据访问延时,优选片内存储资源。Optionally, the memory registration list storage unit 23 and the B+ tree storage unit 24 may be an on-chip storage resource SRAM or an off-chip storage resource DDR. To reduce data access latency, on-chip storage resources are preferred.

虚拟-物理地址转换逻辑单元22将RDMA请求中的虚拟地址作为键,到B+树虚拟-物理地址转换表中进行查询,该B+树查询为范围查询,查询到虚拟地址在某个B+树元素键与值中的长度信息所表示的内存范围内,则查询成功。The virtual-to-physical address conversion logic unit 22 uses the virtual address in the RDMA request as a key to query the B+ tree virtual-to-physical address translation table. The B+ tree query is a range query, and it is found that the virtual address is in a certain B+ tree element key. If the query is within the memory range represented by the length information in the value, the query is successful.

虚拟-物理地址转换逻辑单元22的具体实施步骤如下:The specific implementation steps of the virtual-to-physical address conversion logic unit 22 are as follows:

S221:将RDMA请求中的虚拟地址(IOVA_in)和长度(L_in)作为输入,首先到B+树结构节点数组获取根节点,将IOVA_in与根节点元素的键进行比较,得到下一层孩子节点的索引;S221: Take the virtual address (IOVA_in) and length (L_in) in the RDMA request as input, first go to the B+ tree structure node array to obtain the root node, compare IOVA_in with the key of the root node element, and obtain the index of the next layer of child nodes. ;

S222:按照B+树的查询方式依次查询B+树各层直到叶子节点,查询到最后一个小于等于IOVA_in的元素,并得到该元素对应的内容节点元素索引;S222: Query each layer of the B+ tree up to the leaf node in sequence according to the B+ tree query method, query the last element that is less than or equal to IOVA_in, and obtain the content node element index corresponding to the element;

S223:将内容节点元素索引作为地址,查询内容节点数组元素,得到完整的键值对,即key=IOVA,value={L,PA0,Offset0,PA1,Offset1,PA2,Offset2,PA3,Offset3},以及右兄弟指针RightP;S223: Use the content node element index as the address, query the content node array element, and obtain the complete key-value pair, that is, key=IOVA, value={L,PA0,Offset0,PA1,Offset1,PA2,Offset2,PA3,Offset3}, And the right sibling pointer RightP;

S224:根据各物理分段的偏移,计算出从虚拟地址IOVA_in开始,长度为L_in的内存范围内的各物理分段起始地址及长度;如果表达式(IOVA_in+L_in<=IOVA+L)成立,则查询结束,输出前述各物理分段起始地址及长度;如果表达式IOVA_in+L_in>IOVA+L)成立,则需要继续查询其右兄弟元素,将右兄弟指针RightP作为内容节点元素索引地址,重复执行步骤S223。S224: Based on the offset of each physical segment, calculate the starting address and length of each physical segment within the memory range starting from the virtual address IOVA_in and having a length of L_in; if the expression (IOVA_in+L_in<=IOVA+L) If it is true, the query ends and the starting address and length of each physical segment mentioned above are output; if the expression IOVA_in+L_in>IOVA+L) is true, you need to continue to query its right sibling element, and use the right sibling pointer RightP as the content node element index. address, repeat step S223.

可以理解的是,在本申请的实施方式中,前述的用于RDMA网卡的内存访问控制系统适用于基于Infiniband网络或RoCE网络的RDMA网卡设备。It can be understood that in the implementation of the present application, the aforementioned memory access control system for RDMA network cards is applicable to RDMA network card devices based on Infiniband network or RoCE network.

以上显示和描述了本发明的基本原理、主要特征和优点。本行业的技术人员应该了解,上述实施例不以任何形式限制本发明,凡采用等同替换或等效变换的方式所获得的技术方案,均落在本发明的保护范围内。The basic principles, main features and advantages of the present invention have been shown and described above. Those skilled in the art should understand that the above embodiments do not limit the present invention in any way, and any technical solutions obtained by equivalent substitution or equivalent transformation fall within the protection scope of the present invention.

Claims (8)

1. A memory access control system for an RDMA network card, comprising a memory access control software module and a memory access control logic module;
the memory access control software module is arranged in the host RDMA network card driver and is used for registering and deregistering a memory area and a memory window and mapping a virtual address space to a physical address space;
the memory access control logic module is arranged in the RDMA network card and is used for judging whether the access authority of the RDMA request is legal or not, and converting the virtual address into the physical address under the condition that the access authority is legal, so as to provide information for DMA operation of the network card.
2. The memory access control system for an RDMA network card of claim 1,
the memory access control software module manages a memory registration list for storing registered memory areas and related information of memory windows;
the memory access control software module also manages a B+ tree data structure for storing a mapping of virtual addresses to physical addresses;
the memory access control logic module uses a storage resource in the RDMA network card hardware to store a memory registration list and a copy of a B+ tree data structure, and the memory access control software module actively initiates updating of the memory registration list and the B+ tree data structure in the RDMA network card hardware to ensure consistency of software and hardware storage;
the memory access control logic module judges whether the access authority of the RDMA request is legal or not by accessing the memory registration list, and obtains the mapping from the virtual address to the physical address by inquiring the B+ tree data structure.
3. The memory access control system for an RDMA network card of claim 2, wherein,
the memory access control software module includes:
the virtual-physical address mapping unit is used for mapping the virtual address requested by the application program into a unified IO virtual address space, performing virtual-physical address conversion, and mapping a section of continuous virtual address space into one or more sections of continuous physical address segments;
the B+ tree management unit is used for taking the registered virtual address as a key of the B+ tree, taking the registered memory length and the physical segmentation information as values of the B+ tree, and storing the values into a B+ tree data structure;
the memory area/memory window management unit is used for executing registration and cancellation of the memory area and the memory window, generating an access key comprising a local access key and a remote access key, and storing information such as effective identification, the access key, the registration address and length of the virtual address space, the address type, the access authority and the like into a memory registration list.
4. The memory access control system for an RDMA network card of claim 3,
the memory access control software module adopts an array form to store a memory registration list in a system memory;
the elements of the array of the memory registration list include: valid identification, local access key, remote access key, registered address and length of virtual address space, address type and access rights.
5. The memory access control system for an RDMA network card of claim 4, wherein,
the effective mark indicates whether the array element is effective, when the memory area/memory window is registered successfully, the effective mark is 1, and when the memory area/memory window is registered off, the effective mark is 0;
the local access key and the remote access key are respectively 32-bit integers when the memory area/memory window has local access authority and remote access authority, wherein the upper 24 bits are query indexes of a memory registration list, and the lower 8 bits are globally unique keys;
the registered address and length of the virtual address space are the initial address and total registered length of the virtual address space appointed when the memory area or the memory window is registered;
the address type comprises two types of common addresses and zero base addresses, which are respectively represented by numerals 0 and 1;
the access rights are divided into local write, remote read, remote write and remote atomic operations, using a 4-bit mask representation.
6. The memory access control system for an RDMA network card of claim 4, wherein,
the memory access control software module adopts a tree structure node array and a content node array to store a B+ tree data structure in a system memory;
the tree structure node array stores intermediate nodes and leaf nodes of the B+ tree, the intermediate node elements comprise keys and pointers of left and right subtree nodes, the leaf node elements comprise keys and pointers pointing to content array elements, the content node array stores complete key-value pairs, the content node elements correspond to the leaf node elements, and each content node element stores a right brother node element pointer thereof, so that range query is convenient to perform;
the maximum value of the number of elements contained in each B+ tree node is determined when the network card is initialized;
the maximum value of the number of the physical segments contained in each B+ tree element is determined when the network card is initialized;
in a memory mapping process, if the number of physical segments mapped by the virtual address space exceeds the maximum value, splitting the virtual addresses and the physical segment information into a plurality of B+ tree elements for storage;
and B+ tree creation, updating and deletion can only be processed by the memory access control software module, and the processing result is synchronized to the memory access control logic module, so that the B+ tree virtual-physical address conversion table in the RDMA network card is consistent with the B+ tree data structure content in the system memory.
7. The memory access control system for an RDMA network card of claim 6, wherein,
the memory access control logic module comprises a memory access right matching logic unit, a virtual-physical address conversion logic unit, a memory registration list storage unit and a B+ tree storage unit;
the memory access permission matching logic unit takes a virtual address, a length and a secret key in an RDMA request as input, takes the high 24 bits of the secret key as an index of a memory registration list to obtain memory registration information, matches the RDMA request with information in the memory registration list, and if a memory area in the RDMA request is in a memory registration range, a request secret key is completely equal to a registered secret key, and if the requested read-write permission is in a registered permission range, the memory access permission is successfully matched, otherwise, the permission matching is failed;
the virtual-physical address conversion logic unit takes a virtual address in an RDMA request as a key, queries the virtual-physical address conversion table of a B+ tree, wherein the B+ tree query is a range query, and queries that the virtual address is in a memory range represented by length information in a certain B+ tree element key and value, and the query is successful;
the memory registration list storage unit is used for storing copies of the memory registration list;
the B+ tree storage unit is used for storing a copy of the B+ tree data structure.
8. The memory access control system for an RDMA network card of claim 1,
the memory access control system for the RDMA network card is suitable for RDMA network card equipment based on an Infiniband network or a RoCE network.
CN202310924977.1A 2023-07-26 2023-07-26 Memory access control system for RDMA network cards Active CN116932430B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310924977.1A CN116932430B (en) 2023-07-26 2023-07-26 Memory access control system for RDMA network cards

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310924977.1A CN116932430B (en) 2023-07-26 2023-07-26 Memory access control system for RDMA network cards

Publications (2)

Publication Number Publication Date
CN116932430A true CN116932430A (en) 2023-10-24
CN116932430B CN116932430B (en) 2025-03-18

Family

ID=88389394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310924977.1A Active CN116932430B (en) 2023-07-26 2023-07-26 Memory access control system for RDMA network cards

Country Status (1)

Country Link
CN (1) CN116932430B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118113638A (en) * 2024-03-25 2024-05-31 浙江大学 RDMA data transmission method and device
CN118158088A (en) * 2024-03-25 2024-06-07 浙江大学 Control layer data core bypass system for RDMA network card
CN119311631A (en) * 2024-12-17 2025-01-14 珠海星云智联科技有限公司 Remote Direct Memory Access Verification Systems, Devices, and Clusters
WO2025145544A1 (en) * 2024-01-02 2025-07-10 上海交通大学 Smart network interface controller unloading-based remote memory system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756943B1 (en) * 2006-01-26 2010-07-13 Symantec Operating Corporation Efficient data transfer between computers in a virtual NUMA system using RDMA
US20150254126A1 (en) * 2014-03-07 2015-09-10 Conrad N. Wood Systems and Methods for Storage of Data in a Virtual Storage Device
CN107463447A (en) * 2017-08-21 2017-12-12 中国人民解放军国防科技大学 B + tree management method based on remote direct nonvolatile memory access
US20190018785A1 (en) * 2017-07-14 2019-01-17 Arm Limited Memory system for a data processing network
CN114756388A (en) * 2022-03-28 2022-07-15 北京航空航天大学 RDMA (remote direct memory Access) -based method for sharing memory among cluster system nodes as required

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756943B1 (en) * 2006-01-26 2010-07-13 Symantec Operating Corporation Efficient data transfer between computers in a virtual NUMA system using RDMA
US20150254126A1 (en) * 2014-03-07 2015-09-10 Conrad N. Wood Systems and Methods for Storage of Data in a Virtual Storage Device
US20190018785A1 (en) * 2017-07-14 2019-01-17 Arm Limited Memory system for a data processing network
CN110869913A (en) * 2017-07-14 2020-03-06 Arm有限公司 Memory system for data processing network
CN107463447A (en) * 2017-08-21 2017-12-12 中国人民解放军国防科技大学 B + tree management method based on remote direct nonvolatile memory access
CN114756388A (en) * 2022-03-28 2022-07-15 北京航空航天大学 RDMA (remote direct memory Access) -based method for sharing memory among cluster system nodes as required

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙志卓;张全新;李元章;谭毓安;刘靖宇;马忠梅;: "连续数据存储中面向RAID5的写操作优化设计", 计算机研究与发展, no. 08, 15 August 2013 (2013-08-15), pages 1604 - 1612 *
曹政;王达伟;刘新春;孙凝晖;: "曙光5000高性能计算机Barrier网络的设计", 计算机学报, no. 10, 15 October 2008 (2008-10-15), pages 1727 - 1736 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025145544A1 (en) * 2024-01-02 2025-07-10 上海交通大学 Smart network interface controller unloading-based remote memory system
CN118113638A (en) * 2024-03-25 2024-05-31 浙江大学 RDMA data transmission method and device
CN118158088A (en) * 2024-03-25 2024-06-07 浙江大学 Control layer data core bypass system for RDMA network card
CN119311631A (en) * 2024-12-17 2025-01-14 珠海星云智联科技有限公司 Remote Direct Memory Access Verification Systems, Devices, and Clusters
CN119311631B (en) * 2024-12-17 2025-04-08 珠海星云智联科技有限公司 Remote direct memory access verification system, device and cluster

Also Published As

Publication number Publication date
CN116932430B (en) 2025-03-18

Similar Documents

Publication Publication Date Title
CN116932430B (en) Memory access control system for RDMA network cards
US5386524A (en) System for accessing information in a data processing system
US11287994B2 (en) Native key-value storage enabled distributed storage system
US11269772B2 (en) Persistent memory storage engine device based on log structure and control method thereof
CN114756388A (en) RDMA (remote direct memory Access) -based method for sharing memory among cluster system nodes as required
US11210006B2 (en) Distributed scalable storage
CN106095698A (en) OO caching write, read method and device
WO2023066268A1 (en) Request processing method, apparatus and system
US9292549B2 (en) Method and system for index serialization
WO2023125630A1 (en) Data management method and related apparatus
CN116866298A (en) Virtual physical address translation system in RDMA network
CN101030135A (en) Method and device for storing C++ object in shared memory
CN115203211A (en) A method and system for generating a unique hash sequence number
CN115599532A (en) A method of accessing an index and a computer cluster
Shu et al. Towards unaligned writes optimization in cloud storage with high-performance ssds
US20210132801A1 (en) Optimized access to high-speed storage device
CN114942727B (en) Microkernel file system extensible page cache system and method
KR20230148736A (en) Systems and methods for a cross-layer key-value store architecture with a computational storage device
CN118519964A (en) Data processing method, apparatus, computer program product, device and storage medium
WO2024197789A1 (en) Fine-grained file system and file reading and writing method
CN117493282A (en) Metadata management method based on file system and related equipment thereof
TW202338810A (en) Nonvolatile storage device, host, and method of controlling nonvolatile storage device
US11797178B2 (en) System and method for facilitating efficient management of data structures stored in remote memory
US11934547B2 (en) Multiprotocol access control
CN117851286B (en) A memory address translation table compression method in RDMA ROCE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant