CN114415985A

CN114415985A - Stored data processing unit based on numerical control separation architecture

Info

Publication number: CN114415985A
Application number: CN202210333980.1A
Authority: CN
Inventors: 张雪庆
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-04-29

Abstract

The invention discloses a storage data processing unit based on a numerical control separation architecture, which comprises: a processor and a hardware acceleration engine; the hardware acceleration engine is used for realizing protocol unloading operation and data processing operation of a data plane; the processor is used for realizing software processing operation of the control plane and processing operation except data processing operation in the data plane; therefore, in the scheme, the storage data processing unit can distinguish the data stream and the control stream from hardware based on a numerical control separation architecture, so that the mutual influence between the control stream and the data stream is avoided, and the influence on the storage performance is reduced; in addition, in the scheme, a hardware acceleration engine special for executing data processing operation is also independent in the storage data processing unit, so that the data processing operation is realized more efficiently, and the processing performance of the storage data processing unit in the data storage field is improved.

Description

A Storage Data Processing Unit Based on Numerical Control Separation Architecture

技术领域technical field

本发明涉及数据存储技术领域，更具体地说，涉及一种基于数控分离架构的存储数据处理单元。The invention relates to the technical field of data storage, and more particularly, to a storage data processing unit based on a numerical control separation architecture.

背景技术Background technique

当前市场需求驱动全球存储数据量以ZB（Zettabyte，十万亿亿字节）级激增，单个存储硬盘性能、存储内部CPU（central processing unit，中央处理器）对内存访问带宽和存储所用的网络接口带宽也显著提升，客户对于存储系统I/O（Input/Output，输入/输出）性能也提出更高的需求，如：更高带宽及IOPS（Input/Output Operations Per Second，每秒的读写次数）、更低时延等，然而后摩尔时代半导体工艺制程发展减缓，单核算力滞胀（52%->3.5%），这些对于存储系统设计带来了巨大性能提升挑战。The current market demand drives the global storage data volume to surge at the level of ZB (Zettabyte, one trillion gigabytes). The bandwidth has also been significantly improved, and customers have also put forward higher requirements for the I/O (Input/Output) performance of the storage system, such as higher bandwidth and IOPS (Input/Output Operations Per Second, the number of reads and writes per second). ), lower latency, etc. However, the development of semiconductor processes in the post-Moore era has slowed down, and the single computing power stagflation (52%->3.5%) has brought huge performance improvement challenges to the design of storage systems.

当前主流的存储系统框架是以CPU计算为中心的（Compute Centric）架构，适用于传统的存储设备使用场景，以CPU为中心，通过高速总线将前端接口卡（如网卡、FC（FibreChannel，光纤）卡）、图形计算处理器（graphics processing unit ，GPU）、内存等计算、存储、通信设备挂载在CPU下面，所有的计算、控制都由CPU发起，CPU起到了关键核心的控制地位。但是，随着后摩尔时代来临，CPU单核计算能力滞涨，使得CPU已经成为存储系统性能提升的瓶颈。The current mainstream storage system framework is based on the CPU computing-centric (Compute Centric) architecture, which is suitable for traditional storage device usage scenarios. With the CPU as the center, the front-end interface cards (such as network cards, FC (FibreChannel, optical fiber) Cards), graphics processing unit (GPU), memory and other computing, storage, and communication devices are mounted under the CPU. All calculations and controls are initiated by the CPU, and the CPU plays a key role in the control of the core. However, with the advent of the post-Moore era, the single-core computing power of the CPU has stagnant, making the CPU a bottleneck for improving the performance of storage systems.

因此，如何提升处理器在数据存储领域的处理性能，是本领域技术人员需要解决的问题。Therefore, how to improve the processing performance of the processor in the field of data storage is a problem to be solved by those skilled in the art.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于数控分离架构的存储数据处理单元，以提升处理器在数据存储领域的处理性能。The purpose of the present invention is to provide a storage data processing unit based on the numerical control separation architecture, so as to improve the processing performance of the processor in the field of data storage.

为实现上述目的，本发明提供一种基于数控分离架构的存储数据处理单元，包括：In order to achieve the above object, the present invention provides a storage data processing unit based on a numerical control separation architecture, comprising:

处理器及硬件加速引擎；Processor and hardware acceleration engine;

其中，所述硬件加速引擎用于实现协议卸载操作及数据平面的数据处理操作；所述处理器用于实现控制平面的软件处理操作，以及所述数据平面中除所述数据处理操作以外的处理操作。The hardware acceleration engine is used to implement protocol offloading operations and data processing operations of the data plane; the processor is used to implement software processing operations of the control plane and processing operations other than the data processing operations in the data plane .

其中，所述硬件加速引擎包括协议加速引擎，所述协议加速引擎用于实现协议处理操作，以及数据一致性的硬件加速操作。Wherein, the hardware acceleration engine includes a protocol acceleration engine, and the protocol acceleration engine is used to implement protocol processing operations and hardware acceleration operations of data consistency.

其中，所述协议加速引擎具体用于实现TCP协议处理操作及NVMe over Fabric协议处理操作。Wherein, the protocol acceleration engine is specifically used to implement TCP protocol processing operations and NVMe over Fabric protocol processing operations.

其中，所述硬件加速引擎包括数据流加速引擎，所述数据流加速引擎用于实现数据搬运操作、数据编码操作、数据转码操作、内存比较操作、数据查询操作、数据插取操作中的至少一者。Wherein, the hardware acceleration engine includes a data stream acceleration engine, and the data stream acceleration engine is used to implement at least one of data handling operations, data encoding operations, data transcoding operations, memory comparison operations, data query operations, and data insertion operations. one.

其中，所述处理器为ARM处理器。Wherein, the processor is an ARM processor.

其中，所述ARM处理器实现的软件处理操作包括：存储配置服务操作、机箱管理操作、日志搜集操作、异常处理操作、固件升级操作、用户安全操作、生产诊断操作中的至少一者。The software processing operations implemented by the ARM processor include: at least one of storage configuration service operations, chassis management operations, log collection operations, exception handling operations, firmware upgrade operations, user security operations, and production diagnostic operations.

其中，所述ARM处理器在所述数据平面中实现的处理操作包括：NVMe oF Target管理操作、IO多路径管理操作、缓存管理操作、Disk管理操作、协议处理操作中的至少一者。The processing operations implemented by the ARM processor in the data plane include: at least one of NVMe oF Target management operations, IO multipath management operations, cache management operations, Disk management operations, and protocol processing operations.

其中，所述处理器通过不同的处理器核实现控制平面的软件处理操作，以及所述数据平面中除所述数据处理操作以外的处理操作。Wherein, the processor implements software processing operations of the control plane and processing operations other than the data processing operations in the data plane through different processor cores.

其中，所述存储数据处理单元的用户态包括所述数据平面、所述控制平面及公共模块。Wherein, the user state of the storage data processing unit includes the data plane, the control plane and the common module.

其中，所述公共模块用于实现：任务调度管理操作、内存管理操作及驱动程序管理操作。Wherein, the common module is used to implement: task scheduling management operation, memory management operation and driver management operation.

通过以上方案可知，本发明实施例提供的一种基于数控分离架构的存储数据处理单元，包括：处理器及硬件加速引擎；其中，该硬件加速引擎用于实现协议卸载操作及数据平面的数据处理操作；该处理器用于实现控制平面的软件处理操作，以及数据平面中除数据处理操作以外的处理操作；可见，在本方案中，存储数据处理单元可基于数控分离架构从硬件上对数据流和控制流进行区分，从而避免控制流和数据流之间相互影响，进而减少对存储性能的影响；并且，本方案在存储数据处理单元中，还独立出专门用于执行数据处理操作的硬件加速引擎，以便更高效的实现数据处理操作，提高了存储数据处理单元在数据存储领域的处理性能。It can be seen from the above solutions that a storage data processing unit based on a numerically controlled separation architecture provided by an embodiment of the present invention includes: a processor and a hardware acceleration engine; wherein, the hardware acceleration engine is used to implement protocol offloading operations and data processing of the data plane operation; the processor is used to realize the software processing operations of the control plane and the processing operations other than the data processing operations in the data plane; it can be seen that in this solution, the storage data processing unit can perform data flow and The control flow is distinguished, so as to avoid the mutual influence between the control flow and the data flow, thereby reducing the impact on the storage performance; in addition, in the storage data processing unit, this solution also separates a hardware acceleration engine dedicated to executing data processing operations. , so as to realize the data processing operation more efficiently and improve the processing performance of the storage data processing unit in the data storage field.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative efforts.

图1为本发明实施例公开的一种基于数控分离架构的存储数据处理单元；FIG. 1 is a storage data processing unit based on a numerical control separation architecture disclosed in an embodiment of the present invention;

图2为本发明实施例公开的系统结构示意图；FIG. 2 is a schematic structural diagram of a system disclosed in an embodiment of the present invention;

图3a为本发明实施例公开的控制平面的控制流示意图；3a is a schematic diagram of a control flow of a control plane disclosed in an embodiment of the present invention;

图3b为本发明实施例公开的数据平面的数据流示意图。FIG. 3b is a schematic diagram of a data flow of a data plane disclosed in an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例公开了一种基于数控分离架构的存储数据处理单元，以提升处理器在数据存储领域的处理性能。The embodiment of the present invention discloses a storage data processing unit based on a numerical control separation architecture, so as to improve the processing performance of a processor in the field of data storage.

参见图1，本发明实施例提供的一种基于数控分离架构的存储数据处理单元，包括：处理器11及硬件加速引擎12；Referring to FIG. 1 , a storage data processing unit based on a numerically controlled separation architecture provided by an embodiment of the present invention includes: a processor 11 and a hardware acceleration engine 12;

其中，该硬件加速引擎12用于实现协议卸载操作及数据平面的数据处理操作；该处理器11用于实现控制平面的软件处理操作，以及数据平面中除数据处理操作以外的处理操作。The hardware acceleration engine 12 is used to implement protocol offloading operations and data processing operations on the data plane; the processor 11 is used to implement software processing operations on the control plane and processing operations other than data processing operations on the data plane.

本方案中的存储数据处理单元为SPU（Storage Processing Unit），该SPU应用在以数据为中心的新型存储架构中，该SPU包括处理器及硬件加速引擎，并且采用数控分离架构设计，在SPU内实现多种I/O硬件加速技术，结合CPU处理软件控制流、GPU处理AI（Artificial Intelligence，人工智能）及图形运算、SSD处理存储，可实现存储系统的性能的可持续线性增长。The storage data processing unit in this solution is SPU (Storage Processing Unit), which is applied in a new data-centric storage architecture. The SPU includes a processor and a hardware acceleration engine, and adopts a numerical control separation architecture design. Implementing a variety of I/O hardware acceleration technologies, combined with CPU processing software control flow, GPU processing AI (Artificial Intelligence) and graphics operations, and SSD processing storage, it can achieve sustainable linear growth in the performance of the storage system.

具体来说，在本实施例中，由于ARM处理器在CPU环境下使用较多、应用比较成熟，且具有功耗低等优点，因此在本方案中，可使用ARM处理器实现控制平面的软件处理操作，以及数据平面中除数据处理操作以外的处理操作，但本方案并不局限于使用ARM处理器，在实际应用中也可根据需求选择其他处理器；硬件加速引擎是指存储数据处理单元里的IP核。Specifically, in this embodiment, since the ARM processor is used more in the CPU environment, the application is relatively mature, and it has the advantages of low power consumption, etc., in this solution, the ARM processor can be used to implement the software of the control plane Processing operations, and processing operations other than data processing operations in the data plane, but this solution is not limited to the use of ARM processors, and other processors can also be selected according to requirements in practical applications; hardware acceleration engine refers to the storage data processing unit the IP core in.

在本实施例中，硬件加速引擎为针对存储应用场景提供的专门用于进行数据加速处理的硬件，传统CPU执行数据处理操作时，必须使用CPU的核心进行处理，数据处理效率较低，而在本方案中，将硬件加速引擎独立出以后，就可以使用专门的硬件更高效的实现数据处理操作，从而提高了存储数据处理单元在数据存储领域的处理性能。In this embodiment, the hardware acceleration engine is hardware specially used for data acceleration processing provided for storage application scenarios. When a traditional CPU performs data processing operations, the core of the CPU must be used for processing, and the data processing efficiency is low. In this solution, after the hardware acceleration engine is independent, specialized hardware can be used to implement data processing operations more efficiently, thereby improving the processing performance of the storage data processing unit in the data storage field.

进一步，在传统方案中，数据流和控制流均通过CPU去执行相关运算，也即：CPU的核需要同时对数据流和控制流做运算，若存在硬件故障，则控制流会出现问题，这将影响CPU的核不能正常处理数据流，从而影响整体存储性能，而在本申请中，存储数据处理单元主要基于数控分离架构，处理器通过不同的处理器核实现控制平面的软件处理操作，以及数据平面中除数据处理操作以外的处理操作，这种数控分离的方式，可从硬件上对数据流和控制流进行区分，从而避免控制流和数据流之间的相互影响，进而减少对存储性能的影响。Further, in the traditional solution, both the data flow and the control flow are performed by the CPU to perform related operations, that is, the CPU core needs to perform operations on the data flow and the control flow at the same time. If there is a hardware failure, there will be a problem with the control flow. The core that will affect the CPU cannot process the data stream normally, thereby affecting the overall storage performance, and in this application, the storage data processing unit is mainly based on the numerical control separation architecture, and the processor implements the software processing operations of the control plane through different processor cores, and Processing operations other than data processing operations in the data plane, this numerical control separation method can distinguish the data flow and control flow from the hardware, so as to avoid the mutual influence between the control flow and the data flow, thereby reducing the storage performance. Impact.

综上可以看出，在本方案中，存储数据处理单元可基于数控分离架构从硬件上对数据流和控制流进行区分，从而避免控制流和数据流之间相互影响，进而减少对存储性能的影响，以便最大程度优化存储数据处理单元的内部I/O性能；本方案在存储数据处理单元中，还独立出专门用于执行数据处理操作的硬件加速引擎，以便更高效的实现数据处理操作，在存储领域大量IO处理需求下，通过分离架构及硬件加速引擎能实现数据的快速流动，支持并行方式处理多并发I/O，提高了存储数据处理单元在数据存储领域的处理性能，提高I/O处理效率，节省设备成本。From the above, it can be seen that in this solution, the storage data processing unit can distinguish the data flow and the control flow from the hardware based on the numerical control separation architecture, so as to avoid the mutual influence between the control flow and the data flow, thereby reducing the impact on the storage performance. In order to optimize the internal I/O performance of the storage data processing unit to the greatest extent, this solution also separates a hardware acceleration engine specially used to perform data processing operations in the storage data processing unit, so as to realize data processing operations more efficiently. With a large number of IO processing requirements in the storage field, the separation architecture and hardware acceleration engine can realize the rapid flow of data, support parallel processing of multiple concurrent I/O, improve the processing performance of the storage data processing unit in the data storage field, and improve the I/O O processing efficiency, saving equipment cost.

在本实施例中，该硬件加速引擎包括协议加速引擎，该协议加速引擎用于实现协议处理操作，以及数据一致性的硬件加速操作；其中，协议加速引擎具体用于实现TCP协议处理操作及NVMe over Fabric协议处理操作。In this embodiment, the hardware acceleration engine includes a protocol acceleration engine, which is used to implement protocol processing operations and hardware acceleration operations of data consistency; wherein, the protocol acceleration engine is specifically used to implement TCP protocol processing operations and NVMe The over Fabric protocol handles operations.

需要说明的是，网卡本身具有通信通能，在本实施例中，在网卡中设置有协议加速引擎，该协议加速引擎支持协议卸载特性，如：实现TCP（Transmission Control Protocol，传输控制协议）协议卸载操作、NVMe over Fabric协议（网络高效存储协议）卸载操作以及数据一致性T10-DIF（Data Integrity Field，数据一致性保护）操作硬件卸载，也就是说：本申请将传统方案中使用软件做的处理操作，改为使用硬件（协议加速引擎）来做，以便节省软件计算资源，提高处理速度。如：TCP包具有包头、校验数据等，传统方案需要通过CPU使用软件来处理（如：去除包头、校验数据一致性等），但是在本申请中，可直接使用网卡中的协议加速引擎来执行，这个过程相当于把协议处理操作卸载至硬件上，进而节省处理器处理资源。It should be noted that the network card itself has communication capabilities. In this embodiment, a protocol acceleration engine is set in the network card, and the protocol acceleration engine supports protocol offloading features, such as implementing the TCP (Transmission Control Protocol, Transmission Control Protocol) protocol. Offload operation, NVMe over Fabric protocol (Network Efficient Storage Protocol) offload operation, and data consistency T10-DIF (Data Integrity Field, Data Consistency Protection) operation hardware offload, that is to say: this application uses software in the traditional solution to do For processing operations, use hardware (protocol acceleration engine) instead to save software computing resources and improve processing speed. For example, TCP packets have headers, check data, etc. The traditional solution needs to be processed by CPU using software (such as removing headers, checking data consistency, etc.), but in this application, the protocol acceleration engine in the network card can be directly used This process is equivalent to offloading protocol processing operations to hardware, thereby saving processor processing resources.

进一步的，本实施例中的硬件加速引擎还包括数据流加速引擎，该数据流加速引擎用于实现数据搬运操作、数据编码操作、数据转码操作、内存比较操作、数据查询操作、数据插取操作中的至少一者。Further, the hardware acceleration engine in this embodiment further includes a data stream acceleration engine, and the data stream acceleration engine is used to implement data handling operations, data encoding operations, data transcoding operations, memory comparison operations, data query operations, and data insertion operations. at least one of the operations.

具体来说，本申请通过数据流加速引擎实现上述六个操作时，可设置每个数据流加速引擎实现不同的操作，从而提高数据处理效率，进而提升数据存储性能；其中，数据流加速引擎通过执行数据搬运操作可实现加速数据拷贝，从而替代传统CPU完成memcpy的操作，通过数据流加速引擎搬运一定大小的数据，支持的介质包括：内存、非易失性内存等；数据流加速引擎执行的数据编码操作具体包括数据纠删运算、数据加密、数据压缩等等，通过数据流加速引擎实现纠删运算，替代了原有架构中使用CPU对数据进行计算的方案，由于使用了高速互联共享总线技术，使得纠删编码过程中不同模块间数据搬运很少，开销极低，实现了I/O路径的快速访问；通过数据流加速引擎实现数据加密、数据压缩功能时，可在数据路径上进行合并，即：若既需要数据加密又需要数据压缩时，I/O路径可以合并处理，相比之前分开处理逻辑，路径更短更优。通过数据流加速引擎执行内存比较操作，可实现加速数据比较，替代传统CPU完成memcmp的操作，通过数据流加速引擎比较一定大小的数据，支持的比较包括：全零检测、全1检测、内存比较取差值等，比较操作还包括数据重删、数据一致性保护等。Specifically, when the above-mentioned six operations are implemented by the data stream acceleration engine in the present application, each data stream acceleration engine can be set to implement different operations, thereby improving data processing efficiency and thus improving data storage performance; Executing data transfer operations can accelerate data copying, thus replacing the traditional CPU to complete the memcpy operation, and transfer data of a certain size through the data stream acceleration engine. The supported media include: memory, non-volatile memory, etc.; Data encoding operations include data erasure operations, data encryption, data compression, etc. The erasure operations are implemented through the data stream acceleration engine, replacing the original architecture that uses CPU to calculate data. Due to the use of high-speed interconnection shared buses The technology makes the data transfer between different modules in the erasure coding process very small, the overhead is extremely low, and the fast access to the I/O path is realized; when the data encryption and data compression functions are implemented through the data stream acceleration engine, the data encryption and data compression functions can be carried out on the data path. Combination, that is, if both data encryption and data compression are required, the I/O paths can be combined for processing. Compared with the previous separate processing logic, the path is shorter and better. Performing memory comparison operations through the data stream acceleration engine can accelerate data comparison, replacing the traditional CPU to complete the memcmp operation, and compare data of a certain size through the data stream acceleration engine. The supported comparisons include: all-zero detection, all-one detection, and memory comparison The comparison operation also includes data deduplication, data consistency protection, etc.

综上可以看出，本方案通过设置独立的数据流加速引擎，可实现高效的实现数据搬运操作、数据编码操作、数据转码操作、内存比较操作、数据查询操作、数据插取操作，降低I/O路径时延至微秒级，并且，这种由数据流加速引擎专门处理数据流的方式，可以减少硬件资源对控制平面中控制流的影响，从而提高了存储数据处理单元在数据存储领域的处理性能。In summary, it can be seen that this solution can efficiently implement data handling operations, data encoding operations, data transcoding operations, memory comparison operations, data query operations, and data insertion operations by setting up an independent data stream acceleration engine, reducing I. The delay of the /O path is down to microseconds, and this way of processing the data stream specially by the data stream acceleration engine can reduce the impact of hardware resources on the control flow in the control plane, thereby improving the storage data processing unit in the field of data storage. processing performance.

参见图2，为本发明实施例提供的系统结构示意图。通过该图可以看出，本实施例中的存储数据处理单元的用户态包括控制平面21、数据平面22及公共模块23。控制平面21中包括ARM处理器实现的各项软件处理操作，具体包括：存储配置服务操作、机箱管理操作、日志搜集操作、异常处理操作、固件升级操作、用户安全操作、生产诊断操作。数据平面22中包括数据流加速引擎实现的六级加速协同处理操作，以及ARM处理器实现的NVMe oFTarget管理操作、IO多路径管理操作、缓存管理操作、Disk管理操作、协议处理操作。公共模块用于实现：任务调度管理操作、内存管理操作及驱动程序管理操作。Referring to FIG. 2 , it is a schematic structural diagram of a system provided by an embodiment of the present invention. It can be seen from this figure that the user state of the storage data processing unit in this embodiment includes a control plane 21 , a data plane 22 and a common module 23 . The control plane 21 includes various software processing operations implemented by the ARM processor, specifically including: storage configuration service operations, chassis management operations, log collection operations, exception handling operations, firmware upgrade operations, user security operations, and production diagnostic operations. The data plane 22 includes six-level accelerated cooperative processing operations implemented by the data stream acceleration engine, and NVMe oFTarget management operations, IO multi-path management operations, cache management operations, Disk management operations, and protocol processing operations implemented by the ARM processor. Common modules are used to implement: task scheduling management operations, memory management operations and driver management operations.

具体来说，存储数据处理单元中的ARM处理器的核主要用于完成控制平面的任务，利用ARM处理器的核的运算能力执行相对复杂的存储操作；其中，存储配置服务操作主要涉及到人机用户接口、配置文件读写等复杂逻辑任务；机箱管理操作主要涉及到机箱内部分或全部、共享或非共享硬件器件的管理、异常处理、告警报错；日志搜集操作主要涉及到SPU内FW dump（Firmware dump，固件转储）的搜集、转储，SPU内部OS（operating system，操作系统）和硬件加速器内部日志文件的搜集；异常处理操作主要涉及到各种软硬件异常场景的应对处理；固件升级操作主要负责升级SPU内部FW；用户安全操作主要涉及到软硬件协同设计的安全策略、安全机制；生产诊断操作主要负责生产阶段软件诊断系统。Specifically, the core of the ARM processor in the storage data processing unit is mainly used to complete the task of the control plane, and the computing power of the core of the ARM processor is used to perform relatively complex storage operations; among them, the storage configuration service operation mainly involves human complex logical tasks such as computer user interface, configuration file reading and writing; chassis management operations mainly involve the management of some or all of the chassis, shared or non-shared hardware devices, exception handling, and alarm reporting; log collection operations mainly involve FW dump in the SPU (Firmware dump, firmware dump) collection, dump, SPU internal OS (operating system, operating system) and hardware accelerator internal log files collection; exception handling operations mainly involve the handling of various software and hardware exception scenarios; firmware The upgrade operation is mainly responsible for upgrading the internal FW of the SPU; the user security operation mainly involves the security strategy and security mechanism of the software and hardware co-design; the production diagnosis operation is mainly responsible for the software diagnosis system in the production stage.

在数据平面22中，数据流加速引擎实现的六级加速协同处理操作具体包括：数据搬运操作、数据编码操作、数据转码操作、内存比较操作、数据查询操作、数据插取操作；NVMe oF Target（目的端）管理操作是指：在NVMe over Fabric协议的目的端（接收节点），对NVMe（non-volatile memory express，非易失性内存主机控制器接口规范）协议数据包执行协议解析操作和处理操作；IO多路径管理操作是指在数据传输时，将接收的数据通过对应的IO路径发送至对应的存储介质中；缓存管理操作是指为了提高存储性能，将经常读取的数据从底层存储介质读取到内存中，以便上层应用直接从内存中读取到相应数据，不需要再从底层存储介质中读取；Disk管理操作是指对存储介质进行管理；协议处理操作是指NVMe协议/SAS协议/SATA协议的Initiator，相当于数据发送端，将数据按照协议类型进行协议打包、封装，通过驱动层写到内核驱动，最终写到硬件介质中。In the data plane 22, the six-level accelerated cooperative processing operations implemented by the data stream acceleration engine specifically include: data handling operations, data encoding operations, data transcoding operations, memory comparison operations, data query operations, and data insertion operations; NVMe oF Target operations (Destination end) management operation refers to: at the destination end (receiving node) of the NVMe over Fabric protocol, perform protocol parsing operations on NVMe (non-volatile memory express, non-volatile memory host controller interface specification) protocol packets Processing operation; IO multi-path management operation refers to sending the received data to the corresponding storage medium through the corresponding IO path during data transmission; cache management operation refers to the frequently read data from the bottom layer in order to improve storage performance The storage medium is read into the memory, so that the upper-layer application can directly read the corresponding data from the memory without reading from the underlying storage medium; the Disk management operation refers to the management of the storage medium; the protocol processing operation refers to the NVMe protocol The initiator of the /SAS protocol/SATA protocol is equivalent to the data sender, which packages and encapsulates the data according to the protocol type, writes it to the kernel driver through the driver layer, and finally writes it to the hardware medium.

在公共模块23中，任务调度管理操作用于管理任务调度策略，如：任务冲突管理、任务优先级管理等等；内存管理操作用于进行内存管理，芯片在初始化时申请内存放入用户态，包括内存资源池、内存共享、数据零拷贝等操作均可通过内存管理操作实现；驱动程序管理操作用于管理用户态驱动，涉及到芯片里所有硬件的专用驱动，该驱动用于驱动硬件加速引擎、网卡等实现数据读写及控制功能；并且，在传统方案中，任务调度管理操作、内存管理操作及驱动程序管理操作是在内核态中实现，而在本方案中，为了使存储数据处理单元更高效，将公共模块在用户态中实现，如：传统方案中的驱动程序管理操作在内核态中实现，因此驱动程序需要在内核态与用户态之间通过中断处理方式进行通信，效率较低，而在本方案中，由于驱动程序设置在用户态中，因此能通过无堵塞的非中断的轮训方式去处理数据，从而高效完成任务。In the common module 23, the task scheduling management operation is used to manage the task scheduling strategy, such as: task conflict management, task priority management, etc.; the memory management operation is used for memory management. Operations including memory resource pools, memory sharing, and zero-copy data can be implemented through memory management operations; driver management operations are used to manage user-mode drivers, involving all dedicated drivers for hardware in the chip, which are used to drive hardware acceleration engines. , network card, etc. to realize data read, write and control functions; and, in the traditional scheme, task scheduling management operations, memory management operations and driver management operations are implemented in the kernel mode, but in this scheme, in order to store the data processing unit More efficient, the common module is implemented in user mode, such as: the driver management operation in the traditional scheme is implemented in the kernel mode, so the driver needs to communicate between the kernel mode and the user mode through interrupt processing, which is inefficient. , and in this solution, since the driver is set in the user mode, it can process data through a non-blocking and non-interrupted polling method, thereby completing the task efficiently.

参见图3a，为本发明实施例提供的控制平面的控制流示意图，参见图3b，为本发明实施例提供的数据平面的数据流示意图。在图3a及图3b中，RNIC（RDMA-aware networkinterface controller）为支持RDMA（Remote Direct Memory Access，远程直接数据存取）的网络接口控制器，协议加速引擎包括对TCP协议卸载操作及NVMe over Fabric协议卸载操作，以及数据一致性协议T10-DIF（Data Integrity Field，数据一致性保护）的硬件加速操作，Core为ARM处理器的核，L3Cache为三级缓存，DRAM为（Dynamic Random AccessMemory）为动态随机存取存储器，数据流加速引擎用于实现数据搬运操作、数据编码操作、数据转码操作、内存比较操作、数据查询操作、数据插取操作，SATA（Serial ATA，串行ATA）接口、PCIe（peripheral component interconnect express，高速串行计算机扩展总线标准）接口、SAS（Serial Attached SCSI，串行SCSI）接口用于与其他存储介质连接，如：SSD（Solid State Disk，固态硬盘）及HDD（Hard Disk Drive，硬盘驱动器）。Referring to FIG. 3a, it is a schematic diagram of a control flow of a control plane provided by an embodiment of the present invention, and referring to FIG. 3b, it is a schematic diagram of a data flow of a data plane provided by an embodiment of the present invention. In Figure 3a and Figure 3b, RNIC (RDMA-aware network interface controller) is a network interface controller supporting RDMA (Remote Direct Memory Access, Remote Direct Data Access), and the protocol acceleration engine includes offloading the TCP protocol and NVMe over Fabric. Protocol offload operation, and hardware acceleration operation of data consistency protocol T10-DIF (Data Integrity Field, data consistency protection), Core is the core of ARM processor, L3Cache is L3 cache, DRAM is (Dynamic Random Access Memory) is dynamic Random access memory, data stream acceleration engine is used to implement data handling operations, data encoding operations, data transcoding operations, memory comparison operations, data query operations, data insertion operations, SATA (Serial ATA, Serial ATA) interface, PCIe (peripheral component interconnect express, high-speed serial computer expansion bus standard) interface, SAS (Serial Attached SCSI, serial SCSI) interface is used to connect with other storage media, such as: SSD (Solid State Disk, solid-state drive) and HDD (Hard Disk Drive).

在图3a中，控制平面主要通过Core控制相关模块（缓存、内存、网卡等）进行配置及操作，如在对网卡进行配置时，需要通过Core上的相关软件控制网卡执行初始化工作或者配置工作，在执行这些操作时，会涉及到内存的处理，该过程即为图3a中第一个控制流线条，在图3a的第二个控制流线条中，在对加速引擎执行初始化工作或者配置工作，或者对存储介质的控制器进行配置时，同样需要通过Core去处理，该过程同样会涉及到对内存的处理。在图3b所示的数据流中，需要通过网卡从前端获取数据包，协议加速引擎对该数据包进行协议处理获取对应数据并存储至内存，通过数据流加速引擎对数据执行数据处理操作，将处理后的数据通过对应接口写入存储介质中，完成数据写入过程，数据读取过程即为反方向过程，再此便不具体赘述。In Figure 3a, the control plane mainly controls the configuration and operation of the relevant modules (cache, memory, network card, etc.) through the Core. For example, when configuring the network card, it is necessary to control the network card through the relevant software on the Core to perform initialization or configuration. When performing these operations, memory processing will be involved. This process is the first control flow line in Figure 3a. In the second control flow line in Figure 3a, initialization work or configuration work is performed on the acceleration engine. Or when configuring the controller of the storage medium, it also needs to be processed by the Core, and this process also involves the processing of the memory. In the data flow shown in Figure 3b, it is necessary to obtain data packets from the front end through the network card, the protocol acceleration engine performs protocol processing on the data packets to obtain the corresponding data and stores them in the memory, and the data flow acceleration engine performs data processing operations on the data. The processed data is written into the storage medium through the corresponding interface to complete the data writing process, and the data reading process is the reverse direction process, which will not be described in detail again.

综上可以看出，本方案中的存储数据处理单元可基于数控分离架构从硬件上对数据流和控制流进行区分，从而避免控制流和数据流之间相互影响，进而减少对存储性能的影响，以便最大程度优化存储数据处理单元的内部I/O性能；通过硬件加速引擎可高效的实现数据处理操作，在存储领域大量IO处理需求下，通过分离架构及硬件加速引擎能实现数据的快速流动，支持并行方式处理多并发I/O，提高了存储数据处理单元在数据存储领域的处理性能，提高I/O处理效率，节省设备成本。并且，将公共模块在用户态中实现，可进一步提高存储数据处理单元的处理效率。From the above, it can be seen that the storage data processing unit in this solution can distinguish the data flow and the control flow from the hardware based on the numerical control separation architecture, so as to avoid the mutual influence between the control flow and the data flow, thereby reducing the impact on the storage performance. , in order to optimize the internal I/O performance of the storage data processing unit to the greatest extent; through the hardware acceleration engine, data processing operations can be efficiently realized, and under the large number of IO processing requirements in the storage field, the separation architecture and hardware acceleration engine can realize the rapid flow of data , supports parallel processing of multiple concurrent I/Os, improves the processing performance of the storage data processing unit in the field of data storage, improves I/O processing efficiency, and saves equipment costs. Moreover, implementing the common module in the user mode can further improve the processing efficiency of the storage data processing unit.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其它实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. a storage data processing unit based on numerical control separation architecture, is characterized in that, comprises:

Processor and hardware acceleration engine;

The hardware acceleration engine is used to implement protocol offloading operations and data processing operations of the data plane; the processor is used to implement software processing operations of the control plane and processing operations other than the data processing operations in the data plane .

2 . The storage data processing unit according to claim 1 , wherein the hardware acceleration engine comprises a protocol acceleration engine, and the protocol acceleration engine is used to implement protocol processing operations and hardware acceleration operations of data consistency. 3 .

3 . The storage data processing unit according to claim 2 , wherein the protocol acceleration engine is specifically used to implement TCP protocol processing operations and NVMe over Fabric protocol processing operations. 4 .

4. storage data processing unit according to claim 3, is characterized in that, described hardware acceleration engine comprises data stream acceleration engine, and described data stream acceleration engine is used to realize data handling operation, data encoding operation, data transcoding operation , at least one of a memory comparison operation, a data query operation, and a data insertion operation.

5. The storage data processing unit according to claim 1, wherein the processor is an ARM processor.

6. The storage data processing unit according to claim 5, characterized in that,

The software processing operations implemented by the ARM processor include: at least one of storage configuration service operations, chassis management operations, log collection operations, exception processing operations, firmware upgrade operations, user security operations, and production diagnostic operations.

7. The storage data processing unit according to claim 6, characterized in that,

The processing operations implemented by the ARM processor in the data plane include: at least one of NVMe oF Target management operations, IO multipath management operations, cache management operations, Disk management operations, and protocol processing operations.

8 . The storage data processing unit according to claim 1 , wherein the processor implements software processing operations of the control plane through different processor cores, and the data processing operations in the data plane other than the data processing operations are implemented by the processor. 9 . Handling operations.

9 . The storage data processing unit according to claim 1 , wherein the user state of the storage data processing unit includes the data plane, the control plane, and a common module. 10 .

10 . The storage data processing unit according to claim 9 , wherein the common module is used to implement: task scheduling management operations, memory management operations, and driver program management operations. 11 .