Background
With the development of computer technology, more and more computer devices apply computer interface expansion technology to expand computer interfaces. PCIE is a common computer interface extension technology.
PCIE, namely PCI-Express (peripheral component interconnect express), is a standard interface technology proposed by PCI-SIG organization for connecting internal components of a computer, and the architecture of the PCIE comprises PCIE devices of the types of root component RC (root complex), switch (switch), endpoint device (ep) and the like. The switch is used for expanding the PCIE interface, so that the host or the root component is connected with more endpoint devices.
PCIE interface technology is widely used in data storage in a data center, for example, each virtual machine corresponds to a physical solid state disk (NVMe SSD), where the physical solid state disk provides an independent data storage function for the corresponding virtual machine. Due to the development of storage technology, the storage capacity of a single solid state disk is very huge, and the performance of the single solid state disk also meets the requirements of PCIE Gen5×4. However, because the number of PCIE slots supported by the host is limited, as the number of virtual machines is continuously increased, in the prior art, a physical solid state hard disk needs to be virtualized into multiple small virtual hard disks through a virtualization technology and is provided for multiple virtual machines respectively, and the existing method generally uses a software mode to implement the virtualization of the solid state hard disk. However, virtualization implemented by software may increase delay of data transmission and may also decrease performance of the solid state disk.
The invention patent application with the bulletin number of CN115840620A discloses a data path construction method, which comprises the steps of detecting a read-write operation request initiated by a client virtual machine through an FPGA card, determining a shunt mark according to corresponding instruction read-write operation or data read-write operation, adding the shunt mark to a commit queue element corresponding to the read-write operation request, writing the commit queue element to the commit queue, initiating a direct memory access operation through a solid state disk according to address information in the commit queue element for executing the read-write operation request, responding the initiated direct read-write operation through the FPGA card, selectively shunting the direct memory access operation to a memory or a host of the FPGA card according to the shunt mark, and constructing a data path between the solid state disk and the host when the direct memory access operation is shunted to the host for the solid state disk to read and write the memory of the given client virtual machine. Thus, an NVMe virtualized data path is realized, the time delay is reduced, the bandwidth is improved, and the native NVMe drive can be used.
However, in the data path constructed by the method, since the direct memory access operation of the solid state disk is shunted to the host through the FPGA card, the data transmission delay of the data channel is increased, and the maximum performance of the data channel is also limited by the interface performance of the FPGA card. In order to plug in more solid state disks (usually, at most 4 solid state disks), the physical size of the FPGA card needs to be full, and one host can generally support 24 solid state disks, 6 FPGA cards are needed, the structure of the system is complex, and the production cost of the whole system is too high due to the high production cost of the FPGA cards.
Disclosure of Invention
The first object of the present invention is to provide a method for implementing virtualization of a data channel of a solid state disk with low production cost and low data transmission delay.
The second object of the present invention is to provide a device for implementing the method for implementing virtualization of the data channel of the solid state disk.
In order to achieve the first object of the present invention, the method for achieving virtualization of a data channel of a solid state disk provided by the present invention is applied to a hardware acceleration control chip, the hardware acceleration control chip is provided with a queue management module, the method comprises that a commit queue element processing module of the queue management module obtains a commit queue element to be processed currently from a host, and modifies the commit queue element recorded in the queue management module according to a mapping relation between a commit queue of a virtual hard disk and a commit queue of a physical hard disk; the solid state disk reads the submitted queue element recorded in the queue management module, analyzes the submitted queue element to obtain information of a physical area page list corresponding to the host, exchanges data with the host according to the submitted queue element and the information of the physical area page list, sends the completed queue element to the queue management module, and updates the completed queue element according to the queue mapping relation between the virtual hard disk and the physical hard disk and sends the completed queue element to the host.
According to the scheme, the hardware acceleration control chip is arranged, management of the submitted queue element and the completed queue element is achieved through the queue management module of the hardware acceleration control chip, and updating of the submitted queue element and the completed queue element can be achieved according to the queue mapping relation between the virtual hard disk and the physical hard disk. The invention realizes the virtualization of the solid state disk not by a software mode but by a physical hardware acceleration control chip, the solid state disk directly performs data interaction with the host, the delay of data transmission is less, and the efficiency of data transmission can be improved.
In addition, the hardware acceleration control chip is not directly connected to the solid state disk in a hardware connection mode, the hardware acceleration control chip and the solid state disk are communicated through a root component or a PCIE exchanger and the like, data transmission between the solid state disk and a host is not limited by bandwidth due to the fact that the hardware acceleration control chip is arranged, the size of a board card of the hardware acceleration control chip can be made small, and the virtualization requirements of all the solid state disks under the same root component can be simultaneously supported. In addition, as a plurality of solid state disks can be hung outside without arranging a plurality of FPGAs, the production cost of the system is low.
The method comprises the steps that before a commit queue element processing module obtains a currently pending commit queue element from a host, the host updates a commit queue doorbell state of a virtual hard disk and triggers a virtual hard disk queue scheduler, and the virtual hard disk queue scheduler sends a scheduling result to the commit queue element processing module.
Therefore, the host triggers the virtual hard disk queue scheduler by updating the commit queue doorbell state of the virtual hard disk, so that the host can actively acquire the data of the commit queue and the completion queue of each virtual hard disk.
After the queue management module modifies the submitted queue element recorded in the queue management module, the submitted queue element processing module updates the submitted queue doorbell state recorded by the doorbell register of the solid state disk in a peer-to-peer mode.
Therefore, after the queue element is submitted and recorded in the queue management module, the doorbell state of the submitted queue recorded in the doorbell register of the solid state disk can be updated timely, and the update of the queue element of the solid state disk is triggered.
After the queue management module updates the completion queue element, the queue management module also updates the completion queue doorbell state of the solid state disk in a point-to-point mode.
After the queue management module updates the completion queue element, the completion queue element of the solid state disk is triggered to be updated in a manner of updating the completion queue doorbell state of the solid state disk.
After the queue management module updates the completion queue doorbell state of the solid state disk, if the host receives interrupt request information or polls the completion queue and determines that the current IO operation is completed, the completion queue doorbell state recorded by the queue management module is updated.
Therefore, through the operation, IO operation initiated by the host can be completed, the whole operation process does not involve hardware such as FPGA, the host and the solid state disk can directly carry out data transmission, the delay time of the data transmission is very short, and the bandwidth of the data communication is not limited by the performance of the hardware.
The method further comprises the steps of initializing the virtual device, enumerating all interface devices after the host is started, loading a physical driver program to the solid state disk, creating a commit queue and a completion queue on a memory of the host, configuring base addresses of the commit queue and the completion queue as base address register spaces of computer interface devices virtualized by a hardware acceleration control chip when the commit queue and the completion queue are created, loading the virtual driver program to the solid state disk, transmitting read-write access operation of all base address register spaces except for an IO queue to a controller of the hardware acceleration control chip for processing by the computer interface, returning completion TLP information according to the base address register information by the controller, and transmitting the completion TLP information to the host through the computer interface.
Therefore, before the host initiates the IO operation, the initialization operation of the virtual device is performed first, that is, the virtual driver is loaded into the solid state disk in advance, so that the solid state disk can virtualize a plurality of virtual hard disks.
In a further scheme, when the host enumerates all interface devices, the hardware acceleration control chip transmits request information for configuring the TLP information to the processor, and the processor encapsulates the stored configuration space information and forms TLP information to return to the computer interface, and returns to the host through the computer interface.
In order to achieve the second objective, the device for achieving virtualization of the data channel of the solid state disk comprises a host, the solid state disk and a hardware acceleration control chip, wherein the hardware acceleration control chip is provided with a queue management module, the queue management module is provided with a submitted queue element processing module used for acquiring current submitted queue elements to be processed from the host and modifying the submitted queue elements recorded in the queue management module according to the mapping relation between the submitted queues of the virtual disk and the submitted queues of the physical disk, the solid state disk is used for reading the submitted queue elements recorded in the queue management module and analyzing the submitted queue elements to obtain information of a physical area page list corresponding to the host, data exchange is carried out between the submitted queue elements and the physical area page list and the host, the solid state disk sends completed queue elements to the queue management module, and the queue management module updates the completed queue elements according to the queue mapping relation between the virtual disk and the physical disk and sends the completed queue elements to the host.
The hardware acceleration control chip provided by the invention can realize the management of the submitted queue element and the completed queue element by the queue management module, and can realize the update of the submitted queue element and the completed queue element according to the queue mapping relation between the virtual hard disk and the physical hard disk. The invention realizes the virtualization of the solid state disk not by a software mode but by a physical hardware acceleration control chip, and the solid state disk directly performs data interaction with the host, so that the problem of data transmission delay can be effectively solved, and the data transmission efficiency is improved.
In a preferred scheme, the hardware acceleration control chip further comprises a computer interface and an embedded controller, wherein the computer interface is connected with the host, and the computer interface also comprises a controller for data exchange.
According to the scheme, the hardware acceleration control chip can be communicated with the host through the computer interface, the computer interface can be connected to the host on one hand, the embedded controller can be connected to the host on the other hand, and the queue management module can be communicated with the host through the computer interface, so that the data transmission requirement is met.
Further, the host includes two processor cores, and the computer interfaces are connected with both processor cores.
Therefore, one hardware acceleration control chip can exchange data with two processor cores, the use requirement of a host with two processor cores can be met, the two processor cores can share one hardware acceleration control chip, the resources of the hardware acceleration control chip can be fully utilized to the maximum, and the cost of the whole system is reduced.
Detailed Description
The method for realizing the virtualization of the data channel of the solid state disk is used for virtualizing the solid state disk, so that one physical solid state disk is virtualized into a plurality of virtual solid state disks, and a storage space of a corresponding virtual disk is provided for each virtual machine, thereby meeting the use requirement of increasing number of virtual machines.
An embodiment of a device for realizing virtualization of a data channel of a solid state disk is provided:
Referring to fig. 1, in this embodiment, on the premise of not changing the existing host structure, a hardware acceleration control chip 30 is provided, so that each solid state disk can still remain on the original slot, and the hardware acceleration control chip 30 and each solid state disk are connected to the root components RC (root complex) of the same group.
Illustrated in FIG. 1 is a typical Peer-to-Peer (NVMe) (Non Volatile Memory Host Controller Interface Specification ) satisfying virtualization system. The host is provided with two mutually independent processor cores, namely a CPU10 and a CPU11, and the two processor cores, namely the CPU10 and the CPU11, are respectively connected to the PCIE bus. In this embodiment, each of the processor cores CPU10 and CPU11 is connected to two solid state disks, for example, the processor core CPU10 is connected to the solid state disks 21 and 22, and the processor core CPU11 is connected to the solid state disks 23 and 24. Of course, in practical application, each processor core may be further connected to more solid state disks.
The hardware acceleration control chip 30 includes a computer interface 31, a queue management module 32 and an embedded controller 33, where the computer interface 31 is preferably a PCIE interface, the computer interface 31 is used to connect with a host and the controller 33, and the computer interface 31 needs to support two paths of communications, where one path is used to communicate with the controller 33 and is used to implement configuration of virtual devices and management of queues, and the other path needs to exchange data with the queue management module 32, transmit queue data between the queue management module 32 and the host, and also transmit queue data between the queue management module 32 and each solid state disk.
In addition, the computer interface 31 and the controller 33 of the hardware acceleration control chip 30 also need to complete PCIE enumeration, base address register space and management queue processing of the virtual device, and present the virtual device to the host, for example, present a plurality of virtual downstream ports, downstream devices, and the like. In addition, the computer interface 31 and the controller 33 of the hardware acceleration control chip 30 also virtualizes a PCIE device, and maps the IO queue space of each solid state disk to the base address register space, so that each solid state disk may access the IO queue space located on the hardware acceleration control chip 30 through a PCIE point-to-point (Peer to Peer) manner, instead of accessing the memory of the host, so as to implement virtualization of each solid state disk.
Referring to fig. 2, the management queue work of the virtual device needs to be applied to the virtual NVMe driver and the physical NVMe driver, and needs to be implemented in combination with the hardware acceleration control chip 30 and a plurality of solid state disks, fig. 2 illustrates the solid state disk 21 as an example, and the working principles of other solid state disks are the same.
Before the host initiates the IO operation, the virtual device needs to be initialized first, including creating an IO queue. First, when the host is started, all PCIE devices need to be enumerated, the BIOS (Basic Input Output System, basic input/output system) of the host needs to initiate a read-write operation of the configuration space to the hardware acceleration control chip 30, and the computer interface 31 of the hardware acceleration control chip 30 transmits request information of the configuration TLP (Transaction LAYER PACKAGE, transaction layer packet) to the embedded controller 33, and the controller 33 processes the operation of the configuration TLP information. After encapsulating the saved configuration space information into TLP information, the controller 33 sends the TLP information obtained by encapsulation to the computer interface 31, and returns the TLP information to the BIOS of the host through the computer interface 31. By executing the above operations multiple times, the host will complete enumeration of the virtual devices, including enumeration of the virtual NVMe devices and virtual PCIE devices, and the virtual PCIE devices will provide buffering of the IO queues of the physical hard disk.
After the host is started, the physical NVMe driver is loaded to the solid state disk 21, preferably, the loaded physical NVMe driver is stored in the host in advance, the processing of the physical NVMe driver on the management queue portion of the solid state disk 21 is the same as that of the conventional driver, and the commit queue SQ and the completion queue CQ are both created on the memory of the host. Wherein the commit queue SQ comprises one or more commit queue elements SQE and the completion queue CQ comprises one or more commit queue elements CQE. In addition, when the IO queue is created, the base addresses of the commit queue and the completion queue are configured as the base addresses corresponding to the base address register space of the PCIE device that is virtualized by the hardware acceleration control chip 30.
After the initialization of the solid state disk 21 is completed, a virtual NVMe driver is loaded to the solid state disk, and preferably, the loaded virtual NVMe driver is also stored in advance on the host, and the virtual NVMe driver is a standard OS driver. In the process of loading the virtual NVMe driver, all read-write accesses of all base address register spaces except the IO queue are transmitted to the controller 33 by the computer interface 31 for processing, and the controller 33 returns the completion TLP information according to the cached base address register information and transmits the TLP information to the host through the computer interface 31. In addition, the management queues are all created in the host's memory, commit queues, completion queues, direct memory access (Direct Memory Access, DMA) operations of the data are all controlled by the controller 33's software. After the base register access and direct memory access interaction between the host and the controller 33, the initialization of the virtual NVMe device is completed.
So far, the initialization of the virtual device is completed, and the host can initiate IO operation. The process by which the host performs an IO operation is described below in connection with FIGS. 3-5.
The embodiment of the method for realizing the virtualization of the data channel of the solid state disk comprises the following steps:
Referring to FIG. 3, host initiated IO operations involve processing of doorbell registers, commit queue elements (SQEs), completion Queue Elements (CQEs), physical region pages (Physical Region Page, PRP), and Data (Data), and also require application of interrupts to effect a response to the Data. In this embodiment, the hardware acceleration control chip 30 is used to perform processing on doorbell, commit queue element, completion queue element, interrupt, and processing of physical region page and data is completed by direct memory access of the solid state disk 21.
The queue management module 32 is provided with a virtual doorbell register 41, a virtual hard disk queue scheduler 42, a commit queue element processing module 43, and a completion queue element processing module 44, and stores a cache of commit queue elements. The virtual doorbell register 41 may store a doorbell state of a commit queue and a doorbell state of a completion queue.
In connection with fig. 4 and 5, when the host initiates the IO operation, step S11 is first executed, where the host updates the commit queue doorbell state of the virtual hard disk through the virtual NVMe driver, for example, updates the data of the commit queue doorbell state stored in the virtual doorbell register 41, thereby triggering the virtual disk queue scheduler 42 to operate.
Then, step S12 is executed, and the virtual hard disk queue scheduler 42 sends the scheduling result to the commit queue element handling module 43, at this time, the commit queue element handling module 43 executes step S13 to read the current commit queue element SQE from the host. As can be seen from fig. 3, the memory of the host stores the commit queue element SQE, and in step S13, the commit queue element handling module 43 reads the currently pending commit queue element stored by the host from the memory of the host through the computer interface 31.
Then, step S14 is executed, and the commit queue element processing module 43 modifies the commit queue element recorded by the queue management module 32 according to the mapping relationship between the IO queue of the virtual hard disk and the IO queue of the physical hard disk, and writes the commit queue element into the cache of the commit queue element. Next, step S15 is executed, where the commit queue element processing module 43 updates the doorbell state of the commit queue of the solid state disk 21 in a PCIE point-to-point manner, that is, modifies the commit queue doorbell register of the solid state disk 21, so as to notify the solid state disk 21 that the current commit queue element changes.
After the solid state disk 21 obtains the status update of the commit queue doorbell, step S16 is executed, the cache of the commit queue element in the queue management module 32 is read in a point-to-point manner of PCIE, step S17 is executed, the solid state disk 21 parses the commit queue element, obtains the internal address of the physical area page List (PRP List) corresponding to the host, and reads the information of the corresponding physical area page List from the memory of the host.
Then, the solid state disk 21 executes step S18, and the solid state disk 21 exchanges data with the memory of the host according to the information of the commit queue element and the corresponding physical area page list information, that is, executes corresponding instructions, such as reading data from the memory of the host or writing data into the memory of the host, according to the information of the commit queue element.
After the solid state disk 21 completes the data exchange with the host, the solid state disk 21 executes step S19, and sends the completion queue element to the queue processing module 32 in a point-to-point manner of PCIE, at this time, the queue management module 32 executes step S20, and the completion queue element processing module 44 modifies the recorded completion queue element according to the mapping relationship between the IO queue of the virtual hard disk and the IO queue of the physical hard disk, and writes the completion queue element into the memory of the host. Preferably, completion queue element handling module 44 needs to send an interrupt to the host before sending the completion queue element.
Finally, step S21 is executed, where the completion queue element processing module 44 updates the doorbell state of the completion queue element of the solid state disk 21 in a PCIE point-to-point manner, for example, updates the doorbell state of the completion queue element recorded in the doorbell register of the solid state disk 21. At this time, step S22 is executed, after the virtual NVMe driver on the host receives the interrupt request or polls the completion queue, if it is confirmed that the current IO operation is completed, the doorbell register of the virtual hard disk is updated, that is, the doorbell state of the completion queue element of the corresponding virtual hard disk is updated. Thus, the IO operation initiated by the host is completed.
It can be seen that the main steps of this embodiment are that the queue management module 32 of the hardware acceleration control chip 30 receives the doorbell signal, reads the commit queue element in the commit queue of the queue management module 32, stores the commit queue element in the cache, maps the cache to the base register space of the virtual device, and then updates the commit queue doorbell state of the solid state disk 21.
After the solid state disk receives the doorbell state update information of the commit queue, the commit queue element stored by the acceleration control chip 30 is read in a point-to-point manner, and the memory of the host is directly accessed, that is, the solid state disk does not access the acceleration control chip 30 in the process of processing data, so that the solid state disk 21 can directly access the memory of the host, and the problem of data transmission delay caused by the limitation of hardware is avoided.
After the data processing is completed by the solid state disk, the completion queue elements of the acceleration control chip 30 are accessed in a point-to-point manner, the queue management module 32 can receive the completion queue elements of the solid state disk 21 and convert the completion queue elements into the completion queue elements of the virtual device, write the completion queue elements into a completion queue of the host, and update the completion queue doorbell state information of the solid state disk 21 in a point-to-point manner.
Therefore, the method can realize the virtualization of the solid state disk, and the solid state disk can directly exchange data with the memory of the host, so that the delay of data transmission can be avoided. In addition, because the data exchange between the solid state disk and the host does not need to pass through the acceleration control chip, the acceleration control chip does not need to meet the requirement of the solid state disk, the data transmission bandwidth between the solid state disk and the host is not limited by the hardware performance of the acceleration control chip, the requirement of the acceleration control chip on the bandwidth of data transmission is very low, the size of the acceleration control chip can be made very small, one acceleration control chip can simultaneously support the virtualization requirement of all the solid state disks under the same root component, and the production cost is low.
Finally, it should be emphasized that the foregoing is merely a preferred embodiment of the present invention, and is not intended to limit the invention, but rather that various changes and modifications can be made by those skilled in the art without departing from the spirit and principles of the invention, and any modifications, equivalent substitutions, improvements, etc. are intended to be included within the scope of the present invention.