CN118838859B - Data transmission method, product, equipment and medium - Google Patents
Data transmission method, product, equipment and medium Download PDFInfo
- Publication number
- CN118838859B CN118838859B CN202411329392.6A CN202411329392A CN118838859B CN 118838859 B CN118838859 B CN 118838859B CN 202411329392 A CN202411329392 A CN 202411329392A CN 118838859 B CN118838859 B CN 118838859B
- Authority
- CN
- China
- Prior art keywords
- configuration information
- memory access
- direct memory
- accelerator
- microcontroller
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/28—DMA
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bus Control (AREA)
Abstract
The invention discloses a data transmission method, a data transmission product, data transmission equipment and a data transmission medium, and relates to the technical field of computers. The method and the device have the advantages that the microcontroller is arranged at the accelerator end and is used for receiving the DMA configuration information sent by the host end and transmitting the DMA configuration information to the processor at the accelerator end, namely, a mode of out-of-band configuration is adopted, a plurality of pieces of DMA configuration information can be received at one time, the condition that the next transmission can be initiated after waiting for the completion of one-time configuration information transmission is avoided, the data transmission efficiency is effectively improved, and meanwhile, the accelerator end is controlled by the microcontroller to carry out the DMA transmission, and the processor is only responsible for carrying out data transmission and calculation, so that the logic development difficulty of the processor is reduced. In addition, the scheme adopts the first communication bus to transmit the DMA configuration information respectively, and the target data corresponding to the DMA configuration information is transmitted through the second communication bus, so that the real-time execution of the DMA operation can be realized, and the real-time performance of the data transmission is improved.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data transmission method, a product, a device, and a medium.
Background
In order to meet the increasing demands of technologies such as big data and cloud computing on computing power, a data center often adopts a method for improving computing power of a server and expanding an accelerator. The direct memory access (Direct Memory Access, DMA) is used as the fastest data transmission mode in the high-speed peripheral bus (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIE) interface, so that the central processing unit (Central Processing Unit, CPU) can be effectively liberated, and the transmission efficiency is improved.
The DMA transfer mode includes block DMA and chained DMA. When the block DMA is transmitted, after the CPU sets the source address, the destination address and the length, the DMA controller can transmit data and inform the host through interruption after the transmission is completed. However, block DMA can only transfer data for one contiguous physical memory block at a time, and the transfer efficiency for discrete memory blocks is low. And the chain DMA records the transmission information of the discrete memory blocks through the descriptors, the equipment end performs DMA data transmission according to the descriptors, and after the completion of one transmission, the equipment end continues to execute the next transmission according to the address information in the descriptors until the execution of all the descriptors is completed. Although the chained DMA can solve the problem of transmission under a plurality of data blocks, the number of blocks transmitted each time is limited, and the next transmission can be initiated only by waiting for completion of one transmission, which is inefficient.
In view of the above, how to solve the problem that the current DMA transmission mode needs to wait for completion of one transmission to initiate the next transmission, and the data transmission efficiency is low is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a data transmission method, a product, equipment and a medium, which are used for solving the problem that the current DMA transmission mode needs to wait for one-time transmission to finish before the next transmission can be initiated and the data transmission efficiency is low.
In order to solve the technical problems, the invention provides a data transmission method which is applied to a host side, wherein the method comprises the following steps:
Acquiring each piece of direct memory access configuration information, wherein the direct memory access configuration information comprises a source address, a destination address and a transmission length;
Transmitting each piece of direct memory access configuration information to a microcontroller at an accelerator side through a first communication bus so that the microcontroller at the accelerator side can buffer each piece of direct memory access configuration information, and transmitting each piece of direct memory access configuration information to a processor at the accelerator side, wherein the processor at the accelerator side calls a direct memory access mover based on each piece of direct memory access configuration information and a direct memory access controller, and acquires target data corresponding to each piece of direct memory access configuration information from a memory at the accelerator side through the direct memory access mover;
and receiving each target data transmitted by a processor of the accelerator side through a second communication bus.
In one aspect, before the obtaining each direct memory access configuration information, the method further includes:
applying for a preset number of caches based on the memory;
Setting a data sending buffer and a data receiving buffer based on all the buffers;
and respectively recording the buffer memory size, the head address and the use state of each buffer memory.
In another aspect, the obtaining each direct memory access configuration information includes:
monitoring a direct memory access configuration information sending request;
When the direct memory access configuration information sending request is received, copying each direct memory access configuration information corresponding to the direct memory access configuration information sending request into each data sending buffer;
And respectively acquiring the direct memory access configuration information in each data transmission buffer according to the head address of each data transmission buffer.
In another aspect, the receiving, by the processor of the accelerator side, each of the target data transmitted by the processor of the accelerator side through the second communication bus includes:
monitoring a target data receiving request;
when the target data receiving request is received, determining the size of each target data according to the target data receiving request;
determining a target data receiving buffer memory in all the data receiving buffer memories according to the size of each target data, the buffer memory size of each data receiving buffer memory and the use state;
And acquiring the first address of each target data receiving buffer, and respectively storing each target data into the corresponding target data receiving buffer according to the first address of each target data receiving buffer.
In another aspect, the micro controller of the accelerator side caches each direct memory access configuration information, including:
The microcontroller at the accelerator end stores each piece of direct memory access configuration information into a corresponding cache queue respectively;
And the microcontroller at the accelerator end sets a state flag of the cache queue storing the direct memory access configuration information to indicate that the cache queue is occupied.
In another aspect, the microcontroller at the accelerator side sends each direct memory access configuration information to the processor at the accelerator side, including:
The microcontroller at the accelerator end obtains each piece of direct memory access configuration information based on each piece of cache queue;
and the microcontroller at the accelerator end sends each piece of direct memory access configuration information to the processor at the accelerator end through a third communication bus.
On the other hand, after the microcontroller at the accelerator side sends each direct memory access configuration information to the processor at the accelerator side through the third communication bus, the method further includes:
The microcontroller at the accelerator end monitors transmission completion information transmitted by the processor at the accelerator end through a third communication bus, wherein the transmission completion information characterizes the completion of the target data transmission corresponding to the direct memory access configuration information;
And when receiving the transmission completion information, the microcontroller at the accelerator end resets the state mark of the buffer queue corresponding to the transmission completion information and empties all data in the buffer queue with the state mark reset.
On the other hand, after the microcontroller at the accelerator side sends each direct memory access configuration information to the processor at the accelerator side through the third communication bus, the method further includes:
the microcontroller at the accelerator end monitors the state marks of the cache queues and judges whether the state marks of all the cache queues are reset;
And if the microcontroller at the accelerator side confirms that all the state marks of the cache queues are reset, the microcontroller at the accelerator side sets a transmission completion address register.
On the other hand, if the microcontroller at the accelerator end confirms that the status flag of the target cache queue is not reset, the method further comprises:
The microcontroller at the accelerator end judges whether the duration time of setting the state mark of the target cache queue is larger than a preset time threshold value;
If the microcontroller at the accelerator end confirms that the duration of setting of the state flag of the target cache queue is greater than a preset time threshold, outputting alarm information representing failure of transmission of the target data corresponding to the target cache queue;
deleting the direct memory access configuration information in the target cache queue, and resetting the state mark of the target cache queue.
In another aspect, the method further comprises:
monitoring a transmission completion address register of the microcontroller at the accelerator end, and judging whether the transmission completion address register of the microcontroller at the accelerator end is set or not;
and if the transmission completion address register of the microcontroller at the accelerator side is confirmed to be set, confirming that all the current target data transmission is completed, and recycling all the caches.
In another aspect, the method further comprises:
generating a direct memory access data transmission log according to each direct memory access configuration information and the corresponding target data;
and uploading the direct memory access data transmission log to a server.
In order to solve the technical problem, the invention also provides another data transmission method which is applied to the accelerator terminal, wherein the method comprises the following steps:
Receiving each piece of direct memory access configuration information sent by a microcontroller, wherein the direct memory access configuration information comprises a source address, a destination address and a transmission length;
Calling a direct memory access mover based on each direct memory access configuration information and the direct memory access controller, and acquiring target data corresponding to each direct memory access configuration information from a memory through the direct memory access mover;
and sending each target data to the host side through a second communication bus.
To solve the above technical problem, the present invention also provides a computer program product, which includes a computer program or instructions, and the computer program or instructions implement the steps of the data transmission method when executed by a processor.
In order to solve the above technical problem, the present invention further provides a data transmission device, including:
A memory for storing a computer program;
and the processor is used for realizing the steps of the data transmission method when executing the computer program.
To solve the above technical problem, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the data transmission method described above.
The data transmission method provided by the invention specifically acquires each piece of direct memory access configuration information, wherein the direct memory access configuration information comprises a source address, a destination address and a transmission length, the direct memory access configuration information is sent to a microcontroller at an accelerator end through a first communication bus so that the microcontroller at the accelerator end caches each piece of direct memory access configuration information and sends each piece of direct memory access configuration information to a processor at the accelerator end, the processor at the accelerator end calls a direct memory access mover based on each piece of direct memory access configuration information and a direct memory access controller, and acquires target data corresponding to each piece of direct memory access configuration information from a memory at the accelerator end through the direct memory access mover, and receives each piece of target data transmitted by the processor at the accelerator end through a second communication bus.
The invention has the advantages that the micro controller is arranged at the accelerator end and is used for receiving the DMA configuration information sent by the host end and transmitting the DMA configuration information to the processor at the accelerator end, namely, the out-of-band configuration mode is adopted, a plurality of pieces of DMA configuration information can be received at one time, the condition that the next transmission can be initiated after the completion of one-time configuration information transmission is avoided, the data transmission efficiency is effectively improved, and meanwhile, the accelerator end is controlled by the micro controller, and the processor is only responsible for carrying out data transmission and calculation, so the logic development difficulty of the processor is reduced. In addition, the scheme adopts the first communication bus to transmit the DMA configuration information respectively, and the target data corresponding to the DMA configuration information is transmitted through the second communication bus, so that the real-time execution of the DMA operation can be realized, and the real-time performance of the data transmission is improved.
On the other hand, the host side applies for the data sending buffer memory and the data receiving buffer memory in advance before DMA data transmission in a memory pre-application mode, so that frequent memory application during data transmission is avoided, and the stability of the system is improved. The microcontroller at the accelerator end specifically stores each direct memory access configuration information into a corresponding cache queue respectively, and sets a state flag of the cache queue storing the direct memory access configuration information to indicate that the cache queue is occupied, so that DMA configuration information is better stored. The microcontroller at the accelerator side monitors the transmission completion information transmitted by the processor at the accelerator side through the third communication bus, resets the state mark of the buffer queue corresponding to the transmission completion information when the transmission completion information is received, and empties all data in the buffer queue with the state mark reset, thereby saving the storage space of the microcontroller.
In addition, the invention also provides a computer program product, data transmission equipment and a medium, and the effects are the same as the above.
Drawings
For a clearer description of embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
Fig. 1 is a schematic diagram of a data transmission system according to an embodiment of the present invention;
Fig. 2 is a flowchart of a data transmission method applied to a host according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a cache queue according to an embodiment of the present invention;
fig. 4 is a flowchart of a data transmission method applied to an accelerator according to an embodiment of the present invention;
Fig. 5 is a schematic diagram of a data transmission device applied to a host according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a data transmission device applied to an accelerator according to an embodiment of the present invention;
Fig. 7 is a schematic diagram of a data transmission device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present invention.
The core of the invention is to provide a data transmission method, a product, equipment and a medium, so as to solve the problem that the current DMA transmission mode needs to wait for completion of one transmission to initiate the next transmission and has low data transmission efficiency.
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description.
With the development of big data and cloud computing technology, data centers are facing increasing computational demands. To increase computing power, data centers extend artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) accelerators, such as graphics processors (Graphics Processing Unit, GPUs), general-purpose graphics processors (General-Purpose Graphics Processing Unit, GPGPUs), field programmable gate arrays (Field Programmable GATE ARRAY, FPGA), and the like, through PCIE interfaces in addition to improving the performance of the servers themselves. In PCIE interfaces, DMA is the fastest data transmission mode, which allows the CPU to perform transmission control only without participating in actual transmission, and the DMA controller is responsible for copying data between two address spaces, thereby effectively freeing the CPU and improving transmission efficiency.
Currently, there are two common modes of DMA, block DMA and chained DMA. In the block DMA transfer, the CPU sets three elements of a source address, a destination address, and a length, the source address and the destination address respectively correspond to a Double Data Rate (DDR) memory address of a master and a DDR address of a slave. After address translation is performed by an Input/output memory management unit (IOMMU) module, the DMA controller performs data transfer between the host and device according to the configuration information. After the transmission is completed, the host is notified by Interrupt X (INTerrupt X, INT-X) or message signaled Interrupt (MESSAGE SIGNALED Interrupt, MSI). However, block DMA can only transfer one contiguous physical memory block at a time, and the transfer efficiency for discrete memory blocks is low. And the chain DMA fills the transmission information of the discrete memory blocks into the descriptors, and the equipment side firstly reads the descriptors in a DMA transmission mode and then carries out data transmission according to the information in the descriptors. After the completion of one transfer, the next DMA transfer is continuously read and executed according to the address of the next descriptor recorded in the descriptor until all the descriptors are executed. The host side then reconfigures the descriptor and initiates the next transmission. Although chained DMAs can solve the problem of transferring multiple data blocks, the number of blocks per transfer is limited and it is necessary to wait for one chained DMA transfer to complete before initiating the next transfer. Accordingly, in order to solve the above-described problems, the present invention provides a data transmission method.
Fig. 1 is a schematic diagram of a data transmission system according to an embodiment of the present invention. It should be noted that the method provided by the present invention is applied to the host side as shown in fig. 1. The host side and the accelerator side are in communication connection through a first communication bus and a second communication bus. The accelerator end is provided with a microcontroller and a processor, the host end transmits DMA configuration information to the microcontroller through a first communication bus, and the microcontroller receives and manages the DMA configuration information and transmits the DMA configuration information to the processor. The processor completes DMA operation according to the DMA configuration information, acquires target data from the memory, and transmits the target data to the host side through the second communication bus, thereby completing DMA data transmission.
It should be noted that, in this embodiment, the specific type of the accelerator is not limited, for example, a GPU accelerator, a GPGPU accelerator, or an FPGA accelerator, and the processor in the accelerator is not limited, for example, a GPU, a GPGPU, or an FPGA may be correspondingly used, which depends on the specific implementation.
In addition, the specific types of the first communication bus and the second communication bus are not limited in this embodiment. For example, since the host and the accelerator are generally connected by PCIE Bus communication, a PCIE interface includes a system management Bus (SYSTEM MANAGEMENT Bus, SMBUS) interface in addition to PCIE signals, and thus the first communication Bus may be configured as an SMBUS Bus, and the second communication Bus may be configured as a PCIE Bus. The present solution is described in detail below in connection with specific method steps.
Fig. 2 is a flowchart of a data transmission method applied to a host according to an embodiment of the present invention. As shown in fig. 2, the method includes:
s10, acquiring each piece of direct memory access configuration information.
The direct memory access configuration information comprises a source address, a destination address and a transmission length.
In a specific implementation, when DMA data transmission is performed, first, each DMA configuration information is acquired at the host side. It will be appreciated that the DMA configuration information specifically includes a source address, a destination address, and a transfer length. It should be noted that, in this embodiment, the specific acquisition manner of the DMA configuration information is not limited, and depends on the specific implementation situation.
And S11, sending each direct memory access configuration information to a microcontroller at the accelerator side through a first communication bus so that the microcontroller at the accelerator side can buffer each direct memory access configuration information, sending each direct memory access configuration information to a processor at the accelerator side, calling a direct memory access mover by the processor at the accelerator side based on each direct memory access configuration information and the direct memory access controller, and acquiring target data corresponding to each direct memory access configuration information from a memory at the accelerator side through the direct memory access mover.
Further, the host side sends each DMA configuration information to the microcontroller of the accelerator side through the first communication bus. After receiving the DMA configuration information, the microcontroller at the accelerator side caches the DMA configuration information. In this embodiment, the specific manner of caching the DMA configuration information by the microcontroller is not limited, for example, all DMA configuration information may be stored in a unified manner in one storage space, and each DMA configuration information may also be stored separately and independently, depending on the specific implementation.
The microcontroller at the accelerator side further sends the DMA configuration information to the processor at the accelerator side. It will be appreciated that the processor at the accelerator side includes a DMA controller and a DMA mover. The DMA controller acts as a hardware module and assumes the responsibility of managing and controlling DMA transfers. It has a transfer control function capable of starting and stopping DMA transfer according to configuration parameters such as source address, destination address, and transfer length. At the same time, the DMA controller is also responsible for address management, ensuring proper access to memory by maintaining source and destination address counters. Interrupt management is also one of the important functions, and can generate an interrupt when transmission is completed or an error occurs, and notify the CPU to perform processing. In addition, the DMA controller can directly exchange data with the internal memory and the peripheral through the system bus, and the data transmission is realized without the participation of a CPU.
And the DMA mover is used as a functional module of the DMA controller and is focused on data handling work. It is responsible for copying data from a source address to a destination address, supporting memory-to-memory, memory-to-peripheral, or peripheral-to-memory transfers. The DMA mover supports a plurality of data transfer modes including single transfer, block transfer, and circular buffer transfer. It can also support different data widths, such as 8 bits, 16 bits, or 32 bits, etc. To improve the transmission efficiency, the DMA mover generally includes a First In, first Out (FIFO) buffer for temporarily storing data. Through these functions, the DMA mover realizes efficient data handling, reducing the burden on the CPU.
The processor at the accelerator terminal specifically calls the DMA mover based on the DMA configuration information and the DMA controller, and obtains target data corresponding to the DMA configuration information from the memory at the accelerator terminal through the DMA mover, and executes DMA operation.
And S12, receiving each target data transmitted by the processor of the accelerator side through the second communication bus.
Finally, after the DMA mover of the accelerator processor obtains the target data corresponding to each piece of DMA configuration information from the memory, each piece of target data is transmitted to the host computer through the second communication bus, so that the DMA data transmission is completed. It should be noted that, in this embodiment, the specific manner in which the host receives the target data is not limited, and depends on the specific implementation.
In the embodiment, the micro controller is arranged at the accelerator end and is used for receiving the DMA configuration information sent by the host end and transmitting the DMA configuration information to the processor at the accelerator end, namely, a mode of out-of-band configuration is adopted, so that a plurality of pieces of DMA configuration information can be received at one time, the condition that the next transmission can be initiated after waiting for the completion of one-time configuration information transmission is avoided, the data transmission efficiency is effectively improved, and meanwhile, the accelerator end is controlled by the micro controller and the processor is only responsible for carrying out data transmission and calculation, so that the logic development difficulty of the processor is reduced. In addition, the scheme adopts the first communication bus to transmit the DMA configuration information respectively, and the target data corresponding to the DMA configuration information is transmitted through the second communication bus, so that the real-time execution of the DMA operation can be realized, and the real-time performance of the data transmission is improved.
In order to improve the stability of the host system, in some embodiments, before acquiring each direct memory access configuration information, the method further includes:
s13, applying for a preset number of caches based on the memory;
S14, setting a data sending buffer and a data receiving buffer based on all buffers;
s15, respectively recording the buffer size, the head address and the use state of each buffer.
In a specific implementation, before performing DMA data transmission, applying for a preset number of caches based on a memory of a host, and dividing all caches into two parts, wherein one part is a data transmission cache and the other part is a data receiving cache.
The data transmission buffer is a buffer for storing data to be transmitted, and the data reception buffer is a buffer for storing received data. In this embodiment, the preset number of caches is not limited, for example, a 2N-block cache may be set, and the cache is equally divided into two parts, so as to obtain an N-block data sending cache and an N-block data receiving cache.
In addition, after setting the data sending buffer and the data receiving buffer, the buffer size, the head address and the use state of each buffer need to be recorded respectively, so that the buffer can be called during data transmission.
In this embodiment, the host applies for the data sending buffer and the data receiving buffer in advance before performing DMA data transmission in a memory pre-application manner, so that frequent memory application during data transmission is avoided, and stability of the system is improved.
Based on the foregoing embodiments, in some embodiments, obtaining each direct memory access configuration information includes:
S101, monitoring a direct memory access configuration information sending request;
s102, when a direct memory access configuration information sending request is received, copying each direct memory access configuration information corresponding to the direct memory access configuration information sending request into each data sending buffer;
S103, respectively acquiring direct memory access configuration information in each data transmission buffer according to the head address of each data transmission buffer.
Because the data sending buffer memory is set in advance before the DMA data transmission is performed, the host side specifically monitors the DMA configuration information sending request when the DMA configuration information is acquired. When the host receives the DMA configuration information sending request, the corresponding DMA configuration information of the DMA configuration information sending request is correspondingly copied into each data sending buffer memory. It will be appreciated that the data transmission buffer herein should be an unoccupied buffer. When the DMA configuration information transmission is needed, the DMA configuration information in each data transmission buffer is acquired according to the head address of each data transmission buffer, so that the DMA configuration information is conveniently transmitted to the microcontroller at the accelerator side.
In conclusion, the acquisition of the DMA configuration information is realized. By storing each DMA configuration information into the data transmission buffer memory applied in advance, frequent application of memory during data transmission is avoided, and the stability of the system is improved.
On the basis of the foregoing embodiments, in some embodiments, receiving, by the processor of the accelerator side, each target data transmitted by the second communication bus includes:
S121, monitoring a target data receiving request;
s122, when receiving the target data receiving request, determining the size of each target data according to the target data receiving request;
S123, determining a target data receiving cache in all data receiving caches according to the size of each target data, the cache size of each data receiving cache and the use state;
S124, acquiring the first address of each target data receiving buffer, and respectively storing each target data into the corresponding target data receiving buffer according to the first address of each target data receiving buffer.
The data receiving buffer is set in advance before DMA data transmission, so that the host specifically monitors the target data receiving request when receiving the target data transmitted by the accelerator, and determines the size of each target data according to the target data receiving request when receiving the target data receiving request. In order to select an optimal target data receiving buffer memory corresponding to the target data, the target data receiving buffer memory is determined in all the data receiving buffer memories according to the size of each target data, the buffer memory size and the use state of each data receiving buffer memory. It will be appreciated that the destination data receiving buffer should be unused in terms of its usage status and its buffer size should be no smaller than the size of the corresponding destination data. And finally, acquiring the head address of each target data receiving buffer, and respectively storing each target data into the corresponding target data receiving buffer according to the head address of each target data receiving buffer.
In conclusion, the target data is received and stored by the host side. By storing each target data into the target data receiving buffer memory applied in advance, frequent application memory during data transmission is avoided, and the stability of the system is improved.
Based on the above embodiments, in some embodiments, the caching, by the microcontroller at the accelerator side, each direct memory access configuration information includes:
S131, the microcontroller at the accelerator end stores each piece of direct memory access configuration information into a corresponding cache queue respectively;
s132, the microcontroller at the accelerator end sets a state flag of a cache queue storing the direct memory access configuration information to indicate that the cache queue is occupied.
In order to better cache the DMA configuration information, in a specific implementation, the microcontroller at the accelerator side specifically stores each DMA configuration information into a corresponding cache queue respectively. Fig. 3 is a schematic diagram of a cache queue according to an embodiment of the present invention. As shown in fig. 3, the microcontroller includes a plurality of buffer queues, each buffer queue stores a DMA configuration information, that is, a source address, a destination address, and a transmission length. In addition, each buffer queue also comprises a corresponding state mark used for representing whether the corresponding buffer queue is occupied or not, and particularly when the state mark is set, the corresponding buffer queue is confirmed to be occupied, and when the state mark is reset, the corresponding buffer queue is confirmed to be in an idle state. Therefore, after the micro controller at the accelerator end stores each DMA configuration information into the corresponding buffer queue, the status flag of the buffer queue storing the DMA configuration information needs to be set further to indicate that the buffer queue is occupied.
In this embodiment, the microcontroller at the accelerator end specifically stores each direct memory access configuration information into a corresponding cache queue, and sets a status flag of the cache queue storing the direct memory access configuration information to indicate that the cache queue is occupied, so that DMA configuration information is better stored.
Based on the foregoing embodiments, in some embodiments, the microcontroller at the accelerator side sends each direct memory access configuration information to the processor at the accelerator side, including:
s133, the microcontroller at the accelerator end acquires each piece of direct memory access configuration information based on each cache queue;
and S134, the microcontroller at the accelerator side sends each piece of direct memory access configuration information to the processor at the accelerator side through the third communication bus.
In order to send the DMA configuration information to the processor, in a specific implementation, the microcontroller at the accelerator side specifically obtains each DMA configuration information based on each buffer queue, and sends each DMA configuration information to the processor at the accelerator side through the third communication bus.
It should be noted that the third communication bus in this embodiment is not limited, and may be, for example, a PCIE bus or a serial peripheral interface (SERIAL PERIPHERAL INTERFACE, SPI) bus, depending on the specific implementation. In this way, the transmission of DMA configuration information from the microcontroller to the processor is achieved.
In order to save the memory space of the microcontroller, in some embodiments, after the microcontroller at the accelerator side sends each direct memory access configuration information to the processor at the accelerator side through the third communication bus, the method further includes:
S135, the microcontroller at the accelerator end monitors the transmission completion information transmitted by the processor at the accelerator end through a third communication bus;
The transmission completion information characterizes the completion of the target data transmission corresponding to the direct memory access configuration information;
S136, when the microcontroller at the accelerator end receives the transmission completion information, resetting the state mark of the buffer queue corresponding to the transmission completion information, and emptying all data in the buffer queue with the state mark reset.
In a specific implementation, after the microcontroller at the accelerator side sends each DMA configuration information to the processor at the accelerator side through the third communication bus, the transmission completion information transmitted by the processor at the accelerator side may be further monitored through the third communication bus.
It should be noted that the transmission completion information characterizes that the target data corresponding to the DMA configuration information has completed the transmission from the processor at the accelerator side to the host side, and the DMA data transmission is completed. Therefore, when the microcontroller at the accelerator end receives the transmission completion information, the target data corresponding to the DMA configuration information is considered to be transmitted completely, and the DMA configuration information in the corresponding cache queue is useless, so that the state flag of the cache queue corresponding to the transmission completion information can be reset, and all data in the cache queue with the reset state flag can be emptied, thereby releasing the storage space of the cache queue and saving the storage space of the microcontroller.
In this embodiment, the microcontroller at the accelerator monitors the transmission completion information transmitted by the processor at the accelerator through the third communication bus, and resets the status flag of the buffer queue corresponding to the transmission completion information when the transmission completion information is received, and empties all the data in the buffer queue with the status flag reset, thereby saving the storage space of the microcontroller.
To better learn the state of DMA data transfer, in some embodiments, after the microcontroller at the accelerator side sends each direct memory access configuration information to the processor at the accelerator side through the third communication bus, the method further includes:
s137, the microcontroller at the accelerator end monitors the state marks of all the cache queues and judges whether the state marks of all the cache queues are reset or not, if the microcontroller at the accelerator end confirms that the state marks of all the cache queues are reset, the step S138 is entered;
s138, the microcontroller at the accelerator end sets the transmission completion address register.
In a specific implementation, the microcontroller at the accelerator end also monitors the status flags of the cache queues and determines whether the status flags of all the cache queues are reset.
If the microcontroller at the accelerator confirms that the state marks of all the cache queues are reset, confirming that the target data corresponding to the DMA configuration information in all the cache queues are sent, and currently, no DMA data is transmitted, setting a transmission completion address register by the microcontroller at the accelerator, so that the current DMA data transmission is finished.
If the microcontroller at the accelerator end confirms that the state mark of the target cache queue is not reset, the target data corresponding to the DMA configuration information in the current target cache queue is considered to be unfinished to be sent. In order to prevent the DMA data transmission error from occupying the microcontroller resource for a long time, if the microcontroller at the accelerator end confirms that the status flag of the target buffer queue is not reset, the method further comprises:
S139, the microcontroller at the accelerator end judges whether the duration of the setting of the state mark of the target cache queue is larger than a preset time threshold, if the microcontroller at the accelerator end confirms that the duration of the setting of the state mark of the target cache queue is larger than the preset time threshold, the step S140 is entered;
s140, outputting alarm information representing failure of transmission of target data corresponding to the target cache queue;
S141, deleting the direct memory access configuration information in the target cache queue, and resetting the state mark of the target cache queue.
Specifically, if the microcontroller at the accelerator end confirms that the state flag of the target cache queue is not reset, the microcontroller at the accelerator end judges whether the duration of setting the state flag of the target cache queue is greater than a preset time threshold. In this embodiment, the preset time threshold is not limited, and depends on the specific implementation.
If the microcontroller at the accelerator side confirms that the duration of setting the state flag of the target cache queue is not greater than the preset time threshold, continuing to monitor. If the microcontroller at the accelerator end confirms that the duration of setting of the state flag of the target cache queue is greater than a preset time threshold, confirming that the target data transmission corresponding to the target cache queue fails, and outputting alarm information representing the failure of the target data transmission corresponding to the target cache queue so as to prompt a user to check the transmission condition of the target data corresponding to the target cache queue in time; meanwhile, deleting the DMA configuration information in the target cache queue, resetting the state mark of the target cache queue, and releasing the storage space of the target cache queue so as to facilitate the next DMA data transmission.
In order to recover the DMA related resource information of the host, in some embodiments, the method further includes, based on the above embodiments:
S16, monitoring a transmission completion address register of the microcontroller at the accelerator end, judging whether the transmission completion address register of the microcontroller at the accelerator end is set, and if the transmission completion address register of the microcontroller at the accelerator end is confirmed to be set, entering a step S17;
and S17, confirming that all current target data transmission is completed, and recycling all caches.
In a specific implementation, the host continuously monitors a transmission completion address register of the microcontroller at the accelerator end, and judges whether the transmission completion address register of the microcontroller at the accelerator end is set. It will be appreciated that the transfer complete address register set indicates that all DMA data transfers currently have ended.
If the transmission completion address register of the microcontroller at the accelerator end is confirmed to be unset, confirming that the target data is not transmitted currently, and continuing to monitor the transmission completion address register of the microcontroller at the accelerator end. And if the transmission completion address register of the microcontroller at the accelerator end is confirmed to be set, confirming that all the current target data transmission is completed, and recovering all caches. It can be understood that the recovered buffer is a data transmission buffer and a data receiving buffer which are applied in advance based on the memory of the host side. By recycling all caches, the recovery of the DMA related resource information of the host is realized, and the resources of the host are saved.
In order to make the user better grasp the whole process of DMA data transfer, in some embodiments, the method further includes:
s18, generating a direct memory access data transmission log according to each direct memory access configuration information and the corresponding target data;
And S19, uploading the direct memory access data transmission log to a server.
Specifically, a DMA data transfer log is generated according to each DMA configuration information and the corresponding target data thereof. In order to enable a user to grasp the whole DMA transmission process, the DMA data transmission log should contain configuration information such as a source address, a destination address, a transmission length and the like, define the start point, the end point and the data quantity of data transmission, and also need the data transmission direction, namely the transmission direction of the data between a memory and a peripheral, the DMA transmission state comprises state changes such as starting time, ending time, transmission completion, errors or interruption and the like, DMA transmission performance indexes such as transmission speed, time and efficiency, DMA controller models, versions and related configuration parameters, and records of interruption, abnormality or other related events generated in the DMA transmission process. Such information helps system administrators or developers analyze and debug DMA transfer processes, optimize system performance, and solve potential problems.
And finally, uploading the DMA data transmission logs to a server, so that centralized storage and management of the logs can be realized, a system administrator can conveniently monitor DMA transmissions of a plurality of devices in a unified way, data security is ensured, recovery history logs are convenient to analyze or troubleshoot, developers or technical support personnel can remotely analyze and debug, work efficiency is improved, system performance variation trend is evaluated and optimized by comparing logs of different time periods or devices, the server can analyze the logs in real time, identify potential faults or anomalies and early warn in time, meet compliance requirements and serve as audit basis, and log sharing and cooperation among team members are promoted. This helps to improve the reliability, performance and safety of the system.
Fig. 4 is a flowchart of a data transmission method applied to an accelerator according to an embodiment of the present invention. As shown in fig. 4, the method includes:
S20, receiving each piece of direct memory access configuration information sent by the microcontroller.
The direct memory access configuration information comprises a source address, a destination address and a transmission length, and is acquired by a host end, sent to the microcontroller through a first communication bus and cached by the microcontroller.
S21, calling a direct memory access mover based on each direct memory access configuration information and the direct memory access controller, and acquiring target data corresponding to each direct memory access configuration information from a memory through the direct memory access mover.
S22, sending each target data to the host side through the second communication bus.
In the embodiment, the micro controller is arranged at the accelerator end and is used for receiving the DMA configuration information sent by the host end and transmitting the DMA configuration information to the processor at the accelerator end, namely, a mode of out-of-band configuration is adopted, so that a plurality of pieces of DMA configuration information can be received at one time, the condition that the next transmission can be initiated after waiting for the completion of one-time configuration information transmission is avoided, the data transmission efficiency is effectively improved, and meanwhile, the accelerator end is controlled by the micro controller and the processor is only responsible for carrying out data transmission and calculation, so that the logic development difficulty of the processor is reduced. In addition, the scheme adopts the first communication bus to transmit the DMA configuration information respectively, and the target data corresponding to the DMA configuration information is transmitted through the second communication bus, so that the real-time execution of the DMA operation can be realized, and the real-time performance of the data transmission is improved.
In the above embodiments, the data transmission method is described in detail, and the present invention further provides corresponding embodiments of the data transmission device.
Fig. 5 is a schematic diagram of a data transmission device applied to a host according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes:
A first obtaining module 10, configured to obtain each piece of direct memory access configuration information, where the direct memory access configuration information includes a source address, a destination address, and a transmission length;
The first sending module 11 sends each piece of direct memory access configuration information to the microcontroller at the accelerator side through the first communication bus so that the microcontroller at the accelerator side caches each piece of direct memory access configuration information and sends each piece of direct memory access configuration information to the processor at the accelerator side, and the processor at the accelerator side calls a direct memory access mover based on each piece of direct memory access configuration information and the direct memory access controller to acquire target data corresponding to each piece of direct memory access configuration information from the memory at the accelerator side through the direct memory access mover;
A first receiving module 12, configured to receive, through a second communication bus, each of the target data transmitted by the processor of the accelerator side.
Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.
Fig. 6 is a schematic diagram of a data transmission device applied to an accelerator according to an embodiment of the present invention. As shown in fig. 6, the apparatus includes:
The second receiving module 13 is configured to receive each piece of direct memory access configuration information sent by the microcontroller, where the direct memory access configuration information includes a source address, a destination address and a transmission length;
A second obtaining module 14, configured to invoke a direct memory access mover based on each of the direct memory access configuration information and the direct memory access controller, and obtain, from the memory, target data corresponding to each of the direct memory access configuration information through the direct memory access mover;
and a second sending module 15, configured to send each of the target data to the host through a second communication bus.
Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.
Furthermore, the invention provides a computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the data transmission method described above.
Fig. 7 is a schematic diagram of a data transmission device according to an embodiment of the present invention. As shown in fig. 7, the data transmission apparatus includes:
a memory 20 for storing a computer program;
a processor 21 for implementing the steps of the data transmission method as mentioned in the above embodiments when executing a computer program.
The data transmission device provided in this embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The Processor 21 may be implemented in at least one hardware form of a digital signal Processor (DIGITAL SIGNAL Processor, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 21 may also include a main processor for processing data in the awake state, also referred to as a central processor (Central Processing Unit, CPU), and a coprocessor for processing data in the standby state, which is a low-power processor. In some embodiments, the processor 21 may integrate a graphics processor (Graphics Processing Unit, GPU) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) processor for processing computing operations related to machine learning.
Memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing a computer program 201, which, when loaded and executed by the processor 21, is capable of implementing the relevant steps of the data transmission method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 20 may further include an operating system 202, data 203, and the like, where the storage manner may be transient storage or permanent storage. Operating system 202 may include Windows, unix, linux, among other things. The data 203 may include, but is not limited to, data related to a data transmission method.
In some embodiments, the data transmission device may further include a display 22, an input/output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.
Those skilled in the art will appreciate that the structure shown in fig. 7 does not constitute a limitation of the data transmission device and may include more or fewer components than shown.
Finally, the invention also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps described in the above-described method embodiments (the method may be a method corresponding to the host side, a method corresponding to the accelerator side, or a method corresponding to the host side and the accelerator side).
It will be appreciated that the methods of the above embodiments, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored on a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium for performing all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The data transmission method, the product, the equipment and the medium provided by the invention are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411329392.6A CN118838859B (en) | 2024-09-24 | 2024-09-24 | Data transmission method, product, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411329392.6A CN118838859B (en) | 2024-09-24 | 2024-09-24 | Data transmission method, product, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118838859A CN118838859A (en) | 2024-10-25 |
CN118838859B true CN118838859B (en) | 2024-12-10 |
Family
ID=93142969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411329392.6A Active CN118838859B (en) | 2024-09-24 | 2024-09-24 | Data transmission method, product, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118838859B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113742269A (en) * | 2021-11-03 | 2021-12-03 | 浙江国利信安科技有限公司 | Data transmission method, processing device and medium for EPA device |
CN115373810A (en) * | 2021-12-20 | 2022-11-22 | 比科奇微电子(杭州)有限公司 | Processing method and device of accelerator, storage medium and processor |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7159048B2 (en) * | 2001-12-10 | 2007-01-02 | Emulex Design & Manufacturing Corporation | Direct memory access (DMA) transfer buffer processor |
KR100958685B1 (en) * | 2005-04-01 | 2010-05-20 | 후지쯔 가부시끼가이샤 | Computer-readable recording medium recording a DMA controller, a node, a data transmission control method and a program |
CN117056258A (en) * | 2023-08-18 | 2023-11-14 | 上海思朗万维计算技术有限责任公司 | Data transmission method, device, equipment and storage medium |
CN117149070A (en) * | 2023-08-30 | 2023-12-01 | 山东云海国创云计算装备产业创新中心有限公司 | Data transmission method and solid state disk system |
-
2024
- 2024-09-24 CN CN202411329392.6A patent/CN118838859B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113742269A (en) * | 2021-11-03 | 2021-12-03 | 浙江国利信安科技有限公司 | Data transmission method, processing device and medium for EPA device |
CN115373810A (en) * | 2021-12-20 | 2022-11-22 | 比科奇微电子(杭州)有限公司 | Processing method and device of accelerator, storage medium and processor |
Also Published As
Publication number | Publication date |
---|---|
CN118838859A (en) | 2024-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111813713B (en) | Data acceleration operation processing method and device and computer readable storage medium | |
CN106502935A (en) | FPGA isomery acceleration systems, data transmission method and FPGA | |
CN111061587A (en) | Communication control method, device, equipment and storage medium of I2C bus | |
CN104750554B (en) | The method and apparatus of Data Migration between a kind of virtual machine | |
JP2532191B2 (en) | A method of managing data transmission for use in a computing system having a dual bus architecture. | |
WO2023103296A1 (en) | Write data cache method and system, device, and storage medium | |
WO2017193964A1 (en) | Component upgrade method, apparatus and system | |
CN108123851A (en) | The lifetime detection method and device of main and subordinate node synchronization link in distributed system | |
CN108924008A (en) | A kind of dual controller data communications method, device, equipment and readable storage medium storing program for executing | |
EP4220375A1 (en) | Systems, methods, and devices for queue management with a coherent interface | |
CN117311896A (en) | Method and device for testing direct memory access based on virtual kernel environment | |
CN107547593B (en) | Method, device and distributed system for realizing log synchronization | |
CN118838859B (en) | Data transmission method, product, equipment and medium | |
KR20120134918A (en) | Electronic apparatus including a plurality of processors | |
CN105718396A (en) | I2C bus device and communication method for large data master device transmission | |
US7552269B2 (en) | Synchronizing a plurality of processors | |
CN110365839B (en) | Shutdown method, shutdown device, shutdown medium and electronic equipment | |
CN118132009A (en) | Host command processing method and device, electronic equipment and storage medium | |
CN102508738B (en) | A backup method, kernel and backup kernel of multi-core processor business information | |
CN112084099B (en) | Method, device, equipment and storage medium for acquiring alarm state value based on host | |
CN113961489A (en) | Data access method, device, device and storage medium | |
CN104932947A (en) | Barrier synchronization method and device | |
CN112003860B (en) | Memory management method, system and medium suitable for remote direct memory access | |
CN118503053B (en) | Hardware information transmission method, product, equipment and medium | |
CN109947572B (en) | Communication control method, device, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |