CN117312202A

CN117312202A - System on chip and data transmission method for system on chip

Info

Publication number: CN117312202A
Application number: CN202311618762.3A
Authority: CN
Inventors: 朱宗志; 车京运; 高棋兴; 陈建飞
Original assignee: Zhejiang Guoli Xin'an Technology Co ltd
Current assignee: Zhejiang Guoli Xin'an Technology Co ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2023-12-29
Anticipated expiration: 2043-11-30
Also published as: CN117312202B

Abstract

The present disclosure provides a system-on-chip and a data transmission method for the system-on-chip. The system-on-chip includes a processor system and a programmable logic system, the method comprising: determining, by a DMA kernel thread running in the processor system, whether there is user data in a work queue of the processor system to be sent to the programmable logic system; initiating, by the DMA kernel thread, a DMA transfer to the programmable logic system and then suspending the DMA kernel thread in response to determining that there is user data in a work queue of the processor system to be sent to the programmable logic system; and the AXI DMA core running in the programmable logic system transmits the user data from a transmission buffer area of the processor system to the programmable logic system, and initiates a hard interrupt notification to a main processor of the processor system after the user data transmission is completed.

Description

System on chip and data transmission method for system on chip

Technical Field

The present disclosure relates to the field of industrial control, and more particularly, to a system-on-chip and a data transmission method for the system-on-chip.

Background

Currently, embedded systems are widely used in the field of industrial control to enable control and data transmission between users and industrial equipment. In such embedded systems, programmable systems on chip including a processor and programmable logic devices are often used to provide flexibility and high performance to the system. For example, the ZYNQ-7000 chip manufactured by Xilinx corporation is a type of programmable system-on-a-chip that provides the required processing power and computing capabilities for high-end embedded applications such as video surveillance, high-speed data acquisition, and factory automation. The system on a chip may be made up of two parts, a processor system (PS, processing System) and a programmable logic (PL, programmable Logic) system. Communication between PS and PL is via an interface bus.

However, due to the concurrency and rate mismatch of PS and PL systems, PL systems typically provide higher data rates than PS can handle, thus creating a data transfer bottleneck between the two parts of the system on chip and possibly even causing problems of data loss or overflow.

Disclosure of Invention

In view of the above problems, the present disclosure provides a system-on-chip and a data transmission method for the same that improves data transmission efficiency between PS and PL systems of the system-on-chip by configuring a dedicated data buffer and descriptor table in the system-on-chip and maintaining transmit and receive descriptors through a core work queue and a tasklet queue, respectively.

According to one aspect of the present disclosure, a data transmission method for a system-on-chip is provided, the system-on-chip including a processor system and a programmable logic system. The method comprises the following steps: determining, by a DMA kernel thread running in the processor system, whether there is user data in a work queue of the processor system to be sent to the programmable logic system; initiating, by the DMA kernel thread, a DMA transfer to the programmable logic system and then suspending the DMA kernel thread in response to determining that there is user data in a work queue of the processor system to be sent to the programmable logic system; and the AXI DMA core running in the programmable logic system transmits the user data from a transmission buffer area of the processor system to the programmable logic system, and initiates a hard interrupt notification to a main processor of the processor system after the user data transmission is completed.

In some implementations, the data transmission method further includes: releasing, by the DMA kernel thread, occupancy of the host processor in response to determining that there is no user data in a work queue of the processor system to be sent to the programmable logic system; and/or in response to the hard interrupt, the CPU issues a soft interrupt to reclaim a transmit buffer block corresponding to the user data and a transmit descriptor corresponding to the transmit buffer block in a transmit buffer of the processor system.

In some implementations, the data transmission method further includes: invoking the DMA kernel thread by a user process to determine whether an idle transmit descriptor exists in a transmit descriptor table of the processor system; responsive to the DMA kernel thread determining that there is an idle transmit descriptor in a transmit descriptor table of the processor system, the DMA kernel thread returns information of a transmit buffer block corresponding to the idle transmit descriptor to the user process, and the user process copies the user data to the transmit buffer block; and suspending the user process in response to the DMA kernel thread determining that no free transmit descriptor exists in a transmit descriptor table of the processor system.

In some implementations, the transmit buffer is contiguous physical memory allocated by the core CMA, and a transmit descriptor table is built in the processor system that maps one-to-one with multiple transmit buffer blocks of the transmit buffer.

In some implementations, the data transmission method further includes: packaging user data in a transmit buffer block of the transmit buffer into one data frame, and transmitting the user data from the transmit buffer of the processor system to the programmable logic system by an AXI DMA core running in the programmable logic system comprises: the data frame is sent to the programmable logic system by an AXI DMA core running in the programmable logic system.

In some implementations, the data transmission method further includes: dividing user data to be transmitted into a plurality of data slices according to the data length of a data frame, copying one or more data slices of the plurality of data slices to one or more transmission buffer blocks of the transmission buffer according to the size of the transmission buffer block, respectively, packing each data slice of the one or more transmission buffer blocks into one data frame, and taking the data frame composed of the data slices belonging to the user data as a group of data frames, wherein transmitting the user data from the transmission buffer of the processor system to the programmable logic system by an AXI DMA core running in the programmable logic system comprises: the set of data frames is sent to the programmable logic system by an AXI DMA core running in the programmable logic system.

According to another aspect of the present disclosure, there is provided a system on chip comprising a processor system and a programmable logic system, wherein a transmit buffer and a transmit descriptor table mapped one-to-one with the transmit buffer are allocated in the processor system, wherein the processor system is configured to: determining, via a DMA kernel thread, whether there is user data in a work queue of the processor system to be sent to the programmable logic system; in response to determining that there is user data in a work queue of the processor system to be sent to the programmable logic system, initiate a DMA transfer to the programmable logic system and suspend the DMA kernel thread, and wherein the programmable logic system is configured to: and after the DMA transmission is initiated, the user data is sent from the sending buffer area to the programmable logic system through an AXI DMA core, and after the user data is sent, a hard interrupt notification is initiated to a main processor of the processor system.

In some implementations, the processor system is further configured to: responsive to determining that there is no user data in a work queue of the processor system to send to the programmable logic system, freeing, by the DMA kernel thread, occupancy of the host processor.

In some implementations, the processor system is further configured to: responding to a user process to call the DMA kernel thread and determining that an idle transmission descriptor exists in a transmission descriptor table of the processor system, wherein the DMA kernel thread returns information of a transmission buffer block corresponding to the idle transmission descriptor to the user process so that the user process copies the user data to the transmission buffer block; and suspending the user process in response to the DMA kernel thread determining that no free transmit descriptor exists in a transmit descriptor table of the processor system.

According to yet another aspect of the present disclosure, a data transmission method for a system on a chip is provided. The system-on-chip includes a processor system and a programmable logic system, the method comprising: the main processor of the processor system calls a small task queue to determine whether an idle receiving descriptor exists in a receiving descriptor table of the processor system; responsive to determining that there are free receive descriptors in the receive descriptor table of the processor system, initiating, by a tasklet queue thread running in the processor system, a DMA receive to the programmable logic system and then immediately releasing occupancy of a main processor of the processor system; an AXI DMA core running in the programmable logic system handles device data into a receive buffer block corresponding to the idle receive descriptor in a receive buffer of the processor system, and sends a notification to the host processor through a hard interrupt after the handling is completed to report that the reception of the device data is completed.

In some implementations, the data transmission method further includes: releasing, by the DMA kernel thread, occupancy of the host processor in response to determining that no free receive descriptor exists in a receive descriptor table of the processor system; and/or in response to the hard interrupt, the host processor issues a soft interrupt to update the status of the idle receive descriptor.

In some implementations, the data transmission method further includes: invoking the DMA kernel thread by a user process to determine whether there are available receive descriptors in a receive descriptor table of the processor system; responsive to the DMA kernel thread determining that there are available receive descriptors in a receive descriptor table of the processor system, the DMA kernel thread returning information of the available receive descriptors to the user process; the user process processes data of a receiving buffer block corresponding to the available receiving descriptor in the receiving buffer area based on the information of the available receiving descriptor; and suspending the user process in response to the DMA kernel thread determining that no receive descriptor is available in a receive descriptor table of the processor system.

In some implementations, the data transmission method further includes: packaging the device data in the receiving buffer block into a data frame, and the user process processing the data of the receiving buffer block corresponding to the available receiving descriptor in the receiving buffer area based on the information of the available receiving descriptor comprises: the data frame is received from the receive buffer block.

In some implementations, the data transmission method further includes: and analyzing the received data frames, splitting a group of received data frames into a plurality of independent data frames according to the lead codes and/or the tail codes of the data frames, and merging effective data in each data frame to be placed into a user buffer area for the user equipment to take out.

According to yet another aspect of the present disclosure, there is provided a system on chip comprising a processor system and a programmable logic system, wherein a receive buffer and a receive descriptor table mapped one-to-one with the receive buffer are allocated in the processor system, wherein the processor system is configured to: invoking, by a host processor, a tasklet queue to determine whether there are free receive descriptors in a receive descriptor table of the processor system; and in response to determining that there are free receive descriptors in the receive descriptor table of the processor system, initiating, by a tasklet queue thread running in the processor system, a DMA receive to the programmable logic system and then immediately releasing occupancy of a main processor of the processor system, and wherein the programmable logic system is configured to: device data is carried into a receive buffer block corresponding to the free receive descriptor in a receive buffer of the processor system through an AXI DMA core, and a notification is sent to the host processor through a hard interrupt after the completion of the carrying to report that the reception of the device data is complete.

In some implementations, the processor system is further configured to: releasing, by the DMA kernel thread, occupancy of the host processor in response to determining that no free receive descriptor exists in a receive descriptor table of the processor system; and/or issuing, by the host processor, a soft interrupt to update the status of the free receive descriptor in response to the hard interrupt.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Drawings

The above and other objects, structures and features of the present disclosure will become more apparent upon reading the following detailed description with reference to the accompanying drawings. In the accompanying drawings, several embodiments of the present disclosure are shown by way of example and not by way of limitation. For clarity, the various features of the drawings are not drawn to scale.

Fig. 1 shows a schematic diagram of a system-on-chip according to an embodiment of the present disclosure.

Fig. 2A and 2B illustrate specific diagrams of memory of a processor system according to some embodiments of the invention.

Fig. 3 shows an exemplary flow chart of a data transmission method for a system-on-chip according to an embodiment of the invention.

Fig. 4 shows an exemplary flow chart of a data transmission method for a system-on-chip according to an embodiment of the invention.

Fig. 5 illustrates a schematic diagram of a data frame for a general mode of DMA transfer according to some embodiments of the present invention.

Fig. 6 illustrates a schematic diagram of a data frame for an efficient mode of DMA transfer according to some embodiments of the invention.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure. It may be evident in some or all cases that any of the embodiments described below may be practiced without resorting to the specific design details described below. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing one or more embodiments.

In describing embodiments of the present disclosure, the expression "comprising" and the like should be understood to be an open-ended inclusion, i.e., including, but not limited to. The expression "based on" should be understood as "based at least in part on". The expression "an embodiment" or "this embodiment" should be understood as "at least one embodiment". The expressions "first", "second", etc. may refer to different or the same objects. Other explicit and implicit definitions are also possible below.

Fig. 1 shows a schematic diagram of a system-on-chip 100 according to an embodiment of the present disclosure. As shown in fig. 1, the system-on-chip 100 may include a processor system 110 and a programmable logic system 120. The processor system 110 and the programmable logic system 120 are connected by an interface bus 130. The processor system 110 may include a main processor (CPU) 112 and a memory 114. Note that the components of the system-on-chip 100 relating to the embodiment of the present invention are only schematically shown in fig. 1, and the components of the system-on-chip 100 are not limited to what is shown in fig. 1.

In addition, the system on chip 100 may be connected to a user device 140 and an industrial device 150. The user device 140 may be controlled by a user to provide various control instructions, configuration data, etc. (collectively referred to herein as user data) to the system-on-chip 100 and further to the industrial device 150, and the industrial device 150 may transmit various detected data or calculation results thereof, etc. (collectively referred to herein as device data) to the system-on-chip 100 and further to the user device 140. More specifically, the user device 140 may communicate directly with the processor system 110 of the system-on-chip 100, and the industrial device 150 may communicate directly with the programmable logic system 120, for example.

In one specific example, the system-on-chip 100 may be a programmable system-on-chip such as ZYNQ-7000 from Xilinx corporation. In this case, processor system 110 is an ARM processor, such as a dual core ARM Cortex-A9 processor, programmable logic system 120 is a Field Programmable Gate Array (FPGA), such as a Xilinx 7-series FPGA, and interface bus 130 may be a high-speed AXI (Advanced eXtensible Interface ) bus. The processor system 110 and the programmable logic system 120 may perform bi-directional data transfer via the AXI bus, and the processor system 110 may send control instructions, configuration data, etc. (collectively referred to herein as user data) from the user device 140 to the programmable logic system 120 and read status information in the programmable logic system 120, while the programmable logic system 120 may send calculation results, sensor data, or other processing results (collectively referred to herein as device data) back to the processor system 110.

However, in the above-described systems on chip, high-speed data transfer between the ARM processor and the FPGA has been a challenge, especially in applications requiring a large amount of data exchange. For high speed data transfer, ZYNQ provides two DMA (Direct Memory Access, direct data access) modes. One is a hard core DMA integrated into the processor system 110, which does not occupy the resources of the programmable logic system 120, but requires programming of DMA instructions, which adds to the complexity of the software, while the connection of the processor system 110 and the programmable logic system 120 is through the AXI GP interface, supported up to 32 bits wide, which also limits the transfer rate in this mode. Another is to use soft-core DMA in the programmable logic system 120, where the processor system 110 and the programmable logic system 120 are connected through an axi_hp interface to obtain higher transfer rates with space time.

The ZYNQ-7000 series chip provides three different soft core DMA IPs, which are AXI DMA, AXI CDMA (Central DMA) and ACI VDMA (Video DMA), respectively. The AXI DMA IP core (AXI DMA core hereinafter) is a general data transmission engine, and supports a read channel (MM 2S) and a write channel (S2 MM), which provide high-bandwidth direct Memory access between a Memory and AXI4-Stream peripherals, and mainly includes a Memory Map (Memory Map) and a Stream processing (Stream) interface, the former is connected to the processor system 110, and the latter is connected to the IP core of the programmable logic system 120 with a Stream interface.

AXI DMA provides three data transfer modes, direct Register access, scatter/Gather, cyclic DMA. In the embodiments described herein, only Direct Register access mode is considered, which communicates and controls with other system components through Direct access registers, which avoids the use of FIFO (First In First Out, first-in first-out) buffers, and thus the delay of data transfer is low, enabling high throughput data transfer, while it does not require additional buffers to store data in terms of resource usage, thereby saving resources in the FPGA. Compared with other transmission modes, the direct register access mode flexibly configures and controls the DMA controller, so that a designer can optimize and adapt according to specific requirements to meet the requirements of different applications, the implementation is simpler, the code quantity is less, and the DMA controller is most widely applied and developed among the designers. The direct register access mode allows an application to define a single transaction between the DMA and the industrial device, the application must set the buffer address and transfer length to initiate a transfer in the corresponding channel, does not support a preset transfer instruction, and can only wait for the next transfer to be initiated after the end of the one transfer.

The Xilinx official provides bare metal SDK (software development kit) code for AXI DMA core direct register access mode, the operational flow is roughly: in applying for a block of memory, the processor system 110 reads data from the programmable logic system 120 to the block, or writes data from the block to the programmable logic system 120, the user directly invokes the library function to configure the starting address and the transfer length for the DMA to start the transfer in the corresponding channel, and after the DMA transfer is finished, the IP core notifies the processor system 110 that the data transfer is finished through an interrupt. The transmission is stopped once, the next transmission needs to be reconfigured, and the user needs to configure the next transmission in the interrupt service routine or the main routine, which greatly affects the overall performance and transmission efficiency when the processor system 110 and the programmable logic system 120 communicate frequently.

Xilinx provides a Linux driver for an AXI DMA controller, mainly comprising initializing the DMA controller, and being responsible for configuring parameters of the DMA controller, such as a transmission mode, a data width, and the like; controlling to start and stop data transmission of the DMA controller; registering an interrupt service function and providing a callback function interface for upper application use. The Linux kernel is used for managing and controlling the DMA equipment based on a DMASEngine subsystem, provides a general interface and an abstract layer, and is divided into two roles of provider and client, wherein the provider is an AXI DMA controller driver, directly accesses a register and provides a DMA channel; the client is the transmission method described by the invention, applies to use a DMA channel, and combines interrupt and data buffer management to realize DMA data transmission.

Thus, in the description herein, unless specifically stated otherwise, terms in the Linux operating system are used in the description of various processes and threads, but those skilled in the art will appreciate that the meaning of these terms is not limited to a particular meaning in a particular operating system, but rather that new/additional meanings are given by their unique use in the present invention.

Because of the concurrency and rate mismatch issues of the processor system 110 and the programmable logic system 120, the programmable logic system 120 provides higher data generation rates than the processor system 110 can handle, potentially resulting in data transmission bottlenecks and even data loss or overflow. To avoid this, the inventive method ensures stable transmission and smooth processing of data by balancing the rate differences of data generation and processing by adding data buffer designs and corresponding descriptor tables dedicated to data transmission and reception between the processor system 110 and the programmable logic system 120.

In addition, since the DMA transfer is stopped once and the next transfer needs to be reconfigured, the user needs to configure the next transfer in the interrupt service routine or the main program, and this way, when the processor system 110 and the programmable logic system 120 communicate frequently, the overall performance and the transfer efficiency will be greatly affected.

Fig. 2A and 2B illustrate specific diagrams of the memory 114 of the processor system 110 according to some embodiments of the invention.

In practical applications, it is often necessary to encapsulate data incoming by a user to meet specific transmission requirements or protocols. This typically involves multiple data copy operations, which may have some impact on performance. For example, when a user sends data, the data input by the user is located in the user space, and the data transmission usually needs to copy the data from the user space to the kernel space (copy 1 or more); in the kernel space, multiple copies of data may be required according to specific transmission requirements or protocols; this involves copying data from one kernel buffer to another, or reorganizing, reordering, etc. (copy 2 or more); in the case of large data transfers, multiple data copies involve memory access and data copy operations, which can have a serious impact on system performance. In the invention, the kernel CMA (Contiguous Memory Allocator, continuous memory allocator) is configured to allocate continuous physical memory for high-bandwidth DMA transmission, thereby creating zero-copy memory space, reducing the performance overhead and improving the performance and efficiency of the system.

As shown in fig. 2A, in some embodiments according to the invention, dedicated transmit buffers 10 and receive buffers 20 may be allocated in memory 114 for data transmission between processor system 110 and programmable logic system 120, and one transmit descriptor table 30 and receive descriptor table 40 maintained for transmit buffers 10 and receive buffers 20, respectively. A dedicated transmit descriptor 32 may be maintained in the transmit descriptor table 30 for each transmit buffer block 12 in the transmit buffer 10. A dedicated receive descriptor 42 may be maintained in the receive descriptor table 40 for each receive buffer 22 in the receive buffer 20. The DMA kernel thread may maintain a correspondence between each transmit buffer 12 and a transmit descriptor 32 and a correspondence between receive buffer 22 and a receive descriptor 42. Each of the transmit buffer 12 and the receive buffer 22 may be a physically contiguous memory space, and the transmit descriptor 32 and the receive descriptor 42 corresponding to the transmit buffer 12 and the receive buffer 22 may include a starting physical address of the memory space, a size of the buffer, a sequence number, a status (whether free) and the like.

In other embodiments according to the present invention, one buffer 50 may be used instead of the transmit buffer 10 and the receive buffer 20 shown in fig. 2A. As shown in fig. 2B, a dedicated buffer 50 may be allocated in memory 114 for data transfer between processor system 110 and programmable logic system 120, and accordingly a descriptor table 60 is maintained for the buffer 50, with a dedicated descriptor 62 maintained in the descriptor table 60 for each buffer block 52 in the buffer 50. The DMA kernel thread may maintain a correspondence between each buffer block 52 and the descriptor 62. Each buffer block 52 may be a physically contiguous piece of memory space, and the descriptor 62 corresponding to the buffer block 52 may include a starting physical address of the memory space, a size of the buffer block, a sequence number, a status (whether free), a role of the buffer block 52 (for transmitting or receiving), and so on.

More specifically, during initialization of the system-on-chip 100, or more generally, prior to the initiation of the data transfer method described below, the processor system 110 may effectively manage data buffers, including memory allocation, data copying, buffer reuse, etc., to reduce data transfer latency and improve overall performance.

In some embodiments, contiguous physical memory is first allocated in the memory 114 of the processor system 110 by configuring the CMA of the kernel to serve as buffers dedicated to DMA transfers (e.g., one buffer 50 or two buffers-transmit buffer 10 and receive buffer 20 as described above).

The allocated buffer can be used as a zero-copy memory space for DMA mapping, and large data copying between the Linux kernel space and the user space can be avoided.

Then, the dedicated buffer is divided into a plurality of buffers (e.g., the buffer 52 or the transmission buffer 12 and the reception buffer 22 described above) according to a specified size. Each buffer block is a physically continuous memory space of a fixed size, so as to facilitate DMA transfer.

In addition, a plurality of buffer blocks may constitute a ring buffer (RingBuffer) to facilitate cyclic writing and reading of data in the ring buffer.

Next, a descriptor table (e.g., the descriptor table 60 or the transmit descriptor table 30 and the receive descriptor table 40 described above) is built in the DMA kernel space.

The descriptor table built in the DMA kernel space is used to manage descriptors corresponding to the above-described buffer block queues of the buffers dedicated to DMA transfers. Each descriptor may contain, for example, a Start physical Address (Start Address), size (Size), sequence number (Index), status (Status), etc. of the corresponding buffer block. In addition, the descriptor table may also generally contain Valid Length (Valid Length) that can be stored. The descriptor table may be organized in an array or linked list.

Finally, in the DMA kernel space, a mapping relationship between each descriptor and each buffer block (i.e. DMA mapped memory) may be established.

By associating each descriptor with a corresponding DMA mapped memory space in the DMA core, the corresponding buffer block, i.e. DMA mapped memory area, can be easily found simply by the sequence number of the descriptor. In addition, the one-to-one mapping relationship can be easily maintained and updated in the DMA kernel space, so that the corresponding memory area can be ensured to be correctly accessed during the DMA transmission.

In addition, in the embodiment of the invention, after the buffer block is subjected to the read-write operation, the descriptor corresponding to the buffer block can be updated and maintained.

Specifically, when there is user data to be written into the buffer (i.e., when there is user data to be transmitted), the interface may be invoked to determine whether there is a writable buffer block in the buffer, so as to write the user data into the corresponding buffer block, and update the information of the corresponding DMA descriptor and the location of the write pointer accordingly. In the case of a circular buffer block, if the write pointer reaches the end of the buffer, it is looped back to the buffer start position. Each time of data transmission can write a buffer block corresponding to a DMA descriptor, and then find the next descriptor to continue data transmission. By maintaining the read-write pointer, the data can be written and read circularly in the ring buffer, so that the cost of multiple data copies is avoided, and the efficiency and performance of data transmission are improved.

In the following description, user data transmission and device data reception are described separately from the viewpoint of the processor system 110, and thus, for convenience of description, the transmission buffer 10/transmission buffer block 12, the reception buffer 20/reception buffer block 22, the transmission descriptor table 30/transmission descriptor 32, the reception descriptor table 40/reception descriptor 42, and the like shown in fig. 2A are described as an example. Those skilled in the art will appreciate that it may be equivalently replaced with the buffers 50/buffers 52 and descriptor tables 60/descriptors 62 shown in fig. 2B without affecting the expression of the inventive concept.

Fig. 3 illustrates an exemplary flow chart of a data transmission method 300 for the system-on-chip 100 according to an embodiment of the invention. More specifically, the data transmission method 300 is a process in which the system-on-chip 100 transmits user data from the processor system 110 to the programmable logic system 120.

The data transmission method 300 may include, in time order, a process 310 in which the user device 140 transfers data to the transmit buffer 10 and a process 320 in which the processor system 110 transmits data to the programmable logic system 120. The processes 310 and 320 are linked by the transmit buffer 10 and the corresponding transmit descriptor table 30 and may operate asynchronously, i.e., the processes 310 and 320 operate independently by different processes or threads.

Here, for convenience of the user to use and control the AXI DMA core, the present invention provides a corresponding DMA device driver. The DMA controller driver is responsible for managing initialization, configuration and scheduling of DMA transmissions; the DMA device driver provides a compact interface for the user space program to implement data communication by operating the device node.

Specifically, the DMA device driver (i.e., DMA kernel driver) may pre-create kernel work queue (work queue) and kernel tasklet (tasklet) threads, respectively, for user data transmission and device data reception between the controller system 110 and the programmable logic system 120, respectively.

In this case, a hard interrupt notification is generated after the DMA client initiates the DMA sending or receiving transmission, the DMA interrupt is taken over by the DMA controller driver, only the relevant part of the register operation is responsible for in the interrupt service function, and the real processing is put into the soft interrupt tasklet task for processing.

As shown in fig. 3, process 310 may include a block 312 in which a DMA kernel interface is invoked by a user process to determine by a DMA kernel thread whether there is an idle transmit descriptor in transmit descriptor table 30 of processor system 110. For example, the user device 140 may invoke a "get idle descriptor" interface of the DMA kernel to put the user process into kernel mode operation. In kernel mode, the DMA kernel thread may examine the transmit descriptor table 30 to determine the transmit descriptor 32 in which the state is "idle".

If at block 312 the DMA kernel thread determines that there is an idle transmit descriptor 32 in the transmit descriptor table 30, at block 314 the DMA kernel thread may return information of the transmit buffer block 12 corresponding to the idle transmit descriptor 32 to the user process. Specifically, the DMA core may return at least the information of the starting physical address, size, of the transmit buffer 12 corresponding to the free transmit descriptor 32 to the user process.

Then, at block 316, the user process may copy the user data to the transmit buffer 32 corresponding to the free transmit descriptor 32.

On the other hand, if at block 312 the DMA kernel thread determines that there are no free send descriptors 32 in the send descriptor table 30, then at block 318 the user process may be suspended by the operating system to enter a sleep state. Thereafter, if an idle send descriptor 32 exists, the user process will be awakened again by the operating system.

The process 320 may include a block 321 in which a DMA kernel thread running in the processor system 110 determines whether there is user data in a work queue (work queue) of the processor system 110 to send to the programmable logic system 120.

If it is determined at block 321 that there is user data in the work queue of the processor system 110 to be sent to the programmable logic system 120, at block 322 a DMA transfer is initiated by the DMA kernel thread to the programmable logic system 120, and then the DMA kernel thread is suspended. That is, the DMA kernel thread blocks to wait for the transfer to complete immediately after determining that there is data to be sent (the send buffer 10 is not empty) and initiating the DMA transfer.

The AXI DMA core running in programmable logic system 120 then transmits the user data from transmit buffer 10 of processor system 110 to programmable logic system 120 at block 323. That is, direct Memory Access (DMA) of the programmable logic system 120 to the transmit buffer 10 is implemented.

After user data transmission is complete, an AXI DMA core running in programmable logic system 120 may initiate a hard interrupt notification to CPU 112 of processor system 110 at block 324. Thus, the CPU 112 can know that this data transfer is completed.

On the other hand, if it is determined at block 321 that there is no user data in the work queue of the processor system 110 to be sent to the programmable logic system 120, then at block 325 the occupation of the CPU 112 is released by the DMA kernel thread.

Further, after the AXI DMA core running in programmable logic system 120 initiates a hard interrupt to CPU 112 of processor system 110 in block 324, CPU 112 may further issue a soft interrupt to reclaim transmit buffer 12 corresponding to the user data and transmit descriptor 32 corresponding to the transmit buffer 12 in transmit buffer 10 of processor system 110 in block 326.

In this way, the DMA kernel utilizes the form of a work queue thread and a send buffer queue, initiates a DMA transfer if the send buffer is not empty, and blocks if the send buffer is empty.

Fig. 4 illustrates an exemplary flow chart of a data transmission method 400 for the system-on-chip 100 according to an embodiment of the invention. More specifically, the data transmission method 400 is a process in which the processor system 110 of the system-on-chip 100 receives device data from the programmable logic system 120.

The data transmission method 400 may include, in time order, a process 410 in which the processor system 110 obtains device data from the programmable logic system 120 and a process 420 in which the user device 140 obtains device data from the receive buffer 20. The processes 410 and 420 are linked by the receive buffer 20 and the corresponding receive descriptor table 40 and may operate asynchronously, i.e., the processes 410 and 420 operate independently by different processes or threads.

As shown in fig. 4, process 410 may include a block 411 in which CPU 112 invokes a DMA kernel interface to determine whether there is an empty receive descriptor 42 in receive descriptor table 40 of processor system 110.

If it is determined at block 411 that there is a free receive descriptor 42 in the receive descriptor table 40 of the processor system 110, at block 412, based on the information of the free receive descriptor 42 (e.g., the starting physical address, size, etc. of the receive buffer 32 corresponding to the receive descriptor 42), a DMA receive may be initiated to the programmable logic system (PL) by a tasklet queue thread running in the processor system 110 and then immediately freeing up the CPU 112 of the processor system 110.

The AXI DMA core running in programmable logic system 120 may then, at block 413, carry the device data to the receive buffer 22 in the receive buffer 20 of the processor system 110 corresponding to the free receive descriptor 42.

After the transfer is completed, the AXI DMA core may send a notification to the CPU 112 through a hard interrupt to report that the reception of the device data is completed.

On the other hand, if it is determined at block 411 that there are no free receive descriptors 42 in the receive descriptor table 40 of the processor system 110, at block 414, the occupation of the CPU 112 may be released by the DMA kernel thread.

Further, after the AXI DMA core running in programmable logic system 120 initiates a hard interrupt to CPU 112 of processor system 110 in block 413, CPU 112 may issue a soft interrupt to update the status of the idle receive descriptor 42 in block 415. For example, the state of the free receive descriptor 42 may be modified from "free" to "non-free".

The process 420 may include a block 422 in which the DMA kernel interface may be invoked by a user process to determine, by the DMA kernel thread, whether there are receive descriptors 42 available in the receive descriptor table 40 of the processor system 110. Here, the available reception descriptor 42 means that there is data to be received in the corresponding reception buffer block 22, that is, the state of the reception descriptor 42 is "non-idle".

If the DMA kernel thread determines that there is an available receive descriptor 42 in the receive descriptor table 40 of the processor system 110 at block 422, the DMA kernel thread may return information of the available receive descriptor to the user process at block 424. Specifically, the DMA kernel may return at least information of the starting physical address, size, of the receive buffer block 22 corresponding to the available receive descriptors 42 to the user process.

At block 426, the user process may process the data of the receive buffer 22 in the receive buffer 20 corresponding to the available receive descriptors 42 based on the information of the available receive descriptors 42.

On the other hand, if the DMA kernel thread determines at block 422 that there are no receive descriptors 42 available in the receive descriptor table 40 of the processor system 110, then at block 428 the user process may be suspended.

In this way, the DMA kernel receives the form of a kernel tasklet (tasklet) thread and a receiving buffer queue, and the kernel tasklet queue thread initiates a DMA receiving transmission first and then immediately exits to wait for receiving the completion interrupt. If the receiving completion interrupt is generated, the kernel task queue thread scheduling is started after the interrupt so as to configure DMA information received next time to initiate receiving transmission.

The method 300 of transmitting user data from the user device 140 from the processor system 110 to the programmable logic system 120 by way of DMA transfer in the system-on-chip 100 and the method 400 of the processor system 110 receiving device data from the industrial device 150 from the programmable logic system 120 are described above in connection with fig. 3 and 4, respectively. Wherein prior to the DMA transfer, the processor system 110 may package the user data or device data to be transferred into a data frame for the DMA transfer. Depending on the type of user data and device data, the DMA transfer may have different execution modes and the corresponding execution mode may be selected by setting up the user device 140 and/or the industrial device 150.

In some cases, the amount of data of the user data and/or the device data is relatively small or the real-time requirements are relatively high, for example in the case of instructions or configuration information. In this case, process 310 may send only one data frame to send buffer 10 each time a user process invokes a DMA kernel interface, i.e., only one data frame is stored per send buffer 12; and/or process 420 may receive only one data frame from receive buffer 20 each time a user process invokes a DMA kernel interface, i.e., only one data frame is stored per receive buffer 22. So that the DMA transfer is also only for a single data frame at a time, the format of which is shown for example in fig. 5. This DMA transfer mode may also be referred to herein as a general mode.

As shown in fig. 5, the data frame may include a preamble field at the beginning of the data frame and/or a tail field at the end of the data frame, in addition to valid data (user data or device data), which may be predefined as specific values as separators of different data frames. The preamble and/or the tail code may further comprise a check code (e.g., a cyclic check code) for checking the data frame.

The data frame may further comprise a frame header portion, which may comprise at least a data length (an effective data length or a total length of the data frame). The total length of one data frame shown in fig. 5 is at most the buffer block size (e.g., 64 KB).

More specifically, in this case, for the method 300 shown in fig. 3, after the user device 140 copies the user data to be transmitted to the transmit buffer 12 corresponding to the free transmit descriptor 32 in block 316 and before the DMA kernel thread initiates the DMA transfer in block 322, the processor system 110 may package the user data in the transmit buffer 12 into a data frame (also referred to as a user data frame) as shown in fig. 5, and the axi DMA core transmits the data frame to the programmable logic system 120 in block 323.

For the method 400 shown in fig. 4, after the AXI DMA core has carried the device data to the receive buffer 22 in block 413 and before the user process receives the device data from the receive buffer 22 in block 426, the processor system 110 may package the device data in the receive buffer 22 into a data frame (also referred to as a device data frame) as shown in fig. 5, and the user device may receive the data frame from the receive buffer 22 in block 426.

In other cases, the amount of user data and/or device data is relatively large, such as in the case of network or image data. In this case, process 310 may send multiple data frames to send buffer 10 each time a user process invokes a DMA kernel interface, i.e. each send buffer 12 may store multiple data frames; and/or process 420 may receive a plurality of data frames from receive buffer 20 each time a user process invokes a DMA kernel interface, i.e., each receive buffer 22 may store a plurality of data frames. So that a DMA transfer can be directed to a plurality of data frames at a time, the format of which is shown for example in fig. 6. Such a DMA transfer mode may also be referred to herein as an efficient mode.

Fig. 6 illustrates a schematic diagram of a data frame for an efficient mode of DMA transfer according to some embodiments of the invention. The data frame shown in fig. 6 is different from the data frame shown in fig. 5 mainly in that the frame header portion of the data frame shown in fig. 6 further includes at least an index field indicating the number of the current data frame and a total length field indicating the total length of the current user data or device data.

More specifically, in this case, for the method 300 shown in fig. 3, the user equipment 140 may divide user data to be transmitted into a plurality of data pieces according to the data length (effective data length) of the data frame and copy one or more data pieces to one or more transmission buffers 12 in the transmission buffer 10, respectively, according to the size of the transmission buffer 12 in block 316. Before the DMA kernel thread initiates the DMA transfer in block 322, the processor system 110 may package each piece of data (as valid data) in the transmit buffer 12 into a data frame (also referred to as a user data frame) as shown in fig. 6 and take the data frame made up of pieces of data belonging to the same user data as a set of data frames. Thereafter, in the DMA transfer of block 323, a set of data frames is sent to programmable logic system 120 through the AXI DMA core. The programmable logic system 120 may parse each data frame according to the format of the data frame as shown in fig. 6 and send it to the industrial device 150 (not shown in the figures) via the ethernet interface.

For the method 400 shown in fig. 4, the device data may be a set of data frames (which may also be referred to as device data frames) as shown in fig. 6. In this case, after the AXI DMA core has carried the device data frames to the receive buffer 22 in block 413 and before the user device receives the device data from the receive buffer 22 in block 426, the processor system 110 may parse the received data frames, split the received set of data frames into multiple independent data frames according to the preamble and/or the trailer, and merge the valid data in each data frame for placement into a user buffer (not shown) for retrieval by the user device 140.

In this efficient mode of DMA transfer, by combining a plurality of data frames into a set of DMA data frames for transfer, the DMA engine can be utilized more effectively, the number of times of DMA start and stop can be reduced, the efficiency of data transfer can be improved, and the load of the system can be reduced. By combining and transmitting a plurality of data frames, the triggering times of the interrupt can be reduced, thereby reducing the interrupt processing overhead of the system. Because a certain interval exists between each data frame in network transmission, and each data frame needs a certain transmission time, the network bandwidth can be more effectively utilized by combining a plurality of data frames, the transmission interval is reduced, and the bandwidth utilization rate is improved.

With embodiments of the present disclosure, dedicated data buffers and descriptor tables are configured in the system-on-chip and send and receive descriptors are maintained by the kernel work queue and the tasklet queue threads, respectively. Because the descriptor contains the information such as the initial physical address and the size of the memory space, no additional data copy is needed between the DMA mapping memory space and the kernel space, and thus, the data copy between the processor system and the programmable logic system of the system on chip is reduced. In addition, the flow and sequence of data transmission can be conveniently controlled through the management of the descriptor table, so that the correct transmission and processing of the data are ensured. Further, such a design may improve the efficiency of data transfer and overall system performance due to reduced data copying and increased parallelism of data transfer. Furthermore, by setting the general mode and the efficient mode of DMA transfer, the method of the present invention can be applied to different data amounts or data types, providing a higher flexibility.

Further, the present disclosure provides various example embodiments, as described and as shown in the accompanying drawings. However, the present disclosure is not limited to the embodiments described and illustrated herein, but may be extended to other embodiments as would be known or would be apparent to one of ordinary skill in the art. Reference in the specification to "one embodiment," "the embodiment," "these embodiments," or "some embodiments" means that a particular feature, structure, or characteristic described is included in at least one embodiment, and that the appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment.

Finally, although the various embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended representations is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claimed subject matter.

Claims

1. A data transmission method for a system-on-chip, the system-on-chip comprising a processor system and a programmable logic system, the method comprising:

determining, by a DMA kernel thread running in the processor system, whether there is user data in a work queue of the processor system to be sent to the programmable logic system;

Initiating, by the DMA kernel thread, a DMA transfer to the programmable logic system and then suspending the DMA kernel thread in response to determining that there is user data in a work queue of the processor system to be sent to the programmable logic system;

and the AXI DMA core running in the programmable logic system transmits the user data from a transmission buffer area of the processor system to the programmable logic system, and initiates a hard interrupt notification to a main processor of the processor system after the user data transmission is completed.

2. The data transmission method according to claim 1, further comprising:

releasing, by the DMA kernel thread, occupancy of the host processor in response to determining that there is no user data in a work queue of the processor system to be sent to the programmable logic system; and/or

In response to the hard interrupt, the CPU issues a soft interrupt to reclaim a transmit buffer block corresponding to the user data and a transmit descriptor corresponding to the transmit buffer block in a transmit buffer of the processor system.

3. The data transmission method according to claim 1, further comprising:

invoking a DMA kernel interface through a user process to determine, through the DMA kernel thread, whether an idle transmit descriptor exists in a transmit descriptor table of the processor system;

Responsive to the DMA kernel thread determining that there is an idle transmit descriptor in a transmit descriptor table of the processor system, the DMA kernel thread returns information of a transmit buffer block corresponding to the idle transmit descriptor to the user process, and the user process copies the user data to the transmit buffer block; and

and suspending the user process in response to the DMA kernel thread determining that no free transmit descriptor exists in a transmit descriptor table of the processor system.

4. The data transmission method of claim 1, wherein the transmit buffer is a contiguous physical memory allocated by a core CMA, and a transmit descriptor table is built in the processor system that is one-to-one mapped with a plurality of transmit buffer blocks of the transmit buffer.

5. The data transmission method according to claim 1, further comprising:

packaging user data in a transmission buffer block of the transmission buffer into a data frame, and

transmitting, by an AXI DMA core running in the programmable logic system, the user data from a transmit buffer of the processor system to the programmable logic system includes:

The data frame is sent to the programmable logic system by an AXI DMA core running in the programmable logic system.

6. The data transmission method according to claim 1, further comprising:

dividing user data to be transmitted into a plurality of data slices according to the data length of a data frame, and copying one or more data slices of the plurality of data slices to one or more transmission buffer blocks of the transmission buffer zone respectively according to the size of the transmission buffer block of the transmission buffer zone, and

packing each of the one or more transmit buffers into a data frame, and taking the data frame comprised of the data slices belonging to the user data as a set of data frames,

wherein transmitting, by an AXI DMA core running in the programmable logic system, the user data from a transmit buffer of the processor system to the programmable logic system comprises:

the set of data frames is sent to the programmable logic system by an AXI DMA core running in the programmable logic system.

7. A system-on-chip includes a processor system and a programmable logic system, wherein a transmit buffer and a transmit descriptor table mapped one-to-one with the transmit buffer are allocated in the processor system,

Wherein the processor system is configured to:

determining, via a DMA kernel thread, whether there is user data in a work queue of the processor system to be sent to the programmable logic system;

in response to determining that there is user data in a work queue of the processor system to be sent to the programmable logic system, initiating a DMA transfer to the programmable logic system and suspending the DMA kernel thread, and

wherein the programmable logic system is configured to:

and after the DMA transmission is initiated, the user data is sent to the programmable logic system from a sending buffer zone of the processor system through an AXI DMA core, and after the user data is sent, a hard interrupt notification is initiated to a main processor of the processor system.

8. The system-on-chip of claim 7, wherein the processor system is further configured to:

responsive to determining that there is no user data in a work queue of the processor system to send to the programmable logic system, freeing, by the DMA kernel thread, occupancy of the host processor.

9. The system-on-chip of claim 7, wherein the processor system is further configured to:

Responsive to the DMA kernel thread determining that there is an idle transmit descriptor in a transmit descriptor table of the processor system, the DMA kernel thread returning information of a transmit buffer block corresponding to the idle transmit descriptor to a user process to cause the user process to copy the user data to the transmit buffer block; and

10. A data transmission method for a system-on-chip, the system-on-chip comprising a processor system and a programmable logic system, the method comprising:

the method comprises the steps that a main processor of the processor system calls a DMA kernel interface to determine whether idle receiving descriptors exist in a receiving descriptor table of the processor system through a DMA kernel thread;

responsive to determining that there are free receive descriptors in the receive descriptor table of the processor system, initiating, by a tasklet queue thread running in the processor system, a DMA receive to the programmable logic system and then immediately releasing occupancy of a main processor of the processor system;

An AXI DMA core running in the programmable logic system handles device data into a receive buffer block corresponding to the idle receive descriptor in a receive buffer of the processor system, and sends a notification to the host processor through a hard interrupt after the handling is completed to report that the reception of the device data is completed.

11. The data transmission method of claim 10, further comprising:

releasing, by the DMA kernel thread, occupancy of the host processor in response to determining that no free receive descriptor exists in a receive descriptor table of the processor system; and/or

In response to the hard interrupt, the host processor issues a soft interrupt to update the status of the idle receive descriptor.

12. The data transmission method of claim 10, further comprising:

invoking the DMA kernel interface by a user process to determine whether there are available receive descriptors in a receive descriptor table of the processor system;

responsive to the DMA kernel thread determining that there are available receive descriptors in a receive descriptor table of the processor system, the DMA kernel thread returning information of the available receive descriptors to the user process;

The user process processes data of a receiving buffer block corresponding to the available receiving descriptor in the receiving buffer area based on the information of the available receiving descriptor; and

the user process is suspended in response to the DMA kernel thread determining that no receive descriptor is available in a receive descriptor table of the processor system.

13. The data transmission method of claim 12, further comprising:

packaging the device data in the receive buffer block into a data frame, and

the user process processing data of a receiving buffer block corresponding to the available receiving descriptor in the receiving buffer area based on the information of the available receiving descriptor comprises:

the data frame is received from the receive buffer block.

14. The data transmission method of claim 12, further comprising:

and analyzing the received data frames, splitting a group of received data frames into a plurality of independent data frames according to the lead codes and/or the tail codes of the data frames, and merging effective data in each data frame to be placed into a user buffer area for the user equipment to take out.

15. A system-on-chip includes a processor system and a programmable logic system, wherein a receive buffer and a receive descriptor table mapped one-to-one with the receive buffer are allocated in the processor system,

Wherein the processor system is configured to:

invoking, by a host processor, a DMA kernel interface to determine, by a DMA kernel thread, whether an idle receive descriptor exists in a receive descriptor table of the processor system; and in response to determining that there are free receive descriptors in the receive descriptor table of the processor system, initiating, by a tasklet queue thread running in the processor system, a DMA receive to the programmable logic system and then immediately releasing occupancy of a main processor of the processor system, and wherein the programmable logic system is configured to:

device data is carried into a receive buffer block corresponding to the free receive descriptor in a receive buffer of the processor system through an AXI DMA core, and a notification is sent to the host processor through a hard interrupt after the completion of the carrying to report that the reception of the device data is complete.

16. The system-on-chip of claim 15, wherein the processor system is further configured to:

In response to the hard interrupt, a soft interrupt is issued by the host processor to update the status of the idle receive descriptor.