CN116347094B

CN116347094B - Computing system, video encoding method and data processing module

Info

Publication number: CN116347094B
Application number: CN202310315385.XA
Authority: CN
Inventors: 张献涛; 任晋奎; 张振祥; 文敢; 闵洪波; 赵政辉; 刘翔; 徐达维; 江海涛; 田巍
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2025-05-06
Anticipated expiration: 2043-03-28
Also published as: CN116347094A

Abstract

The embodiment of the application provides a computing system, a video coding method and a data processing module. In the embodiment of the application, the video coding function is unloaded from the CPU of the host to the data processing module, so that the calculation pressure of the CPU of the host is reduced, and the probability that the CPU of the host reaches the performance bottleneck can be reduced. On the other hand, the hardware processing unit in the data processing module accesses the memory of the host by using the DMA engine, so that the speed of the hardware processing unit for acquiring the video to be encoded can be improved, and further the subsequent video encoding efficiency can be improved.

Description

Computing system, video encoding method and data processing module

Technical Field

The present application relates to the field of computer technologies, and in particular, to a computing system, a video encoding method, and a data processing module.

Background

Video coding refers to converting a file in an original video format into a file in another video format by compression techniques. Thus, the video redundancy information can be reduced, and the video transmission bandwidth can be reduced. The video coding algorithm is complex, and if the central processing unit (Central Processing Unit, CPU) of the computing device is directly used for performing software coding, more CPU resources and memory resources are consumed, which easily causes a CPU performance bottleneck of the computing device.

Disclosure of Invention

Aspects of the present application provide a computing system, a data processing method, and a data processing module for offloading video encoding from a host processor, reducing the probability of a CPU of a computing device reaching a performance bottleneck.

In a first aspect, an embodiment of the present application provides a computing system, including a host and a data processing module, where the host is in communication connection with the data processing module, and the data processing module includes a hardware processing unit and a central processing unit CPU, where the hardware processing unit includes a direct memory access DMA engine and a control plane transmission channel, and where the hardware processing unit is additionally provided with a hardware encoder;

The host is used for issuing a video coding request to the hardware processing unit;

the hardware processing unit is used for transmitting the video coding request to the CPU through the control surface transmission channel;

the CPU runs the firmware of the hardware encoder and drives the hardware encoder to act based on the video coding request;

The hardware encoder is used for acquiring the video to be encoded corresponding to the video encoding request from the memory of the host by utilizing the DMA engine under the drive of the CPU, and encoding the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

In a second aspect, the embodiment of the application also provides a computing system, which comprises a host and a data processing module, wherein the host is in communication connection with the data processing module, the data processing module comprises a hardware processing unit and a Central Processing Unit (CPU), and the hardware processing unit comprises a Direct Memory Access (DMA) engine and a control surface transmission channel;

the CPU is used for controlling the hardware processing unit to acquire a video to be encoded from the memory of the host by utilizing the DMA engine based on the video encoding request, storing the video to be encoded into the memory of the CPU by utilizing the DMA engine, reading the video to be encoded from the memory of the CPU, and encoding the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

In a third aspect, the embodiment of the application also provides a video coding method, which is applicable to a data processing module, wherein the data processing module is connected with a host, and comprises a hardware processing unit and a CPU, wherein the hardware processing unit comprises a direct memory access DMA engine and a control surface transmission channel;

The method comprises the following steps:

Acquiring a video coding request issued by the host;

Transmitting the video coding request to the CPU through the control plane transmission channel;

The CPU runs firmware of the hardware encoder and drives the hardware encoder to act based on the video encoding request;

The hardware encoder acquires the video to be encoded corresponding to the video encoding request from the memory of the host by utilizing the DMA engine under the drive of the CPU, and encodes the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

In a fourth aspect, an embodiment of the present application further provides a data processing module, where the data processing module is connected to a host, and the data processing module includes a hardware processing unit and a CPU, where the hardware processing unit includes a DMA engine and a control plane transmission channel, and the method includes:

Acquiring a video coding request issued by the host;

The hardware encoder acquires a video to be encoded corresponding to the video encoding request from the memory of the host by utilizing the DMA engine under the drive of the CPU, and stores the video to be encoded to the memory of the CPU by utilizing the DMA engine;

And the CPU encodes the video to be encoded in the memory to obtain video encoding data corresponding to the video to be encoded.

In a fifth aspect, an embodiment of the present application further provides a data processing module, including a hardware processing unit and a CPU, where the hardware processing unit is in communication connection with the CPU, where the hardware processing unit includes a direct memory access DMA engine and a control plane transmission channel, where the hardware processing unit is additionally provided with a hardware encoder, where the CPU is configured to execute, when the data processing module is in communication connection with a host, a step in a method executed by the CPU in the video encoding method provided by the third aspect;

The hardware encoder is configured to perform steps in a method performed by the hardware encoder in the video encoding method provided in the third aspect.

In a sixth aspect, an embodiment of the present application further provides a data processing module, including a hardware processing unit and a CPU, where the hardware processing unit is in communication connection with the CPU, where the hardware processing unit includes a direct memory access DMA engine and a control plane transmission channel, and where the CPU is configured to execute, when the data processing module is in communication connection with a host, a step in a method executed by the CPU in the video encoding method provided in the fourth aspect;

the hardware processing unit is configured to execute steps in a method executed by the hardware processing unit in the video encoding method provided in the fourth aspect.

In a seventh aspect, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the video encoding method described above.

In the embodiment of the application, the video coding function is unloaded from the CPU of the host to the data processing module, so that the calculation pressure of the CPU of the host is reduced, and the probability that the CPU of the host reaches the performance bottleneck can be reduced. On the other hand, the hardware processing unit in the data processing module accesses the memory of the host by using the DMA engine, so that the speed of the hardware processing unit for acquiring the video to be encoded can be improved, and further the subsequent video encoding efficiency can be improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIGS. 1-3 are schematic diagrams illustrating a computing system according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of video encoding by using a pipelined task execution mode in a computing system according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an interactive flow of video encoding by a computing system according to an embodiment of the present application;

FIG. 6 is a schematic diagram of another computing system according to an embodiment of the present application;

fig. 7a and fig. 7b are schematic flow diagrams of a video encoding method according to an embodiment of the present application;

fig. 8a and fig. 8b are schematic structural diagrams of a data processing module according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In some embodiments of the present application, the video encoding function is offloaded from the CPU of the host to the data processing module, reducing the computational pressure of the CPU of the host, and reducing the probability that the CPU of the host reaches the performance bottleneck. On the other hand, the hardware processing unit in the data processing module accesses the memory of the host by using the DMA engine, so that the speed of the hardware processing unit for acquiring the video to be encoded can be improved, and further the subsequent video encoding efficiency can be improved.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

It should be noted that like reference numerals refer to like objects in the following figures and embodiments, and thus once an object is defined in one figure or embodiment, no further discussion thereof is required in the subsequent figures and embodiments.

Fig. 1-3 are schematic structural diagrams of a computing system according to an embodiment of the present application. With reference to fig. 1-3, the computing system includes a host 10 and a data processing module 20.

In this embodiment, the host 10 refers to any computer device having computing, storage, and communication functions. For example, the host 10 may be a server, a computer, a cell phone, or the like. In this embodiment, the host may include a general purpose processing unit or the like. In the present embodiment, the number of general-purpose processing units is not limited. The number of the general processing units can be at least 1, namely 1 or more, and each general processing unit can be a single-core processing unit or a multi-core processing unit.

In this embodiment, the general processing unit is generally a processing chip disposed on a motherboard of the host 10, such as a central processing unit (Central Processing Unit, CPU) 101 of the host, and cannot implement single-machine expansion. A general purpose processing unit may be any processing device with processing computing capabilities. The general processing unit may be a serial processing unit or a parallel processing unit. For example, the general-purpose processing unit may be a general-purpose processor such as a CPU or the like. Parallel processing units refer to processing devices that can perform parallel computing processing. For example, the parallel processing unit may be a graphics processing unit (Graphics Processing Unit, GPU) or a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), or the like. Optionally, the memory of the general purpose processing unit is larger than the memory of the parallel processing unit. Fig. 1 illustrates a general-purpose processing unit as an example of a CPU, but is not limited thereto.

The data processing module 20 described above refers to any programmable hardware device or means having computing and communication capabilities. The data processing module 20 may include a hardware processing unit 201 and a CPU 202. The CPU 202 may be a stand-alone processor or may be a processor on a Chip (SoC) in a System on Chip.

The hardware processing unit 201 may be a hardware processor built by an electronic device, or may be a hardware processor that performs data processing by using a hardware description language (Hardware Description Language, HDL). The hardware description language may be Very High-speed-SPEED INTEGRATED Circuit Hardware Description Language (VHDL), verilog HDL, system Verilog or System C. The hardware processing unit 201 may be an FPGA, a programmable array Logic device (Programmable Array Logic, PAL), a general-purpose array Logic device (GENERAL ARRAY Logic, GAL), a complex programmable Logic device (Complex Programmable Logic Device, CPLD), or the like. Alternatively, the hardware processing unit 201 may be an Application SPECIFIC INTEGRATED Circuit (ASIC), or the like.

In the present embodiment, the data processing module 20 is communicatively connected to the host 10, and precisely, the data processing module 20 is communicatively connected to the CPU 101 of the host 10. In particular, the data processing module 20 and the host 10 may be communicatively connected by a bus interface. The bus interface may be a serial bus interface, such as a peripheral component interconnect express (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) bus interface, a peripheral component interconnect express (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus interface, a super channel interconnect express (Ultra Path Interconnect, UPI) bus interface, a universal serial bus (Universal Serial Bus, USB) serial interface, an RS485 interface, or an RS232 interface, etc. Preferably, the bus interface is a PCIe interface, which increases the data transfer rate between the data processing module 20 and the host 10.

The bus interface of the host 10 may be extended according to the specification of the host 10, and typically, the communication interfaces of the host 10 are plural. In the embodiments of the present application, "more" means more than 1, i.e., 2 or more than 2. When the data processing module 20 is communicatively connected with the host 10 through the bus interface, the data processing module 20 may be plural, so as to implement expansion of the data processing module 20.

In some embodiments, the data processing module 20 and the host 10 may be disposed on different physical machines, and the host 10 and the data processing module 20 may be connected through a network communication. For example, the host 10 and the data processing module may be disposed in different cloud servers, connected by network communication, and so on. In fig. 1 to 3, the host computer 10 and the data processing module 20 are only shown as being provided on the same physical machine, but the present invention is not limited thereto.

In this embodiment, in order to decompress the CPU of the virtual machine in the host 10, the probability that the CPU reaches the performance bottleneck is reduced, the virtual machine in the host 10 may map the data processing module 20 to a virtual device of the virtual machine through a virtualization technology, and offload the data processing function to the data processing module 20 through the mapped virtual device.

In the embodiment of the present application, the specific implementation form of the virtual device in which the virtual machine maps the data processing module 20 to the host 10 through the virtualization technology is not limited. In some embodiments, the virtual machine may map data processing module 20 to virtual devices of the virtual machine through a paravirtualization technique. Wherein the paravirtualization technique is compared to full virtualization. In a fully virtualized solution, a Virtual Machine monitor (Virtual Machine Monitor, VMM) is required to intercept all the requested instructions and then simulate the behavior of these instructions, which tends to incur much overhead in performance, in order for the client Virtual Machine (VM) to use the underlying host resources. The half virtualization is implemented by using a bottom hardware auxiliary mode, a partially virtualized instruction is implemented by using hardware, a VMM is only responsible for implementing the virtualization of the partially virtualized instruction, to achieve this, a client VM is required to cooperate, the client VM is required to complete front-end drivers of different devices, and the VMM cooperates with the client VM to complete corresponding back-end drivers, so that an efficient virtualization process can be implemented between the two through a certain interaction mechanism.

In some embodiments, the paravirtualized technology may be a Virtual Input/Output virtualization (Virtual io) technology. The virtual IO is an I/O paravirtualized solution, is a lubricant for communication between a client (Guest) and a Host (Host), provides a set of universal framework and standard interfaces or protocols to complete the interaction process between the client (Guest) and the Host (Host), and greatly solves the adaptation problem between various drivers and different virtualized solutions. The virto technology provides a set of communication frameworks and programming interfaces between the upper layer applications and the VMM virtualization devices (e.g., kernel-based virtual machines (Kernel based Virtual Machine, KVM), xen, vmware, etc.). In general, virto may be divided into four layers, including various driver modules in the front-end client, handler modules on the back-end VMM, and a virto Ring layer (i.e., virto Ring) layer in between for front-end and back-end communications. The virtual queue interface is realized by the VirtIO layer and can be regarded as a bridge for front-end and back-end communication, and the VirtIO Ring layer is a specific realization of the bridge, and two annular buffer areas are realized and are respectively used for storing the information executed by a front-end driver and a back-end processing program.

Accordingly, as shown in FIG. 2, the data processing module 20 may support Single Root I/O virtualization (Single Root I/OVirtualization, SR-IOV). The SR-IOV can virtually generate a plurality of light-weight virtual data processing modules by one data processing module, and can be distributed to a plurality of virtual machines deployed by a host for use. The SR-IOV protocol introduces the concept of two types of functions, physical functions (Physical Function, PF) and Virtual Functions (VF). The PF is used for PCI functions that support SR-IOV functions, as defined in the SR-IOV specification. The PF includes an SR-IOV function configuration structure for managing the SR-IOV functions. The PF is a fully functional PCIe function that can be discovered, managed, and handled like PCIe devices. The PF has fully configured resources that can be used to configure or control PCIe devices. VFs are one function associated with a PF, which is a lightweight PCIe function that may share one or more physical resources with a physical function as well as other VFs associated with the same physical function. VFs are only allowed to own configuration resources for their own behavior.

Based on the principles of the paravirtualized technology described above, the virtual machine may virtualize the data processing module 20 into a plurality of virtual video devices and distribute to the plurality of virtual machines through the SR-IOV technology. Specifically, the virtual machine may initialize the data processing module 20, and during the initialization process of the data processing module 20, configure the identity of the data processing module 20 to virtualize the data processing module 20 as a virtual device of the virtual machine, i.e. a virtual video module. The identity of the data processing module 20 refers to an identity that uniquely identifies a virtual device on a bus of a host. For the embodiment in which the data processing module 20 is connected to the Bus of the host 10 through the PCIe Bus interface and communicates with the host 10, the identity of the data processing module 20 may be represented by a Bus Number (Bus Number), a Device Number (Device Number), and a Function Number (BDF for short).

BDF is a unique identifier for each function in a PCIe device. Each PCIe device may have only one function, or may have multiple functions, and may have up to 8 functions. Regardless of how many functions a PCIe device has, each function has a unique independent configuration space. The host 10 may obtain some information about the PCIe device through the segment space, or may configure the PCIe device through the segment space, which is called a PCIe configuration space. The PCIe configuration software (e.g., PCIe Root complex, etc.) can identify the topology logic of the PCIe bus system and each bus therein, each PCIe device and each function of the PCIe devices, i.e., the BDF.

In an embodiment of the present application, in order to map the data processing module 20 to a virtual Video (Video) device of a virtual machine, the function number in the identity of the virtual machine configurable data processing module 20 includes an identity of the Video device. For the host 10 to identify PCIe devices, device enumeration is also required for PCIe devices. PCIe architectures typically include Root components (Root comples), switches (switches), and various PCIe devices. The PCIe device is a terminal device (Endpoint) of the PCIe architecture. For so many devices, after the CPU of the virtual machine is started, PCIe device enumeration is needed to identify the devices. Root complexes typically access each possible branch path using a particular algorithm, such as a depth-first algorithm, until no further access is possible and each PCIe device is accessed only once. This process is referred to as PCIe device enumeration.

In this embodiment, for the host 10, the identity of the data processing module 20 may be obtained from a register of the virtual device in the device enumeration process, and the function of the virtual device is determined to be a video device, and further, the host 10 may be additionally provided with a driver corresponding to the function of the virtual device, so that for the host 10, the data processing module 20 is a video device of the virtual machine, that is, a virtual video device that virtualizes the data processing module 20 into the virtual machine is implemented. For the virtuo technology, the virtual device driver may be a virtuo video (virtuo-video) driver. Accordingly, the communication protocol between the hardware processing unit and the host may be the Input/Output paravirtualized Video (Virtual Input/Output-Video) protocol.

In the embodiment of the present application, in order to achieve the offloading of the video encoding function, a hardware encoder 201a may be added to the hardware processing unit 201 to achieve the hardware encoding function of video. The hardware encoder 201a may be a video encoder built by an electronic device, or may be a video encoder encoded by a hardware description language.

In this embodiment, the hardware processing unit 201 in the data processing module 20 includes a direct memory access (Direct Memory Access, DMA) engine 201b and a control plane transfer channel 201c. The control plane transmission channel is a channel for transmitting control plane information (such as control plane messages or data packets). The control plane transmission channel 201c is used to pass data between the host 10 and the CPU 202. The DMA engine 201b may access the host memory in a DMA manner or may access the memory of the CPU 202 in a DMA manner.

Because the data processing module 20 is provided with the DMA engine and the control surface transmission channel, the corresponding DMA engine and control surface transmission channel do not need to be developed by the hardware encoder 201a additionally arranged in the data processing module 20, and the data transmission and DMA engine can multiplex the existing channels in the data processing module 20, so that the development cost of video encoding is reduced.

In this embodiment, the data processing module 20 may be implemented as any form of off-load card. In some embodiments, the data processing module 20 may be implemented as a data processing unit (Data Processing Unit, DPU). The DPU is a special electronic circuit with a hardware acceleration function and is used for data-centric computation. Data is introduced and transmitted in the form of multiplexed packets of messages. The DPU typically includes a CPU, a network card, and a DMA engine. This allows the DPU to have versatility and programmability of the CPU while efficiently operating on network packets, storage requests, or analysis requests. The network card refers to computer hardware with a network communication function.

Virtualization for encoders involves mainly three parts, device enumeration, virto-video messaging and direct memory access (Direct Memory Access, DMA) of data. Since the DPU offloads at least network and/or storage functions and supports virtualization of the corresponding devices, the hardware encoder 201a may directly multiplex these frameworks. For example, for SR-IOV device enumeration, the DPU has implemented a simulation of device space through software and hardware, and the hardware encoder can be virtualized as a VirtIO-video device on a small scale. The delivery of the virto-video message, as well as the DMA engine, may multiplex existing paths in the DPU, and may even directly multiplex the network core and/or stored virtual queues.

As shown in fig. 2, the CPU 202 may also integrate Firmware (Firmware) corresponding to the hardware encoder 201 a. In this embodiment, the firmware corresponding to the hardware encoder refers to a driver of the hardware encoder. The CPU 202 can drive the hardware programmer 201a to perform operations such as video encoding by the firmware.

Based on the above-described hardware processing unit 201 having the video encoding function, the host 10 may offload the video encoding function to the data processing module 20. In this embodiment, in order to improve the versatility and the programmable flexibility of the CPU, the control plane program may be deployed on the CPU 202, so as to facilitate updating the control plane program. CPU 202 may also have its own CPU memory, i.e., memory 203 in fig. 3. The CPU 202 is connected to the memory 203 through a communication interface. The memory is typically dynamic random access memory (Dynamic Random Access Memory, DRAM). Accordingly, the communication interface may be implemented as a DRAM-supported communication interface, referred to as a DRAM interface for short. The memory 203 may be used to store control plane programs for the CPU 202.

The hardware processing unit 201 is used for video encoding, and the CPU 202 is used for controlling the hardware processing unit 201 to realize separation of a data plane and a control plane. In the embodiment of the present application, the control plane program includes, but is not limited to, a program (such as firmware) for analyzing a message and controlling the hardware processing unit 201.

In this embodiment, the host 10 can load the video to be encoded into the memory 102 of the host when it has video encoding requirements. Further, in connection with fig. 1-3, a video encoding request is issued to the data processing module 20. The video encoding request may request that the data processing module 20 allocate encoder resources and perform encoding processing or the like on the corresponding video.

The hardware processing unit 201 in the data processing module 20 receives the video encoding request and transparently transmits the video encoding request to the CPU 202 through the control plane transmission channel 201 c. For the virto technique, the video encoding request follows the virto-video protocol.

Accordingly, the CPU 202 may run the firmware of the hardware encoder 201a and drive the hardware encoder 201a to act (corresponding to "control data reading and video encoding" in fig. 1) based on the video encoding request.

The hardware encoder 201a may use the DMA engine 201b to obtain the video to be encoded from the memory 102 of the host under the driving of the CPU 202. Further, the hardware encoder 201a may encode the video to be encoded to obtain video encoded data, such as video bitstream data, corresponding to the video to be encoded. Wherein the video encoded data may be an original stream (ELEMENTARY STREAMS, ES). The original stream is a data stream directly output from the hardware encoder 201a, i.e., an encoded video data stream.

In this embodiment, the video encoding function is offloaded from the CPU of the host to the data processing module, so that the calculation pressure of the CPU of the host is reduced, and the probability that the CPU of the host reaches the performance bottleneck can be reduced. On the other hand, the hardware encoder in the hardware processing unit accesses the memory of the host by using the DMA engine, so that the speed of the hardware encoder for acquiring the video to be encoded can be improved, and further the subsequent video encoding efficiency can be improved. In addition, the video coding is unloaded to a hardware coder for processing, so that the hardware acceleration of the video coding can be realized, and the video coding efficiency can be improved.

In addition, because of the flexibility of the CPU software update, the control plane program is executed by the CPU in the data processing module in this embodiment, so that the control plane program update is facilitated.

For the embodiment of the data processing module which is integrated with the DMA engine and the control surface transmission channel in advance, the hardware encoder additionally arranged in the hardware processing unit can directly multiplex the DMA engine and the control surface transmission channel of the data processing module, so that the development cost of the hardware encoder is reduced.

In some embodiments, in conjunction with FIGS. 2 and 3, the hardware processing unit 201 may also include a memory controller 201d and a memory 201e. The memory controller 201d is mainly used for accessing the memory 201e. The memory 201e may be a memory integrated by the hardware processing unit 201 itself, or may be an external memory of the encodable hardware unit 201. Accordingly, the memory 201e may be connected to the hardware processing unit 201 through a communication interface. Memory 201e may be DRAM. Accordingly, the communication interface may be implemented as a DRAM interface. In some embodiments, memory 201e may be implemented as double rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM). Accordingly, the memory controller 201d may be a DDR memory controller.

In this embodiment, the memory controller 201d may write the video to be encoded read by the DMA engine 201b into the memory 201e, and the hardware encoder 201a may read the video to be encoded from the memory 201e and encode the video to be encoded.

In the embodiment of the present application, a specific algorithm for encoding video to be encoded by the hardware encoder 201a is not limited. In some embodiments, the video encoding algorithm employed by the hardware encoder 201a may be, but is not limited to, an H.264 algorithm, an H.265 algorithm, an H.266 algorithm, a technique that is a motion still image (or frame-by-frame) compression algorithm (Motion Joint Photographic Experts Group, MJPEG) or a discrete cosine transform (Discrete Cosine Transform, DCT) based compression algorithm, or the like.

In some embodiments of the present application, in conjunction with fig. 2 and 3, the host 10 is deployed with a virtual machine 30. The number of virtual machines 30 may be 1 or more. The plural means 2 or more than 2. Resources of virtual machine 30 may include the virtual machine's CPU and memory, etc. The CPU and the memory of the virtual machine are connected through a communication interface. The memory may also be DRAM. Accordingly, the communication interface may be implemented as a DRAM interface. Fig. 3 illustrates an example in which the communication interface is a DRAM interface, but the present application is not limited thereto.

The CPU of the Virtual machine may be referred to as a Virtual CPU (vCPU). Virtual machine 30 may issue video encoding requests to data processing module 20 according to video encoding requirements. Specifically, the virtual machine 30 deploys an Application (APP) or the like. Virtual machine 30 may issue video encoding requests to data processing module 20 according to video encoding requirements of the application.

The user of the virtual machine 30 can purchase the corresponding encoding capacity according to the actual demand. The scheduling node corresponding to the host 10 may allocate the coding capacity for the virtual machine 30 according to the requirement of the coding capacity of the user of the virtual machine 30. Wherein the encoding capacity is used to reflect the encoding capacity of the resources allocated to the virtual machine, the encoding capacity limits the amount of video data that can be encoded by the hardware encoder 201 a. Generally, the coding capacity is equal to the product of resolution and frame rate. The Frame rate refers to the number of Frames Per Second (FPS) processed. The virtual machine 30 may use no more encoding capacity than is allocated to the virtual machine.

In this way, when the hardware encoder 201a is driven to operate based on the video encoding request, the CPU 202 can determine whether to accept the video encoding request based on the encoding capacity corresponding to the virtual machine (i.e., the encoding capacity allocated to the virtual machine) and the video encoding request.

Specifically, the CPU202 may parse the video encoding request (corresponding to "request parsing" in fig. 2). Alternatively, the CPU202 may acquire the encoding capacity requirement information of the virtual machine 30 based on the video encoding request. Specifically, the CPU202 may obtain the resolution and frame rate of the video frame of the video to be encoded from the video encoding request, and calculate the encoding capacity requirement information of the virtual machine according to the resolution and frame rate of the video frame of the video to be encoded. For example, the resolution of a video frame of the encoded video and the product of the frame rate may be calculated to obtain the encoding capacity requirement information of the virtual machine.

In practical applications, a single virtual machine may send a single video encoding request, or may send multiple video encoding requests concurrently. The plural means 2 or more than 2. For an embodiment in which a single virtual machine concurrently sends multiple video encoding requests, the CPU 202 may obtain encoding capacity requirement information corresponding to each video encoding request based on the multiple video encoding requests, and calculate the encoding capacity requirement information of the virtual machine according to the encoding capacity requirement information corresponding to each of the multiple video encoding requests. For example, the sum of the coding capacity requirement information corresponding to each of the plurality of video coding requests may be calculated to obtain the coding capacity requirement information of the virtual machine.

Further, the CPU 202 may determine whether the remaining encoding capacity of the virtual machine 30 meets the requirement of the encoding capacity requirement information according to the encoding capacity corresponding to the virtual machine (i.e., the encoding capacity allocated to the virtual machine) and the encoding capacity currently consumed by the virtual machine 30. If the remaining encoding capacity of the virtual machine 30 is greater than or equal to the encoding capacity demand information, it is determined that the remaining encoding capacity of the virtual machine 30 meets the requirement of the encoding capacity demand information. If the remaining encoding capacity of the virtual machine 30 is smaller than the encoding capacity requirement information, it is determined that the remaining encoding capacity of the virtual machine 30 does not meet the requirement of the encoding capacity requirement information.

Further, in the case where the remaining encoding capacity of the virtual machine 30 satisfies the requirement of the encoding capacity demand information, it is determined to accept the video encoding request. Further, the CPU 202 may return a request acceptance message to the virtual machine 30. The hardware processing unit 201 may transparently transmit the request acceptance message to the virtual machine 30 through the control plane transmission channel 201 c. Of course, if the remaining encoding capacity of the virtual machine 30 does not meet the requirement of the encoding capacity requirement information, a request rejection message may be returned to the virtual machine 30. The hardware processing unit 201 may transparently transmit the request rejection message to the virtual machine 30 through the control plane transmission channel 201 c.

For embodiments that return a request acceptance message, virtual machine 30 may determine a video frame to be encoded from the video to be encoded in response to the request acceptance message and provide metadata for the video frame to be encoded to CPU 202. In particular, virtual machine 30 may provide metadata for the video frames to be encoded to data processing module 20. The hardware processing unit 201 in the data processing module 20 transparently passes the metadata of the video frame to be encoded to the CPU 202 through the control plane transmission channel 201 c. The metadata of the video frame to be encoded is information describing the attribute of the video frame, and may include a storage location of the video frame, an identification and a resolution of the video frame, and the like. In the virto technique, metadata of video frames to be encoded may be encapsulated in packets following the virto-video protocol, and passed through to CPU 202 in the form of packets.

The CPU 202 may obtain metadata for the video frame to be encoded. Specifically, the CPU 202 may parse a data packet containing metadata of a video frame to be encoded, to obtain metadata of the video frame to be encoded carried by the data packet.

Further, the CPU 202 may drive the hardware encoder 201a to act based on the storage location information in the metadata of the video frame to be encoded. Accordingly, the hardware encoder 201a obtains the video frame to be encoded from the memory 102 of the virtual machine by using the DMA engine 201b under the driving of the CPU 202, and encodes the video frame to be encoded.

In order to prevent the virtual machine from exceeding the standard and using the coding resources, the CPU 202 may also monitor the actual usage of the coding capacity by the virtual machine 30 in real time during the coding process of the video to be coded in the case of determining to accept the video coding request of the virtual machine. If the actual usage is larger than the allocated coding capacity of the virtual machine 30, the frame rate of the video coding of the hardware processing unit 201 may be reduced so that the actual usage of the coding capacity by the virtual machine 30 does not exceed the allocated coding capacity of the virtual machine.

In some embodiments, the actual usage of the encoding capacity by the virtual machine 30 may be made equal to the encoding capacity allocated by the virtual machine by reducing the frame rate of video encoding by the hardware encoder 201 a. Or the frame rate of video encoding by the hardware encoder 201a may be reduced so that the actual usage of the encoding capacity by the virtual machine 30 is smaller than the encoding capacity allocated by the virtual machine, or the like.

For the above embodiment in which a single virtual machine concurrently transmits a plurality of video encoding requests, in the case where the CPU 202 determines to accept the single virtual machine to concurrently transmit a plurality of video encoding requests, the CPU 202 may drive the hardware encoder 201a to perform encoding tasks corresponding to the plurality of video encoding requests in parallel in a time-division multiplexing manner (corresponding to "schedule" in fig. 2). Accordingly, the hardware encoder 201a performs the encoding tasks corresponding to the plurality of video encoding requests in parallel in a time-division multiplexing manner under the driving of the CPU 202.

Specifically, the CPU 202 may allocate 1/N of processing time for each video encoding request based on a fair scheduling policy. N is the number of the video coding requests, N is more than or equal to 2 and is an integer, and the hardware encoder 201a is controlled to execute the coding task of the video coding request corresponding to the current 1/N processing time. For example, the CPU 202 may receive metadata of a video frame to be encoded of a video encoding request corresponding to a current processing time of 1/N transmitted by the virtual machine 30. Further, the CPU 202 may drive the hardware encoder 201a to act based on metadata of the video frame to be encoded. Accordingly, the hardware encoder 201a obtains metadata of the video frame to be encoded of the video encoding request corresponding to the current processing time of 1/N from the memory of the virtual machine 30 by using the DMA engine 201b, and encodes the video frame to be encoded read by the DMA engine 201b until the processing time of 1/N arrives.

For embodiments where the host 10 is deployed with multiple virtual machines 30, there may be at least two of the multiple virtual machines 30, and the video encoding request may be sent concurrently to the data processing module 20.

Although the hardware encoder 201a can be virtualized into a plurality of virtual video devices and allocated to the plurality of virtual machines 30 by the above-described paravirtualization technique, the physical hardware encoder 201a is limited, and thus, it is necessary to schedule resources for video encoding requests transmitted concurrently by at least two virtual machines.

For the embodiment in which at least two virtual machines concurrently transmit video encoding requests, the CPU 202 may also determine whether to accept the video encoding request of each virtual machine according to the encoding capacity corresponding to each virtual machine and the video encoding request transmitted by the virtual machine. For any virtual machine A in at least two virtual machines which concurrently send video coding requests, whether to accept the video coding requests of the virtual machine A can be judged according to the coding capacity of the virtual machine A and the video coding requests sent by the virtual machine A. For the specific implementation of determining whether to accept the video encoding request of the virtual machine a, please refer to the related content of the above embodiment, and the description is omitted herein.

In the case of determining to accept the video encoding request concurrently transmitted by the at least two virtual machines, the CPU 202 may schedule the encoding tasks of the at least two virtual machines based on the set encoding scheduling policy, and control the hardware processing unit 201 to execute the encoding task of the currently scheduled virtual machine (corresponding to "scheduling" in fig. 2).

The set encoding scheduling policy refers to a scheduling policy for concurrent video encoding requests, and the policy may determine which virtual machine is selected from the concurrent video encoding requests to send the video encoding request for processing.

In the embodiment of the application, the specific implementation form of the coding scheduling strategy is not limited. In some embodiments, users corresponding to the virtual machines have different priorities, and the coding scheduling policy may be implemented as a priority scheduling policy. Accordingly, the CPU 202 may control the hardware encoder 201a to preferentially perform the encoding task of the virtual machine with the highest user priority according to the user priorities corresponding to the at least two virtual machines providing the concurrent video encoding requests.

In other embodiments, the coded scheduling policy may be implemented as a time division multiplexed scheduling policy. Accordingly, the CPU 202 may schedule the encoding tasks of at least two virtual machines that provide concurrent video encoding requests based on a time-division multiplexing scheduling policy, and drive the hardware encoder 201a to perform the encoding tasks of at least two virtual machines that provide concurrent video encoding requests in a time-division multiplexing manner. Accordingly, the hardware encoder 201a performs the encoding tasks of at least two virtual machines providing concurrent video encoding requests in a time-division multiplexing manner under the driving of the CPU 202, so as to implement time-division multiplexing of the hardware encoder between the encoding tasks of different virtual machines, thereby implementing parallel execution of the encoding tasks of multiple virtual machines.

Alternatively, the time division multiplexing scheduling policy may be a fair scheduling policy, i.e. users of the virtual machine have the same priority. Accordingly, the CPU 202 may schedule the encoding tasks of at least two virtual machines that provide concurrent video encoding requests based on a fair scheduling policy, so as to implement time-division multiplexing of the hardware processing units between the encoding tasks of different virtual machines, thereby implementing parallel execution of the encoding tasks of multiple virtual machines, which may be referred to as time-parallel.

Specifically, the CPU 202 may allocate 1/M processing time for the above-described executing two virtual machines, and drive the hardware encoder 201a to execute the encoding task of the target virtual machine corresponding to the current 1/M processing time. Where M is the number of at least two virtual machines providing concurrent video encoding requests. M is more than or equal to 2 and is an integer. Accordingly, the hardware encoder 201a executes the encoding task of the target virtual machine corresponding to the current processing time of 1/M until the processing time of 1/M is reached, under the drive of the CPU 202.

Specifically, the CPU 202 may receive metadata of a video frame to be encoded sent by the target virtual machine, and drive the hardware encoder 201a to act based on storage location information in the metadata of the video frame to be encoded. The hardware encoder 201a reads the video frame to be encoded from the memory of the target virtual machine based on the storage position information of the video frame to be encoded by using the DMA engine 201c under the driving of the CPU 202, and performs video encoding on the video frame to be encoded to obtain video encoding data of the video frame to be encoded.

In some embodiments, the fair scheduling policy may be a full fair scheduling (Completely Fair Scheduler, CFS) policy. The CFS strategy is a scheduling algorithm based on the weighted fair queuing concept. The CFS strategy introduces the concept of weight, the weight is used for representing the user priority of the virtual machine, and the video coding task of each virtual machine distributes processing time according to the proportion of the weight. Wherein, the higher the priority is, the larger the weight of the virtual machine is. Such as virtual machines a and B. The weight of the virtual machine a is 1024 and the weight of the virtual machine B is 2048, and the proportion of processing time of the encoding task obtained by the virtual machine a is 1024/(1024+2048) =33.3%. The proportion of processing time of the encoding task obtained by the virtual machine B is 2048/(1024+2048) =66.7%.

Accordingly, the CPU 202 may determine weights of the at least two virtual machines according to the user priorities of the at least two virtual machines providing the concurrent video encoding request, determine processing times of encoding tasks of the at least two virtual machines according to the weights of the at least two virtual machines, and drive the hardware encoder 201a to execute the encoding task of the target virtual machine corresponding to the processing times according to the processing times of the encoding tasks of the at least two virtual machines. Accordingly, the hardware encoder 201a executes the encoding task of the target virtual machine corresponding to the processing time under the driving of the CPU 202 at the corresponding processing time, so as to implement time-division multiplexing of the hardware encoder between the encoding tasks of different virtual machines, thereby implementing parallel execution of the encoding tasks of multiple virtual machines.

The scheduling method of the encoding task of the virtual machine shown in the above embodiment is merely illustrative, and not limiting.

In addition to implementing temporal parallelism, embodiments of the present application may also implement parallel processing in video coding space. Specifically, the hardware encoder 201a may be configured as a multi-stage encoding unit according to a video encoding flow of a video encoding algorithm. Multistage refers to 2 stages or more. In fig. 4, only the R-level coding unit is shown as an example. R is the number of stages of the coding unit, R is more than or equal to 2, and is an integer.

The multi-stage coding units may be connected according to a logical relationship between video coding flows of the video coding algorithm. A user may write a hardware description language corresponding to the multi-level coding unit through a development system corresponding to the hardware processing unit 201, and burn the video coding primitive into the hardware processing unit 201 to obtain the hardware encoder 201a. Or the multi-stage coding unit can also be a hardware module built according to the electronic device corresponding to the video coding flow of the video coding algorithm.

The video coding algorithms and the video coding flows are different, and accordingly the number and the functions of the multi-stage coding units are different. For example, for the H.264 algorithm, the video coding flow may include 5 flows of inter and intra prediction (Estimation), transform (Transform) and inverse Transform, quantization (Quantization) and inverse Quantization, loop Filter (Loop Filter), and entropy coding (Entropy Coding). Accordingly, the multi-level coding unit may include inter and intra prediction units, transform and inverse transform units, quantization and inverse quantization units, loop filtering units, and entropy coding units.

Based on the multi-stage encoding unit, the CPU 202 may drive the multi-stage encoding unit to perform encoding tasks of different video frames in the video to be encoded in a pipelined task execution manner, so as to encode the different video frames. Accordingly, the multi-stage encoding unit performs encoding tasks of different video frames in the video to be encoded in a pipelined task execution manner under the drive of the CPU 202, so that the encoding tasks are spatially parallel, and the throughput rate of the hardware processing unit 201 is improved. That is, the encoding tasks of the multi-stage encoding units are controlled to be performed simultaneously, and the encoding tasks of different video frames are processed respectively, so that the time of the hardware encoder 201a in an idle waiting state can be reduced, and the video encoding efficiency can be improved.

Specifically, as shown in fig. 4, the video encoding process of each video frame may be divided into a plurality of sub-processes, and the encoding task of each sub-process is the same as the function of the encoding unit corresponding to the process. Correspondingly, for any adjacent coding unit A and coding unit B in the multi-stage coding units, wherein the coding unit B is the next flow of the coding unit A, after the execution of the coding task corresponding to the Kth video frame, the coding unit A can transmit the processed data corresponding to the Kth video frame to the coding unit B. The coding unit A can then execute the coding task corresponding to the (K+1) th video frame, and can start the coding task of the next frame without waiting for the completion of the processing of the coding task of the K video frame by other coding units, so that the idle waiting time of each stage of coding units in the multi-stage coding unit can be reduced, and the video coding efficiency can be improved. Wherein K is a positive integer, and K < Q is 1-1. Q is the total number of frames of the video to be encoded.

For example, for H.264 algorithm, the inter-frame and intra-frame prediction unit may transmit the inter-frame and intra-frame prediction result of the Kth video frame to the transform and inverse transform unit after the inter-frame and intra-frame prediction is completed on the Kth video frame, and then the inter-frame and intra-frame prediction unit may then perform inter-frame and intra-frame prediction on the (K+1) th video frame without waiting for the completion of encoding the Kth video frame by other encoding units, and so on, to realize pipelined video encoding, realize parallel processing of multiple video frames on the multi-stage encoding unit, and help to improve video encoding efficiency.

Based on the pipeline task execution mode, in the case that the number of video frames in the video to be encoded, which is not yet encoded, is greater than or equal to the number of stages of the encoding unit, the hardware encoder 201a may simultaneously encode R video frames in parallel, which is helpful for improving the encoding efficiency. Fig. 4 illustrates the current encoded video frame as the 1 st to R video frames in the video to be encoded, but is not limited thereto.

In the process of executing the encoding tasks of different video frames in the video to be encoded by the multi-stage encoding units in the pipelined task execution manner, the CPU 202 may drive the hardware encoder 201a to read the (k+1) -th video frame from the memory of the virtual machine by using the DMA engine after the encoding task of the kth video frame is executed by the first-stage encoding unit in the multi-stage encoding units, and drive the first-stage encoding unit in the hardware encoder 201a to execute the encoding task of the (k+1) -th video frame.

With reference to fig. 4 and fig. 5, after each video frame is encoded, the CPU 202 may further drive the hardware encoder 201a and the hardware encoder 201a to write the video encoded data corresponding to the video frame (such as the video encoded data of video frames 1-Q in fig. 5) into the memory of the virtual machine 30 by using the DMA engine, so as to implement DMA access of the memory, and improve the data transmission efficiency. For the above embodiment of video encoding in a pipelined task execution manner, the CPU 202 drives the hardware encoder 201a to alternately perform reading of video frames and transmission of video encoded data.

In some embodiments, host 10 may also offload networks to data processing module 20. The data processing module 20 may include a network card (not shown in the figures). Accordingly, after each video frame is encoded, the video encoding data corresponding to the video frame can be sent to the destination end of the video request through the network card, and the like.

As shown in fig. 4, CPU 202 may also send an encoding success message to virtual machine 30 after each video frame has been encoded. The message may include metadata of the successfully encoded video frame, such as identification information, numbering information, etc. of the video frame. The hardware processing unit 201 may transparently transmit the encoding success message to the virtual machine 30 through the control plane transmission channel 201 c. Virtual machine 30 may determine which video frame of the video to be encoded was successfully encoded based on the programming success message. Accordingly, the virtual machine 30 can determine whether the video to be encoded is encoded according to the metadata of the video frames in the encoding success message, and if the virtual machine 30 receives the encoding success message corresponding to all frames in the video to be encoded, determine that the video to be encoded is encoded.

Further, as shown in fig. 5, the virtual machine 30 may send an encoder destruction request to the data processing module 20 after the video to be encoded is programmed. The hardware processing unit 201 may transmit the encoder destruction request to the CPU 202 through the control plane transmission channel 201 c. The CPU 202 may reclaim the encoder resources corresponding to the virtual machine in response to the encoder destruction request. After recovering the encoder resources corresponding to the virtual machine, the CPU 202 may return an encoder destruction success message to the virtual machine 30, and the hardware processing unit 201 may transmit the encoder destruction success message to the virtual machine 30 through the control plane transmission channel 201c, so that the virtual machine 30 can learn about the encoder resource destruction condition. For the specific implementation of the VM 30 sending the video encoding request and the CPU 202 determining whether to accept or reject the video encoding request in fig. 5, reference may be made to the related content of the above embodiment, which is not described herein.

In the embodiment of the present application, the specific implementation form of the data processing module 20 is not limited. In some embodiments, data processing module 20 may be implemented as a DPU, a hardware encoder may be added to the DPU, and firmware of the hardware encoder may be integrated on a CPU of the DPU, enabling multiplexing of the CPU of the DPU without requiring the hardware encoder to separately deploy the CPU. Because the SR-IOV software and hardware frame, the DMA frame and the thermal upgrading and thermal migration frame of the existing DPU are mature technologies, the hardware encoder is integrated on the DPU, the SR-IOV software and hardware frame, the DMA frame and the thermal upgrading and thermal migration frame of the DPU can be reused, the fusion of the hardware encoder and the DPU is realized, and the resource reuse of the DPU is realized.

Because the DPU generally has communication components such as a network card, in some cloud service scenarios, such as a cloud desktop, a cloud application, a cloud game, or a live broadcast, the coded data of the picture to be transmitted can be sent to the client through the network card of the DPU, so as to realize the offloading of the network protocol of the host side.

Because the DPU realizes the unloading of the control client, the control of the hardware encoder can be directly completed in the DPU, and the control client of the DPU performs operation and maintenance operations such as hot upgrading and the like on the encoder firmware. Meanwhile, the hardware encoder can multiplex the network device of the DPU and/or the thermal migration channel of the storage device, copy the context of the virtual encoder to the destination end, realize the thermal migration of the virtual encoder, and the like.

The above embodiments take the example of video encoding off-loading at a hardware encoder to implement hardware acceleration, and describe the video encoding process by way of example. Of course, in some embodiments, the video encoding process of the host may also be offloaded to the CPU 202 of the data processing module 20, so as to implement software offloading of video encoding. An exemplary description of the software offloading process for video encoding follows.

Fig. 6 is a schematic structural diagram of a computing system for software offloading of video coding according to an embodiment of the present application. As shown in fig. 6, the computing system includes a host 10 and a data processing module 20. For description of the implementation and connection manner of the host 10 and the data processing module 20, reference may be made to the related content of the above embodiment, and details are not repeated here. As shown in fig. 6, the data processing module 20 may include a hardware processing unit 201 and a CPU 202. For the description of the implementation forms of the hardware processing unit 201 and the CPU 202, reference may be made to the relevant content of the above embodiment, and the description is omitted here.

In this embodiment, in order to decompress the CPU of the host 10 and reduce the probability of the CPU reaching the performance bottleneck, the host 10 may map the data processing module 20 to a virtual device of the host 10 through a virtualization technology, and offload the data processing function to the data processing module 20 through the mapped virtual device. For a specific implementation of the virtualization of the data processing module 20, reference may be made to the above-mentioned related content of the embodiments of fig. 1-3, which is not described herein.

In an embodiment of the present application, in order to implement offloading of the video encoding function, a software module, a plug-in, a container, or the like having the video encoding function may be configured in the CPU 202.

In this embodiment, the host 10 can load the video to be encoded into the memory 102 of the host when it has video encoding requirements. Further, as shown in fig. 6, the host 10 issues a video encoding request to the data processing module 20. The hardware processing unit 201 in the data processing module 20 receives the video encoding request and transparently transmits the video encoding request to the CPU 202 through the control plane transmission channel 201 c. For the virto technique, the video encoding request follows the virto-video protocol.

Accordingly, the CPU 202 runs the firmware of the hardware processing unit 201, and drives the hardware processing unit 201 to act based on the video encoding request. Accordingly, the hardware processing unit 201 obtains the video to be encoded from the memory 102 of the host by using the DMA engine 201b under the driving of the CPU 202, and stores the video to be encoded into the memory 203 of the CPU 202 by using the DMA engine 201 b. Further, the CPU 202 may encode the video to be encoded in the memory 203 to obtain video encoded data, such as video bitstream data, corresponding to the video to be encoded.

In the embodiment of the present application, the specific algorithm for encoding the video to be encoded by the CPU 202 may be referred to the relevant content of the above embodiment, and will not be described herein.

In this embodiment, the video encoding function is offloaded from the CPU of the host to the data processing module, so that the calculation pressure of the CPU of the host is reduced, and the probability that the CPU of the host reaches the performance bottleneck can be reduced. On the other hand, the hardware processing unit accesses the host memory in a DMA mode, so that the speed of the hardware processing unit for acquiring the video to be encoded can be improved, and further the subsequent video encoding efficiency can be improved. In addition, because of the flexibility of the update of the CPU software, the CPU in the data processing module executes the video coding operation in the embodiment, so that the update of the program corresponding to the video coding operation is facilitated.

In some embodiments of the present application, the host 10 is deployed with a virtual machine 30. The number of virtual machines 30 may be 1 or more. The plural means 2 or more than 2. Virtual machine 30 may issue video encoding requests to data processing module 20 according to video encoding requirements. The hardware processing unit 201 may pass the video encoding request through to the CPU 202.

When the CPU 202 controls the hardware processing unit 201 to acquire the video to be encoded from the memory of the host 10 by using the DMA method based on the video encoding request, it can determine whether to accept the video encoding request according to the encoding capacity corresponding to the virtual machine (i.e., the encoding capacity allocated to the virtual machine) and the video encoding request. For a specific implementation manner of determining whether to accept the video encoding request, reference may be made to the relevant content of the above embodiment, which is not described herein.

If it is determined to accept the video encoding request, the CPU 202 may return a request acceptance message to the virtual machine 30. The hardware processing unit 201 may transparently transmit the request acceptance message to the virtual machine 30 through the control plane transmission channel 201 c. Of course, if the remaining encoding capacity of the virtual machine 30 does not meet the requirement of the encoding capacity requirement information, a request rejection message may be returned to the virtual machine 30. The hardware processing unit 201 may transparently transmit the request rejection message to the virtual machine 30 through the control plane transmission channel 201 c.

For embodiments that return a request acceptance message, virtual machine 30 may determine a video frame to be encoded from the video to be encoded in response to the request acceptance message and provide metadata for the video frame to be encoded to CPU 202. The metadata of the video frame to be encoded is information describing the attribute of the video frame, and may include a storage location of the video frame, an identification and a resolution of the video frame, and the like.

The CPU 202 may drive the hardware processing unit 201 to act based on the storage location information in the metadata of the video frame to be encoded. Under the drive of the CPU 202, the hardware processing unit 201 obtains the video frame to be encoded from the memory 102 of the virtual machine by using the DMA engine 201b in a DMA mode, and writes the video frame to be encoded into the memory 203 corresponding to the CPU 202 by using the DMA engine 201b in a DMA mode. The CPU 202 may encode the video frames to be encoded in the memory 203. The CPU 202 may initiate at least one process or at least one thread, encode a video frame to be encoded, etc.

For the embodiment of concurrently sending multiple video encoding requests by a single virtual machine, in the case that the CPU 202 determines to accept the concurrent sending of multiple video encoding requests by a single virtual machine, the CPU 202 may start multiple processes or multiple threads to execute the encoding tasks corresponding to the multiple video encoding requests in parallel in a time-division multiplexing manner.

In other embodiments, the coded scheduling policy may be implemented as a time division multiplexed scheduling policy. Accordingly, the CPU 202 may schedule the encoding tasks of at least two virtual machines that provide concurrent video encoding requests based on a time-division multiplexing scheduling policy, so as to implement time-division multiplexing between encoding tasks of different virtual machines, thereby implementing parallel execution of encoding tasks of multiple virtual machines.

Specifically, the CPU202 may receive metadata of a video frame to be encoded transmitted by the target virtual machine, and drive the hardware processing unit 201 based on storage location information in the metadata of the video frame to be encoded. The hardware processing unit 201 may read the video frame to be encoded from the memory of the target virtual machine using the DMA engine 201b based on the storage location information of the video frame to be encoded under the driving of the CPU202, and write the video frame to be encoded into the memory 203 of the CPU 202. The CPU202 may perform video encoding on the video frame to be encoded in the memory 203 to obtain video encoded data of the video frame to be encoded.

In addition to implementing temporal parallelism, embodiments of the present application may also implement parallel processing in video coding space. Specifically, the CPU 202 may launch multiple processes or multiple threads. The plural means 2 or more than 2.

Based on the above processes or threads, the CPU 202 may control the processes or threads to execute the encoding tasks of different video frames in the video to be encoded in a pipelined task execution manner, so as to encode the different video frames, thereby realizing spatial parallelism of the encoding tasks. The coding tasks of the multi-stage coding units are controlled to be carried out simultaneously, and the coding tasks of different video frames are processed respectively, so that the time that the process or thread of the CPU is in an idle waiting state can be reduced, and the video coding efficiency is improved.

Based on the pipeline task execution mode, when the number of video frames in the video to be encoded, which is not yet encoded, is greater than or equal to the number of stages of the encoding unit, the CPU 202 may simultaneously encode R video frames in parallel, which is helpful for improving the encoding efficiency.

In the process of executing the encoding tasks of different video frames in the video to be encoded by the multi-stage encoding unit in a pipelined task execution manner, the CPU 202 may drive the hardware processing unit 201 to act after the encoding task of the kth video frame is executed. The hardware processing unit 201 may use the DMA engine 201b to read the (k+1) -th video frame from the memory of the virtual machine and write the video frame into the memory 203. CPU 202 may perform the encoding task for the (k+1) th video frame.

For descriptions of the CPU 202 returning the video encoding data of the encoded video frame to the virtual machine 30 and recovering the encoding resources corresponding to the virtual machine 30 after the encoding of the encoded video frame is completed, reference may be made to the related contents of the above embodiments, and details are not repeated here.

In some embodiments, the CPU 202 may further drive the network card of the data processing module 20 to provide the video encoded data of the encoded video frame to the destination end requesting the video, so as to achieve network offloading on the host side.

In addition to the above system embodiments, the embodiments of the present application further provide a video encoding method, and the video encoding method provided by the embodiments of the present application is described below as an example.

Fig. 7a is a flowchart of a video encoding method according to an embodiment of the present application. The method is suitable for the data processing module. The data processing module may be communicatively coupled to the host. The data processing module is connected with the host, and comprises a hardware processing unit and a CPU, wherein the hardware processing unit comprises a DMA engine and a control surface transmission channel, and a hardware encoder is additionally arranged on the hardware processing unit. As shown in fig. 7a, the method mainly includes:

701. And obtaining a video coding request issued by the host.

702. And transmitting the video coding request to a CPU in the data processing module through a control surface transmission channel.

703. The CPU runs firmware of the hardware encoder and drives the hardware encoder to act based on the video encoding request.

704. Under the drive of CPU, the hardware encoder uses DMA engine to obtain the video to be encoded corresponding to the video encoding request from the memory of the host.

705. The hardware encoder encodes the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

In the embodiment of the application, in order to realize the unloading of the video coding function, the video coding can be unloaded to the data processing module. In this embodiment, when the host has a video encoding requirement, the video to be encoded can be loaded into the memory of the host. Further, the host may issue a video encoding request to the data processing module. In step 701, a hardware processing unit in the data processing module receives the video encoding request, and in step 702, the video encoding request is transmitted to a CPU in the data processing module through a data channel.

Accordingly, for the CPU of the data processing module, the firmware of the hardware encoder may be run and the hardware encoder action driven based on the video encoding request in step 703. Correspondingly, in step 704, the hardware encoder may acquire the video to be encoded from the memory of the host by using the DMA engine under the driving of the CPU, and in step 705, encode the video to be encoded to obtain video encoded data, such as video bitstream data, corresponding to the video to be encoded.

In some embodiments of the application, the host is deployed with a virtual machine. The number of virtual machines may be 1 or more. The plural means 2 or more than 2. The virtual machine can issue a video coding request to the data processing module according to video coding requirements.

The user of the virtual machine can purchase the corresponding coding capacity according to the actual demand. And the scheduling node corresponding to the host can allocate the coding capacity for the virtual machine according to the coding capacity requirement of the user of the virtual machine. Based on this, the CPU drives the hardware encoder to act based on the video encoding request, and can be realized to judge whether to accept the video encoding request according to the encoding capacity corresponding to the virtual machine (namely, the encoding capacity allocated to the virtual machine) and the video encoding request.

Specifically, the encoding capacity requirement information of the virtual machine may be acquired based on the video encoding request. For example, the resolution and frame rate of the video frames of the video to be encoded can be obtained from the video encoding request, and the encoding capacity requirement information of the virtual machine can be calculated according to the resolution and frame rate of the video frames of the video to be encoded. For example, the resolution of a video frame of the encoded video and the product of the frame rate may be calculated to obtain the encoding capacity requirement information of the virtual machine.

In practical applications, a single virtual machine may send a single video encoding request, or may send multiple video encoding requests concurrently. The plural means 2 or more than 2. For the embodiment of the single virtual machine transmitting multiple video coding requests concurrently, the coding capacity requirement information corresponding to each video coding request can be obtained based on the multiple video coding requests, and the coding capacity requirement information of the virtual machine is calculated according to the coding capacity requirement information corresponding to each video coding request. For example, the sum of the coding capacity requirement information corresponding to each of the plurality of video coding requests may be calculated to obtain the coding capacity requirement information of the virtual machine.

Further, whether the remaining coding capacity of the virtual machine meets the requirement of the coding capacity requirement information can be judged according to the coding capacity corresponding to the virtual machine (namely, the coding capacity allocated to the virtual machine) and the coding capacity currently consumed by the virtual machine. And if the residual coding capacity of the virtual machine is greater than or equal to the coding capacity demand information, determining that the residual coding capacity of the virtual machine meets the requirement of the coding capacity demand information. If the residual coding capacity of the virtual machine is smaller than the coding capacity demand information, determining that the residual coding capacity of the virtual machine does not meet the requirement of the coding capacity demand information.

Further, in the case where the remaining encoding capacity of the virtual machine satisfies the requirement of the encoding capacity demand information, it is determined to accept the video encoding request. Further, a request acceptance message may be returned to the virtual machine. The hardware processing unit may transparently pass the request acceptance message to the virtual machine through the control plane transmission channel. Of course, if the remaining encoding capacity of the virtual machine does not meet the requirement of the encoding capacity requirement information, a request rejection message may be returned to the virtual machine. The hardware processing unit may pass the request rejection message through to the virtual machine.

For embodiments that return a request acceptance message, the virtual machine may determine a video frame to be encoded from the video to be encoded in response to the request acceptance message and provide metadata of the video frame to be encoded to the CPU. The metadata of the video frame to be encoded is information describing the attribute of the video frame, and may include a storage location of the video frame, an identification and a resolution of the video frame, and the like.

Accordingly, the CPU may drive the hardware encoder action based on the storage location information in the metadata of the video frame to be encoded. Correspondingly, the hardware encoder can acquire the video frame to be encoded from the memory of the virtual machine by utilizing the DMA engine based on the storage position information of the video frame to be encoded under the drive of the CPU, and encode the video frame to be encoded.

In order to prevent the virtual machine from exceeding the standard and using the coding resources, the CPU can monitor the actual usage of the coding capacity of the virtual machine in real time in the process of coding the video to be coded under the condition of determining to accept the video coding request of the virtual machine. If the actual usage is larger than the coding capacity allocated by the virtual machine, the frame rate of video coding of the hardware processing unit or the CPU can be reduced, so that the actual usage of the coding capacity by the virtual machine does not exceed the coding capacity allocated by the virtual machine.

For the embodiment of the single virtual machine transmitting multiple video coding requests concurrently, in the case that the CPU determines to accept the single virtual machine transmitting multiple video coding requests concurrently, the hardware encoder may be driven to execute the coding tasks corresponding to the multiple video coding requests in parallel in a time-division multiplexing manner. The hardware encoder is driven by the CPU to execute the encoding tasks corresponding to the video encoding requests in parallel in a time division multiplexing mode.

For embodiments in which the host is deployed with multiple virtual machines, there may be at least two virtual machines in the multiple virtual machines, and the video encoding request may be sent concurrently to the data processing module.

For the embodiment that at least two virtual machines send video coding requests concurrently, the CPU may also determine whether to accept the video coding request of each virtual machine according to the coding capacity corresponding to each virtual machine and the video coding request sent by the virtual machine. For any virtual machine A in at least two virtual machines which concurrently send video coding requests, whether to accept the video coding requests of the virtual machine A can be judged according to the coding capacity of the virtual machine A and the video coding requests sent by the virtual machine A. For the specific implementation of determining whether to accept the video encoding request of the virtual machine a, please refer to the related content of the above embodiment, and the description is omitted herein.

Under the condition that the video coding request sent by the at least two virtual machines in parallel is confirmed to be accepted, the CPU can schedule coding tasks of the at least two virtual machines based on a set coding scheduling strategy and drive a hardware encoder to execute the coding tasks of the virtual machines currently scheduled. Accordingly, the hardware encoder can execute the encoding task of the virtual machine currently scheduled by the CPU under the drive of the CPU.

In the embodiment of the application, the specific implementation form of the coding scheduling strategy is not limited. In some embodiments, users corresponding to the virtual machines have different priorities, and the coding scheduling policy may be implemented as a priority scheduling policy. Accordingly, the CPU may drive the hardware processing unit to preferentially execute the encoding task of the virtual machine with the highest user priority according to the user priorities corresponding to the at least two virtual machines that provide the concurrent video encoding request.

In other embodiments, the coded scheduling policy may be implemented as a time division multiplexed scheduling policy. Accordingly, the CPU may schedule the encoding tasks of at least two virtual machines providing concurrent video encoding requests based on a time-division multiplexing scheduling policy, and drive the hardware encoder to execute the encoding tasks of the currently scheduled target virtual machine. Correspondingly, the hardware encoder can execute the coding tasks of the target virtual machine which is currently scheduled under the drive of the CPU, so that the time-sharing multiplexing of the hardware encoder among the coding tasks of different virtual machines is realized, and the parallel execution of the coding tasks of a plurality of virtual machines is realized.

In addition to implementing temporal parallelism, embodiments of the present application may also implement parallel processing in video coding space. Specifically, a hardware encoder in the hardware processing unit may be configured as a multi-stage encoding unit according to a video encoding flow of a video encoding algorithm. Multistage refers to 2 stages or more. The multi-stage coding units may be connected according to a logical relationship between video coding flows of the video coding algorithm.

Based on the multi-stage coding unit, the CPU can drive the multi-stage coding unit to act. The multistage coding unit is driven by the CPU to execute coding tasks of different video frames in the video to be coded in a pipelining task execution mode so as to code the different video frames, thereby realizing the spatial parallelism of the coding tasks and being beneficial to improving the throughput rate of the hardware processing unit. The coding tasks of the multi-stage coding units are controlled to be carried out simultaneously, and the coding tasks of different video frames are processed respectively, so that the time of the hardware encoder in an idle waiting state can be reduced, and the video coding efficiency is improved.

The CPU may also send an encoding success message to the virtual machine after each video frame has been encoded. The message may include metadata of the successfully encoded video frame, such as identification information, numbering information, etc. of the video frame. The hardware processing unit can transmit the encoding success message to the virtual machine through the control plane transmission channel. The virtual machine may determine which video frame of the video to be encoded was successfully encoded based on the programming success message. Correspondingly, the virtual machine can judge whether the video to be encoded is encoded according to the metadata of the video frames in the encoding success message, and if the virtual machine receives the encoding success message corresponding to all frames in the video to be encoded, the virtual machine determines that the video to be encoded is encoded.

Further, the virtual machine may send an encoder destruction request to the data processing module after the video to be encoded is programmed. The hardware processing unit can transmit the encoder destruction request to the CPU through the control surface transmission channel. The CPU can receive the encoder destroying request transmitted by the hardware processing unit, and recover the encoder resources corresponding to the virtual machine in response to the encoder destroying request. After recovering the corresponding encoder resources of the virtual machine, the CPU can return an encoder destroying success message to the virtual machine, and the hardware processing unit can transmit the encoder destroying success message to the virtual machine so that the virtual machine can acquire the encoder resource destroying condition.

The above embodiments take the example of video encoding off-loading at a hardware encoder to implement hardware acceleration, and describe the video encoding process by way of example. Of course, in some embodiments, the video encoding process of the host may also be offloaded to the CPU of the data processing module, so as to implement software offloading of video encoding. An exemplary description of the software offloading process for video encoding follows.

Fig. 7b is a flowchart of another video encoding method according to an embodiment of the present application. The method is suitable for the data processing module. Wherein the data processing module is communicatively connectable to the host. The data processing module comprises a hardware processing unit and a CPU. The hardware processing unit comprises a DMA engine and a control plane transmission channel. Accordingly, as shown in fig. 7b, the video encoding method includes:

71. And obtaining a video coding request issued by the host.

72. And transmitting the video coding request to the CPU of the data processing module through the control surface transmission channel.

73. The CPU runs the firmware of the hardware processing unit and drives the hardware processing unit to act based on the video coding request.

74. The hardware processing unit acquires the video to be encoded corresponding to the video encoding request from the memory of the host by using the DMA engine under the drive of the CPU, and stores the video to be encoded to the memory of the CPU of the data processing module by using the DMA engine.

75. And the CPU of the data processing module encodes the video to be encoded in the memory to obtain video encoding data corresponding to the video to be encoded.

In this embodiment, in order to decompress the CPU of the host, the host may map the data processing module to a virtual device of the host through a virtualization technology, and offload the data processing function to the data processing module through the mapped virtual device.

In the embodiment of the application, in order to realize the unloading of the video coding function, a software module, a plug-in or a container with the video coding function and the like can be configured in the CPU of the data processing module.

In this embodiment, when the host has a video encoding requirement, the video to be encoded can be loaded into the memory of the host. Further, the host computer issues a video encoding request to the data processing module. In step 71, the hardware processing unit in the data processing module receives the video encoding request, and in step 72, the video encoding request is transmitted to the CPU of the data processing module through the control plane transmission channel.

Accordingly, in step 73, the CPU 202 in the data processing module runs the firmware of the hardware processing unit and drives the hardware processing unit to act based on the video encoding request. Accordingly, in step 74, the hardware processing unit obtains the video to be encoded from the memory of the host by using the DMA engine under the driving of the CPU 202, and stores the video to be encoded into the memory of the CPU of the data processing module by using the DMA engine 201 b. Further, in step 75, the CPU of the data processing module may encode the video to be encoded in the memory of the CPU to obtain video encoded data, such as video bitstream data, corresponding to the video to be encoded.

In the embodiment of the present application, the specific algorithm for encoding the video to be encoded by the CPU of the data processing module may be referred to the relevant content of the above embodiment, which is not described herein again.

In some embodiments of the application, the host is deployed with a virtual machine. The number of virtual machines 30 may be 1 or more. The plural means 2 or more than 2. The virtual machine can issue a video coding request to the data processing module according to video coding requirements. The hardware processing unit may pass the video encoding request through to the CPU of the data processing module.

When the CPU of the data processing module drives the hardware processing unit to act based on the video coding request, whether the video coding request is accepted or not can be judged according to the coding capacity corresponding to the virtual machine (namely the coding capacity allocated to the virtual machine) and the video coding request. For a specific implementation manner of determining whether to accept the video encoding request, reference may be made to the relevant content of the above embodiment, which is not described herein.

If it is determined to accept the video encoding request, the CPU of the data processing module may return a request acceptance message to the virtual machine. The hardware processing unit may transparently pass the request acceptance message to the virtual machine through the control plane transmission channel. Of course, if the remaining encoding capacity of the virtual machine 30 does not meet the requirement of the encoding capacity requirement information, a request rejection message may be returned to the virtual machine. The hardware processing unit may transparently pass the request rejection message to the virtual machine through the control plane transmission channel.

The CPU of the data processing module may drive the hardware processing unit to act based on the storage location information in the metadata of the video frame to be encoded. The hardware processing unit acquires the video frame to be encoded from the memory of the virtual machine by using a DMA engine in a DMA mode under the drive of the CPU, and writes the video frame to be encoded into the memory corresponding to the CPU of the data processing module by using the DMA engine in the DMA mode. The CPU of the data processing module may encode the video frames to be encoded in its memory.

For the embodiment of the foregoing single virtual machine concurrent sending multiple video coding requests, in the case that the CPU of the data processing module determines to accept the single virtual machine concurrent sending multiple video coding requests, the CPU of the data processing module may start multiple processes or multiple threads to execute, in parallel, coding tasks corresponding to the multiple video coding requests in a time-division multiplexing manner.

In other embodiments, the coded scheduling policy may be implemented as a time division multiplexed scheduling policy. Accordingly, the CPU of the data processing module can schedule the coding tasks of at least two virtual machines providing concurrent video coding requests based on a time-sharing multiplexing scheduling policy, so that the time-sharing multiplexing among the coding tasks of different virtual machines is realized, and the parallel execution of the coding tasks of a plurality of virtual machines is realized.

Specifically, the CPU of the data processing module may receive metadata of the video frame to be encoded sent by the target virtual machine, and drive the hardware processing unit based on storage location information in the metadata of the video frame to be encoded. The hardware processing unit can read the video frame to be encoded from the memory of the target virtual machine by utilizing the DMA engine based on the storage position information of the video frame to be encoded under the drive of the CPU, and write the video frame to be encoded into the memory of the CPU of the data processing module. The CPU of the data processing module can perform video coding on the video frames to be coded in the memory of the CPU to obtain video coding data of the video frames to be coded.

In addition to implementing temporal parallelism, embodiments of the present application may also implement parallel processing in video coding space. In particular, the CPU of the data processing module may launch multiple processes or multiple threads. The plural means 2 or more than 2.

Based on the processes or the threads, the CPU of the data processing module can control the processes or the threads to execute the coding tasks of different video frames in the video to be coded in a pipelined task execution mode so as to code the different video frames, thereby realizing the spatial parallelism of the coding tasks. The coding tasks of the multi-stage coding units are controlled to be carried out simultaneously, and the coding tasks of different video frames are processed respectively, so that the time that the process or thread of the CPU is in an idle waiting state can be reduced, and the video coding efficiency is improved.

Based on the pipeline task execution mode, R video frames can be simultaneously encoded in the CPU of the data processing module under the condition that the number of video frames which are not encoded in the video to be encoded is greater than or equal to the number of stages of the encoding unit, thereby being beneficial to improving the encoding efficiency.

In the process of executing the encoding tasks of different video frames in the video to be encoded by the multi-stage encoding unit in a pipelined task execution mode, the CPU of the data processing module can drive the hardware processing unit to act after the encoding task of the Kth video frame is executed. The hardware processing unit can read the (K+1) th video frame from the memory of the virtual machine by using a DMA engine in a DMA mode, and write the video frame into the memory of the CPU of the data processing module. The CPU of the data processing module may perform the encoding task of the (k+1) th video frame.

The description of the CPU of the data processing module that returns the video encoding data of the encoded video frame to the virtual machine and recovers the encoding resources corresponding to the virtual machine after the encoding of the encoded video is completed can be referred to the relevant content of the above embodiment, which is not described herein.

In some embodiments, the CPU of the data processing module may further drive the network card of the data processing module to provide the video encoded data of the encoded video frame to the destination end that requests the video, so as to implement network offloading of the host side.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

It should be noted that, the execution subjects of each step of the method provided in the above embodiment may be the same device, or the method may also be executed by different devices. For example, the execution bodies of steps 701 and 702 may be device A, the execution body of step 701 may be device A, the execution body of step 702 may be device B, and so on.

In addition, in some of the above embodiments and the flows described in the drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that the operations may be performed out of the order in which they appear herein or performed in parallel, the sequence numbers of the operations such as 701, 702, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the video encoding methods described above.

Fig. 8a and fig. 8b are schematic structural diagrams of a data processing module according to an embodiment of the present application. As shown in fig. 8a and 8b, the data processing module includes a hardware processing unit 801 and a CPU 802. The hardware processing unit 801 is communicatively connected to a CPU 802. The hardware processing unit 801 may include a DMA engine 801b and a control plane transmission channel 801c.

In some embodiments, as shown in fig. 8a, a hardware encoder 801a is added to the hardware processing unit 801. When the data processing module is in communication connection with the host, the hardware processing unit 801 may be configured to obtain a video encoding request issued by the host, and pass the video encoding request through the control plane transmission channel to the CPU 802. The CPU 802 may run the firmware of the hardware encoder 801a and drive the hardware encoder 801a to act based on the video encoding request.

Under the drive of the CPU 802, the hardware encoder 801a obtains the video to be encoded corresponding to the video encoding request from the memory of the host by using the DMA engine 801b, and encodes the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

In some embodiments, the host is deployed with a virtual machine, and the virtual machine sends a video encoding request to the data processing module. The CPU 802 is specifically configured to determine whether to accept the video encoding request according to the video encoding request and the encoding capacity allocated to the virtual machine when driving the hardware encoder to act based on the video encoding request, if yes, return a request accepting message to the virtual machine, and transmit the request accepting message to the virtual machine through the control plane transmission channel, so that the virtual machine determines a video frame to be encoded from the video to be encoded in response to the request accepting message, and provide metadata of the video frame to be encoded to the data processing module. Accordingly, the CPU 802 may drive the hardware encoder action based on the metadata of the video frame to be encoded returned by the virtual machine.

Accordingly, when the hardware encoder 801a obtains the video to be encoded corresponding to the video encoding request from the memory of the host by using the DMA engine 801b under the driving of the CPU 802, the hardware encoder 801a is specifically configured to obtain the video frame to be encoded from the memory of the host by using the DMA engine 801b under the driving of the CPU 802. Further, when the hardware encoder 801a encodes the video to be encoded, the video frame to be encoded may be encoded to obtain video encoding data of the video frame to be encoded.

In some embodiments, the CPU 802 is configured to determine whether to accept the video encoding request according to the encoding capacity allocated to the virtual machine and the video encoding request, and specifically configured to obtain encoding capacity demand information of the virtual machine based on the video encoding request, determine whether the remaining encoding capacity of the virtual machine meets the requirement of the encoding capacity demand information according to the encoding capacity allocated to the virtual machine and the encoding capacity currently consumed by the virtual machine, and if so, determine to accept the video encoding request.

In some embodiments of the present application, the CPU 802 is further configured to monitor an actual usage of the encoding capacity by the virtual machine if it is determined to accept the video encoding request, and reduce a frame rate of video encoding by the hardware encoder if it is monitored that the actual usage is greater than the encoding capacity allocated by the virtual machine, so that the actual usage of the encoding capacity by the virtual machine does not exceed the encoding capacity allocated by the virtual machine.

In some embodiments, the hardware encoder comprises a multi-stage encoding unit, wherein the hardware encoder is particularly used for processing encoding tasks of a plurality of video frames in the video to be encoded in a pipeline task execution mode under the drive of a CPU (Central processing Unit).

In some embodiments, the host deploys multiple virtual machines, and at least two virtual machines in the multiple virtual machines concurrently send video encoding requests. The CPU 802 is specifically configured to schedule, based on a time-division multiplexing scheduling policy, encoding tasks of at least two virtual machines when driving a hardware encoder to act based on a video encoding request, and to drive the hardware encoder 801a to perform video encoding on video to be encoded of a currently scheduled virtual machine, where the video encoding request sent by at least two virtual machines concurrently is accepted.

Accordingly, the hardware encoder 801a performs video encoding on the video to be encoded of the virtual machine currently scheduled by the CPU under the driving of the CPU 802.

In this embodiment, the data processing module may support SR-IOV. The plurality of virtual machines virtualize the data processing module as a plurality of virtual video devices through the SR-IOV and are assigned to the plurality of virtual machines.

In some embodiments, the hardware encoder 801a, under the driving of the CPU 802, uses the DMA engine 801b to write the video encoded data corresponding to the video frame to be encoded into the memory of the virtual machine.

In other embodiments, the hardware processing unit 801 may further receive an encoder destruction request sent by the virtual machine, where the encoder destruction request is sent by the virtual machine after encoding the video to be encoded is completed, and send the encoder destruction request to the CPU 802 through the control plane transmission channel. The CPU 802 reclaims the encoder resources corresponding to the virtual machine in response to the encoder destruction request.

In some embodiments of the present application, as shown in FIG. 8b, the hardware processing unit 801 has no hardware encoder. Correspondingly, the hardware processing unit 801 can acquire a video coding request issued by the host, and the video coding request is transmitted to the CPU 802 through the control plane transmission channel 801 c.

The CPU 802 runs the firmware of the hardware processing unit and drives the hardware processing unit 801 to operate based on the video encoding request. The hardware processing unit 801 obtains the video to be encoded corresponding to the video encoding request from the memory of the host by using the DMA engine 801b under the driving of the CPU 802, and stores the video to be encoded to the memory 803 of the CPU 802 by using the DMA engine 801 b.

The CPU 802 may encode the video to be encoded in the memory 802 to obtain video encoded data corresponding to the video to be encoded.

Optionally, when encoding the video to be encoded in the slave memory, the CPU 802 is specifically configured to start a plurality of processes or a plurality of threads, and process the encoding task of a plurality of video frames in the video to be encoded by using the plurality of processes or the plurality of processes in a pipelined task execution manner.

In some embodiments, a host deploys multiple virtual machines, at least two virtual machines in the multiple virtual machines concurrently transmit video encoding requests, and when encoding video to be encoded in a slave memory, the CPU 802 is specifically configured to, if the video encoding requests concurrently transmitted by the at least two virtual machines are accepted, start multiple processes or multiple threads, schedule encoding tasks of the at least two virtual machines based on a time-division multiplexing scheduling policy by using the multiple processes or multiple processes, and video encode the video to be encoded of the currently scheduled virtual machine.

The data processing module provided by the embodiment can unload the video coding function from the CPU of the host to the data processing module when in communication connection with the host, so that the calculation pressure of the CPU of the host is reduced, and the probability that the CPU of the host reaches the performance bottleneck can be reduced. On the other hand, the hardware processing unit accesses the host memory in a DMA mode, so that the speed of the hardware processing unit for acquiring the video to be encoded can be improved, and further the subsequent video encoding efficiency can be improved.

It should be noted that the computing system may further include optional components such as a memory, a communication component, a power supply component, a display component, and an audio component, in addition to the memory. Some of the components in the embodiments of the present application are shown only schematically, which does not mean that the computing system must contain all of the components shown in the embodiments of the present application, nor does it mean that the computing system can only contain the components shown in the embodiments of the present application.

In an embodiment of the present application, the memory is used to store a computer program and may be configured to store various other data to support operations on the device on which it resides. Wherein the processor may execute a computer program stored in the memory to implement the corresponding control logic. The Memory may be implemented by any type or combination of volatile or non-volatile Memory devices, such as Static Random-Access Memory (SRAM), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY, EEPROM), erasable programmable Read-Only Memory (ELECTRICAL PROGRAMMABLE READ ONLY MEMORY, EPROM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.

In an embodiment of the present application, the processor may be any hardware processing device that may execute the above-described method logic. Alternatively, the processor may be a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU) or a micro control unit (Microcontroller Unit, MCU), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a Programmable array Logic device (Programmable Array Logic, PAL), a general-purpose array Logic device (GENERAL ARRAY Logic, GAL), a complex Programmable Logic device (Complex Programmable Logic Device, CPLD), or an Application SPECIFIC INTEGRATED Circuit (ASIC) Chip, or an advanced reduced instruction set (Reduced Instruction Set Compute, RISC) processor (ADVANCED RISC MACHINES, ARM) or a System on Chip (SoC), or the like, but is not limited thereto.

In an embodiment of the application, the communication component is configured to facilitate wired or wireless communication between the device in which it is located and other devices. The device in which the communication component is located may access a wireless network based on a communication standard, such as wireless fidelity (WIRELESS FIDELITY, WIFI), 2G or 3G,4G,5G, or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the Communication component may also be implemented based on Near Field Communication (NFC) technology, radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data Association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, or other technologies.

In an embodiment of the present application, the display assembly may include a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD) and a Touch Panel (TP). If the display assembly includes a touch panel, the display assembly may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation.

In an embodiment of the application, the power supply assembly is configured to provide power to the various components of the device in which it is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

In embodiments of the application, the audio component may be configured to output and/or input audio signals. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. For example, for a device with language interaction functionality, voice interaction with a user, etc., may be accomplished through an audio component.

It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, CD-ROM (Compact Disc Read-Only Memory), optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (or systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing module to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing module, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (e.g., CPUs, etc.), input/output interfaces, network interfaces, and memory.

The Memory may include volatile Memory, random-Access Memory (RAM), and/or nonvolatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory (Flash RAM). Memory is an example of computer-readable media.

The storage medium of the computer is a readable storage medium, which may also be referred to as a readable medium. Readable storage media, including both permanent and non-permanent, removable and non-removable media, may be implemented in any method or technology for information storage. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-Change Memory (PRAM), static Random-Access Memory (SRAM), dynamic Random-Access Memory (Dynamic Random Access Memory, DRAM)), other types of Random-Access Memory (RAM), read-only Memory (ROM), electrically erasable programmable read-only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY, EEPROM), flash Memory or other Memory technology, read-only compact disc read-only Memory (CD-ROM), digital versatile discs (Digital Video Disc, DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (Transitory Media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A computing system, characterized in that it comprises: a host and a data processing module; the host and the data processing module are in communication connection; the data processing module comprises: a hardware processing unit and a central processing unit CPU; the hardware processing unit comprises: a direct memory access DMA engine and a control plane transmission channel; the hardware processing unit is additionally provided with a hardware encoder;

The host is used to send a video encoding request to the hardware processing unit;

The hardware processing unit is used to transparently transmit the video encoding request to the CPU through the control plane transmission channel;

The CPU runs the firmware of the hardware encoder and drives the hardware encoder to operate based on the video encoding request;

The hardware encoder is used to obtain the video to be encoded corresponding to the video encoding request from the memory of the host using the DMA engine under the drive of the CPU; and encode the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

2. The system according to claim 1, characterized in that the host deploys multiple virtual machines; the data processing module supports single root input and output virtualization SR-IOV;

The multiple virtual machines virtualize the data processing module into multiple virtual video devices through SR-IOV technology, and distribute the data processing module to the multiple virtual machines.

3. A computing system, characterized in that it comprises: a host and a data processing module; the host and the data processing module are in communication connection; the data processing module comprises: a hardware processing unit and a central processing unit CPU; the hardware processing unit comprises: a direct memory access DMA engine and a control plane transmission channel;

The CPU, based on the video encoding request, controls the hardware processing unit to use the DMA engine to obtain the video to be encoded from the memory of the host, and uses the DMA engine to store the video to be encoded in the memory of the CPU; reads the video to be encoded from the memory of the CPU, and encodes the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

4. The system according to claim 3, characterized in that the host deploys multiple virtual machines; the data processing module supports single root input and output virtualization SR-IOV;

5. A video encoding method, characterized in that it is applicable to a data processing module, the data processing module is connected to a host; the data processing module comprises: a hardware processing unit and a CPU; the hardware processing unit comprises: a direct memory access DMA engine and a control plane transmission channel; the hardware processing unit is additionally provided with a hardware encoder;

The method comprises:

Obtaining a video encoding request sent by the host;

Transmitting the video encoding request to the CPU through the control plane transmission channel;

The hardware encoder, driven by the CPU, uses the DMA engine to obtain the video to be encoded corresponding to the video encoding request from the memory of the host; and encodes the video to be encoded to obtain video encoding data corresponding to the video to be encoded.

6. The method according to claim 5, characterized in that the host is deployed with a virtual machine; the virtual machine sends the video encoding request to the data processing module;

The step of driving the hardware encoder to act based on the video encoding request includes:

The CPU determines whether to accept the video encoding request according to the video encoding request and the encoding capacity allocated to the virtual machine; if the determination result is yes, returns a request acceptance message to the virtual machine;

transparently transmitting the request acceptance message to the virtual machine through the control plane transmission channel, so that the virtual machine determines the video frame to be encoded from the video to be encoded in response to the request acceptance message, and provides metadata of the video frame to be encoded to the data processing module;

The CPU drives the hardware encoder to operate based on the metadata of the to-be-encoded video frame returned by the virtual machine.

7. The method according to claim 6, wherein the hardware encoder, under the drive of the CPU, uses the DMA engine to obtain the to-be-encoded video corresponding to the video encoding request from the memory of the host, comprising:

The hardware encoder, driven by the CPU, uses the DMA engine to obtain the to-be-encoded video frame from the memory of the host;

The encoding of the video to be encoded includes: the hardware encoder encoding the video frame to be encoded to obtain video encoding data of the video frame to be encoded.

8. The method according to claim 6, wherein the CPU determines whether to accept the video encoding request according to the encoding capacity allocated to the virtual machine and the video encoding request, comprising:

Based on the video encoding request, obtaining encoding capacity requirement information of the virtual machine;

According to the encoding capacity allocated to the virtual machine and the encoding capacity currently consumed by the virtual machine, determining whether the remaining encoding capacity of the virtual machine meets the requirements of the encoding capacity demand information;

If the judgment result is yes, determine to accept the video encoding request.

9. The method according to claim 6, further comprising:

The CPU monitors actual usage of encoding capacity by the virtual machine when determining to accept the video encoding request;

When it is monitored that the actual usage is greater than the encoding capacity allocated to the virtual machine, the frame rate of the video encoding of the hardware encoder is reduced so that the actual usage of the encoding capacity by the virtual machine does not exceed the encoding capacity allocated to the virtual machine.

10. The method according to claim 5, characterized in that the hardware encoder comprises: a multi-level encoding unit; the encoding of the video to be encoded comprises:

The multi-level encoding unit, driven by the CPU, processes the encoding tasks of multiple video frames in the video to be encoded in a pipeline task execution manner.

11. The method according to claim 6, characterized in that the host deploys multiple virtual machines; at least two of the multiple virtual machines concurrently send video encoding requests;

When the video encoding requests concurrently sent by the at least two virtual machines are accepted, the CPU schedules the encoding tasks of the at least two virtual machines based on a time-division multiplexing scheduling strategy; and drives the hardware encoder to perform video encoding on the video to be encoded of the currently scheduled virtual machine;

The hardware encoder, driven by the CPU, encodes the video to be encoded of the virtual machine currently scheduled by the CPU.

12. The method according to claim 11 is characterized in that the data processing module supports SR-IOV; the data processing module is virtualized into multiple virtual video devices by the multiple virtual machines through SR-IOV technology and is allocated to the multiple virtual machines.

13. The method according to claim 6, further comprising:

The hardware encoder, driven by the CPU, uses the DMA engine to write the video encoding data corresponding to the video frame to be encoded into the memory of the virtual machine.

14. The method according to claim 6, further comprising:

The hardware processing unit receives an encoder destruction request sent by the virtual machine; the encoder destruction request is sent by the virtual machine after encoding of the video to be encoded is completed;

Transmitting the encoder destruction request to the CPU through the control plane transmission channel;

The CPU reclaims encoder resources corresponding to the virtual machine in response to the encoder destruction request.

15. A video encoding method, characterized in that it is applicable to a data processing module, the data processing module is connected to a host; the data processing module comprises: a hardware processing unit and a CPU; the hardware processing unit comprises: a direct memory access DMA engine and a control plane transmission channel; the method comprises:

Obtaining a video encoding request sent by the host;

The CPU runs the firmware of the hardware processing unit and drives the hardware processing unit to act based on the video encoding request;

The hardware processing unit, under the drive of the CPU, uses the DMA engine to obtain the video to be encoded corresponding to the video encoding request from the memory of the host; and uses the DMA engine to store the video to be encoded in the memory of the CPU;

The CPU encodes the video to be encoded in the memory to obtain video encoding data corresponding to the video to be encoded.

16. A data processing module, characterized in that it comprises: a hardware processing unit and a CPU; the hardware processing unit and the CPU are in communication connection; the hardware processing unit comprises: a direct memory access DMA engine and a control plane transmission channel; the hardware processing unit is additionally provided with a hardware encoder; when the data processing module is in communication connection with a host, the CPU is used to execute the steps in the method executed by the CPU in any one of claims 5-14;

The hardware encoder is used to execute the steps in the hardware encoder execution method in any one of claims 5-14.

17. A data processing module, characterized in that it comprises: a hardware processing unit and a CPU; the hardware processing unit and the CPU are in communication connection; the hardware processing unit comprises: a direct memory access DMA engine and a control plane transmission channel; when the data processing module is in communication connection with a host, the CPU is used to execute the steps in the method executed by the CPU in claim 15;

The hardware processing unit is used to execute the steps in the hardware processing unit execution method of claim 15.