Acceleration system and method for multi-channel video coding and decoding
Technical Field
The invention relates to the technical field of server video coding and decoding, and particularly provides a system and a method for accelerating multi-channel video coding and decoding.
Background
In recent years, the continuous development of network technology makes our lives more and more colorful. With the convenient carrier of the network, the multimedia technology has also advanced day by day, and as the core and key of the multimedia technology, the multimedia video codec has made great progress in the technical and application aspects in recent years. The main function of video coding is to compress video pixel data RGB or YUV, etc. into a video code stream, thereby reducing the data volume of video.
The application requirements for the current video codec are more and more, and the current customers are based on a CPU with video codec function, such as E3 of intel, and P4 card of great. The maximum processing capacity of the current E3 integrated codec chip is 12-way 1080P. Moreover, only 2 chips in the P4 card are used for video encoding and decoding, and if the P4 card is used as a single encoding and decoding application, the other computing performance of the P3932 card is definitely wasted, and the price of the P4 card is relatively expensive. Based on the fact that many clients process small videos and perform security protection, various services such as live broadcast appear, and the method has higher requirements on the way of encoding and decoding and the cost.
Disclosure of Invention
In view of the above disadvantages, embodiments of the present invention provide a system and a method for accelerating encoding and decoding of multiple channels of video, which reduce the cost and improve the encoding and decoding speed of multiple channels of video.
The embodiment of the invention provides an acceleration system for multi-channel video coding and decoding, which comprises a Camera, a Server and an acceleration card, wherein the acceleration card comprises a BMC control module, a chip processing module and a power module;
the BMC control module comprises a BMC, a watchdog, a fan, a temperature sensor and an EEPROM; the BMC is connected with the watchdog through a UART; the BMC is respectively connected with the fan, the temperature sensor and the EEPROM through I2C;
the chip processing module comprises 2, 3 or 4 FPGA chips; the FPGA chip is respectively connected with two DDR4 memory bars, a JTAG circuit, an SPI Flash and an EMMC; the FPGA chips are interconnected by Chiplink and are connected to Switch through UART;
the power supply module comprises a power regulator; the power regulator is connected with an external +12V power supply and is simultaneously connected with the BMC control module and the chip processing module;
the power supply module supplies power to the BMC control module and the chip processing module; the BMC control module is used for out-of-band monitoring management; the chip processing module is used for carrying out multi-channel video coding and decoding.
Furthermore, the accelerator card also comprises a programmable clock chip, an indicator light key, a reset key, a test point key and an external interface;
the programmable clock chip is used for keeping clock synchronization, displaying and recording time;
the indicator light key is used for indicating the fault or the in-place of the accelerator card;
the reset key is used for restarting under the condition of no power failure when the accelerator card fails;
the test point key is used for input and output of a pin during FPGA chip test;
the external interface comprises a USB and a HUB.
Furthermore, the FPGA chip adopts a ZU7EV chip of Xilinx.
Further, the capacity of the DDR4 memory bank is 4 GB.
Furthermore, the FPGA chips are interconnected by using Chiplink, the port specification of the Chiplink interconnection between the FPGA chips is Serdes x4, and the wiring rate of the FPGA chips is 10 Gbps.
A multi-channel video coding and decoding acceleration method is realized based on an acceleration system of multi-channel video coding and decoding, and comprises the following steps:
s1: the Server transmits the H.264/H.265 coded data transmitted by the Camera network Camera to a chip processing module in the accelerator card;
s2: a chip processing module in the accelerator card averagely distributes H.264/H.265 coded data transmitted by a Camera network Camera to each FPGA chip according to a code stream, and each FPGA chip firstly decodes the received code stream data and then carries out CNN reasoning acceleration and retrieval acceleration;
s3: and the accelerator card transmits the code stream data processed by each FPGA chip in the chip processing module to a Server memory.
Further, step S1 includes:
the NIC in the Server writes the H.264/H.265 coded data transmitted by the Camera network Camera into a first memory space of the App process in the memory of the Server;
calling the accelerator card drive, applying for a second memory space required by the accelerator card drive in the accelerator card memory, and copying or mapping the H.264/H.265 coded data transmitted by the Camera network Camera to the second memory space.
Further, step S2 includes:
the method comprises the steps that a Server writes a PCIE accelerator card register in an MMCFG space, an FPGA reads H.264/H.265 coded data transmitted by a Camera network Camera by adopting DMA (direct memory access), the H.264/H.265 coded data transmitted by the Camera network Camera is carried to a chip processing module from a second space, and the coded data are distributed to each FPGA chip according to code streams by a Switch in an average manner;
and the FPGA chip performs H.264/H.265 decoding on the distributed code stream data and performs CNN reasoning acceleration and retrieval acceleration on the decoded data.
Further, step S3 includes:
after the FPGA chip completes CNN reasoning acceleration and retrieval acceleration, MSI interruption is initiated to the Server, the Server writes a PCIE acceleration card register in an MMCFG space, and the FPGA copies data after the CNN reasoning acceleration and the retrieval acceleration are completed to a second memory space from the FPGA by adopting DMA write operation;
the Server copies or maps the data after CNN reasoning acceleration and retrieval acceleration from the second memory space to the first memory space of the App process;
the accelerator card driver calls back.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the embodiment of the invention provides an acceleration system for multi-channel video coding and decoding, which comprises a Camera, a Server and an acceleration card. The accelerator card comprises a BMC control module, a chip processing module and a power supply module; the BMC control module is used for out-of-band monitoring management, the chip processing module is used for carrying out multi-channel video coding and decoding, and the power supply module supplies power to the BMC control module and the chip processing module. The BMC control module comprises BMC, a watchdog, a fan, a temperature sensor and an EEPROM, the BMC is connected with the watchdog through a UART, and the BMC is further connected with the fan, the temperature sensor and the EEPROM through I2C respectively. The chip processing module comprises 2, 3 or 4 FPGA chips, the FPGA chips are respectively connected with two DDR4 memory banks, a JTAG circuit, an SPI Flash and an EMMC, the FPGA chips are mutually connected through Chiplink, and the FPGA chips are connected to Switch. The power supply module comprises a power regulator, and the power regulator is connected with an external +12V power supply and is simultaneously connected with the BMC control module and the chip processing module. Based on the accelerating system of the multi-channel video coding and decoding, a method for accelerating the multi-channel video coding and decoding is also provided. The invention is realized by using an FPGA chip, does not use a CPU with video coding and decoding functions, such as E3 of intel, and is matched with a P4 card of great Vitta, thereby greatly reducing the cost. In addition, 2, 3 or 3 FPGA chips are used, so that the speed of video coding and decoding can be increased according to actual needs. The FPGA chip adopts the ZU7EV chip of Xilinx, and 8 way video coding and decoding can be carried out to every ZU7EV chip, and 16 way video coding and decoding can be carried out to 2 FPGA chips, and 24 way video coding and decoding can be carried out to 3 FPGA chips, and 32 way video coding and decoding can be carried out to 4 FPGA chips, have improved the speed of video coding and decoding.
Drawings
Fig. 1 is a schematic structural connection diagram of an accelerator card in an acceleration system for multi-channel video encoding and decoding according to embodiment 1 of the present invention;
fig. 2 is an interconnection topology diagram of 4 FPGA chips of an accelerator card in an acceleration system based on multi-channel video encoding and decoding according to embodiment 1 of the present invention;
fig. 3 is an overall data flow chart of a method for accelerating multi-channel video encoding and decoding according to embodiment 1 of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Example 1
The embodiment 1 of the invention provides an acceleration system for multi-channel video coding and decoding, which comprises a Camera, a Server and an accelerator card.
Camera is used to provide video data; and the Server is used for connecting and receiving the video data and controlling and managing the encoding and decoding data of the accelerator card.
Fig. 1 is a schematic diagram illustrating a structural connection of an accelerator card in an acceleration system for multi-channel video encoding and decoding according to an embodiment 1 of the present invention. The accelerator card comprises a BMC control module, a chip processing module and a power supply module;
the BMC control module is used for out-of-band monitoring management, the chip processing module is used for carrying out multi-channel video coding and decoding, and the power supply module supplies power to the BMC control module and the chip processing module.
The BMC control module comprises BMC, a watchdog, a fan, a temperature sensor and an EEPROM, the BMC is connected with the watchdog through a UART, and the BMC is further connected with the fan, the temperature sensor and the EEPROM through I2C respectively.
The chip processing module comprises 2, 3 or 4 FPGA chips which are respectively connected with two DDR4 memory banks, a JTAG circuit, SPI Flash and EMMC. The capacity of two DDR4 memory banks is 4GB with ECC and frequency 2400 MH.
And each FPGA chip is interconnected by Chiplink. Fig. 2 is a topological diagram of interconnection of 4 FPGA chips of an accelerator card in an acceleration system based on multi-channel video encoding and decoding according to embodiment 1 of the present invention; the port specification Serdes (GTH) x4 of each FPGA chip has a routing rate of 10 Gbps.
The FPGA chip is connected to Switch through UART. The accelerator card is communicated with the Server through a PCIE golden finger, the Server Server provides X8 signals, and the signals are converted into 4X 8 signals through PCIe Switch and are sent to the FPGA chip.
The FPGA chip adopts ZU7EV chips of Xilinx, and each ZU7EV chip can carry out 8-channel video coding and decoding, 2 FPGA chips can carry out 16-channel video coding and decoding, 3 FPGA chips can carry out 24-channel video coding and decoding, and 4 FPGA chips can carry out 32-channel video coding and decoding. In embodiment 1 of the present invention, 4 FPGA chips are used as an illustration, and the scope of the present invention is not limited to embodiment 1.
The power supply module comprises a power regulator, and the power regulator is connected with an external +12V power supply and is simultaneously connected with the BMC control module and the chip processing module.
The accelerator card also comprises a programmable clock chip, an indicator light key, a reset key, a test point key and an external interface;
the programmable clock chip is used for keeping clock synchronization, displaying and recording time; the programmable clock chip is respectively connected with the ZU7EV0 chip, the ZU7EV1 chip, the ZU7EV2 chip and the ZU7EV3 chip.
The indicator light key is used for indicating the fault or the on-site of the accelerator card;
the reset key is used for restarting under the condition of no power failure when the accelerator card has a fault;
the test point keys are used for input and output of pins during FPGA chip test;
the external interface comprises a USB and a HUB.
The accelerator card further comprises a high-definition digital Display interface Display port, and the high-definition digital Display interface Display port is connected with the ZU7EV0 chip, the ZU7EV1 chip, the ZU7EV2 chip and the ZU7EV3 chip.
Based on the system for accelerating the multi-channel video coding and decoding provided by the embodiment 1 of the invention, a method for accelerating the multi-channel video coding and decoding is also provided.
Before executing an acceleration method of multi-channel video codec, first, 32-channel Camera video codec data Camera0, Camera1 … Camera31 in Camera are transmitted to NIP of Server through Switch.
Then, execution goes to step S1: the Server transmits the 32-channel Camera H.264/H.265 coded data transmitted by the Camera network Camera to a chip processing module in the accelerator card;
s2: a chip processing module in the accelerator card distributes 8-path code stream data to 32-path Camera H.264/H.265 coded data transmitted by a Camera network Camera according to each FPGA chip, and each FPGA chip firstly decodes the received 8-path code stream data and then carries out CNN inference acceleration and retrieval acceleration;
s3: and the accelerator card transmits the code stream data processed by each FPGA chip in the chip processing module to a Server memory.
Fig. 3 is a general data flow chart of an accelerating method for multi-channel video encoding and decoding according to embodiment 1 of the present invention.
Scheme 1: and the NIC in the Server writes the 32-channel Camera H.264/H.265 coded data transmitted by the Camera network Camera into a first memory space of the App process in the Server memory.
And (2) a flow scheme: calling the accelerator card drive, applying for a second memory space required by the accelerator card drive in the accelerator card memory, and copying or mapping the 32-path Camera H.264/H.265 coded data transmitted by the Camera network Camera to the second memory space.
And (3) a flow path: the Server writes a PCIE accelerator card register in the MMCFG space, the FPGA reads 32 paths of Camera H.264/H.265 coded data transmitted by the Camera by adopting DMA (direct memory access), carries the 32 paths of Camera H.264/H.265 coded data transmitted by the Camera to the chip processing module from the second space, and distributes 8 paths of code stream data according to each FPGA chip through the Switch.
And (4) a flow chart: the FPGA chip carries out H.264/H.265 decoding on the distributed 8-path code stream data.
And (5) a flow chart: and the FPGA chip performs CNN reasoning acceleration and retrieval acceleration on the decoded data.
And (6) a flow path: after the FPGA chip completes CNN reasoning acceleration and retrieval acceleration, MSI interruption is initiated to the Server, the Server writes a PCIE acceleration card register in the MMCFG space, and the FPGA copies the data after the CNN reasoning acceleration and the retrieval acceleration from the FPGA chip to a second memory space by adopting DMA write operation;
scheme 7: and the Server copies or maps the data after the CNN reasoning acceleration and the retrieval acceleration from the second memory space to the first memory space of the App process.
And (3) a process 8: the accelerator card driver calls back.
While the invention has been described in detail in the specification and drawings and with reference to specific embodiments thereof, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted; all technical solutions and modifications thereof which do not depart from the spirit and scope of the present invention are intended to be covered by the scope of the present invention.