CN113962842A

CN113962842A - Dynamic stepless despinning system and method based on large-scale integrated circuit high-level synthesis

Info

Publication number: CN113962842A
Application number: CN202111223132.7A
Authority: CN
Inventors: 张弘; 宋剑波; 杨一帆; 邢万里; 袁丁; 李旭亮
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-01-21
Anticipated expiration: 2041-10-20
Also published as: CN113962842B

Abstract

The present invention relates to a dynamic stepless derotation system and method based on high-level synthesis of large-scale integrated circuits, comprising a video acquisition module, a video decoding module, a video storage module, a data communication module, a video coding module, a dynamic stepless derotation module and this The pixel merging module innovatively designed for reducing algorithm delay and improving bus bandwidth utilization in the invention is a four-in-one module. The invention adopts high-level synthesis technology to realize the dynamic stepless derotation function, and can perform real-time derotation processing in the photoelectric platform for the collected video images. Compared with the existing derotation technology, the present invention makes full use of FPGA parallel acceleration and pipeline optimization. It has the characteristics of high video resolution, large derotation range, high derotation accuracy, clear and jagged images after processing, low output delay, strong system stability, easy processing, low power consumption, small size and other excellent characteristics.

Description

Dynamic stepless despinning system and method based on large-scale integrated circuit high-level synthesis

Technical Field

The invention relates to the field of intelligent embedded video processing, in particular to a dynamic stepless despinning system and method based on high-level synthesis of a large-scale integrated circuit.

Background

During the video recording and aiming process of the onboard pod television, the outer frame structure of the television inevitably generates roll motion, which causes relative motion of an optical system relative to the onboard television and further causes image rotation; or in the flight process of the fighter, the airframe always rolls in a large angle (even can reach 360 degrees), so that the television picture rotates in a large angle, and the visual sensation of an operator is seriously influenced. Therefore, in order to eliminate the problem of image rotation caused by the attitude change of the aircraft among a plurality of optical aiming devices or photoelectric pod systems, the original video image acquired by the television system needs to be subjected to anti-rotation processing, namely despinning conversion, so that the normal and stable image is ensured, and the observation of an operator and the subsequent target detection, identification and tracking work are facilitated. At present, in practical engineering application, there are three common despinning modes, namely electronic despinning, optical despinning and physical despinning, and optical despinning is the most used means at present, and corrects an image by rotating a despinning prism in an imaging optical path, although the mode has low delay and high response speed, the processing technology is complex, the despinning angle precision is low, and the system volume and power consumption are large. With the rapid development of large-scale integrated circuits and digital signal processing technologies, the electronic despin technology realized by a real-time video image processing algorithm is the mainstream research direction at present, and the mode overcomes the defects of an optical despin system and is more and more widely applied.

With the continuous development of the computer vision field and the continuous improvement of the performance of various processing chips, the electronic despinning technology based on video image processing becomes the mainstream research direction of various despinning technologies at present, and the elimination of the image rotation problem caused by the change of the aircraft posture through electronic despinning becomes the first choice of the current engineering application.

Disclosure of Invention

The invention solves the problems: the system and the method overcome the defects of the prior art, provide a dynamic stepless despinning system and method based on high-level integration of a large-scale integrated circuit, and realize stepless despinning processing with high precision, large range, high real-time performance and high output image quality by utilizing the characteristics of FPGA parallel acceleration and pipeline optimization based on the high-level integration technology. The precision can reach 0.001 degrees, namely racemization treatment can be carried out on the extremely small angle; the racemization range is 0-360 degrees, namely, racemization treatment can be carried out on any angle; the processing time of one frame of image is less than 12ms, so that real-time despun processing can be realized; and a bilinear interpolation method is adopted for despinning, so that the image is smooth and has no saw teeth, and the image output quality is high. The method is limited in that the contradiction between the real-time performance, the precision, the range and the image quality exists in the prior art, so that the prior art can only realize one or more of the indexes independently and cannot realize all the technical indexes simultaneously, and therefore, the method has high engineering application value.

The technical solution of the invention is as follows: a dynamic non-polar despinning system based on high-level synthesis of a large-scale integrated circuit is designed based on a high-level synthesis method of the large-scale integrated circuit, and has the following innovation points as the core of the invention as a whole: 1) utilizing a high-level comprehensive technology, namely using C + + and other high-level languages to carry out FPGA algorithm design optimization and resource scheduling; 2) the algorithm flow line is accelerated and optimized, the data throughput is improved, the time delay is greatly reduced, and the real-time property of image despinning is improved; 3) the high-bandwidth real-time parallel optimization of the multiple AXI buses improves the data reading and writing efficiency and the algorithm real-time performance; 4) a four-in-one module for four-pixel combination is designed, namely four 8-bit pixel points for bilinear interpolation are combined into 32-bit data, so that the function of reading four pixel points at one time can be realized at the later stage, and the high delay caused by repeated data reading is greatly reduced.

The system comprises a video acquisition module, a video decoding module, a core processing module and a video coding module; the core processing module adopts a heterogeneous system on chip with an FPGA and ARM architecture and is a Zynq UltraScale + MPSoC15EG chip; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate, namely a four-in-one module; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus;

the video acquisition module is used for acquiring an original video image by using a camera, wherein the video image is data to be despuned; the original video image after the acquisition enters a video decoding module;

the video decoding module is used for converting serial videos acquired by the camera into parallel video data and obtaining a series of dominant video synchronous signals, and the parallel video data and the synchronous signals obtained by decoding are sent to the FPGA;

in the FPGA, firstly, a video-to-AXI bus video stream module converts video data into AXI bus video stream data with lower delay and better benefit for realizing data synchronization and pipeline acceleration optimization. The data in the AXI bus video stream format then flows into the four-in-one module of the inventive design, due to the subsequent derotation process of bilinear interpolation, each pixel is read from the DDR for the four pixels immediately adjacent to each pixel, the delay caused by pixel reading is considerable, since multiple pixel readings will cause higher delay, the present invention designs a four-in-one module, that is, every time the data stream flows into two rows, the data stream is cached in the on-chip cache, four 8-bit pixel points around each pixel are merged into one 32-bit data, and when four pixels adjacent to a certain pixel need to be read subsequently, the merged 32-bit pixel needs to be read only once, and divided into four independent 8-bit data, the function of reading four pixel points at a time can be realized, and the processing can fully utilize the AXI bus bandwidth to reduce the time delay to one fourth of the original time delay. Then caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;

the dynamic non-polar despinning module is used for dynamically performing non-polar despinning on video data in a video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by the upper computer through the RS422 serial port communication module, the four-in-one module is matched during despinning processing, 32-bit data read from the DDR is divided into four 8-bit data to perform bilinear interpolation, and a processed video image is still stored in the DDR; and reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display.

The used image electronic racemization algorithm based on bilinear interpolation is as follows:

(1) and solving the coordinates (x, y) of each pixel point (x ', y') of the image after racemization processing corresponding to the pixel point of the image before racemization processing according to the racemization angle sent by the upper computer. The formula is as follows:

wherein theta is a rotation angle,

is a rotation matrix.

Generally set to the center of the image (x)₀,y₀) For rotation of the center of rotation, the above formula should be rewritten as:

writing the above formula as a scalar:

(2) and (4) performing pixel mapping by using a bilinear interpolation method. Since the pixel coordinates (x, y) mapped to the original image calculated in step (1) are often not integers, the pixel mapping cannot be directly performed according to a one-to-one relationship. The non-integer pixel coordinate problem occurring in the mapping process is generally solved by adopting a resampling mode.

According to the image reconstruction theory, three common interpolation methods are generally adopted for image mapping: nearest neighbor interpolation, bilinear interpolation, and cubic interpolation. The interpolation effect of the nearest neighbor interpolation method is poor, and the despiralized image has obvious saw tooth effect and burr phenomenon; the bilinear interpolation method and the cubic interpolation method have good effect, and the gray scale is continuous without saw teeth. The cubic interpolation method has complex algorithm and overlong calculation time, so that the real-time requirement is difficult to meet in practical engineering application. Therefore, the image despinning algorithm based on the bilinear interpolation method is finally selected and used in the invention in consideration of the compromise between the despinning precision and the system real-time property.

The schematic diagram of the electronic racemization algorithm based on bilinear interpolation is shown in fig. 2. The method carries out linear interpolation in the x direction and the y direction according to the gray values of 4 points around the integer coordinate point of the non-integer sampling point. In fig. 2, (x, y) is a pixel coordinate obtained by bilinear interpolation, f (x, y) is a pixel gray scale value at the coordinate (x, y), f (0,0), f (1,0), f (0,1), and f (1,1) is a pixel gray scale value of 4 points around (x, y), so that the calculation formula of the bilinear interpolation method can be obtained as follows:

f(x,y)＝[f(1,0)-f(0,0)]x+[f(0,1)-f(0,0)]y+[f(1,1)-f(1,0)-f(0,1)-f(0,0)]xy+f(0,0)

(3) and determining the image boundary after racemization. The size of the image after rotation is typically changed from before rotation, and therefore the image boundary needs to be re-determined. The determination of the four boundary positions of the upper, lower, left and right of the image is calculated according to the following formula:

left＝max(x₁,x₂,x₃,x₄)

right＝min(x₁,x₂,x₃,x₄)

top＝max(y₁,y₂,y₃,y₄)

bottom＝min(y₁,y₂,y₃,y₄)

(4) the image resolution is fixed. In practical engineering application, the resolution of an output image is often fixed, and after despinning operations of different despin angles are performed on an original video image, the resolution of the image is bound to change and the resolution cannot be fixed, so that the invention aims at clipping the despinned image by taking the image center as the center, and fixing the resolution of the output image, namely keeping the same size of the output image.

The invention focuses on realizing dynamic stepless racemization based on a high-level comprehensive technology of a large-scale integrated circuit, which is an important guarantee for the real-time performance of a high-resolution system and is also the most important innovation point of the invention.

Compared with the prior art, the invention has the advantages that:

(1) the invention innovatively designs a four-in-one module, namely the advantages of high-bandwidth data flow are fully utilized, the data flow is cached in an on-chip cache every two lines flow in, four 8-bit pixel points around each pixel are merged into one 32-bit data and cached into a DDR (double data rate) in a data flow mode, then when a certain pixel point is despund in a bilinear interpolation mode, the 32-bit data can be taken out and divided into four 8-bit pixel points, namely four pixel points required by the bilinear interpolation, the function of reading the four pixel points at one time can be realized, the time delay of an algorithm can be reduced to one fourth of the original time, the processing time delay is the same as the despun processing of nearest neighbor interpolation, but the processing effect is much better than the despun processing of the nearest neighbor interpolation.

(2) And (4) accelerating and optimizing an algorithm pipeline. Compared with a general embedded system, the large-scale integrated circuit FPGA has the great advantage that the algorithm can be optimized in a data pipelining mode, so that the algorithm is compiled in a pipelining mode, when algorithm development is carried out in a Vivado HLS development tool, a precompiled instruction pipeline (pipelining optimization instruction) is used, and the compiled program is ensured to be in accordance with data input, data use and data output once, namely, one piece of data can be input once and used once, and finally, the data flow is prevented from being blocked by a pipelining programming principle that the data must be output and output once, namely, the algorithm can be subjected to pipelining processing in a mode of sacrificing hardware logic resources.

In particular, pipelining allows operations to be performed in parallel, with each execution step not having to wait for all operations to complete before starting the next operation. Pipelining is suitable for functions and cycles, taking circulating pipeline optimization as an example, variables in each cycle relate to three operations of reading, calculating and writing, before pipeline optimization is not performed, the three operations are executed according to a serial sequence, input is read once every 3 clock cycles, and values are output after 2 clock cycles; after the pipeline optimization is carried out, a read operation is executed once in each clock, and multiple groups of data are executed in a parallel mode. The delay conditions before and after pipeline optimization are shown in fig. 3, before pipeline optimization is carried out, 3 clock cycles are needed between two read operations, and the last write operation can be executed after 8 clock cycles; after the pipeline optimization is carried out, 1 clock cycle is needed between two reading operations, the last writing operation can be executed after 4 clock cycles, the pipeline optimization of the visible algorithm improves the data throughput, greatly reduces the time delay and improves the real-time property of image despinning.

(3) Multiple AXI high bandwidth buses are optimized in real-time in parallel. The invention aims to solve the problem that real-time despinning processing is realized on a high-resolution image, and the space of a chip cache (BRAM) of an FPGA chip is limited and is not enough to cache a whole frame of high-resolution image, so that a 64-bit 128MB DDR chip is hung externally at an ARM embedded end and is used for image caching. Different from direct caching in BRAM, because the DDR is externally hung at the ARM end, the FPGA chip needs to read and write data from the FPGA end to the DDR of the ARM end through the AXI bus. As can be derived from analysis and actual measurement of the delay, since the algorithm has been pipeline optimized in (1) and the delay of the racemization algorithm itself has been reduced to a lower level, the delay mainly results from reading and writing data from the DDR over the AXI bus. The FPGA + ARM processing architecture chip used by the invention is Zynq UltraScale + MPSOC15EG, and has abundant AXI bus resources (7 128-bit AXI buses), so that the invention uses a parallel processing mode of a plurality of AXI high-bandwidth buses to read and write and process a plurality of pixel points simultaneously, thereby greatly reducing time delay, increasing data throughput and improving algorithm real-time property. Finally, the invention uses 2 128-bit buses and 1 64-bit bus to carry out multi-bus parallel processing, aiming at 1080p gray level images, the whole time delay of executing bilinear interpolation despinning algorithm in the range of 360 degrees is 12ms, no matter aiming at 30fps video images or 60fps video images, the despinning operation can be completed in one frame time, namely the real-time despinning processing of high resolution images is realized. Meanwhile, the invention only occupies 36% of bus resources, namely, the racemization of 1080p images is realized, so that the resolution of real-time racemization of the images can be further improved by continuously increasing the use of the bus.

(4) And (3) realizing algorithm design optimization and resource scheduling by using a high-level comprehensive technology. The Zynq UltraScale + MPSOC15EG processing chip used by the invention is a heterogeneous embedded chip developed by Xilinx company, is developed by using a Vivado development kit, comprises a high-level development tool Vivado HLS, can use a high-level language (C/C + +/System C) to carry out algorithm development and optimization design according to specific specifications under an HLS development framework, and finally converts the high-level language program into a hardware description language (Verilog HDL/VHDL) program by using the HLS tool. By using a high-level comprehensive tool for development, algorithm design optimization and dynamic scheduling of logic resources can be conveniently performed, the development efficiency is greatly improved, the parallel computing advantages of multiple AXI buses of an FPGA + ARM architecture and the acceleration characteristics of multiple pipelines are fully exerted, and the despinning algorithm performance is remarkably improved. The invention carries out design balance from the aspects of logic resource occupation, delay, throughput and the like, and because the chip hardware used by the invention has richer logic resources, the logic resource occupation is determined to be sacrificed to realize lower algorithm delay and higher data throughput. The invention fully utilizes the advantages of HLS and improves the performance of the racemization algorithm from the aspects of data type optimization and data throughput optimization. Specifically, in the aspect of data type optimization, 20-bit-width data is used for multiple times, however, the data type bit width of the standard C is an integral multiple of 8 bits, and if the integer data with the bit width of 32 bits is directly used, the waste of logic resources is caused, and the advantages of high performance and strong parallel capability of the FPGA cannot be exerted, so that the invention defines one 20-bit-width data by using a mode defined by any bit-width data provided by an HLS tool, and greatly saves the use of the logic resources. The invention discloses a data throughput optimization method, which performs pipeline optimization and cycle expansion optimization on a cycle according to the idea of changing the speed by area, improves the throughput of an algorithm at the cost of sacrificing logic resources and improves the performance of the algorithm.

(5) Through practical tests, real-time despinning can be realized for 1920 x 1080 visible light images, the despinning range is 0-360 degrees, the delay is less than 12ms, the accuracy of the despinning angle can reach 0.001 degrees, the maximum pixel error is less than 1 pixel, and the whole system has the excellent characteristics of high video resolution, large despinning range, high despinning accuracy, clear and non-sawtooth processed images, low output delay, strong system stability, easiness in processing, low power consumption, small size and the like.

Drawings

FIG. 1 is a schematic frame diagram of a dynamic despinning-free system based on high-level integration of LSI;

FIG. 2 is a schematic diagram of an image despinning algorithm principle based on bilinear interpolation;

FIG. 3 is a diagram of the effect of pipeline optimization delay;

FIG. 4 is a flow diagram of a dynamic non-polar racemization processing module;

FIG. 5 is a dynamic non-polar racemization system effect demonstration, wherein (a) is before racemization treatment and (b) is after racemization treatment.

Detailed Description

The following further describes the embodiments of the present invention with reference to the drawings.

As shown in fig. 1, the despinning system of the present invention includes a video capture module, a video decoding module, a core processing module and a video encoding module; the core processing module adopts a heterogeneous system on chip with an FPGA + ARM architecture; the FPGA comprises a dynamic despinless module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate, namely a four-in-one module; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus.

The video acquisition module is an industrial camera, the resolution is 1920 multiplied by 1080, the frame frequency is 30Hz or 60Hz, and the video output format is not limited. The video decoding module uses a video decoding chip and has the function of converting an input serial video signal into a parallel format video, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC, and transmitting the data signal, the effective signal and the synchronizing signal to the FPGA for subsequent processing. The video storage module adopts 4 DDR4 of 16-bit 128MB to combine into DDR of 64-bit 128MB, because racemization processing needs whole frame image caching, and the on-chip cache space in FPGA is smaller, which is not enough to store whole frame image, therefore, external memory is needed, the invention finally selects to store DDR on ARM end of Zynq chip, thus being more beneficial to subsequent operation. The data communication module mainly comprises two parts, one part is communication between the electronic despin system designed by the invention and the main control of the upper computer, the communication is designed based on RS422, and the stable low-speed transmission protocol can meet the transmission of the despin angle in the system; the other is communication between an FPGA end and an ARM end in the Zynq chip, and the communication between the FPGA end and the ARM end adopts an AXI bus communication protocol provided by Xilinx to transmit instruction information and image information through an AXI bus. The video coding module is a video coding chip and is used for converting parallel video data, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC into serial video signals to be output, and finally outputting the serial video signals to a display or an acquisition card to be displayed in real time. The model of a core processing module of the system is Zynq UltraScale + MPSOC15EG, and the Zynq framework chip can fully exert the parallel acceleration function of the FPGA end and the master control scheduling function of the ARM end, and is one of the mainstream chips of the existing heterogeneous system-on-chip. The core of the invention is a four-in-one module and a dynamic non-polar despun module, the algorithm of the dynamic non-polar despun module is deployed at the FPGA end, and the memory scheduling and the communication with the upper computer are carried out at the ARM end.

The invention specifically comprises the following steps:

the method comprises the following steps: video capture and decoding

The invention adopts an industrial camera to collect video images, and carries out video decoding through a decoding chip to obtain parallel videos, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC. The invention is designed based on FPGA AXI data stream, therefore, the related signals obtained by decoding need to be sent to a video to AXI bus video stream module, and parallel video data are converted into AXI bus video stream data, thereby being convenient for realizing the accelerated optimization of the production line in later period with high efficiency.

Step two: immediate neighbor pixel merging

The invention innovatively designs a four-in-one module, a data stream is cached in an on-chip cache every two lines flow in, four 8-bit pixel points adjacent to each pixel are merged into one 32-bit data, then when racemization of bilinear interpolation is carried out on a certain pixel point, the 32-bit data can be taken out and divided into four 8-bit pixel points, namely four pixel points required by the bilinear interpolation, and the function of reading the four pixel points at one time can be realized, so that the algorithm delay can be reduced to one fourth of the original delay.

Step three: video data storage

Caching the 32-bit video stream data merged in the step two into DDR of the ARM through the AXI video stream DDR read-write module;

step four: real-time dynamic non-polar despinning processing of video data

The flow chart of the dynamic non-polar racemization processing module is shown in figure 4. The invention designs a dynamic non-polar racemization algorithm by using Vivado high-level comprehensive technology, and encapsulates the dynamic non-polar racemization algorithm into an IP core, the IP core defines two m _ AXI (AXI host) ports which are respectively used for reading and writing DDR4, the m _ AXI reading port is used for reading original pixel information from a frame buffer area of DDR4 through an AXI bus, after dynamic non-polar racemization processing is carried out through the racemization algorithm, the m _ AXI writing port is used for outputting the original pixel information to another frame buffer area of the DDR, and therefore, the whole process of image racemization is completed.

Step five: video encoding and output display

After the despinning processing in the fourth step, the despinned image is cached in a cache area of the DDR, the cached despinned video image is read into the AXI video stream from the DDR by using the AXI video stream DDR read-write module again, the AXI video stream is converted into parallel video data with a dominant synchronous signal by using the AXI bus video stream video module, and the parallel video data is sent into a video coding chip to be coded and output to a monitor or an acquisition card to carry out real-time display of the despinned result.

According to the steps, the host computer gives any racemization angle, and the system can output a racemization result in real time. For example, the upper computer rotates the despin angle clockwise by 0.625 °, and the images before and after being processed by the despin system are shown in fig. 5. Fig. 5 (a) is an original image before racemization, it can be seen that the image has a tilt in the horizontal direction, that is, the optical axis is not accurately balanced, and a rotation angle in the counterclockwise direction exists, and the rotation angle is 0.625 ° as measured by the upper computer, so the upper computer issues a racemization angle of 0.625 ° to the racemization system, and as shown in fig. 5 (b), it can be seen that the image after racemization has been balanced in the horizontal direction, and the image after racemization has no sawtooth effect, the accuracy of the racemization angle reaches 0.001 °, and the processing time of the frame video image is less than 12ms, which has high real-time performance.

Details not described in the present specification are prior art known to those skilled in the art.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A dynamic stepless despinning system based on large-scale integrated circuit high-level synthesis is characterized in that: the system comprises a video acquisition module, a video decoding module, a core processing module and a video coding module; the core processing module adopts a heterogeneous system on chip with an FPGA + ARM architecture; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is an all-in-one module and is used for reducing algorithm delay and improving the bus bandwidth utilization rate and is innovatively designed; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus;

in the FPGA, firstly, a video-to-AXI bus video stream module converts video data into AXI bus video stream data with lower delay and more beneficial to realizing data synchronization and pipeline acceleration optimization, then data in an AXI bus video stream format flows into a four-in-one module, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when four pixels adjacent to one pixel are required to be read subsequently, the merged 32-bit pixel is only required to be read once and is divided into four independent 8-bit data, namely, the function of reading the four pixel points at one time is realized, and the processing utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay; caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;

the dynamic non-polar despinning module is used for dynamically performing non-polar despinning on video data in a video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by the upper computer through the RS422 serial port communication module, and is matched with the four-in-one module during despinning processing to divide 32-bit data read from the DDR into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR; and reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display.

2. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the four-in-one module and the dynamic non-polar despun module are developed by using a high-level comprehensive tool Vivado HLS, and are subjected to pipeline optimization by using a precompiled instruction pipeline, namely a pipeline optimization instruction, so that under the condition that the requirements of one-time input, one-time use and one-time output of data are met, namely that one data can be input only once and can be used only once, and finally, the data needing 8 clock cycles for processing can be processed only by using 4 clock cycles.

3. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the system also improves the performance of the racemization algorithm in the aspects of data type optimization, namely self-defined bit width data type and data throughput optimization; and performing real-time parallel optimization on the plurality of AXI high-bandwidth buses, and simultaneously reading and writing and processing a plurality of pixel points in a parallel computing mode.

4. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: in the dynamic non-polar despinning module, an image electronic despinning algorithm based on bilinear interpolation is adopted for real-time despinning, and the method specifically comprises the following steps:

(1) according to the despinning angle sent by the upper computer, each pixel point (x ', y') of the despinning processed video image is solved to correspond to the coordinate (x, y) of the pixel point of the video image before despinning processing

Wherein θ represents the racemic angle, x₀，y₀Respectively representing the horizontal and vertical coordinates of the center of the image;

(2) pixel mapping using bilinear interpolation

f(x，y)＝[f(1，0)-f(0，0)]x+[f(0，1)-f(0，0)]y+[f(1，1)-f(1，0)-f(0，1)-f(0，0)]xy+f(0，0)

Wherein, x and y are respectively integer coordinates obtained by rounding off the pixel coordinate point after racemization obtained in (1), f (0,0), f (1,0), f (0,1), f (1,1) is the pixel gray value of 4 points around (x, y), and f (x, y) is the pixel gray value after bilinear interpolation at the (x, y) coordinate;

(3) determining the boundary of the despun image, wherein the size of the rotated image is generally changed compared with that before the rotation, so that the boundary of the video image needs to be determined again, and the determination of the four boundary positions of the video image, namely the upper boundary position, the lower boundary position, the left boundary position and the right boundary position, is calculated according to the following formula:

left＝max(x₁，x₂，x₃，x₄)

right＝min(x₁，x₂，x₃，x₄)

top＝max(y₁，y₂，y₃，y₄)

bottom＝min(y₁，y₂，y₃，y₄)

(4) and fixing the image resolution, cutting the video image after the rotation by taking the center of the video image as the center, and fixing the output image resolution, namely keeping the same size of the output image.

5. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the heterogeneous system on chip with the FPGA and ARM architecture adopted by the core processing module is a Zynq UltraScale + MPSoC15EG chip.

6. A dynamic electrodeless racemization method based on large-scale integrated circuit high-level synthesis is characterized by comprising the following implementation steps:

(1) converting serial video collected by a camera into parallel video data, obtaining a series of dominant video synchronous signals, and sending the parallel video data and the synchronous signals obtained by decoding to an FPGA;

(2) in the FPGA, video data is converted into AXI bus video stream data with lower delay and better benefit for realizing data synchronization and pipeline acceleration optimization through a video-to-AXI bus video stream module;

(3) then the data in the AXI bus video stream format flows into a four-in-one module, as the despun processing of bilinear interpolation is carried out subsequently, each pixel is processed, the four pixels adjacent to each pixel are read from the DDR, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when the four pixels adjacent to a certain pixel are required to be read subsequently, only the merged 32-bit pixel needs to be read once and is divided into four independent 8-bit data, namely the function of reading the four pixel points once is realized, and the processing fully utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay;

(4) caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;

(5) then the dynamic non-polar despinning module performs dynamic non-polar despinning on video data in the video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by an upper computer through the RS422 serial port communication module, the four-in-one module is matched during despinning processing, 32-bit data read from the DDR is divided into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR;

(6) reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display;

in the steps (3) and (5), the four-in-one module and the dynamic non-polar despin module are developed by using a high-level comprehensive tool Vivado HLS, and a precompiled instruction pipeline is used for carrying out pipeline optimization on the algorithm, so that the programmed program meets the conditions that data is input, used and output once, namely, one data can be input once and used once, and finally, the data needs to be output and output once is subjected to pipeline processing, and the data which needs to be processed in 8 clock cycles originally is processed in 4 clock cycles; in addition, the performance of the despun algorithm is improved from the aspects of data type optimization and data throughput optimization; meanwhile, a plurality of AXI high-bandwidth buses are mobilized to carry out real-time parallel optimization, and a plurality of pixel points are read and written and processed simultaneously in a parallel computing mode.