CN104063847A

CN104063847A - FPGA based guide filter and achieving method thereof

Info

Publication number: CN104063847A
Application number: CN201410272948.2A
Authority: CN
Inventors: 朴燕; 任伟杰; 刘硕; 孙荣春; 王宇
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2014-06-18
Filing date: 2014-06-18
Publication date: 2014-09-24

Abstract

The invention discloses an FPGA based guide filter and an achieving method thereof. The FPGA based guide filter comprises a signal controller, an average filtering module, a parameter calculating module, an addressing dereferencing module, a state judging device, a data cache and a comprehensive operation module. The achieving method of the FPGA based guide filter is combined with an FPGA chip, fully utilizes a characteristic that large-scale parallel operation of the FPGA is easy to achieve and improves the operation speed and reduces the complexity level of the guide filter hardware design while ensuring the filter effect. A filtering treatment parallel structure is given wholly, and the filtering speed is improved. The binary division operation is improved in an addressing dereferencing mode, hardware achievement is facilitated, and the system running time is shortened. A binary division algorithm is improved through the judging device, and the system reliability is improved.

Description

A Guided Filter Based on FPGA and Its Realization Method

技术领域technical field

本发明属于引导滤波器技术领域，尤其涉及一种基于FPGA的引导滤波器及其实现方法。The invention belongs to the technical field of guided filters, in particular to an FPGA-based guided filter and an implementation method thereof.

背景技术Background technique

引导滤波器是一种边缘平滑滤波器，它可以实现图像边缘的平滑、细节增强、以及图像融合去噪等功能，是一种功能强大的滤波器。它的原理是通过一幅引导图像对输入图像进行滤波，输出的图像在保留输入图像整体特征的同时，能充分获取引导图像的变化细节。引导滤波器在图像去噪，图像细节平滑、增强，抠图，羽化等等方面均取得了不错的效果，应用比较广泛。The guided filter is a kind of edge smoothing filter, which can realize functions such as smoothing of image edges, enhancement of details, and image fusion and denoising. It is a powerful filter. Its principle is to filter the input image through a guide image, and the output image can fully obtain the change details of the guide image while retaining the overall characteristics of the input image. Guided filters have achieved good results in image denoising, image detail smoothing, enhancement, matting, feathering, etc., and are widely used.

但是引导滤波算法比较复杂，运算量比较大，从而导致引导滤波器的实时性很差，阻碍了其实际应用。考虑到算法内部并行模块的独立性，硬件实现不失为提高滤波实时性的有效途径之一。目前，绝大部分引导滤波文献是关于其理论研究以及在软件算法仿真的，而关于其硬件实现的则很少。However, the guided filtering algorithm is more complicated and the calculation amount is relatively large, which leads to the poor real-time performance of the guided filter, which hinders its practical application. Considering the independence of parallel modules in the algorithm, hardware implementation is one of the effective ways to improve the real-time performance of filtering. At present, the vast majority of guided filtering literature is about its theoretical research and algorithm simulation in software, but very little about its hardware implementation.

FPGA是基于查找表结构的可编程器件，具有大容量的BlockRAM资源，而且可以通过系统内的重新配置来改变逻辑功能，为引导滤波器的硬件设计与实现提供了可能。如何利用FPGA实现引导滤波器的硬件设计，达到引导滤波快速处理的目标，从而实现引导滤波器在图像处理等领域的实时应用，具有十分重要的意义。FPGA is a programmable device based on a look-up table structure. It has a large-capacity BlockRAM resource, and can change the logic function through reconfiguration in the system, which provides the possibility for the hardware design and implementation of the guided filter. How to use FPGA to realize the hardware design of guided filter to achieve the goal of fast processing of guided filter, so as to realize the real-time application of guided filter in image processing and other fields, is of great significance.

发明内容Contents of the invention

本发明实施例的目的在于提供一种基于FPGA的引导滤波器及其实现方法，旨在解决引导滤波算法比较复杂，运算量比较大，从而导致引导滤波器的实时性很差的问题。The purpose of the embodiments of the present invention is to provide an FPGA-based guided filter and its implementation method, aiming to solve the problem that the guided filter algorithm is relatively complex and the amount of calculation is relatively large, resulting in poor real-time performance of the guided filter.

本发明实施例是这样实现的，一种基于FPGA的引导滤波器，该基于FPGA的引导滤波器包括：信号控制器、均值滤波模块、参数计算模块、寻址取值模块、状态判别器、数据缓存器、综合运算模块；The embodiment of the present invention is realized in this way, a kind of guiding filter based on FPGA, this guiding filter based on FPGA comprises: signal controller, average value filtering module, parameter calculation module, addressing value module, state discriminator, data Registers, integrated computing modules;

信号控制器，由行列计数器和比较器构成，用于对数据实时监控，通过输入的系统控制信号分析数据状态，及时标记并发送行信号及场信号工作指令，确保能够依据指令正确处理数据；The signal controller is composed of row and column counters and comparators, which are used for real-time monitoring of data, analyzing data status through input system control signals, marking and sending row signal and field signal work instructions in time, to ensure that data can be processed correctly according to the instructions;

均值滤波模块，与信号控制器连接，用于对数据进行均值化滤波处理，由局部窗口生成模块和均值计算器模块构成，其中局部窗口生成模块由系统快速生成的FIFO和移位寄存器组成，用于将串行数据并行化输出，均值计算模块由加法器和乘法器组成，用于数据处理，在二进制除法改进算法的基础上，对并行输入到均值计算模块的数据求取总和值，然后进入到乘法器，计算并行输入数据的均值；The average value filtering module is connected with the signal controller and is used to perform average value filtering processing on the data. It is composed of a local window generation module and an average value calculator module. In order to output the serial data in parallel, the average value calculation module is composed of an adder and a multiplier for data processing. On the basis of the improved algorithm of binary division, it calculates the sum of the data input in parallel to the average value calculation module, and then enters to the multiplier to calculate the mean of the parallel input data;

参数计算模块，与均值滤波模块连接，由多个加法器、减法器、乘法器、比较器和寄存器组成，用于收集数据，完成初步数据处理，计算过程中与寻址取值模块互相调用，经计算获取方差值以及局部线性系数值；The parameter calculation module is connected with the mean value filter module and consists of multiple adders, subtractors, multipliers, comparators and registers, used to collect data and complete preliminary data processing. During the calculation process, it calls each other with the addressing value module. Obtain the variance value and local linear coefficient value through calculation;

寻址取值模块，与均值滤波模块和参数计算模块连接，用于对二进制除法算法的改进，充分利用FPGA具有大容量的BIOCKRAM资源，用于存储大量数据和实现快速查找表功能的特点，将除法运算改为一次寄存器寻址过程和一次乘法运算过程，通过改进生成寻址取值模块，为参数计算模块及均值滤波模块提供所需数据；The addressing value module is connected with the mean value filter module and parameter calculation module, which is used to improve the binary division algorithm, and fully utilizes the FPGA's large-capacity BIOCKRAM resource, which is used to store a large amount of data and realize the characteristics of a fast lookup table function. The division operation is changed to a register addressing process and a multiplication operation process, and the addressing and value-taking module is generated by improving to provide the required data for the parameter calculation module and the mean filtering module;

状态判别器，与参数计算模块连接，由比较器、加法器、减法器和寄存器组成，是基于改进计算过程中带有符号形式的二进制数不易于分辨及处理的情况所设置，用于对输入数据对比产生状态使能信号，驱动参数计算模块和综合运算模块按照使能状态智能数据处理，减少系统计算复杂度；The state discriminator is connected with the parameter calculation module, and is composed of a comparator, an adder, a subtractor and a register. It is set based on the fact that binary numbers with symbols in the improved calculation process are not easy to distinguish and process. It is used for input The data comparison generates a state enable signal, and the drive parameter calculation module and the comprehensive operation module are intelligently processed according to the enable state to reduce the computational complexity of the system;

数据缓存器，与综合运算模块连接，由系统快速生成的FIFO和移位寄存器组成，用于协调数据时序，确保数据能够同步处理；The data buffer, connected to the comprehensive operation module, is composed of FIFO and shift register quickly generated by the system, which is used to coordinate the data timing and ensure that the data can be processed synchronously;

综合运算模块，与均值滤波模块、状态判别器和数据缓存器连接，用于数据综合处理，结合数据缓存器和参数计算模块的数据综合处理，最终输出通过导向滤波器滤波之后的图像数据信息。The comprehensive operation module is connected with the mean value filter module, the state discriminator and the data buffer, and is used for data comprehensive processing, combined with the data comprehensive processing of the data buffer and the parameter calculation module, and finally outputs the image data information filtered by the guided filter.

本发明实施例的另一目的在于提供一种基于FPGA的引导滤波器实现方法，该基于FPGA的引导滤波器实现方法包括以下步骤：Another object of the embodiments of the present invention is to provide a method for implementing a guided filter based on FPGA, the method for implementing a guided filter based on FPGA includes the following steps:

步骤一：信号控制器：Step 1: Signal controller:

按照流水线设计方法，局部窗口生成后会随着数据的不断输入逐步右移，利用加法器构成一款计数器，在同步时钟的控制下对系统输入控制信号进行标记、判断并发送状态信息；According to the pipeline design method, after the local window is generated, it will gradually move to the right with the continuous input of data, and an adder is used to form a counter, which marks, judges and sends status information to the system input control signal under the control of the synchronous clock;

步骤二：均值滤波：Step 2: Mean filtering:

均值滤波模块中局部生成窗口为3*3模块，输入图像为P_in，引导图像为I_in，以数据串的形式在同步时钟激励下按行分别输入到局部生成窗口模块，在经过两个FIFO和6个寄存器后，输出数据组1：P1、P2……P9和数据组2：I1、I2……I9，利用乘法器获取数据组3：IP1、IP2……IP9以及数据组4：II1、II2……II9，将四组数据分别送到4个均值计算模块，通过计算分别得到数据组均值：ave_P、ave_I、ave_IP和ave_II；The local generation window in the mean value filter module is a 3*3 module, the input image is P_in, and the guide image is I_in, which are input to the local generation window module row by row in the form of data strings under the excitation of a synchronous clock. After passing through two FIFOs and 6 After registers, output data group 1: P1, P2...P9 and data group 2: I1, I2...I9, use the multiplier to obtain data group 3: IP1, IP2...IP9 and data group 4: II1, II2... ...II9, send the four sets of data to four mean calculation modules respectively, and obtain the mean values of the data sets through calculation: ave_P, ave_I, ave_IP and ave_II;

步骤三：变量计算：Step 3: Variable calculation:

第一步，参数计算：The first step, parameter calculation:

在得到ave_P、ave_I、ave_IP及ave_II四组数据，通过分析整理，对(a_k，b_k)计算；After obtaining the four sets of data of ave_P, ave_I, ave_IP and ave_II, through analysis and sorting, calculate (a _k , b _k );

第二步，寻址取值：基于FPGA对二进制除法改进算法的设计，用被除数乘以除数的倒数，将除法转为乘法来简化计算过程；The second step, addressing value: Based on the FPGA design of the improved binary division algorithm, the dividend is multiplied by the reciprocal of the divisor, and the division is converted into multiplication to simplify the calculation process;

第三步，状态判别器：The third step, state discriminator:

步骤三和步骤六的计算过程中涉及到二进制减法运算，计算过程中会产生负数，基于对带符号二进制数的状态判别器进行处理；The calculation process of steps 3 and 6 involves binary subtraction operations, and negative numbers will be generated during the calculation process, which is based on the processing of the state discriminator for signed binary numbers;

步骤四：均值滤波：Step 4: Mean filtering:

用于对步骤三中得到的(a_k，b_k)分别进行均值滤波，具体方法见步骤二，局部窗口生成模块C、D分别生成数据组5:a_k1，a_k2……a_k9和数据组6：b_k1，b_k2……b_k9到均值计算模块，输出数据组均值ave_a和ave_b；It is used to perform mean value filtering on (a _k , b _k ) obtained in step 3, see step 2 for the specific method, local window generation modules C and D respectively generate data groups 5: a _k1 , a _k2 ... a _k9 and data Group 6: b _k1 , b _k2 ... b _k9 to the mean value calculation module, and output the mean values ave_a and ave_b of the data group;

步骤五：数据缓存器：Step 5: Data buffer:

步骤二和步骤四分别调用了局部窗口生成模块，所以行缓存设计中需要用到4个FIFO，FIFO深度与局部生成窗口中用到的FIFO深度一致，设计期间由于部分公式计算会引起时延，所以在数据缓存器的设计中还需要用到寄存器组，I_in在经过数据缓存器缓存后输出I_in_delay；Step 2 and Step 4 call the local window generation module respectively, so 4 FIFOs are needed in the line buffer design, and the FIFO depth is consistent with the FIFO depth used in the local generation window. During the design period, some formula calculations will cause delays. Therefore, register banks are also needed in the design of the data buffer, and I_in outputs I_in_delay after being cached by the data buffer;

步骤六：综合运算：Step 6: Comprehensive calculation:

如公式所示，利用步骤四得到数据ave_a和ave_b协同步骤五输出的I_in_delay综合运算后输出，所得到的数据即为经过引导滤波器处理后输出的最终图像数据，设计过程中利用状态判别器选择输出。such as formula As shown, the data ave_a and ave_b obtained in step 4 are combined with the I_in_delay output in step 5 and then output. The obtained data is the final image data output after being processed by the guiding filter. During the design process, the state discriminator is used to select the output.

进一步，在步骤二中的具体步骤如下：Further, the specific steps in step two are as follows:

第一步，窗口生成：In the first step, the window is generated:

在FIFO控制器的作用下，P_in和I_in数据被写入FIFO和从FIFO中读出，第(i-2)行数据在同步时钟作用下，先按序存入FIFO1中，存满后按时间顺序存到FIFO2中，同时第(i-1)行数据存入FIFO1中，当FIFO1和FIFO2充满数据且第i行数据到来时，利用流水线设计原理，窗口每行设置两个寄存器用于缓存相同列坐标的数据，等待窗口填满后同时将9个数据并行输出，即得到数据组1：P1、P2……P9和数据组2：I1、I2……I9，获取数据组3：IP1、IP2……IP9以及数据组4：II1、II2……II9，所得到的的四组数据送到均值计算模块中；Under the action of the FIFO controller, the P_in and I_in data are written into and read from the FIFO, and the data in the (i-2) row is first stored in FIFO1 in sequence under the action of the synchronous clock, and then stored according to the time when it is full. Store in FIFO2 sequentially, and store the data in row (i-1) in FIFO1 at the same time. When FIFO1 and FIFO2 are full of data and the i-th row of data arrives, using the principle of pipeline design, two registers are set for each row of the window to cache the same For the data of column coordinates, wait for the window to be filled and output 9 data in parallel at the same time, that is, get data group 1: P1, P2...P9 and data group 2: I1, I2...I9, and get data group 3: IP1, IP2 ...IP9 and data group 4: II1, II2...II9, the obtained four sets of data are sent to the mean calculation module;

第二步，均值计算：The second step, mean calculation:

均值计算分为两个过程，即求和过程和除法运算过程，如公式(7)、(8)所示：The mean value calculation is divided into two processes, that is, the summation process and the division operation process, as shown in formulas (7) and (8):

ave＝Sum/N (7)ave=Sum/N (7)

$Sum Sum = = {Σ Σ}_{i i = = 11}^{N N} {I I}_{i i} - - - - - - ((88))$

其中，ave为计算得到的均值，Sum为滑动模板内所有像素值总和，N为模板内像素个数总数，N为9；Among them, ave is the calculated average value, Sum is the sum of all pixel values in the sliding template, N is the total number of pixels in the template, and N is 9;

a、求和，依照公式(8)，分别对数据组1-4求和，得到Sum1，Sum2，Sum3，Sum4；a, summation, according to the formula (8), sum the data groups 1-4 respectively to obtain Sum1, Sum2, Sum3, Sum4;

b、求均值，求均值是FPGA处理二进制除法运算的过程，通过均值计算后得到4个数据组的均值，分别为ave_P、ave_I、ave_IP和ave_II。b. Calculating the mean value. The process of calculating the mean value is the process of FPGA processing the binary division operation. After calculating the mean value, the mean values of the 4 data groups are obtained, which are respectively ave_P, ave_I, ave_IP and ave_II.

进一步，在步骤三的第一步中，对(a_k，b_k)的计算过程如下：Further, in the first step of step three, the calculation process of (a _k , b _k ) is as follows:

1)依据公式将a_k的计算过程并行分解计算如下：1) According to the formula The calculation process of a _k is decomposed and calculated in parallel as follows:

a、分子即ave_IP-ave_I*ave_P，直接通过乘法器和减法器获得，分子中存在有二进制减法运算，采取状态判别器进行优化；a. Molecule That is, ave_IP-ave_I*ave_P is directly obtained through the multiplier and subtractor, and there is a binary subtraction operation in the numerator, and the state discriminator is used for optimization;

b、分母σ_k ²+ε，其中ε为定值，σ_k ²为引导图像在局部窗口内的方差，利用数学概念对方差的定义，可以知道方差和均值如公式(9)所示关系：b. The denominator σ _k ² +ε, where ε is a fixed value, and σ _k ² is the variance of the guide image in the local window. Using the definition of variance in mathematical concepts, we can know the relationship between variance and mean value as shown in formula (9):

σ_k ²＝E(x²)-E²(X) (9)σ _k ² =E(x ² )-E ² (X) (9)

即ave_II-ave_I*ave_I，计算过程中需要用到1个减法器、1个乘法器以及1个加法器；That is, ave_II-ave_I*ave_I, a subtractor, a multiplier and an adder are required in the calculation process;

c、利用2寻址取值中原理整合公式，获取得到局部窗口内a_k值；c. Utilize the principle integration formula in 2 addressing values to obtain the a _k value in the local window;

2)由公式可知，b_k的计算依赖于a_k，a_k计算过程中存在计算时延，需要用到3个寄存器对数据时序进行缓存，b_k计算过程还要用到1个乘法器以及1个减法器。2) by the formula It can be seen that the calculation of b _k depends on a _k , there is a calculation delay in the calculation process of a _k , and three registers are required to cache the data timing, and a multiplier and a subtractor are also used in the calculation process of b _k .

进一步，在步骤三的第二步中，寻址取值具体方法如下：Further, in the second step of step 3, the specific method of addressing and fetching values is as follows:

当给定除数，便通过计算得到其倒数，由于除数的倒数是小数形式，并且值不大于1，因此取小数部分进行二值化后的高16位，连同整数部分最后1位，形成一个位宽为17的二进制数据，记做除数的倒数，通过乘法运算实现除法功能。When the divisor is given, its reciprocal is obtained by calculation. Since the reciprocal of the divisor is in decimal form and the value is not greater than 1, the high 16 bits after binarization of the decimal part are taken, together with the last 1 bit of the integer part, to form a bit Binary data with a width of 17 is recorded as the reciprocal of the divisor, and the division function is realized through multiplication.

进一步，在步骤三的第三步中，基于对带符号二进制数的状态判别器具体方法如下：Further, in the third step of step three, the specific method based on the state discriminator for signed binary numbers is as follows:

计算n＝A-B及m＝n+k，其中k为常数，A，B为输入值；Calculate n=A-B and m=n+k, wherein k is a constant, and A and B are input values;

1)数值计算与状态判别，对需要涉及到减法的数据A和B进行比较判别，若A大于等于B则记做状态S1，否则记做S2，与此同时做计算n1＝A-B和n2＝B-A；状态S1情况下n＝n1，状态S2情况下n＝n2；1) Numerical calculation and state discrimination, compare and judge the data A and B that need to be subtracted, if A is greater than or equal to B, record it as state S1, otherwise record it as S2, and calculate n1=A-B and n2=B-A at the same time ;n=n1 under the state S1 situation, n=n2 under the state S2 situation;

2)状态保留，后续处理，做计算m1＝n+k和m2＝k-n，如若状态S1情况下m＝m1，否则m＝m2。2) The state is reserved, and subsequent processing is performed to calculate m1=n+k and m2=k-n. If the state is S1, m=m1; otherwise, m=m2.

本发明提供的基于FPGA的引导滤波器及其实现方法，通过结合FPGA芯片，充分利用了FPGA易于实现大规模并行运算的特点，在保证滤波效果的同时提高运算速度，降低了引导滤波器硬件设计的复杂度。本发明从总体上给出了滤波处理的并行结构，提高了滤波的速度；通过寻址取值的方式改进了二进制除法运算，便于硬件实现，提高了系统运行时间；通过判别器设置改进了二进制减法算法，提高了系统可靠性。此外，本发明基于FPGA硬件实现引导滤波器，可应用于图像处理的去噪、增强、细节平滑等领域。The guiding filter based on FPGA and its implementation method provided by the present invention, by combining FPGA chip, make full use of the characteristics that FPGA is easy to realize large-scale parallel computing, improve the computing speed while ensuring the filtering effect, and reduce the hardware design of guiding filter. of complexity. The present invention generally provides a parallel structure of filtering processing, which increases the speed of filtering; improves the binary division operation through addressing and taking values, facilitates hardware implementation, and improves the system running time; Subtraction algorithm improves system reliability. In addition, the present invention realizes the guided filter based on FPGA hardware, and can be applied to fields such as denoising, enhancement, and detail smoothing of image processing.

附图说明Description of drawings

图1是本发明实施例提供的基于FPGA的引导滤波器结构示意图；Fig. 1 is the structure schematic diagram of guiding filter based on FPGA that the embodiment of the present invention provides;

图2是本发明实施例提供的基于FPGA的引导滤波器实现方法流程图；Fig. 2 is the flow chart of the realization method of guiding filter based on FPGA provided by the embodiment of the present invention;

图中：1、信号控制器；2、均值滤波模块；3、参数计算模块；4、寻址取值模块；5、状态判别器；6、数据缓存器；7、综合运算模块；In the figure: 1. Signal controller; 2. Mean value filter module; 3. Parameter calculation module; 4. Addressing value module; 5. State discriminator; 6. Data register; 7. Comprehensive operation module;

图3是本发明实施例提供的局部窗口生成方法原理图；FIG. 3 is a schematic diagram of a local window generation method provided by an embodiment of the present invention;

图4是本发明实施例提供的寻址取值方法原理图；FIG. 4 is a schematic diagram of an addressing and value-taking method provided by an embodiment of the present invention;

图5是本发明实施例提供的状态判别器工作流程图；Fig. 5 is a working flow chart of the state discriminator provided by the embodiment of the present invention;

图6是本发明实施例提供的数据缓存器工作原理图。Fig. 6 is a working principle diagram of the data buffer provided by the embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

下面结合附图及具体实施例对本发明的应用原理作进一步描述。The application principle of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

如图1所示，本发明实施例的基于FPGA的引导滤波器主要由：信号控制器1、均值滤波模块2、参数计算模块3、寻址取值模块4、状态判别器5、数据缓存器6、综合运算模块7组成；As shown in Figure 1, the guiding filter based on FPGA of the embodiment of the present invention is mainly composed of: signal controller 1, mean value filtering module 2, parameter calculation module 3, addressing value module 4, state discriminator 5, data register 6. Composed of 7 comprehensive computing modules;

信号控制器1，由行列计数器和比较器构成，用于对数据实时监控，通过输入的系统控制信号分析数据状态，及时标记并发送行信号及场信号等工作指令，确保能够依据指令正确处理数据；Signal controller 1, composed of row and column counters and comparators, is used to monitor data in real time, analyze data status through input system control signals, mark and send work instructions such as row signals and field signals in time, and ensure that data can be processed correctly according to instructions ;

均值滤波模块2，与信号控制器1连接，用于对数据进行均值化滤波处理，主要由局部窗口生成模块和均值计算器模块构成，其中局部窗口生成模块由系统快速生成的FIFO和移位寄存器组成，用于将串行数据并行化输出，均值计算模块由加法器和乘法器组成，用于数据处理，在二进制除法改进算法的基础上，对并行输入到均值计算模块的数据求取总和值，然后进入到乘法器，计算并行输入数据的均值；The average value filter module 2 is connected with the signal controller 1, and is used to perform average value filter processing on the data. It is mainly composed of a local window generation module and an average value calculator module. The local window generation module is a FIFO and a shift register quickly generated by the system It is used to parallelize the output of serial data. The average value calculation module is composed of an adder and a multiplier for data processing. On the basis of the improved algorithm of binary division, the sum value is calculated for the data input in parallel to the average value calculation module. , and then enter the multiplier to calculate the mean value of the parallel input data;

参数计算模块3，与均值滤波模块2连接，由多个加法器、减法器、乘法器、比较器和寄存器组成，用于收集数据，完成初步数据处理，计算过程中与寻址取值模块4互相调用，从而经计算获取方差值以及局部线性系数值；The parameter calculation module 3 is connected with the mean value filtering module 2, and is composed of multiple adders, subtractors, multipliers, comparators and registers, and is used to collect data and complete preliminary data processing. During the calculation process, it is connected with the addressing value module 4 Call each other to obtain the variance value and local linear coefficient value through calculation;

寻址取值模块4，与均值滤波模块2和参数计算模块3连接，用于对二进制除法算法的改进，提高系统运行速度，充分利用FPGA具有大容量的BIOCKRAM资源，可用于存储大量数据和实现快速查找表功能的特点，将除法运算改为一次寄存器寻址过程和一次乘法运算过程，通过改进生成寻址取值模块4，为参数计算模块3及均值滤波模块2提供所需数据；The addressing value module 4 is connected with the mean value filter module 2 and the parameter calculation module 3, and is used to improve the binary division algorithm, improve the operating speed of the system, and make full use of the large-capacity BIOCKRAM resource of the FPGA, which can be used to store a large amount of data and implement The characteristics of the fast lookup table function change the division operation into a register addressing process and a multiplication operation process, and provide the required data for the parameter calculation module 3 and the mean filtering module 2 by improving the generated addressing value module 4;

状态判别器5，与参数计算模块3连接，由比较器、加法器、减法器和寄存器组成，是基于改进计算过程中带有符号形式的二进制数不易于分辨及处理的情况所设置，用于对输入数据对比产生状态使能信号，驱动参数计算模块3和综合运算模块7按照使能状态智能数据处理，减少系统计算复杂度；The state discriminator 5 is connected with the parameter calculation module 3 and is composed of a comparator, an adder, a subtractor and a register. It is based on the fact that the binary numbers with symbolic forms in the improved calculation process are not easy to distinguish and handle. It is used for A state enabling signal is generated by comparing the input data, and the driving parameter calculation module 3 and the comprehensive operation module 7 are intelligently processed according to the enabling state to reduce the computational complexity of the system;

数据缓存器6，与综合运算模块7连接，由系统快速生成的FIFO和移位寄存器组成，用于协调数据时序，确保数据能够同步处理；The data buffer 6 is connected with the comprehensive operation module 7, and is composed of FIFO and shift register quickly generated by the system, and is used to coordinate the data timing and ensure that the data can be processed synchronously;

综合运算模块7，与均值滤波模块2、状态判别器5和数据缓存器6连接，用于数据综合处理，结合数据缓存器6和参数计算模块3的数据综合处理，最终输出通过导向滤波器滤波之后的图像数据信息，。The comprehensive operation module 7 is connected with the mean value filter module 2, the state discriminator 5 and the data buffer 6, and is used for data comprehensive processing, combined with the data comprehensive processing of the data buffer 6 and the parameter calculation module 3, and the final output is filtered by a guided filter After the image data information,.

如图2所示，本发明实施例的基于FPGA的引导滤波器实现方法包括以下步骤：As shown in Figure 2, the FPGA-based guided filter implementation method of the embodiment of the present invention includes the following steps:

S201：按照流水线设计方法，局部窗口生成后会随着数据的不断输入逐步右移，利用加法器构成一款计数器，在同步时钟的控制下对系统输入控制信号进行标记、判断并发送状态信息；S201: According to the pipeline design method, after the local window is generated, it will gradually move to the right with the continuous input of data, and an adder is used to form a counter, and the system input control signal is marked, judged and sent status information under the control of the synchronous clock;

S202：输入图像和引导图像以数据串的形式在同步时钟激励下按行分别输入到局部生成窗口模块，在经过两个FIFO和6个寄存器后，输出数据组1：P1、P2……P9和数据组2：I1、I2……I9，利用乘法器获取数据组3：IP1、IP2……IP9以及数据组4：II1、II2……II9，将四组数据分别送到4个均值计算模块，通过计算分别得到数据组均值：ave_P、ave_I、ave_IP和ave_II；S202: The input image and the guide image are respectively input to the local generation window module row by row in the form of a data string under the excitation of the synchronous clock, and after passing through two FIFOs and 6 registers, the output data group 1: P1, P2...P9 and Data group 2: I1, I2...I9, use the multiplier to obtain data group 3: IP1, IP2...IP9 and data group 4: II1, II2...II9, and send the four groups of data to 4 mean calculation modules respectively, The mean value of the data group is obtained by calculation: ave_P, ave_I, ave_IP and ave_II;

S203：将得到ave_P、ave_I、ave_IP及ave_II四组数据，通过分析整理，计算(a_k，b_k)，寻址取值用被除数乘以除数的倒数，将除法转为乘法来简化计算过程，计算过程中会产生负数，采用状态判别器处理；S203: The four sets of data of ave_P, ave_I, ave_IP and ave_II will be obtained, and through analysis and sorting, calculation (a _k , b _k ), the addressing value is multiplied by the dividend and the reciprocal of the divisor, and the division is converted into multiplication to simplify the calculation process, Negative numbers will be generated during the calculation process, which is processed by a state discriminator;

S204：均值滤波，用于对得到的(a_k，b_k)分别进行均值滤波，局部窗口生成模块C、D分别生成数据组5:a_k1，a_k2……a_k9和数据组6：b_k1，b_k2……b_k9到均值计算模块，输出数据组均值ave_a和ave_b；S204: mean value filtering, used to perform mean value filtering on the obtained (a _k , b _k ), respectively, local window generation modules C and D respectively generate data group 5: a _k1 , a _k2 ... a _k9 and data group 6: b _k1 , b _k2 ...b _k9 to the mean value calculation module, and output the mean values ave_a and ave_b of the data set;

S205：引导图像I_in在经过数据缓存器缓存后输出I_in_delay；S205: output I_in_delay after the guide image I_in is cached by the data buffer;

S206：综合运算：利用得到数据ave_a和ave_b协同输出的I_in_delay综合运算后输出，所得到的数据即为经过引导滤波器处理后输出的最终图像数据。S206: Comprehensive calculation: use the I_in_delay of the obtained data ave_a and ave_b to jointly output and output after comprehensive calculation, and the obtained data is the final image data output after being processed by the guiding filter.

步骤S202，具体包括：Step S202 specifically includes:

步骤一，窗口生成：Step 1, window generation:

在FIFO控制器的作用下，输入图像和引导图像的数据被写入FIFO和从FIFO中读出，第(i-2)行数据在同步时钟作用下，先按序存入FIFO1中，存满后按时间顺序存到FIFO2中，与此同时第(i-1)行数据存入FIFO1中，当FIFO1和FIFO2充满数据且第i行数据到来时，利用流水线设计原理，窗口每行设置两个寄存器用于缓存相同列坐标的数据，这样等待窗口填满后同时将9个数据并行输出，即可以得到数据组1：P1、P2……P9和数据组2：I1、I2……I9，根据步骤二的要求，还需要设计乘法器，获取数据组3：IP1、IP2……IP9以及数据组4：II1、II2……II9，所得到的的四组数据送到均值计算模块中；Under the action of the FIFO controller, the data of the input image and the guide image are written into and read out from the FIFO, and the data of the (i-2) line is first stored in FIFO1 in sequence under the action of the synchronous clock, and the data is full. After that, it is stored in FIFO2 in chronological order, and at the same time, the data of line (i-1) is stored in FIFO1. When FIFO1 and FIFO2 are full of data and the i-th line of data arrives, using the principle of pipeline design, two windows are set for each line. The register is used to cache the data of the same column coordinates, so that after waiting for the window to fill up, 9 data are output in parallel at the same time, that is, data group 1: P1, P2...P9 and data group 2: I1, I2...I9 can be obtained, according to For the requirements of step 2, it is also necessary to design a multiplier to obtain data group 3: IP1, IP2...IP9 and data group 4: II1, II2...II9, and the obtained four groups of data are sent to the mean calculation module;

步骤二，均值计算：Step 2, mean calculation:

ave＝Sum/N (7)ave=Sum/N (7)

$Sum Sum = = {Σ Σ}_{i i = = 11}^{N N} {I I}_{i i} - - - - - - ((88))$

其中，ave为计算得到的均值，Sum为滑动模板内所有像素值总和，N为模板内像素个数总数，在本发明中N为9；Wherein, ave is the calculated mean value, Sum is the sum of all pixel values in the sliding template, and N is the total number of pixels in the template, and N is 9 in the present invention;

在步骤S203中，对(a_k，b_k)的计算过程如下：In step S203, the calculation process for (a _k , b _k ) is as follows:

1)依据公式(3)将a_k的计算过程并行分解计算如下：1) According to formula (3), the calculation process of a _k is decomposed and calculated in parallel as follows:

a、分子即ave_IP-ave_I*ave_P，可以直接通过乘法器和减法器获得，分子中存在有二进制减法运算，其采取状态判别器进行优化，具体说明参见步骤八；a. Molecule That is, ave_IP-ave_I*ave_P can be obtained directly through multipliers and subtractors. There is a binary subtraction operation in the numerator, which is optimized by a state discriminator. For details, see step 8;

σ_k ²＝E(x²)-E²(X) (9)σ _k ² =E(x ² )-E ² (X) (9)

2)由公式(3)可知，b_k的计算依赖于a_k，a_k计算过程中存在计算时延，需要用到3个寄存器对数据时序进行缓存，b_k计算过程还要用到1个乘法器以及1个减法器。2) From the formula (3), it can be known that the calculation of b _k depends on a _k , and there is a calculation delay in the calculation process of a _k , three registers are needed to cache the data timing, and one more register is used in the calculation process of b _k multiplier and a subtractor.

在步骤S203中，寻址取值用被除数乘以除数的倒数，将除法转为乘法来简化计算过程，具体方法如下：In step S203, the addressing value is multiplied by the dividend by the reciprocal of the divisor, and the division is converted into multiplication to simplify the calculation process. The specific method is as follows:

当给定了除数，便可以通过计算得到其倒数，由于除数的倒数是小数形式，并且其值不大于1，因此取小数部分进行二值化后的高16位，连同整数部分最后1位，形成一个位宽为17的二进制数据，记做除数的倒数，这样便可以通过乘法运算实现除法功能。When the divisor is given, its reciprocal can be obtained by calculation. Since the reciprocal of the divisor is in decimal form and its value is not greater than 1, the upper 16 bits after binarization of the decimal part are taken, together with the last 1 bit of the integer part, Form a binary data with a bit width of 17, which is recorded as the reciprocal of the divisor, so that the division function can be realized through multiplication.

在步骤S203中，基于对带符号二进制数的改进设计了状态判别器，具体方法如下：In step S203, a state discriminator is designed based on the improvement of the signed binary number, and the specific method is as follows:

举例计算“n＝A-B”及“m＝n+k”，其中k为常数，A，B为输入值；For example, calculate "n=A-B" and "m=n+k", where k is a constant, and A and B are input values;

1)数值计算与状态判别，对需要涉及到减法的数据A和B进行比较判别，若A大于等于B则记做状态S1，否则记做S2，与此同时做计算“n1＝A-B”和“n2＝B-A”；状态S1情况下“n＝n1”，状态S2情况下“n＝n2”；1) Numerical calculation and state discrimination, compare and judge the data A and B that need to be involved in subtraction, if A is greater than or equal to B, record it as state S1, otherwise record it as S2, and at the same time calculate "n1=A-B" and " n2=B-A"; "n=n1" under the state S1 situation, "n=n2" under the state S2 situation;

2)状态保留，后续处理，做计算“m1＝n+k”和“m2＝k-n”，如若状态S1情况下“m＝m1”，否则“m＝m2”。2) State preservation, follow-up processing, calculation "m1=n+k" and "m2=k-n", if state S1 is "m=m1", otherwise "m=m2".

本发明的具体包括如下：Concrete of the present invention includes as follows:

步骤一：信号控制器：Step 1: Signal controller:

按照流水线设计方法，局部窗口生成后会随着数据的不断输入逐步右移，本发明中利用加法器构成一款计数器，在同步时钟的控制下对系统输入控制信号进行标记、判断并发送状态信息；According to the pipeline design method, after the local window is generated, it will gradually move to the right with the continuous input of data. In the present invention, an adder is used to form a counter, and the system input control signal is marked, judged and sent status information under the control of the synchronous clock. ;

步骤二：均值滤波：Step 2: Mean filtering:

本发明中均值滤波模块中局部生成窗口为3*3模块，输入图像为P_in，引导图像为I_in，它们以数据串的形式在同步时钟激励下按行分别输入到局部生成窗口模块，在经过两个FIFO和6个寄存器后，输出数据组1：P1、P2……P9和数据组2：I1、I2……I9，利用乘法器获取数据组3：IP1、IP2……IP9以及数据组4：II1、II2……II9，将四组数据分别送到4个均值计算模块，通过计算分别得到数据组均值：ave_P、ave_I、ave_IP和ave_II，具体分解步骤如下：In the present invention, the local generation window in the mean value filter module is a 3*3 module, the input image is P_in, and the guide image is I_in. They are respectively input to the local generation window module by row under the synchronous clock excitation in the form of data strings. After two After 1 FIFO and 6 registers, output data group 1: P1, P2...P9 and data group 2: I1, I2...I9, use the multiplier to obtain data group 3: IP1, IP2...IP9 and data group 4: II1, II2...II9, send the four sets of data to the four mean value calculation modules respectively, and obtain the mean values of the data groups through calculation: ave_P, ave_I, ave_IP and ave_II. The specific decomposition steps are as follows:

1窗口生成：1 window generation:

如图3所示，本发明中3*3局部窗口生成模块的设计，需要用到2个FIFO以及6个寄存器，因为FIFO需要存储一行数据，所以本发明中FIFO深度为1024，数据位宽为8位，具有先进先出的特点，在FIFO控制器的作用下，P_in和I_in数据被写入FIFO和从FIFO中读出，第(i-2)行数据在同步时钟作用下，先按序存入FIFO1中，存满后按时间顺序存到FIFO2中，与此同时第(i-1)行数据存入FIFO1中，当FIFO1和FIFO2充满数据且第i行数据到来时，利用流水线设计原理，窗口每行设置两个寄存器用于缓存相同列坐标的数据，这样等待窗口填满后同时将9个数据并行输出，即可以得到数据组1：P1、P2……P9和数据组2：I1、I2……I9，根据步骤二的要求，还需要设计乘法器，获取数据组3：IP1、IP2……IP9以及数据组4：II1、II2……II9，所得到的四组数据送到均值计算模块中；As shown in Figure 3, the design of the 3*3 local window generation module in the present invention needs to use 2 FIFOs and 6 registers, because the FIFO needs to store one line of data, so the FIFO depth in the present invention is 1024, and the data bit width is 8 bits, with the characteristics of first-in-first-out. Under the action of the FIFO controller, P_in and I_in data are written into and read from the FIFO, and the data of the (i-2) row is first sequentially Store it in FIFO1, and store it in FIFO2 in chronological order when it is full. At the same time, the data of the (i-1) row is stored in FIFO1. When FIFO1 and FIFO2 are full of data and the i-th row of data arrives, use the pipeline design principle , set two registers in each row of the window to cache the data of the same column coordinates, so that after waiting for the window to fill up, 9 data will be output in parallel at the same time, that is, data group 1: P1, P2...P9 and data group 2: I1 can be obtained , I2...I9, according to the requirements of step 2, a multiplier needs to be designed to obtain data group 3: IP1, IP2...IP9 and data group 4: II1, II2...II9, and the obtained four groups of data are sent to the mean value In the computing module;

2均值计算：2 mean calculation:

ave＝Sum/N (7)ave=Sum/N (7)

$Sum Sum = = {Σ Σ}_{i i = = 11}^{N N} {I I}_{i i} - - - - - - ((88))$

b、求均值，求均值是FPGA处理二进制除法运算的过程，本发明中对于二进制除法运算提出改进算法，具体在步骤三中的2寻址取值中详细说明，通过均值计算后得到4个数据组的均值，分别为ave_P、ave_I、ave_IP和ave_II；B, seeking mean value, seeking mean value is the process that FPGA handles binary division operation, proposes improved algorithm for binary division operation in the present invention, specifically in step 3 in 2 addressing values in detail, obtains 4 data after calculating by mean value Group means, respectively ave_P, ave_I, ave_IP and ave_II;

步骤三：变量计算模块：Step 3: Variable calculation module:

1参数计算：1 parameter calculation:

在步骤二中得到ave_P、ave_I、ave_IP及ave_II四组数据，通过分析整理，对(a_k，b_k)的计算过程如下：In step 2, the four sets of data of ave_P, ave_I, ave_IP and ave_II are obtained, and after analysis and sorting, the calculation process of (a _k , b _k ) is as follows:

σ_k ²＝E(x²)-E²(X) (9)σ _k ² =E(x ² )-E ² (X) (9)

2)由公式(3)可知，b_k的计算依赖于a_k，a_k计算过程中存在计算时延，需要用到3个寄存器对数据时序进行缓存，b_k计算过程还要用到1个乘法器以及1个减法器；2) From the formula (3), it can be known that the calculation of b _k depends on a _k , and there is a calculation delay in the calculation process of a _k , three registers are needed to cache the data timing, and one more register is used in the calculation process of b _k multiplier and a subtractor;

2寻址取值：2 addressing value:

如图4所示，寻址取值是基于FPGA对二进制除法改进算法的设计，用被除数乘以除数的倒数，将除法转为乘法来简化计算过程，具体设计思想如下：As shown in Figure 4, the addressing value is based on FPGA’s design of an improved binary division algorithm. The dividend is multiplied by the reciprocal of the divisor, and the division is converted into multiplication to simplify the calculation process. The specific design ideas are as follows:

当给定了除数，便可以通过计算得到其倒数，由于除数的倒数是小数形式，并且其值不大于1，因此取小数部分进行二值化后的高16位，连同整数部分最后1位，形成一个位宽为17的二进制数据，记做除数的倒数，这样便可以通过乘法运算实现除法功能；When the divisor is given, its reciprocal can be obtained by calculation. Since the reciprocal of the divisor is in decimal form and its value is not greater than 1, the upper 16 bits after binarization of the decimal part are taken, together with the last 1 bit of the integer part, Form a binary data with a bit width of 17, which is recorded as the reciprocal of the divisor, so that the division function can be realized through multiplication;

本发明基于FPGA的引导滤波器硬件系统设计与实现中，涉及到大量的除法运算，增添了设计过程的复杂度，综合考虑在设计中处于分母的除数虽然为不定值，但其数值范围始终保持在在0到255之间，所以通过求取除数倒数与被除数相乘实现除法功能是可行的，可以做这样的定义：除数作为FPGA中的寄存器地址，其倒数作为寄存器的内容，所以计算除数倒数的过程，可以认为是一次寄存器寻址的过程，其原理过程如图4所示，设计过程中需要开设一个位宽为17位，深度为256的存储空间，用于寄存除数倒数；In the design and implementation of the FPGA-based guided filter hardware system of the present invention, a large number of division operations are involved, which increases the complexity of the design process. Considering that the divisor in the denominator in the design is an indefinite value, its numerical range is always It is kept between 0 and 255, so it is feasible to realize the division function by multiplying the reciprocal of the divisor and the dividend. This definition can be made: the divisor is used as the register address in the FPGA, and its reciprocal is used as the content of the register, so calculate the divisor The reciprocal process can be considered as a register addressing process. The principle process is shown in Figure 4. During the design process, a storage space with a bit width of 17 bits and a depth of 256 is required to store the reciprocal of the divisor;

3状态判别器：3 state discriminator:

步骤三和步骤六的计算过程中涉及到二进制减法运算，计算过程中会产生负数，基于对带符号二进制数的改进设计了状态判别器，如图5所示，其具体设计思想如下：The calculation process of steps 3 and 6 involves binary subtraction, and negative numbers will be generated during the calculation process. Based on the improvement of signed binary numbers, a state discriminator is designed, as shown in Figure 5. The specific design ideas are as follows:

2)状态保留，后续处理，做计算“m1＝n+k”和“m2＝k-n”，如若状态S1情况下“m＝m1”，否则“m＝m2”；2) State reservation, follow-up processing, calculation "m1=n+k" and "m2=k-n", if state S1 "m=m1", otherwise "m=m2";

步骤四：均值滤波：Step 4: Mean filtering:

用于对步骤三中得到的(a_k，b_k)分别进行均值滤波，具体设计方法见步骤二，局部窗口生成模块C、D分别生成数据组5:a_k1，a_k2……a_k9和数据组6：b_k1，b_k2……b_k9到均值计算模块，输出数据组均值ave_a和ave_b；It is used to perform mean value filtering on (a _k , b _k ) obtained in step 3. For the specific design method, see step 2. Local window generation modules C and D respectively generate data groups 5: a _k1 , a _k2 ... a _k9 and Data group 6: b _k1 , b _k2 ... b _k9 to the mean value calculation module, and output the mean values ave_a and ave_b of the data group;

步骤五：数据缓存器：Step 5: Data buffer:

由于在局部窗口生成阶段存在行时延，步骤二和步骤四分别调用了局部窗口生成模块，所以行缓存设计中需要用到4个FIFO，FIFO深度与局部生成窗口中用到的FIFO深度一致，设计期间由于部分公式计算会引起时延，所以在数据缓存器的设计中还需要用到寄存器组，设计原理如图6所示，I_in在经过数据缓存器缓存后输出I_in_delay；Because there is a line delay in the partial window generation stage, step 2 and step 4 respectively call the local window generation module, so four FIFOs are needed in the line buffer design, and the FIFO depth is consistent with the FIFO depth used in the partial generation window. During the design period, due to the delay caused by some formula calculations, the register set is also needed in the design of the data buffer. The design principle is shown in Figure 6. I_in outputs I_in_delay after being cached by the data buffer;

步骤六：综合运算：Step 6: Comprehensive operation:

如公式(6)所示，利用步骤四得到数据ave_a和ave_b协同步骤五输出的I_in_delay综合运算后输出，所得到的数据即为经过引导滤波器处理后输出的最终图像数据，设计过程中利用状态判别器选择输出，选择方法具体见步骤三3.3状态判别器。As shown in formula (6), the data ave_a and ave_b obtained in step 4 are combined with the I_in_delay output in step 5 and then output. The obtained data is the final image data output after being processed by the guiding filter. During the design process, the state is used The discriminator selects the output. For the selection method, refer to Step 3 3.3 State Discriminator.

步骤一至步骤六详细介绍了本发明基于FPGA的引导滤波器硬件设计与实现的设计思想及设计过程，具体实例可以依据用户对处理结果的不同需求进行局部窗口改进，设计完成后，通过QuartusII11.0工具对系统进行综合、实现，生成可下载的配置FPGA的流文件。Steps 1 to 6 have introduced in detail the design idea and design process of the FPGA-based guide filter hardware design and implementation of the present invention. The specific examples can be improved according to the different needs of the user for the processing results. After the design is completed, through QuartusII11.0 The tool synthesizes and implements the system, and generates a downloadable flow file for configuring the FPGA.

本发明的基于FPGA的引导滤波器设计与实现，所使用的FPGA芯片为ALTER公司的CycloneIVEP4CE115F29，共有114480个逻辑单元，3888Kbits嵌入式存储器位，4个锁相环，系统中的寄存器、乘法器、加法器、减法器、缓存器均为系统IP核；FPGA-based guide filter design and realization of the present invention, used FPGA chip is CycloneIVEP4CE115F29 of ALTER company, has 114480 logical units, 3888Kbits embedded memory position, 4 phase-locked loops, register in the system, multiplier, The adder, subtractor, and buffer are all system IP cores;

本发明基于FPGA的引导滤波器硬件系统设计与实现的算法原理：引导滤波器的关键假设是引导图像I与滤波输出q的局部线性模型，假设在以像素点k为中心的方形窗w_k内，q是I的线性转换如(1)式所示：The present invention is based on the algorithm principle of the FPGA-based guide filter hardware system design and implementation: the key assumption of the guide filter is the local linear model of the guide image I and the filter output q, which is assumed to be within a square window w k centered on pixel _k , q is the linear transformation of I as shown in formula (1):

${q q}_{i i} = = {a a}_{k k} {I I}_{i i} + + {b b}_{k k},, {&ForAll; &ForAll;}_{i i} &Element; &Element; {w w}_{k k} - - - - - - ((11))$

在半径为r的局部方形窗w_k内，(a_k，b_k)为恒定的线性系数，可有▽q＝a▽I，为了能够使得输出图像q和输入图像p之间的差别最小，使用最小化方形窗里的费用函数，如式(2)所示：In a local square window w _k with radius r, (a _k , b _k ) is a constant linear coefficient, and there can be ▽q=a▽I, in order to minimize the difference between the output image q and the input image p, Use the cost function in the minimized square window, as shown in formula (2):

$E E. (({a a}_{k k},, {b b}_{k k})) = = \underset{i i &Element; &Element; {w w}_{k k}}{Σ Σ} (({(({a a}_{k k} {I I}_{i i} + + {b b}_{k k} - - {P P}_{i i}))}^{22} + + {ϵa ϵ a}_{k k}^{22})) - - - - - - ((22))$

这里ε是一个正则参数用来防止a_k过大，式(2)的解答可以用线性回归的形式表达：Here ε is a regular parameter to prevent a _k from being too large, and the solution of formula (2) can be expressed in the form of linear regression:

${a a}_{k k} = = \frac{\frac{11}{| | w w | |} \underset{j j}{Σ Σ} {I I}_{i i} {p p}_{i i} - - {u u}_{k k} \overset{&OverBar; &OverBar;}{{p p}_{k k}}}{{σ σ}_{k k}^{22} + + ϵ ϵ} - - - - - - ((33))$

${b b}_{k k} = = \overset{&OverBar; &OverBar;}{{p p}_{k k}} - - {a a}_{k k} {u u}_{k k} - - - - - - ((44))$

这里u_k和σ_k ²是I在方形窗范围内的均值和方差，|w|是方形窗内的所有像素数，是p在方形窗范围内的均值，将线性模型应用于整幅图像的所有局部窗口，最后通过一个式(5)来完善输出；Here u _k and σ _k ² are the mean and variance of I within the square window, |w| is the number of all pixels in the square window, is the mean value of p within the range of the square window, the linear model is applied to all local windows of the entire image, and finally the output is improved by a formula (5);

${q q}_{i i} = = \frac{11}{| | w w | |} \underset{k k : : i i &Element; &Element; {w w}_{k k}}{Σ Σ} (({a a}_{k k} {I I}_{i i} + + {b b}_{k k})) {p p}_{i i} - - - - - - ((55))$

${q q}_{i i} = = \overset{&OverBar; &OverBar;}{{a a}_{k k}} {I I}_{i i} + + \overset{&OverBar; &OverBar;}{{b b}_{k k}} - - - - - - ((66))$

这里 $\overset{&OverBar;}{a_{i}} = \underset{k &Element; w_{k}}{Σ} a_{k}, \overset{&OverBar;}{b_{i}} = \underset{k &Element; w_{k}}{Σ} b_{k};$ here $\overset{&OverBar;}{a_{i}} = \underset{k &Element; w_{k}}{Σ} a_{k}, \overset{&OverBar;}{b_{i}} = \underset{k &Element; w_{k}}{Σ} b_{k};$

由于线性系数分布在不同的空间，所以修改后▽q与▽I不是倍乘关系，另一方面是滤波输出的平均值，其梯度远远小于引导图像I在强边缘附近的梯度，所以，即引导图像I中像素亮度的突然变化大部分保持在输出图像q中。Due to the linear coefficient are distributed in different spaces, so the modified ▽q and ▽I are not multiplicative, on the other hand is the mean value of the filtered output, and its gradient is much smaller than the gradient of the guiding image I near the strong edge, so, That is, sudden changes in pixel brightness in the guide image I are mostly maintained in the output image q.

本发明从总体上给出了滤波处理的并行结构，提高了滤波的速度；通过寻址取值的方式改进了二进制除法运算，便于硬件实现，提高了系统运行时间；通过判别器设置改进了二进制减法算法，提高了系统可靠性；基于FPGA硬件实现引导滤波器，可应用于图像处理的去噪、增强、细节平滑等领域。The present invention generally provides a parallel structure of filtering processing, which increases the speed of filtering; improves the binary division operation through addressing and taking values, facilitates hardware implementation, and improves the system running time; The subtraction algorithm improves the reliability of the system; the guided filter is implemented based on FPGA hardware, which can be applied to the fields of image processing such as denoising, enhancement, and detail smoothing.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims

1. the guiding wave filter based on FPGA, is characterized in that, should comprise by the guiding wave filter based on FPGA: signal controller, mean filter module, parameter calculating module, addressing value module, condition identifier, data buffer, comprehensive computing module;

Signal controller, consists of line count device and comparer, and the time monitoring factually for logarithm is analyzed data mode by the system control signal of input, and tense marker concurrent see off signal and field signal work order, guarantees according to the correct deal with data of instruction;

Mean filter module, be connected with signal controller, for data are carried out to equalization filtering processing, by local window generation module and mean value computation device module composition, FIFO and shift register that wherein local window generation module is generated fast by system form, for serial data parallelization is exported, mean value computation module is comprised of totalizer and multiplier, for data processing, in binary division, improve on the basis of algorithm, the parallel data that are input to mean value computation module are asked for to total value, then enter into multiplier, calculate the average of parallel input data;

Parameter calculating module, be connected with mean filter module, by a plurality of totalizers, subtracter, multiplier, comparer and register, formed, be used for collecting data, completing preliminary data processes, in computation process, call mutually with addressing value module, obtain as calculated variance yields and local linear coefficient value;

Addressing value module, be connected with parameter calculating module with mean filter module, for the improvement to binary division algorithm, make full use of FPGA and there is jumbo BIOCKRAM resource, for storing mass data and the feature that realizes fast finding table function, change division arithmetic into register addressing process and multiplication operation process, by improvement, generate addressing value module, for parameter calculating module and mean filter module provide desired data;

Condition identifier, be connected with parameter calculating module, by comparer, totalizer, subtracter and register, formed, that to be not easy to the situation differentiating and process set based on improving in computation process binary number with sign format, for input Data Comparison is produced to state enable signal, drive parameter calculating module and comprehensive computing module to process according to enabled state intelligent data, reduce system-computed complexity;

Data buffer, is connected with comprehensive computing module, and the FIFO and the shift register that by system, are generated fast form, and for coordination data sequential, guarantees that data can synchronously process;

Comprehensive computing module, be connected with mean filter module, condition identifier and data buffer, for aggregation of data, process, in conjunction with the aggregation of data of data buffer and parameter calculating module, process, final output is by the image data information after the filtering of Steerable filter device.

2. the guiding wave filter implementation method based on FPGA, is characterized in that, should the guiding wave filter implementation method based on FPGA comprise the following steps:

Step 1: signal controller:

According to the pipeline design method, after generating, local window can progressively move to right along with the continuous input of data, and utilize totalizer to form a counter, under the control of synchronous clock, system input control signal is carried out mark, judgement and sends status information;

Step 2: mean filter:

In mean filter module, local generating window is 3*3 module, input picture is P_in, navigational figure is I_in, form with serial data is input in row and separately local generating window module under synchronous clock excitation, after two FIFO and 6 registers, output data groups 1:P1, P2 ... P9 and data group 2:I1, I2 ... I9, utilize multiplier to obtain data group 3:IP1, IP2 ... IP9 and data group 4:II1, II2 ... II9, four groups of data are delivered to respectively to 4 mean value computation modules, by calculating, obtain respectively data class mean: ave_P, ave_I, ave_IP and ave_II,

Step 3: variable calculates:

The first step, calculation of parameter:

Obtaining ave_P, ave_I, ave_IP and tetra-groups of data of ave_II, by analysis and arrangement, to (a _k, b _k) calculate;

Second step, addressing value: based on FPGA, binary division is improved the design of algorithm, be multiplied by the inverse of divisor with dividend, transfer division to multiplication and simplify computation process;

The 3rd step, condition identifier:

In the computation process of step 3 and step 6, relate to binary subtraction computing, in computation process, can produce negative, the condition identifier based on to signed binary number is processed;

Step 4: mean filter:

For (a that step 3 is obtained _k, b _k) carrying out respectively mean filter, concrete grammar is shown in step 2, local window generation module C, D be generated data group 5:a respectively _k1, a _k2a _k9with data group 6:b _k1, b _k2b _k9to mean value computation module, output data groups average ave_a and ave_b;

Step 5: data buffer:

Step 2 and step 4 have been called respectively local window generation module, so need to use 4 FIFO in row cache design, the FIFO degree of depth is consistent with the FIFO degree of depth of using in local generating window, during the design is because the calculating of part formula can cause time delay, so also need to use register group in the design of data buffer, I_in is exporting I_in_delay after data buffer buffer memory;

Step 6: comprehensive computing:

As formula shown in, utilize step 4 to obtain exporting after the comprehensive computing of I_in_delay of the collaborative step 5 output of data ave_a and ave_b, resulting data are the final image data of exporting after guiding filter process, utilize condition identifier to select output in design process.

3. the guiding wave filter implementation method based on FPGA as claimed in claim 2, is characterized in that, the concrete steps in step 2 are as follows:

The first step, window generates:

Under the effect of fifo controller, P_in and I_in data are written into FIFO and read from FIFO, (i-2) row data are under synchronous clock effect, first deposit according to the order of sequence in FIFO1, after being filled with, deposit in chronological order in FIFO2, (i-1) row data deposit in FIFO1 simultaneously, when FIFO1 and FIFO2 are full of data and the capable data of i and arrive, utilize the pipeline design principle, the every row of window arranges two registers for the data of buffer memory same column coordinate, wait for that window fills up rear simultaneously by 9 data parallel outputs, obtain data group 1:P1, P2 ... P9 and data group 2:I1, I2 ... I9, obtain data group 3:IP1, IP2 ... IP9 and data group 4:II1, II2 ... II9, four groups of resulting data are delivered in mean value computation module,

Second step, mean value computation:

Mean value computation is divided into two processes, i.e. summation process and division arithmetic process, as shown in formula (7), (8):

ave＝Sum/N (7)

Sum = Σ_{i = 1}^{N} I_{i} - - - (8)

Wherein, ave is the average calculating, and Sum is all pixel value summations in sleiding form, and N is number of pixels sum in template, and N is 9;

A, summation, according to formula (8), to data group 1-4 summation, obtain Sum1, Sum2, Sum3, Sum4 respectively;

B, average, averaging is the process that FPGA processes binary division computing, by obtaining the average of 4 data groups after mean value computation, is respectively ave_P, ave_I, ave_IP and ave_II.

4. the guiding wave filter implementation method based on FPGA as claimed in claim 2, is characterized in that, in the first step of step 3, to (a _k, b _k) computation process as follows:

1) according to formula by a _kthe parallel decomposition computation of computation process as follows:

A, molecule be ave_IP-ave_I*ave_P, directly by multiplier and subtracter, obtain, in molecule, have binary subtraction computing, take condition identifier to be optimized;

B, denominator σ _k ²+ ε, wherein ε is definite value, σ _k ²for the variance of navigational figure in local window, utilize the definition of mathematical concept to variance, can know variance and average relation as shown in formula (9):

σ _k ²＝E(x ²)-E ²(X) (9)

Be ave_II-ave_I*ave_I, in computation process, need to use 1 subtracter, 1 multiplier and 1 totalizer;

C, utilize in 2 addressing values principle to integrate formula, acquire a in local window _kvalue;

2) by formula known, b _kcalculating depend on a _k, a _kin computation process, there is calculation delay, need to use 3 register pair data time sequences and carry out buffer memory, b _kcomputation process also will be used 1 multiplier and 1 subtracter.

5. the guiding wave filter implementation method based on FPGA as claimed in claim 2, is characterized in that, in the second step of step 3, addressing value concrete grammar is as follows:

When given divisor, just by calculating its inverse, because the inverse of divisor is decimal form, and value is not more than 1, therefore get fraction part and carry out high 16 after binaryzation, together with last 1 of integral part, form a binary data that bit wide is 17, note is done the inverse of divisor, by multiplying, realizes division function.

6. the guiding wave filter implementation method based on FPGA as claimed in claim 2, is characterized in that, in the 3rd step of step 3, the condition identifier concrete grammar based on to signed binary number is as follows:

Calculate n=A-B and m=n+k, wherein k is constant, A, and B is input value;

1) numerical evaluation and condition discrimination, compare differentiation to relating to data A and the B of subtraction, remember and be state S1, otherwise note is S2 if A is more than or equal to B, meanwhile does and calculate n1=A-B and n2=B-A; N=n1 in state S1 situation, n=n2 in state S2 situation;

2) state retains, and subsequent treatment is done and calculated m1=n+k and m2=k-n, if m=m1 in state S1 situation, otherwise m=m2.