CN107230196B

CN107230196B - Infrared and visible light image fusion method based on non-subsampled contourlet and target reliability

Info

Publication number: CN107230196B
Application number: CN201710249290.7A
Authority: CN
Inventors: 罗晓清; 张战成; 王鹏飞; 李丽兵; 董静; 王骏
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2017-04-17
Filing date: 2017-04-17
Publication date: 2020-08-28
Anticipated expiration: 2037-04-17
Also published as: CN107230196A

Abstract

The invention discloses an infrared and visible light image fusion method based on non-subsampled contourlet and target reliability, which mainly solves the problem that the target is not clear enough in the infrared and visible light image fusion results. The implementation steps are: 1) non-subsampling contourlet NSCT transformation is performed on the two images to be fused, and decomposed to obtain low-frequency sub-bands and high-frequency sub-bands; 2) high-frequency sub-band coefficients containing detailed information, using the absolute value of the NSCT coefficients. Take the larger fusion strategy for fusion; 3) For the low frequency subband coefficients of NSCT, achieve fusion through an adaptive hybrid fusion strategy based on target reliability; 4) Perform inverse NSCT transformation on the fused high and low frequency coefficients to obtain the fusion image. The invention can fully extract the target information of the infrared image, effectively protect the details of the visible light image, improve the visual effect, and greatly improve the quality of the fusion image compared with the traditional fusion method.

Description

Fusion of infrared and visible light images based on non-subsampled contourlet and target confidence method

技术领域technical field

本发明涉及一种基于非下采样轮廓波和目标可信度的红外与可见光图像融合方法,是图像处理技术领域的一项融合方法，在军事监控中有广泛的应用。The invention relates to an infrared and visible light image fusion method based on non-subsampling contourlet and target reliability, which is a fusion method in the field of image processing technology and is widely used in military monitoring.

背景技术Background technique

红外和可见光图像的融合在监控和军事领域有重要的意义。红外图像能记录图像场景里的热红外辐射信息但是通常红外图像的分辨率较低。可见光图像擅长细节纹理信息的表达但是很难表达热红外辐射信息。考虑到红外图像和可见光图像之间的这一互补性，将同一场景的红外图像和可见光图像融合得到的新图像将同时具有热辐射目标定位的作用和较高的分辨率。The fusion of infrared and visible light images has important implications in surveillance and military fields. Infrared images can record thermal infrared radiation information in the imaged scene but usually the resolution of infrared images is low. Visible light images are good at expressing detailed texture information but difficult to express thermal infrared radiation information. Considering this complementarity between infrared images and visible light images, a new image obtained by fusing infrared images and visible light images of the same scene will have both the effect of thermal radiation target localization and higher resolution.

通常，图像融合的方法可以分为两大类：在空域中的图像融合和变换域中的图像融合。前者有空间扭曲的问题而后者可以轻易的解决这个问题。多尺度分解工具的选择在变换域融合中起着至关重要的作用，多尺度分解技术有了很长的发展历程。在1989年，Mallat首先提出了离散小波变换(DWT)的数学模型，DWT将图像分解为一个低频子带和三个方向的高频子带。相比直接在空域中分析，DWT用三个方向的高频子带系数来捕捉图像的纹理和边缘信息。但是DWT导致混叠现象，此外由于DWT的变换过程中存在下采样的过程，因此DWT不具有平移不变性。为了克服DWT的一些缺陷，Rockinger等人提出了平稳小波变换(SWT)，SWT具有了平移不变性，但是和DWT一样，只有三个方向的高频子带。Minh N.Do等人提出了轮廓波变换(CT)，相比DWT和SWT，CT在高频部分拥有更多方向的子带，但是它缺乏平移不变性。Cunha等人提出了非下采样轮廓波变换(NSCT)，NSCT相比于CT具备了平移不变的特性。Generally, methods of image fusion can be divided into two categories: image fusion in the spatial domain and image fusion in the transform domain. The former has the problem of space distortion while the latter can easily solve this problem. The selection of multi-scale decomposition tools plays a crucial role in transform domain fusion, and multi-scale decomposition technology has a long history of development. In 1989, Mallat first proposed the mathematical model of discrete wavelet transform (DWT), which decomposes an image into a low-frequency subband and high-frequency subbands in three directions. Compared with direct analysis in the spatial domain, DWT captures the texture and edge information of the image with high frequency subband coefficients in three directions. However, DWT leads to aliasing. In addition, DWT does not have translation invariance due to the down-sampling process in the transformation process of DWT. In order to overcome some of the shortcomings of DWT, Rockinger et al. proposed the stationary wavelet transform (SWT). SWT has translation invariance but, like DWT, only has high-frequency subbands in three directions. Minh N.Do et al. proposed Contourlet Transform (CT). Compared with DWT and SWT, CT has more directional subbands in the high frequency part, but it lacks translation invariance. Cunha et al. proposed the non-subsampled contourlet transform (NSCT). Compared with CT, NSCT has the characteristic of translation invariance.

变换域图像融合中另一个很重要的因素是融合规则的设计。融合规则的设计指的是寻找一种合适的策略融合相对应子带系数从而得到最好的融合图像。融合规则的设计包括融合策略的设计和活动测度的选择。常用的融合策略包括取大融合策略和加权融合策略。取大的策略指的是选择活动测度较大者对应的子带系数作为融合后的子带系数。加权融合的策略指的是根据活动测度和权值计算公式计算待融合子带系数的权值。Another important factor in transform domain image fusion is the design of fusion rules. The design of fusion rules refers to finding a suitable strategy to fuse the corresponding subband coefficients to obtain the best fusion image. The design of fusion rules includes the design of fusion strategies and the selection of activity measures. Commonly used fusion strategies include a large fusion strategy and a weighted fusion strategy. The strategy of taking the larger one refers to selecting the subband coefficient corresponding to the larger activity measure as the fused subband coefficient. The weighted fusion strategy refers to calculating the weights of the subband coefficients to be fused according to the activity measure and the weight calculation formula.

活动测度的质量依靠于有效的特征选取，选择的特征要求能够反映图像的本质。通常特征分为人工特征和数据驱动的特征。人工标注的特征指的是用一系列由专家设计的公式求解得到的特征，比如香农熵。数据驱动的特征通常是由一些非监督的特征学习工具从数据中提取出来的，这类特征包括张量(Tensor)、稀疏表示(SR)和栈式自编码(SSAE)。Liang等人将Tensor引入了图像融合的框架里，首先利用高阶奇异值分解(HOSVD)将图像分解得到系数，然后利用系数的绝对值作为活动测度构融合规则。Yang和Li等人将SR引入了图像融合领域，首先利用重叠滑块的方法将图像分成等大的小块，然后利用一部过完备字典将图像进行分解得到具备稀疏性质的系数，最后利用分解得到的系数作为活动测度构造融合规则。考虑到红外图像和可见光图像之间差异较大，很难学习到一部字典能充分地发掘其本质特性。深度学习模型因对复杂数据具有卓越学习能力而受到越来越多的关注。相比传统的机器学习方法，深度学习模型的多层网络结构能有效地从数据中提取出属于多个抽象层次的特征。其中栈式稀疏自编码(SSAE)作为深度学习的一个分支近年来发展迅速。由于图像融合应用中缺乏带标签的训练数据，具有无监督特性的SSAE比其它属于监督学习范畴的方法更适合应用到图像融合中。The quality of the activity measure depends on effective feature selection, and the selected features are required to reflect the nature of the image. Usually features are divided into artificial features and data-driven features. Human-annotated features refer to features that are solved with a series of formulas designed by experts, such as Shannon entropy. Data-driven features are usually extracted from the data by some unsupervised feature learning tools, such features include Tensor, Sparse Representation (SR), and Stacked Autoencoder (SSAE). Liang et al. introduced Tensor into the framework of image fusion. First, they used high-order singular value decomposition (HOSVD) to decompose the image to obtain coefficients, and then used the absolute value of the coefficients as activity measures to construct fusion rules. Yang and Li et al. introduced SR into the field of image fusion. They first used overlapping sliders to divide the image into small blocks of equal size, then used an overcomplete dictionary to decompose the image to obtain sparse coefficients, and finally used the decomposition The obtained coefficients are used as activity measures to construct fusion rules. Considering the large difference between infrared images and visible light images, it is difficult to learn a dictionary that can fully explore its essential characteristics. Deep learning models have received more and more attention due to their excellent learning ability on complex data. Compared with traditional machine learning methods, the multi-layer network structure of the deep learning model can effectively extract features belonging to multiple abstraction levels from the data. Among them, stacked sparse autoencoder (SSAE), as a branch of deep learning, has developed rapidly in recent years. Due to the lack of labeled training data in image fusion applications, SSAE with unsupervised properties is more suitable for image fusion than other methods belonging to the category of supervised learning.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对上述现有技术的不足，提出了一种基于非下采样轮廓波和目标可信度的红外与可见光图像融合方法，以保护图像的目标及细节、增强图像对比度和轮廓边缘，改善其视觉效果，提高图像融合的质量。本发明具体技术方案如下：The purpose of the present invention is to address the deficiencies of the above-mentioned prior art, and propose an infrared and visible light image fusion method based on non-subsampled contourlet and target reliability, so as to protect the target and details of the image, enhance image contrast and contour edges , improve its visual effect and improve the quality of image fusion. The specific technical scheme of the present invention is as follows:

1)首先将待融合的红外图像IR和可见光图像VI，利用NSCT分别将两幅图像分解为低频子带系数

和高频子带系数

1) First, the infrared image IR and the visible light image VI to be fused are decomposed into low-frequency subband coefficients using NSCT respectively.

and high frequency subband coefficients

2)包含细节信息的高频子带系数，使用NSCT系数绝对值取大的融合策略进行融合；2) High-frequency subband coefficients containing detailed information are fused using a fusion strategy with a larger absolute value of the NSCT coefficients;

其中，

和

分别表示原图像IR、VI以及融合图像F在点(x，y)处对应的低频系数。in,

and

respectively represent the low-frequency coefficients corresponding to the original image IR, VI and the fusion image F at point (x, y).

3)对NSCT低频子带系数，通过基于目标可信度的自适应混合融合策略实现融合；3) For NSCT low-frequency subband coefficients, fusion is achieved through an adaptive hybrid fusion strategy based on target reliability;

3.1)利用Butterworth高通滤波器对低频子带进行锐化处理，将锐化后的低频子带分解为若干子块，使用两层稀疏自编码级联而成的SSAE的编码器部分来获取子块的深度编码。3.1) Use the Butterworth high-pass filter to sharpen the low-frequency sub-band, decompose the sharpened low-frequency sub-band into several sub-blocks, and use the encoder part of the SSAE composed of two layers of sparse auto-encoding to obtain the sub-blocks depth encoding.

首先将低频子带

利用Butterworth高通滤波器进行锐化处理得到锐化后的低频子带

将

利用滑动窗口技术分成num个w_m×w_n的小块，并将所有小块拉伸为(w_m×w_n)×1的列向量

其中1≤i≤num；将

作为输入数据训练两层结构的栈式稀疏自编码SSAE的第一层编码器，解得最优的W^1，1和b^1，1。The low frequency subbands are first

Use Butterworth high-pass filter for sharpening to get sharpened low-frequency subbands

Will

Use the sliding window technique to divide into num small blocks of w _m ×w _n , and stretch all the small blocks into a column vector of (w _m ×w _n )×1

where 1≤i≤num; the

As the input data, the first layer encoder of the stacked sparse autoencoder SSAE with two-layer structure is trained, and the optimal solution of W ^1,1 and b ^1,1 is obtained.

稀疏自编码中输入层到隐层的结构称为编码器，编码器的数学表达式为：The structure from the input layer to the hidden layer in sparse auto-encoding is called the encoder, and the mathematical expression of the encoder is:

a⁽ⁱ⁾＝sigmoid(W^1，1x⁽ⁱ⁾+b^1，1)a ⁽ⁱ⁾ =sigmoid(W ^1,1 x ⁽ⁱ⁾ +b ^1,1 )

隐层到输出层的结构称为解码器，解码器的数学表达式为：The structure from the hidden layer to the output layer is called the decoder, and the mathematical expression of the decoder is:

h_W，b(x⁽ⁱ⁾)＝W^2，1a⁽ⁱ⁾+b^2，1 h _{W, b} (x ⁽ⁱ⁾ ) = W ^2,1 a ⁽ⁱ⁾ +b ^2,1

其中a⁽ⁱ⁾为第i个输入数据在隐层的激活值，x⁽ⁱ⁾为输入(等于理论输出y⁽ⁱ⁾)，h_W，b(x⁽ⁱ⁾)为网络实际输出，其中1≤i≤m，m为输入数据的数量。W^1，l、b^1，l、W^2，l和b^2，l分别代表第l个自动编码器中编码器的权值矩阵、编码器的偏置项、解码器的权值矩阵、解码器的偏置项和第i个输入的隐层神经元激活值，s_l为第l层稀疏自编码器隐层神经元的数目，特别的s₀为第一层稀疏自编码的输入层神经元的数目，此处采用了两层结构的SSAE，故1≤l≤2。where a ⁽ⁱ⁾ is the activation value of the i-th input data in the hidden layer, x ⁽ⁱ⁾ is the input (equal to the theoretical output y ⁽ⁱ⁾ ), h _{W, b} (x ⁽ⁱ⁾ ) is the actual output of the network, where 1≤i≤m, where m is the number of input data. W ^1,l , b ^1,l , W ^2,l and b ^2,l respectively represent the weight matrix of the encoder, the bias term of the encoder, the weight matrix of the decoder, the decoding The bias term of the encoder and the activation value of the hidden layer neuron of the i-th input, s _l is the number of hidden layer neurons of the l-th layer sparse autoencoder, especially s ₀ is the input layer neuron of the first layer of sparse auto-encoder The number of elements, the two-layer SSAE is used here, so 1≤l≤2.

编码器负责将x⁽ⁱ⁾转换成编码，解码器负责从编码中重构出原数据，x⁽ⁱ⁾和h_W，b(x⁽ⁱ⁾)之间存在的误差称为重构误差。The encoder is responsible for converting x ⁽ⁱ⁾ into a code, and the decoder is responsible for reconstructing the original data from the encoding. The error existing between x ⁽ⁱ⁾ and h _{W, b} (x ⁽ⁱ⁾ ) is called reconstruction error.

稀疏自编码的代价函数如下所示：The cost function of sparse autoencoder is as follows:

其中，J(W，b)定义的第一项是均方差项，它描述了实际值x⁽ⁱ⁾与理论值h_W，b(x⁽ⁱ⁾)之间的差别。第二项为权重衰减项，用来防止过拟合，λ为权重衰减参数，n_l为自编码结构的层数，s_l为第l层神经元的数目，

为第l-1层中第i个神经元到第l层第j个神经元的连接权值。第三项中，

代表了第j个隐层神经元的激活值

与稀疏性参数ρ之间的KL距离，

β为稀疏惩罚项的系数，用来控制稀疏性惩罚因子的权重。Among them, the first term defined by J(W, b) is the mean square error term, which describes the difference between the actual value x ⁽ⁱ⁾ and the theoretical value h _W,b (x ⁽ⁱ⁾ ). The second term is the weight decay term, which is used to prevent overfitting, λ is the weight decay parameter, n _l is the number of layers of the self-encoding structure, s _l is the number of neurons in the lth layer,

is the connection weight from the i-th neuron in the l-1 layer to the j-th neuron in the l-th layer. In the third item,

represents the activation value of the jth hidden layer neuron

KL distance from the sparsity parameter ρ,

β is the coefficient of the sparsity penalty term, which is used to control the weight of the sparsity penalty factor.

自动编码器的训练过程是利用梯度下降法求解W^1，1和b^1，1的过程，具体如下：The training process of the autoencoder is the process of solving W ^1,1 and b ^1,1 using the gradient descent method, as follows:

步骤1：设置W¹：＝0，b¹：＝0，ΔW¹：＝0，Δb¹＝0；Step 1: set W ¹ :=0, b ¹ :=0, ΔW ¹ :=0, Δb ¹ =0;

步骤2：计算编码器的重构误差J(W¹，b¹)；Step 2: Calculate the reconstruction error J(W ¹ , b ¹ ) of the encoder;

步骤3：Step 3:

while J(W¹，b¹)＞10^-6：while J(W ¹ , b ¹ )>10 ^-6 :

for i＝1to max_Iter：for i=1to max_Iter:

更新W¹和b¹：Update W1 and ^b1 ^:

其中，ΔW¹和ΔW²为W¹和b¹的增量，α为更新速率，max_Iter为最大迭代次数。where ΔW ¹ and ΔW ² are the increments of W ¹ and b ¹ , α is the update rate, and max_Iter is the maximum number of iterations.

当解出最优的W¹和b¹后，利用编码器的数学表达式对所有的输入数据进行编码，计算隐层神经元的激活值a^(i)，1；以a^(i)，1作为输入训练下一层自编码器，解得最优的W^1，2和b¹ ^，2；After solving the optimal W ¹ and b ¹ , use the mathematical expression of the encoder to encode all the input data, and calculate the activation value of the hidden layer neurons a ^{(i), 1} ; with a ^{(i), 1} As the input to train the next layer of autoencoder, the optimal solution of W 1, ² and b ¹ ^{, 2} is obtained;

当SSAE训练完毕，利用a^(i)，l＝sigmoid(W^1，2a^(l-1)，1+b^1，2)对所有的输入数据进行编码，提取深度的稀疏特征a^(i)，l。其中，a^(i)，l为第i个输入数据在第l层稀疏自编码器的隐层激活值，也是该SSAE对第i个输入数据的编码。因此，低频子块

的SSAE深度编码为a^(i)，2。When the SSAE training is completed, use a ^{(i), l} = sigmoid (W ^1,2 a ^{(l-1), 1} +b ^1,2 ) to encode all the input data, and extract the deep sparse features a ^{(i) , l} . Among them, a ^{(i), l} is the hidden layer activation value of the i-th input data in the l-th layer sparse autoencoder, and it is also the encoding of the i-th input data by the SSAE. Therefore, the low frequency sub-block

The SSAE depth encoding is a ^(i),2 .

3.2)通过低频子块获取的深度编码构建目标可信度函数，采用子块的目标可信度作为自适应混合融合策略中的权重，实现低频子带的融合。3.2) Construct the target credibility function through the depth coding obtained by the low-frequency sub-blocks, and use the target credibility of the sub-blocks as the weight in the adaptive hybrid fusion strategy to realize the fusion of the low-frequency sub-bands.

将

利用滑动窗口技术分成num个w_m×w_n的子块

其中1≤i≤num。根据获得的低频子块

的SSAE深度编码a^(i)，2来计算

的红外目标可信度OR⁽ⁱ⁾，具体如下：Will

Using the sliding window technique to divide into num sub-blocks of w _m ×w _n

where 1≤i≤num. According to the obtained low frequency sub-block

The SSAE depth encoding a ^{(i), 2} to compute

The infrared target confidence OR ⁽ⁱ⁾ of , as follows:

定义函数

define function

低频子带

采用基于OR的自适应混合融合策略进行融合。当En⁽ⁱ⁾小于等于阈值t时，融合策略为取大融合策略，反之，融合策略为加权平均融合策略。具体融合规则如下：low frequency subband

The OR-based adaptive hybrid fusion strategy is used for fusion. When En ⁽ⁱ⁾ is less than or equal to the threshold t, the fusion strategy is the larger fusion strategy, otherwise, the fusion strategy is the weighted average fusion strategy. The specific fusion rules are as follows:

其中，

为

的权值，

为

的权值。in,

for

value of ,

for

weight value.

最后，使用滑动窗口变换的逆变换将所有融合后的低频子块

转换成融合后的低频子带图像

Finally, all fused low-frequency sub-blocks are transformed using the inverse of the sliding window transform

Converted to fused low-frequency subband images

4)将步骤2)和3)中得到的融合系数进行NSCT逆变换得到融合图像。4) Perform inverse NSCT transformation on the fusion coefficients obtained in steps 2) and 3) to obtain a fusion image.

本发明相对比现有红外和可见光图像融合方法具有如下的优点：Compared with the existing infrared and visible light image fusion methods, the present invention has the following advantages:

1、本发明采用基于非下采样轮廓波变换(NSCT)作为多尺度分解工具，相比于小波变换(DWT)，NSCT能捕捉更多的方向信息且消除了伪吉布斯现象；相比于平稳小波变换(SWT)，NSCT得到的高频子带方向数更多；相比于轮廓波变换(CT)缺乏平移不变性的问题，NSCT不存在下采样的操作，故具备了平移不变性。1. The present invention adopts non-subsampling contourlet transform (NSCT) as a multi-scale decomposition tool. Compared with wavelet transform (DWT), NSCT can capture more directional information and eliminate the pseudo-Gibbs phenomenon; In stationary wavelet transform (SWT), NSCT can obtain more high-frequency sub-band directions. Compared with contourlet transform (CT), which lacks translation invariance, NSCT has no down-sampling operation, so it has translation invariance.

2、本发明采用栈式编码器(SSAE)对低频子块实施稀疏自编码，以子块稀疏自编码的和作为子块特征用于后续红外目标的判别。该特征一种数据驱动的特征，是从输入图像通过深度网络学习到的特征，是比人工设计的特征更具代表性的特征，更加适合于图像数据表示。2. The present invention uses a stack encoder (SSAE) to implement sparse self-encoding on the low-frequency sub-blocks, and uses the sparse self-encoded sum of the sub-blocks as a sub-block feature for subsequent infrared target discrimination. This feature is a data-driven feature, which is learned from the input image through a deep network. It is a more representative feature than artificially designed features and is more suitable for image data representation.

3、本发明构建ReLu_tan函数结合子块稀疏自编码的和计算低频子块的目标可信度，并将其用来计算低频融合规则中的权值，使得本发明得到的融合图像在背景和目标交界处的变化更加自然。3. The present invention constructs the ReLu_tan function in combination with the sparse self-encoding of sub-blocks and calculates the target reliability of the low-frequency sub-blocks, and uses it to calculate the weights in the low-frequency fusion rules, so that the fusion image obtained by the present invention is in the background and the target. The change at the junction is more natural.

附图说明：Description of drawings:

图1是整体本发明的整体融合框架图。Figure 1 is an overall fusion framework diagram of the overall invention.

图2是低频子带系数的融合规则示意图。FIG. 2 is a schematic diagram of a fusion rule for low-frequency subband coefficients.

图3是ReLu_tan函数曲线示意图。FIG. 3 is a schematic diagram of the ReLu_tan function curve.

图4(a)和(b)是本发明第一个实施例的待融合红外与可见光图像；(c)是基于加权平均的融合算法(AVG)的融合图像；(d)是基于小波变换(DWT)的融合图像；(e)是基于脉冲耦合神经网络(PCNN)的融合图像；(f)是基于结构张量和小波变换(STR-DWT)的融合图像；(g)是基于NSCT和PCNN(P-NSCT)的融合图像；(h)是基于指导滤波(GFF)的融合图像；(i)是本发明方法的融合图像。Figure 4 (a) and (b) are infrared and visible light images to be fused according to the first embodiment of the present invention; (c) is a fusion image based on a weighted average fusion algorithm (AVG); (d) is a fusion image based on wavelet transform ( DWT); (e) is the fusion image based on pulse coupled neural network (PCNN); (f) is the fusion image based on structure tensor and wavelet transform (STR-DWT); (g) is based on NSCT and PCNN (P-NSCT) fused image; (h) fused image based on Guided Filtering (GFF); (i) fused image of the method of the present invention.

图5(a)-(g)是图4(c)-(i)的目标的局部放大图。Figures 5(a)-(g) are partial enlarged views of the target of Figures 4(c)-(i).

图6(a)和(b)是本发明第二个实施例的待融合红外与可见光图像；(c)是基于加权平均的融合算法(AVG)的融合图像；(d)是基于小波变换(DWT)的融合图像；(e)是基于脉冲耦合神经网络(PCNN)的融合图像；(f)是基于结构张量和小波变换(STR-DWT)的融合图像；(g)是基于NSCT和PCNN(P-NSCT)的融合图像；(h)是基于指导滤波(GFF)的融合图像；(i)是本发明方法的融合图像。Figure 6 (a) and (b) are infrared and visible light images to be fused according to the second embodiment of the present invention; (c) is a fusion image based on a weighted average fusion algorithm (AVG); (d) is a fusion image based on wavelet transform ( DWT); (e) is the fusion image based on pulse coupled neural network (PCNN); (f) is the fusion image based on structure tensor and wavelet transform (STR-DWT); (g) is based on NSCT and PCNN (P-NSCT) fused image; (h) fused image based on Guided Filtering (GFF); (i) fused image of the method of the present invention.

图7(a)-(g)是图4(c)-(i)的目标的局部放大图。Figures 7(a)-(g) are partial enlarged views of the objects of Figures 4(c)-(i).

具体实施方式：Detailed ways:

下面对本发明的实施例结合附图作详细说明，本实施例在以本发明技术方案为前提下进行，如图1所示，详细的实施方式和具体的操作步骤如下：The embodiments of the present invention are described in detail below in conjunction with the accompanying drawings. The present embodiment is carried out on the premise of the technical solution of the present invention. As shown in Figure 1, the detailed implementation manner and specific operation steps are as follows:

1)对待融合的两幅多聚焦图像使用NSCT变换进行分解，图像分解后得到低频子带

和高频子带系数

，其中尺度分解LP选用“maxflat”，方向滤波器组选用“pmaxflat”，方向分解参数设置为[3,4,5]；1) The two multi-focus images to be fused are decomposed using NSCT transform, and the low-frequency sub-bands are obtained after the images are decomposed

and high frequency subband coefficients

, where the scale decomposition LP selects "maxflat", the direction filter bank selects "pmaxflat", and the direction decomposition parameters are set to [3, 4, 5];

其中，

和

and

3)对NSCT低频子带系数，通过基于目标可信度的自适应混合融合策略实现融合，具体实现流程见图2；3) For the NSCT low-frequency subband coefficients, fusion is achieved through an adaptive hybrid fusion strategy based on target reliability, and the specific implementation process is shown in Figure 2;

首先将低频子带

将

其中1≤i≤num；将

Will

where 1≤i≤num; the

其中a⁽ⁱ⁾为第i个输入数据在隐层的激活值，x⁽ⁱ⁾为输入(等于理论输出y⁽ⁱ⁾)，h_W，b(x⁽ⁱ⁾)为网络实际输出，其中1≤i≤m，m为输入数据的数量。W^1，l、b^1，l、W^2，1和b^2，l分别代表第l个自动编码器中编码器的权值矩阵、编码器的偏置项、解码器的权值矩阵、解码器的偏置项和第i个输入的隐层神经元激活值，s_l为第l层稀疏自编码器隐层神经元的数目，特别的s₀为第一层稀疏自编码的输入层神经元的数目，此处采用了两层结构的SSAE，故1≤l≤2。where a ⁽ⁱ⁾ is the activation value of the i-th input data in the hidden layer, x ⁽ⁱ⁾ is the input (equal to the theoretical output y ⁽ⁱ⁾ ), h _{W, b} (x ⁽ⁱ⁾ ) is the actual output of the network, where 1≤i≤m, where m is the number of input data. W ^1,l , b ^1,l , W ^2,1 and b ^2,l respectively represent the weight matrix of the encoder, the bias term of the encoder, the weight matrix of the decoder, the decoding The bias term of the encoder and the activation value of the hidden layer neuron of the i-th input, s _l is the number of hidden layer neurons of the l-th layer sparse autoencoder, especially s ₀ is the input layer neuron of the first layer of sparse auto-encoder The number of elements, the two-layer SSAE is used here, so 1≤l≤2.

代表了第j个隐层神经元的激活值

与稀疏性参数ρ之间的KL距离，

represents the activation value of the jth hidden layer neuron

KL distance from the sparsity parameter ρ,

步骤3：Step 3:

while J(W¹，b¹)＞10^-6：while J(W ¹ , b ¹ )>10 ^-6 :

for i＝1to max_Iter：for i=1to max_Iter:

更新W¹和b¹：Update W1 and ^b1 ^:

The SSAE depth encoding is a ^(i),2 .

将

利用滑动窗口技术分成num个w_m×w_n的子块

其中1≤i≤num。根据获得的低频子块

的SSAE深度编码a^(i)，2来计算

的红外目标可信度OR⁽ⁱ⁾，具体如下：Will

Using the sliding window technique to divide into num sub-blocks of w _m ×w _n

where 1≤i≤num. According to the obtained low frequency sub-block

The SSAE depth encoding a ^{(i), 2} to compute

The infrared target confidence OR ⁽ⁱ⁾ of , as follows:

定义函数ReLu_tanh函数，函数曲线见图3，具体定义如下；Define the function ReLu_tanh function, the function curve is shown in Figure 3, and the specific definition is as follows;

其中，u用来控制曲线陡度，本专利u＝4，可信度函数的阈值t定义为：Among them, u is used to control the curve steepness, u=4 in this patent, and the threshold value t of the credibility function is defined as:

低频子带

采用基于OR的自适应混合融合策略进行融合。如图3函数曲线所示，可知当En⁽ⁱ⁾小于等于阈值t时，融合策略为取大融合策略，反之，融合策略为加权平均融合策略。具体融合规则如下：low frequency subband

The OR-based adaptive hybrid fusion strategy is used for fusion. As shown in the function curve of Fig. 3, it can be seen that when En ⁽ⁱ⁾ is less than or equal to the threshold t, the fusion strategy is the larger fusion strategy, and vice versa, the fusion strategy is the weighted average fusion strategy. The specific fusion rules are as follows:

其中，

为

的权值，

为

的权值。in,

for

value of ,

for

weight value.

最后，使用滑动窗口变换的逆变换将所有融合后的低频子块

转换成融合后的低频子带图像

Converted to fused low-frequency subband images

实验条件与方法：Experimental conditions and methods:

硬件平台为：Intel(R)处理器，CPU主频1.80GHz，内存1.0GB；The hardware platform is: Intel(R) processor, CPU main frequency 1.80GHz, memory 1.0GB;

软件平台为：MATLAB R2016a；实验中采用两组已配准的红外与可见光图像，图像大小均为256×256，tif格式。第一组红外与可见光图像见图4(a)和图4(b)，第一组红外与可见光图像见图6(a)和图6(b)。The software platform is: MATLAB R2016a; two groups of registered infrared and visible light images are used in the experiment, and the image size is 256×256 in tif format. The first group of infrared and visible light images are shown in Fig. 4(a) and Fig. 4(b), and the first group of infrared and visible light images are shown in Fig. 6(a) and Fig. 6(b).

仿真实验：Simulation:

为了验证本发明的可行性和有效性，采用了两组红外-可见光图像测试，融合结果如图4、图5、图6和图7所示。In order to verify the feasibility and effectiveness of the present invention, two groups of infrared-visible light image tests are used, and the fusion results are shown in FIG. 4 , FIG. 5 , FIG. 6 and FIG. 7 .

仿真一：遵循本发明的技术方案，对第一组红外与可见光图像(见图4(a)和图4(b))进行融合，通过图4(c)-图4(i)的分析可以看出：本专利所提的方法的融合图像中目标小人处最为明显，丛林处细节保留最多，整体清晰度最高。图5给出了各融合图像在目标小人处的局部放大效果图，比较可知本专利所提方法的小人处边缘清晰并且对比度高，融合效果最佳。Simulation 1: Following the technical scheme of the present invention, the first group of infrared and visible light images (see Fig. 4(a) and Fig. 4(b)) are fused. Through the analysis of Fig. 4(c)-Fig. 4(i), we can It can be seen that in the fusion image of the method proposed in this patent, the target villain is the most obvious, the jungle is the most detailed, and the overall definition is the highest. Figure 5 shows the partial magnification effect of each fusion image at the target villain. The comparison shows that the method proposed in this patent has clear edges and high contrast at the villain, and the fusion effect is the best.

仿真二：遵循本发明的技术方案，对第二组红外与可见光图像(见图6(a)和图6(b))进行融合，通过图6(c)-图6(i)的分析可以看出：图6(c)、(d)和(g)存在整体亮度较低、枪支轮廓不明显、人脸不清晰的问题，图6(e)与(h)整体的亮度高于图6(c)、(d)与(g)，但存在着噪声信息过多，这导致枪支轮廓受噪声干扰严重。图6(f)噪声较少，但是枪支轮廓不明显。本专利所提方法的融合结果图6(i)整体亮度高、噪声少、枪支轮廓明显。图7可以佐证上述结论。Simulation 2: Following the technical solution of the present invention, the second group of infrared and visible light images (see Fig. 6(a) and Fig. 6(b)) are fused. Through the analysis of Fig. 6(c)-Fig. 6(i) It can be seen that: Figures 6(c), (d) and (g) have the problems of low overall brightness, inconspicuous outlines of guns, and unclear faces. The overall brightness of Figures 6(e) and (h) is higher than that of Figure 6. (c), (d) and (g), but there is too much noise information, which causes the gun outline to be seriously disturbed by noise. Figure 6(f) is less noisy, but the gun outline is not obvious. The fusion result of the method proposed in this patent is shown in Fig. 6(i) with high overall brightness, less noise, and obvious gun outline. Figure 7 can support the above conclusion.

表1和表2给出了两种数据集利用各种融合方法实验结果的客观评价指标，其中加粗的数据表示对应的评价指标为最优值。AVG为基于空域像素值取平均的图像融合方法，DWT为基于离散小波分解的图像融合方法，PCNN为基于脉冲耦合神经网络的图像融合方法，STR-DWT为基于结构张量和离散小波分解的图像融合方法，SW-PCNN为基于非下采样轮廓波变换和脉冲耦合神经网络的图像融合方法，GFF为基于指导滤波的图像融合，NSCT-SSAE为本发明提出的基于NSCT和SSAE的图像融合方法。实验选用信息熵(EN)、平均梯度(AG)、边缘转换率(Qabf)、边缘强度(EI)、互信息(MI)、标准差(SD)和空间频率(SF)作为客观评价指标。Tables 1 and 2 show the objective evaluation indicators of the experimental results of the two datasets using various fusion methods, where the bold data indicates that the corresponding evaluation indicators are the optimal values. AVG is an image fusion method based on the average of spatial pixel values, DWT is an image fusion method based on discrete wavelet decomposition, PCNN is an image fusion method based on pulse coupled neural network, STR-DWT is an image fusion method based on structure tensor and discrete wavelet decomposition Fusion method, SW-PCNN is an image fusion method based on non-subsampled contourlet transform and pulse coupled neural network, GFF is an image fusion method based on guided filtering, and NSCT-SSAE is an image fusion method based on NSCT and SSAE proposed by the present invention. In the experiment, information entropy (EN), average gradient (AG), edge transition rate (Qabf), edge strength (EI), mutual information (MI), standard deviation (SD) and spatial frequency (SF) were selected as objective evaluation indicators.

由表1和表2的数据表明，本发明方法所获得的融合图像在信息熵、平均梯度、边缘强度、标准差和空间频率等客观评价指标上要优于其它的融合方法。信息熵反应的是图像携带信息量的多少，其值说明融合图像中包含的信息量越大，融合效果越好；平均梯度反应的是图像的清晰度，其值越大视觉效果越好；边缘转换率反应的是待融合图像的边缘信息转移到融合图像中的程度，其值越接近1视觉效果越好；边缘强度衡量的是图像边缘细节的丰富程度，其值越大则主观效果越好；互信息反应的是待融合图像和融合图像之间信息的相关程度，其值越大视觉效果越好；标准差反应的是图像灰度相比于灰度均值的离散程度，其值越大则灰度级越分散，则视觉效果越好。空间频率反应的是融合图像的灰度变化程度，其值越大说明融合图像的细节性越好。The data in Table 1 and Table 2 show that the fusion image obtained by the method of the present invention is superior to other fusion methods in objective evaluation indicators such as information entropy, average gradient, edge strength, standard deviation and spatial frequency. The information entropy reflects the amount of information carried by the image, and its value indicates that the greater the amount of information contained in the fused image, the better the fusion effect; the average gradient reflects the clarity of the image, and the larger the value, the better the visual effect; the edge The conversion rate reflects the degree to which the edge information of the image to be fused is transferred into the fused image, and the closer the value is to 1, the better the visual effect; the edge strength measures the richness of the edge details of the image, and the larger the value, the better the subjective effect. ; Mutual information reflects the degree of information correlation between the image to be fused and the fused image, and the larger the value, the better the visual effect; the standard deviation reflects the degree of dispersion of the image gray level compared to the average gray level, and the larger the value, the better the visual effect. The more dispersed the gray level, the better the visual effect. The spatial frequency reflects the degree of grayscale change of the fused image, and the larger the value, the better the detail of the fused image.

从各仿真实验的融合结果可以看出，本发明的融合图像全局清晰，目标明确，融合图像信息丰富。无论是从主观人类视觉感知上还是客观评价指标上都能证明本发明的有效性。It can be seen from the fusion results of each simulation experiment that the fusion image of the present invention is globally clear, the target is clear, and the fusion image information is rich. The effectiveness of the present invention can be proved both in terms of subjective human visual perception and objective evaluation indicators.

表1第一组红外与可见光图像融合结果客观评价指标Table 1 The objective evaluation index of the first group of infrared and visible light image fusion results

表2第二组红外与可见光图像融合结果客观评价指标Table 2 Objective evaluation indicators of the second group of infrared and visible light image fusion results

Claims

1. The infrared and visible light image fusion method based on the non-downsampling contourlet and the target reliability is characterized by comprising the following steps of:

1) carrying out non-subsampled contourlet NSCT transformation on the two images to be fused, and decomposing to obtain a low-frequency sub-band and a high-frequency sub-band;

2) fusing the high-frequency sub-band coefficients containing the detail information by using a fusion strategy that the absolute value of the NSCT coefficient is large;

3) fusing NSCT low-frequency sub-band coefficients by a target-reliability-based self-adaptive mixed fusion strategy;

3.1) sharpening the low-frequency sub-band by using a Butterworth high-pass filter, decomposing the sharpened low-frequency sub-band into a plurality of sub-blocks, and acquiring depth coding of the sub-blocks by using an SSAE (single-layer adaptive amplitude analysis) encoder part formed by cascading two layers of sparse self-coding;

3.2) calculating the sum of depth codes obtained by the subblocks through SSAE, taking the sum as the energy of the low-frequency subblock, then calculating the target reliability of the low-frequency subblock by combining a designed target reliability function ReLu _ tanh, and taking the target reliability of the low-frequency subblock as the weight in the self-adaptive mixed fusion strategy to realize the fusion of the low-frequency subband;

4) and (3) performing NSCT inverse transformation on the fusion coefficient obtained in the steps 2) and 3) to obtain a fusion image.

2. The infrared and visible image fusion method based on non-subsampled contourlet and target confidence according to claim 1, wherein said step 1) comprises: decomposing the infrared image IR and the visible light image VI to be fused into low-frequency subband coefficients by using NSCT (non-subsampled Contourlet transform)

And high frequency subband coefficients

3. The infrared and visible image fusion method based on non-subsampled contourlet and target confidence according to claim 1, wherein said step 2) comprises the steps of:

wherein,

and

the low-frequency coefficients corresponding to the points (x, y) of the original images IR and VI and the fused image F are respectively represented.

4. The infrared and visible image fusion method based on non-subsampled contourlet and target confidence according to claim 1, wherein said step 3.1) comprises the steps of:

1) first, low frequency sub-band

Sharpening by using a Butterworth high-pass filter to obtain a sharpened low-frequency sub-band

Will be provided with

Dividing into num w by sliding window technique_m×w_nAnd all the pieces are stretched to (w)_m×w_n) × 1 column vector

Wherein i is more than or equal to 1 and less than or equal to num;

2) will be provided with

Training a first-layer encoder of a stacked sparse self-encoding SSAE (simple sequence analysis) with a two-layer structure as input data to obtain an optimal W^1，1And b^1，1；

The structure from an input layer to a hidden layer in sparse self-coding is called an encoder, and the mathematical expression of the encoder is as follows:

a⁽ⁱ⁾＝sigmoid(W^1，1x⁽ⁱ⁾+b^1，1)

the structure from hidden layer to output layer is called decoder, and the mathematical expression of the decoder is:

h_w，b(x⁽ⁱ⁾)＝W^2，1a⁽ⁱ⁾+b^2，1

wherein a is⁽ⁱ⁾For the activation value of the ith input data in the hidden layer, x⁽ⁱ⁾Is an input (equal to the theoretical output y)⁽ⁱ⁾)，h_W，b(x⁽ⁱ⁾) For the actual output of the network, i is more than or equal to 1 and less than or equal to m, m is the number of input data, w^1，ι、b^1，ι、W^2，ιAnd b^2，ιRespectively representing the weight matrix of the encoder, the bias term of the encoder, the weight matrix of the decoder, the bias term of the decoder and the hidden layer neuron activation value of the ith input, s_lNumber of hidden layer neurons for layer I sparse autoencoder, special s₀The number of the first layer of sparse self-coding input layer neurons is SSAE with a two-layer structure, so that l is more than or equal to 1 and less than or equal to 2;

the encoder is responsible for dividing x⁽ⁱ⁾Conversion to code, decoder for reconstructing original data from code, x⁽ⁱ⁾And h_W，b(x⁽ⁱ⁾) The error existing between the two is called reconstruction error;

the cost function of sparse self-encoding is as follows:

where the first term defined by J (W, b) is the mean square error term, which describes the actual value x⁽ⁱ⁾To the theoretical value h_W，b(x⁽ⁱ⁾) The difference between them; the second term is a weighted decay term to prevent overfitting, λ is a weighted decay parameter, n_lNumber of layers, s, for self-coding structure_lIs the number of layer I neurons,

the connection weight value from the ith neuron in the l-1 layer to the jth neuron in the l layer is obtained; in the third item, the first and second items,

represents the activation value of the jth hidden layer neuron

The KL distance from the sparsity parameter p,

β is coefficient of sparse penalty term to control weight of sparse penalty factor;

the training process of the automatic encoder is to solve W by using a gradient descent method^1，1And b^1，1The process of (2) is as follows:

step 1: set up W¹：＝0，b¹：＝0，ΔW¹：＝0，Δb¹＝0；

Step 2: calculating a reconstruction error J (W) of an encoder¹，b¹)；

And step 3:

while J(W¹，b¹)＞10^-6：

for i＝1 to max_Iter：

updating W¹And b¹：

Wherein, Δ W¹And Δ W²Is W¹And b¹α is the update rate, max _ Iter is the maximum number of iterations;

3) when the optimal W is solved¹And b¹Then, all the input data are encoded by using the mathematical expression of the encoder, and the activation value a of the hidden layer neuron is calculated^(i)，1；

4) With a^(i)，1Training the next layer self-encoder as input to obtain optimal W^1，2And b^1，2；

5) When SSAE training is completed, use a^(i)，ι＝sigmoid(W^1，2a^(ι-i)，1+b^1，2) All input data are coded, and the sparse feature a of depth is extracted^(i)ιWherein a is^(i)，ιThe hidden layer activation value of the I < th > input data in the I < th > layer sparse self-encoder is also the encoding of the I < th > input data by the SSAE, therefore, the low-frequency sub-block

SSAE depth coding of^(i)，2。

5. The infrared and visible image fusion method based on non-subsampled contourlet and target confidence according to claim 1, wherein said step 3.2) comprises:

will be provided with

Dividing into num w by sliding window technique_m×w_nSub-block of

Wherein i is more than or equal to 1 and less than or equal to num, according to the obtained low-frequency subblocks

SSAE depth coding of^(i)，2To calculate

Infrared target confidence of OR⁽ⁱ⁾The method comprises the following steps:

defining functions

Wherein, the threshold t of the credibility function is defined as:

low frequency sub-band

Adopting an OR-based self-adaptive mixed fusion strategy for fusion, when En⁽ⁱ⁾When the fusion policy is less than or equal to the threshold t, the fusion policy is a large fusion policy, otherwise, the fusion policy is a weighted average fusion policy, and the specific fusion rule is as follows:

wherein,

is composed of

The weight of (a) is calculated,

is composed of

The weight of (2);

finally, all fused low-frequency sub-blocks are transformed by inverse transformation of sliding window transformation

Conversion into fused low frequency subband images