CN110852316A

CN110852316A - Image tampering detection and positioning method adopting convolution network with dense structure

Info

Publication number: CN110852316A
Application number: CN201911081464.9A
Authority: CN
Inventors: 张榕瑜; 倪江群
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-02-28
Anticipated expiration: 2039-11-07
Also published as: CN110852316B

Abstract

The invention provides an image tampering detection and positioning method using a dense structure convolutional network, which includes inputting an image to be tested, performing preprocessing with spatial enrichment SRM convolution to obtain a preprocessed image, and constructing a densely connected volume. The product network extracts the tampered image feature of the preprocessed image, obtains the binary classification information of the image to be tested, and completes the detection of image tampering; constructs a deconvolution network symmetrical with the connected convolutional network structure, and takes the binary classification information as input. ; According to the obtained image tampering area, the image after positioning is completed by the deconvolution network. The method provided by the present invention applies deep learning technology to image tampering detection and positioning, is suitable for a variety of tampering methods, and has good robustness and practicability; provides a unified framework for detection and positioning, not only multi-image Whether the prediction has been made by tampering, it can also predict the tampered area, give accurate pixel-by-pixel annotations, and obtain detailed object contour boundaries.

Description

An Image Tampering Detection and Localization Method Using Densely Structured Convolutional Networks

技术领域technical field

本发明涉及图像盲取证技术领域，更具体的，涉及一种采用密集结构卷积网络的图像篡改检测和定位方法。The invention relates to the technical field of image blind forensics, and more particularly, to an image tampering detection and positioning method using a dense structure convolutional network.

背景技术Background technique

在信息时代，图像作为传播信息的主要方式之一，因为其具有直观的事物表现和思维导向，因此这种信息传播方式已经完全融入了人类的生活。然而，图像篡改技术的发展也突飞猛进，对多媒体内容安全带来的重大威胁不容忽视。当前的识别图像篡改技术主要分为基于人工特征提取的方法以及基于深度学习的方法。In the information age, images are one of the main ways of disseminating information. Because of their intuitive representation of things and thinking orientation, this way of information dissemination has been fully integrated into human life. However, the development of image tampering technology is also advancing by leaps and bounds, and the major threat to the security of multimedia content cannot be ignored. Current image tampering techniques are mainly divided into methods based on artificial feature extraction and methods based on deep learning.

基于人工特征提取的方法是对图像做各种变换，提取图像特征后使用阈值或者机器学习的方法进行分类，但该方法依赖于研究者对图像特征的建模，通常只适用于一种类型的图像篡改识别，也即虽然在一种篡改手段上有比较好的效果，但是对其他篡改手段的适用性不好，鲁棒性和可扩展性差；基于深度学习的方法通常只聚焦实现检测与定位中的一种，虽然在检测上能达到很高的准确率，但是无法发挥出深度学习在目标检测上的优越性能，没有充分利用检测和定位的联系性。The method based on artificial feature extraction is to perform various transformations on the image, and then use threshold or machine learning methods to classify image features, but this method relies on the modeling of image features by researchers, and is usually only applicable to one type of Image tampering recognition, that is, although it has a good effect on one tampering method, it has poor applicability to other tampering methods, and has poor robustness and scalability; deep learning-based methods usually only focus on detection and localization. One of them, although it can achieve high accuracy in detection, cannot exert the superior performance of deep learning in target detection, and does not make full use of the connection between detection and positioning.

发明内容SUMMARY OF THE INVENTION

本发明为克服现有的识别图像篡改技术存在无法同时实现图像篡改的检测和定位的技术缺陷，提供一种采用密集结构卷积网络的图像篡改检测和定位方法。The invention provides an image tampering detection and localization method using a dense structure convolutional network in order to overcome the technical defect that the existing image tampering recognition technology cannot realize the detection and location of image tampering at the same time.

为解决上述技术问题，本发明的技术方案如下：For solving the above-mentioned technical problems, the technical scheme of the present invention is as follows:

一种采用密集结构卷积网络的图像篡改检测和定位方法，包括以下步骤：A method for image tampering detection and localization using densely structured convolutional networks, comprising the following steps:

S1：输入待测图像，对待测进行空间富集SRM卷积进行预处理，得到预处理后的图像；S1: Input the image to be tested, perform preprocessing with spatial enrichment SRM convolution to obtain the preprocessed image;

S2：构建密集连接卷积网络对预处理后的图像进行篡改图像特征提取，得到待测图像的二分类信息，完成对图像篡改的检测；S2: Construct a densely connected convolutional network to extract the tampered image features of the preprocessed image, obtain the binary classification information of the image to be tested, and complete the detection of image tampering;

S3：构建与连接卷积网络结构对称的反卷积网络，将待测图像的二分类信息作为输入，定位图像篡改区域；S3: Construct a deconvolutional network that is symmetrical with the connected convolutional network structure, and uses the binary information of the image to be tested as input to locate the image tampering area;

S4：根据得到的图像篡改区域，由反卷积网络完成定位后的图像，完成对图像篡改的定位。S4: According to the obtained image tampering area, the image after positioning is completed by the deconvolution network, and the positioning of the image tampering is completed.

其中，在所述步骤S2中，所述的密集连接卷积网络包括池化层、密集层、过渡层、全局平均池化层和全连接层；其中：Wherein, in the step S2, the densely connected convolutional network includes a pooling layer, a dense layer, a transition layer, a global average pooling layer and a fully connected layer; wherein:

所述池化层对预处理后的图像进行一次卷积和最大池化操作，并将结果输入密集层中；The pooling layer performs a convolution and maximum pooling operation on the preprocessed image, and inputs the result into the dense layer;

所述密集层、过渡层均设置有多层，每个密集层的输出结果均输入对应的过渡层中，最终由最后一层过渡层将得到的篡改图像特征图输入所述全局平均池化层中；The dense layer and the transition layer are provided with multiple layers, and the output results of each dense layer are input into the corresponding transition layer, and finally the tampered image feature map obtained by the last transition layer is input into the global average pooling layer. middle;

所述全局平均池化层将篡改图像特征图进行平均池化，并由所述全连接层计算输出两个概率值，分别代表篡改和非篡改的概率，得到待测图像的二分类信息。The global average pooling layer performs average pooling on the tampered image feature map, and the fully connected layer calculates and outputs two probability values, which represent the probability of tampering and non-tampering respectively, and obtains the binary classification information of the image to be tested.

其中，所述密集层包括多个基本结构层，每个基本结构层由两个连续的卷积层组成，其中每个基本结构层的输入都有前一层的输出进行合并操作而成，是残差结构的局部稠密版本。Wherein, the dense layer includes a plurality of basic structure layers, each basic structure layer is composed of two consecutive convolutional layers, wherein the input of each basic structure layer is merged with the output of the previous layer, which is A locally dense version of the residual structure.

其中，所述密集连接卷积网络设置有四个密集层，分别包含了5、10、20、12个基本结构层。The densely connected convolutional network is provided with four dense layers, including 5, 10, 20, and 12 basic structure layers, respectively.

其中，所述过渡层包括一层卷积层，其对密集层输入的特征图先卷积一次，再进行平均池化，对图像尺寸进行缩小。The transition layer includes a convolution layer, which convolves the feature map input to the dense layer once, and then performs average pooling to reduce the size of the image.

其中，所述全连接层通过softmax函数计算输出两个概率值，具体计算公式为：Wherein, the fully connected layer calculates and outputs two probability values through the softmax function, and the specific calculation formula is:

其中，i代表两个类别篡改/非篡改，

代表网络在i类别上的输出值，y_i表示样本在i类别上的真实值，a_i代表i类别的权重。where i represents the two categories of tampering/non-tampering,

Represents the output value of the network on category i, _yi represents the real value of the sample on category i, and a _i represents the weight of category i.

上述方案中，为了更好地捕捉图像的篡改噪声特征，对输入图像的RGB三通道进行一次SRM卷积，卷积核用归一化后的SRM进行初始化，一个卷积核的三个通道都用同一个模型赋值，得到30个滤波器，卷积后的输出再和RGB三通道进行串联合并。In the above scheme, in order to better capture the tampering noise features of the image, an SRM convolution is performed on the RGB three channels of the input image, and the convolution kernel is initialized with the normalized SRM, and the three channels of a convolution kernel are all Use the same model to assign values to get 30 filters, and the convolutional output is combined with the RGB three channels in series.

上述方案中，经过池化层进行池化操作后，利用密集连接卷积网络的密集层、过渡层构建深的网络便于提取篡改图像的特征。两个连续的卷积层构成一个基结构层，一个密集层可包括多个基本结构层，并且在一个密集层里每一个基本结构层的输入都是由前面层的所有输出经过合并操作构成的，这样的结构是残差结构的局部稠密版本，能有益于训练更深的网络而不至于过拟合。卷积网络中一种使用了四个密集层，分别包含5、10、20、12个基本结构，过渡层是一个卷积层，对输入的特征图先卷积一次，降低深度，然后进行平均池化，缩小尺寸。网络中使用的池化都是2x2池化，最后一个密集层后的特征图应为原尺寸的三十二分之一。全局平均池化层将特征图进行平均，只保留深度，经过全连接层后输出两个值，由softmax函数转化为概率值，分别代表篡改、非篡改的概率，取最大值为最终的判断结果。In the above scheme, after the pooling operation is performed by the pooling layer, the dense layer and transition layer of the densely connected convolutional network are used to construct a deep network to facilitate the extraction of the features of the tampered image. Two consecutive convolutional layers form a base structure layer, and a dense layer can include multiple base structure layers, and in a dense layer, the input of each base structure layer is composed of all outputs of the previous layers after combining operations. , such a structure is a locally dense version of the residual structure and can be beneficial for training deeper networks without overfitting. One of the convolutional networks uses four dense layers, including 5, 10, 20, and 12 basic structures, respectively. The transition layer is a convolutional layer. The input feature map is convolved once, the depth is reduced, and then averaged Pooling, downsizing. The pooling used in the network is all 2x2 pooling, and the feature map after the last dense layer should be one-thirtieth of the original size. The global average pooling layer averages the feature maps and only retains the depth. After the fully connected layer, two values are output, which are converted into probability values by the softmax function, representing the probability of tampering and non-tampering respectively, and the maximum value is the final judgment result. .

上述方案中，在每个卷积层后都有批标准化和relu激活函数层，从而防止梯度爆炸或者弥散，引入非线性模型。In the above scheme, there are batch normalization and relu activation function layers after each convolution layer to prevent gradient explosion or dispersion and introduce nonlinear models.

其中，在所述步骤S3中，所述反卷积网络包括全连接层，密集层及对应的反卷积过渡层；首先通过全连接层对篡改图像特征上逐点进行计算，然后通过密集层及对应的反卷积过渡层对图像继续逐层的还原，定位图像篡改区域。Wherein, in the step S3, the deconvolution network includes a fully connected layer, a dense layer and a corresponding deconvolution transition layer; firstly, the fully connected layer is used to calculate the tampered image features point by point, and then the dense layer And the corresponding deconvolution transition layer continues to restore the image layer by layer to locate the image tampering area.

其中，在所述步骤S4中，根据所述图像篡改区域，由反卷积网络输出待测图像定位后的二值图像，完成对图像篡改的定位。Wherein, in the step S4, according to the image tampering area, the binary image after the location of the image to be tested is output by the deconvolution network, so as to complete the location of the image tampering.

上述方案中，本发明利用尽可能与卷积网络对称的结构构建反卷积网络，首先，移除了卷积网络中的全局池化层从而得以保留完整的特征图，对应的全连接层可以在特征图上逐点进行运算，即相当于1x1的卷积，接着是三个包含12、6、3个基本结构层的密集层以及反卷积过渡层，反卷积过渡层是对过渡层的改良，将平均池化替换成2x的反卷积层，从而增大一倍特征图的尺寸。In the above scheme, the present invention constructs a deconvolutional network by using a structure that is as symmetrical as possible to the convolutional network. First, the global pooling layer in the convolutional network is removed to retain the complete feature map, and the corresponding fully connected layer can be used. The operation is performed point by point on the feature map, which is equivalent to a 1x1 convolution, followed by three dense layers containing 12, 6, and 3 basic structure layers and a deconvolution transition layer. The deconvolution transition layer is the transition layer. The improvement of the average pooling is replaced by a 2x deconvolution layer, thereby doubling the size of the feature map.

上述方案中，为了更好地补充输出图像的细节，本发明将前面卷积网络的输出通过直接连接、2x反卷积、4x反卷积操作后输入后面反卷积网络的层中，这样形成一种多尺寸的特征拼接，通过串联合并得到的特征图，可以研究多尺寸的上下文信息，从而有助于网络学习如何精准预测出篡改区域的边界、轮廓和大小；另外，本发明通过将全连接层输出进行逐级2x反卷积后连接到后面的层中去，提高其在网络决策中的重要性。本发明认为全连接层输出是一种有效的空间决策信息，因为在检测任务中全连接层被训练用于篡改的二分类，因此对此信息多加利用。In the above scheme, in order to better supplement the details of the output image, the present invention inputs the output of the previous convolutional network into the layer of the subsequent deconvolutional network through direct connection, 2x deconvolution, and 4x deconvolution operations. A multi-size feature splicing, the feature maps obtained by merging in series can study multi-size context information, thereby helping the network learn how to accurately predict the boundary, contour and size of the tampered area; The output of the connection layer is successively deconvolved by 2x and then connected to the following layers to increase its importance in network decision-making. The present invention considers that the output of the fully connected layer is a kind of effective spatial decision information, because the fully connected layer is trained for the tampered binary classification in the detection task, so this information is used more.

其中，所述密集连接卷积网络和反卷积网络的训练过程具体为：Wherein, the training process of the densely connected convolutional network and the deconvolutional network is as follows:

采集训练图像数据并进行预处理；Collect training image data and perform preprocessing;

将预处理后的图像数据分为训练集和测试集；Divide the preprocessed image data into training set and test set;

利用训练集对128x128图像进行预训练，计算梯度更新参数；Use the training set to pre-train 128x128 images, and calculate the gradient update parameters;

根据梯度更新参数对完整尺寸的图像训练，得到密集连接卷积网络的权重；The weights of the densely connected convolutional network are obtained by training the full size image according to the gradient update parameters;

根据密集连接卷积网络的权重对反卷积网络进行128x128图像的预训练，计算梯度更新参数；Pre-train the deconvolution network with 128x128 images according to the weight of the densely connected convolutional network, and calculate the gradient update parameters;

根据计算梯度更新参数对完整尺寸的图像训练，完成对反卷积网络的训练；Complete the training of the deconvolution network by training the full-size image according to the calculated gradient update parameters;

利用测试集对反卷积网络进行评估调整，最终输出对应权重的密集连接卷积网络和反卷积网络。Use the test set to evaluate and adjust the deconvolution network, and finally output the densely connected convolution network and deconvolution network corresponding to the weights.

其中，所述密集连接卷积网络和反卷积网络训练调整过程中，采用五折交叉验证法进行调整，每次取预处理后的图像数据五分之一作为测试集，五分之四作为训练集，通过五次的训练评估，取平均训练的结果作为最终的结果。Among them, in the training and adjustment process of the densely connected convolutional network and the deconvolutional network, the five-fold cross-validation method is used for adjustment, and each time one fifth of the preprocessed image data is taken as the test set, and four fifths as the test set. The training set, through five training evaluations, takes the average training result as the final result.

上述方案中，本发明通过用128x128的小窗口滑动图像，将含有篡改区域的窗口保存为新的图像，并基于篡改区域大小的策略从窗口中筛选科学合理的样本。第一，为了不使篡改区域过大，只保留下篡改区域占比不超过40％的窗口；第二，为了避免篡改区域过小，舍弃篡改区域面积低于150个像素的窗口。这样可以防止样本图像篡改区域面积不合理问题出现，有助于网络对图像篡改检测的学习。同时，用数据增强的方法对图像进行多角度旋转，增强模型的旋转不变性。In the above scheme, the present invention saves the window containing the tampered area as a new image by sliding the image with a 128x128 small window, and selects scientific and reasonable samples from the window based on the strategy of the size of the tampered area. First, in order not to make the tampered area too large, only the windows with the tampered area accounting for no more than 40% are kept; secondly, in order to avoid the tampered area being too small, the windows with the area of the tampered area less than 150 pixels are discarded. This can prevent the unreasonable area of the sample image tampering area from appearing, which is helpful for the network to learn image tampering detection. At the same time, the multi-angle rotation of the image is carried out by means of data augmentation to enhance the rotation invariance of the model.

上述方案中，训练围绕先检测后定位展开，先训练用于检测图像是否经过篡改的二分类卷积网络，然后保留卷积网络的权重，以定位篡改区域为目标训练，仅使用篡改训练样本更新卷积网络和反卷积网络。In the above scheme, the training revolves around first detection and then positioning, first training the binary convolutional network used to detect whether the image has been tampered, and then retaining the weights of the convolutional network, training with the target of locating the tampered area, and only using the tampered training samples to update Convolutional and Deconvolutional Networks.

上述方案中，训练时先用128x128的小图像计算梯度更新参数，充分利用图形处理器的显存，一直前向传播可以计算多个样本的梯度；在使用完整尺寸的数据集进行训练，而由于尺寸不均，一次前向传播只能计算一个样本的梯度。为了使损失能平稳下降，本发明用程序设置梯度累加器，对多次梯度进行平均后再更新一次参数。In the above scheme, 128x128 small images are used to calculate the gradient update parameters during training, and the video memory of the graphics processor is fully utilized, and the gradients of multiple samples can be calculated by forward propagation; Uneven, a forward pass can only calculate the gradient of one sample. In order to make the loss decrease smoothly, the present invention uses a program to set up a gradient accumulator, averages multiple gradients, and then updates the parameters again.

与现有技术相比，本发明技术方案的有益效果是：Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

本发明提供的一种采用密集结构卷积网络的图像篡改检测和定位方法，将深度学习技术应用到图像篡改检测与定位中，训练网络学习到篡改图像的特征，适用于应付多种篡改手段，能在新的数据集前提下继续更新参数提高性能，具有好的鲁棒性和实用性；相对于其他深度学习方法，本发明实现了检测和定位的统一框架，不仅能够多图像是否经过篡改做出预测，还能对篡改区域进行预测，给出逐像素的精确标注，得到细致的物体轮廓边界。The invention provides an image tampering detection and positioning method using a densely structured convolutional network. The deep learning technology is applied to the image tampering detection and positioning, and the network is trained to learn the characteristics of the tampered image, which is suitable for dealing with various tampering methods. It can continue to update parameters to improve performance under the premise of new data sets, and has good robustness and practicability; compared with other deep learning methods, the invention realizes a unified framework for detection and positioning, not only can multiple images be tampered or not. It can also predict the tampered area, give accurate pixel-by-pixel annotations, and obtain detailed object contour boundaries.

附图说明Description of drawings

图1为本发明所述方法步骤流程图；Fig. 1 is the flow chart of the method steps of the present invention;

图2为卷积网络和反卷积网络结构示意图；Figure 2 is a schematic diagram of the structure of a convolutional network and a deconvolutional network;

图3为卷积网络和反卷积网络训练流程图；Fig. 3 is the training flow chart of convolutional network and deconvolutional network;

图4为定位测试样例的结果示意图。FIG. 4 is a schematic diagram of the results of the positioning test sample.

具体实施方式Detailed ways

附图仅用于示例性说明，不能理解为对本专利的限制；The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent;

为了更好说明本实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；In order to better illustrate this embodiment, some parts of the drawings are omitted, enlarged or reduced, which do not represent the size of the actual product;

对于本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。It will be understood by those skilled in the art that some well-known structures and their descriptions may be omitted from the drawings.

下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below with reference to the accompanying drawings and embodiments.

实施例1Example 1

如图1所示，一种采用密集结构卷积网络的图像篡改检测和定位方法，包括以下步骤：As shown in Figure 1, an image forgery detection and localization method using a densely structured convolutional network includes the following steps:

更具体的，如图2所示，在所述步骤S2中，所述的密集连接卷积网络包括池化层、密集层、过渡层、全局平均池化层和全连接层；其中：More specifically, as shown in Figure 2, in the step S2, the densely connected convolutional network includes a pooling layer, a dense layer, a transition layer, a global average pooling layer and a fully connected layer; wherein:

更具体的，所述密集层包括多个基本结构层，每个基本结构层由两个连续的卷积层组成，其中每个基本结构层的输入都有前一层的输出进行合并操作而成，是残差结构的局部稠密版本。More specifically, the dense layer includes a plurality of basic structure layers, each basic structure layer is composed of two consecutive convolutional layers, wherein the input of each basic structure layer is formed by combining the outputs of the previous layer. , is a locally dense version of the residual structure.

更具体的，所述密集连接卷积网络设置有四个密集层，分别包含了5、10、20、12个基本结构层。More specifically, the densely connected convolutional network is provided with four dense layers, including 5, 10, 20, and 12 basic structure layers, respectively.

更具体的，所述过渡层包括一层卷积层，其对密集层输入的特征图先卷积一次，再进行平均池化，对图像尺寸进行缩小。More specifically, the transition layer includes a convolution layer, which convolves the feature map input to the dense layer once, and then performs average pooling to reduce the size of the image.

更具体的，所述全连接层通过softmax函数计算输出两个概率值，具体计算公式为：More specifically, the fully connected layer calculates and outputs two probability values through the softmax function, and the specific calculation formula is:

其中，i代表两个类别篡改/非篡改，

在具体实施过程中，为了更好地捕捉图像的篡改噪声特征，对输入图像的RGB三通道进行一次SRM卷积，卷积核用归一化后的SRM进行初始化，一个卷积核的三个通道都用同一个模型赋值，得到30个滤波器，卷积后的输出再和RGB三通道进行串联合并。In the specific implementation process, in order to better capture the tampering noise features of the image, an SRM convolution is performed on the RGB three channels of the input image, and the convolution kernel is initialized with the normalized SRM. The channels are all assigned with the same model, and 30 filters are obtained, and the convolutional output is combined with the RGB three channels in series.

在具体实施过程中，经过池化层进行池化操作后，利用密集连接卷积网络的密集层、过渡层构建深的网络便于提取篡改图像的特征。两个连续的卷积层构成一个基结构层，一个密集层可包括多个基本结构层，并且在一个密集层里每一个基本结构层的输入都是由前面层的所有输出经过合并操作构成的，这样的结构是残差结构的局部稠密版本，能有益于训练更深的网络而不至于过拟合。卷积网络中一种使用了四个密集层，分别包含5、10、20、12个基本结构，过渡层是一个卷积层，对输入的特征图先卷积一次，降低深度，然后进行平均池化，缩小尺寸。网络中使用的池化都是2x2池化，最后一个密集层后的特征图应为原尺寸的三十二分之一。全局平均池化层将特征图进行平均，只保留深度，经过全连接层后输出两个值，由softmax函数转化为概率值，分别代表篡改、非篡改的概率，取最大值为最终的判断结果。In the specific implementation process, after the pooling operation is performed by the pooling layer, the dense layer and transition layer of the densely connected convolutional network are used to construct a deep network to facilitate the extraction of the features of the tampered image. Two consecutive convolutional layers form a base structure layer, and a dense layer can include multiple base structure layers, and in a dense layer, the input of each base structure layer is composed of all outputs of the previous layers after combining operations. , such a structure is a locally dense version of the residual structure and can be beneficial for training deeper networks without overfitting. One of the convolutional networks uses four dense layers, including 5, 10, 20, and 12 basic structures, respectively. The transition layer is a convolutional layer. The input feature map is convolved once, the depth is reduced, and then averaged Pooling, downsizing. The pooling used in the network is all 2x2 pooling, and the feature map after the last dense layer should be one-thirtieth of the original size. The global average pooling layer averages the feature maps and only retains the depth. After the fully connected layer, two values are output, which are converted into probability values by the softmax function, representing the probability of tampering and non-tampering respectively, and the maximum value is the final judgment result. .

在具体实施过程中，在每个卷积层后都有批标准化和relu激活函数层，从而防止梯度爆炸或者弥散，引入非线性模型。In the specific implementation process, there are batch normalization and relu activation function layers after each convolutional layer, so as to prevent gradient explosion or dispersion and introduce nonlinear models.

更具体的，如图2所示，在所述步骤S3中，所述反卷积网络包括全连接层，密集层及对应的反卷积过渡层；首先通过全连接层对篡改图像特征上逐点进行计算，然后通过密集层及对应的反卷积过渡层对图像继续逐层的还原，定位图像篡改区域。More specifically, as shown in FIG. 2 , in the step S3, the deconvolution network includes a fully connected layer, a dense layer and a corresponding deconvolution transition layer; first, the tampered image features are gradually updated through the fully connected layer. Points are calculated, and then the image is restored layer by layer through the dense layer and the corresponding deconvolution transition layer to locate the image tampering area.

更具体的，在所述步骤S4中，根据所述图像篡改区域，由反卷积网络输出待测图像定位后的二值图像，完成对图像篡改的定位。More specifically, in the step S4, according to the image tampering area, the deconvolution network outputs the binary image after the location of the image to be tested, so as to complete the location of the image tampering.

在具体实施过程中，本发明利用尽可能与卷积网络对称的结构构建反卷积网络，首先，移除了卷积网络中的全局池化层从而得以保留完整的特征图，对应的全连接层可以在特征图上逐点进行运算，即相当于1x1的卷积，接着是三个包含12、6、3个基本结构层的密集层以及反卷积过渡层，反卷积过渡层是对过渡层的改良，将平均池化替换成2x的反卷积层，从而增大一倍特征图的尺寸。In the specific implementation process, the present invention constructs a deconvolutional network by using a structure that is as symmetrical as possible to the convolutional network. First, the global pooling layer in the convolutional network is removed so that the complete feature map can be retained, and the corresponding fully connected The layer can be operated point by point on the feature map, which is equivalent to a 1x1 convolution, followed by three dense layers containing 12, 6, and 3 basic structure layers and a deconvolution transition layer. The improvement of the transition layer replaces the average pooling with a 2x deconvolution layer, thereby doubling the size of the feature map.

在具体实施过程中，为了更好地补充输出图像的细节，本发明将前面卷积网络的输出通过直接连接、2x反卷积、4x反卷积操作后输入后面反卷积网络的层中，这样形成一种多尺寸的特征拼接，通过串联合并得到的特征图，可以研究多尺寸的上下文信息，从而有助于网络学习如何精准预测出篡改区域的边界、轮廓和大小；另外，本发明通过将全连接层输出进行逐级2x反卷积后连接到后面的层中去，提高其在网络决策中的重要性。本发明认为全连接层输出是一种有效的空间决策信息，因为在检测任务中全连接层被训练用于篡改的二分类，因此对此信息多加利用。In the specific implementation process, in order to better supplement the details of the output image, the present invention inputs the output of the previous convolutional network into the layer of the subsequent deconvolution network through direct connection, 2x deconvolution, and 4x deconvolution operations. In this way, a multi-size feature splicing is formed, and the feature maps obtained by concatenating and merging can study multi-size context information, thereby helping the network learn how to accurately predict the boundary, contour and size of the tampered area; The output of the fully connected layer is deconvolved step by step 2x and connected to the following layers to increase its importance in network decision-making. The present invention considers that the output of the fully connected layer is a kind of effective spatial decision information, because the fully connected layer is trained for the tampered binary classification in the detection task, so this information is used more.

实施例2Example 2

更具体的，在实施例1的基础上，如图3所示，所述密集连接卷积网络和反卷积网络的训练过程具体为：More specifically, on the basis of Embodiment 1, as shown in FIG. 3 , the training process of the densely connected convolutional network and the deconvolutional network is as follows:

更具体的，所述密集连接卷积网络和反卷积网络训练调整过程中，采用五折交叉验证法进行调整，每次取预处理后的图像数据五分之一作为测试集，五分之四作为训练集，通过五次的训练评估，取平均训练的结果作为最终的结果。More specifically, in the training and adjustment process of the densely connected convolutional network and the deconvolutional network, the five-fold cross-validation method is used for adjustment, and one fifth of the preprocessed image data is taken as the test set each time, and the fifth is used as the test set. Four is used as the training set. After five training evaluations, the average training result is taken as the final result.

在具体实施过程中，本发明通过用128x128的小窗口滑动图像，将含有篡改区域的窗口保存为新的图像，并基于篡改区域大小的策略从窗口中筛选科学合理的样本。第一，为了不使篡改区域过大，只保留下篡改区域占比不超过40％的窗口；第二，为了避免篡改区域过小，舍弃篡改区域面积低于150个像素的窗口。这样可以防止样本图像篡改区域面积不合理问题出现，有助于网络对图像篡改检测的学习。同时，用数据增强的方法对图像进行多角度旋转，增强模型的旋转不变性。In the specific implementation process, the present invention saves the window containing the tampered area as a new image by sliding the image with a 128x128 small window, and selects scientific and reasonable samples from the window based on the strategy of the size of the tampered area. First, in order not to make the tampered area too large, only the windows with the tampered area accounting for no more than 40% are kept; secondly, in order to avoid the tampered area being too small, the windows with the area of the tampered area less than 150 pixels are discarded. This can prevent the unreasonable area of the sample image tampering area from appearing, which is helpful for the network to learn image tampering detection. At the same time, the multi-angle rotation of the image is carried out by means of data augmentation to enhance the rotation invariance of the model.

在具体实施过程中，训练围绕先检测后定位展开，先训练用于检测图像是否经过篡改的二分类卷积网络，然后保留卷积网络的权重，以定位篡改区域为目标训练，仅使用篡改训练样本更新卷积网络和反卷积网络。In the specific implementation process, the training revolves around first detection and then positioning. First, the binary convolutional network for detecting whether the image has been tampered is trained, and then the weight of the convolutional network is reserved. The target training is to locate the tampered area, and only the tampering training is used. Sample update convolutional and deconvolutional networks.

在具体实施过程中，训练时先用128x128的小图像计算梯度更新参数，充分利用图形处理器的显存，一直前向传播可以计算多个样本的梯度；在使用完整尺寸的数据集进行训练，而由于尺寸不均，一次前向传播只能计算一个样本的梯度。为了使损失能平稳下降，本发明用程序设置梯度累加器，对多次梯度进行平均后再更新一次参数。In the specific implementation process, 128x128 small images are used to calculate the gradient update parameters during training, and the video memory of the graphics processor is fully utilized, and the gradients of multiple samples can be calculated by forward propagation. Due to the uneven size, a forward pass can only compute the gradient of one sample. In order to make the loss decrease smoothly, the present invention uses a program to set up a gradient accumulator, averages multiple gradients, and then updates the parameters again.

实施例3Example 3

在具体实施过程中，本发明提出的网络使用Tensorflow深度学习框架搭建，可在一颗Geforce GTX 1080Ti GPU(图形处理器)上训练。样本为128x128大小时，更新一次参数可使用128张图像一次迭代。在大小从240×160到1000×1000像素不等的测试集上，检测一张图像是否经过篡改平均需要17.75毫秒，对一张篡改图像进行定位的平均时间是99.84毫秒。In the specific implementation process, the network proposed by the present invention is built using the Tensorflow deep learning framework, which can be trained on a Geforce GTX 1080Ti GPU (graphics processing unit). When the sample size is 128x128, updating the parameters once can use 128 images for one iteration. On test sets ranging in size from 240×160 to 1000×1000 pixels, it takes an average of 17.75 ms to detect whether an image has been tampered with, and an average of 99.84 ms to locate a tampered image.

在具体实施过程中，本发明使用多个公开数据集进行训练和测试，包括CASIAv1.0、CASIA v2.0、NC 2016和Columbia Umcompressed四个常用数据集。对模型进行五次训练测试后取平均结果，表1给出了在测试集上的分类平均准确率，平均像素分类准确率，平均交并比。准确率为分类正确样本数量与总样本数量的比值。交并比指的是真实篡改区域与预测篡改区域的交集与并集的比值，介于0到1之间，越大表明重合程度愈高，即模型的性能愈好。In the specific implementation process, the present invention uses multiple public datasets for training and testing, including four commonly used datasets, CASIAv1.0, CASIA v2.0, NC 2016 and Columbia Umcompressed. The average results are obtained after five training tests are performed on the model. Table 1 shows the average classification accuracy, average pixel classification accuracy, and average intersection ratio on the test set. Accuracy is the ratio of the number of correctly classified samples to the total number of samples. The intersection and union ratio refers to the ratio of the intersection and union of the real tampered area and the predicted tampered area.

表1Table 1

在具体实施过程中，一些定位测试样例的结果如图4所示，白色像素代表篡改的区域。由于卷积网络全局池化层的去除，全连接层输出一个2通道的特征图，将其可视化如第四列，可见该层输出有效的空间决策信息，大致地给出预测位置。反卷积网络利用此信息以及通过与浅层网络密集连接，进一步精准预测。本发明提出的密集结构卷积神经网络能够有效地识别拼接、复制移动、去除的篡改手段，并且输出逐像素分类的结果，能够精准地预测被篡改的对象、大小、形状，贴近真实标注。In the specific implementation process, the results of some positioning test samples are shown in Figure 4, and the white pixels represent the tampered areas. Due to the removal of the global pooling layer of the convolutional network, the fully connected layer outputs a 2-channel feature map, which is visualized as the fourth column. It can be seen that this layer outputs effective spatial decision information and roughly gives the predicted position. Deconvolutional networks use this information and through dense connections to shallow networks, further accurate predictions. The dense structure convolutional neural network proposed by the present invention can effectively identify the tampering means of splicing, copying, moving, and removing, and output the result of pixel-by-pixel classification, which can accurately predict the tampered object, size, and shape, and is close to the real annotation.

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Obviously, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

1. an image tampering detection and positioning method using a dense structure convolutional network, is characterized in that, comprises the following steps:

S1: Input the image to be tested, perform preprocessing with spatial enrichment SRM convolution to obtain the preprocessed image;

S2: Construct a densely connected convolutional network to extract the tampered image features of the preprocessed image, obtain the binary classification information of the image to be tested, and complete the detection of image tampering;

S3: Construct a deconvolutional network that is symmetrical with the connected convolutional network structure, and uses the binary information of the image to be tested as input to locate the image tampering area;

S4: According to the obtained image tampering area, the image after positioning is completed by the deconvolution network, and the positioning of the image tampering is completed.

2. A method for image tampering detection and localization using a densely structured convolutional network according to claim 1, wherein in the step S2, the densely connected convolutional network comprises a pooling layer, a dense layer, transition layer, global average pooling layer, and fully connected layer; where:

The pooling layer performs a convolution and maximum pooling operation on the preprocessed image, and inputs the result into the dense layer;

The dense layer and the transition layer are provided with multiple layers, and the output results of each dense layer are input into the corresponding transition layer, and finally the tampered image feature map obtained by the last transition layer is input into the global average pooling layer. middle;

The global average pooling layer performs average pooling on the tampered image feature map, and the fully connected layer calculates and outputs two probability values, which represent the probability of tampering and non-tampering respectively, and obtains the binary classification information of the image to be tested.

3. An image tampering detection and localization method using a dense structure convolutional network according to claim 2, wherein the dense layer comprises a plurality of basic structure layers, and each basic structure layer consists of two consecutive layers. It consists of convolutional layers, in which the input of each basic structure layer is merged with the output of the previous layer, which is a locally dense version of the residual structure.

4. a kind of image tampering detection and positioning method adopting dense structure convolutional network according to claim 3, is characterized in that, described densely connected convolutional network is provided with four dense layers, respectively comprises 5, 10, 20, 12 basic structural layers.

5. An image tampering detection and localization method using a densely structured convolutional network according to claim 4, wherein the transition layer comprises a convolutional layer, which pre-rolls the feature map input by the dense layer Product once, and then perform average pooling to reduce the size of the image.

6. a kind of image tampering detection and positioning method adopting dense structure convolutional network according to claim 5, is characterized in that, described fully connected layer calculates and outputs two probability values by softmax function, and concrete calculation formula is:

where i represents the two categories of tampering/non-tampering,

7. The image tampering detection and localization method using a densely structured convolutional network according to claim 2, wherein in the step S3, the deconvolutional network comprises a fully connected layer, a dense layer and a The corresponding deconvolution transition layer; firstly, the fully connected layer is used to calculate the tampered image features point by point, and then the image is restored layer by layer through the dense layer and the corresponding deconvolution transition layer to locate the image tampering area.

8. A method for image tampering detection and positioning using a densely structured convolutional network according to claim 7, wherein in the step S4, according to the image tampering area, the output to be tampered with is output by the deconvolution network. The binary image after image location is measured to complete the location of image tampering.

9. a kind of image tampering detection and positioning method adopting dense structure convolutional network according to claim 8 is characterized in that, the training process of described densely connected convolutional network and deconvolutional network is specifically:

Collect training image data and perform preprocessing;

Divide the preprocessed image data into training set and test set;

Use the training set to pre-train 128x128 images, and calculate the gradient update parameters;

The weights of the densely connected convolutional network are obtained by training the full size image according to the gradient update parameters;

Pre-train the deconvolution network with 128x128 images according to the weight of the densely connected convolutional network, and calculate the gradient update parameters;

Complete the training of the deconvolution network by training the full-size image according to the calculated gradient update parameters;

Use the test set to evaluate and adjust the deconvolution network, and finally output the densely connected convolution network and deconvolution network corresponding to the weights.

10. The image tampering detection and positioning method using a densely structured convolutional network according to claim 9, characterized in that, in the training and adjustment process of the densely connected convolutional network and the deconvolutional network, a five-fold cross is used. The verification method is adjusted. Each time, one-fifth of the preprocessed image data is taken as the test set and four-fifths as the training set. After five training evaluations, the average training result is taken as the final result.