CN108090485A

CN108090485A - Display foreground extraction method based on various visual angles fusion

Info

Publication number: CN108090485A
Application number: CN201711216652.9A
Authority: CN
Inventors: 王敏; 马宏斌; 侯本栋
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2018-05-29

Abstract

The invention discloses an image foreground automatic extraction method based on multi-view fusion, which mainly solves the problems of cumbersome extraction process and inaccurate extraction of foreground edges in the prior art. The implementation plan is: first train the SVM classifier, and then obtain the grayscale image of the image to be extracted; detect the sub-image containing the foreground in the gray-scale image through the trained SVM classifier; put the sub-image in the image to be extracted As the input of the GrabCut algorithm, foreground extraction is performed on the image to be extracted, and the extraction result under the pixel perspective of the image to be extracted is obtained; the image under the superpixel perspective is generated from the image to be extracted by the SLIC algorithm; the image under the superpixel perspective and The extraction results under the pixel perspective are fused to obtain the foreground extraction result of the image to be extracted. The invention simplifies the foreground extraction process, improves extraction efficiency and precision, and can be used for stereo vision, image semantic recognition, three-dimensional reconstruction and image search.

Description

Automatic extraction method of image foreground based on multi-view fusion

技术领域technical field

本发明属于图像处理技术领域，更进一步涉及一种基于多视角融合的图像前景自动提取方法，本发明可用于立体视觉、图像语义识别，图像搜索等的应用与研究。The invention belongs to the technical field of image processing, and further relates to an image foreground automatic extraction method based on multi-view fusion. The invention can be used in the application and research of stereo vision, image semantic recognition, image search, etc.

背景技术Background technique

前景提取是一种在图像中提取感兴趣目标的手段。它把图像分成若干个特定的、具有独特性质的区域并提出感兴趣目标的技术和过程，并且已经成为从图像处理到图像分析的关键步骤。具体解释为根据灰度、颜色、纹理和形状等特征把图像划分成若干互补交叠的区域，并使这些特征在同一区域内呈现出相似性，而在不同区域间呈现出明显的差异。经过了几十年的发展与变化，前景提取逐步形成了自己的科学体系，新的提取方法层出不穷，已然成为了一个跨学科的领域，并且引起了各个领域的研究人员和应用人士的广泛关注，如医学领域，航空航天遥感领域，工业检测，安防与军事领域等。Foreground extraction is a means to extract objects of interest in an image. It divides the image into several specific regions with unique properties and proposes the technology and process of the target of interest, and has become a key step from image processing to image analysis. The specific explanation is to divide the image into several complementary and overlapping regions according to features such as gray scale, color, texture and shape, and make these features appear similar in the same region, but show obvious differences in different regions. After decades of development and changes, foreground extraction has gradually formed its own scientific system, and new extraction methods have emerged one after another. It has become an interdisciplinary field and has attracted extensive attention from researchers and application personnel in various fields. Such as the field of medicine, aerospace remote sensing, industrial testing, security and military fields, etc.

当前前景提取方法主要包括基于阈值的前景提取方法、基于边缘的前景提取方法、基于区域的前景提取方法、基于图切割的前景提取方法、基于能量泛函的前景提取方法和基于深度学习的图像前景提取方法等。其中基于图切割的前景提取方法因为提取精度高，操作简单而受到青睐，基于图切割的前景提取方法是一种基于图论的组合优化方法,根据用户的交互信息，它将一幅图像映射成一个网络图，并建立关于标号的能量函数，运用最大流最小割算法对网络图进行有限次的迭代切割，得到网络图的最小割，作为图像的前景提取结果。但是因为人机交互的存在，对多幅图像进行提取时，人工操作量太大，限制了其在工程中的应用。例如，Meng Tang等人2013年在2013IEEE International Conference onComputer Vision上发表的《GrabCut in One Cut》，通过用户选择前景区域，然后将前景所在区域映射为图，通过One Cut对映射图进行有限次迭代切割，获得图像的前景提取结果，但是需要人机交互标定前景所在区域，导致前景提取过程比较繁琐，而且有限次的能量迭代优化只能获得较优解的最小割，难以得到精确的前景边缘。Current foreground extraction methods mainly include threshold-based foreground extraction methods, edge-based foreground extraction methods, region-based foreground extraction methods, graph cutting-based foreground extraction methods, energy functional-based foreground extraction methods, and image foreground extraction methods based on deep learning. extraction method, etc. Among them, the foreground extraction method based on graph cutting is favored because of its high extraction accuracy and simple operation. The foreground extraction method based on graph cutting is a combined optimization method based on graph theory. According to the user's interaction information, it maps an image into A network graph, and establish an energy function about the label, use the maximum flow minimum cut algorithm to cut the network graph iteratively for a limited number of times, and obtain the minimum cut of the network graph as the foreground extraction result of the image. However, due to the existence of human-computer interaction, when extracting multiple images, the amount of manual operation is too large, which limits its application in engineering. For example, "GrabCut in One Cut" published by Meng Tang et al. at the 2013 IEEE International Conference on Computer Vision in 2013, the user selects the foreground area, and then maps the area where the foreground is located into a graph, and performs a limited number of iterative cutting on the map through One Cut , to obtain the foreground extraction result of the image, but human-computer interaction is required to calibrate the foreground area, resulting in a cumbersome foreground extraction process, and limited energy iterative optimization can only obtain the minimum cut of a better solution, and it is difficult to obtain accurate foreground edges.

发明内容Contents of the invention

本发明的目的在于针对上述已有技术的不足，提出了一种基于多视角融合的图像前景自动提取方法，用于解决现有基于图切割的前景提取方法中，因为人机交互的存在导致的前景提取过程比较繁琐和有限次的能量迭代优化导致的前景边缘不精确的问题。The purpose of the present invention is to address the above-mentioned deficiencies in the prior art, and propose an image foreground automatic extraction method based on multi-view fusion, which is used to solve the problems caused by the existence of human-computer interaction in the existing foreground extraction method based on graph cutting. The foreground extraction process is relatively cumbersome and the problem of inaccurate foreground edges caused by limited energy iterative optimization.

为实现上述目的，本发明采取的技术方案包括如下：In order to achieve the above object, the technical scheme that the present invention takes includes as follows:

(1)对SVM分类器进行训练，得到训练好的SVM分类器；(1) train the SVM classifier to obtain the trained SVM classifier;

(2)对待提取图像进行灰度化，得到灰度图像；(2) Grayscale the image to be extracted to obtain a grayscale image;

(3)通过训练好的SVM分类器，在灰度图像中检测包含前景目标的子图像p_k；(3) through the trained SVM classifier, detect the sub-image p _k that contains the foreground target in the grayscale image;

(3a)采用多尺度窗口，按照设定的间隔在灰度图像中进行逐行滑动，得到由多个子图像组成的图像集P＝{p₁,p₂,...p_k,...,p_q}，其中，k∈[1,q],p_k为第k个子图像，q为子图像的数量；(3a) Use multi-scale windows to slide line by line in the grayscale image according to the set interval, and obtain an image set P={p ₁ ,p ₂ ,...p _k ,... ,p _q }, where, k∈[1,q], p _k is the kth sub-image, and q is the number of sub-images;

(3b)提取图像集P中各子图像p_k的方向梯度直方图HOG特征，并将其输入到训练好的SVM分类器中进行分类，计算得到子图像p_k的标签l_pk；(3b) extracting the histogram HOG feature of the direction gradient of each sub-image p _k in the image set P, and inputting it into the trained SVM classifier for classification, and calculating the label l _pk of the sub-image p _k ;

(3c)判断子图像p_k的标签l_pk是否为正，若是，则子图像p_k包含前景目标，记录子图像p_k在待提取图像的位置，即子图像p_k左上角的像素在待提取图像的相应位置(x_min,y_min)和右下角的像素在待提取图像的相应位置(x_max,y_max)，执行步骤(4)，否则，丢弃图像p_k；(3c) Determine whether the label l _pk of the sub-image p _k is positive, if so, the sub-image p _k contains the foreground object, record the position of the sub-image p _k in the image to be extracted, that is, the pixel in the upper left corner of the sub-image p _k is to be extracted The corresponding position (x _min , y _min ) of the extracted image and the pixel in the lower right corner are at the corresponding position (x _max , y _max ) of the image to be extracted, perform step (4), otherwise, discard the image p _k ;

(4)对待提取图像进行前景提取：(4) Foreground extraction of the image to be extracted:

采用子图像p_k左上角的像素在待提取图像的相应位置(x_min,y_min)和右下角的像素在待提取图像的相应位置(x_max,y_max)，对GrabCut算法的人机交互进行替换，并利用替换结果对待提取图像进行前景提取，得到待提取图像的像素视角下的提取结果S₁(x,y)；Using the pixel in the upper left corner of the sub-image p _k at the corresponding position (x _min , y _min ) of the image to be extracted and the pixel in the lower right corner at the corresponding position (x _max , y _max ) of the image to be extracted, the human-computer interaction of the GrabCut algorithm Perform replacement, and use the replacement result to extract the foreground of the image to be extracted, and obtain the extraction result S ₁ (x, y) under the pixel perspective of the image to be extracted;

(5)采用简单线性迭代聚类算法SLIC计算待提取图像的超像素，得到超像素视角下的图像：B＝{b₁,b₂,...,b_i,...,b_m}，i∈[1,m],b_i为第i个超像素，m为超像素的数量；(5) Use the simple linear iterative clustering algorithm SLIC to calculate the superpixels of the image to be extracted, and obtain the image under the superpixel perspective: B={b ₁ ,b ₂ ,..., _bi ,...,b _m } , i∈[1,m], b _i is the i-th superpixel, m is the number of superpixels;

(6)对超像素视角下的图像B和待提取图像的像素视角下的提取结果S₁(x,y)进行多视角融合，得到待提取图像前景S₂(x_i,y_i)。(6) Perform multi-view fusion on the image B under the superpixel perspective and the extraction result S ₁ (x, y) under the pixel perspective of the image to be extracted to obtain the foreground S ₂ ( _xi , y _i ) of the image to be extracted.

本发明与现有技术相比，具有如下优点：Compared with the prior art, the present invention has the following advantages:

1)本发明采用训练的SVM分类器获得待提取图像中前景所在的子图像，并采用子图像在待提取图像中的位置坐标替换GrabCut算法的人机交互获得的矩形区域作为算法输入，实现对待提取图像的前景提取，充分结合了SVM分类器和GrabCut算法，可以自动完成图像前景提取过程，解决了现有基于图切割的前景提取方法中，因为人机交互的存在导致的前景提取过程比较繁琐的问题，有效地提高了图像前景提取的效率。1) The present invention adopts the trained SVM classifier to obtain the sub-image where the foreground is located in the image to be extracted, and uses the position coordinates of the sub-image in the image to be extracted to replace the rectangular area obtained by the human-computer interaction of the GrabCut algorithm as an algorithm input to realize the treatment The foreground extraction of the extracted image fully combines the SVM classifier and the GrabCut algorithm, which can automatically complete the image foreground extraction process, and solves the cumbersome foreground extraction process caused by the existence of human-computer interaction in the existing foreground extraction method based on graph cutting problem, effectively improving the efficiency of image foreground extraction.

2)本发明采用SLIC算法对待提取图像进行超像素提取，充分利用了超像素块内一致性较好的特点，通过对超像素视角下的图像和像素视角下的提取结果进行融合，可得到待提取图像的前景精确提取结果，2) The present invention uses the SLIC algorithm to extract superpixels from the image to be extracted, fully utilizes the characteristics of good consistency in the superpixel block, and fuses the image under the superpixel perspective and the extraction result under the pixel perspective to obtain Extract the foreground of the image with accurate extraction results,

3)本发明通过引入超像素，使前景提取结果更加精确，平滑，解决了现有基于图切割的前景提取方法中，因为有限次的能量迭代优化导致的前景边缘不精确的问题，提高了图像前景提取的精度。3) The present invention makes the foreground extraction result more accurate and smooth by introducing superpixels, solves the problem of inaccurate foreground edges caused by limited energy iterative optimization in the existing foreground extraction method based on graph cutting, and improves the image quality. Accuracy of foreground extraction.

附图说明Description of drawings

图1是本发明的实现流程图；Fig. 1 is the realization flowchart of the present invention;

图2是本发明中的样本图像集结构图；Fig. 2 is a sample image set structure diagram in the present invention;

图3是本发明中提取HOG特征的示意图；Fig. 3 is a schematic diagram of extracting HOG features in the present invention;

图4是本发明中对HOG特征的可视化展示图；Fig. 4 is a visual display diagram of HOG features in the present invention;

图5是用本发明对行人、树叶作为前景的提取实验结果图。Fig. 5 is a diagram of the extraction experiment results of pedestrians and leaves as the foreground in the present invention.

具体实施方式Detailed ways

以下结合附图和具体实施例，对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

参照图1，基于多视角融合的图像前景自动提取方法，包括以下步骤：With reference to Fig. 1, the image foreground automatic extraction method based on multi-view fusion comprises the following steps:

步骤1，对SVM分类器进行训练。Step 1, train the SVM classifier.

(1a)采集含有前景类别的样本图像集，并对其中的所有样本图像进行灰度化，得到样本灰度图像集；(1a) Collect a sample image set containing foreground categories, and grayscale all the sample images therein to obtain a sample grayscale image set;

含有前景类别的样本图像集的结构图如图2所示，其包含正样本、负样本和样本标签文件，其中正样本为包含前景的图像，负样本为不包含前景的图像，样本标签文件用于对正样本和负样本的类别和存储位置进行说明；The structure diagram of the sample image set containing the foreground category is shown in Figure 2, which includes positive samples, negative samples and sample label files, where positive samples are images that contain foreground, negative samples are images that do not contain foreground, and sample label files use It is used to explain the categories and storage locations of positive samples and negative samples;

所述的对样本图像集中的所有样本图像进行灰度化，是将样本图像中的三通道的红色分量R，绿色分量G，蓝色分量B，进行加权平均，得到样本图像的灰度图像灰度值Gray：The grayscale of all the sample images in the sample image set is to carry out weighted average of the red component R, green component G, and blue component B of the three channels in the sample image to obtain the grayscale image of the sample image. Degree value Gray:

Gray＝R×0.299+G×0.587+B×0.114；Gray＝R×0.299+G×0.587+B×0.114;

(1b)提取样本灰度图像集中各图像的方向梯度直方图HOG特征：(1b) Extract the HOG feature of the direction gradient histogram of each image in the sample grayscale image set:

参照图3，本步骤的具体实现如下：Referring to Figure 3, the specific implementation of this step is as follows:

(1b1)将输入图像分为连通相邻且不重叠的若干个单元，在每个单元内计算像素的梯度幅值G(x,y)和梯度方向α(x,y)，计算公式分别为：(1b1) Divide the input image into several connected and non-overlapping units, and calculate the gradient magnitude G(x, y) and gradient direction α(x, y) of the pixel in each unit, and the calculation formulas are respectively :

其中，G_x(x,y)＝H(x+1,y)-H(x-1,y)表示输入图像中像素点(x,y)处的水平方向梯度，G_y(x,y)＝H(x,y+1)-H(x,y-1)表示输入图像中像素点(x,y)处的垂直方向梯度，H(x,y)表示输入图像中像素点(x,y)处的像素值；Among them, G _x (x, y) = H (x + 1, y) - H (x-1, y) represents the horizontal gradient at the pixel point (x, y) in the input image, G _y (x, y )=H(x,y+1)-H(x,y-1) indicates the vertical direction gradient at the pixel point (x,y) in the input image, and H(x,y) indicates the pixel point (x,y) in the input image , the pixel value at y);

(1b2)将所有梯度方向α(x,y)划分为9个角度，作为直方图的横轴，每个角度范围所对应的梯度值累加作为直方图的纵轴，得到梯度直方图；(1b2) Divide all gradient directions α(x, y) into 9 angles as the horizontal axis of the histogram, and accumulate the gradient values corresponding to each angle range as the vertical axis of the histogram to obtain a gradient histogram;

(1b3)统计每个单元的梯度直方图，得到每个单元的特征描述子；(1b3) Count the gradient histogram of each unit to obtain the feature descriptor of each unit;

(1b4)将8×8个单元组成一个块，串联一个块内所有单元的特征描述子，得到该块的方向梯度直方图HOG特征描述子；(1b4) 8×8 units are formed into a block, and the feature descriptors of all units in a block are connected in series to obtain the HOG feature descriptor of the block's direction gradient histogram;

(1b5)将输入图像内所有块的方向梯度直方图HOG特征描述子进行串联，得到该输入图像的方向梯度直方图HOG特征，其中方向梯度直方图HOG特征的可视化展示图如图4所示，其中图4(a)为示例样图，图4(b)示例样图的方向梯度直方图HOG特征图，由图4可见，方向梯度直方图HOG特征通过梯度或边缘方向密度很好的描述局部目标的表象和形状；(1b5) Concatenate the HOG feature descriptors of all blocks in the input image to obtain the HOG feature of the directional gradient histogram of the input image, and the visual display of the HOG feature of the directional gradient histogram is shown in Figure 4. Among them, Figure 4(a) is an example image, and Figure 4(b) is the HOG feature map of the oriented gradient histogram of the example image. It can be seen from Figure 4 that the HOG feature of the oriented gradient histogram can well describe the local area through gradient or edge direction density. the appearance and shape of the target;

(1b6)将样本灰度图像集中所有输入图像的方向梯度直方图HOG特征进行串联，得到该样本灰度图像集的方向梯度直方图HOG特征集；(1b6) Concatenate the histogram HOG features of all input images in the sample grayscale image set to obtain the HOG feature set of the histogram of orientation gradients in the sample grayscale image set;

(1c)采用样本方向梯度直方图HOG特征集中所有的方向梯度直方图HOG特征对SVM分类器进行训练，得到训练好的SVM分类器。(1c) Use all the HOG features of the histogram of orientation gradients in the HOG feature set to train the SVM classifier to obtain a trained SVM classifier.

步骤2，对待提取图像进行灰度化，得到灰度图像。Step 2: grayscale the image to be extracted to obtain a grayscale image.

将待提取图像中的三通道的红色分量R′，绿色分量G′，蓝色分量B′三者进行加权平均，得到待提取图像中每个像素点的灰度值Gray′：The red component R', the green component G', and the blue component B' of the three channels in the image to be extracted are weighted and averaged to obtain the gray value Gray' of each pixel in the image to be extracted:

Gray′＝R′×0.299+G′×0.587+B′×0.114Gray'=R'×0.299+G'×0.587+B'×0.114

根据每个像素点的灰度值得到待提取图像的灰度图像。The grayscale image of the image to be extracted is obtained according to the grayscale value of each pixel.

步骤3，利用训练好的SVM分类器，在待提取图像的灰度图像中检测包含前景目标的子图像p_i，得到子图像p_i左上角的像素在待提取图像的相应位置(x_min,y_min)和右下角的像素在待提取图像的相应位置(x_max,y_max)。Step 3, use the trained SVM classifier to detect the sub-image p _i containing the foreground object in the gray image of the image to be extracted, and obtain the pixel in the upper left corner of the sub-image p _i at the corresponding position of the image to be extracted (x _min , y _min ) and the pixel in the lower right corner is at the corresponding position (x _max , y _max ) of the image to be extracted.

(3a)采用多尺度窗口，按照设定的间隔在待提取图像的灰度图像中进行逐行滑动，得到由多个子图像组成的图像集：P＝{p₁,p₂,...p_k,...,p_q}，其中，p_k为第k个子图像，q为子图像的数量；(3a) Use multi-scale windows to slide line by line in the grayscale image of the image to be extracted according to the set interval, and obtain an image set composed of multiple sub-images: P={p ₁ ,p ₂ ,...p _k ,...,p _q }, where p _k is the kth sub-image, and q is the number of sub-images;

(3b)提取图像集P中各子图像p_k的方向梯度直方图HOG特征，并将其输入到训练好的SVM分类器中进行分类，得到子图像p_k的标签l_pk,l_pk的计算公式为：(3b) Extract the HOG feature of the histogram of oriented gradients of each sub-image p _k in the image set P, and input it into the trained SVM classifier for classification, and obtain the label l _pk of the sub-image p _k , and the calculation of l _pk The formula is:

其中， _k为SVM分类器的超平面的第k个法向量，x_k为子图像0_k的方向梯度直方图HOG特征，k∈[1,q]，q为子图像个数，φ为SVM分类器的超平面的位移项。in, _k is the kth normal vector of the hyperplane of the SVM classifier, x _k is the HOG feature of the directional gradient histogram of the sub-image 0 _k , k∈[1,q], q is the number of sub-images, φ is the SVM classifier The displacement term of the hyperplane.

(3c)判断子图像p_k的标签l_pk是否为正，若是，则子图像p_k包含前景目标，记录子图像p_k在待提取图像的位置，即子图像p_k左上角的像素在待提取图像的相应位置(x_min,y_min)和右下角的像素在待提取图像的相应位置(x_max,y_max)，并用子图像p_i左上角的像素在待提取图像的相应位置(x_min,y_min)和右下角的像素在待提取图像的相应位置(x_max,y_max)，组成包含前景的矩形区域，执行步骤4，否则，丢弃子图像p_k。(3c) Determine whether the label l _pk of the sub-image p _k is positive, if so, the sub-image p _k contains the foreground object, record the position of the sub-image p _k in the image to be extracted, that is, the pixel in the upper left corner of the sub-image p _k is to be extracted The corresponding position (x _min , y _min ) of the extracted image and the pixel in the lower right corner are in the corresponding position (x _max , y _max ) of the image to be extracted, and the pixel in the upper left corner of the sub-image p _i is in the corresponding position (x max , y max ) of the image to be extracted _min , y _min ) and the corresponding position (x _max , y _max ) of the lower right corner of the image to be extracted to form a rectangular area containing the foreground, go to step 4, otherwise, discard the sub-image p _k .

步骤4，对待提取图像进行前景提取。Step 4, perform foreground extraction on the image to be extracted.

前景提取方法主要包括基于阈值的前景提取方法、基于边缘的前景提取方法、基于区域的前景提取方法、基于图切割的前景提取方法、基于能量泛函的前景提取方法和基于深度学习的图像前景提取方法等。本实例采用但不局限于基于图切割的前景提取方法中基于GrabCut算法进行前景提取，其具体实现是：采用子图像p_i左上角和右下角的像素在待提取图像的相应位置(x_min,y_min)，(x_max,y_max)和待提取图像作为GrabCut算法的输入，对待提取图像进行前景提取，得到待提取图像的像素视角下的提取结果S₁(x,y)。Foreground extraction methods mainly include threshold-based foreground extraction methods, edge-based foreground extraction methods, region-based foreground extraction methods, graph cutting-based foreground extraction methods, energy functional-based foreground extraction methods, and image foreground extraction based on deep learning. method etc. This example uses but is not limited to the foreground extraction method based on graph cutting to perform foreground extraction based on the GrabCut algorithm. The specific implementation is: use the pixels in the upper left corner and lower right corner of the sub-image p _i to be at the corresponding position of the image to be extracted (x _min , y _min ), (x _max , y _max ) and the image to be extracted are used as the input of the GrabCut algorithm, the foreground is extracted from the image to be extracted, and the extraction result S ₁ (x, y) under the pixel perspective of the image to be extracted is obtained.

步骤5，计算待提取图像的超像素。Step 5, calculate the superpixels of the image to be extracted.

计算待提取图像的超像素，现有方法包括基于图论的方法和基于梯度下降的方法；本实例采用但不局限于基于梯度下降的方法中基于SLIC算法进行超像素提取；具体步骤如下：To calculate the superpixels of the image to be extracted, the existing methods include methods based on graph theory and methods based on gradient descent; this example uses but is not limited to methods based on gradient descent to extract superpixels based on the SLIC algorithm; the specific steps are as follows:

(5a)将待提取图像从RGB颜色空间转换到CIE-Lab颜色空间，得到CIE-Lab图像；(5a) convert the image to be extracted to the CIE-Lab color space from the RGB color space, and obtain the CIE-Lab image;

所述的待提取图像从RGB颜色空间转换到CIE-Lab颜色空间，其中在RGB和LAB之间没有直接的转换公式，其必须用XYZ颜色空间作为中间层，转换公式为：The image to be extracted is converted from RGB color space to CIE-Lab color space, wherein there is no direct conversion formula between RGB and LAB, it must use XYZ color space as the intermediate layer, and the conversion formula is:

其中，r,g,b为待提取图像像素的三个通道分量，R,G,B是r,g,b经过校正函数gamma(t)校正之后的三个通道分量， Among them, r, g, b are the three channel components of the image pixel to be extracted, R, G, B are the three channel components of r, g, b after correction by the correction function gamma(t),

由RGB颜色空间到XYZ颜色空间的转换公式得到XYZ颜色空间的X,Y,Z三个通道分量；其转换公式为：The conversion formula from the RGB color space to the XYZ color space obtains the X, Y, Z three channel components of the XYZ color space; the conversion formula is:

其中，为一个3×3的矩阵；in, is a 3×3 matrix;

在CIE-Lab颜色空间中，由XYZ颜色空间到CIE-Lab颜色空间转换公式得到CIE-Lab颜色空间L，a，b三通道的值，其转换公式为：In the CIE-Lab color space, the conversion formula from the XYZ color space to the CIE-Lab color space obtains the values of the three channels of the CIE-Lab color space L, a, and b, and the conversion formula is:

L＝116f(Y/Y_n)L=116f(Y/Y _n )

b＝500(f(X/X_n)-f(Y/Y_n))b=500(f(X/X _n )-f(Y/Y _n ))

a＝200(f(Y/Y_n)-f(Z/Z_n))a=200(f(Y/Y _n )-f(Z/Z _n ))

其中，X，Y，Z为RGB颜色空间向XYZ颜色空间转换后的三通道分量；X_n，Y_n，Z_n取值分别为95.047，100.0，108.883；f(X/X_n)，f(Y/Y_n)，f(Z/Z_n)通过如下函数计算出：Among them, X, Y, and Z are the three-channel components converted from the RGB color space to the XYZ color space; the values of X _n , Y _n , and Z _n are respectively 95.047, 100.0, and 108.883; f(X/X _n ), f( Y/Y _n ), f(Z/Z _n ) is calculated by the following function:

(5b)初始化超像素的聚类中心：设定超像素个数m＝200，在CIE-Lab图像内按照超像素个数均匀的分配超像素聚类中心，得到聚类中心集其中，为第d次迭代后的第i个聚类中心，共m个，其中，l_i,a_i,b_i为CIE-Lab颜色空间的三个通道，(x_i,y_i)为b_i坐标；(5b) Initialize the clustering centers of superpixels: set the number of superpixels m=200, distribute the superpixel clustering centers evenly according to the number of superpixels in the CIE-Lab image, and obtain the clustering center set in, is the i-th cluster center after the d-th iteration, a total of m, where, l _i , a _i , b _i are the three channels of the CIE-Lab color space, (xi _, y _i ) are the coordinates of b _i ;

(5c)对CIE-Lab图像的每一个像素pixel，设置标签l(pixel)＝-1和距离d(pixel)＝∞；(5c) For each pixel pixel of the CIE-Lab image, set label l(pixel)=-1 and distance d(pixel)=∞;

(5d)分别计算聚类中心集C^d中聚类中心的3×3邻域内所有像素点的梯度值，并将聚类中心移到该领域内梯度最小的像素点上，获得新的聚类中心集C^d+1；(5d) Calculate the cluster center in the cluster center set C ^d respectively Gradient values of all pixels in the 3×3 neighborhood of , and move the cluster center to the pixel with the smallest gradient in this area to obtain a new cluster center set C ^d+1 ;

(5e)对于聚类中心集C^d中每一个聚类中心的2S×2S内的每一个像素pixel＝[l_p,a_p,b_p,x_p,y_p]，计算与pixel的距离D(pixel)：(5e) For each cluster center in the cluster center set C ^d For each pixel pixel in 2S×2S=[l _p , a _p , b _p , x _p , y _p ], calculate Distance D(pixel) from pixel:

其中为像素点间的颜色差异，in is the color difference between pixels,

为像素点间的空间距离， is the spatial distance between pixels,

M为d_c的最大值，N为图像所有像素点个数，m为设定的超像素个数；M is the maximum value of d _c , N is the number of all pixels in the image, and m is the number of superpixels set;

(5f)比较d(pixel)与D(pixel)的大小，如果D(pixel)＜d(pixel)，则将D(pixel)赋值给d(pixel)，设d(pixel)＝D(pixel)，即用d(pixel)记录该像素到聚类中心的距离，并用该像素标签l(pixel)标记该像素属于第i个超像素，l(pixel)＝i，得到新的超像素b_i；(5f) Compare the size of d(pixel) and D(pixel), if D(pixel)<d(pixel), then assign D(pixel) to d(pixel), set d(pixel)=D(pixel) , that is, use d(pixel) to record the pixel to the cluster center , and use the pixel label l(pixel) to mark the pixel as belonging to the ith superpixel, l(pixel)=i, to obtain a new superpixel b _i ;

(5g)重复执行步骤(5d)～(5f)，更新聚类中心，直到残余误差收敛，得到超像素图像B＝{b₁,b₂,...,b_i,...,b_m}。(5g) Repeat steps (5d)~(5f) to update the cluster centers until the residual error converges, and obtain the superpixel image B={b ₁ ,b ₂ ,..., _bi ,...,b _m }.

步骤6，对超像素视角下的图像和待提取图像的像素视角下的提取结果S₁(x,y)进行多视角融合，得到待提取图像前景S₂(x_i,y_i)。Step 6: Perform multi-view fusion on the image under the superpixel perspective and the extraction result S ₁ (x, y) under the pixel perspective of the image to be extracted to obtain the foreground S ₂ ( _xi , y _i ) of the image to be extracted.

(6a)对超像素视角下的图像B的超像素b_i包含的所有像素在像素视角下的提取结果S₁(x,y)中的标签l_ij进行加权，得到超像素b_i的标签置信度Score_bi：(6a) Weight the label l _ij in the extraction result S ₁ (x, y) of all pixels contained in the superpixel b _i of the image B under the superpixel perspective, and obtain the label confidence of the superpixel b _i Degree Score _bi :

Score_bi＝∑l_ij；Score _bi = ∑ l _ij ;

(6b)设定置信度阈值gate，将置信度阈值gate与超像素b_i的标签置信度Score_bi进行比较，得到超像素b_i视角下的标签l_bi，并将该标签l_bi作为像素点(x_i,y_i)的标签S₂(x_i,y_i)，超像素b_i内的所有像素的标签与超像素的标签相同，S₂(x_i,y_i)即为待提取图像前景，其中(x_i,y_i)∈b_i；(6b) Set the confidence threshold gate, compare the confidence threshold gate with the label confidence Score _bi of the superpixel b _i , obtain the label l _bi under the view angle of the superpixel b _i , and use the label l _bi as the pixel point The label S ₂ (xi _, y _i ) of ( _xi , y _i ), the labels of all pixels in the superpixel b _i are the same as the label of the superpixel, and S ₂ ( _xi , y _i ) is the image to be extracted Foreground, where ( _xi ,y ₎ _∈bi ;

所述的设定置信度gate越小，则超像素b_i判为前景的概率越小，gate越大，则超像素b_i判为前景的概率越大，但是gate过大时，前景提取结果中会有过多的噪声存在；The smaller the set confidence gate is, the smaller the probability that the superpixel _bi is judged as the foreground, and the larger the gate is, the greater the probability that the superpixel _{bi is} judged as the foreground, but when the gate is too large, the foreground extraction result There will be too much noise in the

所述的将置信度阈值gate与超像素b_i的标签置信度Score_bi进行比较，得到超像素b_i的标签l_bi，其比较公式为：The confidence threshold gate is compared with the label confidence Score _bi of the superpixel b _i to obtain the label l _bi of the superpixel b _i , and the comparison formula is:

其中，l_bi为超像素b_i的标签，num_bi为超像素b_i中像素的数量，gate为置信度阈值，1为前景标签，0为背景标签；Among them, l _bi is the label of superpixel b _i , num _bi is the number of pixels in superpixel b _i , gate is the confidence threshold, 1 is the foreground label, and 0 is the background label;

所述的多视角融合，指的将超像素视角下的图像B和像素视角下的提取结果S₁(x,y)进行融合。The multi-view fusion refers to fusing the image B under the superpixel perspective and the extraction result S ₁ (x, y) under the pixel perspective.

以下通过前景提取实验，对本发明的技术效果作进一步说明：The technical effect of the present invention will be further described below by foreground extraction experiment:

1、实验条件与内容1. Experimental conditions and content

本发明的实验分别对行人、树叶目标进行提取，训练数据为网络随机找的行人、树叶图像集，图像数量分别为736张、186张，对每幅图片分别取正负样本，然后制作标签，分别形成含行人类别的样本图像集、含树叶类别的样本图像集。The experiment of the present invention extracts the target of pedestrians and leaves respectively, the training data is the image set of pedestrians and leaves randomly found by the network, the number of images is respectively 736 and 186, and the positive and negative samples are respectively taken for each picture, and then labels are made, A sample image set containing the category of pedestrians and a sample image set containing the category of leaves are formed respectively.

通过在MATLAB R2017a中编程，实现对行人、树叶目标进行前景提取，结果如图5所示。By programming in MATLAB R2017a, the foreground extraction of pedestrians and leaves is realized, and the results are shown in Figure 5.

2、实验结果分析：2. Analysis of experimental results:

从图5可见，对两类数据分别测试的4幅图像，其输出的前景提取结果不存在噪声，并且提取的前景边缘较好，如对树叶类别的4幅图像的前景提取结果，提取的前景边缘极为准确。对于输入图像中含有的前景完整度有较好的兼容性，如对行人类别的输入图像3，以半身像作为输入图像，仍然能获得较好的前景提取效果。It can be seen from Figure 5 that for the four images tested on the two types of data, the output foreground extraction results have no noise, and the extracted foreground edges are better. For example, for the foreground extraction results of the four images of the leaf category, the extracted foreground Edges are extremely accurate. It has better compatibility with the completeness of the foreground contained in the input image. For example, for the input image 3 of the pedestrian category, a bust image is used as the input image, and a good foreground extraction effect can still be obtained.

此外，从图5可见，本发明在SVM分类器完成训练后，可以自动完成对待提取图像的前景自动过程，获得待提取图像的前景提取结果，解决了现有基于图切割的前景提取方法中，需要人机交互辅助提取的问题，同时本发明充分利用了超像素块内一致性较好的特点，对GrabCut算法输出的像素视角下的提取结果的边缘进行修补，使前景提取结果更加准确，平滑，得到精确的前景提取结果，提高了前景提取精度。In addition, it can be seen from Fig. 5 that after the SVM classifier is trained, the present invention can automatically complete the automatic process of the foreground of the image to be extracted, obtain the foreground extraction result of the image to be extracted, and solve the problems in the existing foreground extraction method based on graph cutting. The problem that human-computer interaction is needed to assist the extraction, and the present invention makes full use of the characteristics of good consistency in the super-pixel block, and repairs the edge of the extraction result under the pixel perspective output by the GrabCut algorithm, so that the foreground extraction result is more accurate and smooth , to obtain accurate foreground extraction results and improve the accuracy of foreground extraction.

Claims

1. An automatic image foreground extraction method based on multi-view fusion is characterized in that:

(1) training the SVM classifier to obtain a trained SVM classifier;

(2) graying an image to be extracted to obtain a grayscale image;

(3) detecting a sub-image p containing a foreground target in the gray image through a trained SVM classifier_k；

(3a) Adopting a multi-scale window to slide line by line in the gray level image according to a set interval to obtain a channelImage set P ═ { P } composed of a plurality of sub-images₁,p₂,...p_k,...,p_qIn which k ∈ [1, q ]],p_kQ is the number of sub-images;

(3b) extracting each sub-image P in image set P_kThe HOG features of the histogram of directional gradients are input into a trained SVM classifier for classification, and a sub-image p is obtained through calculation_kLabel l of_pk；

(3c) Judging the subimage p_kLabel l of_pkIf positive, sub-picture p_kContaining foreground objects, recording subimages p_kAt the position of the image to be extracted, i.e. subimage p_kThe pixel at the upper left corner is at the corresponding position (x) of the image to be extracted_min,y_min) And the pixel at the lower right corner at the corresponding position (x) of the image to be extracted_max,y_max) Executing the step (4), otherwise, discarding the image p_k；

(4) Carrying out foreground extraction on an image to be extracted:

using subimages p_kThe pixel at the upper left corner is at the corresponding position (x) of the image to be extracted_min,y_min) And the pixel at the lower right corner at the corresponding position (x) of the image to be extracted_max,y_max) Replacing the human-computer interaction of the GrabConut algorithm, and performing foreground extraction on the image to be extracted by using the replacement result to obtain an extraction result S under the pixel viewing angle of the image to be extracted₁(x,y)；

(5) Calculating the super pixels of the image to be extracted by adopting a simple linear iterative clustering algorithm SLIC to obtain the image under the super pixel viewing angle: b ═ B₁,b₂,...,b_i,...,b_m}，i∈[1,m],b_iIs the ith super pixel, and m is the number of super pixels;

(6) extracting result S of image B under super pixel visual angle and image to be extracted under pixel visual angle₁(x, y) carrying out multi-view fusion to obtain the foreground S of the image to be extracted₂(x_i,y_i)。

2. The method of claim 1, wherein the training of the SVM classifier in step (1) is performed by:

(1a) collecting a sample image set containing a foreground category, and graying all sample images in the sample image set to obtain a sample gray image set;

(1b) extracting HOG (histogram of oriented gradient) features of each image in the sample gray level image set to obtain a HOG feature set of the sample HOG;

(1c) and training the SVM classifier by adopting all HOG characteristics in the HOG characteristic set of the sample HOG to obtain the trained SVM classifier.

3. The method of claim 2, wherein: graying all sample images in the sample image set in the step (1a), namely performing weighted average on three channels of red components R, green components G and blue components B in the sample images to obtain a Gray value Gray of the sample Gray image:

Gray＝R×0.299+G×0.587+B×0.114。

4. the method of claim 2, wherein: in the step (1b), the HOG characteristic of each image in the sample gray level image set is extracted, and the method comprises the following steps:

(1b1) dividing an input image into a plurality of units which are connected and adjacent without overlapping, and calculating the gradient amplitude G (x, y) and the gradient direction α (x, y) of a pixel in each unit:

<mrow> <mi>&alpha;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>tan</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mi>G</mi> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>G</mi> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>

wherein G is_x(x,y)＝H(x+1,y)-H(x-1,y)，G_y(x, y) ═ H (x, y +1) -H (x, y-1) respectively represent the horizontal direction gradient and the vertical direction gradient at the pixel point (x, y) in the input image, H (x +1, y) represents the pixel value at the pixel point (x +1, y) in the input image, H (x-1, y) represents the pixel value at the pixel point (x-1, y) in the input image, H (x, y +1) represents the pixel value at the pixel point (x, y +1) in the input image, and H (x, y-1) represents the pixel value at the pixel point (x, y-1) in the input image;

(1b2) dividing all gradient directions α (x, y) into 9 angles as the horizontal axis of the histogram, and accumulating gradient values corresponding to each angle range as the vertical axis of the histogram to obtain a gradient histogram;

(1b3) counting the gradient histogram of each unit to obtain a feature descriptor of each unit;

(1b4) combining n × n units into a block, and connecting the feature descriptors of all the units in the block in series to obtain the HOG feature descriptor of the block;

(1b5) connecting direction gradient histogram HOG feature descriptors of all blocks in the input image in series to obtain the HOG feature of the direction gradient histogram of the input image;

(1b6) and connecting the HOG features of the direction gradient histograms of all the input images in the sample gray level image set in series to obtain a HOG feature set of the direction gradient histograms of the sample gray level image set.

5. The method of claim 1, wherein: in step (3b) the operator image p is calculated_kLabel l of_pkBy the following formula:

wherein,the k-th normal vector, x, of the hyperplane of the SVM classifier_kAs a subimage p_kHOG feature of histogram of oriented gradients, k ∈ [1, q ]]Q is the number of sub-images and phi is the displacement term of the hyperplane of the SVM classifier.

6. The method of claim 1, wherein: calculating the superpixel of the image to be extracted in the step (5), wherein the method comprises the following implementation steps:

(5a) converting an image to be extracted from an RGB color space to a CIE-Lab color space to obtain a CIE-Lab image;

(5b) initializing the cluster centers of the superpixels: setting the number of superpixels, and uniformly distributing superpixel clustering centers according to the number of the superpixels in the CIE-Lab image to obtain a clustering center setWherein,m cluster centers after the d iteration, wherein l_i,a_i,b_iThree channels of CIE-Lab color space, (x)_i,y_i) Is b is_iCoordinates;

(5c) setting a label l (pixel) ═ 1 and a distance d (pixel) ∞foreach pixel of the CIE-Lab image;

(5d) separately computing a cluster center set C^dCenter of clusterAnd moving the clustering center to the pixel point with the minimum gradient in the field to obtain a new clustering center set C^d+1；

(5e) For cluster center set C^dEach cluster center inEach pixel in 2 sx 2S ═ l_p,a_p,b_p,x_p,y_p]CalculatingDistance D (pixel) from pixel:

whereinThe difference in color between the pixels is determined,

the spatial distance between the pixels is the distance between the pixels,

m is d_cThe maximum value of (a) is,n is the number of all pixel points of the image, and m is the set number of super pixels;

(5f) comparing the size of D (pixel) with that of D (pixel), if D (pixel) < D (pixel), D (pixel) is assigned to D (pixel), and l (pixel) < i, a new super pixel b is obtained_i；

(5g) Continuously executing the steps (5d) - (5f), updating the clustering center until the residual error is converged, and obtaining the superpixel image B ═ B₁,b₂,...,b_i,...,b_m}。

7. The method according to claim 1, wherein the step (6) comprises extracting the image B under the super-pixel view angle and the extraction result S under the pixel view angle of the image to be extracted₁(x, y) performing multi-view fusion as follows:

(6a) for the super pixel B of the image B under the super pixel view angle_iExtracting result S of all contained pixels under pixel viewing angle₁Label l in (x, y)_ijWeighting to obtain super pixel b_iTag confidence Score of_bi；

(6b) Setting a confidence threshold gate, and connecting the confidence threshold gate with the super-pixel b_iTag confidence Score of_biComparing to obtain the super pixel b_iLabel under visual angle_bi：

<mrow> <msub> <mi>l</mi> <mrow> <mi>b</mi> <mi>i</mi> </mrow> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>Score</mi> <mrow> <mi>b</mi> <mi>i</mi> </mrow> </msub> <mo>></mo> <msub> <mi>num</mi> <mrow> <mi>b</mi> <mi>i</mi> </mrow> </msub> <mo>/</mo> <mi>g</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>Score</mi> <mrow> <mi>b</mi> <mi>i</mi> </mrow> </msub> <mo><</mo> <msub> <mi>num</mi> <mrow> <mi>b</mi> <mi>i</mi> </mrow> </msub> <mo>/</mo> <mi>g</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

Wherein, num_biIs a super pixel b_iThe number of middle pixels, 1 is the foreground label and 0 is the background label.

(6c) The label l_biAs pixel point (x)_i,y_i) Tag S of₂(x_i,y_i) The S of₂(x_i,y_i) I.e. the image foreground to be extracted, where (x)_i,y_i)∈b_i。

8. The method of claim 7, wherein: super pixel b in step (6a)_iTag confidence Score of_biFor the super pixel B of the image B under the super pixel view angle_iExtracting result S of all contained pixels under pixel viewing angle₁The labels in (x, y) are summed, i.e.:

Score_bi＝∑l_ij

wherein, Score_biIs a super pixel b_iThe tag confidence of (1).