CN104134234B

CN104134234B - A fully automatic 3D scene construction method based on a single image

Info

Publication number: CN104134234B
Application number: CN201410340189.9A
Authority: CN
Inventors: 陈雪锦; 王贵杭; 胡思宇
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2014-07-16
Filing date: 2014-07-16
Publication date: 2017-07-25
Anticipated expiration: 2034-07-16
Also published as: CN104134234A

Abstract

The invention discloses a fully automatic method for building a three-dimensional scene model based on a single image, including: a method based on machine learning, using a training image set to train a classifier capable of roughly classifying and labeling input images. The classifier is used to divide the input image into three sub-regions of vertical, ground and sky, and a rough classification and labeling result of the image region is obtained. Based on the "credible" regions in the rough classification and labeling results, the GrabCut image segmentation algorithm is used to correct the rough classification and labeling results and obtain precise boundaries between image geometric regions. On the basis of obtaining the accurate classification and labeling results of the geometric regions of the image and the precise boundaries between the geometric regions of the image, the method of computer graphics is used to model a realistic three-dimensional scene. The fully automatic three-dimensional scene construction based on a single image is realized by adopting the method disclosed in the invention.

Description

A fully automatic 3D scene construction method based on a single image

技术领域technical field

本发明涉及基于图像的建模领域，尤其涉及一种全自动的基于单幅图像的三维场景构建方法。The invention relates to the field of image-based modeling, in particular to a fully automatic three-dimensional scene construction method based on a single image.

背景技术Background technique

基于图像的三维重建技术能够从二维图像出发构造具有真实感的三维图形。基于图像的建模是最近几年兴起的一门新技术，它使用直接拍摄到的图像，采用尽量少的交互操作，重建场景。其最大的特点就是克服了传统的基于几何建模和绘制技术的不足，可以在只具有普通计算能力的计算机上实现具有照片般真实感场景的实时漫游。传统的三维建模工具虽然日益改进，但构建稍显复杂的三维模型依旧是一件非常耗时费力的工作。考虑到要构建的很多三维模型都能在现实世界中找到或加以塑造，因此三维扫描技术和基于图像建模技术就成了人们心目中理想的建模方式；又由于前者一般只能获取景物的几何信息，而后者为生成具有照片级真实感的合成图像提供了一种自然的方式，因此它迅速成为目前计算机图形学领域中的研究热点。Image-based 3D reconstruction technology can construct realistic 3D graphics from 2D images. Image-based modeling is a new technology that has emerged in recent years. It uses directly captured images and uses as few interactive operations as possible to reconstruct the scene. Its biggest feature is to overcome the shortcomings of traditional geometric modeling and rendering technology, and can realize real-time roaming with photorealistic scenes on a computer with only ordinary computing power. Although the traditional 3D modeling tools are improving day by day, it is still a very time-consuming and labor-intensive task to construct a somewhat complex 3D model. Considering that many 3D models to be constructed can be found or shaped in the real world, 3D scanning technology and image-based modeling technology have become ideal modeling methods in people's minds; Geometric information, and the latter provides a natural way to generate photorealistic synthetic images, so it has quickly become a research hotspot in the field of computer graphics.

基于图像的模型重建是计算机图形学研究的前沿问题。该技术结合了计算机图形学、图像处理以及计算机视觉等诸多领域的理论和方法，通过图像场景所包含的二维信息来获取用于模型重建的三维数据，实现在虚拟场景中的模型重建，因此在计算机辅助设计和逆向工程中有很好的应用前景。基于图像的建模技术是在二维图像的基础上进行图像理解并最终重建三维立体，它是计算机视觉要解决的主要问题之一，广泛应用于机器人导航、模糊识别、虚拟现实和建筑物重建等各个领域。Image-based model reconstruction is a frontier issue in computer graphics research. This technology combines theories and methods in many fields such as computer graphics, image processing, and computer vision. It uses the two-dimensional information contained in the image scene to obtain three-dimensional data for model reconstruction, and realizes model reconstruction in the virtual scene. Therefore, It has a good application prospect in computer-aided design and reverse engineering. Image-based modeling technology is based on two-dimensional images for image understanding and final reconstruction of three-dimensional stereo, it is one of the main problems to be solved in computer vision, widely used in robot navigation, fuzzy recognition, virtual reality and building reconstruction and other fields.

最初关于三维重建的研究是基于几何信息方法的，如点云等。近些年基于图像的三维重建技术兴起，它使用直接拍摄到的照片进行重建，克服了传统的基于几何重建技术中的标定问题，其具有很大的优越性，因而基于图像的三维重建技术成为众多学者研究的重要课题。The initial research on 3D reconstruction is based on geometric information methods, such as point cloud and so on. In recent years, image-based 3D reconstruction technology has emerged. It uses directly captured photos for reconstruction, which overcomes the calibration problem in traditional geometry-based reconstruction technology. It has great advantages. Therefore, image-based 3D reconstruction technology has become An important subject of research by many scholars.

目前大部分研究是针对两幅或多幅(序列)图像的三维重建。多幅图重建技术上需要先对每幅图像进行繁杂的预处理，寻找图像间用于匹配的特征点，而特征点匹配又是图像处理中的难点，因此利用多幅图像进行三维重建操作上存在重建成本高、操作复杂、计算量大，不适用于动态场景重建等问题。Most of the current research is on the 3D reconstruction of two or more (sequence) images. Multi-image reconstruction technology needs to perform complicated preprocessing on each image to find the feature points used for matching between images, and feature point matching is a difficult point in image processing. Therefore, the use of multiple images for 3D reconstruction operations There are problems such as high reconstruction cost, complicated operation, large amount of calculation, and it is not suitable for dynamic scene reconstruction.

基于单幅图像的三维重建的主要思想是通过单张数码影像提取目标的颜色、形状、共面性等二维、三维几何信息，从而利用少量已知条件获取该目标的空间三维信息。单幅图像的三维重建操作上避免了多幅图像重建的麻烦，它重建过程简单、速度快、只需拍摄一张角度合适的数码相片即可获得该目标的三维几何信息；它投入少，不需要多个摄像机或投影仪进行标定，大大减少了人力、物力的投资；而且技术上只对一幅图像进行预处理，无需多幅图像的匹配，避开了多幅图像重建的匹配难点，大大节约了时间，提高了效率。因此，用单幅图像进行三维重建越来越多地得到人们的重视。The main idea of 3D reconstruction based on a single image is to extract the 2D and 3D geometric information of the target such as color, shape, and coplanarity from a single digital image, so as to obtain the spatial 3D information of the target using a small number of known conditions. The three-dimensional reconstruction operation of a single image avoids the trouble of multiple image reconstruction. Its reconstruction process is simple and fast. It only needs to take a digital photo with a suitable angle to obtain the three-dimensional geometric information of the target; Multiple cameras or projectors are required for calibration, which greatly reduces the investment in manpower and material resources; and technically, only one image is preprocessed, without the need for multiple image matching, which avoids the matching difficulties of multiple image reconstruction, greatly Save time and improve efficiency. Therefore, more and more people pay attention to 3D reconstruction with a single image.

目前的研究方法中，单幅图像的三维重建方法包括交互式的三维场景构建方法和全自动的三维场景构建方法。交互式的三维场景构建方法需要用户的交互进行指导，全自动的三维场景构建方法一般是基于图像特征利用机器学习的方法获得相应的场景结构分类器，利用分类器将图像区域进行分类标注，在此基础上进行三维场景的建模。交互式的建模方法精度高，但需要用户交互指导。全自动的三维场景构建方法是近年来的研究热点，如何快速准确地估计图像区域类别，提高自动重建的精确度是全自动的单幅图像三维重建方法面临的主要问题。Among the current research methods, the 3D reconstruction method of a single image includes an interactive 3D scene construction method and a fully automatic 3D scene construction method. Interactive 3D scene construction methods require user interaction for guidance. Fully automatic 3D scene construction methods generally use machine learning methods to obtain corresponding scene structure classifiers based on image features, and use classifiers to classify and mark image regions. On this basis, the modeling of the 3D scene is carried out. Interactive modeling methods are highly accurate, but require user interaction guidance. The fully automatic 3D scene construction method is a research hotspot in recent years. How to quickly and accurately estimate the image region category and improve the accuracy of automatic reconstruction is the main problem faced by the fully automatic single image 3D reconstruction method.

发明内容Contents of the invention

本发明的目的是提供一种全自动的基于单幅图像的三维场景构建方法，在有效获取图像几何区域分类标注结果及图像中几何区域之间精准边界的基础上进行具有真实感的三维场景的构建。The purpose of the present invention is to provide a fully automatic method for constructing a three-dimensional scene based on a single image, and to construct a three-dimensional scene with a sense of reality on the basis of effectively obtaining the classification and labeling results of the geometric regions of the image and the precise boundaries between the geometric regions in the image. Construct.

本发明的目的是通过以下技术方案实现的：一种全自动的基于单幅图像的三维场景构建方法，包括以下步骤：The purpose of the present invention is achieved by the following technical solutions: a fully automatic three-dimensional scene construction method based on a single image, comprising the following steps:

步骤1：利用训练图像集获得能够进行图像几何区域划分的分类器Step 1: Use the training image set to obtain a classifier capable of image geometric region division

图像几何区域划分的分类器是基于机器学习获得的，首先需要收集训练图像集，然后利用训练图像集获得一组训练样本，最后利用训练样本训练分类器；所述训练样本的是在训练图像集上获得，包括样本标注和样本提取；The classifier of image geometric region division is obtained based on machine learning. First, it is necessary to collect a training image set, then use the training image set to obtain a set of training samples, and finally use the training samples to train the classifier; the training samples are in the training image set Obtained on the Internet, including sample annotation and sample extraction;

所述样本标注是指对训练图像集里面的每一幅图进行几何区域的标注，即把每一幅图像的整个区域划分成多个几何子区域，每个几何子区域应当归属为三种类别中的一种，这三种类别分别是：竖立区域、地面区域和天空区域；The sample annotation refers to the annotation of the geometric area of each image in the training image set, that is, the entire area of each image is divided into multiple geometric sub-areas, and each geometric sub-area should belong to three categories The three categories are: vertical area, ground area and sky area;

经过样本标注后，需要提取真正用于训练的样本集。为了尽可能精确的对图像区域进行几何子区域的划分，以30*40的矩形块作为样本单元，把每一幅图像以10为间隔步长划分成一系列具有一定重叠区域的30*40的样本矩形块。对于每个样本矩形块，提取1031维的样本特征；从而对于每一幅训练图像可以获得一组训练样本(一个训练样本集)，而所有训练图像的训练样本集形成最终的训练样本集；After the samples are labeled, it is necessary to extract the sample set that is actually used for training. In order to divide the image area into geometric sub-regions as accurately as possible, a 30*40 rectangular block is used as a sample unit, and each image is divided into a series of 30*40 samples with a certain overlapping area at a step size of 10. rectangular blocks. For each sample rectangular block, extract 1031-dimensional sample features; thus for each training image, one group of training samples (a training sample set) can be obtained, and the training sample sets of all training images form the final training sample set;

提取了训练样本集，采用有监督的训练方式获得能够进行图像几何区域划分的分类器，即采用支持向量机SVM(Support Vector Machine)分类器，训练得到的分类器模型能够输出一个测试样本分别属于三种类别的概率；The training sample set is extracted, and a supervised training method is used to obtain a classifier that can divide the geometric region of the image, that is, a Support Vector Machine (SVM) classifier is used. The trained classifier model can output a test sample that belongs to Probabilities for the three classes;

步骤2：利用训练得到的分类器对用户输入的图像进行几何区域的划分，得到粗略分类标注的结果；Step 2: Use the trained classifier to divide the geometric area of the image input by the user, and obtain the result of rough classification and labeling;

输入一幅图像，先以10为间隔步长将图像区域划分成一系列具有一定重叠区域的30*40的样本矩形块，对于每一个样本矩形块提取1031维度的样本特征；对于每一个样本矩形块，分类器根据其1031维的样本特征，输出该样本分别属于三种类别的概率：p(v|P_i)、p(g|P_i)和p(s|P_i)，其中p(v|P_i)表示样本P_i属于竖立区域的概率，p(g|P_i)和p(s|P_i)分别表示样本P_i属于地面区域和天空区域的概率；Input an image, first divide the image area into a series of 30*40 sample rectangular blocks with a certain overlapping area with a step size of 10, and extract 1031-dimensional sample features for each sample rectangular block; for each sample rectangular block , the classifier outputs the probabilities that the sample belongs to three categories according to its 1031-dimensional sample features: p(v|P _i ), p(g|P _i ) and p(s|P _i ), where p(v |P _i ) represents the probability that the sample P _i belongs to the vertical area, p(g|P _i ) and p(s|P _i ) represent the probability that the sample P _i belongs to the ground area and the sky area, respectively;

对于图像区域每一个10*10的决策单元C_j其属于三种类别的概率由N个包含该决策单元的样本矩形块的类别共同决定，每一个决策单元C_j其属于三种类别的概率计算为：For each 10*10 decision-making unit C _j in the image area, the probability of belonging to the three categories is determined by the categories of the N sample rectangular blocks containing the decision-making unit, and the probability of each decision-making unit C _j belonging to the three categories is calculated for:

其中N表示包含决策单元C_j的样本矩形块的个数，P_i表示N个矩形块中的某个，从而获得决策单元C_j分别属于三种类别的概率大小；p(v|C_j)表示决策单元C_j属于竖立区域的概率，p(g|C_j)和p(s|C_j)分别表示决策单元C_j属于地面区域和天空区域的概率；Among them, N represents the number of sample rectangular blocks containing the decision-making unit C _j , and P _i represents one of the N rectangular blocks, so as to obtain the probability that the decision-making unit C _j belongs to the three categories; p(v|C _j ) Indicates the probability that the decision-making unit C _j belongs to the vertical area, p(g|C _j ) and p(s|C _j ) represent the probability that the decision-making unit C _j belongs to the ground area and the sky area, respectively;

当且仅当决策单元C_j属于某种类别的概率p^*＞0.5时，才标注该决策单元为该类别，否则将其标注为未知类别；If and only if the probability p ^* of the decision-making unit C _j belonging to a certain category is greater than 0.5, the decision-making unit is marked as this category, otherwise it is marked as an unknown category;

步骤3：利用基于GrabCut图像分割算法修正步骤2中获得的粗略分类标注结果，并优化图像几何区域之间的边界，获得图像几何区域之间精准的边界Step 3: Use the GrabCut image segmentation algorithm to correct the rough classification and labeling results obtained in step 2, and optimize the boundaries between the geometric regions of the image to obtain precise boundaries between the geometric regions of the image

利用基于GrabCut图像分割算法时，以粗略分类结果中“可信的”的区域作为GrabCut的初始输入进行全自动地优化粗略标注结果，所述“可信”区域为具有较大可能性属于某种类别的像素的集合，即属于某种类别的概率大于0.5且在属于该类别的所有像素集合中属于概率较大的前90％；对于每一个类别均计算相应的“可信”区域，获得对于图像区域中某种类别P_*的“可信”区域；通过基于GrabCut图像分割算法的输出来修正粗略分类的结果，以获得图像中几何区域之间精准的边界；When using the GrabCut-based image segmentation algorithm, the "credible" region in the rough classification result is used as the initial input of GrabCut to automatically optimize the rough labeling result. A set of pixels of a category, that is, the probability of belonging to a certain category is greater than 0.5 and belongs to the top 90% of all pixel sets belonging to this category; the corresponding "credible" area is calculated for each category, and the The "credible" area of a certain category P _* in the image area; the result of the rough classification is corrected by the output of the GrabCut image segmentation algorithm to obtain the precise boundary between the geometric areas in the image;

步骤4、针对步骤3输出的标注结果，利用计算机图形学的方法进行三维场景的建模，提供用户具有真实感的三维场景漫游。Step 4. Based on the labeling results output in step 3, the computer graphics method is used to model the 3D scene, so as to provide users with realistic 3D scene roaming.

根据图像中几何区域之间精准的边界信息，把图像区域裁剪成不同的几何区域；在设定相机参数的基础上，通过参考地面引入相对深度信息，从而恢复出图像场景中几何区域的重要顶点的三维坐标；最终利用平面近似各个几何子区域，并把各个区域按照几何关系放置在三维场景中，从而生成具有真实感的三维场景漫游。According to the precise boundary information between the geometric areas in the image, the image area is cropped into different geometric areas; on the basis of setting the camera parameters, the relative depth information is introduced by referring to the ground, so as to restore the important vertices of the geometric areas in the image scene The three-dimensional coordinates; finally use the plane to approximate each geometric sub-area, and place each area in the three-dimensional scene according to the geometric relationship, so as to generate a realistic three-dimensional scene roaming.

所述步骤1中，1031维的样本特征包括：1000维的Bag of Visual Words特征、30维的颜色特征和1维的位置特征。In the step 1, the 1031-dimensional sample features include: 1000-dimensional Bag of Visual Words feature, 30-dimensional color feature and 1-dimensional position feature.

所述步骤1中，SVM分类器中的基函数选为径向基函数，模型类别选为多类别的分类器，概率估计参数b设置为1，即训练得到的分类器能够输出一个测试样本分别属于三种类别的概率。In the step 1, the basis function in the SVM classifier is selected as a radial basis function, the model category is selected as a multi-category classifier, and the probability estimation parameter b is set to 1, that is, the trained classifier can output a test sample respectively Probability of belonging to three classes.

所述步骤3具体实现为：The specific implementation of the step 3 is:

(1)中对于图像区域中某种类别P_*的“可信”区域的计算方法为：The calculation method for the "credible" area of a certain category P _* in the image area in (1) is:

对粗略标注结果中属于该类别P_*的所有像素按照它们属于该类别的概率大小降序排列，移除概率较小的像素，其百分比为k％； Arrange all the pixels belonging to the category P _* in the rough labeling result in descending order according to the probability that they belong to the category, and remove the pixels with lower probability, and the percentage is k%;

产生一个与P_*对应的二值模板图像M_*，M_*和原图大小一样，凡属于集合P_*中的像素，其在模板图像M_*的对应像素位置值为1，否则值为0； Generate a binary template image M _* corresponding to P _* , M _* is the same size as the original image, and any pixel belonging to the set P _* has a value of 1 at the corresponding pixel position of the template image M _* , otherwise the value is 0;

检测模板图像M_*中的连通区域，对于连通区域内部存在的面积小于A的0值区域，以1值覆盖填充； Detect the connected region in the template image M _* , and fill in the 0-value region with an area smaller than A in the connected region;

以大小为β的结构元素腐蚀模板图像M_*。对于被腐蚀的像素视为可能属于该类别的像素，其集合记为，经过腐蚀后模板图像M_*中值为1的像素，视为该类别的“可信”像素，其集合记为； The template image M _* is etched with a structuring element of size β. For the corroded pixels that may belong to this category, the set is denoted as , the pixel with a value of 1 in the template image M _* after erosion is regarded as the "credible" pixel of this category, and its set is denoted as ;

根据所述的计算“可信”区域的方法分别获得三种类别(地面、竖立和天空区域)的“可信”像素集和以及“可能”像素集和计算参数分别为：对于竖立区域，k,A,β分别取10,5000,20，针对地面区域和天空区域，k,A,β取分别0,5000,10；The "credible" pixel sets of the three categories (ground, vertical and sky areas) are respectively obtained according to the method of calculating the "credible" area with and the "maybe" pixel set with The calculation parameters are: for the vertical area, k, A, β are 10, 5000, 20 respectively; for the ground area and sky area, k, A, β are 0, 5000, 10 respectively;

(2)GrabCut算法进行全自动地优化粗略标注结果的实现为：(2) The implementation of the GrabCut algorithm to automatically optimize the rough labeling results is as follows:

根据所述的“可信”像素集和“可能”像素集分别对每个类别进行单独分割，对三类中某个类别的单独分割，其计算方法为：将该类别区域视为前景，另外两个类别区域视为背景。具体地，将该类别中“可信”的像素视为前景像素，另外两个类别的“可信”像素视为背景像素，并将该类别中的“可能”像素视为可能的前景，而剩下的其他像素均看作可能的背景；利用上述信息初始化GrabCut分割算法，分别建立前景和背景的混合高斯模型，经过分割后可以获得以某类别区域为前景的单独分割结果；According to the "credible" pixel set and the "possible" pixel set, each category is separately segmented, and the separate segmentation of a category among the three categories is calculated as follows: the category area is regarded as the foreground, and in addition The two-category regions are considered as background. Specifically, "trustworthy" pixels in this category are regarded as foreground pixels, "trustworthy" pixels in the other two categories are regarded as background pixels, and "likely" pixels in this category are regarded as possible foreground pixels, while The rest of the other pixels are regarded as the possible background; use the above information to initialize the GrabCut segmentation algorithm, respectively establish the mixed Gaussian model of the foreground and the background, and after segmentation, a separate segmentation result with a certain type of area as the foreground can be obtained;

根据所述的单独分割的结果进一步优化标注结果，方法为：在三步单独分割结果的基础上，按照单独分割某个区域的方法，再次以竖立区域为前景进行前景背景的分离从而获得最终的图像分割结果；According to the results of the separate segmentation, the labeling results are further optimized. The method is: on the basis of the three-step separate segmentation results, according to the method of separately segmenting a certain area, the foreground and background are separated again with the vertical area as the foreground to obtain the final Image segmentation results;

根据天空和地面单独分割的结果可以大致估计出地平线的位置，利用地平线将最终的图像分割结果中的背景区域划分成天空和地面区域，其方法为：位于地平线之上的背景区域标注为天空区域，位于地平线之下的背景区域标注为地面区域。According to the results of separate segmentation of the sky and the ground, the position of the horizon can be roughly estimated, and the background area in the final image segmentation result is divided into sky and ground areas by using the horizon. The method is: the background area above the horizon is marked as the sky area , the background area below the horizon is marked as the ground area.

在所述的图像几何区域标注的结果之上，利用计算机图形学的方法进行三维场景的建模，提供用户具有真实感的三维场景漫游，包括：Based on the results of the image geometric area labeling, the computer graphics method is used to model the three-dimensional scene, providing users with a realistic three-dimensional scene roaming, including:

根据所述的图像几何区域标注结果，获得图像几何区域之间的精准边界，用Douglas-Peucker算法对地面和竖立区域的边界用多边形近似获得边界的拟合多边形；According to the labeling result of the image geometric area, the precise boundary between the image geometric areas is obtained, and the boundary of the ground and the erected area is approximated with a polygon to obtain a fitted polygon of the boundary with the Douglas-Peucker algorithm;

其中上述利用计算机图形学的方法进行三维场景的建模的步骤为：Wherein the above-mentioned steps of using the method of computer graphics to carry out the modeling of the three-dimensional scene are:

(1)对所述的场景建模，使用针孔相机模型，光轴通过图像中心，世界坐标系和相机坐标系重合，相机视野设置为1.431rad；(1) To model the scene, use the pinhole camera model, the optical axis passes through the image center, the world coordinate system and the camera coordinate system coincide, and the camera field of view is set to 1.431rad;

(2)利用参考地平面，获得场景中重要顶点的三维坐标，方法为：引入参考地平面，地平面的高度设置为-5；根据上述建模信息获得投影矩阵，在地面高度确定的条件下，通过反投影，计算出图像中地面区域的每个像素所对应的在三维场景中的三维坐标，特别地，可以获得地面区域和竖立区域边界点的三维坐标；(2) Use the reference ground plane to obtain the three-dimensional coordinates of the important vertices in the scene. The method is: introduce the reference ground plane, and set the height of the ground plane to -5; obtain the projection matrix according to the above modeling information, under the condition that the ground height is determined , through back projection, calculate the three-dimensional coordinates in the three-dimensional scene corresponding to each pixel of the ground area in the image, especially, the three-dimensional coordinates of the boundary points of the ground area and the vertical area can be obtained;

(3)根据所述的地面区域和竖立区域边界点的三维坐标以及地面区域和竖立区域边界的拟合多边形，获得一系列竖直平面，方法为：将地面区域和竖立区域边界的拟合多边形上的每一条折线视为某个竖直平面和地面的区域的交线，每个竖直平面的上边界由所述的图像标注结果中竖立区域和天空区域的边界确定；(3) According to the three-dimensional coordinates of the boundary points of the ground area and the erected area and the fitting polygon of the ground area and the border of the erected area, obtain a series of vertical planes, the method is: the fitting polygon of the border of the ground area and the erected area Each polyline on above is regarded as the intersection line of a certain vertical plane and the area of the ground, and the upper boundary of each vertical plane is determined by the boundary of the vertical area and the sky area in the image annotation result;

(4)对所述的竖直平面和地面区域，利用纹理映射获得具有真实感的三维场景模型；真实感的场景漫游包括：变换相机的视角、调节焦距和变换观察位置观察场景模型。(4) For the vertical plane and the ground area, use texture mapping to obtain a realistic three-dimensional scene model; realistic scene roaming includes: changing the viewing angle of the camera, adjusting the focal length and changing the observation position to observe the scene model.

由上述本发明提供的技术方案可以看出，通过基于机器学习的方法训练得到一个能将输入的图像粗略划分成不同几何子区域的支持向量机(SVM)。这些子区域属于三大类的一种(竖立区域、地面区域和天空区域)。由于粗略分类标注的结果中，图像几何区域之间的边界常出现错分和混淆，从而提出利用图像分割的算法来修正粗略分类标注的结果，以获得几何区域之间精准的边界。利用精准的边界信息，在进行三维场景的构建中可以避免由于边界不精确导致的失真，从而生成具有真实感的三维场景模型。It can be seen from the above-mentioned technical solution provided by the present invention that a support vector machine (SVM) capable of roughly dividing an input image into different geometric sub-regions can be obtained through machine learning-based training. These subregions belong to one of three categories (vertical regions, ground regions, and sky regions). Due to the misclassification and confusion of boundaries between image geometric regions in the results of rough classification and labeling, an image segmentation algorithm is proposed to correct the results of rough classification and labeling to obtain precise boundaries between geometric regions. Using accurate boundary information, the distortion caused by inaccurate boundaries can be avoided in the construction of the 3D scene, thereby generating a realistic 3D scene model.

本发明与现有技术相比的优点在于：The advantage of the present invention compared with prior art is:

(1)本发明结合了机器学习和图像分割的优势，以图像分割方法修正机器学习获得的粗略分类标注结果,获得更为精准的几何区域之间的边界，从而在进行三维场景的构建中可以更好地避免由于边界不精确导致的场景模型失真。(1) The present invention combines the advantages of machine learning and image segmentation, corrects the rough classification and labeling results obtained by machine learning with the image segmentation method, and obtains more accurate boundaries between geometric regions, so that it can be used in the construction of three-dimensional scenes Better avoid scene model distortions due to inaccurate boundaries.

(2)在本发明获得的图像分类标注精确度和现有技术具有可比性的基础上，本发明提出的技术解决方法更加简单。现有的技术侧重于利用更多有效的图像特征和构建复杂的分类标注模型来达到更高的分类标注精确度。就图像特征而言，本发明仅使用了少量有效的图像特征。就分类标注模型而言，本发明仅采用单分类器模型。在获得具有可比性的分类标注精确度的基础上，相较于现有技术需要使用更多图像特征以及需要构建复杂的分类模型的而言，本发明提出的技术方案显得更为简单，复杂度低，易于实现。(2) On the basis that the accuracy of image classification and labeling obtained by the present invention is comparable to that of the prior art, the technical solution proposed by the present invention is simpler. Existing technologies focus on using more effective image features and building complex classification and labeling models to achieve higher classification and labeling accuracy. In terms of image features, the present invention only uses a small number of effective image features. As far as the classification labeling model is concerned, the present invention only uses a single classifier model. On the basis of obtaining comparable classification labeling accuracy, compared with the prior art that needs to use more image features and needs to build complex classification models, the technical solution proposed by the present invention is simpler and less complex. low and easy to implement.

附图说明Description of drawings

图1为本发明的技术方案的系统流程示意图；Fig. 1 is a system flow diagram of the technical solution of the present invention;

图2为本发明的实施例一中使用的训练图像集中的部分图像；Fig. 2 is a partial image in the training image set used in Embodiment 1 of the present invention;

图3为本发明技术方案中的基于图像分割算法来修正粗略分类标注结果的算法流程图；Fig. 3 is the algorithm flow chart of correcting the rough classification labeling result based on the image segmentation algorithm in the technical solution of the present invention;

图4为本发明技术方案涉及的GrabCut图像分割算法在不同初始化方法下的分割结果对比图；Fig. 4 is the comparison diagram of the segmentation results of the GrabCut image segmentation algorithm involved in the technical solution of the present invention under different initialization methods;

图5为本发明实施例一中的输入图像在本发明技术方案提出的基于粗略分类标注结果的“四步”GrabCut算法下获得的地面区域和竖立区域之间的精准边界；Fig. 5 is the precise boundary between the ground area and the vertical area obtained under the "four-step" GrabCut algorithm based on the rough classification and labeling results proposed by the technical solution of the present invention for the input image in Embodiment 1 of the present invention;

图6为本发明实施例一的输入图像的三维场景模型在不同视角下观察的结果图；FIG. 6 is a result diagram of observing the three-dimensional scene model of the input image under different viewing angles according to Embodiment 1 of the present invention;

图7是按照本发明的技术方案，在数据库1上进行分类标注获得的混淆矩阵；Fig. 7 is according to the technical scheme of the present invention, carries out the confusion matrix obtained by classifying and labeling on database 1;

图8是按照本发明的技术方案，在数据库2上以6折交叉验证的方式获得的分类标注结果的精确度和现有技术在该数据库上分类标注结果的对比；Fig. 8 is according to the technical scheme of the present invention, the comparison of the accuracy of the classification and labeling results obtained in the manner of 6-fold cross-validation on the database 2 and the classification and labeling results of the prior art on the database;

图9是按照本发明的技术方案，在数据库1和数据库2上采用支持向量机为分类器进行粗略分类标注获得的分类标注精确度和采用图像分割算法进行标注结果修正的精确度的对比。Fig. 9 is a comparison of classification and labeling accuracy obtained by using a support vector machine as a classifier for rough classification and labeling on database 1 and database 2 and the accuracy of labeling result correction using an image segmentation algorithm according to the technical solution of the present invention.

具体实施方式detailed description

下面结合本发明实施例中的附图，对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。The technical solution of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本发明实施例所述的场景图像是针对室外场景图像，由于室外场景图像内容可以由三种类型的几何区域构成：竖立区域、地面区域和天空区域。一般情况下，室外场景图像内容可以由以上三种类型的几何区域组合构成，例如图2所示的几种常见的室外场景图像中，地面区域可以是草地、道路等，竖立区域可以是建筑、树木等，天空区域即为天空。由于本发明不仅仅要对图像内容进行精准的标注，还需要根据标注结果进行三维场景的构建，而三维场景构建中假设有参考地面的存在，所以适用于本发明技术方案进行三维场景构建的室外场景图像至少包括地面区域。如果仅用本发明技术方案进行图像内容的分类标注，则适用范围不受限于图像需要包括地面区域的假设。The scene image described in the embodiment of the present invention is aimed at the outdoor scene image, because the content of the outdoor scene image may be composed of three types of geometric areas: vertical area, ground area and sky area. In general, the content of an outdoor scene image can be composed of the above three types of geometric areas. For example, in several common outdoor scene images shown in Figure 2, the ground area can be grass, roads, etc., and the vertical area can be buildings, trees, etc., the sky area is the sky. Since the present invention not only needs to accurately label the image content, but also needs to construct a 3D scene according to the labeling results, and the existence of a reference ground is assumed in the construction of the 3D scene, so it is suitable for the outdoor environment where the technical solution of the present invention is used to construct the 3D scene. The scene image includes at least a ground area. If only the technical solution of the present invention is used to classify and mark the image content, the scope of application is not limited to the assumption that the image needs to include the ground area.

由于室外场景图像内容可以由三种类型的几何区域构成:竖立区域、地面区域和天空区域。不同的几何区域其图像特征具有一些可以区分性的特征，比如颜色，天空的颜色常见的可以有蓝色，草地的颜色一般是绿色。基于这些观察，本发明首先利用图像数据集来训练一个具有将图像内容按照三种几何区域进行划分判别的分类器，即通过训练得到的分类器能够把输入图像按照图像的局部特征将其划分成不同的几何子区域。本发明实施例中使用的图像特征包括：Dense SIFT(稠密的尺度不变特征转换，Dense Scale InvariantFeature Transform)特征、Bag of Visual Words(视觉词袋特征)，颜色特征(采用LUV或者RGB)，位置特征(采用归一化的高度值h)。训练的分类器采用支持向量机SVM(SupportVector Machine)，训练采用的基函数选为径向基函数，模型类别选为多类别的分类器，概率估计参数b设置为1，即训练得到的模型能够输出一个测试样本分别属于三种类别的概率，其余设置均选用默认参数。Since the outdoor scene image content can be composed of three types of geometric regions: vertical region, ground region and sky region. The image features of different geometric regions have some distinguishable features, such as color, the common color of the sky can be blue, and the color of grass is generally green. Based on these observations, the present invention first uses the image data set to train a classifier that can divide the image content according to three geometric regions, that is, the classifier obtained through training can divide the input image into three types according to the local features of the image. Different geometric subregions. The image features used in the embodiment of the present invention include: Dense SIFT (dense scale invariant feature transformation, Dense Scale InvariantFeature Transform) feature, Bag of Visual Words (visual word bag feature), color feature (using LUV or RGB), position feature (takes the normalized height value h). The trained classifier adopts Support Vector Machine SVM (SupportVector Machine), the basis function used in training is selected as radial basis function, the model category is selected as a multi-category classifier, and the probability estimation parameter b is set to 1, that is, the trained model can Output the probability that a test sample belongs to the three categories, and the rest of the settings use the default parameters.

上述采用机器学习训练的图像几何区域的分类器所使用的图像特征是局部特征，其对图像内容进行几何区域判别分类时，虽然可以获得有效的分类结果，但由于缺乏全局的约束会导致一些语义上的错分以及区域之间的边界不够精准等问题，所以利用GrabCut图像分割算法，进一步引入图像区域之间的约束，以优化和修正由分类器输出的粗略分类标注的结果，从而获得图像几何区域之间更为精准的边界。利用精准的边界信息，在进行三维场景的建模中可以避免由于边界不明确导致的失真，从而生成具有真实感的三维场景模型。The above-mentioned image features used by the classifier of the image geometric area trained by machine learning are local features. When it discriminates and classifies the geometric area of the image content, although effective classification results can be obtained, the lack of global constraints will lead to some semantic problems. Therefore, the GrabCut image segmentation algorithm is used to further introduce constraints between image regions to optimize and correct the results of rough classification and labeling output by the classifier, so as to obtain image geometry More precise boundaries between regions. Using accurate boundary information, the distortion caused by unclear boundaries can be avoided in the modeling of the 3D scene, thereby generating a realistic 3D scene model.

实施例一Embodiment one

图1为本发明实施例一提供的全自动的基于单幅图像的三维场景建模方法的系统流程图。实施例一的主要步骤包括：FIG. 1 is a system flowchart of a fully automatic single image-based 3D scene modeling method provided by Embodiment 1 of the present invention. The main steps of embodiment one include:

步骤1、利用训练图像集获得能够进行图像几何区域划分的分类器。Step 1. Use the training image set to obtain a classifier capable of dividing the image into geometric regions.

因为本发明实施例中图像几何区域划分的分类器是基于机器学习获得的，所以首先需要收集训练图像集，然后利用训练图像集获得一组训练样本，最后利用训练样本训练分类器。Because the classifier for image geometric region division in the embodiment of the present invention is obtained based on machine learning, it is first necessary to collect a training image set, then use the training image set to obtain a set of training samples, and finally use the training samples to train the classifier.

训练图像集的收集可以通过互联网搜索获得。由于室外场景图像的内容千变万化，收集的训练图像应当具有代表性，尽量多的涵盖各种可能的室外场景。附图2中展示了实施例一中使用的训练图像集中的部分图像，这些图像是几种常见的室外场景图像，它们至少包含了三种类别(地面、竖立和天空)中的一种。三种类别中的地面区域可以是草地、道路等，竖立区域可以是建筑、树木等，天空区域即为天空。当然，如果只是有针对特定室外场景的应用，训练图像集可以更具针对性，比如仅仅针对室外街景图像进行构建，则可以收集不同类别的街景图作为训练图像集。Collections of training image sets are available through internet searches. Since the content of outdoor scene images is ever-changing, the collected training images should be representative and cover as many possible outdoor scenes as possible. Figure 2 shows some images in the training image set used in Embodiment 1. These images are several common outdoor scene images, and they contain at least one of three categories (ground, vertical and sky). The ground area in the three categories can be grass, roads, etc., the vertical area can be buildings, trees, etc., and the sky area is the sky. Of course, if there is only an application for a specific outdoor scene, the training image set can be more targeted. For example, only for outdoor street view images, you can collect street view images of different categories as the training image set.

训练样本的是在训练图像集上获得，包括样本标注和样本提取。The training samples are obtained on the training image set, including sample annotation and sample extraction.

样本标注是指对训练图像集里面的每一幅图进行几何区域的标注，即把每一幅图像的整个区域划分成多个几何子区域，每个几何子区域应当归属为三种类别中的一种。这三种类别分别是：竖立区域、地面区域和天空区域。因为在本发明中采用有监督的方式训练分类器，所以样本标注需要人为手动进行标注。Sample labeling refers to labeling the geometric area of each image in the training image set, that is, dividing the entire area of each image into multiple geometric sub-areas, and each geometric sub-area should belong to one of the three categories A sort of. The three categories are: vertical area, ground area and sky area. Since the classifier is trained in a supervised manner in the present invention, the sample labeling needs to be manually marked.

经过样本标注后，需要提取真正由于训练的样本集。本发明的目的是尽可能精确的对图像区域进行几何子区域的划分，因此本发明的实施例中均以10*10的矩形块作为决策单元，以30*40的矩形块作为样本单元。本发明的实施例中把每一幅图像以10为间隔步长划分成一系列具有一定重叠区域的30*40的样本矩形块，以800*600的图像为例，可以获得58*77＝4466个样本矩形块。对于每个样本矩形块，提取1031维的样本特征，具体包括1000维的Bag of Visual Words特征，30维的颜色特征，1维的位置信息。After the samples are labeled, it is necessary to extract the real sample set due to training. The purpose of the present invention is to divide the image area into geometric sub-regions as accurately as possible. Therefore, in the embodiments of the present invention, a 10*10 rectangular block is used as a decision unit, and a 30*40 rectangular block is used as a sample unit. In the embodiment of the present invention, each image is divided into a series of 30*40 sample rectangular blocks with a certain overlapping area at intervals of 10. Taking an image of 800*600 as an example, 58*77=4466 pieces can be obtained Sample rectangular blocks. For each sample rectangular block, 1031-dimensional sample features are extracted, including 1000-dimensional Bag of Visual Words features, 30-dimensional color features, and 1-dimensional position information.

要提取1000维的Bag of Visual Words特征，首先需要预先提取每幅训练图像的Dense SIFT特征形成SIFT特征集，然后利用聚类算法对特征集进行聚类，获得SIFT特征的1000个聚类中心。本发明实施例中Dense SIFT特征提取采用的间隔步长为4，聚类算法采用K-means(K-均值)聚类算法。对于一幅训练图像中的每个30*40的样本矩形块，根据SIFT特征的聚类中心统计该矩形区域的SIFT特征词频直方图，形成1000维的Bag of VisualWords特征。本发明实施例中采用的颜色特征采用30维直方图特征，在LUV空间，每个通道统计10个维度的直方图特征。本发明实施例中采用的位置信息为1维度的相对高度信息，即每个样本矩形块在图像中的相对高度。对于每个样本矩形块提取1031维度的特征作为该样本的特征描述。对于每一幅训练图像可以获得一组训练样本，而所有训练图像的样本集形成最终的训练样本集。本发明实施例中仅使用纯的训练样本用于训练分类器，即训练样本所在的矩形区域均属于同一个类别的训练样本构成最终的训练样本集(最终的训练样本集包含三种类别的训练样本)。To extract 1000-dimensional Bag of Visual Words features, firstly, it is necessary to pre-extract Dense SIFT features of each training image to form a SIFT feature set, and then use a clustering algorithm to cluster the feature set to obtain 1000 cluster centers of SIFT features. In the embodiment of the present invention, the interval step used in Dense SIFT feature extraction is 4, and the clustering algorithm adopts K-means (K-means) clustering algorithm. For each 30*40 sample rectangular block in a training image, the SIFT feature word frequency histogram of the rectangular area is counted according to the cluster center of the SIFT feature to form a 1000-dimensional Bag of VisualWords feature. The color features used in the embodiment of the present invention adopt 30-dimensional histogram features, and in LUV space, each channel counts 10-dimensional histogram features. The position information used in the embodiment of the present invention is 1-dimensional relative height information, that is, the relative height of each sample rectangular block in the image. For each sample rectangular block, a 1031-dimensional feature is extracted as the feature description of the sample. A set of training samples can be obtained for each training image, and the sample sets of all training images form the final training sample set. In the embodiment of the present invention, only pure training samples are used to train the classifier, that is, the training samples in the rectangular area where the training samples are located all belong to the same category to form the final training sample set (the final training sample set includes three types of training samples). sample).

提取了训练样本集，本发明实例采用有监督的训练方式获得能够进行图像几何区域划分的分类器。具体地，分类器采用支持向量机SVM(Support Vector Machine)，基函数选为径向基函数，模型类别选为多类别的分类器，概率估计参数b设置为1，即训练得到的模型能够输出一个测试样本分别属于三种类别的概率。The training sample set is extracted, and the example of the present invention adopts a supervised training method to obtain a classifier capable of dividing image geometric regions. Specifically, the classifier adopts SVM (Support Vector Machine), the basis function is selected as the radial basis function, the model category is selected as a multi-category classifier, and the probability estimation parameter b is set to 1, that is, the trained model can output The probability that a test sample belongs to each of the three classes.

步骤2、对用户输入的图像，利用训练得到的分类器对其进行几何区域的划分，得到粗略分类标注的结果。Step 2. For the image input by the user, use the trained classifier to divide it into geometric regions, and obtain a rough classification and labeling result.

步骤2的目的是通过训练得到的分类器对输入的图像进行区域类别的粗略标注。输入一幅图像，先以10为间隔步长将图像区域划分成一系列具有一定重叠区域的30*40的样本矩形块，对于每一个样本矩形块提取1031维度的样本特征。对于每一个样本矩形块，分类器根据其1031维的样本特征，输出该样本分别属于三种类别的概率：p(v|P_i)、p(g|P_i)和p(s|P_i)，其中p(v|P_i)表示样本P_i属于竖立区域的概率，p(g|P_i)和p(s|P_i)分别表示样本P_i属于地面区域和天空区域的概率。The purpose of step 2 is to roughly label the region category of the input image through the trained classifier. Input an image, first divide the image area into a series of 30*40 sample rectangular blocks with a certain overlapping area with a step size of 10, and extract 1031-dimensional sample features for each sample rectangular block. For each sample rectangular block, the classifier outputs the probabilities that the sample belongs to three categories according to its 1031-dimensional sample features: p(v|P _i ), p(g|P _i ) and p(s|P _i ), where p(v|P _i ) represents the probability that the sample P _i belongs to the vertical region, and p(g|P _i ) and p(s|P _i ) represent the probability that the sample P _i belongs to the ground region and the sky region, respectively.

本发明的目的是尽可能精确的对图像区域进行子区域的划分，因此本发明的实施例中以10*10的矩形块作为决策单元(互相之间没有重叠)，在图像中每一个决策单元包含在多个样本矩形块中。本发明实施例中采用30*40的样本矩形块，采样间隔步骤是10，则在图像内部区域，每一个决策单元将包含于12个样本矩形块中。因此，对于每一个决策单元C_j其属于三种类别的概率可以由N个包含该决策单元的样本矩形块的类别共同决定。本发明实施例中，每一个决策单元C_j其属于三种类别的概率计算为：The purpose of the present invention is to divide the image area into sub-regions as accurately as possible, so in the embodiment of the present invention, a 10*10 rectangular block is used as a decision-making unit (without overlapping each other), and each decision-making unit in the image Contained in multiple sample rectangle blocks. In the embodiment of the present invention, a sample rectangular block of 30*40 is used, and the sampling interval step is 10, so in the internal area of the image, each decision-making unit will be included in 12 sample rectangular blocks. Therefore, for each decision-making unit _Cj , its probability of belonging to the three categories can be jointly determined by the categories of the N sample rectangular blocks containing the decision-making unit. In the embodiment of the present invention, the probability of each decision unit C _j belonging to the three categories is calculated as:

其中N表示包含决策单元C_j的样本矩形块的个数，P_i表示N个矩形块中的某个，从而获得决策单元C_j分别属于三种类别的概率大小。p(v|C_j)表示决策单元C_j属于竖立区域的概率，p(g|C_j)和p(s|C_j)分别表示决策单元C_j属于地面区域和天空区域的概率。Among them, N represents the number of sample rectangular blocks containing the decision-making unit C _j , and P _i represents one of the N rectangular blocks, so as to obtain the probabilities that the decision-making unit C _j belongs to the three categories. p(v|C _j ) represents the probability that the decision-making unit C _j belongs to the vertical area, and p(g|C _j ) and p(s|C _j ) represent the probability that the decision-making unit C _j belongs to the ground area and the sky area, respectively.

本发明实施例中，当且仅当决策单元C_j属于某种类别的概率p^*＞0.5时，才标注该决策单元为该类别，否则将其标注为未知类别。输入图像经过分类器(实施例中采用SVM)输出的分类标注结果比较粗略。一些错分主要发生在几何区域之间的边界处，区域内部也存在一些语义错分。为了修正粗略分类标注的结果获得几何区域之间精准的边界，从而利于真实感三维场景的建模，本发明提出一种基于图像分割的修正方法。In the embodiment of the present invention, if and only if the probability p ^* of a decision-making unit C _j belonging to a certain category is greater than 0.5, the decision-making unit is marked as the category, otherwise it is marked as the unknown category. The classification and labeling results output by the input image through the classifier (SVM is used in the embodiment) are relatively rough. Some misclassifications mainly occur at the boundaries between geometric regions, and there are also some semantic misclassifications inside regions. In order to correct the results of rough classification and labeling to obtain precise boundaries between geometric regions, thereby facilitating the modeling of realistic 3D scenes, the present invention proposes a correction method based on image segmentation.

步骤3、利用图像分割算法修正步骤2中获得的粗略分类标注结果，修正分类结果并优化图像几何区域之间的边界。Step 3. Use the image segmentation algorithm to correct the rough classification and labeling results obtained in step 2, correct the classification results and optimize the boundaries between the geometric regions of the image.

针对粗略标注的一些错分以及几何区域边界的不精准，本发明提出一种基于GrabCut图像分割算法的修正方法。GrabCut是一种有效的交互式的分离前景背景的图像分割算法。通过用户给定的一些初始标注信息来初始化前景背景的高斯混合模型。附图4中展示了不同的初始化方式下以竖立区域为前景的GrabCut图像分割算法的结果，图4中的(a)是仅用矩形框作为分割范围约束的GrabCut图像分割算法获得的竖立区域的分割结果，图4中的(b)和(c)是在矩形框约束的基础上，由用户交互标注了前景信息(竖立区域的线条)和背景信息(天空和地面区域的线条)作为GrabCut的分割输入的分割结果，图4中的(b)和(c)的区别在于(c)的用户交互更多，标注了更多前景和背景信息。图4中的(d)是本发明提出的基于图像粗略分类标注结果的全自动的GrabCut图像分割算法获得的竖立区域的分割结果。由于GrabCut图像分割算法需要用户交互，而本发明的旨在建立一个基于单幅图像的全自动的三维场景构建系统，所以不能直接利用GrabCut算法。注意到步骤2中产生的粗略分类结果，虽然存在一些错分，但是图像中仍然存在大部分标注正确的区域。因此，本发明提出利用粗略分类标注结果中“可信的”的区域作为GrabCut的初始输入。“可信”区域在这里定义为具有较大可能性属于某种类别的像素的集合，即属于某种类别的概率大于0.5且在属于该类别的所有像素集合中属于概率较大的前90％。在本发明实施例中，对于每一个类别都计算相应的“可信”区域。以三种类别中的某类别区域为例，“可信”区域的计算方法如下：Aiming at some misclassifications of rough labels and inaccurate boundaries of geometric regions, the present invention proposes a correction method based on the GrabCut image segmentation algorithm. GrabCut is an effective and interactive image segmentation algorithm for separating foreground and background. The Gaussian mixture model of the foreground and background is initialized by some initial annotation information given by the user. Figure 4 shows the results of the GrabCut image segmentation algorithm with the vertical area as the foreground under different initialization methods. (a) in Figure 4 is the vertical area obtained by the GrabCut image segmentation algorithm that only uses a rectangular frame as the segmentation range constraint. Segmentation results, (b) and (c) in Figure 4 are based on the constraints of the rectangular frame, and the foreground information (lines of the vertical area) and background information (lines of the sky and the ground area) are interactively marked by the user as GrabCut The segmentation results of the segmented input, the difference between (b) and (c) in Figure 4 is that (c) has more user interaction, and more foreground and background information is marked. (d) in FIG. 4 is the segmentation result of the vertical region obtained by the fully automatic GrabCut image segmentation algorithm based on the rough image classification and labeling results proposed by the present invention. Since the GrabCut image segmentation algorithm requires user interaction, and the present invention aims to establish a fully automatic 3D scene construction system based on a single image, the GrabCut algorithm cannot be directly used. Notice the rough classification results produced in step 2. Although there are some misclassifications, there are still most of the correctly labeled regions in the image. Therefore, the present invention proposes to use the "credible" region in the rough classification labeling result as the initial input of GrabCut. A "credible" region is defined here as a set of pixels with a high probability of belonging to a certain class, that is, the probability of belonging to a certain class is greater than 0.5 and belongs to the top 90% of all pixels belonging to the class . In an embodiment of the present invention, for each category a corresponding "trusted" region is calculated. Taking an area of one of the three categories as an example, the calculation method of the "trusted" area is as follows:

粗略标注结果中属于某类别区域的所有像素集合记为将该集合里面的像素按照属于该类别区域的概率大小降序排列。经过降序排列后，移除集合后k％的像素获得新的集合P_*。即集合中概率较小的后k％的像素视为“不可靠”像素并给予移除。 In the rough labeling result, all the pixel sets belonging to a certain category area are recorded as The pixels in the set are arranged in descending order according to the probability of belonging to the category area. After sorting in descending order, remove the collection The last k% of pixels get a new set P _* . the collection The bottom k% pixels with low probability are regarded as "unreliable" pixels and removed.

产生一个与P_*对应的二值模板图像M_*。M_*和原图大小一样，凡属于集合P_*中的像素，其在模板图像M_*的对应像素位置值为1，否则值为0。 Generate a binary template image M _{* corresponding to P*} _. M _* is the same size as the original image, and any pixel belonging to the set P _* has a value of 1 at the corresponding pixel position of the template image M _* , otherwise the value is 0.

检测模板图像M_*中的连通区域，对于连通区域内部存在的面积小于A的0值区域，以1值覆盖填充。 Detect the connected region in the template image M _* , and cover and fill with 1 value for the 0-valued region with an area smaller than A in the connected region.

以大小为β的结构元素腐蚀模板图像M_*。对于被腐蚀的像素视为“可能”属于该类别区域的像素，其集合记为。经过腐蚀后模板图像M_*中值为1的像素，视为该类别区域的“可信”像素，其集合记为。 The template image M _* is etched with a structuring element of size β. For the pixels that are regarded as "probably" belonging to the category area by the eroded pixels, the set is denoted as . After erosion, the pixels with a value of 1 in the template image M _* are regarded as "credible" pixels of the category area, and their collection is denoted as .

经过以上4步，可以获得某类别区域的“可信”像素集和“可能”属于该类别区域的像素集。采用上述方法，针对地面区域、竖立区域和天空区域可以分别获得“可信”像素集和以及“可能”像素集和本发明实施例中，对于竖立区域，(k,A,β)取(10,5000,20)，针对地面区域和天空区域，(k,A,β)取(0,5000,10)。After the above 4 steps, the "credible" pixel set of a certain category area can be obtained and the set of pixels that "likely" belong to the class region . Using the above method, "credible" pixel sets can be obtained for the ground area, vertical area and sky area respectively with and the "maybe" pixel set with In the embodiment of the present invention, (k, A, β) takes (10, 5000, 20) for the vertical area, and (k, A, β) takes (0, 5000, 10) for the ground area and the sky area.

由于GrabCut是针对前景背景分离的交互式的二值分割算法，而本发明实施例中涉及三种类别的区域：竖立、地面和天空。因此，本发明技术方案提出一种基于粗略分类标注结果的“四步”GrabCut算法进行全自动地优化粗略标注结果。Since GrabCut is an interactive binary segmentation algorithm aiming at separation of foreground and background, three types of regions are involved in the embodiment of the present invention: vertical, ground and sky. Therefore, the technical solution of the present invention proposes a "four-step" GrabCut algorithm based on rough classification and labeling results to automatically optimize the rough labeling results.

当获取了和之后，本发明实施例中先对每个类别分别进行单独分割。对三类中某个类别的单独分割，其计算方法为：将该类别区域视为前景，另外两个类别区域视为背景。具体地，将该类别中“可信”的像素视为前景像素，另外两个类别的“可信”像素视为背景像素，并将该类别中的“可能”像素视为可能的前景，而剩下的其他所有像素均看作可能的背景；利用上述信息初始化GrabCut分割算法，分别建立前景和背景的混合高斯模型，经过分割后可以获得以某类别区域为前景的单独分割结果。when acquired with Afterwards, in the embodiment of the present invention, each category is segmented separately. An individual segmentation of one of the three classes is computed by treating the region of that class as the foreground and the regions of the other two classes as the background. Specifically, "trustworthy" pixels in this category are regarded as foreground pixels, "trustworthy" pixels in the other two categories are regarded as background pixels, and "likely" pixels in this category are regarded as possible foreground pixels, while All other remaining pixels are regarded as possible backgrounds; the above information is used to initialize the GrabCut segmentation algorithm, and the mixed Gaussian models of the foreground and background are respectively established. After segmentation, a separate segmentation result with a certain type of area as the foreground can be obtained.

单独分割的结果可以修正粗略分类标注中不少的错分并且区域之间的边界更准确，但是竖立区域和地面区域，以竖立区域和天空区域之间仍然存在一些错分，为了进一步优化标注结果，本发明技术方案提出的“四步”GrabCut算法的第四步，以竖立区域为前景区域，地面和天空区域视为背景区域。在三步单独分割结果的基础上，按照单独分割竖立区域的方法，再次以竖立区域为前景进行前景背景的分割。图3中描述了利用GrabCut图像分割算法修正图像区域粗略分类标注结果的算法流程。相较于粗略分类标注结果而言，经过GrabCut图像分割算法后的分类标注结果，修正了粗略分类标注的不少错分并且几何区域之间的边界更加精确。利用天空和地面单独分割的结果可以大致估计出地平线的位置，从而可以进行几何校正，即利用地平线将最终的图像分割结果中的背景区域划分成天空和地面区域，其方法为：位于地平线之上的背景区域标注为天空区域，位于地平线之下的背景区域标注为地面区域The result of separate segmentation can correct many misclassifications in the rough classification labeling and the boundaries between regions are more accurate, but there are still some misclassifications between the vertical area and the ground area, and between the vertical area and the sky area. In order to further optimize the labeling results , the fourth step of the "four-step" GrabCut algorithm proposed by the technical solution of the present invention, the foreground area is taken as the vertical area, and the ground and sky areas are regarded as the background area. On the basis of the three-step separate segmentation results, according to the method of separately segmenting the vertical area, the foreground and background are segmented again with the vertical area as the foreground. Figure 3 describes the algorithm flow of using the GrabCut image segmentation algorithm to correct the results of rough classification and labeling of image regions. Compared with the rough classification and labeling results, the classification and labeling results after the GrabCut image segmentation algorithm have corrected many misclassifications of the rough classification and labeling and the boundaries between geometric regions are more precise. The position of the horizon can be roughly estimated by using the results of the separate segmentation of the sky and the ground, so that geometric correction can be performed, that is, the background area in the final image segmentation result is divided into sky and ground areas by using the horizon, and the method is: above the horizon The background area of , is marked as the sky area, and the background area below the horizon is marked as the ground area

图4中展示了GrabCut分割算法在不同初始化条件下的分割结果。对于较复杂的背景，GrabCut算法需要不少的用户交互才能获得较好的分割结果。而本发明技术方案提出的基于粗略分类标注结果的“四步”GrabCut分割算法在全自动的情况下可以获得不错的分割结果。Figure 4 shows the segmentation results of the GrabCut segmentation algorithm under different initialization conditions. For more complex backgrounds, the GrabCut algorithm requires a lot of user interaction to obtain better segmentation results. However, the "four-step" GrabCut segmentation algorithm based on the rough classification and labeling results proposed by the technical solution of the present invention can obtain good segmentation results under fully automatic conditions.

步骤3获得的图像几何区域的分类标注结果提供了几何区域之间精准的边界，如图5所示，曲线ABCDEF(白色线条)为地面区域和竖立区域之间的边界，其很好的区分开了竖立物体和地面。虽然仅仅由以上步骤获得的结果(几何标注区域、地平线的位置和区域边界)无法精确的恢复出三维场景模型，但仍然可以通过已有的信息，在合理的假设下对场景进行建模，提供用户具有真实感的三维场景漫游。The classification and labeling results of the geometric regions of the image obtained in step 3 provide precise boundaries between geometric regions, as shown in Figure 5, the curve ABCDEF (white line) is the boundary between the ground region and the vertical region, which is well distinguished upright objects and the ground. Although the results obtained by the above steps only (geometric labeling area, horizon position and area boundary) cannot accurately restore the 3D scene model, the scene can still be modeled under reasonable assumptions based on the existing information, providing The user roams through a realistic 3D scene.

本发明实施例中，使用针孔相机模型，光轴通过图像中心，同时假设世界坐标系和相机坐标系重合，相机视野设置为1.431rad。由于模型中参考地面的高度影响着场景模型的尺度，在本发明实施例中地平面的高度设置为-5。由以上条件可以获得投影矩阵，在地面高度确定的情况下，通过反投影，可以计算出图像中地面区域的每个像素所对应的在三维场景中的三维坐标。由于步骤3提供了地面和竖立区域的精确边界，则通过反投影，这些边界点所对应的三维坐标可以计算得到。为了获得竖立区域的三维坐标，本发明实施例中先用Douglas-Peucker算法对地面和竖立区域的边界用多边形近似获得边界的拟合多边形。拟合多边形上的每一条折线，可以看作是某个竖直平面和地面的交线。每一条折线对应一个竖直平面，每个竖直平面的上边界由标注结果中竖立区域和天空区域的边界确定。从而可以获得场景的几何模型，通过纹理映射可以获得具有真实感的三维场景。用户可以变换相机的视角、观察位置和调节焦距等操作进行场景漫游。附图6展示了本发明实施例一的输入图像的三维场景模型在不同视角下观察的结果图，图6中的(a)(b)(c)分别表示视角1、视角2和视角3下观察场景模型的结果图。In the embodiment of the present invention, a pinhole camera model is used, the optical axis passes through the center of the image, and it is assumed that the world coordinate system and the camera coordinate system coincide, and the camera field of view is set to 1.431 rad. Since the height of the reference ground in the model affects the scale of the scene model, the height of the ground plane is set to -5 in the embodiment of the present invention. The projection matrix can be obtained from the above conditions. When the ground height is determined, the 3D coordinates in the 3D scene corresponding to each pixel of the ground area in the image can be calculated through back projection. Since step 3 provides precise boundaries of the ground and vertical areas, the three-dimensional coordinates corresponding to these boundary points can be calculated through back projection. In order to obtain the three-dimensional coordinates of the erected area, the Douglas-Peucker algorithm is first used in the embodiment of the present invention to approximate the boundary between the ground and the erected area with a polygon to obtain a fitted polygon of the boundary. Each polyline on the fitted polygon can be regarded as the intersection line between a certain vertical plane and the ground. Each polyline corresponds to a vertical plane, and the upper boundary of each vertical plane is determined by the boundary of the vertical area and the sky area in the labeling result. Therefore, the geometric model of the scene can be obtained, and a realistic three-dimensional scene can be obtained through texture mapping. Users can change the camera's angle of view, observe the position, adjust the focal length and other operations to roam the scene. Accompanying drawing 6 shows the results of observing the 3D scene model of the input image in Embodiment 1 of the present invention under different viewing angles. (a)(b)(c) in FIG. Observe the resulting plot of the scene model.

按照本发明提出的技术方案，分别在两个公认的用于测试分类标注精确度的数据库Popup数据库(Derek Hoiem,Alexei A.Efros,and Martial Hebert,“Automatic photopop-up,”in ACM Transactions on Graphics(TOG).ACM,2005,vol.24,pp.577–584.)，简称“数据库1”和Geometric context数据库(Derek Hoiem,Alexei A.Efros,and MartialHebert,“Geometric context from a single image,”in International Conference ofComputer Vision(ICCV).2005,vol.1,pp.654–661.)，简称“数据库2”上进行评测本发明技术方案的有效性。数据库1包含144张图像，其中82张训练图像和62张测试图像。数据库2包含300张图像，以50张图像为一份，分成了6份。数据库2的标准测试方法采用6折交叉验证：测试时轮流将其中1份作为训练图像集，另外5份为测试图像集。附图7是在按照本发明的技术方案，在数据库1上以82张训练图像训练获得粗略标注的分类器，并对62张测试图像进行分类标注获得的混淆矩阵。与该混淆矩阵相对应的本发明的分类标注精确度为92％，即在测试图像集上，92％的图像像素被正确分类标注，而该数据库的分类标注精确度的基准线为87％。图8是按照本发明的技术方案在数据库2上以6折交叉验证的方式获得的分类标注结果的精确度和现有技术在该数据库上分类标注结果的对比。数据显示，在标准测试数据库2上，分类标注精确度的基准线为86.0％,目前最好的分类标注结果为88.9％，而本发明的分类方法获得的分类标注结果的精确度为88.7％。结果表明，本发明的分类方法可以获得和现有技术具有可比性的分类标注精确度。需要注意的是：就图像特征而言，本发明仅使用了少量有效的图像特征；就分类标注模型而言，本发明仅采用单分类器模型。因此在获得和现有技术具有可比性的分类标注精确度的基础上，相较于现有技术需要使用更多图像特征以及需要构建复杂的分类模型的而言，本发明提出的技术方案显得更为简单，复杂度低，易于实现。图9是按照本发明的技术方案在数据库1和数据库2上采用支持向量机为分类器进行粗略标注获得的分类标注精确度和采用图像分割算法进行标注结果修正的精确度的对比。数据显示，在数据库1和数据库2上标注结果修正后的精确度比粗略标注的精确度分别提高4.6％和3.5％。结果表明，通过本发明提出的利用图像分割算法对粗略标注结果进行修正的方法可以有效地提高分类标注的精确度。According to the technical scheme proposed by the present invention, two recognized database Popup databases (Derek Hoiem, Alexei A. Efros, and Martial Hebert, "Automatic photopop-up," in ACM Transactions on Graphics (TOG).ACM,2005,vol.24,pp.577–584.), referred to as "database 1" and Geometric context database (Derek Hoiem, Alexei A. Efros, and MartialHebert, "Geometric context from a single image," in International Conference of Computer Vision (ICCV).2005, vol.1, pp.654-661.), referred to as "database 2" to evaluate the effectiveness of the technical solution of the present invention. Database 1 contains 144 images, including 82 training images and 62 testing images. Database 2 contains 300 images, divided into 6 parts with 50 images as a part. The standard test method of database 2 adopts 6-fold cross-validation: during the test, one of them is used as the training image set in turn, and the other 5 are used as the test image set. Accompanying drawing 7 is according to the technical scheme of the present invention, trains on the database 1 with 82 training images and obtains the roughly labeled classifier, and carries out the confusion matrix obtained by classifying and labeling 62 test images. The classification labeling accuracy of the present invention corresponding to the confusion matrix is 92%, that is, on the test image set, 92% of the image pixels are correctly classified and labeled, while the baseline of the classification labeling accuracy of the database is 87%. Fig. 8 is a comparison between the accuracy of the classification and labeling results obtained by means of 6-fold cross-validation on the database 2 according to the technical solution of the present invention and the classification and labeling results of the prior art on the database. The data shows that on the standard test database 2, the baseline of classification and labeling accuracy is 86.0%, the best classification and labeling result is 88.9% at present, and the accuracy of the classification and labeling result obtained by the classification method of the present invention is 88.7%. The results show that the classification method of the present invention can obtain classification labeling accuracy comparable to that of the prior art. It should be noted that: in terms of image features, the present invention only uses a small amount of effective image features; in terms of classification and labeling models, the present invention only uses a single classifier model. Therefore, on the basis of obtaining classification and labeling accuracy comparable to that of the prior art, the technical solution proposed by the present invention appears to be more efficient than the prior art that requires the use of more image features and the construction of complex classification models. For simplicity, low complexity, and easy implementation. Fig. 9 is a comparison of classification labeling accuracy obtained by roughly labeling classifiers using support vector machines on database 1 and database 2 and accuracy of labeling result correction using image segmentation algorithms according to the technical solution of the present invention. The data shows that the corrected accuracy of annotation results on database 1 and database 2 is 4.6% and 3.5% higher than the accuracy of rough annotation, respectively. The results show that the accuracy of classification and labeling can be effectively improved by using the image segmentation algorithm proposed by the present invention to correct the rough labeling results.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例可以通过软件实现，也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解，上述实施例的技术方案可以以软件产品的形式体现出来，该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM，U盘，移动硬盘等)中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明实施例所述的方法。Through the above description of the implementation manners, those skilled in the art can clearly understand that the above embodiments can be implemented by software, or by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the above-mentioned embodiments can be embodied in the form of software products, which can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.), including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the method described in the embodiment of the present invention.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明披露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求书的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person familiar with the technical field can easily conceive of changes or changes within the technical scope disclosed in the present invention. Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. a kind of full automatic three-dimensional scene construction method based on single image, it is characterised in that comprise the following steps：

Step 1：The grader of image geometry region division can be carried out by being obtained using training image collection；

The grader of image geometry region division is obtained based on machine learning, it is necessary first to collect training image collection, then One group of training sample is obtained using training image collection, finally grader is trained using training sample；The training sample be Obtained in training image collection, including sample mark and sample extraction；

The sample mark refers to the mark that geometric areas is carried out to each width figure inside training image collection, i.e., each width figure The whole region of picture is divided into multiple geometry subregions, and each geometry subregion should be attributed to one kind in three kinds of classifications, this Three kinds of classifications are respectively：Erect region, ground region and sky areas；

, it is necessary to extract the real sample set for being used to train after sample is marked；Image-region is entered in order to reported as precisely as possible The division of row geometry subregion, using 30*40 rectangular block as sample unit, divides every piece image for interval steps with 10 Into the sample rectangular block of 30*40 with certain overlapping region a series of；For each sample rectangular block, the sample of 1031 dimensions is extracted Eigen；One group of training sample, i.e., one training sample set, and the instruction of all training images are obtained for each width training image Practice sample set and form final training sample set；

Training sample set is extracted, the classification of image geometry region division can be carried out by being obtained using the training method for having supervision Device, i.e., using support vector machines (Support Vector Machine) grader, train obtained model to export one Individual test sample is belonging respectively to the other probability of three species；

Step 2：The image that the grader obtained using training is inputted to user carries out the division of geometric areas, obtains rude classification The result of mark；

Piece image is inputted, is first that image-region is divided into a series of 30* with certain overlapping region by interval steps with 10 40 sample rectangular block, the sample characteristics of 1031 dimensions are extracted for each sample rectangular block；For each sample rectangle Block, grader exports the sample and is belonging respectively to the other probability of three species according to the sample characteristics of its 1031 dimension：p(v|P_i)、p(g| P_i) and p (s | P_i), wherein p (v | P_i) represent sample P_iBelong to erect region probability, p (g | P_i) and p (s | P_i) sample is represented respectively This P_iBelong to ground region and the probability of sky areas；

For each decision package C_jIt belongs to the other probability of three species by N number of sample rectangular block comprising the decision package Classification is together decided on, each decision package C_jIt belongs to the other probability calculation of three species：

Wherein N represents to include decision package C_jSample rectangular block number, P_iSome in N number of rectangular block is represented, so as to obtain Decision package C_jIt is belonging respectively to the other probability size of three species；p(v|C_j) represent decision package C_jBelong to the probability for erectting region, p (g|C_j) and p (s | C_j) decision package C is represented respectively_jBelong to ground region and the probability of sky areas；

And if only if decision package C_jBelong to the Probability p of certain classification^*>When 0.5, just mark the decision package for this it is described certain Classification, otherwise labels it as unknown classification；

Step 3：Using based on the rude classification annotation results obtained in GrabCut image segmentation algorithm amendment steps 2, and optimize Border between image geometry region, obtains between image geometry region accurately border；

During using based on GrabCut image segmentation algorithms, the region of " credible " is used as the first of GrabCut using in rude classification result Begin to input and fully automatically optimized rough annotation results；" credible " region is to belong to certain classification with larger possibility Pixel set, that is, the probability for belonging to certain classification is more than and 0.5 and belongs to general in all pixels set for belonging to the category Rate it is larger preceding 90%；" credible " region accordingly is calculated for each classification, is obtained for certain species in image-region Other P_*" credible " region；Based on rude classification mark " credible " region, using the output of GrabCut image segmentation algorithms come The result of rude classification mark is corrected, to obtain in image between geometric areas accurately border；

Step 4, the annotation results exported for step 3, carry out the modeling of three-dimensional scenic using the method for computer graphics, carry For the realistic three-dimensional scenic roaming of user；

According to accurately boundary information between geometric areas in image, image-region is cut into different geometric areas；Setting Determine on the basis of camera parameter, relative depth information is introduced by reference to ground, so as to recover geometric areas in image scene Important summit three-dimensional coordinate；It is final to utilize each geometry subregion of plane approximation, and regional according to geometrical relationship It is placed in three-dimensional scenic, so as to generate realistic three-dimensional scenic roaming；

The step 3 is implemented as：

(11) for certain classification P in image-region_*The computational methods in " credible " region be：

To belonging to category P in rough annotation results_*All pixels according to they belong to the category probability size descending arrange Row, remove the less pixel of probability, and its percentage is k%；

Produce one and P_*Corresponding two-value template image M_*, M_*It is all to belong to set P as artwork size_*In pixel, its In template image M_*Respective pixel positional value be 1, otherwise value be 0；

Detection template image M_*In connected region, for connected region inside exist area be less than A 0 value region, with 1 Value covering filling；

Structural element Erodent Algorithm image M by β of size_*, the picture of the category may be belonged to by being considered as the pixel being corroded Element, its set is designated asThrough excessive erosion rear pattern plate image M_*Intermediate value is 1 pixel, is considered as " credible " pixel of the category, and it collects Conjunction is designated as

Other " credible " set of pixels of three species is obtained according to the method in described calculating " credible " region respectivelyWithAnd " possibility " set of pixelsWithCalculating parameter is respectively：For erectting region, k, A, β takes 10,5000,20 respectively, for Ground region and sky areas, k, A, β take 0,5000,10 respectively；

(12) GrabCut algorithms are fully automatically optimized being embodied as rough annotation results：

Each classification is individually split respectively according to described " credible " set of pixels and " possibility " set of pixels, to certain in three classes The independent segmentation of individual classification, computational methods are：Category region is considered as prospect, two other category regions is considered as background, i.e., The pixel of " credible " in the category is considered as foreground pixel, " credible " pixel of two other classification is considered as background pixel, and will " possibility " pixel in the category is considered as possible prospect, and remaining every other pixel regards possible background as；Utilize Above- mentioned information initializes GrabCut partitioning algorithms, and the mixed Gauss model of foreground and background is set up respectively, can be with after over-segmentation Obtain the independent segmentation result by prospect of certain category regions；

Annotation results are further optimized according to the result of described independent segmentation, method is：In the base of the independent segmentation result of three steps On plinth, the method for erectting region according to independent segmentation, again to erect separation of the region as prospect progress prospect background so as to obtain Obtain image segmentation result finally；

The result individually split according to sky and ground can substantially estimate horizontal position, will be final using horizon Background area in image segmentation result is divided into sky and ground region, and its method is：Background area on horizon Domain is labeled as sky areas, and the background area under horizon is labeled as ground region；

For step 3 export annotation results, using computer graphics method carry out three-dimensional scenic modeling there is provided user Realistic three-dimensional scenic roaming, including：

The annotation results exported according to the step 3, obtain the accurate border between image geometry region, use Douglas- Fitted polygon of the Peucker algorithms to ground and the border polygonal approximation acquisition border for erectting region；

The modeling of three-dimensional scenic is carried out using the method for computer graphics, including：

(21) to described scene modeling, using pinhole camera model, optical axis is sat by picture centre, world coordinate system and camera Mark system overlaps, and camera fields of view is set to 1.431rad；

(22) reference horizontal plane of manufacturing is utilized, the three-dimensional coordinate on important summit in scene is obtained, method is：Introduce reference horizontal plane of manufacturing, ground The height of plane is set to -5；Projection matrix is obtained according to above-mentioned modeling information, under conditions of ground level determination, by anti- Projection, calculates the three-dimensional coordinate in three-dimensional scenic corresponding to each pixel of ground region in image, obtains ground area Domain and the three-dimensional coordinate for erectting zone boundary point；

(23) according to described ground region and the three-dimensional coordinate and ground region and setting regional edge of setting zone boundary point The fitted polygon on boundary, obtains a series of perpendiculars, and method is：By ground region and the fitted polygon of setting zone boundary On each broken line be considered as some perpendicular and ground region intersection, the coboundary of each perpendicular is by described The border that region and sky areas are erect in image labeling result is determined；

(24) to described perpendicular and ground region, realistic three-dimensional scene models are obtained using texture mapping； The scene walkthrough of the sense of reality includes：The visual angle of camera is converted, focus and converts observation position observation model of place.

2. according to the method described in claim 1, it is characterised in that：In the step 1, the sample characteristics of 1031 dimensions include： The position feature of the Bag of Visual Words features of 1000 dimensions, the color characteristic of 30 dimensions and 1 dimension.

3. according to the method described in claim 1, it is characterised in that：In the step 1, the basic function in SVM classifier is elected as RBF, model classification elects multi-class grader as, and probability Estimation parameter b is set to 1, that is, trains obtained grader A test sample can be exported and be belonging respectively to the other probability of three species.