CN110599537A

CN110599537A - Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system

Info

Publication number: CN110599537A
Application number: CN201910677541.0A
Authority: CN
Inventors: 陈珺; 王干北; 龚文平; 罗林波; 程展; 王永涛
Original assignee: China University of Geosciences Wuhan
Current assignee: China University of Geosciences Wuhan
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2019-12-20

Abstract

The present invention provides a method and system for calculating the building area of UAV images based on Mask R-CNN. First, the UAV is used to collect multiple images of pre-selected areas, and the size of each image is consistent; Contains images of buildings; mark the buildings in the remaining images, and use the marked images as a training image set; prepare a complete satellite image dataset; use the Mask R-CNN algorithm, first use the satellite image data The training image set is used to pre-train the initial model to obtain an initial model; the training image set is used to train the initial model to obtain the final segmentation model; the drone is used to collect images of the area to be tested, stitch them into a panorama, perform down-sampling processing, and then crop into Small image; use the segmentation model to process the small image; count the number of pixels contained in each marked building; calculate the area of each building in the image according to the actual situation.

Description

Method and system for calculating building area in UAV images based on Mask R-CNN

技术领域technical field

本发明属于地理信息科学领域，具体涉及基于Mask R-CNN的无人机图像建筑物面积计算方法及系统。The invention belongs to the field of geographic information science, and in particular relates to a method and system for calculating the area of a building in an unmanned aerial vehicle image based on Mask R-CNN.

背景技术Background technique

近年来，随着气候的变化，地质灾害发生的频率越来越频繁。特别是滑坡，泥石流，洪水等都威胁到人的生命和财产安全，因此必须事先采取预防措施。在一些容易发生地质灾害的地区，比如滑坡带，进行财产评估，然后，根据财产价值采取相应的措施是非常有必要的。财产价值通常根据房屋面积进行评估。传统方法通常是手动操作，耗时相当长，效率低。近年来，遥感技术发展迅速，因此利用遥感图像提取建筑物已成为一种趋势。尽管遥感技术取得了很大突破，但是遥感图像分辨率依然过低，使用遥感图像提取建筑物并计算面积将带来很大的误差，这可能会对房屋价值评估产生不利影响。近年来，无人机技术发展迅速，特别是在航程和载荷上有了很大的进步。使用无人机图像进行财产评估有两个原因。首先，通过配备高清摄像头，可以轻松获得分辨率达到厘米级别的高分辨率图像；第二，在一些容易发生地质灾害的地区，地形复杂且非常危险，而使用无人机却可以在这些地区对指定区域进行拍摄。因此，本发明提出了一种基于无人机航拍图像并自动计算建筑物面积的新方法。In recent years, with climate change, geological disasters have occurred more and more frequently. Especially landslides, mudslides, floods, etc. all threaten the safety of human life and property, so preventive measures must be taken in advance. In some areas prone to geological disasters, such as landslide zones, it is very necessary to conduct property assessment, and then take corresponding measures according to the property value. Property values are usually assessed based on the square footage of the home. The traditional method is usually manual operation, which takes a long time and is inefficient. In recent years, remote sensing technology has developed rapidly, so it has become a trend to use remote sensing images to extract buildings. Although remote sensing technology has made great breakthroughs, the resolution of remote sensing images is still too low. Using remote sensing images to extract buildings and calculate the area will bring large errors, which may have a negative impact on house value evaluation. In recent years, UAV technology has developed rapidly, especially in terms of range and payload. There are two reasons to use drone imagery for property appraisals. First of all, by being equipped with a high-definition camera, it is easy to obtain high-resolution images with a resolution of centimeters; second, in some areas prone to geological disasters, the terrain is complex and very dangerous, but the use of drones can be used in these areas. Specify an area to shoot. Therefore, the present invention proposes a new method for automatically calculating the area of buildings based on aerial images taken by drones.

在传统的图像分割算法中，分水岭算法是一种较为流行的方法，在分割过程中，它把相邻像素之间的相似性作为重要的参考依据，从而获得具有相似空间位置和相似灰度值的像素，进而形成相互连接的封闭轮廓。分水岭算法的常用操作步骤为：彩色图像灰度化，然后获得梯度图，最后，在梯度图的基础上进行分水岭算法，得到分割图像的边缘线。在分水岭算法中，对梯度图像进行阈值处理，选择适当的阈值对最终的分割图像有很大的影响。因此，阈值的选择是图像分割效果的关键。此外，传统的分割算法还包括聚类法和边缘检测法等。In the traditional image segmentation algorithm, the watershed algorithm is a more popular method. In the segmentation process, it takes the similarity between adjacent pixels as an important reference, so as to obtain images with similar spatial positions and similar gray values. pixels, and then form a closed contour connected with each other. The common operation steps of the watershed algorithm are: grayscale the color image, then obtain the gradient map, and finally, perform the watershed algorithm on the basis of the gradient map to obtain the edge line of the segmented image. In the watershed algorithm, the gradient image is thresholded, and the selection of an appropriate threshold has a great impact on the final segmented image. Therefore, the choice of threshold is the key to the effect of image segmentation. In addition, traditional segmentation algorithms also include clustering and edge detection.

近年来，深度卷积神经网络(DCNNs)在计算机视觉应用中变得非常流行。DCNNs通常包含许多卷积层，可以从训练数据中学习深层特征。因此，深度卷积神经网络被引入到分割任务中，并且取得了良好的效果。然而，这些方法仍然有限制。在训练数据较少的的情况下很难实现良好的结果。作为一种有监督学习方法，基于深度神经网络进行图像分割通常需要大量的训练数据，但是有些情况下样本数量是较少的。遥感图像与无人机图像有较大的相似性，而前者可以在一些开源网站上获得，因此，借助于迁移学习的理念，可以使用大量遥感图像预训练网络，之后利用少量的无人机航拍图像进行微调以达到不错的效果。In recent years, deep convolutional neural networks (DCNNs) have become very popular in computer vision applications. DCNNs usually contain many convolutional layers, which can learn deep features from training data. Therefore, deep convolutional neural networks are introduced into the segmentation task and have achieved good results. However, these methods still have limitations. It is difficult to achieve good results with less training data. As a supervised learning method, image segmentation based on deep neural network usually requires a large amount of training data, but in some cases the number of samples is small. Remote sensing images are quite similar to UAV images, and the former can be obtained on some open source websites. Therefore, with the help of the concept of transfer learning, a large number of remote sensing images can be used to pre-train the network, and then a small number of UAVs can be used to take aerial photos The image is fine-tuned to achieve a nice effect.

目前已经有许多深度学习的方法被用来进行分割任务。这些方法可以分为两类。第一类是语义分段，如FCN和deeplab系列，第二类是实例分割，例如FCIS和Mask R-CNN。语义分割是指对图像中的每个像素进行分类，但它无法区分同一类中的不同目标。实例分割可以看成语义分割的延伸。与语义分割不同的是实例分割会区分每一个实例，即使是同一类目标，实例分割仍然会标记出每一个目标，并且以不同颜色和边界框加以区分。实例分割可以看成是目标检测和语义分割的组合。由于需要对一个指定区域的相邻目标进行区分，因此，实例分割比语义分割更加适合用来计算指定区域的建筑物面积。At present, many deep learning methods have been used for segmentation tasks. These methods can be divided into two categories. The first category is semantic segmentation, such as FCN and deeplab series, and the second category is instance segmentation, such as FCIS and Mask R-CNN. Semantic segmentation refers to classifying each pixel in an image, but it cannot distinguish different objects in the same class. Instance segmentation can be seen as an extension of semantic segmentation. Different from semantic segmentation, instance segmentation will distinguish each instance. Even if it is the same type of target, instance segmentation will still mark each target and distinguish it with different colors and bounding boxes. Instance segmentation can be seen as a combination of object detection and semantic segmentation. Since it is necessary to distinguish adjacent objects in a specified area, instance segmentation is more suitable for calculating the area of buildings in a specified area than semantic segmentation.

发明内容Contents of the invention

本发明要解决的技术问题在于，针对上述目前传统的建筑物面积计算方法耗时长、效率低的技术问题，提供基于Mask R-CNN的无人机图像建筑物面积计算方法及系统解决上述技术缺陷。The technical problem to be solved by the present invention is to provide a method and system for calculating the building area of UAV images based on Mask R-CNN to solve the above-mentioned technical defects. .

基于Mask R-CNN的无人机图像建筑物面积计算方法，包括：A method for calculating the building area of UAV images based on Mask R-CNN, including:

S1、使用无人机采集预选定地区的多张图像，每张图像尺寸保持一致；S1. Use drones to collect multiple images of pre-selected areas, and keep the size of each image consistent;

S2、对S1采集到的图像进行删选，删除完全不包含建筑物的图像；S2. Deleting the images collected in S1, and deleting images that do not contain buildings at all;

S3、对S2剩下的图像进行一一标注，标注出图像中的建筑物，将标注好的图像作为训练图像集；S3, mark the remaining images of S2 one by one, mark the buildings in the images, and use the marked images as the training image set;

S4、准备一个卫星图像数据集，该数据集中包含多张已经进行了建筑物标注的卫星图像；S4. Prepare a satellite image data set, which contains a plurality of satellite images that have been marked with buildings;

S5、采用Mask R-CNN算法，先使用卫星图像数据集进行预训练，预训练完成后得到一个初始模型；S5. Using the Mask R-CNN algorithm, first use the satellite image dataset for pre-training, and obtain an initial model after the pre-training is completed;

S6、使用训练图像集对初始模型进行训练，训练多轮之后模型收敛，得到最终的分割模型；S6. Using the training image set to train the initial model, the model converges after multiple rounds of training, and obtains the final segmentation model;

S7、采用无人机对待测地区进行图像采集，将采集到的多张图像拼接成一张全景图，对全景图进行降采样处理，然后裁剪成多张尺寸相同的小型图像；S7. Using the unmanned aerial vehicle to collect images of the area to be measured, splicing the collected multiple images into a panorama, down-sampling the panorama, and then cutting it into multiple small images of the same size;

S8、使用分割模型对S7中得到的所有的小型图像进行处理，以将每张小型图像中的建筑物标注出来；S8, use the segmentation model to process all the small images obtained in S7, to mark out the buildings in each small image;

S9、统计每一个标注出建筑物所包含的像素点的个数；S9, counting the number of pixels contained in each marked building;

S10、根据实际情况，设定每一个像素点代表的单位像素点面积，计算得到图像中每个建筑物的面积。S10. According to the actual situation, set the unit pixel area represented by each pixel, and calculate the area of each building in the image.

进一步的，S1中采集的图像为三通道的RGB图像。Further, the image collected in S1 is a three-channel RGB image.

进一步的，S3中使用labelme软件对图像进行建筑物标注。Further, labelme software is used in S3 to label buildings on the image.

进一步的，S10中计算方法是用单位像素点面积与像素点个数相乘。Further, the calculation method in S10 is to multiply the unit pixel area by the number of pixels.

基于Mask R-CNN的无人机图像建筑物面积计算系统，包括：处理器及存储设备；所述处理器加载并执行所述存储设备中的指令及数据用于实现任意一种基于Mask R-CNN的无人机图像建筑物面积计算方法。The UAV image building area calculation system based on Mask R-CNN includes: a processor and a storage device; the processor loads and executes instructions and data in the storage device for realizing any one based on Mask R-CNN A CNN-based method for building area calculations from drone images.

与现有技术相比，本发明优势在于：选用Mask R-CNN做为分割模型，Mask R-CNN具有结构简单、灵活、分割效果显著等优点。在执行Mask R-CNN之后，可以获得每个建筑物的轮廓并统计每个建筑物轮廓上的像素数量，然后利用每个像素代表的单位面积大小，可以相应地计算每个建筑物的面积。通过本方法得到的结果精度明显高于传统的计算方法。Compared with the prior art, the advantage of the present invention is that: Mask R-CNN is selected as the segmentation model , and Mask R-CNN has the advantages of simple structure, flexibility, and remarkable segmentation effect. After executing Mask R-CNN, the outline of each building can be obtained and the number of pixels on the outline of each building can be counted, and then the area of each building can be calculated accordingly by using the unit area represented by each pixel. The precision of the result obtained by this method is obviously higher than that of the traditional calculation method.

附图说明Description of drawings

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:

图1是本发明基于Mask R-CNN的无人机图像建筑物面积计算方法流程图；Fig. 1 is the flow chart of the method for calculating the building area of the UAV image based on Mask R-CNN in the present invention;

图2是各种算法的定性实验比较图；Figure 2 is a qualitative experiment comparison diagram of various algorithms;

图3是各算法的定量指标比较图；Figure 3 is a comparison chart of the quantitative indicators of each algorithm;

图4是本发明使用Mask R-CNN分割的效果图；Fig. 4 is the rendering of the present invention using Mask R-CNN segmentation;

图5是本发明计算出的面积与真实值比较图。Fig. 5 is a graph comparing the area calculated by the present invention with the real value.

具体实施方式Detailed ways

为了对本发明的技术特征、目的和效果有更加清楚的理解，现对照附图详细说明本发明的具体实施方式。In order to have a clearer understanding of the technical features, purposes and effects of the present invention, the specific implementation manners of the present invention will now be described in detail with reference to the accompanying drawings.

基于Mask R-CNN的无人机图像建筑物面积计算方法，如图1所示，包括：The method for calculating the building area of UAV images based on Mask R-CNN, as shown in Figure 1, includes:

S1、使用无人机采集预选定地区的多张图像，每张图像尺寸保持一致，图像为三通道的RGB图像；S1. Use drones to collect multiple images of pre-selected areas, each image size is consistent, and the image is a three-channel RGB image;

S3、使用labelme软件对S2剩下的图像进行一一标注，标注出图像中的建筑物，将标注好的图像作为训练图像集；S3, use the labelme software to label the remaining images of S2 one by one, label the buildings in the image, and use the labeled image as a training image set;

S4、准备一个完整的卫星图像数据集，该数据集中包含多张已经进行了建筑物标注的卫星图像；S4. Prepare a complete satellite image data set, which contains multiple satellite images that have been marked with buildings;

S7、采用无人机对待测地区进行图像采集，将采集到的多张图像拼接成一张全景图，对全景图进行降采样处理，然后裁剪成多张尺寸相同的小型图像，尺寸大小满足分割模型的处理要求；S7. Use drones to collect images of the area to be measured, stitch the collected images into a panorama, perform down-sampling processing on the panorama, and then crop it into multiple small images of the same size, the size of which meets the segmentation model processing requirements;

S10、根据实际情况，设定每一个像素点代表的单位像素点面积，用单位像素点面积与像素点个数相乘，计算得到图像中每个建筑物的面积。S10. According to the actual situation, set the unit pixel point area represented by each pixel point, and multiply the unit pixel point area by the number of pixel points to calculate the area of each building in the image.

本发明采用的Mask R-CNN模型是目标检测和语义分割的结合，称为实例分割或者目标分割。Mask R-CNN在目标检测算法Faster R-CNN的基础上，增加语义分割算法全卷积神经网络(FCN)算法以作为分割分支。当图片通过Faster R-CNN后，会产生许多感兴趣区域(RoI),FCN被应用到每一个感兴趣区域已实现对像素的分类。与Faster R-CNN不同的是，Mask R-CNN使用感兴趣对齐(RoI Align)而不是感兴趣池化(RoI Pool)，这可以解决空间错位问题，对提升分割质量有明显的帮助。此外采用二元损失而不是多项式损失，这可以产生准确的二进制掩摸。Mask R-CNN的另一个特点是使用残差网络或改进的ResNet网络而不是传统的vgg网络以增强提取特征的能力。ResNet也是由何凯明等人提出并获得ILSVRC2015比赛的冠军。与VGGNet相比，ResNet效果更好，但参数更少。一般来说，ResNet的结构可以加速深度神经网络的训练，并且准确性也大大提高。Mask R-CNN非常灵活，可用于各种计算机视觉任务，包括目标检测，图像分割和人体姿势识别。在COCO挑战中，Mask R-CNN的表现要好于以前的各种模型。与微软提出的FCIS不同，Mask R-CNN更简单，性能更好，扩展性更好，功能也更为多样，Mask R-CNN可以更改不同的主干结构，比如Resnet-101或Resnet-101-FPN等。这里，主要是FPN解决了多尺度检测的问题。简而言之，相对于Faster R-CNN，Mask R-CNN有三个主要的改进。首先，探索多种网络结构作为Mask R-CNN的骨干网络。其次，使用感兴趣对齐(RoI Align)代替感兴趣池化(RoI Pool)。第三，添加FCN算法作为分割分支。在我们的任务中，Mask R-CNN用边界框识别每个目标。然后将每个边界框分割成建筑物区域和非建筑区域。The Mask R-CNN model used in the present invention is a combination of target detection and semantic segmentation, called instance segmentation or target segmentation. Based on the target detection algorithm Faster R-CNN, Mask R-CNN adds a semantic segmentation algorithm full convolutional neural network (FCN) algorithm as a segmentation branch. When the picture passes through Faster R-CNN, many regions of interest (RoI) will be generated, and FCN is applied to each region of interest to realize the classification of pixels. Unlike Faster R-CNN, Mask R-CNN uses RoI Align instead of RoI Pool, which can solve the problem of spatial misalignment and is obviously helpful to improve the quality of segmentation. Furthermore, binary loss is adopted instead of multinomial loss, which can produce accurate binary masking. Another feature of Mask R-CNN is to use a residual network or an improved ResNet network instead of the traditional vgg network to enhance the ability to extract features. ResNet was also proposed by He Kaiming and others and won the ILSVRC2015 competition. Compared with VGGNet, ResNet works better, but has fewer parameters. In general, the structure of ResNet can speed up the training of deep neural networks, and the accuracy is also greatly improved. Mask R-CNN is flexible and can be used for a variety of computer vision tasks, including object detection, image segmentation, and human pose recognition. In the COCO challenge, Mask R-CNN performed better than various previous models. Unlike FCIS proposed by Microsoft, Mask R-CNN is simpler, has better performance, better scalability, and more functions. Mask R-CNN can change different backbone structures, such as Resnet-101 or Resnet-101-FPN Wait. Here, mainly FPN solves the problem of multi-scale detection. In short, Mask R-CNN has three main improvements over Faster R-CNN. First, explore various network structures as the backbone network of Mask R-CNN. Second, use the alignment of interest (RoI Align) instead of pooling of interest (RoI Pool). Third, add the FCN algorithm as a segmentation branch. In our task, Mask R-CNN identifies each object with a bounding box. Each bounding box is then segmented into building and non-building regions.

各种算法的定性实验如图2所示，第一列是真值，第二列是FCN分割结果，第三列是deeplab分割结果，第四列是SegNet分割结果，第五列是本发明方法分割的结果；上述算法的定量指标比较如图3所示。The qualitative experiments of various algorithms are shown in Figure 2. The first column is the true value, the second column is the FCN segmentation result, the third column is the deeplab segmentation result, the fourth column is the SegNet segmentation result, and the fifth column is the method of the present invention The results of the segmentation; the quantitative index comparison of the above algorithms are shown in Fig. 3.

如图4所示为使用Mask R-CNN分割的结果，图中选取了七栋建筑物，分别是A、B、C、D、E、F、G，通过本发明计算出的面积与真实值比较结果如图5所示，其中GT是真实值，单位是平方米。As shown in Figure 4, it is the result of using Mask R-CNN segmentation. In the figure, seven buildings are selected, namely A, B, C, D, E, F, and G. The area calculated by the present invention and the real value The comparison results are shown in Figure 5, where GT is the real value and the unit is square meters.

综上所述，本发明选用Mask R-CNN作为分割模型，且采用神经网络学习的方法训练和优化模型，Mask R-CNN具有结构简单、灵活、分割效果显著等优点。在执行Mask R-CNN之后，可以获得每个建筑物的轮廓并计算每个建筑物轮廓上的像素数量，然后利用每个像素代表的单位面积大小，可以相应地计算每个建筑物的面积。通过本方法得到的结果精度明显高于传统的计算方法。In summary, the present invention selects Mask R-CNN as the segmentation model, and adopts the method of neural network learning to train and optimize the model. Mask R-CNN has the advantages of simple structure, flexibility, and remarkable segmentation effect. After executing Mask R-CNN, the outline of each building can be obtained and the number of pixels on the outline of each building can be calculated, and then the area of each building can be calculated accordingly by using the unit area represented by each pixel. The precision of the result obtained by this method is obviously higher than that of the traditional calculation method.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，这些均属于本发明的保护之内。Embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific implementations, and the above-mentioned specific implementations are only illustrative, rather than restrictive, and those of ordinary skill in the art will Under the enlightenment of the present invention, many forms can also be made without departing from the gist of the present invention and the protection scope of the claims, and these all belong to the protection of the present invention.

Claims

1. The method for calculating the building area of unmanned aerial vehicle images based on Mask R-CNN is characterized in that, comprising:

S1. Use drones to collect multiple images of pre-selected areas, and keep the size of each image consistent;

S2. Deleting the images collected in S1, and deleting images that do not contain buildings at all;

S3, mark the remaining images of S2 one by one, mark the buildings in the images, and use the marked images as the training image set;

S4. Prepare a satellite image data set, which contains a plurality of satellite images that have been marked with buildings;

S5. Using the Mask R-CNN algorithm, first use the satellite image dataset for pre-training, and obtain an initial model after the pre-training is completed;

S6. Using the training image set to train the initial model, the model converges after multiple rounds of training, and obtains the final segmentation model;

S7. Using the unmanned aerial vehicle to collect images of the area to be measured, splicing the collected multiple images into a panorama, down-sampling the panorama, and then cutting it into multiple small images of the same size;

S8, use the segmentation model to process all the small images obtained in S7, to mark out the buildings in each small image;

S9, counting the number of pixels contained in each marked building;

S10. According to the actual situation, set the unit pixel area represented by each pixel, and calculate the area of each building in the image.

2. the UAV image building area calculation method based on Mask R-CNN according to claim 1, is characterized in that, the image that gathers among the S1 is the RGB image of three channels.

3. the UAV image building area calculation method based on Mask R-CNN according to claim 1, is characterized in that, use labelme software to carry out building annotation to image in S3.

4. the UAV image building area calculation method based on Mask R-CNN according to claim 1, is characterized in that, in S10, calculation method is to multiply with unit pixel point area and pixel point number.

5. The UAV image building area calculation system based on Mask R-CNN is characterized in that it includes: a processor and a storage device; the processor loads and executes instructions and data in the storage device for realizing rights Any method for calculating the building area of UAV images based on Mask R-CNN described in 1-4 is required.