CN110047142A

CN110047142A - No-manned plane three-dimensional map constructing method, device, computer equipment and storage medium

Info

Publication number: CN110047142A
Application number: CN201910209625.1A
Authority: CN
Inventors: 周翊民; 龚亮; 吴庆甜
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2019-07-23
Also published as: WO2020186678A1

Abstract

The present application relates to a method for constructing a three-dimensional map of an unmanned aerial vehicle. The method includes: acquiring a video frame image captured by a camera, extracting feature points in each video frame image, and using a color histogram and a scale-invariant feature transformation mixing and matching algorithm Match the feature points to obtain feature point matching pairs, calculate the pose transformation matrix according to the feature point matching pairs, determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix, and convert the three-dimensional coordinates of the feature points in the video frame image. Convert to the world coordinate system to obtain a 3D point cloud map, use the video frame image as the input of the target detection model, obtain the target object information, combine the 3D point cloud map with the target object information, and obtain a 3D point cloud containing the target object information. map. This method improves the real-time and accuracy of 3D point cloud map construction, and contains rich information. In addition, a UAV 3D map construction device, computer equipment and storage medium are also proposed.

Description

UAV 3D map construction method, device, computer equipment and storage medium

技术领域technical field

本发明涉及计算机技术领域，尤其是涉及一种无人机三维地图构建方法、装置、计算机设备及存储介质。The invention relates to the field of computer technology, and in particular, to a method, device, computer equipment and storage medium for constructing a three-dimensional map of an unmanned aerial vehicle.

背景技术Background technique

随着科学技术的发展，无人机日趋小型化、智能化，其飞行空间已扩展至丛林、城市甚至建筑物内。环境感知是无人机在工作过程中对环境理解、导航、规划和行为决策的基础。其中，环境感知的最重要的目标是构建完善的三维地图，然后基于三维地图进行路径的规划和导航。传统的三维地图的构建要么准确性低、要么实时性低，且构建的地图信息量较少进而影响后续路径的规划和导航。With the development of science and technology, drones have become increasingly miniaturized and intelligent, and their flight space has expanded to jungles, cities and even buildings. Environmental perception is the basis for UAVs to understand, navigate, plan, and act on the environment during their work. Among them, the most important goal of environmental perception is to build a complete three-dimensional map, and then plan and navigate the path based on the three-dimensional map. The construction of traditional three-dimensional maps is either low in accuracy or low in real-time, and the amount of information in the constructed map is small, which affects the planning and navigation of subsequent paths.

发明内容SUMMARY OF THE INVENTION

此，有必要针对上述问题，提供了一种能够在保证高精度的前提下达到良好实时性要求且包含信息量多的无人机三维地图构建方法、装置、计算机设备及存储介质。Therefore, it is necessary to provide a method, device, computer equipment and storage medium for UAV 3D map construction that can meet the requirements of good real-time performance under the premise of ensuring high precision and contain a large amount of information in response to the above problems.

第一方面，本发明实施例提供一种无人机三维地图构建方法，所述方法包括：In a first aspect, an embodiment of the present invention provides a method for constructing a three-dimensional map of a UAV, the method comprising:

获取相机拍摄得到的视频帧图像，提取每个视频帧图像中的特征点；Obtain the video frame images captured by the camera, and extract the feature points in each video frame image;

采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对；Match the feature points between the video frame images by using the color histogram and the scale-invariant feature transformation mixed matching algorithm to obtain the feature point matching pairs between the video frame images;

根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵；Calculate the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images;

根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标；Determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下，得到三维点云地图；Convert the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system according to the three-dimensional coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a three-dimensional point cloud map;

将所述视频帧图像作为目标检测模型的输入，获取所述目标检测模型检测得到的视频帧图像中的目标物信息；The video frame image is used as the input of the target detection model, and the target object information in the video frame image detected by the target detection model is obtained;

将所述三维点云地图与所述目标物信息结合，得到包含有目标物信息的三维点云地图。The three-dimensional point cloud map is combined with the target object information to obtain a three-dimensional point cloud map containing the target object information.

第二方面，本发明实施例提供一种无人机三维地图构建装置，所述装置包括：In a second aspect, an embodiment of the present invention provides a device for constructing a three-dimensional map of an unmanned aerial vehicle, the device comprising:

提取模块，用于获取相机拍摄得到的视频帧图像，提取每个视频帧图像中的特征点；The extraction module is used to obtain the video frame images captured by the camera, and extract the feature points in each video frame image;

匹配模块，用于采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对；The matching module is used to match the feature points between the video frame images by using the color histogram and the scale-invariant feature transformation mixed matching algorithm to obtain the feature point matching pairs between the video frame images;

计算模块，用于根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵；A calculation module, for calculating the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images;

确定模块，用于根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标；a determining module, configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

转换模块，用于根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下，得到三维点云地图；The conversion module is used to convert the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system according to the three-dimensional coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a three-dimensional point cloud map;

检测模块，用于将所述视频帧图像作为目标检测模型的输入，获取所述目标检测模型检测得到的视频帧图像中的目标物信息；a detection module, configured to use the video frame image as an input of a target detection model, and obtain target object information in the video frame image detected by the target detection model;

结合模块，用于将所述三维点云地图与所述目标物信息结合，得到包含有目标物信息的三维点云地图。The combining module is used for combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

第三方面，本发明实施例提供一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述计算机程序被所述处理器执行时，使得所述处理器执行如下步骤：In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps:

第四方面，本发明实施例提供一种计算机可读存储介质，存储有计算机程序，所述计算机程序被处理器执行时，使得所述处理器执行如下步骤：In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps:

上述无人机三维地图构建方法、装置、计算机设备及存储介质，通过采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，能够提高特征点匹配的准确度和实时性。另外，通过目标检测模型对视频帧图像中的目标物进行识别检测，将目标物信息信息与三维点云地图进行结合，得到包含有物体信息的三维点云地图，使得建立的三维点云地图包含有更丰富的信息。即通过颜色直方图和尺度不变特征变换混合匹配提高了三维地图构建的准确性，且与目标检测模型识别得到的目标物信息进行结合，使得三维点云地图包含有更丰富的内容，为后续进行最优路径规划提供了支撑。The above-mentioned method, device, computer equipment and storage medium for constructing a three-dimensional map of an unmanned aerial vehicle can improve the accuracy of feature point matching by using a color histogram and a scale-invariant feature transformation mixed matching algorithm to match feature points between video frame images. degree and real-time. In addition, the target detection model is used to identify and detect the target in the video frame image, and the target information is combined with the three-dimensional point cloud map to obtain a three-dimensional point cloud map containing the object information, so that the established three-dimensional point cloud map contains There is richer information. That is to say, the accuracy of 3D map construction is improved by mixing and matching of color histogram and scale-invariant feature transformation, and combined with the target information recognized by the target detection model, the 3D point cloud map contains richer content, which is for the follow-up. It provides support for optimal path planning.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图示出的结构获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained according to the structures shown in these drawings without creative efforts.

图1为一个实施例中无人机三维地图构建方法的流程图；1 is a flowchart of a method for constructing a three-dimensional map of an unmanned aerial vehicle in one embodiment;

图2为一个实施例中无人机三维地图构建方法的示意图；2 is a schematic diagram of a method for constructing a three-dimensional map of an unmanned aerial vehicle in one embodiment;

图3为一个实施例中颜色直方图与SIFT特征匹配的结合示意图；3 is a schematic diagram of the combination of color histogram and SIFT feature matching in one embodiment;

图4为一个实施例中基于深度学习的无人机目标检测模型的训练以及预测的示意图；4 is a schematic diagram of training and prediction of a UAV target detection model based on deep learning in one embodiment;

图5为一个实施例中无人机三维地图构建装置的结构框图；5 is a structural block diagram of an apparatus for constructing a three-dimensional map of an unmanned aerial vehicle in one embodiment;

图6为另一个实施例中无人机三维地图构建装置的结构框图；6 is a structural block diagram of an apparatus for constructing a three-dimensional map of an unmanned aerial vehicle in another embodiment;

图7为又一个实施例中无人机三维地图构建装置的结构框图；7 is a structural block diagram of a device for constructing a three-dimensional map of an unmanned aerial vehicle in another embodiment;

图8为一个实施例中计算机设备的内部结构图。FIG. 8 is a diagram of the internal structure of a computer device in one embodiment.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

如图1所示，提出了一种无人机三维地图构建方法，该无人机三维地图构建方法应用于无人机或者与无人机连接的终端或服务器，本实施例中以应用于无人机为例说明，具体包括以下步骤：As shown in FIG. 1, a method for constructing a three-dimensional map of a UAV is proposed. The method for constructing a three-dimensional map of an UAV is applied to a UAV or a terminal or server connected to the UAV. Taking man-machine as an example, it includes the following steps:

步骤102，获取相机拍摄得到的视频帧图像，提取每个视频帧图像中的特征点。Step 102 , acquiring video frame images captured by the camera, and extracting feature points in each video frame image.

其中，特征点可以简单理解为图像中比较显著的点，如轮廓点、较暗区域中的亮点，较亮区域中的暗点等。在一个实施例中，无人机的相机可以采用RGB-D相机，获取拍摄得到的彩色图像和深度图像，并将获取到的彩色图像和深度图像进行时间上对齐，然后提取彩色图像中的特征点，特征点的特征提取可以采用颜色直方图和尺度不变特征变换进行特征提取。Among them, feature points can be simply understood as more prominent points in the image, such as contour points, bright spots in darker areas, and dark spots in brighter areas. In one embodiment, the camera of the UAV may use an RGB-D camera to acquire the color image and the depth image obtained by shooting, align the acquired color image and the depth image in time, and then extract the features in the color image The feature extraction of feature points can use color histogram and scale-invariant feature transformation for feature extraction.

步骤104，采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对。Step 104 , using a color histogram and a scale-invariant feature transformation mixing and matching algorithm to match the feature points between the video frame images to obtain feature point matching pairs between the video frame images.

其中，颜色直方图匹配算法侧重于对颜色特征的匹配，尺度不变特征变换(scaleinvariant feature transform，SIFT)侧重于对形状特征的匹配。所以将颜色直方图匹配算法和尺度变换特征变换进行混合，即将颜色直方图的“色”与SIFT算法的“形”进行了结合，从而提高了特征识别的准确度，提高了特征点匹配的准确度，同时也有利于提高识别的实时性，从而有利于提高后续三维点云地图生成的实时性和准确度。Among them, the color histogram matching algorithm focuses on the matching of color features, and the scale invariant feature transform (SIFT) focuses on the matching of shape features. Therefore, the color histogram matching algorithm and the scale transformation feature transformation are mixed, that is, the "color" of the color histogram is combined with the "shape" of the SIFT algorithm, thereby improving the accuracy of feature recognition and improving the accuracy of feature point matching. At the same time, it is also beneficial to improve the real-time performance of recognition, thereby improving the real-time performance and accuracy of subsequent 3D point cloud map generation.

在提取到每个视频帧图像中的特征点后，根据特征点的特征进行特征匹配，得到视频帧图像之间的特征点匹配对。由于无人机是在不断地飞行中，所以真实空间中的同一点在不同视频帧图像中的位置不同，通过获取前后视频帧中特征点的特征，然后根据特征进行匹配，得到真实空间中的同一点在不同视频帧中的位置。After the feature points in each video frame image are extracted, feature matching is performed according to the features of the feature points to obtain feature point matching pairs between the video frame images. Since the drone is constantly flying, the position of the same point in the real space is different in different video frame images. By acquiring the features of the feature points in the video frames before and after, and then matching according to the features, the real space is obtained. The position of the same point in different video frames.

在一个实施例中，获取相邻的两个视频帧图像，在前一视频帧图像和后一视频帧图像中提取到多个特征点的特征，然后对特征点的特征进行匹配，得到前一视频帧图像与后一视频帧图像中的匹配的特征点，构成特征点匹配对。比如，前一视频帧图像中的特征点分别为P1，P2，P3……，Pn，后一视频帧图像中的相应匹配的特征点分别为Q1，Q2，Q3……，Qn。其中，P1和Q1为特征点匹配对，P2和Q2为特征点匹配对，P3和Q3为特征点匹配对等。特征点的匹配可以采用暴力匹配(Brute Force)或快速近似最近邻(FLANN)算法进行特征匹配，其中，快速近似最近邻算法是通过判断最近匹配距离和次近匹配距离比值是否超过设定阈值，若超过预设阈值，则判定匹配成功，以此减少误匹配点对。In one embodiment, two adjacent video frame images are acquired, the features of multiple feature points are extracted from the previous video frame image and the next video frame image, and then the features of the feature points are matched to obtain the previous video frame image. The matched feature points in the video frame image and the next video frame image constitute a feature point matching pair. For example, the feature points in the previous video frame image are respectively P1, P2, P3..., Pn, and the corresponding matching feature points in the next video frame image are respectively Q1, Q2, Q3..., Qn. Among them, P1 and Q1 are feature point matching pairs, P2 and Q2 are feature point matching pairs, P3 and Q3 are feature point matching pairs, and so on. The matching of feature points can use Brute Force or Fast Approximate Nearest Neighbor (FLANN) algorithm for feature matching. The Fast Approximate Nearest Neighbor algorithm is to judge whether the ratio of the nearest matching distance and the second nearest matching distance exceeds the set threshold. If it exceeds the preset threshold, it is determined that the matching is successful, so as to reduce the mismatched point pairs.

步骤106，根据视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵。Step 106: Calculate the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images.

其中，在确定了特征点在视频帧图像中的位置后，就可以根据位置之间的对应关系计算得到视频帧图像之间的位姿变换矩阵。Wherein, after the positions of the feature points in the video frame images are determined, the pose transformation matrix between the video frame images can be calculated and obtained according to the correspondence between the positions.

步骤108，根据位姿变换矩阵确定每个视频帧图像对应的三维坐标。Step 108: Determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix.

其中，视频帧图像对应的三维坐标是指无人机中的相机对应的三维坐标。在已知视频帧图像之间的位姿变换矩阵后，就可以根据变换关系计算得到任一视频帧图像的三维坐标，视频帧图像对应的三维坐标实际上是指相机拍摄该视频帧图像时的位置对应的三维点坐标。The three-dimensional coordinates corresponding to the video frame images refer to the three-dimensional coordinates corresponding to the camera in the drone. After the pose transformation matrix between the video frame images is known, the three-dimensional coordinates of any video frame image can be calculated according to the transformation relationship. The three-dimensional coordinates corresponding to the video frame image actually refer to the camera when the video frame image is captured. The 3D point coordinates corresponding to the position.

步骤110，根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下，得到三维点云地图。Step 110: Convert the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system according to the three-dimensional coordinates corresponding to the video frame image and the corresponding pose transformation matrix to obtain a three-dimensional point cloud map.

其中，由于每个视频帧图像对应的三维坐标是在对应的相机坐标系下的三维坐标，视频帧图像中的特征点的坐标也是在相机坐标系下的，为了将特征点的坐标都转换到世界坐标系下，根据位姿变换矩阵进行转换，得到特征点在世界坐标下的三维坐标，从而得到三维点云地图。Among them, since the three-dimensional coordinates corresponding to each video frame image are the three-dimensional coordinates in the corresponding camera coordinate system, the coordinates of the feature points in the video frame image are also in the camera coordinate system. In order to convert the coordinates of the feature points to In the world coordinate system, the transformation is performed according to the pose transformation matrix to obtain the three-dimensional coordinates of the feature points in the world coordinates, thereby obtaining the three-dimensional point cloud map.

步骤112，将视频帧图像作为目标检测模型的输入，获取目标检测模型检测得到的视频帧图像中的目标物信息。Step 112 , taking the video frame image as the input of the target detection model, and acquiring target object information in the video frame image detected by the target detection model.

其中，预先训练得到目标检测模型，目标检测模型用于检测视频帧图像中出现的目标物，比如，汽车。由于视频帧图像中可能包含有多个物体，如果需要识别得到每个物体的类别，则相应地需要训练得到多个目标检测模型。在训练得到目标检测模型后，将视频帧图像作为目标检测模型的输入，就可以检测得到视频帧图像中的目标物以及目标物所在的位置。Among them, the target detection model is obtained by pre-training, and the target detection model is used to detect the target object appearing in the video frame image, such as a car. Since the video frame image may contain multiple objects, if the category of each object needs to be identified, it is necessary to train multiple target detection models accordingly. After training the target detection model, using the video frame image as the input of the target detection model, the target object in the video frame image and the position of the target object can be detected.

步骤114，将三维点云地图与目标物信息结合，得到包含有目标物信息的三维点云地图。Step 114 , combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map including the target object information.

其中，在得到的视频帧图像中的目标物的信息后，通过与三维点云地图上的特征点进行匹配，就可以确定目标物对应的特征点，将该特征点对应的目标物信息标注到三维点云地图，从而使得建立的三维点云地图具有更丰富的信息量。目标检测模型用于对局部感知，而三维点云地图的构建是基于全局感知，将全局感知和局部感知进行结合，从而提高了三维点云地图的丰富性。Among them, after the information of the target object in the obtained video frame image is matched with the feature points on the three-dimensional point cloud map, the feature point corresponding to the target object can be determined, and the target object information corresponding to the feature point can be marked in the 3D point cloud map, so that the established 3D point cloud map has more abundant information. The target detection model is used for local perception, and the construction of 3D point cloud map is based on global perception, which combines global perception and local perception, thereby improving the richness of 3D point cloud map.

上述无人机三维地图构建方法，通过采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，能够提高特征点匹配的准确度和实时性。另外，通过目标检测模型对视频帧图像中的目标物进行识别检测，将目标物信息信息与三维点云地图进行结合，得到包含有物体信息的三维点云地图，使得建立的三维点云地图包含有更丰富的信息。即通过颜色直方图和尺度不变特征变换混合匹配提高了三维地图构建的准确性，且与目标检测模型识别得到的目标物信息进行结合，使得三维点云地图包含有更丰富的内容，为后续进行最优路径规划提供了支撑，提高了无人机环境感知的智能化水平。The above-mentioned UAV 3D map construction method can improve the accuracy and real-time performance of feature point matching by using a color histogram and a scale-invariant feature transformation mixed matching algorithm to match feature points between video frame images. In addition, the target detection model is used to identify and detect the target in the video frame image, and the target information is combined with the three-dimensional point cloud map to obtain a three-dimensional point cloud map containing the object information, so that the established three-dimensional point cloud map contains There is richer information. That is to say, the accuracy of 3D map construction is improved by mixing and matching of color histogram and scale-invariant feature transformation, and combined with the target information recognized by the target detection model, the 3D point cloud map contains richer content, which is for the follow-up. The optimal path planning provides support and improves the intelligence level of UAV environmental perception.

如图2所示，在一个实施例中，无人机三维地图构建方法的示意图，包括：全局感知和局部感知两个部分。全局感知中采用颜色直方图和SIFT特征进行混合的结构框架进行匹配，然后进行定位以及三维点云地图的构建。局部感知采用目标检测模型对视频帧图像中的目标物进行识别，最后，将两者结合，得到包含有目标物信息的三维点云地图。As shown in FIG. 2 , in one embodiment, a schematic diagram of a method for constructing a three-dimensional map of a UAV includes two parts: global perception and local perception. In the global perception, the color histogram and the SIFT feature are used to match the structural framework, and then the localization and the construction of the 3D point cloud map are carried out. The local perception uses the target detection model to identify the target in the video frame image. Finally, the two are combined to obtain a 3D point cloud map containing the target information.

在一个实施例中，所述采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对，包括：采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配，得到第一匹配对集合；采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。In one embodiment, using a color histogram and a scale-invariant feature transformation mixing and matching algorithm to match feature points between video frame images to obtain feature point matching pairs between video frame images, including: using color histograms The graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transformation matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature point matching pairs.

其中，先采用颜色直方图进行初步的特征点匹配，得到第一匹配对集合，然后采用尺度不变特征变换匹配算法对第一匹配对集合中的匹配点进行进一步匹配，得到目标特征点匹配对。在一个实施例中，颜色直方图的匹配采用Bhattacharyya距离计算，或者采用Correlation距离计算。如图3所示，为一个实施例中，颜色直方图与SIFT特征匹配的结合示意图，两者为串级的关系。Among them, the color histogram is used to perform preliminary feature point matching to obtain the first matching pair set, and then the scale-invariant feature transformation matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature point matching pair. . In one embodiment, the matching of the color histogram is calculated using the Bhattacharyya distance, or the Correlation distance. As shown in FIG. 3 , it is a schematic diagram of the combination of color histogram and SIFT feature matching in one embodiment, and the two are in a cascade relationship.

在一个实施例中，根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵，包括：获取所述特征点匹配对中每个特征点的三维坐标；计算将一个视频帧图像中特征点的三维坐标转换到另一视频帧图像得到的转换三维坐标；获取所述另一视频帧图像中相应匹配的特征点对应的目标三维坐标；根据所述转换三维坐标和所述目标三维坐标计算得到位姿变换矩阵。In one embodiment, calculating the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images includes: acquiring the three-dimensional coordinates of each feature point in the feature point matching pairs; Calculate the converted three-dimensional coordinates obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image; obtain the target three-dimensional coordinates corresponding to the corresponding matching feature points in the another video frame image; According to the converted three-dimensional coordinates The coordinates and the three-dimensional coordinates of the target are calculated to obtain a pose transformation matrix.

其中，在确定了特征点匹配对后，获取每个特征点的三维坐标，三维坐标是根据RGB-D相机拍摄得到的彩色图像和深度图像得到的，彩色图像用于识别得到特征点的x和y值，深度图像用于获取相应的z值。对于两个视频帧图像，将特征点匹配对分别作为两个集合，第一视频帧图像中的特征点的集合为{P|P_i∈R³,i＝1,2KN}，第二视频帧图像中的特征点的集合为{Q|Q_i∈R³,i＝1,2KN}，将两个点集之间的误差作为代价函数，通过代价函数的最小化求得对应的旋转矩阵R和平移向量t。可以采用如下公式表示：Among them, after the matching pair of feature points is determined, the three-dimensional coordinates of each feature point are obtained. The three-dimensional coordinates are obtained according to the color image and depth image captured by the RGB-D camera. The color image is used to identify the x and y value, the depth image is used to get the corresponding z value. For two video frame images, the feature point matching pairs are taken as two sets respectively, the set of feature points in the first video frame image is {P|P _i ∈ R ³ , i=1, 2KN}, the second video frame image The set of feature points in the image is {Q|Q _i ∈ R ³ , i=1, 2KN}, the error between the two point sets is used as the cost function, and the corresponding rotation matrix R is obtained by minimizing the cost function and translation vector t. It can be expressed by the following formula:

其中，R和t分别为旋转矩阵和平移向量。迭代最近点算法的步骤为：where R and t are the rotation matrix and translation vector, respectively. The steps of the iterative closest point algorithm are:

1)对P_i中每一个点在Q中对应的最近点，记为Q_i；1) The nearest point corresponding to each point in Q in P _i is denoted as Q _i ;

2)按照以上公式求取使最小的变换矩阵R和t；2) Find the minimum transformation matrix R and t according to the above formula;

3)利用R和t对点集P进行刚体变换操作得到新点集计算新点集与点集Q之间的误差距离：3) Use R and t to perform rigid body transformation on the point set P to obtain a new point set Compute the error distance between the new point set and the point set Q:

在实际操作中，可以将有约束条件的旋转矩阵和平移向量用无约束的李代数表示，并且记录误差距离小于设定阈值的特征点数量，即内点数量。如果步骤3)中计算的误差距离E_d小于阈值且内点大于设定阈值，或者迭代次数是否到达设定阈值，则迭代结束；如果不满足则转到步骤1)进行下一轮迭代。In practice, the constrained rotation matrix and translation vector can be represented by unconstrained Lie algebra, and the number of feature points whose error distance is less than the set threshold, that is, the number of interior points, can be recorded. If the error distance _Ed calculated in step 3) is less than the threshold and the inner point is greater than the set threshold, or whether the number of iterations reaches the set threshold, the iteration ends; if not, go to step 1) for the next iteration.

在一个实施例中，所述目标检测模型是基于深度学习模型训练得到的；在所述将所述视频帧图像作为目标检测模型的输入，获取所述目标检测模型输出的检测得到目标物之前，还包括：获取训练视频图像样本，所述训练视频图像样本包括正样本和负样本，所述正样本中包括有目标物以及所述目标物在所述视频图像中位置标记；根据所述训练视频图像样本对所述目标检测模型进行训练，得到训练好的目标检测模型。In one embodiment, the target detection model is obtained by training based on a deep learning model; before the target object is obtained by using the video frame image as the input of the target detection model to obtain the output of the target detection model, It also includes: acquiring training video image samples, the training video image samples include positive samples and negative samples, and the positive samples include a target and a position mark of the target in the video image; according to the training video The image samples are used to train the target detection model to obtain a trained target detection model.

其中，目标检测模型是采用深度学习模型进行训练得到的。为了训练得到目标检测模型，首先获取训练视频图像样本，并设定正样本和负样本，正样本就是包含有目标物以及目标物在视频图像中的位置标记的视频图像，通过训练学习到能够检测目标物的目标检测模型。如图4所示，在一个实施例中，基于深度学习的无人机目标检测模型的训练以及预测的示意图，分为预处理和实时检测两大部分。实时检测目标，首先对无人机采集的数据进行预处理操作，将采集到的视频流分为一个个视频帧图像，对图像中的目标进行样品标记，分为训练和测试数据集，使用深度学习框架训练模型，然后将保存的模型应用于平台回传的视频流，完成对目标的实时检测。Among them, the target detection model is obtained by training a deep learning model. In order to train the target detection model, first obtain training video image samples, and set positive samples and negative samples. A positive sample is a video image containing the target object and the position mark of the target object in the video image. Object detection model for objects. As shown in FIG. 4 , in one embodiment, a schematic diagram of training and prediction of a UAV target detection model based on deep learning is divided into two parts: preprocessing and real-time detection. To detect targets in real time, first preprocess the data collected by the drone, divide the collected video stream into video frame images, mark the targets in the images, and divide them into training and test data sets. The learning framework trains the model, and then applies the saved model to the video stream returned by the platform to complete the real-time detection of the target.

使用小型无人机载体，搭载工业摄像头，广泛的针对无人机视角下的情景大量视频数据取样，确定好无人机所需识别目标，在获取的视频数据中对所需识别目标进行标记，利用预处理好的数据对神经网络模型进行训练，调节模型参数至训练结果满足收敛条件，保存训练模型用于后续目标检测，将训练好的模型加载至无人机，运用无人机对目标检测试验，不断地调节优化模型。Use a small drone carrier, equipped with an industrial camera, and widely sample a large amount of video data for the scene from the perspective of the drone, determine the target that the drone needs to identify, and mark the target to be identified in the acquired video data. Use the preprocessed data to train the neural network model, adjust the model parameters until the training results meet the convergence conditions, save the training model for subsequent target detection, load the trained model to the UAV, and use the UAV to detect the target Experiment, and constantly adjust the optimization model.

在一个具体的实施例中，深度学习模型采用YOLOv3网络结构(也叫做Darknet-53)，采用全卷积网络，包括：引入了residual(残差)结构，即ResNet跳层连接方式，大量使用残差网络特性。使用步长为2的卷积来进行降采样，同时使用了上采样、route操作，在一个网络结构中进行3次检测。使用维度聚类作为anchor boxes(锚箱)进行预测边界框，训练期间使用平方误差损失的总和，通过逻辑回归预测每个边界框的对象分数。如果以前的边界框不是最好的，且将待测对象重叠了一定的阈值以上后，我们会忽略这个预测，继续进行。我们使用阈值0.5系统只为每个待测对象分配一个边界框。如果先前的边界框未分配给待测对象，则不会对坐标或类别预测造成损失。每个框使用多标签分类来预测边界框可能包含的类。在训练过程中，使用二元交叉熵损失来进行类别预测。使用YOLOv3轻量级目标检测神经网络结构应用于无人机平台，在无人机的有限算力下提高了目标的实时识别的能力。In a specific embodiment, the deep learning model adopts the YOLOv3 network structure (also called Darknet-53) and adopts a fully convolutional network, including: the introduction of a residual (residual) structure, that is, the ResNet skip layer connection method, and a large number of residual Poor network characteristics. Downsampling is performed using convolution with a stride of 2, while upsampling and route operations are used to perform 3 detections in one network structure. The bounding boxes are predicted using dimensional clustering as anchor boxes, and the object score for each bounding box is predicted by logistic regression using the sum of the squared error losses during training. If the previous bounding box is not the best, and after overlapping the object under test by more than a certain threshold, we ignore the prediction and move on. We use a threshold of 0.5 to systematically assign only one bounding box to each object under test. If the previous bounding box is not assigned to the object under test, there is no penalty for coordinate or class prediction. Each box uses multi-label classification to predict the class that the bounding box is likely to contain. During training, a binary cross-entropy loss is used for class prediction. The YOLOv3 lightweight target detection neural network structure is applied to the UAV platform, which improves the ability of real-time target recognition under the limited computing power of the UAV.

在一个实施例中，所述将所述三维点云地图与所述目标物信息结合，得到包含有目标物信息的三维点云地图，包括：获取检测得到的目标物在视频帧图像中的目标位置；根据所述目标位置确定与之匹配的特征点；根据所述特征点将所述目标物信息标注到所述三维点云地图。In one embodiment, the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target of the target object in the video frame image position; determine the matching feature point according to the target position; mark the target object information on the three-dimensional point cloud map according to the feature point.

其中，根据检测得到的目标物在视频帧图像中的位置，以及特征点在视频帧图像中的位置，确定与特征点匹配的目标物信息，将目标物信息标注到三维点云地图上，从而得到信息量更丰富的三维点云地图。Among them, according to the position of the detected target in the video frame image and the position of the feature point in the video frame image, the target object information matching the feature point is determined, and the target object information is marked on the three-dimensional point cloud map, so as to Get a more informative 3D point cloud map.

在一个实施例中，所述方法还包括：获取惯性测量单元测量得到的测量数据；根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵；所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵，包括：根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。In one embodiment, the method further includes: acquiring measurement data measured by an inertial measurement unit; calculating an initial pose transformation matrix between video frames according to the measurement data; The feature point matching pair is calculated to obtain the pose transformation matrix between the video frame images, including: calculating the target position between the video frames according to the initial pose transformation matrix and the feature point matching pair between the video frame images. Pose transformation matrix.

其中，惯性测量单元(Inertial measurement unit，IMU)是测量物体三轴姿态角(或角速率)以及加速度的装置。将惯性测量单元作为无人机的惯性参数测量装置，该装置包含了三轴陀螺仪、三轴加速度和三轴磁力计。无人机可以直接读取惯性测量单元测量的测量数据，测量数据包括：角速度、加速度和磁力计数据等。在获取到惯性测量单元测量得到的测量数据后，直接可以根据测量数据计算得到无人机的位姿变换矩阵，由于惯性测量单元会存在累计误差，所以得到的无人机的位姿变换矩阵不够准确。为了与后续优化后的位姿变换矩阵进行区分，将根据测量数据直接计算得到的位姿变换矩阵称为“初始位姿变换矩阵”。位姿变换矩阵包括旋转矩阵R和平移向量t。在一个实施例中，通过采用互补滤波算法计算得到测量数据对应的初始位姿变换矩阵。在得到初始位姿变换矩阵后，将初始位姿变换矩阵作为初始矩阵，采用迭代最近点(Iterative Closest Point,ICP)算法根据视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。通过将惯性测量单元得到的初始位姿变换矩阵作为初始矩阵有利于提高计算的速度。Among them, an inertial measurement unit (Inertial measurement unit, IMU) is a device for measuring the three-axis attitude angle (or angular rate) and acceleration of an object. The inertial measurement unit is used as the inertial parameter measurement device of the UAV, which includes a three-axis gyroscope, a three-axis acceleration and a three-axis magnetometer. The UAV can directly read the measurement data measured by the inertial measurement unit, including: angular velocity, acceleration and magnetometer data. After the measurement data measured by the inertial measurement unit is obtained, the pose transformation matrix of the UAV can be directly calculated according to the measurement data. Since the inertial measurement unit will have accumulated errors, the obtained pose transformation matrix of the UAV is not enough. precise. In order to distinguish it from the subsequent optimized pose transformation matrix, the pose transformation matrix directly calculated according to the measurement data is called the "initial pose transformation matrix". The pose transformation matrix includes a rotation matrix R and a translation vector t. In one embodiment, the initial pose transformation matrix corresponding to the measurement data is obtained by calculating by using a complementary filtering algorithm. After the initial pose transformation matrix is obtained, the initial pose transformation matrix is used as the initial matrix, and the iterative closest point (Iterative Closest Point, ICP) algorithm is used to calculate the target between the video frames according to the feature point matching pairs between the video frame images. Pose transformation matrix. Using the initial pose transformation matrix obtained by the inertial measurement unit as the initial matrix is beneficial to improve the calculation speed.

在一个实施例中，在所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵之后，还包括：计算当前视频帧与前一关键帧之间的运动量，若运动量大于预设阈值，则将当前视频帧作为关键帧；当所述当前视频帧为关键帧时，将当前视频帧与之前的关键帧库中的关键帧进行匹配，若所述关键帧库中存在与当前视频帧匹配的关键帧，则将当前视频帧作为回环帧；根据所述回环帧对相应的位姿变换矩阵进行优化更新，得到更新位姿变换矩阵；所述根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标，包括：根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。In an embodiment, after the calculating the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images, the method further includes: calculating the distance between the current video frame and the previous key frame If the motion amount is greater than the preset threshold, the current video frame is used as a key frame; when the current video frame is a key frame, the current video frame is matched with the key frame in the previous key frame library, if the There is a key frame matching the current video frame in the key frame library, then the current video frame is used as the loopback frame; the corresponding pose transformation matrix is optimized and updated according to the loopback frame, and the updated pose transformation matrix is obtained; Determining the three-dimensional coordinates corresponding to each video frame image by the pose transformation matrix includes: determining the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.

其中，为了减少后续优化的复杂度，可以通过关键帧的提取来减少计算的复杂度。由于采集到的视频帧比较密集，比如，一般一秒内就可以采集30帧，可见，帧与帧之间的相似度很高，甚至是完全一样的，那么如果计算每一帧无疑会增加计算复杂度。所以可以通过提取关键帧来减少复杂度。具体地，首先将第一视频帧作为关键帧，然后通过计算当前视频帧与前一关键帧之间的运动量，若运动量在一定阈值范围则选为关键帧，其中运动量的计算公式为：Among them, in order to reduce the complexity of subsequent optimization, the computational complexity can be reduced by extracting key frames. Because the collected video frames are relatively dense, for example, 30 frames can be collected in one second. It can be seen that the similarity between frames is very high, or even exactly the same, so calculating each frame will undoubtedly increase the calculation. the complexity. So the complexity can be reduced by extracting keyframes. Specifically, the first video frame is first used as a key frame, and then by calculating the amount of motion between the current video frame and the previous key frame, if the amount of motion is within a certain threshold range, it is selected as a key frame, where the calculation formula for the amount of motion is:

其中，E_m表示运动量的度量，t_x,t_y,t_z表示平移向量t的三个平移距离，表示帧间运动旋转欧拉角，可以从旋转矩阵转化得到，ω₁,ω₂分别为平移和旋转运动量的平衡权重，对相机拍摄的视觉场，旋转比平移更容易带来较大的场景变化，因此ω₂的取值比ω₁大，具体取值要根据具体情况进行调整。Among them, Em represents the measure of the amount of motion, t _x , _ty , _t _z represent the three translation distances of the translation vector t, Represents the Euler angle of rotation between frames, which can be converted from the rotation matrix. ω ₁ , ω ₂ are the balance weights of translation and rotation, respectively. For the visual field captured by the camera, rotation is easier than translation to bring greater scene changes , so the value of ω ₂ is larger than that of ω ₁ , and the specific value should be adjusted according to the specific situation.

在提取了关键帧后，采用回环检测的方法对得到的位姿变换矩阵进行优化更新。在一个实施例中，采用闭环检测算法进行回环检测。在进行回环检测后，根据回环检测结果对目标位姿变换矩阵进行更新优化，得到更准确的位姿变换矩阵，为了区分，称为“更新位姿变换矩阵”。根据更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。After the key frames are extracted, the loop closure detection method is used to optimize and update the obtained pose transformation matrix. In one embodiment, loop closure detection is performed using a closed loop detection algorithm. After loop closure detection, the target pose transformation matrix is updated and optimized according to the loop closure detection result, and a more accurate pose transformation matrix is obtained, which is called "updated pose transformation matrix" in order to distinguish. The three-dimensional coordinates corresponding to each video frame image are determined according to the updated pose transformation matrix.

如图5所示，提出了一种无人机三维地图构建装置，该装置包括：As shown in Figure 5, a device for constructing a three-dimensional map of an unmanned aerial vehicle is proposed, which includes:

提取模块502，用于获取相机拍摄得到的视频帧图像，提取每个视频帧图像中的特征点；The extraction module 502 is used to obtain the video frame images captured by the camera, and extract the feature points in each video frame image;

匹配模块504，用于采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对；The matching module 504 is used to match the feature points between the video frame images by adopting the color histogram and the scale-invariant feature transformation mixing and matching algorithm to obtain the feature point matching pairs between the video frame images;

计算模块506，用于根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵；The calculation module 506 is used to calculate the pose transformation matrix between the video frame images according to the feature point matching pairs between the video frame images;

确定模块508，用于根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标；A determination module 508, configured to determine the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix;

转换模块510，用于根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下，得到三维点云地图；The conversion module 510 is used to convert the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system according to the three-dimensional coordinates corresponding to the video frame image and the corresponding pose transformation matrix, so as to obtain a three-dimensional point cloud map;

检测模块512，用于将所述视频帧图像作为目标检测模型的输入，获取所述目标检测模型检测得到的视频帧图像中的目标物信息；The detection module 512 is configured to use the video frame image as the input of the target detection model, and obtain the target object information in the video frame image detected by the target detection model;

结合模块514，用于将所述三维点云地图与所述目标物信息结合，得到包含有目标物信息的三维点云地图。The combining module 514 is configured to combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

在一个实施例中，所述匹配模块504还用于采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配，得到第一匹配对集合；采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。In one embodiment, the matching module 504 is further configured to use a color histogram feature matching algorithm to match the feature points between the video frame images to obtain a first set of matching pairs; The matching points in the first matching pair set are further matched to obtain target feature point matching pairs.

在一个实施例中，计算模块506还用于获取所述特征点匹配对中每个特征点的三维坐标；计算将一个视频帧图像中特征点的三维坐标转换到另一视频帧图像得到的转换三维坐标；获取所述另一视频帧图像中相应匹配的特征点对应的目标三维坐标；根据所述转换三维坐标和所述目标三维坐标计算得到位姿变换矩阵。In one embodiment, the calculation module 506 is further configured to obtain the three-dimensional coordinates of each feature point in the feature point matching pair; calculate the conversion obtained by converting the three-dimensional coordinates of the feature points in one video frame image to another video frame image three-dimensional coordinates; acquiring the three-dimensional coordinates of the target corresponding to the corresponding matching feature points in the other video frame image; and calculating the pose transformation matrix according to the converted three-dimensional coordinates and the three-dimensional coordinates of the target.

在一个实施例中，所述目标检测模型是基于深度学习模型训练得到的；上述无人机三维地图构建装置还包括：训练模块，用于获取训练视频图像样本，所述训练视频图像样本包括正样本和负样本，所述正样本中包括有目标物以及所述目标物在所述视频图像中位置标记；根据所述训练视频图像样本对所述目标检测模型进行训练，得到训练好的目标检测模型。In one embodiment, the target detection model is obtained by training based on a deep learning model; the above-mentioned apparatus for constructing a three-dimensional map of an unmanned aerial vehicle further includes: a training module for acquiring training video image samples, wherein the training video image samples include positive sample and negative sample, the positive sample includes the target object and the position mark of the target object in the video image; the target detection model is trained according to the training video image sample, and the trained target detection model is obtained Model.

在一个实施例中，所述结合模块514还用于获取检测得到的目标物在视频帧图像中的目标位置；根据所述目标位置确定与之匹配的特征点；根据所述特征点将所述物体类别信息标注到所述三维点云地图。In one embodiment, the combining module 514 is further configured to acquire the target position of the detected target in the video frame image; determine the matching feature point according to the target position; Object category information is marked on the three-dimensional point cloud map.

如图6所示，在一个实施例中，上述无人机三维地图构建装置还包括：As shown in Figure 6, in one embodiment, the above-mentioned device for constructing a three-dimensional map of an unmanned aerial vehicle further includes:

初始计算模块505，用于获取惯性测量单元测量得到的测量数据，根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵；The initial calculation module 505 is used to obtain the measurement data obtained by the inertial measurement unit measurement, and calculate the initial pose transformation matrix between the video frames according to the measurement data;

所述计算模块还用于包括：根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。The computing module is further configured to include: calculating and obtaining a target pose transformation matrix between video frames according to the initial pose transformation matrix and the feature point matching pair between the video frame images.

如图7所示，在一个实施例中，上述无人机三维地图构建装置还包括：As shown in FIG. 7 , in one embodiment, the above-mentioned device for constructing a three-dimensional map of an unmanned aerial vehicle further includes:

关键帧确定模块516，用于计算当前视频帧与前一关键帧之间的运动量，若运动量大于预设阈值，则将当前视频帧作为关键帧。The key frame determination module 516 is configured to calculate the motion amount between the current video frame and the previous key frame, and if the motion amount is greater than a preset threshold, the current video frame is used as the key frame.

回环帧确定模块518，用于当所述当前视频帧为关键帧时，将当前视频帧与之前的关键帧库中的关键帧进行匹配，若所述关键帧库中存在与当前视频帧匹配的关键帧，则将当前视频帧作为回环帧。The loopback frame determination module 518 is used to match the current video frame with the key frame in the previous key frame library when the current video frame is a key frame, if there is a matching frame with the current video frame in the key frame library. If the key frame is selected, the current video frame is used as the loopback frame.

优化模块520，用于根据所述回环帧对相应的位姿变换矩阵进行优化更新，得到更新位姿变换矩阵。The optimization module 520 is configured to optimize and update the corresponding pose transformation matrix according to the loop closure frame to obtain an updated pose transformation matrix.

所述确定模块508还用于根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。The determining module 508 is further configured to determine the three-dimensional coordinates corresponding to each video frame image according to the updated pose transformation matrix.

图8示出了一个实施例中计算机设备的内部结构图。该计算机设备可以是无人机、或与无人机连接的终端或服务器。如图8所示，该计算机设备包括通过系统总线连接的处理器、存储器、和网络接口。其中，存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统，还可存储有计算机程序，该计算机程序被处理器执行时，可使得处理器实现无人机三维地图构建方法。该内存储器中也可储存有计算机程序，该计算机程序被处理器执行时，可使得处理器执行无人机三维地图构建方法。网络接口用于与外接进行通信。本领域技术人员可以理解，图8中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Figure 8 shows an internal structure diagram of a computer device in one embodiment. The computer equipment can be a drone, or a terminal or server connected to the drone. As shown in Figure 8, the computer device includes a processor, memory, and a network interface connected by a system bus. Wherein, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and also stores a computer program. When the computer program is executed by the processor, the processor can implement a method for constructing a three-dimensional map of an unmanned aerial vehicle. A computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor may execute the method for constructing a three-dimensional map of an unmanned aerial vehicle. The network interface is used to communicate with external devices. Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

在一个实施例中，本申请提供的无人机三维地图构建方法可以实现为一种计算机程序的形式，计算机程序可在如图8所示的计算机设备上运行。计算机设备的存储器中可存储组成该无人机三维地图构建装置的各个程序模板。比如，提取模块502，匹配模块504，计算模块506，确定模块508，转换模块510，检测模块512和结合模块514。In one embodiment, the method for constructing a three-dimensional map of an unmanned aerial vehicle provided by the present application may be implemented in the form of a computer program, and the computer program may be executed on a computer device as shown in FIG. 8 . The memory of the computer equipment can store each program template that constitutes the three-dimensional map construction device of the unmanned aerial vehicle. For example, extraction module 502 , matching module 504 , calculation module 506 , determination module 508 , conversion module 510 , detection module 512 and combination module 514 .

一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述计算机程序被所述处理器执行时，使得所述处理器执行如下步骤：获取相机拍摄得到的视频帧图像，提取每个视频帧图像中的特征点；采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对；根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵；根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标；根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下，得到三维点云地图；将所述视频帧图像作为目标检测模型的输入，获取所述目标检测模型检测得到的视频帧图像中的目标物信息；将所述三维点云地图与所述目标物信息结合，得到包含有目标物信息的三维点云地图。A computer device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor is made to perform the following steps: acquiring a video frame image captured by a camera, extracting feature points in each video frame image; use a color histogram and a scale-invariant feature transformation hybrid matching algorithm to match the feature points between the video frame images to obtain feature point matching pairs between the video frame images; according to the described The feature point matching pairs between the video frame images are calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates corresponding to each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and The corresponding pose transformation matrix converts the three-dimensional coordinates of the feature points in the video frame image to the world coordinate system to obtain a three-dimensional point cloud map; the video frame image is used as the input of the target detection model to obtain the target detection model detection. obtaining target object information in the video frame image; combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

在一个实施例中，所述将所述三维点云地图与所述目标物信息结合，得到包含有目标物信息的三维点云地图，包括：获取检测得到的目标物在视频帧图像中的目标位置；根据所述目标位置确定与之匹配的特征点；根据所述特征点将所述物体类别信息标注到所述三维点云地图。In one embodiment, the combining the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information includes: acquiring the detected target of the target object in the video frame image position; determine the matching feature point according to the target position; mark the object category information on the three-dimensional point cloud map according to the feature point.

在一个实施例中，所述计算机程序被所述处理器处理时，还用于执行以下步骤：获取惯性测量单元测量得到的测量数据；根据所述测量数据计算得到视频帧之间的初始位姿变换矩阵；所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵，包括：根据所述初始位姿变换矩阵和所述视频帧图像之间的特征点匹配对计算得到视频帧之间的目标位姿变换矩阵。In one embodiment, when the computer program is processed by the processor, the computer program is further configured to perform the following steps: acquiring measurement data measured by an inertial measurement unit; calculating an initial pose between video frames according to the measurement data transformation matrix; the calculating and obtaining the pose transformation matrix between the video frame images according to the feature point matching pair between the video frame images, including: according to the initial pose transformation matrix and the video frame image The feature point matching pairs are calculated to obtain the target pose transformation matrix between video frames.

在一个实施例中，在所述根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵之后，所述计算机程序被所述处理器处理时，还用于执行以下步骤：计算当前视频帧与前一关键帧之间的运动量，若运动量大于预设阈值，则将当前视频帧作为关键帧；当所述当前视频帧为关键帧时，将当前视频帧与之前的关键帧库中的关键帧进行匹配，若所述关键帧库中存在与当前视频帧匹配的关键帧，则将当前视频帧作为回环帧；根据所述回环帧对相应的位姿变换矩阵进行优化更新，得到更新位姿变换矩阵；所述根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标，包括：根据所述更新位姿变换矩阵确定每个视频帧图像对应的三维坐标。In one embodiment, after the pose transformation matrix between the video frame images is calculated according to the feature point matching pair between the video frame images, when the computer program is processed by the processor, the computer program further uses In performing the following steps: calculating the motion amount between the current video frame and the previous key frame, if the motion amount is greater than a preset threshold, the current video frame is used as a key frame; when the current video frame is a key frame, the current video frame Matching with the key frames in the previous key frame library, if there is a key frame matching the current video frame in the key frame library, the current video frame is used as a loopback frame; according to the loopback frame, the corresponding pose transformation is performed The matrix is optimized and updated to obtain an updated pose transformation matrix; the determining the three-dimensional coordinates corresponding to each video frame image according to the pose transformation matrix includes: determining the corresponding three-dimensional coordinates of each video frame image according to the updated pose transformation matrix. three-dimensional coordinates.

一种计算机可读存储介质，存储有计算机程序，所述计算机程序被处理器执行时，使得所述处理器执行如下步骤：获取相机拍摄得到的视频帧图像，提取每个视频帧图像中的特征点；采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对；根据所述视频帧图像之间的特征点匹配对计算得到视频帧图像之间的位姿变换矩阵；根据所述位姿变换矩阵确定每个视频帧图像对应的三维坐标；根据视频帧图像对应的三维坐标和相应的位姿变换矩阵将视频帧图像中的特征点的三维坐标转换到世界坐标系下，得到三维点云地图；将所述视频帧图像作为目标检测模型的输入，获取所述目标检测模型检测得到的视频帧图像中的目标物信息；将所述三维点云地图与所述目标物信息结合，得到包含有目标物信息的三维点云地图。A computer-readable storage medium storing a computer program, when the computer program is executed by a processor, the processor is made to perform the following steps: acquiring a video frame image captured by a camera, extracting features in each video frame image The feature points between the video frame images are matched by the color histogram and the scale-invariant feature transformation mixed matching algorithm to obtain the feature point matching pairs between the video frame images; according to the feature points between the video frame images The matching pairs are calculated to obtain the pose transformation matrix between the video frame images; the three-dimensional coordinates corresponding to each video frame image are determined according to the pose transformation matrix; the three-dimensional coordinates corresponding to the video frame images and the corresponding pose transformation matrix The three-dimensional coordinates of the feature points in the frame image are converted into the world coordinate system to obtain a three-dimensional point cloud map; the video frame image is used as the input of the target detection model, and the target in the video frame image detected by the target detection model is obtained. object information; combine the three-dimensional point cloud map with the target object information to obtain a three-dimensional point cloud map containing the target object information.

在一个实施例中，所述采用颜色直方图和尺度不变特征变换混合匹配算法对视频帧图像之间的特征点进行匹配，得到视频帧图像之间的特征点匹配对，包括：采用颜色直方图特征匹配算法对视频帧图像之间的特征点进行匹配，得到第一匹配对集合；采用尺度不变特征变换匹配算法对所述第一匹配对集合中的匹配点进行进一步匹配得到目标特征点匹配对。In one embodiment, the matching of feature points between video frame images by using a color histogram and a scale-invariant feature transformation hybrid matching algorithm to obtain feature point matching pairs between video frame images includes: using a color histogram The graph feature matching algorithm matches the feature points between the video frame images to obtain the first matching pair set; the scale-invariant feature transformation matching algorithm is used to further match the matching points in the first matching pair set to obtain the target feature point matching pairs.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一非易失性计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium , when the program is executed, it may include the flow of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. a kind of no-manned plane three-dimensional map constructing method, which is characterized in that the described method includes:

The video frame images that camera is shot are obtained, the characteristic point in each video frame images is extracted；

The characteristic point between video frame images is carried out using color histogram and Scale invariant features transform mixing matching algorithm Matching, obtains the Feature Points Matching pair between video frame images；

According to the Feature Points Matching between the video frame images to the module and carriage transformation matrix being calculated between video frame images；

The corresponding three-dimensional coordinate of each video frame images is determined according to the module and carriage transformation matrix；

According to the corresponding three-dimensional coordinate of video frame images and corresponding module and carriage transformation matrix by the characteristic point in video frame images Three-dimensional coordinate is transformed under world coordinate system, obtains three-dimensional point cloud map；

Using the video frame images as the input of target detection model, the video that the target detection model inspection obtains is obtained Object information in frame image；

By the three-dimensional point cloud map in conjunction with the object information, obtain include object information three-dimensional point cloud Figure.

2. the method according to claim 1, wherein described use color histogram and Scale invariant features transform Mixing matching algorithm matches the characteristic point between video frame images, obtains the Feature Points Matching between video frame images It is right, comprising:

The characteristic point between video frame images is matched using color histogram Feature Correspondence Algorithm, obtains the first matching pair Set；

First matching further matches the match point in set using Scale invariant features transform matching algorithm Obtain target feature point matching pair.

3. the method according to claim 1, wherein according to the Feature Points Matching pair between the video frame images The module and carriage transformation matrix between video frame images is calculated, comprising:

Obtain the three-dimensional coordinate of each characteristic point of Feature Points Matching centering；

It calculates and the three-dimensional coordinate of characteristic point in a video frame images is transformed into the conversion three-dimensional that another video frame images obtain Coordinate；

Obtain in another video frame images the corresponding target three-dimensional coordinate of corresponding matched characteristic point；

Module and carriage transformation matrix is calculated according to the conversion three-dimensional coordinate and the target three-dimensional coordinate.

4. the method according to claim 1, wherein the target detection model is instructed based on deep learning model It gets；

Described using the video frame images as the input of target detection model, the inspection of the target detection model output is obtained Before measuring object, further includes:

Training video image pattern is obtained, the training video image pattern includes positive sample and negative sample, in the positive sample It include object and the object position mark in the video image；

The target detection model is trained according to the training video image pattern, obtains trained target detection mould Type.

5. the method according to claim 1, wherein described believe the three-dimensional point cloud map and the object Breath combine, obtain include object information three-dimensional point cloud map, comprising:

Obtain target position of the object for detecting and obtaining in video frame images；

Matching characteristic point is determined according to the target position；

According to the characteristic point by the object category information labeling to the three-dimensional point cloud map.

6. the method according to claim 1, wherein the method also includes:

Obtain the measurement data that Inertial Measurement Unit measurement obtains；

The initial module and carriage transformation matrix between video frame is calculated according to the measurement data；

The Feature Points Matching according between the video frame images converts the pose being calculated between video frame images Matrix, comprising:

According to the Feature Points Matching between the initial module and carriage transformation matrix and the video frame images to video frame is calculated Between object pose transformation matrix.

7. the method according to claim 1, wherein in the characteristic point according between the video frame images After matching is to the module and carriage transformation matrix being calculated between video frame images, further includes:

The amount of exercise between current video frame and previous key frame is calculated, if amount of exercise is greater than preset threshold, by current video Frame is as key frame；

When the current video frame is key frame, by current video frame and the key frame progress in key frame library before Match, if in the key frame library exist with the matched key frame of current video frame, using current video frame as winding frame；

Update is optimized to corresponding module and carriage transformation matrix according to the winding frame, obtains updating module and carriage transformation matrix；

It is described that the corresponding three-dimensional coordinate of each video frame images is determined according to the module and carriage transformation matrix, comprising: according to it is described more New module and carriage transformation matrix determines the corresponding three-dimensional coordinate of each video frame images.

8. a kind of no-manned plane three-dimensional map structuring device, which is characterized in that described device includes:

Extraction module, the video frame images shot for obtaining camera, extracts the characteristic point in each video frame images；

Matching module, for using color histogram and Scale invariant features transform mixing matching algorithm between video frame images Characteristic point matched, obtain the Feature Points Matching pair between video frame images；

Computing module, for according to the Feature Points Matching between the video frame images to being calculated between video frame images Module and carriage transformation matrix；

Determining module, for determining the corresponding three-dimensional coordinate of each video frame images according to the module and carriage transformation matrix；

Conversion module, for according to the corresponding three-dimensional coordinate of video frame images and corresponding module and carriage transformation matrix by video frame images In the three-dimensional coordinate of characteristic point be transformed under world coordinate system, obtain three-dimensional point cloud map；

Detection module, for obtaining the target detection model using the video frame images as the input of target detection model Detect the object information in obtained video frame images；

Binding modules, in conjunction with the object information, obtaining including object information the three-dimensional point cloud map Three-dimensional point cloud map.

9. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the computer When program is executed by the processor, so that the processor executes the step such as any one of claims 1 to 7 the method Suddenly.

10. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor is executed such as the step of any one of claims 1 to 7 the method.