CN112802096A

CN112802096A - Device and method for realizing real-time positioning and mapping

Info

Publication number: CN112802096A
Application number: CN202010144849.1A
Authority: CN
Inventors: 史晓龙; 高天豪; 彭雄峰; 马宽; 刘志花; 王强; 金允泰; 全明帝; 李洪硕
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2019-11-14
Filing date: 2020-03-04
Publication date: 2021-05-14
Also published as: KR20210058686A

Abstract

Provided are an implementation device and method for real-time positioning and mapping. The implementation device includes: an image acquisition unit configured to acquire a color image and a depth image of the surrounding environment; an initial pose estimation unit configured to estimate an initial pose based on the color image and the depth image; a map construction unit, is configured to construct a three-dimensional map based on the depth image and the color image; and a pose determination unit is configured to determine a final pose based on the initial pose and the three-dimensional map.

Description

Device and method for real-time positioning and mapping

技术领域technical field

本申请涉及实时定位和建图(SLAM)领域，更具体地讲，涉及一种基于重建地图的SLAM实现装置和方法。The present application relates to the field of real-time positioning and mapping (SLAM), and more particularly, to an apparatus and method for implementing SLAM based on a reconstructed map.

背景技术Background technique

在现有技术中的构建三维地图的方法中，通常利用设备上所安装的诸如相机、激光传感器和惯性测量仪等传感装置来实时地获取周围环境的信息，由此构建出周围环境的三维地图，并实时输出设备在世界坐标系中的位置和姿态信息，这种技术被称之为实时定位和建图(SLAM,Simultaneous Localization And Mapping)。In the method for constructing a three-dimensional map in the prior art, sensing devices such as cameras, laser sensors and inertial measuring instruments installed on the equipment are usually used to obtain information of the surrounding environment in real time, thereby constructing a three-dimensional map of the surrounding environment. Map, and output the position and attitude information of the device in the world coordinate system in real time. This technology is called real-time positioning and mapping (SLAM, Simultaneous Localization And Mapping).

几乎所有的SLAM系统都是基于两种框架：一种是非线性优化，另一种是统计滤波。根据硬件传感装置的不同，SLAM系统也可以分为激光传感器、双目/多目、单目+惯性测量设备和单目+深度相机等方式。Almost all SLAM systems are based on two frameworks: one is nonlinear optimization and the other is statistical filtering. According to the different hardware sensing devices, SLAM systems can also be divided into laser sensors, binocular/polyocular, monocular + inertial measurement equipment, and monocular + depth camera.

激光传感技术是早期SLAM系统常用的一种技术，它精度高并且能够得到较为稠密的地图点，但由于激光设备体积较大，重量较大且成本高，不太适用于目前轻量型SLAM系统(如无人机、AR眼镜等)的要求，但是在一些对负载要求不高的设备(诸如扫地机器人、仓储机器人)中仍有使用价值。另一种技术是双目/多目SLAM系统，它可以根据两个摄像机之间的视差估计出场景中特征点的大概深度信息，然后在后端的优化过程中对前端得到的估计值进一步地精确化。比较经典的双目SLAM技术包括OKVIS和ORB-SLAM等。Laser sensing technology is a commonly used technology in early SLAM systems. It has high accuracy and can obtain relatively dense map points. However, due to the large size, weight and high cost of laser equipment, it is not suitable for the current lightweight SLAM. The requirements of the system (such as drones, AR glasses, etc.), but still have value in some equipment that does not require high loads (such as sweeping robots, warehouse robots). Another technology is the binocular/multi-eye SLAM system, which can estimate the approximate depth information of the feature points in the scene according to the disparity between the two cameras, and then further refine the estimated value obtained by the front end in the back-end optimization process. change. The more classic binocular SLAM technologies include OKVIS and ORB-SLAM.

此外，目前基于视觉的SLAM技术也已经逐步应用到消费级产品中。然而，单目相机在SLAM系统中使用时，往往会有一些问题，例如，需要较为精细的初始化过程、难以估计运动轨迹的尺度等等。因此，一些针对单目相机的改变被深入地研究，目前比较经典的有单目相机加惯性传感器、单目相机加深度相机以及一些精细的初始化算法。惯性传感器对于设备运动较为敏感，因此可以很好地弥补单目相机的不足，并且在设备运动较快或旋转较大时，可以精确地估计设备的位姿。单目相机加深度相机则可粗略测量出场景的三维空间位置，因此可以得到运动轨迹的尺度。单目相机由于较为轻便，被广泛应用到许多消费级产品(如无人机、AR眼镜、自动送餐机器人等)中。In addition, the current vision-based SLAM technology has been gradually applied to consumer products. However, when a monocular camera is used in a SLAM system, there are often some problems, such as the need for a relatively fine initialization process, difficulty in estimating the scale of the motion trajectory, and so on. Therefore, some changes for monocular cameras have been studied in depth. Currently, the more classic ones are monocular camera plus inertial sensor, monocular camera plus depth camera, and some fine initialization algorithms. Inertial sensors are sensitive to device motion, so they can make up for the lack of monocular cameras, and can accurately estimate the device's pose when the device moves quickly or rotates a lot. The monocular camera plus the depth camera can roughly measure the three-dimensional spatial position of the scene, so the scale of the motion trajectory can be obtained. Monocular cameras are widely used in many consumer products (such as drones, AR glasses, automatic food delivery robots, etc.) due to their lightness.

然而，在现有大多数SLAM技术中都包含非线性优化部分，但是在非线性优化中实时性和精确度是一个难以兼顾的问题，许多现有的SLAM系统在进行非线性优化时，为了保证实时性，仅能利用过去一小段时间的测量信息，难以得到较好的定位精确度。此外，一些SLAM系统可能需要经常在熟悉的环境中工作，如仓储机器人和送餐机器人。如何利用之前的地图信息，使得系统的计算量随着时间的推移进一步减轻，这方面的研究也相对匮乏。很多基于视觉的SLAM技术前端都会使用特征点提取和特征点匹配来初步追踪相机的位置，但是在一些纹理相对贫瘠的场景(如玻璃房间、较多白墙的房间)中，能够提取和匹配的特征点较少，会使得跟踪精度下降，产生较大的漂移误差。However, nonlinear optimization is included in most existing SLAM technologies, but real-time performance and accuracy are difficult to balance in nonlinear optimization. Many existing SLAM systems perform nonlinear optimization in order to ensure Real-time, only the measurement information of a short period of time in the past can be used, and it is difficult to obtain better positioning accuracy. Additionally, some SLAM systems may need to frequently work in familiar environments, such as warehouse robots and food delivery robots. How to use the previous map information to further reduce the computational load of the system over time is relatively scarce. Many vision-based SLAM technology front-ends use feature point extraction and feature point matching to initially track the position of the camera, but in some scenes with relatively poor textures (such as glass rooms, rooms with many white walls), it is possible to extract and match. Fewer feature points will reduce the tracking accuracy, resulting in a larger drift error.

因此，需要一种能够确保实时性，降低计算量并还保证地图准确性的SLAM技术。Therefore, there is a need for a SLAM technology that can ensure real-time performance, reduce the amount of computation, and also ensure map accuracy.

以上信息仅作为背景信息被提供以用于帮助理解本公开。至于以上信息中的任何信息是否可作为针对本公开的现有技术适用，并未作出确定，并且也未作出任何声明。The above information is presented as background information only to assist with an understanding of the present disclosure. No determination has been made, and no claim is made, as to whether any of the above information is applicable as prior art with regard to the present disclosure.

发明内容SUMMARY OF THE INVENTION

根据本发明的示例性实施例，提供了一种实时定位和建图的实现装置，包括：图像获取单元，被配置为获取周围环境的彩色图像及深度图像；初始位姿估计单元，被配置为基于所述彩色图像和深度图像，估计初始位姿；地图构建单元，被配置为基于所述深度图像和彩色图像，构建三维地图；以及位姿确定单元，被配置基于初始位姿以及三维地图，确定最终位姿。According to an exemplary embodiment of the present invention, an apparatus for realizing real-time positioning and mapping is provided, including: an image acquisition unit configured to acquire color images and depth images of the surrounding environment; an initial pose estimation unit configured to Based on the color image and the depth image, an initial pose is estimated; a map construction unit is configured to construct a three-dimensional map based on the depth image and the color image; and a pose determination unit is configured based on the initial pose and the three-dimensional map, Determine the final pose.

地图构建单元可包括：地图重建模块，被配置为基于所述周围环境的第一深度图像和第一彩色图像来重建初始的三维地图；地图更新模块，被配置为基于所述周围环境的第二深度图像和第二彩色图像，更新所述三维地图。The map construction unit may include: a map reconstruction module configured to reconstruct an initial three-dimensional map based on a first depth image and a first color image of the surrounding environment; a map update module configured to reconstruct a second three-dimensional map based on the surrounding environment A depth image and a second color image update the three-dimensional map.

初始位姿估计单元可包括：特征提取模块，被配置为在彩色图像中提取点特征；特征匹配模块，被配置为根据所述点特征进行点匹配；估计模块，被配置为使用匹配的点特征，估计初始位姿。当匹配的点特征的数量少于第一阈值时，特征提取模块还可在彩色图像中提取线段特征，特征匹配模块可根据所述线段特征进行线段匹配，并且估计模块可使用匹配的点特征和线段特征来估计初始位姿。The initial pose estimation unit may include: a feature extraction module configured to extract point features in the color image; a feature matching module configured to perform point matching according to the point features; an estimation module configured to use the matched point features , estimating the initial pose. When the number of matched point features is less than the first threshold, the feature extraction module may further extract line segment features in the color image, the feature matching module may perform line segment matching according to the line segment features, and the estimation module may use the matched point features and Line segment features to estimate the initial pose.

特征匹配模块可根据所述线段特征周围的几何结构信息进行线段匹配。The feature matching module can perform line segment matching according to the geometric structure information around the line segment feature.

所述实现装置还可包括：全图跟踪模块，被配置为在先前关键帧中确定与当前帧具有共视点的关键帧。初始位姿估计单元可基于所述彩色图像和深度图像，使用所确定的关键帧来估计初始位姿。The implementing apparatus may further include: a full-image tracking module configured to determine a key frame having a common viewpoint with the current frame among the previous key frames. The initial pose estimation unit may use the determined keyframe to estimate an initial pose based on the color image and the depth image.

所述实现装置还可包括：点线优化组合模块，被配置为基于线段特征的匹配结果，建立三维共线约束。位姿确定单元还可根据所述三维共线约束来确定最终位姿。所述三维共线约束可表示第一帧的线段上的点在第二帧的对应线段上。The implementation device may further include: a point-line optimization combination module configured to establish a three-dimensional collinearity constraint based on the matching result of the line segment features. The pose determination unit may also determine the final pose according to the three-dimensional collinearity constraint. The three-dimensional collinearity constraint may represent that a point on a line segment of the first frame is on a corresponding line segment of the second frame.

位姿确定单元可基于初始位姿以及三维地图，使用预设帧集合来确定最终位姿。所述预设帧集合可包括：对与当前帧相关的跟踪集合进行划分所获得的多个跟踪子集。The pose determination unit may use a preset frame set to determine the final pose based on the initial pose and the three-dimensional map. The preset frame set may include: a plurality of tracking subsets obtained by dividing the tracking set related to the current frame.

位姿确定单元可针对当前帧设置预定大小的时域窗口，并基于初始位姿以及三维地图，利用处于所述时域窗口内的关键帧来确定最终位姿。The pose determination unit may set a time domain window of a predetermined size for the current frame, and based on the initial pose and the three-dimensional map, determine the final pose by using key frames within the time domain window.

根据本发明的另一示例性实施例，提供了一种实时定位和建图的实现方法，所述方法包括：获取周围环境的彩色图像及深度图像；基于所述彩色图像和深度图像，估计初始位姿；基于所述深度图像和彩色图像，构建三维地图；以及基于初始位姿以及三维地图，确定最终位姿。According to another exemplary embodiment of the present invention, a method for realizing real-time positioning and mapping is provided, the method includes: acquiring a color image and a depth image of a surrounding environment; based on the color image and the depth image, estimating an initial pose; build a three-dimensional map based on the depth image and the color image; and determine a final pose based on the initial pose and the three-dimensional map.

构建三维地图的步骤可包括：基于所述周围环境的第一深度图像和第一彩色图像来重建初始的三维地图；基于所述周围环境的第二深度图像和第二彩色图像，更新所述三维地图。The step of constructing a three-dimensional map may include: reconstructing an initial three-dimensional map based on a first depth image and a first color image of the surrounding environment; updating the three-dimensional map based on a second depth image and a second color image of the surrounding environment map.

估计初始位姿的步骤可包括：在彩色图像中提取点特征；根据所述点特征进行点匹配；使用匹配的点特征，估计初始位姿。当匹配的点特征的数量少于第一阈值时，估计初始位姿的步骤还可包括：在彩色图像中提取线段特征，根据所述线段特征进行线段匹配，并且使用匹配的点特征和线段特征来估计初始位姿。The step of estimating the initial pose may include: extracting point features in the color image; performing point matching according to the point features; and estimating the initial pose using the matched point features. When the number of matched point features is less than the first threshold, the step of estimating the initial pose may further include: extracting line segment features in the color image, performing line segment matching according to the line segment features, and using the matched point features and line segment features to estimate the initial pose.

根据所述线段特征进行线段匹配的步骤可包括：根据所述线段特征周围的几何结构信息进行线段匹配。The step of performing line segment matching according to the line segment feature may include: performing line segment matching according to geometric structure information around the line segment feature.

所述方法还可包括：在先前关键帧中确定与当前帧具有共视点的关键帧，其中，估计初始位姿的步骤可包括：基于所述彩色图像和深度图像，使用所确定的关键帧来估计初始位姿。The method may further include: determining a key frame having a common viewpoint with the current frame among the previous key frames, wherein the step of estimating the initial pose may include: using the determined key frame based on the color image and the depth image. Estimate the initial pose.

所述方法还可包括：基于线段特征的匹配结果，建立三维共线约束。确定最终位姿的步骤可包括：还根据所述三维共线约束来确定最终位姿。所述三维共线约束可表示第一帧的线段上的点在第二帧的对应线段上。The method may further include: establishing a three-dimensional collinearity constraint based on the matching result of the line segment features. The step of determining the final pose may include determining the final pose also based on the three-dimensional collinearity constraint. The three-dimensional collinearity constraint may represent that a point on a line segment of the first frame is on a corresponding line segment of the second frame.

确定最终位姿的步骤可包括：基于初始位姿以及三维地图，使用预设帧集合来确定最终位姿。所述预设帧集合可包括：对与当前帧相关的跟踪集合进行划分所获得的多个跟踪子集。The step of determining the final pose may include: using a preset frame set to determine the final pose based on the initial pose and the three-dimensional map. The preset frame set may include: a plurality of tracking subsets obtained by dividing the tracking set related to the current frame.

确定最终位姿的步骤可包括：针对当前帧设置预定大小的时域窗口，并基于初始位姿以及三维地图，利用处于所述时域窗口内的关键帧来确定最终位姿。The step of determining the final pose may include: setting a time domain window of a predetermined size for the current frame, and based on the initial pose and the three-dimensional map, using key frames within the time domain window to determine the final pose.

根据本发明的示例性实施例，提供了一种存储指令的计算机可读存储介质，其中，当所述指令被至少一个计算装置运行时，促使所述至少一个计算装置执行前述实时定位和建图的实现方法。According to an exemplary embodiment of the present invention, there is provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one computing device, cause the at least one computing device to perform the aforementioned real-time positioning and mapping implementation method.

根据本发明的示例性实施例，提供了一种包括至少一个计算装置和至少一个存储指令的存储装置的系统，其中，所述指令在被所述至少一个计算装置运行时，促使所述至少一个计算装置执行前述实时定位和建图的实现方法。According to an exemplary embodiment of the present invention, there is provided a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to The computing device executes the aforementioned implementation method of real-time positioning and mapping.

有益效果beneficial effect

通过应用根据本发明的示例性实施例的SLAM方案，至少能够实现以下效果：By applying the SLAM scheme according to the exemplary embodiment of the present invention, at least the following effects can be achieved:

由于在SLAM系统初次进入特定环境时通过3D语义重建出带有空间准确度的地图，并在以后每次进入该特定环境时通过计算空间时间置信度对该地图进行融合和完善，因此能够确保地图的准确性，并且在后端的优化处理中，由于可固定三维地图点的坐标，只对SLAM系统设备的位姿(即，位置和姿态)进行优化，因此使得整个SLAM系统更加轻量；Since the SLAM system reconstructs a map with spatial accuracy through 3D semantics when it first enters a specific environment, and fuses and improves the map by calculating the spatial and temporal confidence every time it enters the specific environment, it can ensure that the map In the back-end optimization process, since the coordinates of the 3D map points can be fixed, only the pose (ie, position and attitude) of the SLAM system equipment is optimized, thus making the entire SLAM system lighter;

当整个SLAM系统进入纹理比较贫瘠的场景区域时，可还对场景的线段特征进行提取和匹配，并在后端优化中加入三维空间点线共线约束子项，使整个SLAM系统更加鲁棒；When the entire SLAM system enters the scene area with poor texture, it can also extract and match the line segment features of the scene, and add the three-dimensional space point-line collinearity constraint sub-item in the back-end optimization to make the entire SLAM system more robust;

通过在全局地图上进行特征跟踪，可得到更多有效的约束，提高SLAM系统的定位精度；通过特征重识别特性，在原有时域空间2d-2d匹配或者3d-2d匹配的基础上，增加了特征在空域匹配，把这些视觉测量加入到了局部和全局极束优化的能量函数里，进一步减小了累积误差；By performing feature tracking on the global map, more effective constraints can be obtained and the positioning accuracy of the SLAM system can be improved; through the feature re-identification feature, on the basis of the original time domain space 2d-2d matching or 3d-2d matching, more features are added. In the spatial matching, these visual measurements are added to the energy functions of the local and global polar beam optimization, which further reduces the cumulative error;

当实施全局光束平差方法时，可将原始的较长特征跟踪集合分解为若干小的特征跟踪子集，提高了计算效率；When the global beam adjustment method is implemented, the original long feature tracking set can be decomposed into several small feature tracking subsets, which improves the computational efficiency;

在实施全局光束平差方法时，当新的关键帧到来时，可不再对涉及的所有关键帧的三维地图点和位姿进行优化，而只是优化与新关键帧有关联的预设时域窗口内的若干关键帧，进一步提高了后端优化效率。When implementing the global beam adjustment method, when a new keyframe arrives, the 3D map points and poses of all the keyframes involved can no longer be optimized, but only the preset time domain window associated with the new keyframe can be optimized Several keyframes within the system further improve the efficiency of back-end optimization.

从下面结合附图公开本发明的各种实施例的详细描述，本发明的其他方面、优点和显著特征对于本领域中的技术人员将变得清楚。Other aspects, advantages and salient features of the invention will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the accompanying drawings, discloses various embodiments of the invention.

附图说明Description of drawings

从下面结合附图进行的描述，本公开的特定实施例的以上和其他方面、特征和优点将更加清楚，其中：The above and other aspects, features and advantages of specific embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings, wherein:

图1是示出根据本发明的示例性实施例的基于地图重建的SLAM系统的框图。FIG. 1 is a block diagram illustrating a map reconstruction based SLAM system according to an exemplary embodiment of the present invention.

图2是示出根据本发明的示例性实施例的基于地图重建的SLAM系统的处理流程的示意图。FIG. 2 is a schematic diagram illustrating a processing flow of a SLAM system based on map reconstruction according to an exemplary embodiment of the present invention.

图3是示出根据本发明的示例性实施例的基于地图重建的SLAM系统中的初始位姿估计单元的框图。3 is a block diagram illustrating an initial pose estimation unit in a map reconstruction-based SLAM system according to an exemplary embodiment of the present invention.

图4是示出根据本发明的示例性实施例的基于地图重建的SLAM系统中的地图构建单元的框图。4 is a block diagram illustrating a map construction unit in a map reconstruction-based SLAM system according to an exemplary embodiment of the present invention.

图5示出线段特征描述符矢量的产生示意图。FIG. 5 shows a schematic diagram of the generation of the line segment feature descriptor vector.

图6示出特征再识别操作的示意图。Figure 6 shows a schematic diagram of a feature re-identification operation.

图7示出全图跟踪模块的关键帧筛选示意图。FIG. 7 shows a schematic diagram of key frame screening of the full-image tracking module.

图8示出全图跟踪模块的全图跟踪流程示意图。FIG. 8 shows a schematic diagram of a full-image tracking flow of the full-image tracking module.

图9示出特征跟踪集合的划分示意图Figure 9 shows a schematic diagram of the division of the feature tracking set

图10示出基于时域窗口的全局光束平差示意图。Figure 10 shows a schematic diagram of the global beam adjustment based on the time domain window.

图11示出闭环误差消除处理的示意图。FIG. 11 shows a schematic diagram of the closed loop error cancellation process.

图12是示出根据本发明的示例性实施例的基于地图重建的SLAM方法的流程图。FIG. 12 is a flowchart illustrating a map reconstruction-based SLAM method according to an exemplary embodiment of the present invention.

在整个附图中，应注意：相同的参考标号用于表示相同或相似的元件、特征和结构。Throughout the drawings, it should be noted that the same reference numerals are used to refer to the same or similar elements, features and structures.

具体实施方式Detailed ways

为了使本领域技术人员更好地理解本发明，下面结合附图和具体实施方式对本发明的示例性实施例作进一步详细说明。In order for those skilled in the art to better understand the present invention, the exemplary embodiments of the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

图1是示出根据本发明的示例性实施例的基于地图重建的SLAM系统100的框图。FIG. 1 is a block diagram illustrating a map reconstruction based SLAM system 100 according to an exemplary embodiment of the present invention.

参照图1，基于地图重建的SLAM系统100(以下简称为“SLAM系统100”)包括图像获取单元110、初始位姿估计单元120、地图构建单元130、位姿确定单元140以及存储单元150。1 , a SLAM system 100 based on map reconstruction (hereinafter referred to as “SLAM system 100 ”) includes an image acquisition unit 110 , an initial pose estimation unit 120 , a map construction unit 130 , a pose determination unit 140 and a storage unit 150 .

所述图像获取单元110可获取SLAM系统100当前所在周围环境的彩色图像及深度图像，作为待处理的当前帧的彩色图像和深度图像。此外，下文所提及的“先前帧”是指已经过根据本发明的示例性实施例的SLAM系统100处理和并在存储单元150中存储了相关信息的帧。所述图像获取单元110可包括单目相机和深度相机来分别获得所述彩色图像及所述深度图像，然而，应该理解，本申请不限于此，还可使用其他任何能够获得深度图像和彩色图像的相机或相机组合(例如，双目相机)来实现图像获取单元110。所述彩色图像可以是RGB图像，或者也可以是其他类型的能够提供像素灰度信息的彩色图像(例如，灰度图像)。The image acquisition unit 110 may acquire the color image and the depth image of the surrounding environment where the SLAM system 100 is currently located, as the color image and the depth image of the current frame to be processed. Also, the “previous frame” mentioned below refers to a frame that has been processed by the SLAM system 100 according to an exemplary embodiment of the present invention and related information is stored in the storage unit 150 . The image acquisition unit 110 may include a monocular camera and a depth camera to obtain the color image and the depth image, respectively, however, it should be understood that the present application is not limited to this, and any other device capable of obtaining the depth image and the color image may also be used. The image acquisition unit 110 is implemented by a camera or a combination of cameras (eg, a binocular camera). The color image may be an RGB image, or may be other types of color images (eg, grayscale images) capable of providing pixel grayscale information.

初始位姿估计单元120可基于所述彩色图像和深度图像，估计SLAM系统100的初始位姿。在本发明的示例性实施例中，所述初始位姿可以是一种相对位姿，稍后将结合图3来对此进行详细解释。The initial pose estimation unit 120 may estimate the initial pose of the SLAM system 100 based on the color image and the depth image. In an exemplary embodiment of the present invention, the initial pose may be a relative pose, which will be explained in detail later with reference to FIG. 3 .

图2是示出根据本发明的示例性实施例的基于地图重建的SLAM系统100的处理流程的示意图。FIG. 2 is a schematic diagram illustrating a processing flow of the map reconstruction-based SLAM system 100 according to an exemplary embodiment of the present invention.

参照图2，可知SLAM系统100的主要处理流程包括：惯性导航数据预积分，特征跟踪，新特征提取，特征再识别，全局光束平差和局部光束平差等处理。2 , it can be seen that the main processing flow of the SLAM system 100 includes: pre-integration of inertial navigation data, feature tracking, new feature extraction, feature re-identification, global beam adjustment and local beam adjustment.

在惯性导航数据预积分处理中，可根据采集的惯性导航数据进行相对姿态预估，该处理的输入是惯性导航数据，输出是相对姿态；在特征跟踪处理中，可对初始提取特征进行跟踪，该处理的输入为初始提取特征，输出为下一帧跟踪到的特征；在新特征提取处理中，可在图像获取单元110获取的图像中提取特征，该处理的输入为彩色图像，输出是特征；在特征再识别处理中，可寻找曾经提取的特征，该处理的输入为原始特征和之前帧的位姿结果，输出为特征再识别处理后的特征，具体实现可参照后面图6解释；在全局光束平差处理中，可进行全局非线性优化，该处理的输入是特征匹配集合，输出是位姿；在局部光束平差处理中，可进行局部非线性优化，该处理的输入是特征匹配集合，输出是位姿。In the pre-integration process of inertial navigation data, the relative attitude can be estimated according to the collected inertial navigation data. The input of the process is the inertial navigation data, and the output is the relative attitude; in the feature tracking process, the initial extracted features can be tracked, The input of this process is the initial extraction feature, and the output is the feature tracked in the next frame; in the new feature extraction process, the feature can be extracted from the image acquired by the image acquisition unit 110, the input of this process is a color image, and the output is the feature ; In the feature re-identification process, the feature that has been extracted can be found, the input of this process is the original feature and the pose result of the previous frame, and the output is the feature after the feature re-identification process, and the specific implementation can be explained with reference to the following Figure 6; In the global beam adjustment processing, global nonlinear optimization can be performed, the input of the processing is the feature matching set, and the output is the pose; in the local beam adjustment processing, local nonlinear optimization can be performed, and the input of the processing is the feature matching. Set, the output is the pose.

图3是示出根据本发明的示例性实施例的基于地图重建的SLAM系统100中的初始位姿估计单元120的框图。FIG. 3 is a block diagram illustrating the initial pose estimation unit 120 in the map reconstruction-based SLAM system 100 according to an exemplary embodiment of the present invention.

参照图3，初始位姿估计单元120可包括：特征提取模块121，被配置为在图像获取单元110获取的彩色图像中提取特征；特征匹配模块122，被配置为根据所述特征进行特征匹配；以及估计模块123，被配置为使用匹配的特征来估计SLAM系统100的初始位姿。3 , the initial pose estimation unit 120 may include: a feature extraction module 121, configured to extract features from the color image acquired by the image acquisition unit 110; a feature matching module 122, configured to perform feature matching according to the features; and an estimation module 123 configured to estimate the initial pose of the SLAM system 100 using the matched features.

在本发明的示例性实施例中，在特征提取模块121完成特征提取之后，特征匹配模块122可根据深度图像的深度信息获得提取的各个特征在当前帧的相机坐标系下的三维坐标，然后根据特征的局部信息描述找出其在另一帧上的匹配特征，并获得匹配特征在所述另一帧的相机坐标系下的三维坐标。在本发明的示例性实施例中，所述另一帧可以是先前帧之中的特定帧，稍后将对此进行更详细的描述。特征提取模块121和特征匹配模块122获得的这些特征相关信息可作为针对当前帧的相关信息而被存储在存储单元150中以在之后的处理中使用。In an exemplary embodiment of the present invention, after the feature extraction module 121 completes the feature extraction, the feature matching module 122 may obtain the three-dimensional coordinates of the extracted features in the camera coordinate system of the current frame according to the depth information of the depth image, and then according to The local information description of the feature finds its matching feature on another frame, and obtains the three-dimensional coordinates of the matching feature in the camera coordinate system of the other frame. In an exemplary embodiment of the present invention, the another frame may be a specific frame among the previous frames, which will be described in more detail later. These feature-related information obtained by the feature extraction module 121 and the feature matching module 122 may be stored in the storage unit 150 as the related information for the current frame to be used in subsequent processing.

仅作为示例，特征提取模块121可在当前帧的彩色图像中提取点特征，并且特征匹配模块122可根据所述点特征进行点匹配。更具体地讲，特征匹配模块122可在先前帧之中的特定帧中寻找与特征提取模块121提取的点特征匹配的点特征。之后，估计模块123可使用匹配的点特征来估计SLAM系统100的初始位姿。使用点特征进行特征匹配的方法对于本领域技术人员而言是已知的，因此为了简明，在此将不再进行详细描述。For example only, the feature extraction module 121 may extract point features in the color image of the current frame, and the feature matching module 122 may perform point matching according to the point features. More specifically, the feature matching module 122 may look for point features that match the point features extracted by the feature extraction module 121 in a specific frame among the previous frames. Afterwards, the estimation module 123 may use the matched point features to estimate the initial pose of the SLAM system 100 . The method for feature matching using point features is known to those skilled in the art, so for the sake of brevity, it will not be described in detail here.

此外，当SLAM系统100所处的的环境纹理较为缺乏时(例如，处于玻璃房间、颜色较为统一的房间等)，能够获得的匹配的点特征的数量较少，这会导致SLAM系统100的误差较大或直接失效。因此，优选地，在本发明的示例性实施例中，当特征匹配模块122匹配的点特征的数量少于第一预设阈值(例如，20个)时，所述特征提取模块121还可在当前帧的彩色图像中进一步提取线段特征，并且特征匹配模块122还可根据所述线段特征进行线段匹配，即，在所述特定帧中寻找与特征提取模块121提取的线段特征匹配的线段特征。此时，估计模块123可使用匹配的点特征和线段特征两者来估计SLAM系统100的初始位姿。In addition, when the environment texture in which the SLAM system 100 is located is relatively lacking (eg, in a glass room, a room with relatively uniform colors, etc.), the number of matching point features that can be obtained is small, which will lead to errors in the SLAM system 100 larger or directly ineffective. Therefore, preferably, in an exemplary embodiment of the present invention, when the number of point features matched by the feature matching module 122 is less than the first preset threshold (for example, 20), the feature extraction module 121 may also Line segment features are further extracted from the color image of the current frame, and the feature matching module 122 can also perform line segment matching according to the line segment features, that is, search for line segment features matching the line segment features extracted by the feature extraction module 121 in the specific frame. At this point, the estimation module 123 may use both the matched point features and line segment features to estimate the initial pose of the SLAM system 100 .

在本发明的示例性实施例中，为了利用提取的线段特征进行特征匹配，可对线段设置特征描述符。此时，由于环境纹理匮乏，线段特征周围的局部灰度信息利用效率较低，因此在本发明的示例性实施例中，可根据线段之间的相对形状信息对线段特征进行描述。具体来说，可将彩色图像中提取出的任意一条线段用作参考线段，然后可根据彩色图像中剩余的所有线段相对于此参考线段的相对位置和相对角度来建立特征描述符。使用线段的中点坐标来表示其位置。例如，对于任一线段i，其描述符的矢量形式v_i(k)可表示为：In an exemplary embodiment of the present invention, in order to perform feature matching using the extracted line segment features, a feature descriptor may be set for the line segment. At this time, due to the lack of environmental textures, the utilization efficiency of local grayscale information around the line segment features is low. Therefore, in an exemplary embodiment of the present invention, the line segment features can be described according to the relative shape information between the line segments. Specifically, any line segment extracted from the color image can be used as a reference line segment, and then feature descriptors can be established according to the relative positions and relative angles of all remaining line segments in the color image with respect to this reference line segment. Use the midpoint coordinates of the line segment to indicate its position. For example, for any line segment _i , the vector form of its descriptor vi (k) can be expressed as:

v_i(k)＝#{q≠p_i:(q-p_i)∈bin(k)} (1)v _i (k)=#{q≠ _pi :( _qpi )∈bin(k)} (1)

其中，bin(k)表示彩色图像中的多个区域之中的第k个区域，k为区域标号，p_i为线段i的中点坐标，q表示除线段i之外的其他线段的中点坐标。Among them, bin(k) represents the kth region among the multiple regions in the color image, k is the region label, pi is the midpoint coordinate of line segment _i , and q represents the midpoint of other line segments except line segment i coordinate.

图5示出线段特征描述符矢量的生成示意图。在图5中，{v₁,v₂,…,v_n}分别指示彩色图像的如图5所示的4×4的块区域中的第1个区域、第2个区域、…、以及第n个区域中的线段特征描述符矢量。FIG. 5 shows a schematic diagram of the generation of line segment feature descriptor vectors. In FIG. 5 , {v ₁ , v ₂ ,...,v _n } respectively indicate the first area, the second area,..., and the first area in the 4×4 block area shown in FIG. 5 of the color image Line segment feature descriptor vector in n regions.

特征匹配模块122可根据线段特征周围的几何结构信息进行线段匹配。具体来说，特征匹配模块122可对当前帧和所述特定帧的线段特征描述符矢量两两计算欧式距离，从而找到最佳线段匹配。这里，为了保证匹配的正确率，在本发明的示例性实施例中，可设定最佳线段匹配必须是两个线段均为各自在对应图像中的最佳匹配。The feature matching module 122 may perform line segment matching according to the geometric structure information around the line segment feature. Specifically, the feature matching module 122 may calculate the Euclidean distance between the line segment feature descriptor vectors of the current frame and the specific frame, so as to find the best line segment matching. Here, in order to ensure the correct rate of matching, in an exemplary embodiment of the present invention, it can be set that the best line segment matching must be the best matching of both line segments in the corresponding images.

在本发明的示例性实施例中，由于当前帧的深度图像可提供深度信息，因此估计模块123可基于迭代最近点(ICP)方法，使用匹配的点特征和/或线段特征对当前帧与所述特定帧之间的位姿变化进行估计，从而获得SLAM系统100的初始位姿。也就是说，针对当前帧所获得的SLAM系统100的初始位姿是相对于所述特定帧的相对位姿。此外，应该理解，本申请不限于此，还可使用其他各种合适的方法来估计SLAM系统100的初始位姿。In an exemplary embodiment of the present invention, since the depth image of the current frame may provide depth information, the estimation module 123 may use the matched point feature and/or line segment feature to compare the current frame with the The pose changes between the specific frames are estimated to obtain the initial pose of the SLAM system 100 . That is, the initial pose of the SLAM system 100 obtained for the current frame is the relative pose with respect to the specific frame. In addition, it should be understood that the present application is not limited thereto, and various other suitable methods may also be used to estimate the initial pose of the SLAM system 100 .

此外，在本发明的示例性实施例中，当SLAM系统100首次进入特定环境时，估计模块123可直接将此时SLAM系统100的初始位姿设置为零矢量。In addition, in the exemplary embodiment of the present invention, when the SLAM system 100 enters a specific environment for the first time, the estimation module 123 may directly set the initial pose of the SLAM system 100 to a zero vector at this time.

此外，在本发明的示例性实施例中，当匹配的点特征的数量非常少时，例如，当在所述特定帧中与特征提取模块121提取的点特征匹配的点特征的数量少于第二预设阈值(例如，5个)时，通过点特征匹配所估计的相对位姿不再可靠，此时，所述估计模块123可将SLAM系统100的初始位姿确定为与所述特定帧中SLAM系统100的相对位姿保持相同。Furthermore, in an exemplary embodiment of the present invention, when the number of matched point features is very small, for example, when the number of point features matched with the point features extracted by the feature extraction module 121 in the specific frame is less than the second When the preset threshold (for example, 5), the relative pose estimated by point feature matching is no longer reliable, and at this time, the estimation module 123 may determine the initial pose of the SLAM system 100 to be the same as that in the specific frame. The relative pose of the SLAM system 100 remains the same.

此外，在本发明的示例性实施例中，所述特定帧可以是当前帧的上一帧或附近的若干帧。然而，如果仅使用与当前帧附近的若干帧的关系来建立针对当前帧的约束，则当SLAM系统100运行较快或者旋转较大时，能与当前帧建立联系的帧数就会变少，可能影响SLAM系统100的准确性。因此，优选地，根据本发明的示例性实施例的SLAM系统100还可包括全图跟踪模块(未示出)。所述全图跟踪模块(未示出)可在先前帧(例如，先前的关键帧)之中找出与当前帧具有共视点(例如，共视特征点)的共视关键帧(即，在当前帧与这些共视关键帧中都能够看到该共视点，存在与该共视点对应的投影点)，这些共视关键帧可被用于进一步建立针对当前帧的约束。在这种情况下，所述特定帧还可包括这些共视关键帧。也就是说，特征匹配模块122还可在所述共视关键帧中寻找与特征提取模块121提取的特征(包括点特征及线段特征)匹配的特征，从而估计模块123可进一步使用这些匹配的特征估计初始位姿。以下将结合图6、图7和图8对全图跟踪模块进行详细描述。Furthermore, in an exemplary embodiment of the present invention, the specific frame may be a previous frame or several frames in the vicinity of the current frame. However, if the constraints for the current frame are established using only the relationship to a few frames near the current frame, then when the SLAM system 100 runs faster or rotates more, fewer frames can be associated with the current frame, The accuracy of the SLAM system 100 may be affected. Therefore, preferably, the SLAM system 100 according to the exemplary embodiment of the present invention may further include a full-image tracking module (not shown). The full-image tracking module (not shown) may find a co-view key frame (ie, a co-view key frame in a previous frame (eg, a co-view feature point) with the current frame that has a co-view point (eg, a co-view feature point) among previous frames (eg, a previous key frame). The co-view point can be seen in both the current frame and these co-view key frames, and there is a projection point corresponding to the co-view point), and these co-view key frames can be used to further establish constraints for the current frame. In this case, the specific frame may also include these co-view key frames. That is to say, the feature matching module 122 can also look for features matching the features (including point features and line segment features) extracted by the feature extraction module 121 in the co-view key frame, so that the estimation module 123 can further use these matched features Estimate the initial pose. The full-image tracking module will be described in detail below with reference to FIG. 6 , FIG. 7 and FIG. 8 .

图6示出特征再识别操作的示意图。图7示出全图跟踪模块的关键帧筛选示意图。图8示出全图跟踪模块的全图跟踪流程示意图。Figure 6 shows a schematic diagram of a feature re-identification operation. FIG. 7 shows a schematic diagram of key frame screening of the full-image tracking module. FIG. 8 shows a schematic diagram of a full-image tracking flow of the full-image tracking module.

参照图6，左上角图像提取了一个点特征，标号为6993，从帧号1035的图像一直跟踪到帧号1042的图像，然后在帧号1043图像中，这个特征跟踪丢失，这是因为在这幅图像中，有一个梯子挡住了这个特征。但是在帧号1723图像中，这个特征被重新识别出。提取出该特征的帧(即，帧号1035的帧)和重新识别出此特征的帧(即，帧号1723的帧)之间角度为67.8度，平移距离大概为3米左右。帧号1035至1042的帧与帧号1723的帧具有共同的共视特征点6993，因此，帧号1035至1042的帧也可被用于建立针对帧号1723的帧的约束。由此可提高帧帧之间产生约束的特征数量并且为当前帧找到更广泛而稳定的约束(即，找出更多能够建立针对当前帧的约束的帧)，进而可提高SLAM系统100定位的精度。Referring to Figure 6, the upper left image extracts a point feature, labeled 6993, which is tracked from the image of frame number 1035 to the image of frame number 1042, and then in the image of frame number 1043, this feature tracking is lost, because in this In this image, a ladder blocks this feature. But in the frame number 1723 image, this feature is re-identified. The angle between the frame where the feature is extracted (ie, the frame with frame number 1035) and the frame where this feature is re-identified (ie, the frame with frame number 1723) is 67.8 degrees, and the translation distance is about 3 meters. Frames of frame numbers 1035 to 1042 share a common view feature point 6993 with the frame of frame number 1723, therefore, frames of frame numbers 1035 to 1042 can also be used to establish constraints for the frame of frame number 1723. This can increase the number of features that generate constraints between frames and find broader and more stable constraints for the current frame (ie, find more frames that can establish constraints on the current frame), thereby improving the SLAM system 100 positioning accuracy. precision.

可选择地，由于已使用当前帧的上一帧或附近的帧来进行初始位姿估计，再使用时间距当前帧较近的共视关键帧来进行初始位姿估计意义不大，因此，在本发明的示例性实施例中，优选地，初始位姿估计单元120可仅选择使用时间距当前帧超过预设时间阈值的共视关键帧进行初始位姿估计。稍后参照图7对此进行进一步描述。Optionally, since the initial pose estimation has been performed using the previous frame or a nearby frame of the current frame, it is not meaningful to use the common-view key frame that is closer to the current frame to perform the initial pose estimation. In an exemplary embodiment of the present invention, preferably, the initial pose estimation unit 120 may only select and use the co-view key frames whose time distance from the current frame exceeds a preset time threshold for initial pose estimation. This is further described later with reference to FIG. 7 .

在选择和当前帧进行匹配的关键帧时，由于计算负担原因，需进行关键帧筛选，这里可使用时间和空间结合的策略对关键帧进行筛选。When selecting a keyframe that matches the current frame, due to computational burden, keyframe screening is required. Here, a strategy combining time and space can be used to screen keyframes.

对于时间上的筛选，仅作为示例，参照图7，时间上距离当前帧较近的关键帧，如图中K_t,K_t+1,K_t+2等帧，由于和当前帧距离较近，则它们之间本就存在许多的共视点，因此再增加少数共视点对最终精度影响不大，因此可设定一个时间阈值，在距离当前帧时间较为久远的关键帧中去寻找匹配特征，提高计算效率。另外，从空间上看，也需要进行一系列的筛选操作，例如，对关键帧上的点进行特征过滤、基线较大特征匹配和特征分级等操作。当含有所有对空间时间条件敏感的点(即，经过空间和时间条件筛选出的点)的关键帧都经过处理以后，则关键帧筛选完成。For temporal screening, only as an example, referring to FIG. 7 , the key frames that are closer to the current frame in time, such as K _t , K _t+1 , K _t+2 and other frames in the figure, are close to the current frame due to the , then there are many common viewpoints between them, so adding a few common viewpoints has little effect on the final accuracy, so a time threshold can be set to find matching features in key frames that are far away from the current frame. Improve computational efficiency. In addition, from a spatial point of view, a series of screening operations are also required, such as feature filtering, baseline larger feature matching, and feature classification for points on key frames. When the keyframes containing all the points sensitive to the spatial and temporal conditions (ie, the points filtered out by the spatial and temporal conditions) are processed, the keyframe filtering is completed.

对点进行空间上的筛选的处理可如下：The process of spatially filtering points can be as follows:

首先，可过滤掉不稳定的点。每个地图点的逆深度都在全图光束平差中进行迭代优化，因此可计算每个地图点的逆深度在一段时期内的变化，如果该变化过大或者其逆深度小于零，则认为这个点是不稳定的，并去掉此点。First, unstable points can be filtered out. The inverse depth of each map point is iteratively optimized in the full-map beam adjustment, so the change of the inverse depth of each map point over a period of time can be calculated. If the change is too large or its inverse depth is less than zero, it is considered that This point is unstable, and remove this point.

另外，还可去掉处于图像边缘的点。这里，通过相机姿态的初步估计(例如，通过使用初始位姿)，通常地图点的投射点会投射在当前帧上对应特征点的周围，如果所述投射点与对应特征点之间的计算的空间距离太大，则可过滤掉该地图点。In addition, points at the edge of the image can be removed. Here, through the preliminary estimation of the camera pose (for example, by using the initial pose), usually the projected point of the map point will be projected around the corresponding feature point on the current frame, if the calculated difference between the projected point and the corresponding feature point If the spatial distance is too large, the map point can be filtered out.

之后，可计算所述投射点与对应特征点之间的描述距离(即，描述符)，然后当该描述距离小于某一阈值时，可执行特征再识别操作。Afterwards, a description distance (ie, a descriptor) between the projection point and the corresponding feature point can be calculated, and then when the description distance is smaller than a certain threshold, a feature re-identification operation can be performed.

由于对空间时间条件敏感的点，往往存在和当前帧之间对应点基线过宽的情况，容易产生误匹配，因此在本发明的示例性实施例中可使用下面结合图8描述的处理来确定共视关键帧。Due to the point sensitive to the spatial and temporal conditions, the baseline of the corresponding point between the current frame and the corresponding point is often too wide, which is prone to false matching. Therefore, in an exemplary embodiment of the present invention, the process described below in conjunction with FIG. 8 can be used to determine Common view keyframes.

仅作为示例，如图8所示，C₁,C₂,C₃均为关键帧，C₄为当前帧。其中，C₁,C₂,C₃与C₄距离较远，传统方法中是不考虑C₁,C₂,C₃与C₄之间的约束的。但在本发明的示例性实施例中，以特征点P₁(这里，特征点P₁是地图点)为例，在关键帧C₁中可看到该特征点P₁，全图跟踪模块(未示出)可通过如下操作来确定关键帧C₁是否是当前帧C₄的共视关键帧：Just as an example, as shown in FIG. 8 , C ₁ , C ₂ , and C ₃ are all key frames, and C ₄ is the current frame. Among them, C ₁ , C ₂ , C ₃ and C ₄ are far away, and the traditional method does not consider the constraints between C ₁ , C ₂ , C ₃ and C ₄ . However, in the exemplary embodiment of the present invention, taking the feature point P ₁ (here, the feature point P ₁ is a map point) as an example, the feature point P ₁ can be seen in the key frame C ₁ , and the full image tracking module ( (not shown) can be determined by the following operations to determine whether the key frame C ₁ is the co-view key frame of the current frame C ₄ :

(1)将特征点P₁根据初始位姿估计单元120估计的初始相对位姿关系投射到当前帧C₄，标记为特征点p_i；(1) Project the feature point P ₁ to the current frame C ₄ according to the initial relative pose relationship estimated by the initial pose estimation unit 120, and mark it as feature point p _i ;

(2)计算当前帧C₄中该特征点p_i附近的特征点q_i与特征点p_i的空间坐标以及局部灰度差值d(p_i,q_i)；(2) _Calculate the spatial coordinates of the feature point _qi and the feature point _pi near the feature point pi in the current frame _C4 and the local grayscale difference d( _pi , _qi );

(3)找出当前帧C₄中与特征点p_i的局部灰度差值小于设定阈值的所有特征点的集合

其中，k＝1,2,3,…,m，m为找出的特征点的数量；(3) Find the set of all feature points in the current frame _C4 whose local grayscale difference with the feature point p _i is less than the set threshold

Among them, k=1,2,3,...,m, m is the number of found feature points;

(4)基于以下公式(2)和(3)来对特征点p_i的特征描述符D(p_i)和特征点

的特征描述符

分别进行比较。这里，如果关键帧C₁和当前帧C₄的相对旋转ΔR和相对平移ΔT均小于给定阈值(例如，相对旋转阈值T_ΔR＝45度，相对平移阈值T_ΔT＝2米)，则直接比较特征点p_i的ORB描述符D(p_i)和特征点

的ORB描述符

的汉明距离，否则，可如公式(3)所示先将特征点

的描述重新投射(warp)到当前帧上，然后比较ORB描述符的汉明距离；(4) Based on the following formulas (2) and (3), the feature descriptor D( _{pi ) of the feature point p i} _and the feature point

feature descriptor of

Compare them separately. Here, if the relative rotation ΔR and relative translation ΔT of key frame C ₁ and current frame C ₄ are both less than a given threshold (eg, relative rotation threshold T _ΔR = 45 degrees, relative translation threshold T _ΔT = 2 meters), then a direct comparison is made _ORB descriptor D( _pi ) of feature point pi and feature point

ORB descriptor

The Hamming distance of , otherwise, as shown in formula (3), the feature points can

The description is re-projected (warp) to the current frame, and then the Hamming distance of the ORB descriptor is compared;

(5)将上述集合

中的ORB描述符之间的汉明距离最小的点确定为与特征点P₁匹配的特征点，并且此时可确定关键帧C₁与当前帧C₄具有共视特征点P₁。(5) Combine the above

The point with the smallest Hamming distance between the ORB descriptors in is determined as the feature point matching the feature point P ₁ , and at this time, it can be determined that the key frame C ₁ and the current frame C ₄ have a common view feature point P ₁ .

其中，in,

在以上描述中，使用汉明距离来指示特征点之间的距离，但这仅是示例，还可使用其他各种距离表示方法来确定特征点之间的距离。In the above description, the Hamming distance is used to indicate the distance between the feature points, but this is just an example, and other various distance representation methods may also be used to determine the distance between the feature points.

在本发明的示例性实施例中，当初始位姿估计单元120使用多个帧(例如，除了使用当前帧的上一帧之外还使用了上述共视关键帧)进行位姿估计而获得了多个初始位姿时，初始位姿估计单元120可将获得的多个初始位姿的统计值(例如，平均值、中值等等)确定为SLAM系统100的初始位姿。In an exemplary embodiment of the present invention, when the initial pose estimation unit 120 performs pose estimation using a plurality of frames (eg, the above-mentioned co-view key frame is used in addition to the previous frame of the current frame), the obtained When there are multiple initial poses, the initial pose estimation unit 120 may determine the obtained statistical values (eg, average, median, etc.) of the multiple initial poses as the initial pose of the SLAM system 100 .

接下来，返回参照图1，地图构建单元130可基于图像获取单元110获取的深度图像和彩色图像构建三维地图。Next, referring back to FIG. 1 , the map construction unit 130 may construct a three-dimensional map based on the depth image and the color image acquired by the image acquisition unit 110 .

图4示出根据本发明的示例性实施例的基于地图重建的SLAM系统100中的地图构建单元130的结构。FIG. 4 shows the structure of the map construction unit 130 in the SLAM system 100 based on map reconstruction according to an exemplary embodiment of the present invention.

如图4所示，根据本发明的示例性实施例的，地图构建单元130可包括地图重建模块131、地图存储模块132和地图更新模块133。As shown in FIG. 4 , according to an exemplary embodiment of the present invention, the map construction unit 130 may include a map reconstruction module 131 , a map storage module 132 and a map update module 133 .

地图重建模块131可在首次进入特定环境时基于所述特定环境的第一深度图像和第一彩色图像来重建初始的三维地图，之后地图更新模块133可在再次进入所述特定环境时基于所述特定环境的第二深度图像和第二彩色图像来更新所述三维地图The map reconstruction module 131 may reconstruct an initial three-dimensional map based on the first depth image and the first color image of the specific environment when entering the specific environment for the first time, and then the map updating module 133 may re-enter the specific environment based on the a second depth image and a second color image of the specific environment to update the 3D map

更具体地讲，当SLAM系统100首次进入特定环境时，可由地图重建模块131基于SLAM系统100首次进入该特定环境时所获取的深度图像和彩色图像来重建三维地图，确定重建的三维地图上的每个地图点的三维坐标(即，在世界坐标系下的三维坐标)及其空间准确度。More specifically, when the SLAM system 100 enters a specific environment for the first time, the map reconstruction module 131 can reconstruct a three-dimensional map based on the depth image and color image obtained when the SLAM system 100 enters the specific environment for the first time, and determine the 3D map on the reconstructed three-dimensional map. The three-dimensional coordinates of each map point (ie, three-dimensional coordinates in the world coordinate system) and its spatial accuracy.

仅作为示例，地图重建模块131可基于深度图像中所包括的深度信息以及彩色图像中所包括的灰度信息，使用三维语义重建来重建三维地图。在地图重建模块131使用三维语义重建来重建三维地图的过程中，除了确定三维地图上的每个地图点的三维坐标以外，还可确定每个地图点的空间准确度。所述空间准确度可指示计算出的地图点的重投影误差，并且关于空间准确度的计算在下面被详细描述。此外，应该理解，以上列举的三维语义重建方法仅是实现上述三维地图重建处理的方式之一，本申请不限于此，还可使用本领域已知的任何其它合适的三维地图重建方法来重建三维地图并确定每个地图点的空间准确度。For example only, the map reconstruction module 131 may reconstruct a three-dimensional map using three-dimensional semantic reconstruction based on depth information included in the depth image and grayscale information included in the color image. In the process of reconstructing the three-dimensional map using the three-dimensional semantic reconstruction by the map reconstruction module 131, in addition to determining the three-dimensional coordinates of each map point on the three-dimensional map, the spatial accuracy of each map point may also be determined. The spatial accuracy may indicate a reprojection error of the calculated map points, and the calculation of the spatial accuracy is described in detail below. In addition, it should be understood that the three-dimensional semantic reconstruction method enumerated above is only one of the ways to realize the above-mentioned three-dimensional map reconstruction processing, and the present application is not limited to this, and any other suitable three-dimensional map reconstruction method known in the art can also be used to reconstruct three-dimensional maps. map and determine the spatial accuracy of each map point.

在完成三维地图重建之后，地图重建模块131可将带有空间准确度的的三维地图存储在地图存储模块132中。这里，由于SLAM系统100首次进入特定环境时并未获得空间时间置信度，因此可将此时的三维地图上的地图点的空间时间置信度设置为空并进行存储。After completing the three-dimensional map reconstruction, the map reconstruction module 131 may store the three-dimensional map with spatial accuracy in the map storage module 132 . Here, since the SLAM system 100 does not obtain the spatial and temporal confidence when entering a specific environment for the first time, the spatial and temporal confidence of the map point on the three-dimensional map at this time can be set as null and stored.

之后，当SLAM系统100再次进入该特定环境时，可无需再次重建三维地图，而是由地图更新模块133基于SLAM系统100再次进入该特定环境时所获取的深度图像和彩色图像计算每个地图点的三维坐标及其空间时间置信度，根据计算出的每个地图点的三维坐标及其空间时间置信度来更新地图存储模块132中所存储的三维地图上的对应地图点的三维坐标及其空间时间置信度。更新后的三维地图可作为地图构建单元130构建的三维地图进行后续使用。After that, when the SLAM system 100 re-enters the specific environment, it is not necessary to rebuild the three-dimensional map again, but the map update module 133 calculates each map point based on the depth image and color image obtained when the SLAM system 100 re-enters the specific environment The three-dimensional coordinates and their space-time confidence of each map point are updated according to the calculated three-dimensional coordinates of each map point and their space-time confidence, and the three-dimensional coordinates of the corresponding map points and their space on the three-dimensional map stored in the map storage module 132 are updated. time confidence. The updated three-dimensional map can be used as the three-dimensional map constructed by the map constructing unit 130 for subsequent use.

这里，对于三维地图上的特定地图点，其空间时间置信度通常会随着时间的流逝而下降，并且其下降速度通常与该特定地图点的三维语义类型相关。在本发明的示例性实施例中，可如以下公式(4)来确定所述特定地图点的空间时间置信度C_c：Here, for a specific map point on a 3D map, its spatial-temporal confidence usually decreases over time, and its decreasing speed is usually related to the 3D semantic type of the specific map point. In an exemplary embodiment of the present invention, the spatial-temporal confidence C _c of the specific map point can be determined as the following formula (4):

其中，t₁表示SLAM系统100的当前系统时间，t₀表示SLAM系统100的初始系统时间。ω1和ω2分别表示空间权重和时间权重。

表示进行三维语义重建时该特定地图点的重投影误差，并且可由以下公式(5)来计算：Wherein, t ₁ represents the current system time of the SLAM system 100 , and t ₀ represents the initial system time of the SLAM system 100 . ω1 and ω2 represent the spatial and temporal weights, respectively.

represents the reprojection error of this particular map point when performing 3D semantic reconstruction, and can be calculated by the following formula (5):

在公式(5)中，

为与该特定地图点对应的投影点的二维坐标，ε表示SLAM系统100在世界坐标系下的位姿，这里可基于初始位姿估计单元120所估计出的初始位姿来获得，X表示该特定地图点在世界坐标系下的三维坐标，π是用于将输入变量转换为二维坐标的函数。In formula (5),

is the two-dimensional coordinate of the projection point corresponding to the specific map point, ε represents the pose of the SLAM system 100 in the world coordinate system, which can be obtained based on the initial pose estimated by the initial pose estimation unit 120, and X represents The three-dimensional coordinates of this particular map point in the world coordinate system, and π is the function used to convert the input variables into two-dimensional coordinates.

由此，可如以下公式(6)和(7)来分别实现该特定地图点的三维坐标及其空间准确度的更新：Therefore, the update of the three-dimensional coordinates of the specific map point and its spatial accuracy can be achieved respectively as the following formulas (6) and (7):

C_p′＝0.5×(C_p+C_c)......(7)C _p ′=0.5×(C _p +C _c )......(7)

其中，X_p和C_p分别为地图存储模块132中所存储的该特定地图点的三维坐标及其空间时间置信度，X_c为地图更新模块133计算出的该特定地图点的当前三维坐标，X_p′和C_p′分别为该特定地图点的更新后的三维坐标及其空间时间置信度。P_r表示初始位姿估计单元120所估计出的SLAM系统100的初始位姿。Wherein, X _p and C _p are respectively the three-dimensional coordinates of the specific map point stored in the map storage module 132 and their spatial and temporal confidence, and X _c is the current three-dimensional coordinates of the specific map point calculated by the map update module 133, X _p ' and C _p ' are the updated three-dimensional coordinates of the particular map point and its spatial-temporal confidence, respectively. P _r represents the initial pose of the SLAM system 100 estimated by the initial pose estimation unit 120 .

在计算出上述X_p′和C_p′之后，可使用计算出X_p′和C_p′来分别替换地图存储模块132中所存储的X_p和C_p，由此实现地图存储模块132中存储的三维地图的更新。After the above-mentioned X _p ' and C _p ' are calculated, the calculated X _p ' and C _p ' can be used to replace X _p and C _p stored in the map storage module 132 respectively, thereby realizing the storage in the map storage module 132 update of the 3D map.

在图4的图示中，地图存储模块132被示出为单独的模块，然而，应该理解，仅作为示例，地图存储模块132也可与地图更新模块133集成为一个模块。In the illustration of FIG. 4 , the map storage module 132 is shown as a separate module, however, it should be understood that the map storage module 132 may also be integrated with the map update module 133 as one module, by way of example only.

通过地图更新模块133的上述更新处理，每当SLAM系统100进入相同的环境时，可仅对存储的三维地图进行更新而无需每次重建三维地图，由此降低了获得三维地图所需的计算量，并且还使得地图越来越精确，从而能够在后续的三维地图的使用过程中，固定三维地图点，而仅优化SLAM系统100的位姿。Through the above-mentioned update process of the map update module 133, every time the SLAM system 100 enters the same environment, only the stored 3D map can be updated without rebuilding the 3D map every time, thereby reducing the amount of computation required to obtain the 3D map , and also makes the map more and more accurate, so that the three-dimensional map points can be fixed during the subsequent use of the three-dimensional map, and only the pose of the SLAM system 100 can be optimized.

接下来，返回参照图1，根据本发明的示例性实施例的位姿确定单元140可基于估计出的SLAM系统100的初始位姿以及构建的三维地图来确定SLAM系统100的最终位姿。在本发明的示例性实施例中，位姿确定单元140可基于所述初始位姿以及三维地图，使用预设帧集合来确定SLAM系统100的最终位姿。所确定的最终位姿是在世界坐标系下的位姿，并且所确定的最终位姿可被存储在存储单元150中。Next, referring back to FIG. 1 , the pose determination unit 140 according to an exemplary embodiment of the present invention may determine the final pose of the SLAM system 100 based on the estimated initial pose of the SLAM system 100 and the constructed three-dimensional map. In an exemplary embodiment of the present invention, the pose determination unit 140 may use a preset frame set to determine the final pose of the SLAM system 100 based on the initial pose and the three-dimensional map. The determined final pose is the pose in the world coordinate system, and the determined final pose may be stored in the storage unit 150 .

在本发明的示例性实施例中，可使用全局光束平差方法来确定所述最终位姿。具体地讲，位姿确定单元140可基于下面的公式(8)和(9)来进行全局光束平差以确定SLAM系统100的最终位姿：In an exemplary embodiment of the present invention, the final pose may be determined using a global beam adjustment method. Specifically, the pose determination unit 140 may perform a global beam adjustment based on the following equations (8) and (9) to determine the final pose of the SLAM system 100:

ψ^*＝argmin∑_i∈K∑_j∈P||e^ij|| …… (8)ψ ^* = argmin∑ _i∈K ∑ _j∈P ||e ^ij || … (8)

e^ij＝x^ij-π(ε^iw,X^wj) …… (9)e ^ij =x ^ij -π(ε ^iw ,X ^wj ) … (9)

在上述公式(8)和(9)中，K是预设帧集合并且包括当前帧，例如，可以是包括关键帧和当前帧的集合。P表示特征提取模块121提取的特征点的集合。e^ij表示所述特征点的集合P中的第j个特征点在所述集合K中的第i个帧上的重投影误差，并且当不存在所述第j个特征点在所述第i个帧上的投影时，可将e^ij设置为0。x^ij为所述第j个特征点在所述第i个帧上的投影点的二维坐标。ε^iw表示针对所述第i个帧，SLAM系统100在世界坐标系下的位姿。X^wj表示所述第j个特征点在世界坐标系下的三维坐标，并且可基于地图构建单元130所提供的三维地图来确定。In the above formulas (8) and (9), K is a preset frame set and includes a current frame, for example, may be a set including a key frame and a current frame. P represents the set of feature points extracted by the feature extraction module 121 . e ^ij represents the reprojection error of the j-th feature point in the feature point set P on the i-th frame in the set K, and when there is no j-th feature point in the i-th frame When projecting over frames, ^eij can be set to 0. x ^ij is the two-dimensional coordinate of the projection point of the j-th feature point on the i-th frame. ε ^iw represents the pose of the SLAM system 100 in the world coordinate system for the ith frame. X ^wj represents the three-dimensional coordinates of the j-th feature point in the world coordinate system, and can be determined based on the three-dimensional map provided by the map construction unit 130 .

基于公式(8)求最优解，可获得SLAM系统100的与集合K中的各帧相应的位姿(即，SLAM系统100在获得所述各帧时的位姿)。此时，不仅可确定SLAM系统100的最终位姿，还可用所获得的SLAM系统100的针对其他帧的位姿来更新先前确定的对应位姿，由此能够不断提高SLAM系统100的定位精确度。To find the optimal solution based on formula (8), the pose of the SLAM system 100 corresponding to each frame in the set K (ie, the pose of the SLAM system 100 when obtaining the frames) can be obtained. At this time, not only the final pose of the SLAM system 100 can be determined, but also the obtained poses of the SLAM system 100 for other frames can be used to update the previously determined corresponding poses, thereby continuously improving the positioning accuracy of the SLAM system 100 .

此外，在对上述公式(8)求最优解的过程中，初始位姿估计单元120所确定的初始位姿可作为ε^iw的参考(例如，ε^iw的取值可初始设置为所述初始位姿，并且在之后的计算中基于该初始位姿进行调整)，以便加快运算速度。In addition, in the process of finding the optimal solution to the above formula (8), the initial pose determined by the initial pose estimation unit 120 can be used as a reference for ε ^iw (for example, the value of ε ^iw can be initially set as the initial pose pose, and adjust it based on the initial pose in subsequent calculations) to speed up the computation.

在本发明的示例性实施例中，关键帧可以是存储单元150中存储的多个帧之中的按照预设规则选择的帧，例如，例如，按照预设间隔选择的第1帧、第5帧、第9帧、……，或者甚至可以是存储的全部帧。In an exemplary embodiment of the present invention, the key frame may be a frame selected according to a preset rule among the multiple frames stored in the storage unit 150, for example, the first frame, the fifth frame selected according to the preset interval frame, frame 9, . . ., or even the entire frame stored.

在上面基于公式(8)和(9)进行全局光束平差以确定SLAM系统100的最终位姿时，例如，对于当前帧的第j个特征点，可将所有以第j个特征点的对应地图特征点为共视特征点的共视帧(即，在这些共视帧上都能够看到第j个特征点的对应地图特征点)作为与该第j个特征点相关的一个特征跟踪集合(即，所述预设帧集合)来整体参与最终位姿的计算(例如，在本发明中，可使用前述全图跟踪模块(未示出)来找出这样的共视帧)，由此保持全局一致性。然而，这样耗时较长，无法满足SLAM系统100的实时性，甚至有时不得不牺牲SLAM系统100的精度来保证实时性。When the global beam adjustment is performed based on equations (8) and (9) to determine the final pose of the SLAM system 100, for example, for the jth feature point of the current frame, all the corresponding features of the jth feature point can be The map feature points are the co-view frames of the co-view feature points (that is, the corresponding map feature points of the j-th feature point can be seen on these co-view frames) as a feature tracking set related to the j-th feature point. (ie, the preset frame set) to participate in the calculation of the final pose as a whole (for example, in the present invention, the aforementioned full-image tracking module (not shown) can be used to find such co-view frames), thereby Maintain global consistency. However, this takes a long time, cannot meet the real-time performance of the SLAM system 100, and sometimes even has to sacrifice the accuracy of the SLAM system 100 to ensure the real-time performance.

可选择地，当进行全局光束平差时，位姿确定单元140可设置所述预设帧集合包括对与当前帧相关的特征跟踪集合进行划分所获得的多个跟踪子集。更具体地，位姿确定单元140可将与当前帧的各特征相关的各特征跟踪集合分别划分为一个或更多个特征跟踪子集，并通过基于各个特征跟踪子集进行全局光束平差来确定SLAM系统100的最终位姿。也就是说，所述预设帧集合可包括对与当前帧的各特征相关的各特征跟踪集合分别划分所获得的一个或更多个特征跟踪子集。下面参照图9对此进行描述。Optionally, when performing global beam adjustment, the pose determination unit 140 may set the preset frame set to include multiple tracking subsets obtained by dividing the feature tracking set related to the current frame. More specifically, the pose determination unit 140 may divide each feature tracking set related to each feature of the current frame into one or more feature tracking subsets, respectively, and perform global beam adjustment based on each feature tracking subset to obtain the result. The final pose of the SLAM system 100 is determined. That is, the preset frame set may include one or more feature tracking subsets obtained by dividing each feature tracking set related to each feature of the current frame respectively. This is described below with reference to FIG. 9 .

图9示出特征跟踪集合的划分示例示意图。FIG. 9 shows a schematic diagram of an example of division of a feature tracking set.

参照图9，仅作为示例，假设C₁至C₄为与当前帧的第j个特征点相关的共视帧，K为与当前帧的第j个特征点相关的共视关键帧。也就是说，在共视帧C₁至C₄以及共视关键帧K中均可看到所述第j个特征点的对应地图特征点，T_j为所述第j个特征点的特征跟踪集合，并由共视关键帧K以及共视帧C₁至C₄构成。Referring to FIG. 9 , as an example only, it is assumed that C ₁ to C ₄ are co-view frames related to the j-th feature point of the current frame, and K is a co-view key frame related to the j-th feature point of the current frame. That is to say, the corresponding map feature points of the j-th feature point can be seen in the common-view frames C ₁ to C ₄ and the common-view key frame K, and T _j is the feature tracking of the j-th feature point The set is composed of a co-viewing key frame K and co-viewing frames C ₁ to C ₄ .

在本发明的示例性实施例中，参照图9，位姿估计单元140可将T_j划分为子集T_j1、T_j2、T_j3和T_j4，其中，T_j1由共视关键帧K以及共视帧C₁至C₃构成，T_j2由共视关键帧K以及共视帧C₂至C₄构成，T_j3由共视关键帧K以及共视帧C₁、C₃和C₄构成，并且T_j4由共视关键帧K以及共视帧C₁、C₂和C₄构成。In an exemplary embodiment of the present invention, referring to FIG. 9 , the pose estimation unit 140 may divide T _j into subsets T _j1 , T _j2 , T _j3 , and T _j4 , where T _j1 consists of the common view key frame K and The common-view frames C ₁ to C ₃ are composed, T _j2 is composed of the common-view key frame K and the common-view frames C ₂ to C ₄ , and T _j3 is composed of the common-view key frame K and the common-view frames C ₁ , C ₃ and C ₄ . , and T _j4 consists of a common-view key frame K and common-view frames C ₁ , C ₂ and C ₄ .

这样，当使用如公式(8)的全局光束平差方法来计算SLAM系统100的最终位姿时，公式(8)中与第j个特征点相关的子项中所涉及的帧集合并非是与T_j，而是子集T_j1、T_j2、T_j3和T_j4，通过基于这4个更短的子集来计算所述相应的子项，可有效降低计算负担。Thus, when the final pose of the SLAM system 100 is calculated using the global beam adjustment method as in Equation (8), the set of frames involved in the sub-item related to the jth feature point in Equation (8) is not related to T _j , but the subsets T _j1 , T _j2 , T _j3 and T _j4 , by calculating the corresponding sub-items based on these 4 shorter subsets, can effectively reduce the computational burden.

此外，在本发明的示例性实施例中，可基于与每个特征相应的特征跟踪集合中的帧的数量来确实是否对该特征跟踪集合进行划分，例如，可仅在所述特征跟踪集合中的帧的数量超过预设阈值时将该特征跟踪集合划分为预设大小的若干个特征跟踪子集，使得每个特征跟踪子集中的帧的数量少于或等于所述预设阈值并且每个特征跟踪子集均包括共视关键帧，并且所有特征跟踪子集中的帧涵盖了特征跟踪集合中的所有帧。Furthermore, in an exemplary embodiment of the present invention, whether a feature tracking set is actually divided may be based on the number of frames in the feature tracking set corresponding to each feature, eg, it may only be in the feature tracking set When the number of frames exceeds the preset threshold, the feature tracking set is divided into several feature tracking subsets of preset size, so that the number of frames in each feature tracking subset is less than or equal to the preset threshold and each Feature tracking subsets all include co-view keyframes, and frames in all feature tracking subsets cover all frames in the feature tracking set.

此外，在SLAM系统100的应用中，对于较长的运动轨迹，会累积相当数量的关键帧在后端进行优化，因此容易导致SLAM系统100无法高效的工作。因此，在本发明的另一示例性实施例中，位姿确定单元140可针对当前帧设置预定大小的时域窗口，并使用处于时域窗口内的帧(例如，处于时域窗口内的关键帧，可包括共视关键帧，也可包括非共视关键帧)来确定SLAM系统100的最终位姿。也就是说，前述预设帧集合也可包括通过对针对当前帧设置预定大小的时域窗口所确定的处于所述时域窗口内的关键帧。图10示出基于时域窗口的全局光束平差示意图。如图10中所示，针对当前帧的时域窗口中即可包括共视关键帧，也可包括非共视关键帧。In addition, in the application of the SLAM system 100 , for a long motion trajectory, a considerable number of key frames will be accumulated for optimization at the back end, which easily causes the SLAM system 100 to fail to work efficiently. Therefore, in another exemplary embodiment of the present invention, the pose determination unit 140 may set a time domain window of a predetermined size for the current frame, and use a frame within the time domain window (for example, a key within the time domain window) frames, which may include co-view keyframes or non-co-view key frames) to determine the final pose of the SLAM system 100 . That is, the aforementioned preset frame set may also include key frames within the time domain window determined by setting a time domain window of a predetermined size for the current frame. Figure 10 shows a schematic diagram of the global beam adjustment based on the time domain window. As shown in FIG. 10 , the time domain window for the current frame may include both co-view key frames and non- co-view key frames.

可选择地，位姿确定单元140还可针对当前帧设置预定大小的时域窗口，并使用所述时域窗口对前述处理中产生的特征跟踪子集之中的帧进行筛选，去除处于时域窗口以外的帧，然后基于经过该筛选处理的特征跟踪子集来进行全局光束平差，从而确定SLAM系统100的最终位姿，由此能够进一步降低运算负担，提高图像处理效率。Optionally, the pose determination unit 140 may further set a time domain window of a predetermined size for the current frame, and use the time domain window to filter the frames in the feature tracking subset generated in the foregoing process, and remove the frames in the time domain For frames outside the window, global beam adjustment is performed based on the filtered feature tracking subset to determine the final pose of the SLAM system 100 , thereby further reducing computational burden and improving image processing efficiency.

然而，当SLAM系统100的运动轨迹存在回环时，仅使用时域窗口内的关键帧而不是全部关键帧进行处理可能会导致回环无法闭合，如图11的(a)所示。在本发明的示例性实施例中，可利用闭环检测使回环闭合，而且由此产生的位姿差值可将轨迹依次传递回去，保证运动轨迹平滑，其过程如图11的(a)至图11(d)所示。图11所示的闭环误差消除处理对本领域技术人员来说是已知的，因此为了简明，在此将不再进行更加详细的描述。However, when there is a loop in the motion trajectory of the SLAM system 100, only using key frames within the time domain window instead of all key frames for processing may cause the loop to fail to close, as shown in (a) of FIG. 11 . In an exemplary embodiment of the present invention, loop closure can be used to close the loop, and the resulting pose difference can be passed back to the trajectory in turn to ensure smooth motion trajectory, the process of which is shown in Figure 11 (a) to Figure 11. 11(d). The closed-loop error cancellation process shown in FIG. 11 is known to those skilled in the art, so for the sake of brevity, it will not be described in more detail here.

此外，以上以点特征为示例描述了位姿确定单元140确定SLAM系统100的最终位姿的详细处理，然而，当初始位姿估计单元120还提取了线段特征时，还可进一步考虑线段特征来进行全局光束平差。In addition, the point feature is used as an example to describe the detailed process of determining the final pose of the SLAM system 100 by the pose determining unit 140. However, when the initial pose estimating unit 120 also extracts line segment features, the line segment features may be further considered to determine the final pose of the SLAM system 100. Perform a global beam adjustment.

具体来说，根据本发明的示例性实施例的SLAM系统还可包括点线优化组合模块(未示出)，该点线优化组合模块(未示出)可基于特征匹配模块132的线段特征的匹配结果来建立三维共线约束，使得位姿确定单元140还可根据所述三维共线约束来确定最终位姿。仅作为示例，还可在前述全局光束平差方法中添加并使用与所述三维共线约束对应的线段约束子项。这里，三维共线约束表示第一帧的线段上的点在第二帧的对应线段上。更具体地讲，根据线段描述符矢量获取两帧之间的线段匹配，对任一线段匹配，两条匹配的线段在相机坐标系下的三维坐标均可得，则可知，其中一条线段的端点(空间三维坐标)经过帧帧位姿变换到另一系相机坐标系下时，一定在此线段所匹配的线段之上Specifically, the SLAM system according to the exemplary embodiment of the present invention may further include a point-line optimization combination module (not shown), which may be based on the line segment features of the feature matching module 132 The matching results are used to establish a three-dimensional collinearity constraint, so that the pose determination unit 140 can also determine the final pose according to the three-dimensional collinearity constraint. By way of example only, a line segment constraint sub-term corresponding to the three-dimensional collinearity constraint may also be added and used in the aforementioned global beam adjustment method. Here, the three-dimensional collinearity constraint means that a point on a line segment of the first frame is on a corresponding line segment of the second frame. More specifically, the line segment matching between two frames is obtained according to the line segment descriptor vector. For any line segment matching, the three-dimensional coordinates of the two matching line segments in the camera coordinate system can be obtained. It can be known that the endpoint of one of the line segments is (Three-dimensional coordinates in space) When the frame pose is transformed to another camera coordinate system, it must be above the line segment matched by this line segment.

换句话说，在如公式(8)计算SLAM系统100的最终位姿时，公式右侧的子项所涉及的重投影误差不仅包括点特征的重投影误差，还包括线段特征的重投影误差。这里，计算线段特征的重投影误差与公式(9)中示出的计算点特征的重投影误差的方式类似并且对于本领域技术人员来说是已知的，因此在这里不再赘述。In other words, when calculating the final pose of the SLAM system 100 as in Equation (8), the reprojection error involved in the sub-term on the right side of the formula includes not only the reprojection error of point features, but also the reprojection error of line segment features. Here, calculating the reprojection error of the line segment feature is similar to the method of calculating the reprojection error of the point feature shown in formula (9) and is known to those skilled in the art, so it is not repeated here.

此外，以上以全局光束平差方法进行了描述，但本发明不限于此，还可使用其他能够确定装置位姿的方法。In addition, the global beam adjustment method is described above, but the present invention is not limited to this, and other methods capable of determining the pose of the device can also be used.

参照图12，在所述方法的步骤1210，可使用图像获取单元110来获取SLAM系统100的周围环境的彩色图像及深度图像。Referring to FIG. 12 , at step 1210 of the method, the image acquisition unit 110 may be used to acquire a color image and a depth image of the surrounding environment of the SLAM system 100 .

然后，在步骤1220，可使用初始位姿估计单元120基于步骤1210获得的彩色图像和深度图像，估计SLAM系统100的初始位姿。Then, at step 1220, the initial pose estimation unit 120 may be used to estimate the initial pose of the SLAM system 100 based on the color image and the depth image obtained at step 1210.

更具体地，在步骤1220，初始位姿估计单元120可在所述彩色图像中提取点特征，根据所述点特征进行点匹配，然后使用匹配的点特征来估计SLAM系统100的初始位姿。More specifically, in step 1220, the initial pose estimation unit 120 may extract point features in the color image, perform point matching according to the point features, and then use the matched point features to estimate the initial pose of the SLAM system 100.

当匹配的点特征的数量少于第一预设阈值时，初始位姿估计单元120还可在所述彩色图像中提取线段特征，并根据所述线段特征进行线段匹配。此时，初始位姿估计单元120可使用匹配的点特征和线段特征来估计的SLAM系统100的初始位姿。When the number of matched point features is less than the first preset threshold, the initial pose estimation unit 120 may further extract line segment features from the color image, and perform line segment matching according to the line segment features. At this time, the initial pose estimation unit 120 may use the matched point features and line segment features to estimate the initial pose of the SLAM system 100 .

此外，当匹配的点特征的数量少于第二预设阈值(这里，第二预设阈值小于第一预设阈值)时，初始位姿估计单元120可将SLAM系统100的初始位姿直接确定为与所述特定帧中SLAM系统100的相对位姿保持相同。In addition, when the number of matched point features is less than the second preset threshold (here, the second preset threshold is less than the first preset threshold), the initial pose estimation unit 120 may directly determine the initial pose of the SLAM system 100 to remain the same as the relative pose of the SLAM system 100 in the particular frame.

这里，所述特定帧可以是当前帧的前一帧或附近的若干帧。Here, the specific frame may be a previous frame or several frames in the vicinity of the current frame.

可选择地，尽管图12的流程图中没有示出，但所述方法还可包括确定共视关键帧的操作。在本发明的示例性实施例中，可使用前面描述的全图跟踪单元(未示出)在先前帧(例如，先前关键帧)中确定与当前帧具有共视点的共视关键帧。由此，步骤1220中初始位姿估计单元120还可进一步使用这些共视关键帧来估计初始位姿，从而能够增强SLAM系统100的准确性。Optionally, although not shown in the flowchart of FIG. 12, the method may further include the operation of determining a co-view key frame. In an exemplary embodiment of the present invention, a co-viewing keyframe having a co-viewing point with the current frame may be determined in a previous frame (eg, a previous key frame) using the previously described full-image tracking unit (not shown). Therefore, in step 1220, the initial pose estimation unit 120 may further use these co-view key frames to estimate the initial pose, so that the accuracy of the SLAM system 100 can be enhanced.

然后，在步骤1230，可使用地图构建单元130来基于所述深度图像和彩色图像，构建三维地图。Then, at step 1230, a map construction unit 130 may be used to construct a three-dimensional map based on the depth image and the color image.

地图构建单元130可在首次进入特定环境时基于所述特定环境的第一深度图像和第一彩色图像来重建初始的三维地图，之后在再次进入所述特定环境时基于所述特定环境的第二深度图像和第二彩色图像来更新所述三维地图。The map construction unit 130 may reconstruct an initial three-dimensional map based on the first depth image and the first color image of the specific environment when entering the specific environment for the first time, and then based on the second image of the specific environment when entering the specific environment again. A depth image and a second color image are used to update the three-dimensional map.

更具体地讲，当SLAM系统100首次进入特定环境时，地图构建单元130可基于首次进入所述特定环境时由图像获取单元110所获取的深度图像和彩色图像来重建三维地图，确定重建的三维地图上的每个地图点的三维坐标及其空间准确度，并存储带有空间准确度的三维地图上的每个地图点的三维坐标及其空间时间置信度。这里，由于SLAM系统100首次进入特定环境时并未获得空间时间置信度，因此可将此时的三维地图上的地图点的空间时间置信度设置为空并进行存储。More specifically, when the SLAM system 100 enters a specific environment for the first time, the map construction unit 130 may reconstruct a three-dimensional map based on the depth image and color image acquired by the image acquisition unit 110 when entering the specific environment for the first time, and determine the reconstructed three-dimensional map. The three-dimensional coordinates of each map point on the map and its spatial accuracy are stored, and the three-dimensional coordinates of each map point on the three-dimensional map with spatial accuracy and its spatial-temporal confidence are stored. Here, since the SLAM system 100 does not obtain the spatial and temporal confidence when entering a specific environment for the first time, the spatial and temporal confidence of the map point on the three-dimensional map at this time can be set as null and stored.

之后，当SLAM系统100再次进入该特定环境时，地图构建单元130可基于再次进入该特定环境时由图像获取单元110所获取的深度图像和彩色图像计算每个地图点的三维坐标及其空间时间置信度，根据计算出的每个地图点的三维坐标及其空间时间置信度来更新所存储的三维地图上的对应地图点的三维坐标及其空间时间置信度。更新后的三维地图可作为地图构建单元130构建的三维地图进行后续使用。After that, when the SLAM system 100 re-enters the specific environment, the map construction unit 130 may calculate the three-dimensional coordinates of each map point and its space time based on the depth image and color image acquired by the image acquisition unit 110 when re-entering the specific environment Confidence, according to the calculated three-dimensional coordinates of each map point and its spatial and temporal confidence, update the three-dimensional coordinates of the corresponding map points on the stored three-dimensional map and their spatial and temporal confidence. The updated three-dimensional map can be used as the three-dimensional map constructed by the map constructing unit 130 for subsequent use.

在步骤1240，可使用位姿确定单元140来基于步骤1220估计出的SLAM系统100的初始位姿以及在步骤1230构建的三维地图，确定SLAM系统100的最终位姿。At step 1240 , the pose determination unit 140 may be used to determine the final pose of the SLAM system 100 based on the initial pose of the SLAM system 100 estimated at step 1220 and the three-dimensional map constructed at step 1230 .

这里，如前所述，位姿确定单元140可基于所述初始位姿以及三维地图，使用预设帧集合来确定最终位姿。所述预设帧集合可包括对与当前帧相关的跟踪集合进行划分所获得的多个跟踪子集。可选择地，位姿确定单元140也可针对当前帧设置预定大小的时域窗口，并基于初始位姿以及三维地图，利用处于所述时域窗口内的关键帧来确定最终位姿。Here, as described above, the pose determination unit 140 may use a preset frame set to determine the final pose based on the initial pose and the three-dimensional map. The preset frame set may include a plurality of tracking subsets obtained by dividing the tracking set related to the current frame. Optionally, the pose determining unit 140 may also set a time domain window of a predetermined size for the current frame, and based on the initial pose and the three-dimensional map, use the key frames within the time domain window to determine the final pose.

优选地，当在步骤1220执行了线段特征的提取和匹配之后，所述方法还可使用点线优化组合单元(未示出)来基于线段特征的匹配结果建立三维共线约束，使得位姿确定单元140可进行一步利用所述三维共线约束来确定所述最终位姿。由此可进一步提高SLAM系统100的准确性Preferably, after performing the extraction and matching of the line segment features in step 1220, the method may also use a point-line optimization combination unit (not shown) to establish a three-dimensional collinearity constraint based on the matching result of the line segment features, so that the pose is determined Unit 140 may further utilize the three-dimensional collinearity constraints to determine the final pose. As a result, the accuracy of the SLAM system 100 can be further improved

最后，在步骤S1250，可由存储单元150存储包括SLAM系统100的最终位姿的当前帧相关信息。Finally, in step S1250, the current frame related information including the final pose of the SLAM system 100 may be stored by the storage unit 150.

以上已结合图1至图11详细描述了图12的各个步骤中由根据本发明的示例性实施例的SLAM系统100的各个组成部件所执行的操作，因此为了简明，在此将不再进行重复描述。The operations performed by the various components of the SLAM system 100 according to the exemplary embodiment of the present invention in the various steps of FIG. 12 have been described in detail above with reference to FIGS. 1 to 11 , so for brevity, they will not be repeated here. describe.

本发明的示例性实施例的SLAM系统100及其方法能够确保建图的实时性，降低计算量并还保证地图准确性。此外，应该理解，本说明书中记载的示例性实施例仅是为了便于理解而示出的示例，本申请不限于此，在不脱离本发明的构思的精神和范围的情况下，本发明的任何等同形式、替换形式以及修改可被视为落入本发明的范围之内。The SLAM system 100 and the method thereof of the exemplary embodiment of the present invention can ensure the real-time performance of the mapping, reduce the amount of calculation, and also ensure the accuracy of the map. In addition, it should be understood that the exemplary embodiments described in the present specification are only examples shown for the convenience of understanding, and the present application is not limited thereto, without departing from the spirit and scope of the concept of the present invention, any Equivalents, alternatives, and modifications are deemed to fall within the scope of the invention.

本发明示例性实施例也可实现为计算机可读记录介质上的计算机可读代码。计算机可读记录介质是可存储其后可由计算机系统读出的数据的任意数据存储装置。计算机可读记录介质的示例包括：只读存储器(ROM)、随机存取存储器(RAM)、CD-ROM、磁带、软盘、光数据存储装置和载波(诸如经有线或无线传输路径通过互联网的数据传输)。计算机可读记录介质也可分布于连接网络的计算机系统，从而计算机可读代码以分布式存储和执行。此外，完成本发明的功能程序、代码和代码段可容易地被与本发明相关的领域的普通程序员在本发明的范围之内解释。Exemplary embodiments of the present invention can also be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of computer-readable recording media include: read only memory (ROM), random access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data over the Internet via wired or wireless transmission paths) transmission). The computer-readable recording medium can also be distributed over network coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. In addition, functional programs, codes and code segments for accomplishing the present invention can be easily construed by programmers of ordinary skill in the art to which the present invention pertains within the scope of the present invention.

尽管已经参照其示例性实施例具体显示和描述了本发明，但是本领域的技术人员应该理解，在不脱离权利要求所限定的本发明的精神和范围的情况下，可以对其进行形式和细节上的各种改变。Although the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that form and detail may be made therein without departing from the spirit and scope of the invention as defined in the claims various changes on.

Claims

1. A realization device for real-time positioning and mapping, comprising:

an image acquisition unit, configured to acquire a color image and a depth image of the surrounding environment;

an initial pose estimation unit configured to estimate an initial pose based on the color image and the depth image;

a map construction unit configured to construct a three-dimensional map based on the depth image and the color image; and

The pose determination unit is configured to determine the final pose based on the initial pose and the three-dimensional map.

2. The implementation device according to claim 1, wherein the map construction unit comprises:

a map reconstruction module configured to reconstruct an initial three-dimensional map based on the first depth image and the first color image of the surrounding environment;

A map update module configured to update the three-dimensional map based on a second depth image and a second color image of the surrounding environment.

3. The implementation device according to claim 1, wherein the initial pose estimation unit comprises:

a feature extraction module configured to extract point features in the color image;

a feature matching module, configured to perform point matching according to the point feature;

an estimation module, configured to estimate the initial pose using the matched point features,

Wherein, when the number of matched point features is less than the first threshold, the feature extraction module also extracts line segment features in the color image, the feature matching module performs line segment matching according to the line segment features, and the estimation module uses the matched point features and line segments features to estimate the initial pose.

4. The implementation device according to claim 3, wherein the feature matching module performs line segment matching according to geometric structure information around the line segment feature.

5. The implementation device according to claim 1, further comprising: a full-image tracking module configured to determine a key frame having a common viewpoint with the current frame among previous key frames,

Wherein, the initial pose estimation unit uses the determined key frame to estimate the initial pose based on the color image and the depth image.

6. A realization method of real-time positioning and mapping, the method comprising:

Obtain color images and depth images of the surrounding environment;

estimating an initial pose based on the color image and the depth image;

constructing a three-dimensional map based on the depth image and color image; and

Based on the initial pose and the three-dimensional map, the final pose is determined.

7. The method of claim 6, wherein the step of constructing a three-dimensional map comprises:

reconstructing an initial three-dimensional map based on a first depth image and a first color image of the surrounding environment;

The three-dimensional map is updated based on a second depth image and a second color image of the surrounding environment.

8. The method of claim 6, wherein the step of estimating the initial pose comprises:

Extract point features in color images;

perform point matching according to the point feature;

Using the matched point features, the initial pose is estimated,

Wherein, when the number of matched point features is less than the first threshold, the step of estimating the initial pose further includes: extracting line segment features in the color image, performing line segment matching according to the line segment features, and using the matched point features and line segments features to estimate the initial pose.

9 . The method of claim 8 , wherein the step of performing line segment matching according to the line segment feature comprises: performing line segment matching according to geometric structure information around the line segment feature. 10 .

10. The method of claim 6, further comprising: determining a key frame having a common viewpoint with the current frame in previous key frames,

The step of estimating the initial pose includes: using the determined key frame to estimate the initial pose based on the color image and the depth image.

11. The method of claim 8, further comprising: establishing a three-dimensional collinearity constraint based on the matching result of the line segment features,

Wherein, the step of determining the final pose includes: further determining the final pose according to the three-dimensional collinear constraint,

The three-dimensional collinearity constraint indicates that the point on the line segment of the first frame is on the corresponding line segment of the second frame.

12. The method of claim 6, wherein the step of determining the final pose comprises: using a preset frame set to determine the final pose based on the initial pose and the three-dimensional map,

The preset frame set includes: multiple tracking subsets obtained by dividing the tracking set related to the current frame.

13. The method according to claim 6, wherein the step of determining the final pose comprises: setting a time domain window of a predetermined size for the current frame, and based on the initial pose and the three-dimensional map, using the time domain window within the time domain window. keyframes to determine the final pose.

14. A computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one computing device, cause the at least one computing device to perform the method of any one of claims 6 to 13. method.

15. A system comprising at least one computing device and at least one storage device for storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform as claimed in claims 6 to 13 The method of any of the claims.