CN118038239A

CN118038239A - An indoor visual loop detection method based on point-line feature global descriptor

Info

Publication number: CN118038239A
Application number: CN202410197669.8A
Authority: CN
Inventors: 张文超; 刘晓宙; 魏东岩; 陈嘉伟; 曹磊; 袁洪
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2024-02-22
Filing date: 2024-02-22
Publication date: 2024-05-14

Abstract

The invention provides an indoor vision loop detection method based on a dotted line feature global descriptor, which belongs to the technical field of indoor vision positioning and comprises the following steps: step 1, extracting image dotted line characteristics and descriptors; step 2, establishing a key frame; step 3, generating a global descriptor; and 4, matching the historical key frames. The invention is based on the image appearance structured dot line characteristic information, is oriented to the positioning requirement of indoor scenes, and can meet the indoor scene loop detection with consistent visual long-term positioning track.

Description

An indoor visual loop detection method based on point-line feature global descriptor

技术领域Technical Field

本发明属于视觉室内定位技术领域，具体涉及一种基于点线特征全局描述子的室内视觉回环检测方法，可用于实现室内场景下低成本便携视觉传感器的长期定位。The present invention belongs to the technical field of visual indoor positioning, and specifically relates to an indoor visual loop detection method based on a point-line feature global descriptor, which can be used to achieve long-term positioning of low-cost portable visual sensors in indoor scenes.

背景技术Background technique

随着社会的快速发展，室内定位在社会生活的各种领域都具备重大需求，在商业领域，在大型建筑物内，如购物中心、机场、医院或大型办公楼内需要导航服务，提高用户体验；在娱乐领域，室内定位技术可以用于虚拟和增强现实应用，提供更真实、交互性更强的体验；在工业领域，各种物联网移动机器需要知道自身准确位置，以便更好地执行任务。而全球导航卫星系统(Global Navigation Satellite System,GNSS)是当前最传统和应用最广泛的定位技术，但在室内场景中，GNSS信号被建筑物遮挡、阻隔，定位结果精度受到极大的影响。同时，其大尺度级别上的定位精度难以无法满足小尺度环境下的运动。此外，一些传统室内定位手段例如UWB定位技术、Wi-Fi指纹技术，其根据不同基站接受到的信号强度，通过三角测量等技术推算当前位置，但其却必须依赖预先部署的基站，不具备自主定位的能力，适用场景少、限制多，因此需要引入自主定位技术。With the rapid development of society, indoor positioning has great demand in various fields of social life. In the commercial field, navigation services are needed in large buildings such as shopping malls, airports, hospitals or large office buildings to improve user experience; in the entertainment field, indoor positioning technology can be used for virtual and augmented reality applications to provide a more realistic and interactive experience; in the industrial field, various IoT mobile machines need to know their exact location in order to better perform tasks. The Global Navigation Satellite System (GNSS) is the most traditional and widely used positioning technology at present, but in indoor scenes, GNSS signals are blocked and blocked by buildings, and the accuracy of positioning results is greatly affected. At the same time, its positioning accuracy at a large scale level is difficult to meet the movement in a small scale environment. In addition, some traditional indoor positioning methods such as UWB positioning technology and Wi-Fi fingerprint technology, which calculate the current position through triangulation and other technologies based on the signal strength received by different base stations, but they must rely on pre-deployed base stations, do not have the ability of autonomous positioning, are applicable to few scenarios, and have many restrictions, so it is necessary to introduce autonomous positioning technology.

而基于视觉的SLAM技术是当前最主流的自主定位技术。SLAM(SimultaneousLocalization and Mapping,同步定位与建图)的核心目标是使机器或行人能够在未知环境中进行自主导航，同时构建地图。其依赖的传感器大概可以分为三类：视觉传感器、激光雷达、惯性测量单元IMU。其中视觉传感器相较于惯性传感器具备能够感知周边环境的能力，而相较于激光雷达又具备价格低廉、便携的优点，因此是目前自主定位技术中最常用的传感器。而基于视觉的SLAM通常分为三个环节：前端跟踪、后端优化以及回环检测。由于视觉SLAM的位姿计算本质依赖于图像间的匹配递推，误差不可避免地累积并最终导致位姿的发散。而回环检测正是通过识别同一位置，检测到达相同地点，实现全局的约束，从而避免位姿发散。The vision-based SLAM technology is currently the most mainstream autonomous positioning technology. The core goal of SLAM (Simultaneous Localization and Mapping) is to enable machines or pedestrians to navigate autonomously in unknown environments and build maps at the same time. The sensors it relies on can be roughly divided into three categories: visual sensors, lidar, and inertial measurement units (IMUs). Compared with inertial sensors, visual sensors have the ability to perceive the surrounding environment, and compared with lidar, they have the advantages of low price and portability. Therefore, they are the most commonly used sensors in autonomous positioning technology. The vision-based SLAM is usually divided into three links: front-end tracking, back-end optimization, and loop detection. Since the pose calculation of visual SLAM essentially depends on the matching recursion between images, errors inevitably accumulate and eventually lead to pose divergence. Loop detection is to achieve global constraints by identifying the same position and detecting the arrival at the same place, thereby avoiding pose divergence.

最常见的回环检测方法基于词袋模型实现，其依赖大量场景进行训练，得到数量庞大的词袋库，图像根据词袋库构建自身的单词向量，并据此进行匹配。该方法依赖于特征点，实现简单，但庞大的词袋库加载缓慢，且不适用于所有场景。同时，在室内场景中，常常具备结构化和低纹理的特点，即图像中特征点信息减少，线特征信息增多。综合以上特点，针对室内定位场景，亟需构建一种基于结构化点线特征，同时不依赖先验构建词库，仅根据图像间信息进行匹配的全局描述匹配方法，以实现视觉SLAM过程中的回环检测，满足视觉自主定位在室内场景全局轨迹一致的要求。The most common loop detection method is based on the bag-of-words model, which relies on a large number of scenes for training to obtain a large number of bag-of-words libraries. The image builds its own word vector based on the bag-of-words library and matches it accordingly. This method relies on feature points and is simple to implement, but the huge bag-of-words library is slow to load and is not suitable for all scenes. At the same time, indoor scenes often have the characteristics of structure and low texture, that is, the feature point information in the image is reduced and the line feature information is increased. Combining the above characteristics, for indoor positioning scenes, it is urgent to build a global description matching method based on structured point and line features, which does not rely on a priori word library construction and only matches based on information between images, so as to realize loop detection in the visual SLAM process and meet the requirements of global trajectory consistency of visual autonomous positioning in indoor scenes.

发明内容Summary of the invention

为解决现有技术应用于线特征充足而点特征较少的结构化室内场景下视觉定位发散的问题，本发明提供一种基于点线特征全局描述子的室内视觉回环检测方法，基于图像外观结构化点线特征信息，面向室内场景的定位需要，能够满足视觉长期定位轨迹一致的室内场景回环检测。In order to solve the problem of divergence of visual positioning in structured indoor scenes with sufficient line features but fewer point features in the prior art, the present invention provides an indoor visual loop detection method based on a global descriptor of point and line features. This method is based on the structured point and line feature information of image appearance, is oriented to the positioning needs of indoor scenes, and can meet the indoor scene loop detection with consistent visual long-term positioning trajectory.

为达到上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical scheme:

一种基于点线特征全局描述子的室内视觉回环检测方法，包括如下步骤：An indoor visual loop detection method based on a point-line feature global descriptor comprises the following steps:

步骤1、利用卷积神经网络的SuperPoint模型完成图像的特征点及其描述子的提取，利用线段检测方法完成线特征检测，并利用以点特征表示线特征的方法完成线特征的描述子的提取，最终完成图像的点特征、线特征及描述子的提取；Step 1: Use the SuperPoint model of the convolutional neural network to complete the extraction of the feature points and their descriptors of the image, use the line segment detection method to complete the line feature detection, and use the method of representing line features with point features to complete the extraction of the descriptor of the line feature, and finally complete the extraction of the point features, line features and descriptors of the image;

步骤2、根据帧数间隔、特征数量和共视关系三个标准，对每一帧图像进行检验，从而完成关键帧的确立；Step 2: According to the three criteria of frame interval, feature quantity and co-viewing relationship, each frame of the image is tested to complete the establishment of the key frame;

步骤3、针对关键帧，通过将点特征进行聚类，并将线特征根据相交关系进行分组，并根据聚类结果和分组结果以及所属的局部的特征点的描述子，完成全局特征残差的计算，从而获得关键帧的全局描述子；Step 3: For the key frame, cluster the point features and group the line features according to the intersection relationship, and calculate the global feature residual according to the clustering result and the grouping result and the descriptor of the local feature point to obtain the global descriptor of the key frame;

步骤4、根据当前关键帧生成的全局描述子，与历史关键帧的全局描述子进行相似度计算，得到候选匹配关键帧；针对候选匹配关键帧进行几何校验和连续性校验，获得确认为同一位置的匹配关键帧，完成回环检测。Step 4: Calculate the similarity between the global descriptor generated by the current key frame and the global descriptor of the historical key frame to obtain a candidate matching key frame; perform geometric verification and continuity verification on the candidate matching key frame to obtain a matching key frame confirmed to be at the same position, and complete loop detection.

有益效果：Beneficial effects:

本发明中的方法基于图像匹配技术，根据室内结构化特点，利用图像间自身外观信息，通过点线特征聚类和分组，实现自局部到整体的图像全局描述方法，无需先验构建的视觉词袋，即可实现相同位置的检测和识别，广泛适用于室内结构化环境，从而实现视觉定位的长期轨迹一致。The method in the present invention is based on image matching technology. According to the structured characteristics of indoor environments, it utilizes the appearance information of images themselves and realizes a global image description method from local to overall through point and line feature clustering and grouping. It can realize detection and recognition of the same position without the need for a priori constructed visual word bags. It is widely applicable to indoor structured environments, thereby achieving long-term trajectory consistency of visual positioning.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的一种基于点线特征全局描述子的室内视觉回环检测方法的流程图。FIG1 is a flow chart of an indoor visual loop detection method based on a point-line feature global descriptor of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

如图1所示，本发明的一种基于点线特征全局描述子的室内视觉回环检测方法包括如下步骤：As shown in FIG1 , an indoor visual loop detection method based on a point-line feature global descriptor of the present invention comprises the following steps:

步骤1、提取图像点线特征及描述子；Step 1: Extract image point and line features and descriptors;

步骤2、确立关键帧；Step 2: Establish key frames;

步骤3、生成全局描述子；Step 3: Generate a global descriptor;

步骤4、匹配历史关键帧。Step 4: Match historical key frames.

具体地，本发明实施例以室内场景为例，描述基于点线特征全局描述子的视觉回环检测方法的具体步骤。Specifically, the embodiment of the present invention takes an indoor scene as an example to describe the specific steps of a visual loop detection method based on a point-line feature global descriptor.

步骤1、提取图像点线特征及描述子：Step 1: Extract image point and line features and descriptors:

在SLAM定位和图像匹配中，直接对像素进行操作计算量过大，因此常常首先进行特征的提取，以实现降维效果。In SLAM positioning and image matching, directly operating on pixels is too computationally intensive, so feature extraction is often performed first to achieve dimensionality reduction.

步骤1.1特征点提取和描述：Step 1.1 Feature point extraction and description:

为提高特征点提取效果，本发明利用基于卷积神经网络的SuperPoint模型进行自动特征点提取，其使用编码器-解码器(Encoder-Decoder)架构：其中编码器Encoder经由多个卷积层和最大池化层，实现图像的降维和特征提取。而解码器则负责将编码器压缩得到的特征信息进行解码，以获得所需要的信息类型。在SuperPoint模型中共有两个解码器：一个负责获得特征点位置，一个获得特征点描述子。由此，经过SuperPoint模型后，可以获得图像中一系列特征点集合{x_i}，每一个特征点x_i都由其在图像像素坐标系的位置(u_i,v_i)和一个256*1维的描述子构成。In order to improve the effect of feature point extraction, the present invention uses the SuperPoint model based on convolutional neural network to perform automatic feature point extraction, which uses an encoder-decoder architecture: the encoder Encoder realizes image dimensionality reduction and feature extraction through multiple convolution layers and maximum pooling layers. The decoder is responsible for decoding the feature information compressed by the encoder to obtain the required information type. There are two decoders in the SuperPoint model: one is responsible for obtaining the feature point position, and the other is responsible for obtaining the feature point descriptor. Therefore, after the SuperPoint model, a series of feature point sets {x _i } in the image can be obtained, and each feature point x _i is composed of its position (u _i , _vi ) in the image pixel coordinate system and a 256*1 dimensional descriptor. constitute.

步骤1.2线特征提取：Step 1.2 Line feature extraction:

结合室内环境结构化的特点，即较多外观信息以线段的形式存在，因此对图像继续进行线特征检测。LSD(Line Segment Detector,线段检测器)是一种基于图像梯度的线段检测算法，可以实现亚像素级的精度。其原理为作为检测目标的线段，其两端区域的像素灰度值不同必然会产生梯度值，因此，根据将梯度的大小和方向对像素进行合并和区分，最终生成线区域，从而获得一系列线特征集合{l_i}，每条线特征l_i由其两个端点e_i1,e_i2和斜率k_i构成。Considering the structured characteristics of the indoor environment, that is, more appearance information exists in the form of line segments, the image is further subjected to line feature detection. LSD (Line Segment Detector) is a line segment detection algorithm based on image gradients, which can achieve sub-pixel accuracy. The principle is that the grayscale values of pixels at both ends of the line segment as the detection target are different, which will inevitably produce gradient values. Therefore, the pixels are merged and distinguished according to the size and direction of the gradient, and finally the line area is generated, thereby obtaining a series of line feature sets {l _i }, where each line feature l _i consists of its two endpoints e _i1 , e _i2 and the slope k _i .

考虑到LSD提取线特征往往会出现过碎的问题，因此要对提取的线特征进行合并。合并的标准根据线段端点间最小距离和线段斜率进行判断：当两线特征(线段)l_i、l_j间的最小端点距离min{||{e_i1,e_i2}-{{e_j1,e_j2}||}小于阈值T_d且斜率之差|k_i-k_j|小于阈值T_k时，可以认为两线特征斜率一致，且端点连续，即可将其合并为一条线特征l_n,新的线特征l_n的端点e_n1,e_n2由l_i、l_j两线段距离最远的两个端点构成，斜率k_n根据新端点重新计算。其中{e_j1,e_j2}表示线特征l_j的两个端点，k_j表示线特征l_j的斜率。Considering that the line features extracted by LSD are often too fragmented, the extracted line features need to be merged. The merging standard is based on the minimum distance between the line segment endpoints and the line segment slope: when the minimum endpoint distance min{||{e _i1 ,e _i2 }-{{e _j1 ,e _j2 }||} between two line features (line segments) l _i and l _j is less than the threshold T _d and the slope difference |k _i -k _j | is less than the threshold T _k , it can be considered that the slopes of the two line features are consistent and the endpoints are continuous, and they can be merged into a line feature l _n . The endpoints e _n1 and e _n2 of the new line feature l _n are composed of the two endpoints with the farthest distance between the two line segments l _i and l _j , and the slope k _n is recalculated based on the new endpoints. Among them, {e _j1 ,e _j2 } represents the two endpoints of the line feature l _j , and k _j represents the slope of the line feature l _j .

为避免线特征冗余，当线特征数量大于阈值n_l时，对线特征进行四叉树均匀化筛选，得到剔除过密区域多余线特征的线特征集合{l_i}。To avoid line feature redundancy, when the number of line features is greater than the threshold n _l , the line features are screened by quadtree uniformization to obtain a line feature set {l _i } that removes redundant line features in overcrowded areas.

步骤1.3线特征描述：Step 1.3 Line feature description:

为节省计算资源，本发明利用以特征点描述代表线特征描述的方法，根据特征点到线特征的距离，判断将其归属于线特征。In order to save computing resources, the present invention uses a method of using feature point description to represent line feature description, and determines whether a feature point belongs to a line feature according to the distance from the feature point to the line feature.

首先，根据端点和斜率，将线特征l_j所在直线表述为参数化直线：A_j·x+B_j·y+C_j，其中x，y为图像像素坐标，A_j、B_j、C_j为表述线特征l_j所在直线所需的参数，则特征点x_i到线特征l_j的距离d_ij为：First, according to the endpoints and slope, the straight line where the line feature _lj is located is expressed as a parameterized straight line: _Aj ·x+ _Bj ·y+ _Cj , where x and y are the image pixel coordinates, _Aj , _Bj , and _Cj are the parameters required to describe the straight line where the line feature _lj is located, and the distance _dij from the feature point _xi to the line feature _lj is:

其中，(u_i,v_i)为特征点x_i在图像像素坐标系的位置。当该距离d_ij小于阈值时，即认为该特征点属于该线特征。经过点线间距离计算后，可得到归属于线特征l_j的特征点集合以及不属于任何线特征的特征点集合{x_p}。而线特征l_j的描述，即由归属于该线特征的全体特征点描述子描述。Among them, (u_i, vi_i) is the position of the feature point x_i in the image pixel coordinate system. When the distance d _ij is less than the threshold, it is considered that the feature point belongs to the line feature. After calculating the distance between points and lines, the set of feature points belonging to the line feature l _j can be obtained And the feature point set {x _p } that does not belong to any line feature. The description of the line feature l _j is described by all the feature point descriptors belonging to the line feature.

步骤1.4点特征均匀化：Step 1.4 Point feature homogenization:

对于不属于任何线特征的特征点集合，为防止其冗余，当其点数量大于阈值n_p时，对特征点进行四叉树均匀化筛选，得到剔除过密区域多余特征点的特征点集合{x_pi}。For a set of feature points that do not belong to any line feature, in order to prevent redundancy, when the number of points is greater than the threshold _np , the feature points are screened by quadtree uniformization to obtain a set of feature points {x _pi } that removes redundant feature points in overcrowded areas.

步骤2、确立关键帧：Step 2: Establish key frames:

为提高回环检测效率，本发明通过图像帧确立关键帧，之后为实现回环检测所需的图像匹配都只在关键帧中进行，减少匹配所需计算量，包括：In order to improve the efficiency of loop detection, the present invention establishes key frames through image frames, and then the image matching required for loop detection is only performed in the key frames, reducing the amount of calculation required for matching, including:

步骤2.1确定帧数间隔：Step 2.1 Determine the frame interval:

考虑到正常定位移动速度，短时间内视觉感知到的场景不会出现太多变化，因此当前关键帧与上一关键帧必须间隔一定帧数的图像，当该间隔帧数大于阈值N_GAP时，当前帧才可能被确立为关键帧，该阈值N_GAP根据视觉图像的帧率改变。Taking into account the normal positioning movement speed, the visually perceived scene will not change much in a short period of time. Therefore, the current key frame must be separated from the previous key frame by a certain number of frames. When the interval number of frames is greater than the threshold N _GAP , the current frame may be established as a key frame. The threshold N _GAP changes according to the frame rate of the visual image.

步骤2.2确定特征数量：Step 2.2 Determine the number of features:

当经过点线特征提取后，为保证后续的匹配过程能够有充足的信息，因此需要保证特征的数量。当点特征数量大于阈值P_kf且线特征数量大于L_kf时，当前帧才可能被确立为关键帧。After point and line feature extraction, in order to ensure that the subsequent matching process has sufficient information, the number of features needs to be guaranteed. When the number of point features is greater than the threshold P _kf and the number of line features is greater than L _kf , the current frame may be established as a key frame.

步骤2.3确定共视关系：Step 2.3 Determine the common view relationship:

新确立的关键帧既需要与上一关键帧保证一定的间隔，减少信息的冗余，又需要和上一关键帧保持信息的连续，因此通过共视关系进行检验。共视指同一特征在两幅图像中都出现过，因此需要当前帧中的特征与上一关键帧中的特征的共视比例维持在阈值区间[Cov_L,Cov_U]。The newly established key frame needs to maintain a certain interval with the previous key frame to reduce information redundancy, and also needs to maintain information continuity with the previous key frame, so it is tested through the common view relationship. Common view means that the same feature appears in both images, so the common view ratio of the feature in the current frame and the feature in the previous key frame needs to be maintained within the threshold range [Cov _L ,Cov _U ].

如果当前帧通过以上步骤2.1-步骤2.3的全部三个检验标准，则将其确立为新的关键帧KF_n。经过以上三个检验标准进行判断,可以获得具有代表性的关键帧,之后回环检测只针对关键帧进行后续全局描述子的提取和匹配。If the current frame passes all three test criteria of steps 2.1 to 2.3 above, it will be established as a new key frame KF _n . After judging by the above three test criteria, a representative key frame can be obtained, and then the loop detection only extracts and matches the subsequent global descriptors for the key frame.

步骤3、提取全局描述子：Step 3: Extract global descriptor:

为了获得全局图像的描述子向量，本发明采用VLAD(Vector of LocalAggregated Descriptors,局部描述子聚合向量)的思路，但不仅限于对特征点的聚类描述，还实现了对线特征的聚类描述，从而实现了包含点线特征的全局描述子提取。In order to obtain the descriptor vector of the global image, the present invention adopts the idea of VLAD (Vector of Local Aggregated Descriptors), but is not limited to the cluster description of feature points, but also realizes the cluster description of line features, thereby realizing the extraction of global descriptors including point and line features.

步骤3.1点特征聚类：Step 3.1 Point feature clustering:

对于点特征，本发明利用Kmeans++聚类，实现简单快速，其在初始化时逐个选取K个簇中心，且离其它簇中心越远的样本点越有可能被选为下一个簇中心，之后根据特征点与簇中心的距离不断更新K个簇和簇中心。完成聚类后最终获得K个特征点簇和对应的聚类中心特征点 For point features, the present invention uses Kmeans++ clustering, which is simple and fast. It selects K cluster centers one by one during initialization, and the sample points farther away from other cluster centers are more likely to be selected as the next cluster center. Then, the K clusters and cluster centers are continuously updated according to the distance between the feature points and the cluster centers. After clustering is completed, K feature point clusters and corresponding cluster center feature points are finally obtained.

步骤3.2线特征分组：Step 3.2 Line feature grouping:

考虑到现实室内场景，线特征常常出现在物体的边缘，且通常多条连续(具备同一个端点)的线段构成一个结构(如墙角、门框)，可以简要地将其归结为，具备连续关系的线段近似属于同一个结构，而本发明就基于此特性进行线特征分组。Taking into account real indoor scenes, line features often appear at the edges of objects, and usually multiple continuous (with the same endpoint) line segments constitute a structure (such as a wall corner, door frame). It can be simply summarized as line segments with a continuous relationship approximately belong to the same structure, and the present invention groups line features based on this feature.

初始化时逐步选择线段，确保新选择的线段与已选择的线段最小端点距离大于阈值，最终获得L个初始线段，分属L个线段组。当图片中无法寻找到L条这样的线段时，对不足的线段进行随机挑选。During initialization, line segments are selected step by step to ensure that the distance between the minimum endpoint of the newly selected line segment and the selected line segment is greater than the threshold, and finally L initial line segments are obtained, which belong to L line segment groups. When L such line segments cannot be found in the image, the insufficient line segments are randomly selected.

之后开始对剩余线段与已确立的线段组中的线段进行距离计算，当某一剩余线段l_l与某线段组中任意线段l_a的最小端点距离min{||{e_l1,e_l2}-{e_a1,e_a2}||小于阈值T_c时，就认为二者连续，并将其加入该线段组，当某线段同时与两个线段组满足以上关系时，则合并这两个线段组。剩余线段l_l的两个端点为e_l1,e_l2，任意线段l_a的两个端点为e_a1,e_a2。由此遍历所有线特征，遍历完成后，通常会出现两种情况：Then, the distance calculation between the remaining line segments and the line segments in the established line segment group is started. When the minimum endpoint distance min{||{e _l1 ,e _l2 }-{e _a1 ,e _a2 }|| between a remaining line segment l _l and any line segment l _a in a line segment group is less than the threshold T _c , the two are considered continuous and added to the line segment group. When a line segment satisfies the above relationship with two line segment groups at the same time, the two line segment groups are merged. The two endpoints of the remaining line segment l _l are e _l1 ,e _l2 , and the two endpoints of any line segment l _a are e _a1 ,e _a2 . All line features are traversed in this way. After the traversal is completed, two situations usually occur:

此时仍有不属于任何一个线段簇的线段，此时将其加入线段数量最小的线段组；At this time, there are still line segments that do not belong to any line segment cluster. At this time, add them to the line segment group with the smallest number of line segments;

此时线段组的数量已经不足L，则从数量最大的线段组进行拆分，将其均匀地对分为两个线段组，直至满足L个线段组。At this time, the number of line segment groups is less than L, so the line segment group with the largest number is split and evenly divided into two line segment groups until L line segment groups are met.

当获得L个线段组后，就要计算其组中心点，通过计算属于该组线段的特征点的质心，再将这些特征点中与该质心距离最近的特征点作为组中心。至此，获得了L个线特征组和其对应的组中心特征点 After obtaining L line segment groups, its group center point is calculated. By calculating the centroid of the feature points belonging to the group segment, the feature point closest to the centroid is used as the group center.

步骤3.3全局特征残差计算：Step 3.3 Global feature residual calculation:

在得到L个线特征组和K个点特征簇以及对应的(L+K)个中心特征点后，通过将所有特征点与其中心特征点相减，就得到能够表达L+K个聚类范围的特征分布，而这L+K个局部集合特征则构成了全局描述子。具体计算公式如下：After obtaining L line feature groups and K point feature clusters and the corresponding (L+K) central feature points, by subtracting all feature points from their central feature points, we can get the feature distribution that can express the L+K clustering ranges, and these L+K local set features constitute the global descriptor. The specific calculation formula is as follows:

其中，V为全局描述子矩阵，V(m,j)为该矩阵的第m行第j列的元素，而其中j代表特征点描述子的第j维，共256维，m代表第m个局部集合，其中前L个为L个线特征组，后K个为K个点特征簇，共L+K维；首先计算L个线特征组，此时x_i代表属于线特征中特征点集合{x_l}中的点，代表特征点x_i的描述子，而/>代表第m个线特征组中心特征点c_m的描述子。a_k为归属函数，当特征点x_i属于第m个聚类簇时，a_k为1，否则a_k为0。之后计算K个特征点簇，此时x_i代表属于线特征中特征点集合{x_p}中的点,/>代表特征点x_i的描述子,而/>代表第(m-L)个点特征簇聚类中心特征点c_m-L的描述子。最终得到的V为(L+K)*256矩阵，其中每一行向量都是对一个线特征组或点特征聚类簇的描述向量。该式累加了每个聚类的所有特征残差，最终得到了(L+K)个全局特征。这(L+K)个全局特征表达了聚类范围内局部特征的某种分布，这种分布通过各个特征点与特征中心的相减，抹去了图像本身的特征分布差异，只保留了局部特征与聚类中心的分布差异。Where V is the global descriptor matrix, V(m,j) is the element in the mth row and jth column of the matrix, and j represents the jth dimension of the feature point descriptor, with a total of 256 dimensions. m represents the mth local set, of which the first L are L line feature groups and the last K are K point feature clusters, with a total of L+K dimensions. First, calculate the L line feature groups. At this time, _xi represents the points in the feature point set { _xl } in the line feature. represents the descriptor of feature point x _i , and/> Represents the descriptor of the central feature point c _m of the mth line feature group. a _k is the belonging function. When the feature point _xi belongs to the mth cluster, a _k is 1, otherwise a _k is 0. Then calculate K feature point clusters. At this time, _xi represents a point in the feature point set {x _p } in the line feature./> represents the descriptor of feature point x _i , and/> Represents the descriptor of the feature point c _mL of the cluster center of the (mL)th point feature cluster. The final V is a (L+K)*256 matrix, in which each row vector is a description vector of a line feature group or a point feature cluster. This formula accumulates all feature residuals of each cluster and finally obtains (L+K) global features. These (L+K) global features express a certain distribution of local features within the cluster range. This distribution eliminates the feature distribution differences of the image itself by subtracting each feature point from the feature center, and only retains the distribution differences between the local features and the cluster center.

步骤3.4L2范数归一化：Step 3.4 L2 norm normalization:

为了方便计算，将获得(K+L)*256维矩阵进行一维展开，即将行向量进行拼接，获得(L+K)*256一维向量V。之后对该描述向量进行L2范数归一化：从而在之后计算相似度时，可以确保每个向量的表达形式在数值上是统一的，从而更容易比较图像之间的相似度。For the convenience of calculation, the (K+L)*256 dimensional matrix is expanded in one dimension, that is, the row vectors are concatenated to obtain a (L+K)*256 one-dimensional vector V. Then the description vector is normalized by the L2 norm: Therefore, when calculating the similarity later, it can be ensured that the expression of each vector is numerically uniform, making it easier to compare the similarity between images.

步骤4、匹配历史关键帧：Step 4: Match historical keyframes:

对每一帧关键帧进行全局描述子的提取后，就要对当前关键帧的描述子与历史关键帧进行匹配。After extracting the global descriptor for each key frame, the descriptor of the current key frame must be matched with the historical key frames.

步骤4.1相似度计算：Step 4.1 Similarity calculation:

本发明利用余弦相似度进行向量匹配，遍历所有历史关键帧KF_j的全局描述子V_j并与当前关键帧KF_n的全局描述子V_n，计算其相似度Sim(n,j)为：The present invention uses cosine similarity to perform vector matching, traverses the global descriptors _Vj of all historical key frames _KFj and compares them with the global descriptor _Vn of the current key frame _KFn , and calculates their similarity Sim(n,j) as:

由于描述子向量已经进行L2范数归一化，所以||V_n||＝||V_j||＝1，则Sim(n,j}＝V_n*V_j。完成对所有历史关键帧的相似度计算后，若没有相似度大于阈值Sim_th的关键帧，则代表该图像中的位置没有在历史信息中出现过，反之则选出大于阈值Sim_th的相似度最大前3个候选关键帧{KF_c}(不足3个则选择所有大于阈值Sim_th的关键帧)，并根据相似度大小依次进行几何检验。Since the descriptor vector has been L2-norm normalized, || _Vn ||＝|| _Vj ||＝1, then Sim(n,j}＝ _Vn * _Vj . After completing the similarity calculation for all historical key frames, if there is no key frame with a similarity greater than the threshold _Simth , it means that the position in the image has never appeared in the historical information. Otherwise, the top three candidate key frames { _KFc } with the largest similarity greater than the threshold _Simth are selected (if there are less than three, all key frames greater than the threshold _Simth are selected), and geometric tests are performed in turn according to the similarity.

步骤4.2进行几何校验：Step 4.2: Perform geometry verification:

将当前关键帧和候选关键帧进行特征点与线特征匹配，进行RANSAC(RAndomSAmple Consensus,随机采样一致)外点剔除后完成位姿解算，并且根据该位姿进行双向重投影，得到优化过后匹配的内点，当内点数量大于阈值P_th时，则可认为两者符合现实的几何关系，即是现实世界的同一位置。当所有候选关键帧都无法通过几何校验时，则认为此时只是外观相近，而并非同一地点。The current keyframe and the candidate keyframes are matched with feature points and line features, and the pose solution is completed after RANSAC (RAndomSAmple Consensus) outliers are eliminated. Bidirectional reprojection is performed based on the pose to obtain optimized matching inliers. When the number of inliers is greater than the threshold _Pth , it can be considered that the two conform to the actual geometric relationship, that is, they are at the same location in the real world. When all candidate keyframes fail to pass the geometric check, it is considered that they are only similar in appearance, but not at the same location.

步骤4.3进行连续性校验：Step 4.3: Continuity check:

当有候选关键帧KF_c通过几何校验时，为确保是同一位置，还需要对该关键帧之后的连续三帧历史帧F_c+1、F_c+2、F_c+3与当前关键帧V_n之后的连续三帧F_n+1、F_n+2、F_n+3同样依次进行几何校验。当连续帧都通过几何校验后，则可以确认当前来到了同一位置，之后向定位系统输出当前关键帧和候选关键帧的匹配关系，从而完成回环检测。When a candidate key frame KF _c passes the geometric check, to ensure that it is at the same position, it is also necessary to perform geometric checks on the three consecutive historical frames F _c+1 , F _c+2 , and F _c+3 after the key frame and the three consecutive frames F _n+1 , F _n+2 , and F _n+3 after the current key frame V _n . When all consecutive frames pass the geometric check, it can be confirmed that the current position is the same, and then the matching relationship between the current key frame and the candidate key frame is output to the positioning system to complete the loop detection.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. An indoor vision loop detection method based on a dotted line feature global descriptor is characterized by comprising the following steps:

Step 1, extracting feature points and descriptors of an image by utilizing a SuperPoint model of a convolutional neural network, detecting line features by utilizing a line segment detection method, extracting descriptors of the line features by utilizing a method for representing the line features by utilizing the point features, and finally extracting the point features, the line features and the descriptors of the image;

Step2, checking each frame of image according to three standards of frame number interval, feature quantity and common view relation, so as to complete establishment of key frames;

step 3, clustering the point features, grouping the line features according to the intersecting relation, and completing calculation of global feature residual errors according to clustering results, grouping results and the local feature point descriptors to which the clustering results and the grouping results belong, so as to obtain global descriptors of the key frames;

step 4, performing similarity calculation on the global descriptor generated according to the current key frame and the global descriptor of the historical key frame to obtain candidate matching key frames; and performing geometric check and continuity check on the candidate matching key frames to obtain matching key frames confirmed to be in the same position, and completing loop detection.

2. The method for detecting indoor visual loop back based on global descriptor with dotted line features according to claim 1, wherein the step 1 comprises:

Step 1.1, performing automatic feature point extraction by using SuperPoint model based on convolutional neural network, which uses encoder-decoder architecture, wherein the encoder realizes the dimension reduction and feature extraction of the image through a plurality of convolutional layers and a maximum pooling layer, and the decoder is responsible for decoding the feature information compressed by the encoder to obtain the required information type; in SuperPoint model, there are two decoders, one is in charge of obtaining feature point position, and the other is in charge of obtaining feature point descriptor; after SuperPoint model, a series of feature point sets { x _i } in the image are obtained, each feature point x _i is composed of its position (u _i,v_i) in the image pixel coordinate system and a 256 x 1-dimensional descriptor Constructing;

Step 1.2, combining the characteristic of indoor environment structuring, namely that more appearance information exists in the form of line segments, continuing to perform line feature detection on the image, and adopting a line segment detection algorithm based on image gradients to realize sub-pixel level precision; the difference of the pixel gray values of the two end regions of the line segment serving as a detection target inevitably generates gradient values, so that pixels are combined and distinguished according to the size and the direction of the gradient, and a line region is finally generated, so that a series of line feature sets { l _i } are obtained, and each line feature l _i is composed of two end points e _i1,e_i2 and a slope k _i;

Combining the extracted line features, and judging the combination standard according to the minimum distance between line segment endpoints and the slope of the straight line: when the minimum endpoint distance min { ||{ e _i1,e_i2}-{e_j1,e_j2 } || } between the two line features l _i、l_j is smaller than the threshold T _d and the difference of the slopes |k _i-k_j | is smaller than the threshold T _k, the slopes of the two line features are considered to be consistent, the endpoints are continuous, namely the two line features are combined into one line feature l _n, the endpoint e _n1,e_n2 of the new line feature l _n is formed by two endpoints with the farthest two line segments of l _i、l_j, and the slope k _n is recalculated according to the new endpoint; where { e _j1,e_j2 } represents the two endpoints of line feature l _j, and k _j represents the slope of line feature l _j;

When the number of the line features is larger than a threshold value n _l, performing quadtree homogenization screening on the line features to obtain a line feature set { l _i } with redundant line features in an over-dense area removed;

step 1.3, judging whether the characteristic points are assigned to the line characteristics according to the distances from the characteristic points to the line characteristics by using a method for representing the line characteristics by using the characteristic point descriptions:

First, the line feature l _j is expressed as a parameterized straight line according to the endpoints and slopes: a _j·x+B_j·y+C_j, where x, y are image pixel coordinates, a _j、B_j、C_j is a parameter required for expressing a straight line where the line feature l _j is located, and a distance d _ij from the feature point x _i to the line feature l _j is:

Wherein, (u _i,v_i) is the position of the feature point x _i in the image pixel coordinate system; when the distance d _ij is smaller than the threshold value, the feature point is considered to belong to the line feature; after the calculation of the distance between the dotted lines, a characteristic point set belonging to the line characteristic l _j is obtained And a feature point set { x _p } that does not belong to any line feature; a description of a line feature l _j, i.e., by a global feature point descriptor attributed to the line feature;

And 1.4, for the feature point set which does not belong to any line feature, in order to prevent redundancy, performing quadtree homogenization screening on the feature points when the number of points is larger than a threshold value n _p, and obtaining a feature point set { x _pi } with redundant feature points in an over-dense region removed.

3. The method for detecting indoor visual loop back based on global descriptor with dotted line features according to claim 2, wherein the step 2 comprises:

Step 2.1 determining a frame number interval: an image in which a current key frame is spaced from a previous key frame by a certain number of frames, the current frame being established as a key frame when the number of frames spaced is greater than a threshold N _GAP, the threshold N _GAP being varied according to a frame rate of the visual image;

Step 2.2, determining the feature quantity: the current frame may be established as a key frame only when the number of point features is greater than the threshold P _kf and the number of line features is greater than L _kf;

Step 2.3, determining a common view relation: common view means that the same feature appears in both images, so that the common view ratio of the feature in the current frame and the feature in the previous key frame is maintained at a threshold interval [ Cov _L,Cov_U ];

If the current frame passes all three of the above inspection criteria of steps 2.1-2.3, it is established as a new key frame KF _n; and judging through the three inspection standards to obtain a representative key frame, and then performing loop detection to extract and match the subsequent global descriptor only for the key frame.

4. The method for detecting indoor visual loop back based on global descriptor with dotted line features according to claim 3, wherein the step 3 comprises:

Step 3.1, clustering the point characteristics: for the point characteristics, K cluster centers are selected one by one in the initialization process by utilizing Kmeans++ clustering, and sample points which are farther away from other cluster centers are more likely to be selected as the next cluster center, and then the K clusters and the cluster centers are continuously updated according to the distance between the characteristic points and the cluster centers; finally obtaining K characteristic point clusters and corresponding clustering center characteristic points after completing clustering

Step 3.2 line feature grouping:

Gradually selecting line segments during initialization, ensuring that the distance between the newly selected line segments and the minimum endpoint of the selected line segments is greater than a threshold value, and finally obtaining L initial line segments which belong to L line segment groups; when L line segments cannot be found in the picture, randomly selecting the insufficient line segments;

Then, starting to calculate the distance between the rest line segment and the line segment in the established line segment group, when the minimum endpoint distance min of a certain rest line segment l _l and any line segment l _a in a certain line segment group is smaller than a threshold T _c, considering that the rest line segment l _l and the line segment group are continuous, adding the rest line segment into the line segment group, and when a certain line segment simultaneously meets the above relation with the two line segment groups, merging the two line segment groups; two endpoints of the remaining line segment l _l are e _l1,e_l2, and two endpoints of the arbitrary line segment l _a are e _a1,e_a2;

Thus, all line features are traversed, and after the traversing is completed, two cases occur:

(1) If the line segments do not belong to any line segment cluster, adding the line segments into the line segment group with the minimum line segment number;

(2) If the number of the line segment groups is less than L, splitting from the line segment group with the largest number, and uniformly dividing the line segment group into two line segment groups until the line segment groups meet the L line segment groups;

after obtaining L line segment groups, calculating the group center point, calculating the mass center of the characteristic points belonging to the group of line segments, and taking the characteristic point closest to the mass center in the characteristic points as the group center; thus, L line characteristic groups and corresponding group center characteristic points are obtained

Step 3.3 global feature residual calculation:

After obtaining L line feature groups, K point feature clusters and corresponding (L+K) central feature points, subtracting all the feature points from the central feature points to obtain feature distribution expressing L+K clustering ranges, wherein the L+K local set features form a global descriptor, and the specific calculation formula is as follows:

wherein V is a global descriptor matrix, V (m, j) is an element of an mth row and an mth column of the matrix, j represents an mth dimension of a feature point descriptor, and m represents an mth local set, wherein the former L are L line feature groups, the latter K are K point feature clusters, and L+K dimensions are formed; l line feature sets are first computed, where x _i represents a point in the feature point set x _l that belongs to the line feature, Descriptor representing feature point x _i/(A descriptor representing the m-th line feature group center feature point c _m; a _k is a attribution function, when the feature point x _i belongs to the m-th cluster, a _k is 1, otherwise a _k is 0; k feature point clusters are then computed, where x _i represents a point in the feature point set { x _p }, which belongs to the line feature >Descriptor representing feature point x _i/(A descriptor representing the (m-L) th point feature cluster class center feature point c _m-L; the final V is an (L+K) 256 matrix, wherein each row of vectors is a description vector of a line feature group or a point feature cluster; the method adds up all the feature residuals of each cluster to finally obtain (L+K) global features;

Step 3.4L2, norm normalization: one-dimensional expansion is carried out on the obtained (K+L) 256-dimensional matrix, namely, row vectors are spliced, and a (L+K) 256-dimensional vector V is obtained; the description vector is then L2 norm normalized: so that the expression form of each vector is ensured to be numerically uniform when the similarity is calculated later, and the similarity between images is more easily compared.

5. The method for detecting indoor visual loop back based on global descriptor with dotted line features according to claim 4, wherein the step 4 comprises:

Step 4.1 similarity calculation: vector matching is carried out by utilizing cosine similarity, the global descriptor V _j of all the historical key frames KF _j is traversed, the global descriptor V _n of the current key frame KF _n is traversed, and the similarity Sim (n, j) is calculated as follows:

Since the descriptor vector has been L2 norm normalized, ||v _n||＝||V_j |=1, then Sim (n, j) =v _n*V_j; after the similarity calculation of all the historical key frames is completed, if no key frame with the similarity being greater than a threshold value Sim _th exists, the position in the image is represented to not appear in the historical information, otherwise, the first 3 candidate key frames { KF _c } with the maximum similarity being greater than the threshold value Sim _th are selected, and if the number of the candidate key frames is less than 3, all the key frames with the similarity being greater than the threshold value Sim _th are selected, and geometric inspection is sequentially carried out according to the similarity;

And 4.2, performing geometric verification: matching the characteristic points and the line characteristics of the current key frame and the candidate key frame, removing the RANSAC outliers, completing pose calculation, and carrying out two-way reprojection according to the pose to obtain matched interior points after optimization, wherein when the number of the interior points is greater than a threshold value P _th, the two are considered to be in accordance with the actual geometric relationship, namely the same position in the real world; when all candidate key frames cannot pass the geometric verification, the candidate key frames are considered to be similar in appearance at the moment, and are not in the same place;

Step 4.3, performing continuity check: when candidate key frames KF _c pass the geometric check, in order to ensure the same position, the continuous three-frame historical frame F _c+1、F_c+2、F_c+3 after the key frames and the continuous three-frame F _n+1、F_n+2、F_n+3 after the current key frame V _n are subjected to the geometric check in turn; when the continuous frames pass the geometric verification, the current arrival at the same position can be confirmed, and then the matching relation between the current key frame and the candidate key frame is output to the positioning system, so that loop detection is completed.