CN118038239A - An indoor visual loop detection method based on point-line feature global descriptor - Google Patents
An indoor visual loop detection method based on point-line feature global descriptor Download PDFInfo
- Publication number
- CN118038239A CN118038239A CN202410197669.8A CN202410197669A CN118038239A CN 118038239 A CN118038239 A CN 118038239A CN 202410197669 A CN202410197669 A CN 202410197669A CN 118038239 A CN118038239 A CN 118038239A
- Authority
- CN
- China
- Prior art keywords
- line
- feature
- point
- line segment
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 33
- 230000000007 visual effect Effects 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims description 19
- 239000013598 vector Substances 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000000265 homogenisation Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000011176 pooling Methods 0.000 claims description 2
- 238000007689 inspection Methods 0.000 claims 3
- 238000012216 screening Methods 0.000 claims 2
- 230000007774 longterm Effects 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 10
- 239000000284 extract Substances 0.000 description 4
- 238000012360 testing method Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical Field
本发明属于视觉室内定位技术领域,具体涉及一种基于点线特征全局描述子的室内视觉回环检测方法,可用于实现室内场景下低成本便携视觉传感器的长期定位。The present invention belongs to the technical field of visual indoor positioning, and specifically relates to an indoor visual loop detection method based on a point-line feature global descriptor, which can be used to achieve long-term positioning of low-cost portable visual sensors in indoor scenes.
背景技术Background technique
随着社会的快速发展,室内定位在社会生活的各种领域都具备重大需求,在商业领域,在大型建筑物内,如购物中心、机场、医院或大型办公楼内需要导航服务,提高用户体验;在娱乐领域,室内定位技术可以用于虚拟和增强现实应用,提供更真实、交互性更强的体验;在工业领域,各种物联网移动机器需要知道自身准确位置,以便更好地执行任务。而全球导航卫星系统(Global Navigation Satellite System,GNSS)是当前最传统和应用最广泛的定位技术,但在室内场景中,GNSS信号被建筑物遮挡、阻隔,定位结果精度受到极大的影响。同时,其大尺度级别上的定位精度难以无法满足小尺度环境下的运动。此外,一些传统室内定位手段例如UWB定位技术、Wi-Fi指纹技术,其根据不同基站接受到的信号强度,通过三角测量等技术推算当前位置,但其却必须依赖预先部署的基站,不具备自主定位的能力,适用场景少、限制多,因此需要引入自主定位技术。With the rapid development of society, indoor positioning has great demand in various fields of social life. In the commercial field, navigation services are needed in large buildings such as shopping malls, airports, hospitals or large office buildings to improve user experience; in the entertainment field, indoor positioning technology can be used for virtual and augmented reality applications to provide a more realistic and interactive experience; in the industrial field, various IoT mobile machines need to know their exact location in order to better perform tasks. The Global Navigation Satellite System (GNSS) is the most traditional and widely used positioning technology at present, but in indoor scenes, GNSS signals are blocked and blocked by buildings, and the accuracy of positioning results is greatly affected. At the same time, its positioning accuracy at a large scale level is difficult to meet the movement in a small scale environment. In addition, some traditional indoor positioning methods such as UWB positioning technology and Wi-Fi fingerprint technology, which calculate the current position through triangulation and other technologies based on the signal strength received by different base stations, but they must rely on pre-deployed base stations, do not have the ability of autonomous positioning, are applicable to few scenarios, and have many restrictions, so it is necessary to introduce autonomous positioning technology.
而基于视觉的SLAM技术是当前最主流的自主定位技术。SLAM(SimultaneousLocalization and Mapping,同步定位与建图)的核心目标是使机器或行人能够在未知环境中进行自主导航,同时构建地图。其依赖的传感器大概可以分为三类:视觉传感器、激光雷达、惯性测量单元IMU。其中视觉传感器相较于惯性传感器具备能够感知周边环境的能力,而相较于激光雷达又具备价格低廉、便携的优点,因此是目前自主定位技术中最常用的传感器。而基于视觉的SLAM通常分为三个环节:前端跟踪、后端优化以及回环检测。由于视觉SLAM的位姿计算本质依赖于图像间的匹配递推,误差不可避免地累积并最终导致位姿的发散。而回环检测正是通过识别同一位置,检测到达相同地点,实现全局的约束,从而避免位姿发散。The vision-based SLAM technology is currently the most mainstream autonomous positioning technology. The core goal of SLAM (Simultaneous Localization and Mapping) is to enable machines or pedestrians to navigate autonomously in unknown environments and build maps at the same time. The sensors it relies on can be roughly divided into three categories: visual sensors, lidar, and inertial measurement units (IMUs). Compared with inertial sensors, visual sensors have the ability to perceive the surrounding environment, and compared with lidar, they have the advantages of low price and portability. Therefore, they are the most commonly used sensors in autonomous positioning technology. The vision-based SLAM is usually divided into three links: front-end tracking, back-end optimization, and loop detection. Since the pose calculation of visual SLAM essentially depends on the matching recursion between images, errors inevitably accumulate and eventually lead to pose divergence. Loop detection is to achieve global constraints by identifying the same position and detecting the arrival at the same place, thereby avoiding pose divergence.
最常见的回环检测方法基于词袋模型实现,其依赖大量场景进行训练,得到数量庞大的词袋库,图像根据词袋库构建自身的单词向量,并据此进行匹配。该方法依赖于特征点,实现简单,但庞大的词袋库加载缓慢,且不适用于所有场景。同时,在室内场景中,常常具备结构化和低纹理的特点,即图像中特征点信息减少,线特征信息增多。综合以上特点,针对室内定位场景,亟需构建一种基于结构化点线特征,同时不依赖先验构建词库,仅根据图像间信息进行匹配的全局描述匹配方法,以实现视觉SLAM过程中的回环检测,满足视觉自主定位在室内场景全局轨迹一致的要求。The most common loop detection method is based on the bag-of-words model, which relies on a large number of scenes for training to obtain a large number of bag-of-words libraries. The image builds its own word vector based on the bag-of-words library and matches it accordingly. This method relies on feature points and is simple to implement, but the huge bag-of-words library is slow to load and is not suitable for all scenes. At the same time, indoor scenes often have the characteristics of structure and low texture, that is, the feature point information in the image is reduced and the line feature information is increased. Combining the above characteristics, for indoor positioning scenes, it is urgent to build a global description matching method based on structured point and line features, which does not rely on a priori word library construction and only matches based on information between images, so as to realize loop detection in the visual SLAM process and meet the requirements of global trajectory consistency of visual autonomous positioning in indoor scenes.
发明内容Summary of the invention
为解决现有技术应用于线特征充足而点特征较少的结构化室内场景下视觉定位发散的问题,本发明提供一种基于点线特征全局描述子的室内视觉回环检测方法,基于图像外观结构化点线特征信息,面向室内场景的定位需要,能够满足视觉长期定位轨迹一致的室内场景回环检测。In order to solve the problem of divergence of visual positioning in structured indoor scenes with sufficient line features but fewer point features in the prior art, the present invention provides an indoor visual loop detection method based on a global descriptor of point and line features. This method is based on the structured point and line feature information of image appearance, is oriented to the positioning needs of indoor scenes, and can meet the indoor scene loop detection with consistent visual long-term positioning trajectory.
为达到上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical scheme:
一种基于点线特征全局描述子的室内视觉回环检测方法,包括如下步骤:An indoor visual loop detection method based on a point-line feature global descriptor comprises the following steps:
步骤1、利用卷积神经网络的SuperPoint模型完成图像的特征点及其描述子的提取,利用线段检测方法完成线特征检测,并利用以点特征表示线特征的方法完成线特征的描述子的提取,最终完成图像的点特征、线特征及描述子的提取;Step 1: Use the SuperPoint model of the convolutional neural network to complete the extraction of the feature points and their descriptors of the image, use the line segment detection method to complete the line feature detection, and use the method of representing line features with point features to complete the extraction of the descriptor of the line feature, and finally complete the extraction of the point features, line features and descriptors of the image;
步骤2、根据帧数间隔、特征数量和共视关系三个标准,对每一帧图像进行检验,从而完成关键帧的确立;Step 2: According to the three criteria of frame interval, feature quantity and co-viewing relationship, each frame of the image is tested to complete the establishment of the key frame;
步骤3、针对关键帧,通过将点特征进行聚类,并将线特征根据相交关系进行分组,并根据聚类结果和分组结果以及所属的局部的特征点的描述子,完成全局特征残差的计算,从而获得关键帧的全局描述子;Step 3: For the key frame, cluster the point features and group the line features according to the intersection relationship, and calculate the global feature residual according to the clustering result and the grouping result and the descriptor of the local feature point to obtain the global descriptor of the key frame;
步骤4、根据当前关键帧生成的全局描述子,与历史关键帧的全局描述子进行相似度计算,得到候选匹配关键帧;针对候选匹配关键帧进行几何校验和连续性校验,获得确认为同一位置的匹配关键帧,完成回环检测。Step 4: Calculate the similarity between the global descriptor generated by the current key frame and the global descriptor of the historical key frame to obtain a candidate matching key frame; perform geometric verification and continuity verification on the candidate matching key frame to obtain a matching key frame confirmed to be at the same position, and complete loop detection.
有益效果:Beneficial effects:
本发明中的方法基于图像匹配技术,根据室内结构化特点,利用图像间自身外观信息,通过点线特征聚类和分组,实现自局部到整体的图像全局描述方法,无需先验构建的视觉词袋,即可实现相同位置的检测和识别,广泛适用于室内结构化环境,从而实现视觉定位的长期轨迹一致。The method in the present invention is based on image matching technology. According to the structured characteristics of indoor environments, it utilizes the appearance information of images themselves and realizes a global image description method from local to overall through point and line feature clustering and grouping. It can realize detection and recognition of the same position without the need for a priori constructed visual word bags. It is widely applicable to indoor structured environments, thereby achieving long-term trajectory consistency of visual positioning.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明的一种基于点线特征全局描述子的室内视觉回环检测方法的流程图。FIG1 is a flow chart of an indoor visual loop detection method based on a point-line feature global descriptor of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
如图1所示,本发明的一种基于点线特征全局描述子的室内视觉回环检测方法包括如下步骤:As shown in FIG1 , an indoor visual loop detection method based on a point-line feature global descriptor of the present invention comprises the following steps:
步骤1、提取图像点线特征及描述子;Step 1: Extract image point and line features and descriptors;
步骤2、确立关键帧;Step 2: Establish key frames;
步骤3、生成全局描述子;Step 3: Generate a global descriptor;
步骤4、匹配历史关键帧。Step 4: Match historical key frames.
具体地,本发明实施例以室内场景为例,描述基于点线特征全局描述子的视觉回环检测方法的具体步骤。Specifically, the embodiment of the present invention takes an indoor scene as an example to describe the specific steps of a visual loop detection method based on a point-line feature global descriptor.
步骤1、提取图像点线特征及描述子:Step 1: Extract image point and line features and descriptors:
在SLAM定位和图像匹配中,直接对像素进行操作计算量过大,因此常常首先进行特征的提取,以实现降维效果。In SLAM positioning and image matching, directly operating on pixels is too computationally intensive, so feature extraction is often performed first to achieve dimensionality reduction.
步骤1.1特征点提取和描述:Step 1.1 Feature point extraction and description:
为提高特征点提取效果,本发明利用基于卷积神经网络的SuperPoint模型进行自动特征点提取,其使用编码器-解码器(Encoder-Decoder)架构:其中编码器Encoder经由多个卷积层和最大池化层,实现图像的降维和特征提取。而解码器则负责将编码器压缩得到的特征信息进行解码,以获得所需要的信息类型。在SuperPoint模型中共有两个解码器:一个负责获得特征点位置,一个获得特征点描述子。由此,经过SuperPoint模型后,可以获得图像中一系列特征点集合{xi},每一个特征点xi都由其在图像像素坐标系的位置(ui,vi)和一个256*1维的描述子构成。In order to improve the effect of feature point extraction, the present invention uses the SuperPoint model based on convolutional neural network to perform automatic feature point extraction, which uses an encoder-decoder architecture: the encoder Encoder realizes image dimensionality reduction and feature extraction through multiple convolution layers and maximum pooling layers. The decoder is responsible for decoding the feature information compressed by the encoder to obtain the required information type. There are two decoders in the SuperPoint model: one is responsible for obtaining the feature point position, and the other is responsible for obtaining the feature point descriptor. Therefore, after the SuperPoint model, a series of feature point sets {x i } in the image can be obtained, and each feature point x i is composed of its position (u i , vi ) in the image pixel coordinate system and a 256*1 dimensional descriptor. constitute.
步骤1.2线特征提取:Step 1.2 Line feature extraction:
结合室内环境结构化的特点,即较多外观信息以线段的形式存在,因此对图像继续进行线特征检测。LSD(Line Segment Detector,线段检测器)是一种基于图像梯度的线段检测算法,可以实现亚像素级的精度。其原理为作为检测目标的线段,其两端区域的像素灰度值不同必然会产生梯度值,因此,根据将梯度的大小和方向对像素进行合并和区分,最终生成线区域,从而获得一系列线特征集合{li},每条线特征li由其两个端点ei1,ei2和斜率ki构成。Considering the structured characteristics of the indoor environment, that is, more appearance information exists in the form of line segments, the image is further subjected to line feature detection. LSD (Line Segment Detector) is a line segment detection algorithm based on image gradients, which can achieve sub-pixel accuracy. The principle is that the grayscale values of pixels at both ends of the line segment as the detection target are different, which will inevitably produce gradient values. Therefore, the pixels are merged and distinguished according to the size and direction of the gradient, and finally the line area is generated, thereby obtaining a series of line feature sets {l i }, where each line feature l i consists of its two endpoints e i1 , e i2 and the slope k i .
考虑到LSD提取线特征往往会出现过碎的问题,因此要对提取的线特征进行合并。合并的标准根据线段端点间最小距离和线段斜率进行判断:当两线特征(线段)li、lj间的最小端点距离min{||{ei1,ei2}-{{ej1,ej2}||}小于阈值Td且斜率之差|ki-kj|小于阈值Tk时,可以认为两线特征斜率一致,且端点连续,即可将其合并为一条线特征ln,新的线特征ln的端点en1,en2由li、lj两线段距离最远的两个端点构成,斜率kn根据新端点重新计算。其中{ej1,ej2}表示线特征lj的两个端点,kj表示线特征lj的斜率。Considering that the line features extracted by LSD are often too fragmented, the extracted line features need to be merged. The merging standard is based on the minimum distance between the line segment endpoints and the line segment slope: when the minimum endpoint distance min{||{e i1 ,e i2 }-{{e j1 ,e j2 }||} between two line features (line segments) l i and l j is less than the threshold T d and the slope difference |k i -k j | is less than the threshold T k , it can be considered that the slopes of the two line features are consistent and the endpoints are continuous, and they can be merged into a line feature l n . The endpoints e n1 and e n2 of the new line feature l n are composed of the two endpoints with the farthest distance between the two line segments l i and l j , and the slope k n is recalculated based on the new endpoints. Among them, {e j1 ,e j2 } represents the two endpoints of the line feature l j , and k j represents the slope of the line feature l j .
为避免线特征冗余,当线特征数量大于阈值nl时,对线特征进行四叉树均匀化筛选,得到剔除过密区域多余线特征的线特征集合{li}。To avoid line feature redundancy, when the number of line features is greater than the threshold n l , the line features are screened by quadtree uniformization to obtain a line feature set {l i } that removes redundant line features in overcrowded areas.
步骤1.3线特征描述:Step 1.3 Line feature description:
为节省计算资源,本发明利用以特征点描述代表线特征描述的方法,根据特征点到线特征的距离,判断将其归属于线特征。In order to save computing resources, the present invention uses a method of using feature point description to represent line feature description, and determines whether a feature point belongs to a line feature according to the distance from the feature point to the line feature.
首先,根据端点和斜率,将线特征lj所在直线表述为参数化直线:Aj·x+Bj·y+Cj,其中x,y为图像像素坐标,Aj、Bj、Cj为表述线特征lj所在直线所需的参数,则特征点xi到线特征lj的距离dij为:First, according to the endpoints and slope, the straight line where the line feature lj is located is expressed as a parameterized straight line: Aj ·x+ Bj ·y+ Cj , where x and y are the image pixel coordinates, Aj , Bj , and Cj are the parameters required to describe the straight line where the line feature lj is located, and the distance dij from the feature point xi to the line feature lj is:
其中,(u_i,v_i)为特征点x_i在图像像素坐标系的位置。当该距离dij小于阈值时,即认为该特征点属于该线特征。经过点线间距离计算后,可得到归属于线特征lj的特征点集合以及不属于任何线特征的特征点集合{xp}。而线特征lj的描述,即由归属于该线特征的全体特征点描述子描述。Among them, (u_i, vi_i) is the position of the feature point x_i in the image pixel coordinate system. When the distance d ij is less than the threshold, it is considered that the feature point belongs to the line feature. After calculating the distance between points and lines, the set of feature points belonging to the line feature l j can be obtained And the feature point set {x p } that does not belong to any line feature. The description of the line feature l j is described by all the feature point descriptors belonging to the line feature.
步骤1.4点特征均匀化:Step 1.4 Point feature homogenization:
对于不属于任何线特征的特征点集合,为防止其冗余,当其点数量大于阈值np时,对特征点进行四叉树均匀化筛选,得到剔除过密区域多余特征点的特征点集合{xpi}。For a set of feature points that do not belong to any line feature, in order to prevent redundancy, when the number of points is greater than the threshold np , the feature points are screened by quadtree uniformization to obtain a set of feature points {x pi } that removes redundant feature points in overcrowded areas.
步骤2、确立关键帧:Step 2: Establish key frames:
为提高回环检测效率,本发明通过图像帧确立关键帧,之后为实现回环检测所需的图像匹配都只在关键帧中进行,减少匹配所需计算量,包括:In order to improve the efficiency of loop detection, the present invention establishes key frames through image frames, and then the image matching required for loop detection is only performed in the key frames, reducing the amount of calculation required for matching, including:
步骤2.1确定帧数间隔:Step 2.1 Determine the frame interval:
考虑到正常定位移动速度,短时间内视觉感知到的场景不会出现太多变化,因此当前关键帧与上一关键帧必须间隔一定帧数的图像,当该间隔帧数大于阈值NGAP时,当前帧才可能被确立为关键帧,该阈值NGAP根据视觉图像的帧率改变。Taking into account the normal positioning movement speed, the visually perceived scene will not change much in a short period of time. Therefore, the current key frame must be separated from the previous key frame by a certain number of frames. When the interval number of frames is greater than the threshold N GAP , the current frame may be established as a key frame. The threshold N GAP changes according to the frame rate of the visual image.
步骤2.2确定特征数量:Step 2.2 Determine the number of features:
当经过点线特征提取后,为保证后续的匹配过程能够有充足的信息,因此需要保证特征的数量。当点特征数量大于阈值Pkf且线特征数量大于Lkf时,当前帧才可能被确立为关键帧。After point and line feature extraction, in order to ensure that the subsequent matching process has sufficient information, the number of features needs to be guaranteed. When the number of point features is greater than the threshold P kf and the number of line features is greater than L kf , the current frame may be established as a key frame.
步骤2.3确定共视关系:Step 2.3 Determine the common view relationship:
新确立的关键帧既需要与上一关键帧保证一定的间隔,减少信息的冗余,又需要和上一关键帧保持信息的连续,因此通过共视关系进行检验。共视指同一特征在两幅图像中都出现过,因此需要当前帧中的特征与上一关键帧中的特征的共视比例维持在阈值区间[CovL,CovU]。The newly established key frame needs to maintain a certain interval with the previous key frame to reduce information redundancy, and also needs to maintain information continuity with the previous key frame, so it is tested through the common view relationship. Common view means that the same feature appears in both images, so the common view ratio of the feature in the current frame and the feature in the previous key frame needs to be maintained within the threshold range [Cov L ,Cov U ].
如果当前帧通过以上步骤2.1-步骤2.3的全部三个检验标准,则将其确立为新的关键帧KFn。经过以上三个检验标准进行判断,可以获得具有代表性的关键帧,之后回环检测只针对关键帧进行后续全局描述子的提取和匹配。If the current frame passes all three test criteria of steps 2.1 to 2.3 above, it will be established as a new key frame KF n . After judging by the above three test criteria, a representative key frame can be obtained, and then the loop detection only extracts and matches the subsequent global descriptors for the key frame.
步骤3、提取全局描述子:Step 3: Extract global descriptor:
为了获得全局图像的描述子向量,本发明采用VLAD(Vector of LocalAggregated Descriptors,局部描述子聚合向量)的思路,但不仅限于对特征点的聚类描述,还实现了对线特征的聚类描述,从而实现了包含点线特征的全局描述子提取。In order to obtain the descriptor vector of the global image, the present invention adopts the idea of VLAD (Vector of Local Aggregated Descriptors), but is not limited to the cluster description of feature points, but also realizes the cluster description of line features, thereby realizing the extraction of global descriptors including point and line features.
步骤3.1点特征聚类:Step 3.1 Point feature clustering:
对于点特征,本发明利用Kmeans++聚类,实现简单快速,其在初始化时逐个选取K个簇中心,且离其它簇中心越远的样本点越有可能被选为下一个簇中心,之后根据特征点与簇中心的距离不断更新K个簇和簇中心。完成聚类后最终获得K个特征点簇和对应的聚类中心特征点 For point features, the present invention uses Kmeans++ clustering, which is simple and fast. It selects K cluster centers one by one during initialization, and the sample points farther away from other cluster centers are more likely to be selected as the next cluster center. Then, the K clusters and cluster centers are continuously updated according to the distance between the feature points and the cluster centers. After clustering is completed, K feature point clusters and corresponding cluster center feature points are finally obtained.
步骤3.2线特征分组:Step 3.2 Line feature grouping:
考虑到现实室内场景,线特征常常出现在物体的边缘,且通常多条连续(具备同一个端点)的线段构成一个结构(如墙角、门框),可以简要地将其归结为,具备连续关系的线段近似属于同一个结构,而本发明就基于此特性进行线特征分组。Taking into account real indoor scenes, line features often appear at the edges of objects, and usually multiple continuous (with the same endpoint) line segments constitute a structure (such as a wall corner, door frame). It can be simply summarized as line segments with a continuous relationship approximately belong to the same structure, and the present invention groups line features based on this feature.
初始化时逐步选择线段,确保新选择的线段与已选择的线段最小端点距离大于阈值,最终获得L个初始线段,分属L个线段组。当图片中无法寻找到L条这样的线段时,对不足的线段进行随机挑选。During initialization, line segments are selected step by step to ensure that the distance between the minimum endpoint of the newly selected line segment and the selected line segment is greater than the threshold, and finally L initial line segments are obtained, which belong to L line segment groups. When L such line segments cannot be found in the image, the insufficient line segments are randomly selected.
之后开始对剩余线段与已确立的线段组中的线段进行距离计算,当某一剩余线段ll与某线段组中任意线段la的最小端点距离min{||{el1,el2}-{ea1,ea2}||小于阈值Tc时,就认为二者连续,并将其加入该线段组,当某线段同时与两个线段组满足以上关系时,则合并这两个线段组。剩余线段ll的两个端点为el1,el2,任意线段la的两个端点为ea1,ea2。由此遍历所有线特征,遍历完成后,通常会出现两种情况:Then, the distance calculation between the remaining line segments and the line segments in the established line segment group is started. When the minimum endpoint distance min{||{e l1 ,e l2 }-{e a1 ,e a2 }|| between a remaining line segment l l and any line segment l a in a line segment group is less than the threshold T c , the two are considered continuous and added to the line segment group. When a line segment satisfies the above relationship with two line segment groups at the same time, the two line segment groups are merged. The two endpoints of the remaining line segment l l are e l1 ,e l2 , and the two endpoints of any line segment l a are e a1 ,e a2 . All line features are traversed in this way. After the traversal is completed, two situations usually occur:
此时仍有不属于任何一个线段簇的线段,此时将其加入线段数量最小的线段组;At this time, there are still line segments that do not belong to any line segment cluster. At this time, add them to the line segment group with the smallest number of line segments;
此时线段组的数量已经不足L,则从数量最大的线段组进行拆分,将其均匀地对分为两个线段组,直至满足L个线段组。At this time, the number of line segment groups is less than L, so the line segment group with the largest number is split and evenly divided into two line segment groups until L line segment groups are met.
当获得L个线段组后,就要计算其组中心点,通过计算属于该组线段的特征点的质心,再将这些特征点中与该质心距离最近的特征点作为组中心。至此,获得了L个线特征组和其对应的组中心特征点 After obtaining L line segment groups, its group center point is calculated. By calculating the centroid of the feature points belonging to the group segment, the feature point closest to the centroid is used as the group center.
步骤3.3全局特征残差计算:Step 3.3 Global feature residual calculation:
在得到L个线特征组和K个点特征簇以及对应的(L+K)个中心特征点后,通过将所有特征点与其中心特征点相减,就得到能够表达L+K个聚类范围的特征分布,而这L+K个局部集合特征则构成了全局描述子。具体计算公式如下:After obtaining L line feature groups and K point feature clusters and the corresponding (L+K) central feature points, by subtracting all feature points from their central feature points, we can get the feature distribution that can express the L+K clustering ranges, and these L+K local set features constitute the global descriptor. The specific calculation formula is as follows:
其中,V为全局描述子矩阵,V(m,j)为该矩阵的第m行第j列的元素,而其中j代表特征点描述子的第j维,共256维,m代表第m个局部集合,其中前L个为L个线特征组,后K个为K个点特征簇,共L+K维;首先计算L个线特征组,此时xi代表属于线特征中特征点集合{xl}中的点,代表特征点xi的描述子,而/>代表第m个线特征组中心特征点cm的描述子。ak为归属函数,当特征点xi属于第m个聚类簇时,ak为1,否则ak为0。之后计算K个特征点簇,此时xi代表属于线特征中特征点集合{xp}中的点,/>代表特征点xi的描述子,而/>代表第(m-L)个点特征簇聚类中心特征点cm-L的描述子。最终得到的V为(L+K)*256矩阵,其中每一行向量都是对一个线特征组或点特征聚类簇的描述向量。该式累加了每个聚类的所有特征残差,最终得到了(L+K)个全局特征。这(L+K)个全局特征表达了聚类范围内局部特征的某种分布,这种分布通过各个特征点与特征中心的相减,抹去了图像本身的特征分布差异,只保留了局部特征与聚类中心的分布差异。Where V is the global descriptor matrix, V(m,j) is the element in the mth row and jth column of the matrix, and j represents the jth dimension of the feature point descriptor, with a total of 256 dimensions. m represents the mth local set, of which the first L are L line feature groups and the last K are K point feature clusters, with a total of L+K dimensions. First, calculate the L line feature groups. At this time, xi represents the points in the feature point set { xl } in the line feature. represents the descriptor of feature point x i , and/> Represents the descriptor of the central feature point c m of the mth line feature group. a k is the belonging function. When the feature point xi belongs to the mth cluster, a k is 1, otherwise a k is 0. Then calculate K feature point clusters. At this time, xi represents a point in the feature point set {x p } in the line feature./> represents the descriptor of feature point x i , and/> Represents the descriptor of the feature point c mL of the cluster center of the (mL)th point feature cluster. The final V is a (L+K)*256 matrix, in which each row vector is a description vector of a line feature group or a point feature cluster. This formula accumulates all feature residuals of each cluster and finally obtains (L+K) global features. These (L+K) global features express a certain distribution of local features within the cluster range. This distribution eliminates the feature distribution differences of the image itself by subtracting each feature point from the feature center, and only retains the distribution differences between the local features and the cluster center.
步骤3.4L2范数归一化:Step 3.4 L2 norm normalization:
为了方便计算,将获得(K+L)*256维矩阵进行一维展开,即将行向量进行拼接,获得(L+K)*256一维向量V。之后对该描述向量进行L2范数归一化:从而在之后计算相似度时,可以确保每个向量的表达形式在数值上是统一的,从而更容易比较图像之间的相似度。For the convenience of calculation, the (K+L)*256 dimensional matrix is expanded in one dimension, that is, the row vectors are concatenated to obtain a (L+K)*256 one-dimensional vector V. Then the description vector is normalized by the L2 norm: Therefore, when calculating the similarity later, it can be ensured that the expression of each vector is numerically uniform, making it easier to compare the similarity between images.
步骤4、匹配历史关键帧:Step 4: Match historical keyframes:
对每一帧关键帧进行全局描述子的提取后,就要对当前关键帧的描述子与历史关键帧进行匹配。After extracting the global descriptor for each key frame, the descriptor of the current key frame must be matched with the historical key frames.
步骤4.1相似度计算:Step 4.1 Similarity calculation:
本发明利用余弦相似度进行向量匹配,遍历所有历史关键帧KFj的全局描述子Vj并与当前关键帧KFn的全局描述子Vn,计算其相似度Sim(n,j)为:The present invention uses cosine similarity to perform vector matching, traverses the global descriptors Vj of all historical key frames KFj and compares them with the global descriptor Vn of the current key frame KFn , and calculates their similarity Sim(n,j) as:
由于描述子向量已经进行L2范数归一化,所以||Vn||=||Vj||=1,则Sim(n,j}=Vn*Vj。完成对所有历史关键帧的相似度计算后,若没有相似度大于阈值Simth的关键帧,则代表该图像中的位置没有在历史信息中出现过,反之则选出大于阈值Simth的相似度最大前3个候选关键帧{KFc}(不足3个则选择所有大于阈值Simth的关键帧),并根据相似度大小依次进行几何检验。Since the descriptor vector has been L2-norm normalized, || Vn ||=|| Vj ||=1, then Sim(n,j}= Vn * Vj . After completing the similarity calculation for all historical key frames, if there is no key frame with a similarity greater than the threshold Simth , it means that the position in the image has never appeared in the historical information. Otherwise, the top three candidate key frames { KFc } with the largest similarity greater than the threshold Simth are selected (if there are less than three, all key frames greater than the threshold Simth are selected), and geometric tests are performed in turn according to the similarity.
步骤4.2进行几何校验:Step 4.2: Perform geometry verification:
将当前关键帧和候选关键帧进行特征点与线特征匹配,进行RANSAC(RAndomSAmple Consensus,随机采样一致)外点剔除后完成位姿解算,并且根据该位姿进行双向重投影,得到优化过后匹配的内点,当内点数量大于阈值Pth时,则可认为两者符合现实的几何关系,即是现实世界的同一位置。当所有候选关键帧都无法通过几何校验时,则认为此时只是外观相近,而并非同一地点。The current keyframe and the candidate keyframes are matched with feature points and line features, and the pose solution is completed after RANSAC (RAndomSAmple Consensus) outliers are eliminated. Bidirectional reprojection is performed based on the pose to obtain optimized matching inliers. When the number of inliers is greater than the threshold Pth , it can be considered that the two conform to the actual geometric relationship, that is, they are at the same location in the real world. When all candidate keyframes fail to pass the geometric check, it is considered that they are only similar in appearance, but not at the same location.
步骤4.3进行连续性校验:Step 4.3: Continuity check:
当有候选关键帧KFc通过几何校验时,为确保是同一位置,还需要对该关键帧之后的连续三帧历史帧Fc+1、Fc+2、Fc+3与当前关键帧Vn之后的连续三帧Fn+1、Fn+2、Fn+3同样依次进行几何校验。当连续帧都通过几何校验后,则可以确认当前来到了同一位置,之后向定位系统输出当前关键帧和候选关键帧的匹配关系,从而完成回环检测。When a candidate key frame KF c passes the geometric check, to ensure that it is at the same position, it is also necessary to perform geometric checks on the three consecutive historical frames F c+1 , F c+2 , and F c+3 after the key frame and the three consecutive frames F n+1 , F n+2 , and F n+3 after the current key frame V n . When all consecutive frames pass the geometric check, it can be confirmed that the current position is the same, and then the matching relationship between the current key frame and the candidate key frame is output to the positioning system to complete the loop detection.
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410197669.8A CN118038239A (en) | 2024-02-22 | 2024-02-22 | An indoor visual loop detection method based on point-line feature global descriptor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410197669.8A CN118038239A (en) | 2024-02-22 | 2024-02-22 | An indoor visual loop detection method based on point-line feature global descriptor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118038239A true CN118038239A (en) | 2024-05-14 |
Family
ID=91000013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410197669.8A Pending CN118038239A (en) | 2024-02-22 | 2024-02-22 | An indoor visual loop detection method based on point-line feature global descriptor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118038239A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119068267A (en) * | 2024-11-04 | 2024-12-03 | 深圳爱莫科技有限公司 | An intelligent similar image recognition and retrieval method and system |
-
2024
- 2024-02-22 CN CN202410197669.8A patent/CN118038239A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119068267A (en) * | 2024-11-04 | 2024-12-03 | 深圳爱莫科技有限公司 | An intelligent similar image recognition and retrieval method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110189304B (en) | On-line fast detection method of optical remote sensing image target based on artificial intelligence | |
CN110136058B (en) | Drawing construction method based on overlook spliced drawing and vehicle-mounted terminal | |
WO2021031954A1 (en) | Object quantity determination method and apparatus, and storage medium and electronic device | |
CN114241464A (en) | Cross-view image real-time matching geolocation method and system based on deep learning | |
CN115100643B (en) | Monocular vision localization enhancement method and device integrating 3D scene semantics | |
CN116205989A (en) | Target detection method, system and equipment based on laser radar and camera fusion | |
CN109389621B (en) | RGB-D target tracking method based on multi-modal depth feature fusion | |
US20230281867A1 (en) | Methods performed by electronic devices, electronic devices, and storage media | |
CN113936109A (en) | Method, device, device and storage medium for processing high-precision map point cloud data | |
CN118038239A (en) | An indoor visual loop detection method based on point-line feature global descriptor | |
CN112348887A (en) | Terminal pose determining method and related device | |
CN113704276A (en) | Map updating method and device, electronic equipment and computer readable storage medium | |
CN114842287B (en) | Monocular three-dimensional target detection model training method and device of depth-guided deformer | |
CN115909173A (en) | Object tracking method, tracking model training method, device, equipment and medium | |
Jiang et al. | Multilayer map construction and vision-only multi-scale localization for intelligent vehicles in underground parking | |
CN114155524A (en) | Single-stage 3D point cloud target detection method and device, computer equipment, and medium | |
CN113628272A (en) | Indoor positioning method and device, electronic equipment and storage medium | |
CN117746418B (en) | Target detection model construction method, target detection method and related device | |
CN117351310B (en) | Multimodal 3D target detection method and system based on depth completion | |
CN118397588A (en) | Camera scene analysis method, system, equipment and medium for intelligent driving automobile | |
CN116027296A (en) | Radar target detection method, device, equipment and medium | |
CN117036373A (en) | Large-scale point cloud segmentation method based on round function bilinear regular coding and decoding network | |
CN118465783B (en) | Building change detection method and related equipment based on satellite three-dimensional image | |
CN116628251B (en) | Search methods, devices, equipment and media for safe areas on the lunar surface | |
Chai et al. | Tourist street view navigation and tourist positioning based on multimodal wireless virtual reality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |