[go: up one dir, main page]

CN116468786B - Semantic SLAM method based on point-line combination and oriented to dynamic environment - Google Patents

Semantic SLAM method based on point-line combination and oriented to dynamic environment Download PDF

Info

Publication number
CN116468786B
CN116468786B CN202211619407.3A CN202211619407A CN116468786B CN 116468786 B CN116468786 B CN 116468786B CN 202211619407 A CN202211619407 A CN 202211619407A CN 116468786 B CN116468786 B CN 116468786B
Authority
CN
China
Prior art keywords
point
matching
line
feature
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211619407.3A
Other languages
Chinese (zh)
Other versions
CN116468786A (en
Inventor
杨健
董军宇
范浩
饶源
时正午
杨凯
李丛
刘伊美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202211619407.3A priority Critical patent/CN116468786B/en
Publication of CN116468786A publication Critical patent/CN116468786A/en
Application granted granted Critical
Publication of CN116468786B publication Critical patent/CN116468786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a semantic SLAM method based on point-line combination, which is oriented to a dynamic environment, improves on the basis of ORB-SLAM3, and is oriented to the dynamic environment. The method is used for extracting the point and line characteristics, and using the point and line characteristics for accurate matching and repositioning of Lu Bang under the scene lacking texture and illumination change to estimate the pose of the camera, so that the positioning error and repositioning error are reduced, and the algorithm solves the problems of failure detection and difficult positioning of the characteristic points in the weak texture area and the illumination change scene.

Description

Semantic SLAM method based on point-line combination and oriented to dynamic environment
Technical Field
The invention relates to the field of computer vision, in particular to a semantic SLAM method based on point-line combination and oriented to a dynamic environment.
Background
The synchronous positioning and map construction technology (Simultaneous Localization and Mapping, SLAM) refers to that a robot collects surrounding environment information by using various sensors carried by the robot in an unknown environment, analyzes the position of the robot by an algorithm and establishes a map of the surrounding environment, wherein vision SLAM (Visual SLAM) mainly uses cameras to acquire data, including monocular, binocular, RGB-D cameras and the like, and the camera sensor used by the robot has the characteristics of high cost performance, small volume, low power consumption and capability of acquiring abundant environment information, so that the robot becomes a popular research field in recent years.
Various algorithms of the traditional visual SLAM can obtain good feature matching in a static scene, and mismatching can occur in a dynamic scene, so that great errors are generated in positioning and mapping of the SLAM system. Therefore, aiming at the problem that the positioning accuracy and the robustness of an SLAM system are reduced when a dynamic moving object exists in an application scene, a semantic SLAM method and a semantic SLAM system based on feature points and feature lines are provided.
The existing semantic SLAM technology mainly aims at a scene with a dynamic object, and the existing semantic SLAM technology mainly adopts a mode that pixels on all prior dynamic objects are deleted, the rest pixels are utilized for feature extraction and subsequent positioning research, or all dynamic feature points are deleted, only static feature points are adopted for feature point matching and rear end processing, the method can improve the positioning precision of a camera in a dynamic scene with rich textures, but for the scene with the dynamic object with low textures and strong illumination, only the information of the feature points and the semantics is adopted, so that enough data are difficult to obtain, tracking loss of a SLAM system is easy to be caused, and the positioning precision is reduced.
Currently, vision-based SLAM algorithm research has made great progress, such as ORB-SLAM2 (Orient FAST and Rotated BRIEF SLAM), LSD-SLAM (Large Scale Direct monocular SLAM), and the like. However, these algorithms are generally based on a strong assumption that a static working environment has many features and no obvious illumination changes, and have strict limitations on the application environment. The assumption influences the applicability of the visual SLAM system in an actual scene, when the environment is a dynamic weak texture area and has illumination change, the characteristic points are sensitive to the scene and are difficult to detect, the accuracy and the robustness of camera pose estimation can be reduced, errors are caused to the positioning based on vision, and a large deviation occurs to the three-dimensional reconstruction result.
The camera is typically in motion during the mobile robot's positioning and mapping process using the camera. This makes classical motion segmentation methods such as background removal (Background Subtraction) unusable in visual SLAM. Early SLAM systems mostly employed data optimization methods to reduce the effects of dynamic objects. A Random sampling consistency detection (Random SampleConsensus, RANSAC) algorithm is used for roughly estimating a basic matrix between two frames, semantic information and a mobile consistency detection result are combined, the establishment of a two-stage semantic knowledge base is completed, and all feature points in a dynamic contour are deleted as noise or discrete points. And eliminating the inter-frame characteristic point matching pairs on the dynamic object by using a RANSAC algorithm, and reducing the influence of the dynamic object on the SLAM system to a certain extent. These methods all implicitly assume that the objects in the image are mostly static and will fail when the data generated by the dynamic object exceeds a certain threshold.
In the prior art, researches on visual positioning, robot navigation and the like in scenes with abundant features such as cities, indoors and the like have been advanced to a certain extent, but many research contents are still insufficient, and for scenes with low texture and illumination variation with geometric features, the following problems still exist in visual positioning:
(1) The existing method is influenced by the problems of shielding, missing and the like of objects in the aspect of feature detection, and the complete geometric features are difficult to detect from the image, so that the pose of a camera is difficult to calculate;
(2) The existing method is affected by few textures and few feature points in the low-texture image, so that features of the image are difficult to extract, or feature matching errors are caused, SLAM tracking and repositioning are invalid, and camera pose recognition is poor;
(3) In the area with obvious illumination change, the detection of the characteristic points is sensitive, and the problems of difficult detection of the characteristic points, no matching and the like are easy to occur, so that the pose of a camera is inaccurate;
and combining MASK-RCNN with multi-view geometry to realize the example segmentation and rejection of the dynamic target, simultaneously identifying dynamic characteristic points, eliminating the interference of the dynamic target on characteristic matching and eliminating the influence of the dynamic target on an SLAM system.
Disclosure of Invention
The invention improves on the basis of ORB-SLAM3, and provides a semantic SLAM method based on point line characteristics, compared with the point characteristics, the line provides more geometric structure information about the environment, and the camera pose is jointly optimized through the point line, so that the camera positioning precision and robustness are improved. The method is used for extracting the point and line characteristics, and using the point and line characteristics for accurate matching and repositioning of Lu Bang under the scene lacking texture and illumination change to estimate the pose of the camera, so that the positioning error and repositioning error are reduced, and the algorithm solves the problems of failure detection and difficult positioning of the characteristic points in the weak texture area and the illumination change scene.
The invention is realized by the following technical scheme: a semantic SLAM method facing dynamic environment based on point-line combination specifically comprises the following steps:
step S1: acquiring an image stream of a scene, transmitting the image stream into a CNN network frame by frame, dividing an object with a priori dynamic property pixel by pixel, dividing the dynamic object in the scene to obtain a key frame image, and complementing a static scene blocked by a dynamic target by utilizing information of the previous frames;
step S2: for step S1: extracting feature points and feature lines from the obtained key frame image, constructing a local map related to the current frame image, including a key frame image sharing a common view point with the current frame image and adjacent frame images of the key frame image, searching feature points and line segments matched with the current frame image in the key frame image and the adjacent frame images of the key frame image, then carrying out dynamic consistency check on the prior dynamic object, removing the feature points and the feature lines on the dynamic object, reserving the feature points and the feature lines on the static object, and carrying out matching by utilizing the rest static feature points and the rest static lines;
step S3: matching the characteristic points and the characteristic lines in the step S2, filtering at the same time, removing the points and the lines which are incorrectly matched to obtain correct matching point pairs and line pairs, and obtaining the initial camera pose by using the matching point pairs;
step S4: calculating the camera pose of the current frame through the matching point pair and the line pair obtained in the step S3, and obtaining accurate camera pose estimation by minimizing the re-projection error of the point pair and the line pair;
step S5: constructing a local map about a scene by utilizing a key frame image, carrying out instance segmentation on each frame image, merging characteristic points and characteristic lines in each instance into corresponding instances, positioning a camera pose by utilizing the characteristic points and the characteristic lines, and calculating point clouds of objects and the scene to obtain a sparse point cloud map;
step S6: and (3) performing pose optimization by using loop detection, correcting drift errors, and obtaining more accurate camera pose estimation.
As a preferred scheme, step S1 is to extract feature points and feature lines of a static region on a key frame image, and extract feature points and feature lines of the static region of the key frame image, and specifically includes the following steps: and extracting the characteristics of the image static region by using ORB characteristic points, simultaneously calculating ORB descriptors to obtain characteristic points and descriptors of the image static region, extracting line characteristics of the image from which the dynamic object is removed, wherein the extraction of the line characteristics adopts a network structure of a transducer, and the line characteristics on the image static region are obtained by fusing characteristic information under different scales through a series of up-sampling and down-sampling operations.
Further, the extracted line features employ horizontal distancesAnd vertical distanceGenerating vector->To predict the positions of the two end points of a single line segment to obtain line characteristics, wherein +.>Andrepresenting coordinates of left and right end points of the line segment, < >>Is the midpoint coordinate of the line segment, ">Represent right endpoint->Coordinates and midpointA vector of relationships between coordinatesIn the present method->And->Expressed as: />
As a preferred solution, the matching of the feature points and the feature lines in step S3 specifically includes the following steps: the feature point matching is to find out a feature point with the closest descriptor distance as a matching point in the current frame through quick nearest neighbor search by generating ORB descriptors, then to reject the mismatching point pair, when the matching descriptor distance is larger than a threshold gamma or the ratio of the optimal matching point distance to the second optimal matching point distance is smaller than 1, the second matching point is equivalent to the first matching point, then the matching point pair is considered to be easy to be mismatched, and the matching point pair is rejected; the matching of the characteristic lines is to obtain 2D-2D matching line pairs through geometric constraint, map the 2D-2D matching line pairs to a 3D space directly through outlier rejection, and then obtain accurate 2D-3D line matching pairs by minimizing the reprojection error.
As a preferred solution, the optimization of the camera pose by minimizing the re-projection errors of the point pairs and the line pairs in step S4 is specifically implemented as follows:
the position and posture are jointly optimized by adopting the dotted line, and the minimized reprojection error is defined as:
wherein the method comprises the steps of
Wherein N represents a pair of matching lines on 2D-3D, a functionEqual to 3D line->Line projected onto 2D plane, angle error +.>By defining two planes +.>And->Defined, function->Equal to 3D point->Points on the 2D plane of the graph, +.>And->Is a given weight value, and optimizes the camera pose by minimizing the re-projection error.
In the preferred scheme, in step S5, the point cloud processing is performed through local mapping, and the pose of the camera is optimized by global repositioning, so as to obtain a sparse point cloud reconstruction map, which specifically comprises the following steps:
calculating a BOW vector of each frame of data stream, calculating the current frame image comprising the BOW vector and the common view relation information, inserting the current frame image into a map, and updating the common view; in the tracking process, each key frame is attached with information comprising feature points, feature lines and descriptors, and then map points are created by utilizing triangulation; judging whether other key frames exist in the key frame queue, if not, optimizing map points, and performing local BA optimization by using the current frame, the key frame image sharing the common view point with the current frame image and the adjacent frame images of the key frame image;
and (3) finding candidate key frames corresponding to the current frame, matching the current frame with the key frames by using a BOW dictionary for each candidate key frame, initializing by using the matching relation between the current frame and the candidate key frames, and estimating the pose by using EPnP for each candidate key frame.
Further, in step S6, optimizing the pose of the camera through loop detection specifically includes the following steps:
based on two characteristics of points and lines, performing loop detection by using key frames, when three continuous closed-loop candidate key frames have higher similarity with the current key frame, obtaining loop candidate frames, firstly matching characteristic points and characteristic lines on each candidate loop frame with the current frame, then solving a similar transformation matrix by using three-dimensional information corresponding to the characteristic points and the characteristic lines, if enough inner points and inner lines exist in the loop frame, performing Sim (3) optimization, performing loop correction by using the loop candidate frames, optimizing characteristic point constraint and line segment constraint, and obtaining the camera pose after point-line joint optimization.
(1) The invention adopts the technical proposal, and compared with the prior art, the invention has the following beneficial effects: the invention improves on the basis of ORB-SLAM3, proposes a SLAM algorithm based on feature points, feature lines and semantic information, combines MASK-RCNN with multi-view geometry, realizes the example segmentation and rejection of dynamic targets, simultaneously identifies dynamic feature points and feature lines, eliminates the interference of the dynamic targets on feature matching, eliminates the influence of the dynamic targets on a SLAM system, and completes the static scene blocked by the dynamic targets by utilizing the information of the previous frames;
(2) The invention provides a semantic SLAM system based on feature points and feature lines, which adopts a structure of a transducer to extract line features, and the line features extracted by the method are more accurate than those extracted by the traditional method;
compared with the point features, the line provides more geometric structure information about the environment, the point and line features are extracted, the point and line features can be more accurately matched with Lu Bang under the scene of weak texture and illumination change, the pose estimation of a camera is realized, the positioning error and repositioning error are reduced, and the algorithm solves the problem of difficult positioning under the low-texture scene.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a feature line detection diagram;
FIG. 2 is a flow chart of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
The semantic SLAM method based on the dotted line combination for the dynamic environment according to the embodiment of the present invention is specifically described below with reference to fig. 1 to 2.
As shown in fig. 1 and fig. 2, the invention provides a semantic SLAM method based on point-line combination and oriented to a dynamic environment, which is characterized by comprising the following steps:
step S1: acquiring an image stream of a scene, transmitting the image stream into a CNN network frame by frame, dividing objects with priori dynamic properties such as pedestrians, vehicles, fish and the like pixel by pixel, and dividing dynamic objects in the scene to obtainThe key frame image is used for complementing the static scene shielded by the dynamic target by utilizing the information of the previous frames; extracting feature points and feature lines of a static region on a key frame image, and extracting the feature points and the feature lines of the static region of the key frame image, wherein the method specifically comprises the following steps of: and extracting the characteristics of the image static region by using ORB characteristic points, simultaneously calculating ORB descriptors to obtain characteristic points and descriptors of the image static region, extracting line characteristics of the image from which the dynamic object is removed, wherein the extraction of the line characteristics adopts a network structure of a transducer, and the line characteristics on the image static region are obtained by fusing characteristic information under different scales through a series of up-sampling and down-sampling operations. Extracting line characteristics by using length of line segmentAnd the angle theta acquires two endpoints of the line segment, and for the long line segment, the small change of the angle can greatly influence the position of the endpoint of the line segment, so that larger line error is caused, and the method adopts horizontal distance +.>And vertical distance->Generating vector->To predict the positions of the two end points of a single line segment to obtain line characteristics, wherein +.>And->Representing coordinates of left and right end points of the line segment, < >>Is the midpoint coordinate of the line segment, ">Represent right endpoint->Coordinates and midpoint->A vector of the relation between coordinates, in the method +.>And->Expressed as: />,/>
Step S2: for step S1: extracting feature points and feature lines from the obtained key frame image, constructing a local map related to the current frame image, including a key frame image sharing a common view point with the current frame image and adjacent frame images of the key frame image, searching feature points and line segments matched with the current frame image in the key frame image and the adjacent frame images of the key frame image, then carrying out dynamic consistency check on the prior dynamic object, removing the feature points and the feature lines on the dynamic object, reserving the feature points and the feature lines on the static object, and carrying out matching by utilizing the rest static feature points and the rest static lines;
step S3: matching the characteristic points and the characteristic lines in the step S2, filtering at the same time, removing the points and the lines which are incorrectly matched to obtain correct matching point pairs and line pairs, and obtaining the initial camera pose by using the matching point pairs; the matching of the feature points and the feature lines specifically comprises the following steps: the feature point matching is to find out a feature point with the closest descriptor distance as a matching point in the current frame through quick nearest neighbor search by generating ORB descriptors, then to reject the mismatching point pair, when the matching descriptor distance is larger than a threshold gamma or the ratio of the optimal matching point distance to the second optimal matching point distance is smaller than 1, the second matching point is equivalent to the first matching point, then the matching point pair is considered to be easy to be mismatched, and the matching point pair is rejected; the matching of the characteristic lines is to obtain 2D-2D matching line pairs through geometric constraint, map the 2D-2D matching line pairs to a 3D space directly through outlier rejection, and then obtain accurate 2D-3D line matching pairs by minimizing the reprojection error. The initial camera pose calculation specifically comprises the following steps: and calculating a basic matrix and an essential matrix through the feature points and the feature lines, and obtaining a relatively accurate pose transformation matrix between cameras through SVD decomposition.
Step S4: calculating the camera pose of the current frame through the matching point pair and the line pair obtained in the step S3, and obtaining accurate camera pose estimation by minimizing the re-projection error of the point pair and the line pair; the specific implementation of optimizing the camera pose by minimizing the reprojection error of the point pair and the line pair is as follows:
the position and posture are jointly optimized by adopting the dotted line, and the minimized reprojection error is defined as:
wherein the method comprises the steps of
Wherein N represents a pair of matching lines on 2D-3D, a functionEqual to 3D line->Line projected onto 2D plane, angle error +.>By defining two planesFace->And->Defined, functionEqual to 3D point->Dot of figure onto 2D plane +.>And->Is a given weight value, and optimizes the camera pose by minimizing the re-projection error.
Step S5: constructing a local map about a scene by utilizing a key frame image, carrying out instance segmentation on each frame image, merging characteristic points and characteristic lines in each instance into corresponding instances, positioning a camera pose by utilizing the characteristic points and the characteristic lines, calculating point clouds of an object and the scene, carrying out point cloud processing by utilizing the local map, and optimizing the camera pose by utilizing global repositioning, thereby obtaining a sparse point cloud reconstruction map, and specifically comprising the following steps:
calculating a BOW vector of each frame of data stream, calculating the current frame image comprising the BOW vector and the common view relation information, inserting the current frame image into a map, and updating the common view; in the tracking process, each key frame is attached with information comprising feature points, feature lines and descriptors, but not all feature points become 3D map points, so that unqualified feature points and feature lines need to be removed, and then the map points are created by utilizing triangulation; judging whether other key frames exist in the key frame queue, if not, optimizing map points, and performing local BA optimization by using the current frame, the key frame image sharing the common view point with the current frame image and the adjacent frame images of the key frame image;
and (3) finding candidate key frames corresponding to the current frame, matching the current frame with the key frames by using a BOW dictionary for each candidate key frame, initializing by using the matching relation between the current frame and the candidate key frames, and estimating the pose by using EPnP for each candidate key frame.
Step S6: and (3) performing pose optimization by using loop detection, correcting drift errors, and obtaining more accurate camera pose estimation. The method specifically comprises the following steps:
based on two characteristics of points and lines, performing loop detection by using key frames, when three continuous closed-loop candidate key frames have higher similarity with the current key frame, obtaining loop candidate frames, firstly matching characteristic points and characteristic lines on each candidate loop frame with the current frame, then solving a similar transformation matrix by using three-dimensional information corresponding to the characteristic points and the characteristic lines, if enough inner points and inner lines exist in the loop frame, performing Sim (3) optimization, performing loop correction by using the loop candidate frames, optimizing characteristic point constraint and line segment constraint, and obtaining the camera pose after point-line joint optimization.
In the description of the present invention, the term "plurality" means two or more, unless explicitly defined otherwise, the orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present invention; the terms "coupled," "mounted," "secured," and the like are to be construed broadly, and may be fixedly coupled, detachably coupled, or integrally connected, for example; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1.一种面向动态环境的基于点线联合的语义SLAM 方法,其特征在于,具体包括以下步骤:1. A semantic SLAM method based on point-line union for dynamic environments, which is characterized by including the following steps: 步骤S1:获取场景的图像流,将图像流逐帧传入到CNN网络中,对先验动态性质的物体进行逐像素的分割,分割出场景中的动态物体,得到关键帧图像,并利用前几帧的信息,将动态目标遮挡的静态场景补全;提取关键帧图像上静态区域的特征点和特征线,对关键帧图像静态区域提取特征点及特征线,具体包括以下步骤:采用ORB特征点提取图像静态区域的特征,同时计算ORB描述子,获得图像静态区域的特征点和描述子,对去除动态物体的图像进行线特征提取,线特征的提取采用Transformer的网络结构,通过一系列的上采样和下采样操作,融合不同尺度下的特征信息,获得图像静态区域上的线特征;提取线特征,采用水平距离和垂直距离/>生成向量/>来预测单个线段两个端点的位置,获得线特征,其中/>和/>表示线段的左右两个端点坐标,/>为线段的中点坐标,/>表示右端点/>坐标和中点/>坐标之间关系的一个向量,本方法中/>和/>分别表示为:/>,/>Step S1: Obtain the image stream of the scene, pass the image stream frame by frame into the CNN network, segment the a priori dynamic objects pixel by pixel, segment the dynamic objects in the scene, obtain the key frame image, and use the previous Several frames of information are used to complete the static scene blocked by the dynamic target; extract feature points and feature lines in the static area of the key frame image, and extract feature points and feature lines from the static area of the key frame image. The specific steps include the following steps: Using ORB features Points extract the features of the static area of the image, and at the same time calculate the ORB descriptor to obtain the feature points and descriptors of the static area of the image. Line features are extracted from the image with dynamic objects removed. The line features are extracted using the Transformer network structure, through a series of Upsampling and downsampling operations combine feature information at different scales to obtain line features in the static area of the image; line features are extracted using horizontal distance and vertical distance/> Generate vector/> To predict the positions of the two endpoints of a single line segment and obtain line features, where/> and/> Indicates the coordinates of the left and right endpoints of the line segment, /> is the midpoint coordinate of the line segment,/> Indicates the right endpoint/> Coordinates and midpoints/> A vector of the relationship between coordinates, in this method/> and/> Respectively expressed as:/> ,/> ; 步骤S2:对步骤S1:获得的关键帧图像提取特征点和特征线,同时构建一个关于当前帧图像的局部地图,包括与当前帧图像有共视点的关键帧图像以及关键帧图像的相邻帧图像,在关键帧图像以及关键帧图像的相邻帧图像中查找与当前帧图像匹配的特征点和线段,然后对先验动态性质的物体进行动态一致性检查,剔除动态物体上的特征点和特征线,保留静态物体上的特征点和特征线,利用剩下的静态特征点和静态线进行匹配;Step S2: Extract feature points and feature lines from the key frame image obtained in Step S1, and construct a local map of the current frame image, including key frame images that have a common viewpoint with the current frame image and adjacent frames of the key frame image. image, find the feature points and line segments matching the current frame image in the key frame image and the adjacent frame images of the key frame image, and then conduct a dynamic consistency check on the objects with a priori dynamic properties, and eliminate the feature points and line segments on the dynamic objects. Feature lines, retain feature points and feature lines on static objects, and use the remaining static feature points and static lines for matching; 步骤S3:对步骤S2的特征点和特征线进行匹配,同时进行滤波,剔除错误匹配的点和线,获得正确的匹配点对和线对,利用匹配点对获得初始相机位姿;特征点和特征线的匹配的具体包括以下步骤:特征点的匹配是通过生成ORB描述子,通过快速最近邻搜索,在当前帧找到一个描述子距离最相近的特征点作为匹配点,然后剔除误匹配的点对,当匹配描述子距离大于阈值γ或最优匹配点距离与第二最优匹配点距离的比值小于1时,说明第二匹配点与第一匹配点相当,则认为匹配点对容易发生误匹配,舍弃该匹配点对;特征线的匹配是通过几何约束获得2D-2D的匹配线对,通过外点剔除直接将其映射到3D空间,然后通过最小化重投影误差,获得精确的2D-3D线匹配对;Step S3: Match the feature points and feature lines in step S2, and perform filtering at the same time to eliminate incorrectly matched points and lines, obtain correct matching point pairs and line pairs, and use the matching point pairs to obtain the initial camera pose; the feature points and The matching of feature lines specifically includes the following steps: the matching of feature points is by generating ORB descriptors, through fast nearest neighbor search, finding a feature point with the closest descriptor distance in the current frame as a matching point, and then eliminating mismatched points. Yes, when the matching descriptor distance is greater than the threshold γ or the ratio of the optimal matching point distance to the second optimal matching point distance is less than 1, it means that the second matching point is equivalent to the first matching point, and the matching point pair is considered prone to errors. matching, discarding the matching point pair; the matching of feature lines is to obtain a 2D-2D matching line pair through geometric constraints, directly map it to the 3D space through external point elimination, and then obtain an accurate 2D-2D space by minimizing the reprojection error. 3D line matching pairs; 步骤S4:通过步骤S3得到的匹配点对和线对计算当前帧的相机位姿,通过最小化点对和线对的重投影误差,获得精确的相机位姿估计;最小化点对和线对的重投影误差来优化相机位姿具体实现如下:Step S4: Calculate the camera pose of the current frame through the matching point pairs and line pairs obtained in step S3, and obtain an accurate camera pose estimate by minimizing the reprojection error of point pairs and line pairs; minimizing point pairs and line pairs The specific implementation of optimizing the camera pose using the reprojection error is as follows: 采用点线联合优化位姿,最小化重投影误差定义为:Using points and lines to jointly optimize the pose, minimizing the reprojection error is defined as: 其中in 其中,N表示2D-3D上的匹配线对,函数等于3D线/>投影到2D平面上的线,角度误差/>是通过定义两个平面/>和/>定义的,函数等于3D点/>图应到2D平面上的点,/>和/>是给定的权重值,通过最小化重投影误差来优化相机位姿Among them, N represents the matching line pair on 2D-3D, and the function Equal to 3D line/> Line projected onto 2D plane, angle error/> By defining two planes/> and/> defined, function Equal to 3D point/> The graph should go to a point on the 2D plane,/> and/> is a given weight value, optimizing the camera pose by minimizing the reprojection error 步骤S5:利用关键帧图像构建关于场景的局部地图,对每帧图像进行实例分割,将每个实例中的特征点和特征线归并到对应的实例中,利用特征点和特征线来定位相机位姿,计算物体和场景的点云获得稀疏点云地图;Step S5: Use key frame images to construct a local map of the scene, perform instance segmentation on each frame of image, merge the feature points and feature lines in each instance into the corresponding instance, and use the feature points and feature lines to locate the camera position. pose, calculate point clouds of objects and scenes to obtain sparse point cloud maps; 步骤S6:利用回环检测进行位姿优化,修正漂移误差,获得更精确的相机位姿估计。Step S6: Use loop closure detection to optimize the pose, correct the drift error, and obtain a more accurate camera pose estimate. 2.根据权利要求1所述的一种面向动态环境的基于点线联合的语义SLAM 方法,其特征在于,所述步骤S5中通过局部建图进行点云处理,利用全局重定位优化相机位姿,获得稀疏点云重建图,其具体包括步骤如下:2. A semantic SLAM method based on point-line union for dynamic environments according to claim 1, characterized in that in step S5, point cloud processing is performed through local mapping and global relocation is used to optimize the camera pose. , to obtain the sparse point cloud reconstruction map, which specifically includes the following steps: 计算每一帧数据流的BOW向量,计算当前帧图像包括BOW向量、共视关系信息并插入地图,更新共视图;在跟踪过程中,每个关键帧都附有包括特征点、特征线和描述子的信息,然后利用三角化创建地图点;判断关键帧队列中是否还有其他关键帧,若没有,则进行地图点的优化,利用当前帧及与当前帧图像有共视点的关键帧图像及关键帧图像的相邻帧图像进行局部BA优化;Calculate the BOW vector of each frame of data stream, calculate the current frame image including BOW vector, common view relationship information and insert it into the map, update the common view; during the tracking process, each key frame is accompanied by feature points, feature lines and descriptions sub-information, and then use triangulation to create map points; determine whether there are other key frames in the key frame queue, if not, optimize the map points, use the current frame and key frame images that have a common viewpoint with the current frame image and The adjacent frame images of the key frame image undergo local BA optimization; 找到当前帧对应的候选关键帧,对每个候选关键帧,使用BOW词典匹配当前帧与该关键帧,用当前帧和候选关键帧的匹配关系来初始化,对于每个候选关键帧,用EPnP估计位姿。Find the candidate key frame corresponding to the current frame. For each candidate key frame, use the BOW dictionary to match the current frame with the key frame. Use the matching relationship between the current frame and the candidate key frame to initialize. For each candidate key frame, use EPnP estimation. Posture. 3.根据权利要求2所述的一种面向动态环境的基于点线联合的语义SLAM 方法,其特征在于,所述步骤S6中通过回环检测优化相机位姿具体包括步骤如下:3. A semantic SLAM method based on point-line union for dynamic environments according to claim 2, characterized in that optimizing the camera pose through loop closure detection in step S6 specifically includes the following steps: 基于点和线两种特征,利用关键帧进行回环检测,当三个连续的闭环候选关键帧均与当前关键帧有较高的相似度,则得到回环候选帧,对每个候选的回环帧,先匹配其和当前帧上的特征点和特征线,然后利用特征点和特征线对应的三维信息求解一个相似变换矩阵,若回环帧中有足够的内点和内线,则做Sim(3)优化,利用回环候选帧进行回环修正,优化特征点约束与线段约束,得到点线联合优化后的相机位姿。Based on the two features of points and lines, key frames are used for loop closure detection. When three consecutive closed loop candidate key frames have a high degree of similarity with the current key frame, a loop closure candidate frame is obtained. For each candidate loop closure frame, First match it with the feature points and feature lines on the current frame, and then use the three-dimensional information corresponding to the feature points and feature lines to solve a similarity transformation matrix. If there are enough interior points and interior lines in the loop frame, do Sim(3) optimization , use loop closure candidate frames to perform loop closure correction, optimize feature point constraints and line segment constraints, and obtain the jointly optimized camera pose of points and lines.
CN202211619407.3A 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment Active CN116468786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211619407.3A CN116468786B (en) 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211619407.3A CN116468786B (en) 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment

Publications (2)

Publication Number Publication Date
CN116468786A CN116468786A (en) 2023-07-21
CN116468786B true CN116468786B (en) 2023-12-26

Family

ID=87181281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211619407.3A Active CN116468786B (en) 2022-12-16 2022-12-16 Semantic SLAM method based on point-line combination and oriented to dynamic environment

Country Status (1)

Country Link
CN (1) CN116468786B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173342B (en) * 2023-11-02 2024-07-02 中国海洋大学 Underwater monocular and binocular camera-based natural light moving three-dimensional reconstruction device and method
CN117690192B (en) * 2024-02-02 2024-04-26 天度(厦门)科技股份有限公司 Abnormal behavior identification method and equipment for multi-view instance-semantic consensus mining

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489501A (en) * 2019-07-24 2019-11-22 西北工业大学 SLAM system rapid relocation algorithm based on line feature
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN112132897A (en) * 2020-09-17 2020-12-25 中国人民解放军陆军工程大学 A visual SLAM method for semantic segmentation based on deep learning
CN112381890A (en) * 2020-11-27 2021-02-19 上海工程技术大学 RGB-D vision SLAM method based on dotted line characteristics
CN112396595A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on point-line characteristics in dynamic environment
CN112435262A (en) * 2020-11-27 2021-03-02 广东电网有限责任公司肇庆供电局 Dynamic environment information detection method based on semantic segmentation network and multi-view geometry
CN112446882A (en) * 2020-10-28 2021-03-05 北京工业大学 Robust visual SLAM method based on deep learning in dynamic scene
CN113837277A (en) * 2021-09-24 2021-12-24 东南大学 A multi-source fusion SLAM system based on visual point and line feature optimization
WO2022041596A1 (en) * 2020-08-31 2022-03-03 同济人工智能研究院(苏州)有限公司 Visual slam method applicable to indoor dynamic environment
CN114283199A (en) * 2021-12-29 2022-04-05 北京航空航天大学 A point-line fusion semantic SLAM method for dynamic scenes
CN114627309A (en) * 2022-03-11 2022-06-14 长春工业大学 Visual SLAM method based on dotted line features in low texture environment
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling
CN114862949A (en) * 2022-04-02 2022-08-05 华南理工大学 Structured scene vision SLAM method based on point, line and surface characteristics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7096274B2 (en) * 2018-06-07 2022-07-05 馭勢科技(北京)有限公司 Method and device for simultaneous self-position estimation and environment map creation

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489501A (en) * 2019-07-24 2019-11-22 西北工业大学 SLAM system rapid relocation algorithm based on line feature
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
WO2022041596A1 (en) * 2020-08-31 2022-03-03 同济人工智能研究院(苏州)有限公司 Visual slam method applicable to indoor dynamic environment
CN112132897A (en) * 2020-09-17 2020-12-25 中国人民解放军陆军工程大学 A visual SLAM method for semantic segmentation based on deep learning
CN112446882A (en) * 2020-10-28 2021-03-05 北京工业大学 Robust visual SLAM method based on deep learning in dynamic scene
CN112396595A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on point-line characteristics in dynamic environment
CN112435262A (en) * 2020-11-27 2021-03-02 广东电网有限责任公司肇庆供电局 Dynamic environment information detection method based on semantic segmentation network and multi-view geometry
CN112381890A (en) * 2020-11-27 2021-02-19 上海工程技术大学 RGB-D vision SLAM method based on dotted line characteristics
CN113837277A (en) * 2021-09-24 2021-12-24 东南大学 A multi-source fusion SLAM system based on visual point and line feature optimization
CN114283199A (en) * 2021-12-29 2022-04-05 北京航空航天大学 A point-line fusion semantic SLAM method for dynamic scenes
CN114627309A (en) * 2022-03-11 2022-06-14 长春工业大学 Visual SLAM method based on dotted line features in low texture environment
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling
CN114862949A (en) * 2022-04-02 2022-08-05 华南理工大学 Structured scene vision SLAM method based on point, line and surface characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于点线特征的单目视觉同时定位与地图构建算法;王丹等;机器人(第03期);全文 *

Also Published As

Publication number Publication date
CN116468786A (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN110223348B (en) Adaptive pose estimation method for robot scene based on RGB-D camera
CN112785702B (en) A SLAM method based on tightly coupled 2D lidar and binocular camera
CN110686677B (en) Global positioning method based on geometric information
CN110070615B (en) Multi-camera cooperation-based panoramic vision SLAM method
CN109059895B (en) Multi-mode indoor distance measurement and positioning method based on mobile phone camera and sensor
CN110322511B (en) Semantic SLAM method and system based on object and plane features
WO2024114119A1 (en) Sensor fusion method based on binocular camera guidance
CN104732518B (en) An Improved Method of PTAM Based on Ground Features of Intelligent Robot
CN112258600A (en) A simultaneous localization and map construction method based on vision and lidar
CN108615246B (en) Method for improving robustness of visual odometer system and reducing calculation consumption of algorithm
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
US11788845B2 (en) Systems and methods for robust self-relocalization in a visual map
CN107945265A (en) Real-time dense monocular SLAM method and systems based on on-line study depth prediction network
CN108519102B (en) A binocular visual odometry calculation method based on secondary projection
CN113223045B (en) Vision and IMU sensor fusion positioning system based on dynamic object semantic segmentation
CN108776989B (en) Low-texture planar scene reconstruction method based on sparse SLAM framework
CN111127524A (en) Method, system and device for tracking trajectory and reconstructing three-dimensional image
CN110097584A (en) The method for registering images of combining target detection and semantic segmentation
CN116468786B (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN113506318A (en) A 3D object perception method in vehicle edge scene
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
CN110032965A (en) Vision positioning method based on remote sensing images
CN114140527A (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN112101160A (en) A Binocular Semantic SLAM Method for Autonomous Driving Scenarios
CN111998862A (en) Dense binocular SLAM method based on BNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant