CN120726129B - Pipe network detection robot positioning method, device, equipment, medium and product - Google Patents
Pipe network detection robot positioning method, device, equipment, medium and productInfo
- Publication number
- CN120726129B CN120726129B CN202511149890.7A CN202511149890A CN120726129B CN 120726129 B CN120726129 B CN 120726129B CN 202511149890 A CN202511149890 A CN 202511149890A CN 120726129 B CN120726129 B CN 120726129B
- Authority
- CN
- China
- Prior art keywords
- map
- point
- points
- feature
- pose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Remote Sensing (AREA)
- Computer Graphics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pipe network detection robot positioning method, a pipe network detection robot positioning device, pipe network detection robot positioning equipment, a pipe network detection robot medium and a pipe network detection robot product. The method comprises the steps of acquiring images through a binocular camera arranged by a detection robot, respectively transmitting left-eye images and right-eye images in a topic form, extracting characteristic points and descriptors of the left-eye images and the right-eye images by utilizing an acceleration characteristic network, generating an initialized pipeline map by utilizing an SLAM system, determining the initial pose of the camera through a tracking thread, optimizing the initial pose through the characteristic points and the descriptors, constructing a cylinder according to the initialized pipeline map, inserting key frames meeting the conditions into a local mapping thread, generating map points and cylinder points by utilizing the received key frames by utilizing the local mapping thread, executing local BA, searching candidate loop frames by utilizing a XFeat database by utilizing a closed loop detection thread, calculating similar transformation to obtain accumulated errors, and executing a global BA optimized map to position the detection robot. The scheme of the invention can realize the accurate positioning of the pipe network detection robot in the complex scene of the pipeline.
Description
Technical Field
The invention relates to the technical field of pipe network positioning, in particular to a pipe network detection robot positioning method, a pipe network detection robot positioning device, pipe network detection equipment, a pipe network detection medium and a pipe network detection product.
Background
With the rapid development of urban pipe network systems, pipe network detection robots are increasingly widely applied to the fields of pipeline inspection, fault positioning and the like. The accurate positioning is a core requirement for the pipe network detection robot to realize autonomous inspection, and the vision synchronous positioning and map construction (Simultaneous Localization AND MAPPING, SLAM) technology becomes a main scheme of robot positioning by virtue of low cost and high information quantity.
ORB-SLAM series algorithm is prominent in the traditional visual SLAM field due to high-efficiency characteristic processing capability, multithreading architecture and loop detection function, and is widely used for positioning tasks of various scenes.
However, in special scenes such as pipelines, the limitations of the traditional characteristic point method are gradually revealed, in the pipeline network detection scene, the robot needs to operate in 2/3 water, semi-water or anhydrous environments, complex challenges such as sparse texture, uneven illumination, metal reflection, structural repetition and the like of the inner wall of the pipeline are faced, and extremely high requirements are put on the robustness and positioning accuracy of SLAM technology.
Disclosure of Invention
The invention provides a pipe network detection robot positioning method, a pipe network detection robot positioning device, pipe network detection robot positioning equipment, a pipe network detection robot positioning medium and a pipe network detection robot positioning product, which can realize accurate positioning of the pipe network detection robot under a complex scene of a pipeline.
According to an aspect of the present invention, there is provided a pipe network detection robot positioning method, including:
Acquiring images by a binocular camera arranged by a detection robot, and respectively transmitting left-eye images and right-eye images in the form of topics;
Generating an initialized pipeline map by utilizing the SLAM system, and determining the initial pose of a camera by tracking threads;
Inserting key frames meeting the conditions into a local mapping thread according to the initialized pipeline map construction cylinder;
Generating map points and cylindrical points by utilizing the received key frames by utilizing the local map creation process, and executing local BA;
Searching candidate loop frames through XFeat database by using a closed loop detection thread, calculating similar transformation to obtain accumulated errors, executing global BA optimization map, and positioning the detection robot through the optimized map.
Optionally, the left eye image and the right eye image acquired by the binocular camera are respectively transmitted through independent communication topics, the topic format adopts a ROS standard message type of a robot operating system, and the transmission frame rate is the same as the acquisition frame rate of the binocular camera.
Optionally, the acceleration feature network is an end-to-end network model based on deep learning, the input of the acceleration feature network is RGB data of left and right eye images, and the output of the acceleration feature network comprises:
A key point heat map of the position probability distribution of the feature points in the image is represented in a two-dimensional matrix form, wherein the higher the first matrix element value in the key point heat map is, the greater the possibility that the feature points exist at the position is;
The reliability heat map with the same size as the key point heat map is characterized in that the second matrix element value of the reliability heat map reflects the reliability score of the characteristic point of the corresponding position, the scoring range is [0,1], and the closer the value is to 1, the higher the reliability is;
and a descriptor set, wherein each feature point corresponds to a 128-dimensional binary descriptor vector for matching between feature points.
Optionally, the extracting the feature points of the left eye image and the right eye image by using the acceleration feature network includes:
And carrying out confidence scoring on each characteristic point in the key point heat map and the reliability heat map through the following formula:
;
wherein, the Is a normalized score for a feature point in the keypoint heat map,Is a normalized score for a feature point in the keypoint heat map,AndIs a weight coefficient, and+=1;
And selecting the characteristic points with the scores of N in the top ranking, screening the uniformly distributed characteristic points by a grid division method, dividing the image into M multiplied by M grids, and keeping each grid to not more than P characteristic points.
Optionally, the generating the initialized pipeline map by using the SLAM system and determining the initial pose of the camera by tracking the thread includes:
Selecting a preset frame number of double-mesh images, calculating a basic matrix through feature point matching, decomposing to obtain an initial rotation matrix and a translation vector, and generating a three-dimensional point cloud by combining internal parameters and a base line distance of the double-mesh camera in a triangularization way to form the initial pipeline map;
Determining whether a previous frame image exists currently, if so, predicting the current pose of the binocular camera based on a constant-speed motion model to serve as an initial value, matching the feature points of the current frame and the previous frame, solving the pose by using a RANSAC-PnP algorithm, and performing iterative optimization;
And if the initial pose is determined for the first time after the SLAM system is started, screening interior points from the feature point matching pair through a random sampling consistency algorithm, calculating an essential matrix and decomposing to obtain the initial pose.
Optionally, the optimizing the initial pose through the feature points and the descriptors includes;
selecting map points corresponding to a preset number of key frames with highest common view range degree of the current frame from the local map, extracting descriptors of the map points to form a local feature library, and associating observation information of at least the preset number of key frames with each map point;
Adopting a bidirectional matching mechanism, firstly matching the current frame feature point descriptor with a local feature library to obtain initial matching pairs, then verifying the projection consistency of the current frame feature point and the map point in the corresponding key frame for each matching pair, and eliminating the matching pairs with projection errors exceeding the preset pixel number;
constructing a pose optimization objective function based on a lie algebra by taking an initial pose as an iteration starting point, wherein the objective function is a weighted sum of all matching point re-projection errors, and the weight is dynamically adjusted according to the observation times of map points;
And (3) carrying out iterative optimization by adopting a Gaussian-Newton algorithm, calculating a Jacobian matrix after each iteration, updating pose parameters, stopping when the iteration times reach preset times or the pose variation of two adjacent iterations is smaller than 1e-6 radians and 1e-3 meters, and outputting the optimized pose.
Optionally, the inserting the key frames meeting the conditions into the local mapping thread according to the initializing the pipeline map building cylinder includes:
Based on the point cloud data of the initialized pipeline map, fitting a cylinder parameter by adopting a random sampling consistency algorithm, wherein the cylinder parameter comprises a unit direction vector of a cylinder axis, an axis midpoint coordinate and a cylinder radius, wherein the distance between each point in the point cloud and a candidate axis is calculated through a point-to-straight line distance formula, and when the internal point occupation ratio exceeds a preset proportion value of the total number of the point cloud, determining the candidate cylinder as an effective model;
The current frame is determined to be a key frame and a local mapping thread is inserted when the following conditions are satisfied:
the translation distance between the key frame and the last key frame exceeds 1/5 of the diameter of the pipeline or the rotation angle exceeds 5 degrees;
the number of matching pairs between the current frame extracted feature points and the local map feature points is not less than 80, and the distribution of the matching points in the image covers at least 8 preset grid areas;
the root mean square value of the reprojection error after pose optimization is smaller than 1.5 pixels;
The key frames inserted into the local mapping thread comprise the following information of the optimized camera pose, the feature point set, the corresponding descriptors, the image acquisition time stamp, the common view relation with the adjacent key frames and the projection position parameters of the key frames in the cylindrical model.
Optionally, the generating map points and cylinder points by the local mapping process using the received keyframes, and executing local BA includes:
Calculating three-dimensional coordinates of matching feature point pairs of the key frames and adjacent key frames through binocular triangulation, screening points with the reprojection error smaller than 2 pixels and the parallax angle larger than 1 degree as effective map points, and associating the observed key frame index and the observed pixel coordinates of each map point;
Based on the cylinder model, carrying out cylinder constraint verification on the effective map points, calculating the distance from the map points to the cylinder axis, marking the map points as cylinder points if the deviation between the distance and the cylinder radius is within a preset value, and recording parameterized coordinates of the map points in the circumferential direction and the axial direction of the cylinder;
An optimization window comprising a current key frame, a common view key frame and a corresponding map point is constructed, a Lei algebra is used for representing the pose of a camera, the vertex comprises the pose of the key frame and the coordinates of the map point, the edge is an observation constraint, a cylindrical constraint edge is additionally added to the cylindrical point, the distance from the cylindrical point to the axis is constrained to be equal to the radius of the cylinder, the weight is 1.5 times of that of the common observation edge, a g2o optimization library is adopted for sparse BA solution, iteration is carried out until the root mean square of the re-projection error is smaller than 1.2 pixels or the iteration times is 30 times, and the optimized pose and the coordinates of the map point are output.
Optionally, the searching the candidate loop frame by using the closed loop detection thread through XFeat database, calculating the similarity transformation to obtain the accumulated error, and executing the global BA optimization map includes:
inputting the feature point descriptors of the current key frame into XFeat database, searching the first 20 historical key frames with highest feature matching degree with the current frame through approximate nearest neighbor searching algorithm, and taking the first 20 historical key frames as candidate loop frames;
Estimating a similar transformation matrix from the feature matching pair by a RANSAC algorithm for the candidate loop frames, wherein the interior point judgment threshold is set to be 3 pixels, and confirming the candidate frames as effective loop frames when the number of the interior points exceeds 30% of the total number of the matching pairs and the transformation matrix meets homography constraint;
The method comprises the steps of constructing a global optimization graph comprising all key frame poses and map points, adding similar transformation obtained by loop detection into the graph as strong constraint, dynamically adjusting constraint weights according to accumulated errors, optimizing by adopting a sparse BA algorithm, performing marginalization operation once every 10 iterations in an iteration process to remove redundant variables by taking weighted sum of re-projection errors and loop constraint errors as an objective function, and stopping optimizing when the variation of the objective function of two iterations is smaller than 1e-6 or the iteration times reach 80 times.
According to another aspect of the present invention, there is provided a pipe network inspection robot positioning device, including:
The image acquisition unit is used for acquiring images through a binocular camera arranged by the detection robot, and transmitting left-eye images and right-eye images in a topic form respectively;
The initial pose determining unit is used for generating an initialized pipeline map by utilizing the SLAM system and determining the initial pose of the camera by tracking threads;
The column construction unit is used for constructing a column according to the initialized pipeline map and inserting key frames meeting the conditions into the local map construction thread;
A local BA unit, configured to generate map points and cylinder points by using the received key frame by using the local mapping process, and execute local BA;
and the global BA unit is used for searching candidate loop back frames through the XFeat database by utilizing a closed loop detection thread, calculating similar transformation to obtain accumulated errors, executing a global BA optimization map and positioning the detection robot through the optimized map.
According to another aspect of the present invention, there is provided an electronic apparatus including:
the pipe network detection robot positioning method comprises the steps of providing a pipe network detection robot positioning system, providing at least one processor, and providing a memory in communication connection with the at least one processor, wherein the memory stores a computer program executable by the at least one processor, and enabling the at least one processor to execute the pipe network detection robot positioning method according to any embodiment of the invention.
According to another aspect of the present invention, there is provided a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a processor to implement the pipe network detection robot positioning method according to any embodiment of the present invention when executed.
According to another aspect of the present invention, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the pipe network inspection robot positioning method according to any of the embodiments of the present invention.
According to the technical scheme provided by the embodiment of the invention, the problems that the positioning accuracy and the robustness are seriously affected due to the problems of repeated pipeline structure, sparse pipe wall texture, uneven illumination in the pipe and the like in the traditional visual SLAM method are solved. The positioning method can realize centimeter-level positioning in complex pipeline environment, reduce positioning errors, improve positioning accuracy compared with the traditional positioning method, is suitable for wide deployment and popularization of urban pipeline systems, and can be widely applied to similar scenes such as diversion tunnels, culverts, tunnels and the like.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for positioning a pipe network detection robot according to an embodiment of the present invention;
fig. 2 is a flowchart of an initial pose determining method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a method for local BA according to a third embodiment of the present invention;
fig. 4 is a flowchart of a global BA method according to a third embodiment of the present invention;
Fig. 5 is a schematic structural diagram of a positioning device for a pipe network detection robot according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device for implementing the positioning method of the pipe network detection robot according to the embodiment of the invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a positioning method of a pipe network detection robot according to an embodiment of the present invention, where the embodiment is applicable to positioning a pipe network detection robot in a complex pipeline scenario, the method may be performed by a pipe network detection robot positioning device, the pipe network detection robot positioning device may be implemented in a hardware and/or software form, and the pipe network detection robot positioning device may be configured in an electronic device. As shown in fig. 1, the method includes:
S110, acquiring images through a binocular camera arranged by the detection robot, respectively transmitting left-eye images and right-eye images in the form of topics, and extracting characteristic points and descriptors of the left-eye images and the right-eye images by utilizing an acceleration characteristic network.
The binocular camera carried by the detection robot synchronously collects left-eye and right-eye images of the internal environment of the pipeline, and the left-eye and right-eye images are packaged into independent topics respectively for real-time transmission through a communication mechanism of an operating system of the robot, similar to human binocular vision.
The characteristic points are representative key pixel points identified from the image, such as the obvious characteristics of joints, bulges and the like of the inner wall of a pipeline, the points can still keep stability under different view angles and illumination, and the descriptor generation is that a high-dimensional vector descriptor is generated for each characteristic point and used for describing the local image characteristics of the point, such as textures, gradients and the like, so that the same physical points in different images can be matched and related through the descriptor.
In the embodiment of the invention, the left-eye image and the right-eye image acquired by the binocular camera are respectively transmitted through independent communication topics, the topic format adopts the ROS standard message type of the robot operating system, and the transmission frame rate is the same as the acquisition frame rate of the binocular camera.
The left eye image and the right eye image are transmitted through two mutually independent ROS topics. The core of binocular vision is that depth information is calculated through parallax of left and right images, independent topics can ensure that the left and right images are respectively read by a front-end processing module of an SLAM system, and data aliasing is avoided.
The sensor_ msgs/Image standard message format defined by the ROS is adopted to transmit the Image, the Image can be directly analyzed by an Image processing library in the ROS ecology, meanwhile, the compatibility between a camera driving node and a SLAM algorithm node is ensured, and a custom data format is not needed.
The transmission frame rate is consistent with the camera acquisition frame rate, so that the real-time performance and time synchronism of image data are ensured, on one hand, image backlog or loss caused by the fact that the transmission frame rate is too low is avoided, the SLAM system is ensured to be capable of estimating the pose based on the latest image, and on the other hand, the left image and the right image are aligned strictly in time, and the left image and the right image asynchronous error caused by the movement of the robot are reduced.
In the embodiment of the invention, the acceleration feature network is an end-to-end network model based on deep learning, the input of the acceleration feature network is RGB data of left and right eye images, and the output of the acceleration feature network comprises:
A key point heat map of the position probability distribution of the characteristic points in the image is represented in a two-dimensional matrix form, wherein the higher the first matrix element value in the key point heat map is, the greater the possibility that the characteristic points exist at the position is;
The reliability heat map with the same size as the key point heat map reflects the reliability score of the corresponding position characteristic point by the second matrix element value of the reliability heat map, wherein the score range is [0,1], and the closer the value is to 1, the higher the reliability is;
and a descriptor set, wherein each feature point corresponds to a 128-dimensional binary descriptor vector for matching between the feature points.
Specifically, the acceleration feature network is an end-to-end deep learning model, does not need to manually design feature extraction rules, directly takes RGB color images collected by left and right eye cameras as input, and adapts to color information extraction under complex illumination in a pipeline.
A keypoint heat map is a two-dimensional probability matrix associated with the size of an input image, e.g., 1280 x 720 for an input image, the heat map may be 320 x 180 or the like scaled in size. The numerical value of each element in the matrix represents the probability that a feature point exists at the corresponding image location, such as a pipeline joint, protrusion, stain, etc., that is available for locating a salient structure, with higher values indicating that the location is more likely to be a stable feature point.
The reliability heat map completely corresponds to the key point heat map in size, the element value (0-1 range) represents the reliability of the characteristic point of the corresponding position, and the closer the value is to 1, the more the characteristic point is, the stability can be maintained under the conditions of visual angle change, illumination fluctuation, slight shielding and the like, and the characteristic mismatch risk caused by interference of light reflection, water stain and the like in a pipeline can be effectively reduced.
And generating a binary vector for each screened characteristic point by the 128-dimensional binary descriptor set, and realizing rapid matching of the characteristic points of different images by encoding local information such as texture, gradient, color distribution and the like around the characteristic points. Compared with a floating point descriptor, the binary descriptor is lighter, the real-time matching efficiency of the SLAM system can be accelerated, and the real-time requirement of robot inspection is met.
In the embodiment of the invention, extracting the characteristic points of the left-eye image and the right-eye image by using an acceleration characteristic network comprises the following steps:
And carrying out confidence scoring on each characteristic point in the key point heat map and the reliability heat map through the following formula:
;
wherein, the Is a normalized score for a feature point in the keypoint heat map,Is a normalized score for a feature point in the keypoint heat map,AndIs a weight coefficient, and+=1。
And selecting the characteristic points with the scores of N in the top ranking, screening the uniformly distributed characteristic points by a grid division method, dividing the image into M multiplied by M grids, and keeping each grid to not more than P characteristic points.
The feature point normalization score extracted from the key point heat map reflects the probability that the position is taken as the feature point, and correspondingly,The feature point normalization score is extracted from the reliability heat map and reflects the stability of the feature point in a complex environment.AndFor balancing the importance of keypoint probability and reliability. For example, in a pipeline texture sparse scene, the increase may be madeThe feature points with high reliability are reserved preferentially.
And selecting the characteristic points with the scores of N in the top ranking, and ensuring that the candidate points with the highest scores are reserved. The image is divided into M multiplied by M uniform grids (such as 16 multiplied by 16), and P characteristic points are reserved in each grid at most, so that the characteristic points are prevented from being concentrated in a local area, such as a pipeline joint, and the characteristic points can be densely appeared, and the characteristic points are ensured to be uniformly distributed in the image. The uniformly distributed feature points can provide more comprehensive space constraint for the subsequent pose estimation, reduce positioning deviation caused by local feature loss, and improve the stability of rotation and translation estimation due to uniformity especially in a long-strip symmetrical structure of a pipeline.
S120, generating an initialized pipeline map by utilizing the SLAM system, determining the initial pose of the camera by tracking threads, and optimizing the initial pose by utilizing the feature points and the descriptors.
After the SLAM system is started, the three-dimensional point cloud is calculated through binocular stereo matching by utilizing the first several frames of images acquired by the binocular cameras, and an initial pipeline map comprising three-dimensional point coordinates and preliminary geometric constraints is constructed by combining the prior knowledge of the pipeline structure.
The tracking thread obtains the initial pose by:
if no history data is started for the first time, calculating a basic matrix/essential matrix between images through feature point matching, decomposing to obtain rotation and translation parameters of a camera, and establishing a world coordinate system by taking a camera coordinate system of a first frame image as an origin. If the system is restarted or repositioned, the pose of the current camera in the world coordinate system is quickly solved by matching with the features of the existing initial map.
Matching the feature points and descriptors extracted from the current frame with three-dimensional points in the initialized map, and finding out the corresponding relation between the image feature points and the spatial three-dimensional points by comparing the similarity of the descriptors. And constructing a reprojection error function based on the matching pair, minimizing the error through a nonlinear optimization algorithm, and iteratively correcting rotation and translation parameters of the camera. Finally, the re-projection error is controlled within a preset threshold value, and the precision of the initial pose is ensured.
S130, according to the initialized pipeline map building cylinder, inserting the key frames meeting the conditions into the local map building thread.
From the three-dimensional point cloud of the initialized pipeline map, cylindrical parameters including the spatial direction of a cylindrical axis, the coordinates of the central point of the axis, the radius of the cylinder and the like are fitted through an algorithm. During fitting, points conforming to the geometric constraint of the cylinder are screened out, for example, the distance from the point to the axis is close to the radius of the pipeline, and noise points, for example, abnormal points formed by interference of impurities, bubbles and the like in the pipeline are removed. The cylindrical model adds structural constraint to the map, and newly generated map points can be verified whether to accord with the cylindrical characteristics or not, so that accumulated errors are reduced, and the method is particularly suitable for long-distance repeated structural scenes such as pipelines.
The key frames are key image frames used for constructing a map in the SLAM system, and include core information such as camera pose, feature points, etc., and not all image frames are selected as key frames.
Screening conditions typically include:
The movement distance is that compared with the previous key frame, the movement distance of the robot exceeds 1/5 of the diameter of the pipeline or the rotation angle exceeds 5 degrees, so that redundant frames are avoided, and the drawing efficiency is ensured;
the feature matching quality is that the number of pairs of feature matching of the current frame and the local map is enough, for example, not less than 80 pairs, and the feature matching quality is uniformly distributed and covers a plurality of areas of the image;
The pose accuracy is that the re-projection error of the optimized camera pose is smaller, such as root mean square <1.5 pixels, so that the pose reliability is ensured.
The key frames meeting the conditions are transmitted to a local mapping thread for generating new three-dimensional map points, optimizing a local map structure, fusing the new three-dimensional map points with a cylindrical model and gradually constructing a more accurate pipeline environment map.
And S140, generating map points and cylindrical points by utilizing the received key frames by utilizing the local mapping process, and executing local BA.
After the local mapping process receives the new key frame, the new key frame is matched with the adjacent key frames, three-dimensional space coordinates of the matched feature points are calculated through the binocular vision triangulation principle, and a new map point is generated. And screening the map points, removing the points with overlarge reprojection errors or overlarge parallax angles, and ensuring the accuracy of the map points.
And (3) carrying out geometric constraint verification on newly generated map points by combining the constructed pipeline cylindrical model, calculating the distance between each map point and the cylindrical axis, marking the map points as cylindrical points if the deviation between the distance and the theoretical radius of the pipeline is within an allowable range, and recording the parameters of the map points under a cylindrical coordinate system. The cylindrical points have the significance of strengthening the constraint of the cylindrical structure of the pipeline and reducing map drift caused by sparse features or noise.
BA is a nonlinear least square algorithm for optimizing camera pose and three-dimensional point coordinates, and the core is to improve the consistency of the pose and the map by minimizing the reprojection error. Unlike global BA for optimizing the whole graph, local BA only builds an optimization window for the current key frame, the neighboring key frames which are common to the current key frame and the map points observed by the key frames, so that high calculation cost of global optimization is avoided, and instantaneity is ensured.
And S150, searching candidate loop frames through a XFeat database by utilizing a closed loop detection thread, calculating similar transformation to obtain accumulated errors, executing a global BA optimization map, and positioning the detection robot through the optimized map.
When the robot moves a certain distance in the pipeline, the robot may return to the previous passing region, and this phenomenon needs to be identified through loop detection to correct the accumulated positioning error. XFeat is an efficient feature extraction and matching algorithm, whose built database stores feature descriptors of historical key frames. The closed loop detection thread compares the feature descriptors of the current key frame with the historical data in the database, and finds the first N historical key frames with the highest feature matching degree through approximate nearest neighbor search to serve as candidate loop frames. To avoid repeated matches in a short time, candidate frames that are too close to the current frame interval are filtered out, ensuring the validity of loop detection.
The global BA is a global optimization algorithm, and by adjusting the pose of all key frames and the coordinates of map points, the projection error of the whole body weight is minimized, the accumulated error is eliminated, and the global consistency of the map is ensured. And constructing a global optimization model containing all key frames and map points, and adding the similarity transformation obtained by loop detection into the model as strong constraint. And adopting a sparse BA algorithm for iterative optimization until the error converges or the maximum iterative times are reached. The optimized global consistent map eliminates scale drift and pose deviation caused by long-distance movement. The map after global optimization provides an accurate three-dimensional model of the pipeline environment, the robot can perform feature matching through the image acquired in real time and the map, and solve the current camera pose by combining algorithms such as PnP and the like, so that accurate positioning based on the global consistent map is realized. The positioning mode is not affected by accumulated errors, and can keep high positioning precision even in long-distance pipeline inspection.
Example two
Fig. 2 is a flowchart of an initial pose determining method according to a second embodiment of the present invention. As shown in fig. 2, the method includes:
S210, selecting a binocular image with a preset frame number, calculating a basic matrix through feature point matching, decomposing to obtain an initial rotation matrix and a translation vector, and generating a three-dimensional point cloud by combining the internal reference and the base line distance of the binocular camera and triangularization to form an initial pipeline map.
After the SLAM system is started, a preset number of binocular images are firstly acquired, and the images are required to meet certain motion constraint, so that the reliability of subsequent feature matching and geometric calculation is ensured. Although a single-frame binocular image can generate local point clouds, a globally consistent coordinate system is difficult to construct due to lack of motion information, and the motion relation of multiple frames of images can provide scale and direction references.
And extracting characteristic points and descriptors of left and right images of each frame from the selected image sequence through an acceleration characteristic network, and then carrying out characteristic point matching between adjacent frames to find out corresponding pixel positions of the same physical point in the space in different images.
Based on the matched feature point pairs, estimating a base matrix through a RANSAC algorithm. The basic matrix is an important geometric constraint in binocular vision, describes the projection relationship between left and right eye images, and essentially encodes the mathematical relationship of camera internal parameters, relative pose and space point coordinates, and can be used for verifying the rationality of feature matching. The basic matrix contains the relative pose information between the cameras corresponding to the two frames of images. By SVD of the base matrix, an initial rotation matrix and a translation vector of the camera can be obtained by solving.
And converting the characteristic point coordinates in the pixel coordinate system into the ray directions in the camera coordinate system by utilizing the internal parameters and the base line distances of the binocular camera. And calculating three-dimensional coordinates of the matched characteristic points in a world coordinate system according to the relative pose of the adjacent frames by using a geometric triangulation principle. This process is performed on all the matching points, generating a dense three-dimensional point cloud. And organizing the three-dimensional point cloud according to the space position, and combining the cylindrical priori of the pipeline to form an initial three-dimensional map containing key features of the inner wall of the pipeline.
S220, determining whether a previous frame of image exists currently, if so, predicting the current pose of the binocular camera based on a constant-speed motion model to serve as an initial value, solving the pose by utilizing a RANSAC-PnP algorithm through matching of characteristic points of the current frame and the previous frame, and performing iterative optimization.
The SLAM system first checks whether there is image data processed in the previous frame. If the previous frame exists, the robot is in continuous motion, and the pose prediction can be performed by utilizing the time continuity of the motion. The motion of the robot may be approximated as uniform linear motion and uniform rotational motion during a short time interval. And in the prediction process, the predicted pose of the current frame is calculated according to the camera pose of the previous frame and the motion speeds of the previous frames. The predicted value is used as an initial value for the subsequent pose solving, so that the searching range of an optimization algorithm can be greatly reduced, the solving speed and the convergence are improved, and the problem of sinking into local optimum is avoided. And for the images of the current frame and the previous frame, the feature point pairs corresponding to the same physical point in the two frames are found through the similarity comparison of the feature descriptors.
And (3) knowing the three-dimensional point and the two-dimensional projection point of the three-dimensional point in the current image, and solving the pose algorithm of the camera through a PnP algorithm. The method is characterized in that a nonlinear equation set is established and solved through the projection relation from a space point to an image point.
The random sample consensus RANSAC algorithm is used to handle mismatching in matching pairs. And solving the pose by partial matching through random sampling for a plurality of times, then counting the number of interior points conforming to the projection constraint of the pose, and reserving the pose with the largest interior points as an optimal solution, thereby greatly improving the robustness of pose solving. And taking the pose obtained by RANSAC as an initial value, constructing a reprojection error function, and carrying out iterative minimization of the error through a nonlinear optimization algorithm to further correct the pose parameters until the error converges.
And S230, if the initial pose is determined for the first time after the SLAM system is started, screening interior points from the feature point matching pair through a random sampling consistency algorithm, calculating an essential matrix and decomposing to obtain the initial pose.
When the SLAM system is just started, there is no history pose data or map information, and the pose cannot be predicted by using the motion information of the previous frame like the continuous tracking stage. At this time, it is necessary to calculate the initial pose of the camera from zero, completely depending on the initially acquired image data. And acquiring respective characteristic points and descriptors of the two first frames of binocular images acquired after the SLAM system is started through a characteristic extraction network, and then obtaining a characteristic point matching pair between the two frames through descriptor matching. The random sample consensus algorithm RANSAC screens interior points.
Because mismatching inevitably exists in the characteristic matching process, interior points are required to be screened through a RANSAC algorithm. And counting the number of the matched pairs conforming to the constraint of the model, reserving the matched pair corresponding to the model with the largest inner point as effective data, and eliminating the outer points not conforming to the constraint.
And calculating an essential matrix corresponding to the two frames of images based on the filtered interior points. The essential matrix is an important geometrical constraint describing the relative pose between two camera coordinate systems in binocular vision, and its mathematical expression encodes the relation of rotation matrix and translation vector between two frames and is related to camera internal parameters. The essential matrix contains relative pose information between two frames, and a possible rotation matrix and translation vector combination can be obtained by carrying out singular value decomposition on the essential matrix. And (3) combining with actual physical constraint, screening out a unique reasonable rotation matrix and a translation vector from the solutions, and obtaining the initial pose.
In the embodiment of the invention, the initial pose is optimized through the feature points and the descriptors, including;
selecting map points corresponding to a preset number of key frames with highest common view range degree of the current frame from the local map, extracting descriptors of the map points to form a local feature library, and associating observation information of at least the preset number of key frames with each map point;
Adopting a bidirectional matching mechanism, firstly matching the current frame feature point descriptor with a local feature library to obtain initial matching pairs, then verifying the projection consistency of the current frame feature point and the map point in the corresponding key frame for each matching pair, and eliminating the matching pairs with projection errors exceeding the preset pixel number;
constructing a pose optimization objective function based on a lie algebra by taking an initial pose as an iteration starting point, wherein the objective function is a weighted sum of all matching point re-projection errors, and the weight is dynamically adjusted according to the observation times of map points;
And (3) carrying out iterative optimization by adopting a Gaussian-Newton algorithm, calculating a Jacobian matrix after each iteration, updating pose parameters, stopping when the iteration times reach preset times or the pose variation of two adjacent iterations is smaller than 1e-6 radians and 1e-3 meters, and outputting the optimized pose.
And screening a preset number of key frames with highest common view range degree with the current frame from the local map, and extracting three-dimensional map points corresponding to the key frames and descriptors thereof to form a local feature library. The degree of common view refers to the number of shared feature points, and the higher the common view is, the stronger the spatial correlation between the key frames and the current frame is, and the greater the reference value of the map points to the current pose optimization is. Each map point needs to be associated with observation information of at least a preset number of key frames, so that the map point is ensured to be verified in multiple view angles, and the accuracy is higher.
The bidirectional matching mechanism firstly uses descriptors of the characteristic points of the current frame to match with map point descriptors in the local characteristic library to obtain initial matching pairs. And then, carrying out projection consistency verification on each initial matching pair, projecting the three-dimensional map points back to the key frame image according to the associated key frame pose, and checking the deviation between the projection position and the original observation feature points in the key frame. If the projection error exceeds the preset pixel number, the error is judged to be mismatching and is removed, and only the matching pair with the error meeting the requirement is reserved.
The camera pose is converted into a lie algebra form, so that the pose optimization problem is converted into a linear solution of a vector space, and the iterative calculation is simplified. And taking the weighted sum of the re-projection errors of all the matching points as an optimization target. The three-dimensional map point is projected to the pixel position on the image through the current camera pose, and the deviation between the three-dimensional map point and the actual position of the corresponding feature point in the current frame is achieved. The more times a map point is observed, the greater the weight, since the spatial coordinates of such points are more reliable, and the greater the contribution to the optimization should be.
And (3) taking the initial pose as a starting point, performing iterative optimization through a Gaussian-Newton algorithm, and calculating a jacobian matrix of the objective function on the lie algebraic variable in each iteration to reflect the sensitivity degree of the error on the pose change. And solving an increment equation based on the jacobian matrix, and updating pose parameters to reduce an objective function. And stopping iteration when the iteration times reach a preset value or the pose variation of two adjacent iterations is smaller than a threshold value, and outputting the current pose as an optimization result. And gradually correcting the pose by a numerical optimization algorithm to minimize the re-projection error, and finally obtaining the high-precision camera pose.
In the embodiment of the invention, according to the initialization pipeline map construction cylinder, key frames meeting the conditions are inserted into a local map construction thread, and the method comprises the following steps:
Based on the point cloud data of the initialized pipeline map, fitting a cylinder parameter by adopting a random sampling consistency algorithm, wherein the cylinder parameter comprises a unit direction vector of a cylinder axis, an axis midpoint coordinate and a cylinder radius, wherein the distance between each point in the point cloud and a candidate axis is calculated through a point-to-straight line distance formula, the inner point with the distance smaller than a preset threshold value is screened, and when the inner point accounts for a preset proportion value exceeding the total number of the point cloud, the candidate cylinder is determined to be an effective model;
The current frame is determined to be a key frame and a local mapping thread is inserted when the following conditions are satisfied:
the translation distance between the key frame and the last key frame exceeds 1/5 of the diameter of the pipeline or the rotation angle exceeds 5 degrees;
the number of matching pairs between the current frame extracted feature points and the local map feature points is not less than 80, and the distribution of the matching points in the image covers at least 8 preset grid areas;
the root mean square value of the reprojection error after pose optimization is smaller than 1.5 pixels;
the key frame inserted with the local mapping thread comprises the following information of the optimized camera pose, the feature point set, the corresponding descriptors, the image acquisition time stamp, the common view relation with the adjacent key frame and the projection position parameter of the key frame in the cylindrical model.
The pipeline has a typical cylindrical structure, can be converted into geometric constraint through an algorithm, and improves map accuracy. And extracting key parameters of the cylinder from the three-dimensional point cloud of the initialized pipeline map, wherein the key parameters comprise unit direction vectors of the axis, coordinates of the middle point of the axis and the radius of the cylinder.
And calculating the distances between all the points in the point cloud and the candidate axis, and screening out the points with the distances smaller than a preset threshold value as internal points. When the proportion of the number of the inner points to the total number of the point clouds exceeds a preset value, the candidate cylinder is indicated to be capable of being better fitted with the pipeline structure, and the candidate cylinder is determined to be an effective model. The method is used for eliminating noise points in the point cloud and strengthening structural characteristics of the pipeline.
The key frame is a core image frame for constructing and optimizing the map, and strict conditions are required to be met to ensure the efficiency and the precision of the map construction. The translation distance between the current frame and the previous key frame exceeds 1/5 of the diameter of the pipeline, or the rotation angle exceeds 5 degrees, so that repeated insertion of the key frames in a short distance is avoided, meanwhile, enough parallax between adjacent key frames is ensured, and the triangularization is facilitated to generate high-precision map points.
The number of the matching pairs of the characteristic points of the current frame and the characteristic points of the local map is not less than 80, so that enough space constraint relations are ensured to be used for subsequent optimization. The matching points need to cover at least 8 preset grid areas in the image, so that the matching points are prevented from being concentrated in local areas, and the comprehensiveness of space constraint is ensured. The root mean square of the optimized camera pose re-projection error is required to be smaller than 1.5 pixels, so that pose estimation of the current frame is ensured to be accurate enough, and errors are prevented from being introduced into a map.
The key frames inserted into the local map building thread are required to carry multidimensional information so as to meet the map building and optimizing requirements, wherein the optimized camera pose is used as a space reference for map point triangularization, the feature points and descriptors are used for being matched with other key frames to establish cross-frame space association, the time stamp ensures time synchronization of multi-sensor data, the common view relation records shared feature point information of other key frames and is used for building a topological structure of a local map, and the projection parameters of the cylindrical model represent the axial position and the circumferential angle of the current frame in the cylindrical model of the pipeline to strengthen the geometric association of the key frames and the pipeline structure.
Example III
Fig. 3 is a flowchart of a method for local BA according to a third embodiment of the present invention. As shown in fig. 3, the method includes:
S310, calculating three-dimensional coordinates of matching feature point pairs of the key frames and adjacent key frames through binocular triangulation, screening points with the reprojection error smaller than 2 pixels and the parallax angle larger than 1 degree as effective map points, and associating the observed key frame index and the observed pixel coordinates of each map point.
And matching the key frames with the adjacent key frames through feature descriptors to obtain feature point pairs. And converting the two-dimensional pixel coordinates of a pair of matched feature points into three-dimensional space coordinates by utilizing the internal parameters of the binocular camera and the relative pose between two key frames.
And re-projecting the calculated three-dimensional map points back to the original key frame image according to the pose of the camera, if the pixel deviation between the projection position and the original feature points is less than 2 pixels, indicating that the three-dimensional coordinates are accurately calculated, otherwise, matching errors or calculation errors can be possibly caused, and the three-dimensional coordinates need to be removed. The parallax angle is an included angle formed by the three-dimensional points and the optical center connecting lines of the two key frame cameras, and the larger the angle is, the more obvious the difference between the two visual angles is, the higher the depth precision of the triangularization calculation is, and if the angle is smaller than 1 degree, the depth calculation is easily affected by noise. Through the two conditions, the three-dimensional points with low precision and unreliability can be filtered out, and the effective map points which can stably reflect the pipeline structure are reserved.
The observed keyframe index is used to establish a many-to-many association of map points with keyframes. The observation pixel coordinates store the pixel positions of the map points in each observation key frame, and provide the basis for calculating the reprojection errors for pose optimization.
And S320, performing cylinder constraint verification on the effective map points based on a cylinder model, calculating the distance from the map points to the cylinder axis, marking the map points as cylinder points if the deviation between the distance and the cylinder radius is within a preset value, and recording parameterized coordinates of the map points in the circumferential direction and the axial direction of the cylinder.
And (3) based on the previously fitted cylindrical model, carrying out secondary verification on the effective map points generated by binocular triangulation, and judging whether the effective map points accord with the cylindrical geometric characteristics of the inner wall of the pipeline. And calculating the linear distance from each effective map point to the axis of the cylinder, comparing the linear distance with the radius of the cylinder model, and screening the cylinder points or the non-cylinder points. For a map point marked as a cylindrical point, the parameterized coordinate of the map point in a cylindrical coordinate system is required to be recorded, the circumferential angle theta is an included angle between the map point and a reference direction in a plane perpendicular to the axis by taking the cylindrical axis as a central axis, the position of the map point in the circumferential direction of the pipeline is described, and the axial distance s is a distance from the map point to the midpoint of the axis along the cylindrical axis, and the position of the map point in the length direction of the pipeline is described.
S330, constructing an optimization window containing a current key frame, a common view key frame and a corresponding map point, representing the pose of the camera by using a lie algebra, wherein the vertex comprises the pose of the key frame and the coordinates of the map point, the edge is an observation constraint, additionally adding a cylindrical constraint edge to the cylindrical point, constraining the distance from the cylindrical point to the axis to be equal to the radius of the cylinder, setting the weight to be 1.5 times of that of the common observation edge, adopting a g2o optimization library to carry out sparse BA solution, iterating until the root mean square of the reprojection error is smaller than 1.2 pixels or the number of iterations is 30, and outputting the optimized pose and the coordinates of the map point.
And selecting the current key frame, a plurality of key frames with highest common view degree and map points observed by the key frames together to form a local optimization window.
Defining vertexes and edges of an optimization model, wherein the vertexes are variables to be solved in the optimization process, and the vertexes comprise key frame poses, namely converting a nonlinear rotation matrix into a vector of a linear space by adopting a lie algebra representation, and simplifying optimization calculation. Map point coordinates are expressed by three-dimensional space coordinates (x, y, z) as space position parameters to be optimized. The edges describe the constraint relationship between vertices, i.e., the optimized objective function source, observe constraint edges, whose projected positions in the key frame image should be consistent with the actual observed feature point pixel positions for each map point, the constraint being quantified by the reprojection error. And (3) a cylindrical constraint side, namely an additionally added structural constraint, and forcing the distance from a cylindrical point to a cylindrical axis to be equal to the radius of the pipeline. The weight of the constraint is 1.5 times of that of a common observation side, which means that the geometric constraint of the pipeline is preferentially met in optimization, and the contribution of the structural characteristics to precision is enhanced.
And adopting a g2o optimization library to carry out sparse BA solution. The sparse BA utilizes the characteristic that map points are observed by only a few key frames, and the calculated amount is greatly reduced through sparse matrix operation, so that the method is suitable for a real-time system. And taking the current pose and map point coordinates as initial values, and continuously correcting pose and map point parameters by iteratively minimizing the residual square sum of all constraint edges. And (3) convergence judgment, namely stopping iteration when any condition that the root mean square of the reprojection errors of all observation points is smaller than 1.2 pixels is met, and the iteration times reach 30 times. The optimized key frame pose and map point coordinates eliminate accumulated errors in a local range, so that the local map is more accurate and self-consistent.
Fig. 4 is a flowchart of a global BA method according to a third embodiment of the present invention, as shown in fig. 4, where the method includes:
S410, inputting the feature point descriptors of the current key frame into XFeat databases, and searching the first 20 historical key frames with the highest feature matching degree with the current frame as candidate loop frames through an approximate nearest neighbor search algorithm.
And extracting descriptors of feature points in the current key frame, and inputting the descriptors into a XFeat feature database storing historical data. And (3) rapidly calculating the similarity between the current frame features and all the historical key frame features in the database by using an approximate nearest neighbor search algorithm, and finding out the first 20 historical key frames with the highest matching degree. The historical key frames serve as candidate loop frames, meaning that the robot may have arrived at the locations corresponding to these frames for verification of loop formation.
S420, estimating a similar transformation matrix from the feature matching pairs through a RANSAC algorithm for the candidate loop frames, wherein the interior point judgment threshold is set to be 3 pixels, confirming that the candidate frames are effective loop frames when the number of interior points exceeds 30% of the total number of the matching pairs and the transformation matrix meets homography constraint, and calculating accumulated errors based on the similar transformation matrix, wherein the error value is the root mean square of all interior point reprojection errors.
For candidate loop frames, estimating a similar transformation matrix from the feature matching pair by using a RANSAC algorithm, and setting 3 pixels as an interior point judgment threshold. When the number of interior points exceeds 30% of the total matching pair and the transformation matrix meets the homography geometric constraint, the candidate frame is confirmed to be a valid loop frame, namely the robot does return to the position which was reached. Based on the similarity transformation matrix, the root mean square of all interior point re-projection errors is calculated and used as the accumulated error before and after loop back to reflect the deviation degree of pose estimation.
S430, constructing a global optimization graph containing all key frame poses and map points, adding similar transformation obtained by loop detection into the graph as strong constraint, dynamically adjusting constraint weights according to accumulated errors, optimizing by adopting a sparse BA algorithm, taking the weighted sum of a reprojection error and loop constraint errors as an objective function, executing an limination operation once every 10 iterations in the iteration process to remove redundant variables, and stopping optimizing when the variation of the objective function of two iterations is smaller than 1e-6 or the iteration times reach 80 times.
And constructing a global optimization graph containing all key frame poses and map points, adding similar transformation obtained by loop detection as strong constraint, and dynamically adjusting constraint weights along with accumulated errors.
And (3) optimizing by adopting a sparse BA algorithm, taking the weighted sum of the reprojection error and the loop constraint error as an optimization target, and eliminating redundant variables through marginalization every 10 iterations to reduce the calculated amount.
Stopping when the change of the objective function of two continuous iterations is less than 1e-6 or the iteration is 80 times, and finally outputting the pose and the map which are globally consistent, so as to eliminate the accumulated error.
Example IV
Fig. 5 is a schematic structural diagram of a positioning device for a pipe network detection robot according to a fourth embodiment of the present invention. As shown in fig. 5, the apparatus includes:
An image acquisition unit 510, configured to acquire images by detecting a binocular camera set by the robot, transmit left-eye images and right-eye images in topic form, and extract feature points and descriptors of the left-eye images and the right-eye images by using an acceleration feature network.
An initial pose determining unit 520, configured to generate an initialized pipeline map by using the SLAM system and determine an initial pose of the camera by tracking the thread, and optimize the initial pose by using the feature points and the descriptors.
And a cylinder construction unit 530, configured to construct a cylinder according to the initialized pipeline map, and insert the keyframes meeting the conditions into the local mapping thread.
And a local BA unit 540 for generating map points and cylinder points using the received key frames by using the local mapping process, and executing the local BA.
And the global BA unit 550 is used for searching candidate loop frames through the XFeat database by utilizing a closed loop detection thread, calculating similar transformation to obtain accumulated errors, executing a global BA optimization map, and positioning the detection robot through the optimized map.
The pipe network detection robot positioning device provided by the embodiment of the invention can execute the pipe network detection robot positioning method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example five
Fig. 6 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including an input unit 16, such as a keyboard, mouse, etc., an output unit 17, such as various types of displays, speakers, etc., a storage unit 18, such as a magnetic disk, optical disk, etc., and a communication unit 19, such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the pipe network inspection robot positioning method.
In some embodiments, the pipe network inspection robot positioning method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the pipe network inspection robot positioning method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the pipe network inspection robot positioning method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), a blockchain network, and the Internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. The pipe network detection robot positioning method is characterized by comprising the following steps of:
Acquiring images by a binocular camera arranged by a detection robot, and respectively transmitting left-eye images and right-eye images in the form of topics;
Generating an initialized pipeline map by utilizing the SLAM system, and determining the initial pose of a camera by tracking threads;
Inserting key frames meeting the conditions into a local mapping thread according to the initialized pipeline map construction cylinder;
Generating map points and cylindrical points by utilizing the received key frames by utilizing the local map creation process, and executing local BA;
Searching candidate loop frames through XFeat database by utilizing a closed loop detection thread, calculating similarity transformation to obtain accumulated errors, executing global BA optimization map, and positioning the detection robot through the optimized map;
The acceleration feature network is an end-to-end network model based on deep learning, the input of the acceleration feature network is RGB data of left and right eye images, and the output of the acceleration feature network comprises:
A key point heat map of the position probability distribution of the feature points in the image is represented in a two-dimensional matrix form, wherein the higher the first matrix element value in the key point heat map is, the greater the possibility that the feature points exist at the position is;
The reliability heat map with the same size as the key point heat map is characterized in that the second matrix element value of the reliability heat map reflects the reliability score of the characteristic point of the corresponding position, the scoring range is [0,1], and the closer the value is to 1, the higher the reliability is;
A descriptor set, wherein each feature point corresponds to a 128-dimensional binary descriptor vector and is used for matching the feature points;
the extracting the characteristic points of the left eye image and the right eye image by using the acceleration characteristic network comprises the following steps:
And carrying out confidence scoring on each characteristic point in the key point heat map and the reliability heat map through the following formula:
;
wherein, the Is a normalized score for a feature point in the keypoint heat map,Is a normalized score for a feature point in the reliability heat map,AndIs a weight coefficient, and+=1;
Selecting the feature points with the scores of N in the top ranking, screening the feature points which are uniformly distributed through a grid division method, dividing the image into M multiplied by M grids, and keeping each grid to be not more than P feature points;
the optimizing the initial pose through the feature points and the descriptors comprises the following steps:
selecting map points corresponding to a preset number of key frames with highest common view range degree of the current frame from the local map, extracting descriptors of the map points to form a local feature library, and associating observation information of at least the preset number of key frames with each map point;
Adopting a bidirectional matching mechanism, firstly matching the current frame feature point descriptor with a local feature library to obtain initial matching pairs, then verifying the projection consistency of the current frame feature point and the map point in the corresponding key frame for each matching pair, and eliminating the matching pairs with projection errors exceeding the preset pixel number;
constructing a pose optimization objective function based on a lie algebra by taking an initial pose as an iteration starting point, wherein the objective function is a weighted sum of all matching point re-projection errors, and the weight is dynamically adjusted according to the observation times of map points;
And (3) carrying out iterative optimization by adopting a Gaussian-Newton algorithm, calculating a Jacobian matrix after each iteration, updating pose parameters, stopping when the iteration times reach preset times or the pose variation of two adjacent iterations is smaller than 1e-6 radians and 1e-3 meters, and outputting the optimized pose.
2. The method of claim 1, wherein the left eye image and the right eye image acquired by the binocular camera are respectively transmitted through independent communication topics, the topic format adopts a robot operating system ROS standard message type, and the transmission frame rate is the same as the acquisition frame rate of the binocular camera.
3. The method of claim 1, wherein generating an initialization pipeline map using the SLAM system and determining an initial pose of a camera by tracking threads comprises:
Selecting a preset frame number of double-mesh images, calculating a basic matrix through feature point matching, decomposing to obtain an initial rotation matrix and a translation vector, and generating a three-dimensional point cloud by combining internal reference and base line distances of the double-mesh camera in a triangularization way to form the initialized pipeline map;
Determining whether a previous frame image exists currently, if so, predicting the current pose of the binocular camera based on a constant-speed motion model to serve as an initial value, matching the feature points of the current frame and the previous frame, solving the pose by using a RANSAC-PnP algorithm, and performing iterative optimization;
And if the initial pose is determined for the first time after the SLAM system is started, screening interior points from the feature point matching pair through a random sampling consistency algorithm, calculating an essential matrix and decomposing to obtain the initial pose.
4. The method of claim 1, wherein the inserting key frames meeting the conditions into the local mapping thread according to the initializing the pipeline map building cylinder comprises:
Based on the point cloud data of the initialized pipeline map, fitting a cylinder parameter by adopting a random sampling consistency algorithm, wherein the cylinder parameter comprises a unit direction vector of a cylinder axis, an axis midpoint coordinate and a cylinder radius, wherein the distance between each point in the point cloud and a candidate axis is calculated through a point-to-straight line distance formula, and when the internal point occupation ratio exceeds a preset proportion value of the total number of the point cloud, determining the candidate cylinder as an effective model;
The current frame is determined to be a key frame and a local mapping thread is inserted when the following conditions are satisfied:
the translation distance between the key frame and the last key frame exceeds 1/5 of the diameter of the pipeline or the rotation angle exceeds 5 degrees;
the number of matching pairs between the current frame extracted feature points and the local map feature points is not less than 80, and the distribution of the matching points in the image covers at least 8 preset grid areas;
the root mean square value of the reprojection error after pose optimization is smaller than 1.5 pixels;
The key frames inserted into the local mapping thread comprise the following information of the optimized camera pose, the feature point set, the corresponding descriptors, the image acquisition time stamp, the common view relation with the adjacent key frames and the projection position parameters of the key frames in the cylindrical model.
5. The method of claim 4, wherein the generating map points and cylinder points using the received keyframes with the local mapping process, performing a local BA, comprises:
Calculating three-dimensional coordinates of matching feature point pairs of the key frames and adjacent key frames through binocular triangulation, screening points with the reprojection error smaller than 2 pixels and the parallax angle larger than 1 degree as effective map points, and associating the observed key frame index and the observed pixel coordinates of each map point;
Based on the cylinder model, carrying out cylinder constraint verification on the effective map points, calculating the distance from the map points to the cylinder axis, marking the map points as cylinder points if the deviation between the distance and the cylinder radius is within a preset value, and recording parameterized coordinates of the map points in the circumferential direction and the axial direction of the cylinder;
An optimization window comprising a current key frame, a common view key frame and a corresponding map point is constructed, a Lei algebra is used for representing the pose of a camera, the vertex comprises the pose of the key frame and the coordinates of the map point, the edge is an observation constraint, a cylindrical constraint edge is additionally added to the cylindrical point, the distance from the cylindrical point to the axis is constrained to be equal to the radius of the cylinder, the weight is 1.5 times of that of the common observation edge, a g2o optimization library is adopted for sparse BA solution, iteration is carried out until the root mean square of the re-projection error is smaller than 1.2 pixels or the iteration times is 30 times, and the optimized pose and the coordinates of the map point are output.
6. The method of claim 1, wherein searching candidate loop frames through XFeat database using a closed loop detection thread, calculating a similarity transformation to obtain a cumulative error, and executing a global BA optimization map, comprises:
inputting the feature point descriptors of the current key frame into XFeat database, searching the first 20 historical key frames with highest feature matching degree with the current frame through approximate nearest neighbor searching algorithm, and taking the first 20 historical key frames as candidate loop frames;
Estimating a similar transformation matrix from the feature matching pair by a RANSAC algorithm for the candidate loop frames, wherein the interior point judgment threshold is set to be 3 pixels, and confirming the candidate frames as effective loop frames when the number of the interior points exceeds 30% of the total number of the matching pairs and the transformation matrix meets homography constraint;
The method comprises the steps of constructing a global optimization graph comprising all key frame poses and map points, adding similar transformation obtained by loop detection into the graph as strong constraint, dynamically adjusting constraint weights according to accumulated errors, optimizing by adopting a sparse BA algorithm, performing marginalization operation once every 10 iterations in an iteration process to remove redundant variables by taking weighted sum of re-projection errors and loop constraint errors as an objective function, and stopping optimizing when the variation of the objective function of two iterations is smaller than 1e-6 or the iteration times reach 80 times.
7. The utility model provides a pipe network detection robot positioner which characterized in that includes:
The image acquisition unit is used for acquiring images through a binocular camera arranged by the detection robot, and transmitting left-eye images and right-eye images in a topic form respectively;
The initial pose determining unit is used for generating an initialized pipeline map by utilizing the SLAM system and determining the initial pose of the camera by tracking threads;
The column construction unit is used for constructing a column according to the initialized pipeline map and inserting key frames meeting the conditions into the local map construction thread;
A local BA unit, configured to generate map points and cylinder points by using the received key frame by using the local mapping process, and execute local BA;
The global BA unit is used for searching candidate loop back frames through a XFeat database by utilizing a closed loop detection thread, calculating similar transformation to obtain accumulated errors, executing a global BA optimization map, and positioning the detection robot through the optimized map;
The acceleration feature network is an end-to-end network model based on deep learning, the input of the acceleration feature network is RGB data of left and right eye images, and the output of the acceleration feature network comprises:
A key point heat map of the position probability distribution of the feature points in the image is represented in a two-dimensional matrix form, wherein the higher the first matrix element value in the key point heat map is, the greater the possibility that the feature points exist at the position is;
The reliability heat map with the same size as the key point heat map is characterized in that the second matrix element value of the reliability heat map reflects the reliability score of the characteristic point of the corresponding position, the scoring range is [0,1], and the closer the value is to 1, the higher the reliability is;
A descriptor set, wherein each feature point corresponds to a 128-dimensional binary descriptor vector and is used for matching the feature points;
The image acquisition unit, when executing the extraction of the feature points of the left-eye image and the right-eye image by using the acceleration feature network, executes:
And carrying out confidence scoring on each characteristic point in the key point heat map and the reliability heat map through the following formula:
;
wherein, the Is a normalized score for a feature point in the keypoint heat map,Is a normalized score for a feature point in the reliability heat map,AndIs a weight coefficient, and+=1;
Selecting the feature points with the scores of N in the top ranking, screening the feature points which are uniformly distributed through a grid division method, dividing the image into M multiplied by M grids, and keeping each grid to be not more than P feature points;
the initial pose determination unit, when executing the optimizing the initial pose by the feature points and descriptors, executes:
selecting map points corresponding to a preset number of key frames with highest common view range degree of the current frame from the local map, extracting descriptors of the map points to form a local feature library, and associating observation information of at least the preset number of key frames with each map point;
Adopting a bidirectional matching mechanism, firstly matching the current frame feature point descriptor with a local feature library to obtain initial matching pairs, then verifying the projection consistency of the current frame feature point and the map point in the corresponding key frame for each matching pair, and eliminating the matching pairs with projection errors exceeding the preset pixel number;
constructing a pose optimization objective function based on a lie algebra by taking an initial pose as an iteration starting point, wherein the objective function is a weighted sum of all matching point re-projection errors, and the weight is dynamically adjusted according to the observation times of map points;
And (3) carrying out iterative optimization by adopting a Gaussian-Newton algorithm, calculating a Jacobian matrix after each iteration, updating pose parameters, stopping when the iteration times reach preset times or the pose variation of two adjacent iterations is smaller than 1e-6 radians and 1e-3 meters, and outputting the optimized pose.
8. An electronic device, the electronic device comprising:
And a memory communicatively coupled to the at least one processor, wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the pipe network inspection robot positioning method of any one of claims 1-6.
9. A computer readable storage medium, wherein the computer readable storage medium stores computer instructions for causing a processor to implement the pipe network inspection robot positioning method of any one of claims 1-6 when executed.
10. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the pipe network inspection robot positioning method according to any one of claims 1-6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511149890.7A CN120726129B (en) | 2025-08-18 | 2025-08-18 | Pipe network detection robot positioning method, device, equipment, medium and product |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511149890.7A CN120726129B (en) | 2025-08-18 | 2025-08-18 | Pipe network detection robot positioning method, device, equipment, medium and product |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN120726129A CN120726129A (en) | 2025-09-30 |
| CN120726129B true CN120726129B (en) | 2025-11-04 |
Family
ID=97156833
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202511149890.7A Active CN120726129B (en) | 2025-08-18 | 2025-08-18 | Pipe network detection robot positioning method, device, equipment, medium and product |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120726129B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108665540A (en) * | 2018-03-16 | 2018-10-16 | 浙江工业大学 | Robot localization based on binocular vision feature and IMU information and map structuring system |
| CN117823741A (en) * | 2024-03-06 | 2024-04-05 | 福建巨联环境科技股份有限公司 | A trenchless repair method and system for pipe network combined with intelligent robot |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113989450B (en) * | 2021-10-27 | 2023-09-26 | 北京百度网讯科技有限公司 | Image processing method, device, electronic equipment and medium |
| CN120235924A (en) * | 2025-03-31 | 2025-07-01 | 东北大学 | Binocular vision SLAM method based on deep learning feature extraction and matching algorithm |
-
2025
- 2025-08-18 CN CN202511149890.7A patent/CN120726129B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108665540A (en) * | 2018-03-16 | 2018-10-16 | 浙江工业大学 | Robot localization based on binocular vision feature and IMU information and map structuring system |
| CN117823741A (en) * | 2024-03-06 | 2024-04-05 | 福建巨联环境科技股份有限公司 | A trenchless repair method and system for pipe network combined with intelligent robot |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120726129A (en) | 2025-09-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12504286B2 (en) | Simultaneous location and mapping (SLAM) using dual event cameras | |
| US10706567B2 (en) | Data processing method, apparatus, system and storage media | |
| CN115471748B (en) | A monocular vision SLAM method for dynamic environments | |
| Ge et al. | PIPO-SLAM: Lightweight visual-inertial SLAM with preintegration merging theory and pose-only descriptions of multiple view geometry | |
| CN112767546B (en) | Binocular image-based visual map generation method for mobile robot | |
| Zhang et al. | Improved feature point extraction method of ORB-SLAM2 dense map | |
| CN113570713A (en) | Semantic map construction method and device for dynamic environment | |
| CN113393524A (en) | Target pose estimation method combining deep learning and contour point cloud reconstruction | |
| Chang et al. | YOLOv4‐tiny‐based robust RGB‐D SLAM approach with point and surface feature fusion in complex indoor environments | |
| CN118521601A (en) | Indoor scene 3D layout estimation method and device based on angular point depth prediction | |
| CN118351181A (en) | Visual positioning method and electronic equipment | |
| Chen et al. | Multiview stereo via noise suppression PatchMatch | |
| Lee et al. | Robust uncertainty-aware multiview triangulation | |
| CN117036447A (en) | Indoor scene dense three-dimensional reconstruction method and device based on multi-sensor fusion | |
| CN116128961A (en) | Product assembly-oriented label-free augmented reality tracking registration method | |
| Kocur et al. | Robust self-calibration of focal lengths from the fundamental matrix | |
| CN120726129B (en) | Pipe network detection robot positioning method, device, equipment, medium and product | |
| CN120014579A (en) | SLAM method based on radar odometer and visual feature point depth filter | |
| CN120198501A (en) | A visual positioning method, device, electronic equipment and medium | |
| CN115375740B (en) | Pose determining method, three-dimensional model generating method, pose determining device, pose determining equipment and three-dimensional model generating medium | |
| Olivier et al. | Live structural modeling using rgb-d slam | |
| CN113920267A (en) | Three-dimensional scene model construction method, device, equipment and storage medium | |
| Wu et al. | DPC-SLAM: Discrete Plane Constrained VSLAM for Intelligent Vehicle in Road Environment | |
| Gou et al. | DR-Fusion: Dynamic SLAM 3D Reconstruction Method of Production Workshop | |
| CN119295439B (en) | Method and device for identifying hidden danger of power transmission cable external damage, electronic equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |