CN120070568A - Positioning method and related device - Google Patents
Positioning method and related device Download PDFInfo
- Publication number
- CN120070568A CN120070568A CN202510007135.9A CN202510007135A CN120070568A CN 120070568 A CN120070568 A CN 120070568A CN 202510007135 A CN202510007135 A CN 202510007135A CN 120070568 A CN120070568 A CN 120070568A
- Authority
- CN
- China
- Prior art keywords
- map
- illumination
- target
- gaussian
- primitive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a positioning method and a related device, and relates to the technical field of computer vision. And acquiring a historical environment map meeting preset illumination matching conditions from the historical environment map with the illumination label based on the target illumination as an illumination matching map, and acquiring a positioning map based on the illumination matching map. And updating the pose of the camera based on the new Gaussian primitive and the Gaussian primitive of the positioning map. Because the target illuminance is the ambient illuminance when the target ambient image is captured, the illuminance tag of the historical ambient map represents the ambient illuminance when the historical ambient map is generated, and the illumination matching condition at least includes a threshold value of the difference between the illuminance tag and the target illuminance, which is less than Yu Guangzhao. Therefore, the method is used for positioning based on the historical environment map with the illumination condition similar to the current target environment image, and positioning accuracy is improved.
Description
Technical Field
The application relates to the technical field of computer vision, in particular to a positioning method and a related device.
Background
SLAM (Simultaneous Localization AND MAPPING, instant localization and mapping) technology is one of the core technologies for robots, unmanned aerial vehicles, intelligent security devices, locomotive trains, etc. to realize movement and autonomous operation in complex environments. The equipment immediately positions and postures of the equipment based on repeatedly observed environmental characteristics in the moving process through SLAM technology, and meanwhile, an incremental environmental map is constructed, so that autonomous positioning and navigation are realized. Compared with a laser radar SLAM, the visual SLAM has the advantages of low cost, light equipment, low power consumption and the like, and particularly has obvious advantages in a scene with limited resources. The vision SLAM relies on the image information acquired by the camera, does not need expensive laser radar equipment, greatly reduces hardware cost, and can be seamlessly integrated with the existing vision perception system, so that the vision SLAM has higher applicability.
However, the accuracy and robustness of the visual SLAM are easily affected by environmental conditions, and thus, how to improve the positioning accuracy and robustness of the visual SLAM in a complex environment has become an important research direction in the field of the visual SLAM.
Disclosure of Invention
In view of the above, the present application provides a positioning method and related apparatus, so as to achieve the purpose of improving the positioning accuracy of positioning by the visual SLAM technology. The specific scheme is as follows:
the first aspect of the present application provides a positioning method, including:
Acquiring 3D point cloud data of a target environment image;
initializing a Gaussian primitive by using the 3D point cloud data to obtain a new Gaussian primitive;
acquiring a historical environment map meeting preset illumination matching conditions from a scene ground chart based on target illumination, wherein the illumination matching conditions at least comprise that the illumination difference between an illumination label and the target illumination is smaller than a preset illumination difference threshold value, the target illumination is the environment illumination when the target environment image is shot, the scene ground chart comprises the historical environment map with the illumination label, and the illumination label of the historical environment map represents the environment illumination when the historical environment map is generated;
acquiring a positioning map based on the illumination matching map;
and updating the camera pose based on the new Gaussian primitive and the Gaussian primitive of the positioning map.
In one possible implementation, acquiring 3D point cloud data of a target environment image includes:
receiving a monocular environment image acquired by monocular image acquisition equipment as a target environment image;
obtaining a depth map of the target environment image through a pre-trained depth estimation model;
And mapping the depth map of the target environment image into a 3D point cloud based on a preset camera internal parameter to obtain the 3D point cloud data.
In one possible implementation, obtaining a historical environment map satisfying a preset illumination matching condition from a scene ground chart based on target illuminance includes:
acquiring illuminance of the target environment image based on preset illumination characteristics;
For each target historical environment map, comparing the illuminance tag of the target historical environment map with the target illuminance to obtain the illuminance difference degree of the target historical environment map, wherein the target historical environment map comprises at least one historical environment map in the scene map;
And comparing the illumination difference degree with the illumination difference degree threshold value for each target historical environment map, determining that the target historical environment map meets the illumination matching condition if the illumination difference degree is smaller than the illumination difference degree threshold value, and determining that the target historical environment map does not meet the illumination matching condition if the illumination difference degree is not smaller than the illumination difference degree threshold value.
In one possible implementation, based on the illumination matching map, obtaining a positioning map includes:
if the number of the illumination matching maps is equal to 1, taking the illumination matching maps as the positioning maps;
If the number of the illumination matching maps is greater than 1, acquiring two historical environment maps with the smallest illumination difference degree from all the illumination matching maps, and respectively serving as a first illumination matching map and a second illumination matching map;
Respectively carrying out weighted fusion on Gaussian primitives at the same positions of the first illumination matching map and the second illumination matching map based on corresponding weighting coefficients to obtain a fusion Gaussian primitive set, wherein the weighting coefficients of the Gaussian primitives of a target illumination matching map are inversely related to the illumination difference degree of the target illumination matching map, and the target illumination matching map comprises the first illumination matching map and the second illumination matching map;
and constructing the positioning map based on the fusion Gaussian primitive set.
In one possible implementation, updating the camera pose based on the new gaussian primitive and the gaussian primitive of the localization map includes:
If the positioning map exists, registering the new Gaussian primitive with the Gaussian primitive of the positioning map by using a preset point cloud-based registration algorithm to obtain a rotation matrix;
And updating the camera pose by using the rotation matrix.
In one possible implementation, after the obtaining, based on the target illuminance, a historical environment map satisfying a preset illuminance matching condition from the scene graph, the positioning method further includes:
if an illumination matching map meeting a preset updating condition exists, selecting an illumination matching map meeting the updating condition as a reference map, wherein the updating condition comprises that the illumination difference is smaller than a preset updating threshold value, and the updating threshold value is smaller than the illumination difference threshold value;
Updating the reference map based on the new gaussian primitive;
And updating an illuminance tag of the reference map based on the target illuminance.
In one possible implementation, updating the reference map based on the new gaussian primitive includes:
adding the new Gaussian primitive to the reference map to obtain a first candidate map;
Heuristic pruning is carried out on the number of Gaussian primitives in the first candidate map to obtain a second candidate map;
rendering by using a micro-renderable path based on the second candidate map to obtain a rendered image under the pose of the camera;
comparing the rendered image with a target monocular environment image, calculating a loss value, counter-propagating the loss value, and calculating a gradient corresponding to each Gaussian primitive in the second candidate map;
Updating each Gaussian primitive based on the gradient corresponding to each Gaussian primitive in the second candidate map to obtain an environment map of the target environment image;
Updating the reference map to be an environment map of the target environment image.
In one possible implementation, after the obtaining, based on the target illuminance, a historical environment map satisfying a preset illuminance matching condition from the scene graph, the positioning method further includes:
If the illumination matching map meeting the updating conditions does not exist, a new environment map is built based on the new Gaussian primitives;
And storing the new environment map to the scene map by taking the target illuminance as an illuminance tag.
A second aspect of the present application provides a positioning device comprising:
The point cloud generation unit is used for acquiring 3D point cloud data of the target environment image;
the primitive initializing unit is used for initializing the Gaussian primitive by using the 3D point cloud data to obtain a new Gaussian primitive;
The illumination matching unit is used for acquiring a historical environment map meeting preset illumination matching conditions from a scene ground chart based on target illumination, and taking the historical environment map as an illumination matching map, wherein the illumination matching conditions at least comprise that the illumination difference between an illumination label and the target illumination is smaller than a preset illumination difference threshold value;
The positioning map acquisition unit is used for acquiring a positioning map based on the illumination matching map;
And the positioning unit is used for updating the camera pose based on the new Gaussian primitive and the Gaussian primitive of the positioning map.
A third aspect of the application provides an electronic device comprising at least one processor and a memory coupled to the processor, wherein:
the memory is used for storing a computer program;
the processor is configured to execute the computer program to enable the electronic device to implement the positioning method of the first aspect or any implementation manner of the first aspect.
By means of the technical scheme, the positioning method and the related device provided by the application use the 3D point cloud data of the target environment image to initialize the Gaussian primitive to obtain the new Gaussian primitive. And acquiring a historical environment map meeting preset illumination matching conditions from the historical environment map with the illumination label based on the target illumination as an illumination matching map, and acquiring a positioning map based on the illumination matching map. And updating the pose of the camera based on the new Gaussian primitive and the Gaussian primitive of the positioning map. Because the target illuminance is the ambient illuminance when the target ambient image is captured, the illuminance tag of the historical ambient map represents the ambient illuminance when the historical ambient map is generated, and the illumination matching condition at least includes a threshold value of the difference between the illuminance tag and the target illuminance, which is less than Yu Guangzhao. Therefore, the method is used for positioning based on the historical environment map with the illumination condition similar to the current target environment image, and the influence of the illumination condition change on positioning is avoided, so that the positioning accuracy and the robustness are improved.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a positioning method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of an SLAM system according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a specific implementation of a positioning method according to an embodiment of the present application;
FIG. 4 is a flowchart of a specific implementation for obtaining a positioning map and a reference map according to an embodiment of the present application;
Fig. 5 is a schematic structural diagram of a positioning device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application herein is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.
Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Taking a mobile device as an example of a locomotive train in a closed-scene intelligent transportation system, the locomotive train needs to be transported efficiently and accurately in a closed station.
In order to realize accurate control of starting and stopping of a locomotive train, the traditional laser radar SLAM technology realizes environmental awareness and map construction of a closed station through a laser radar so as to determine the position of the locomotive train in a station yard in real time, thereby ensuring transportation efficiency and safety. However, lidar SLAMs rely on expensive lidar sensors, which face limitations for intelligent transportation systems that require large-scale deployment due to their high cost and poor practicality. In addition, the laser radar SLAM technology needs to transmit high-bandwidth three-dimensional point cloud data and rely on a high-performance computing unit for processing, which further exacerbates the problems of high cost and high power consumption. Therefore, the application scenario of the laser radar SLAM technology is limited by large heat productivity of equipment, high load of transmission bandwidth and expensive overall investment and operation cost.
In order to overcome the defects of the laser radar SLAM, the prior art uses a visual SLAM technology to construct an environment map. The visual SLAM relies on image information acquired by the camera equipment, and the environment map construction and positioning are realized through algorithm processing.
For example, based on the NeRF (neural radiation field) visual SLAM technique, after a multi-angle image is acquired by an image pickup apparatus, an environment image is constructed by using a NeRF method. However, the NeRF method requires high computational effort to learn and infer the implicit representation of the three-dimensional scene, resulting in slow processing speed and difficulty in meeting the real-time positioning requirements.
For another example, a visual SLAM technique based on GS (Gaussian Splatting, gaussian sputtering) acquires an environmental image and depth information by an RGB-D camera, and constructs an environmental map based on the gaussian sputtering technique. However, RGB-D camera hardware costs are high and positioning performs poorly.
The inventor researches and discovers that the applicability of the existing GS-based visual SLAM technology in a practical scene is further limited by the sensitivity of the existing GS-based visual SLAM technology to illumination conditions. For example, the prior art, when acquiring environmental images and depth information by RGB-D cameras, measures depth by actively emitting light, is limited in effective detection distance and illumination conditions in outdoor applications, especially in complex illumination and large scale environments. As another example, changes in lighting conditions can significantly affect the extraction and matching of image features. The method is particularly changed into the condition that the characteristic points are extracted incompletely in a low-light environment, and the accuracy of positioning and map construction is reduced due to shadows and overexposure areas in a high-light or non-uniform-light environment. This sensitivity to illumination changes makes it difficult for existing visual SLAM techniques to work stably under dynamic, complex illumination conditions. It can be seen that the existing visual SLAM technology still has difficulty in overcoming the technical disadvantage of poor adaptability.
In order to solve the technical disadvantages, the embodiment of the application provides a positioning method, which uses a GS-based visual SLAM technology to acquire a position map for positioning through a pre-constructed scene ground chart and illuminance analysis of a currently acquired environment image. So as to overcome the influence of illumination conditions on the positioning accuracy and improve the positioning accuracy. The following describes the positioning method of the embodiment of the present application in detail with reference to the accompanying drawings, and the method uses the edge reasoning device in the SLAM system as the execution main body. Referring to fig. 1, fig. 1 is a flow chart of a positioning method according to an embodiment of the present application, and as shown in fig. 1, a data processing method according to an embodiment of the present application may include steps S101 to S105, which are respectively described in detail below.
S101, acquiring 3D point cloud data of a target environment image.
In this embodiment, the target environmental image is an environmental image acquired in real time based on a pre-configured camera, for example, a monocular camera or a multi-eye camera is configured at the top end of the intelligent mobile device, and the monocular camera or the multi-eye camera is controlled to acquire the environmental image based on a pre-set acquisition frequency. The 3D point cloud data includes spatial coordinates and color information of each point in the 3D point cloud.
In this embodiment, the information carried by the target environment image is different based on different camera types, so the method for acquiring the 3D point cloud data of the target environment image is different.
For example, a target environment image acquired by a monocular camera is a monocular image, depth information of the target environment image is predicted by using a pre-constructed depth estimation model, and 3D point cloud data is obtained through a GS technology.
For another example, the target environment image acquired when the camera is a multi-view image. Based on depth information carried by the multi-view images, 3D point cloud data are obtained through GS technology.
It should be noted that, the method for acquiring the 3D point cloud data of the target environment image may refer to the prior art.
S102, initializing the Gaussian primitive by using the 3D point cloud data to obtain a new Gaussian primitive.
In this embodiment, the environment map is represented using a set of gaussian primitives, that is, the environment map is composed of a plurality of gaussian primitives.
In this embodiment, the gaussian primitive is a parameterized three-dimensional object, and the parameters of the gaussian include a mean vector, a covariance matrix, a color, and transparency, and the gaussian primitive represents a position and a shape using the mean vector and the covariance matrix. Gaussian primitive is represented by its mean vectorDescribing the three-dimensional space position, for each Gaussian primitive, the mean value thereof represents the coordinates of the primitive center in the three-dimensional space, and the coordinates are represented by a covariance matrixThe shape is described. In the initialization, each point in the 3D point cloud is assumed to correspond to a Gaussian primitive, the mean value in the Gaussian primitive is the coordinate of the point, and the covariance is obtained by using k close approximation calculation of the point.
S103, acquiring a historical environment map meeting preset illumination matching conditions from the scene graph based on the target illumination as an illumination matching map.
In this embodiment, the illumination matching condition at least includes that the illumination difference between the illumination label and the target illumination is smaller than a preset illumination difference threshold. The target illuminance is the ambient illuminance when the target ambient image is captured, the scene map table includes a history ambient map with illuminance tags representing the ambient illuminance when the history ambient map is generated.
In this embodiment, the illumination difference is the absolute value of the difference between the illumination label and the target illumination.
S104, acquiring a positioning map based on the illumination matching map.
In this embodiment, based on the illumination matching map, specific methods for obtaining the positioning map include various methods, for example, randomly selecting one map from the illumination matching maps as the positioning map, and for example, fusing a plurality of illumination matching maps to obtain the positioning map.
In this embodiment, the illuminance of the positioning map still satisfies the illumination matching condition.
S105, updating the camera pose based on the new Gaussian primitive and the Gaussian primitive of the positioning map.
In this embodiment, the method for updating the camera pose based on the new gaussian primitive and the gaussian primitive of the positioning map includes multiple methods, and in an optional embodiment, a preset point cloud-based registration algorithm may be used to register the new gaussian primitive and the gaussian primitive of the positioning map to obtain a rotation matrix, and the camera pose is updated by using the rotation matrix.
According to the technical scheme, the positioning method provided by the embodiment of the application uses the 3D point cloud data of the target environment image to initialize the Gaussian primitive to obtain the new Gaussian primitive. And acquiring a historical environment map meeting preset illumination matching conditions from the historical environment map with the illumination label based on the target illumination as an illumination matching map, and acquiring a positioning map based on the illumination matching map. And updating the pose of the camera based on the new Gaussian primitive and the Gaussian primitive of the positioning map. Because the target illuminance is the ambient illuminance when the target ambient image is captured, the illuminance tag of the historical ambient map represents the ambient illuminance when the historical ambient map is generated, and the illumination matching condition at least includes a threshold value of the difference between the illuminance tag and the target illuminance, which is less than Yu Guangzhao. Therefore, the method is used for positioning based on the historical environment map with the illumination condition similar to the current target environment image, and the influence of the illumination condition change on positioning is avoided, so that the positioning accuracy and the robustness are improved.
Further, an embodiment of the present application provides a SLAM system for implementing a positioning method, and fig. 2 is a schematic structural diagram of the SLAM system provided in the embodiment of the present application, as shown in fig. 2, where the SLAM system includes an edge reasoning device and a monocular image capturing device configured in an intelligent mobile device, and a reasoning server. The intelligent mobile equipment comprises a mobile phone, an unmanned aerial vehicle, a robot, an automobile, a train and the like, and the edge reasoning equipment is a controller which is provided with a computing card, a computing chip, a computing core and the like and is focused on a computing unit for large-scale parallel operation. The monocular image acquisition device comprises a monocular camera arranged on the intelligent mobile device. For example, monocular image acquisition devices and edge reasoning devices of self-sensing mobile devices such as sweeping robots, locomotive trains, unmanned aerial vehicles and the like are configured in the self-sensing mobile devices. The reasoning server is a high-performance GPU server.
In this embodiment, a pre-trained depth estimation model is configured in the edge reasoning device, and the depth estimation model is obtained by training a high-performance GPU server based on big data. Optionally, a depth estimation model is preconfigured in the edge reasoning device, usingAnd training a large-scale depth estimation model. RTX 6000 is high-performance GPU, and can process the training process of the large-scale model. The edge reasoning side host device is provided with a samsung 990 EVO SSD storage device and an Intel i5-1240P processor, 4 cores, and supports system operation and map related operation calculation. The inference side computing device employs NVIDIA Jetson AGX Orin to implement depth estimation and fast inference of gaussian sputtering. NVIDIA Jetson AGX Orin has strong computing power and energy efficiency ratio, and the delay can be reduced by placing the computing task of the positioning method on the edge reasoning equipment, so that the efficiency and instantaneity of the system are improved.
It should be noted that, the edge inference device in the SLAM system and the monocular image acquisition device construct communication connection in advance, and the communication connection mode is determined based on the hardware configuration mode of the edge inference device and the monocular image acquisition device, for example, wired communication may be adopted for the edge inference device configured at the near end (in the mobile device) and the monocular image acquisition device, and wireless communication may be adopted for the edge inference device configured at the far end and the monocular image acquisition device. Specific communication means can be seen in the prior art.
In this embodiment, the monocular image acquisition device is configured to acquire a monocular environmental image in real time, and send the monocular environmental image to the edge inference device in real time according to a time sequence, where the edge inference device is configured to implement positioning based on the monocular environmental image by combining monocular depth estimation and GS (Gaussian Splatting, gaussian sputtering) technology. The system uses monocular environment images to perform environment sensing and uses Gaussian sputtering imaging, so that the limit of the SLAM technology on the performances of the image pickup equipment and the edge reasoning equipment is reduced, and the speed of environment map construction and the applicability of the SLAM technology are improved. The low-cost monocular camera is used for replacing an expensive laser radar sensor and an RGB-D camera, so that the hardware configuration is simplified, the hardware cost is reduced, the applicability of SLAM technology is improved, and the method is particularly suitable for mobile intelligent equipment needing autonomous decision operation.
In addition, the system uses the monocular environment image to perform environment sensing, and can be applied to the monocular mode of the multi-camera, wherein the monocular mode of the multi-camera can be an active triggering power saving mode for reducing power consumption or a passive triggering disaster recovery mode due to fault reasons.
Based on the SLAM system, the embodiment of the present application provides a specific implementation manner of a positioning method, where the method is specifically applied to an edge reasoning device in the SLAM system, and fig. 3 is a flowchart of a specific implementation of the positioning method provided by the embodiment of the present application, and as shown in fig. 3, the method specifically may include:
S301, receiving a target monocular environment image acquired by monocular image acquisition equipment.
In this embodiment, the monocular image capturing device may use an Oak-D-Pro camera, where the target monocular environmental image is an RGB image with a resolution of 640 x 480, and the monocular image capturing device captures the target monocular environmental image at a preset frame rate to form a scene video stream, and sends the scene video stream to the edge inference device frame by frame.
The preset frame rate may be a fixed frame rate or a dynamic frame rate for meeting the real-time requirement and the requirement of scene change information. Optionally, the preset frame rate is 10 frames/second.
S302, obtaining a depth map of each frame of target monocular environment image through a pre-trained depth estimation model.
In this embodiment, the depth map of the target monocular environment image includes absolute depth information of each pixel in the target monocular environment image, and the absolute depth information includes a depth value. Specifically, the acquired target monocular environment image is input into a depth estimation model, feature analysis is carried out on each pixel in the target monocular environment image through the depth estimation model, absolute depth information (unit: meter) corresponding to the pixel is inferred, and a depth map of the target monocular environment image is output.
In the embodiment, the depth estimation model is pre-trained on a large-scale data set, has the generalization capability of depth estimation, can adapt to a wide range of scenes including outdoors and indoors, and has a depth map of 3D point cloud generation and slam mapProvides environmental information.
S303, mapping a depth map of the target monocular environment image into a 3D point cloud based on camera internal parameters to obtain 3D point cloud data.
In this embodiment, the 3D point cloud data includes spatial coordinates and color information of each point in the 3D point cloud.
Specifically, the spatial coordinates of each point in the 3D point cloud are generated by projecting pixels in the depth map of the target monocular environmental image using camera internal parameters, see specifically formula (1):
(1);
Wherein, Representing the image coordinates of the pixel,Representing the principal point coordinates of the camera's internal parameters,AndThe focal length is indicated as such,Representing pixelsCorresponding depth values on the depth map.
In this embodiment, the camera parameters are fixed after being initialized and configured in the initial state, and a specific method for initializing the camera parameters may refer to the prior art.
S304, initializing the Gaussian primitive by using the 3D point cloud data to obtain a new Gaussian primitive.
S305, acquiring a positioning map and a reference map based on the target illuminance and the scene graph.
In this embodiment, the target illuminance is the illuminance of the target monocular environment image, and the scene map table includes a plurality of historical environment maps with illuminance tags. The illuminance tag of the history environment map represents the illuminance of the monocular environment image that generated the history environment map.
In this embodiment, the positioning map is obtained based on an illumination matching map, where the illumination matching map includes at least one historical environment map matched with the target illuminance in a scene graph, where the historical environment map matched with the target illuminance at least meets that an illumination difference between an illuminance tag and the target illuminance is smaller than a preset illuminance difference threshold.
In an alternative embodiment, a historical environment map with a light intensity difference degree between the light intensity label and the target light intensity smaller than a light intensity difference threshold value and the latest update time stamp and the current time is selected from the scene map as the positioning map.
In another alternative embodiment, a plurality of historical environment maps with the illumination difference degree of the illumination label and the target illumination smaller than the illumination difference threshold value are selected from the scene map, and the historical environment maps are fused, and the fusion result is used as a positioning map.
In this embodiment, the reference map is an illumination matching map that satisfies an update condition, where the update condition includes that an illumination difference between the illumination label and the target illumination is minimum and less than a preset update difference threshold, and the update difference threshold is less than the illumination difference threshold.
And S306, if the positioning map exists, carrying out GICP registration on the new Gaussian primitive and the Gaussian primitive in the positioning map to obtain a rotation matrix, and updating the pose of the camera by using the rotation matrix.
Specifically, a new gaussian primitive is first created based on 3D point cloud data. Will beAs a source, gaussian primitives in the map will be locatedAs a targetRegistration is performed using GICP (Generalized Iterative Closest Point, point cloud based registration method) to obtain a rotation matrix of the camera, and the camera pose of the camera is updated by the rotation matrix.
GICP formula is shown as formula (2) below:
(2);
Wherein, Is a transformation matrix.
And S307, if the reference map exists, adding a new Gaussian primitive to the reference map to obtain a first candidate map.
In this embodiment, the new Gaussian primitive obtained by initialization is initializedAdded to the reference map, i.e. updating the Gaussian primitive set of the reference map to。
S308, heuristic pruning is conducted on the number of Gaussian primitives in the first candidate map, and a second candidate map is obtained.
In this embodiment, to prevent the excessive number of gaussian primitives in the first candidate map from causing a decrease in computation efficiency, gaussian primitives that have not been updated for a long period or have higher transparency are removed from the first candidate map based on factors such as frequency of use, transparency, gradient, and the like.
Specifically, judging whether the Gaussian primitive in the first candidate map meets the condition that the transparency is larger than a transparency thresholdOr the update time is greater than the update time thresholdIf yes, removing the gaussian primitive from the first candidate map, and updating the first candidate map to be the second candidate map, as shown in the following formula (3):
(3)。
And S309, based on the second candidate map, rendering by using the micro-renderable path to obtain a rendered image under the pose of the camera.
In this embodiment, based on the second candidate map, the method for rendering the environmental image under the pose of the camera by using the micro-renderable path includes:
B1, projecting Gaussian primitives, projecting the Gaussian primitives in a three-dimensional space to a phase plane based on the current camera pose, and simultaneously considering the mean value of the Gaussian primitives in the projection process Sum covariance matrix。
Specifically, the mean position is projected first, and the three-dimensional mean vector is calculated using the internal reference matrix K of the cameraPixel position projected to two-dimensional planeAbove, as in formula (4):
(4);
Wherein, AndIs thatThe normalized coordinates on the phase plane, the camera internal reference matrix K, is generally expressed as:
, wherein, AndThe focal length is indicated as such,AndIs the principal point coordinates. Obtained after projectionIs the pixel location of the mean of the gaussian primitive on the phase plane.
Then, projecting covariance matrix, namely setting rotation matrix of monocular camera as R and translation vector as t, then obtaining three-dimensional covariance matrixProjection of (2) is asWherein jacobian matrix J is represented as formula (5):
(5);
Wherein z is the mean value of The depth value of a gaussian primitive is its projection in the direction of the camera viewpoint (typically the z-axis direction). J describes local variations of projection relationships, i.e. coordinates in three dimensionsThe (x, y, z) changes in the two-dimensional plane position (u, v) corresponding to the small changes.
And B2, sorting all the projected Gaussian primitives covering the pixel positions at each pixel position of the phase plane according to depth values (distances from a camera) from large to small to obtain a Gaussian primitive sequence of the pixel positions.
In this embodiment, the Gaussian primitive sequence includes target Gaussian primitives ordered from large to small according to depth values, the target Gaussian primitive is a Gaussian primitive which covers the pixel after projection, and the pixel position is determinedExpressed as a gaussian primitive sequence of (c):。
In this embodiment, the depth values of the gaussian primitives are sequentially from large to small. So that, at rendering, gaussian primitives with large depth (far) are prioritized to ensure that the gaussian primitives in front can cover the pixels in back.
And B3, in the position of each pixel of the phase plane, weighting and summing the transparency and the color of each Gaussian primitive in the Gaussian primitive sequence corresponding to the pixel to obtain the rendering color of the pixel.
In this embodiment, the specific method for rendering the color is shown in formula (6):
(6);
Wherein, Is the transparency of the ith gaussian primitive in the gaussian primitive sequence,Is the color of the ith gaussian primitive in the gaussian primitive sequence,Is the transparency of the j-th gaussian primitive in the gaussian primitive sequence.
And B6, rendering based on the rendering colors of all the pixels to obtain a rendering image.
S310, comparing the rendered image with the target monocular environment image, calculating a loss value, back-propagating the loss value, and calculating a gradient corresponding to each Gaussian primitive in the second candidate map.
In this embodiment, the parameters of each Gaussian primitive include the mean positionCovariance matrix Sigma, transparencyColor c, gradient of。
Specifically, the specific method for calculating the loss value and back-propagating the loss value and calculating the parameter of each gaussian cell in the second candidate map comprises the following steps:
And C1, calculating a difference value between the rendered image and the target monocular environment image based on the loss function to obtain a loss value.
In this embodiment, the difference value represents the degree of difference between the rendered image and the target monocular environment image, and the larger the difference is, the larger the difference between the map and the real scene is, and the larger the loss value is.
The loss function formula is as formula (7):
(7);
Wherein, Is the kth pixel in the input target monocular ambient image,Is the kth pixel in the rendered image, and the total number of pixels is K.
And C2, carrying out back propagation on the loss value along the micro-renderable path, and calculating the gradient corresponding to the parameters of each Gaussian primitive in the second candidate map.
And S311, updating each Gaussian primitive based on the gradient corresponding to each Gaussian primitive in the second candidate map to obtain the environment map of the target monocular environment image.
In this embodiment, the optimization formula is formula (8):
(8);
Wherein θ represents The parameters of the Gaussian primitive, eta is a preset learning rate.
S312, updating the reference map to be the environment map of the target monocular environment image, and updating the illuminance tag of the reference map to be the target illuminance.
When the SLAM system is operated, the above-described steps S301 to S312 are repeated based on each acquired monocular environment map, and the positioning task (S301 to 306) and the updating task (S307 to S312) are repeatedly executed based on each monocular environment map.
According to the technical scheme, the monocular depth estimation method is used for replacing an RGB-D camera and a laser radar, hardware dependence is reduced, system complexity is reduced, and scheme applicability is improved. The Gaussian sputtering method is used for constructing the map, so that the accuracy and efficiency of map construction are improved, and efficient three-dimensional reconstruction is realized. The positioning speed of 5-10 frames per second can be realized, the real-time requirement of the closed scene intelligent transportation system is met, the solution is low in cost and high in efficiency, an economic and efficient SLAM solution is provided through the combination of a monocular camera and a Gaussian sputtering technology, and the economical efficiency of the solution is improved.
Specifically, in the positioning method provided by the embodiment of the application, the scene ground chart comprises the historical environment maps under different illumination, the reference map in the scene ground chart is continuously optimized through the updating task, so that the historical environment map under each illumination condition more accurately represents the actual scene, the positioning task obtains the positioning map based on the illumination condition to realize the updating of the pose parameters of the camera, the influence of different illumination on the visual slam is solved, and the positioning precision is improved.
Further, by the acquisition method of the positioning map and the reference map shown in S305, the positioning map for the positioning task is isolated from the reference map for the updating task, and the positioning map is discarded without being saved, thereby preventing the map from being frequently updated and stabilizing the visual slam operation.
Further, by the execution method of the update task shown in S307-S312, under the condition that a reference map exists, the reference map and the illuminance tag are updated based on the target monocular environment image, so that the update and optimization of the historical environment map under the current illumination condition are realized, and the accuracy and the integrity of the historical environment map in the scene map are improved, thereby improving the accuracy of the subsequent positioning task.
It should be noted that, the embodiment corresponding to fig. 3 is only a specific implementation procedure of an optional positioning method provided by the embodiment of the present application, and the present application may also be implemented by other specific procedures.
For example, in an alternative embodiment, the present application further includes the following steps A1 and A2:
A1, initializing Gaussian primitives by using 3D point cloud data if a positioning map does not exist, obtaining a target environment map, and initializing a camera pose based on the target environment map.
In the embodiment, the situation that the positioning map does not exist comprises that the illumination matching map does not exist in the scene ground chart, namely, the historical environment map matched with the target illumination does not exist. In the initial state, there is no history environment map in the scene map, and it is also the case that there is no positioning map.
In this embodiment, the camera pose is initialized using a two-frame-based initialization method, and the specific initialization method can be seen in the prior art.
In this embodiment, when creating a new environment map, a gaussian primitive set is initialized therein using a gaussian sputtering technique to ensure a high-precision representation of the new environment map. The illuminance of the target monocular environment image is used as an illuminance tag of the new environment map.
Specifically, the Gaussian primitive is initializedGaussian primitives are a parameterized three-dimensional object, using meansCovariance matrixColor c represents the position, shape, color of the gaussian primitive. Initializing a Gaussian primitive corresponding to each point in the assumed point cloud, wherein the average value is the coordinates of the pointsInitialization stageThe calculation is performed using k immediately adjacent points of each point, c using the corresponding pixel color. The SLAM method of the present invention uses a set of gaussian primitives to represent an environment map. Setting new environment map as reference map。
It can be seen that under the condition that the positioning map does not exist, the camera pose is initialized, after the camera pose is initialized, the positioning map is obtained based on the monocular environment image of the subsequent frame, and the camera pose is continuously updated based on the positioning map to realize positioning.
A2, if the reference map does not exist, initializing Gaussian primitives by using 3D point cloud data to obtain new Gaussian primitives, constructing a new environment map based on a new Gaussian primitive set, and storing the new environment map in a scene map table by taking target illuminance as an illuminance tag.
In the embodiment, the condition that the reference map does not exist comprises that an illumination matching map meeting the updating condition does not exist in the scene ground chart, wherein the updating condition comprises that the illumination difference degree with the target illumination is smaller than an updating difference threshold value. It will be appreciated that in the absence of a positioning map, there must be no reference map.
In this embodiment, a new environment map is generated by using a new gaussian primitive, and the new environment map is stored in a scene map table by using the target illuminance as an illuminance tag, and after a subsequent monocular environment image is acquired, the new environment map is continuously updated by using the new environment map as a reference map because the illuminance does not change in a short time.
For another example, S306 is an optional method for executing a positioning task, where in the case of a positioning map, updating the pose of the camera, that is, positioning, is implemented based on GICP registration method. In other alternative embodiments, the camera pose may also be updated by other specific updating methods, which is not limited to this embodiment.
For another example, in an alternative embodiment, in order to overcome the influence of illumination variation in a scene, the embodiment of the present application provides a specific implementation manner of S305, and the positioning map is acquired and the map is updated based on a gaussian map fusion scheme of multi-scale illumination perception, so as to improve the robustness of the system. In the SLAM optimization and positioning process, a positioning map and a reference map are decoupled, and the problem of system instability caused by simultaneous positioning and optimization is solved.
For another example, the steps of S305 to acquire the reference map and S307 to S312 to perform the update task are optional, for implementing the update of the historical environment map and the illuminance tag thereof in the scene ground chart based on the currently acquired target environment map.
For another example, the present application is not limited to a specific application scenario, and is particularly important for applications that need to move and work in a complex environment, such as robots, unmanned aerial vehicles, intelligent security devices, etc., and the following are several optional application scenarios, which are exemplified:
The household sweeping robot captures images of a household environment through a single camera arranged on the robot, and the distance (depth) of each image pixel is calculated by using a monocular depth estimation technology to form three-dimensional point cloud data. And optimizing the point cloud data into a high-precision home map by a Gaussian sputtering technology. The robot not only can know the layout of the room, but also can know the position of the robot in the room in real time, so that the robot can intelligently plan a cleaning route, avoid obstacles and perform efficient cleaning.
In the field of unmanned aerial vehicle navigation, an unmanned aerial vehicle captures an environment image in real time by using an installed camera in the flight process, acquires three-dimensional information of the environment by using a monocular depth estimation technology, and constructs a three-dimensional map by using a Gaussian sputtering technology. The unmanned aerial vehicle can navigate autonomously in a complex indoor or outdoor environment, avoid collision, plan an optimal flight path according to a real-time map, and execute tasks such as express delivery, environment monitoring and the like.
In the field of intelligent security, a camera installed in a security system can capture images of a monitoring area in real time, three-dimensional point cloud data are generated through a monocular depth estimation technology, an environment map is built through a Gaussian sputtering technology, the layout and dynamic change of the monitoring area are known in real time, personnel positioning and behavior analysis are performed, and the intelligent level of security monitoring is improved.
Therefore, the invention combines the monocular depth estimation technology and the Gaussian sputtering technology to realize environment sensing and autonomous positioning navigation by using a single RGB camera, and provides a visual SLAM solution with low cost and good effect for various application scenes.
For another example, the specific method for acquiring the positioning map and the reference map in S305 includes a plurality of methods, and fig. 4 provides a flowchart of a specific implementation for acquiring the positioning map and the reference map in an embodiment of the present application, as shown in fig. 4, the method includes:
S401, acquiring illuminance of a target monocular environment image.
In this embodiment, the method for obtaining illuminance by performing illuminance analysis on the monocular environment map includes various methods, for example, illuminance is obtained by arranging an illuminance sensor in a scene, and illuminance can be estimated by analyzing the illuminance characteristics of the target monocular environment image. The specific method for analyzing the target monocular environment image to estimate the illuminance comprises the following steps:
D1, extracting a brightness channel L from the target monocular environment image.
Specifically, the target monocular environment image is converted into a luminance image, and the luminance channel L is calculated using formula (9):
(9)。
Calculating the mean value of the luminance channel using equation (10) :
(10)。
Calculating the variance of the luminance channel using equation (11):
(11);
Wherein H and W are the height and width of the target monocular ambient image, respectively.
D2 converting the target monocular ambient image into HSV color space。
Calculating the mean value of the hue channel using formula (12):
(12)。
Calculating variance of hue channel using formula (13):
(13)。
D3, using formula (14), weighting and adding the features to obtain illuminance.
(14);
Wherein:
Contrast ratio The calculation formula of (a) is formula (15):
The calculation formula of the exposure value EV is formula (16):
(16);
wherein N is aperture value, t is shutter speed, ISO is sensitivity
Proportion of shadow areaThe calculation formula of (a) is formula (17):
(17)。
Ratio of highlight regions The calculation formula of (2) is formula (18):
(18);
The weight coefficients representing the individual features. The weight coefficient of each characteristic is adjusted according to the actual situation and experimental data so as to optimize the illuminance Accuracy of (3).
In summary, after the monocular environment map is read in this step, a multi-scale analysis method is used to obtain a luminance histogram, a luminance average value and variance, a hue and saturation, a color balance, a color distribution, a shadow and highlight region, an Exposure Value (EV), an image contrast and other various features in the monocular environment map, and a quantized illuminance value, that is, illuminance after integrating the various features is calculated. The analysis of various features under different scales ensures the accurate perception and processing of complex illumination environments and improves the accuracy of illumination.
S402, comparing the illuminance of the target monocular environment image with the illuminance tag of the target historical environment map to obtain the illuminance difference, and acquiring an illuminance matching map according to the illuminance difference.
In this embodiment, the target historical environment map includes a historical environment map in at least one scene graph, for example, the target historical environment map includes n environment maps in which a time interval between a time stamp and a current time in the scene graph is less than a preset time interval threshold. For another example, ignoring the time stamp, the historical environment map in the scene graph is directly taken as the target historical environment map.
In this embodiment, the illumination difference is not less than a preset illumination difference threshold, the illumination matching result is unmatched, the illumination difference is less than the preset illumination difference threshold, and the illumination matching result is matched.
Specifically, the specific implementation method of the step comprises the following steps:
and E1, acquiring an illuminance difference threshold.
And E2, calculating the illumination difference degree.
Specifically, for each target historical environment map, using equation (19), the target illuminance of the target monocular environment image is calculatedIlluminance tag for historical environment map of targetAbsolute value of difference of (2)As the degree of illumination difference.
(19)。
E3, acquiring the minimum value of the illumination difference degree of each target historical environment map and the target monocular environment imageComparing the minimum value with the illumination difference threshold value ifDetermining that an illumination matching map exists, comparing the illumination difference degree of each target historical environment map with an illumination difference degree threshold value, and taking the target historical environment map with the illumination difference degree smaller than Yu Guangzhao difference degree threshold value as the illumination matching map.
If it isAnd determining that no illumination matching map exists, namely, no positioning map exists.
It should be further noted that, according to intensity of current environmental illumination change, the illumination difference threshold is dynamically adjustedFor example, when the system detects that the illumination change is large, the threshold value can be appropriately increased, and under the scene of stable illumination, the threshold value can be reduced, so that the accuracy of illumination matching is improved.
S403, if the number of the illumination matching maps is equal to 1, taking the illumination matching maps as positioning maps.
S404, if the number of the illumination matching maps is larger than 1, acquiring two illumination matching maps with the smallest illumination difference degree with the target monocular environment image, and respectively serving as a first illumination matching map and a second illumination matching map.
In this embodiment, based on the illumination difference result of E2, two illumination matching maps with the smallest illumination difference with the target monocular environment image are selected, that is, the illumination label isIs matched with the map by the first illumination of the mapAnd illuminance label asIs matched with the map by the second illumination of the map。
S405, carrying out weighted fusion on Gaussian primitives at the same positions of the first illumination matching map and the second illumination matching map to obtain a fusion Gaussian primitive set.
In this embodiment, the fusion gaussian primitive set includes fusion gaussian primitives at each position, and the first illumination matches the mapAnd a second illumination matching mapThe weighting coefficients (1-alpha) of the target monocular environment image are all related to the illumination difference degree, and the larger the illumination difference degree between the illumination matching map and the target monocular environment image is, the smaller the weighting coefficient is.
Using equation (20), a weighting coefficient is determined:
(20);
Wherein the method comprises the steps of ,,Representing a first illumination matching mapIs a light level label of the (c) in the (c),Representing a second illumination matching mapIs a lighting label of (a).
Matching the map to the first illumination using equation (21)Map matching with second illuminationCarrying out weighted fusion on the Gaussian primitive set of the fusion map, and calculating the Gaussian primitive set of the fusion map, namely the fusion Gaussian primitive set:
(21);
Wherein, AndAnd the Gaussian primitive sets are respectively a first illumination matching map and a second illumination matching map.
S406, constructing a positioning map based on the fusion Gaussian primitive set.
S407, selecting an illumination matching map meeting the updating conditions as a reference map.
In this embodiment, the update condition includes that the illumination difference is smaller than an update threshold, wherein the update threshold is smaller than the illumination difference threshold. Optionally, selecting an illumination matching map with the smallest illumination difference degree from the illumination matching maps with the illumination difference degree smaller than the updating threshold value as the reference map.
According to the technical scheme, the method takes the illuminance of the collected target monocular environment image, namely the target illuminance as a reference, determines whether the environment image meeting the illumination matching condition exists in the historical environment map, namely the illumination matching map, wherein the illumination matching map is an environment image similar to the illumination condition of the target monocular environment image, determines a positioning image based on the illumination matching map, and avoids low positioning precision caused by environment illumination change. Further, in the case where there are a plurality of illumination matching maps, the positioning image is determined by fusion of the plurality of illumination matching maps (the present embodiment uses only two illumination matching maps as an example), and smooth transition and fine adjustment are performed on the positioning image by using the characteristics of gaussian sputtering.
Thus, by fusing to obtain a localization map, the system will select the two maps closest to the current illuminance to fuse. The fusion process comprises the steps of carrying out weighted addition on Gaussian primitive sets based on weight coefficients to adjust and merge Gaussian primitives of a history environment map to be fused (namely a first illumination matching map and a second illumination matching map), wherein the weight coefficients are determined based on illumination difference degree, namely, the operation of fusing the Gaussian primitives is adjusted by using the illumination of a current target environment map so as to realize illumination matching map fusion operation of fusing the current target environment map, smooth transition and fine adjustment are carried out by utilizing the characteristic of Gaussian sputtering, and a fine map reflecting a current scene is generated. And taking the fused map as a positioning map for SLAM positioning tasks. The adaptation of the localization image for localization (camera pose update) to the environmental conditions of the current scene is improved, thereby improving localization accuracy and precision under the current lighting conditions.
Further, under the condition that an illumination matching map exists, a historical environment map is continuously selected as a reference image based on the updating condition, a subsequent environment image updating task is carried out, namely, the reference image is updated based on a target monocular environment image input currently, and the optimization of the environment image under the current illumination condition is realized, so that the accuracy of the subsequent positioning task is improved.
Furthermore, the positioning map and the reference map are acquired through different methods, namely, the method for acquiring the decoupling positioning map and the reference map is used for realizing the isolation of the positioning map and the reference map, and the stability of the slam algorithm is ensured.
In summary, the application provides a visual SLAM method combining monocular shooting and GS technology, and firstly, by establishing a plurality of environment maps adapting to different illumination conditions, the positioning accuracy of the visual SLAM under complex illumination conditions is greatly improved. Aiming at the influence of illumination change on positioning precision, the application further provides a mechanism for separating a positioning map from a reference map, wherein a map which is most matched with the current illumination condition is dynamically selected for positioning in the positioning process, and meanwhile, the background carries out incremental update on an environment map, so that the stability and reliability of the visual SLAM in all-weather operation are obviously improved.
The method adopts the 3D Gaussian sputtering as the representation mode of the scene map, and compared with the traditional dense point cloud or grid representation, the 3D Gaussian sputtering has remarkable advantages in the aspects of data storage and calculation efficiency. Gaussian sputtering describes the environment structure more naturally through continuous probability distribution, supports efficient rendering and updating operation, and is suitable for SLAM scenes with high real-time requirements. By combining depth information generated by the monocular depth estimation technology, 3D Gaussian sputtering can accurately and efficiently express scene details, and the expressive capacity of the system in a complex environment is further enhanced. According to the method, an RGB-D camera is not needed, the hardware cost is obviously reduced, the illumination adaptability is improved through an optimization algorithm and data representation, and accurate positioning and efficient map construction can be realized in a dynamic and complex illumination environment.
The invention can be widely applied to the field of artificial intelligence, is particularly suitable for intelligent mobile equipment, acquires monocular environment images through the camera equipment and combines monocular depth estimation and Gaussian sputtering technology, constructs an environment map based on the monocular images in real time, and realizes the high-efficiency positioning and environment sensing functions of the equipment in all-weather operation while ensuring the accuracy and stability of visual SLAM. Positioning the present invention aims to provide a low cost, adaptable visual SLAM solution by a combination of monocular depth estimation and gaussian sputtering techniques. Depth information of each image pixel is inferred using a monocular depth estimation technique to generate three-dimensional point cloud data. And the Gaussian sputtering technology is used for replacing NeRF to optimize the generated point cloud data, so that the processing speed of the system is improved, the SLAM system can be ensured to run at the rate of 5 to 10 frames per second, and the real-time positioning requirement of intelligent equipment is met.
The foregoing describes a positioning method provided by an embodiment of the present application, and an apparatus for performing the foregoing positioning method will be described below.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a positioning device according to an embodiment of the application. As shown in fig. 5, the positioning device 500 includes:
A point cloud generating unit 501, configured to acquire 3D point cloud data of a target environment image;
A primitive initializing unit 502, configured to initialize gaussian primitives to obtain new gaussian primitives by using the 3D point cloud data;
An illumination matching unit 503, configured to obtain, as an illumination matching map, a historical environment map that meets a preset illumination matching condition from a scene ground chart based on a target illuminance, where the illumination matching condition at least includes an illuminance tag and a target illuminance having an illuminance difference smaller than a preset illuminance difference threshold;
A positioning map obtaining unit 504, configured to obtain a positioning map based on the illumination matching map;
and the positioning unit 505 is used for updating the pose of the camera based on the new Gaussian primitive and the Gaussian primitive of the positioning map.
In one possible implementation, when the point cloud generating unit is configured to acquire 3D point cloud data of the target environment image, the point cloud generating unit is specifically configured to:
receiving a monocular environment image acquired by monocular image acquisition equipment as a target environment image;
obtaining a depth map of the target environment image through a pre-trained depth estimation model;
And mapping the depth map of the target environment image into a 3D point cloud based on a preset camera internal parameter to obtain the 3D point cloud data.
In one possible implementation, the illumination matching unit is used, in particular, for:
Based on the target illuminance, acquiring a historical environment map meeting preset illumination matching conditions from a scene ground chart, wherein the historical environment map comprises:
acquiring illuminance of the target environment image based on preset illumination characteristics;
For each target historical environment map, comparing the illuminance tag of the target historical environment map with the target illuminance to obtain the illuminance difference degree of the target historical environment map, wherein the target historical environment map comprises at least one historical environment map in the scene map;
And comparing the illumination difference degree with the illumination difference degree threshold value for each target historical environment map, determining that the target historical environment map meets the illumination matching condition if the illumination difference degree is smaller than the illumination difference degree threshold value, and determining that the target historical environment map does not meet the illumination matching condition if the illumination difference degree is not smaller than the illumination difference degree threshold value.
In one possible implementation, the positioning map obtaining unit is configured to obtain the positioning map based on the illumination matching map, where the positioning map is specifically configured to:
if the number of the illumination matching maps is equal to 1, taking the illumination matching maps as the positioning maps;
If the number of the illumination matching maps is greater than 1, acquiring two historical environment maps with the smallest illumination difference degree from all the illumination matching maps, and respectively serving as a first illumination matching map and a second illumination matching map;
Respectively carrying out weighted fusion on Gaussian primitives at the same positions of the first illumination matching map and the second illumination matching map based on corresponding weighting coefficients to obtain a fusion Gaussian primitive set, wherein the weighting coefficients of the Gaussian primitives of a target illumination matching map are inversely related to the illumination difference degree of the target illumination matching map, and the target illumination matching map comprises the first illumination matching map and the second illumination matching map;
and constructing the positioning map based on the fusion Gaussian primitive set.
In one possible implementation, the positioning unit is configured to update the camera pose based on the new gaussian primitive and the gaussian primitive of the positioning map, specifically configured to:
If the positioning map exists, registering the new Gaussian primitive with the Gaussian primitive of the positioning map by using a preset point cloud-based registration algorithm to obtain a rotation matrix;
And updating the camera pose by using the rotation matrix.
In one possible implementation, the positioning device further comprises an updating unit, wherein the updating unit is used for selecting an illumination matching map meeting a preset updating condition as a reference map after acquiring a historical environment map meeting the preset updating condition from a scene graph based on the target illumination, the updating condition comprises that the illumination difference is smaller than a preset updating threshold value, the updating threshold value is smaller than the illumination difference threshold value, the reference map is updated based on the new Gaussian primitive, and the illumination label of the reference map is updated based on the target illumination.
In one possible implementation, the updating unit is used for adding the new Gaussian primitives to the reference map to obtain a first candidate map, performing heuristic pruning on the number of the Gaussian primitives in the first candidate map to obtain a second candidate map, rendering a rendered image under a camera pose by using a micro-rendering path based on the second candidate map, comparing the rendered image with a target monocular environment image, calculating a loss value, counter-propagating the loss value, calculating gradients corresponding to each Gaussian primitive in the second candidate map, updating each Gaussian primitive based on the gradients corresponding to each Gaussian primitive in the second candidate map to obtain an environment map of the target environment image, and updating the reference map to be the environment map of the target environment image.
In one possible implementation, the method includes initializing a map unit, after acquiring a historical environment map meeting a preset illumination matching condition from a scene graph based on the target illumination, if there is no illumination matching map meeting the update condition, constructing a new environment map based on the new gaussian primitive, and storing the new environment map to the scene graph with the target illumination as an illumination tag.
The embodiment of the application also provides electronic equipment. Referring to fig. 6, a schematic diagram of an electronic device suitable for use in implementing embodiments of the present application is shown. The electronic device in the embodiment of the present application may include, but is not limited to, a fixed terminal such as a mobile phone, a notebook computer, a PDA (personal digital assistant), a PAD (tablet computer), a desktop computer, and the like. The electronic device shown in fig. 6 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.
As shown in fig. 6, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the state where the electronic device is powered on, various programs and data necessary for the operation of the electronic device are also stored in the RAM 603. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, devices may be connected to I/O interface 605 including input devices 606, including for example, touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc., output devices 607, including for example, liquid Crystal Displays (LCDs), speakers, vibrators, etc., storage devices 608, including for example, memory cards, hard disks, etc., and communication devices 609. The communication means 609 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
Embodiments of the present application also provide a computer program product comprising computer readable instructions, which when executed on an electronic device, cause the electronic device to implement any of the positioning methods provided by the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, which carries one or more computer programs, and when the one or more computer programs are executed by the electronic device, the electronic device can realize any positioning method provided by the embodiment of the application.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.
Claims (10)
1. A positioning method, comprising:
Acquiring 3D point cloud data of a target environment image;
initializing a Gaussian primitive by using the 3D point cloud data to obtain a new Gaussian primitive;
acquiring a historical environment map meeting preset illumination matching conditions from a scene ground chart based on target illumination, wherein the illumination matching conditions at least comprise that the illumination difference between an illumination label and the target illumination is smaller than a preset illumination difference threshold value, the target illumination is the environment illumination when the target environment image is shot, the scene ground chart comprises the historical environment map with the illumination label, and the illumination label of the historical environment map represents the environment illumination when the historical environment map is generated;
acquiring a positioning map based on the illumination matching map;
and updating the camera pose based on the new Gaussian primitive and the Gaussian primitive of the positioning map.
2. The positioning method according to claim 1, wherein the acquiring 3D point cloud data of the target environment image includes:
receiving a monocular environment image acquired by monocular image acquisition equipment as a target environment image;
obtaining a depth map of the target environment image through a pre-trained depth estimation model;
And mapping the depth map of the target environment image into a 3D point cloud based on a preset camera internal parameter to obtain the 3D point cloud data.
3. The positioning method according to claim 1, wherein the obtaining, based on the target illuminance, a historical environment map satisfying a preset illuminance matching condition from the scene ground chart includes:
acquiring illuminance of the target environment image based on preset illumination characteristics;
For each target historical environment map, comparing the illuminance tag of the target historical environment map with the target illuminance to obtain the illuminance difference degree of the target historical environment map, wherein the target historical environment map comprises at least one historical environment map in the scene map;
And comparing the illumination difference degree with the illumination difference degree threshold value for each target historical environment map, determining that the target historical environment map meets the illumination matching condition if the illumination difference degree is smaller than the illumination difference degree threshold value, and determining that the target historical environment map does not meet the illumination matching condition if the illumination difference degree is not smaller than the illumination difference degree threshold value.
4. The positioning method of claim 1, wherein the obtaining a positioning map based on the illumination matching map comprises:
if the number of the illumination matching maps is equal to 1, taking the illumination matching maps as the positioning maps;
If the number of the illumination matching maps is greater than 1, acquiring two historical environment maps with the smallest illumination difference degree from all the illumination matching maps, and respectively serving as a first illumination matching map and a second illumination matching map;
Respectively carrying out weighted fusion on Gaussian primitives at the same positions of the first illumination matching map and the second illumination matching map based on corresponding weighting coefficients to obtain a fusion Gaussian primitive set, wherein the weighting coefficients of the Gaussian primitives of a target illumination matching map are inversely related to the illumination difference degree of the target illumination matching map, and the target illumination matching map comprises the first illumination matching map and the second illumination matching map;
and constructing the positioning map based on the fusion Gaussian primitive set.
5. The positioning method of claim 1, wherein the updating the camera pose based on the new gaussian primitive and the gaussian primitive of the positioning map comprises:
If the positioning map exists, registering the new Gaussian primitive with the Gaussian primitive of the positioning map by using a preset point cloud-based registration algorithm to obtain a rotation matrix;
And updating the camera pose by using the rotation matrix.
6. The positioning method according to claim 5, further comprising, after the obtaining, from the scene ground map, a historical environment map satisfying a preset illumination matching condition based on the target illuminance:
if an illumination matching map meeting a preset updating condition exists, selecting an illumination matching map meeting the updating condition as a reference map, wherein the updating condition comprises that the illumination difference is smaller than a preset updating threshold value, and the updating threshold value is smaller than the illumination difference threshold value;
Updating the reference map based on the new gaussian primitive;
And updating an illuminance tag of the reference map based on the target illuminance.
7. The positioning method of claim 6, wherein updating the reference map based on the new gaussian primitive comprises:
adding the new Gaussian primitive to the reference map to obtain a first candidate map;
Heuristic pruning is carried out on the number of Gaussian primitives in the first candidate map to obtain a second candidate map;
rendering by using a micro-renderable path based on the second candidate map to obtain a rendered image under the pose of the camera;
comparing the rendered image with a target monocular environment image, calculating a loss value, counter-propagating the loss value, and calculating a gradient corresponding to each Gaussian primitive in the second candidate map;
Updating each Gaussian primitive based on the gradient corresponding to each Gaussian primitive in the second candidate map to obtain an environment map of the target environment image;
Updating the reference map to be an environment map of the target environment image.
8. The positioning method according to claim 6, further comprising, after the obtaining, from the scene ground map, a historical environment map satisfying a preset illumination matching condition based on the target illuminance:
If the illumination matching map meeting the updating conditions does not exist, a new environment map is built based on the new Gaussian primitives;
And storing the new environment map to the scene map by taking the target illuminance as an illuminance tag.
9. A positioning device, comprising:
The point cloud generation unit is used for acquiring 3D point cloud data of the target environment image;
the primitive initializing unit is used for initializing the Gaussian primitive by using the 3D point cloud data to obtain a new Gaussian primitive;
The illumination matching unit is used for acquiring a historical environment map meeting preset illumination matching conditions from a scene ground chart based on target illumination, and taking the historical environment map as an illumination matching map, wherein the illumination matching conditions at least comprise that the illumination difference between an illumination label and the target illumination is smaller than a preset illumination difference threshold value;
The positioning map acquisition unit is used for acquiring a positioning map based on the illumination matching map;
And the positioning unit is used for updating the camera pose based on the new Gaussian primitive and the Gaussian primitive of the positioning map.
10. An electronic device comprising at least one processor and a memory coupled to the processor, wherein:
the memory is used for storing a computer program;
The processor is configured to execute the computer program to enable the electronic device to implement the positioning method according to any one of claims 1 to 8.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510007135.9A CN120070568A (en) | 2025-01-02 | 2025-01-02 | Positioning method and related device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510007135.9A CN120070568A (en) | 2025-01-02 | 2025-01-02 | Positioning method and related device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120070568A true CN120070568A (en) | 2025-05-30 |
Family
ID=95790065
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510007135.9A Pending CN120070568A (en) | 2025-01-02 | 2025-01-02 | Positioning method and related device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120070568A (en) |
-
2025
- 2025-01-02 CN CN202510007135.9A patent/CN120070568A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112734765B (en) | Mobile robot positioning method, system and medium based on fusion of instance segmentation and multiple sensors | |
| CN112990211B (en) | A neural network training method, image processing method and device | |
| CN114764778B (en) | Target detection method, target detection model training method and related equipment | |
| CN112505065B (en) | Method for detecting surface defects of large part by indoor unmanned aerial vehicle | |
| WO2021226876A1 (en) | Target detection method and apparatus | |
| CN114022830A (en) | Target determination method and target determination device | |
| CN112365604B (en) | AR equipment depth information application method based on semantic segmentation and SLAM | |
| CN115249269A (en) | Object detection method, computer program product, storage medium, and electronic device | |
| CN118736145A (en) | A semantic elevation map construction method and system for unstructured environments | |
| CN115497077A (en) | Carriage attitude recognition system, carriage attitude recognition method, electronic device and storage medium | |
| WO2023283929A1 (en) | Method and apparatus for calibrating external parameters of binocular camera | |
| CN115824218A (en) | Design method of autonomous navigation system for ground unmanned platform based on intelligent accelerator card | |
| CN119625279A (en) | Multimodal target detection method, device and multimodal recognition system | |
| CN118119968A (en) | Point cloud data annotation method and device | |
| CN118608751A (en) | A method for intelligent identification and positioning of traffic cones | |
| Nguyen et al. | Neural network‐based optical flow versus traditional optical flow techniques with thermal aerial imaging in real‐world settings | |
| CN117611809A (en) | Point cloud dynamic object filtering method based on camera lidar fusion | |
| CN117496322A (en) | Multi-mode 3D target detection method and device based on cloud edge cooperation | |
| CN117095060A (en) | Accurate cleaning method and device based on garbage detection technology and cleaning robot | |
| CN116468903A (en) | A method and related device for processing point cloud data | |
| CN116385532A (en) | UAV positioning method, device, UAV and storage medium | |
| Tu et al. | Method of using RealSense camera to estimate the depth map of any monocular camera | |
| CN118537507B (en) | Image rendering method, related method, device, equipment and storage medium | |
| CN120070568A (en) | Positioning method and related device | |
| CN116740488B (en) | Training method and device for feature extraction model for visual positioning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |