CN110599542A

CN110599542A - Method and device for local mapping of adaptive VSLAM (virtual local area model) facing to geometric area

Info

Publication number: CN110599542A
Application number: CN201910817262.XA
Authority: CN
Inventors: 朱州
Original assignee: Beijing Yingpu Technology Co Ltd
Current assignee: Beijing Yingpu Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2019-12-20

Abstract

The application discloses a method and a device for local mapping of adaptive VSLAM facing to a geometric region, and belongs to the field of mapping. The method comprises the following steps: acquiring an experimental data set, and eliminating dynamic noise points of images in the experimental data set by adopting RCNN (recursive least squares NN); inputting the image without the dynamic noise points into a CNN (continuous noise network), determining attributes of local point clouds and voxels in a local space, calculating a homography matrix through a pixel matching and back propagation algorithm, and estimating the pose of a camera according to the homography matrix; and carrying out local mapping according to the camera pose, and carrying out self-adaptive optimization on the local mapping according to the attributes of the local point cloud and the voxels. The device includes: the device comprises an acquisition module, a calculation module and a mapping module. The method has the advantages of high calculation efficiency, high accuracy and strong generalization capability, and enhances the effectiveness of the characteristics, so that the output result has higher credibility.

Description

Method and device for local mapping of adaptive VSLAM (virtual local area model) facing to geometric area

Technical Field

The present application relates to the field of map construction, and in particular, to a method and an apparatus for local mapping of a geometric-region-oriented adaptive VSLAM.

Background

The VSLAM (Visual simultaneouslocalization And Mapping) refers to a process of calculating a self position And constructing an environment map according to information of a Visual sensor, And can solve the problems of positioning And map construction during movement in an unknown environment, And is more accurate And rapid. The VSLAM model mainly comprises sensor data preprocessing, a front end, a back end, loop detection and graph building. The front end is also called VO (Visual odometer), and mainly studies how to quantitatively estimate the motion of the inter-frame camera according to the adjacent frame images. The motion trail of the camera carrier (such as a robot or a vehicle) is formed by connecting the motion trails of the adjacent frames, and the positioning problem is solved. And then, according to the estimated position of the camera at each moment, calculating the position of a space point of each pixel, and completing the construction of the map.

The output result of the VSLAM model is mainly influenced by conditions such as real-time performance, environment, illumination and the like, and the visual odometer is easy to generate accumulated errors, so that the accuracy is reduced, and further, the accuracy of the constructed graph is also influenced to a certain degree.

Disclosure of Invention

It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.

According to an aspect of the present application, there is provided a method for local mapping of adaptive VSLAM for a geometric region, including:

acquiring an experimental data set, and eliminating dynamic noise points of images in the experimental data set by adopting an RCNN (convolutional neural network) region convolutional neural network;

inputting the image without the dynamic noise points into a CNN convolutional neural network, determining attributes of local point clouds and voxels in a local space, calculating a homography matrix through a pixel matching and back propagation algorithm, and estimating the pose of the camera according to the homography matrix;

and carrying out local mapping according to the camera pose, and carrying out self-adaptive optimization on the local mapping according to the attributes of the local point cloud and the voxels.

Optionally, the homography matrix is calculated by a pixel matching and back propagation algorithm, including:

and setting the loss function as the distance between the feature point of the previous frame and the corresponding feature point of the current frame after the change of the homography matrix, and executing a back propagation algorithm according to the loss function to calculate the homography matrix.

Optionally, performing adaptive optimization on the local mapping according to the attributes of the local point cloud and the voxels, including:

dividing the local point cloud into a plurality of sub-areas, acquiring response intensity from the attribute of the voxel, and filtering the feature points in each sub-area according to the response intensity to obtain a local feature map;

and judging whether the number of the feature points in the local point cloud is smaller than the preset number q of the registration points, if so, expanding the local feature map by increasing edge adjacent voxel space and/or angle adjacent voxel space for the sub-area where the registration points in the local point cloud are located.

Optionally, filtering the feature points in each sub-region according to the response strength to obtain a local feature map, including:

and in each sub-area, reserving the characteristic points with the response intensity larger than the specified value as a characteristic map, and filtering out the rest characteristic points.

Optionally, for the images in the experimental data set, removing dynamic noise points by using an RCNN area convolution neural network, including:

by means of RCNN combined with geometric features, for each current frame, finding N key frames with the highest overlapping degree, acquiring the projection of feature points of the key frames on the current frame, and calculating the position and depth value of the projection; and judging whether the depth value exceeds a set threshold value, if so, judging that the feature point is a dynamic noise point, calculating the variance between the dynamic noise point and the surrounding points, and removing the dynamic noise point of which the variance is smaller than a specified value.

According to another aspect of the present application, there is provided a local mapping apparatus for adaptive VSLAM facing a geometric region, including:

the acquisition module is configured to acquire an experimental data set, and eliminate dynamic noise points of images in the experimental data set by adopting an RCNN (convolutional neural network) region convolution neural network;

the calculation module is configured to input the image with the dynamic noise points removed into a CNN convolutional neural network, determine attributes of local point clouds and voxels in local space, calculate a homography matrix through a pixel matching and back propagation algorithm, and estimate a camera pose according to the homography matrix;

a mapping module configured to perform local mapping according to the camera pose, the local mapping being adaptively optimized according to the attributes of the local point cloud and the voxels.

Optionally, the computing module is specifically configured to:

Optionally, the mapping module includes:

a mapping unit configured to perform local mapping according to the camera pose;

the self-adaptive unit is configured to divide the local point cloud into a plurality of sub-areas, obtain response intensity from attributes of the voxels, and filter feature points in each sub-area according to the response intensity to obtain a local feature map; and judging whether the number of the feature points in the local point cloud is smaller than the preset number q of the registration points, if so, expanding the local feature map by increasing edge adjacent voxel space and/or angle adjacent voxel space for the sub-area where the registration points in the local point cloud are located.

Optionally, the adaptation unit is specifically configured to:

Optionally, the obtaining module is specifically configured to:

According to yet another aspect of the application, there is provided a computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor implements the method as described above when executing the computer program.

According to yet another aspect of the application, a computer-readable storage medium, preferably a non-volatile readable storage medium, is provided, having stored therein a computer program which, when executed by a processor, implements a method as described above.

According to yet another aspect of the application, there is provided a computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform the method described above.

According to the technical scheme, the experimental data set is obtained, dynamic noise points of an image in the experimental data set are eliminated through RCNN, the image is input into CNN, attributes of local point cloud and voxels in a local space are determined, a homography matrix is calculated through pixel matching and a back propagation algorithm, the camera pose is estimated according to the homography matrix, local mapping is conducted according to the camera pose, self-adaptive optimization is conducted on the local mapping according to the attributes of the local point cloud and the voxels, the calculation is efficient, the accuracy rate is high, the generalization capability is strong, the feature effectiveness is enhanced, and the output result is enabled to be more credible. In addition, the accuracy of the output result of the model can be further improved by rejecting dynamic noise data.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow diagram of a geometric region-oriented adaptive VSLAM local mapping method according to one embodiment of the present application;

FIG. 2 is a flow diagram of a geometric region-oriented adaptive VSLAM local mapping method according to another embodiment of the present application;

FIG. 3 is a flow diagram illustrating a geometric region-oriented adaptive VSLAM local building block according to another embodiment of the present application;

fig. 4 is a block diagram of an adaptive VSLAM local mapping apparatus for geometric area according to another embodiment of the present application;

FIG. 5 is a block diagram of a computing device according to another embodiment of the present application;

fig. 6 is a diagram of a computer-readable storage medium structure according to another embodiment of the present application.

Detailed Description

Fig. 1 is a flow chart of a geometric region-oriented adaptive VSLAM local mapping method according to one embodiment of the present application. Referring to fig. 1, the method includes:

101: acquiring an experimental data set, and removing dynamic noise points of images in the experimental data set by adopting RCNN (Regions with a connected Neural Network Features);

102: inputting the image without the dynamic noise points into a CNN (Convolutional Neural Networks), determining attributes of local point clouds and voxels in a local space, calculating a homography matrix through a pixel matching and back propagation algorithm, and estimating the pose of the camera according to the homography matrix;

103: and performing local mapping according to the pose of the camera, and performing self-adaptive optimization on the local mapping according to the attributes of the local point cloud and the voxels.

In the method provided by this embodiment, a VO visual odometry process is performed, two adjacent key frame images are input in the process, a plurality of VO processes are performed in practical application, each VO process processes two adjacent key frame images, and a detailed process is the same as the above process and is not described herein again.

In this embodiment, optionally, the calculating the homography matrix through the pixel matching and back propagation algorithm includes:

In this embodiment, optionally, the performing adaptive optimization on the local mapping according to the attributes of the local point cloud and the voxels includes:

and judging whether the number of the feature points in the local point cloud is smaller than the preset number q of the registration points, if so, increasing edge adjacent voxel space and/or angle adjacent voxel space of the sub-region where the registration points in the local point cloud are located, and expanding the local feature map.

In this embodiment, optionally, the filtering the feature points in each sub-region according to the response strength to obtain a local feature map includes:

In this embodiment, optionally, removing dynamic noise points from an image in the experimental data set by using an RCNN area convolution neural network includes:

by means of RCNN combined with geometric features, for each current frame, N key frames with the highest overlapping degree are found, projection of feature points of the key frames on the current frame is obtained, and the position and the depth value of the projection are calculated; judging whether the depth value exceeds a set threshold value, if so, judging that the characteristic point is a dynamic noise point, calculating the variance between the dynamic noise point and the surrounding points, and eliminating the dynamic noise point of which the variance is smaller than a specified value.

According to the method provided by the embodiment, the experimental data set is obtained, dynamic noise points of the image are removed through RCNN, the image is input into CNN, attributes of local point cloud and voxels in a local space are determined, the homography matrix is calculated through pixel matching and a back propagation algorithm, the camera pose is estimated according to the homography matrix, local mapping is carried out according to the camera pose, self-adaptive optimization is carried out on the local mapping according to the attributes of the local point cloud and the voxels, the calculation is efficient, the accuracy rate is high, the generalization capability is strong, the feature effectiveness is enhanced, and the output result is more credible. In addition, the accuracy of the output result of the model can be further improved by rejecting dynamic noise data.

Fig. 2 is a flow diagram of a geometric region-oriented adaptive VSLAM local mapping method according to another embodiment of the present application. Referring to fig. 2, the method includes:

201: acquiring an experimental data set, and eliminating dynamic noise points of images in the experimental data set by adopting RCNN (recursive least squares NN);

in this embodiment, preferably, the selected experimental data set is a KITTI data set (jointly created by the charles stuuer institute of technology, germany and the technical research institute of yota america), and is a computer vision algorithm evaluation data set in the current international largest automatic driving scene. The acquisition platform of KITTI data set includes: 2 grayscale cameras, 2 color cameras, one Velodyne 3D lidar, 4 optical lenses, and 1 GPS navigation system. The entire data set consisted of 389 images of stereoscopic images and optical flow maps, 39.2 km visual ranging sequence and over 200,0003D labeled objects, where each image included a maximum of 15 vehicles and 30 pedestrians, and also contained varying degrees of occlusion.

In this embodiment, optionally, the removing dynamic noise points from the images in the experimental data set by using the RCNN includes:

Wherein, N may be set to 5 or other values, and is not limited specifically.

202: inputting the image without the dynamic noise points into a CNN (continuous noise network), and determining attributes of local point clouds and voxels in a local space;

203: setting the loss function as the distance between the feature point of the previous frame and the corresponding feature point of the current frame after the change of the homography matrix, and calculating the homography matrix by executing a back propagation algorithm according to the loss function;

204: estimating the camera pose according to the homography matrix, and performing local image building according to the camera pose;

205: dividing the local point cloud into a plurality of sub-areas, acquiring response intensity from attributes of voxels, reserving feature points with response intensity greater than a specified value in each sub-area as a feature map, and filtering out other feature points;

206: and judging whether the number of the feature points in the local point cloud is smaller than the preset number q of the registration points, if so, increasing edge adjacent voxel space and/or angle adjacent voxel space of the sub-region where the registration points in the local point cloud are located, and expanding the local feature map.

Fig. 3 is a flow diagram illustrating a geometric region-oriented adaptive VSLAM local building process according to another embodiment of the present application. Referring to fig. 3, an image in an experimental data set is input, dynamic noise points are removed by using the RCNN, then the CNN is input, attributes of local point clouds and voxels in a local space are determined, homography matrixes are calculated by using a pixel matching and back propagation algorithm, then the camera pose is estimated and local mapping is performed, then self-adaptive optimization is performed on the local mapping according to the attributes of the local point clouds and the voxels, an output result is obtained, the calculation efficiency is high, the accuracy rate is high, the generalization capability is strong, the feature effectiveness is enhanced, and the output result has higher credibility.

Fig. 4 is a block diagram of a geometry-area-oriented adaptive VSLAM local mapping apparatus according to another embodiment of the present application. Referring to fig. 4, the apparatus includes:

an obtaining module 401 configured to obtain an experimental data set, and for an image in the experimental data set, remove a dynamic noise point by using an RCNN region convolution neural network;

a calculation module 402 configured to input the image from which the dynamic noise points are removed into a CNN convolutional neural network, determine attributes of local point clouds and voxels in a local space, calculate a homography matrix through a pixel matching and back propagation algorithm, and estimate a camera pose according to the homography matrix;

and a mapping module 403 configured to perform local mapping according to the camera pose, and perform adaptive optimization on the local mapping according to the attributes of the local point cloud and the voxels.

In this embodiment, optionally, the computing module is specifically configured to:

In this embodiment, optionally, the mapping module includes:

a mapping unit configured to perform local mapping according to a camera pose;

the adaptive unit is configured to divide the local point cloud into a plurality of sub-areas, acquire response intensity from attributes of voxels, and filter feature points in each sub-area according to the response intensity to obtain a local feature map; and judging whether the number of the feature points in the local point cloud is smaller than the preset number q of the registration points, if so, increasing edge adjacent voxel space and/or angle adjacent voxel space of the sub-region where the registration points in the local point cloud are located, and expanding the local feature map.

In this embodiment, optionally, the adaptive unit is specifically configured to:

In this embodiment, optionally, the obtaining module is specifically configured to:

The apparatus provided in this embodiment may perform the method provided in any of the above method embodiments, and details of the process are described in the method embodiments and are not described herein again.

According to the device provided by the embodiment, the experimental data set is obtained, dynamic noise points of an image in the experimental data set are removed through RCNN, the image is input into CNN, attributes of local point cloud and voxels in a local space are determined, a homography matrix is calculated through pixel matching and a back propagation algorithm, the camera pose is estimated according to the homography matrix, local mapping is carried out according to the camera pose, self-adaptive optimization is carried out on the local mapping according to the attributes of the local point cloud and the voxels, the calculation is efficient, the accuracy rate is high, the generalization capability is strong, the feature effectiveness is enhanced, and the output result is enabled to be more credible. In addition, the accuracy of the output result of the model can be further improved by rejecting dynamic noise data.

Embodiments also provide a computing device, referring to fig. 5, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.

The embodiment of the application also provides a computer readable storage medium. Referring to fig. 6, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.

The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A local mapping method for adaptive VSLAM facing to a geometric region comprises the following steps:

2. The method of claim 1, wherein computing the homography matrix by a pixel matching and back propagation algorithm comprises:

3. The method of claim 1, wherein adaptively optimizing the local map according to attributes of the local point cloud and the voxels comprises:

4. The method of claim 3, wherein filtering the feature points in each sub-region according to the response strength to obtain a local feature map comprises:

5. The method according to any one of claims 1 to 4, wherein the step of removing dynamic noise points from the images in the experimental data set by using an RCNN (convolutional neural network) comprises the following steps:

6. A local mapping device of adaptive VSLAM facing to a geometric region comprises:

7. The apparatus of claim 6, wherein the computing module is specifically configured to:

8. The apparatus of claim 6, wherein the mapping module comprises:

9. The apparatus of claim 8, wherein the adaptation unit is specifically configured to:

10. The apparatus according to any of claims 6-9, wherein the acquisition module is specifically configured to: