CN115601431A

CN115601431A - Control method for stability of visual inertial odometer and related equipment

Info

Publication number: CN115601431A
Application number: CN202211351015.3A
Authority: CN
Inventors: 李梦男; 支涛
Original assignee: Beijing Yunji Technology Co Ltd
Current assignee: Beijing Yunji Technology Co Ltd
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-01-13

Abstract

The invention provides a control method aiming at the stability of a visual inertia odometer and related equipment. The method comprises the following steps: carrying out feature point tracking and feature point matching on at least two frames of images acquired by the visual inertial odometer; carrying out outlier elimination based on a sampling consistency algorithm to obtain an optimized feature point matching result; constructing an inertial neural network, inputting an actual speed parameter related to the visual inertial odometer into the inertial neural network, and outputting a relative error between a predicted speed parameter and an actual speed parameter, which are predicted by the inertial neural network and related to the visual inertial odometer; and determining the pose information of the optimized visual inertial odometer based on the optimized feature point matching result and the relative error. Therefore, the data of the inertial measurement unit can be denoised, the correctness of the motion constraint of the inertial measurement unit is effectively improved, and the stability of the visual inertial odometer is improved.

Description

Control method for stability of visual inertial odometer and related equipment

Technical Field

The present invention relates to the field of computer vision technologies, and in particular, to a method and an apparatus for controlling stability of a visual inertial odometer, an electronic device, and a storage medium.

Background

SLAM (simultaneous Localization and Mapping) is called instant positioning and Mapping, and has the main function of enabling equipment carrying the technology to complete positioning (Localization) and Mapping (Mapping) in an unknown environment. At present, SLAM technology is widely applied to fields such as robot, unmanned aerial vehicle, unmanned driving, AR, VR, rely on the sensor can realize functions such as autonomic location, the map of machine. There are two main types of mainstream SLAM technologies: laser SLAM and visual SLAM. The laser SLAM is high in reliability and mature in technology, but is limited by structural installation, only has geometric information and lacks rich texture semantic information, so that positioning fails in some scenes, such as repeated geometric structures, wheel skidding, moving, dynamically-changing environments and the like. Therefore, the current commercial SLAM scheme mostly adopts a scheme of joint positioning of laser SLAM and visual SLAM. However, this introduces some problems with visual SLAM, most notably instability of the optimal computation of visual SLAM.

The visual SLAM technology comprises two parts, namely a front-end visual odometer and a back-end optimization. The front-end visual odometer can give a track and a map in a short time, but inevitable errors accumulate, resulting in the track mileage walked and the map built at the same time being inaccurate over a long time. Therefore, optimization needs to be performed at the back end to improve the performance of the front-end visual odometer in optimizing the track on a larger scale and a larger scale, and further improve the stability of optimization calculation. In the prior art, the stability of the optimization calculation is mainly related to the monocular camera and the inertial measurement unit in the visual odometer, for example, in a weak texture, repeated texture or dynamic environment, the inertial measurement unit may cause data noise of one or both of them to be larger under uniform motion or rapid vibration, resulting in an error in the final joint optimization.

Therefore, a new technical solution is needed to solve the above technical problems.

Disclosure of Invention

In this summary, concepts in a simplified form are introduced that are further described in the detailed description. The summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In a first aspect, the present invention provides a method for controlling the stability of a visual inertial odometer, comprising: carrying out feature point tracking and feature point matching on at least two frames of images acquired by the visual inertial odometer; carrying out outlier elimination based on a sampling consistency algorithm to obtain an optimized feature point matching result; constructing an inertial neural network, inputting an actual speed parameter related to the visual inertial odometer to the inertial neural network, and outputting a relative error between a predicted speed parameter and an actual speed parameter, which are predicted by the inertial neural network and related to the visual inertial odometer; and determining the pose information of the optimized visual inertial odometer based on the optimized feature point matching result and the relative error.

Optionally, the feature point tracking and feature point matching are performed on at least two frames of images acquired by the visual-inertial odometer, including: finding out the last first frame image arranged in a queue and a second frame image which is in line with a preset condition compared with the first frame image in the queue of the images collected by the visual inertial odometer; calculating an essential matrix between the first frame image and the second frame image to determine a transformation relation between the two frame images; and obtaining a first transformation matrix between the two frames of images based on the transformation relation.

Optionally, the method for performing feature point tracking and feature point matching on at least two frames of images acquired by a visual inertial odometer, and performing outlier rejection based on a sampling consistency algorithm to obtain an optimized feature point matching result includes: triangularization is carried out on the common-view feature points and the first transformation matrix to obtain feature point coordinates of the common-view feature points in the two frames of images relative to the second frame of image; respectively and sequentially carrying out three-dimensional coordinate posture prediction, triangulation and three-dimensional coordinate posture prediction on other frame images except the first frame image and the second frame image in the queue so as to obtain a second transformation matrix of the other frame images relative to the second frame image and characteristic point coordinates of all characteristic points in the other frame images; determining the feature point coordinates of the feature points under different frames based on the feature point coordinates of all the feature points in all the images in the queue; constructing a re-projection error and randomly sampling feature points of the two frames of images so as to eliminate outer points with the re-projection error larger than a third threshold value; and performing adjustment optimization of the light beam method on the first transformation matrix, the second transformation matrix and the feature points after the outliers are removed to obtain an optimized feature point matching result.

Optionally, the feature point coordinates of the feature point in different frames are calculated by the following formula:

wherein,

representing a rotation matrix, P, of the nth frame image in the queue relative to the second frame image _n Indicating the position of the feature point in the nth frame image,

representing the coordinates of the feature points of the n-th frame image relative to the second frame image, P _F Representing the feature point coordinates of the feature point with respect to the second frame image.

Optionally, the control method further comprises: and performing external reference calibration and pixel alignment by using the distance sensor and the visual inertial odometer to obtain depth information of part of feature points and further obtain visual scale information.

Optionally, the control method further includes: and acquiring relevant parameters of the positioning equipment, and inputting the relevant parameters into the position prediction model to output actual position information of the visual inertial odometer.

Optionally, the inertial neural network comprises a convolutional layer and a fully-connected layer, wherein the actual speed parameter related to the visual odometer is input to the inertial neural network to output a relative error between the predicted speed parameter related to the visual odometer and the actual speed parameter predicted by the inertial neural network, and the relative error comprises: inputting an actual speed parameter to the convolutional layer to output intermediate operation data to the long-term and short-term memory neural network; based on the intermediate operation data, the long-term and short-term memory neural network carries out time sequence prediction and inputs the prediction data into a full connection layer; the fully connected layer performs dimension conversion on the prediction data to output a relative error.

In a second aspect, a control device for stability of a visual inertial odometer is also proposed, comprising:

the characteristic point tracking and matching module is used for tracking and matching characteristic points of at least two frames of images acquired by the visual inertial odometer;

the characteristic point optimization module is used for removing outliers based on a sampling consistency algorithm to obtain an optimized characteristic point matching result;

the construction calculation module is used for constructing an inertial neural network, inputting an actual speed parameter related to the visual inertial odometer to the inertial neural network, and outputting a relative error between a predicted speed parameter predicted by the inertial neural network and related to the visual inertial odometer and the actual speed parameter;

and the pose information optimization module is used for determining the pose information of the optimized visual inertial odometer based on the optimized feature point matching result and the relative error.

In a third aspect, an electronic device is further proposed, comprising a processor and a memory, wherein the memory has stored therein computer program instructions for executing the control method for stability of a visual odometer as described above when the computer program instructions are executed by the processor.

In a fourth aspect, a storage medium is also proposed, on which program instructions are stored, which program instructions are adapted to perform the control method for the stability of a visual inertial odometer as described above when executed.

According to the technical scheme, the feature point tracking and feature point matching can be carried out on at least two frames of images acquired by the visual inertial odometer, and outliers in the feature point matching are removed to obtain an optimized feature point matching result. And obtaining a relative error between the predicted speed parameter and the actual speed parameter through an inertial neural network, and performing combined optimization on the optimized feature point matching result and the relative error to obtain the optimized pose information of the visual inertial odometer. Therefore, the data of the inertial measurement unit can be denoised, the correctness of the motion constraint of the inertial measurement unit is effectively improved, and the stability of the visual inertial odometer is improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the specification. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic flow diagram of a control method for stability of a visual inertial odometer according to one embodiment of the invention;

FIG. 2 shows a schematic flow diagram of feature point tracking and feature point matching for at least two frames of images acquired by a visual inertial odometer, according to one embodiment of the present invention;

FIG. 3 shows a schematic flow diagram of feature point tracking and feature point matching for at least two frames of images acquired by a visual inertial odometer and outlier rejection based on a sample consistency algorithm to obtain an optimized feature point matching result, according to an embodiment of the present invention;

FIG. 4 shows a schematic flow chart of inputting an actual speed parameter related to visual odometry to an inertial neural network to output a relative error between a predicted speed parameter related to visual odometry predicted by the inertial neural network and the actual speed parameter, according to one embodiment of the present invention;

FIG. 5 shows a schematic block diagram of a control device for stability of a visual inertial odometer according to an embodiment of the invention; and

FIG. 6 shows a schematic block diagram of an electronic device according to one embodiment of the invention.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

According to a first aspect of the invention, a control method for the stability of a visual inertial odometer is proposed. FIG. 1 shows a schematic flow diagram of a control method 100 for stability of a visual inertial odometer, according to one embodiment of the invention. As shown in fig. 1, the control method 100 may include the following steps.

And step S110, performing feature point tracking and feature point matching on at least two frames of images acquired by the visual inertial odometer.

Alternatively, the visual inertial odometer may include a monocular camera (MonoCamera) and an Inertial Measurement Unit (IMU). For example, for at least two frames of images acquired by the visual inertial odometer, an optical flow method can be adopted for feature point tracking. Specifically, the correspondence existing between the current frame and the previous frame can be determined by using, for example, the change in the temporal domain of the pixels in the two frames of images and the correlation between the adjacent frames, so as to realize the tracking of the feature points. Then, it can be determined which feature points are the same point according to the image information around the feature points, i.e. the difference of the descriptors, to implement feature point matching. It is understood that the above methods for performing feature point matching and feature point matching are only exemplary, and that the feature points can be tracked and matched by using any existing or future method for performing feature point tracking and feature point matching, which is not listed in the present application.

And step S120, carrying out outlier elimination based on a sampling consistency algorithm to obtain an optimized feature point matching result.

It will be appreciated that the sample consensus algorithm may iteratively estimate parameters of a mathematical model from a set of observed data that includes outliers. That is, the outlier data is removed from the feature point data in an iterative manner, where the outlier may refer to a point that does not conform to the optimized feature point model, and a point that conforms to the optimized feature point model may be referred to as an inlier. In addition, the number of selected feature points and the number of repeated iterative computations before the sampling consistency algorithm starts have a large influence on the final optimization result. Therefore, outlier rejection is performed on the feature point matching result in step S110, and an optimized feature point matching result can be obtained.

And step S130, constructing an inertial neural network, inputting an actual speed parameter related to the visual inertial odometer into the inertial neural network, and outputting a relative error between a predicted speed parameter and an actual speed parameter, which are predicted by the inertial neural network and related to the visual inertial odometer.

Preferably, the inertial neural network model may be implemented using a lightweight neural network (MobileNet). Specifically, the input of the lightweight neural network may be actual velocity parameters related to the visual inertial odometer, such as angular velocity, acceleration, visual velocity of the IMU, acceleration bias Ba and acceleration bias Bg in motion model parameters, and the like. Via the lightweight neural network, a relative error between the predicted speed parameter and the actual speed parameter, predicted by the neural network based on the input actual speed parameter, may be output.

And step S140, determining the pose information of the optimized visual inertial odometer based on the optimized feature point matching result and the relative error.

And performing combined optimization on the optimized feature point matching result and the relative error between the predicted speed parameter and the actual speed parameter, and finally outputting the pose information of the monocular camera in the visual inertial odometer.

According to the technical scheme, the feature point tracking and the feature point matching can be carried out on at least two frames of images collected by the visual inertial odometer, and outliers in the feature point matching are removed to obtain an optimized feature point matching result. And obtaining a relative error between the predicted speed parameter and the actual speed parameter through an inertial neural network, and performing combined optimization on the optimized feature point matching result and the relative error to obtain the position and pose information of the optimized visual inertial odometer. Therefore, the data of the inertial measurement unit can be denoised, the accuracy of the motion constraint of the inertial measurement unit is effectively improved, and the stability of the visual inertial odometer is improved.

Fig. 2 shows a schematic flow chart of the feature point tracking and feature point matching for at least two frames of images acquired by the visual inertial odometer in step S110 according to an embodiment of the present invention. As shown in fig. 2, step S110 may include step S111, step S112, and step S113.

Step S111, finding out the last first frame image arranged in the queue and the second frame image which is in line with the preset condition compared with the first frame image in the queue of the images collected by the visual inertial odometer. Optionally, the preset condition may include: the number of the common-view feature points in the first frame image and the second frame image is larger than a first threshold value, and the parallax between the first frame image and the second frame image is larger than a second threshold value.

Specifically, in this embodiment, in the queue of images acquired by the visual inertial odometer, one image frame that is arranged at the last of the queue is found and is denoted as N, and another image frame that has the number of feature points that are co-visible with N that is greater than the first threshold value and has the parallax that is greater than the second threshold value is found and is denoted as F. The common-view feature point may be determined by using any existing or future feature point extraction and tracking solution, which is not limited herein. In addition, the first threshold and the second threshold may be set appropriately according to experience, and are not limited herein.

Step S112, calculating an essential matrix between the first frame image and the second frame image to determine a transformation relationship between the two frame images.

After the first frame image and the second frame image are found, the intrinsic matrix between the first frame image and the second frame image can be calculated by using an eight-point method. Those skilled in the art can understand how to use the eight-point method to calculate the essential matrix between two images, and the description is not made here for the sake of brevity. In the calculation process, the input is a pixel point of the common-view feature point in the image, and the pixel point can be represented by a vector, such as a one-dimensional matrix. To facilitate subsequent calculations, the essential matrix can be represented as a transformation relationship of two frame images.

Step S113, based on the transformation relation, a first transformation matrix between two frames of images is obtained.

Then, SVD decomposition is carried out on the transformation relation, and a first transformation matrix can be obtained. In this embodiment, the first transformation matrix obtained is 4 × 4, where the upper left corner of the matrix, 3 × 3, may represent a rotational relationship between the two frames of images, and the upper right corner, 3 × 1, may represent a translational relationship between the two frames of images.

Therefore, the error rate of feature point matching can be reduced under the condition that the environment texture is repeated and single, and accurate data support is provided for subsequently improving the stability of the visual inertial odometer.

Fig. 3 shows a schematic flow chart of the step S120 of performing feature point tracking and feature point matching on at least two frames of images acquired by the visual-inertial odometer and performing outlier rejection based on a sampling consistency algorithm to obtain an optimized feature point matching result according to an embodiment of the present invention. As shown in fig. 3, step S120 may further include step S121, step S122, step S123, step S124, and step S125.

Step S121, triangularization is carried out on the common-view feature points and the first transformation matrix, so that feature point coordinates of the common-view feature points in the two frame images relative to the second frame image are obtained.

It will be appreciated that the purpose of triangularization is to find corresponding three-dimensional spatial points based on two-dimensional pairs of pixel points. Alternatively, triangularization can be divided into: an intermediate point method, a DLT direct method, an optimization method, and the like. Any existing or future technical solution that can implement the triangularization of the common view feature point and the first variation matrix is within the protection scope of the present application. Thus, the feature point coordinates (position) of the common-view feature point in the two frame images with respect to the second frame image F can be obtained

And step S122, respectively and sequentially carrying out three-dimensional coordinate posture prediction, triangulation and three-dimensional coordinate posture prediction on other frame images except the first frame image and the second frame image in the queue so as to obtain a second transformation matrix of the other frame images relative to the second frame image and the characteristic point coordinates of all characteristic points in the other frame images.

And respectively carrying out three-dimensional coordinate posture prediction once, triangularization and three-dimensional coordinate posture prediction again on other frame images except the first frame image N and the second frame image F in the queue, and finally obtaining a transformation matrix of the frame images relative to the second frame and the feature point coordinates of all feature points in the frame images.

Step S123, determining feature point coordinates of the feature points in different frames based on the feature point coordinates of all the feature points in all the images in the queue. In a specific embodiment, the feature point coordinates of the feature points in different frames can be calculated by the following formula:

wherein,

As can be seen from the above, when the second transformation matrix is 4 × 4, the 3 × 3 matrix at the upper left corner in the second transformation matrix is the rotation matrix of the frame image relative to the second frame image. And according to the second transformation matrix and the feature point coordinates of all feature points in other frame images, the feature point coordinates of the nth frame image relative to the second frame image can be calculated, and the position of the feature point in the nth frame image is further determined, namely the feature point coordinates of the feature point in different frames are determined. Optionally, when calculating the feature point coordinates of the feature points in different frames, motion constraint of the monocular camera may be added to improve robustness and accuracy of the calculation result.

And step S124, constructing a re-projection error and randomly sampling feature points of the two frames of images to eliminate outliers with the re-projection error larger than a third threshold value.

After the feature point coordinates of the feature points under different frame images are determined, the feature points can be re-projected into image pixels of the monocular camera, and the Euclidean distance between the feature points and the previous image pixels is calculated to construct a re-projection error. In addition, random sampling of feature points is performed for two frames of images, and approximately 100 feature points can be acquired at a time. It is understood that, in an ideal case, the feature points between the two images are mutually transformed by the transformation matrix. In practice, however, there may be some noise. According to the Euclidean distance and the randomly sampled feature points, outliers in which the reprojection error is larger than a third threshold, namely, the outliers do not meet the mutual transformation relation, can be eliminated.

And step S125, performing adjustment optimization of the light beam method on the first transformation matrix, the second transformation matrix and the feature points after the outliers are removed to obtain an optimized feature point matching result.

Finally, unified beam adjustment optimization (BA) can be carried out on the feature points after the outliers are removed, the first transformation matrix and the second transformation matrix, and after optimization, the optimized first transformation matrix, the optimized second transformation matrix and the feature point coordinates of all the optimized feature points can be output.

Therefore, the error rate of feature point matching can be remarkably reduced, the influence on visual constraint and motion recovery by the feature point matching is avoided, the accuracy of visual motion recovery is improved, and the accuracy of motion constraint of the inertial measurement unit is improved.

Optionally, in an embodiment, the control method may further include performing external reference calibration and pixel alignment with the visual inertial odometer by using the distance sensor to obtain depth information of the partial feature points, thereby obtaining visual scale information.

For example, the distance sensor may be a depth camera or a laser sensor, etc. After external reference calibration and pixel alignment are carried out by utilizing the distance sensor and the visual inertial odometer, a relative pose transformation matrix between the distance sensor and the visual inertial odometer can be output. Then, when the above technical scheme is used for tracking the feature points to realize visual motion recovery, the depth information of part of the feature points can be directly obtained, and further the visual scale information is obtained. The data output by the depth camera or the laser sensor is three-dimensional points of the environment detected by the depth camera or the laser sensor, namely depth information. In addition, after the external reference calibration is aligned with the pixels, the pixels corresponding to the depth information can be aligned with the monocular camera, and thus the visual scale information can be obtained. The non-observability of one dimension in the visual scale information can be compensated, so that the dimension of the visual constraint is more sufficient.

In another embodiment, the control method may further include acquiring relevant parameters of the positioning device and inputting the relevant parameters into the position prediction model to output actual position information of the visual inertial odometer.

Alternatively, the positioning device may be a wireless fidelity device (WIFI), an ultra wideband device (UWB), or the like. The following description will be given by taking WIFI as an example when positioning a device. For example, a WIFI list may be obtained, including WIFI names, signals, and the like. This is input as an input to the position prediction model. Via the position prediction model, actual position information of the visual inertial odometer may be output. Thus, the training of the position prediction model can be realized. Preferably, the location prediction model may be end-to-end. Through the trained position prediction model, low-frequency global position information can be output, and positioning drift can be effectively avoided when visual data and IMU data are in failure states in the joint optimization process. So that the constraint is more sufficient and the stability of the visual inertia odometer is higher.

In particular, the inertial neural network may include a convolutional layer and a fully-connected layer. Fig. 4 shows a schematic flow chart of step S130 of inputting the actual speed parameter related to the visual odometer to the inertial neural network to output the relative error between the predicted speed parameter related to the visual odometer and the actual speed parameter predicted by the inertial neural network according to one embodiment of the present invention. As shown in fig. 4, step S130 may include the following steps.

Step S131, inputting the actual speed parameter to the convolution layer to output the intermediate operation data to the long-short term memory neural network.

As mentioned above, the actual velocity parameters may include angular velocity, acceleration, visual velocity of the IMU, acceleration bias Ba and acceleration bias Bg in the motion model parameters, and the like. The convolution layer input to the inertial neural network can output intermediate operation data.

Step S132, based on the intermediate operation data, the long-short term memory neural network carries out time sequence prediction and inputs the prediction data to the full connection layer.

And then, inputting the intermediate operation data into the long-term and short-term memory neural network to perform time sequence prediction so as to obtain prediction data, and inputting the prediction data into the full connection layer. The prediction data may be the above-mentioned predicted speed parameter.

In step S133, the full link layer performs dimension conversion on the prediction data to output a relative error.

The fully-connected layer may perform dimension conversion on the prediction data after receiving the prediction data, for example, the prediction data is a 3 × 3 matrix, and the prediction data may be converted into a one-dimensional vector after passing through the fully-connected layer. Therefore, the relative error output by the full connection layer is a one-dimensional vector.

Therefore, the inertial neural network can be tightly coupled with vision and IMU to denoise IMU data. Therefore, the excessive dependence on the visual information is effectively reduced, and the long-time accurate position information can be provided under the condition that the visual information is invalid.

According to a second aspect of the invention, there is also provided a control device for the stability of a visual inertial odometer. Fig. 5 shows a schematic block diagram of a control device 500 for the stability of a visual inertial odometer according to one embodiment of the invention. As shown in fig. 5, the control device 500 includes:

the feature point tracking matching module 510 is configured to perform feature point tracking and feature point matching on at least two frames of images acquired by the visual inertial odometer.

The feature point optimization module 520 is configured to perform outlier rejection based on a sampling consistency algorithm to obtain an optimized feature point matching result.

The construction calculation module 530 is used for constructing the inertial neural network, and inputting the actual speed parameter related to the visual inertial odometer to the inertial neural network so as to output the relative error between the predicted speed parameter related to the visual inertial odometer and the actual speed parameter predicted by the inertial neural network.

The pose information optimization module 540 is configured to determine pose information of the optimized visual inertial odometer based on the optimized feature point matching result and the relative error.

According to a third aspect of the invention, an electronic device is also provided. FIG. 6 shows a schematic block diagram of an electronic device 600 according to one embodiment of the invention. As shown in fig. 6, electronic device 600 may include a processor 610 and a memory 620. Stored in the memory 620 are computer program instructions for executing the control method for the stability of the visual inertial odometer as described above, when executed by the processor 610.

According to a fourth aspect of the present invention, there is also provided a storage medium having stored thereon program instructions for executing the control method for the stability of a visual inertial odometer as described above when executed. The storage medium may include, for example, a storage component of a tablet computer, a hard disk of a computer, read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media

The detailed details and advantageous effects of the control apparatus, the electronic device and the storage medium for the stability of the visual inertia odometer can be understood by those skilled in the art through reading the above description related to the control method for the stability of the visual inertia odometer, and are not described herein again for brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and/or device may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of controlling stability for a visual inertial odometer, comprising:

carrying out feature point tracking and feature point matching on at least two frames of images collected by the visual inertial odometer;

carrying out outlier elimination based on a sampling consistency algorithm to obtain an optimized feature point matching result;

constructing an inertial neural network, and inputting an actual speed parameter related to the visual inertial odometer to the inertial neural network to output a relative error between a predicted speed parameter related to the visual inertial odometer and the actual speed parameter predicted by the inertial neural network;

and determining the optimized pose information of the visual inertial odometer based on the optimized feature point matching result and the relative error.

2. The method of claim 1, wherein the feature point tracking and feature point matching for at least two frames of images acquired by the visual odometer comprises:

finding out a last first frame image arranged in a queue of images collected by the visual inertial odometer and a second frame image which is in accordance with a preset condition compared with the first frame image;

calculating an essential matrix between the first frame image and the second frame image to determine a transformation relation between the two frame images;

and obtaining a first transformation matrix between the two frames of images based on the transformation relation.

3. The method for controlling stability of a visual-inertial odometer according to claim 2, wherein the performing feature point tracking and feature point matching on at least two frames of images acquired by the visual-inertial odometer, and performing outlier rejection based on a sampling consistency algorithm to obtain an optimized feature point matching result comprises:

triangularization is carried out on the common-view feature points and the first transformation matrix so as to obtain feature point coordinates of the common-view feature points in the two frame images relative to the second frame image;

respectively and sequentially carrying out three-dimensional coordinate posture prediction, triangulation and three-dimensional coordinate posture prediction on other frame images except the first frame image and the second frame image in the queue so as to obtain a second transformation matrix of the other frame images relative to the second frame image and feature point coordinates of all feature points in the other frame images;

determining the feature point coordinates of the feature points under different frames based on the feature point coordinates of all the feature points in all the images in the queue;

constructing a re-projection error and randomly sampling feature points of the two frames of images to remove outer points of which the re-projection error is greater than a third threshold value;

and performing bundle adjustment optimization on the first transformation matrix, the second transformation matrix and the feature points from which the outliers are removed to obtain the optimized feature point matching result.

4. A control method for the stability of a visual-inertial odometer according to claim 3, characterized in that the coordinates of the feature points of said feature points at different frames are calculated by the following formula:

wherein,

representing a rotation matrix, P, of the nth frame image in said queue with respect to said second frame image _n Indicating the position of the feature point in the n-th frame image,

representing the coordinates of the feature points of the n-th frame image with respect to the second frame image, P _F Representing feature point coordinates of the feature point with respect to the second frame image.

5. A control method for the stability of a visual odometer according to claim 1, characterized in that it further comprises:

and performing external parameter calibration and pixel alignment by using a distance sensor and the visual inertial odometer to obtain depth information of part of feature points and further obtain visual scale information.

6. The control method for stability of a visual odometer according to claim 5, characterized in that it further comprises:

and acquiring relevant parameters of the positioning equipment, and inputting the relevant parameters into a position prediction model to output actual position information of the visual inertial odometer.

7. The method of claim 1, wherein the inertial neural network comprises a convolutional layer and a fully-connected layer, wherein the inputting the actual speed parameter related to the visual inertial odometer to the inertial neural network to output the relative error between the predicted speed parameter related to the visual inertial odometer predicted by the inertial neural network and the actual speed parameter comprises:

inputting the actual speed parameter to the convolutional layer to output intermediate operation data to a long-term and short-term memory neural network;

based on the intermediate operation data, the long-short term memory neural network carries out time sequence prediction and inputs the prediction data to the full connection layer;

the fully connected layer performs dimension conversion on the prediction data to output the relative error.

8. A control device for the stability of a visual inertial odometer, comprising:

the characteristic point tracking matching module is used for tracking and matching characteristic points of at least two frames of images acquired by the visual inertia odometer;

the construction calculation module is used for constructing an inertial neural network, inputting an actual speed parameter related to the visual inertial odometer into the inertial neural network, and outputting a relative error between a predicted speed parameter predicted by the inertial neural network and related to the visual inertial odometer and the actual speed parameter;

and the pose information optimization module is used for determining the optimized pose information of the visual inertial odometer based on the optimized feature point matching result and the relative error.

9. An electronic device comprising a processor and a memory, wherein the memory has stored therein computer program instructions for executing the method of controlling for stability of a visual odometer according to any one of claims 1 to 7 when executed by the processor.

10. A storage medium on which are stored program instructions for performing, when running, a method of controlling for stability of a visual odometer according to any one of claims 1 to 7.