CN113516692B

CN113516692B - SLAM method and device for multi-sensor fusion

Info

Publication number: CN113516692B
Application number: CN202110541906.4A
Authority: CN
Inventors: 刘燕霖; 陈敏鹤; 金忠孝
Original assignee: SAIC Motor Corp Ltd; Shanghai Automotive Industry Corp Group
Current assignee: SAIC Motor Corp Ltd; Shanghai Automotive Industry Corp Group
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2024-07-19
Anticipated expiration: 2041-05-18
Also published as: CN113516692A

Abstract

The invention provides a multi-sensor fusion SLAM method and device, which are used for expanding the observation field of vision sensors and avoiding the characteristic point extraction failure caused by shielding, burst strong illumination and no obvious texture area; meanwhile, in order to eliminate monocular degradation caused by uncertainty of monocular scale and rapid rotation, inertial navigation observation and visual observation of the IMU and the wheel speed meter are tightly coupled to obtain a more robust SLAM system, so that the multi-sensor fusion SLAM device has a wider visual field range.

Description

SLAM method and device for multi-sensor fusion

Technical Field

The invention relates to the technical field, in particular to a SLAM method and device based on multi-camera, IMU and wheel speed meter multi-sensor fusion.

Background

The instant positioning and map construction (Simultaneous Localization AND MAPPING, SLAM) technology is a method in which a moving body takes the observation of a sensor of the moving body as input under unknown environment and position, outputs the position and the posture of the moving body in real time, and constructs a map according to the posture increment. SLAM technology has important application in the fields of automatic driving, high-precision maps, robots, unmanned aerial vehicles, AR/VR and the like at present.

The SLAM can be classified into a laser SLAM mainly using a laser radar and a visual SLAM mainly using a camera such as a monocular, binocular, RGBD, etc., according to the difference of main sensors. Laser SLAM has relatively developed to be mature, but visual SLAM is still the mainstream of current research in view of the cost of mass-production applications.

For visual SLAM, the binocular camera measurement range is limited by a base line and resolution, and a large amount of computation is needed to recover the pixel depth, so that the real-time performance is poor; the depth camera can directly obtain the pixel depth, but has the advantages of high price, small visual field, narrow measurement range and light sensitivity, and is mainly used indoors at present. Monocular cameras are widely focused on small volume, low cost and real-time performance, but have the disadvantage of requiring translation to determine depth, and have dimensional uncertainty themselves, resulting in a difference of a scaling factor between the reconstructed trajectory and the real trajectory. Currently, monocular vision SLAM has a number of excellent open source frameworks such as ORB-SLAM series, LSD-SLAM, DSO, SVO, and the like.

The IMU and the wheel speed meter are independent of external environment, the speed and the angular speed of the moving body can be given through pre-integration, and the IMU and the wheel speed meter have the absolute scale of reconstruction; the higher output frequency can give a prediction result among image frames, and the searching range matched between two frames can be reduced according to the prediction result, so that track following loss caused by degradation in monocular fast rotation is avoided. At the same time, however, the data errors of the IMU and the wheel speed meter are continuously amplified through accumulation, and the divergence and accumulation errors caused by offset can be reduced through visual observation coupling optimization, so that the result is more accurate. Currently, an SLAM open source framework of a monocular vision Fusion IMU mainly comprises VINS, OKVINS, VINS-Fusion and the like, and the schemes require precise hardware alignment and strict initialization, and are required to give dimensions, bias and gravity directions. In the invention, the wheel speed meter is introduced to directly obtain the speed variable, so that the initialization process is simplified.

In practical application, the single-path camera observation tends to have a limited visual field, and when the visual field is blocked, strong illumination or no obvious texture area appears, the feature points are lost, so that the system failure is caused. The multi-camera system provides observation of multiple visual angles, the visual field range is widened, and when one camera fails, the other cameras can also continue tracking and drawing construction, so that the robustness of the system is greatly improved. Multicol-SLAM provides a multi-path pure vision SLAM method based on a fish-eye camera model, feature points and descriptors are directly extracted from a plurality of fish-eye camera images, mismatching is easy to occur, and real-time performance is poor; meanwhile, the edge with larger distortion of the fisheye image is removed, so that great difficulty exists in extraction, matching and optimization of point and line characteristics; it also requires that there must be field overlap in the camera field of view to determine the scale, making its use limiting. Cubemap SLAM in correcting a single-way fisheye image by using a perspective cube model, image information is fully utilized, but the limitation of a single-way visual field is not overcome, and meanwhile, the absolute scale cannot be recovered by a single eye.

Disclosure of Invention

In view of this, the embodiment of the invention provides a multi-sensor fusion SLAM method based on a multi-camera, IMU and wheel speed meter, so as to realize a SLAM system capable of giving an absolute scale and expanding an observation field of view.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

A SLAM method of multisensor fusion, comprising:

calibrating internal parameters of a camera in the SLAM system;

calibrating external parameters of a camera, a wheel speed meter and an IMU in the SLAM system;

Giving a camera projection model based on calibration results of the internal parameters of the camera, the wheel speed meter and the external parameters of the IMU;

controlling the multi-camera and wheel speed Ji Song coupling initialization in the SLAM system;

Acquiring initialized map points and poses by adopting a tracking thread, calculating and optimizing the poses of the current key frame based on image data, IMU data and wheel speed data acquired from a reference frame to the current key frame, and sending the key frame to a local map building thread;

Acquiring a key frame sent by a tracking thread by adopting a local map building process, generating a new map point based on the key frame triangularization, fusing, screening and removing redundant key frames and redundant map points;

and carrying out loop detection on the key frames screened by the local map building process by adopting a loop detection thread, and correcting the pose and map points of the key frames screened according to loop frame adjustment.

Optionally, in the multi-sensor fusion SLAM method, when the camera in the SLAM system is a perspective camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more combinations of focal length, center coordinates of perspective images, and distortion coefficients; when the camera in the SLAM system is a fisheye camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more of a forward mapping coefficient, a reverse mapping coefficient, a fisheye image center coordinate, and an affine transformation coefficient.

Optionally, in the multi-sensor fusion SLAM method, the giving the camera projection model based on the calibration results of the internal parameters of the camera and the external parameters of the camera, the wheel speed meter and the IMU includes:

Based on the internal parameters and the external parameters, calculating to obtain a posture transformation matrix from a world coordinate system to a body coordinate system, a posture transformation matrix from the body coordinate system to a camera coordinate system, and a transformation matrix from a camera coordinate system of a fisheye camera to a camera coordinate system of a j-th virtual perspective camera;

The formula is given by As a camera projection model, wherein P is a pixel point P (X, Y) obtained by projecting a map point P (X, Y, Z) in a world coordinate system to a pixel coordinate system of a camera c _i, and K _j is a perspective transformation matrix of a j-th virtual perspective camera, andIs a transformation matrix from the camera coordinate system of the fish-eye camera c _i to the camera coordinate system of the jth virtual perspective camera, saidThe position and posture transformation matrix from the body coordinate system to the ith camera c _i camera coordinate system is T _bw, the position and posture transformation matrix from the world coordinate system to the body coordinate system is P, and the map points P (X, Y, Z) under the world coordinate system are P.

Optionally, in the multi-sensor fusion SLAM method, the controlling the multi-camera and wheel speed Ji Song coupling initialization in the SLAM system includes:

respectively calculating the essential matrix of each path of camera through epipolar constraint, and selecting the camera corresponding to the matrix with the largest interior points as an initializing camera;

Performing visual initialization to obtain a translation distance of a camera optical center of the initialized camera, and marking the translation distance as a first translation distance;

Obtaining the translation distance of the coordinate points of the body coordinate system between two adjacent frames through wheel speed meter pre-integration;

Calculating the translation distance of the camera optical center of the initialization camera between two adjacent frames based on the translation distance of the coordinate points of the body coordinate system, and marking the translation distance as a second translation distance;

calculating a scale factor based on the first translation distance and the second translation distance;

And recovering the absolute scale of the gesture and the map point based on the scale factors, and initializing the multi-camera and the wheel speed meter.

Optionally, in the multi-sensor fusion SLAM method, a loop detection thread is used to perform loop detection on the key frame after the local map-building thread is screened, and the pose and map point position of the key frame after the screening are corrected according to loop frame adjustment, including:

And selecting a key frame with the same word number as the current key frame and exceeding a threshold value and with similarity detection larger than the lowest score of an adjacent key frame as a candidate frame, and selecting the key frame with continuity from the candidate frames as a loop candidate frame, wherein the lowest score of the adjacent key frame refers to: the minimum similarity between the adjacent key frame of the current key frame and the previous key frame;

Matching of descriptors is accelerated through word bag vectors, the RANSAC is utilized to calculate the Sim3 transformation of the current key frame and the closed-loop candidate frame, and map points are projected according to the preliminarily obtained Sim3 transformation to obtain more matched map points;

Taking the Sim3 transformation and map points corresponding to the two key frames as vertexes, traversing the observation of all map points, adding binary sides of forward and reverse projection, fixing the pose of the map points, and optimizing the Sim3 transformation of the two frames by minimizing the reprojection error;

according to the optimized Sim3, projecting map points of the closed-loop key frame and the key frames connected with the closed-loop key frame to the current key frame for matching so as to obtain more matching points;

And solving the optimized Sim3, and adjusting the pose of the key frame and the position of the map point based on the solved Sim 3.

A multisensor fused SLAM device, comprising:

The calibration unit is used for calibrating internal parameters of the camera in the SLAM system; calibrating external parameters of a camera, a wheel speed meter and an IMU in the SLAM system;

The projection calculation unit is used for giving a camera projection model based on the internal parameters of the camera and calibration results of the camera, the wheel speed meter and the external parameters of the IMU;

The initialization unit is used for controlling the multi-camera and wheel speed Ji Song coupling initialization in the SLAM system;

The tracking thread unit is used for acquiring initialized map points and positions by adopting a tracking thread, calculating and optimizing the positions of the current key frame based on image data, IMU data and wheel speed data acquired from a reference frame to the current key frame, and sending the key frame to a local map building thread;

the local map-building line unit is used for tracking key frames sent by a thread by adopting a local map-building line Cheng Charu, generating new map points based on the triangularization of the key frames, fusing the new map points, screening and removing redundant key frames and redundant map points;

And the loop detection unit is used for carrying out loop detection on the key frames screened by the local map creation thread by adopting a loop detection thread, and correcting the pose and map points of the key frames screened according to loop frame adjustment.

Optionally, in the multi-sensor fusion SLAM device, when the camera in the SLAM system is a perspective camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more combinations of focal length, center coordinates of perspective images, and distortion coefficients; when the camera in the SLAM system is a fisheye camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more of a forward mapping coefficient, a reverse mapping coefficient, a fisheye image center coordinate, and an affine transformation coefficient.

Optionally, in the multi-sensor fusion SLAM device, the projection calculation unit is specifically configured to, when a camera projection model is given based on calibration results of the internal parameters of the camera and the external parameters of the camera, the wheel speed meter and the IMU:

based on the formula Calculating to obtain a camera projection model K _j, wherein P is a map point in a world coordinate system, P (X, Y, Z) is a pixel point P (X, Y) obtained by projecting the map point P (X, Y, Z) to a pixel coordinate system of a camera c _i, K _j is a perspective transformation matrix of a j-th virtual perspective camera, andIs a transformation matrix from the camera coordinate system of the fish-eye camera c _i to the camera coordinate system of the jth virtual perspective camera, saidThe T _bw is a pose transformation matrix from the world coordinate system to the body coordinate system, wherein the pose transformation matrix is from the body coordinate system to the ith camera c _i camera coordinate system.

Optionally, in the multi-sensor fusion SLAM device, the initializing unit is specifically configured to, when controlling the coupling initialization of the camera and the wheel speed Ji Song in the SLAM system:

And recovering the absolute scale of the gesture and the map point based on the scale factors, and initializing the IMU wheel speed meter.

Optionally, in the multi-sensor fused SLAM device, the loop detection unit performs loop detection on the key frame after the local map creation process is screened by using a loop detection thread, and when correcting the pose and map points of the screened key frame according to loop frame adjustment, the loop detection unit is specifically configured to:

Based on the above technical solution, the solution provided by the embodiment of the present invention includes: the calibration internal parameters refer to the internal parameters of a camera in a calibration SLAM system, the external parameters of the camera, a wheel speed meter and an IMU in the calibration SLAM system, the multi-camera and wheel speed Ji Song in the SLAM system are coupled and initialized through the calibration internal parameters and the external parameters to give a camera projection model, a tracking thread is adopted to continuously calculate a new pose according to new image, IMU and wheel speed meter data, key frames are inserted and pose is optimized, a local map building thread is adopted to process the key frames sent by the tracking thread, new map points are generated in a triangulating mode and fused, map points and redundant key frames are checked and removed, and finally a loop detection thread is adopted to carry out loop detection on the key frames sent by the local map building thread, so that the limitations that absolute scale and limited single-path visual field cannot be given in the traditional single-path pure-visual SLAM system are overcome.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a SLAM method of multi-sensor fusion disclosed in an embodiment of the present application;

Fig. 2 is a schematic diagram of an approximate arc motion of a moving body from t to t+Δt according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an optimized structure of a multi-sensor fusion SLAM system according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a multi-sensor fusion SLAM device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the application discloses a SLAM method for multi-sensor fusion, which can comprise the following steps of:

Step S100: calibrating parameters;

The calibration parameters in the step mainly refer to a calibration internal parameter and a calibration external parameter, wherein the calibration internal parameter refers to an internal parameter of a camera in a calibration SLAM system. In this example, the SLAM system supports a multi-camera system. That is, the SLAM system supports that n (n is greater than or equal to 1) cameras (the cameras can be perspective cameras or fisheye cameras) are rigidly arranged on a moving body at any angle, whether the fields of view of the cameras overlap or not is not required, two paths of fisheye cameras are taken as an example, fisheye camera internal parameter calibration is taken as an example of an omnidirectional vision model based on Taylor series and provided by Scaramuzza, the internal parameters comprise a forward mapping coefficient a ₀,a₁,...,a_N, a reverse mapping coefficient p ₀,p₁,...,p_N, fisheye image center coordinates u ₀ and v ₀, affine transformation coefficients c, d and e, and of course, a person skilled in the art can select the camera internal parameters to be calibrated based on actual requirements.

The calibration external parameters refer to external parameters for calibrating a camera, a wheel speed meter and an IMU in an SLAM system, and the aim is to give the relative pose of rigid body transformation among the camera, the wheel speed meter and the IMU through the calibration external parameters. In this example, the center points of the two wheel speed meters in the SLAM system are set as the origin of coordinates of a Body coordinate system (Body for short) of the moving Body, and the pose of the Body coordinate system can be regarded as the absolute pose of the moving Body. The location and pose of the body coordinate system can be selected by those skilled in the art based on actual requirements. Since the subsequent optimization is tightly coupled, the IMU, wheel speed meter coordinate system and Body coordinate system are coincident, i.e., the moving Body forward direction is the x-axis positive direction of the IMU and the gravitational direction is the z-axis negative direction of the IMU. At this time, the SLAM system includes a World coordinate system (World), a Body coordinate system (Body), and a plurality of Camera coordinate systems (cameras). The external parameter calibration needs to align the IMU wheel speed reference system with a Body coordinate system and give a pose transformation relation from the Body coordinate system to an ith camera c _i camera coordinate system

Step S200: giving a camera projection model;

in this step, a camera projection model is given based on calibration results of the internal and external parameters.

Specifically, the process of calculating the projection model according to the calibration results of the internal reference and the external reference includes:

pose transformation from world coordinate system to Body coordinate system is known Pose transformation from Body coordinate system to ith camera c _i camera coordinate systemCan be based onAndThe transformation from a three-dimensional point in the world coordinate system to a two-dimensional point in the pixel coordinate system is obtained. Assuming that the map point in the world coordinate system is P (X, Y, Z), the pixel point P (X, Y) is obtained by projection onto the pixel coordinate system of the camera c _i, the following relationship is given:

Wherein, Representing a projection model of camera c _i. In the case of a perspective camera,The projection matrix (K _n) and the distortion coefficient (distortion coefficient K ₁、k₂、p₁、p₂) can be used for modeling. In the case of a fish-eye camera,The modeling may be performed using a calibrated generic polynomial model.

In particular, if a fisheye camera is used as the visual input, although the visual field can be greatly expanded, the fisheye image has large distortion, and the subsequent feature extraction and matching have a certain problem. Therefore, the application can adopt a multi-pinhole perspective projection model as a camera projection model to project the image of the fish-eye camera onto the imaging plane of the pinhole camera to obtain the corrected fish-eye image. Specifically, the multi-pinhole perspective projection model is a virtual perspective camera with N (N is more than or equal to 2) optical centers overlapped with the optical centers of the fish-eye camera, the respective imaging planes form a certain included angle,The transformation matrix from the camera coordinate system of the fish-eye camera c _i to the camera coordinate system of the j-th virtual perspective camera is represented, so the transformation relationship from the three-dimensional point P (X, Y, Z) under the world coordinate system to the two-dimensional point P (X, Y) under the pixel coordinate system of the j-th virtual perspective camera is:

The said Namely, the projection model of the camera is obtained,

Where K _j is the perspective transformation matrix of the jth virtual perspective camera.

Step S300: initializing multi-camera and wheel speed Ji Song coupling;

The method is mainly used for carrying out loose coupling initialization on the multiple cameras and the wheel speed meter, and at the beginning of the initialization, an SLAM system transmits in an initial frame image and a current frame image of the multiple cameras, each frame comprises multiple images from the multiple cameras, and the time stamps of the images are synchronous, namely, the images have the same time stamp. In the loose coupling initialization process, the gyroscope of the SLAM system is aligned with the wheel speed data, the wheel speed meter is unbiased, and the pre-integration between two adjacent frames of images transmitted by the SLAM system can be obtained by a discrete motion equation observed by inertia, so that the position and the gesture of a moving body are given out based on the pre-integration. The multi-camera loose coupling initialization is visual initialization, the initialization of the IMU is completed at the moment of visual initialization, the initial bias of the gyroscope is enabled to be 0, and the bias of the gyroscope is optimized subsequently. The method comprises the steps of performing feature extraction and matching on an initial frame image and a current frame image of each camera through visual initialization, obtaining camera image rotation R and translation t with the largest inner points through epipolar constraint calculation, taking the camera as LEADING CAMERA (initialized camera), taking the motion of the camera as a reference, and obtaining the pose of a moving body according to an external reference relation Triangularization establishes an initialization map.

Since the initialized calculated R and t have a monocular scale uncertainty, it is necessary to restore the absolute scale of the initialized map and pose by observation of the wheel speed meter. The moving distance of the optical center of the two-frame camera obtained by the visual odometer in the SLAM system is S _vo, and the distance is different from the real distance in the world coordinate system by a preset scale factor. Since the wheel speed meter gives the speed of the target object in the world coordinate system, an absolute scale can be given. The displacement of the moving body in the x-axis direction can be obtained through dead reckoning through the pre-integration of the wheel speed meter between two frames, and meanwhile, the displacement caused by rotation is considered, so that the distance S _vio of the camera optical center movement between two frames is finally obtained. Therefore, the real scale is scale=s _vio/S_vo, and the absolute scale of the pose and map points can be recovered accordingly, so that initialization is completed.

Specifically, in this embodiment, the first and second processing steps,

The observation equation of the inertial sensor in the SLAM system at the time t is as follows:

Wherein the method comprises the steps of Coordinates of the angular velocity vector measured for the gyroscope in the Body coordinate system, b _g (t) is the bias of the gyroscope, η _g (t) is the angular velocity noise,Is the true value of the angular velocity in the Body coordinate system; Is the coordinates of the velocity vector measured by the wheel speed in the Body coordinate system, η _v (t) is the velocity noise, Is the true value of the velocity in the Body coordinate system. the discrete motion equation from time t to time t+Δt is:

Where R _wb (t+Δt) represents a rotation matrix from the Body coordinate system at time t+Δt to the world coordinate system, and R _wb (t) represents a rotation matrix from the Body coordinate system at time t to the world coordinate system. The rotation vector in the Body coordinate system from the t time to the t+delta t time can be regarded as a lie algebra, and the lie group is obtained through exponential mappingA rotation matrix of the Body coordinate system from the t+Δt time Body coordinate system to the t time Body coordinate system is represented. The motion of the moving body between two frames is assumed to satisfy a constant turning rate and velocity amplitude model (CTRV), i.e., it can move at a constant turning rate and constant velocity while advancing along a straight line, approximating a circular arc motion, as shown in fig. 2. Assuming that the speed is constant during this time,Represents the displacement vector in the Body coordinate system from the time t to the time t + deltat,The rotation matrix representing the displacement vector from the Body coordinate system to the world coordinate system at time t+Δt is multiplied to obtain the displacement increment in the world coordinate system in Δt time.

Substitution of the discrete motion equation into the observation equation yields:

since the frequency of inertial observation is much higher than that of visual observation, there are a plurality of inertial observation data between the i-th frame and the j-th frame images, which are pre-integrated as:

The position and posture of the moving body can be given by pre-integration. And finishing the initialization of the IMU at the moment of visual initialization, enabling the initial bias of the gyroscope to be 0, and optimizing the bias subsequently.

The pose of the Body coordinate system is first given by visual initialization. Performing feature extraction and matching on an initial frame image and a current frame image of the same camera, performing total iteration for X times by using a random sampling consistency algorithm (RANSAC), wherein for example, X can be 200, selecting eight groups or other groups of feature points each time, calculating an essential matrix E by epipolar constraint, calculating errors of polar lines corresponding to all the feature points under the essential matrix E, judging the feature points as interior points if the errors are smaller than a given threshold, and recording the interior points; and simultaneously recording the accumulated scores of all feature point pairs under the intrinsic matrix E, wherein the larger the error is, the lower the score is, and recording the intrinsic matrix E with the highest score as an intrinsic matrix E _i initialized by the camera c _i. Singular Value Decomposition (SVD) of the essential matrix E _i recovers the camera motions R and t from a total of four combinations. And recovering the matched inner points of the camera through triangulation, removing points outside the angle of view, points with negative depth and points with excessive re-projection errors, and finally selecting R and t with the maximum triangulated points as the movement of the camera. Taking the camera with the maximum reconstruction point number as LEADING CAMERA (initializing the camera), taking the motion of the camera as a reference, and obtaining the pose of the moving Body, namely the pose of a Body coordinate system according to the external referenceAnd establishing an initialization map.

Since R and t calculated for pure vision initialization have monocular scale uncertainty, it is necessary to recover the absolute scale of the initialization map and pose by observation of the wheel speed meter. Firstly, global optimization is carried out on the pose of the Body coordinate system of the initial frame and the current frame and the position of the map point once, so that the reprojection error of the map point projected onto the pixel plane through the pose is minimum. Taking the camera coordinate system LEADING CAMERA of the initial frame as the world coordinate system, the absolute pose of the moving Body of the initial frame (namely the pose of the Body coordinate system) is Wherein the method comprises the steps ofThe rotation is indicated by the expression of a rotation,Representing translation (i.e., the coordinates of the initial frame camera optical center in the world coordinate system). Absolute pose of current frame moving body Wherein the method comprises the steps ofThe rotation is indicated by the expression of a rotation,Representing translation (i.e., the coordinates of the current frame camera optical center in the world coordinate system). The moving distance of the optical center of the camera between two frames obtained by the visual odometer is But the scale of this distance is inaccurate.

Since the wheel speed meter gives the speed in world coordinate system, the absolute scale can be given. Optionally calculated here using inertial observations of the initial and current frames, the distance traveled by the camera's optical center can be decomposed into a dead reckoned x-direction component and a rotated y-direction component. The distance S _odom of the origin of coordinates of the wheel speed meter coordinate system (i.e., body coordinate system) can be obtained by first pre-integrating the wheel speed meter between two frames. Because the interval between two frames is very short, S _odom is approximately regarded as the translation amount of the moving body in the x-axis direction; meanwhile, due to rotation between two frames, the camera optical center is approximately shifted in the y axis, and the shift amount of the camera optical center caused by rotation is calculated. External parameters between the known LEADING CAMERA and Body coordinate systemsThe pre-calibrated external parameters are considered to be unchanged during the initialization process, i.e Is the coordinates of the initial frame camera optical center in the initial frame Body coordinate system,After rotation, the coordinates of the current frame camera optical center in the initial frame Body coordinate systemThe distance of camera optical center movement caused by rotation can be obtainedDistance of camera optical center movement between two frames given by wheel speed meter and rotationTherefore, the real scale is scale=s _vio/S_vo, and the absolute scale of the pose and the map point can be recovered accordingly, so that the initialization of the SLAM system is completed.

Specifically, in the technical solution disclosed in another embodiment of the present application, the controlling the coupling initialization of the multiple cameras and the wheel speed Ji Song in the SLAM system includes:

Step S400: tracking threads (Tracking).

After obtaining the initialized map points and pose, the SLAM system needs to continuously calculate new pose, insert key frames and optimize key frame pose according to the new image, IMU and wheel speed meter data as the data are transmitted. In this process, the tracking thread first calculates the pose between two frames (the previous frame or previous key frame and the current frame) using the visual and inertial observations of the previous frame or previous key frame.

If the difference between the occurrence time of the last repositioning and the current time is not greater than the preset duration, the pose is resolved by multi-path visual observation (TRACK REFERENCE KEY FRAME). The characteristic points of the two acquired frames of images are matched, and the matching process is as follows: and converting the descriptors of the feature points into bag-of-word vectors to match the feature points between the current frame and the reference frame, and removing mismatching by calculating a distance threshold, a proportion threshold and angle voting. After the obtained characteristic points of the two frames of images are matched, the pose of the reference frame is taken as an initial value of the pose of the current frame, the map points, the pose of the moving body and external parameters of a plurality of cameras are taken as vertexes, the observation of each map point is taken as one side, and the pose of the moving body of the current frame is optimized by optimizing the reprojection error from the three-dimensional point under the world coordinate system to the two-dimensional point of the pixel coordinate system. In the scheme, the map points can be optimized four times in total, the observation after each optimization is divided into outlier and inlier, wherein the outlier does not participate in the next optimization, and finally, mismatching map points are removed according to the updated pose. Alternatively, after the pose of the moving body is updated, the pose of the moving body and map points can be taken as known points, the external parameters of each camera are taken as vertexes, and the re-projection error of the unit edge is constructed to optimize the external parameters of the multiple cameras.

If no repositioning is triggered or the difference between the time of last repositioning and the current time is greater than a preset duration, the SLAM system adopts inertial observation to solve the pose (TRACK WITH Odom). The pose and bias of inertial observation of the reference frame are taken as initial values of the current frame, and pre-integral of inertial observation from the reference frame to the current frame is calculated. The map points of the previous frame are projected to the current frame through a projection model, matching points are obtained in a small range according to the distance between descriptors, mismatching is removed through angle voting, the pose of a moving body and inertial observation bias of two frames obtained through inertial observation, the map points and a multi-camera external parameter are taken as vertexes, constraint is added between the pose and bias of the two frames through pre-integration of the inertial observation between the two frames, meanwhile, observation of the map points is taken as constraint, the current pose is added as a unit edge of the vertexes, the total optimization is performed four times, the observation after optimization is divided into outlier and inlier, the outlier does not participate in the next optimization, and finally the mismatching map points are removed according to the updated pose. Alternatively, multi-camera outliers may continue to be optimized by minimizing the re-projection error.

And after the pose of the current frame is obtained through matching between the two frames and pose optimization, tracking the local map to obtain more matching, and continuously optimizing the pose of the current frame. Specifically, updating the local key frame and the local map points, finding points in the field of view of the current frame in the local map, and projecting the points to the current frame for matching so as to obtain more matched map points. Similarly, if the difference between the time of the last repositioning and the current time is not greater than the preset duration, a multi-camera visual observation is used to calculate the pose (Track Local Map), and the pose optimization method is the same as that in TRACK REFERENCE KEY FRAME; if no repositioning is triggered or the difference between the past time and the current time of the last repositioning is not greater than a preset duration, the pose (Track Local Map With Odom) is solved by using visual and inertial observations, and the pose optimization method is the same as that in TRACK WITH Odom. And finally, removing the map points which are mismatched through the updated current frame motion body pose.

If the number of matches obtained after two traces is insufficient, a relocation mode will be entered. The repositioning process includes using the key frame with the same word number as the current key frame and with similarity greater than the lowest score of adjacent key frame as repositioning candidate frame, and calculating the pose and map point of the current frame based on the repositioning candidate frame via EPNP method. If the relocation fails, the entire SLAM system is restarted.

The cost function of the reprojection error in the optimization process is as follows:

p _ij＝f(T_i,P_j) represents the pixel coordinates for projecting the three-dimensional point P _j onto the image plane according to the Body coordinate system pose. For the nth virtual pinhole camera of the mth fisheye camera, the projection relationship is:

p_ij＝K_nR_nT_mT_iP_j

Where T _m represents the pose transformation from the Body coordinate system to the mth camera system, R _n represents the pose transformation from the mth camera system to the nth virtual pinhole camera, and K _n represents the projection matrix of the nth virtual pinhole camera.

The structure of the optimization process diagram optimization is shown in fig. 3.The pose at time t=i can also be expressed by the pose of inertial observation, namelyMeanwhile, the inertial observation at each moment also has a bias B _i, a preset constraint edge exists between the pose PR _i of the inertial observation at the moment and the pose PR _i-1 and the bias B _i-1 at the last moment, and meanwhile, a constraint edge also exists between the biases at the two moments. At time t=1, camera c ₁ sees map point P ₀, and camera c ₂ sees map points P ₀ and P ₁. At time t=2, camera c ₁ sees map points P ₁ and P ₂, and camera c ₂ sees map points P ₂ and P ₃.Representing the external parameters of the mth camera at the time t=i, the graph optimization can be represented as the structure shown in fig. 3, and the pose of the moving body, the external parameters of the multiple cameras and the map point positions can be optimized by minimizing the observation errors of the edges.

Step S500: local Mapping thread (Local Mapping).

The thread processes the key frames sent by the tracking thread, inspects and eliminates map points, triangulates and restores new map points between the multiple cameras of the two frames, fuses the new map points, and performs local optimization on pose and map points by combining the inserted key frames, the connected key frames and all map points (Local Bundle Adjustment). And finally, generating new map points by continuously adding new key frames, removing redundant key frames and redundant map points, and providing a stable key frame set for loop detection.

In this step, when the number of key frames in the acquired map is less than a preset threshold, a visual observation of a sliding window is used to perform local optimization. Specifically, the pose of the moving body of all the connected key frames in the common view is taken as a pose vertex to be optimized, all map points which can be seen by the key frames are taken as map point vertices to be optimized, the pose of the moving body which can see the map points but is not in the common view is taken as a fixed pose vertex which does not participate in optimization and only provides a constraint relation, and the external parameters of a plurality of cameras are taken as the fixed pose vertices. And traversing the observation of each map point, adding multiple edges, respectively connecting the map point, the key frame pose vertex and the external reference vertex, and optimizing the position of the map point and the key frame motion pose in the common view by minimizing the reprojection error. Optionally, if the outliers of each camera need to be optimized, traversing the observations of all map points, adding cell edges with multiple camera outliers as vertices, optimizing the camera outliers by minimizing the re-projection error.

And when the number of the key frames in the acquired map is larger than a threshold value, adopting inertial observation of a sliding window to perform local optimization. Specifically, the inertial viewing pose and bias of all the connected key frames in the common view are used as pose vertices and bias vertices to be optimized, all map points which can be seen by the key frames are used as map point vertices to be optimized, all the inertial viewing pose and bias of the key frames which can see local map points but do not belong to the common view are used as fixed pose vertices and bias vertices which do not participate in optimization and only provide constraint relation, and the external parameters of a plurality of cameras are used as fixed pose vertices. And (3) giving a constraint relation of pose and offset between the current key frame and the previous key frame as an observation edge, traversing the observation of each map point, adding a multi-element edge, connecting the map point, the pose vertex of the key frame and the external reference vertex, and optimizing the position of the map point and the pose of the key frame moving body in the common view by minimizing the reprojection error. Optionally, if the outliers of each camera need to be optimized, traversing the observations of all map points, adding cell edges with multiple camera outliers as vertices, optimizing the camera outliers by minimizing the re-projection error.

Step S400: the loop detects the thread.

The thread carries out loop detection on the key frames sent by the local map building thread, adjusts the pose and map point positions of all the key frames in the map according to the loop frames, eliminates accumulated drift errors, and corrects the track and map before.

The specific process comprises the following steps:

Firstly, using the key frame with the same word number as the current key frame exceeding a threshold value and the similarity detection being greater than the lowest score of the adjacent key frames as a candidate frame, and then selecting the key frame with continuity as a loop candidate frame. And accelerating matching of descriptors through word bag vectors, calculating Sim3 transformation of the current key frame and the closed-loop candidate frame by using RANSAC, and projecting map points according to the preliminarily obtained Sim3 transformation to obtain more matched map points. Taking the Sim3 transformation of the two key frames and the map points as vertexes, traversing the observation of all map points, adding binary sides of forward and reverse projection, fixing the pose of the map points, and optimizing the Sim3 transformation of the two frames by minimizing the reprojection error to obtain more accurate Sim3 transformation. And according to the updated Sim3 transformation, projecting map points of the closed-loop key frame and the key frames connected with the closed-loop key frame to the current key frame for matching so as to obtain more matching points. And according to the solved Sim3 transformation, adjusting the pose of the key frame and the position of the map point, eliminating drift errors, and finally updating all the key frame and the map point through global optimization, and correcting the track and the map before.

Compared with the prior art, the application has the following advantages:

By introducing multiple cameras into the SLAM system, the observation field of view of a moving body can be effectively expanded, and the breakdown of the whole system caused by failure of a certain path of camera due to the reasons of camera shielding, no obvious characteristics of the field of view, severe change of the visual angle caused by turning and the like is prevented. Experiments on running the system on an automobile show that when an automobile comes from the rear, the illumination of the automobile lamp can cause failure caused by the fact that characteristic points cannot be extracted from images of a rear camera, if a single-path system is restarted at the moment, a multi-camera system can continue to track and build images by means of other cameras.

The SLAM system supports a perspective camera and a fisheye camera, and a multi-pinhole perspective projection model is provided for the fisheye camera. The field angle of the fisheye camera is larger, but meanwhile, the edge of the image has larger distortion, the existing characteristic points and descriptors are not suitable for the fisheye image with large distortion, an improved descriptor is provided in Multicol-SLAM, but the real-time performance is poorer, and meanwhile, in order to avoid mismatching, the edge of the fisheye image is generally avoided, and only the information in the middle of the image is utilized; extraction, matching and optimization of line features is difficult to directly perform on fish-eye images. According to the invention, the multi-pinhole perspective projection model is utilized to correct the fisheye image into the perspective projection image, and experiments show that the mature ORB features have good instantaneity and less mismatching, the extraction and matching of the feature points can be performed on the whole image, the image edge information is fully utilized, and the corrected image is also suitable for line features.

In addition, the invention does not require field of view overlapping among multiple cameras, and gives absolute scale by utilizing a wheel speed meter, so that the initialization process has better instantaneity and robustness. Because the wheel speed meter directly gives out speed observation, the need of aligning the gravity direction of the IMU during initialization is avoided; the initialization of the IMU and the wheel speed meter is completed at the same time of visual initialization, and the initial bias of the gyroscope is set to 0 and added to the subsequent optimization assuming that the wheel speed meter is unbiased. Therefore, the initialization time and the calculated amount are greatly simplified, and a relatively robust initialization result can be given in real time.

In a traditional monocular SLAM system framework such as ORB-SLAM2, the pose and map points of a moving body are used as optimized vertexes, and after a plurality of cameras are introduced, camera external parameters can also be optionally used as vertexes to be optimized; meanwhile, the wheel speed meter and the IMU pre-integration add new constraint to inertial bias of the inertial observation pose between two frames, and the pose of the key frame, the position of the map point and the camera external parameters can be optimized by minimizing the edges of the re-projection errors. The results show that the addition of multiple cameras and optimization of inertial observations makes the SLAM system operation process more robust.

In the embodiment, a multi-sensor fusion SLAM device is disclosed, and specific working contents of each unit in the device are referred to in the foregoing method embodiment, and description is given below of the multi-sensor fusion SLAM device provided in the embodiment of the present invention, and the multi-sensor fusion SLAM device described below and the multi-sensor fusion SLAM method described above may be referred to correspondingly.

Referring to fig. 4, a multi-sensor fusion SLAM device disclosed in an embodiment of the present application may include:

A calibration unit 100 for calibrating internal parameters of a camera in the SLAM system; calibrating external parameters of a camera, a wheel speed meter and an IMU in the SLAM system;

a projection calculation unit 200 for giving a camera projection model based on calibration results of the camera's internal parameters and the camera, wheel speed meter and IMU's external parameters;

an initialization unit 300, configured to control the multi-camera and wheel speed Ji Song coupling initialization in the SLAM system;

a tracking thread unit 400, configured to acquire initialized map points and poses by using a tracking thread, calculate and optimize the poses of the current key frame based on image data, IMU data and wheel speed data acquired from a reference frame to the current key frame, and send the key frame to a local mapping thread;

The local map-building thread unit 500 is configured to track key frames sent by threads by using a local map-building thread Cheng Charu, triangulate and generate new map points based on the key frames, fuse the new map points, and screen and reject redundant key frames and redundant map points;

And the loop detection unit 600 is configured to perform loop detection on the key frames screened by the local map creation thread by using a loop detection thread, and correct the pose and map points of the key frames screened according to loop frame adjustment.

Corresponding to the method, the internal parameters of the camera in the SLAM system calibrated by the calibration unit comprise, but are not limited to, one or more combinations of focal length, center coordinates of perspective images and distortion coefficients for the perspective camera; for fisheye cameras, the internal parameters include, but are not limited to, a combination of one or more of forward mapping coefficients, reverse mapping coefficients, fisheye image center coordinates, affine transformation coefficients.

Corresponding to the method, the projection calculation unit is specifically configured to, when a camera projection model is given based on calibration results of the internal parameters of the camera and the external parameters of the camera, the wheel speed meter and the IMU:

based on the formula Calculating to obtain a camera projection model K _j, wherein P is a map point in a world coordinate system, P (X, Y, Z) is a pixel point P (X, Y) obtained by projecting the map point P (X, Y, Z) to a pixel coordinate system of a camera c _i, K _j is a perspective transformation matrix of a j-th virtual perspective camera, andRepresenting a transformation matrix from a camera coordinate system of the fish-eye camera c _i to a camera coordinate system of the jth virtual perspective camera, saidThe T _bw is a pose transformation matrix from the world coordinate system to the body coordinate system, wherein the pose transformation matrix is from the body coordinate system to the ith camera c _i camera coordinate system.

Corresponding to the method, the initializing unit is specifically configured to, when controlling the multi-camera and wheel speed Ji Song coupling initialization in the SLAM system:

Corresponding to the method, the loop detection unit carries out loop detection on the key frames after the local map building process screening by adopting a loop detection thread, and when correcting the pose and map points of the key frames after the screening according to loop frame adjustment, the loop detection unit is specifically used for:

For convenience of description, the above system is described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A multiple sensor fusion SLAM method, comprising:

calibrating internal parameters of a camera in the SLAM system;

Acquiring initialized map points and positions by adopting a tracking thread, inserting key frames based on the acquired image data, IMU data and wheel speed meter data, correcting the positions and map points based on the image data, and sending the key frames to a local map building thread;

acquiring a key frame sent by a tracking thread by adopting a local map building process, generating new map points based on the key frame triangularization and fusing the map points, and screening and removing redundant key frames and redundant map points;

performing loop detection on the key frames screened by the local map building process by adopting a loop detection thread, and correcting the pose and map point positions of the key frames screened according to loop frame adjustment;

The controlling the multi-camera and wheel speed Ji Song coupling initialization in the SLAM system includes:

2. The multi-sensor fused SLAM method of claim 1, wherein when the camera in the SLAM system is a perspective camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more combinations of focal length, perspective image center coordinates, distortion coefficients; when the camera in the SLAM system is a fisheye camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more of a forward mapping coefficient, a reverse mapping coefficient, a fisheye image center coordinate, and an affine transformation coefficient.

3. The SLAM method of claim 1, wherein the step of giving a camera projection model based on calibration results of the camera's internal parameters and camera's, wheel speed meter, and IMU's external parameters comprises:

4. The SLAM method of claim 1, wherein performing loop detection on the screened keyframes of the local map process using a loop detection thread, correcting the pose and map point position of the screened keyframes according to loop frame adjustment, comprises:

5. A multisensor fused SLAM device, comprising:

the loop detection unit is used for carrying out loop detection on the key frames screened by the local map building process by adopting a loop detection thread, and correcting the pose of the key frames and the positions of map points after the screening according to loop frame adjustment;

the initializing unit is specifically configured to, when controlling the multi-camera and wheel speed Ji Song coupling initialization in the SLAM system:

6. The multi-sensor fused SLAM device of claim 5, wherein when the camera in the SLAM system is a perspective camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more combinations of focal length, perspective image center coordinates, distortion coefficients; when the camera in the SLAM system is a fisheye camera, the internal parameters of the camera in the calibrated SLAM system include, but are not limited to, one or more of a forward mapping coefficient, a reverse mapping coefficient, a fisheye image center coordinate, and an affine transformation coefficient.

7. The multi-sensor fusion SLAM device of claim 5, wherein the projection calculation unit is configured to, when given a camera projection model based on calibration results of the camera's internal parameters and camera's, wheel speed meter, and IMU's external parameters:

based on the formula Calculating to obtain a camera projection model K _j, wherein P is a map point in a world coordinate system, P (X, Y, Z) is a pixel point P (X, Y) obtained by projecting the map point P (X, Y, Z) to a pixel coordinate system of a camera c _i, K _j is a perspective transformation matrix of a j-th virtual perspective camera, andThe world coordinate system is the transformation matrix from the camera coordinate system of the fish-eye camera c _i to the camera coordinate system of the jth virtual perspective cameraThe position and posture transformation matrix from the body coordinate system to the ith camera c _i camera coordinate system is T _bw, the position and posture transformation matrix from the world coordinate system to the body coordinate system is P, and the map points P (X, Y, Z) under the world coordinate system are P.

8. The multi-sensor fusion SLAM device of claim 5, wherein the loop detection unit performs loop detection on the key frames after the local map process screening by using a loop detection thread, and when correcting the pose and map points of the key frames after the screening according to loop frame adjustment, is specifically configured to: