CN111354043A

CN111354043A - A three-dimensional attitude estimation method and device based on multi-sensor fusion

Info

Publication number: CN111354043A
Application number: CN202010108437.2A
Authority: CN
Inventors: 蔡国榕; 吴则彪; 陈桂锟; 沈启金; 王宗跃; 吴云东; 徐焕
Original assignee: Jimei University
Current assignee: Jimei University
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2020-06-30

Abstract

The invention provides a three-dimensional attitude estimation method and device based on multi-sensor fusion. The method includes: acquiring an RGB image of a measured object, extracting FAST feature points and tracking the feature points; collecting IMU measurement values of the measured object , and integrate to obtain the position, velocity and rotation of the measured object at the current moment; obtain the depth image of the measured object, and process the depth image to obtain the iterative closest point; based on the prior information, the iterative closest point, the IMU measurement value and An objective function is constructed by visual reprojection; an iterative method is used to solve the minimum value of the objective function to obtain the optimal pose of the object. This method performs attitude estimation based on a variety of sensors, and constructs an objective function for attitude estimation. The objective function is solved by the least square method. The input value of the function is input through a sliding window, and the specific number of sliding windows is set. The solution time is reduced and the estimation accuracy of the pose is improved.

Description

A three-dimensional attitude estimation method and device based on multi-sensor fusion

技术领域technical field

本发明涉及三维重建技术领域，具体涉及一种基于多传感器融合的三维姿态估计方法及装置。The invention relates to the technical field of three-dimensional reconstruction, in particular to a three-dimensional attitude estimation method and device based on multi-sensor fusion.

背景技术Background technique

姿态估计问题就是确定某一三维目标物体的方位指向问题。姿态估计在机器人视觉、动作跟踪和单照相机定标等很多领域都有应用。在不同领域用于姿态估计的传感器是不一样的。Attitude estimation problem is to determine the orientation of a three-dimensional target object. Pose estimation has applications in many fields such as robot vision, motion tracking, and single-camera calibration. The sensors used for pose estimation in different fields are different.

基于模型的方法通常利用物体的几何关系或者物体的特征点来估计。其基本思想是利用某种几何模型或结构来表示物体的结构和形状，并通过提取某些物体特征，在模型和图像之间建立起对应关系，然后通过几何或者其它方法实现物体空间姿态的估计。这里所使用的模型既可能是简单的几何形体，如平面、圆柱，也可能是某种几何结构，也可能是通过激光扫描或其它方法获得的三维模型。基于模型的姿态估计方法是通过比对真实图像和合成图像，进行相似度计算更新物体姿态。目前基于模型的方法为了避免在全局状态空间中进行优化搜索，一般都将优化问题先降解成多个局部特征的匹配问题，非常依赖于局部特征的准确检测。当噪声较大无法提取准确的局部特征的时候，该方法的鲁棒性受到很大影响。Model-based methods usually use the geometric relationship of objects or the feature points of objects to estimate. The basic idea is to use a certain geometric model or structure to represent the structure and shape of the object, and by extracting some object features, establish a corresponding relationship between the model and the image, and then realize the estimation of the object's spatial pose through geometric or other methods. . The model used here may be a simple geometric shape, such as a plane, a cylinder, or a certain geometric structure, or a three-dimensional model obtained by laser scanning or other methods. The model-based pose estimation method is to update the pose of the object by comparing the real image and the synthetic image, and calculating the similarity. In order to avoid the optimization search in the global state space, the current model-based methods generally degrade the optimization problem into a matching problem of multiple local features, which is very dependent on the accurate detection of local features. When the noise is too large to extract accurate local features, the robustness of the method is greatly affected.

基于学习的方法借助于机器学习(machine learning)方法，从事先获取的不同姿态下的训练样本中学习二维观测与三维姿态之间的对应关系，并将学习得到的决策规则或回归函数应用于样本，所得结果作为对样本的姿态估计。基于学习的方法一般采用全局观测特征，不需检测或识别物体的局部特征，具有较好的鲁棒性。其缺点是由于无法获取在高维空间中进行连续估计所需要的密集采样，因此无法保证姿态估计的精度与连续性。Learning-based methods use machine learning methods to learn the correspondence between 2D observations and 3D poses from previously acquired training samples in different poses, and apply the learned decision rules or regression functions to sample, and the result is used as the pose estimation for the sample. Learning-based methods generally use global observation features, do not need to detect or identify local features of objects, and have better robustness. The disadvantage is that the accuracy and continuity of pose estimation cannot be guaranteed because the dense sampling required for continuous estimation in high-dimensional space cannot be obtained.

发明内容SUMMARY OF THE INVENTION

本发明针对上述现有技术中的缺陷，提出了如下技术方案。The present invention proposes the following technical solutions in view of the above-mentioned defects in the prior art.

一种基于多传感器融合的三维姿态估计方法，该方法包括：A three-dimensional pose estimation method based on multi-sensor fusion, the method includes:

RGB图像获取步骤，使用图像传感器获取被测对象的RGB图像，并提取所述RGB图像的FAST特征点以及使用KLT稀疏光流法进行特征点的跟踪；The RGB image acquisition step, using an image sensor to acquire the RGB image of the measured object, and extracting the FAST feature points of the RGB image and using the KLT sparse optical flow method to track the feature points;

IMU测量值获取步骤，使用IMU传感器采集被测对象的IMU测量值，并对所述IMU测量值进行积分得到当前时刻的被测对象的位置、速度和旋转；The IMU measurement value acquisition step is to use the IMU sensor to collect the IMU measurement value of the measured object, and integrate the IMU measurement value to obtain the position, speed and rotation of the measured object at the current moment;

深度图像获取步骤，使用深度图像传感器获取被测对象的深度图像，使用迭代最近点(ICP)算法对所述深度图像进行处理得到迭代最近点；In the depth image acquisition step, a depth image sensor is used to acquire a depth image of the measured object, and an iterative closest point (ICP) algorithm is used to process the depth image to obtain an iterative closest point;

目标函数构建步骤，基于先验信息、迭代最近点、IMU测量值和视觉重投影构建目标函数；Objective function construction step, construct objective function based on prior information, iterative closest point, IMU measurement value and visual reprojection;

位姿估计步骤，使用迭代法对所述目标函数求解最小值得到所述对象最优的位姿。In the pose estimation step, an iterative method is used to solve the minimum value of the objective function to obtain the optimal pose of the object.

更进一步地，所述方法还包括：回环检测步骤，检测当前帧是否发生回环，如果是，则执行全局的位姿图优化来矫正所述对象的定位轨迹，以消除累积误差。Further, the method further includes: a loop closure detection step, detecting whether a loop closure occurs in the current frame, and if so, performing a global pose graph optimization to correct the positioning trajectory of the object to eliminate accumulated errors.

更进一步地，所述目标函数为：Further, the objective function is:

min_(R,t){PP+∑||D||²+∑||I||²+∑||V||²}；min _(R,t) {PP+∑||D|| ² +∑||I|| ² +∑||V|| ² };

通过最小二乘法求得一个最合适的运动参数：旋转参数

和位移参数

使四个残差项PP、∑||D||²、∑||I||²与∑||V||²的和的值最低，其中，PP表示先验信息的先验分布、D表示迭代最近点残差项、I表示所述对象的两帧之间位置、速度、旋转、陀螺仪偏差和加速度偏差的变化量的差、V表示重投影误差。A most suitable motion parameter is obtained by the least square method: rotation parameter

and displacement parameters

The value of the sum of the four residual terms PP, ∑||D|| ² , ∑||I|| ² and ∑||V|| ² is the lowest, where PP represents the prior distribution of prior information, D represents the iterative nearest point residual term, I represents the difference in the amount of change in position, velocity, rotation, gyroscope bias and acceleration bias between two frames of the object, and V represents the reprojection error.

更进一步地，基于滑动窗口求解该最小二乘，滑动窗口内包含最小二乘法计算所需要的输入参数，每次最小二乘的除先验信息之外的误差项都由滑动窗口内的值提供。Further, the least squares are solved based on the sliding window, the sliding window contains the input parameters required for the calculation of the least squares, and the error term of each least squares except the prior information is provided by the value in the sliding window. .

更进一步地，基于以下条件确定RGB图像中的关键帧：Further, the keyframes in the RGB image are determined based on the following conditions:

该帧的平均视差与上一帧关键帧的平均视差相比差值超过第一阈值；The difference between the average disparity of the frame and the average disparity of the previous key frame exceeds the first threshold;

或者，该帧跟踪的特征点数量小于第二阈值。Alternatively, the number of feature points tracked in this frame is less than the second threshold.

更进一步地，通过深度图像得到空间的几何结构，从而估计两帧之间的变换矩阵，迭代最近点(ICP)残差项D使用点到面的距离为：Furthermore, the geometric structure of the space is obtained from the depth image, thereby estimating the transformation matrix between the two frames, and the iterative closest point (ICP) residual term D uses the point-to-surface distance as:

D＝(T·P_j-P_i)·n_i D=(T·P _j -P _i )·n _i

其中，P_j和P_i是上述深度图像中已匹配的三维点，n_i是上一帧对应的法向量，旋转参数

和位移参数

是两帧之间的运动参数；Among them, P _j and P _i are the matched 3D points in the above depth image, n _i is the normal vector corresponding to the previous frame, and the rotation parameter

and displacement parameters

is the motion parameter between two frames;

重投影误差V作为视觉约束，并将其定义在单位球面的正切平面上：The reprojection error V serves as a visual constraint and defines it on the tangent plane of the unit sphere:

其中，

是特征点经过刚体变换后三维点的单位向量的预测值，P是特征点由像素坐标经针孔相机模型反投影得到的三维点，

和

是P对应的切平面的任意两个正交基。in,

is the predicted value of the unit vector of the three-dimensional point after the feature point is transformed by the rigid body, P is the three-dimensional point obtained by the back-projection of the feature point by the pixel coordinates through the pinhole camera model,

and

are any two orthonormal bases of the tangent plane corresponding to P.

本发明还提出了一种基于多传感器融合的三维姿态估计装置，该装置包括：The present invention also proposes a three-dimensional attitude estimation device based on multi-sensor fusion, the device comprising:

RGB图像获取单元，用于使用图像传感器获取被测对象的RGB图像，并提取所述RGB图像的FAST特征点以及使用KLT稀疏光流法进行特征点的跟踪；The RGB image acquisition unit is used to obtain the RGB image of the measured object using the image sensor, and extracts the FAST feature point of the RGB image and uses the KLT sparse optical flow method to track the feature point;

IMU测量值获取单元，用于使用IMU传感器采集被测对象的IMU测量值，并对所述IMU测量值进行积分得到当前时刻的被测对象的位置、速度和旋转；The IMU measurement value acquisition unit is used to use the IMU sensor to collect the IMU measurement value of the measured object, and integrate the IMU measurement value to obtain the position, speed and rotation of the measured object at the current moment;

深度图像获取单元，用于使用深度图像传感器获取被测对象的深度图像，使用迭代最近点(ICP)算法对所述深度图像进行处理得到迭代最近点；a depth image acquisition unit, used for using a depth image sensor to acquire a depth image of the measured object, and using an iterative closest point (ICP) algorithm to process the depth image to obtain an iterative closest point;

目标函数构建单元，用于基于先验信息、迭代最近点、IMU测量值和视觉重投影构建目标函数；An objective function construction unit for constructing an objective function based on prior information, iterative closest points, IMU measurements and visual reprojection;

位姿估计单元，用于使用迭代法对所述目标函数求解最小值得到所述对象最优的位姿。The pose estimation unit is configured to use an iterative method to solve the minimum value of the objective function to obtain the optimal pose of the object.

更进一步地，回环检测单元，用于检测当前帧是否发生回环，如果是，则执行全局的位姿图优化来矫正所述对象的定位轨迹，以消除累积误差。Furthermore, a loop closure detection unit is used to detect whether a loop closure occurs in the current frame, and if so, perform global pose graph optimization to correct the positioning trajectory of the object to eliminate accumulated errors.

更进一步地，所述目标函数为：Further, the objective function is:

通过最小二乘法求得一个最合适的运动参数：旋转参数

和位移参数

and displacement parameters

D＝(T·P_j-P_i)·n_i D=(T·P _j -P _i )·n _i

和位移参数

and displacement parameters

is the motion parameter between two frames;

其中，

和

是P对应的切平面的任意两个正交基。in,

and

are any two orthonormal bases of the tangent plane corresponding to P.

本发明的技术效果在于：本发明的一种基于多传感器融合的三维姿态估计方法，该方法包括：使用图像传感器获取被测对象的RGB图像，并提取所述RGB图像的FAST特征点以及使用KLT稀疏光流法进行特征点的跟踪；使用IMU传感器采集被测对象的IMU测量值，并对所述IMU测量值进行积分得到当前时刻的被测对象的位置、速度和旋转；使用深度图像传感器获取被测对象的深度图像，使用迭代最近点(ICP)算法对所述深度图像进行处理得到迭代最近点；基于先验信息、迭代最近点、IMU测量值和视觉重投影构建目标函数；使用迭代法对所述目标函数求解最小值得到所述对象最优的位姿。本方法基于多种传感器采集运动对象的RGB图像、IMU测量值及深度图像进行姿态估计，并构建了用于姿态估计的目标函数，该目标函数使用最小二乘法进行求解，该函数的输入值通过滑动窗口进行输入，设置了滑动窗口的具体数目，降低求解时间，提高了位姿的估计精度，还设计了回环检测过程，在发生回环时执行全局的位姿图优化来矫正所述对象的定位轨迹，以消除累积误差，并具体设计相应的残差函数。The technical effect of the present invention is: a three-dimensional attitude estimation method based on multi-sensor fusion of the present invention, the method includes: using an image sensor to obtain an RGB image of a measured object, extracting FAST feature points of the RGB image, and using KLT The sparse optical flow method is used to track the feature points; the IMU measurement value of the measured object is collected by the IMU sensor, and the IMU measurement value is integrated to obtain the position, speed and rotation of the measured object at the current moment; the depth image sensor is used to obtain The depth image of the measured object, the iterative closest point (ICP) algorithm is used to process the depth image to obtain the iterative closest point; the objective function is constructed based on the prior information, the iterative closest point, the IMU measurement value and the visual reprojection; the iterative method is used The optimal pose of the object is obtained by solving the minimum value of the objective function. This method uses various sensors to collect RGB images, IMU measurement values and depth images of moving objects for attitude estimation, and constructs an objective function for attitude estimation. The objective function is solved by the least square method. The input value of the function is obtained by The sliding window is used for input, the specific number of sliding windows is set, the solution time is reduced, the estimation accuracy of the pose is improved, and the loopback detection process is also designed, and the global pose graph optimization is performed when a loopback occurs to correct the positioning of the object trajectory to eliminate the accumulated error, and specifically design the corresponding residual function.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本申请的其它特征、目的和优点将会变得更明显。Other features, objects and advantages of the present application will become more apparent upon reading the detailed description of non-limiting embodiments taken with reference to the following drawings.

图1是根据本发明的实施例的一种基于多传感器融合的三维姿态估计方法的流程图。FIG. 1 is a flowchart of a three-dimensional pose estimation method based on multi-sensor fusion according to an embodiment of the present invention.

图2是根据本发明的实施例的一种基于多传感器融合的三维姿态估计装置的结构图。FIG. 2 is a structural diagram of a three-dimensional attitude estimation apparatus based on multi-sensor fusion according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

图1示出了本发明的一种基于多传感器融合的三维姿态估计方法，该方法包括：Fig. 1 shows a three-dimensional attitude estimation method based on multi-sensor fusion of the present invention, and the method includes:

RGB图像获取步骤S101，使用图像传感器获取被测对象的RGB图像，并提取所述RGB图像的FAST特征点以及使用KLT稀疏光流法进行特征点的跟踪；所述图像传感器可以是摄像头、摄像机等等，其可以采集被测对象的多帧图像。RGB image acquisition step S101, use an image sensor to acquire the RGB image of the measured object, and extract the FAST feature point of the RGB image and use the KLT sparse optical flow method to track the feature point; the image sensor can be a camera, a video camera, etc. etc., it can collect multiple frames of images of the measured object.

每当一帧新的RGB图像到来时，系统将自动提取FAST特征点，然后使用KLT稀疏光流法进行特征点的跟踪。为了保证跟踪的准确性又能保持比较小的计算代价，可以设置特征点跟踪的数量范围在100到300之间，在进行特征点跟踪的同时会检测新的特征点以维持跟踪特征点数量始终保持在一个阈值内。Whenever a new frame of RGB image arrives, the system will automatically extract FAST feature points, and then use the KLT sparse optical flow method to track the feature points. In order to ensure the accuracy of tracking and keep the computational cost relatively small, the number of feature points to be tracked can be set between 100 and 300, and new feature points will be detected during feature point tracking to maintain the number of tracked feature points. stay within a threshold.

IMU测量值获取步骤S102，使用IMU传感器采集被测对象的IMU测量值，并对所述IMU测量值进行积分得到当前时刻的被测对象的位置、速度和旋转；IMU传感器又称为惯性传感器，通过对离散的IMU测量值进行积分就可以得到当前时刻的位置、速度和旋转，可以给出当前搭载传感器的对象进行运动估计。IMU measurement value acquisition step S102, use the IMU sensor to collect the IMU measurement value of the measured object, and integrate the IMU measurement value to obtain the position, speed and rotation of the measured object at the current moment; the IMU sensor is also called an inertial sensor, By integrating the discrete IMU measurements, the position, velocity, and rotation at the current moment can be obtained, and the motion estimation of the object currently equipped with the sensor can be given.

深度图像获取步骤S103，使用深度图像传感器获取被测对象的深度图像，使用迭代最近点(ICP)算法对所述深度图像进行处理得到迭代最近点；所述深度图像传感器可以是深度摄像机、深度摄像头，通过深度图像可以得到被测对象空间的几何结构，从而估计两帧之间的变换矩阵。Step S103 of obtaining a depth image, using a depth image sensor to obtain a depth image of the measured object, and using an iterative closest point (ICP) algorithm to process the depth image to obtain an iterative closest point; the depth image sensor may be a depth camera, a depth camera , the geometric structure of the measured object space can be obtained through the depth image, so as to estimate the transformation matrix between the two frames.

此外，系统对于场景真实尺度的估计也通过深度图像完成。深度图像可以提供像素点相对于相机光心(Pinhole)的真实距离，由此便可通过针孔相机模型进行像素坐标和空间坐标的转化，从而恢复场景的真实尺度。In addition, the system's estimation of the real scale of the scene is also done through the depth image. The depth image can provide the real distance of the pixel relative to the camera's optical center (Pinhole), so that the pixel coordinates and spatial coordinates can be converted through the pinhole camera model, so as to restore the real scale of the scene.

在S102中IMU测量值的获取频率虽高于S103中深度图像的获取频率，而通常情况下，由于IMU数据和图像数据需要对齐，在相邻两帧深度图像间隔的时间差内累计的IMU测量值不足以积分要求。本发明在连续四帧的图像采集间隔内积累IMU测量值，以避免由于IMU测量数据不够而导致的积分结果波动。而相隔四帧的深度图像处理得到迭代最近点具有更大的误差，本发明保留了相邻帧的迭代最近点计算结果，将连续四帧的迭代最近点计算结果进行合并作为相邻4帧的迭代最近点结果。Although the acquisition frequency of the IMU measurement value in S102 is higher than the acquisition frequency of the depth image in S103, in general, since the IMU data and the image data need to be aligned, the IMU measurement value accumulated within the time difference between two adjacent frames of depth images Not enough points required. The present invention accumulates the IMU measurement value in the image acquisition interval of four consecutive frames, so as to avoid the fluctuation of the integration result caused by insufficient IMU measurement data. However, the iterative closest point obtained by processing the depth images separated by four frames has a larger error. The present invention retains the iterative closest point calculation results of adjacent frames, and combines the iterative closest point calculation results of four consecutive frames as the result of the adjacent four frames. Iterate the closest point result.

目标函数构建步骤S104，基于先验信息、迭代最近点、IMU测量值和视觉重投影构建目标函数。由于每种传感器都有其不适用的环境，但运动过程中的环境是无法人为控制的。为了提高在多场景下的鲁棒性，得到更精确的定位，本发明采用多传感器的进行数据的融合，提高位姿估计的精确度及鲁棒性。The objective function construction step S104 is to construct an objective function based on the prior information, the iterative closest point, the IMU measurement value and the visual reprojection. Since each sensor has its own inapplicable environment, the environment during motion cannot be controlled by humans. In order to improve the robustness in multiple scenarios and obtain more accurate positioning, the present invention adopts multi-sensor data fusion to improve the accuracy and robustness of pose estimation.

本发明以紧耦合(Tightly-coupled)的方式综合利用RGB图像、深度图像、IMU测量值等三种传感器信息估计两帧之间最优的姿态估计，通过迭代的方式解决这个最小二乘问题。The present invention comprehensively utilizes three kinds of sensor information such as RGB image, depth image, and IMU measurement value in a tightly-coupled manner to estimate the optimal attitude estimation between two frames, and solves the least squares problem in an iterative manner.

具体地，所述目标函数为：Specifically, the objective function is:

通过最小二乘法求得一个最合适的运动参数：旋转参数

和位移参数

使四个残差项PP、∑||D||²、∑||I||²与∑||V||²的和的值最低，其中，PP表示先验信息的先验分布、D表示迭代最近点残差项、I表示所述对象的两帧之间位置、速度、旋转、陀螺仪偏差和加速度偏差的变化量的差、V表示重投影误差，即PP、D、I、V也可以称为目标函数的四个约束，也就是四个残差项。A most suitable motion parameter is obtained by the least square method: rotation parameter

and displacement parameters

The value of the sum of the four residual terms PP, ∑||D|| ² , ∑||I|| ² and ∑||V|| ² is the lowest, where PP represents the prior distribution of prior information, D Represents the iterative nearest point residual term, I represents the difference in the amount of change in position, velocity, rotation, gyroscope bias and acceleration bias between the two frames of the object, V represents the reprojection error, i.e. PP, D, I, V It can also be called the four constraints of the objective function, that is, the four residual terms.

滑动窗口外的状态也能提供一定的约束信息，为了能够得到更真实的约束关系，所以滑动窗口外的状态也被考虑进来(但它不被优化)，系统使用边缘化的技巧，将它的约束信息转化为待优化变量的先验分布，即最小二乘问题中的PP。The state outside the sliding window can also provide certain constraint information. In order to obtain a more realistic constraint relationship, the state outside the sliding window is also taken into account (but it is not optimized). The constraint information is transformed into the prior distribution of the variables to be optimized, that is, PP in the least squares problem.

对于通过深度图像得到空间的几何结构，从而估计两帧之间的变换矩阵，迭代最近点(ICP)残差项D使用点到面的距离为：For obtaining the geometry of the space from the depth image and thus estimating the transformation matrix between the two frames, the iterative closest point (ICP) residual term D uses the point-to-surface distance as:

D＝(T·P_j-P_i)·n_i D=(T·P _j -P _i )·n _i

和位移参数

是两帧之间的运动参数。Among them, P _j and P _i are the matched 3D points in the above depth image, n _i is the normal vector corresponding to the previous frame, and the rotation parameter

and displacement parameters

is the motion parameter between two frames.

根据迭代最近点的计算特征，系统采用了GPU来提高迭代最近点的计算效率。因为两帧之间的深度图像相似度较高，为了避免不必要的迭代，所以限制迭代次数最高为5次。According to the calculation characteristics of the iterative closest point, the system adopts GPU to improve the calculation efficiency of the iterative closest point. Because the depth image similarity between two frames is high, in order to avoid unnecessary iterations, the number of iterations is limited to a maximum of 5.

对于IMU约束I，给定两帧的IMU测量值(加速度和角速度)，IMU预积分可以计算出位置、速度和旋转。所以系统将IMU误差项定义为两帧之间位置、速度、旋转、陀螺仪偏差和加速度偏差的变化量的差。For IMU constraint I, given two frames of IMU measurements (acceleration and angular velocity), IMU preintegration can calculate position, velocity, and rotation. So the system defines the IMU error term as the difference in the amount of change in position, velocity, rotation, gyroscope bias, and acceleration bias between two frames.

对于重投影误差V，作为视觉约束，并将其定义在单位球面的正切平面上：For the reprojection error V, as a visual constraint, and define it on the tangent plane of the unit sphere:

其中，

和

是P对应的切平面的任意两个正交基。对于投影与反投影

是根据针孔相机模型推导的从三维空间点投影到二维空间点的投影函数，定义为：in,

and

are any two orthonormal bases of the tangent plane corresponding to P. For projection and back projection

is a projection function derived from a 3D space point to a 2D space point based on the pinhole camera model, and is defined as:

其中，(f_x,f_y)和(c_x,c_y)是相机内参，可以通过棋盘格标定法等相机内参标定方法得到。投影的逆操作就是反投影。Among them, (f _x , f _y ) and (c _x , c _y ) are camera internal parameters, which can be obtained by camera internal parameter calibration methods such as checkerboard calibration method. The inverse operation of projection is backprojection.

位姿估计步骤S105，使用迭代法对所述目标函数求解最小值得到所述对象最优的位姿。In step S105 of pose estimation, an iterative method is used to solve the minimum value of the objective function to obtain the optimal pose of the object.

将所有残差项加在一起便构成了目标函数，使用迭代法最小化目标函数就可以得到最优的位姿。迭代方向由目标函数的一阶导决定。为了保证系统的实时性，求解位姿所用的最大迭代次数设置为8次。The objective function is formed by adding all the residual terms together, and the optimal pose can be obtained by minimizing the objective function using the iterative method. The iteration direction is determined by the first derivative of the objective function. In order to ensure the real-time performance of the system, the maximum number of iterations used to solve the pose is set to 8.

在一个实施例中，在实施本方法前，通过基于滑动窗口的运动恢复结构(SfM)和视觉惯性矫正模块来粗略估计系统所依赖的状态(重力向量、陀螺仪偏差和速度等)，为本方法提供初始值。In one embodiment, before implementing the method, the state (gravity vector, gyroscope bias and velocity, etc.) on which the system depends is roughly estimated through a sliding window-based motion recovery structure (SfM) and a visual-inertial correction module. Methods provide initial values.

在一个实施例中，所述方法还包括：回环检测步骤S106，检测当前帧是否发生回环，如果是，则执行全局的位姿图优化来矫正所述对象的定位轨迹，以消除累积误差。本方法采用基于词袋模型的回环检测，如果当前帧被认定为是一个关键帧，那么系统会根据字典把RGB图像转换成一个向量。接着把这个向量提交给回环检测，回环检测判断当前帧是否发生回环并返回结果。如果检测到发生了回环，那么将会执行全局的位姿图(Pose-Graph)优化来矫正定位轨迹，消除累积误差。In one embodiment, the method further includes: a loop closure detection step S106 , detecting whether a loop closure occurs in the current frame, and if so, performing a global pose graph optimization to correct the positioning trajectory of the object to eliminate accumulated errors. This method adopts the loop closure detection based on the bag-of-words model. If the current frame is identified as a key frame, the system will convert the RGB image into a vector according to the dictionary. Then submit this vector to the loopback detection, and the loopback detection determines whether a loopback occurs in the current frame and returns the result. If a loopback is detected, a global Pose-Graph optimization will be performed to correct the positioning trajectory and eliminate accumulated errors.

在一个实施例中，基于滑动窗口求解该最小二乘，滑动窗口内包含最小二乘法计算所需要的输入参数，每次最小二乘的除先验信息之外的误差项都由滑动窗口内的值提供。即本发明采用了基于滑动窗口的形式实现该最小二乘问题。滑动窗口内包含最小二乘所需要的所有输入(像素点、三维空间点、状态等)，每次最小二乘的误差项都由滑动窗口内的值提供(除了先验信息)。为了限制该最小二乘问题的求解时间，将滑动窗口的数量设置为10个，已足够得出精确的两帧之间的位姿(也可以称为姿势)估计。In one embodiment, the least squares is solved based on a sliding window, the sliding window contains input parameters required for the calculation of the least squares, and the error term of each least squares except the prior information is determined by the sliding window. value provided. That is, the present invention adopts the form based on the sliding window to realize the least squares problem. The sliding window contains all the inputs (pixel points, three-dimensional space points, states, etc.) required for least squares, and the error term of each least squares is provided by the values in the sliding window (except for prior information). In order to limit the solution time of this least squares problem, the number of sliding windows is set to 10, which is enough to obtain an accurate estimation of the pose (also called pose) between two frames.

在一个实施例中，基于以下条件确定RGB图像中的关键帧：该帧的平均视差与上一帧关键帧的平均视差相比差值超过第一阈值(比如，第一阈值设置为1.0，在大部分场景下定位性能良好)；或者，该帧跟踪的特征点数量小于第二阈值，比如，第二阈值设置为10，既保证系统的高鲁棒性又不至于产生过多的关键帧。In one embodiment, the key frame in the RGB image is determined based on the following condition: the difference between the average disparity of the frame and the average disparity of the key frame of the previous frame exceeds a first threshold (for example, the first threshold is set to 1.0, in the The positioning performance is good in most scenes); or, the number of feature points tracked in this frame is less than the second threshold, for example, the second threshold is set to 10, which not only ensures the high robustness of the system but also does not generate too many key frames.

本方法基于多种传感器采集运动对象的RGB图像、IMU测量值及深度图像进行姿态估计，并构建了用于姿态估计的目标函数，该目标函数使用最小二乘法进行求解，该函数的输入值通过滑动窗口进行输入，设置了滑动窗口的具体数目，降低求解时间，提高了位姿的估计精度，还设计了回环检测过程，在发生回环时执行全局的位姿图优化来矫正所述对象的定位轨迹，以消除累积误差，并具体设计相应的残差函数。This method uses various sensors to collect RGB images, IMU measurement values and depth images of moving objects for attitude estimation, and constructs an objective function for attitude estimation. The objective function is solved by the least square method. The input value of the function is obtained by The sliding window is used for input, the specific number of sliding windows is set, the solution time is reduced, the estimation accuracy of the pose is improved, and the loopback detection process is also designed, and the global pose graph optimization is performed when a loopback occurs to correct the positioning of the object trajectory to eliminate the accumulated error, and specifically design the corresponding residual function.

图2示出了本发明的本发明的一种基于多传感器融合的三维姿态估计装置，该装置包括：Fig. 2 shows a three-dimensional attitude estimation device based on multi-sensor fusion of the present invention, the device includes:

RGB图像获取单元201，用于使用图像传感器获取被测对象的RGB图像，并提取所述RGB图像的FAST特征点以及使用KLT稀疏光流法进行特征点的跟踪；所述图像传感器可以是摄像头、摄像机等等，其可以采集被测对象的多帧图像。The RGB image acquisition unit 201 is used to acquire the RGB image of the measured object using an image sensor, and extract the FAST feature point of the RGB image and use the KLT sparse optical flow method to track the feature point; the image sensor can be a camera, Cameras, etc., which can capture multiple frames of images of the object under test.

IMU测量值获取单元202，用于使用IMU传感器采集被测对象的IMU测量值，并对所述IMU测量值进行积分得到当前时刻的被测对象的位置、速度和旋转；IMU传感器又称为惯性传感器，通过对离散的IMU测量值进行积分就可以得到当前时刻的位置、速度和旋转，可以给出当前搭载传感器的对象进行运动估计。The IMU measurement value acquisition unit 202 is configured to use the IMU sensor to collect the IMU measurement value of the measured object, and integrate the IMU measurement value to obtain the position, speed and rotation of the measured object at the current moment; the IMU sensor is also called inertial For the sensor, by integrating the discrete IMU measurement values, the position, speed and rotation at the current moment can be obtained, and the motion estimation of the object currently equipped with the sensor can be given.

深度图像获取单元203，用于使用深度图像传感器获取被测对象的深度图像，使用迭代最近点(ICP)算法对所述深度图像进行处理得到迭代最近点；所述深度图像传感器可以是深度摄像机、深度摄像头，通过深度图像可以得到被测对象空间的几何结构，从而估计两帧之间的变换矩阵。The depth image acquisition unit 203 is used for acquiring the depth image of the measured object by using the depth image sensor, and using the iterative closest point (ICP) algorithm to process the depth image to obtain the iterative closest point; the depth image sensor may be a depth camera, In the depth camera, the spatial geometry of the measured object can be obtained through the depth image, thereby estimating the transformation matrix between two frames.

目标函数构建单元204，用于基于先验信息、迭代最近点、IMU测量值和视觉重投影构建目标函数。由于每种传感器都有其不适用的环境，但运动过程中的环境是无法人为控制的。为了提高在多场景下的鲁棒性，得到更精确的定位，本发明采用多传感器的进行数据的融合，提高位姿估计的精确度及鲁棒性。The objective function construction unit 204 is configured to construct an objective function based on the prior information, the iterative closest point, the IMU measurement value and the visual reprojection. Since each sensor has its own inapplicable environment, the environment during motion cannot be controlled by humans. In order to improve the robustness in multiple scenarios and obtain more accurate positioning, the present invention adopts multi-sensor data fusion to improve the accuracy and robustness of pose estimation.

具体地，所述目标函数为：Specifically, the objective function is:

通过最小二乘法求得一个最合适的运动参数：旋转参数

和位移参数

and displacement parameters

D＝(T·P_j-P_i)·n_i D=(T·P _j -P _i )·n _i

和位移参数

and displacement parameters

is the motion parameter between two frames.

其中，

和

是P对应的切平面的任意两个正交基。对于投影与反投影

and

位姿估计单元205，用于使用迭代法对所述目标函数求解最小值得到所述对象最优的位姿。The pose estimation unit 205 is configured to use an iterative method to solve the minimum value of the objective function to obtain the optimal pose of the object.

在一个实施例中，在实施本装置前，通过基于滑动窗口的运动恢复结构(SfM)和视觉惯性矫正模块来粗略估计系统所依赖的状态(重力向量、陀螺仪偏差和速度等)，为本装置提供初始值。In one embodiment, before implementing the device, the state (gravity vector, gyroscope bias and velocity, etc.) on which the system depends is roughly estimated by a sliding window-based motion recovery structure (SfM) and a visual-inertial correction module. The device provides initial values.

在一个实施例中，所述装置还包括：回环检测单元206，用于检测当前帧是否发生回环，如果是，则执行全局的位姿图优化来矫正所述对象的定位轨迹，以消除累积误差。本装置采用基于词袋模型的回环检测，如果当前帧被认定为是一个关键帧，那么系统会根据字典把RGB图像转换成一个向量。接着把这个向量提交给回环检测，回环检测判断当前帧是否发生回环并返回结果。如果检测到发生了回环，那么将会执行全局的位姿图(Pose-Graph)优化来矫正定位轨迹，消除累积误差。In one embodiment, the apparatus further includes: a loop closure detection unit 206, configured to detect whether a loop closure occurs in the current frame, and if so, perform global pose graph optimization to correct the positioning trajectory of the object to eliminate accumulated errors . The device adopts the loop closure detection based on the bag-of-words model. If the current frame is identified as a key frame, the system will convert the RGB image into a vector according to the dictionary. Then submit this vector to the loopback detection, and the loopback detection determines whether a loopback occurs in the current frame and returns the result. If a loopback is detected, a global Pose-Graph optimization will be performed to correct the positioning trajectory and eliminate accumulated errors.

本装置基于多种传感器采集运动对象的RGB图像、IMU测量值及深度图像进行姿态估计，并构建了用于姿态估计的目标函数，该目标函数使用最小二乘法进行求解，该函数的输入值通过滑动窗口进行输入，设置了滑动窗口的具体数目，降低求解时间，提高了位姿的估计精度，还设计了回环检测过程，在发生回环时执行全局的位姿图优化来矫正所述对象的定位轨迹，以消除累积误差，并具体设计相应的残差函数。The device collects RGB images, IMU measurement values and depth images of moving objects based on various sensors for attitude estimation, and constructs an objective function for attitude estimation. The objective function is solved by the least squares method, and the input value of the function is The sliding window is used for input, the specific number of sliding windows is set, the solution time is reduced, the estimation accuracy of the pose is improved, and the loopback detection process is also designed, and the global pose graph optimization is performed when a loopback occurs to correct the positioning of the object trajectory to eliminate the accumulated error, and specifically design the corresponding residual function.

为了描述的方便，描述以上装置时以功能分为各种单元分别描述。当然，在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described respectively. Of course, when implementing the present application, the functions of each unit may be implemented in one or more software and/or hardware.

通过以上的实施方式的描述可知，本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的装置。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in storage media, such as ROM/RAM, magnetic disks , CD, etc., including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the apparatus described in each embodiment or some part of the embodiment of the present application.

最后所应说明的是：以上实施例仅以说明而非限制本发明的技术方案，尽管参照上述实施例对本发明进行了详细说明，本领域的普通技术人员应当理解：依然可以对本发明进行修改或者等同替换，而不脱离本发明的精神和范围的任何修改或局部替换，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only to illustrate rather than limit the technical solutions of the present invention. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: the present invention can still be modified or Equivalent replacements, and any modifications or partial replacements that do not depart from the spirit and scope of the present invention, shall all be included in the scope of the claims of the present invention.

Claims

1. A three-dimensional attitude estimation method based on multi-sensor fusion is characterized by comprising the following steps:

an RGB image acquisition step, wherein an image sensor is used for acquiring an RGB image of a measured object, FAST characteristic points of the RGB image are extracted, and a KLT sparse optical flow method is used for tracking the characteristic points;

an IMU measurement value acquisition step, namely acquiring an IMU measurement value of a measured object by using an IMU sensor, and integrating the IMU measurement value to obtain the position, the speed and the rotation of the measured object at the current moment;

a depth image obtaining step, namely obtaining a depth image of the measured object by using a depth image sensor, and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point;

an objective function construction step, namely constructing an objective function based on prior information, an iteration closest point, an IMU (inertial measurement Unit) measurement value and visual reprojection;

and a pose estimation step, namely solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object.

2. The method of claim 1, further comprising:

and a loop detection step, namely detecting whether a loop occurs in the current frame, if so, executing global pose graph optimization to correct the positioning track of the object so as to eliminate the accumulated error.

3. The method of claim 2, wherein the objective function is:

min_(R，t){PP+∑||D||²+∑||I||²+∑||V||²}；

the most suitable motion parameter is obtained by the least square method: rotation parameter

And displacement parameter

Making four residual terms PP, ∑ | | | D | | non-woven²、∑||I||²And ∑ V calculation²Where PP represents the prior distribution of prior information, D represents the iterative closest point residual term, I represents the difference in the amount of change in position, velocity, rotation, gyroscope and acceleration biases between two frames of the object, and V represents the reprojection error.

4. The method of claim 3, wherein the least squares are solved based on a sliding window, the sliding window containing input parameters required for the least squares calculation, and the error term of each least square, excluding the prior information, is provided by a value within the sliding window.

5. The method of claim 4, wherein the key frame in the RGB image is determined based on the following conditions:

the difference value of the average parallax of the frame and the average parallax of the key frame of the previous frame exceeds a first threshold value;

alternatively, the number of feature points tracked by the frame is less than a second threshold.

6. The method of claim 3,

the spatial geometry is derived from the depth image to estimate the transformation matrix between two frames, and the Iterative Closest Point (ICP) residual term D uses the point-to-face distance as:

D＝(T·P_j-P_i)·n_i

wherein, P_jAnd P_iIs a matched three-dimensional point, n, in the depth image_iIs the corresponding normal vector, rotation parameter of the previous frame

And displacement parameter

Is a motion parameter between two frames;

the reprojection error V serves as a visual constraint and is defined on the tangent plane of the unit sphere:

wherein,

is the predicted value of the unit vector of the three-dimensional point after the rigid body transformation of the characteristic point, P is the three-dimensional point obtained by the back projection of the pixel coordinate of the characteristic point through the pinhole camera model,

and

are any two orthogonal bases of the tangent plane to which P corresponds.

7. A three-dimensional attitude estimation device based on multi-sensor fusion is characterized by comprising:

an RGB image acquisition unit, which is used for acquiring an RGB image of a measured object by using an image sensor, extracting FAST characteristic points of the RGB image and tracking the characteristic points by using a KLT sparse optical flow method;

the IMU measurement value acquisition unit is used for acquiring an IMU measurement value of the measured object by using an IMU sensor and integrating the IMU measurement value to obtain the position, the speed and the rotation of the measured object at the current moment;

the depth image acquisition unit is used for acquiring a depth image of the measured object by using a depth image sensor and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point;

the objective function construction unit is used for constructing an objective function based on the prior information, the iteration closest point, the IMU measurement value and the visual re-projection;

and the pose estimation unit is used for solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object.

8. The apparatus of claim 7, further comprising:

and the loop detection unit is used for detecting whether a loop occurs to the current frame, and if so, performing global pose graph optimization to correct the positioning track of the object so as to eliminate the accumulated error.

9. The method of claim 8, wherein the objective function is:

min_(R，t){PP+∑||D||²+∑||I||²+∑||V||²}；

And displacement parameter

10. The method of claim 9, wherein the least squares are solved based on a sliding window, the sliding window containing input parameters required for the least squares calculation, and the error term of each least square, excluding the prior information, is provided by a value within the sliding window.

11. The method of claim 10, wherein determining the key frame in the RGB image is based on:

12. The method of claim 9,

D＝(T·P_j-P_i)·n_i

And displacement parameter

Is a motion parameter between two frames;

wherein,

of unit vectors of three-dimensional points after rigid body transformation of characteristic pointsPredicting value P is three-dimensional point obtained by pixel coordinate back projection of pinhole camera model,

and

are any two orthogonal bases of the tangent plane to which P corresponds.