CN103607584B

CN103607584B - Real-time registration method for depth maps shot by kinect and video shot by color camera

Info

Publication number: CN103607584B
Application number: CN201310609865.3A
Authority: CN
Inventors: 童若锋; 琚旋; 成可立
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-11-27
Filing date: 2013-11-27
Publication date: 2015-05-27
Anticipated expiration: 2033-11-27
Also published as: CN103607584A

Abstract

The invention proposes a stable and rapid real-time registration method for registering a depth map captured by kinect in real time and a video shot by a high-definition camera. The invention eliminates the step of estimating the internal parameters of the depth camera, reduces the parameters to be obtained, and enhances the stability of the algorithm. The present invention uses a linear optimization framework to solve the problem, and can obtain the global optimal solution in one step, and the calculation efficiency of the algorithm is greatly improved. Although the present invention removes the step of estimating the internal parameters of the depth camera in the traditional algorithm, the execution efficiency of mapping the depth information to the video is not affected, and the depth information captured by the kinect can still be mapped to the video in real time; and due to the defined mixing The parameters have good mathematical properties, and the matrix QR decomposition technology can still be used to reversely calculate the internal parameters of the depth camera for other algorithm applications.

Description

A real-time registration method of depth map captured by kinect and video captured by color camera

技术领域 technical field

本发明涉及一种kinect拍摄的深度图与彩色摄像机拍摄视频的实时配准方法。 The invention relates to a real-time registration method of a depth map captured by a kinect and a video captured by a color camera.

背景技术 Background technique

随着数据存储、传输技术的提高以及摄像机镜头工艺的进步，高清摄像机和高清网络摄像头逐渐进入了人们的视野。使用这些设备，人们可以轻松地获取高质量的视频素材，并与他人进行高清视频通信。但是近年来，随着增强现实技术和立体显示技术的发展，传统的二维高清视频已经不能满足人们的需求，人们渴望能够更简单地对视频素材进行高质量编辑，立体显示，或在实时视频通信过程中与计算机合成的虚拟物体进行高真实感的互动，而这一切都依赖于视频的深度生成技术。 With the improvement of data storage and transmission technology and the advancement of camera lens technology, high-definition cameras and high-definition network cameras have gradually entered people's field of vision. Using these devices, people can easily obtain high-quality video material and communicate with others in high-definition video. However, in recent years, with the development of augmented reality technology and stereoscopic display technology, traditional two-dimensional high-definition video can no longer meet people's needs. High-realistic interaction with computer-synthesized virtual objects during communication, all of which rely on deep video generation technology.

视频的深度生成技术是计算机视觉领域的经典难题。为了恢复在视频拍摄的过程中所丢失的深度信息，无数的科学家投身于这个领域，并提出了一系列经典的算法。但是到目前为止，没有一个算法能够在保证视频深度生成的实时性的同时保证对复杂场景深度生成的正确性。而且目前的所有证据都显示，距离同时满足“实时性”和“正确性”的视频深度生成算法的产生，还有很长的时间。 The deep generation technology of video is a classic problem in the field of computer vision. In order to restore the depth information lost in the process of video shooting, countless scientists have devoted themselves to this field and proposed a series of classic algorithms. But so far, no algorithm can guarantee the correctness of complex scene depth generation while ensuring the real-time performance of video depth generation. Moreover, all current evidence shows that there is still a long time before the generation of video depth generation algorithms that satisfy both "real-time" and "correctness".

因此，人们倾向于使用特殊的设备进行视频和深度的同时捕获，这样虽然能够同时满足深度信息的“实时性”和“正确性”，但这类特殊设备往往体积巨大而且价格不菲，是普通的用户无法承受的。 Therefore, people tend to use special equipment to capture video and depth at the same time. Although this can satisfy the "real-time" and "correctness" of depth information at the same time, such special equipment is often bulky and expensive, which is common users cannot bear.

近几年，微软推出了一种轻量级的深度捕获设备kinect，它体积小，价格便宜，能够实时地捕获场景深度，这个设备让人们对视频深度生成再次燃起了希望。但是不幸的是，虽然kinect设备本身提供了彩色摄像头用以捕获视频数据，但是这个摄像头的分辨率非常低，无法满足人们对高清视频的需求。而使用额外的高清摄像机和kinect分别捕捉高清视频和深度信息，则由于两个设备的视角存在差异而导致捕获的视频和深度数据在空间上无法对齐。 In recent years, Microsoft has launched a lightweight depth capture device, kinect, which is small in size and cheap, and can capture scene depth in real time. This device has revived people's hope for video depth generation. But unfortunately, although the kinect device itself provides a color camera to capture video data, the resolution of this camera is very low and cannot meet people's needs for high-definition video. While using an additional high-definition camera and kinect to capture high-definition video and depth information separately, the captured video and depth data cannot be aligned spatially due to differences in the viewing angles of the two devices.

近年来，许多学者相继提出了克服这个空间差异的算法。但是都存在着容易受到深度数据的噪声影响和计算效率低的问题。 In recent years, many scholars have proposed algorithms to overcome this spatial difference. However, they all have the problems of being easily affected by the noise of depth data and low computational efficiency.

发明内容 Contents of the invention

本发明所要解决的技术内容是提出一种将kinect实时捕获的深度图与高清摄像机拍摄的视频进行配准的、稳定的、快速的实时配准方法。 The technical content to be solved by the present invention is to propose a stable and fast real-time registration method for registering the depth map captured by the kinect in real time with the video shot by the high-definition camera.

为此本发明采用以下技术方案，它包括以下步骤： For this reason the present invention adopts following technical scheme, it may further comprise the steps:

(1)、将高清彩色相机和kinect深度相机分别固定，分别固定的位置为任意位置，本发明所述的实时配准方法能够处理kinect和深度相机之间存在任意较大视角差异情况下的准确配准，并设置同步拍摄信号； (1), the high-definition color camera and the kinect depth camera are fixed respectively, and the positions fixed respectively are arbitrary positions, and the real-time registration method of the present invention can handle the accuracy under the situation that there is any larger angle of view difference between the kinect and the depth camera Registration, and set synchronous shooting signal;

(2)、将标定用棋盘格随机置于彩色相机和kinect深度相机之前，并进行同步拍摄，得到同一时刻棋盘格的彩色图像和深度图像。将此操作重复多次，直到得到一组不同角度，不同距离的同步拍摄结果； (2) Randomly place the checkerboard for calibration in front of the color camera and the kinect depth camera, and perform synchronous shooting to obtain the color image and depth image of the checkerboard at the same time. Repeat this operation many times until you get a set of synchronous shooting results from different angles and different distances;

(3)、在kinect每次拍摄的深度图中的棋盘格对应的区域内手动随意标注一块或若干块任意形状区域，用户只需要在深度图平板区域内部随意标注一块或多块任意形状区域，而不像传统方法中需要精确标记平板的四角，有效地降低了深度相机噪声对算法的影响，并保证每张深度图上标记的区域的像素数目总和不少于10个像素。计算在第i次拍摄的kinect的深度图中，手动标注的任意区域内的第j个像素点的二维坐标的齐次坐标表示和该点的深度的乘积，记作。并将所有坐标的集合记作； (3) Manually mark one or several arbitrary-shaped areas in the area corresponding to the checkerboard grid in the depth map captured by kinect each time. The user only needs to mark one or more arbitrary-shaped areas in the flat area of the depth map. Unlike the traditional method that needs to accurately mark the four corners of the tablet, it effectively reduces the impact of depth camera noise on the algorithm, and ensures that the total number of pixels in the marked area on each depth map is not less than 10 pixels. Calculate the product of the homogeneous coordinate representation of the two-dimensional coordinates of the jth pixel point in any area manually marked in the depth map of the kinect captured for the i time and the depth of the point, denoted as . and denote the set of all coordinates as ;

(4)、对于第i次拍摄得到的坐标集合，将其所有元素带入空间平面方程：进行平面拟合。并计算得到的平面的法向量。对于集合中的所有元素，计算的均值，将中所有满足的点滤除，将剩余的点作为新的坐标集合； (4), for the coordinate set obtained for the i-th shooting , bringing all its elements into the spatial plane equation: Perform plane fitting. and calculate the normal vector of the resulting plane . for collections all elements in ,calculate mean of ,Will all satisfied in points are filtered out, and the remaining points are used as a new set of coordinates ;

(5)、使用张正友的经典彩色摄像机标定算法计算高清彩色相机内参和每次拍摄中彩色相机和棋盘格的相对位置关系，使用旋转矩阵和平移向量来表示第i次的拍摄的相对位置关系； (5) Using Zhang Zhengyou's classic color camera calibration algorithm to calculate the internal parameters of the high-definition color camera and the relative position of the color camera and the checkerboard in each shot, using the rotation matrix and translation vector to represent the relative positional relationship of the i-th shooting;

(6)、设3*3大小的混合参数和3维向量为未知量；将(4)、(5)两步骤中得到的经过噪声滤除步骤的点坐标和第i次拍摄的旋转参数作为已知量，建立平面约束线性方程：； (6), set the mixing parameters of 3*3 size and a 3D vector is an unknown quantity; the point coordinates obtained in (4) and (5) after the noise filtering step and the rotation parameter of the i-th shot As a known quantity, set up a plane constrained linear equation: ;

(7)、根据点坐标对应的深度对(6)中的方程施加惩罚系数，深度值越小的惩罚系数越大，使深度较小的点对应的约束方程在方程组中具有较大的权重； (7), according to point coordinates The corresponding depth imposes a penalty coefficient on the equation in (6) , the smaller the depth value, the larger the penalty coefficient, so that the constraint equation corresponding to the point with a smaller depth has a larger weight in the equation system;

(8)、建立约束方程组，使用线性优化框架求解：1)表示彩色相机和kinect之间的平移参数T；2)由深度相机内参和彩色相机与kinect之间相对旋转组成的混合参数H； (8), establish constraint equations , use the linear optimization framework to solve: 1) represents the translation parameter T between the color camera and kinect; 2) the mixing parameter H composed of the internal parameters of the depth camera and the relative rotation between the color camera and kinect;

(9)、使用(6)中得到的H和T两个参数实时地将kinect捕获的深度信号映射到高清摄像机捕获的视频上。具体方法为：使用kinect连续拍摄深度图，对每幅图像上的每个像素，计算其二维坐标的齐次坐标表示和该点的深度的乘积，接着计算该点在摄像机坐标系下三维坐标：，将该点投影到彩色图像上，得到其在彩色图像上的投影位置和对应的深度，完成实时地将kinect捕获的深度信号映射到高清摄像机捕获的视频上，其中：。由于深度图上的不同像素可能映射到彩色图像上的同一个坐标下，对于这种情况，将保存最小深度作为彩色图像上该点的深度。 (9), using the two parameters H and T obtained in (6) to map the depth signal captured by the kinect to the video captured by the high-definition camera in real time. The specific method is: use kinect to continuously shoot the depth map, and calculate the homogeneous coordinate representation of its two-dimensional coordinates for each pixel on each image and the depth of the point the product of , and then calculate the three-dimensional coordinates of the point in the camera coordinate system: , project the point onto the color image, and get its projected position on the color image and the corresponding depth , complete the real-time mapping of the depth signal captured by kinect to the video captured by the high-definition camera, where: . Since different pixels on the depth map may map to the same coordinate on the color image, for this case, the minimum depth will be saved as the depth of the point on the color image.

在采用以上技术方案的基础上，本发明还可以采用以下进一步的方案： On the basis of adopting the above technical solutions, the present invention can also adopt the following further solutions:

在步骤（7）中，根据点坐标对应的深度来决定所述平面约束线性方程在整个方程组中的权重，惩罚系数如下所示： In step (7), according to point coordinates The corresponding depth is used to determine the weight of the plane constrained linear equation in the entire equation system, and the penalty coefficient As follows:

（单位米） (Unit: m)

其中depth指的是该像素点对应的深度。 Where depth refers to the depth corresponding to the pixel.

在步骤（8）中，建立约束方程AX=b的具体构造方式为：将3*3大小的混合参数和3维向量组合成12维向量，建立最小二乘意义下的约束方程组：，使用线性优化框架求解：其中，记，，，则在矩阵中对应的行构造为： In step (8), the specific construction method of establishing the constraint equation AX=b is: the mixing parameter of size 3*3 and a 3D vector combined into a 12-dimensional vector , to establish a system of constraint equations in the sense of least squares: , using the linear optimization framework to solve: where, remember , , , then in the matrix the corresponding row in Constructed as:

相应地，列向量为： Correspondingly, the column vector for:

。 .

在步骤（9）中，所述将得到的H和T两个参数实时地将kinect捕获的深度信号映射到高清摄像机捕获的视频上的方法为：使用kinect连续拍摄深度图，对每幅图像上的每个像素，计算其二维坐标的齐次坐标表示和该点的深度的乘积，接着计算该点在摄像机坐标系下三维坐标：，将该点投影到彩色图像上，得到其在彩色图像上的投影位置和对应的深度，完成实时地将kinect捕获的深度信号映射到高清摄像机捕获的视频上，其中：。由于深度图上的不同像素可能映射到彩色图像上的同一个坐标下，对于这种情况，将保存最小深度作为彩色图像上该点的深度 In step (9), the method of mapping the obtained H and T parameters in real time from the depth signal captured by the kinect to the video captured by the high-definition camera is: use the kinect to continuously shoot the depth map, and on each image For each pixel of , calculate the homogeneous coordinate representation of its two-dimensional coordinates and the depth of the point the product of , and then calculate the three-dimensional coordinates of the point in the camera coordinate system: , project the point onto the color image, and get its projected position on the color image and the corresponding depth , complete the real-time mapping of the depth signal captured by kinect to the video captured by the high-definition camera, where: . Since different pixels on the depth map may map to the same coordinates on the color image, in this case, the minimum depth will be saved as the depth of the point on the color image

由于采用了本发明的技术方案，本发明具有以下有益效果： Due to the adoption of the technical solution of the present invention, the present invention has the following beneficial effects:

(1)、传统算法需要估计深度相机内参，但由于kinect捕获的深度信号包含大量噪声，这一步骤会受到噪声影响，造成整体算法的不稳定。本发明去除了深度相机内参估计的步骤，减少待求参数的同时，增强了算法的稳定性。 (1) The traditional algorithm needs to estimate the internal parameters of the depth camera, but since the depth signal captured by kinect contains a lot of noise, this step will be affected by the noise, resulting in the instability of the overall algorithm. The invention eliminates the step of estimating the internal parameters of the depth camera, reduces the parameters to be obtained, and enhances the stability of the algorithm.

(2)、传统算法为了降低深度相机内参估计的误差对算法造成的影响，采用了非线性优化框架迭代地对深度相机内参等参数进行修正。但该非线性优化速度慢，而且往往会陷入局部最优。本发明使用线性优化框架求解，可以一步得到全局最优解，算法计算效率得到了极大的提升。 (2) In order to reduce the impact of the estimation error of the internal parameters of the depth camera on the algorithm, the traditional algorithm uses a nonlinear optimization framework to iteratively correct the parameters such as the internal parameters of the depth camera. But this nonlinear optimization is slow and often falls into local optimum. The present invention uses a linear optimization framework to solve the problem, and can obtain the global optimal solution in one step, and the calculation efficiency of the algorithm is greatly improved.

(3)、本发明虽然去除了传统算法中深度相机内参估计的步骤，但是深度信息映射到视频的执行效率并未因此受到影响，仍然能够实时地将kinect捕获的深度信息映射到视频上。 (3), although the present invention removes the step of estimating the internal parameters of the depth camera in the traditional algorithm, the execution efficiency of mapping the depth information to the video is not affected, and the depth information captured by the kinect can still be mapped to the video in real time.

(4)、本发明虽然去除了传统算法中深度相机内参估计的步骤，但是由于定义的混合参数具有良好的数学性质，仍然可以使用矩阵QR分解技术反求出深度相机内参，供其他算法应用。 (4), although the present invention removes the step of estimating the internal parameters of the depth camera in the traditional algorithm, due to the good mathematical properties of the defined mixing parameters, the internal parameters of the depth camera can still be obtained by using the matrix QR decomposition technology for other algorithm applications.

附图说明 Description of drawings

图1为本发明所提供的方法的整体流程图。 Fig. 1 is the overall flowchart of the method provided by the present invention.

图2为本发明的效果示例图，本发明将同步拍摄的深度图（左上）映射到彩色图像上（左下）。为了方便观察，本发明将深度图进行彩色化表示。映射的结果如右图所示，可以清楚地看到彩色图像和深度图在边缘上贴合紧密，从深度图上人的耳朵上取出一个像素点，该像素点通过本发明的算法准确地映射到了彩色图像人物的耳朵上。 Fig. 2 is an example diagram of the effect of the present invention. The present invention maps the synchronously captured depth map (upper left) to a color image (lower left). For convenience of observation, the present invention displays the depth map in color. The result of the mapping is shown in the figure on the right. It can be clearly seen that the color image and the depth map fit closely on the edge, and a pixel point is taken from the human ear on the depth map, and the pixel point is accurately mapped by the algorithm of the present invention to the ears of the characters in the color image.

具体实施方式 Detailed ways

首先定义接下来说明中所要用到的缩写： First define the abbreviations that will be used in the following instructions:

：摄像机内参 : Camera internal reference

：棋盘格坐标系相对于摄像机坐标系的3维旋转矩阵； : 3D rotation matrix of the checkerboard coordinate system relative to the camera coordinate system;

：棋盘格坐标系相对于摄像机坐标系的3维旋转矩阵的第3列向量； : The third column vector of the 3D rotation matrix of the checkerboard coordinate system relative to the camera coordinate system;

：棋盘格坐标系相对于摄像机坐标系的平移； : The translation of the checkerboard coordinate system relative to the camera coordinate system;

：第i个样本中kinect捕获的深度图中，位于棋盘格平板上的第j个点的二维坐标的齐次坐标表示和该点的深度的乘积； : The homogeneous coordinate representation of the two-dimensional coordinates of the jth point on the checkerboard plate in the depth image captured by kinect in the ith sample and the depth of the point the product of

：由摄像机坐标系和kinect坐标系的旋转和以及kinect内参所构成的混合参数，； : Rotation by camera coordinate system and kinect coordinate system And and kinect internal reference The resulting mixing parameters, ;

：摄像机坐标系和kinect坐标系之间的平移。。 : The translation between the camera coordinate system and the kinect coordinate system. .

以下结合附图对本发明技术方案作进一步说明。 The technical solution of the present invention will be further described below in conjunction with the accompanying drawings.

图1是本发明的基本流程图，发明通过建立平面约束线性方程组，求解混合参数H和平移参数T，将kinect实时捕获的深度图映射到高清摄像机捕获的视频。下面对本发明的各个流程进行详细说明： Fig. 1 is a basic flow chart of the present invention. The invention maps the depth map captured by kinect in real time to the video captured by the high-definition camera by establishing a plane-constrained linear equation set, solving the mixing parameter H and the translation parameter T. Each flow process of the present invention is described in detail below:

(1)、将高清彩色相机和kinect深度相机分别固定，并设置同步拍摄信号： (1) Fix the high-definition color camera and kinect depth camera respectively, and set the synchronous shooting signal:

本发明要求摄像机和kinect的位置关系保持固定。同时为了将kinect拍摄的深度图映射到同一时刻摄像机拍摄的图像上，需要为两台采集设备之间设置同步拍摄信号。使用软件同步，由主机发送60HZ方波信号，由时钟上升沿触发同步拍摄。 The present invention requires that the positional relationship between the camera and the kinect be kept fixed. At the same time, in order to map the depth map captured by kinect to the image captured by the camera at the same time, it is necessary to set a synchronous shooting signal between the two acquisition devices. Using software synchronization, the host sends a 60HZ square wave signal, and the rising edge of the clock triggers synchronous shooting.

(2)、同步拍摄棋盘格： (2), Synchronously shoot checkerboard:

将标定用棋盘格随机置于相机之前，并进行同步拍摄，得到同一时刻棋盘格的彩色图像和深度图像。将此操作重复多次，直到得到一组不同角度，不同距离的同步拍摄结果。在实验中采用了13*9大小的棋盘格，选取了15个距离，并在每个距离下分别拍摄了3~5个角度的棋盘格，得到了48个样本。 Randomly place the checkerboard for calibration in front of the camera, and shoot synchronously to obtain the color image and depth image of the checkerboard at the same time. Repeat this operation many times until you get a set of simultaneous shooting results from different angles and different distances. In the experiment, a checkerboard with a size of 13*9 was used, 15 distances were selected, and checkerboards with 3 to 5 angles were photographed at each distance, and 48 samples were obtained.

(3)、手动标注平板： (3) Manually mark the tablet:

由于在参数计算时需要使用平面约束，所以需要人工地为步骤(2)中kinect采集的棋盘格平板所在的区域进行标注。需要声明的是，并不需要完整地标记出整个平板区域，而是用一个任意形状的封闭曲线标记平板内部的区域即可。这种标记的随意性有效地减少了人工交互的难度。标注时需要保证每张深度图上标记的区域的像素数目总和不少于10个像素。计算在第i次拍摄的kinect的深度图中，手动标注的任意区域内的第j个像素点的二维坐标的齐次坐标表示和该点的深度的乘积，记作。并将所有坐标的集合记作。 Since plane constraints need to be used during parameter calculation, it is necessary to manually mark the area where the checkerboard plate collected by kinect in step (2) is located. It should be stated that it is not necessary to mark the entire plate area completely, but a closed curve of any shape can be used to mark the area inside the plate. The randomness of this marking effectively reduces the difficulty of human interaction. When marking, it is necessary to ensure that the total number of pixels in the marked area on each depth map is not less than 10 pixels. Calculate the product of the homogeneous coordinate representation of the two-dimensional coordinates of the jth pixel point in any area manually marked in the depth map of the kinect captured for the i time and the depth of the point, denoted as . and denote the set of all coordinates as .

(4)、滤除噪声点： (4), filter out noise points:

Kinect拍摄的深度图包含大量的噪声，虽然本发明的方法不需要像传统方法中准确标注棋盘格的四角，而是在棋盘格区域内标注任意形状区域，但是kinect的噪声仍然会对本发明所述算法的映射正确性造成影响。所以在使用任意区域内的点建立约束之前，需要对其进行滤除噪声处理。具体地，对于第i次拍摄得到的坐标集合，将其所有元素带入空间平面方程：进行平面拟合。并计算得到的平面的法向量。对于集合中的所有元素，计算的均值，将中所有满足的点滤除，将剩余的点作为新的坐标集合。 The depth map that Kinect shoots contains a large amount of noises, although the method of the present invention does not need to accurately mark the four corners of checkerboard as in the traditional method, but marks the area of arbitrary shape in the checkerboard area, but the noise of kinect still can affect described in the present invention The mapping correctness of the algorithm is affected. Therefore, before using the points in any region to establish constraints, it needs to be processed to filter out noise. Specifically, for the coordinate set obtained in the i-th shooting , bringing all its elements into the spatial plane equation: Perform plane fitting. and calculate the normal vector of the resulting plane . for collections all elements in ,calculate mean of ,Will all satisfied in points are filtered out, and the remaining points are used as a new set of coordinates .

(5)、摄相机标定： (5), camera calibration:

使用张正友的经典摄像机标定算法计算摄像机内参；并计算步骤(2)中记录的每个样本中的棋盘格坐标系相对于摄像机坐标系的3维旋转矩阵，并记录其第3列向量，作为第i个样本中棋盘格的法向在摄像机坐标下的表示。同时记录每个样本中的棋盘格坐标系相对于摄像机坐标系的平移。 Using Zhang Zhengyou's classic camera calibration algorithm to calculate the internal parameters of the camera ; and calculate the 3-dimensional rotation matrix of the checkerboard coordinate system relative to the camera coordinate system in each sample recorded in step (2) , and record its 3rd column vector , as the representation of the normal of the checkerboard in the i-th sample in camera coordinates. Simultaneously record the translation of the checkerboard coordinate system relative to the camera coordinate system in each sample .

(6)、建立平面约束方程： (6) Establish plane constraint equation:

对于位于kinect深度图中棋盘格平板上的每一个点，可以建立方程： For each point on the checkerboard slab in the kinect depth map , the equation can be established:

其中，表示第i个样本中棋盘格的法向在摄像机坐标下的表示；表示第i个样本中的棋盘格坐标系相对于摄像机坐标系的平移；表示第i个样本中kinect捕获的深度图中，位于棋盘格平板上的第j个点的二维坐标的齐次坐标表示和该点的深度的乘积，其中平板区域在步骤(3)中被确定；为由摄像机坐标系和kinect坐标系的旋转和以及kinect内参所构成的混合参数，；表示摄像机坐标系和kinect坐标系之间的平移。和为待求参数，其中为3维矩阵，为3维向量，共12个未知量。 in, Indicates the representation of the normal direction of the checkerboard in the i-th sample in the camera coordinates; Indicates the translation of the checkerboard coordinate system in the i-th sample relative to the camera coordinate system; Indicates the homogeneous coordinate representation of the two-dimensional coordinates of the jth point on the checkerboard plate in the depth map captured by kinect in the ith sample and the depth of the point The product of , wherein the plate area is determined in step (3); For the rotation by the camera coordinate system and the kinect coordinate system And and kinect internal reference The resulting mixing parameters, ; Indicates the translation between the camera coordinate system and the kinect coordinate system. and is the parameter to be requested, where is a 3D matrix, is a 3-dimensional vector, with a total of 12 unknowns.

对于方程组的建立，需要注意两点：第一，由于每个平面上的点都可以建立一个方程，所以方程数目明显多于未知数，甚至只用一个样本就可以求解，但是为了不是在某一个位置，而是在整个空间中满足平面约束，仍然需要所有样本参与运算。第二，对于一些较远的平板上的点，kinect测量的深度测量会比较粗糙。对于这些点建立方程时，加入惩罚系数进行惩罚。 For the establishment of equations, we need to pay attention to two points: First, since each point on the plane can establish an equation, the number of equations is obviously more than the number of unknowns, and even only one sample can be solved, but in order not to be in a certain location, but to satisfy the planar constraint in the entire space, still requires all samples to participate in the operation. Second, for some distant points on the plate, the depth measurement measured by kinect will be rough. When establishing equations for these points, add a penalty coefficient to punish.

在步骤（8）中，建立约束方程AX=b的具体构造方式为：将3*3大小的混合参数和3维向量组合成12维向量。建立最小二乘意义下的约束方程组：，使用线性优化框架求解。其中，记，，，则在矩阵中对应的行构造为： In step (8), the specific construction method of establishing the constraint equation AX=b is: the mixing parameter of size 3*3 and a 3D vector combined into a 12-dimensional vector . Set up a system of constraint equations in the least squares sense: , solved using a linear optimization framework. Among them, remember , , , then in the matrix the corresponding row in Constructed as:

相应地，列向量为： Correspondingly, the column vector for:

使用最小二乘算法求解方程，得到和。 Using the least squares algorithm to solve the equation, we get and .

(7)深度图实时生成： (7) The depth map is generated in real time:

使用kinect连续拍摄深度图，对每幅图像上的每个像素，计算其二维坐标的齐次坐标表示和该点的深度的乘积。然后即可以得到该点在摄像机坐标系下三维坐标：，将该点投影到彩色图像上，得到其在彩色图像上的投影位置和对应的深度，其中：。由于深度图上的不同像素可能映射到彩色图像上的同一个坐标下，对于这种情况，保存最小深度作为彩色图像上该点的深度。 Use kinect to continuously shoot depth maps, and calculate the homogeneous coordinate representation of its two-dimensional coordinates for each pixel on each image and the depth of the point the product of . Then you can get the three-dimensional coordinates of the point in the camera coordinate system: , project the point onto the color image, and get its projected position on the color image and the corresponding depth ,in: . Since different pixels on the depth map may be mapped to the same coordinate on the color image, in this case, the minimum depth is saved as the depth of the point on the color image.

算法的结果如图2所示，本发明将同步拍摄的深度图（左上）映射到彩色图像上（左下）。映射的结果如右图所示，可以清楚地看到彩色图像和深度图在边缘上贴合紧密，从深度图上人的耳朵上取出一个像素点1，该像素点1通过本发明的算法准确地映射到了彩色图像人物的耳朵2上，并且通过这一步骤，获取得到人耳在彩色摄像机中的深度，附图标记3表示的是该像素点1在深度摄像机中的深度，附图标记4表示的是该像素点在彩色摄像机中的深度。 The result of the algorithm is shown in Figure 2. The invention maps the synchronously captured depth map (upper left) to a color image (lower left). The result of the mapping is shown in the figure on the right. It can be clearly seen that the color image and the depth map fit closely on the edge, and a pixel point 1 is taken from the human ear on the depth map, and the pixel point 1 is accurately detected by the algorithm of the present invention. is mapped onto the ear 2 of the person in the color image, and through this step, the depth of the human ear in the color camera is obtained, and the reference numeral 3 represents the depth of the pixel point 1 in the depth camera, and the reference numeral 4 Indicates the depth of the pixel in the color camera.

Claims

1. A real-time registration method for a depth map shot by a kinect and a video shot by a color camera is characterized by comprising the following steps:

(1) respectively fixing the high-definition color camera and the kinect depth camera, and setting synchronous shooting signals;

(2) randomly placing the checkerboards for calibration in front of the color camera and the kinect depth camera, and synchronously shooting to obtain color images and depth images of the checkerboards at the same time; repeating the operation for multiple times until a group of synchronous shooting results with different angles and different distances are obtained;

(3) manually and randomly marking one or a plurality of regions with any shapes in the region corresponding to the checkerboard in the depth map shot by kinect each time, and ensuring that the sum of the pixel number of the region marked on each depth map is not less than 10 pixels; calculating the product of homogeneous coordinate representation of two-dimensional coordinates of jth pixel point in any manually marked area and the depth of the point in the ith shot kinect depth map, and recording the product asAnd recording the set of all coordinates as；

(4) And for the coordinate set obtained by the ith shootingAll its elements are substituted into the spatial plane equation:performing plane fitting, and calculating to obtain normal vector of plane(ii) a For collectionsAll elements in (1)CalculatingMean value ofWill beAll satisfyFiltering out the remaining points as a new coordinate set；

(5) Calculating internal parameters of high-definition color camera by using Zhang Zhengyou classic color camera calibration algorithmAnd relative position relationship between the color camera and the checkerboard in each shooting, using a rotation matrixAnd translation vectorTo indicate the relative positional relationship of the i-th shot;

(6) let the mixing parameter be 3 x 3And a 3-dimensional vectorIs an unknown quantity; the point coordinates obtained in the two steps (4) and (5) and subjected to the noise filtering stepAnd rotation parameter of the ith shotAs a known quantityEstablishing a plane constraint linear equation:；

(7) according to the coordinates of the pointsCorresponding depth imposes penalty factor on equation in (6)The smaller the depth value, the larger the penalty coefficient is, so that the constraint equation corresponding to the point with the smaller depth has larger weight in the equation set;

(8) establishing a constraint equation setSolving using a linear optimization framework: 1) represents a translation parameter T between the color camera and kinect; 2) a blending parameter H consisting of depth camera internal parameters and relative rotation between the color camera and kinect;

(9) and (5) mapping the depth signal captured by kinect to the video captured by the high-definition camera in real time by using the H parameter and the T parameter obtained in the step (6).

2. The method as claimed in claim 1, wherein the real-time registration of the depth map captured by kinect with the video captured by color camera comprises: in step (7), based on the point coordinatesDetermining the weight and penalty coefficient of the plane constraint linear equation in the whole equation system according to the corresponding depthAs follows:

(unit meter)

Where depth refers to the depth to which the pixel point corresponds.

3. The method as claimed in claim 1, wherein the real-time registration of the depth map captured by kinect with the video captured by color camera comprises: in step (8), the specific construction manner for establishing constraint equation AX = b is as follows: mixing parameters of 3 x 3 sizeAnd a 3-dimensional vectorCombined into a 12-dimensional vectorEstablishing a constraint equation set in the least square sense:solving using a linear optimization framework: wherein, note，，Then in the matrixCorresponding row inThe structure is as follows:

accordingly, the column vectorComprises the following steps:

。

4. the method as claimed in claim 1, wherein the real-time registration of the depth map captured by kinect with the video captured by color camera comprises: in step (9), the method for mapping the depth signal captured by kinect to the video captured by the high-definition camera in real time by using the obtained two parameters of H and T is as follows: continuously shooting a depth map by using kinect, and calculating a homogeneous coordinate representation of two-dimensional coordinates of each pixel on each imageAnd the depth of the pointProduct of (2)Then, the three-dimensional coordinates of the point in the camera coordinate system are calculated:projecting the point on the color image to obtain the projection position on the color imageAnd corresponding depthAnd mapping the depth signal captured by kinect to the video captured by the high-definition camera in real time, wherein:since different pixels on the depth map may map to the same coordinate on the color image, for this case the minimum depth will be saved as the depth of the point on the color image.