CN100543775C

CN100543775C - Method of 3D Human Motion Tracking Based on Multi-camera

Info

Publication number: CN100543775C
Application number: CNB2007100442191A
Authority: CN
Inventors: 邓浩龙; 申抒含; 刘允才
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2007-07-26
Filing date: 2007-07-26
Publication date: 2009-09-23
Anticipated expiration: 2027-07-26
Also published as: CN101154289A

Abstract

The method that a kind of 3 d human motion based on many orders camera of technical field of computer vision is followed the tracks of, step is: the first, handle voxel data at the number of voxels strong point of extracting human body surface, obtains outermost point; The second, before being followed the tracks of, human motion makes up the three-dimensional human skeleton model of a standard carrying out skeleton extract, be used in the middle of the process of following the tracks of, it being carried out dynamic stance adjustment, to realize it and original three-dimensional image match; The 3rd, make up the human skeleton observation procedure, whether be the maximum skeleton pattern attitude of adjusting according to the tissue points number in every section bone sleeve; The 4th, the implementation method of structure Optimum Matching skeleton judges around the anglec of rotation of its father node whether current human skeleton attitude has reached tracer request by locus and every section bone of changing root node.The present invention obtains the three-dimensional framework of human body rapidly on the basis of existing human body voxel data, thereby human motion is followed the tracks of.

Description

Method of 3D Human Motion Tracking Based on Multi-camera

技术领域 technical field

本发明涉及一种电信技术领域的方法，具体是一种基于多目相机的三维人体运动跟踪的方法。The invention relates to a method in the technical field of telecommunications, in particular to a method for tracking three-dimensional human body motion based on a multi-eye camera.

背景技术 Background technique

三维人体运动跟踪是当前计算机视觉领域研究的一个热点和难点。从视频数据当中抽取人体骨架模型是一个很有效的监控和跟踪方法。从三维体素数据得到的人体骨架主要应用领域有：监控系统，虚拟现实，高级用户接口，智能环境，娱乐，运动分析，医学，教育等。三维体素英文单词为voxel，是Volume Element两个单词的缩写，把Voxel叫做“体素”，它就相当于二维图像中的“pixel”——“像素”，是三维空间中的基本小方块，包含每个点的X、Y和Z三个坐标等信息，通过多目相机来获取三维的人体体素数据来进行姿态估计和人体运动跟踪是一种很新颖的方法。然而仅仅有了三维人体的体素数据还不能足以解决许多实际问题。通过多目相机得到的体素数据直接进行重建并对其跟踪，由于相机数量的不同导致结果参差不齐，数据量总体来说很大，运算速度慢且不易进行数据实时传输和网络发布，达不到跟踪和监控的效果。人体运动跟踪既有二维人体运动跟踪，也有三维人体运动跟踪。三维人体运动跟踪相对于二维的要复杂得多。通过体素数据来分析人体运动已经很普遍，但是通过体素数据来得到准确的人体骨架从而对人体运动进行跟踪却很少有人涉及。这个工作意义比较特殊，既可以将得到的跟踪结果用于监控系统，又可以当作中间结果来进行人体姿势和动作的识别，作进一步的研究。总的来说，人体运动的准确跟踪意义重大，是整个计算机视觉领域的难题。Three-dimensional human motion tracking is a hot and difficult point in the field of computer vision. Extracting human skeleton models from video data is a very effective monitoring and tracking method. The main application fields of human skeleton obtained from 3D voxel data are: surveillance system, virtual reality, advanced user interface, intelligent environment, entertainment, motion analysis, medicine, education, etc. The English word for three-dimensional voxel is voxel, which is the abbreviation of the two words Volume Element. Voxel is called "voxel", which is equivalent to "pixel" in two-dimensional images - "pixel", which is the basic small element in three-dimensional space. The square contains information such as the X, Y, and Z coordinates of each point. It is a very novel method to obtain three-dimensional human voxel data through multi-eye cameras for pose estimation and human motion tracking. However, only the voxel data of 3D human body is not enough to solve many practical problems. The voxel data obtained by the multi-eye camera is directly reconstructed and tracked. Due to the difference in the number of cameras, the results are uneven, the data volume is generally large, the calculation speed is slow, and it is not easy to transmit data in real time and publish it on the Internet. Less than the effect of tracking and monitoring. Human motion tracking includes both two-dimensional human motion tracking and three-dimensional human motion tracking. 3D human motion tracking is much more complex than 2D. It is very common to analyze human motion through voxel data, but it is rarely involved in tracking human motion by obtaining accurate human skeleton through voxel data. The significance of this work is rather special, as the obtained tracking results can be used in the monitoring system, and can also be used as intermediate results to identify human body poses and actions for further research. In general, accurate tracking of human motion is of great significance and is a difficult problem in the entire field of computer vision.

经对现有技术文献的检索发现，Caillette，F.等人于2004年在《Third IEEE andACM International Symposium on Mixed and Augmented Reality》(IEEE和ACM第三届关于混合和扩展现实的国际研讨会)上发表的论文“Real-time markerlesshuman body tracking using colored voxels and 3D blobs”，(用彩色体素和三维模块实时跟踪无标记的人体)论文中用带有颜色的体素信息来对人体进行跟踪，该方法需要被跟踪目标和场景有比较好的颜色对比差，这就大大限制了它在工程当中的应用。After searching the existing technical literature, it was found that Caillette, F. et al. in 2004 in "Third IEEE and ACM International Symposium on Mixed and Augmented Reality" (IEEE and ACM Third International Symposium on Mixed and Extended Reality) The published paper "Real-time markerless human body tracking using colored voxels and 3D blobs", (using colored voxels and 3D modules to track unmarked human bodies in real time) uses colored voxel information to track the human body. The method needs to have a relatively good color contrast between the tracked target and the scene, which greatly limits its application in engineering.

发明内容 Contents of the invention

本发明的目的在于克服现有技术中的不足，提供一种基于多目相机的三维人体运动跟踪的方法，使其具有稳定性好，快速性好，精度性高，输出文件小，易于保存和传输的特点。本发明通过对人体运动的跟踪，准确地判断和预测人体的姿态、动作等，从而可以让计算机进行自行分析和判断，在一个监控系统当中，可以将此信息实时地传送到监控终端，从而可以实时发现异常情况并及时采取相应措施。The purpose of the present invention is to overcome the deficiencies in the prior art, to provide a method for three-dimensional human body motion tracking based on a multi-eye camera, which has good stability, good speed, high precision, small output file, easy to save and Transmission characteristics. The present invention accurately judges and predicts the posture and action of the human body by tracking the movement of the human body, so that the computer can analyze and judge by itself. In a monitoring system, this information can be transmitted to the monitoring terminal in real time, so that Detect abnormalities in real time and take corresponding measures in time.

本发明是通过以下技术方案实现的，包括以下步骤：The present invention is achieved through the following technical solutions, comprising the following steps:

第一，提取人体表面的体素数据点，对体素数据进行处理，得到最外层的点；First, extract the voxel data points on the surface of the human body, process the voxel data, and obtain the outermost points;

第二，在抽取骨架对人体运动进行跟踪之前构建一个标准的三维人体骨架模型，用于在跟踪的过程当中对其进行动态的姿势调整，以实现它和原始三维图像的拟合；Second, construct a standard 3D human skeleton model before extracting the skeleton to track the human body movement, which is used to dynamically adjust its posture during the tracking process to achieve its fitting with the original 3D image;

第三，构建人体骨架抽取方法，根据在三维人体骨架模型中每段骨头套筒内的体素点个数是否为最大调整骨架模型姿态；Third, construct a human skeleton extraction method, and adjust the posture of the skeleton model according to whether the number of voxel points in each bone sleeve in the three-dimensional human skeleton model is the maximum;

第四，构建最优匹配骨架的实现方法，通过改变三维人体骨架模型中根节点的空间位置和每段骨头绕其父节点的旋转角度来判断当前人体骨架姿态是否已经达到跟踪要求。Fourth, construct an optimal matching skeleton implementation method, by changing the spatial position of the root node in the 3D human skeleton model and the rotation angle of each bone around its parent node to determine whether the current human skeleton posture has met the tracking requirements.

所述提取人体表面的体素数据点，对体素数据进行处理，得到最外层的点，是指：采用16个摄像机得到三维人体体素数据之后，为了减小数据、提升处理速度，把人体最外面的体素寻找出来。有了三维体素数据，把所有体素都用计算机显示出来就可以得到三维图像。为了得到三维图像最外层的点，本发明设计了3×3×3的图像模板。其中3×3×3是指在三维空间中每个体素(包括它本身)的前、后、左、右、上、下六个方向组成立方体模板的共计27个点。不同于二维图像，这个模板是26连通域。对于任何一个体素，对上、下、左、右、前、后和它欧式距离小于或者等于

的点全部算作是该点的邻域点，共计26个。在判断过程当中判断该点的邻域点是否有背景点，若有则为最外层点。The extraction of voxel data points on the surface of the human body, and processing the voxel data to obtain the outermost point refers to: after using 16 cameras to obtain the three-dimensional human body voxel data, in order to reduce the data and improve the processing speed, the The outermost voxel of the human body is searched out. With 3D voxel data, a 3D image can be obtained by displaying all the voxels on a computer. In order to obtain the outermost point of the three-dimensional image, the present invention designs a 3*3*3 image template. Wherein 3×3×3 refers to a total of 27 points forming a cube template in the six directions of front, back, left, right, up and down of each voxel (including itself) in three-dimensional space. Unlike 2D images, this template is a 26-connected domain. For any voxel, the Euclidean distance between up, down, left, right, front, back and it is less than or equal to

The points of are all counted as the neighborhood points of this point, a total of 26 points. During the judging process, it is judged whether there is a background point in the neighborhood of the point, and if so, it is the outermost point.

所述构建一个标准的三维人体骨架模型，是指：在进行骨架抽取对人体运动进行跟踪之前，设计一个标准的三维人体骨架模型。在这个骨架模型当中，定义若干个节点，定义一个根节点，控制整个骨架的空间旋转和空间平移。每一个节点定义一个父节点，每段骨架都绕着它的父节点在局部坐标系内做旋转和平移，最后可以得出每个节点相对于根节点的空间位置的改变，从而可以确定每个节点在三维空间的坐标，进而确定每帧的骨架姿势。对于每段骨头，它有若干参数来确定，例如ID号、父节点、长度等。每根骨头定义唯一的ID号，通过ID号可以将当前节点和父节点连接起来；每段骨头的长度均不相同，可以通过设定或者机器学习来获得。The construction of a standard three-dimensional human skeleton model refers to designing a standard three-dimensional human skeleton model before performing skeleton extraction to track human motion. In this skeleton model, several nodes are defined, and a root node is defined to control the spatial rotation and spatial translation of the entire skeleton. Each node defines a parent node, and each skeleton rotates and translates around its parent node in the local coordinate system. Finally, the change of the spatial position of each node relative to the root node can be obtained, so that each The coordinates of the nodes in 3D space, which in turn determine the skeleton pose for each frame. For each bone, it has several parameters to determine, such as ID number, parent node, length and so on. Each bone defines a unique ID number, through which the current node can be connected to the parent node; the length of each bone is different, which can be obtained through setting or machine learning.

所述构建人体骨架抽取方法，是指：通过人体模型和骨架观测特征的匹配程度来判断人体运动跟踪的效果。通过前帧的跟踪结果并利用此帧根节点的空间位置和各段骨头绕其父节点的旋转角度参数来跟踪下帧的人体运动。第一帧的参数通过人机交互或者机器学习来获取。由于人体运动的渐变性，前后两帧参数变化通常在一定范围内，这个范围可以根据常识人为指定。采取了一个简单却又行之有效的方法：给每段骨头套一个套筒，计算在套筒内的体素点的个数，只要它达到最大并且得到的骨架在人体表层点以内，则停止计算，认为这个骨架姿势是最优骨架姿势，达到了通过骨架姿势来跟踪人体运动的目的。The method for constructing a human skeleton extraction refers to judging the effect of human body motion tracking according to the matching degree of the human body model and the observed features of the skeleton. Through the tracking results of the previous frame and the spatial position of the root node of this frame and the rotation angle parameters of each bone around its parent node, the human body motion of the next frame is tracked. The parameters of the first frame are obtained through human-computer interaction or machine learning. Due to the gradual change of human body motion, the parameter changes of the two frames before and after are usually within a certain range, and this range can be artificially specified according to common sense. A simple but effective method is adopted: put a sleeve on each bone, calculate the number of voxel points in the sleeve, and stop as long as it reaches the maximum and the obtained skeleton is within the surface point of the human body According to the calculation, this skeleton pose is considered to be the optimal skeleton pose, and the purpose of tracking the human body movement through the skeleton pose is achieved.

所述构建最优匹配骨架的实现方法，是指：为使人体骨架和每帧的三维体素达到最佳匹配，确定每帧骨架的每个节点确切的空间位置，从而必须得到控制整个骨架平移和旋转的根节点的三维坐标。在得到第一帧三维人体骨架之后，下一帧骨架姿态以前一帧骨架作为基准进行调整。改变每帧骨架姿势的参数主要是每根骨头绕它父节点所在局部坐标系分别绕X、Y、Z坐标轴做旋转变换，在这个变换的过程当中采用概率遗传算法来对每根骨头绕其父节点的角度进行迭代，直到得到的人体骨架在人体以内且能够代表人体姿态为止。其中每段骨头有个局部最优值，所有的骨头都达到局部最优值时整个骨架姿势达到最优，对本帧的跟踪结束，进入下一帧的跟踪，直到最后一帧结束为止。The implementation method of constructing the optimal matching skeleton refers to: in order to achieve the best matching between the human skeleton and the three-dimensional voxels of each frame, determine the exact spatial position of each node of the skeleton in each frame, so that the translation of the entire skeleton must be controlled and the three-dimensional coordinates of the rotated root node. After the first frame of 3D human skeleton is obtained, the skeleton pose of the next frame is adjusted based on the skeleton of the previous frame. The parameters for changing the skeleton pose of each frame are mainly to rotate each bone around the local coordinate system where its parent node is located around the X, Y, and Z coordinate axes. The angle of the parent node is iterated until the obtained human skeleton is inside the human body and can represent the posture of the human body. Each bone has a local optimal value, and when all the bones reach the local optimal value, the whole skeleton posture reaches the optimal value, the tracking of this frame ends, and the tracking of the next frame starts until the end of the last frame.

与现有技术相比，本发明简单有效，它的关键在于在已有的人体体素数据的基础上迅速的得到人体的三维骨架，从而可以很清楚地判断人体的运动姿势以对人体运动进行跟踪。利用本发明进行人体运动跟踪，既可以作为结果来利用，又可以作为下一步的模式识别的中间结果。由于每帧人体骨架文件的数据量小(小于1K)，而每帧三维图像文件的大小很大(5M以上)，所以得到的人体骨架文件可以大大节省存储空间、节省网络当中的传输时间和传输成本，特别适用于监控系统当中，同时也可以应用于计算机动画，游戏，虚拟现实等领域。Compared with the prior art, the present invention is simple and effective, and its key lies in rapidly obtaining the three-dimensional skeleton of the human body on the basis of the existing human body voxel data, so that the movement posture of the human body can be clearly judged and the human body movement can be adjusted accordingly. track. The human body movement tracking by using the present invention can not only be used as the result, but also can be used as the intermediate result of the pattern recognition in the next step. Since the data volume of each frame of human skeleton file is small (less than 1K), and the size of each frame of three-dimensional image file is large (more than 5M), the obtained human skeleton file can greatly save storage space, save transmission time and transmission time in the network Cost, especially suitable for monitoring systems, but also can be applied to computer animation, games, virtual reality and other fields.

附图说明 Description of drawings

图1为本发明实施例相机和物体的空间位置示意图Fig. 1 is a schematic diagram of the spatial position of the camera and the object according to the embodiment of the present invention

图2为本发明实施例3×3×3图像模板示意图Fig. 2 is a schematic diagram of a 3×3×3 image template according to an embodiment of the present invention

图3为本发明实施例三维人体体素图像组图，其中：从左到右每张图像名称分别为第15帧，第45帧，第75帧，第105帧，第135帧，第165帧。Fig. 3 is a group diagram of three-dimensional human body voxel images according to the embodiment of the present invention, wherein: from left to right, each image name is the 15th frame, the 45th frame, the 75th frame, the 105th frame, the 135th frame, and the 165th frame .

图4为本发明实施例标准骨架模型图Fig. 4 is the standard skeleton model diagram of the embodiment of the present invention

图5为本发明实施例骨架套筒示意图Fig. 5 is the schematic diagram of the skeleton sleeve of the embodiment of the present invention

图6为本发明实施例结果组图(仅骨架)Fig. 6 is a group diagram of the results of the embodiment of the present invention (only skeleton)

其中：从左到右每张图像名称分别为第15帧，第45帧，第75帧，第105帧，第135帧，第165帧。Among them: from left to right, the names of each image are the 15th frame, the 45th frame, the 75th frame, the 105th frame, the 135th frame, and the 165th frame.

图7实施例结果组图(骨架和体素)Fig. 7 Example result group diagram (skeleton and voxel)

具体实施方式 Detailed ways

下面结合附图对本发明的实施例作详细说明：本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和过程，但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below in conjunction with the accompanying drawings: the present embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and processes are provided, but the protection scope of the present invention is not limited to the following implementations example.

1.提取人体表面体素。如图1所示，在一个场景当中布置16个摄像机，图中相机为上下两层，通过各个相机得到人体的二维图像并通过这些图像来构建三维图像，组成三维图像的每个基元就是体素。为了减少计算机的计算开支，提高整个方法处理速度，降低处理时间，采用如图2所示的3×3×3模板对每帧的数据点进行处理，图中实心圆圈代表当前所要检测的体素，空心圆圈代表它周围的共计26个相邻体素。整个视频共计191帧，如图3所示的最外层体素所组成的图像为从第15帧开始隔30帧取一帧的图像。为了更加清晰地显示实施结果，本实施例采用了隔十个点取一个点的方式显示原始数据。具体步骤如下：1. Extract human body surface voxels. As shown in Figure 1, 16 cameras are arranged in a scene. In the figure, the cameras are divided into upper and lower layers. Two-dimensional images of the human body are obtained through each camera and a three-dimensional image is constructed from these images. Each primitive that composes a three-dimensional image is voxel. In order to reduce the computational expense of the computer, increase the processing speed of the whole method, and reduce the processing time, the data points of each frame are processed using the 3×3×3 template as shown in Figure 2. The solid circles in the figure represent the voxels to be detected currently , the hollow circle represents a total of 26 adjacent voxels around it. The entire video has a total of 191 frames, and the image composed of the outermost voxels shown in Figure 3 is an image taken every 30 frames from the 15th frame. In order to display the implementation results more clearly, this embodiment adopts a method of taking every ten points to display the original data. Specific steps are as follows:

(1)先对每帧三维体素数据进行一次完整扫描，分别找出X、Y、Z方向的点的坐标的最大值和最小值。(1) First, a complete scan is performed on each frame of 3D voxel data, and the maximum and minimum values of the coordinates of the points in the X, Y, and Z directions are respectively found.

(2)根据求出的最大值和最小值，每个方向的最大值相减分别得到M、N、P，建立一个M×N×P维的三维网格空间，有体素的地方置1，否则置0。(2) According to the calculated maximum value and minimum value, the maximum value in each direction is subtracted to obtain M, N, P respectively, and a three-dimensional grid space of M×N×P dimensions is established, and the voxel is set to 1 , otherwise set to 0.

(3)利用3×3×3模板对每帧体素进行处理，将非最外层的点的值由1置0，从而得到目标点。(3) Use a 3×3×3 template to process each frame of voxels, and change the value of the non-outermost point from 1 to 0, so as to obtain the target point.

2.构建标准人体骨架模型2. Construct a standard human skeleton model

(1)确定人体骨架基本信息。如图4所示为本发明实施例的标准骨架，它由若干根骨头组成，每根骨头的长度根据不同的人分别确定不同的长度，标注好每段骨架的ID号，确定好节点之间的相互关系。(1) Determine the basic information of the human skeleton. As shown in Fig. 4, it is the standard skeleton of the embodiment of the present invention, and it is made up of several bones, and the length of each bone determines different lengths respectively according to different people, marks the ID number of every section skeleton, determines the distance between the nodes. mutual relationship.

(2)在这个标准骨架模型当中，定义若干个节点。其中根节点控制着整个骨架在世界坐标系的空间旋转和空间平移。每段骨架都绕着它的父节点在局部坐标系内做旋转和平移，最后可以得到每个节点相对于根节点的空间位置的改变。对于绕与坐标轴不重合的轴进行旋转的变换矩阵，利用平移和坐标轴旋转复合而得到。针对一般三维旋转情况：平移骨架使其与平行于该骨架的一个坐标轴重合；对于该轴完成指定的旋转；平移对象将骨架移回到原来的位置。每段骨头都可以绕父节点的平移和旋转来完成姿态的调整，从而达到跟踪的目的。(2) In this standard skeleton model, define several nodes. The root node controls the space rotation and translation of the entire skeleton in the world coordinate system. Each skeleton is rotated and translated around its parent node in the local coordinate system, and finally the change of the spatial position of each node relative to the root node can be obtained. For the transformation matrix that rotates around an axis that does not coincide with the coordinate axis, it is obtained by compounding translation and coordinate axis rotation. For general 3D rotations: translate the skeleton to coincide with a coordinate axis parallel to the skeleton; complete the specified rotation for that axis; translate the object to move the skeleton back to its original position. Each bone can be translated and rotated around the parent node to complete the adjustment of the posture, so as to achieve the purpose of tracking.

3.构建人体骨架抽取方法3. Construction of human skeleton extraction method

在人体模型和骨架观测特征的匹配程度的过程中，通过前帧的跟踪结果并利用此帧根节点的空间位置和各段骨头绕其父节点的旋转角度这些参数来跟踪下帧的人体运动。第一帧的参数通过人机交互或者机器学习来获取。通过概率遗传算法，设置好种群规模数和迭代次数这两个参数后，以前帧的骨架作为后帧体素数据的参考，进行运算，可以快速地得到下一帧的骨架，从而对人体运动进行跟踪。如图5所示，给每段骨头套一个套筒，假设人体骨架总共有N根骨架，则要给骨架套N个套筒，并计算每段骨架套筒内的体素点的个数，定义的骨架观测函数如下：In the process of matching the human body model and skeleton observation features, the human body motion of the next frame is tracked by using the tracking results of the previous frame and using the spatial position of the root node of this frame and the rotation angle of each bone around its parent node. The parameters of the first frame are obtained through human-computer interaction or machine learning. Through the probabilistic genetic algorithm, after setting the two parameters of the population size and the number of iterations, the skeleton of the previous frame is used as a reference for the voxel data of the next frame to perform calculations, and the skeleton of the next frame can be quickly obtained, so as to perform human motion track. As shown in Figure 5, put a sleeve on each bone, assuming that the human skeleton has N skeletons in total, then put N sleeves on the skeleton, and calculate the number of voxel points in each skeleton sleeve, The defined skeleton observation function is as follows:

$F f ((s the s)) = = {Σ Σ}_{i i = = 11}^{K K} {Σ Σ}_{j j = = 11}^{N N} {x x}_{j j},,$ ${x x}_{j j} = = \{\begin{matrix} 11 & {voxel voxels}_{j j} &Element; &Element; U u (({s the s}_{i i})) \\ 00 & {voxel voxels}_{j j} &NotElement; &NotElement; U u (({s the s}_{i i})) \end{matrix}$

K表示定义的骨头数目，N表示Voxel数据中点的数量，K represents the number of defined bones, N represents the number of points in the Voxel data,

s＝[s₁，s₂，...，s_k]，s表示整个骨架，s_i表示骨头，s=[s ₁ , s ₂ ,..., s _k ], s represents the whole skeleton, s _i represents the bone,

U(s_i)表示骨架套筒内的空间。U(s _i ) represents the space inside the skeleton sleeve.

在进行优化的过程中，当在筒内的三维点的数目最大时就认为此时跟踪效果最佳。第t帧的跟踪等价于在所有骨架参数的可行域内寻找使骨架效果最佳的骨架参数，由于参数s维数较高且骨架观测函数的地形曲面通常多峰且含有很多局部极值，传统优化算法难以找到全局最优点，故利用概率遗传算法对骨架观测函数进行优化以达到跟踪的目的。During the optimization process, the tracking effect is considered to be the best when the number of three-dimensional points in the cylinder is the largest. The tracking of the t-th frame is equivalent to finding the skeleton parameters with the best skeleton effect in the feasible region of all skeleton parameters. Since the parameter s has a high dimension and the terrain surface of the skeleton observation function is usually multi-peaked and contains many local extremums, traditional It is difficult for the optimization algorithm to find the global optimal point, so the probabilistic genetic algorithm is used to optimize the skeleton observation function to achieve the purpose of tracking.

4.构建最优匹配骨架的实现方法4. The realization method of constructing the optimal matching skeleton

确定每帧骨架的每个节点的确切的空间位置，根节点的空间三维坐标可以通过手动或者机器学习得到。运用概率遗传算法并且结合三维人体运动分析，对每段骨头父节点所在坐标系的三个坐标轴的旋转角度进行复合位编码，步骤如下：The exact spatial position of each node of each frame skeleton is determined, and the spatial three-dimensional coordinates of the root node can be obtained manually or by machine learning. Using the probabilistic genetic algorithm and combining the three-dimensional human motion analysis, the rotation angles of the three coordinate axes of the coordinate system where the parent node of each bone is located are compound-bit coded, and the steps are as follows:

1)对人体骨架进行分析，结合人的实际关节运动的极限确定所要进行复合位编码的个体的个数，这样可以在不影响结果情况下提升处理速度，节约时间；1) Analyze the human skeleton and determine the number of individuals to be coded in combination with the limit of the actual joint motion of the person, so that the processing speed can be improved without affecting the result and time can be saved;

2)对每根骨头所必须要用到的参数(角度)进行复合位编码；2) Composite bit coding is performed on the parameters (angles) that must be used by each bone;

3)确定种群规模数和迭代次数，种群规模数越高，迭代次数越多，人体骨架姿势和原始三维图像就越匹配；3) Determine the population size and the number of iterations. The higher the population size, the more iterations, the more the human skeleton pose matches the original 3D image;

4)产生初始种群(各根骨头绕相应的父节点的旋转角度的集合)并对其进行观测，利用骨架观测函数来观测在套筒里的体素的个数；4) Generate the initial population (the set of rotation angles of each bone around the corresponding parent node) and observe it, and use the skeleton observation function to observe the number of voxels in the sleeve;

5)若此数量在迭代过程当中先增长再趋于平稳，则转步骤6)；若一直增长则需要产生新种群，回到上一步骤；5) If the number increases first and then becomes stable during the iterative process, then go to step 6); if it keeps growing, a new population needs to be generated, and then return to the previous step;

6)迭代结束。6) The iteration ends.

上述实施例结果如图6和图7所示：图6中从左到右每帧图像对应图3中从左到右每帧图像，为得到的跟踪骨架，图7中从左到右每帧图像也对应图3中从左到右每帧图像，但它也同时显示得到的跟踪骨架。在实施过程中拍摄的是一个人在一个三维空间内的基本运动，包括正常的走路，伸手，拐弯等基本姿势。每帧图像包括大概45000个三维点。实施结果表明，在不用体素颜色信息的情况下不仅能够很好的得到骨架信息，而且节省了因为使用这个而增加的额外计算开销。从每帧体素数据得到该帧骨架数据需要大概1分钟的时间，基本达到实时要求，大大优于用其它方法。从而可以看出本发明得到的骨架全部都在人体以内并且能够100％的反应此刻人体的运动姿势，能够对下一步的动作识别，数据传输起到关键性的作用。The results of the above embodiment are shown in Figure 6 and Figure 7: each frame of image from left to right in Figure 6 corresponds to each frame of image from left to right in Figure 3, for the obtained tracking skeleton, each frame from left to right in Figure 7 The image also corresponds to each frame from left to right in Figure 3, but it also shows the resulting tracked skeleton. During the implementation process, the basic movements of a person in a three-dimensional space are captured, including basic postures such as normal walking, reaching out, and turning. Each image frame contains about 45000 3D points. The implementation results show that not only the skeleton information can be obtained well without the voxel color information, but also the additional calculation overhead caused by using this is saved. It takes about 1 minute to obtain the skeleton data of each frame of voxel data, which basically meets the real-time requirements and is much better than other methods. It can be seen that all the skeletons obtained by the present invention are within the human body and can 100% reflect the movement posture of the human body at the moment, and can play a key role in the next step of action recognition and data transmission.

Claims

1, a kind of method of following the tracks of based on the 3 d human motion of many orders camera is characterized in that, may further comprise the steps:

The first, handle voxel data at the number of voxels strong point of extracting human body surface, obtains outermost point;

The second, before being followed the tracks of, human motion makes up the three-dimensional human skeleton model of a standard carrying out skeleton extract, be used in the middle of the process of following the tracks of, it being carried out dynamic stance adjustment, to realize the match of it and original three-dimensional image;

The 3rd, make up the human skeleton abstracting method, whether be the maximum skeleton pattern attitude of adjusting according to the tissue points number in every section bone sleeve in the three-dimensional human skeleton model;

The 4th, the implementation method of structure Optimum Matching skeleton judges around the anglec of rotation of its father node whether current human skeleton attitude has reached tracer request by locus and every section bone of changing root node in the three-dimensional human skeleton model.

2, method of following the tracks of according to claim 1 based on the 3 d human motion of many orders camera, it is characterized in that, the number of voxels strong point of described extraction human body surface, voxel data is handled, obtain outermost point, be meant: obtain the 3 D human body voxel data and obtain 3-D view with the computing machine demonstration by 16 video cameras, adopt 3 * 3 * 3 image template to obtain the outermost point of 3-D view then, wherein 3 * 3 * 3 be meant each voxel in three dimensions, comprise itself before, after, a left side, right, on, following six direction is formed 27 points that amount to of cube template, this template is 26 connected domains, for any one voxel, to last, down, a left side, right, before, back and its Euclidean distance are less than or equal to

Point all to can be regarded as be this neighborhood of a point point, amount to 26, judge in the middle of deterministic process whether this neighborhood of a point point has powerful connections a little, then be the outermost layer point as if having.

3, method of following the tracks of according to claim 2 based on the 3 d human motion of many orders camera, it is characterized in that, the three-dimensional human skeleton model of a standard of described structure, be meant: in the middle of the three-dimensional human skeleton model, definition plurality of nodes and a root node, control the space rotation and the spatial translation of whole skeleton, father node of each node definition, every section skeleton is all done rotation and translation around its father node in local coordinate system, draw of the change of each node at last with respect to the locus of root node, thereby determine each node at three-dimensional coordinate, and then determine the skeleton posture of every frame.

4, method of following the tracks of according to claim 3 based on the 3 d human motion of many orders camera, it is characterized in that, described every section bone, it has several parameters to determine, comprise ID number, father node, length, unique ID number of every bone definition couples together present node and father node by ID number, the length of every section bone is all inequality, obtains by setting or machine learning.

5, method of following the tracks of according to claim 4 based on the 3 d human motion of many orders camera, it is characterized in that, described every section skeleton is all done rotation and translation around its father node in local coordinate system, wherein: for around the transformation matrix that is rotated of axle that does not overlap with coordinate axis, utilize the rotation of translation and coordinate axis compound and obtain; Rotating the peaceful condition of shifting one's love comprises: the translation skeleton makes it overlap with a coordinate axis that is parallel to this skeleton, for this rotation and translation of object of finishing appointment skeleton is moved back into original position.

6, method of following the tracks of according to claim 1 based on the 3 d human motion of many orders camera, it is characterized in that, described structure human skeleton abstracting method, be meant: judge the effect that human motion is followed the tracks of according to the matching degree of manikin and skeleton observational characteristic, tracking results by preceding frame is also utilized the locus of this frame root node and each section bone is followed the tracks of down frame around the anglec of rotation parameter of its father node human motion, the parameter of first frame is obtained by man-machine interaction or machine learning, front and back two frame parameter variation ranges adopt following method to determine: give every section bone cover a sleeve, the number of the tissue points of calculating in sleeve, as long as it reach maximum and the skeleton that obtains in body surface's point, then stop to calculate, think that this skeleton posture is optimum skeleton posture, has reached the purpose of following the tracks of human motion by the skeleton posture.

7, method of following the tracks of according to claim 1 based on the 3 d human motion of many orders camera, it is characterized in that, the implementation method of described structure Optimum Matching skeleton, be meant: after obtaining the first frame three-dimensional human skeleton, next frame skeleton attitude is adjusted as benchmark with the former frame skeleton, the parameter that changes every frame skeleton posture be every bone around its father node place local coordinate system respectively around X, Y, the Z coordinate axis is done rotational transform, in the middle of the process of this conversion, adopt the probability genetic algorithm to come every bone is carried out iteration around the angle of its father node, up to the human skeleton that obtains human body with interior and can represent human body attitude till, wherein every section bone has individual local optimum, whole skeleton posture reached optimum when all bone all reached local optimum, tracking to this frame finishes, enter the tracking of next frame, to the last till the frame end.

8, according to claim 1 or 6 described methods of following the tracks of based on the 3 d human motion of many orders camera, it is characterized in that, the implementation method of described structure Optimum Matching skeleton, be specially: the definite locus of determining each node of every frame skeleton earlier, the 3 d space coordinate of root node obtains by manual or machine learning, uses the probability genetic algorithm that the anglec of rotation of three coordinate axis of every section bone father node place coordinate system is carried out compound position coding then:

1) determines to carry out the number of the individuality of encoding compound position in conjunction with the limit of people's actual joint motions;

2) parameter that will use every bone is that the anglec of rotation of three coordinate axis of every section bone father node place coordinate system is carried out compound position coding;

3) determine population scale number and iterations, the population scale number is high more, and iterations is many more, and human skeleton posture and original three-dimensional image are just mated more;

4) producing initial population is the set of each root bone around the anglec of rotation of corresponding father node, and it is observed, and utilizes the skeleton observation function to observe the number of the voxel in sleeve;

5) if this quantity increases earlier to tend to be steady again, then change step 6) in the middle of iterative process; Then need to produce new population if increase always, it is rapid to get back to previous step;

6) iteration finishes.

9, method of following the tracks of based on the 3 d human motion of many orders camera according to claim 8 is characterized in that, described skeleton observation function is specific as follows:

F (s) = Σ_{i = 1}^{K} Σ_{j = 1}^{N} x_{j},

x_{j} = \{\begin{matrix} 1 & {voxel}_{j} &Element; U (s_{i}) \\ 0 & {voxel}_{j} &NotElement; U (s_{i}) \end{matrix}

K represents the bone number that defines, and N represents the quantity of Voxel data mid point,

S=[s ₁, s ₂..., s _k], s represents whole skeleton, s _iExpression bone,

U (s _i) the interior space of expression skeleton sleeve.