CN108392207B

CN108392207B - An Action Recognition Method Based on Gesture Labels

Info

Publication number: CN108392207B
Application number: CN201810133363.0A
Authority: CN
Inventors: 徐嘉晨; 张晓云; 刘小通; 周建益
Original assignee: Northwestern University
Current assignee: Northwestern University
Priority date: 2018-02-09
Filing date: 2018-02-09
Publication date: 2020-12-11
Anticipated expiration: 2038-02-09
Also published as: CN108392207A

Abstract

The present invention provides an action recognition method based on gesture labels, which abstracts action recognition into gesture recognition, abstracts gestures into gesture labels based on the relative position method of key nodes, and finds out what human beings have by comparing the posture changes of humans in a certain period of time. The method reduces the difficulty of building a template library, greatly reduces the speed and operation requirements of action recognition, and improves the versatility of action recognition for recognizing individuals. The method has important application value in the fields of human-computer interaction, virtual reality, video surveillance, and motion feature analysis.

Description

An Action Recognition Method Based on Gesture Labels

技术领域technical field

本发明属于动作识别技术领域，涉及一种基于姿态标签的动作识别方法。The invention belongs to the technical field of action recognition, and relates to an action recognition method based on gesture tags.

背景技术Background technique

动作识别是近年研究的一个热点，现有的动作识别领域产生的研究成果应用于人防安保、人类生活习性研究、人机交互、虚拟现实等各个领域，并产生了很大的积极效果。传统的动作识别通过图像处理相关的技术方法直接对图像(包含视频、若干照片等)进行分析，通过对图像进行分割，特征提取，动作特征提取，动作特征分类等步骤，最后实现动作识别。现有的动作识别方法虽然取得很大的进展，但仍存在一定的问题，如运算量巨大；动作特征库不好建立，需要专业人士录入素材；对与素材不同体型身高的人类进行识别时，精度会产生较大幅度下降等。Action recognition has been a hot research topic in recent years. The existing research results in the field of action recognition have been applied to various fields such as civil air defense and security, research on human life habits, human-computer interaction, and virtual reality, and have produced great positive effects. The traditional action recognition directly analyzes the image (including video, several photos, etc.) through the technical methods related to image processing, and finally realizes the action recognition through the steps of image segmentation, feature extraction, action feature extraction, action feature classification and so on. Although the existing action recognition methods have made great progress, there are still some problems, such as the huge amount of calculation; the action feature database is not easy to establish, and professionals need to input materials; Accuracy will be greatly reduced and so on.

发明内容SUMMARY OF THE INVENTION

针对现有技术中存在的问题，本发明的目的在于，提供一种基于姿态标签的动作识别方法，解决了现有动作识别技术中存在的计算量大，模板库建立困难及模板库通用性差的问题。Aiming at the problems existing in the prior art, the purpose of the present invention is to provide an action recognition method based on gesture tags, which solves the problems of large amount of calculation, difficulty in establishing a template library and poor generality of the template library in the existing motion identification technology. question.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种将动作分解为姿态标签的方法，包括以下步骤：A method for decomposing actions into pose labels, including the following steps:

步骤1，采用骨骼追踪设备获取人体躯干动作在每个时刻下的关键节点的位置数据，该关键节点的位置数据为骨骼追踪设备坐标系下的数据；该关键节点至少包括关键节点HEAD、关键节点SHOULDER CENTER、关键节点SPINE、关键节点HIP CENTER、关键节点SHOULDE RIGHT、关键节点SHOULDER LEFT、关键节点ELBOW RIGHT、关键节点ELBOW LEFT、关键节点WRIST RIGHT、关键节点WRIST LEFT、关键节点HAND RIGHT、关键节点HAND LEFT、关键节点HIP RIGHT、关键节点HIP LEFT、关键节点KNIEE RIGHT、关键节点KNIEE LEFT、关键节点ANIKLE RIGHT、关键节点ANIKLE LEFT、关键节点FOOT RIGHT、关键节点FOOT LEFT；Step 1, using the skeleton tracking device to obtain the position data of the key node of the human body torso action at each moment, the position data of the key node is the data in the coordinate system of the skeleton tracking device; the key node includes at least the key node HEAD, the key node SHOULDER CENTER, key node SPINE, key node HIP CENTER, key node SHOULDE RIGHT, key node SHOULDER LEFT, key node ELBOW RIGHT, key node ELBOW LEFT, key node WRIST RIGHT, key node WRIST LEFT, key node HAND RIGHT, key node HAND LEFT, key node HIP RIGHT, key node HIP LEFT, key node KNIEE RIGHT, key node KNIEE LEFT, key node ANIKLE RIGHT, key node ANIKLE LEFT, key node FOOT RIGHT, key node FOOT LEFT;

步骤2，将步骤1得到的每个时刻下的关键节点的位置数据分别转化为形态学坐标系下的关键节点的位置数据；该形态学坐标系以人体躯干的面对方向为Z轴正方向，以人体躯干的形态学上端方向为Y轴正方向，以人的左侧方向为X轴正方向，以关键节点HIPCENTER为原点；Step 2: Convert the position data of the key nodes at each moment obtained in step 1 into the position data of the key nodes under the morphological coordinate system; the morphological coordinate system takes the facing direction of the human torso as the positive direction of the Z axis. , take the morphological upper end of the human body as the positive direction of the Y-axis, take the left direction of the person as the positive direction of the X-axis, and take the key node HIPCENTER as the origin;

步骤3，利用步骤2得到的每个时刻下的形态学坐标系下的关键节点的位置数据分别求每个时刻下的姿态标签，该姿态标签包括主体姿态标签GL_body、左前肢姿态标签GL_lf、右前肢姿态标签GL_rf、左后肢姿态标签GL_lb和右后肢姿态标签GL_rb。Step 3, using the position data of the key nodes under the morphological coordinate system obtained in step 2 at each moment to obtain the posture label at each moment, the posture label includes the main body posture label GL _body , the left forelimb posture label GL _lf , right forelimb pose label _GLrf , left hindlimb pose label _GLlb and right hindlimb pose label _GLrb .

可选地，所述步骤3中的主体姿态标签GL_body的求取方法如下：Optionally, the method for obtaining the body pose label GL _body in the step 3 is as follows:

选取X_F，Y_F和Z_F中绝对值最大的坐标值，找到该坐标值所属区间对应的GL_body的值即为主体姿态标签GL_body的值，采用的公式如下：Select the coordinate value with the largest absolute value among X _F , Y _F and Z _F , and find the value of the GL _body corresponding to the interval to which the coordinate value belongs, which is the value of the subject attitude label GL _body . The formula used is as follows:

其中，X_F，Y_F和Z_F分别为单位向量F的3个坐标轴的坐标；单位向量

为关键节点HEAD与关键节点HIP CENTER在骨骼追踪设备坐标系下形成的向量；Among them, X _F , Y _F and Z _F are the coordinates of the three coordinate axes of the unit vector F respectively; the unit vector

It is the vector formed by the key node HEAD and the key node HIP CENTER in the coordinate system of the bone tracking device;

所述步骤3中的左前肢姿态标签GL_lf、右前肢姿态标签GL_rf、左后肢姿态标签GL_lb和右后肢姿态标签GL_rb的求取方法如下：The method for obtaining the left forelimb posture label GL _lf , the right forelimb posture label GL _rf , the left hind limb posture label GL _lb and the right hind limb posture label GL _rb in the step 3 is as follows:

所述四种姿态标签均包括三个关键节点，记为关键节点1、关键节点2和关键节点3，对于左前肢姿态标签GL_lf包括的三个关键节点分别为ELBOW LEFT、WRIST LEFT和HANDLEFT，对于右前肢姿态标签GL_rf包括的三个关键节点分别为KNIEE LEFT、ANIKLE LEFT和FOOT LEFT，对于左后肢姿态标签GL_lb包括的三个关键节点分别为ELBOW LEFT、WRIST LEFT和HAND LEFT，右后肢姿态标签GL_rb包括的三个关键节点分别为KNIEE LEFT、ANIKLE LEFT和FOOT LEFT。The four attitude labels all include three key nodes, which are denoted as key node 1, key node 2 and key node 3, and the three key nodes included in the left forelimb attitude label GL _lf are ELBOW LEFT, WRIST LEFT and HANDLEFT respectively, For the right forelimb pose label GL _rf , the three key nodes included are KNIEE LEFT, ANIKLE LEFT and FOOT LEFT, respectively, for the left hindlimb pose label GL _lb , the three key nodes included are ELBOW LEFT, WRIST LEFT and HAND LEFT, respectively. The three key nodes included in the attitude label GL _rb are KNIEE LEFT, ANIKLE LEFT and FOOT LEFT.

该三个关键节点在形态学坐标系下的数据分别用(X₁,Y₁,Z₁)(X₂,Y₂,Z₂)(X₃,Y₃,Z₃)表示；上述四种姿态标签均包括高度标签G1、方位标签G2和蜷曲标签G3；The data of the three key nodes in the morphological coordinate system are respectively represented by (X ₁ , Y ₁ , Z ₁ ) (X ₂ , Y ₂ , Z ₂ ) (X ₃ , Y ₃ , Z ₃ ); the above four Attitude labels include height label G1, orientation label G2 and curl label G3;

其中，高度标签G1的求取方法如下：Among them, the calculation method of the height label G1 is as follows:

G1＝(g₁+g₂+g₃)/3取整，其中，G1=(g ₁ +g ₂ +g ₃ )/3 rounding, where,

其中，n＝1,2,3，Y_H为关键节点HEAD的形态学坐标系下的Y轴坐标，Y_HC为关键节点SHOULDER CENTER的形态学坐标系下的Y轴坐标；Among them, n=1, 2, 3, Y _H is the Y-axis coordinate in the morphological coordinate system of the key node HEAD, and Y _HC is the Y-axis coordinate in the morphological coordinate system of the key node SHOULDER CENTER;

方位标签G2的求取方法如下：The method of obtaining the orientation label G2 is as follows:

统计关键节点1、关键节点2和关键节点3的X轴坐标和Z轴坐标的符号，采用以下公式求方位标签G2：Count the symbols of the X-axis and Z-axis coordinates of key node 1, key node 2 and key node 3, and use the following formula to find the bearing label G2:

蜷曲标签G3的求取方法如下：The method of obtaining the curled label G3 is as follows:

根据关键节点1、关键节点2和关键节点3，引入关键节点4，分别计算关键节点1、关键节点2和关键节点3，与关键节点4之间的距离D₁，D₂，D₃；对于左前肢姿态标签GL_lf，关键节点4为SHOULDER LEFT，对于右前肢姿态标签GL_rf，关键节点4为SHOULDER RIGHT，对于左后肢姿态标签GL_lb，关键节点4为HIP LEFT，对于右后肢姿态标签GL_rb，关键节点4为HIP RIGHT；According to the key node 1, the key node 2 and the key node 3, the key node 4 is introduced, and the distances D ₁ , D ₂ , D ₃ between the key node 1, the key node 2 and the key node 3 and the key node 4 are calculated respectively; for Left forelimb pose label GL _lf , key node 4 is SHOULDER LEFT, for right fore limb pose label GL _rf , key node 4 is SHOULDER RIGHT, for left hind limb pose label GL _lb , key node 4 is HIP LEFT, for right hind limb pose label GL _rb , the key node 4 is HIP RIGHT;

蜷曲标签G3的取值方法采用如下公式：The value method of the curled label G3 adopts the following formula:

本发明还提供一种求取动作模板库的方法，包括以下步骤：The present invention also provides a method for obtaining an action template library, comprising the following steps:

步骤1，多次作出标准动作，将每次作出的标准动作分解为各个时刻下的姿态标签；选取初始时刻的姿态标签为起始帧姿态标签，选取终止时刻的姿态标签为终止帧姿态标签；将第一次作出的标准动作为对比标准动作，其他次作出的标准动作为参照标准动作；对比标准动作对应的起始帧姿态标签作为起始帧对比姿态标签，将第一次作出的标准动作对应的终止帧姿态标签作为终止帧对比姿态标签；Step 1, make standard actions multiple times, decompose the standard actions made each time into attitude labels at each moment; select the attitude label at the initial moment as the starting frame attitude label, and select the attitude label at the termination moment as the ending frame attitude label; Take the standard action made for the first time as the comparison standard action, and the standard action made in the other times as the reference standard action; The corresponding termination frame pose label is used as the termination frame comparison pose label;

所述的将每次作出的标准动作分解为各个时刻下的姿态标签为根据权利要求1的方法得到的姿态标签；The described standard action that is made each time is decomposed into the attitude label under each moment is the attitude label obtained according to the method of claim 1;

步骤2，求初始帧相似度系数组，具体方法如下：Step 2, find the initial frame similarity coefficient group, the specific method is as follows:

分别计算多个参照标准动作的起始帧姿态标签与起始帧对比姿态标签的每个属性的相似度Sl1(A)_n，采用的公式如下：Calculate the similarity Sl1(A) _n of each attribute of the starting frame attitude label of multiple reference standard actions and the starting frame contrast attitude label respectively, and the formula used is as follows:

Sl1(A)_n＝A_n×z1_n÷ln(n∈Z，n∈[1，13],当n＝1,4,7,10,13时ln取5，其余取3)Sl1(A) _n = A _n ×z1 _n ÷ln (n∈Z, n∈[1, 13], when n=1, 4, 7, 10, 13, ln takes 5, and the rest take 3)

其中，A_n为初始化的相似度系数值，A_n＝1，n表示属性的序号，该序号从1至13分别为姿态标签的主体姿态标签GL_body、左前肢姿态标签GL_lf中的高度标签G1、左前肢姿态标签GL_lf中的方位标签G2、左前肢姿态标签GL_lf中的蜷曲标签G3、左后肢姿态标签GL_lb中的高度标签G1、左后肢姿态标签GL_lb中的方位标签G2、左后肢姿态标签GL_lb中的蜷曲标签G3、右前肢姿态标签GL_rf的高度标签G1、右前肢姿态标签GL_rf的方位标签G2、右前肢姿态标签GL_rf的蜷曲标签G3、右后肢姿态标签GL_rb的高度标签G1、右后肢姿态标签GL_rb的方位标签G2、右后肢姿态标签GL_rb的蜷曲标签G3；z1_n为参照标准动作的起始帧姿态标签与起始帧对比姿态标签的对应属性的差值的绝对值；Among them, An is the initialized similarity coefficient value, An = 1, _n _is the serial number of the attribute, the serial number from 1 to 13 is the body pose label GL _body of the pose label, and the height label in the left forelimb pose label GL _lf G1, orientation label G2 in left forelimb pose label GL _lf , curl label G3 in left forelimb pose label GL _lf , height label G1 in left hind limb pose label GL _lb , orientation label G2 in left hind limb pose label GL _lb , Curl tag G3 in left hind limb pose tag GL _lb , height tag G1 in right forelimb pose tag GL _rf , orientation tag G2 in right forelimb pose tag GL _rf , curl tag G3 in right forelimb pose tag GL _rf , right hind limb pose tag GL The height label G1 of _rb , the orientation label G2 of the right hind limb posture label GL _rb , the curl label G3 of the right hind limb posture label GL _rb ; z1 _n is the corresponding attribute of the posture label of the starting frame of the reference standard action and the starting frame comparison of the posture label the absolute value of the difference;

针对每个属性n，选取针对多个参照标准动作的起始帧姿态标签计算得到的相似度Sl1(A)_n中的次大值作为该属性下的相似度系数值A1_n。每个属性n对应的相似度系数值A1_n形成初始帧相似度系数组A_star＝{A1_n,n∈Z,n＝1,2,...,13}；For each attribute n, the next largest value in the similarity S11(A) _n calculated for the starting frame pose labels of multiple reference standard actions is selected as the similarity coefficient value A1 _n under the attribute. The similarity coefficient value A1 _n corresponding to each attribute n forms the initial frame similarity coefficient group A _star ={A1 _n ,n∈Z,n=1,2,...,13};

步骤3，求终止帧相似度系数组，具体方法如下：Step 3, find the similarity coefficient group of the termination frame, and the specific method is as follows:

分别计算多个参照标准动作的终止帧姿态标签与终止帧对比姿态标签的每个属性的相似度Sl2(A)_n，采用的公式如下:Calculate the similarity Sl2(A) _n of each attribute of the terminal frame attitude label of multiple reference standard actions and the terminal frame contrast attitude label respectively, and the formula used is as follows:

Sl2(A)_n＝A_n×z2_n÷ln(n∈Z，n∈[1，13],当n＝1,4,7,10,13时ln取5，其余取3)Sl2(A) _n = A _n ×z2 _n ÷ln (n∈Z, n∈[1, 13], when n=1, 4, 7, 10, 13, ln takes 5, the rest take 3)

其中，z2_n为参照标准动作的终止帧姿态标签与终止帧对比姿态标签的对应属性的差值的绝对值；Wherein, z2 _n is the absolute value of the difference between the corresponding attribute of the termination frame gesture tag of the reference standard action and the termination frame contrast gesture tag;

针对每个属性n，选取针对多个参照标准动作的终止帧姿态标签计算得到的相似度Sl2(A)_n中的次大值作为该属性下的相似度系数值A2_n；每个属性n对应的相似度系数值A2_n形成终止帧相似度系数组A_stop＝{A2_n,n∈Z,n＝1,2,...,13}；For each attribute n, select the second largest value in the similarity S12(A) _n calculated for the termination frame pose labels of multiple reference standard actions as the similarity coefficient value A2n under the attribute; each attribute _n corresponds to The similarity coefficient value of A2 _n forms the termination frame similarity coefficient group A _stop ={A2 _n ,n∈Z,n=1,2,...,13};

步骤4，针对多个标准动作按照步骤1-3的方法，得到每个标准动作对应的初始帧相似度系数组和终止帧相似度系数组，所有的标准动作对应的初始帧相似度系数组和终止帧相似度系数组形成动作模板库。Step 4, according to the method of steps 1-3 for a plurality of standard actions, obtain the initial frame similarity coefficient group and the termination frame similarity coefficient group corresponding to each standard action, the initial frame similarity coefficient group corresponding to all standard actions and The set of similarity coefficients of termination frames forms an action template library.

本发明还提供一种基于姿态标签的动作识别方法，包括以下步骤：The present invention also provides an action recognition method based on gesture tags, comprising the following steps:

步骤1，针对待识别动作，将待识别动作分解为每个时刻下的姿态标签；所述的将待识别动作分解为每个时刻下的姿态标签为根据权利要求1所述的方法得到的姿态标签；Step 1, for the action to be recognized, the action to be recognized is decomposed into gesture labels at each moment; the decomposed action to be recognized into gesture labels at each moment is the gesture obtained by the method according to claim 1 Label;

步骤2，选取动作模板库中的某一标准动作，计算步骤1得到的终止帧姿态标签,与选取的标准动作的终止帧姿态标签之间的每个属性的相似度SL(B)_n,记终止帧姿态标签为第t帧姿态标签，采用的公式如下：Step 2, select a certain standard action in the action template library, calculate the termination frame pose label obtained in step 1, and the similarity SL(B) _n of each attribute between the termination frame pose label of the selected standard action, denote The attitude label of the termination frame is the attitude label of the t-th frame, and the formula used is as follows:

SL(B)_n＝A1_n×z3_n÷ln(n∈Z，n∈[1，13],当n＝1,4,7,10,13时ln取5，其余取3)SL(B) _n = A1 _n ×z3 _n ÷ln (n∈Z, n∈[1, 13], when n=1, 4, 7, 10, 13, ln takes 5, the rest take 3)

其中，z3_n为步骤1得到的终止帧姿态标签与选取的标准动作的终止帧姿态标签的对应属性的差值的绝对值；Wherein, z3 _n is the absolute value of the difference between the corresponding attributes of the termination frame pose tag obtained in step 1 and the termination frame pose tag of the selected standard action;

计算步骤1得到的终止帧的姿态标签与选取的标准动作的终止帧的姿态标签之间的整体相似度S(B)，采用的公式如下：To calculate the overall similarity S(B) between the pose label of the termination frame obtained in step 1 and the pose label of the selected standard action termination frame, the formula used is as follows:

步骤3，若整体相似度S(B)大于设定阈值MAXBLUR，则返回步骤2；否则，执行步骤4；Step 3, if the overall similarity S(B) is greater than the set threshold MAXBLUR, then go back to Step 2; otherwise, go to Step 4;

步骤4，计算终止帧姿态标签的上一帧姿态标签与选取的标准动作的起始帧姿态标签之间的每个属性的相似度SL(C)_n,记上一帧姿态标签为第t-1帧姿态标签，采用的公式如下：Step 4: Calculate the similarity SL(C) _n of each attribute between the last frame pose label of the end frame pose label and the selected standard action starting frame pose label, and record the last frame pose label as the t-th 1 frame pose label, the formula used is as follows:

SL(C)_n＝A2_n×z4_n÷ln(n∈Z，n∈[1，13],当n＝1,4,7,10,13时ln取5，其余取3)SL(C) _n = A2 _n ×z4 _n ÷ln (n∈Z, n∈[1, 13], when n=1, 4, 7, 10, 13, ln takes 5, the rest take 3)

其中，z4_n为上一帧姿态标签与选取的标准动作的起始帧姿态标签的对应属性的差值的绝对值；Wherein, z4 _n is the absolute value of the difference between the previous frame pose label and the corresponding attribute of the selected standard action starting frame pose label;

计算上一帧姿态标签与选取的标准动作的起始帧姿态标签之间的整体相似度S(C)，采用的公式如下：To calculate the overall similarity S(C) between the pose label of the previous frame and the pose label of the selected standard action starting frame, the formula used is as follows:

步骤5，若整体相似度S(C)小于设定阈值MAXBLUR，则待识别动作与选取的标准动作一致；若整体相似度S(C)大于设定阈值MAXBLUR，则返回步骤4，将处理对象由第t-1帧姿态标签替换为第t-2帧姿态标签，直至处理对象为第一帧姿态标签时，得到整体相似度S(C)大于设定阈值MAXBLUR，则返回步骤2。Step 5, if the overall similarity S(C) is less than the set threshold MAXBLUR, then the action to be identified is consistent with the selected standard action; if the overall similarity S(C) is greater than the set threshold MAXBLUR, then return to step 4 to process the object. The t-1 frame pose label is replaced with the t-2 frame pose label, until the processing object is the first frame pose label, and the overall similarity S(C) is greater than the set threshold MAXBLUR, then return to step 2.

与现有技术相比，本发明具有以下技术效果：该方法将动作识别抽象为姿态识别，基于关键节点相对位置方法将姿态抽象为姿态标签，通过比对人类一定时间的姿态变化，找出人类所发出的动作；该方法降低了模板库的建立难度，大幅度减少了动作识别的速度及运算要求，提高动作识别对于识别个体的通用性。本方法在人机交互领域、虚拟现实领域、视频监控领域、运动特征分析领域有重要的应用价值。Compared with the prior art, the present invention has the following technical effects: the method abstracts action recognition as gesture recognition, abstracts gestures into gesture labels based on the relative position method of key nodes, and finds out human beings by comparing the posture changes of humans over a certain period of time. The method reduces the difficulty of establishing a template library, greatly reduces the speed and operation requirements of action recognition, and improves the versatility of action recognition for recognizing individuals. The method has important application value in the fields of human-computer interaction, virtual reality, video surveillance, and motion feature analysis.

下面结合附图和实施例对本发明的方案作进一步详细地解释和说明。The solution of the present invention will be further explained and described in detail below in conjunction with the accompanying drawings and embodiments.

附图说明Description of drawings

图1是本发明所采用的骨骼追踪设备坐标系示意图。FIG. 1 is a schematic diagram of the coordinate system of the bone tracking device used in the present invention.

图2是本发明获取的二十个骨骼关键节点位置示意图。FIG. 2 is a schematic diagram of the positions of twenty key skeleton nodes obtained by the present invention.

具体实施方式Detailed ways

本发明提供一种将动作分解为姿态标签的方法，包括以下步骤：The present invention provides a method for decomposing actions into gesture labels, comprising the following steps:

步骤1，采用骨骼追踪设备获取人体躯干动作的关键节点的位置数据，该关键节点的位置数据为骨骼追踪设备坐标系下的数据。其中，骨骼追踪设备可采用Kinect，采用Kinect按照一定的频率获取动作的关键节点数据，该关键节点的位置数据表示骨骼特定的二十个骨骼节点的位置，关键节点的节点名和序号如下表所示：Step 1, using a bone tracking device to obtain position data of a key node of the action of the human body torso, where the position data of the key node is data in the coordinate system of the bone tracking device. Among them, the bone tracking device can use Kinect, which uses Kinect to obtain the key node data of the action according to a certain frequency. The position data of the key node represents the position of the 20 bone nodes specific to the bone. The node name and serial number of the key node are shown in the following table. :

其中，骨骼追踪设备坐标系以设备摄像头为原点，摄像头正对方向为Z轴正方向，重力的反方向为Y轴正方向，摄像头左侧方向为X轴正方向，单位长度为1米。骨骼追踪设备坐标系为静态坐标系。Among them, the coordinate system of the skeleton tracking device takes the device camera as the origin, the forward direction of the camera is the positive direction of the Z axis, the opposite direction of gravity is the positive direction of the Y axis, the left direction of the camera is the positive direction of the X axis, and the unit length is 1 meter. The coordinate system of the bone tracking device is a static coordinate system.

步骤2，将步骤1得到的每个时刻下的关键节点的位置数据分别转化为形态学坐标系下的关键节点的位置数据；采用的公式如下：Step 2: Convert the position data of the key nodes at each moment obtained in step 1 into the position data of the key nodes under the morphological coordinate system respectively; the formula used is as follows:

其中，(x,y,z)＝(X-X_HC，Y-Y_HC，Z-Z_HC)表示步骤1得到的骨骼追踪设备坐标系下的任一关键节点NODE之间的向量的坐标，(X，Y，Z)表示关键节点NODE的位置数据，(X_HC，Y_HC，Z_HC)表示关键节点HIPCENTER的位置数据；α，β和γ分别为形态学坐标系中的各坐标轴相对于骨骼追踪设备坐标系的旋转角度。Among them, (x,y,z)=(XX _HC , YY _HC , ZZ _HC ) represents the coordinates of the vector between any key node NODE in the coordinate system of the bone tracking device obtained in step 1, (X, Y, Z ) represents the position data of the key node NODE, (X _HC , Y _HC , Z _HC ) represents the position data of the key node HIPCENTER; α, β and γ are the coordinates of each coordinate axis in the morphological coordinate system relative to the coordinate system of the bone tracking device the rotation angle.

则形态学坐标系下的关键节点的位置数据为(x',y',z')。Then the position data of the key nodes in the morphological coordinate system is (x', y', z').

该形态学坐标系以人体躯干的面对方向为Z轴正方向，以人体躯干的形态学上端方向为Y轴正方向，以人的左侧方向为X轴正方向，以关键节点HIP CENTER为原点。In this morphological coordinate system, the facing direction of the human torso is the positive direction of the Z-axis, the morphological upper end of the human torso is the positive direction of the Y-axis, the left direction of the human body is the positive direction of the X-axis, and the key node HIP CENTER is the positive direction of the X-axis. origin.

人体躯干的形态学上端，指的是以人的头部为起点，沿身体向下，向外延伸，较早到达的部位即为较晚到达的部位的形态学上端。例如，人在立正站立时，双手自然下垂，左肩，左肘，左手三个部位的状态如下：左肩是左肘的形态学上端，左肘是左手的形态学上端。The morphological upper end of the human torso refers to the morphological upper end of the human body starting from the head and extending downward and outward along the body. For example, when a person is standing upright, his hands are naturally drooping, and the three parts of the left shoulder, left elbow, and left hand are as follows: the left shoulder is the morphological upper end of the left elbow, and the left elbow is the morphological upper end of the left hand.

步骤3，求每个时刻下的主体姿态标签GL_body、左前肢姿态标签GL_lf、右前肢姿态标签GL_rf、左后肢姿态标签GL_lb和右后肢姿态标签GL_rb。Step 3: Find the main body pose label GL _body , the left fore limb pose label GL _lf , the right fore limb pose label GL _rf , the left hind limb pose label GL _lb and the right hind limb pose label GL _rb at each moment.

具体地，在又一实施例中，步骤2中的人体躯干的面对方向和人体躯干的形态学上端方向的确定方法如下：Specifically, in yet another embodiment, the method for determining the facing direction of the human torso and the morphological upper end direction of the human torso in step 2 is as follows:

步骤1中得到的的关键节点SHOULDER RIGHT的位置数据为(X_SR，Y_SR，Z_SR)、关键节点SHOULDER LEFT的位置数据为(X_SL，Y_SL，Z_SL)、关键节点HIP CENTER的位置数据为(X_HC，Y_HC，Z_HC)，三个关键节点可以确定一个平面，该平面即为人体躯干所在平面。The position data of the key node SHOULDER RIGHT obtained in step 1 is (X _SR , Y _SR , Z _SR ), the position data of the key node SHOULDER LEFT is (X _SL , Y _SL , Z _SL ), the position of the key node HIP CENTER The data is (X _HC , Y _HC , Z _HC ), and the three key nodes can determine a plane, which is the plane where the human body is located.

人体躯干所在平面的法线向量

其中，The normal vector of the plane where the human torso lies

in,

求关键节点HEAD与关键节点HIP CENTER在骨骼追踪设备坐标系下的向量

Find the vector of the key node HEAD and the key node HIP CENTER in the coordinate system of the bone tracking device

由于Kinect设备头部总是向前倾斜，将

乘以

若取值为正，

取正号，若取值为负，

取负号。

的方向即为人体躯干面对方向，

的方向即为人体躯干的形态学上端方向。Since the head of the Kinect device is always tilted forward, the

multiply by

If the value is positive,

Take a positive sign, if the value is negative,

Take a negative sign.

The direction is the direction of the human torso facing,

The direction is the morphological upper direction of the human torso.

具体地，主体姿态标签GL_body的求取方法如下：Specifically, the method for obtaining the body pose label GL _body is as follows:

令单位向量

Let the unit vector

因F为单位向量，则

X_F，Y_F和Z_F中一个为0，且另外两个的值相等时，求得相等的两个值为

则X_F，Y_F和Z_F中的最大值大于

Since F is a unit vector, then

When one of X _F , Y _F and Z _F is 0, and the other two are equal, the two equal values are obtained

then the maximum value of X _F , Y _F and Z _F is greater than

左前肢姿态标签GL_lf、右前肢姿态标签GL_rf、左后肢姿态标签GL_lb和右后肢姿态标签GL_rb的求取方法如下：The methods for obtaining the left forelimb pose label GL _lf , the right fore limb pose label GL _rf , the left hind limb pose label GL _lb and the right hind limb pose label GL _rb are as follows:

上述四种姿态标签均包括三个关键节点，记为关键节点1、关键节点2和关键节点3，对于左前肢姿态标签GL_lf包括的三个关键节点分别为ELBOW LEFT、WRIST LEFT和HANDLEFT，对于右前肢姿态标签GL_rf包括的三个关键节点分别为KNIEE LEFT、ANIKLE LEFT和FOOT LEFT，对于左后肢姿态标签GL_lb包括的三个关键节点分别为ELBOW LEFT、WRIST LEFT和HAND LEFT，右后肢姿态标签GL_rb包括的三个关键节点分别为KNIEE LEFT、ANIKLE LEFT和FOOT LEFT。The above four attitude labels all include three key nodes, denoted as key node 1, key node 2 and key node 3. For the left forelimb pose label GL _lf , the three key nodes included are ELBOW LEFT, WRIST LEFT and HANDLEFT, respectively. The three key nodes included in the right forelimb pose label GL _rf are KNIEE LEFT, ANIKLE LEFT and FOOT LEFT respectively, and the three key nodes included in the left hindlimb pose label GL _lb are ELBOW LEFT, WRIST LEFT and HAND LEFT respectively, the right hindlimb pose The three key nodes included in the label GL _rb are KNIEE LEFT, ANIKLE LEFT and FOOT LEFT.

该三个关键节点在形态学坐标系下的数据分别用(X₁,Y₁,Z₁)(X₂,Y₂,Z₂)(X₃,Y₃,Z₃)表示；上述四种姿态标签均包括高度标签G1、方位标签G2和蜷曲标签G3。The data of the three key nodes in the morphological coordinate system are respectively represented by (X ₁ , Y ₁ , Z ₁ ) (X ₂ , Y ₂ , Z ₂ ) (X ₃ , Y ₃ , Z ₃ ); the above four The attitude labels all include a height label G1, an orientation label G2 and a curl label G3.

G1＝(g₁+g₂+g₃)/3取整，G1的取值越小，说明该部位所处的位置离形态学的上端越近。其中：G1=(g ₁ +g ₂ +g ₃ )/3 rounded up, the smaller the value of G1, the closer the position of the part is to the upper end of the morphology. in:

其中，n＝1,2,3，Y_H为关键节点HEAD的形态学坐标系下的Y轴坐标，Y_HC为关键节点SHOULDER CENTER的形态学坐标系下的Y轴坐标，且Y_H＞Y_HC。Among them, n=1, 2, 3, Y _H is the Y-axis coordinate in the morphological coordinate system of the key node HEAD, Y _HC is the Y-axis coordinate in the morphological coordinate system of the key node SHOULDER CENTER, and Y _H > Y _hc .

根据关键节点1、关键节点2和关键节点3，引入关键节点4，分别计算关键节点1、关键节点2和关键节点3，与关键节点4之间的距离D₁，D₂，D₃。对于左前肢姿态标签GL_lf，关键节点4为SHOULDER LEFT，对于右前肢姿态标签GL_rf，关键节点4为SHOULDER RIGHT，对于左后肢姿态标签GL_lb，关键节点4为HIP LEFT，对于右后肢姿态标签GL_rb，关键节点4为HIP RIGHT。According to the key node 1, the key node 2 and the key node 3, the key node 4 is introduced, and the distances D ₁ , D ₂ , D ₃ between the key node 1, the key node 2 and the key node 3 and the key node 4 are calculated respectively. For the left forelimb pose label GL _lf , the key node 4 is SHOULDER LEFT, for the right fore limb pose label GL _rf , the key node 4 is SHOULDER RIGHT, for the left hind limb pose label GL _lb , the key node 4 is HIP LEFT, for the right hind limb pose label GL _rb , the key node 4 is HIP RIGHT.

本发明的另一个方面提供一种求取动作模板库的方法，包括以下步骤：Another aspect of the present invention provides a method for obtaining an action template library, comprising the following steps:

步骤1，多次作出标准动作，按照上述将动作分解为姿态标签的方法，将每次作出的标准动作分解为各个时刻下的姿态标签；选取初始时刻的姿态标签为起始帧姿态标签，选取终止时刻的姿态标签为终止帧姿态标签。将第一次作出的标准动作为对比标准动作，其他次作出的标准动作为参照标准动作。对比标准动作对应的起始帧姿态标签作为起始帧对比姿态标签，将第一次作出的标准动作对应的终止帧姿态标签作为终止帧对比姿态标签。Step 1, make standard actions multiple times, and decompose the standard actions made each time into pose labels at each moment according to the above-mentioned method of decomposing actions into posture labels; select the posture label at the initial moment as the initial frame posture label, and select The pose label at the end time is the end frame pose label. The standard action performed the first time is the comparison standard action, and the standard action performed the other time is the reference standard action. The starting frame pose label corresponding to the standard action is compared as the starting frame comparing pose label, and the ending frame pose label corresponding to the standard action performed for the first time is used as the ending frame comparing pose label.

Sl1(A)_n＝A_n×z1_n÷ln(n∈Z，n∈[1，13],当n＝1,4,7,10,13时ln取5，其余取3) (6)Sl1(A) _n =A _n ×z1 _n ÷ln(n∈Z,n∈[1,13], when n=1,4,7,10,13, ln takes 5, the rest take 3) (6)

其中，A_n为初始化的相似度系数值，A_n＝1，n表示属性的序号，该序号从1至13分别为姿态标签的主体姿态标签GL_body、左前肢姿态标签GL_lf中的高度标签G1、左前肢姿态标签GL_lf中的方位标签G2、左前肢姿态标签GL_lf中的蜷曲标签G3、左后肢姿态标签GL_lb中的高度标签G1、左后肢姿态标签GL_lb中的方位标签G2、左后肢姿态标签GL_lb中的蜷曲标签G3、右前肢姿态标签GL_rf的高度标签G1、右前肢姿态标签GL_rf的方位标签G2、右前肢姿态标签GL_rf的蜷曲标签G3、右后肢姿态标签GL_rb的高度标签G1、右后肢姿态标签GL_rb的方位标签G2、右后肢姿态标签GL_rb的蜷曲标签G3；z1_n为参照标准动作的起始帧姿态标签与起始帧对比姿态标签的对应属性的差值的绝对值，如z1₁为参照标准动作的起始帧姿态标签中主体姿态标签GL_body与起始帧对比姿态标签的主体姿态标签GL_body的差值的绝对值。Among them, An is the initialized similarity coefficient value, An = 1, _n _is the serial number of the attribute, the serial number from 1 to 13 is the body pose label GL _body of the pose label, and the height label in the left forelimb pose label GL _lf G1, orientation label G2 in left forelimb pose label GL _lf , curl label G3 in left forelimb pose label GL _lf , height label G1 in left hind limb pose label GL _lb , orientation label G2 in left hind limb pose label GL _lb , Curl tag G3 in left hind limb pose tag GL _lb , height tag G1 in right forelimb pose tag GL _rf , orientation tag G2 in right forelimb pose tag GL _rf , curl tag G3 in right forelimb pose tag GL _rf , right hind limb pose tag GL The height label G1 of _rb , the orientation label G2 of the right hind limb posture label GL _rb , the curl label G3 of the right hind limb posture label GL _rb ; z1 _n is the corresponding attribute of the posture label of the starting frame of the reference standard action and the starting frame comparison of the posture label The absolute value of the difference, such as z1 ₁ is the absolute value of the difference between the main gesture label GL _body in the initial frame gesture label of the reference standard action and the main gesture label GL _body in the initial frame contrast gesture label.

针对每个属性n，选取针对多个参照标准动作的起始帧姿态标签计算得到的相似度Sl1(A)_n中的次大值作为该属性下的相似度系数值A1_n。每个属性n对应的相似度系数值A1_n形成初始帧相似度系数组A_star＝{A1_n,n∈Z,n＝1,2,...,13}。For each attribute n, the next largest value in the similarity S11(A) _n calculated for the starting frame pose labels of multiple reference standard actions is selected as the similarity coefficient value A1 _n under the attribute. The similarity coefficient value A1 _n corresponding to each attribute n forms an initial frame similarity coefficient group A _star ={A1 _n ,n∈Z,n=1,2,...,13}.

分别计算多个参照标准动作的终止帧姿态标签与终止帧对比姿态标签的每个属性的相似度Sl2(A)_n，采用的公式如下；Calculate the similarity S12(A) _n of each attribute of the termination frame pose labels of multiple reference standard actions and the termination frame contrast pose labels respectively, and the formula used is as follows;

Sl2(A)_n＝A_n×z2_n÷ln(n∈Z，n∈[1，13],当n＝1,4,7,10,13时ln取5，其余取3) (7)Sl2(A) _n =A _n ×z2 _n ÷ln(n∈Z,n∈[1,13], when n=1,4,7,10,13, ln takes 5, the rest take 3) (7)

其中，z2_n为参照标准动作的终止帧姿态标签与终止帧对比姿态标签的对应属性的差值的绝对值，如z2₁为参照标准动作的终止帧姿态标签中主体姿态标签GL_body与终止帧对比姿态标签的主体姿态标签GL_body的差值的绝对值。Among them, z2 _n is the absolute value of the difference between the attitude label of the termination frame of the reference standard action and the corresponding attribute of the termination frame compared with the attitude label, for example, z2 ₁ is the reference to the termination frame of the standard action. The pose label GL _body and the termination frame The absolute value of the difference of the body pose label GL _body against the pose label.

针对每个属性n，选取针对多个参照标准动作的终止帧姿态标签计算得到的相似度Sl2(A)_n中的次大值作为该属性下的相似度系数值A2_n。每个属性n对应的相似度系数值A2_n形成终止帧相似度系数组A_stop＝{A2_n,n∈Z,n＝1,2,...,13}。For each attribute n, the next largest value in the similarity S12(A) _n calculated for the terminal frame pose labels of multiple reference standard actions is selected as the similarity coefficient value A2 _n under the attribute. The similarity coefficient value A2 _n corresponding to each attribute n forms a termination frame similarity coefficient group A _stop ={A2 _n ,n∈Z,n=1,2,...,13}.

本发明的第三个方面提供一种动作识别方法，包括以下步骤：A third aspect of the present invention provides an action recognition method, comprising the following steps:

步骤1，针对待识别动作，按照上述将动作分解为姿态标签的方法，将待识别动作分解为每个时刻下的姿态标签。Step 1: For the action to be recognized, the action to be recognized is decomposed into the gesture label at each moment according to the above method of decomposing the action into gesture labels.

(8) (8)

其中，z3_n为步骤1得到的终止帧姿态标签与选取的标准动作的终止帧姿态标签的对应属性的差值的绝对值。Wherein, z3 _n is the absolute value of the difference between the corresponding attributes of the termination frame pose tag obtained in step 1 and the termination frame pose tag of the selected standard action.

SL(C)_n＝A2_n×z4_n÷ln(n∈Z，n∈[1，13],当n＝1,4,7,10,13时ln取5，其余取3) (10)SL(C) _n = A2 _n ×z4 _n ÷ln(n∈Z, n∈[1, 13], when n=1, 4, 7, 10, 13, ln takes 5, the rest take 3) (10)

其中，z4_n为上一帧姿态标签与选取的标准动作的起始帧姿态标签的对应属性的差值的绝对值。Among them, z4 _n is the absolute value of the difference between the previous frame pose label and the corresponding attribute of the selected standard action starting frame pose label.

步骤5，若整体相似度S(C)小于设定阈值MAXBLUR，则待识别动作与选取的标准动作一致；若整体相似度S(C)大于设定阈值MAXBLUR，则返回步骤4，将处理对象由第t-1帧姿态标签替换为第t-2帧姿态标签，直至处理对象为第一帧姿态标签时，得到整体相似度S(C)大于设定阈值MAXBLUR，则返回步骤2。MAXBLUR表示动作匹配算法的模糊程度，取值为0.25-0.05。Step 5, if the overall similarity S(C) is less than the set threshold MAXBLUR, then the action to be identified is consistent with the selected standard action; if the overall similarity S(C) is greater than the set threshold MAXBLUR, then return to step 4 to process the object. The t-1 frame pose label is replaced with the t-2 frame pose label, until the processing object is the first frame pose label, and the overall similarity S(C) is greater than the set threshold MAXBLUR, then return to step 2. MAXBLUR represents the fuzzy degree of the action matching algorithm, and the value is 0.25-0.05.

实施例Example

采用传统方法进行动作识别：Action recognition using traditional methods:

使用设备为单台Kinect，识别动作为右手敬礼时，使用传统方法建立模板库，测试者a身高173cm，体重60kg，测试者b身高191cm，体重100kg，测试者c身高181cm，体重80kg。前50个样本为测试者a录入，样本序号为51-80的样本为测试者b录入，每次样本录入大概需要时间2分钟，录入时选定录入的动作，由录入者站于设备前1.5米处做出右手敬礼动作，样本库与测试点为Kinect全20个骨骼节点A single Kinect is used, and when the recognition action is a right-hand salute, a template library is established using the traditional method. Tester a is 173cm tall and weighs 60kg, tester b is 191cm tall and weighs 100kg, and tester c is 181cm tall and weighs 80kg. The first 50 samples are entered by tester a, and the samples with sample serial numbers 51-80 are entered by tester b. It takes about 2 minutes for each sample to be entered. The action to be entered is selected during the entry, and the entryr stands in front of the device 1.5 Mi Chu makes a right-hand salute, the sample library and test point are all 20 skeleton nodes of Kinect

测试时，规定测试者站于设备1.5米前，尽可能做出符合标准动作，每次录入新样本优化模板库时，每人做右手敬礼动作十次，统计识别结果。识别结果统计如表1所示：During the test, the testers are required to stand 1.5 meters in front of the equipment and do their best to comply with the standard actions. Every time a new sample is entered to optimize the template library, each person does a right-hand salute ten times, and the recognition results are counted. The statistics of the recognition results are shown in Table 1:

表1Table 1

样本数Number of samples 2020 3030 4040 5050 6060 7070 8080 测试者atester a 70％70% 80％80% 90％90% 100％100% 90％90% 80％80% 80％80% 测试者btester b 0％0% 30％30% 30％30% 30％30% 70％70% 70％70% 90％90% 测试者ctester c 30％30% 40％40% 40％40% 40％40% 50％50% 50％50% 50％50%

由测试结果可以看出：测试者a即作为测试者，又作为样本录入者时，随着录入次数增加，识别成功率明显增加，最终识别成功率在样本数为50时达到100％，而其他测试者的成功率基本不变，当改以测试者b作为样本录入者时，测试者b成功率有很大提高，测试者a成功率反而有所降低。测试者c因为未参与录入，识别成功率较低，不过随着样本数增加，成功率有所增加。测试1总计耗时4小时20分。It can be seen from the test results that when tester a is both a tester and a sample entryr, with the increase of the number of entries, the recognition success rate increases significantly, and the final recognition success rate reaches 100% when the number of samples is 50, while the other The success rate of the tester is basically unchanged. When the tester b is used as the sample entry person, the success rate of the tester b is greatly improved, but the success rate of the tester a is reduced. Because tester c did not participate in the entry, the recognition success rate was low, but as the number of samples increased, the success rate increased. Test 1 took a total of 4 hours and 20 minutes.

使用本发明的方法进行动作识别：Action recognition using the method of the present invention:

使用设备为单台Kinect，识别动作为右手敬礼、双手挥动，使用本方法建立姿态标签库，动作-姿态库，使用全部20个节点，建立模板库共计耗时30分钟，包括六个动作：站立、左手高举、右手高举、双手高举、左臂敬礼、右臂敬礼。其中右手高举与右臂敬礼动作相近，双手高举为三姿态的复杂动作，其动作既符合左手高举要求，又符合右手高举要求，加入用于增加本次测试难度。测试者a身高173cm，体重60kg，测试者b身高191cm，体重100kg，测试者c身高181cm，体重80kg，与测试1一致。The equipment is a single Kinect, and the recognized actions are salute with right hand and waving with both hands. This method is used to build gesture label library, action-pose library, all 20 nodes are used, and it takes 30 minutes to build template library, including six actions: standing , left hand raised, right hand raised, both hands raised, left arm salute, right arm salute. Among them, the raising of the right hand is similar to the salute of the right arm, and the raising of both hands is a complex movement of three postures. Tester a is 173cm tall and weighs 60kg, tester b is 191cm tall and weighs 100kg, tester c is 181cm tall and weighs 80kg, which is consistent with test 1.

测试时，规定测试者站于设备1.5米前，尽可能做出符合标准动作，每人做右手敬礼、双手挥舞动作各十次，整个测试没有更新模板库，所以不需要每人做多轮动作，统计识别结果。识别结果统计如表2所示：During the test, the testers are required to stand 1.5 meters in front of the equipment and perform standard movements as much as possible. Each person salutes with his right hand and waves his hands ten times. The template library is not updated throughout the test, so there is no need for each person to do multiple rounds of movements. , the statistical identification results. The statistics of the recognition results are shown in Table 2:

表2Table 2

其中测试者c一次右臂敬礼动作被错误识别为右手高举，一次双手高举动作被错误识别为右手高举，与动作-姿态库中有关动作设定有关。Among them, tester c was mistakenly identified as a right-hand high raise for a salute with his right arm, and a right-hand high-raise was wrongly recognized as a high-handed gesture, which was related to the relevant action settings in the action-stance library.

整体测试比测试1识别成功率高很多，且对于三个不同体型测试者来说，都有良好的成功率。整体测试下来，共计耗时1小时10分，且识别动作相较测试1更加丰富，难度更大。The overall test had a much higher recognition success rate than Test 1, and had good success rates for all three testers of different body sizes. The overall test took 1 hour and 10 minutes in total, and the recognition actions were richer and more difficult than Test 1.

说明本方法对于测试者拥有良好的通用性，模板库的录入(设计)更加简单方便。It shows that this method has good versatility for testers, and the input (design) of the template library is simpler and more convenient.

Claims

1. A method of decomposing an action into gesture tags, comprising the steps of:

step 1, adopting skeleton tracking equipment to obtain position data of key nodes of human trunk actions at each moment, wherein the position data of the key nodes are data in a skeleton tracking equipment coordinate system; the key nodes at least comprise a key node HEAD, a key node SHOULDER CENTER, a key node SPINE, a key node HIP CENTER, a key node SHOULDE RIGHT, a key node SHOULDER LEFT, a key node ELBOW RIGHT, a key node ELBOW LEFT, a key node WRIST RIGHT, a key node WRIST LEFT, a key node HAND RIGHT, a key node HANDLEFT, a key node HIPRIGHT, a key node HIPLEFT, a key node KNIEE RIGHT, a key node KNIEE LEFT, a key node ANIKLE RIGHT, a key node ANIKLE LEFT, a key node FOOT RIGHT and a key node FOOT LEFT;

step 2, respectively converting the position data of the key nodes at each moment obtained in the step 1 into position data of the key nodes in a morphological coordinate system; the morphological coordinate system takes the facing direction of the human body trunk as the positive direction of a Z axis, the direction of the morphological upper end of the human body trunk as the positive direction of a Y axis, the left direction of the human body as the positive direction of an X axis and the key node HIP CENTER as the origin;

step 3, respectively obtaining attitude labels at each moment by using the position data of the key nodes under the morphological coordinate system at each moment, which are obtained in the step 2, wherein the attitude labels comprise main body attitude labels GL_bodyLeft forelimb posture label GL_1fRight front limb posture label GL_rfLeft hind limb posture label GL_1bAnd right hind limb posture label GL_rb；

The subject posture label GL in the step 3_bodyThe method for obtaining (1) is as follows:

selecting X_F，Y_FAnd Z_FFinding the coordinate value with the maximum absolute value, and finding the GL corresponding to the interval to which the coordinate value belongs_bodyIs the subject posture label GL_bodyThe following formula is adopted:

wherein, X_F，Y_FAnd Z_FCoordinates of 3 coordinate axes of the unit vector F respectively; unit vector

A vector formed by the key node HEAD and the key node HIP CENTER in the skeleton tracking device coordinate system;

the left forelimb posture label GL in the step 3_1fRight front limb posture label GL_rfLeft hind limb posture label GL_1bAnd right hind limb posture label GL_rbThe method for obtaining (1) is as follows:

the four posture labels all comprise three key nodes which are marked as a key node 1, a key node 2 and a key node 3, and the posture label GL of the left forelimb_1fThree key nodes included are ELBOW LEFT, WRIST LEFT and HAND LEFT, respectively, for the right forelimb posture label GL_rfThree key nodes included are KNIEE LEFT, ANIKLE LEFT and FOOT LEFT, respectively, for LEFT hind limb posture label GL_1bThe three key nodes are ELBOW LEFT, WRIST LEFT and HAND LEFT, right hind limb posture label GL_rbThe three key nodes are KNIEE LEFT, ANIKLE LEFT and FOOT LEFT respectively;

the data of the three key nodes in the morphological coordinate system are respectively used as (X)₁，Y₁，Z₁)(X₂，Y₂，Z₂)(X₃，Y₃，Z₃) Represents; the four attitude tags each include height tag G1, orientation tag G2, and curl tag G3;

the height label G1 is obtained by the following method:

G1＝(g₁+g₂+g₃) And/3, rounding, wherein,

wherein n is 1,2,3, Y_HIs a Y-axis coordinate in the morphological coordinate system of the key node HEAD_HCThe Y-axis coordinate of the key node SHOULDER CENTER under the morphological coordinate system;

the orientation label G2 is obtained as follows:

counting the symbols of the X-axis coordinate and the Z-axis coordinate of the key node 1, the key node 2 and the key node 3, and solving an orientation label G2 by adopting the following formula:

the method for obtaining the crimp label G3 is as follows:

introducing a key node 4 according to the key node 1, the key node 2 and the key node 3, respectively calculating the distances D between the key node 1, the key node 2 and the key node 3 and the key node 4₁，D₂，D₃(ii) a For left forelimb posture label GL_1fThe key node 4 is SHOULDER LEFT, for the right forelimb posture label GL_rfThe key node 4 is SHOULDER RIGHT, for left hind limb posture label GL_1bThe key node 4 is HIPLEFT, and the right hind limb posture label GL_rbThe key node 4 is HIPRIGHT;

the value of the curl label G3 is given by the following formula:

2. a method for obtaining an action template library, comprising the steps of:

step 1, making standard actions for multiple times, and decomposing the standard actions made each time into attitude tags at each moment; selecting the attitude tag at the initial moment as an initial frame attitude tag, and selecting the attitude tag at the termination moment as a termination frame attitude tag; taking the standard action made for the first time as a comparison standard action, and taking the standard actions made for other times as reference standard actions; comparing the initial frame posture label corresponding to the standard action as an initial frame comparison posture label, and using the termination frame posture label corresponding to the first made standard action as a termination frame comparison posture label;

the standard action made each time is decomposed into attitude tags at various moments, wherein the attitude tags are obtained according to the method of claim 1;

step 2, solving an initial frame similarity coefficient group, wherein the specific method is as follows:

respectively calculating the similarity sl1(A) of each attribute of the start frame attitude tag and the start frame contrast attitude tags of a plurality of reference standard actions_nThe formula used is as follows:

Sl1(A)_n＝A_n×z1_n÷ln(n∈Z，n∈[1，13]when n is 1,4,7,10,13, ln is 5, and the rest is 3)

Wherein A is_nFor initialized similarity coefficient values, A_nN denotes the serial number of the attribute, and the serial numbers 1 to 13 are the body posture labels GL of the posture labels, respectively_bodyLeft forelimb posture label GL_1fHeight label G1 in (1), left forelimb posture label GL_1fMiddle orientation label G2, left forelimb posture label GL_1fMiddle curl tag G3, left hind limb posture tag GL_1bHeight label G1 in, left hind limb posture label GL_1bMiddle orientation label G2, left hind limb posture label GL_1bMiddle curl label G3, right front limb posture label GL_rfHeight label G1, right front limb posture label GL_rfOrientation label G2, right forelimb posture label GL_rfThe curl label G3, the right hind limb posture label GL_rbHeight label G1, right hind limb posture label GL_rbOrientation label G2, right hind limb posture label GL_rbCrimp label G3; z1_nThe absolute value of the difference value of the corresponding attributes of the initial frame attitude tag and the initial frame contrast attitude tag of the reference standard action;

for each attribute n, selecting similarity Sl1(A) calculated according to the starting frame attitude labels of a plurality of reference standard actions_nThe second largest value in (b) is taken as the similarity coefficient value A1 under the attribute_nThe similarity coefficient value A1 corresponding to each attribute n_nForming an initial frame similarity coefficient set A_star＝{A1_n，n∈Z，n＝1，2，...，13}；

And step 3, solving a similarity coefficient group of the termination frame, wherein the specific method comprises the following steps:

respectively calculating the similarity Sl2(A) of each attribute of the terminal frame attitude tag and the terminal frame contrast attitude tags of a plurality of reference standard actions_nThe formula used is as follows:

Sl2(A)_n＝A_n×z2_n÷ln(n∈Z，n∈[1，13]when n is 1,4,7,10,13, ln is 5, and the rest is 3)

Wherein, z2_nThe absolute value of the difference between the corresponding attributes of the terminating frame pose tag and the terminating frame contrast pose tag for the reference standard action;

selecting similarity Sl2(A) calculated according to the attitude labels of the termination frames of a plurality of reference standard actions for each attribute n_nThe second largest value in (b) is taken as the similarity coefficient value A2 under the attribute_n(ii) a The similarity coefficient value A2 corresponding to each attribute n_nForming a group A of termination frame similarity coefficients_stop＝{A2_n，n∈Z，n＝1，2，...，13}；

And 4, aiming at a plurality of standard actions, obtaining an initial frame similarity coefficient group and an ending frame similarity coefficient group corresponding to each standard action according to the method in the step 1-3, wherein the initial frame similarity coefficient group and the ending frame similarity coefficient group corresponding to all the standard actions form an action template library.

3. A motion recognition method based on attitude tags is characterized by comprising the following steps:

step 1, decomposing the action to be recognized into attitude tags at each moment aiming at the action to be recognized; the gesture label for decomposing the action to be recognized into each moment is obtained according to the method of claim 1;

step 2, selecting a certain standard action in the action template library, calculating the similarity SL (B) n of each attribute between the termination frame attitude tag obtained in the step 1 and the termination frame attitude tag of the selected standard action, and recording the termination frame attitude tag as a tth frame attitude tag, wherein the adopted formula is as follows:

SL(B)_n＝A1_n×z3_n÷ln(n∈Z，n∈[1，13]when n is 1,4,7,10,13, ln is 5, and the rest is 3)

Wherein, z3_nSelecting similarity Sl1(A) calculated for a plurality of initial frame attitude tags of reference standard actions for the absolute value of the difference between the attributes corresponding to the end frame attitude tag obtained in step 1 and the end frame attitude tag of the selected standard action_nThe second largest value in (b) is taken as the similarity coefficient value A1 under the attribute_n；

Calculating the overall similarity S (B) between the attitude tag of the termination frame obtained in the step 1 and the attitude tag of the termination frame of the selected standard action, wherein the formula is as follows:

step 3, if the overall similarity S (B) is greater than the set threshold MAXBLUR, returning to the step 2; otherwise, executing step 4;

step 4, calculating the similarity SL (C) of each attribute between the previous frame attitude tag of the ending frame attitude tag and the starting frame attitude tag of the selected standard action_nAnd recording a frame attitude tag as a t-1 frame attitude tag, wherein the adopted formula is as follows:

SL(C)_n＝A2_n×z4_n÷ln(n∈Z，n∈[1，13]when n is 1,4,7,10,13, ln is 5, and the rest is 3)

Wherein, z4_nSelecting similarity Sl2(A) calculated for a plurality of end frame attitude tags of reference standard actions for the absolute value of the difference between the corresponding attributes of the previous frame attitude tag and the start frame attitude tag of the selected standard action_nThe second largest value in (b) is taken as the similarity coefficient value A2 under the attribute_n；

Calculating the overall similarity S (C) between the pose label of the last frame and the pose label of the initial frame of the selected standard action by the following formula:

step 5, if the overall similarity S (C) is less than a set threshold MAXBLUR, the action to be recognized is consistent with the selected standard action; and if the overall similarity S (C) is greater than the set threshold MAXBLUR, returning to the step 4, replacing the posture label of the t-1 frame of the processed object with the posture label of the t-2 frame of the processed object until the overall similarity S (C) is greater than the set threshold MAXBLUR when the processed object is the posture label of the first frame, and returning to the step 2.