CN106529441A

CN106529441A - Fuzzy boundary fragmentation-based depth motion map human body action recognition method

Info

Publication number: CN106529441A
Application number: CN201610949051.8A
Authority: CN
Inventors: 蒋敏; 金科; 孔军; 昝宝锋; 胡珂杰; 徐海洋; 刘天山
Original assignee: Jiangnan University
Current assignee: Huirong Electronic System Engineering Ltd
Priority date: 2016-10-26
Filing date: 2016-10-26
Publication date: 2017-03-22
Anticipated expiration: 2036-10-26
Also published as: CN106529441B

Abstract

The invention discloses a human behavior recognition method based on fuzzy boundary slicing in depth motion graphs. The model training method includes the following steps: slice the video depth map sequence and determine the blurred boundary of the slice according to the blur parameter α; calculate the depth motion map DMM of the main view, left view and top view for each sub-sequence after fragmentation ; use interpolation to convert these depth motion maps to a fixed size and normalize them; concatenate the normalized depth motion maps DMM of each video sequence subsequence to obtain the feature vector of the video sequence; use probabilistic collaborative representation The classifier R‑ProCRC classifies the features and finally realizes human action recognition. The human body behavior recognition method disclosed by the invention effectively captures the change rule of the time domain feature, enhances the anti-interference ability of the behavior feature to the time domain difference, and can realize the robust recognition of the human body behavior.

Description

Human action recognition method based on fuzzy boundary slices in deep motion map

技术领域：Technical field:

本发明属于机器视觉领域，特别涉及一种基于模糊边界分片的深度动作图人体行为识别方法。The invention belongs to the field of machine vision, and in particular relates to a human behavior recognition method based on a fuzzy boundary slice in a deep motion map.

背景技术：Background technique:

人体行为识别技术是通过处理人体行为的视频序列，从中提取行为特征，并通过这些特征识别动作的一种技术。Human behavior recognition technology is a technology that extracts behavioral features from video sequences of human behaviors and recognizes actions through these features.

在机器视觉和模式识别领域，人体行为识别现在已经成为一个非常活跃的分支。这种技术在人机交互领域有着许多潜在的应用，包括视频分析、监控系统，智能机器人等。前几年，人体行为识别方向的研究主要基于彩色摄像头采集的图像帧序列[1][2].这种数据有着固有的缺陷，它们对于光照、遮挡以及复杂的背景十分敏感，影响识别精度。如今深度摄像头得到了广泛的关注，这种传感器能够提供物体的3D结构信息，包括人体的形状以及动作信息。目前提取深度的特征方法主要有两个分支：基于估计的3D骨骼点的方法[3]和基于原始深度图像的方法。但是这种关节点估算有时会出现错误，特别是对于人体部分遮挡或者背景复杂的情况。C.Chen等[4]通过计算原始深度图像的深度动作图(DMM,DepthMotion Maps)，并通过PCA降维后利用像素点作为特征得到了相对较好的结果。但这种方法将所有视频帧累加在一张DMM上，丢失了动作的时间信息。In the field of machine vision and pattern recognition, human behavior recognition has now become a very active branch. This technology has many potential applications in the field of human-computer interaction, including video analysis, surveillance systems, intelligent robots, etc. In the past few years, research on human behavior recognition was mainly based on image frame sequences collected by color cameras[1][2]. This kind of data has inherent defects, and they are very sensitive to illumination, occlusion, and complex backgrounds, which affect recognition accuracy. Nowadays, depth cameras have received widespread attention. This sensor can provide 3D structural information of objects, including the shape and motion information of the human body. At present, there are two main branches of the feature extraction method of depth: the method based on the estimated 3D skeleton point [3] and the method based on the original depth image. However, this kind of joint point estimation can sometimes be wrong, especially for cases where the human body is partially occluded or the background is complex. C. Chen et al. [4] obtained relatively good results by calculating the depth motion map (DMM, DepthMotion Maps) of the original depth image, and using pixels as features after PCA dimensionality reduction. However, this method accumulates all video frames on one DMM and loses the time information of the action.

本发明针对上述人体行为识别存在的问题，提出了一种基于模糊边界分片的深度动作图的人体行为识别方法，不仅吸取了原始深度图像信息相对于骨骼信息的优点，并且使用模糊边界分片的技术对DMM进行分片，提取了动作的时间信息，对光照、遮挡、复杂背景等因素具有较高鲁棒性。In view of the above-mentioned problems of human behavior recognition, the present invention proposes a human behavior recognition method based on the depth action map of the fuzzy boundary slice, which not only absorbs the advantages of the original depth image information relative to the skeleton information, but also uses the fuzzy boundary slice The technology slices the DMM, extracts the time information of the action, and has high robustness to factors such as illumination, occlusion, and complex background.

发明内容：Invention content:

本发明的主要目的是提出一种基于模糊边界分片的深度动作图人体行为识别方法，捕获动作的时间信息并使用鲁棒的概率协作表示分类器R-ProCRC(RobustProbabilistic Collaborative Representation based Classifier)[5]来进行分类，提高识别精度。The main purpose of the present invention is to propose a human behavior recognition method based on fuzzy boundary slices in depth action graphs, capture the time information of actions and use the robust probabilistic collaborative representation classifier R-ProCRC (Robust Probabilistic Collaborative Representation based Classifier) [5 ] to classify and improve the recognition accuracy.

为了实现上述目的，本发明提供如下技术方案，包含训练阶段和测试阶段。In order to achieve the above object, the present invention provides the following technical solutions, including a training phase and a testing phase.

基于模糊边界分片的深度动作图的人体行为识别方法训练阶段技术方案如下：The technical scheme of the training phase of the human action recognition method based on the depth action map of the fuzzy boundary fragmentation is as follows:

步骤一：给定人体行为视频序列的深度图样本的训练集合其中X^(k)表示第k个训练样本的深度图序列，为第k个样本中第i帧的原始深度图像，N_k为第k个样本的总帧数；Y^(k)表示第k个训练样本所属行为类别；M表示训练集中样本个数；Step 1: Given a training set of depth map samples of human behavior video sequences where X ^(k) represents the depth map sequence of the kth training sample, Be the original depth image of the i-th frame in the k-th sample, N _k is the total number of frames of the k-th sample; Y ^(k) represents the behavior category to which the k-th training sample belongs; M represents the number of samples in the training set;

步骤二：将训练集中每个视频序列训练样本X^(k)按时间轴直接划分为DIV个等长时间片，每个时间片长度为划分后的时间片表示为其中 Step 2: Divide each video sequence training sample X ^(k) in the training set directly into DIV equal time slices according to the time axis, and the length of each time slice is The divided time slice is expressed as in

步骤三：选择合适的模糊参数α，对每个时间片进行分片模糊处理；模糊时间片表示为为避免下标溢出，对于第一个时间片不做前向模糊处理，对最后一个时间片不做后向模糊处理；Step 3: Select the appropriate fuzzy parameter α, and perform slice fuzzy processing on each time slice; the fuzzy time slice is expressed as To avoid subscript overflow, forward blurring is not performed for the first time slice, and backward blurring is not performed for the last time slice;

步骤四：计算每个模糊时间片在三个不同投影方向的深度动作图其中v∈{f,s,t}分别表示视频序列投影的三视图(三个方向)，主视图、左视图和俯视图；至此，计算获得所有训练样本对应的深度动作图集合 Step 4: Calculate each fuzzy time slice Depth motion maps in three different projection directions where v ∈ {f, s, t} respectively represent the three views (three directions) of the video sequence projection, the main view, the left view and the top view; so far, all training samples are calculated Corresponding set of deep motion maps

步骤五：利用双三次差值法将步骤四求解获得的深度动作图调整为一个相同的尺寸，并将这些深度动作图归一化为0到1之间，归一化后的记为 Step 5: Use the bicubic difference method to solve the depth action map obtained in step 4 Adjust to the same size, and normalize these depth action maps between 0 and 1, the normalized recorded as

步骤六：将任一训练样本X^(k)对应深度动作图集合进行向量化，并将所有向量化后的序列动作图串联，完成训练样本X^(k)对应的特征构建，该特征记为H^(k)，则所有样本的特征集合表示为{H^(k)}_k∈[1M]；Step 6: Correspond any training sample X ^(k) to the set of depth motion maps Carry out vectorization, and concatenate all vectorized sequence action graphs to complete the feature construction corresponding to the training sample X ^(k) , which is recorded as H ^(k) , and the feature set of all samples is expressed as {H ^(k) } _k∈[1M] ;

步骤七：将步骤六得到的所有样本的输出特征{H^(k)}_k∈[1M]通过PCA降维，并保存所有降维后的特征 Step 7: All the samples obtained in Step 6 The output features of {H ^(k) } _k∈[1M] are dimensionally reduced by PCA, and all features after dimensionality reduction are saved

基于模糊边界分片的深度动作图的人体行为识别方法的测试阶段技术方案如下：The technical scheme of the testing phase of the human action recognition method based on the depth action map of the fuzzy boundary fragmentation is as follows:

步骤一：给定一个人体行为视频序列的深度图样本的测试样本(TestX,TestY)，其中TestX表示测试样本的深度图序列，X_i为测试样本中第i帧的原始深度图像，N_T为该测试样本的总帧数；TestY表示测试样本所属行为类别；Step 1: Given a test sample (TestX, TestY) of a depth map sample of a human behavior video sequence, where TestX represents the depth map sequence of the test sample, X _i is the original depth image of the i-th frame in the test sample, _NT is the total number of frames of the test sample; TestY indicates the behavior category to which the test sample belongs;

步骤二：将测试样本TestX按时间轴直接划分为DIV个等长时间片(分片方式与训练阶段相同)，每个时间片长度为划分后的时间片表示为其中 Step 2: Divide the test sample TestX directly into DIV equal time slices according to the time axis (the segmentation method is the same as the training stage), and the length of each time slice is The divided time slice is expressed as in

步骤三：根据训练阶段采用的模糊参数α，对每个时间片进行分片模糊处理；模糊时间片表示为为避免下标溢出，对于第一个时间片不做前向模糊处理，对最后一个时间片不做后向模糊处理；Step 3: According to the fuzzy parameter α used in the training stage, perform slice fuzzy processing on each time slice; the fuzzy time slice is expressed as To avoid subscript overflow, forward blurring is not performed for the first time slice, and backward blurring is not performed for the last time slice;

步骤四：计算每个模糊时间片在三个不同投影方向的深度动作图DMM_j,v，其中v∈{f,s,t}分别表示视频序列投影的三视图(三个方向)，主视图、左视图和俯视图；至此，计算获得测试样本TestX对应的深度动作图集合{DMM_j,v}_{j∈[1DIV],v∈[f,s,t]}；Step 4: Calculate each fuzzy time slice Depth motion map DMM _j,v in three different projection directions, where v∈{f,s,t} respectively represent three views (three directions) of video sequence projection, front view, left view and top view; so far, calculate Obtain the depth motion map set {DMM _j,v } _{j∈[1DIV],v∈[f,s,t]} corresponding to the test sample TestX;

步骤五：利用双三次差值法将步骤四求解获得的测试样本深度动作图{DMM_j,v}_{j∈[1DIV],v∈[f,s,t]}调整为训练阶段同一尺寸，并根据训练阶段归一化方法，将这些深度动作图归一化为0到1之间，归一化后的DMM_j,v记为 Step 5: Use the bicubic difference method to adjust the test sample depth action map {DMM _j,v } _{j∈[1DIV],v∈[f,s,t]} obtained in step 4 to the same size as the training stage, and according to The normalization method in the training phase normalizes these depth action maps to be between 0 and 1, and the normalized DMM _j,v is recorded as

步骤六：将测试样本TestX对应的深度动作图集合进行向量化，并将所有向量化后的序列动作图串联，完成测试样本TestX对应的特征构建，该特征记为H_T；Step 6: Collect the depth action images corresponding to the test sample TestX Carry out vectorization and concatenate all vectorized sequence action graphs to complete the feature construction corresponding to the test sample TestX, which is denoted as H _T ;

步骤七：将步骤六得到的测试样本TestX的输出特征H_T通过PCA降维，得到降维后的特征PCA降维方法与训练阶段一致；Step 7: Reduce the dimensionality of the output feature H _T of the test sample TestX obtained in Step 6 through PCA to obtain the dimensionality-reduced feature The PCA dimensionality reduction method is consistent with the training phase;

步骤八：然后将降维后的输出特征送入R-ProCRC[5]分类器，获得分类输出PridY；Step 8: Then the output features after dimensionality reduction Send it to the R-ProCRC[5] classifier to obtain the classification output PridY;

步骤九：比较PridY和TestY，若PridY＝TestY，则识别正确，否则识别错误。Step 9: Compare PridY and TestY, if PridY=TestY, the recognition is correct, otherwise the recognition is wrong.

与现有的技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本方法采用深度数据来进行人体行为识别，相比于传统的彩色视频数据，深度数据能够实现对人体的高效分割，同时保存了人体的形状和结构特征，有助于提高分类精度；1. This method uses depth data for human body behavior recognition. Compared with traditional color video data, depth data can achieve efficient segmentation of human body, while preserving the shape and structural characteristics of human body, which helps to improve classification accuracy;

2、传统的利用深度动作图DMM的特征提取方法将整个视频帧投影到一张DMM图上，丢失时间了信息。本方法提出的模糊边界分片的深度动作图人体行为识别方法，将深度图序列进行时间维度的分片，有效捕获了时域特征的变化规律；2. The traditional feature extraction method using the depth motion map DMM projects the entire video frame onto a DMM map, which loses time information. The human behavior recognition method of depth action map based on fuzzy boundary segmentation proposed by this method, slices the depth image sequence in the time dimension, and effectively captures the change law of time domain features;

3、针对人体行为时域差异性，本方法提出的模糊边界分片的深度动作图的人体行为识别方法，使用模糊参数α来控制分片之间的边界，使得相邻分片信息得到共享，进一步提高了特征对人体行为时域差异的鲁棒性，配合使用R-ProCRC分类器识别精度得到显著提高。3. Aiming at the time-domain differences of human behaviors, the human behavior recognition method of the depth motion map of the fuzzy boundary slices proposed by this method uses the fuzzy parameter α to control the boundaries between slices, so that the information of adjacent slices can be shared. The robustness of the feature to the time domain difference of human behavior is further improved, and the recognition accuracy of the R-ProCRC classifier is significantly improved.

附图说明：Description of drawings:

图1基于模糊边界分片的深度动作图的人体行为识别特征构建方法流程图；Fig. 1 is a flow chart of a method for constructing human behavior recognition features based on a depth action map of fuzzy boundary slices;

图2模糊边界分片策略示意图；Figure 2 Schematic diagram of fuzzy boundary fragmentation strategy;

图3深度动作图在三个视角下的投影实例图；Figure 3 is an example of the projection of the depth action image under three viewing angles;

具体实施方式detailed description

为了更好的说明本发明的目的、具体步骤以及特点，下面结合附图,以MSRAction3D数据集为例，对本发明作进一步详细的说明：In order to better illustrate the purpose, specific steps and characteristics of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings, taking the MSRAction3D data set as an example:

本发明提出的模糊边界分片的深度动作图的人体行为识别方法，其中特征提取的流程图如图1所示。首先对样本进行确定边界的等量分片，然后根据参数α来确定边界的模糊程度，对于每一个分片后的视频子序列都计算它的深度动作图DMM，利用双三次差值法将所有样本的DMM固定到一个相同的尺寸，并归一化，串联向量化后得到子序列的特征，完成训练样本的输出特征的构建。The human behavior recognition method of the depth motion map of the fuzzy boundary segmentation proposed by the present invention, wherein the flow chart of feature extraction is shown in FIG. 1 . Firstly, the sample is divided into equal segments to determine the boundary, and then the blurring degree of the boundary is determined according to the parameter α. For each segmented video sub-sequence, its depth motion map DMM is calculated, and all The DMM of the sample is fixed to the same size and normalized, and the features of the subsequence are obtained after serial vectorization, and the construction of the output feature of the training sample is completed.

本发明提出的一种基于模糊边界分片的深度动作图的人体行为识别方法，包含训练阶段和测试阶段。A human behavior recognition method based on a deep motion graph of fuzzy boundary slices proposed by the present invention includes a training phase and a testing phase.

上述技术方案中，训练阶段步骤二对视频序列进行等长时间片划分的方法中，分片数DIV的选择根据具体的人体行为样本数据集来确定最优分片数，以MSR Action3D数据集为例，DIV＝3。In the above-mentioned technical scheme, in the method of dividing the video sequence into equal length slices in step 2 of the training phase, the selection of the number of slices DIV is based on the specific human behavior sample data set to determine the optimal number of slices, taking the MSR Action3D data set as For example, DIV=3.

上述技术方案中，训练阶段步骤二对视频序列进行等长时间片划分的方法中，每个时间片长度为时间片长度采用向下取整方式；最后一个时间片若长度不足则按实际长度选取；In the above-mentioned technical scheme, in the method for dividing the video sequence into equal time slices in step 2 of the training phase, the length of each time slice is The length of the time slice is rounded down; if the length of the last time slice is insufficient Then choose according to the actual length;

上述技术方案中，训练阶段步骤三对时间片的分片模糊处理中，模糊参数α的选择根据具体的人体行为样本数据集来确定最优参数，以MSR Action3D数据集为例，α＝0.8。In the above technical solution, in step 3 of the training stage, in the segmental fuzzy processing of time slices, the selection of the fuzzy parameter α is based on the specific human behavior sample data set to determine the optimal parameter. Taking the MSR Action3D data set as an example, α=0.8.

上述技术方案中，训练阶段步骤三对时间片的分片模糊处理中，对每个样本的第一个时间片不做前向模糊处理；对每个样本的最后一个时间片不做后向模糊处理。In the above technical solution, in step three of the training phase, in the segmental fuzzy processing of time slices, no forward fuzzy processing is performed on the first time slice of each sample; no backward fuzzy processing is performed on the last time slice of each sample deal with.

上述技术方案中，训练阶段步骤二和步骤三共同完成了视频序列的模糊分片处理，如图2所示，模糊参数α控制了分片之间的边界，使得相邻分片信息得到共享，进一步提高了特征对人体行为时域差异的鲁棒性。In the above technical solution, step 2 and step 3 of the training stage jointly complete the fuzzy slice processing of the video sequence, as shown in Figure 2, the fuzzy parameter α controls the boundary between slices, so that adjacent slice information is shared, The robustness of features to temporal differences in human behavior is further improved.

上述技术方案中，训练阶段步骤四对各个模糊时间片在三个不同投影方向的深度动作图的计算采用视频帧绝对差叠加的方法，具体为：In the above technical scheme, step four of the training phase is for each fuzzy time slice Depth motion maps in three different projection directions The calculation of uses the video frame absolute difference superposition method, specifically:

对同一时间片内的视频帧做三个视觉方向的投影，获得各个视频的主视图、左视图和俯视图，接着对相邻帧的同一视角投影相减，相减后绝对值进行叠加，这样动作的轨迹就被保存了下来，具体公式如下：The video frames in the same time slice are projected in three visual directions to obtain the main view, left view and top view of each video, and then the projections of the same viewing angle of adjacent frames are subtracted, and the absolute values are superimposed after subtraction. The trajectory of is saved, the specific formula is as follows:

其中和分别表示第k个样本中的第一个时间片和最后一个时间片,v∈{f,s,t}分别表示视频序列投影的三视图(三个方向)，即主视图、左视图和俯视图；表示第k个样本中第i帧深度图的v方向投影；j∈(2,…,DIV-1)，α∈(0,1)；如图3所示，各投影方向的DMM图有效保存了人体行为序列的单个模糊分片在三个投影方向的轨迹信息。in with respectively represent the first time slice and the last time slice in the k-th sample, and v∈{f,s,t} respectively represent the three views (three directions) of the video sequence projection, that is, the front view, the left view and the top view ; Indicates the v-direction projection of the depth map of the i-th frame in the k-th sample; j∈(2,...,DIV-1), α∈(0,1); as shown in Figure 3, the DMM map of each projection direction is effectively saved Trajectory information of a single fuzzy slice of human behavior sequences in three projection directions is obtained.

上述技术方案中，训练阶段步骤五采用双三次差值法将深度动作图调整为同一尺寸，本专利采用的主视图、左视图和俯视图的尺寸分别定义为：50×25，50×40,40×20；在实际实施中，可以选择不同的插值法实现深度动作图尺寸的缩放，选择前提为尽量减少图像信息的损失。In the above technical scheme, step 5 of the training stage adopts the bicubic difference method to convert the depth action map to Adjusted to the same size, the dimensions of the front view, left view, and top view used in this patent are defined as: 50×25, 50×40, and 40×20; in actual implementation, different interpolation methods can be selected to achieve the size of the depth action map The scaling of , the selection premise is to minimize the loss of image information.

上述技术方案中，训练阶段步骤七对步骤六计算得到的所有样本的输出特征{H^(k)}_k∈[1M]通过PCA降维，具体降维后的特征维数可根据训练样本的个数而定，在本专利实施实例中，若训练样本的个数为M，则最终的特征维数为(M-20)×1。In the above technical solution, all samples calculated in step 7 of the training phase against step 6 The output features of {H ^(k) } _k∈[1M] are dimensionally reduced by PCA, and the specific feature dimension after dimensionality reduction can be determined according to the number of training samples. In the implementation example of this patent, if the number of training samples is M, the final feature dimension is (M-20)×1.

上述技术方案中，测试阶段步骤二至步骤六采用的特征构造方法及参数与训练阶段相同。In the above technical solution, the feature construction methods and parameters used in steps 2 to 6 of the test phase are the same as those used in the training phase.

上述技术方案中，测试阶段步骤七采对PCA对步骤六计算得到的测试样本TestX的输出特征H_T降维，降维后的维数为(M-20)×1，M为训练阶段样本个数。In the above-mentioned technical scheme, step 7 of the test stage uses PCA to reduce the dimensionality of the output feature H _T of the test sample TestX calculated in step 6, and the dimension after dimensionality reduction is (M-20)×1, and M is the number of samples in the training stage number.

上述技术方案中，测试阶段步骤八使用R-ProCRC分类对步骤七获得的降维后的行为特征进行分类的具体方法为：In the above technical solution, step 8 of the test phase uses R-ProCRC classification to classify the dimensionality-reduced behavioral features obtained in step 7 The specific methods for classification are:

(1)计算最优参数 (1) Calculate the optimal parameters

其中是降维后的训练特征,H_T为测试样本特征，是属于类别c(c∈C)的所有输入特征向量的集合，‖C‖为总类别数,λ和γ是0到1之间的参数；其中的构造方法为：首先将初始化为与字典相同尺寸的0矩阵，接着将分配到中，位置为在中的相对位置，即可得到例如的值为：是一个对角化的权重矩阵：in is the training feature after dimensionality reduction, H _T is the test sample feature, Is the set of all input feature vectors belonging to category c (c∈C), ‖C‖ is the total number of categories, λ and γ are parameters between 0 and 1; in The construction method is as follows: firstly, the initialized as a dictionary with 0 matrix of the same size, then the Assigned to , the position is exist The relative position in can be obtained E.g The value is: is a diagonalized weight matrix:

这里，表示第i行的所有元素，表示测试样本特征向量的第i个值；here, Represents all elements of the i-th row, Indicates the i-th value of the test sample feature vector;

(2)估计测试样本输出特征属于类别c的概率(2) Estimate the output characteristics of the test sample Probability of belonging to class c

测试样本输出特征属于类别c的概率估计方法如下：Test sample output features The probability of belonging to category c is estimated as follows:

对于所有样本来说都是相同的，因此上式可以简化为：for all samples are the same, so the above formula can be simplified as:

由此可以得到特征所属的类别。From this we can get the characteristics category to which it belongs.

为验证本发明的有效性，本发明在著名的人体行为深度信息数据库MSRAction3D、Action Pair上先后进行了实验。表1给出了两种人体行为深度信息数据库的特性。In order to verify the validity of the present invention, the present invention has carried out experiments successively on the famous human body behavior depth information database MSRAction3D and Action Pair. Table 1 shows the characteristics of the two human behavior depth information databases.

表1:深度数据库特性描述Table 1: Depth Database Feature Description

如表2所示，在实验中我们将MSR Action3D数据库分成三个固定的子集，。每个子集都使用3种实验方式，Test One将每个subject的第一次演示行为作为训练集，剩余的作为测试集。Test Two将每个subject的前两次演示行为作为训练集，剩余的作为测试集。Cross Test使用1,3,5,7,9这些subject的所有视频序列作为训练集，其余用做测试。实验结果见表3，可见本发明的行为识别精度在大多数情况下都优于传统的DMM方法。As shown in Table 2, we divide the MSR Action3D database into three fixed subsets in the experiments. Each subset uses 3 experimental methods, Test One uses the first demonstration behavior of each subject as the training set, and the rest as the test set. Test Two uses the first two demonstration behaviors of each subject as the training set, and the rest as the test set. Cross Test uses all video sequences of 1, 3, 5, 7, and 9 subjects as the training set, and the rest are used for testing. The experimental results are shown in Table 3. It can be seen that the behavior recognition accuracy of the present invention is better than the traditional DMM method in most cases.

表2:MSR Action3D数据库子集Table 2: MSR Action3D database subset

表3:MSR Action3D数据库子集对比Table 3: Comparison of MSR Action3D database subsets

表4显示了本发明在Action Pair数据库上的识别率以及与DMM的对比。由于Action Pair数据库行为都是存在大量动作时序相反的行为，比如“捡起”和“放下”，“起立”和“坐下”等，因此对时间信息十分敏感。传统的DMM只有50.6％的识别率，而本发明的识别率达到了97.2％。Table 4 shows the recognition rate of the present invention on the Action Pair database and the comparison with DMM. Since the actions in the Action Pair database have a large number of actions with opposite timing, such as "pick up" and "put down", "stand up" and "sit down", they are very sensitive to time information. The recognition rate of traditional DMM is only 50.6%, but the recognition rate of the present invention reaches 97.2%.

表4:Action Pair不同算法识别率Table 4: Action Pair recognition rate of different algorithms

由于本发明采用深度数据来进行人体行为识别，相比于传统的彩色视频数据，深度数据能够快速精准地完成人体分隔，保存人体的形状和结构特征，利于提高精度，同时使用深度动作图DMM的处理方式，相对于基于骨骼跟踪技术估计获得的骨骼点有更好的鲁棒性；传统的利用深度动作图DMM的特征提取方法将整个视频帧投影到一张DMM图上，丢失时间了信息；本方法提出的模糊边界分片的深度动作图的人体行为识别，将现有的DMM进行分片，并使用模糊参数α来控制分片之间的边界，使得相邻分片信息得到共享，也使得DMM能够更好的捕获时间信息，配合使用R-ProCRC[5]分类器可以获得卓越的识别准确率。Since the present invention uses depth data for human body behavior recognition, compared with traditional color video data, depth data can quickly and accurately complete human body separation, preserve the shape and structural features of the human body, and help improve accuracy. The processing method is more robust than the bone points estimated based on bone tracking technology; the traditional feature extraction method using the depth motion map DMM projects the entire video frame onto a DMM map, which loses time information; The human behavior recognition of the depth action map of the fuzzy boundary segmentation proposed by this method divides the existing DMM into slices, and uses the fuzzy parameter α to control the boundaries between slices, so that the information of adjacent slices can be shared, and the It enables DMM to better capture time information, and the R-ProCRC [5] classifier can be used together to obtain excellent recognition accuracy.

上面结合附图对本发明的具体实施方式做了详细说明，但是本发明并不限于上述实施方式，在本领域普通技术人员所具备的知识范围内，还可以在不脱离本发明宗旨的前提下做出各种变化。The specific implementation of the present invention has been described in detail above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned implementation, and within the knowledge of those of ordinary skill in the art, it can also be done without departing from the gist of the present invention. Various changes.

参考文献references

[1].Bian W,Tao D,Rui Y.Cross-domain human action recognition.[J].IEEETransactions on Systems Man&Cybernetics Part B Cybernetics A Publication ofthe IEEE Systems Man&Cybernetics Society,2012,42(2):298-307.[1]. Bian W, Tao D, Rui Y. Cross-domain human action recognition. [J]. IEEE Transactions on Systems Man&Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man&Cybernetics Society, 2012,42(2):298-307.

[2].Niebles J C,Wang H,Li F F.Unsupervised Learning of Human ActionCategories Using Spatial-Temporal Words[J].International Journal of ComputerVision,2008,79(3):299-318.[2].Niebles J C,Wang H,Li F F.Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words[J].International Journal of Computer Vision,2008,79(3):299-318.

[3].Wang J,Liu Z,Wu Y,et al.Mining actionlet ensemble for actionrecognition with depth cameras[C]//IEEE Conference on Computer Vision andPattern Recognition.IEEE Computer Society,2012:1290-1297.[3].Wang J, Liu Z, Wu Y, et al.Mining actionlet ensemble for actionrecognition with depth cameras[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2012:1290-1297.

[4].Chen C,Liu K,Kehtarnavaz N.Real-time human action recognitionbased on depth motion maps[J].Journal of Real-Time Image Processing,2013:1-9.[4]. Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps [J]. Journal of Real-Time Image Processing, 2013: 1-9.

[5].Sijia C,Lei Z,et al.A Probabilistic Collaborative Representationbased Approach for Pattern Classification.IEEE Trans.on Pattern Analysis andMachine Intelligence,2016.[5]. Sijia C, Lei Z, et al. A Probabilistic Collaborative Representation based Approach for Pattern Classification. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2016.

[6].Xu H,Chen E,Liang C,et al.Spatio-Temporal Pyramid Model based ondepth maps for action recognition[C]//Mmsp2015 IEEE,International Workshop onMultimedia Signal Processing.2015。[6]. Xu H, Chen E, Liang C, et al.Spatio-Temporal Pyramid Model based ondepth maps for action recognition[C]//Mmsp2015 IEEE, International Workshop on Multimedia Signal Processing.2015.

Claims

1. The human action recognition based on the depth action graph of fuzzy boundary slice, is characterized in that, comprises training stage and testing stage.

2. the human body behavior recognition based on the depth action map of fuzzy boundary slice according to claim 1, is characterized in that, training stage comprises the following steps:

Step 1. Given the training set of depth map samples of human behavior video sequences where X ^(k) represents the depth map sequence of the kth training sample, Be the original depth image of the i-th frame in the k-th sample, N _k is the total number of frames of the k-th sample; Y ^(k) represents the behavior category to which the k-th training sample belongs; M represents the number of samples in the training set;

Step 2, each video sequence training sample X ^(k) in the training set is directly divided into DIV equal time slices according to the time axis, and the length of each time slice is The divided time slice is expressed as in

Step 3: Select the appropriate fuzzy parameter α, and perform slice fuzzy processing on each time slice; the fuzzy time slice is expressed as To avoid subscript overflow, forward blurring is not performed for the first time slice, and backward blurring is not performed for the last time slice;

Step 4. Calculate each fuzzy time slice Depth motion maps in three different projection directions where v ∈ {f, s, t} respectively represent the three views (three directions) of the video sequence projection, the main view, the left view and the top view; so far, all training samples are calculated Corresponding set of deep motion maps

Step 5. Use the bicubic difference method to solve the depth action map obtained in step 4 Adjust to the same size, and normalize these depth action maps between 0 and 1, the normalized recorded as

Step 6. Correspond any training sample X ^(k) to the set of depth motion maps Carry out vectorization, and concatenate all vectorized sequence action graphs to complete the feature construction corresponding to the training sample X ^(k) , which is recorded as H ^(k) , and the feature set of all samples is expressed as {H ^(k) } _k∈[ _1M] ;

Step 7, all the samples obtained in step 6 The output features of {H ^(k) } _k∈[1 _M] are dimensionally reduced by PCA, and all features after dimensionality reduction are saved

3. the human body behavior recognition based on the depth action map of fuzzy boundary slice according to claim 1, is characterized in that, testing stage comprises the following steps:

Step 1, given a test sample (TestX, TestY) of the depth map sample of a human behavior video sequence, wherein TestX represents the depth map sequence of the test sample, X _i is the original depth image of the i-th frame in the test sample, _NT is the total number of frames of the test sample; TestY indicates the behavior category to which the test sample belongs;

Step 2. Divide the test sample TestX directly into DIV equal time slices according to the time axis (the slice method is the same as the training stage), and the length of each time slice is The divided time slice is expressed as in

Step 3. According to the fuzzy parameter α used in the training stage, perform fragmentation fuzzy processing on each time slice; the fuzzy time slice is denoted as To avoid subscript overflow, no forward blurring is done for the first time slice, and no backward blurring is done for the last time slice;

Step 4. Calculate each fuzzy time slice Depth motion map DMM _j,v in three different projection directions, where v∈{f,s,t} respectively represent three views (three directions) of video sequence projection, front view, left view and top view; so far, calculate Obtain the depth motion map set {DMM _{j, v} } _j∈[1 _{DIV], v∈[f, s, t]} corresponding to the test sample TestX;

Step 5. Use the bicubic difference method to adjust the test sample depth motion map {DMM _j,v } _j∈[1 _{DIV],v∈[f,s,t]} obtained in step 4 to the same size as the training stage, and According to the normalization method of the training stage, these depth action maps are normalized between 0 and 1, and the normalized DMM _j,v is recorded as

Step 6: Collect the depth action images corresponding to the test sample TestX Carry out vectorization and concatenate all vectorized sequence action graphs to complete the feature construction corresponding to the test sample TestX, which is denoted as H _T ;

Step 7. The output feature H _T of the test sample TestX obtained in step 6 is reduced by PCA to obtain the feature after dimension reduction The PCA dimensionality reduction method is consistent with the training phase;

Step 8, and then the output features after dimensionality reduction Send it to the R-ProCRC classifier to obtain the classification output PridY;

Step 9: Comparing PridY and TestY, if PridY=TestY, the recognition is correct, otherwise the recognition is wrong.

4. in the method that step 2 of training stage according to claim 2 is carried out to video sequence in the method for equal time slice division, each time slice length is The length of the time slice is rounded down; if the length of the last time slice is insufficient Then choose according to the actual length.

5. in the fragmentation fuzzy processing of time slice according to training stage step 3 according to claim 2, the first time slice of each sample is not done forward fuzzy processing; The last time slice of each sample is not Do backward blurring.

6. training phase step four according to claim 2 to each fuzzy time slice Depth motion maps in three different projection directions The method of superimposing the absolute difference of video frames is used for the calculation of , specifically: projection in three visual directions on the video frames in the same time slice to obtain the main view, left view and top view of each video, and then the same viewing angle of adjacent frames The projection is subtracted, and the absolute value is superimposed after the subtraction, so that the trajectory of the action is saved. The specific formula is as follows:

in with respectively represent the first time slice and the last time slice in the k-th sample, and v∈{f,s,t} respectively represent the three views (three directions) of the video sequence projection, that is, the front view, the left view and the top view ; Represents the v-direction projection of the i-th frame depth map in the k-th sample; j∈(2,...,DIV-1), α∈(0,1); the DMM map of each projection direction effectively preserves a single human behavior sequence Trajectory information of fuzzy slices in three projection directions.

7. according to claim 3, test phase step eight uses R-ProCRC classification to the behavioral characteristics after the dimensionality reduction that test step seven obtains The specific methods for classification are:

(1) Calculate the optimal parameters

in is the training feature after dimensionality reduction, H _T is the test sample feature, Is the set of all input feature vectors belonging to category c (c∈C), ‖C‖ is the total number of categories, λ and γ are parameters between 0 and 1; in The construction method is as follows: firstly, the initialized as a dictionary with 0 matrix of the same size, then the Assigned to , the position is exist The relative position in can be obtained E.g The value is: is a diagonalized weight matrix:

here, Represents all elements of the i-th row, Indicates the i-th value of the test sample feature vector;

(2) Estimate the output characteristics of the test sample Probability of belonging to class c

Test sample output features The probability of belonging to category c is estimated as follows:

for all samples are the same, so the above formula can be simplified as:

From this we can get the characteristics category to which it belongs.