CN110580456A

CN110580456A - Group Activity Recognition Method Based on Coherence Constraint Graph Long Short-Term Memory Network

Info

Publication number: CN110580456A
Application number: CN201910778094.8A
Authority: CN
Inventors: 舒祥波; 张瑞鹏; 唐金辉; 严锐; 宋砚
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2019-12-17

Abstract

The present invention provides a group activity recognition method based on the coherence constraint map long-term memory network, comprising the following steps: (1) using the CNN features of all people as the input of the coherence constraint map long-term and short-term memory network, and jointly learning all people in time and space individual motion states over time under the constraints of contextual coherence; (2) using the attention mechanism of global contextual coherence to quantify the contribution of related motions by learning attention factors corresponding to different motions; (3) in each time step Aggregate LSTM is adopted to aggregate all individual motion states weighted by different attention factors into a hidden representation of the whole activity, and the hidden representation of each activity is input into the softmax classifier; (4) the softmax classifier for each time step The output of the group is averaged to obtain the probability class vector of the group activity, so as to infer the category of the group activity.

Description

Group Activity Recognition Method Based on Coherence Constraint Graph Long Short-Term Memory Network

技术领域technical field

本发明涉及计算机视觉领域动作识别技术，特别是一种基于相干约束图长短时记忆网络的群体活动识别方法。The invention relates to an action recognition technology in the field of computer vision, in particular to a group action recognition method based on a coherence constraint graph long-short-term memory network.

背景技术Background technique

传统的动作识别诸如单人识别和两个人交互的动作识别通常由一个人或两个人在一个视频中出现，在过去的几十年里，这类任务已经取得了令人满意的性能。与传统的人类行为相比，群体活动是场景中较为复杂但又较为常见的行为。与单人活动和两人互动不同，团体活动通常由多人同时进行。因此，在群体活动识别中，我们需要对多个个体的行为及其相互作用进行建模。这是一个细粒度的识别任务，与传统的单人动作识别或两个人交互识别相比，难度要大得多。Traditional action recognition such as single-person recognition and two-person interaction action recognition usually consists of one or two persons appearing in a video, and such tasks have achieved satisfactory performance over the past few decades. Compared with traditional human behavior, group activities are more complex but more common behaviors in the scene. Unlike solo activities and two-person interactions, group activities are usually performed by multiple people at the same time. Therefore, in group activity recognition, we need to model the behavior of multiple individuals and their interactions. This is a fine-grained recognition task that is much more difficult than traditional single-person action recognition or two-person interaction recognition.

得益于循环神经网络(RNN)的成功，特别是对于长短时记忆网络(LSTM)的发展，近年来，群体活动识别取得了一定的进展。通过回顾现有的与群体活动识别相关的深度学习方法，一个常见的解决方案是，首先学习每个人的个人层次的动作表示，然后整合所有的个体表示来识别群体层次的活动。具体来说，早期的一些方法假设活动场景中的所有人都是相互独立的。随后，一些方法认为活动场景中的所有人都是相互依赖的，参照群体活动中他人的运动状态来建模每个人的个体运动。但是，上述方法认为所有人的运动对群体活动的贡献是平等的，这就抑制了一些相干运动对整个活动的贡献，夸大了一些与群体活动无关的离群运动。Benefiting from the success of Recurrent Neural Networks (RNNs), especially for the development of Long Short-Term Memory Networks (LSTMs), group activity recognition has made some progress in recent years. By reviewing existing deep learning methods related to group activity recognition, a common solution is to first learn individual-level action representations for each person, and then integrate all individual representations to identify group-level activities. Specifically, some of the earlier methods assumed that everyone in the active scene was independent of each other. Subsequently, some methods assume that all persons in the activity scene are interdependent, and model the individual motion of each person with reference to the motion state of others in the group activity. However, the above method considers that all people's movements contribute equally to the group activity, which suppresses the contribution of some coherent movements to the whole activity and exaggerates some outlier movements that have nothing to do with the group activity.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于相干约束图长短时记忆网络的群体活动识别方法。The purpose of the present invention is to provide a group activity identification method based on the coherence constraint graph long and short-term memory network.

实现本发明目的的技术方案为：一种基于相干约束图长短时记忆网络的群体活动识别方法，包括以下步骤：The technical scheme for realizing the object of the present invention is: a method for identifying group activity based on a coherent constraint graph long-short-term memory network, comprising the following steps:

步骤1，使用一个预先训练好的卷积神经网络CNN模型，提取跟踪的边界框中每个人的CNN特征；Step 1, use a pre-trained convolutional neural network CNN model to extract the CNN features of each person in the tracked bounding box;

步骤2，将所有人的CNN特征作为相干约束图长短时记忆网络CCG-LSTM的输入，共同学习所有人在时空上下文相干性约束下随时间的个体运动状态；Step 2: Use the CNN features of all people as the input of the CCG-LSTM coherence constraint map long-term memory network, and jointly learn the individual motion states of all people over time under the constraints of spatiotemporal context coherence;

步骤3，利用全局上下文相干性的注意力机制，学习不同运动对应的注意力因子，通过注意力因子得到在全局上下文相干性约束下的每个个体的运动状态；Step 3: Use the attention mechanism of global context coherence to learn the attention factors corresponding to different movements, and obtain the movement state of each individual under the constraint of global context coherence through the attention factors;

步骤4，在每个时间步中，相干约束图长短时记忆网络CCG-LSTM中的聚合长短时记忆网络LSTM将所有由不同注意力因子得到的单个个体的运动状态聚合为整个活动的隐藏表示；Step 4, in each time step, the aggregated long-short-term memory network LSTM in the coherence constraint graph long-short-term memory network CCG-LSTM aggregates all the motion states of a single individual obtained by different attention factors into a hidden representation of the entire activity;

步骤5，在每个时间步中将每个活动的隐藏表示输入到softmax分类器中；Step 5. Input the hidden representation of each activity into the softmax classifier at each time step;

步骤6，对每个时间步的softmax分类器的输出进行平均，推断出群体活动的类别。In step 6, the output of the softmax classifier at each time step is averaged to infer the class of group activity.

进一步地，步骤1具体包括如下步骤：对于每个视频片段，采用在Dlib库中的对象跟踪器在一定的时间步内跟踪每个人周围的一组边界框，对象跟踪器提取了每个边界框中每个人的CNN特征。Further, step 1 specifically includes the following steps: for each video segment, the object tracker in the Dlib library is used to track a set of bounding boxes around each person within a certain time step, and the object tracker extracts each bounding box. CNN features of each person in .

进一步地，步骤1中若某一帧中跟踪器未跟踪到人，则这帧图像中用一个全零矩阵弥补所缺失的对象人的特征。Further, in step 1, if the tracker does not track a person in a certain frame, an all-zero matrix is used to make up for the missing feature of the target person in this frame of image.

进一步地，步骤2具体包括如下步骤：Further, step 2 specifically includes the following steps:

步骤201，给定一个T帧的视频片段，其描述了含有V个人的群体活动，表示第v个人在第t帧的CNN特征，其中t∈{1,2,...,T}，v∈{1,2,...,V}；Step 201, given a video clip of T frames, which describes the group activity containing V individuals, represents the CNN features of the vth person in the tth frame, where t∈{1,2,...,T}, v∈{1,2,...,V};

步骤202，把群体活动的特征在空间域和时间域表示为图结构其中E^t是邻接矩阵；Step 202, representing the characteristics of group activities as a graph structure in the spatial and temporal domains in E ^t is the adjacency matrix;

步骤203，构造相干约束图长短时记忆网络CCG-LSTM，时间步t下CCG-LSTM中第v个节点的运动状态计算公式如下Step 203, construct the coherence constraint graph long and short-term memory network CCG-LSTM, the motion state of the vth node in the CCG-LSTM at time step t Calculated as follows

其中in

其中，对于第v个节点，为输入门，为遗忘门，为输出门，为邻接遗忘门，为时间置信门，为空间上下文置信门，φ(·)是一个多层感知机，表示第v个节点在时间步t-1的运动状态，表示第i个节点在时间步t-1的运动状态i属于第v个节点的邻接节点，表示第v个节点和第i在时间步t-1的关系权重，是第v个人的空间上下文状态，W_*、U_*、G_*是权重矩阵，b_*是偏置向量，*指的是下标为i、g、o、f和σ(·)表示sigmoid激活函数，表示tanh激活函数，⊙表示按元素乘，Φ(v)表示第v个节点的邻接节点，表示第v个节点的空间上下文记忆状态，是经W_p变换矩阵维度变换后的空间上下文状态，是经W_x变换矩阵维度变换后的特征，是运动状态投影到另一个维度空间的变量，参数ρ为控制函数输入范围，是指第v个节点的空间上下文记忆状态，是指第v个节点的相应邻节点的记忆状态。where, for the vth node, is the input gate, For the gate of oblivion, is the output gate, For adjoining the gate of oblivion, is the time confidence gate, is the spatial context confidence gate, φ( ) is a multilayer perceptron, represents the motion state of the vth node at time step t-1, Indicates that the motion state i of the i-th node at time step t-1 belongs to the adjacent nodes of the v-th node, represents the relationship weight between the vth node and the ith node at time step t-1, is the spatial context state of the vth person, W _* , U _* , G _* are the weight matrices, b _* is the bias vector, and * refers to the subscripts i, g, o, f and σ( ) represents the sigmoid activation function, represents the tanh activation function, ⊙ represents element-wise multiplication, Φ(v) represents the adjacent node of the vth node, represents the spatial context memory state of the vth node, Yes The spatial context state transformed by the dimension of the _Wp transformation matrix, Yes The feature transformed by the dimension of the W _x transformation matrix, is the variable of the motion state projected to another dimension space, the parameter ρ is the input range of the control function, is the spatial context memory state of the vth node, is the memory state of the corresponding neighbors of the vth node.

进一步地，步骤3具体包括如下步骤：Further, step 3 specifically includes the following steps:

步骤301，获取所有个体运动状态的平均运动状态来表示该时间步整体活动的隐藏表示，即 Step 301, obtain the average motion state of all individual motion states to represent the hidden representation of the overall activity at that time step, i.e.

步骤302，使用一个注意力模型学习注意因子来衡量个体活动对整体活动的贡献其中γ是一个参数；Step 302, use an attention model to learn attention factors to measure the contribution of individual activities to the overall activity where γ is a parameter;

步骤303，通过注意力因子得到在全局上下文相干性约束下的第v个节点的运动状态 Step 303, obtain the motion state of the vth node under the constraint of global context coherence through the attention factor

进一步地，步骤4具体包括如下步骤：Further, step 4 specifically includes the following steps:

在空间域中使用聚合LSTM将所有人的运动状态聚合为一个隐藏的整个活动在时间步长t时的人对人的表示：Use an aggregated LSTM in the spatial domain to aggregate the motion states of all people into a hidden person-to-person representation of the entire activity at time step t:

其中，表示聚合LSTM的隐藏状态，z^t是整个活动在时间步t的隐藏表示，为全局上下文相干性约束下的第v个个体的运动状态。in, represents the hidden state of the aggregated LSTM, z ^t is the hidden representation of the entire activity at time step t, is the motion state of the vth individual under the constraint of global context coherence.

进一步地，步骤5具体包括如下步骤：Further, step 5 specifically includes the following steps:

将群体活动在时间步t的隐藏表示z^t(t＝1,2,...,T)放入softmax分类器中得到y^t＝softmax(z^t),t＝1,2,...,T。Putting the hidden representation z ^t (t=1,2,...,T) of the group activity at time step t into the softmax classifier gets y ^t =softmax(z ^t ),t=1,2,... , T.

进一步地，步骤6具体包括如下步骤：Further, step 6 specifically includes the following steps:

对所有的softmax分类器的输出进行平均，得到群体活动的概率类向量从而得到分类结果。Average the outputs of all softmax classifiers to get the probability class vector of group activity Thereby, the classification results are obtained.

本发明与现有技术相比，具有以下优点：(1)考虑在时空上下文相干(STCC)约束下扩展图LSTM，通过探索空间和时间域上的个体运动来理解群体活动；(2)测量个体运动在全局上下文相干(GCC)约束下自身与整个活动的一致性来量化该运动对群体活动的贡献，可以有效地识别群体活动。Compared with the prior art, the present invention has the following advantages: (1) Consider expanding graph LSTMs under the constraints of spatiotemporal context coherence (STCC), and understand group activities by exploring individual motions in both spatial and temporal domains; (2) Measure individuals To quantify the contribution of the movement to the group activity, the consistency of the movement itself with the whole activity under the constraint of Global Context Coherence (GCC) can effectively identify the group activity.

下面结合说明书附图对本发明作进一步描述。The present invention will be further described below with reference to the accompanying drawings.

附图说明Description of drawings

图1是本发明的流程图。Figure 1 is a flow chart of the present invention.

图2是基于相干约束图长短时记忆网络的群体活动识别方法的可视化图。Figure 2 is a visualization of the group activity recognition method based on the coherence constraint graph long and short-term memory network.

表1为不同方法对排球数据集的识别精度。Table 1 shows the recognition accuracy of different methods on the volleyball dataset.

具体实施方式Detailed ways

1.种基于相干约束图长短时记忆网络的群体活动识别方法，包括学习个体在时空上下文相干约束下的运动状态、量化在全局上下文相干约束下的个体运动对群体活动的贡献、采用聚合LSTM获取群体活动的隐藏表示、获取群体活动的概率类向量四个过程。1. A method for group activity recognition based on coherence constraint graph long-term and short-term memory network, including learning the motion state of individuals under the constraints of spatio-temporal context coherence, quantifying the contribution of individual motion to group activities under the constraints of global context coherence, and using aggregated LSTM to obtain There are four processes of hidden representation of group activity and acquisition of probability class vector of group activity.

学习个体在时空上下文相干约束下的运动状态包括以下步骤：Learning the motion state of an individual under the coherence constraints of the spatiotemporal context includes the following steps:

步骤1，使用一个预先训练好的卷积神经网络(CNN)模型，提取被检测和跟踪的边界框中每个人的CNN特征，其中采用的卷积神经网络可以兼容AlexNet,VGG,ResNet和GoogLeNet。Step 1, use a pre-trained convolutional neural network (CNN) model to extract the CNN features of each person in the bounding box being detected and tracked, and the convolutional neural network used is compatible with AlexNet, VGG, ResNet and GoogLeNet.

步骤2，在普通的图长短时记忆网络(Graph LSTM)中添加时间置信门和空间上下文置信门来学习所有个体的时间上下文相干性约束和空间上下文相干性约束。Step 2, add a temporal confidence gate to the ordinary graph long short-term memory network (Graph LSTM) and spatial context confidence gates to learn temporal context coherence constraints and spatial context coherence constraints for all individuals.

步骤3，在步骤1中得到的个体的CNN特征作为相干约束图长短时记忆网络的输入，共同学习所有个体在时空上下文相干性约束下随时间的个体运动状态。采用的步骤如下：Step 3: The individual CNN features obtained in step 1 are used as the input of the coherence constraint graph long-term memory network to jointly learn the individual motion states of all individuals over time under the constraints of spatiotemporal context coherence. The steps taken are as follows:

a、给定一个T帧的视频片段，其描述了含有V个人的群体活动表示第v个人在第t帧的CNN特征，其中t∈{1,2,...,T}，v∈{1,2,...,V}。a. Given a video clip of T frames, it describes the group activity of V individuals represents the CNN features of the vth person at frame t, where t∈{1,2,...,T}, v∈{1,2,...,V}.

b、把群体活动的特征在空间域和时间域表示为图结构：b. Represent the characteristics of group activities as a graph structure in the spatial and temporal domains:

其中E^t是邻接矩阵。in E ^t is the adjacency matrix.

c、构造相干约束图长短时记忆网络，对于第v个节点，为输入门，为遗忘门，为输出门，为邻接遗忘门，为时间置信门，为空间上下文置信门，φ(·)是一个多层感知机，表示第v个节点在时间步t-1的运动状态，表示第i个节点在时间步t-1的运动状态i属于第v个节点的邻接节点，表示第v个节点和第i在时间步t-1的关系权重，是第v个人的空间上下文状态，W_*、U_*、G_*是权重矩阵，b_*是偏置向量，*指的是下标为i、g、o、f和σ(·)表示sigmoid激活函数，表示tanh激活函数，⊙表示按元素乘，Φ(v)表示第v个节点的邻接节点，表示第v个节点的空间上下文记忆状态，是经W_p变换矩阵维度变换后的空间上下文状态，是经W_x变换矩阵维度变换后的特征，是运动状态投影到另一个维度空间的变量，参数ρ为控制函数输入范围，是指第v个节点的空间上下文记忆状态，是指第v个节点的相应邻节点的记忆状态；在时间步t，CCG-LSTM中第v个节点的运动状态计算公式如下：c. Construct a long and short-term memory network of coherence constraint graph, for the vth node, is the input gate, For the gate of oblivion, is the output gate, For adjoining the gate of oblivion, is the time confidence gate, is the spatial context confidence gate, φ( ) is a multilayer perceptron, represents the motion state of the vth node at time step t-1, Indicates that the motion state i of the i-th node at time step t-1 belongs to the adjacent nodes of the v-th node, represents the relationship weight between the vth node and the ith node at time step t-1, is the spatial context state of the vth person, W _* , U _* , G _* are the weight matrices, b _* is the bias vector, and * refers to the subscripts i, g, o, f and σ( ) represents the sigmoid activation function, represents the tanh activation function, ⊙ represents element-wise multiplication, Φ(v) represents the adjacent node of the vth node, represents the spatial context memory state of the vth node, Yes The spatial context state transformed by the dimension of the _Wp transformation matrix, Yes The feature transformed by the dimension of the W _x transformation matrix, is the variable of the motion state projected to another dimension space, the parameter ρ is the input range of the control function, is the spatial context memory state of the vth node, refers to the memory state of the corresponding neighbors of the vth node; at time step t, the motion state of the vth node in CCG-LSTM Calculated as follows:

得到个体运动状态 Get the state of individual movement

量化在全局上下文相干约束下的个体运动对群体活动的贡献包括以下步骤：Quantifying the contribution of individual motion to group activity under the constraints of global context coherence involves the following steps:

步骤4，利用全局上下文相干性的注意力机制，通过学习不同个体运动对应的注意力因子，将步骤3得到的个体状态对群体活动的贡献量化，首先采用所有个体运动状态的平均运动状态来近似这部分整体活动的隐藏表示，即 Step 4, using the attention mechanism of global context coherence, by learning the attention factors corresponding to different individual movements, the individual states obtained in step 3 are Contribution to group activity is quantified by first taking the average motion state of all individual motion states to approximate the hidden representation of this part of the overall activity, i.e.

步骤5，使用一个注意力模型学习注意因子来衡量步骤3个体活动对步骤4整体活动的贡献其中γ是一个参数。Step 5, use an attention model to learn attention factors to measure the contribution of the individual activities of step 3 to the overall activities of step 4 where γ is a parameter.

步骤6，通过步骤5的注意力因子得到在全局上下文相干性约束下的第v个节点的运动状态 Step 6, obtain the motion state of the vth node under the constraint of global context coherence through the attention factor of step 5

采用聚合LSTM获取群体活动的隐藏表示包括以下步骤：Using an aggregated LSTM to obtain hidden representations of group activity involves the following steps:

步骤7，在空间域中使用聚合LSTM将步骤6)所有个体的运动状态聚合为一个隐藏的整个活动在时间步长t时的人对人的表示：Step 7. Aggregate the motion states of all individuals in step 6) into a hidden person-to-person representation of the entire activity at time step t using an aggregated LSTM in the spatial domain:

其中表示聚合LSTM的隐藏状态，z^t是整个活动在时间步t的隐藏表示。in represents the hidden state of the aggregated LSTM, z ^t is the hidden representation of the entire activity at time step t.

获取群体活动的概率类向量包括以下步骤：Obtaining the probability class vector of group activity involves the following steps:

步骤8，将步骤7群体活动在时间步t的隐藏表示z^t(t＝1,2,...,T)放入softmax分类器中得到：y^t＝softmax(z^t),t＝1,2,...,T。Step 8, put the hidden representation z ^t (t=1,2,...,T) of the group activity at time step t in step 7 into the softmax classifier to get: y ^t =softmax(z ^t ),t=1 ,2,...,T.

步骤9)、对步骤8)所有的softmax分类器的输出进行平均，得到群体活动的概率类向量：从而得到分类结果。Step 9), average the outputs of all the softmax classifiers in step 8) to obtain the probability class vector of the group activity: Thereby, the classification results are obtained.

表1为不同方法对排球数据集的识别精度Table 1 shows the recognition accuracy of different methods on the volleyball dataset

Claims

1. a group activity recognition method based on coherence constraint graph long short-term memory network, is characterized in that, comprises the following steps:

Step 1, use a pre-trained convolutional neural network CNN model to extract the CNN features of each person in the tracked bounding box;

Step 2: Use the CNN features of all people as the input of the CCG-LSTM coherence constraint map long-term memory network, and jointly learn the individual motion states of all people over time under the constraints of spatiotemporal context coherence;

Step 3: Use the attention mechanism of global context coherence to learn the attention factors corresponding to different movements, and obtain the movement state of each individual under the constraint of global context coherence through the attention factors;

Step 4, in each time step, the aggregated long-short-term memory network LSTM in the coherence constraint graph long-short-term memory network CCG-LSTM aggregates all the motion states of a single individual obtained by different attention factors into a hidden representation of the entire activity;

Step 5. Input the hidden representation of each activity into the softmax classifier at each time step;

In step 6, the output of the softmax classifier at each time step is averaged to infer the class of group activity.

2. method according to claim 1, is characterized in that, step 1 specifically comprises the steps:

For each video clip, an object tracker in the Dlib library is used to track a set of bounding boxes around each person within a certain time step, and the object tracker extracts the CNN features of each person in each bounding box.

3 . The method according to claim 2 , wherein in step 1, if the tracker does not track a person in a certain frame, an all-zero matrix is used to make up for the missing feature of the target person in this frame of image. 4 .

4. method according to claim 1, is characterized in that, step 2 specifically comprises the steps:

Step 201, given a video clip of T frames, which describes the group activity containing V individuals, represents the CNN features of the vth person in the tth frame, where t∈{1,2,...,T}, v∈{1,2,...,V};

Step 202, express the characteristics of the group activity as a graph structure θ ^t ={S ^t ,E ^t }(t=1,2,...,T) in the spatial and temporal domains, where E ^t is the adjacency matrix;

Step 203, construct the coherence constraint graph long and short-term memory network CCG-LSTM, the motion state of the vth node in the CCG-LSTM at time step t Calculated as follows

in

where, for the vth node, is the input gate, For the gate of oblivion, is the output gate, For adjoining the gate of oblivion, is the time confidence gate, is the spatial context confidence gate, φ( ) is a multilayer perceptron, represents the motion state of the vth node at time step t-1, Indicates that the motion state i of the i-th node at time step t-1 belongs to the adjacent nodes of the v-th node, represents the relationship weight between the vth node and the ith node at time step t-1, is the spatial context state of the vth person, W _* , U _* , G _* are the weight matrices, b _* is the bias vector, and * refers to the subscripts i, g, o, f and σ( ) represents the sigmoid activation function, represents the tanh activation function, ⊙ represents element-wise multiplication, Φ(v) represents the adjacent node of the vth node, Represents the spatial context memory state of the vth node, W _p : W _x : _Wq : Yes The spatial context state transformed by the dimension of the _Wp transformation matrix, Yes The feature transformed by the dimension of the W _x transformation matrix, is the variable of the motion state projected to another dimension space, the parameter ρ is the input range of the control function, is the spatial context memory state of the vth node, is the memory state of the corresponding neighbors of the vth node.

5. method according to claim 1, is characterized in that, step 3 specifically comprises the steps:

Step 301, obtain the average motion state of all individual motion states to represent the hidden representation of the overall activity at that time step, i.e.

Step 302, use an attention model to learn attention factors to measure the contribution of individual activities to the overall activity where γ is a parameter;

Step 303, obtain the motion state of the vth node under the constraint of global context coherence through the attention factor

6. method according to claim 1, is characterized in that, step 4 specifically comprises the steps:

Use an aggregated LSTM in the spatial domain to aggregate the motion states of all people into a hidden person-to-person representation of the entire activity at time step t:

in, represents the hidden state of the aggregated LSTM, z ^t is the hidden representation of the entire activity at time step t, is the motion state of the vth individual under the constraint of global context coherence.

7. The method according to claim 1, wherein step 5 specifically comprises the steps:

Putting the hidden representation z ^t (t=1,2,...,T) of the group activity at time step t into the softmax classifier gets y ^t =soft max(z ^t ),t=1,2,... ., T.

8. method according to claim 1, is characterized in that, step 6 specifically comprises the steps:

Average the outputs of all softmax classifiers to get the probability class vector of group activity Thereby, the classification results are obtained.