CN112315456B

CN112315456B - Human body action prediction method based on jump attention mechanism

Info

Publication number: CN112315456B
Application number: CN202011067849.2A
Authority: CN
Inventors: 舒祥波; 张瑞鹏; 宋砚; 唐金辉
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2020-10-07
Filing date: 2020-10-07
Publication date: 2022-02-11
Anticipated expiration: 2040-10-07
Also published as: CN112315456A

Abstract

The invention discloses a human body action prediction method based on a jump connection attention mechanism, which comprises the following steps: inputting the normalized human skeleton point coordinates of each frame into an encoder consisting of a multilayer self-updating convolution gating circulation network, and extracting the layering characteristics of the human skeleton point sequence; aggregating the layered features extracted at the last time step of the encoder to obtain a long-term semantic vector of the human body action; taking hidden variables of each layer of the last time step of the encoder as initial hidden variables of the decoder, and calculating the weight of each layer of characteristics of the decoder according to human body action long-term semantic vectors of the encoder by using a jump attention mechanism; connecting the adjusted hierarchical features and the human motion long-term semantic vector into new features, generating the variable quantity of a human skeleton point frame to be predicted and the previous frame through a convolutional neural network, adding the variable quantity to the input skeleton points, and then performing normalized data reduction to obtain the predicted value of the human motion skeleton points. The method can effectively predict the human body action and has high accuracy.

Description

Human body action prediction method based on jump attention mechanism

Technical Field

The invention relates to a human body motion prediction technology, in particular to a human body motion prediction method based on a jump-over attention mechanism.

Background

The human body action prediction is to automatically predict the future action sequence according to the observed human body action sequence, and can be applied to most human-computer interaction systems. The human body action prediction has wide application scenes: the system has supplementary effect on industrial automation and automatic driving technology and has important effect on interaction of intelligent robots and human beings.

Human motion prediction has attracted more and more researchers' attention, with the main challenge that early human motion prediction tasks rarely target human behavior; the long-term human body action prediction result tends to be constant action; the prediction effect is poor for aperiodic actions.

Disclosure of Invention

The invention aims to provide a human body motion prediction method based on a jump connection attention mechanism, which has a good long-term prediction effect on periodic and aperiodic human body motions.

The technical scheme for realizing the purpose of the invention is as follows: a human body action prediction method based on a jump-over attention mechanism comprises the following steps:

step 1, inputting a section of human skeleton point coordinate sequence, and carrying out normalization processing on the human skeleton point coordinate sequence to obtain a processed section of human skeleton point coordinate sequence;

step 2, inputting the human skeleton point coordinates of each frame into an encoder consisting of a multilayer self-updating convolution gating circulation network, and extracting the layering characteristics of the human skeleton point sequence;

step 3, aggregating the hierarchical features extracted by the gated loop network of the last time step to obtain a long-term semantic vector of the human body action;

step 4, inputting the layered features extracted by the encoder and the previous frame of human skeleton points into a decoder to obtain layered features;

step 5, calculating the weight of each layer of features of the decoder according to the human body action long-term semantic vector of the encoder by using a jump attention mechanism so as to adjust the importance of the layered features;

step 6, connecting the adjusted hierarchical features and the human body action long-term semantic vectors into new features, and generating the variation of the human body skeleton point frame to be predicted and the previous frame by the new features through a convolutional neural network;

and 7, adding the variable quantity of the skeleton points and the input frame of the decoder, and then restoring the normalized data to obtain the final predicted value of the human action skeleton points.

Compared with the prior art, the invention has the remarkable advantages that; (1) the method is based on the jump attention mechanism, can effectively predict the human body action, and has high prediction accuracy; (2) the method has good long-term prediction effect on periodic and aperiodic human body actions.

Drawings

Fig. 1 is a flowchart of a human body motion prediction method based on a jump attention mechanism according to the present invention.

Fig. 2 is an effect diagram of a human body motion prediction method based on a jump attention mechanism.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

as shown in FIG. 1, a human body motion prediction method based on a jump attention mechanism comprises four processes of extracting human body skeleton point sequence hierarchical features, extracting long-term semantic vectors, calculating weights of decoder hierarchical features and generating skeleton point variable quantities.

Extracting human skeleton point sequence layering characteristics: inputting the normalized human skeleton point coordinates of each frame into an encoder consisting of a multilayer self-updating convolution gating circulation network, and extracting the hierarchical characteristics of the human skeleton point sequence;

extracting a long-term semantic vector: aggregating the layered features extracted at the last time step of the encoder to obtain a long-term semantic vector of the human body action;

calculating weights of decoder hierarchical features: taking hidden variables of each layer of the last time step of the encoder as initial hidden variables of the decoder, and calculating the weight of each layer of features of the decoder according to human body action long-term semantic vectors of the encoder by using a jump attention mechanism so as to adjust the importance of the hierarchical features;

generating the bone point variation: and connecting the adjusted hierarchical features and the human motion long-term semantic vector into new features, generating the variation of a human skeleton point frame to be predicted and a previous frame by the new features through a convolutional neural network, adding the variation of the skeleton point to the input skeleton point of a decoder, and restoring normalized data to finally obtain the predicted value of the human motion skeleton point. The method can effectively predict the human body action and has high prediction accuracy.

The steps of the above method will be described in detail below.

The method for extracting the human skeleton point sequence layering characteristics comprises the following steps:

step 1), inputting a section of human body skeleton point coordinate sequence, and carrying out normalization processing on the human body skeleton point coordinate sequence to obtain a processed section of human body skeleton point coordinate sequence X ═ X₁,…,x_s,…,x_S]Which describes the activity of a person, of which there is x_s∈R^NAnd the skeleton key point coordinates of the human body at the time step s are shown, and N is the number of joints of the human body.

Step 2), the sequence X obtained in step 1) ═ X₁,…,x_s,…,x_s]Is input toIn an encoder composed of a multi-layer Self-updating convolution gating circulation network (Self-updating ConvGRU), the calculation formula of one layer of the multi-layer Self-updating convolution gating circulation network (Self-updating ConvGRU) at the time step t is as follows:

updating the door z_tCalculated from the formula:

z_t＝σ(W_zh*h_t-1+W_zx*x_t+b_z),

reset gate r_tCalculated from the formula:

r_t＝σ(W_rh*h_t-1+W_rx*x_t+b_r),

candidate hidden variables

Calculated from the formula:

self-updating door hh_tCalculated from the formula:

then candidate hidden variables

Is updated again:

finally, the current hidden variable h is obtained_t：

Wherein h is_t-1Is the last time step tHidden variable of-1, x_tIs a hidden variable of the previous layer at time step t, σ () is a sigmoid activation function, tanh () is a tanh activation function, W_*Is a learnable transformation parameter, b_*Is a bias term, is a convolution operation,

are multiplied by element.

Step 3) obtaining hidden variables of each layer of the input sequence at the last time step of the encoder through the step 2)

Where K denotes the number of layers of the encoder, these hidden variables are the hierarchical characteristics of the input sequence.

The extraction of the long-term semantic vector comprises the following steps:

step 4) the hierarchical characteristics of the input sequence obtained in the step 3)

And (3) aggregating to form a human body action long-term semantic vector:

where g () is an aggregation function, a convolution operation is used as the aggregation function.

The method for calculating the weight of the decoder hierarchical characteristics mainly comprises the following steps:

step 5) constructing a decoder by adopting the same calculation formula in the step 2), taking hidden variables of each layer of the encoder obtained in the step 3) as initial hidden variables of the decoder, taking human skeleton points obtained by predicting at the time step t-1 as input of the decoder at the time step t, and finally obtaining hidden variables of the self-updating convolution gated cyclic network of each layer

Step 6), hidden variables from decoder by convolution operation

Extracting the characteristics of each layer at time step t

Step 7), calculating the weight of each layer of features of the decoder according to the long-term semantic vector of the human body action, and setting the features of the nth layer of the time step t

The score calculation formula is as follows:

wherein W_fIs a weight matrix, b_fIs a bias vector;

the score represents the importance of the features of the decoder layers, and is normalized using the Softmax function:

finally, the

Representing the adjusted characteristics of the decoder layers.

Generating the bone point variation comprises the following steps:

step 8), connecting the characteristics of the time step t adjusted by the decoder in the step 7) with the long-term semantic vector of the human action in the step 4), and generating the variation of the human skeleton points through convolution operation:

the predicted human body bone point coordinate at time step t +1 can be obtained by adding the human body bone point variation at time step t and the input human body bone point coordinate of the decoder.

Fig. 2 is a diagram of the effect of the human body motion prediction method based on the jump-joint attention mechanism, in which the first four rows in the first row are real bone points, the first four rows are predicted bone points from the fifth row, and the second row is real bone points for comparison.

Claims

1. A human body action prediction method based on a jump-over attention mechanism is characterized by comprising the following steps:

step 7, adding the variable quantity of the skeleton points and the input frame of the decoder, and then restoring the normalized data to obtain a final predicted value of the human body action skeleton points;

the step 7 specifically comprises the following steps:

the variation of the human skeleton point of the last time step t

And the input human skeleton point coordinate y of the last time step t decoder_tAdding to obtain the human skeleton point coordinate y of the predicted time step t +1_t+1：

Finally, the predicted human skeleton point y of the time step t +1_t+1And restoring the normalized data to finally obtain the predicted value of the human body action skeletal point at the time step t + 1.

2. The human motion prediction method based on the jump attention mechanism according to claim 1, wherein the step 5 specifically comprises the following steps:

step 501, calculating the weight of each layer of features of the decoder according to the long-term semantic vector of human body motion, and setting the features of the nth layer of a time step t

The score calculation formula is as follows:

wherein W_fIs a weight matrix, b_fIs a bias vector, C represents a long-term semantic vector of human body motion;

step 502, the score represents the importance of each layer feature of the decoder, and the score is normalized by using a Softmax function

：

Finally, the

Representing the adjusted characteristics of the decoder layers.

3. The human motion prediction method based on the jump attention mechanism according to claim 2, wherein the step 6 specifically comprises the following steps:

connecting the characteristics of the time step t after being adjusted by the decoder with the long-term semantic vector of the human body action, and generating the variable quantity of the human body skeleton point through convolution operation:

。