Disclosure of Invention
In order to overcome the problems in the related art at least to a certain extent, the application provides a human body action evaluation method, an evaluation device and an evaluation system.
According to a first aspect of embodiments of the present application, there is provided a human body motion evaluation method, including:
training and deploying a human body action evaluation model;
evaluating the human body action of the user by using a human body action evaluation model to obtain a feature score;
analyzing the emotion when the user checks the characteristic score output by the human body action evaluation model by using the emotion analysis model, and constructing an error matching library according to the analysis result;
updating the human body action evaluation model by using an error matching library;
and evaluating the human body action by using the updated human body action evaluation model.
In the above human body motion evaluation method, the specific process of training and deploying the human body motion evaluation model is as follows:
generating an action evaluation data set according to the human skeleton data, wherein the process comprises the following steps:
acquiring a human body two-dimensional video stream and a depth image stream by using an RGB-D camera;
generating 3D human skeleton data according to the human two-dimensional video stream and the depth image stream;
obtaining an input action sequence according to the 3D human skeleton data;
performing characteristic quantization and measurement comparison on the input action sequence and the standard action sequence to obtain a characteristic score;
obtaining a human body action evaluation sample according to the standard action sequence, the input action sequence and the characteristic scores corresponding to the standard action sequence and the input action sequence;
screening each human body action evaluation sample to obtain an action evaluation data set;
and training the deep learning model by using the human body action evaluation sample in the action evaluation data set to obtain and deploy the human body action evaluation model.
In the above human body motion evaluation method, the specific process of evaluating the human body motion of the user by using the human body motion evaluation model to obtain the feature score is as follows:
acquiring a real-time input action sequence of a user and a standard action sequence selected to be learned by the user;
inputting the acquired real-time input action sequence and the standard action sequence into a human body action evaluation model;
and the human body action evaluation model evaluates the similarity between the input action sequence and the standard action sequence and outputs a characteristic score.
In the above human body action evaluation method, the process of analyzing the emotion when the user views the feature score output by the human body action evaluation model by using the emotion analysis model and constructing the error matching library according to the analysis result includes:
the method comprises the steps that an RGB-D camera is used for collecting expression video streams when a user sees feature scores output by a human body action evaluation model in real time, and the collected video streams are sent to an emotion analysis model;
the emotion analysis model analyzes the emotion when the user sees the feature score output by the human body action evaluation model, and the probability of the emotion when the user sees the feature score is obtained; the emotions include happiness, confusion and frustration;
when the probability of the confusion among the three emotions is the maximum, the emotion analysis model compares the obtained confusion probability with a preset confusion probability threshold value, and calculates an adjusted score according to the difference value of the obtained confusion probability and the preset confusion probability threshold value and the adjustment base number;
and taking the corresponding input action sequence and standard action sequence when the emotion of the user is puzzled when the user sees the feature score and the adjusted score as error samples, and constructing an error matching library by using the error samples.
Further, the specific process of updating the human body action evaluation model by using the mismatch library is as follows:
inputting the samples in the error matching library into a human body action evaluation model;
and updating the weight of the human body motion evaluation model through a back propagation algorithm to reduce the loss function of the human body motion evaluation model, stopping updating until the loss function of the human body motion evaluation model is converged, and taking the obtained human body motion evaluation model as the updated human body motion evaluation model.
According to a second aspect of embodiments of the present application, there is provided a human motion evaluation apparatus, including a human motion evaluation model, an emotion analysis model, and a mismatch library;
the human body action evaluation model is used for evaluating the similarity of the acquired real-time input action sequence of the user and a standard action sequence selected to be learned by the user and outputting a feature score;
the emotion analysis model is used for analyzing emotion when a user checks the feature score output by the human action evaluation model, and constructing an error matching library according to an analysis result;
and the error matching library is used for updating the human body action evaluation model.
Further, the human body action evaluation model is obtained through training and deployment in advance; the specific process for obtaining the human body action evaluation model comprises the following steps:
generating an action assessment data set from human skeletal data, comprising the steps of:
acquiring a human body two-dimensional video stream and a depth image stream by using an RGB-D camera;
generating 3D human skeleton data according to the human two-dimensional video stream and the depth image stream;
obtaining an input action sequence according to the 3D human skeleton data;
acquiring a standard action sequence from a standard action database;
performing characteristic quantization and measurement comparison on the input action sequence and the standard action sequence to obtain a characteristic score;
obtaining a human body action evaluation sample according to the standard action sequence, the input action sequence and the characteristic scores corresponding to the standard action sequence and the input action sequence;
screening each human body action evaluation sample to obtain an action evaluation data set;
and training the deep learning model by using the human body motion evaluation sample in the motion evaluation data set to obtain a human body motion evaluation model.
Further, the specific process of analyzing the emotion when the user views the feature score output by the human body action evaluation model by the emotion analysis model and constructing the error matching library according to the analysis result is as follows:
the method comprises the steps that an RGB-D camera is used for collecting expression video streams when a user sees feature scores output by a human body action evaluation model corresponding to a user real-time input action sequence in real time, and the collected video streams are sent to an emotion analysis model;
the emotion analysis model analyzes the emotion when the user sees the feature score output by the human body action evaluation model, and obtains the probability of the emotion when the user sees the feature score, wherein the emotion comprises happiness, confusion and frustration;
when the probability of the confusion among the three emotions is the maximum, the emotion analysis model compares the obtained confusion probability with a preset confusion probability threshold value, and calculates an adjusted score according to the difference value of the obtained confusion probability and the preset confusion probability threshold value and the adjustment base number;
and taking the corresponding input action sequence and standard action sequence when the emotion of the user is puzzled when the user sees the feature score and the adjusted score as error samples, and constructing an error matching library by using the error samples.
Furthermore, the specific process of updating the human body action evaluation model by the mismatch library is as follows:
inputting the samples in the error matching library into a human body action evaluation model;
and updating the weight of the human body motion evaluation model through a back propagation algorithm to reduce the loss function of the human body motion evaluation model, stopping updating until the loss function of the human body motion evaluation model is converged, and taking the obtained human body motion evaluation model as the updated human body motion evaluation model.
According to a third aspect of the embodiments of the present application, there is provided a human body action evaluation system, which includes an RGB-D camera, a human-computer interaction interface, and a human body action evaluation device;
the RGB-D camera is used for collecting a human body two-dimensional video stream and a depth image stream and is also used for collecting an expression video stream when a user sees a feature score output by a human body action evaluation model in the human body action evaluation device;
selecting a standard action sequence to be learned by a user through the human-computer interaction interface, and presenting a feature score output by the human action evaluation model to the user through the human-computer interaction interface;
the human body action evaluation device is used for constructing an error matching library according to the analysis result of the emotion analysis model, updating the human body action evaluation model by using the error matching library, and evaluating the real-time input action of the user by using the updated human body action evaluation model.
According to the above embodiments of the present application, at least the following advantages are obtained:
the traditional human body action labeling usually adopts a mode that a person looks at actions and then gives scores, and the mode is too subjective, so that the physical objectivity in action similarity is lost. Compared with the traditional data labeling mode, the method and the device have the advantage that the action evaluation data set is obtained by adopting a mode of combining objective knowledge system evaluation and manual screening.
The traditional human body action evaluation method adopts two independent steps of human body action characteristic quantification and measurement comparison, and the two steps need human experts to design, so that the designed method cannot meet all actions and always has a large part of actions which cannot be well recognized. Compared with the traditional human body action evaluation method, the human body action evaluation model based on the deep learning model has the advantages that the characteristic quantification and the measurement are well coupled, and the identification accuracy is higher.
The traditional human body action evaluation method is an open-loop evaluation mode and cannot supervise whether the score given by a model is proper or not. Compared with the traditional evaluation mode, the evaluation method has the advantages that the emotion analysis model is added, so that the human action evaluation is a closed-loop evaluation mode, and the evaluation meets the physical objective property and the human sensory ability.
After the traditional human body action evaluation method is deployed to equipment, an evaluation model is a solidified model, errors cannot be repaired by self, and the model can be manually upgraded only when an instruction is sent manually to upgrade the model. Compared with the traditional human body action evaluation method, the method has the advantages that the whole life learning updating process of the human body action evaluation model is additionally arranged, so that the human body action evaluation model can be automatically corrected after mistakes are made every time, the same error evaluation is guaranteed not to be generated any more when the same error is encountered next time, and the continuous self-learning and model automatic optimization capabilities of the equipment are kept in the whole service life period.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the scope of the invention, as claimed.
Detailed Description
For the purpose of promoting a clear understanding of the objects, aspects and advantages of the embodiments of the present application, reference will now be made to the accompanying drawings and detailed description, wherein like reference numerals refer to like elements throughout.
The illustrative embodiments and descriptions of the present application are provided to explain the present application and not to limit the present application. Additionally, the same or similar numbered elements/components used in the drawings and the embodiments are used to represent the same or similar parts.
As used herein, "first," "second," …, etc., are not specifically intended to mean in a sequential or chronological order, nor are they intended to limit the application, but merely to distinguish between elements or operations described in the same technical language.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.
As used herein, "and/or" includes any and all combinations of the described items.
References to "plurality" herein include "two" and "more than two"; reference to "multiple sets" herein includes "two sets" and "more than two sets".
Certain words used to describe the present application are discussed below or elsewhere in this specification to provide additional guidance to those skilled in the art in describing the present application.
Fig. 1 is a main flowchart of a human body motion evaluation method according to an embodiment of the present application. Fig. 2 is a detailed flowchart of a human body motion evaluation method according to an embodiment of the present application.
As shown in fig. 1 and 2, the human body motion evaluation method provided by the present application includes the following steps:
s1, training and deploying a human body action evaluation model, wherein the specific process is as follows:
s11, generating an action evaluation data set according to the human skeleton data, as shown in fig. 3, the specific process is:
and S111, acquiring a human body two-dimensional video stream and a depth image stream by using the RGB-D camera.
Specifically, the RGB-D camera comprises an image acquisition sensor and a corresponding depth collector, and a human body two-dimensional video stream is acquired by using the image acquisition sensor; and acquiring a depth image stream by using a depth acquisition device.
And S112, generating 3D human skeleton data according to the human two-dimensional video stream and the depth image stream.
Specifically, firstly, obtaining pixel coordinates in an image according to an RGB image in a human body two-dimensional video stream; and then generating 3D human skeleton data according to the point cloud mapping relation and the depth information in the depth image stream.
And S113, obtaining an input action sequence according to the 3D human skeleton data.
Specifically, the input action sequence may be expressed as the input trainee's skeletal point coordinates.
And acquiring the standard action sequence from the standard action database. In particular, the standard motion sequence may be represented as pre-stored skeleton point coordinates of the trainer.
S114, carrying out characteristic quantization and measurement comparison on the input action sequence and the standard action sequence to obtain a characteristic score, wherein the specific process is as follows:
the traditional motion characteristic quantification method is adopted to quantify the human motion into the characteristics of joint vectors, joint planes, joint speeds, joint angles, joint relative positions and the like. Note that the human body motion here includes an input motion and a standard motion.
And measuring and comparing the characteristics corresponding to the input action and the standard action by adopting a measuring method to obtain a characteristic score. It should be noted that the difference between the quantized features of the input motion and the corresponding features of the quantized standard motion may be calculated by using a DTW distance, a euclidean distance, a cosine distance, a mahalanobis distance, or a manhattan distance.
And S115, evaluating a large number of human body actions by adopting the method of the step S114 to obtain a large number of human body action evaluation samples, wherein the form of each human body action evaluation sample is [ standard action sequence, input action sequence and feature score ].
And S116, screening each human body motion evaluation sample to obtain a motion evaluation data set.
It should be noted that when the human body action evaluation samples are manually screened, an error threshold value is preset, and a professional action scoring coach evaluates the similarity between an input action and a standard action according to an action similarity guide standard to give a manual score; and comparing the manual score with the feature score, if the error of the manual score and the feature score is larger than a preset error threshold value, discarding the human body action evaluation sample corresponding to the feature score, and finally obtaining the human body action evaluation sample according with the judgment rule, wherein all the human body action evaluation samples according with the judgment rule form an action evaluation data set.
In the process of generating the action evaluation data set according to the human skeleton data, traditional human action characteristic quantization and measurement comparison are adopted as a physical objective knowledge system, then human sensory knowledge is added by manual screening, a high-quality action evaluation data set can be constructed through physical objective evaluation and manual screening sensory evaluation, and the action evaluation data set simultaneously considers objective evaluation and human sensory comfort.
S12, training the deep learning model by using the human motion estimation samples in the motion estimation data set to obtain a human motion estimation model, as shown in fig. 4, the specific process is as follows:
and learning a human body action evaluation sample in the action evaluation data set by adopting a deep learning model based on a convolutional neural network and a time sequence neural network, wherein the input of the deep learning model is a standard action sequence, an input action sequence and a feature score.
The advantages and the defects of the traditional human body action evaluation can be learned by using the deep learning model, and meanwhile, the end-to-end structure from the input action sequence to the output characteristic score can be completed, and a characteristic quantification and measurement comparison model does not need to be specially designed.
S2, evaluating the human body action of the user by using the human body action evaluation model to obtain a feature score, wherein the specific process is as follows:
s21, acquiring a real-time input action sequence of the user and a standard action sequence selected by the user to be learned;
specifically, a human body two-dimensional video stream and a depth image stream of a user can be acquired in real time through an RGB-D camera; generating 3D human skeleton data according to the human two-dimensional video stream and the depth image stream; and obtaining a real-time input action sequence of the user according to the 3D human skeleton data.
The standard action sequence to be learned selected by the user can be directly obtained from a preset standard action database.
S22, inputting the acquired real-time input action sequence and the standard action sequence into a human action evaluation model;
and S23, evaluating the similarity of the input action sequence and the standard action sequence by the human body action evaluation model, and outputting a characteristic score.
The traditional human body action evaluation method adopts two independent steps of human body action characteristic quantification and measurement comparison to evaluate the human body action, and the two steps need human experts to design, so that the designed method cannot meet the requirements of all actions, and a large part of actions cannot be well identified. Compared with the traditional human body action evaluation method, the human body action evaluation model based on the deep learning model can well couple characteristic quantification and measurement, and the identification accuracy is higher.
S3, analyzing the emotion when the user views the feature score output by the human body action evaluation model by using the emotion analysis model, and constructing an error matching library according to the analysis result, as shown in FIG. 5, the specific process is as follows:
and S31, acquiring expression video streams when the user sees the feature scores output by the human body action evaluation model in real time by using the RGB-D camera, and sending the acquired video streams to the emotion analysis model.
It should be noted that, the emotion analysis model may use THIN: a THrowable information networks model, Deep FEA model, or EmotionNet model.
And S32, analyzing the emotion when the user sees the feature score output by the human body action evaluation model by the emotion analysis model to obtain the probability of the emotion when the user sees the feature score, wherein the emotion comprises happiness, confusion and frustration.
And S33, when the probability of the confusion among the three emotions is the maximum, comparing the obtained confusion probability with a preset confusion probability threshold value by the emotion analysis model, and calculating to obtain an adjusted score according to the difference value of the two and the adjustment base number.
To facilitate a clearer understanding of how the adjusted scores are calculated, a specific example is used for the following description.
The emotion analysis model obtains the probability of the mood of the user when seeing the feature score as 100%, and if the obtained happy probability is 10%, the discouragement probability is 20%, the doubtful probability is 70%, the probability threshold value in the doubtful case is 50%, and the adjustment base number is 10, the score required to be adjusted is: (70% -50%) 10 ═ 2, then the current adjusted score was: score +2 after last adjustment.
And S34, taking the corresponding input action sequence and standard action sequence when the emotion of the user is puzzled when the user sees the feature score and the adjusted score as error samples, and constructing an error matching library by using the error samples.
For convenience of explanation, an input action sequence and a standard action sequence corresponding to a case where the emotion of the user when seeing the feature score is in doubt are referred to as an erroneous input action sequence and an erroneous standard action sequence, respectively. Specifically, the error sample is in the form of [ erroneous standard action sequence, erroneous input action sequence, adjusted score ].
The characteristic score which is more physical and objective can be obtained through the human body action evaluation model, a user has self perceptual knowledge on the action, and the characteristic score output by the human body action evaluation model is adjusted according to the output result of the emotion analysis model, so that the adjusted characteristic score gives consideration to objective evaluation and sensory evaluation, the probability that the traditional method cannot identify or has deviation is reduced, and the finally obtained characteristic score has objectivity.
S4, updating the human motion estimation model by using the mismatch library, as shown in fig. 6, the specific process is as follows:
and inputting the samples in the error matching library into the human body action evaluation model.
And updating the weight of the human body motion evaluation model through a back propagation algorithm to reduce the loss function of the human body motion evaluation model, and stopping updating until the loss function of the human body motion evaluation model is converged, wherein the obtained human body motion evaluation model is the updated human body motion evaluation model.
And updating the human body action evaluation model by using the error matching library, so that the updated human body action evaluation model has the capability of identifying error data, and when the error data is encountered again, an evaluation result which meets the psychological expectation of the user and gives consideration to objective action key evaluation can be identified.
After each evaluation, the human body action evaluation model automatically updates the optimization model by using the error samples in the error matching library under the condition of not depending on the operation of external personnel, so that the next evaluation result of the human body action evaluation model is more reasonable and accurate.
And S5, evaluating the human body action by using the updated human body action evaluation model.
The traditional human body action evaluation method is an open-loop evaluation mode and cannot supervise whether the score given by a model is proper or not. Compared with the traditional evaluation mode, the evaluation of the human body action evaluation model can become a closed-loop evaluation by adding the emotion analysis model, so that the feature score output by the human body action evaluation model can meet the physical objective property and the human sensory ability.
The traditional human body action evaluation method is a solidified model after being deployed to equipment, errors cannot be repaired by self, and the model can be manually upgraded only when an instruction is sent manually to upgrade the model. Compared with the traditional human body action evaluation method, the method has the advantages that by additionally arranging the life-span learning regulation process, the model can be automatically corrected after mistakes are made every time, the same error evaluation is guaranteed not to be generated any more when the same error is encountered next time, and the continuous self-learning and model automatic optimization capabilities of the equipment are kept in the whole service life period.
Based on the human body action evaluation method provided by the application, the application also provides a human body action evaluation device which comprises a human body action evaluation model, an emotion analysis model and an error matching library.
The human body action evaluation model is used for evaluating the similarity of the acquired real-time input action sequence of the user and a standard action sequence selected to be learned by the user and outputting a feature score.
The emotion analysis model is used for analyzing emotion when the user views the feature scores output by the human action evaluation model, and an error matching library is constructed according to the analysis result.
And the error matching library is used for updating the human body action evaluation model.
In a specific embodiment, the human action evaluation model is obtained by training and deploying in advance. The specific process for obtaining the human body action evaluation model comprises the following steps:
firstly, an action evaluation data set is generated according to human skeleton data, and the method comprises the following steps:
and acquiring a human body two-dimensional video stream and a depth image stream by using the RGB-D camera.
And generating 3D human skeleton data according to the human two-dimensional video stream and the depth image stream.
And obtaining an input action sequence according to the 3D human skeleton data.
And acquiring the standard action sequence from the standard action database.
And performing characteristic quantization and measurement comparison on the input action sequence and the standard action sequence to obtain a characteristic score.
And obtaining a large number of human body action evaluation samples according to a large number of standard action sequences and input action sequences and feature scores corresponding to the standard action sequences and the input action sequences.
And manually screening each human action evaluation sample, discarding the evaluation samples which do not accord with the preset rule, and obtaining an action evaluation data set according to the reserved evaluation samples which accord with the preset rule.
Secondly, training the deep learning model by using the human motion evaluation samples in the motion evaluation data set to obtain a human motion evaluation model.
In a specific embodiment, the specific process of analyzing the emotion when the user views the feature score output by the human body action evaluation model by the emotion analysis model and constructing the error matching library according to the analysis result is as follows:
and the RGB-D camera is used for acquiring expression video streams when the user sees the feature scores output by the human body action evaluation model corresponding to the user real-time input action sequence in real time, and the acquired video streams are sent to the emotion analysis model.
The emotion analysis model analyzes the emotion when the user sees the feature score output by the human body action evaluation model, and obtains the probability of the emotion when the user sees the feature score, wherein the emotion comprises happiness, confusion and frustration; the sum of the probabilities for the three emotions is 1.
And when the probability of the confusion among the three emotions is the maximum, the emotion analysis model compares the obtained confusion probability with a preset confusion probability threshold value, and calculates an adjusted score according to the difference value of the obtained confusion probability and the preset confusion probability threshold value and the adjustment base number.
And taking the corresponding input action sequence and standard action sequence when the emotion of the user is puzzled when the user sees the feature score and the adjusted score as error samples, and constructing an error matching library by using the error samples.
In a specific embodiment, the specific process of updating the human body action evaluation model by the mismatch library is as follows:
and inputting the error samples in the error matching library into the human body action evaluation model.
And updating the weight of the human body motion evaluation model through a back propagation algorithm to reduce the loss function of the human body motion evaluation model, and stopping updating until the loss function of the human body motion evaluation model is converged, wherein the obtained human body motion evaluation model is the updated human body motion evaluation model.
In each of the above embodiments, the applicant's body motion evaluation apparatus further includes a standard motion database in which a standard motion sequence is stored, and the standard motion database outputs the standard motion sequence in response to a standard motion selection command input by a user.
In another embodiment, the application further provides a human body action evaluation system which comprises an RGB-D camera, a human-computer interaction interface and a human body action evaluation device.
The RGB-D camera is used for collecting a human body two-dimensional video stream and a depth image stream and is also used for collecting an expression video stream when a user sees a feature score output by a human body action evaluation model in the human body action evaluation device.
The user can select a standard action sequence to be learned through the human-computer interaction interface, and the human-computer interaction interface can also present the feature score output by the human action evaluation model to the user.
The human body action evaluation device is used for constructing an error matching library according to the analysis result of the emotion analysis model, updating the human body action evaluation model by using the error matching library, and evaluating the real-time input action of the user by using the updated human body action evaluation model.
The method and the system for evaluating the human body actions are constructed through a process of obtaining an action evaluation data set through serial data labeling, a human body action evaluation model construction process, an emotion analysis model process and a life-cycle learning updating process of a human body action evaluation model, and can continuously self-learn from human emotions and keep model optimization and self-updating.
The embodiments of the present application described above may be implemented in various hardware, software code, or a combination of both. For example, embodiments of the present application may also represent program code for performing the above-described methods in a data signal processor. The present application may also relate to various functions performed by a computer processor, digital signal processor, microprocessor, or field programmable gate array. The processor described above may be configured in accordance with the present application to perform certain tasks by executing machine-readable software code or firmware code that defines certain methods disclosed herein. Software code or firmware code may be developed to represent different programming languages and different formats or forms. Different target platforms may also be represented to compile the software code. However, different code styles, types, and languages of software code and other types of configuration code for performing tasks according to the present application do not depart from the spirit and scope of the present application.
The foregoing is merely an illustrative embodiment of the present application, and any equivalent changes and modifications made by those skilled in the art without departing from the spirit and principles of the present application shall fall within the protection scope of the present application.