Human body joint point prediction method and device and motion type identification method and device
Technical Field
The embodiment of the invention relates to the technical field of machine learning, in particular to a human body joint point prediction method and device and an action type identification method and device.
Background
The human body joint point data has important functions in the fields of human body identification, robot driving, behavior prediction and the like. In the problem of calculating the position of a human joint point, the prior art mainly obtains the position of the human joint point by methods such as machine learning reconstruction missing joints, forward dynamics calculation, reverse dynamics calculation and the like, but has limitations. Machine learning is carried out to reconstruct the key point positions of most human joints, and the missing joint information is reconstructed through training; the forward dynamics completely follows the hierarchy of the parent-child relationship of the joint points, and the position of the parent joint point of the joint point is required to be provided for calculating the position of a certain joint point; and the inverse dynamics calculation is to calculate how the parent joint can make the child joint reach the position according to the position of the child joint.
However, the inventors found that at least the following problems exist in the prior art: the existing mode for acquiring the human body joint points usually needs to acquire a large number of human body joint point positions or the hierarchical relationship of father-son joint points, needs a large amount of manpower, material resources and time, and consumes a large amount of data acquisition.
Disclosure of Invention
An object of an embodiment of the present invention is to provide a method and an apparatus for predicting human body joint points, and a method and an apparatus for identifying motion types, which avoid the manpower, material resources, and time required for acquiring a large number of human body joint points or hierarchical relationships between father and son joints, and reduce the consumption of data acquisition.
In order to solve the above technical problem, an embodiment of the present invention provides a method for predicting a human joint, including: acquiring a motion image and positions of M designated joint points of a human body in the motion image, wherein M is larger than 0; inputting the motion image and the positions of M designated joint points into a human body joint point recognition model trained in advance to obtain the predicted positions of N human body joint points of the human body, wherein N is larger than M, the N human body joint points comprise M designated joint points, and N and M are integers.
An embodiment of the present invention also provides a human body joint point prediction apparatus, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the human joint prediction method.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above-mentioned human joint prediction method.
The embodiment of the invention also provides an action type identification method, which comprises the following steps: obtaining the predicted positions of N human body joint points on the human body of the motion image by using the human body joint point prediction method; and inputting the motion image and the predicted positions of the N human body joint points into a pre-trained motion classification model to obtain the motion type of the motion image.
An embodiment of the present invention further provides an action type recognition apparatus, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the motion type identification method.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above-described action type identification method.
Compared with the prior art, the embodiment of the invention provides a human body joint point prediction method, which comprises the following steps: acquiring a motion image and positions of M designated joint points of a human body in the motion image, wherein M is larger than 0; inputting the motion image and the positions of M designated joint points into a human body joint point recognition model trained in advance to obtain the predicted positions of N human body joint points of the human body, wherein N is larger than M, the N human body joint points comprise M designated joint points, and N and M are integers. In the embodiment, when the motion image and the positions of part of human joint points in the motion image are obtained, the positions of most human joint points which accord with the human posture in the motion image can be predicted by using the pre-trained recognition model of the human joint points, so that the manpower, material resources and time for acquiring a large number of human joint points or hierarchical relations of father-son joint points are avoided, the consumption of data acquisition is reduced, and convenience is provided for the technologies of human body recognition, robot driving, behavior prediction and the like.
In addition, the human body joint point recognition model trained in advance is trained in the following way: acquiring a plurality of frames of training images containing human body actions, wherein each frame of training image of the plurality of frames of training images comprises: the positions of the real human body joint points and the real human body joint points appointed from the real human body joint points, wherein the number of the appointed real human body joint points is less than that of the real human body joint points; inputting a plurality of frames of training images into a human body joint point recognition model to obtain the predicted position of a real human body joint point in each frame of training image, wherein the predicted position comprises the following steps: a predicted position of the designated real human body joint point; calculating a first loss function according to the predicted position, the position of the real human body joint point and the position of the appointed real human body joint point; and when the first loss function meets the first preset condition, finishing the training.
In addition, the first loss function L1 is calculated by the following formula:
where n denotes n training images, AniPosition of i real human body joint points, Q, for each of n training imagesniI predicted positions of each training image in n training images, k represents the number of specified real human joint points, PjPosition Q of the j-th designated real human joint point for each of n training imagesjAnd the predicted position of the j-th specified real human body joint point of each frame of training image in the n-th frame of training image.
In addition, the human body joint point recognition model trained in advance is trained in the following way: acquiring a plurality of groups of training images containing human body actions, wherein each group of training images contains continuous frames of training images, and the training images of each frame in the continuous frames of training images comprise: the positions of the real human body joint points and the real human body joint points appointed from the real human body joint points, wherein the number of the appointed real human body joint points is less than that of the real human body joint points; inputting the motion training data sets of continuous frames into a human body joint point recognition model to obtain the predicted position of the real human body joint point in each frame of training image, wherein the predicted position comprises the following steps: a predicted position of the designated real human body joint point; calculating a second loss function according to the predicted position, the position of the real human body joint point and the position of the appointed real human body joint point; and when the second loss function meets a second preset condition, finishing the training.
In addition, the second loss function L2 is calculated by the following formula:
where n denotes n sets of training images, s denotes s frames of training images in each set of training images, As niRepresenting the positions of i real human joint points, Q, of each of s training images of the n sets of training imagess niI predicted positions of each training image in s training images in n groups of training images are represented, k is the number of appointed real human joint points, and P is the number of the appointed real human joint pointst jPosition Q of j-th designated real human body joint point of t-th frame training image in s-frame training imaget jAnd the predicted position of the jth real human body joint point of the tth frame of training image in the s frame of training image is shown.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
FIG. 1 is a flowchart illustrating a method for predicting a joint point of a human body according to a first embodiment of the present invention;
FIG. 2 is a schematic configuration diagram of a human joint prediction apparatus according to a second embodiment of the present invention;
fig. 3 is a flowchart illustrating an action type recognition method according to a fourth embodiment of the present invention;
fig. 4 is a schematic configuration diagram of an action type recognition device according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
A first embodiment of the present invention relates to a human joint prediction method, and the core of the present embodiment is to provide a human joint prediction method including: acquiring a motion image and positions of M designated joint points of a human body in the motion image, wherein M is larger than 0; inputting the motion image and the positions of M designated joint points into a human body joint point recognition model trained in advance to obtain the predicted positions of N human body joint points of the human body, wherein N is larger than M, the N human body joint points comprise M designated joint points, and N and M are integers. In the embodiment, when the motion image and the positions of part of human joint points in the motion image are obtained, the positions of most human joint points which accord with the human posture in the motion image can be predicted by using the pre-trained recognition model of the human joint points, so that the manpower, material resources and time for acquiring a large number of human joint points or hierarchical relations of father-son joint points are avoided, the consumption of data acquisition is reduced, and convenience is provided for the technologies of human body recognition, robot driving, behavior prediction and the like.
The following describes the details of the human body joint prediction method of the present embodiment in detail, and the following is only provided for easy understanding and is not essential to implementing the present embodiment.
Fig. 1 is a schematic flow chart of the human body joint prediction method according to the present embodiment:
step 101: and acquiring the motion image and the positions of M designated joint points of the human body in the motion image.
Specifically, a motion image including a motion posture of a human body is acquired, and positions of M designated joint points of the human body are indicated in the motion image, the number of the designated joint points is greater than 0, that is, M is greater than 0, and M is an integer. The M designated joint points may be joint point positions of the same joint part, or may include joint point positions of different joint parts. For example: the M designated joint points are joint point positions of the elbow or wrist of the hand, and the M designated joint points can also be joint point positions of the upper half body or the lower half body of the human body.
Step 102: and inputting the motion image and the positions of the M designated joint points into a human body joint point recognition model trained in advance to obtain the predicted positions of the N human body joint points of the human body.
Specifically, a human joint point recognition model capable of obtaining the positions of most human joint points in a motion image according to the motion image and the positions of some human joint points in the motion image is trained in advance, and when the positions of most human joint points in a certain motion image need to be obtained, the positions of M designated joint points with the number greater than 0 in the motion image only need to be collected, so that the predicted positions of N human joint points in the motion image can be obtained by using the trained human joint point recognition model, wherein N is greater than M, and N is an integer. Therefore, when scenes such as human body identification, robot driving, behavior prediction and the like needing to acquire a large number of positions of human body joint points are needed, the problem that in the prior art, the labor, material resources and time for acquiring the positions of a large number of human body joint points or acquiring the hierarchical relationship of father and son joint points are consumed is avoided, the consumption of data acquisition is reduced, and convenience is brought to the technologies such as human body identification, robot driving, behavior prediction and the like.
It should be noted that the N human joint points predicted by using the pre-trained human joint point recognition model include M designated joint points, that is, the obtained predicted positions of the N human joint points include both the predicted positions of the M designated joint points and the predicted positions of N-M human joint points other than the M designated joint points. Since the predicted positions of the N-M individual body joint points predicted by the model often deviate from the actual positions of the body joint points, and the positions of the M designated joint points and the predicted positions of the N-M individual body joint points may not be able to form the body motion posture in the motion image, after the predicted positions of the N individual body joint points in the motion image are obtained by using the positions of the M designated joint points, the M designated joint points are discarded, and subsequent calculations such as robot driving, body recognition, behavior prediction and the like are performed by using the predicted positions of the M designated joint points and the predicted positions of the N-M individual body joint points, so as to ensure reliable execution of the body motion posture.
In the present embodiment, two training methods are used according to the difference of the training images used during training:
when a plurality of frames of training images containing human body actions are adopted during training, the human body joint point identification model is trained in the following mode:
each frame of training image of the plurality of frames of training images containing the human body motion comprises: the positions of the real human body joint points and the real human body joint points appointed from the real human body joint points, wherein the number of the appointed real human body joint points is less than that of the real human body joint points; inputting a plurality of frames of training images into a human body joint point recognition model to obtain the predicted position of a real human body joint point in each frame of training image, wherein the predicted position comprises the following steps: a predicted position of the designated real human body joint point; calculating a first loss function according to the predicted position, the position of the real human body joint point and the position of the appointed real human body joint point; and when the first loss function meets the first preset condition, finishing the training.
Wherein the first loss function L1 is calculated by the following formula (1):
where n denotes n training images, AniPosition of i real human body joint points, Q, for each of n training imagesniI predicted positions of each training image in n training images, k represents the number of specified real human joint points, PjPosition Q of the j-th designated real human joint point for each of n training imagesjAnd the predicted position of the j-th specified real human body joint point of each frame of training image in the n-th frame of training image.
Specifically, n frames of training images are acquired, the n frames of training images do not include images of continuous frames, each frame of training image includes i real human joint points and k real human joint points designated from the i real human joint points, wherein 0< k < i. And taking the training image of each frame, the positions of i real human body joint points corresponding to the training image and the positions of k appointed real human body joint points as the input of a human body joint point identification model for training, and predicting to obtain i predicted positions of the training image, wherein the i predicted positions comprise the predicted positions of the k appointed real human body joint points. Inputting the positions of i real human body joint points, the positions of k designated real human body joint points and i predicted positions of the training image into a loss function L1 in the formula (1), calculating a loss function value, and adjusting model parameters of a human body joint point recognition model according to the loss function value until the loss function L1 meets a first preset condition. The prediction result of the trained human body joint point recognition model meets the human body action posture in the training image, and the prediction positions of k appointed real human body joint points are close to the real result.
It should be noted that the first preset condition may be set by the user, and the first preset condition may be a value set by the user to represent the magnitude of the loss function, and when the value of the loss function is equal to or less than the value, the training of the human joint recognition model is represented successfully; or a preset value range which is set by a user and represents the change amplitude of the loss function, and when the change amplitude of the loss function is within the preset value range in the training process, the human body joint point identification model is successfully trained.
When a plurality of groups of training images containing human body actions are adopted during training, each group of training images contains training images of continuous frames, and then the human body joint point identification model is trained in the following way:
each training image of a plurality of groups of training images containing human body movement comprises continuous frames of training images, and the training image of each frame in the continuous frames of training images comprises: the positions of the real human body joint points and the real human body joint points appointed from the real human body joint points, wherein the number of the appointed real human body joint points is less than that of the real human body joint points; inputting the motion training data sets of continuous frames into a human body joint point recognition model to obtain the predicted position of the real human body joint point in each frame of training image, wherein the predicted position comprises the following steps: a predicted position of the designated real human body joint point; calculating a second loss function according to the predicted position, the position of the real human body joint point and the position of the appointed real human body joint point; and when the second loss function meets a second preset condition, finishing the training.
Wherein the second loss function L2 is calculated by the following formula (2):
where n denotes n sets of training images, s denotes s frames of training images in each set of training images, As niRepresenting the positions of i real human joint points, Q, of each of s training images of the n sets of training imagess niI predicted positions of each training image in s training images in n groups of training images are represented, k is the number of appointed real human joint points, and P is the number of the appointed real human joint pointst jPosition Q of j-th designated real human body joint point of t-th frame training image in s-frame training imaget jRepresenting the prediction bit of the jth real human body joint point of the tth frame training image in the s frame training imagesAnd (4) placing.
Specifically, n groups of training images are acquired, each group of training images comprises s frames of training images, the s frames of training images are images of human body motion postures of continuous frames, each frame of training image comprises i real human body joint points and k real human body joint points appointed from the i real human body joint points, wherein 0< k < i. And taking the training image of each frame, the positions of i real human body joint points corresponding to the training image and the positions of k appointed real human body joint points as the input of a human body joint point identification model for training, and predicting to obtain i predicted positions of the training image, wherein the i predicted positions comprise the predicted positions of the k appointed real human body joint points. Inputting the positions of i real human body joint points, the positions of k designated real human body joint points and i predicted positions of the training image into a loss function L1 in the formula (2), calculating a loss function value, and adjusting model parameters of a human body joint point recognition model according to the loss function value until the loss function L1 meets a first preset condition. The prediction result of the trained human body joint point recognition model meets the human body action posture in the training image, and the prediction positions of k appointed real human body joint points are close to the real result.
It is worth to be noted that the second preset condition can be set by the user, the first preset condition can be a numerical value set by the user and representing the size of the loss function, and when the value of the loss function is equal to or less than the numerical value, the training of the human body joint point identification model is represented successfully; or a preset value range which is set by a user and represents the change amplitude of the loss function, and when the change amplitude of the loss function is within the preset value range in the training process, the human body joint point identification model is successfully trained.
Compared with the prior art, the embodiment of the invention provides the human body joint point prediction method, when the action image and the positions of part of human body joint points in the action image are obtained, the positions of most human body joint points which accord with the human body posture in the action image can be predicted by using the pre-trained human body joint point recognition model, so that the manpower, material resources and time for collecting a large number of human body joint points or father-son joint point hierarchical relations are avoided, the consumption of data collection is reduced, and the convenience is provided for the technologies of human body recognition, robot driving, behavior prediction and the like.
A second embodiment of the present invention relates to a human joint prediction method, as shown in fig. 2, including at least one processor 201; and a memory 202 communicatively coupled to the at least one processor 201; the memory 202 stores instructions executable by the at least one processor 201, and the instructions are executed by the at least one processor 201 to enable the at least one processor 201 to execute the above-mentioned human joint prediction method.
Where the memory 202 and the processor 201 are coupled in a bus, the bus may comprise any number of interconnected buses and bridges, the buses coupling one or more of the various circuits of the processor 201 and the memory 202 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 201 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 201.
The processor 201 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 202 may be used to store data used by the processor 201 in performing operations.
The third embodiment of the present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described human joint prediction method.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
A fourth embodiment of the present invention relates to a motion type recognition method in which the predicted positions of most human joint points in a motion image are acquired by the human joint point prediction method of the first embodiment.
A flowchart of the action type identification method in this embodiment is shown in fig. 3, and specifically includes:
step 301: and acquiring the motion image and the positions of M designated joint points of the human body in the motion image.
Step 302: and inputting the motion image and the positions of the M designated joint points into a human body joint point recognition model trained in advance to obtain the predicted positions of the N human body joint points of the human body.
The steps 301 and 302 are substantially the same as the steps 301 and 302 in the first embodiment, and are not repeated herein to avoid repetition.
Step 303: and inputting the motion image and the predicted positions of the N human body joint points into a pre-trained motion classification model to obtain the motion type of the motion image.
Specifically, in the prior art, when the motion type of a certain motion image to be classified is identified, part of human body joint points of the motion image to be classified are usually acquired to avoid consumption caused by collecting a large number of human body joint points, but the accuracy rate of identifying the motion type of the motion image to be classified is not high because the number of human body joint points is small. Therefore, in order to improve the accuracy of identifying the motion type of the motion classification model, most human body joint points in the motion image to be classified are required to be acquired. In the embodiment, the human body joint point recognition model in the first embodiment is used, after the motion image to be classified and the position of the designated joint point in the motion image to be classified are input into the human body joint point recognition model, the motion image to be classified and the predicted positions of the obtained N human body joint points are input into the pre-trained motion classification model to obtain the motion type of the motion image to be classified, so that the consumption of manpower, material resources and time caused by the collection of a large number of human body joint points is avoided, and the recognition accuracy of the motion type of the motion classification model is improved.
It should be noted that, the method for training the motion classification model according to the motion image and the human body joint points in the motion image belongs to the prior art, and this embodiment is not described in detail.
Compared with the prior art, the embodiment of the invention provides the motion type identification method, after the motion image to be classified and the position of the designated joint point in the motion image to be classified are input into the human body joint point identification model, the motion image to be classified and the obtained predicted positions of N human body joint points are input into the pre-trained motion classification model to obtain the motion type of the motion image to be classified, so that the consumption of manpower, material resources and time caused by the collection of a large number of human body joint points is avoided, and the accuracy of the identification of the motion type of the motion classification model is improved.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A fifth embodiment of the present invention relates to an action type recognition apparatus, as shown in fig. 4, including at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401 to enable the at least one processor 401 to execute the above-mentioned motion type identification method.
Where the memory 402 and the processor 401 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 401 and the memory 402 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 may be transmitted over a wireless medium via an antenna, which may receive the data and transmit the data to the processor 401.
The processor 401 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store data used by processor 401 in performing operations.
The sixth embodiment of the present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described action type identification method.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.