CN119380394A

CN119380394A - An algorithm for facial expression recognition and imitation of virtual characters or robots

Info

Publication number: CN119380394A
Application number: CN202411953666.9A
Authority: CN
Inventors: 袁野; 任钦泽; 刘娜; 李清都
Original assignee: Shanghai Zhuoyide Robot Co ltd
Current assignee: Shanghai Zhuoyide Robot Co ltd
Priority date: 2024-12-27
Filing date: 2024-12-27
Publication date: 2025-01-28
Anticipated expiration: 2044-12-27
Also published as: CN119380394B

Abstract

An algorithm for imitating facial expressions of virtual characters or robots, the present invention belongs to the field of image recognition and imitation, wherein the algorithm for imitating facial expressions of robots mainly includes: inputting a cropped face image into a depth model, identifying key facial points in the face image through the depth model, and obtaining head posture parameters, and then predicting the value of facial expression from the key facial points; calculating the head posture of the robot: calculating the rotation matrix through the head posture parameters, converting the rotation matrix into the Euler angle of the robot head posture; mapping the predicted value of facial expression into the rotation angle of the robot facial control servo; thereby controlling the operation of the robot. The present invention can obtain robot expression imitation data of micro-expressions with a small dynamic amplitude, and is convenient for accurately mapping a large amount of robot recognition and imitation data from a large number of video expressions to robot entities, and linking the overall head posture to make the expression of expression more accurate.

Description

Facial expression recognition imitation algorithm for virtual character or robot

Technical Field

The invention relates to the technical field of image recognition imitation, in particular to a facial expression recognition imitation algorithm for a virtual character or a robot.

Background

With the development of artificial intelligence technology, the intelligent human-computer interaction technology has higher and higher interest, is widely loved by users, but the traditional intelligent human-computer interaction technology is limited to voice and image interaction, has difficult to meet the diversified demands of users, and is an emerging expression recognition and interaction technology and eye tracking interaction technology, but the technologies all need a carrier to better present, the traditional presentation carrier is generally an AI virtual character, but the expression robot simulating the human-computer interaction is undoubtedly the best carrier for human-computer interaction at present, and not only can diversified integrated presentation of various human-computer interaction technologies, but also is the most natural interaction carrier for users. Therefore, the mature expression robot product can be widely applied to the fields of education assistance, medical companion nursing, customer guiding service, entertainment demonstration, social contact and the like in the future.

However, the present expression robot technology also needs a large amount of expression recognition simulation data to make the robot expression interaction more natural and vivid, in the prior art, the patent document with the bulletin number of CN116597484a discloses a real-time expression simulation method and device for a robot, the method can obtain key point positions on all faces in an image by recognizing the face image to be detected, determine the face to be simulated in the face image to be detected according to the key point positions, the current frame of the face image to be detected is compared with the previous frame in real time, the transformation trend and transformation proportion of the face to be imitated can be obtained, the transformation trend is used for adjusting the head rotation angle of the robot, the transformation proportion is used for calculating the rotation angles and the running time of all steering engines of the robot, so that the steering engines of the robot can control the action of the robot, and expression imitation is carried out on the face to be imitated. Although the above scheme can provide the robot expression imitation, it is difficult to provide a large amount of expression recognition imitation data by adopting a real-time expression imitation mode, and the robot expression imitation data of the micro-expression with small dynamic amplitude is difficult to obtain by acquiring the transformation trend and transformation proportion of all key point coordinates of the face to be imitated, so that the robot can be acquired by adjusting the robot mode according to the transformation trend only in the expression with large dynamic amplitude, but the scheme is difficult to reflect the micro-expression with small dynamic amplitude to the robot for imitation action.

Disclosure of Invention

The invention provides the robot expression simulation data capable of obtaining the micro-expressions with smaller dynamic amplitude, which is convenient for obtaining a large amount of robot identification simulation data by accurately mapping a large amount of video expressions into robot entities, and linking the head pose of the whole body, so that the expression is more accurate, and simultaneously, the robot expression simulation data can be associated with a virtual character to perform synchronous expression identification simulation, and a virtual character or robot facial expression identification simulation algorithm for a comparison reference observation test can be performed.

The specific technical scheme is as follows:

there is provided a facial expression recognition simulation algorithm for a robot, comprising the steps of:

S1, circularly acquiring a face image, and carrying out cutting pretreatment on the face image;

S2, inputting the cut face image into a depth model mediapipe, identifying facial key points in the face image through the depth model mediapipe, obtaining head pose parameters, and predicting the value of the facial expression through the facial key points;

S3, calculating the position and the pose of the robot head, namely establishing three basic rotation matrixes through head pose parameters, wherein the product of the three basic rotation matrixes is a rotation matrix, and then converting the rotation matrix into three Euler angles corresponding to the position and the pose of the robot head;

S4, mapping the position of the predicted facial expression value in the range of the predicted original data set into a target data set of the robot facial control steering engine, so as to obtain the rotating angle of the robot facial control steering engine;

And S5, finally, controlling the pose of the robot head and the face of the robot to control the steering engine to rotate according to the Euler angle and the rotation angle.

Further, in S3, the calculation of the robot head pose is specifically:

according to the depth model mediapipe, head pose parameters are alpha, beta and gamma respectively, and three basic rotation matrixes are established as follows:

Then, the product of the three basic rotation matrices is a rotation matrix rm= And finally, converting into three Euler angles corresponding to the pose of the robot head according to the rotation matrix.

Further, in S1, face images are circularly acquired by the image acquisition device.

Further, the values of the euler angle and the rotation angle obtained in S3 and S4 are subjected to filtering processing.

Further, the filtering processing mode is mean filtering processing, the average value is obtained by averaging the angle values generated in the adjacent time of the occurrence time of the angle value of the current processing, and the obtained average value replaces the angle value of the current processing to adjust and control the position and the posture of the robot head or the rotation of the robot face to control a steering engine, specifically:

Wherein, the obtained In order to obtain a stable angle value after the filtering process,For the currently processed angle value(s),Is based onThe size of the selected neighborhood window is chosen,Is thatAngle values obtained in adjacent times.

Further, whereinThe value is 5-10.

Further, 52 facial key points in the face image are identified through the depth model mediapipe, 3 head pose parameters are obtained, and then the values of the facial expression predicted by the facial key points are also 52.

The method comprises the steps of selecting 52 morphological keys and 3 head morphological poses corresponding to the face of a person in the blender software, binding the 52 morphological keys and the 3 head morphological poses corresponding to the head of the person, then forming a mapping relation between 3 head pose parameters obtained by identifying a face image through a depth model mediapipe and 3 head morphological poses in the blender software, and forming a mapping relation between 52 facial expression values predicted by facial key points and 52 morphological keys selected in the blender software, so that binding control of the 52 morphological keys and the 3 head morphological poses of the facial expression of the person is realized.

The method has the advantages that the method can obtain the robot expression simulation data of the micro-expressions with small dynamic amplitude, is convenient for obtaining a large amount of robot identification simulation data by accurately mapping a large amount of video expressions into a robot entity, and links the whole head pose, so that the expression is more accurate, and can be associated with virtual characters to perform synchronous expression identification simulation, joint interaction or comparison reference observation test.

Drawings

FIG. 1 is a flow chart of the whole method of the invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.

Examples:

As shown in fig. 1, a facial expression recognition simulation algorithm for a robot includes the steps of:

S1, circularly acquiring face images through image acquisition equipment, directly extracting through video pictures, and cutting and preprocessing the face images.

S2, inputting the cut face image into a depth model mediapipe (an on-off source cross-platform framework developed by Google is mainly used for constructing a multimedia processing and machine learning application program), identifying facial key points in the face image through the depth model mediapipe, obtaining head pose parameters, and predicting the value of the facial expression through the facial key points.

The depth model mediapipe identifies 52 facial key points in the face image, 3 head pose parameters are obtained, and then the facial expression values predicted by the facial key points are also 52.

And S3, calculating the head pose of the robot, namely establishing three basic rotation matrixes through head pose parameters, wherein the product of the three basic rotation matrixes is the rotation matrix, then converting the rotation matrix into three Euler angles corresponding to the head pose of the robot, and linking the head pose of the whole robot, so that expression is more accurate, and accurate mapping of values of subsequent facial expressions is facilitated.

The calculating of the robot head pose in S3 is specifically:

The euler angle mode for converting the rotation matrix into three corresponding robot head poses is specifically and generally as follows:

if the obtained rotation matrix is set as follows:

Calculating the panning rotation angle The method comprises the following steps:

If (if) When it is needed to determine according to the specific rotation conditionIs a value of (2).

Calculating pitch angleThe method comprises the following steps:

The value range is 。

Calculating the rotation angle of the deflection headThe method comprises the following steps:

If the rotation order is different, the formulas for calculating the Euler angles will also be different, and the calculation order is Z-Y-X order. If the calculation process and the formula are changed for the X-Y-Z sequence, and in practical application, the problem of universal lock exists due to the Euler angle, namely, when the pitch angle isThis results in a loss of freedom, which requires special attention in switching and use.

S4, mapping the position of the predicted facial expression value within the range of the predicted original data set to a target data set of the robot facial control steering engine to obtain the rotating angle of the robot facial control steering engine, wherein the minimum and maximum values of the predicted facial expression value of the facial image are respectively 0 and 1, i.e. the limiting value range isAnd assuming that the rotation limit range of the steering engine is 0-90 degrees, when the value of the actually predicted facial expression is 0.1, the rotation angle of the robot facial control steering engine is 9 degrees, and when the value of the actually predicted facial expression is 0.5, the rotation angle of the robot facial control steering engine is 45 degrees, and so on. The mapping mode can obtain the robot expression imitation data of the micro-expression with smaller dynamic amplitude, has low requirements on picture quality, is convenient for extracting face images from a large number of existing rich expressions such as videos or film and television dramas, accurately maps the face images into a robot entity to obtain a large number of robot identification imitation data, and can accurately extract and express facial expressions based on the fact that the head pose is determined firstly.

And (3) respectively obtaining values of Euler angles and rotation angles in the S3 and the S4, and performing filtering treatment to avoid shaking caused by excessively intense movement amplitude of the robot, wherein the filtering treatment mode is mean filtering treatment, and the average value is obtained by averaging the angle values generated in adjacent time of the occurrence time of the angle value currently processed, so that the obtained average value replaces the angle value currently processed to adjust and control the pose of the robot head or the rotation of the robot face to control a steering engine, and specifically comprises the following steps:

Wherein, the obtained In order to obtain a stable angle value after the filtering process,For the currently processed angle value(s),Is based onSelected neighborhood window size, whichTake the value of 5-10, taking the most common 28 frames per second for video processing as an exampleThe value is 5, and a certain delay is caused because the value is taken to the subsequent time, but the delay is not excessive, otherwise obvious slow feeling is easily caused to man-machine interaction,Is thatAngle values obtained in adjacent times.

The invention also provides a facial expression recognition simulation algorithm for the virtual character, which is characterized in that 52 morphological keys and 3 head morphological poses of the face of the virtual character are selected from the blender software (3D computer graphic software) and are respectively bound with corresponding positions of the head of the virtual character, then 3 head pose parameters obtained by recognizing the face image through a depth model mediapipe and 3 head morphological poses in the blender software form a mapping relation, the mapping relation is simpler and is generally direct input, the values of 52 facial expressions predicted by facial key points and 52 morphological keys selected from the blender software form a mapping relation, so that the binding control of the 52 morphological keys and the 3 head morphological poses of the facial expression of the virtual character is realized, the virtual character and the robot entity participate in the expression recognition simulation in parallel at the same time, and the virtual character can be associated to perform synchronous expression recognition simulation, joint interaction or contrast observation test.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A facial expression recognition simulation algorithm for a robot, comprising the steps of:

2. The facial expression recognition simulation algorithm for a robot according to claim 1, wherein the calculation of the robot head pose in S3 is specifically:

3. The facial expression recognition simulation algorithm for a robot according to claim 1 or 2, wherein the facial image is circularly acquired by the image acquisition apparatus in S1.

4. The facial expression recognition simulation algorithm for a robot according to claim 1 or 2, wherein the values of the euler angle and the rotation angle are obtained in S3 and S4, respectively, and a filtering process is performed.

5. The algorithm according to claim 4, wherein the filtering processing mode is mean filtering processing, and the average value obtained by averaging the angle values generated in adjacent time of the occurrence time of the angle value currently processed is used for replacing the angle value currently processed to adjust and control the robot head pose or the robot face to control the steering engine to rotate, specifically:

6. The facial expression recognition simulation algorithm for a robot of claim 5, whereinThe value is 5-10.

7. The method according to any one of claims 1,2, 5, and 6, wherein 52 facial key points in the face image are identified by the depth model mediapipe, 3 head pose parameters are obtained, and then the value of predicting the facial expression from the facial key points is also 52.

8. A facial expression recognition simulation algorithm for a virtual character is characterized in that 52 morphological keys and 3 head morphological poses corresponding to the face of the character are selected in the blender software, the facial expression recognition simulation algorithm is bound with corresponding positions of the head of the virtual character respectively, then 3 head pose parameters obtained by recognizing a face image through a depth model mediapipe are mapped with 3 head morphological poses in the blender software, and the values of the 52 facial expressions predicted by facial key points are mapped with the 52 morphological keys selected in the blender software.