CN112766063A

CN112766063A - Micro-expression fitting method and system based on displacement compensation

Info

Publication number: CN112766063A
Application number: CN202011624238.3A
Authority: CN
Inventors: 王智勇; 关庆阳; 王治博; 毛书贵; 宋胜尊; 李永春; 童心
Original assignee: Shenyang Contain Electronic Technology Co ltd
Current assignee: Shenyang Contain Electronic Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-07
Anticipated expiration: 2040-12-31
Also published as: CN112766063B

Abstract

The invention provides a micro-expression fitting method and a system based on displacement compensation, wherein the method comprises the following steps: acquiring a reference image and an emotion image, wherein the reference image refers to an image under the condition that no stimulus source exists, and the emotion image refers to an image under the condition that a preset stimulus source exists; respectively obtaining a reference characteristic point and an emotion characteristic point according to the reference image and the emotion image; compensating the emotion characteristic points to obtain compensated emotion characteristic points; calculating a face micro-feature vector according to the position information of the reference feature point and the position information of the compensated emotion feature point, and outputting the face micro-feature vector meeting a preset threshold value as a face motion unit; and fitting the facial movement unit to obtain the predicted emotional characteristics.

Description

Micro-expression fitting method and system based on displacement compensation

Technical Field

The invention relates to the technical field of computer vision, in particular to a micro-expression fitting method based on displacement compensation.

Background

The micro expression recognition is a non-contact type psychological activity analysis method which utilizes a stimulus source to cause facial muscle change, extracts facial features through a deep learning neural network, fuses a plurality of formed features with unified dimensionality and acquires psychological features by combining with psychological semantic analysis. In the trial process, the trial difficulty can be reduced and trial and judgment can be assisted by a trial messenger through a non-contact micro-expression recognition method.

The existing micro-expression recognition method generally inputs the whole image of the face into a neural network for emotion clustering to obtain seven basic emotions of happiness, hurry, fear, anger, disgust, surprise, slight and the like, the classified emotions continue to increase, and the accuracy is greatly reduced. The prior art method has great limitation and has very limited auxiliary function in the interrogation process. In addition, when the head posture changes, the positions of the facial feature points in the two-dimensional space of the camera also change, so that the micro-feature positioning accuracy rate is reduced along with the increase of the head self-contained change.

Disclosure of Invention

The invention aims to provide a micro-expression fitting method and system based on displacement compensation, so that the emotional characteristics of a detected person can be quickly and accurately obtained under the condition that a detected face image has deflection.

The invention provides a micro-expression fitting method based on displacement compensation, which comprises the following steps: acquiring a reference image and an emotion image, wherein the reference image refers to an image under the condition that no stimulus source exists, and the emotion image refers to an image under the condition that a preset stimulus source exists; respectively obtaining a reference characteristic point and an emotion characteristic point according to the reference image and the emotion image; compensating the emotion characteristic points to obtain compensated emotion characteristic points; calculating a face micro-feature vector according to the position information of the reference feature point and the position information of the compensated emotion feature point, and outputting the face micro-feature vector meeting a preset threshold value as a face motion unit; and fitting the facial movement unit to obtain the predicted emotional characteristics.

In an embodiment according to the inventive concept, the compensating of the emotional feature points to obtain compensated emotional feature points may include: obtaining a head posture based on the emotional feature points; obtaining a displacement compensation value of the feature point based on the head posture; and compensating the emotion characteristic points according to the displacement compensation values to obtain compensated emotion characteristic points.

In an embodiment of the inventive concept, in the obtaining of the displacement compensation value of the feature point based on the head pose, the displacement compensation value of the feature point is obtained using a displacement estimation neural network, which may include: the input layer comprises three neurons, and input parameters of the input layer respectively correspond to the rotation angles of the head around an X axis, a Y axis and a Z axis; the first hidden layer adopts a hyperbolic tangent function as an activation function and compresses an input value to a preset range; the second hidden layer selects an exponential linear function as an activation function of the network model; and the output layer selects a logistic regression model as an activation function of the network and outputs the displacement compensation value of the feature point corresponding to the emotion feature point.

In an embodiment according to the inventive concept, the method may further include: training the displacement estimation neural network through the face posture sample. In the training process, constructing a face posture sample, wherein the face posture sample comprises rotation angles of the head around an X axis, a Y axis and a Z axis and a characteristic point deviation value corresponding to the rotation angles; constructing the displacement estimation neural network; training a displacement estimation neural network through a facial posture sample, randomly distributing weight coefficients of all layers during first iteration, and comparing displacement compensation values of feature points output based on an output layer with corresponding feature point deviation values; and adjusting the connection weight coefficient among the neurons according to the comparison result, and then performing next iteration, wherein the quality of the neural network model is evaluated through a loss function, the loss function is a cross entropy function, and when the loss function is lower than a preset loss threshold, converges, or the iteration number reaches a preset count, the training is terminated.

In an embodiment according to the inventive concept, the feature point compensation unit is configured to perform the steps of: taking the intersection point of the connecting line of the inner canthus of the two eyes and the perpendicular line of the nasal tip as a facial reference point; calculating a horizontal pixel difference and a vertical pixel difference of each feature point and the face reference point; and compensating the horizontal pixel difference and the vertical pixel difference of each feature point according to the corresponding displacement compensation value to obtain the compensated emotion feature point.

The invention provides a micro-expression fitting system based on multiple characteristic points, which comprises: the image acquisition unit is used for acquiring a reference image and an emotion image, wherein the reference image refers to an image under the condition that no stimulus source exists, and the emotion image refers to an image under the condition that a preset stimulus source exists; a face detection unit that obtains a reference face image and an emotion face image from the reference image and the emotion image, respectively; a feature point extraction unit which respectively obtains a reference feature point and an emotion feature point according to the reference face image and the emotion face image; the displacement compensation unit is used for compensating the emotion characteristic points to obtain compensated emotion characteristic points; the face motion acquisition unit is used for calculating a face micro-feature vector according to the position information of the reference feature point and the position information of the compensated emotion feature point and outputting the face micro-feature vector meeting a preset threshold value as a face motion unit; and the emotion recognition unit is used for fitting the face movement unit to obtain predicted emotion characteristics.

In an embodiment according to the inventive concept, the displacement compensation unit includes: the head posture resolving unit is used for obtaining a head posture according to the emotion characteristic points; the displacement estimation unit is used for obtaining a displacement compensation value of the characteristic point according to the head posture; and the characteristic point compensation unit is used for compensating the emotion characteristic points according to the displacement compensation values so as to obtain compensated emotion characteristic points.

Another aspect of the present invention provides a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the displacement compensation-based micro-expression fitting method as described above.

Another aspect of the present invention provides a computer apparatus, comprising: a processor; a memory storing a computer program which, when executed by the processor, implements the displacement compensation based micro-expression fitting method as described above.

According to one or more aspects of the invention, a displacement compensation based micro-expression fitting method and system predicts emotions from a reference image and an emotion image. Because the face motion unit obtained according to the reference image and the emotion image is adopted to carry out emotion fitting, more accurate emotion characteristics can be obtained.

According to one or more aspects of the invention, the displacement compensation-based micro-expression fitting method and system perform displacement compensation according to micro-features of facial emotions under different head postures, so that the accuracy of micro-expression recognition is improved.

Drawings

The above and other aspects, features and advantages of the present disclosure will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a multiple feature point based micro-expression fitting method;

FIG. 2 is a flow chart of facial feature point extraction obtained by a facial detection neural network and a facial feature point labeling network;

FIG. 3 is a schematic diagram of a facial micro-feature vector;

FIG. 4 is a flow diagram of generating a recursive neural network model for fitting of facial motion units from training samples;

FIG. 5 is a block diagram of a system for micro-expression fitting based on multiple feature points;

FIG. 6 is a schematic illustration of head pose and facial fiducial points;

FIG. 7 is a block diagram of a system for displacement-compensated micro-expression fitting;

fig. 8 is a block diagram of a displacement compensation unit that compensates for emotional feature points; and

fig. 9 is a schematic structural diagram of a displacement estimation neural network.

Detailed Description

The following detailed description is provided to assist the reader in obtaining a thorough understanding of the methods, devices, and/or systems described herein. Various changes, modifications, and equivalents of the methods, apparatus, and/or systems described herein will, however, be apparent to those of ordinary skill in the art. For example, the order of operations described herein is merely an example and is not limited to the order set forth herein, but rather, variations may be made which will be apparent to those of ordinary skill in the art in addition to operations which must be performed in a particular order. Furthermore, descriptions of features and structures that will be well known to those of ordinary skill in the art may be omitted for the sake of clarity and conciseness. The features described herein may be embodied in different forms and should not be construed as limited to the examples described herein. Rather, the examples described herein have been provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Embodiments of the present invention are described in detail below with reference to the accompanying drawings. Examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

FIG. 1 is a flow chart of a multiple feature point based micro-expression fitting method; FIG. 2 is a flow chart of facial feature point extraction obtained by a facial detection neural network and a facial feature point labeling network; FIG. 3 is a schematic diagram of a facial micro-feature vector.

Referring to fig. 1, in step S01, a reference image is acquired, and in step S02, a mood image is acquired, wherein the reference image refers to an image without a stimulus source and the mood image refers to an image with a predetermined stimulus source. Specifically, the reference image may be a face image of the detected person acquired separately without a stimulus source or a stimulus, and it is more favorable for emotion prediction to select the face image when the emotion of the detected person is relatively smooth as the reference image. Further, the emotion image may refer to an image selected from a video (e.g., an audition video). For example, the emotional image may be an image that the subject reacts to in the presence of a predetermined stimulus, which may refer to, for example, a conversation that occurs during an interrogation, a presented witness, a testimony, a particular action of the auditor. It should be noted that the purpose of selecting the predetermined stimulus source instead of any stimulus source is to predict only the emotion image corresponding to the predetermined stimulus source in order to eliminate interference of the expression of the subject, which is not related to the trial process, with the emotion prediction, and on the other hand, selecting the predetermined stimulus source can eliminate interference of an unconscious or subconscious expression with the emotion prediction. In particular, the above examples of predetermined stimulus sources are merely examples and are not intended to be limiting. In another embodiment, the reference image and the emotion image may be images respectively selected according to scene conditions in the same video. In an embodiment, the reference image and the emotion image may be two-dimensional image information, or may be three-dimensional image information having a depth feature captured and synthesized by a dual camera, a multi camera, or a depth camera.

In the embodiment of the invention, the input image is preprocessed by adopting a multi-convolution neural network joint model to obtain the facial feature points. The multi-convolution neural network joint model comprises a face detection neural network and a face feature point marking network which are improved based on a convolution neural network. The steps of obtaining facial feature points will be further explained and explained below.

In step S03, the reference image and the emotion image are input into the face detection neural network for face detection, respectively, to obtain a reference face image and an emotion face image, respectively. Referring to fig. 2, an image including a human face is input, and the image may output position coordinate information of upper left and lower right corners of the human face through a face detection neural network. And then cropping the image according to the position coordinate information. The specific structure and generation method of the face detection neural network will be further explained and explained below.

Referring to table 1, the face detection neural network employs a lightweight network model (or referred to as a box network model, a BoxNet model). The input Image of the BoxNet model is an Image of 320 wide × 180 high pixels, and the number of channels is 3. In an embodiment, the input Image of the BoxNet model may be scaled from an Image of 1920 × 1080 pixels of the acquisition camera. The size of the input Image described herein is merely an example, and for example, in another embodiment, the size of the input Image may be (230-400) × (130-225). The size of the input image should not be too large to avoid a decrease in detection speed due to an increase in the amount of calculation; nor too small to avoid reducing accuracy.

TABLE 1 BoxNet model Structure Table

The structure of the BoxNet model is: a first convolutional layer Conv1, a second convolutional layer S-Conv1, a first downsampling layer Block1, and a second downsampling layer Block 2. The first convolution layer Conv1 performs convolution calculation on the input image by a convolution kernel of a predetermined size and a predetermined step size. For example, the size of the convolution kernel may be 3 × 3, the step size is 1, and the number of repetitions is 1. The output result has a size of 160 × 90 and the number of channels is 16.

The second convolutional layer S-Conv1 may perform a depth separable convolution calculation on the result output by the first convolutional layer by the same convolution kernel as the first convolutional layer Conv1 and a predetermined step size. In this case, the size of the output result of the second convolution layer S-Conv1 is 80 × 45, and the number of channels is 24.

It should be noted that the sizes and step sizes of the convolution kernels of the first convolution layer Conv1 and the second convolution layer S-Conv1 may be appropriately adjusted according to the situation, and are not limited to the case where the convolution kernels and the predetermined step size are the same.

The First convolutional layer Conv1 and the second convolutional layer S-Conv1 may be referred to as First _ Conv.

The first downsampling layer Block1 downsamples the result output from the second convolutional layer, and the second downsampling layer Block2 downsamples the result output from the first downsampling layer.

Wherein, first downsampling layer Block1 and second downsampling layer Block2 both include: a first basic unit, the step length of which is 2, the width and height of the output result are half of the width and height of the input, and the number of channels of the output result is twice of the number of channels of the input; and a second basic unit, the step length of which is 1, and the width, height and channel number of the output result are the same as those of the input result.

For example, referring to table 1, in the first downsampling layer Block1, the first basic unit is performed once at step 2. Thus, the output of the first basic unit after one execution is half of its input, i.e., 40 × 23 pixels, and the number of channels is twice the input, i.e., 48. The width, height and number of channels of the output of the second basic cell are the same as the output of the first basic cell. In the second downsampled layer Block2, the first basic unit is performed twice with step 2. Thus, the output of the first basic unit after twice execution is one-fourth of its input, i.e., 10 × 6 pixels, and the number of channels is four times that of the input, i.e., 192. The width, height and number of channels of the output of the second basic cell are the same as the output of the first basic cell.

The full connection layer FC non-linearly fits the result output by the second downsampling layer Block2 to obtain a face image corresponding to the input image. For example, the results of 192 channels after downsampling are sent to a full-connected layer of 500 neural units and then sent to a full-connected layer of 4 neural units, and the extracted results (e.g., feature maps) are subjected to nonlinear fitting to output coordinates of the top left corner and the bottom right corner of the face frame.

The first downsampling layer Block1 and the Second downsampling layer Block2 may be referred to as Second _ Conv.

In an embodiment, the method further comprises the step of training the face detection neural network through a face detection training set. During the training, the number of iterations is controlled by setting parameters (e.g., an epoch parameter, etc.), and the training is terminated when a certain condition is reached. For example, an output face frame and expected face frame intersection ratio coefficient (IOU _ accuracy) can be defined as a judgment condition for model training. For example, the intersection-and-union ratio coefficient may be a ratio of intersection and union of the positions of the output face frame and the expected face frame. The cross-over ratio (IOU) parameter of the model is set to 0.1-0.5, and preferably, may be set to 0.3. Namely, if the value is larger than the value, the face position detection is regarded as successful, and the face tracking detection can be well realized. In the embodiment, when the iteration number is less than 300, the weight coefficient among each neuron is adjusted through a back propagation algorithm, and the next iteration is restarted; when the number of iterations equals 300, or the cross-over ratio coefficient converges (e.g., no more rises), the training is terminated.

In an example embodiment, the BoxNet model may be used for face recognition during interrogation. The environment light source intensity of the interrogation environment is stable, the situations of light source movement, flicker, light and shade change and the like can not occur, the color of the surrounding wall is single, the detected person sits on the chair, and the face movement range in the detection process is small. Therefore, aiming at the characteristics of an interrogation environment, the BoxNet model adopts an optimized network structure, omits a convolution layer for dealing with the background of a complex environment and the change condition of a light source, and optimizes the cross-over ratio coefficient of a face frame. In this case, due to the fact that the BoxNet model has a small number of parameters and low calculation complexity, rapid and accurate feature extraction of the image through a lightweight backbone network structure can be achieved. Therefore, compared with the Haar cascade classifier based on OpenCV, the model has higher detection speed and higher detection accuracy.

Referring back to fig. 1, in step S04, the reference face image and the emotional face image are respectively input into the facial feature point labeling network for facial feature point extraction to respectively obtain reference feature points and emotional feature points. Referring to fig. 2, feature points corresponding to a face image are obtained by a face feature point labeling network based on the face image generated by a face detection neural network. In an example embodiment according to the present invention, the facial feature point marker network may be a network structure of Shuffle units (Shuffle units), and the convolution module included therein may adopt a Shuffle net structure. The facial feature point labeling network takes a facial image (e.g., a reference facial image and an emotional facial image) as an input layer and outputs facial feature points (e.g., reference feature points and emotional feature points).

The facial feature point marking network can adopt a high-performance convolutional neural network model, wherein the input image is a face image with 160 × 160 pixels, and the output image is position information of 68 feature points of the face. The size of the input image described herein is merely an example, and for example, in another embodiment, the size of the input image may be 110 × 110 to 210 × 210. Compared with the feature point extraction method of the DLib type of the common computer vision tool library, the model has higher extraction speed and higher accuracy.

The recognition speed of the multi-convolution neural network combined model under a host computer configured as i7-10700 is about 10ms, and the evaluation standard test accuracy rate reaches over 90% by using a Face Detection Data Set and a Benchmark (FDDB) Face Detection algorithm.

In an embodiment, the position information of the reference feature point and the position information of the emotional feature point include two-dimensional position information. In the case of using two or more cameras, the position information of the reference feature point and the position information of the emotional feature point may include three-dimensional position information.

Referring back to fig. 1, in step S05, a face micro feature vector is calculated from the position information of the reference feature point and the position information of the emotion feature point, and the face micro feature vector satisfying a predetermined threshold is output as a face motion unit.

Referring to fig. 3, the face micro-feature vector may be defined as follows:

the eyebrow arch micro-feature 201 is an eyebrow arch feature point vertical displacement vector and can represent an eyebrow arch movement unit;

the eyebrow heart microfeatures 202 are eyebrow heart feature point vertical displacement vectors and can represent eyebrow heart movement units;

the eye micro-features 203 are eye feature point vertical displacement vectors, and may represent eye movement units, such as gazelle, squint, closed eye, and the like;

the nose micro features 204 are nose feature point vertical displacement vectors, which may represent nose movement units, e.g., nose folds;

the lip micro-feature 205 is a lip feature point vertical displacement vector, and can represent a lip motion unit;

the mouth angle micro-features 206 are horizontal displacement vectors and vertical displacement vectors of the mouth angle feature points, and may represent mouth angle motion units, such as mouth angle rise, mouth angle fall, mouth angle stretch, mouth angle contraction, and the like.

In the embodiment, taking the eyebrow center micro feature as an example, when the eyebrow center micro feature vector is upward and the length is greater than a predetermined threshold (for example, two pixels), it may be understood as triggering the eyebrow center moving unit to move upward, thereby implementing the conversion of machine language (change of facial feature points) into facial semantics.

Referring back to fig. 1, in step S06, the recursive neural network model is used to fit the facial motion units to obtain the predicted emotional characteristics. For example, a recurrent neural network can be used to fit a facial motion unit to obtain seven basic emotions, such as happiness, hurry, fear, anger, disgust, surprise, slight lib, and the like. In addition, because the face movement unit is used for emotion fitting, complex emotions such as embarrassment, guilt, shame, pride, and the like can be obtained.

FIG. 4 is a flow diagram of generating a recurrent neural network model for fitting of facial motion units from training samples.

The micro-expression fitting method based on the multiple characteristic points further comprises the steps of constructing a Recurrent Neural Network (RNN) model and training the RNN model.

Referring to fig. 4, in step S601, an emotion training set and an emotion test set are constructed based on a training sample, where the training sample includes a facial movement unit obtained based on a reference image and an emotion image, and an emotion feature corresponding to the facial movement unit. In an embodiment, the training sample may be an image and emotional features of the sample acquired in an interrogation environment.

In step S602, a recurrent neural network model is constructed.

In step S603, the recurrent neural network model is trained by the emotion training set. For example, in one iteration, the recurrent neural network model trains 50 input training sets, and outputs 600 emotion values, wherein in the 600 emotion values, whether the classification requirement is met is judged according to the proportion of correct emotion to total emotion number. For example, the prediction accuracy of the recurrent neural network model is required to be greater than 90%.

In step S604, when the iteration condition is not satisfied, for example, when the prediction accuracy is different from that of the last iteration (for example, greater than a predetermined accuracy change threshold), or the predetermined maximum number of iterations is not reached, in step S605, the weighting coefficients among the respective neurons are inversely adjusted to improve the accuracy of each iteration. In an example embodiment, each layer of the recurrent neural network model can share the same weight parameter, so that the computation amount can be reduced, the computation speed can be improved, and the model has stronger generalization. For example, the automatic adjustment may be performed using a function packaged in the program code. After the network parameters are adjusted, the method returns to step S603 for next training.

By continuously iterating, when the iteration condition is satisfied, for example, in step S604, when the prediction accuracy no longer changes (for example, is less than or equal to a predetermined accuracy change threshold) or the number of iterations reaches a predetermined maximum number of iterations, the training of the recurrent neural network model is ended.

In step S606, it is determined whether the prediction accuracy of the recurrent neural network model satisfies the classification requirement. If not, discarding the data, and if the classification requirement is met, outputting the data to the next step.

In step S607, the trained recurrent neural network is evaluated by the emotion test set. According to the result of the evaluation, a recurrent neural network satisfying a predetermined accuracy is used for fitting of the face motion unit. For example, the trained recurrent neural network can be used for prediction of emotional features given by the facial motion units in the emotional test set when the detection accuracy reaches more than 90%.

FIG. 5 is a block diagram of a system for micro-expression fitting based on multiple feature points.

In fig. 5, the system for micro-expression fitting based on multiple feature points includes: an image acquisition unit 100, a face detection unit 200, a feature point extraction unit 300, a facial motion acquisition unit 400, and an emotion recognition unit 500.

The image acquisition unit 100 acquires a reference image and a mood image. The image acquisition unit 100 may be configured to perform steps S01 and S02 described with reference to fig. 1, and thus redundant description is omitted herein.

The face detection unit 200 inputs the reference image and the emotion image into the face detection neural network, respectively, for face detection to obtain the reference face image and the emotion face image, respectively. The face detection unit 200 may be configured to perform step S03 described with reference to fig. 1 and table 1, and thus redundant description is omitted herein.

The feature point extraction unit 300 inputs the reference face image and the emotional face image into the facial feature point labeling network, respectively, to perform facial feature point extraction to obtain reference feature points and emotional feature points, respectively. The feature point extraction unit 300 may be configured to perform step S04 described with reference to fig. 1, and thus redundant description is omitted herein.

The face motion acquisition unit 400 calculates a face micro feature vector from the position information of the reference feature point and the position information of the emotion feature point, and outputs the face micro feature vector satisfying a predetermined threshold as a face motion unit. The face motion acquisition unit 400 may be configured to perform step S05 described with reference to fig. 1 and 2, and thus redundant description is omitted herein.

The emotion recognition unit 500 fits the facial motion unit using a recurrent neural network model to obtain predicted emotional characteristics. In addition, the emotion recognition unit 500 may train the recurrent neural network model according to the training samples. The emotion recognition unit 500 may be configured to perform step S06 described with reference to fig. 1 and steps S601 to S607 of training the recurrent neural network model described with reference to fig. 4, and thus redundant description is omitted herein.

The present invention also provides a displacement compensation-based micro expression fitting method, which includes a step of compensating the emotional feature points to obtain compensated emotional feature points, and is performed after step S04 shown in fig. 1 to provide position information of the compensated emotional feature points for the subsequent step S05, although a specific flowchart is not shown. Further, step S03 and step S04 shown in fig. 1 may be combined to express that the reference feature point and the emotion feature point are obtained from the reference image and the emotion image, respectively. Redundant description is omitted herein.

The step of compensating the emotional feature points to obtain compensated emotional feature points may include: obtaining a head posture based on the emotional feature points; obtaining a displacement compensation value of the feature point based on the head posture; and compensating the emotion characteristic points according to the displacement compensation values to obtain compensated emotion characteristic points.

The step of compensating the emotional feature points to obtain compensated emotional feature points will be described in detail below with reference to fig. 6 to 9.

FIG. 6 is a schematic illustration of head pose and facial fiducial points; FIG. 7 is a block diagram of a system for displacement-compensated micro-expression fitting; fig. 8 is a block diagram of a displacement compensation unit that compensates for emotional feature points; fig. 9 is a schematic structural diagram of a displacement estimation neural network.

Referring to fig. 6, in the case where the head pose is at the reference position, the face orientation direction may be taken as the Z axis, and the horizontal line of the eye line may be taken as the X axis and the vertical direction as the Y axis. The above examples are merely for convenience of understanding and are not limited thereto. In other embodiments, a cylindrical coordinate system or a spherical coordinate system may be selected. The intersection point of the line connecting the inner canthus of both eyes and the perpendicular line passing the tip of the nose is taken as a facial reference point 601. The feature points extracted by facial expression do not usually include the intersection of the line connecting the inner canthus of both eyes and the perpendicular line passing through the tip of the nose. Selecting points other than the facial feature points as the facial reference points is more advantageous for calculating the offset values of the facial feature points in the case of head deflection. The intersection point of the line of the inner canthus of the two eyes and the perpendicular line passing through the tip of the nose is relatively fixed, the variation is minimum under various expressions, and the calculation complexity can be reduced by selecting the line as the facial reference point 601, so that a better correction effect is obtained.

Referring to fig. 7, a system of micro-expression fitting based on displacement compensation is shown that is substantially the same as or similar to the system of micro-expression fitting shown with reference to fig. 5, except for the displacement compensation unit 350 depicted in the figure. Like reference numerals refer to like elements throughout. Therefore, in order to avoid redundant description, only the differences from fig. 5 will be described herein.

The system for micro-expression fitting based on displacement compensation further comprises a displacement compensation unit 350 for compensating the emotion feature points of the emotion image obtained in the deflection state to obtain compensated emotion feature points. So that more accurate facial micro-feature vectors can be obtained in the subsequent facial motion acquisition unit 400.

Fig. 8 shows a specific configuration of the displacement compensation unit 350. The displacement compensation unit 350 may include a head posture resolving unit 351, a displacement estimating unit 353, and a feature point compensating unit 355. Fig. 9 is a schematic structural diagram of a displacement estimation neural network.

The head posture resolving unit 351 obtains a head posture from the emotional feature points. Among them, the head pose calculation unit 351 may be implemented as software, hardware, or a combination of software and hardware. For example, the head pose calculating unit 351 may employ a method of software such as a head pose transformation matrix least squares estimation method, a 3D convolutional neural network estimation method, a cyclic neural network estimation method, a codec neural network estimation method, and the like. In this case, the input parameter of the head posture calculation unit 351 is a facial feature point (for example, an emotional feature point of an emotional image), and the output result is a rotation angle of the head about the X, Y, and Z axes. For example, the head attitude calculation unit 351 may adopt a hardware method of the 3-axis attitude sensor calculation method. In this case, the head posture resolving unit may obtain the rotation angles of the head about the X-axis, the Y-axis, and the Z-axis from the output parameters of the 3-axis posture sensor.

The displacement estimation unit 353 obtains a displacement compensation value of the feature point according to the head posture. Referring to fig. 9, the displacement estimation unit 353 obtains the displacement compensation value of the feature point using the displacement estimation neural network in performing the step of obtaining the displacement compensation value of the feature point based on the head posture. The displacement estimation neural network includes: the input layer comprises three neurons, and input parameters of the input layer respectively correspond to the rotation angles of the head around an X axis, a Y axis and a Z axis; the first hidden layer selects a hyperbolic tangent function (tanh) as an activation function, and compresses an input value to a preset range, so that the data is further processed more stably, and the problem of non-zero averaging does not exist; the second hidden layer selects an exponential linear function (ELU) as an activation function of the network model, so that the convergence rate of the network is higher, and the problem of neuron necrosis is avoided; and an output layer, which selects a logistic regression model (softmax) as an activation function of the network and outputs the displacement compensation values of the feature points corresponding to the emotion feature points, wherein the logistic regression model is good at solving the problem of multi-classification, so that the output of the displacement compensation values of the feature points is realized, for example, the compensation values of the network model consisting of 26 neurons and corresponding to 13 feature points of the face in the horizontal direction and the vertical direction. In an embodiment, the predetermined range may be [ -1,1 ]. In an embodiment, the displacement compensation value of the feature point output by the displacement estimation unit 353 may select a face reference point as the reference point. The offset values of the feature points with respect to the face reference points are output.

The feature point compensation unit 355 compensates the emotional feature points according to the displacement compensation value to obtain compensated emotional feature points. For example, the feature point compensating unit 355 may be configured to perform the following steps: taking the intersection point of the connecting line of the inner canthus of the two eyes and the perpendicular line of the nasal tip as a facial reference point; calculating a horizontal pixel difference and a vertical pixel difference of each feature point and the face reference point; and compensating the horizontal pixel difference and the vertical pixel difference of each feature point according to the corresponding displacement compensation value to obtain the compensated emotion feature point.

Referring back to fig. 7, the face motion acquisition unit 400 calculates a face micro feature vector from the position information of the reference feature point and the position information of the compensated emotion feature point, and outputs the face micro feature vector satisfying a predetermined threshold to the emotion recognition unit 500 as a face motion unit.

In an embodiment, the method further comprises the step of training the displacement estimation neural network by facial pose samples. In the training process, firstly, a face posture sample is constructed, wherein the face posture sample comprises the rotation angles of the head around an X axis, a Y axis and a Z axis and a characteristic point offset value corresponding to the rotation angles, the characteristic point offset value refers to the offset corresponding to the rotation angles measured in advance, and can be stored in the face posture sample in the form of a lookup table. Second, the displacement estimation neural network is constructed to have the structure described above. Then, training the displacement estimation neural network through the face posture sample, randomly distributing the weight coefficient of each layer during the first iteration, and comparing the displacement compensation value of the feature point output by the output layer with the corresponding feature point deviation value. Then, the connection weight coefficient between each neuron is adjusted according to the comparison result, and then the next iteration is carried out.

In an embodiment, the goodness of the neural network model may be evaluated by a loss function. The loss function may be selected as a cross-entropy function. Because the output layer of the model selects the logistic regression model as the activation function, and the cross entropy function actually carries out negative log likelihood estimation on the sample, the cross entropy function is more fit with the logistic regression model, and in addition, the cross entropy function is more fit with the definition of the loss function from the relative entropy point of view. For example, the predetermined count may be set to 5000 times. And when the iteration times reach 5000 times or the loss function value is lower than a set threshold value and converges to a certain value and is not reduced, stopping the iteration to obtain the displacement estimation neural network which can be used for obtaining the displacement compensation value of the characteristic point.

The method and system for multi-feature point based micro-expression fitting and the system for displacement compensation based micro-expression fitting are described above with reference to the accompanying drawings. The method and the system predict the emotion according to the reference image and the emotion image. Because the face motion unit obtained according to the reference image and the emotion image is adopted to carry out emotion fitting, more accurate emotion characteristics can be obtained.

In addition, the micro-expression fitting method and system based on the multiple feature points adopts a multi-convolution neural network combined model comprising a face detection neural network and a face feature point marking network to preprocess the input image to obtain the face feature points. The method and the system can be optimized according to an interrogation scene, so that under the condition of low computing power, the accuracy can be high even if a lightweight model is adopted.

On the other hand, in the micro-expression fitting method and system based on the multiple feature points, the light-weight network model adopted by the face detection neural network has few parameters and low computational complexity, and rapid and accurate feature extraction of the image can be realized through a light-weight backbone network structure.

In another aspect, the displacement compensation-based micro-expression fitting method and system perform displacement compensation according to the micro-features of the facial emotion under different head postures, so that the accuracy of micro-expression recognition is improved.

The method and system for multi-feature point-based micro-expression fitting and the method and system for displacement compensation-based micro-expression fitting according to exemplary embodiments of the present disclosure have been described above with reference to fig. 1 to 9.

The various elements of the systems shown in fig. 5, 7, and 8 may be configured as software, hardware, firmware, or any combination thereof that performs the specified functions. For example, each unit may correspond to an application-specific integrated circuit, to pure software code, or to a module combining software and hardware. Furthermore, one or more functions implemented by the respective units may also be uniformly executed by components in a physical entity device (e.g., a processor, a client, a server, or the like).

Further, the methods described with reference to fig. 1 to 9 may be implemented by a program (or instructions) recorded on a computer-readable storage medium. For example, according to an exemplary embodiment of the present disclosure, a computer-readable storage medium storing instructions may be provided, wherein the instructions, when executed by at least one computing device, cause the at least one computing device to perform a method of multi-feature point-based micro-expression fitting and a method of displacement-compensated micro-expression fitting according to the present disclosure.

The computer program in the computer-readable storage medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, and the like, and it should be noted that the computer program may also be used to perform additional steps other than the above steps or perform more specific processing when the above steps are performed, and the content of the additional steps and the further processing is already mentioned in the description of the related method with reference to fig. 1 to 9, and therefore will not be described again in order to avoid repetition.

It should be noted that each unit in the system based on the micro expression fitting of the multi feature points and the system based on the micro expression fitting of the displacement compensation according to the exemplary embodiments of the present disclosure may completely depend on the execution of the computer program to realize the corresponding function, that is, each unit corresponds to each step in the functional architecture of the computer program, so that the whole system is called by a special software package (e.g., lib library) to realize the corresponding function.

Alternatively, the various elements shown in fig. 5, 7, and 8 may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.

For example, exemplary embodiments of the present disclosure may also be implemented as a computing device including a storage component having stored therein a set of computer-executable instructions that, when executed by a processor, perform a method of multi-feature point-based micro-expression fitting and a method of displacement compensation-based micro-expression fitting according to exemplary embodiments of the present disclosure.

In particular, computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions.

The computing device need not be a single computing device, but can be any device or collection of circuits capable of executing the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In a computing device, a processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

Some operations described in the method of the multiple feature point-based micro expression fitting and the method of the displacement compensation-based micro expression fitting according to the exemplary embodiments of the present disclosure may be implemented by software, some operations may be implemented by hardware, and further, the operations may be implemented by a combination of software and hardware.

The processor may execute instructions or code stored in one of the memory components, which may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.

In addition, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.

The method of multi-feature point based micro-expression fitting and the method of displacement compensation based micro-expression fitting according to exemplary embodiments of the present disclosure may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.

Thus, the method of multi-feature point-based micro-expression fitting and the method of displacement compensation-based micro-expression fitting described with reference to fig. 1 to 9 may be implemented by a system comprising at least one computing device and at least one storage device storing instructions.

According to an exemplary embodiment of the present disclosure, at least one computing device is a computing device for performing a method of multiple feature point-based micro-expression fitting and a method of displacement compensation-based micro-expression fitting according to an exemplary embodiment of the present disclosure, and a storage device having stored therein a set of computer-executable instructions that, when executed by the at least one computing device, perform the method of multiple feature point-based micro-expression fitting and the method of displacement compensation-based micro-expression fitting described with reference to fig. 1 to 9.

While various exemplary embodiments of the present disclosure have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present disclosure is not limited to the disclosed exemplary embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. Therefore, the protection scope of the present disclosure should be subject to the scope of the claims.

Claims

1. A displacement compensation based micro-expression fitting method, the method comprising:

acquiring a reference image and an emotion image, wherein the reference image refers to an image under the condition that no stimulus source exists, and the emotion image refers to an image under the condition that a preset stimulus source exists;

respectively obtaining a reference characteristic point and an emotion characteristic point according to the reference image and the emotion image;

compensating the emotion characteristic points to obtain compensated emotion characteristic points;

calculating a face micro-feature vector according to the position information of the reference feature point and the position information of the compensated emotion feature point, and outputting the face micro-feature vector meeting a preset threshold value as a face motion unit; and

and fitting the facial movement unit to obtain the predicted emotional characteristics.

2. The method of claim 1, wherein the step of compensating the emotional feature points to obtain compensated emotional feature points comprises:

obtaining a head posture based on the emotional feature points;

obtaining a displacement compensation value of the feature point based on the head posture; and

and compensating the emotion characteristic points according to the displacement compensation values to obtain compensated emotion characteristic points.

3. The method according to claim 2, wherein in the step of obtaining the displacement compensation value of the feature point based on the head posture, a displacement estimation neural network is employed to obtain the displacement compensation value of the feature point, the displacement estimation neural network comprising:

the input layer comprises three neurons, and input parameters of the input layer respectively correspond to the rotation angles of the head around an X axis, a Y axis and a Z axis;

the first hidden layer adopts a hyperbolic tangent function as an activation function and compresses an input value to a preset range;

the second hidden layer selects an exponential linear function as an activation function of the network model; and

and the output layer selects a logistic regression model as an activation function of the network and outputs the displacement compensation value of the feature point corresponding to the emotion feature point.

4. The method of claim 3, further comprising: a step of training the displacement estimation neural network by using the face posture sample, wherein in the training process,

constructing a face posture sample, wherein the face posture sample comprises rotation angles of the head around an X axis, a Y axis and a Z axis and a characteristic point deviation value corresponding to the rotation angles;

constructing the displacement estimation neural network;

training a displacement estimation neural network through a facial posture sample, randomly distributing weight coefficients of all layers during first iteration, and comparing displacement compensation values of feature points output based on an output layer with corresponding feature point deviation values; and

adjusting the connection weight coefficient among the neurons according to the comparison result, then performing the next iteration,

and evaluating the quality of the neural network model through a loss function, wherein the loss function adopts a cross entropy function, and when the loss function is lower than a preset loss threshold, converges or the iteration number reaches a preset count, the training is terminated.

5. The method according to claim 2, the feature point compensation unit being configured to perform the steps of:

taking the intersection point of the connecting line of the inner canthus of the two eyes and the perpendicular line of the nasal tip as a facial reference point;

calculating a horizontal pixel difference and a vertical pixel difference of each feature point and the face reference point;

and compensating the horizontal pixel difference and the vertical pixel difference of each feature point according to the corresponding displacement compensation value to obtain the compensated emotion feature point.

6. A system for micro-expression fitting based on multiple feature points, the system comprising:

the image acquisition unit is used for acquiring a reference image and an emotion image, wherein the reference image refers to an image under the condition that no stimulus source exists, and the emotion image refers to an image under the condition that a preset stimulus source exists;

a face detection unit that obtains a reference face image and an emotion face image from the reference image and the emotion image, respectively;

a feature point extraction unit which respectively obtains a reference feature point and an emotion feature point according to the reference face image and the emotion face image;

the displacement compensation unit is used for compensating the emotion characteristic points to obtain compensated emotion characteristic points;

the face motion acquisition unit is used for calculating a face micro-feature vector according to the position information of the reference feature point and the position information of the compensated emotion feature point and outputting the face micro-feature vector meeting a preset threshold value as a face motion unit; and

and the emotion recognition unit is used for fitting the face movement unit to obtain predicted emotion characteristics.

7. The system of claim 6, the displacement compensation unit comprising:

the head posture resolving unit is used for obtaining a head posture according to the emotion characteristic points;

the displacement estimation unit is used for obtaining a displacement compensation value of the characteristic point according to the head posture;

and the characteristic point compensation unit is used for compensating the emotion characteristic points according to the displacement compensation values so as to obtain compensated emotion characteristic points.

8. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the displacement compensation-based micro-expression fitting method according to any one of claims 1 to 5.

9. A computer device, characterized in that the computer device comprises:

a processor;

memory storing a computer program which, when executed by a processor, implements the displacement compensation based micro-representation fitting method of any one of claims 1 to 5.