CN120108028B - An emotion recognition method based on eye gaze analysis - Google Patents
An emotion recognition method based on eye gaze analysisInfo
- Publication number
- CN120108028B CN120108028B CN202510592066.2A CN202510592066A CN120108028B CN 120108028 B CN120108028 B CN 120108028B CN 202510592066 A CN202510592066 A CN 202510592066A CN 120108028 B CN120108028 B CN 120108028B
- Authority
- CN
- China
- Prior art keywords
- vector
- texture
- width
- centrifugal
- eye
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/54—Extraction of image or video features relating to texture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biodiversity & Conservation Biology (AREA)
- Ophthalmology & Optometry (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a emotion recognition method based on eye analysis, and belongs to the technical field of image processing. The method comprises the steps of firstly extracting eye images at a plurality of continuous moments and dividing the eye images to obtain iris, sclera and periocular regions, marking the iris and sclera formed regions with contacted positions as eye regions, then extracting texture values from all subregions of the periocular regions, obtaining texture centrifugal values and constructing texture centrifugal vectors, obtaining width centrifugal values according to the width of the eye regions at all moments, constructing width centrifugal vectors, calculating position centrifugal values according to the distance between the eye regions and the iris regions, constructing position centrifugal vectors, obtaining characteristic change factors for each vector, and finally inputting the texture centrifugal vectors, the width centrifugal vectors, the position centrifugal vectors and the characteristic change factors corresponding to all the vectors into an emotion recognition neural network model to obtain emotion types. The invention solves the problem of low emotion recognition precision in the prior art.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a emotion recognition method based on eye analysis.
Background
The eyes act as windows of hearts, carrying a lot of emotional information. The subtle changes of the eye rotation direction, the gazing time length and the eye circumference muscles in the eye spirit are closely connected with the emotion state. Under normal conditions, in daily communication and environmental observation, eyeballs can naturally and flexibly rotate to acquire surrounding visual information, the glance range is wide, and the rotation speed is relatively uniform. However, individuals in a bad mood or depressed state often exhibit significant retardation and limitation in eye rotation. Their rotation amplitudes in the horizontal and vertical directions are reduced, and it is difficult for the eyeballs to move to the target position as quickly and smoothly as normal persons, for example, when reading text or viewing images, and more time and effort may be required to accomplish the same visual search task.
The existing emotion recognition method through eyes recognizes emotion of a person according to distances among upper left eyelid feature points, lower left eyelid feature points, upper right eyelid feature points and lower right eyelid feature points. However, the opening and closing degree of eyes of each person is different, the distance between the characteristic points cannot truly reflect the movement condition of eyes, and the problem of low emotion recognition accuracy exists.
Disclosure of Invention
Aiming at the defects in the prior art, the emotion recognition method based on eye analysis solves the problem of low emotion recognition precision in the prior art.
In order to achieve the aim of the invention, the technical scheme adopted by the invention is that the emotion recognition method based on eye analysis comprises the following steps:
Extracting eye images at a plurality of continuous moments, and dividing each eye image to obtain an iris area, a sclera area and a periocular area;
Marking a constituent region of the iris region and the sclera region in positional contact as an eye region;
Extracting texture values from each subarea in the periocular region, obtaining texture centrifugal values, and constructing a texture centrifugal vector;
Acquiring a width centrifugal value according to the width of the eye area at each moment, and constructing a width centrifugal vector;
Calculating a position centrifugal value according to the distance between the eye area and the iris area, and constructing a position centrifugal vector;
acquiring a characteristic change factor for each vector;
And inputting the texture centrifugal vector, the width centrifugal vector, the position centrifugal vector and the characteristic change factors corresponding to the vectors into the emotion recognition neural network model to obtain emotion types.
Further, the process of segmentation includes:
Carrying out gray scale processing on each eye image, and carrying out clustering processing on pixel points according to gray values to obtain a plurality of clusters;
Calculating the similarity between the average gray level of each cluster and the stored iris gray level value, and finding out the cluster corresponding to the maximum similarity as an iris area;
calculating the similarity between the average gray level of each cluster and the stored sclera gray level value, and finding out the cluster corresponding to the maximum similarity as a sclera area;
Other clusters adjacent to the sclera region and the iris region were considered as periocular regions.
Further, the process of constructing the texture centrifugation vector comprises:
Extracting a contour from the periocular region to obtain a contour map;
Counting the number of contour points in the contour map, and carrying out logarithmic normalization processing to obtain contour density of each periocular region;
calculating standard deviation of each gray value in the periocular region, and normalizing the standard deviation to obtain gray fluctuation value of each periocular region;
adding the contour density belonging to the same periocular region and the gray scale fluctuation value to obtain a texture value;
subtracting the average texture value from the texture value of each periocular region to obtain a texture centrifugal value;
and constructing the texture centrifugal value of the periocular region at each moment into a texture centrifugal vector.
Further, the process of constructing the width centrifugal vector includes:
constructing an external rectangle for the eye region;
Extracting the width of the external rectangle;
subtracting the average width from the width of each eye region to obtain a width centrifugal value;
The width centrifugal value of the eye region at each time is constructed as a width centrifugal vector.
Further, the process of constructing the positional centrifugal vector is as follows:
Acquiring the eye center position of an eye area at each moment;
acquiring an iris center position of an iris region at each moment;
calculating the distance between the central position of the iris and the central position of the eye to obtain the position centrifugal value of the iris;
The position centrifugal value of the iris at each moment is constructed as a position centrifugal vector.
Further, the formula of the position centrifugal value of the iris is: Where γ is the position centrifugation value of the iris, x r is the abscissa of the iris center, y r is the ordinate of the iris center, x e is the abscissa of the eye center, y e is the ordinate of the eye center, f is a sign function, f (x r-xe) is assigned 1 when x r-xe is greater than 0, and f (x r-xe) is assigned-1 when x r-xe is less than 0.
Further, the process of obtaining the feature change factor includes:
summing absolute values of adjacent element differences in each vector to obtain a total characteristic change value;
According to the group number of the adjacent elements, taking an average value of the total characteristic change value to obtain a characteristic change average value;
and carrying out normalization processing on the characteristic change average value to obtain a characteristic change factor.
The emotion recognition neural network model further comprises a texture vector feature extraction module, a width vector feature extraction module, a position vector feature extraction module, a texture feature fusion module, a width feature fusion module, a position feature fusion module and a full-connection layer;
the input end of the texture vector feature extraction module is used for inputting a texture centrifugal vector, the input end of the width vector feature extraction module is used for inputting a width centrifugal vector, and the input end of the position vector feature extraction module is used for inputting a position centrifugal vector;
the first input end of the texture feature fusion module is connected with the output end of the texture vector feature extraction module, and the second input end of the texture feature fusion module is used for inputting texture feature change factors;
the first input end of the width characteristic fusion module is connected with the output end of the width vector characteristic extraction module, and the second input end of the width characteristic fusion module is used for inputting a width characteristic change factor;
the first input end of the position feature fusion module is connected with the output end of the position vector feature extraction module, and the second input end of the position feature fusion module is used for inputting a position feature change factor;
the input end of the full-connection layer is respectively connected with the output end of the texture feature fusion module, the output end of the width feature fusion module and the output end of the position feature fusion module, and the output end of the full-connection layer is used as the output end of the emotion recognition neural network model.
The system comprises a texture vector feature extraction module, a width vector feature extraction module, a position vector feature extraction module and a position vector feature extraction module, wherein the texture vector feature extraction module is used for extracting texture features from a texture centrifugal vector;
the system comprises a texture feature fusion module, a width feature fusion module, a position feature fusion module and a position feature fusion module, wherein the texture feature fusion module is used for fusing texture features and texture feature change factors to obtain texture fusion features;
the full-connection layer is used for classifying according to texture fusion features, width fusion features and position fusion features to obtain emotion types.
Further, the texture vector feature extraction module, the width vector feature extraction module and the position vector feature extraction module all comprise an LSTM network, a two-dimensional feature construction layer and a CNN network which are sequentially connected;
The LSTM network is used for extracting shallow features from vectors, the two-dimensional data construction layer is used for constructing the shallow features into two-dimensional features, and the CNN network is used for extracting deep features from the shallow features.
The beneficial effects of the invention are as follows:
1. According to the invention, the iris region, the sclera region and the periocular region are obtained by extracting and dividing eye images at a plurality of continuous moments, different characteristics of the regions (such as texture values of the periocular region, widths of the eye region and distances between the eye region and the iris region) are analyzed, and the texture centrifugal value, the width centrifugal value and the position centrifugal value are extracted, so that the states of the texture deviating from the mean value, the width deviating from the mean value and the iris position deviating from the center at each moment are reflected, the actual movement condition of eyes can be reflected more comprehensively and accurately, and the limitation caused by the distance of eyelid feature points is avoided, thereby improving the emotion recognition precision.
2. The invention acquires the characteristic change factor for each vector and reflects the change speed of the elements in the vector. The change in emotion is dynamic, not only in the static values of the eye features, but also in the rate of change of these features over time.
3. According to the invention, the texture centrifugal vector, the width centrifugal vector and the position centrifugal vector are processed by adopting the emotion recognition neural network model, and the feature change factors corresponding to the vectors are adopted, so that the accuracy of model emotion classification is further improved.
Drawings
FIG. 1 is a flow chart of a method of emotion recognition based on eye analysis;
fig. 2 is a schematic diagram of a structure of an emotion recognition neural network model.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
As shown in fig. 1, a emotion recognition method based on eye analysis includes the following steps:
Extracting eye images at a plurality of continuous moments, and dividing each eye image to obtain an iris area, a sclera area and a periocular area;
Marking a constituent region of the iris region and the sclera region in positional contact as an eye region;
Extracting texture values from each subarea in the periocular region, obtaining texture centrifugal values, and constructing a texture centrifugal vector;
Acquiring a width centrifugal value according to the width of the eye area at each moment, and constructing a width centrifugal vector;
Calculating a position centrifugal value according to the distance between the eye area and the iris area, and constructing a position centrifugal vector;
acquiring a characteristic change factor for each vector;
And inputting the texture centrifugal vector, the width centrifugal vector, the position centrifugal vector and the characteristic change factors corresponding to the vectors into the emotion recognition neural network model to obtain emotion types.
The iris region is located in the central portion of the eye and is a colored circular portion of the eye that is used to control the amount of light entering the eye. The scleral region, generally described as the "white of the eye", is a white region surrounding the iris, covering most of the surface of the eyeball, and serves to protect the internal tissues of the eyeball.
In this embodiment, 10-20 eye images can be acquired every 30 seconds to facilitate observation of eye movement over a period of time.
In this embodiment, the process of segmentation includes:
Carrying out gray scale processing on each eye image, and carrying out clustering processing on pixel points according to gray values to obtain a plurality of clusters;
Calculating the similarity between the average gray level of each cluster and the stored iris gray level value, and finding out the cluster corresponding to the maximum similarity as an iris area;
calculating the similarity between the average gray level of each cluster and the stored sclera gray level value, and finding out the cluster corresponding to the maximum similarity as a sclera area;
Other clusters adjacent to the sclera region and the iris region were considered as periocular regions.
In this embodiment, the formula for calculating the similarity between the average gray level of each cluster and the stored iris gray level value or the stored sclera gray level value is: Where S is the similarity, G avg is the average gray of the cluster, and G s is the stored iris gray value or the stored sclera gray value.
After the gray level map is clustered, the iris area and the sclera area are found according to the similarity of gray level values of the clusters, and then other clusters adjacent to the sclera area and the iris area are used as periocular areas.
In this embodiment, the process of constructing the texture centrifugation vector includes:
Extracting a contour from the periocular region to obtain a contour map;
Counting the number of contour points in the contour map, and carrying out logarithmic normalization processing to obtain contour density of each periocular region;
calculating standard deviation of each gray value in the periocular region, and normalizing the standard deviation to obtain gray fluctuation value of each periocular region;
adding the contour density belonging to the same periocular region and the gray scale fluctuation value to obtain a texture value;
subtracting the average texture value from the texture value of each periocular region to obtain a texture centrifugal value;
and constructing the texture centrifugal value of the periocular region at each moment into a texture centrifugal vector.
The invention obtains the contour density by extracting the outline around the eyes and counting the number of the points of the outline, can embody the shape and structure characteristics around the eyes, calculates the standard deviation of gray values to obtain the gray fluctuation value, and jointly represents the texture condition around the eyes by combining the outline and the gray fluctuation value. The emotional changes of the person can drive the muscle around eyes to move. For example, when the eye is opened, more wrinkles are formed in the periocular region, and both the contour density and the gray scale fluctuation value are increased. However, because the outline of each periocular region is different and the wrinkling condition is different, the texture centrifugal value is obtained by subtracting the average texture value from the texture value of each periocular region, and the deviation condition of the texture is reflected. In this embodiment, the average texture value may be an average of texture values of a plurality of periocular regions at a plurality of times, or an average of texture values of a plurality of periocular regions on a daily basis.
In the present embodiment, the specific process of extracting the contour of the periocular region includes centering each pixel, and the gray value of the pixel at the centerAnd when the gray values in the neighborhood range are the same, marking the pixel point at the center as a non-contour point, discarding the non-contour point, and obtaining a contour map by using the rest pixel points as contour points.
In the present embodiment, the expression for obtaining the contour density of each periocular region is: Where μ o is the contour density of the periocular region, M o is the number of contour points in the contour map, and M E is the number of pixel points in the periocular region.
In the present embodiment, the expression for obtaining the gradation fluctuation value of each periocular region is: where θ is a gray scale fluctuation value of the periocular region, σ is a standard deviation of each gray scale value in the periocular region, G max is a maximum gray scale value in the periocular region at each time, and G min is a minimum gray scale value in the periocular region at each time.
In this embodiment, the process of constructing the width centrifugal vector includes:
constructing an external rectangle for the eye region;
Extracting the width of the external rectangle;
subtracting the average width from the width of each eye region to obtain a width centrifugal value;
The width centrifugal value of the eye region at each time is constructed as a width centrifugal vector.
According to the invention, the external rectangle is constructed for the eye area, the opening and closing state of the eye is reflected through the width of the external rectangle, and the larger the width of the external rectangle is, the larger the iris area and the sclera area are, so that the opening and closing degree of the eye is shown to be larger. For example, when a person is surprised, the eyes may be opened more, the width of the circumscribed rectangle may be increased, and when the person is in a relaxed or tired state, the eyes may be squinted slightly, and the width may be reduced.
Because the sizes of eyes of each person are different, the state of the eyes cannot be accurately reflected through the widths of the eyes, the average width is subtracted from the width of each eye area to obtain a width centrifugal value, the condition that the width deviates from the average is reflected, and the state of the eyes is accurately reflected.
In this embodiment, the average width may be an average of the widths of the eye regions at a plurality of times, or an average of the widths of a plurality of daily eye regions.
In this embodiment, the process of constructing the positional centrifugal vector is:
Acquiring the eye center position of an eye area at each moment;
acquiring an iris center position of an iris region at each moment;
calculating the distance between the central position of the iris and the central position of the eye to obtain the position centrifugal value of the iris;
The position centrifugal value of the iris at each moment is constructed as a position centrifugal vector.
In this embodiment, the formula of the position centrifugal value of the iris is: Where γ is the position centrifugation value of the iris, x r is the abscissa of the iris center, y r is the ordinate of the iris center, x e is the abscissa of the eye center, y e is the ordinate of the eye center, f is a sign function, f (x r-xe) is assigned 1 when x r-xe is greater than 0, and f (x r-xe) is assigned-1 when x r-xe is less than 0.
The invention calculates the distance between the center of the eye and the center of the iris, and can accurately quantify the position of the iris in the eye area. The sign function f is added, so that the distance value can be reflected, and the direction of the iris relative to the center of the eye can be reflected.
The horizontal coordinate of the eye center is the average value of the horizontal coordinates of all the pixel points in the eye area, the vertical coordinate of the eye center is the average value of the vertical coordinates of all the pixel points in the eye area, and the other implementation mode is that the horizontal coordinate of the eye center is the average value of the horizontal coordinates of all the pixel points at the outer edge of the eye area, the vertical coordinate of the eye center is the average value of the vertical coordinates of all the pixel points at the outer edge of the eye area, namely the eye center position is the geometric center of the eye area. The abscissa of the iris region is the average value of the abscissas of all the pixel points in the iris region, the ordinate of the iris region is the average value of the abscissas of all the pixel points in the iris region, the abscissa of the iris region is the average value of the abscissas of all the pixel points at the outer edge in the iris region, and the ordinate of the iris region is the average value of the abscissas of all the pixel points at the outer edge in the iris region, namely the center position of the iris is the geometric center of the iris region. The manner of acquiring the eye center position and the iris center position described in the present embodiment is not limited. The outer edge refers to a pixel point at the boundary of the outermost side of the eye region or iris region.
In this embodiment, the process of obtaining the feature change factor includes:
summing absolute values of adjacent element differences in each vector to obtain a total characteristic change value;
According to the group number of the adjacent elements, taking an average value of the total characteristic change value to obtain a characteristic change average value;
and carrying out normalization processing on the characteristic change average value to obtain a characteristic change factor.
The expression for obtaining the characteristic change average value is: e is a feature change average value, E t+1 is an element at the t+1th time in the vector, E t is an element at the T time in the vector, I is an absolute value operation, T-1 is the number of groups of adjacent elements, and T is the number of times.
The method comprises the steps of summing absolute values of differences of adjacent elements to obtain overall characteristic change conditions, taking an average value of the overall characteristic change values according to the number of groups of the adjacent elements to obtain a characteristic change average value, reflecting average change speed, and uniformly evaluating the scale through normalization processing.
In the embodiment, the normalization processing of the characteristic variation average value comprises the steps of dividing the characteristic variation average value by an average texture value for a texture centrifugal vector, dividing the characteristic variation average value by an average width for a width centrifugal vector, and dividing the characteristic variation average value by the absolute value of the difference between the maximum value and the minimum value in the position centrifugal vector for a position centrifugal vector.
As shown in fig. 2, the emotion recognition neural network model comprises a texture vector feature extraction module, a width vector feature extraction module, a position vector feature extraction module, a texture feature fusion module, a width feature fusion module, a position feature fusion module and a full connection layer;
the input end of the texture vector feature extraction module is used for inputting a texture centrifugal vector, the input end of the width vector feature extraction module is used for inputting a width centrifugal vector, and the input end of the position vector feature extraction module is used for inputting a position centrifugal vector;
the first input end of the texture feature fusion module is connected with the output end of the texture vector feature extraction module, and the second input end of the texture feature fusion module is used for inputting texture feature change factors;
the first input end of the width characteristic fusion module is connected with the output end of the width vector characteristic extraction module, and the second input end of the width characteristic fusion module is used for inputting a width characteristic change factor;
the first input end of the position feature fusion module is connected with the output end of the position vector feature extraction module, and the second input end of the position feature fusion module is used for inputting a position feature change factor;
the input end of the full-connection layer is respectively connected with the output end of the texture feature fusion module, the output end of the width feature fusion module and the output end of the position feature fusion module, and the output end of the full-connection layer is used as the output end of the emotion recognition neural network model.
The texture characteristic change factor is a characteristic change factor corresponding to the texture centrifugal vector, the width characteristic change factor is a characteristic change factor corresponding to the width centrifugal vector, and the position characteristic change factor is a characteristic change factor corresponding to the position centrifugal vector.
According to the invention, three time sequence vectors are respectively processed through the three feature extraction modules, the vector features are extracted, and the three feature fusion modules are used for fusing the feature change factors and the vector features, so that the emotion recognition accuracy is improved.
In the embodiment, the texture vector feature extraction module is used for extracting texture features from texture centrifugal vectors, the width vector feature extraction module is used for extracting width features from width centrifugal vectors, and the position vector feature extraction module is used for extracting position features from position centrifugal vectors;
the system comprises a texture feature fusion module, a width feature fusion module, a position feature fusion module and a position feature fusion module, wherein the texture feature fusion module is used for fusing texture features and texture feature change factors to obtain texture fusion features;
the full-connection layer is used for classifying according to texture fusion features, width fusion features and position fusion features to obtain emotion types.
In this embodiment, the expressions of the texture feature fusion module, the width feature fusion module, and the position feature fusion module are: Wherein y is the output of the feature fusion module, g 1 is the input of the first input end of the feature fusion module, g 2 is the input of the second input end of the feature fusion module, w 1 is the weight of g 1, and w 2 is the weight of g 2.
In the embodiment, the texture vector feature extraction module, the width vector feature extraction module and the position vector feature extraction module all comprise an LSTM network and a full connection layer which are sequentially connected, wherein the LSTM network is used for extracting shallow features from vectors, and the full connection layer is used for carrying out feature mapping on the shallow features to obtain deep features. More preferably, the texture vector feature extraction module, the width vector feature extraction module and the position vector feature extraction module all comprise an LSTM network, a two-dimensional feature construction layer and a CNN network which are sequentially connected;
The LSTM network is used for extracting shallow features from vectors, the two-dimensional data construction layer is used for constructing the shallow features into two-dimensional features, and the CNN network is used for extracting deep features from the shallow features. For the texture vector feature extraction module, the deep features are texture features, for the width vector feature extraction module, the deep features are width features, and for the position vector feature extraction module, the deep features are position features.
The two-dimensional feature construction layer is formed by H=h T H, H is a two-dimensional feature, H is a vector formed by outputting each feature value H t by the LSTM network, and T is transposition operation.
According to the invention, time sequence information in the vector can be effectively extracted through the LSTM network, dynamic changes of eye features along with time are captured, and then the output features of the LSTM network are formed into two-dimensional data, so that the data volume is increased, the strong space feature extraction capacity of the CNN network is conveniently utilized, the space relation among the features is excavated, the model can comprehensively utilize the time sequence and the space information, the eye features are more comprehensively represented, and the emotion recognition accuracy is improved.
In this embodiment, the emotion types include happiness, sadness, anger, lowness, and the like.
According to the invention, the iris region, the sclera region and the periocular region are obtained by extracting and dividing eye images at a plurality of continuous moments, different characteristics of the regions (such as texture values of the periocular region, widths of the eye region and distances between the eye region and the iris region) are analyzed, and the texture centrifugal value, the width centrifugal value and the position centrifugal value are extracted, so that the states of the texture deviating from the mean value, the width deviating from the mean value and the iris position deviating from the center at each moment are reflected, the actual movement condition of eyes can be reflected more comprehensively and accurately, and the limitation caused by the distance of eyelid feature points is avoided, thereby improving the emotion recognition precision.
The invention acquires the characteristic change factor for each vector and reflects the change speed of the elements in the vector. The change in emotion is dynamic, not only in the static values of the eye features, but also in the rate of change of these features over time.
According to the invention, the texture centrifugal vector, the width centrifugal vector and the position centrifugal vector are processed by adopting the emotion recognition neural network model, and the feature change factors corresponding to the vectors are adopted, so that the accuracy of model emotion classification is further improved.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510592066.2A CN120108028B (en) | 2025-05-09 | 2025-05-09 | An emotion recognition method based on eye gaze analysis |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510592066.2A CN120108028B (en) | 2025-05-09 | 2025-05-09 | An emotion recognition method based on eye gaze analysis |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN120108028A CN120108028A (en) | 2025-06-06 |
| CN120108028B true CN120108028B (en) | 2025-07-18 |
Family
ID=95890316
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510592066.2A Active CN120108028B (en) | 2025-05-09 | 2025-05-09 | An emotion recognition method based on eye gaze analysis |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120108028B (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113343943A (en) * | 2021-07-21 | 2021-09-03 | 西安电子科技大学 | Eye image segmentation method based on sclera region supervision |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10909363B2 (en) * | 2019-05-13 | 2021-02-02 | Fotonation Limited | Image acquisition system for off-axis eye images |
| CN112163456B (en) * | 2020-08-28 | 2024-04-09 | 北京中科虹霸科技有限公司 | Identity recognition model training method, testing method, recognition method and device |
-
2025
- 2025-05-09 CN CN202510592066.2A patent/CN120108028B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113343943A (en) * | 2021-07-21 | 2021-09-03 | 西安电子科技大学 | Eye image segmentation method based on sclera region supervision |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120108028A (en) | 2025-06-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Amrutha et al. | ML based sign language recognition system | |
| CN106960202B (en) | Smiling face identification method based on visible light and infrared image fusion | |
| Tome et al. | Facial soft biometric features for forensic face recognition | |
| US6611613B1 (en) | Apparatus and method for detecting speaking person's eyes and face | |
| US20160371539A1 (en) | Method and system for extracting characteristic of three-dimensional face image | |
| CN102013011B (en) | Front-face-compensation-operator-based multi-pose human face recognition method | |
| CN109271930B (en) | Micro-expression recognition method, device and storage medium | |
| CN112818899B (en) | Face image processing method, device, computer equipment and storage medium | |
| CN108062543A (en) | A kind of face recognition method and device | |
| WO2021196721A1 (en) | Cabin interior environment adjustment method and apparatus | |
| CN108629336A (en) | Face value calculating method based on human face characteristic point identification | |
| CN111460950B (en) | Cognitive distraction method based on head-eye evidence fusion in natural driving conversation behavior | |
| CN110543848B (en) | Driver action recognition method and device based on three-dimensional convolutional neural network | |
| CN113920575A (en) | Facial expression recognition method and device and storage medium | |
| CN110008920A (en) | Research on facial expression recognition method | |
| Jacintha et al. | A review on facial emotion recognition techniques | |
| CN116645717B (en) | A micro-expression recognition method and system based on PCANet+ and LSTM | |
| RU2768797C1 (en) | Method and system for determining synthetically modified face images on video | |
| CN120108028B (en) | An emotion recognition method based on eye gaze analysis | |
| CN107977622B (en) | Eye state detection method based on pupil characteristics | |
| Kim et al. | Facial landmark extraction scheme based on semantic segmentation | |
| Pathak et al. | Multimodal eye biometric system based on contour based E-CNN and multi algorithmic feature extraction using SVBF matching | |
| Lin et al. | A gender classification scheme based on multi-region feature extraction and information fusion for unconstrained images | |
| Barra et al. | F-FID: fast fuzzy-based iris de-noising for mobile security applications | |
| CN110688872A (en) | Lip-based person identification method, device, program, medium, and electronic apparatus |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |