[go: up one dir, main page]

CN120108028B - An emotion recognition method based on eye gaze analysis - Google Patents

An emotion recognition method based on eye gaze analysis

Info

Publication number
CN120108028B
CN120108028B CN202510592066.2A CN202510592066A CN120108028B CN 120108028 B CN120108028 B CN 120108028B CN 202510592066 A CN202510592066 A CN 202510592066A CN 120108028 B CN120108028 B CN 120108028B
Authority
CN
China
Prior art keywords
vector
texture
width
centrifugal
eye
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510592066.2A
Other languages
Chinese (zh)
Other versions
CN120108028A (en
Inventor
高云龙
胡炜
马文越
程露红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zhilan Health Co ltd
Original Assignee
Hangzhou Zhilan Health Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zhilan Health Co ltd filed Critical Hangzhou Zhilan Health Co ltd
Priority to CN202510592066.2A priority Critical patent/CN120108028B/en
Publication of CN120108028A publication Critical patent/CN120108028A/en
Application granted granted Critical
Publication of CN120108028B publication Critical patent/CN120108028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a emotion recognition method based on eye analysis, and belongs to the technical field of image processing. The method comprises the steps of firstly extracting eye images at a plurality of continuous moments and dividing the eye images to obtain iris, sclera and periocular regions, marking the iris and sclera formed regions with contacted positions as eye regions, then extracting texture values from all subregions of the periocular regions, obtaining texture centrifugal values and constructing texture centrifugal vectors, obtaining width centrifugal values according to the width of the eye regions at all moments, constructing width centrifugal vectors, calculating position centrifugal values according to the distance between the eye regions and the iris regions, constructing position centrifugal vectors, obtaining characteristic change factors for each vector, and finally inputting the texture centrifugal vectors, the width centrifugal vectors, the position centrifugal vectors and the characteristic change factors corresponding to all the vectors into an emotion recognition neural network model to obtain emotion types. The invention solves the problem of low emotion recognition precision in the prior art.

Description

Emotion recognition method based on eye analysis
Technical Field
The invention relates to the technical field of image processing, in particular to a emotion recognition method based on eye analysis.
Background
The eyes act as windows of hearts, carrying a lot of emotional information. The subtle changes of the eye rotation direction, the gazing time length and the eye circumference muscles in the eye spirit are closely connected with the emotion state. Under normal conditions, in daily communication and environmental observation, eyeballs can naturally and flexibly rotate to acquire surrounding visual information, the glance range is wide, and the rotation speed is relatively uniform. However, individuals in a bad mood or depressed state often exhibit significant retardation and limitation in eye rotation. Their rotation amplitudes in the horizontal and vertical directions are reduced, and it is difficult for the eyeballs to move to the target position as quickly and smoothly as normal persons, for example, when reading text or viewing images, and more time and effort may be required to accomplish the same visual search task.
The existing emotion recognition method through eyes recognizes emotion of a person according to distances among upper left eyelid feature points, lower left eyelid feature points, upper right eyelid feature points and lower right eyelid feature points. However, the opening and closing degree of eyes of each person is different, the distance between the characteristic points cannot truly reflect the movement condition of eyes, and the problem of low emotion recognition accuracy exists.
Disclosure of Invention
Aiming at the defects in the prior art, the emotion recognition method based on eye analysis solves the problem of low emotion recognition precision in the prior art.
In order to achieve the aim of the invention, the technical scheme adopted by the invention is that the emotion recognition method based on eye analysis comprises the following steps:
Extracting eye images at a plurality of continuous moments, and dividing each eye image to obtain an iris area, a sclera area and a periocular area;
Marking a constituent region of the iris region and the sclera region in positional contact as an eye region;
Extracting texture values from each subarea in the periocular region, obtaining texture centrifugal values, and constructing a texture centrifugal vector;
Acquiring a width centrifugal value according to the width of the eye area at each moment, and constructing a width centrifugal vector;
Calculating a position centrifugal value according to the distance between the eye area and the iris area, and constructing a position centrifugal vector;
acquiring a characteristic change factor for each vector;
And inputting the texture centrifugal vector, the width centrifugal vector, the position centrifugal vector and the characteristic change factors corresponding to the vectors into the emotion recognition neural network model to obtain emotion types.
Further, the process of segmentation includes:
Carrying out gray scale processing on each eye image, and carrying out clustering processing on pixel points according to gray values to obtain a plurality of clusters;
Calculating the similarity between the average gray level of each cluster and the stored iris gray level value, and finding out the cluster corresponding to the maximum similarity as an iris area;
calculating the similarity between the average gray level of each cluster and the stored sclera gray level value, and finding out the cluster corresponding to the maximum similarity as a sclera area;
Other clusters adjacent to the sclera region and the iris region were considered as periocular regions.
Further, the process of constructing the texture centrifugation vector comprises:
Extracting a contour from the periocular region to obtain a contour map;
Counting the number of contour points in the contour map, and carrying out logarithmic normalization processing to obtain contour density of each periocular region;
calculating standard deviation of each gray value in the periocular region, and normalizing the standard deviation to obtain gray fluctuation value of each periocular region;
adding the contour density belonging to the same periocular region and the gray scale fluctuation value to obtain a texture value;
subtracting the average texture value from the texture value of each periocular region to obtain a texture centrifugal value;
and constructing the texture centrifugal value of the periocular region at each moment into a texture centrifugal vector.
Further, the process of constructing the width centrifugal vector includes:
constructing an external rectangle for the eye region;
Extracting the width of the external rectangle;
subtracting the average width from the width of each eye region to obtain a width centrifugal value;
The width centrifugal value of the eye region at each time is constructed as a width centrifugal vector.
Further, the process of constructing the positional centrifugal vector is as follows:
Acquiring the eye center position of an eye area at each moment;
acquiring an iris center position of an iris region at each moment;
calculating the distance between the central position of the iris and the central position of the eye to obtain the position centrifugal value of the iris;
The position centrifugal value of the iris at each moment is constructed as a position centrifugal vector.
Further, the formula of the position centrifugal value of the iris is: Where γ is the position centrifugation value of the iris, x r is the abscissa of the iris center, y r is the ordinate of the iris center, x e is the abscissa of the eye center, y e is the ordinate of the eye center, f is a sign function, f (x r-xe) is assigned 1 when x r-xe is greater than 0, and f (x r-xe) is assigned-1 when x r-xe is less than 0.
Further, the process of obtaining the feature change factor includes:
summing absolute values of adjacent element differences in each vector to obtain a total characteristic change value;
According to the group number of the adjacent elements, taking an average value of the total characteristic change value to obtain a characteristic change average value;
and carrying out normalization processing on the characteristic change average value to obtain a characteristic change factor.
The emotion recognition neural network model further comprises a texture vector feature extraction module, a width vector feature extraction module, a position vector feature extraction module, a texture feature fusion module, a width feature fusion module, a position feature fusion module and a full-connection layer;
the input end of the texture vector feature extraction module is used for inputting a texture centrifugal vector, the input end of the width vector feature extraction module is used for inputting a width centrifugal vector, and the input end of the position vector feature extraction module is used for inputting a position centrifugal vector;
the first input end of the texture feature fusion module is connected with the output end of the texture vector feature extraction module, and the second input end of the texture feature fusion module is used for inputting texture feature change factors;
the first input end of the width characteristic fusion module is connected with the output end of the width vector characteristic extraction module, and the second input end of the width characteristic fusion module is used for inputting a width characteristic change factor;
the first input end of the position feature fusion module is connected with the output end of the position vector feature extraction module, and the second input end of the position feature fusion module is used for inputting a position feature change factor;
the input end of the full-connection layer is respectively connected with the output end of the texture feature fusion module, the output end of the width feature fusion module and the output end of the position feature fusion module, and the output end of the full-connection layer is used as the output end of the emotion recognition neural network model.
The system comprises a texture vector feature extraction module, a width vector feature extraction module, a position vector feature extraction module and a position vector feature extraction module, wherein the texture vector feature extraction module is used for extracting texture features from a texture centrifugal vector;
the system comprises a texture feature fusion module, a width feature fusion module, a position feature fusion module and a position feature fusion module, wherein the texture feature fusion module is used for fusing texture features and texture feature change factors to obtain texture fusion features;
the full-connection layer is used for classifying according to texture fusion features, width fusion features and position fusion features to obtain emotion types.
Further, the texture vector feature extraction module, the width vector feature extraction module and the position vector feature extraction module all comprise an LSTM network, a two-dimensional feature construction layer and a CNN network which are sequentially connected;
The LSTM network is used for extracting shallow features from vectors, the two-dimensional data construction layer is used for constructing the shallow features into two-dimensional features, and the CNN network is used for extracting deep features from the shallow features.
The beneficial effects of the invention are as follows:
1. According to the invention, the iris region, the sclera region and the periocular region are obtained by extracting and dividing eye images at a plurality of continuous moments, different characteristics of the regions (such as texture values of the periocular region, widths of the eye region and distances between the eye region and the iris region) are analyzed, and the texture centrifugal value, the width centrifugal value and the position centrifugal value are extracted, so that the states of the texture deviating from the mean value, the width deviating from the mean value and the iris position deviating from the center at each moment are reflected, the actual movement condition of eyes can be reflected more comprehensively and accurately, and the limitation caused by the distance of eyelid feature points is avoided, thereby improving the emotion recognition precision.
2. The invention acquires the characteristic change factor for each vector and reflects the change speed of the elements in the vector. The change in emotion is dynamic, not only in the static values of the eye features, but also in the rate of change of these features over time.
3. According to the invention, the texture centrifugal vector, the width centrifugal vector and the position centrifugal vector are processed by adopting the emotion recognition neural network model, and the feature change factors corresponding to the vectors are adopted, so that the accuracy of model emotion classification is further improved.
Drawings
FIG. 1 is a flow chart of a method of emotion recognition based on eye analysis;
fig. 2 is a schematic diagram of a structure of an emotion recognition neural network model.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
As shown in fig. 1, a emotion recognition method based on eye analysis includes the following steps:
Extracting eye images at a plurality of continuous moments, and dividing each eye image to obtain an iris area, a sclera area and a periocular area;
Marking a constituent region of the iris region and the sclera region in positional contact as an eye region;
Extracting texture values from each subarea in the periocular region, obtaining texture centrifugal values, and constructing a texture centrifugal vector;
Acquiring a width centrifugal value according to the width of the eye area at each moment, and constructing a width centrifugal vector;
Calculating a position centrifugal value according to the distance between the eye area and the iris area, and constructing a position centrifugal vector;
acquiring a characteristic change factor for each vector;
And inputting the texture centrifugal vector, the width centrifugal vector, the position centrifugal vector and the characteristic change factors corresponding to the vectors into the emotion recognition neural network model to obtain emotion types.
The iris region is located in the central portion of the eye and is a colored circular portion of the eye that is used to control the amount of light entering the eye. The scleral region, generally described as the "white of the eye", is a white region surrounding the iris, covering most of the surface of the eyeball, and serves to protect the internal tissues of the eyeball.
In this embodiment, 10-20 eye images can be acquired every 30 seconds to facilitate observation of eye movement over a period of time.
In this embodiment, the process of segmentation includes:
Carrying out gray scale processing on each eye image, and carrying out clustering processing on pixel points according to gray values to obtain a plurality of clusters;
Calculating the similarity between the average gray level of each cluster and the stored iris gray level value, and finding out the cluster corresponding to the maximum similarity as an iris area;
calculating the similarity between the average gray level of each cluster and the stored sclera gray level value, and finding out the cluster corresponding to the maximum similarity as a sclera area;
Other clusters adjacent to the sclera region and the iris region were considered as periocular regions.
In this embodiment, the formula for calculating the similarity between the average gray level of each cluster and the stored iris gray level value or the stored sclera gray level value is: Where S is the similarity, G avg is the average gray of the cluster, and G s is the stored iris gray value or the stored sclera gray value.
After the gray level map is clustered, the iris area and the sclera area are found according to the similarity of gray level values of the clusters, and then other clusters adjacent to the sclera area and the iris area are used as periocular areas.
In this embodiment, the process of constructing the texture centrifugation vector includes:
Extracting a contour from the periocular region to obtain a contour map;
Counting the number of contour points in the contour map, and carrying out logarithmic normalization processing to obtain contour density of each periocular region;
calculating standard deviation of each gray value in the periocular region, and normalizing the standard deviation to obtain gray fluctuation value of each periocular region;
adding the contour density belonging to the same periocular region and the gray scale fluctuation value to obtain a texture value;
subtracting the average texture value from the texture value of each periocular region to obtain a texture centrifugal value;
and constructing the texture centrifugal value of the periocular region at each moment into a texture centrifugal vector.
The invention obtains the contour density by extracting the outline around the eyes and counting the number of the points of the outline, can embody the shape and structure characteristics around the eyes, calculates the standard deviation of gray values to obtain the gray fluctuation value, and jointly represents the texture condition around the eyes by combining the outline and the gray fluctuation value. The emotional changes of the person can drive the muscle around eyes to move. For example, when the eye is opened, more wrinkles are formed in the periocular region, and both the contour density and the gray scale fluctuation value are increased. However, because the outline of each periocular region is different and the wrinkling condition is different, the texture centrifugal value is obtained by subtracting the average texture value from the texture value of each periocular region, and the deviation condition of the texture is reflected. In this embodiment, the average texture value may be an average of texture values of a plurality of periocular regions at a plurality of times, or an average of texture values of a plurality of periocular regions on a daily basis.
In the present embodiment, the specific process of extracting the contour of the periocular region includes centering each pixel, and the gray value of the pixel at the centerAnd when the gray values in the neighborhood range are the same, marking the pixel point at the center as a non-contour point, discarding the non-contour point, and obtaining a contour map by using the rest pixel points as contour points.
In the present embodiment, the expression for obtaining the contour density of each periocular region is: Where μ o is the contour density of the periocular region, M o is the number of contour points in the contour map, and M E is the number of pixel points in the periocular region.
In the present embodiment, the expression for obtaining the gradation fluctuation value of each periocular region is: where θ is a gray scale fluctuation value of the periocular region, σ is a standard deviation of each gray scale value in the periocular region, G max is a maximum gray scale value in the periocular region at each time, and G min is a minimum gray scale value in the periocular region at each time.
In this embodiment, the process of constructing the width centrifugal vector includes:
constructing an external rectangle for the eye region;
Extracting the width of the external rectangle;
subtracting the average width from the width of each eye region to obtain a width centrifugal value;
The width centrifugal value of the eye region at each time is constructed as a width centrifugal vector.
According to the invention, the external rectangle is constructed for the eye area, the opening and closing state of the eye is reflected through the width of the external rectangle, and the larger the width of the external rectangle is, the larger the iris area and the sclera area are, so that the opening and closing degree of the eye is shown to be larger. For example, when a person is surprised, the eyes may be opened more, the width of the circumscribed rectangle may be increased, and when the person is in a relaxed or tired state, the eyes may be squinted slightly, and the width may be reduced.
Because the sizes of eyes of each person are different, the state of the eyes cannot be accurately reflected through the widths of the eyes, the average width is subtracted from the width of each eye area to obtain a width centrifugal value, the condition that the width deviates from the average is reflected, and the state of the eyes is accurately reflected.
In this embodiment, the average width may be an average of the widths of the eye regions at a plurality of times, or an average of the widths of a plurality of daily eye regions.
In this embodiment, the process of constructing the positional centrifugal vector is:
Acquiring the eye center position of an eye area at each moment;
acquiring an iris center position of an iris region at each moment;
calculating the distance between the central position of the iris and the central position of the eye to obtain the position centrifugal value of the iris;
The position centrifugal value of the iris at each moment is constructed as a position centrifugal vector.
In this embodiment, the formula of the position centrifugal value of the iris is: Where γ is the position centrifugation value of the iris, x r is the abscissa of the iris center, y r is the ordinate of the iris center, x e is the abscissa of the eye center, y e is the ordinate of the eye center, f is a sign function, f (x r-xe) is assigned 1 when x r-xe is greater than 0, and f (x r-xe) is assigned-1 when x r-xe is less than 0.
The invention calculates the distance between the center of the eye and the center of the iris, and can accurately quantify the position of the iris in the eye area. The sign function f is added, so that the distance value can be reflected, and the direction of the iris relative to the center of the eye can be reflected.
The horizontal coordinate of the eye center is the average value of the horizontal coordinates of all the pixel points in the eye area, the vertical coordinate of the eye center is the average value of the vertical coordinates of all the pixel points in the eye area, and the other implementation mode is that the horizontal coordinate of the eye center is the average value of the horizontal coordinates of all the pixel points at the outer edge of the eye area, the vertical coordinate of the eye center is the average value of the vertical coordinates of all the pixel points at the outer edge of the eye area, namely the eye center position is the geometric center of the eye area. The abscissa of the iris region is the average value of the abscissas of all the pixel points in the iris region, the ordinate of the iris region is the average value of the abscissas of all the pixel points in the iris region, the abscissa of the iris region is the average value of the abscissas of all the pixel points at the outer edge in the iris region, and the ordinate of the iris region is the average value of the abscissas of all the pixel points at the outer edge in the iris region, namely the center position of the iris is the geometric center of the iris region. The manner of acquiring the eye center position and the iris center position described in the present embodiment is not limited. The outer edge refers to a pixel point at the boundary of the outermost side of the eye region or iris region.
In this embodiment, the process of obtaining the feature change factor includes:
summing absolute values of adjacent element differences in each vector to obtain a total characteristic change value;
According to the group number of the adjacent elements, taking an average value of the total characteristic change value to obtain a characteristic change average value;
and carrying out normalization processing on the characteristic change average value to obtain a characteristic change factor.
The expression for obtaining the characteristic change average value is: e is a feature change average value, E t+1 is an element at the t+1th time in the vector, E t is an element at the T time in the vector, I is an absolute value operation, T-1 is the number of groups of adjacent elements, and T is the number of times.
The method comprises the steps of summing absolute values of differences of adjacent elements to obtain overall characteristic change conditions, taking an average value of the overall characteristic change values according to the number of groups of the adjacent elements to obtain a characteristic change average value, reflecting average change speed, and uniformly evaluating the scale through normalization processing.
In the embodiment, the normalization processing of the characteristic variation average value comprises the steps of dividing the characteristic variation average value by an average texture value for a texture centrifugal vector, dividing the characteristic variation average value by an average width for a width centrifugal vector, and dividing the characteristic variation average value by the absolute value of the difference between the maximum value and the minimum value in the position centrifugal vector for a position centrifugal vector.
As shown in fig. 2, the emotion recognition neural network model comprises a texture vector feature extraction module, a width vector feature extraction module, a position vector feature extraction module, a texture feature fusion module, a width feature fusion module, a position feature fusion module and a full connection layer;
the input end of the texture vector feature extraction module is used for inputting a texture centrifugal vector, the input end of the width vector feature extraction module is used for inputting a width centrifugal vector, and the input end of the position vector feature extraction module is used for inputting a position centrifugal vector;
the first input end of the texture feature fusion module is connected with the output end of the texture vector feature extraction module, and the second input end of the texture feature fusion module is used for inputting texture feature change factors;
the first input end of the width characteristic fusion module is connected with the output end of the width vector characteristic extraction module, and the second input end of the width characteristic fusion module is used for inputting a width characteristic change factor;
the first input end of the position feature fusion module is connected with the output end of the position vector feature extraction module, and the second input end of the position feature fusion module is used for inputting a position feature change factor;
the input end of the full-connection layer is respectively connected with the output end of the texture feature fusion module, the output end of the width feature fusion module and the output end of the position feature fusion module, and the output end of the full-connection layer is used as the output end of the emotion recognition neural network model.
The texture characteristic change factor is a characteristic change factor corresponding to the texture centrifugal vector, the width characteristic change factor is a characteristic change factor corresponding to the width centrifugal vector, and the position characteristic change factor is a characteristic change factor corresponding to the position centrifugal vector.
According to the invention, three time sequence vectors are respectively processed through the three feature extraction modules, the vector features are extracted, and the three feature fusion modules are used for fusing the feature change factors and the vector features, so that the emotion recognition accuracy is improved.
In the embodiment, the texture vector feature extraction module is used for extracting texture features from texture centrifugal vectors, the width vector feature extraction module is used for extracting width features from width centrifugal vectors, and the position vector feature extraction module is used for extracting position features from position centrifugal vectors;
the system comprises a texture feature fusion module, a width feature fusion module, a position feature fusion module and a position feature fusion module, wherein the texture feature fusion module is used for fusing texture features and texture feature change factors to obtain texture fusion features;
the full-connection layer is used for classifying according to texture fusion features, width fusion features and position fusion features to obtain emotion types.
In this embodiment, the expressions of the texture feature fusion module, the width feature fusion module, and the position feature fusion module are: Wherein y is the output of the feature fusion module, g 1 is the input of the first input end of the feature fusion module, g 2 is the input of the second input end of the feature fusion module, w 1 is the weight of g 1, and w 2 is the weight of g 2.
In the embodiment, the texture vector feature extraction module, the width vector feature extraction module and the position vector feature extraction module all comprise an LSTM network and a full connection layer which are sequentially connected, wherein the LSTM network is used for extracting shallow features from vectors, and the full connection layer is used for carrying out feature mapping on the shallow features to obtain deep features. More preferably, the texture vector feature extraction module, the width vector feature extraction module and the position vector feature extraction module all comprise an LSTM network, a two-dimensional feature construction layer and a CNN network which are sequentially connected;
The LSTM network is used for extracting shallow features from vectors, the two-dimensional data construction layer is used for constructing the shallow features into two-dimensional features, and the CNN network is used for extracting deep features from the shallow features. For the texture vector feature extraction module, the deep features are texture features, for the width vector feature extraction module, the deep features are width features, and for the position vector feature extraction module, the deep features are position features.
The two-dimensional feature construction layer is formed by H=h T H, H is a two-dimensional feature, H is a vector formed by outputting each feature value H t by the LSTM network, and T is transposition operation.
According to the invention, time sequence information in the vector can be effectively extracted through the LSTM network, dynamic changes of eye features along with time are captured, and then the output features of the LSTM network are formed into two-dimensional data, so that the data volume is increased, the strong space feature extraction capacity of the CNN network is conveniently utilized, the space relation among the features is excavated, the model can comprehensively utilize the time sequence and the space information, the eye features are more comprehensively represented, and the emotion recognition accuracy is improved.
In this embodiment, the emotion types include happiness, sadness, anger, lowness, and the like.
According to the invention, the iris region, the sclera region and the periocular region are obtained by extracting and dividing eye images at a plurality of continuous moments, different characteristics of the regions (such as texture values of the periocular region, widths of the eye region and distances between the eye region and the iris region) are analyzed, and the texture centrifugal value, the width centrifugal value and the position centrifugal value are extracted, so that the states of the texture deviating from the mean value, the width deviating from the mean value and the iris position deviating from the center at each moment are reflected, the actual movement condition of eyes can be reflected more comprehensively and accurately, and the limitation caused by the distance of eyelid feature points is avoided, thereby improving the emotion recognition precision.
The invention acquires the characteristic change factor for each vector and reflects the change speed of the elements in the vector. The change in emotion is dynamic, not only in the static values of the eye features, but also in the rate of change of these features over time.
According to the invention, the texture centrifugal vector, the width centrifugal vector and the position centrifugal vector are processed by adopting the emotion recognition neural network model, and the feature change factors corresponding to the vectors are adopted, so that the accuracy of model emotion classification is further improved.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1.一种基于眼神分析的情绪识别方法,其特征在于,包括以下步骤:1. An emotion recognition method based on eye expression analysis, characterized in that it comprises the following steps: 提取连续多个时刻的眼睛图像,将每张眼睛图像进行分割,得到虹膜区域、巩膜区域和眼周区域;Extract eye images at multiple consecutive moments, and segment each eye image to obtain the iris area, sclera area, and eye periorbital area; 将位置接触的虹膜区域和巩膜区域的组成区域标记为眼睛区域;marking a region composed of the iris region and the sclera region where the position contacts as an eye region; 对眼周区域中各个子区域提取纹理值,获取纹理离心值,构建纹理离心向量;Extracting texture values from each sub-region in the periorbital region, obtaining texture centrifugal values, and constructing texture centrifugal vectors; 根据各个时刻眼睛区域的宽度,获取宽度离心值,构建宽度离心向量;According to the width of the eye area at each moment, the width centrifugal value is obtained and the width centrifugal vector is constructed; 根据眼睛区域与虹膜区域的距离,计算位置离心值,构建位置离心向量;According to the distance between the eye area and the iris area, the position centrifugal value is calculated and the position centrifugal vector is constructed; 对每个向量获取特征变化因子;Get the feature change factor for each vector; 将纹理离心向量、宽度离心向量、位置离心向量、以及各向量对应的特征变化因子输入情绪识别神经网络模型中,得到情绪类型。The texture centrifugal vector, the width centrifugal vector, the position centrifugal vector, and the characteristic change factors corresponding to each vector are input into the emotion recognition neural network model to obtain the emotion type. 2.根据权利要求1所述的基于眼神分析的情绪识别方法,其特征在于,分割的过程包括:2. The emotion recognition method based on eye contact analysis according to claim 1, wherein the segmentation process comprises: 将每张眼睛图像灰度处理,根据灰度值,对像素点聚类处理,得到多个簇;The grayscale of each eye image is processed, and the pixels are clustered according to the grayscale value to obtain multiple clusters; 计算每个簇的平均灰度分别与存储虹膜灰度值的相似度,找出最大相似度对应的簇作为虹膜区域;Calculate the similarity between the average grayscale of each cluster and the stored iris grayscale value, and find the cluster corresponding to the maximum similarity as the iris region; 计算每个簇的平均灰度分别与存储巩膜灰度值的相似度,找出最大相似度对应的簇作为巩膜区域;Calculate the similarity between the average grayscale of each cluster and the stored sclera grayscale value, and find the cluster corresponding to the maximum similarity as the sclera area; 将与巩膜区域和虹膜区域相邻的其他簇,作为眼周区域。The other clusters adjacent to the sclera region and the iris region are regarded as the periocular region. 3.根据权利要求1所述的基于眼神分析的情绪识别方法,其特征在于,构建纹理离心向量的过程包括:3. The emotion recognition method based on eye expression analysis according to claim 1, wherein the process of constructing the texture centrifugal vector comprises: 对眼周区域提取轮廓,得到轮廓图;Extract the contour of the peri-eye area to obtain a contour map; 统计轮廓图中轮廓点的数量,并对数量归一化处理,得到每个眼周区域的轮廓密集度;Count the number of contour points in the contour map and normalize the number to obtain the contour density of each periocular area; 计算眼周区域中各个灰度值的标准差,并对标准差归一化处理,得到每个眼周区域的灰度波动值;Calculate the standard deviation of each grayscale value in the periocular area, and normalize the standard deviation to obtain the grayscale fluctuation value of each periocular area; 将属于同一眼周区域的轮廓密集度和灰度波动值相加,得到纹理值;The contour density and grayscale fluctuation values belonging to the same periocular area are added together to obtain the texture value; 将每个眼周区域的纹理值减去平均纹理值,得到纹理离心值;The texture centrifugal value was obtained by subtracting the average texture value from the texture value of each periocular area; 将各个时刻眼周区域的纹理离心值,构建为纹理离心向量。The texture centrifugal value of the eye area at each moment is constructed as a texture centrifugal vector. 4.根据权利要求1所述的基于眼神分析的情绪识别方法,其特征在于,构建宽度离心向量的过程包括:4. The emotion recognition method based on eye expression analysis according to claim 1, characterized in that the process of constructing the width centrifugal vector comprises: 对眼睛区域构建外接矩形;Construct a bounding rectangle for the eye area; 提取外接矩形的宽度;Extract the width of the bounding rectangle; 将每个眼睛区域的宽度减去平均宽度,得到宽度离心值;Subtract the average width from the width of each eye region to obtain the width eccentricity value; 将各个时刻的眼睛区域的宽度离心值,构建为宽度离心向量。The width eccentric value of the eye area at each moment is constructed as a width eccentric vector. 5.根据权利要求1所述的基于眼神分析的情绪识别方法,其特征在于,构建位置离心向量的过程为:5. The emotion recognition method based on eye expression analysis according to claim 1, characterized in that the process of constructing the position centrifugal vector is: 对每个时刻的眼睛区域获取眼睛中心位置;Get the eye center position for the eye area at each moment; 对每个时刻的虹膜区域获取虹膜中心位置;Obtain the iris center position for the iris area at each moment; 计算虹膜中心位置与眼睛中心位置的距离,得到虹膜的位置离心值;Calculate the distance between the center of the iris and the center of the eye to obtain the eccentricity value of the iris position; 将各个时刻的虹膜的位置离心值构建为位置离心向量。The position eccentricity value of the iris at each moment is constructed as a position eccentricity vector. 6.根据权利要求5所述的基于眼神分析的情绪识别方法,其特征在于,虹膜的位置离心值的公式为:,其中,γ为虹膜的位置离心值,xr为虹膜中心的横坐标,yr为虹膜中心的纵坐标,xe为眼睛中心的横坐标,ye为眼睛中心的纵坐标,f为符号函数,在xr-xe为大于0时,对f(xr-xe)赋值为1,在xr-xe为小于0时,对f(xr-xe)赋值为-1。6. The emotion recognition method based on eye expression analysis according to claim 5, characterized in that the formula for the eccentricity value of the iris position is: , where γ is the eccentricity of the iris, xr is the horizontal coordinate of the iris center, yr is the vertical coordinate of the iris center, xe is the horizontal coordinate of the eye center, ye is the vertical coordinate of the eye center, and f is the sign function. When xr - xe is greater than 0, f( xr - xe ) is assigned a value of 1. When xr - xe is less than 0, f( xr - xe ) is assigned a value of -1. 7.根据权利要求1所述的基于眼神分析的情绪识别方法,其特征在于,获取特征变化因子的过程包括:7. The emotion recognition method based on eye expression analysis according to claim 1, wherein the process of obtaining the characteristic change factor comprises: 对每个向量中相邻元素差值的绝对值进行求和,得到总特征变化值;Sum the absolute values of the differences between adjacent elements in each vector to obtain the total feature change value; 根据相邻元素的组数,对总特征变化值取均值,得到特征变化平均值;According to the number of groups of adjacent elements, the total feature change value is averaged to obtain the feature change average value; 对特征变化平均值进行归一化处理,得到特征变化因子。The feature change average is normalized to obtain the feature change factor. 8.根据权利要求1所述的基于眼神分析的情绪识别方法,其特征在于,情绪识别神经网络模型包括:纹理向量特征提取模块、宽度向量特征提取模块、位置向量特征提取模块、纹理特征融合模块、宽度特征融合模块、位置特征融合模块和全连接层;8. The emotion recognition method based on eye contact analysis according to claim 1, characterized in that the emotion recognition neural network model comprises: a texture vector feature extraction module, a width vector feature extraction module, a position vector feature extraction module, a texture feature fusion module, a width feature fusion module, a position feature fusion module and a fully connected layer; 纹理向量特征提取模块的输入端用于输入纹理离心向量;宽度向量特征提取模块的输入端用于输入宽度离心向量;位置向量特征提取模块的输入端用于输入位置离心向量;The input end of the texture vector feature extraction module is used to input the texture centrifugal vector; the input end of the width vector feature extraction module is used to input the width centrifugal vector; the input end of the position vector feature extraction module is used to input the position centrifugal vector; 纹理特征融合模块的第一输入端与纹理向量特征提取模块的输出端连接,其第二输入端用于输入纹理特征变化因子;The first input end of the texture feature fusion module is connected to the output end of the texture vector feature extraction module, and the second input end thereof is used to input a texture feature change factor; 宽度特征融合模块的第一输入端与宽度向量特征提取模块的输出端连接,其第二输入端用于输入宽度特征变化因子;The first input end of the width feature fusion module is connected to the output end of the width vector feature extraction module, and the second input end thereof is used to input a width feature change factor; 位置特征融合模块的第一输入端与位置向量特征提取模块的输出端连接,其第二输入端用于输入位置特征变化因子;The first input end of the position feature fusion module is connected to the output end of the position vector feature extraction module, and the second input end thereof is used to input the position feature change factor; 全连接层的输入端分别与纹理特征融合模块的输出端、宽度特征融合模块的输出端和位置特征融合模块的输出端连接,其输出端作为情绪识别神经网络模型的输出端。The input end of the fully connected layer is connected to the output end of the texture feature fusion module, the output end of the width feature fusion module and the output end of the position feature fusion module respectively, and its output end serves as the output end of the emotion recognition neural network model. 9.根据权利要求8所述的基于眼神分析的情绪识别方法,其特征在于,纹理向量特征提取模块用于对纹理离心向量提取纹理特征;宽度向量特征提取模块用于对宽度离心向量提取宽度特征;位置向量特征提取模块用于对位置离心向量提取位置特征;9. The emotion recognition method based on eye expression analysis according to claim 8 is characterized in that the texture vector feature extraction module is used to extract texture features from the texture centrifugal vector; the width vector feature extraction module is used to extract width features from the width centrifugal vector; and the position vector feature extraction module is used to extract position features from the position centrifugal vector; 纹理特征融合模块用于将纹理特征和纹理特征变化因子进行融合,得到纹理融合特征;宽度特征融合模块用于将宽度特征和宽度特征变化因子进行融合,得到宽度融合特征;位置特征融合模块用于将位置特征和位置特征变化因子进行融合,得到位置融合特征;The texture feature fusion module is used to fuse the texture feature and the texture feature change factor to obtain the texture fusion feature; the width feature fusion module is used to fuse the width feature and the width feature change factor to obtain the width fusion feature; the position feature fusion module is used to fuse the position feature and the position feature change factor to obtain the position fusion feature; 全连接层用于根据纹理融合特征、宽度融合特征和位置融合特征进行分类,得到情绪类型。The fully connected layer is used to classify the emotion type based on the texture fusion features, width fusion features and position fusion features. 10.根据权利要求8所述的基于眼神分析的情绪识别方法,其特征在于,纹理向量特征提取模块、宽度向量特征提取模块和位置向量特征提取模块均包括依次连接的LSTM网络、二维特征构建层和CNN网络;10. The emotion recognition method based on eye expression analysis according to claim 8, characterized in that the texture vector feature extraction module, the width vector feature extraction module and the position vector feature extraction module all include an LSTM network, a two-dimensional feature construction layer and a CNN network connected in sequence; LSTM网络用于对向量提取浅层特征;二维数据构建层用于将浅层特征构建为二维特征;CNN网络用于对浅层特征提取深层特征。The LSTM network is used to extract shallow features from vectors; the two-dimensional data construction layer is used to construct shallow features into two-dimensional features; and the CNN network is used to extract deep features from shallow features.
CN202510592066.2A 2025-05-09 2025-05-09 An emotion recognition method based on eye gaze analysis Active CN120108028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510592066.2A CN120108028B (en) 2025-05-09 2025-05-09 An emotion recognition method based on eye gaze analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510592066.2A CN120108028B (en) 2025-05-09 2025-05-09 An emotion recognition method based on eye gaze analysis

Publications (2)

Publication Number Publication Date
CN120108028A CN120108028A (en) 2025-06-06
CN120108028B true CN120108028B (en) 2025-07-18

Family

ID=95890316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510592066.2A Active CN120108028B (en) 2025-05-09 2025-05-09 An emotion recognition method based on eye gaze analysis

Country Status (1)

Country Link
CN (1) CN120108028B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343943A (en) * 2021-07-21 2021-09-03 西安电子科技大学 Eye image segmentation method based on sclera region supervision

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10909363B2 (en) * 2019-05-13 2021-02-02 Fotonation Limited Image acquisition system for off-axis eye images
CN112163456B (en) * 2020-08-28 2024-04-09 北京中科虹霸科技有限公司 Identity recognition model training method, testing method, recognition method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343943A (en) * 2021-07-21 2021-09-03 西安电子科技大学 Eye image segmentation method based on sclera region supervision

Also Published As

Publication number Publication date
CN120108028A (en) 2025-06-06

Similar Documents

Publication Publication Date Title
Amrutha et al. ML based sign language recognition system
CN106960202B (en) Smiling face identification method based on visible light and infrared image fusion
Tome et al. Facial soft biometric features for forensic face recognition
US6611613B1 (en) Apparatus and method for detecting speaking person's eyes and face
US20160371539A1 (en) Method and system for extracting characteristic of three-dimensional face image
CN102013011B (en) Front-face-compensation-operator-based multi-pose human face recognition method
CN109271930B (en) Micro-expression recognition method, device and storage medium
CN112818899B (en) Face image processing method, device, computer equipment and storage medium
CN108062543A (en) A kind of face recognition method and device
WO2021196721A1 (en) Cabin interior environment adjustment method and apparatus
CN108629336A (en) Face value calculating method based on human face characteristic point identification
CN111460950B (en) Cognitive distraction method based on head-eye evidence fusion in natural driving conversation behavior
CN110543848B (en) Driver action recognition method and device based on three-dimensional convolutional neural network
CN113920575A (en) Facial expression recognition method and device and storage medium
CN110008920A (en) Research on facial expression recognition method
Jacintha et al. A review on facial emotion recognition techniques
CN116645717B (en) A micro-expression recognition method and system based on PCANet+ and LSTM
RU2768797C1 (en) Method and system for determining synthetically modified face images on video
CN120108028B (en) An emotion recognition method based on eye gaze analysis
CN107977622B (en) Eye state detection method based on pupil characteristics
Kim et al. Facial landmark extraction scheme based on semantic segmentation
Pathak et al. Multimodal eye biometric system based on contour based E-CNN and multi algorithmic feature extraction using SVBF matching
Lin et al. A gender classification scheme based on multi-region feature extraction and information fusion for unconstrained images
Barra et al. F-FID: fast fuzzy-based iris de-noising for mobile security applications
CN110688872A (en) Lip-based person identification method, device, program, medium, and electronic apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant