CN112800998A

CN112800998A - Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Info

Publication number: CN112800998A
Application number: CN202110159085.8A
Authority: CN
Inventors: 卢官明; 朱清扬; 卢峻禾
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-05-14
Anticipated expiration: 2041-02-05
Also published as: CN112800998B

Abstract

The invention discloses a multimodal emotion recognition method and system integrating attention mechanism and discriminating multiple set canonical correlation analysis (DMCCA). The method includes: extracting EEG signal features, peripheral physiological signal features and expression features respectively from preprocessed EEG signals, peripheral physiological signals and facial expression videos; using an attention mechanism to extract discriminative EEG emotional features respectively , peripheral physiological emotion feature, expression emotion feature; use DMCCA method on EEG emotion feature, peripheral physiological emotion feature and expression emotion feature to obtain EEG-peripheral physiological-expression multimodal emotion feature; use classifier to classify multimodal emotion Features are classified and identified. The invention adopts the attention mechanism to selectively focus on the more emotionally discriminating features in each modal, and makes full use of the correlation and complementarity between the emotional features of different modalities in combination with DMCCA, which can effectively improve the accuracy of emotion recognition. and robustness.

Description

Multimodal emotion recognition method and system integrating attention mechanism and DMCCA

技术领域technical field

本发明涉及情感识别和人工智能技术领域，尤其涉及一种融合注意力机制和鉴别多重集典型相关分析(DMCCA)的多模态情感识别方法及系统。The present invention relates to the technical field of emotion recognition and artificial intelligence, and in particular, to a multimodal emotion recognition method and system integrating attention mechanism and discriminative multiple set canonical correlation analysis (DMCCA).

背景技术Background technique

人类情感是伴随着人的意识过程产生的心理和生理状态，在人际交流中起着重要作用。随着人工智能等技术的不断进步，获得更加智能化、人性化的人机交互(Human–Computer Interactions，HCIs)体验越来越受到人们的关注。人们对机器智能化的要求越来越高，期望机器能够具有感知、理解甚至表达情感的能力，实现人性化的人机交互，更好地为人类服务。情感识别作为情感计算的一个分支，是实现人-机情感交互的基础和核心技术，已经成为计算机科学、认知科学和人工智能等领域的研究热点，受到了学术界和工业界的广泛关注。例如，在临床医疗护理中，如果能够知道患者，尤其是有表达障碍的患者的情感状态，就可以采取不同的护理措施，以提高护理质量。此外，在对精神障碍患者的心理行为监控、情感机器人的人-机友好交互等方面也得到了越来越多的关注。Human emotion is the psychological and physiological state that accompanies the process of human consciousness and plays an important role in interpersonal communication. With the continuous progress of artificial intelligence and other technologies, obtaining a more intelligent and humanized human-computer interaction (Human–Computer Interactions, HCIs) experience has attracted more and more attention. People have higher and higher requirements for machine intelligence, expecting machines to have the ability to perceive, understand and even express emotions, realize human-computer interaction, and better serve human beings. As a branch of affective computing, emotion recognition is the basic and core technology for realizing human-computer emotional interaction. For example, in clinical medical care, if the emotional state of patients, especially those with expression disorders, can be known, different nursing measures can be taken to improve the quality of care. In addition, more and more attention has been paid to the psychological behavior monitoring of patients with mental disorders and the human-machine friendly interaction of emotional robots.

以往对情感识别的研究大多集中在通过单一模态的信息来识别人类情感状态，比如基于语音的情感识别、基于面部表情的情感识别等。由于单一的语音或表情信息所表达的情感信息是不完整的，且容易受到外界各种因素的影响，例如面部表情识别容易受遮挡和光照变化的影响，而基于语音的情感识别容易受环境噪音干扰和不同受试者的声音差异的影响，此外，有时候人们为了掩盖自己的真实情感而强颜欢笑、装腔作势或沉默不语，此时，面部表情或身体姿态具有一定的欺骗性，而当人们沉默不语时基于语音的情感识别方法就会失效，所以，单模态情感识别存在一定的局限性。因此，越来越多的研究人员将目光转向基于多模态信息融合的情感识别研究，期望能够利用各个模态信息之间的互补性，来构建鲁棒的情感识别模型，以达到更高的情感识别准确率。Most of the previous researches on emotion recognition focus on identifying human emotional states through a single modality of information, such as speech-based emotion recognition, facial expression-based emotion recognition, etc. Because the emotional information expressed by a single voice or facial expression information is incomplete and easily affected by various external factors, for example, facial expression recognition is easily affected by occlusion and illumination changes, while speech-based emotion recognition is easily affected by environmental noise. The effects of interference and differences in the voice of different subjects, in addition, sometimes people smile, pose, or remain silent in order to hide their true emotions, at this time, facial expressions or body gestures can be deceptive, and when people Speech-based emotion recognition methods will fail when there is silence, so single-modal emotion recognition has certain limitations. Therefore, more and more researchers turn their attention to emotion recognition research based on multimodal information fusion, expecting to use the complementarity between each modal information to build a robust emotion recognition model to achieve higher Emotion recognition accuracy.

目前，在进行多模态情感识别研究中，比较常用的信息融合策略有决策层融合和特征层融合。决策层融合通常基于各模态单独识别的结果，再依据相关规则，如均值(Mean)规则、求和(Sum)规则、最大值(Max)规则、少数服从多数的投票表决机制等，做出决策判断，得到最终的识别结果。决策层融合技术根据不同模态信息对于情感识别的贡献不同，比较全面地考虑了不同模态信息之间的差异性，但却忽略了不同模态信息之间的相关性。基于决策层融合的多模态情感识别性能不仅与单个模态的情感识别率有关，还依赖于决策层融合算法的性能。特征层融合是指将多个模态的情感特征联合起来，形成一个融合特征向量。特征层融合方法利用了不同模态情感特征的互补性，但如何确定不同模态情感特征的权值，以体现不同特征在情感分类识别中的差异性，是进行多模态特征融合的关键，目前仍然是面临挑战的开放课题。At present, in the multimodal emotion recognition research, the more commonly used information fusion strategies are decision-level fusion and feature-level fusion. The decision-making level fusion is usually based on the results of the individual identification of each mode, and then according to relevant rules, such as the mean (Mean) rule, the sum (Sum) rule, the maximum value (Max) rule, and the voting mechanism where the minority obeys the majority, etc. Decision-making and judgment to obtain the final identification result. According to the different contributions of different modal information to emotion recognition, decision-level fusion technology comprehensively considers the difference between different modal information, but ignores the correlation between different modal information. The multimodal emotion recognition performance based on decision-level fusion is not only related to the emotion recognition rate of a single modality, but also depends on the performance of the decision-level fusion algorithm. Feature layer fusion refers to combining the emotional features of multiple modalities to form a fusion feature vector. The feature layer fusion method utilizes the complementarity of different modal emotional features, but how to determine the weights of different modal emotional features to reflect the differences in emotion classification and recognition of different features is the key to multi-modal feature fusion. It is still an open topic facing challenges.

发明内容SUMMARY OF THE INVENTION

发明目的：针对单模态情感识别准确率低、鲁棒性差以及现有多模态情感特征融合方法存在的缺点，本发明的目的是提供一种融合注意力机制和鉴别多重集典型相关分析(DMCCA)的多模态情感识别方法及系统，通过引入注意力机制选择性地重点关注各模态中有鉴别力的情感特征，并结合DMCCA充分利用不同模态情感特征之间的相关性和互补性，可以有效提高多模态情感识别的准确率和鲁棒性。Purpose of the invention: In view of the low accuracy of single-modal emotion recognition, poor robustness and the shortcomings of existing multi-modal emotional feature fusion methods, the purpose of the present invention is to provide a fusion attention mechanism and identification of multiple sets of canonical correlation analysis ( DMCCA) multi-modal emotion recognition method and system, selectively focus on the discriminative emotional features in each modality by introducing an attention mechanism, and make full use of the correlation and complementarity between emotional features of different modalities in combination with DMCCA It can effectively improve the accuracy and robustness of multimodal emotion recognition.

技术方案：本发明为实现上述发明目的采用以下技术方案：Technical scheme: the present invention adopts the following technical scheme for realizing the above-mentioned purpose of the invention:

一种融合注意力机制和DMCCA的多模态情感识别方法，包括以下步骤：A multimodal emotion recognition method integrating attention mechanism and DMCCA, including the following steps:

(1)对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量，对预处理后的外周生理信号，通过抽取信号波形描述符及其统计特征，提取外周生理信号特征向量；(1) Extract the EEG signal feature vector and the expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. For the preprocessed peripheral physiological signal, extract the signal waveform to describe Symbol and its statistical characteristics, extract peripheral physiological signal feature vector;

(2)对所述的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量，并分别使用注意力机制模块确定不同特征向量组的重要性权重，通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量；(2) The EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector are respectively mapped into several groups of feature vectors through a linear transformation matrix, and the attention mechanism module is used to determine the importance weights of different feature vector groups. , through weighted fusion to form discriminative EEG emotion feature vectors, peripheral physiological emotion feature vectors, and expression emotion feature vectors with the same dimension;

(3)对所述的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量，使用鉴别多重集典型相关分析(DMCCA)方法，通过最大化同一类别样本的不同模态情感特征之间的相关性，确定各情感特征向量的投影矩阵，并将各情感特征向量投影到一个公共子空间，相加融合后得到脑电-外周生理-表情多模态情感特征向量；(3) For the EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector, the discriminative multiple set canonical correlation analysis (DMCCA) method is used to maximize the difference between different modal emotion features of the same class of samples. Determine the projection matrix of each emotional feature vector, project each emotional feature vector into a common subspace, and obtain the EEG-peripheral physiology-expression multimodal emotional feature vector after addition and fusion;

(4)使用分类器对多模态情感特征向量进行分类识别，得到情感类别。(4) Use the classifier to classify and identify the multimodal emotion feature vector to obtain the emotion category.

进一步地，步骤(2)中使用注意力机制模块提取有鉴别力的脑电情感特征、外周生理情感特征、表情情感特征的具体步骤包括：Further, the specific steps of using the attention mechanism module to extract discriminative EEG emotional features, peripheral physiological emotional features, and facial expression emotional features in step (2) include:

(2.1)将步骤(1)提取到的脑电信号特征以矩阵形式表示成

并通过线性变换矩阵W⁽¹⁾映射成M₁组特征向量

4≤M₁≤16，每组特征向量的维数为N，16≤N≤64，令

其线性变换表达式为：(2.1) The EEG signal features extracted in step (1) are expressed in matrix form as

And through the linear transformation matrix W ⁽¹⁾ is mapped into M ₁ set of eigenvectors

4≤M ₁ ≤16, the dimension of each set of eigenvectors is N, 16≤N≤64, let

Its linear transformation expression is:

E⁽¹⁾＝(F⁽¹⁾)^TW⁽¹⁾ E ⁽¹⁾ = (F ⁽¹⁾ ) ^T W ⁽¹⁾

其中，上标(1)代表脑电模态，T表示转置符号；Among them, the superscript (1) represents the EEG mode, and T represents the transpose symbol;

使用第一个注意力机制模块来确定不同特征向量组的重要性权重，通过加权融合形成有鉴别力的脑电情感特征向量，其中第r组脑电信号特征向量的权重

以及脑电情感特征向量x⁽¹⁾表示为：The first attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative EEG emotion feature vector is formed through weighted fusion, where the weight of the rth group EEG signal feature vector

And the EEG emotion feature vector x ⁽¹⁾ is expressed as:

其中，r＝1,2,…,M₁，

表示第r组脑电信号特征向量，

为可训练的线性变换参数向量，exp(·)表示以自然常数e为底的指数函数；Among them, r=1,2,...,M ₁ ,

represents the eigenvectors of the rth group EEG signals,

is the trainable linear transformation parameter vector, exp( ) represents the exponential function with the natural constant e as the base;

(2.2)将步骤(1)提取到的外周生理信号特征以矩阵形式表示成

并通过线性变换矩阵W⁽²⁾映射成M₂组特征向量

4≤M₂≤16，令

其线性变换表达式为：(2.2) The peripheral physiological signal features extracted in step (1) are expressed in matrix form as

And through the linear transformation matrix W ⁽²⁾ is mapped into M ₂ groups of eigenvectors

4≤M ₂ ≤16, let

Its linear transformation expression is:

E⁽²⁾＝(F⁽²⁾)^TW⁽²⁾ E ⁽²⁾ = (F ⁽²⁾ ) ^T W ⁽²⁾

其中，上标(2)代表外周生理模态；Among them, the superscript (2) represents the peripheral physiological mode;

使用第二个注意力机制模块来确定不同特征向量组的重要性权重，通过加权融合形成有鉴别力的外周生理情感特征向量，其中第s组外周生理信号特征向量的权重

以及外周生理情感特征向量x⁽²⁾表示为：The second attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative peripheral physiological emotion feature vector is formed by weighted fusion, where the weight of the sth group of peripheral physiological signal feature vectors

And the peripheral physiological emotion feature vector x ⁽²⁾ is expressed as:

其中，s＝1,2,…,M₂，

表示第s组外周生理信号特征向量，

为可训练的线性变换参数向量；Among them, s=1,2,...,M ₂ ,

represents the feature vector of peripheral physiological signals in group s,

is a trainable linear transformation parameter vector;

(2.3)将步骤(1)提取到的表情特征以矩阵形式表示成

并通过线性变换矩阵W⁽³⁾映射成M₃组特征向量

4≤M₃≤16，令

其线性变换表达式为：(2.3) Express the facial expression features extracted in step (1) in matrix form as

And through the linear transformation matrix W ⁽³⁾ mapped into M ₃ groups of eigenvectors

4≤M ₃ ≤16, let

Its linear transformation expression is:

E⁽³⁾＝(F⁽³⁾)^TW⁽³⁾ E ⁽³⁾ = (F ⁽³⁾ ) ^T W ⁽³⁾

其中，上标(3)代表表情模态；Among them, the superscript (3) represents the expression mode;

使用第三个注意力机制模块来确定不同特征向量组的重要性权重，通过加权融合形成有鉴别力的表情情感特征向量，其中第t组表情特征向量的权重

以及表情情感特征向量x⁽³⁾表示为：Use the third attention mechanism module to determine the importance weights of different feature vector groups, and form discriminative emotion feature vectors through weighted fusion, where the weight of the t-th group of expression feature vectors

And the expression emotion feature vector x ⁽³⁾ is expressed as:

其中，t＝1,2,…,M₃，

表示第t组表情特征向量，

为可训练的线性变换参数向量。Among them, t=1,2,...,M ₃ ,

represents the t-th group of expression feature vectors,

is the trainable linear transformation parameter vector.

进一步地，步骤(3)具体包括以下子步骤：Further, step (3) specifically includes the following substeps:

(3.1)获取通过训练得到的分别对应于脑电情感特征、外周生理情感特征和表情情感特征的DMCCA投影矩阵

和

32≤d≤128；(3.1) Obtaining the DMCCA projection matrix corresponding to EEG emotional features, peripheral physiological emotional features and facial emotional features obtained through training

and

32≤d≤128;

(3.2)分别使用投影矩阵Ω、Φ和Ψ将步骤(2)提取到的脑电情感特征向量x⁽¹⁾、外周生理情感特征向量x⁽²⁾和表情情感特征向量x⁽³⁾投影到一个d维的公共子空间，其中脑电情感特征向量x⁽¹⁾到d维公共子空间的投影为Ω^Tx⁽¹⁾，外周生理情感特征向量x⁽²⁾到d维公共子空间的投影为Ψ^Tx⁽²⁾，表情情感特征向量x⁽³⁾到d维公共子空间的投影为Ψ^Tx⁽³⁾；(3.2) Using projection matrices Ω, Φ and Ψ to project the EEG emotion feature vector x ⁽¹⁾ , peripheral physiological emotion feature vector x ⁽²⁾ and expression emotion feature vector x ⁽³⁾ extracted in step (2) to A d-dimensional common subspace, in which the projection of the EEG emotion feature vector x ⁽¹⁾ to the d-dimensional common subspace is Ω ^T x ⁽¹⁾ , and the peripheral physiological emotion feature vector x ⁽²⁾ to the d-dimensional common subspace. The projection is Ψ ^T x ⁽²⁾ , and the projection of the expression emotion feature vector x ⁽³⁾ to the d-dimensional common subspace is Ψ ^T x ⁽³⁾ ;

(3.3)将Ω^Tx⁽¹⁾、Φ^Tx⁽²⁾和Ψ^Tx⁽³⁾进行融合，得到脑电-外周生理-表情多模态情感特征向量为Ω^Tx⁽¹⁾+Φ^Tx⁽²⁾+Ψ^Tx⁽³⁾。(3.3) Integrate Ω ^T x ⁽¹⁾ , Φ ^T x ⁽²⁾ and Ψ ^T x ⁽³⁾ to obtain the EEG-peripheral physiology-expression multimodal emotional feature vector as Ω ^T x ⁽¹⁾ +Φ ^T x ⁽²⁾ + Ψ ^T x ⁽³⁾ .

进一步地，步骤(3.1)中的投影矩阵Ω、Φ和Ψ通过以下步骤的训练得到：Further, the projection matrices Ω, Φ and Ψ in step (3.1) are obtained through the training of the following steps:

(3.1.1)从训练样本集中分别抽取各情感类别的训练样本生成3组情感特征向量

其中

M为训练样本数，N为

的维数，i＝1,2,3，m＝1,2,…,M；令i＝1代表脑电模态，i＝2代表外周生理模态，i＝3代表表情模态，

代表脑电情感特征向量,

代表外周生理情感特征向量,

代表表情情感特征向量；(3.1.1) Extract the training samples of each emotion category from the training sample set to generate 3 sets of emotion feature vectors

in

M is the number of training samples, and N is

The dimension of , i=1, 2, 3, m=1, 2,...,M; let i=1 represent the EEG modality, i=2 represent the peripheral physiological modality, i=3 represent the expression modality,

represents the EEG emotion feature vector,

represents the peripheral physiological emotion feature vector,

Represents the expression emotion feature vector;

(3.1.2)计算X⁽ⁱ⁾中各列向量的均值，对X⁽ⁱ⁾进行中心化操作；(3.1.2) Calculate the mean value of each column vector in X ( ⁱ ^), and perform the centering operation on X (i);

(3.1.3)基于鉴别多重集典型相关分析(DMCCA)的思想求得一组投影矩阵Ω、Φ和Ψ，使得同类样本在公共投影子空间的线性相关性最大，同时最大化模态内数据的类间散布与最小化模态内数据的类内散布，令X⁽ⁱ⁾的投影向量为

i＝1,2,3，DMCCA的目标函数为：(3.1.3) Based on the idea of discriminative multiple set canonical correlation analysis (DMCCA), a set of projection matrices Ω, Φ and Ψ are obtained, so that the linear correlation of similar samples in the common projection subspace is maximized, and the intra-modal data is maximized. The inter-class scatter of and minimizing the intra-class scatter of intra-modal data, let the projection vector of X ⁽ⁱ⁾ be

i=1,2,3, the objective function of DMCCA is:

其中，

表示X⁽ⁱ⁾的类内散布矩阵，

表示X⁽ⁱ⁾的类间散布矩阵，cov(·,·)表示协方差，i,j∈{1,2,3}；in,

represents the intra-class scatter matrix of X ⁽ⁱ⁾ ,

represents the inter-class scatter matrix of X ⁽ⁱ⁾ , cov(·,·) represents the covariance, i,j∈{1,2,3};

构建如下优化模型并求解得到投影矩阵Ω、Φ和Ψ：The following optimization model is constructed and solved to obtain the projection matrices Ω, Φ and Ψ:

进一步地，使用拉格朗日乘子法(Lagrange multiplier)求解DMCCA目标函数的优化模型，可得到如下拉格朗日(Lagrange)函数：Further, using the Lagrange multiplier method to solve the optimization model of the DMCCA objective function, the following Lagrange function can be obtained:

其中，λ是拉格朗日乘子，再分别求L(w⁽¹⁾，w⁽²⁾，w⁽³⁾)对w⁽¹⁾、w⁽²⁾和w⁽³⁾的偏导数并令其为零，即令Among them, λ is the Lagrange multiplier, and then find the partial derivatives of L(w ⁽¹⁾ , w ⁽²⁾ , w ⁽³⁾ ) to w ⁽¹⁾ , w ⁽²⁾ and w ⁽³⁾ respectively and combine Let it be zero, that is,

得到get

进一步对上式作简化处理，则可获得如下的广义特征值问题：By further simplifying the above formula, the following generalized eigenvalue problem can be obtained:

通过求解上式中的广义特征值问题，选取前d个最大特征值λ₁≥λ₂≥…≥λ_d对应的特征向量，即可得到投影矩阵

和

By solving the generalized eigenvalue problem in the above formula, the projection matrix can be obtained by selecting the eigenvectors corresponding to the first d largest eigenvalues λ ₁ ≥λ ₂ ≥...≥λ _d

and

基于相同的发明构思，本发明提供的融合注意力机制和DMCCA的多模态情感识别系统，包括：Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the present invention includes:

特征初步提取模块，用于对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量，对预处理后的外周生理信号，通过抽取信号波形描述符及其统计特征，提取外周生理信号特征向量；The feature preliminary extraction module is used to extract the EEG signal feature vector and the facial expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. Extract the signal waveform descriptor and its statistical features, and extract the peripheral physiological signal feature vector;

特征鉴别增强模块，用于对所述的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量，并分别使用注意力机制模块确定不同特征向量组的重要性权重，通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量；The feature identification enhancement module is used to map the EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector into several groups of feature vectors through a linear transformation matrix, and use the attention mechanism module to determine different feature vector groups. The importance weights of , and the discriminative EEG emotion feature vector, peripheral physiological emotion feature vector, and expression emotion feature vector with the same dimension are formed by weighted fusion;

投影矩阵确定模块，用于使用鉴别多重集典型相关分析(DMCCA)方法，通过最大化同一类别样本的不同模态情感特征之间的相关性，确定各情感特征向量的投影矩阵；The projection matrix determination module is used to determine the projection matrix of each emotion feature vector by maximizing the correlation between different modal emotion features of the same class of samples by using the discriminative multiple set canonical correlation analysis (DMCCA) method;

特征融合模块，用于对所述的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量，通过各自对应的投影矩阵投影到一个公共子空间，相加融合后得到脑电-外周生理-表情多模态情感特征向量；The feature fusion module is used to project the EEG emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector into a common subspace through their corresponding projection matrices, and then add and fuse to obtain the EEG-peripheral physiological emotion feature vector. - Expression multimodal emotion feature vector;

以及，分类识别模块，用于使用分类器对多模态情感特征向量进行分类识别，得到情感类别。And, the classification and recognition module is used for classifying and recognizing the multimodal emotion feature vector by using the classifier to obtain the emotion category.

基于相同的发明构思，本发明提供的融合注意力机制和DMCCA的多模态情感识别系统，包括至少一台计算设备，所述计算设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述计算机程序被加载至处理器时实现所述的融合注意力机制和DMCCA的多模态情感识别方法。Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the present invention includes at least one computing device, and the computing device includes a memory, a processor, and a multimodal emotion recognition system stored on the memory and available on the processor. A computer program running on the computer program, when the computer program is loaded into the processor, realizes the multimodal emotion recognition method of fusion attention mechanism and DMCCA.

有益效果：与现有技术相比，本发明具有以下技术效果：Beneficial effects: compared with the prior art, the present invention has the following technical effects:

(1)本发明采用注意力机制选择性地重点关注各模态中对情感识别起关键作用的显著性特征，自适应地学习具有情感鉴别能力的特征，可以有效提高多模态情感识别的准确率和鲁棒性。(1) The present invention adopts the attention mechanism to selectively focus on the salient features that play a key role in emotion recognition in each modal, and adaptively learns the features with emotion discrimination ability, which can effectively improve the accuracy of multimodal emotion recognition. rate and robustness.

(2)本发明采用鉴别多重集典型相关分析方法，引入了样本的类别信息，通过最大化同一类别样本不同模态情感特征之间的相关性，以及最大化同一模态情感特征的类间散布与最小化同一模态情感特征的类内散布，能够挖掘不同模态之间非线性的相关关系，充分利用脑电情感特征、外周生理情感特征和表情情感特征之间的相关性和互补性，同时又消除了一些无效的冗余特征，可以有效提高特征表示的鉴别力与鲁棒性。(2) The present invention adopts the method of identifying multiple sets of canonical correlation analysis, introduces the category information of samples, maximizes the correlation between different modal emotional features of the same category of samples, and maximizes the inter-class dispersion of the same modal emotional features In addition to minimizing the intra-class dispersion of emotional features of the same modality, it can mine nonlinear correlations between different modalities, and make full use of the correlation and complementarity between EEG emotional features, peripheral physiological emotional features, and facial emotional features. At the same time, some invalid redundant features are eliminated, which can effectively improve the discrimination and robustness of feature representation.

(3)与单模态情感识别方法相比，本发明综合利用了情感表达过程中的多种模态信息，能够结合不同模态的特点并充分利用其互补性来挖掘多模态情感特征，可以有效提高情感识别的准确率和鲁棒性。(3) Compared with the single-modal emotion recognition method, the present invention comprehensively utilizes multiple modal information in the process of emotional expression, and can combine the characteristics of different modalities and make full use of their complementarity to mine multi-modal emotional features. It can effectively improve the accuracy and robustness of emotion recognition.

附图说明Description of drawings

图1是本发明实施例的方法流程图；Fig. 1 is the method flow chart of the embodiment of the present invention;

图2是本发明实施例的结构图。FIG. 2 is a structural diagram of an embodiment of the present invention.

具体实施方式Detailed ways

为了更加详细了解本发明，下面结合说明书附图和具体实施例对本发明做进一步详细的说明。In order to understand the present invention in more detail, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

如图1和图2所示，本发明实施例提供的一种融合注意力机制和DMCCA的多模态情感识别方法，主要包括如下步骤：As shown in FIG. 1 and FIG. 2 , a multimodal emotion recognition method integrating attention mechanism and DMCCA provided by an embodiment of the present invention mainly includes the following steps:

(1)对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量，对预处理后的外周生理信号，通过抽取信号波形描述符及其统计特征，提取外周生理信号特征向量。(1) Extract the EEG signal feature vector and the expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. For the preprocessed peripheral physiological signal, extract the signal waveform to describe Symbol and its statistical characteristics, extract the peripheral physiological signal feature vector.

本实施例中采用DEAP(Database for Emotion Analysis using PhysiologicalSignals)情感数据库，在实际中也可以采用其他的包含脑电、外周生理信号、面部表情视频的情感数据库。本实施例中使用的DEAP数据库是由英国伦敦玛丽皇后大学的Koelstra等人采集并到公开的多模态情感数据库。该数据库包含32名受试者在观看40个时长均为1分钟的不同种类音乐视频片段诱发刺激下产生的生理信号、外周生理信号以及前22名受试者在观看音乐视频片段时的面部表情视频。每个受试者需要进行40次实验，并且在每次实验结束后都要进行及时的自我评估(Self-assessment Manikins，SAM)，需要在SAM问卷调查表上进行40次自我评估。SAM问卷调查表包含受试者对视频的唤醒度(Arousal)、效价度(Valence)、支配度(Dominance)和喜好度(Liking)的心理量表。唤醒度表示人的状态兴奋程度，变化范围由平静状态逐渐过渡到兴奋状态，用数字1到9的分值来衡量；效价度也称愉悦度，表示人的心情愉悦程度，变化范围由消极(Negative)状态逐渐过渡到积极(Positive)状态，也用数字1到9的分值来衡量；支配度的变化范围从顺从(或“无控制”)到支配(或“有控制”)；喜好度表示受试者对视频的个人喜好。每位受试者需要在每次实验后选择代表情感状态的分值，用作后面的情感分类的类别和识别分析。In this embodiment, the DEAP (Database for Emotion Analysis using Physiological Signals) emotion database is used. In practice, other emotion databases including EEG, peripheral physiological signals, and facial expression videos may also be used. The DEAP database used in this embodiment is a public multimodal emotion database collected by Koelstra et al. of Queen Mary University of London, UK. The database contains physiological signals, peripheral physiological signals, and facial expressions of the top 22 subjects when watching 40 different kinds of music video clips of 1 minute evoked stimuli. video. Each subject was required to conduct 40 experiments, and a timely self-assessment (Self-assessment Manikins, SAM) was required after each experiment, and 40 self-assessments were required on the SAM questionnaire. The SAM questionnaire contains psychological scales of the subjects' arousal, valence, dominance and liking to the video. The degree of arousal represents the degree of excitement of a person's state, and the change range is gradually transitioned from a calm state to an excited state, which is measured by the scores of numbers 1 to 9; Gradual transition from Negative to Positive, also measured on a scale of 1 to 9; dominance ranging from submissive (or "no control") to dominance (or "controlled"); liking The degree represents the subject's personal preference for the video. Each subject was required to select a score representing an emotional state after each experiment, which was used as a category and identification analysis for subsequent emotional classification.

在DEAP数据库中，生理信号采用512Hz采样、128Hz复采样(官方提供了经过预处理的复采样数据)，每个受试者的生理信号矩阵为40×40×8064(40个不同种类音乐视频片段，40个生理信号通道，8064个采样点)。在采集的40个生理信号通道中，前32个通道采集的是脑电信号，后8个通道采集的是外周生理信号。8064个采样数据则是在128Hz采样率下时长为63s的数据，每一段信号记录前，都有3s静默时间。In the DEAP database, the physiological signal adopts 512Hz sampling and 128Hz multi-sampling (officially provides preprocessed multi-sampling data), and the physiological signal matrix of each subject is 40 × 40 × 8064 (40 different kinds of music video clips) , 40 physiological signal channels, 8064 sampling points). Among the 40 physiological signal channels collected, the first 32 channels collected EEG signals, and the last 8 channels collected peripheral physiological signals. The 8064 sampled data is data with a duration of 63s at a sampling rate of 128Hz. Before each signal is recorded, there is a 3s silence time.

在本发明实施例中，我们采用同时具有脑电信号、外周生理信号和面部表情的880个样本作为训练样本，在唤醒度、效价度、支配度和喜好度的4个维度上分别做二分类识别。In the embodiment of the present invention, we use 880 samples that simultaneously have EEG signals, peripheral physiological signals and facial expressions as training samples, and make two samples in the four dimensions of arousal, valence, dominance, and liking, respectively. Classification identification.

用于提取脑电信号特征的神经网络模型可以采用长短时记忆(Long Short-TermMemory，LSTM)网络或卷积神经网络(Convolutional Neural Network，CNN)，用于提取表情特征的神经网络模型可以采用3D卷积神经网络、CNN-LSTM等。在本实施例中，使用训练好的卷积神经网络(CNN)模型对预处理后的脑电信号进行特征提取，得到256维的脑电信号特征向量；对预处理后的心电、呼吸、眼电、肌电等外周生理信号，通过抽取信号波形的低层描述符(Low Level Descriptor，LLD)及其统计特征(包括平均值、标准偏差、功率谱、中值、最大值和最小值)，提取了128维的外周生理信号特征向量；对预处理后的面部表情视频使用训练好的CNN-LSTM模型提取256维的表情特征向量。The neural network model for extracting EEG signal features can use Long Short-Term Memory (LSTM) network or Convolutional Neural Network (CNN), and the neural network model for extracting facial expression features can use 3D Convolutional Neural Network, CNN-LSTM, etc. In this embodiment, a trained convolutional neural network (CNN) model is used to perform feature extraction on the preprocessed EEG signal to obtain a 256-dimensional EEG signal feature vector; For peripheral physiological signals such as OMG and EMG, by extracting the Low Level Descriptor (LLD) of the signal waveform and its statistical characteristics (including mean, standard deviation, power spectrum, median, maximum and minimum values), The 128-dimensional peripheral physiological signal feature vector is extracted; the trained CNN-LSTM model is used to extract the 256-dimensional expression feature vector for the preprocessed facial expression video.

(2)对脑电信号特征向量、外周生理信号特征向量、表情特征向量分别使用注意力机制模块提取有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量。(2) Using the attention mechanism module to extract the discriminative EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector for EEG signal feature vector, peripheral physiological signal feature vector and expression feature vector respectively.

(3)对脑电情感特征向量、外周生理情感特征向量和表情情感特征向量，使用鉴别多重集典型相关分析(DMCCA)方法，得到脑电-外周生理-表情多模态情感特征向量。(3) Using the discriminative multiple set canonical correlation analysis (DMCCA) method to obtain the EEG-peripheral physiology-expression multimodal emotion feature vector for EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector.

(2.1)将步骤(1)提取到的脑电信号特征以矩阵形式表示成

并通过线性变换矩阵W⁽¹⁾映射成M₁组特征向量

4≤M₁≤16，每组特征向量的维数为N，16≤N≤64，令

Its linear transformation expression is:

E⁽¹⁾＝(F⁽¹⁾)^TW⁽¹⁾ E ⁽¹⁾ = (F ⁽¹⁾ ) ^T W ⁽¹⁾

其中，上标(1)代表脑电模态，T表示转置符号。Among them, the superscript (1) represents the EEG modality, and T represents the transpose symbol.

And the EEG emotion feature vector x ⁽¹⁾ is expressed as:

其中，r＝1,2,…,M₁，

表示第r组脑电信号特征向量，

为可训练的线性变换参数向量，exp(·)表示以自然常数e为底的指数函数。在本实施例中，M₁＝8，N＝32。Among them, r=1,2,...,M ₁ ,

represents the eigenvectors of the rth group EEG signals,

is the trainable linear transformation parameter vector, exp( ) represents the exponential function with the natural constant e as the base. In this embodiment, M ₁ =8 and N=32.

为了训练线性变换矩阵W⁽¹⁾的参数，需要在第一个注意力机制模块之后连接一个softmax分类器，将第一个注意力机制模块输出的脑电情感特征向量x⁽¹⁾连接到softmax分类器的C个输出节点，经过softmax函数之后输出一个概率分布向量

其中c∈[1,C]，C为情感类别数。In order to train the parameters of the linear transformation matrix W ⁽¹⁾ , it is necessary to connect a softmax classifier after the first attention mechanism module, and connect the EEG emotion feature vector x ⁽¹⁾ output by the first attention mechanism module to the softmax The C output nodes of the classifier, after passing through the softmax function, output a probability distribution vector

where c∈[1,C], C is the number of emotion categories.

进一步地，由下式所示的交叉熵损失函数来训练线性变换矩阵W⁽¹⁾的参数。Further, the parameters of the linear transformation matrix W ⁽¹⁾ are trained by the cross-entropy loss function shown in the following equation.

其中，x⁽¹⁾为32维的脑电情感特征向量；

表示softmax分类模型预测情感类别的概率分布向量；

表示第m个脑电样本的真实情感类别标签，当采用one-hot编码时，若第m个脑电样本的真实情感类别标签为c，则

否则

表示softmax分类模型将第m个脑电样本预测为类别c的概率；Loss⁽¹⁾表示线性变换矩阵W⁽¹⁾在训练过程中的损失函数；在本实施例中，C＝2，M＝880。Among them, x ⁽¹⁾ is the 32-dimensional EEG emotion feature vector;

Represents the probability distribution vector of the sentiment category predicted by the softmax classification model;

Represents the true emotional category label of the mth EEG sample. When one-hot encoding is used, if the true emotional category label of the mth EEG sample is c, then

otherwise

Represents the probability that the softmax classification model predicts the mth EEG sample as category c; Loss ⁽¹⁾ represents the loss function of the linear transformation matrix W ⁽¹⁾ in the training process; in this embodiment, C=2, M= 880.

通过误差反向传播算法不断迭代训练，直至模型参数达到最优。之后，就可对新输入的测试样本的脑电信号提取脑电情感特征向量x⁽¹⁾。Iteratively trains through the error back propagation algorithm until the model parameters reach the optimum. After that, the EEG emotion feature vector x ⁽¹⁾ can be extracted from the EEG signal of the newly input test sample.

(2.2)将步骤(1)提取到的外周生理信号特征以矩阵形式表示成

并通过线性变换矩阵W⁽²⁾映射成M₂组特征向量

4≤M₂≤16，令

4≤M ₂ ≤16, let

Its linear transformation expression is:

E⁽²⁾＝(F⁽²⁾)^TW⁽²⁾ E ⁽²⁾ = (F ⁽²⁾ ) ^T W ⁽²⁾

其中，上标(2)代表外周生理模态。Among them, the superscript (2) represents the peripheral physiological modality.

其中，s＝1,2,…,M₂，

表示第s组外周生理信号特征向量，

为可训练的线性变换参数向量。在本实施例中，M₂＝4。Among them, s=1,2,...,M ₂ ,

represents the feature vector of peripheral physiological signals in group s,

is the trainable linear transformation parameter vector. In this embodiment, M ₂ =4.

为了训练线性变换矩阵W⁽²⁾的参数，需要在第二个注意力机制模块之后连接一个softmax分类器，将第二个注意力机制模块输出的外周生理情感特征向量x⁽²⁾连接到softmax分类器的C个输出节点，经过softmax函数之后输出一个概率分布向量

In order to train the parameters of the linear transformation matrix W ⁽²⁾ , a softmax classifier needs to be connected after the second attention mechanism module, and the peripheral physiological emotion feature vector x ⁽²⁾ output by the second attention mechanism module is connected to the softmax The C output nodes of the classifier, after passing through the softmax function, output a probability distribution vector

进一步地，由下式所示的交叉熵损失函数来训练线性变换矩阵W⁽²⁾的参数。Further, the parameters of the linear transformation matrix W ⁽²⁾ are trained by the cross-entropy loss function shown in the following equation.

其中，x⁽²⁾为32维的外周生理情感特征向量；

表示softmax分类模型预测情感类别的概率分布向量；

表示第m个外周生理信号样本的真实情感类别标签，当采用one-hot编码时，若第m个外周生理信号样本的真实情感类别标签为c，则

否则

表示softmax分类模型将第m个外周生理信号样本预测为类别c的概率；Loss⁽²⁾表示线性变换矩阵W⁽²⁾在训练过程中的损失函数；在本实施例中，C＝2，M＝880。Among them, x ⁽²⁾ is the 32-dimensional peripheral physiological emotion feature vector;

Represents the real emotion category label of the mth peripheral physiological signal sample. When one-hot encoding is used, if the real emotion category label of the mth peripheral physiological signal sample is c, then

otherwise

Represents the probability that the softmax classification model predicts the mth peripheral physiological signal sample as category c; Loss ⁽²⁾ represents the loss function of the linear transformation matrix W ⁽²⁾ in the training process; in this embodiment, C=2, M = 880.

通过误差反向传播算法不断迭代训练，直至模型参数达到最优。之后，就可对新输入的测试样本的外周生理信号提取外周生理情感特征向量x⁽²⁾。Iteratively trains through the error back propagation algorithm until the model parameters reach the optimum. After that, the peripheral physiological emotion feature vector x ⁽²⁾ can be extracted from the peripheral physiological signal of the newly input test sample.

(2.3)将步骤(1)提取到的表情特征以矩阵形式表示成

并通过线性变换矩阵W⁽³⁾映射成M₃组特征向量

4≤M₃≤16，令

4≤M ₃ ≤16, let

Its linear transformation expression is:

E⁽³⁾＝(F⁽³⁾)^TW⁽³⁾ E ⁽³⁾ = (F ⁽³⁾ ) ^T W ⁽³⁾

其中，上标(3)代表表情模态。Among them, the superscript (3) represents the expression mode.

And the expression emotion feature vector x ⁽³⁾ is expressed as:

其中，t＝1，2，…，M₃，

表示第t组表情特征向量，

为可训练的线性变换参数向量。在本实施例中，M₃＝8。where, t=1, 2, ..., M ₃ ,

represents the t-th group of expression feature vectors,

is the trainable linear transformation parameter vector. In this embodiment, M ₃ =8.

为了训练线性变换矩阵W⁽³⁾的参数，需要在第三个注意力机制模块之后连接一个softmax分类器，将第三个注意力机制模块输出的表情情感特征向量x⁽³⁾连接到softmax分类器的C个输出节点，经过softmax函数之后输出一个概率分布向量

In order to train the parameters of the linear transformation matrix W ⁽³⁾ , it is necessary to connect a softmax classifier after the third attention mechanism module, and connect the expression emotion feature vector x ⁽³⁾ output by the third attention mechanism module to the softmax classification The C output nodes of the device, after the softmax function, output a probability distribution vector

进一步地，由下式所示的交叉熵损失函数来训练线性变换矩阵W⁽³⁾的参数。Further, the parameters of the linear transformation matrix W ⁽³⁾ are trained by the cross-entropy loss function shown in the following equation.

其中，x⁽³⁾为32维的表情情感特征向量；

表示softmax分类模型预测情感类别的概率分布向量；

表示第m个表情视频样本的真实情感类别标签，当采用one-hot编码时，若第m个表情视频样本的真实情感类别标签为c，则

否则

表示softmax分类模型将第m个表情视频样本预测为类别c的概率；Loss⁽³⁾表示线性变换矩阵W⁽³⁾在训练过程中的损失函数；在本实施例中，C＝2，M＝880。Among them, x ⁽³⁾ is a 32-dimensional expression emotion feature vector;

Represents the real emotion category label of the mth expression video sample. When using one-hot encoding, if the real emotion category label of the mth expression video sample is c, then

otherwise

Represents the probability that the softmax classification model predicts the mth expression video sample as category c; Loss ⁽³⁾ represents the loss function of the linear transformation matrix W ⁽³⁾ in the training process; in this embodiment, C=2, M= 880.

通过误差反向传播算法不断迭代训练，直至模型参数达到最优。之后，就可对新输入的测试样本的表情视频提取表情情感特征向量x⁽³⁾。Iteratively trains through the error back propagation algorithm until the model parameters reach the optimum. After that, the expression emotion feature vector x ⁽³⁾ can be extracted from the expression video of the newly input test sample.

和

32≤d≤128。在本实施例中，d＝40。(3.1) Obtaining the DMCCA projection matrix corresponding to EEG emotional features, peripheral physiological emotional features and facial emotional features obtained through training

and

32≤d≤128. In this embodiment, d=40.

(3.2)分别使用投影矩阵Ω、Φ和Ψ将步骤(2)提取到的脑电情感特征向量x⁽¹⁾、外周生理情感特征向量x⁽²⁾和表情情感特征向量x⁽³⁾投影到一个d维的公共子空间，其中脑电情感特征向量x⁽¹⁾到d维公共子空间的投影为Ω^Tx⁽¹⁾，外周生理情感特征向量x⁽²⁾到d维公共子空间的投影为Φ^Tx⁽²⁾，表情情感特征向量x⁽³⁾到d维公共子空间的投影为Ψ^Tx⁽³⁾。(3.2) Using projection matrices Ω, Φ and Ψ to project the EEG emotion feature vector x ⁽¹⁾ , peripheral physiological emotion feature vector x ⁽²⁾ and expression emotion feature vector x ⁽³⁾ extracted in step (2) to A d-dimensional common subspace, in which the projection of the EEG emotion feature vector x ⁽¹⁾ to the d-dimensional common subspace is Ω ^T x ⁽¹⁾ , and the peripheral physiological emotion feature vector x ⁽²⁾ to the d-dimensional common subspace. The projection is Φ ^T x ⁽²⁾ , and the projection of the expression emotion feature vector x ⁽³⁾ to the d-dimensional common subspace is Ψ ^T x ⁽³⁾ .

(3.1.1)对于训练样本集中的C类情感类别的样本生成3组情感特征向量

其中

M为训练样本数(本例中样本集中数据量不大，所有样本参与计算，数据量大的样本集可随机抽取各情感类别的样本)，i＝1，2，3，m＝1,2，…，M；令i＝1代表脑电模态，i＝2代表外周生理模态，i＝3代表表情模态，

代表脑电情感特征向量，

代表外周生理情感特征向量，

代表表情情感特征向量；在本实施例中，C＝2，M＝880，N＝32。(3.1.1) Generate 3 sets of emotion feature vectors for the samples of the C-type emotion category in the training sample set

in

M is the number of training samples (in this example, the amount of data in the sample set is not large, all samples participate in the calculation, and the sample set with a large amount of data can randomly select samples of each emotion category), i=1, 2, 3, m=1, 2 , ..., M; let i=1 represent the EEG modality, i=2 represent the peripheral physiological modality, and i=3 represent the expression modality,

represents the EEG emotion feature vector,

represents the peripheral physiological emotion feature vector,

Represents the facial expression emotion feature vector; in this embodiment, C=2, M=880, and N=32.

(3.1.2)计算X⁽ⁱ⁾中各列向量的均值

对X⁽ⁱ⁾进行中心化操作，得到

为了便于描述，下面将中心化后的

仍记为X⁽ⁱ⁾，即假设

均已被中心化。(3.1.2) Calculate the mean of each column vector in X ⁽ⁱ⁾

Perform a centralization operation on X ⁽ⁱ⁾ to get

For the convenience of description, the following will be centralized

Still denoted as X ⁽ⁱ⁾ , that is, assuming

have been centralized.

(3.1.3)鉴别多重集典型相关分析(DMCCA)的思想旨在求得一组投影矩阵Ω、Φ和Ψ，使得同类样本在公共投影子空间的线性相关性最大，同时还最大化了模态内数据的类间散布与最小化了模态内数据的类内散布，令X⁽ⁱ⁾的投影向量为

i＝1，2，3，DMCCA的目标函数为：(3.1.3) The idea of discriminative multiple set canonical correlation analysis (DMCCA) aims to obtain a set of projection matrices Ω, Φ and Ψ, so that the linear correlation of similar samples in the common projection subspace is maximized, and the modulus is maximized. The inter-class scatter of intra-modal data and the intra-class scatter of intra-modal data are minimized, let the projection vector of X ⁽ⁱ⁾ be

i=1, 2, 3, the objective function of DMCCA is:

其中，

表示X⁽ⁱ⁾的类内散布矩阵，

表示X⁽ⁱ⁾的类间散布矩阵，cov(·，·)表示协方差，i，j∈{1，2，3}。in,

represents the intra-class scatter matrix of X ⁽ⁱ⁾ ,

represents the inter-class scatter matrix of X ⁽ⁱ⁾ , cov(·,·) represents the covariance, i,j∈{1,2,3}.

对DMCCA目标函数的求解可以表示为如下的优化模型：The solution to the DMCCA objective function can be expressed as the following optimization model:

(3.1.4)使用拉格朗日乘子法(Lagrange multiplier)求解DMCCA目标函数的优化模型，可得到如下拉格朗日(Lagrange)函数：(3.1.4) Using the Lagrange multiplier method to solve the optimization model of the DMCCA objective function, the following Lagrange function can be obtained:

其中，λ是拉格朗日乘子，再分别求L(w⁽¹⁾，w⁽²⁾,w⁽³⁾)对w⁽¹⁾、w⁽²⁾和w⁽³⁾的偏导数并令其为零，即令Among them, λ is the Lagrange multiplier, and then find the partial derivatives of L(w ⁽¹⁾ , w ⁽²⁾ , w ⁽³⁾ ) to w ⁽¹⁾ , w ⁽²⁾ and w ⁽³⁾ respectively and combine Let it be zero, that is,

得到get

和

在本实施例中，d＝40。By solving the generalized eigenvalue problem in the above formula, the projection matrix can be obtained by selecting the eigenvectors corresponding to the first d largest eigenvalues λ ₁ ≥λ ₂ ≥...≥λ _d

and

In this embodiment, d=40.

基于相同的发明构思，本发明实施例提供的融合注意力机制和DMCCA的多模态情感识别系统，包括：Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the embodiment of the present invention includes:

特征鉴别增强模块，用于对的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量，并分别使用注意力机制模块确定不同特征向量组的重要性权重，通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量；The feature identification enhancement module is used to map the EEG signal feature vectors, peripheral physiological signal feature vectors, and expression feature vectors into several groups of feature vectors through linear transformation matrices, and use the attention mechanism module to determine the importance of different feature vector groups. Sex weight, through weighted fusion to form discriminative EEG emotion feature vector, peripheral physiological emotion feature vector, and expression emotion feature vector with the same dimension;

投影矩阵确定模块，用于使用DMCCA方法，通过最大化同一类别样本的不同模态情感特征之间的相关性，确定各情感特征向量的投影矩阵；The projection matrix determination module is used to determine the projection matrix of each emotional feature vector by maximizing the correlation between the emotional features of different modalities of the same category of samples by using the DMCCA method;

特征融合模块，用于对的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量，通过各自对应的投影矩阵投影到一个公共子空间，相加融合后得到脑电-外周生理-表情多模态情感特征向量；The feature fusion module is used to project the corresponding EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector into a common subspace through their corresponding projection matrices, and then add and fuse to obtain EEG-peripheral physiology-expression Multimodal emotion feature vector;

各模块的具体实现参考上述方法实施例，不再赘述。本领域技术人员可以理解，可以对实施例中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个系统中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。For the specific implementation of each module, reference is made to the foregoing method embodiments, and details are not repeated here. Those skilled in the art will appreciate that the modules in an embodiment can be adaptively changed and placed in one or more systems different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies.

基于相同的发明构思，本发明实施例提供的融合注意力机制和DMCCA的多模态情感识别系统，包括至少一台计算设备，该计算设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，该计算机程序被加载至处理器时实现上述的融合注意力机制和DMCCA的多模态情感识别方法。Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the embodiment of the present invention includes at least one computing device, and the computing device includes a memory, a processor, and a multimodal emotion recognition system stored on the memory and capable of processing A computer program running on the processor, when the computer program is loaded into the processor, realizes the above-mentioned fusion attention mechanism and the multimodal emotion recognition method of DMCCA.

本发明所公开的技术方案既包含了上述实施方案中涉及的技术方法，也包括由以上技术方法任意组合成的技术方案。本技术领域的普通技术人员，在不脱离本发明的原理的前提下，可以作出一定的改善和修饰，这些改善和修饰也被认为本发明的保护范围。The technical solutions disclosed in the present invention include not only the technical methods involved in the above embodiments, but also the technical solutions formed by any combination of the above technical methods. Those skilled in the art can make certain improvements and modifications without departing from the principles of the present invention, and these improvements and modifications are also considered to be within the protection scope of the present invention.

Claims

1. the multimodal emotion recognition method of fusion attention mechanism and DMCCA, is characterized in that, described method comprises the following steps:

(1) Extract the EEG signal feature vector and the expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. For the preprocessed peripheral physiological signal, extract the signal waveform to describe Symbol and its statistical characteristics, extract peripheral physiological signal feature vector;

(2) The EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector are respectively mapped into several groups of feature vectors through a linear transformation matrix, and the attention mechanism module is used to determine the importance weights of different feature vector groups. , through weighted fusion to form discriminative EEG emotion feature vectors, peripheral physiological emotion feature vectors, and expression emotion feature vectors with the same dimension;

(3) For the EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector, the discriminative multiple set canonical correlation analysis (DMCCA) method is used to maximize the difference between different modal emotion features of the same class of samples. Determine the projection matrix of each emotional feature vector, project each emotional feature vector into a common subspace, and obtain the EEG-peripheral physiology-expression multimodal emotional feature vector after addition and fusion;

(4) Use the classifier to classify and identify the multimodal emotion feature vector to obtain the emotion category.

2. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 1, is characterized in that, step (2) comprises the following substeps:

(2.1) The EEG signal features extracted in step (1) are expressed in matrix form as

Its linear transformation expression is:

E ⁽¹⁾ = (F ⁽¹⁾ ) ^T W ⁽¹⁾

Among them, the superscript (1) represents the EEG mode, and T represents the transpose symbol;

The first attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative EEG emotion feature vector is formed through weighted fusion, where the weight of the rth group EEG signal feature vector

And the EEG emotion feature vector x ⁽¹⁾ is expressed as:

Among them, r=1,2,...,M ₁ ,

represents the eigenvectors of the rth group EEG signals,

(2.2) The peripheral physiological signal features extracted in step (1) are expressed in matrix form as

4≤M ₂ ≤16, let

Its linear transformation expression is:

E ⁽²⁾ = (F ⁽²⁾ ) ^T W ⁽²⁾

Among them, the superscript (2) represents the peripheral physiological mode;

The second attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative peripheral physiological emotion feature vector is formed through weighted fusion, where the weight of the sth group of peripheral physiological signal feature vectors

Among them, s=1,2,...,M ₂ ,

represents the feature vector of peripheral physiological signals in group s,

is a trainable linear transformation parameter vector;

(2.3) Express the facial expression features extracted in step (1) in matrix form as

4≤M ₃ ≤16, let

Its linear transformation expression is:

E ⁽³⁾ = (F ⁽³⁾ ) ^T W ⁽³⁾

Among them, the superscript (3) represents the expression mode;

Use the third attention mechanism module to determine the importance weights of different feature vector groups, and form discriminative emotion feature vectors through weighted fusion, where the weight of the t-th group of expression feature vectors

And the expression emotion feature vector x ⁽³⁾ is expressed as:

Among them, t=1,2,...,M ₃ ,

represents the t-th group of expression feature vectors,

is the trainable linear transformation parameter vector.

3. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 2, is characterized in that, step (3) comprises following substep:

(3.1) Obtaining the DMCCA projection matrix corresponding to EEG emotional features, peripheral physiological emotional features and facial emotional features obtained through training

and

32≤d≤128;

(3.2) Using projection matrices Ω, Φ and Ψ to project the EEG emotion feature vector x ⁽¹⁾ , peripheral physiological emotion feature vector x ⁽²⁾ and expression emotion feature vector x ⁽³⁾ extracted in step (2) to A d-dimensional common subspace, in which the projection of the EEG emotion feature vector x ⁽¹⁾ to the d-dimensional common subspace is Ω ^T x ⁽¹⁾ , and the peripheral physiological emotion feature vector x ⁽²⁾ to the d-dimensional common subspace. The projection is Φ ^T x ⁽²⁾ , and the projection of the expression emotion feature vector x ⁽³⁾ to the d-dimensional common subspace is Ψ ^T x ⁽³⁾ ;

(3.3) Integrate Ω ^T x ⁽¹⁾ , Φ ^T x ⁽²⁾ and Ψ ^T x ⁽³⁾ to obtain the EEG-peripheral physiology-expression multimodal emotional feature vector as Ω ^T x ⁽¹⁾ +Φ ^T x ⁽²⁾ + Ψ ^T x ⁽³⁾ .

4. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 3, is characterized in that, the projection matrix Ω, Φ and Ψ in step (3.1) obtain by the training of following steps:

(3.1.1) Extract the training samples of each emotion category from the training sample set to generate 3 sets of emotion feature vectors

in

M is the number of training samples, i=1, 2, 3, m=1, 2, ..., M; let i=1 represent the EEG modality, i=2 represent the peripheral physiological modality, and i=3 represent the expression modality ,

represents the EEG emotion feature vector,

represents the peripheral physiological emotion feature vector,

Represents the expression emotion feature vector;

(3.1.2) Calculate the mean value of each column vector in X ( ⁱ ^), and perform the centering operation on X (i);

(3.1.3) Based on the idea of discriminative multiple set canonical correlation analysis (DMCCA), a set of projection matrices Ω, Φ and Ψ are obtained, so that the linear correlation of similar samples in the common projection subspace is maximized, and the intra-modal data is maximized. The inter-class scatter of and minimizing the intra-class scatter of intra-modal data, let the projection vector of X ⁽ⁱ⁾ be

The objective function of DMCCA is:

in,

represents the intra-class scatter matrix of X ⁽ⁱ⁾ ,

represents the inter-class scatter matrix of X ⁽ⁱ⁾ , cov( , ) represents the covariance, i, j ∈ {1, 2, 3}; construct the following optimization model and solve to obtain the projection matrices Ω, Φ and Ψ:

5. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 4, is characterized in that, uses Lagrangian multiplier method to solve the optimization model of the constructed DMCCA objective function, is specially: The optimization model is expressed as the following Lagrangian function:

Among them, λ is the Lagrange multiplier, and then find the partial derivatives of L(w ⁽¹⁾ , w ⁽²⁾ , w ⁽³⁾ ) to w ⁽¹⁾ , w ⁽²⁾ and w ⁽³⁾ respectively and combine Let it be zero, that is,

get

By further simplifying the above formula, the following generalized eigenvalue problem can be obtained:

and

6. A multimodal emotion recognition system integrating attention mechanism and DMCCA, characterized in that it includes:

The feature preliminary extraction module is used to extract the EEG signal feature vector and the facial expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. Extract the signal waveform descriptor and its statistical features, and extract the peripheral physiological signal feature vector;

The feature identification enhancement module is used to map the EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector into several groups of feature vectors through a linear transformation matrix, and use the attention mechanism module to determine different feature vector groups. The importance weights of , and the discriminative EEG emotion feature vector, peripheral physiological emotion feature vector, and expression emotion feature vector with the same dimension are formed by weighted fusion;

The projection matrix determination module is used to determine the projection matrix of each emotion feature vector by maximizing the correlation between different modal emotion features of the same class of samples by using the discriminative multiple set canonical correlation analysis (DMCCA) method;

The feature fusion module is used to project the EEG emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector into a common subspace through their corresponding projection matrices, and then add and fuse to obtain the EEG-peripheral physiological emotion feature vector. - Expression multimodal emotion feature vector;

And, the classification and recognition module is used for classifying and recognizing the multimodal emotion feature vector by using the classifier to obtain the emotion category.

7. the multimodal emotion recognition system of fusion attention mechanism and DMCCA, is characterized in that, comprises at least one computing device, and described computing device comprises memory, processor and the computer that is stored on memory and can run on processor A program, when the computer program is loaded into the processor, implements the multimodal emotion recognition method fused with the attention mechanism and DMCCA according to any one of claims 1-5.