[go: up one dir, main page]

CN112800998A - Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA - Google Patents

Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA Download PDF

Info

Publication number
CN112800998A
CN112800998A CN202110159085.8A CN202110159085A CN112800998A CN 112800998 A CN112800998 A CN 112800998A CN 202110159085 A CN202110159085 A CN 202110159085A CN 112800998 A CN112800998 A CN 112800998A
Authority
CN
China
Prior art keywords
feature vector
emotion
expression
eeg
dmcca
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110159085.8A
Other languages
Chinese (zh)
Other versions
CN112800998B (en
Inventor
卢官明
朱清扬
卢峻禾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110159085.8A priority Critical patent/CN112800998B/en
Publication of CN112800998A publication Critical patent/CN112800998A/en
Application granted granted Critical
Publication of CN112800998B publication Critical patent/CN112800998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

本发明公开了一种融合注意力机制和鉴别多重集典型相关分析(DMCCA)的多模态情感识别方法及系统。该方法包括:对经过预处理后的脑电信号、外周生理信号和面部表情视频分别提取脑电信号特征、外周生理信号特征和表情特征;使用注意力机制分别提取有鉴别力的脑电情感特征、外周生理情感特征、表情情感特征;对脑电情感特征、外周生理情感特征和表情情感特征使用DMCCA方法,得到脑电‑外周生理‑表情多模态情感特征;使用分类器对多模态情感特征进行分类识别。本发明采用注意力机制选择性地重点关注各模态中更具情感鉴别力的特征,并结合DMCCA充分利用不同模态情感特征之间的相关性和互补性,可以有效提高情感识别的准确率和鲁棒性。

Figure 202110159085

The invention discloses a multimodal emotion recognition method and system integrating attention mechanism and discriminating multiple set canonical correlation analysis (DMCCA). The method includes: extracting EEG signal features, peripheral physiological signal features and expression features respectively from preprocessed EEG signals, peripheral physiological signals and facial expression videos; using an attention mechanism to extract discriminative EEG emotional features respectively , peripheral physiological emotion feature, expression emotion feature; use DMCCA method on EEG emotion feature, peripheral physiological emotion feature and expression emotion feature to obtain EEG-peripheral physiological-expression multimodal emotion feature; use classifier to classify multimodal emotion Features are classified and identified. The invention adopts the attention mechanism to selectively focus on the more emotionally discriminating features in each modal, and makes full use of the correlation and complementarity between the emotional features of different modalities in combination with DMCCA, which can effectively improve the accuracy of emotion recognition. and robustness.

Figure 202110159085

Description

融合注意力机制和DMCCA的多模态情感识别方法及系统Multimodal emotion recognition method and system integrating attention mechanism and DMCCA

技术领域technical field

本发明涉及情感识别和人工智能技术领域,尤其涉及一种融合注意力机制和鉴别多重集典型相关分析(DMCCA)的多模态情感识别方法及系统。The present invention relates to the technical field of emotion recognition and artificial intelligence, and in particular, to a multimodal emotion recognition method and system integrating attention mechanism and discriminative multiple set canonical correlation analysis (DMCCA).

背景技术Background technique

人类情感是伴随着人的意识过程产生的心理和生理状态,在人际交流中起着重要作用。随着人工智能等技术的不断进步,获得更加智能化、人性化的人机交互(Human–Computer Interactions,HCIs)体验越来越受到人们的关注。人们对机器智能化的要求越来越高,期望机器能够具有感知、理解甚至表达情感的能力,实现人性化的人机交互,更好地为人类服务。情感识别作为情感计算的一个分支,是实现人-机情感交互的基础和核心技术,已经成为计算机科学、认知科学和人工智能等领域的研究热点,受到了学术界和工业界的广泛关注。例如,在临床医疗护理中,如果能够知道患者,尤其是有表达障碍的患者的情感状态,就可以采取不同的护理措施,以提高护理质量。此外,在对精神障碍患者的心理行为监控、情感机器人的人-机友好交互等方面也得到了越来越多的关注。Human emotion is the psychological and physiological state that accompanies the process of human consciousness and plays an important role in interpersonal communication. With the continuous progress of artificial intelligence and other technologies, obtaining a more intelligent and humanized human-computer interaction (Human–Computer Interactions, HCIs) experience has attracted more and more attention. People have higher and higher requirements for machine intelligence, expecting machines to have the ability to perceive, understand and even express emotions, realize human-computer interaction, and better serve human beings. As a branch of affective computing, emotion recognition is the basic and core technology for realizing human-computer emotional interaction. For example, in clinical medical care, if the emotional state of patients, especially those with expression disorders, can be known, different nursing measures can be taken to improve the quality of care. In addition, more and more attention has been paid to the psychological behavior monitoring of patients with mental disorders and the human-machine friendly interaction of emotional robots.

以往对情感识别的研究大多集中在通过单一模态的信息来识别人类情感状态,比如基于语音的情感识别、基于面部表情的情感识别等。由于单一的语音或表情信息所表达的情感信息是不完整的,且容易受到外界各种因素的影响,例如面部表情识别容易受遮挡和光照变化的影响,而基于语音的情感识别容易受环境噪音干扰和不同受试者的声音差异的影响,此外,有时候人们为了掩盖自己的真实情感而强颜欢笑、装腔作势或沉默不语,此时,面部表情或身体姿态具有一定的欺骗性,而当人们沉默不语时基于语音的情感识别方法就会失效,所以,单模态情感识别存在一定的局限性。因此,越来越多的研究人员将目光转向基于多模态信息融合的情感识别研究,期望能够利用各个模态信息之间的互补性,来构建鲁棒的情感识别模型,以达到更高的情感识别准确率。Most of the previous researches on emotion recognition focus on identifying human emotional states through a single modality of information, such as speech-based emotion recognition, facial expression-based emotion recognition, etc. Because the emotional information expressed by a single voice or facial expression information is incomplete and easily affected by various external factors, for example, facial expression recognition is easily affected by occlusion and illumination changes, while speech-based emotion recognition is easily affected by environmental noise. The effects of interference and differences in the voice of different subjects, in addition, sometimes people smile, pose, or remain silent in order to hide their true emotions, at this time, facial expressions or body gestures can be deceptive, and when people Speech-based emotion recognition methods will fail when there is silence, so single-modal emotion recognition has certain limitations. Therefore, more and more researchers turn their attention to emotion recognition research based on multimodal information fusion, expecting to use the complementarity between each modal information to build a robust emotion recognition model to achieve higher Emotion recognition accuracy.

目前,在进行多模态情感识别研究中,比较常用的信息融合策略有决策层融合和特征层融合。决策层融合通常基于各模态单独识别的结果,再依据相关规则,如均值(Mean)规则、求和(Sum)规则、最大值(Max)规则、少数服从多数的投票表决机制等,做出决策判断,得到最终的识别结果。决策层融合技术根据不同模态信息对于情感识别的贡献不同,比较全面地考虑了不同模态信息之间的差异性,但却忽略了不同模态信息之间的相关性。基于决策层融合的多模态情感识别性能不仅与单个模态的情感识别率有关,还依赖于决策层融合算法的性能。特征层融合是指将多个模态的情感特征联合起来,形成一个融合特征向量。特征层融合方法利用了不同模态情感特征的互补性,但如何确定不同模态情感特征的权值,以体现不同特征在情感分类识别中的差异性,是进行多模态特征融合的关键,目前仍然是面临挑战的开放课题。At present, in the multimodal emotion recognition research, the more commonly used information fusion strategies are decision-level fusion and feature-level fusion. The decision-making level fusion is usually based on the results of the individual identification of each mode, and then according to relevant rules, such as the mean (Mean) rule, the sum (Sum) rule, the maximum value (Max) rule, and the voting mechanism where the minority obeys the majority, etc. Decision-making and judgment to obtain the final identification result. According to the different contributions of different modal information to emotion recognition, decision-level fusion technology comprehensively considers the difference between different modal information, but ignores the correlation between different modal information. The multimodal emotion recognition performance based on decision-level fusion is not only related to the emotion recognition rate of a single modality, but also depends on the performance of the decision-level fusion algorithm. Feature layer fusion refers to combining the emotional features of multiple modalities to form a fusion feature vector. The feature layer fusion method utilizes the complementarity of different modal emotional features, but how to determine the weights of different modal emotional features to reflect the differences in emotion classification and recognition of different features is the key to multi-modal feature fusion. It is still an open topic facing challenges.

发明内容SUMMARY OF THE INVENTION

发明目的:针对单模态情感识别准确率低、鲁棒性差以及现有多模态情感特征融合方法存在的缺点,本发明的目的是提供一种融合注意力机制和鉴别多重集典型相关分析(DMCCA)的多模态情感识别方法及系统,通过引入注意力机制选择性地重点关注各模态中有鉴别力的情感特征,并结合DMCCA充分利用不同模态情感特征之间的相关性和互补性,可以有效提高多模态情感识别的准确率和鲁棒性。Purpose of the invention: In view of the low accuracy of single-modal emotion recognition, poor robustness and the shortcomings of existing multi-modal emotional feature fusion methods, the purpose of the present invention is to provide a fusion attention mechanism and identification of multiple sets of canonical correlation analysis ( DMCCA) multi-modal emotion recognition method and system, selectively focus on the discriminative emotional features in each modality by introducing an attention mechanism, and make full use of the correlation and complementarity between emotional features of different modalities in combination with DMCCA It can effectively improve the accuracy and robustness of multimodal emotion recognition.

技术方案:本发明为实现上述发明目的采用以下技术方案:Technical scheme: the present invention adopts the following technical scheme for realizing the above-mentioned purpose of the invention:

一种融合注意力机制和DMCCA的多模态情感识别方法,包括以下步骤:A multimodal emotion recognition method integrating attention mechanism and DMCCA, including the following steps:

(1)对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量,对预处理后的外周生理信号,通过抽取信号波形描述符及其统计特征,提取外周生理信号特征向量;(1) Extract the EEG signal feature vector and the expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. For the preprocessed peripheral physiological signal, extract the signal waveform to describe Symbol and its statistical characteristics, extract peripheral physiological signal feature vector;

(2)对所述的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量,并分别使用注意力机制模块确定不同特征向量组的重要性权重,通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量;(2) The EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector are respectively mapped into several groups of feature vectors through a linear transformation matrix, and the attention mechanism module is used to determine the importance weights of different feature vector groups. , through weighted fusion to form discriminative EEG emotion feature vectors, peripheral physiological emotion feature vectors, and expression emotion feature vectors with the same dimension;

(3)对所述的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量,使用鉴别多重集典型相关分析(DMCCA)方法,通过最大化同一类别样本的不同模态情感特征之间的相关性,确定各情感特征向量的投影矩阵,并将各情感特征向量投影到一个公共子空间,相加融合后得到脑电-外周生理-表情多模态情感特征向量;(3) For the EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector, the discriminative multiple set canonical correlation analysis (DMCCA) method is used to maximize the difference between different modal emotion features of the same class of samples. Determine the projection matrix of each emotional feature vector, project each emotional feature vector into a common subspace, and obtain the EEG-peripheral physiology-expression multimodal emotional feature vector after addition and fusion;

(4)使用分类器对多模态情感特征向量进行分类识别,得到情感类别。(4) Use the classifier to classify and identify the multimodal emotion feature vector to obtain the emotion category.

进一步地,步骤(2)中使用注意力机制模块提取有鉴别力的脑电情感特征、外周生理情感特征、表情情感特征的具体步骤包括:Further, the specific steps of using the attention mechanism module to extract discriminative EEG emotional features, peripheral physiological emotional features, and facial expression emotional features in step (2) include:

(2.1)将步骤(1)提取到的脑电信号特征以矩阵形式表示成

Figure BDA0002935593770000031
并通过线性变换矩阵W(1)映射成M1组特征向量
Figure BDA0002935593770000032
4≤M1≤16,每组特征向量的维数为N,16≤N≤64,令
Figure BDA0002935593770000033
其线性变换表达式为:(2.1) The EEG signal features extracted in step (1) are expressed in matrix form as
Figure BDA0002935593770000031
And through the linear transformation matrix W (1) is mapped into M 1 set of eigenvectors
Figure BDA0002935593770000032
4≤M 1 ≤16, the dimension of each set of eigenvectors is N, 16≤N≤64, let
Figure BDA0002935593770000033
Its linear transformation expression is:

E(1)=(F(1))TW(1) E (1) = (F (1) ) T W (1)

其中,上标(1)代表脑电模态,T表示转置符号;Among them, the superscript (1) represents the EEG mode, and T represents the transpose symbol;

使用第一个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的脑电情感特征向量,其中第r组脑电信号特征向量的权重

Figure BDA0002935593770000034
以及脑电情感特征向量x(1)表示为:The first attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative EEG emotion feature vector is formed through weighted fusion, where the weight of the rth group EEG signal feature vector
Figure BDA0002935593770000034
And the EEG emotion feature vector x (1) is expressed as:

Figure BDA0002935593770000035
Figure BDA0002935593770000035

Figure BDA0002935593770000036
Figure BDA0002935593770000036

其中,r=1,2,…,M1

Figure BDA0002935593770000037
表示第r组脑电信号特征向量,
Figure BDA0002935593770000038
为可训练的线性变换参数向量,exp(·)表示以自然常数e为底的指数函数;Among them, r=1,2,...,M 1 ,
Figure BDA0002935593770000037
represents the eigenvectors of the rth group EEG signals,
Figure BDA0002935593770000038
is the trainable linear transformation parameter vector, exp( ) represents the exponential function with the natural constant e as the base;

(2.2)将步骤(1)提取到的外周生理信号特征以矩阵形式表示成

Figure BDA0002935593770000039
并通过线性变换矩阵W(2)映射成M2组特征向量
Figure BDA00029355937700000310
4≤M2≤16,令
Figure BDA00029355937700000311
Figure BDA00029355937700000312
其线性变换表达式为:(2.2) The peripheral physiological signal features extracted in step (1) are expressed in matrix form as
Figure BDA0002935593770000039
And through the linear transformation matrix W (2) is mapped into M 2 groups of eigenvectors
Figure BDA00029355937700000310
4≤M 2 ≤16, let
Figure BDA00029355937700000311
Figure BDA00029355937700000312
Its linear transformation expression is:

E(2)=(F(2))TW(2) E (2) = (F (2) ) T W (2)

其中,上标(2)代表外周生理模态;Among them, the superscript (2) represents the peripheral physiological mode;

使用第二个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的外周生理情感特征向量,其中第s组外周生理信号特征向量的权重

Figure BDA0002935593770000041
以及外周生理情感特征向量x(2)表示为:The second attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative peripheral physiological emotion feature vector is formed by weighted fusion, where the weight of the sth group of peripheral physiological signal feature vectors
Figure BDA0002935593770000041
And the peripheral physiological emotion feature vector x (2) is expressed as:

Figure BDA0002935593770000042
Figure BDA0002935593770000042

Figure BDA0002935593770000043
Figure BDA0002935593770000043

其中,s=1,2,…,M2

Figure BDA0002935593770000044
表示第s组外周生理信号特征向量,
Figure BDA0002935593770000045
为可训练的线性变换参数向量;Among them, s=1,2,...,M 2 ,
Figure BDA0002935593770000044
represents the feature vector of peripheral physiological signals in group s,
Figure BDA0002935593770000045
is a trainable linear transformation parameter vector;

(2.3)将步骤(1)提取到的表情特征以矩阵形式表示成

Figure BDA0002935593770000046
并通过线性变换矩阵W(3)映射成M3组特征向量
Figure BDA0002935593770000047
4≤M3≤16,令
Figure BDA0002935593770000048
Figure BDA0002935593770000049
其线性变换表达式为:(2.3) Express the facial expression features extracted in step (1) in matrix form as
Figure BDA0002935593770000046
And through the linear transformation matrix W (3) mapped into M 3 groups of eigenvectors
Figure BDA0002935593770000047
4≤M 3 ≤16, let
Figure BDA0002935593770000048
Figure BDA0002935593770000049
Its linear transformation expression is:

E(3)=(F(3))TW(3) E (3) = (F (3) ) T W (3)

其中,上标(3)代表表情模态;Among them, the superscript (3) represents the expression mode;

使用第三个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的表情情感特征向量,其中第t组表情特征向量的权重

Figure BDA00029355937700000410
以及表情情感特征向量x(3)表示为:Use the third attention mechanism module to determine the importance weights of different feature vector groups, and form discriminative emotion feature vectors through weighted fusion, where the weight of the t-th group of expression feature vectors
Figure BDA00029355937700000410
And the expression emotion feature vector x (3) is expressed as:

Figure BDA00029355937700000411
Figure BDA00029355937700000411

Figure BDA00029355937700000412
Figure BDA00029355937700000412

其中,t=1,2,…,M3

Figure BDA00029355937700000413
表示第t组表情特征向量,
Figure BDA00029355937700000414
为可训练的线性变换参数向量。Among them, t=1,2,...,M 3 ,
Figure BDA00029355937700000413
represents the t-th group of expression feature vectors,
Figure BDA00029355937700000414
is the trainable linear transformation parameter vector.

进一步地,步骤(3)具体包括以下子步骤:Further, step (3) specifically includes the following substeps:

(3.1)获取通过训练得到的分别对应于脑电情感特征、外周生理情感特征和表情情感特征的DMCCA投影矩阵

Figure BDA0002935593770000051
Figure BDA0002935593770000052
Figure BDA0002935593770000053
32≤d≤128;(3.1) Obtaining the DMCCA projection matrix corresponding to EEG emotional features, peripheral physiological emotional features and facial emotional features obtained through training
Figure BDA0002935593770000051
Figure BDA0002935593770000052
and
Figure BDA0002935593770000053
32≤d≤128;

(3.2)分别使用投影矩阵Ω、Φ和Ψ将步骤(2)提取到的脑电情感特征向量x(1)、外周生理情感特征向量x(2)和表情情感特征向量x(3)投影到一个d维的公共子空间,其中脑电情感特征向量x(1)到d维公共子空间的投影为ΩTx(1),外周生理情感特征向量x(2)到d维公共子空间的投影为ΨTx(2),表情情感特征向量x(3)到d维公共子空间的投影为ΨTx(3)(3.2) Using projection matrices Ω, Φ and Ψ to project the EEG emotion feature vector x (1) , peripheral physiological emotion feature vector x (2) and expression emotion feature vector x (3) extracted in step (2) to A d-dimensional common subspace, in which the projection of the EEG emotion feature vector x (1) to the d-dimensional common subspace is Ω T x (1) , and the peripheral physiological emotion feature vector x (2) to the d-dimensional common subspace. The projection is Ψ T x (2) , and the projection of the expression emotion feature vector x (3) to the d-dimensional common subspace is Ψ T x (3) ;

(3.3)将ΩTx(1)、ΦTx(2)和ΨTx(3)进行融合,得到脑电-外周生理-表情多模态情感特征向量为ΩTx(1)Tx(2)Tx(3)(3.3) Integrate Ω T x (1) , Φ T x (2) and Ψ T x (3) to obtain the EEG-peripheral physiology-expression multimodal emotional feature vector as Ω T x (1)T x (2) + Ψ T x (3) .

进一步地,步骤(3.1)中的投影矩阵Ω、Φ和Ψ通过以下步骤的训练得到:Further, the projection matrices Ω, Φ and Ψ in step (3.1) are obtained through the training of the following steps:

(3.1.1)从训练样本集中分别抽取各情感类别的训练样本生成3组情感特征向量

Figure BDA0002935593770000054
其中
Figure BDA0002935593770000055
M为训练样本数,N为
Figure BDA0002935593770000056
的维数,i=1,2,3,m=1,2,…,M;令i=1代表脑电模态,i=2代表外周生理模态,i=3代表表情模态,
Figure BDA0002935593770000057
代表脑电情感特征向量,
Figure BDA0002935593770000058
代表外周生理情感特征向量,
Figure BDA0002935593770000059
代表表情情感特征向量;(3.1.1) Extract the training samples of each emotion category from the training sample set to generate 3 sets of emotion feature vectors
Figure BDA0002935593770000054
in
Figure BDA0002935593770000055
M is the number of training samples, and N is
Figure BDA0002935593770000056
The dimension of , i=1, 2, 3, m=1, 2,...,M; let i=1 represent the EEG modality, i=2 represent the peripheral physiological modality, i=3 represent the expression modality,
Figure BDA0002935593770000057
represents the EEG emotion feature vector,
Figure BDA0002935593770000058
represents the peripheral physiological emotion feature vector,
Figure BDA0002935593770000059
Represents the expression emotion feature vector;

(3.1.2)计算X(i)中各列向量的均值,对X(i)进行中心化操作;(3.1.2) Calculate the mean value of each column vector in X ( i ), and perform the centering operation on X (i);

(3.1.3)基于鉴别多重集典型相关分析(DMCCA)的思想求得一组投影矩阵Ω、Φ和Ψ,使得同类样本在公共投影子空间的线性相关性最大,同时最大化模态内数据的类间散布与最小化模态内数据的类内散布,令X(i)的投影向量为

Figure BDA00029355937700000510
i=1,2,3,DMCCA的目标函数为:(3.1.3) Based on the idea of discriminative multiple set canonical correlation analysis (DMCCA), a set of projection matrices Ω, Φ and Ψ are obtained, so that the linear correlation of similar samples in the common projection subspace is maximized, and the intra-modal data is maximized. The inter-class scatter of and minimizing the intra-class scatter of intra-modal data, let the projection vector of X (i) be
Figure BDA00029355937700000510
i=1,2,3, the objective function of DMCCA is:

Figure BDA0002935593770000061
Figure BDA0002935593770000061

其中,

Figure BDA0002935593770000062
表示X(i)的类内散布矩阵,
Figure BDA0002935593770000063
表示X(i)的类间散布矩阵,cov(·,·)表示协方差,i,j∈{1,2,3};in,
Figure BDA0002935593770000062
represents the intra-class scatter matrix of X (i) ,
Figure BDA0002935593770000063
represents the inter-class scatter matrix of X (i) , cov(·,·) represents the covariance, i,j∈{1,2,3};

构建如下优化模型并求解得到投影矩阵Ω、Φ和Ψ:The following optimization model is constructed and solved to obtain the projection matrices Ω, Φ and Ψ:

Figure BDA0002935593770000064
Figure BDA0002935593770000064

进一步地,使用拉格朗日乘子法(Lagrange multiplier)求解DMCCA目标函数的优化模型,可得到如下拉格朗日(Lagrange)函数:Further, using the Lagrange multiplier method to solve the optimization model of the DMCCA objective function, the following Lagrange function can be obtained:

Figure BDA0002935593770000065
Figure BDA0002935593770000065

其中,λ是拉格朗日乘子,再分别求L(w(1),w(2),w(3))对w(1)、w(2)和w(3)的偏导数并令其为零,即令Among them, λ is the Lagrange multiplier, and then find the partial derivatives of L(w (1) , w (2) , w (3) ) to w (1) , w (2) and w (3) respectively and combine Let it be zero, that is,

Figure BDA0002935593770000066
Figure BDA0002935593770000066

得到get

Figure BDA0002935593770000067
Figure BDA0002935593770000067

进一步对上式作简化处理,则可获得如下的广义特征值问题:By further simplifying the above formula, the following generalized eigenvalue problem can be obtained:

Figure BDA0002935593770000068
Figure BDA0002935593770000068

通过求解上式中的广义特征值问题,选取前d个最大特征值λ1≥λ2≥…≥λd对应的特征向量,即可得到投影矩阵

Figure BDA0002935593770000071
Figure BDA0002935593770000072
Figure BDA0002935593770000073
By solving the generalized eigenvalue problem in the above formula, the projection matrix can be obtained by selecting the eigenvectors corresponding to the first d largest eigenvalues λ 1 ≥λ 2 ≥...≥λ d
Figure BDA0002935593770000071
Figure BDA0002935593770000072
and
Figure BDA0002935593770000073

基于相同的发明构思,本发明提供的融合注意力机制和DMCCA的多模态情感识别系统,包括:Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the present invention includes:

特征初步提取模块,用于对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量,对预处理后的外周生理信号,通过抽取信号波形描述符及其统计特征,提取外周生理信号特征向量;The feature preliminary extraction module is used to extract the EEG signal feature vector and the facial expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. Extract the signal waveform descriptor and its statistical features, and extract the peripheral physiological signal feature vector;

特征鉴别增强模块,用于对所述的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量,并分别使用注意力机制模块确定不同特征向量组的重要性权重,通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量;The feature identification enhancement module is used to map the EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector into several groups of feature vectors through a linear transformation matrix, and use the attention mechanism module to determine different feature vector groups. The importance weights of , and the discriminative EEG emotion feature vector, peripheral physiological emotion feature vector, and expression emotion feature vector with the same dimension are formed by weighted fusion;

投影矩阵确定模块,用于使用鉴别多重集典型相关分析(DMCCA)方法,通过最大化同一类别样本的不同模态情感特征之间的相关性,确定各情感特征向量的投影矩阵;The projection matrix determination module is used to determine the projection matrix of each emotion feature vector by maximizing the correlation between different modal emotion features of the same class of samples by using the discriminative multiple set canonical correlation analysis (DMCCA) method;

特征融合模块,用于对所述的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量,通过各自对应的投影矩阵投影到一个公共子空间,相加融合后得到脑电-外周生理-表情多模态情感特征向量;The feature fusion module is used to project the EEG emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector into a common subspace through their corresponding projection matrices, and then add and fuse to obtain the EEG-peripheral physiological emotion feature vector. - Expression multimodal emotion feature vector;

以及,分类识别模块,用于使用分类器对多模态情感特征向量进行分类识别,得到情感类别。And, the classification and recognition module is used for classifying and recognizing the multimodal emotion feature vector by using the classifier to obtain the emotion category.

基于相同的发明构思,本发明提供的融合注意力机制和DMCCA的多模态情感识别系统,包括至少一台计算设备,所述计算设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述计算机程序被加载至处理器时实现所述的融合注意力机制和DMCCA的多模态情感识别方法。Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the present invention includes at least one computing device, and the computing device includes a memory, a processor, and a multimodal emotion recognition system stored on the memory and available on the processor. A computer program running on the computer program, when the computer program is loaded into the processor, realizes the multimodal emotion recognition method of fusion attention mechanism and DMCCA.

有益效果:与现有技术相比,本发明具有以下技术效果:Beneficial effects: compared with the prior art, the present invention has the following technical effects:

(1)本发明采用注意力机制选择性地重点关注各模态中对情感识别起关键作用的显著性特征,自适应地学习具有情感鉴别能力的特征,可以有效提高多模态情感识别的准确率和鲁棒性。(1) The present invention adopts the attention mechanism to selectively focus on the salient features that play a key role in emotion recognition in each modal, and adaptively learns the features with emotion discrimination ability, which can effectively improve the accuracy of multimodal emotion recognition. rate and robustness.

(2)本发明采用鉴别多重集典型相关分析方法,引入了样本的类别信息,通过最大化同一类别样本不同模态情感特征之间的相关性,以及最大化同一模态情感特征的类间散布与最小化同一模态情感特征的类内散布,能够挖掘不同模态之间非线性的相关关系,充分利用脑电情感特征、外周生理情感特征和表情情感特征之间的相关性和互补性,同时又消除了一些无效的冗余特征,可以有效提高特征表示的鉴别力与鲁棒性。(2) The present invention adopts the method of identifying multiple sets of canonical correlation analysis, introduces the category information of samples, maximizes the correlation between different modal emotional features of the same category of samples, and maximizes the inter-class dispersion of the same modal emotional features In addition to minimizing the intra-class dispersion of emotional features of the same modality, it can mine nonlinear correlations between different modalities, and make full use of the correlation and complementarity between EEG emotional features, peripheral physiological emotional features, and facial emotional features. At the same time, some invalid redundant features are eliminated, which can effectively improve the discrimination and robustness of feature representation.

(3)与单模态情感识别方法相比,本发明综合利用了情感表达过程中的多种模态信息,能够结合不同模态的特点并充分利用其互补性来挖掘多模态情感特征,可以有效提高情感识别的准确率和鲁棒性。(3) Compared with the single-modal emotion recognition method, the present invention comprehensively utilizes multiple modal information in the process of emotional expression, and can combine the characteristics of different modalities and make full use of their complementarity to mine multi-modal emotional features. It can effectively improve the accuracy and robustness of emotion recognition.

附图说明Description of drawings

图1是本发明实施例的方法流程图;Fig. 1 is the method flow chart of the embodiment of the present invention;

图2是本发明实施例的结构图。FIG. 2 is a structural diagram of an embodiment of the present invention.

具体实施方式Detailed ways

为了更加详细了解本发明,下面结合说明书附图和具体实施例对本发明做进一步详细的说明。In order to understand the present invention in more detail, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

如图1和图2所示,本发明实施例提供的一种融合注意力机制和DMCCA的多模态情感识别方法,主要包括如下步骤:As shown in FIG. 1 and FIG. 2 , a multimodal emotion recognition method integrating attention mechanism and DMCCA provided by an embodiment of the present invention mainly includes the following steps:

(1)对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量,对预处理后的外周生理信号,通过抽取信号波形描述符及其统计特征,提取外周生理信号特征向量。(1) Extract the EEG signal feature vector and the expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. For the preprocessed peripheral physiological signal, extract the signal waveform to describe Symbol and its statistical characteristics, extract the peripheral physiological signal feature vector.

本实施例中采用DEAP(Database for Emotion Analysis using PhysiologicalSignals)情感数据库,在实际中也可以采用其他的包含脑电、外周生理信号、面部表情视频的情感数据库。本实施例中使用的DEAP数据库是由英国伦敦玛丽皇后大学的Koelstra等人采集并到公开的多模态情感数据库。该数据库包含32名受试者在观看40个时长均为1分钟的不同种类音乐视频片段诱发刺激下产生的生理信号、外周生理信号以及前22名受试者在观看音乐视频片段时的面部表情视频。每个受试者需要进行40次实验,并且在每次实验结束后都要进行及时的自我评估(Self-assessment Manikins,SAM),需要在SAM问卷调查表上进行40次自我评估。SAM问卷调查表包含受试者对视频的唤醒度(Arousal)、效价度(Valence)、支配度(Dominance)和喜好度(Liking)的心理量表。唤醒度表示人的状态兴奋程度,变化范围由平静状态逐渐过渡到兴奋状态,用数字1到9的分值来衡量;效价度也称愉悦度,表示人的心情愉悦程度,变化范围由消极(Negative)状态逐渐过渡到积极(Positive)状态,也用数字1到9的分值来衡量;支配度的变化范围从顺从(或“无控制”)到支配(或“有控制”);喜好度表示受试者对视频的个人喜好。每位受试者需要在每次实验后选择代表情感状态的分值,用作后面的情感分类的类别和识别分析。In this embodiment, the DEAP (Database for Emotion Analysis using Physiological Signals) emotion database is used. In practice, other emotion databases including EEG, peripheral physiological signals, and facial expression videos may also be used. The DEAP database used in this embodiment is a public multimodal emotion database collected by Koelstra et al. of Queen Mary University of London, UK. The database contains physiological signals, peripheral physiological signals, and facial expressions of the top 22 subjects when watching 40 different kinds of music video clips of 1 minute evoked stimuli. video. Each subject was required to conduct 40 experiments, and a timely self-assessment (Self-assessment Manikins, SAM) was required after each experiment, and 40 self-assessments were required on the SAM questionnaire. The SAM questionnaire contains psychological scales of the subjects' arousal, valence, dominance and liking to the video. The degree of arousal represents the degree of excitement of a person's state, and the change range is gradually transitioned from a calm state to an excited state, which is measured by the scores of numbers 1 to 9; Gradual transition from Negative to Positive, also measured on a scale of 1 to 9; dominance ranging from submissive (or "no control") to dominance (or "controlled"); liking The degree represents the subject's personal preference for the video. Each subject was required to select a score representing an emotional state after each experiment, which was used as a category and identification analysis for subsequent emotional classification.

在DEAP数据库中,生理信号采用512Hz采样、128Hz复采样(官方提供了经过预处理的复采样数据),每个受试者的生理信号矩阵为40×40×8064(40个不同种类音乐视频片段,40个生理信号通道,8064个采样点)。在采集的40个生理信号通道中,前32个通道采集的是脑电信号,后8个通道采集的是外周生理信号。8064个采样数据则是在128Hz采样率下时长为63s的数据,每一段信号记录前,都有3s静默时间。In the DEAP database, the physiological signal adopts 512Hz sampling and 128Hz multi-sampling (officially provides preprocessed multi-sampling data), and the physiological signal matrix of each subject is 40 × 40 × 8064 (40 different kinds of music video clips) , 40 physiological signal channels, 8064 sampling points). Among the 40 physiological signal channels collected, the first 32 channels collected EEG signals, and the last 8 channels collected peripheral physiological signals. The 8064 sampled data is data with a duration of 63s at a sampling rate of 128Hz. Before each signal is recorded, there is a 3s silence time.

在本发明实施例中,我们采用同时具有脑电信号、外周生理信号和面部表情的880个样本作为训练样本,在唤醒度、效价度、支配度和喜好度的4个维度上分别做二分类识别。In the embodiment of the present invention, we use 880 samples that simultaneously have EEG signals, peripheral physiological signals and facial expressions as training samples, and make two samples in the four dimensions of arousal, valence, dominance, and liking, respectively. Classification identification.

用于提取脑电信号特征的神经网络模型可以采用长短时记忆(Long Short-TermMemory,LSTM)网络或卷积神经网络(Convolutional Neural Network,CNN),用于提取表情特征的神经网络模型可以采用3D卷积神经网络、CNN-LSTM等。在本实施例中,使用训练好的卷积神经网络(CNN)模型对预处理后的脑电信号进行特征提取,得到256维的脑电信号特征向量;对预处理后的心电、呼吸、眼电、肌电等外周生理信号,通过抽取信号波形的低层描述符(Low Level Descriptor,LLD)及其统计特征(包括平均值、标准偏差、功率谱、中值、最大值和最小值),提取了128维的外周生理信号特征向量;对预处理后的面部表情视频使用训练好的CNN-LSTM模型提取256维的表情特征向量。The neural network model for extracting EEG signal features can use Long Short-Term Memory (LSTM) network or Convolutional Neural Network (CNN), and the neural network model for extracting facial expression features can use 3D Convolutional Neural Network, CNN-LSTM, etc. In this embodiment, a trained convolutional neural network (CNN) model is used to perform feature extraction on the preprocessed EEG signal to obtain a 256-dimensional EEG signal feature vector; For peripheral physiological signals such as OMG and EMG, by extracting the Low Level Descriptor (LLD) of the signal waveform and its statistical characteristics (including mean, standard deviation, power spectrum, median, maximum and minimum values), The 128-dimensional peripheral physiological signal feature vector is extracted; the trained CNN-LSTM model is used to extract the 256-dimensional expression feature vector for the preprocessed facial expression video.

(2)对脑电信号特征向量、外周生理信号特征向量、表情特征向量分别使用注意力机制模块提取有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量。(2) Using the attention mechanism module to extract the discriminative EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector for EEG signal feature vector, peripheral physiological signal feature vector and expression feature vector respectively.

(3)对脑电情感特征向量、外周生理情感特征向量和表情情感特征向量,使用鉴别多重集典型相关分析(DMCCA)方法,得到脑电-外周生理-表情多模态情感特征向量。(3) Using the discriminative multiple set canonical correlation analysis (DMCCA) method to obtain the EEG-peripheral physiology-expression multimodal emotion feature vector for EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector.

(4)使用分类器对多模态情感特征向量进行分类识别,得到情感类别。(4) Use the classifier to classify and identify the multimodal emotion feature vector to obtain the emotion category.

进一步地,步骤(2)中使用注意力机制模块提取有鉴别力的脑电情感特征、外周生理情感特征、表情情感特征的具体步骤包括:Further, the specific steps of using the attention mechanism module to extract discriminative EEG emotional features, peripheral physiological emotional features, and facial expression emotional features in step (2) include:

(2.1)将步骤(1)提取到的脑电信号特征以矩阵形式表示成

Figure BDA0002935593770000101
并通过线性变换矩阵W(1)映射成M1组特征向量
Figure BDA0002935593770000102
4≤M1≤16,每组特征向量的维数为N,16≤N≤64,令
Figure BDA0002935593770000103
其线性变换表达式为:(2.1) The EEG signal features extracted in step (1) are expressed in matrix form as
Figure BDA0002935593770000101
And through the linear transformation matrix W (1) is mapped into M 1 set of eigenvectors
Figure BDA0002935593770000102
4≤M 1 ≤16, the dimension of each set of eigenvectors is N, 16≤N≤64, let
Figure BDA0002935593770000103
Its linear transformation expression is:

E(1)=(F(1))TW(1) E (1) = (F (1) ) T W (1)

其中,上标(1)代表脑电模态,T表示转置符号。Among them, the superscript (1) represents the EEG modality, and T represents the transpose symbol.

使用第一个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的脑电情感特征向量,其中第r组脑电信号特征向量的权重

Figure BDA0002935593770000104
以及脑电情感特征向量x(1)表示为:The first attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative EEG emotion feature vector is formed through weighted fusion, where the weight of the rth group EEG signal feature vector
Figure BDA0002935593770000104
And the EEG emotion feature vector x (1) is expressed as:

Figure BDA0002935593770000105
Figure BDA0002935593770000105

Figure BDA0002935593770000106
Figure BDA0002935593770000106

其中,r=1,2,…,M1

Figure BDA0002935593770000107
表示第r组脑电信号特征向量,
Figure BDA0002935593770000108
为可训练的线性变换参数向量,exp(·)表示以自然常数e为底的指数函数。在本实施例中,M1=8,N=32。Among them, r=1,2,...,M 1 ,
Figure BDA0002935593770000107
represents the eigenvectors of the rth group EEG signals,
Figure BDA0002935593770000108
is the trainable linear transformation parameter vector, exp( ) represents the exponential function with the natural constant e as the base. In this embodiment, M 1 =8 and N=32.

为了训练线性变换矩阵W(1)的参数,需要在第一个注意力机制模块之后连接一个softmax分类器,将第一个注意力机制模块输出的脑电情感特征向量x(1)连接到softmax分类器的C个输出节点,经过softmax函数之后输出一个概率分布向量

Figure BDA0002935593770000111
其中c∈[1,C],C为情感类别数。In order to train the parameters of the linear transformation matrix W (1) , it is necessary to connect a softmax classifier after the first attention mechanism module, and connect the EEG emotion feature vector x (1) output by the first attention mechanism module to the softmax The C output nodes of the classifier, after passing through the softmax function, output a probability distribution vector
Figure BDA0002935593770000111
where c∈[1,C], C is the number of emotion categories.

进一步地,由下式所示的交叉熵损失函数来训练线性变换矩阵W(1)的参数。Further, the parameters of the linear transformation matrix W (1) are trained by the cross-entropy loss function shown in the following equation.

Figure BDA0002935593770000112
Figure BDA0002935593770000112

Figure BDA0002935593770000113
Figure BDA0002935593770000113

其中,x(1)为32维的脑电情感特征向量;

Figure BDA0002935593770000114
表示softmax分类模型预测情感类别的概率分布向量;
Figure BDA0002935593770000115
表示第m个脑电样本的真实情感类别标签,当采用one-hot编码时,若第m个脑电样本的真实情感类别标签为c,则
Figure BDA0002935593770000116
否则
Figure BDA0002935593770000117
表示softmax分类模型将第m个脑电样本预测为类别c的概率;Loss(1)表示线性变换矩阵W(1)在训练过程中的损失函数;在本实施例中,C=2,M=880。Among them, x (1) is the 32-dimensional EEG emotion feature vector;
Figure BDA0002935593770000114
Represents the probability distribution vector of the sentiment category predicted by the softmax classification model;
Figure BDA0002935593770000115
Represents the true emotional category label of the mth EEG sample. When one-hot encoding is used, if the true emotional category label of the mth EEG sample is c, then
Figure BDA0002935593770000116
otherwise
Figure BDA0002935593770000117
Represents the probability that the softmax classification model predicts the mth EEG sample as category c; Loss (1) represents the loss function of the linear transformation matrix W (1) in the training process; in this embodiment, C=2, M= 880.

通过误差反向传播算法不断迭代训练,直至模型参数达到最优。之后,就可对新输入的测试样本的脑电信号提取脑电情感特征向量x(1)Iteratively trains through the error back propagation algorithm until the model parameters reach the optimum. After that, the EEG emotion feature vector x (1) can be extracted from the EEG signal of the newly input test sample.

(2.2)将步骤(1)提取到的外周生理信号特征以矩阵形式表示成

Figure BDA0002935593770000118
并通过线性变换矩阵W(2)映射成M2组特征向量
Figure BDA0002935593770000119
4≤M2≤16,令
Figure BDA00029355937700001110
Figure BDA00029355937700001111
其线性变换表达式为:(2.2) The peripheral physiological signal features extracted in step (1) are expressed in matrix form as
Figure BDA0002935593770000118
And through the linear transformation matrix W (2) is mapped into M 2 groups of eigenvectors
Figure BDA0002935593770000119
4≤M 2 ≤16, let
Figure BDA00029355937700001110
Figure BDA00029355937700001111
Its linear transformation expression is:

E(2)=(F(2))TW(2) E (2) = (F (2) ) T W (2)

其中,上标(2)代表外周生理模态。Among them, the superscript (2) represents the peripheral physiological modality.

使用第二个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的外周生理情感特征向量,其中第s组外周生理信号特征向量的权重

Figure BDA00029355937700001112
以及外周生理情感特征向量x(2)表示为:The second attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative peripheral physiological emotion feature vector is formed by weighted fusion, where the weight of the sth group of peripheral physiological signal feature vectors
Figure BDA00029355937700001112
And the peripheral physiological emotion feature vector x (2) is expressed as:

Figure BDA00029355937700001113
Figure BDA00029355937700001113

Figure BDA0002935593770000121
Figure BDA0002935593770000121

其中,s=1,2,…,M2

Figure BDA0002935593770000122
表示第s组外周生理信号特征向量,
Figure BDA0002935593770000123
为可训练的线性变换参数向量。在本实施例中,M2=4。Among them, s=1,2,...,M 2 ,
Figure BDA0002935593770000122
represents the feature vector of peripheral physiological signals in group s,
Figure BDA0002935593770000123
is the trainable linear transformation parameter vector. In this embodiment, M 2 =4.

为了训练线性变换矩阵W(2)的参数,需要在第二个注意力机制模块之后连接一个softmax分类器,将第二个注意力机制模块输出的外周生理情感特征向量x(2)连接到softmax分类器的C个输出节点,经过softmax函数之后输出一个概率分布向量

Figure BDA0002935593770000124
In order to train the parameters of the linear transformation matrix W (2) , a softmax classifier needs to be connected after the second attention mechanism module, and the peripheral physiological emotion feature vector x (2) output by the second attention mechanism module is connected to the softmax The C output nodes of the classifier, after passing through the softmax function, output a probability distribution vector
Figure BDA0002935593770000124

进一步地,由下式所示的交叉熵损失函数来训练线性变换矩阵W(2)的参数。Further, the parameters of the linear transformation matrix W (2) are trained by the cross-entropy loss function shown in the following equation.

Figure BDA0002935593770000125
Figure BDA0002935593770000125

Figure BDA0002935593770000126
Figure BDA0002935593770000126

其中,x(2)为32维的外周生理情感特征向量;

Figure BDA0002935593770000127
表示softmax分类模型预测情感类别的概率分布向量;
Figure BDA0002935593770000128
表示第m个外周生理信号样本的真实情感类别标签,当采用one-hot编码时,若第m个外周生理信号样本的真实情感类别标签为c,则
Figure BDA0002935593770000129
否则
Figure BDA00029355937700001210
表示softmax分类模型将第m个外周生理信号样本预测为类别c的概率;Loss(2)表示线性变换矩阵W(2)在训练过程中的损失函数;在本实施例中,C=2,M=880。Among them, x (2) is the 32-dimensional peripheral physiological emotion feature vector;
Figure BDA0002935593770000127
Represents the probability distribution vector of the sentiment category predicted by the softmax classification model;
Figure BDA0002935593770000128
Represents the real emotion category label of the mth peripheral physiological signal sample. When one-hot encoding is used, if the real emotion category label of the mth peripheral physiological signal sample is c, then
Figure BDA0002935593770000129
otherwise
Figure BDA00029355937700001210
Represents the probability that the softmax classification model predicts the mth peripheral physiological signal sample as category c; Loss (2) represents the loss function of the linear transformation matrix W (2) in the training process; in this embodiment, C=2, M = 880.

通过误差反向传播算法不断迭代训练,直至模型参数达到最优。之后,就可对新输入的测试样本的外周生理信号提取外周生理情感特征向量x(2)Iteratively trains through the error back propagation algorithm until the model parameters reach the optimum. After that, the peripheral physiological emotion feature vector x (2) can be extracted from the peripheral physiological signal of the newly input test sample.

(2.3)将步骤(1)提取到的表情特征以矩阵形式表示成

Figure BDA00029355937700001211
并通过线性变换矩阵W(3)映射成M3组特征向量
Figure BDA00029355937700001212
4≤M3≤16,令
Figure BDA00029355937700001213
Figure BDA00029355937700001214
其线性变换表达式为:(2.3) Express the facial expression features extracted in step (1) in matrix form as
Figure BDA00029355937700001211
And through the linear transformation matrix W (3) mapped into M 3 groups of eigenvectors
Figure BDA00029355937700001212
4≤M 3 ≤16, let
Figure BDA00029355937700001213
Figure BDA00029355937700001214
Its linear transformation expression is:

E(3)=(F(3))TW(3) E (3) = (F (3) ) T W (3)

其中,上标(3)代表表情模态。Among them, the superscript (3) represents the expression mode.

使用第三个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的表情情感特征向量,其中第t组表情特征向量的权重

Figure BDA0002935593770000131
以及表情情感特征向量x(3)表示为:Use the third attention mechanism module to determine the importance weights of different feature vector groups, and form discriminative emotion feature vectors through weighted fusion, where the weight of the t-th group of expression feature vectors
Figure BDA0002935593770000131
And the expression emotion feature vector x (3) is expressed as:

Figure BDA0002935593770000132
Figure BDA0002935593770000132

Figure BDA0002935593770000133
Figure BDA0002935593770000133

其中,t=1,2,…,M3

Figure BDA0002935593770000134
表示第t组表情特征向量,
Figure BDA0002935593770000135
为可训练的线性变换参数向量。在本实施例中,M3=8。where, t=1, 2, ..., M 3 ,
Figure BDA0002935593770000134
represents the t-th group of expression feature vectors,
Figure BDA0002935593770000135
is the trainable linear transformation parameter vector. In this embodiment, M 3 =8.

为了训练线性变换矩阵W(3)的参数,需要在第三个注意力机制模块之后连接一个softmax分类器,将第三个注意力机制模块输出的表情情感特征向量x(3)连接到softmax分类器的C个输出节点,经过softmax函数之后输出一个概率分布向量

Figure BDA0002935593770000136
In order to train the parameters of the linear transformation matrix W (3) , it is necessary to connect a softmax classifier after the third attention mechanism module, and connect the expression emotion feature vector x (3) output by the third attention mechanism module to the softmax classification The C output nodes of the device, after the softmax function, output a probability distribution vector
Figure BDA0002935593770000136

进一步地,由下式所示的交叉熵损失函数来训练线性变换矩阵W(3)的参数。Further, the parameters of the linear transformation matrix W (3) are trained by the cross-entropy loss function shown in the following equation.

Figure BDA0002935593770000137
Figure BDA0002935593770000137

Figure BDA0002935593770000138
Figure BDA0002935593770000138

其中,x(3)为32维的表情情感特征向量;

Figure BDA0002935593770000139
表示softmax分类模型预测情感类别的概率分布向量;
Figure BDA00029355937700001310
表示第m个表情视频样本的真实情感类别标签,当采用one-hot编码时,若第m个表情视频样本的真实情感类别标签为c,则
Figure BDA00029355937700001311
否则
Figure BDA00029355937700001312
表示softmax分类模型将第m个表情视频样本预测为类别c的概率;Loss(3)表示线性变换矩阵W(3)在训练过程中的损失函数;在本实施例中,C=2,M=880。Among them, x (3) is a 32-dimensional expression emotion feature vector;
Figure BDA0002935593770000139
Represents the probability distribution vector of the sentiment category predicted by the softmax classification model;
Figure BDA00029355937700001310
Represents the real emotion category label of the mth expression video sample. When using one-hot encoding, if the real emotion category label of the mth expression video sample is c, then
Figure BDA00029355937700001311
otherwise
Figure BDA00029355937700001312
Represents the probability that the softmax classification model predicts the mth expression video sample as category c; Loss (3) represents the loss function of the linear transformation matrix W (3) in the training process; in this embodiment, C=2, M= 880.

通过误差反向传播算法不断迭代训练,直至模型参数达到最优。之后,就可对新输入的测试样本的表情视频提取表情情感特征向量x(3)Iteratively trains through the error back propagation algorithm until the model parameters reach the optimum. After that, the expression emotion feature vector x (3) can be extracted from the expression video of the newly input test sample.

进一步地,步骤(3)具体包括以下子步骤:Further, step (3) specifically includes the following substeps:

(3.1)获取通过训练得到的分别对应于脑电情感特征、外周生理情感特征和表情情感特征的DMCCA投影矩阵

Figure BDA0002935593770000141
Figure BDA0002935593770000142
Figure BDA0002935593770000143
32≤d≤128。在本实施例中,d=40。(3.1) Obtaining the DMCCA projection matrix corresponding to EEG emotional features, peripheral physiological emotional features and facial emotional features obtained through training
Figure BDA0002935593770000141
Figure BDA0002935593770000142
and
Figure BDA0002935593770000143
32≤d≤128. In this embodiment, d=40.

(3.2)分别使用投影矩阵Ω、Φ和Ψ将步骤(2)提取到的脑电情感特征向量x(1)、外周生理情感特征向量x(2)和表情情感特征向量x(3)投影到一个d维的公共子空间,其中脑电情感特征向量x(1)到d维公共子空间的投影为ΩTx(1),外周生理情感特征向量x(2)到d维公共子空间的投影为ΦTx(2),表情情感特征向量x(3)到d维公共子空间的投影为ΨTx(3)(3.2) Using projection matrices Ω, Φ and Ψ to project the EEG emotion feature vector x (1) , peripheral physiological emotion feature vector x (2) and expression emotion feature vector x (3) extracted in step (2) to A d-dimensional common subspace, in which the projection of the EEG emotion feature vector x (1) to the d-dimensional common subspace is Ω T x (1) , and the peripheral physiological emotion feature vector x (2) to the d-dimensional common subspace. The projection is Φ T x (2) , and the projection of the expression emotion feature vector x (3) to the d-dimensional common subspace is Ψ T x (3) .

(3.3)将ΩTx(1)、ΦTx(2)和ΨTx(3)进行融合,得到脑电-外周生理-表情多模态情感特征向量为ΩTx(1)Tx(2)Tx(3)(3.3) Integrate Ω T x (1) , Φ T x (2) and Ψ T x (3) to obtain the EEG-peripheral physiology-expression multimodal emotional feature vector as Ω T x (1)T x (2) + Ψ T x (3) .

进一步地,步骤(3.1)中的投影矩阵Ω、Φ和Ψ通过以下步骤的训练得到:Further, the projection matrices Ω, Φ and Ψ in step (3.1) are obtained through the training of the following steps:

(3.1.1)对于训练样本集中的C类情感类别的样本生成3组情感特征向量

Figure BDA0002935593770000144
其中
Figure BDA0002935593770000145
M为训练样本数(本例中样本集中数据量不大,所有样本参与计算,数据量大的样本集可随机抽取各情感类别的样本),i=1,2,3,m=1,2,…,M;令i=1代表脑电模态,i=2代表外周生理模态,i=3代表表情模态,
Figure BDA0002935593770000146
代表脑电情感特征向量,
Figure BDA0002935593770000147
代表外周生理情感特征向量,
Figure BDA0002935593770000148
代表表情情感特征向量;在本实施例中,C=2,M=880,N=32。(3.1.1) Generate 3 sets of emotion feature vectors for the samples of the C-type emotion category in the training sample set
Figure BDA0002935593770000144
in
Figure BDA0002935593770000145
M is the number of training samples (in this example, the amount of data in the sample set is not large, all samples participate in the calculation, and the sample set with a large amount of data can randomly select samples of each emotion category), i=1, 2, 3, m=1, 2 , ..., M; let i=1 represent the EEG modality, i=2 represent the peripheral physiological modality, and i=3 represent the expression modality,
Figure BDA0002935593770000146
represents the EEG emotion feature vector,
Figure BDA0002935593770000147
represents the peripheral physiological emotion feature vector,
Figure BDA0002935593770000148
Represents the facial expression emotion feature vector; in this embodiment, C=2, M=880, and N=32.

(3.1.2)计算X(i)中各列向量的均值

Figure BDA0002935593770000149
对X(i)进行中心化操作,得到
Figure BDA00029355937700001410
为了便于描述,下面将中心化后的
Figure BDA00029355937700001411
仍记为X(i),即假设
Figure BDA00029355937700001412
均已被中心化。(3.1.2) Calculate the mean of each column vector in X (i)
Figure BDA0002935593770000149
Perform a centralization operation on X (i) to get
Figure BDA00029355937700001410
For the convenience of description, the following will be centralized
Figure BDA00029355937700001411
Still denoted as X (i) , that is, assuming
Figure BDA00029355937700001412
have been centralized.

(3.1.3)鉴别多重集典型相关分析(DMCCA)的思想旨在求得一组投影矩阵Ω、Φ和Ψ,使得同类样本在公共投影子空间的线性相关性最大,同时还最大化了模态内数据的类间散布与最小化了模态内数据的类内散布,令X(i)的投影向量为

Figure BDA0002935593770000151
i=1,2,3,DMCCA的目标函数为:(3.1.3) The idea of discriminative multiple set canonical correlation analysis (DMCCA) aims to obtain a set of projection matrices Ω, Φ and Ψ, so that the linear correlation of similar samples in the common projection subspace is maximized, and the modulus is maximized. The inter-class scatter of intra-modal data and the intra-class scatter of intra-modal data are minimized, let the projection vector of X (i) be
Figure BDA0002935593770000151
i=1, 2, 3, the objective function of DMCCA is:

Figure BDA0002935593770000152
Figure BDA0002935593770000152

其中,

Figure BDA0002935593770000153
表示X(i)的类内散布矩阵,
Figure BDA0002935593770000154
表示X(i)的类间散布矩阵,cov(·,·)表示协方差,i,j∈{1,2,3}。in,
Figure BDA0002935593770000153
represents the intra-class scatter matrix of X (i) ,
Figure BDA0002935593770000154
represents the inter-class scatter matrix of X (i) , cov(·,·) represents the covariance, i,j∈{1,2,3}.

对DMCCA目标函数的求解可以表示为如下的优化模型:The solution to the DMCCA objective function can be expressed as the following optimization model:

Figure BDA0002935593770000155
Figure BDA0002935593770000155

(3.1.4)使用拉格朗日乘子法(Lagrange multiplier)求解DMCCA目标函数的优化模型,可得到如下拉格朗日(Lagrange)函数:(3.1.4) Using the Lagrange multiplier method to solve the optimization model of the DMCCA objective function, the following Lagrange function can be obtained:

Figure BDA0002935593770000156
Figure BDA0002935593770000156

其中,λ是拉格朗日乘子,再分别求L(w(1),w(2),w(3))对w(1)、w(2)和w(3)的偏导数并令其为零,即令Among them, λ is the Lagrange multiplier, and then find the partial derivatives of L(w (1) , w (2) , w (3) ) to w (1) , w (2) and w (3) respectively and combine Let it be zero, that is,

Figure BDA0002935593770000157
Figure BDA0002935593770000157

得到get

Figure BDA0002935593770000158
Figure BDA0002935593770000158

进一步对上式作简化处理,则可获得如下的广义特征值问题:By further simplifying the above formula, the following generalized eigenvalue problem can be obtained:

Figure BDA0002935593770000161
Figure BDA0002935593770000161

通过求解上式中的广义特征值问题,选取前d个最大特征值λ1≥λ2≥…≥λd对应的特征向量,即可得到投影矩阵

Figure BDA0002935593770000162
Figure BDA0002935593770000163
Figure BDA0002935593770000164
在本实施例中,d=40。By solving the generalized eigenvalue problem in the above formula, the projection matrix can be obtained by selecting the eigenvectors corresponding to the first d largest eigenvalues λ 1 ≥λ 2 ≥...≥λ d
Figure BDA0002935593770000162
Figure BDA0002935593770000163
and
Figure BDA0002935593770000164
In this embodiment, d=40.

基于相同的发明构思,本发明实施例提供的融合注意力机制和DMCCA的多模态情感识别系统,包括:Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the embodiment of the present invention includes:

特征初步提取模块,用于对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量,对预处理后的外周生理信号,通过抽取信号波形描述符及其统计特征,提取外周生理信号特征向量;The feature preliminary extraction module is used to extract the EEG signal feature vector and the facial expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. Extract the signal waveform descriptor and its statistical features, and extract the peripheral physiological signal feature vector;

特征鉴别增强模块,用于对的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量,并分别使用注意力机制模块确定不同特征向量组的重要性权重,通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量;The feature identification enhancement module is used to map the EEG signal feature vectors, peripheral physiological signal feature vectors, and expression feature vectors into several groups of feature vectors through linear transformation matrices, and use the attention mechanism module to determine the importance of different feature vector groups. Sex weight, through weighted fusion to form discriminative EEG emotion feature vector, peripheral physiological emotion feature vector, and expression emotion feature vector with the same dimension;

投影矩阵确定模块,用于使用DMCCA方法,通过最大化同一类别样本的不同模态情感特征之间的相关性,确定各情感特征向量的投影矩阵;The projection matrix determination module is used to determine the projection matrix of each emotional feature vector by maximizing the correlation between the emotional features of different modalities of the same category of samples by using the DMCCA method;

特征融合模块,用于对的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量,通过各自对应的投影矩阵投影到一个公共子空间,相加融合后得到脑电-外周生理-表情多模态情感特征向量;The feature fusion module is used to project the corresponding EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector into a common subspace through their corresponding projection matrices, and then add and fuse to obtain EEG-peripheral physiology-expression Multimodal emotion feature vector;

以及,分类识别模块,用于使用分类器对多模态情感特征向量进行分类识别,得到情感类别。And, the classification and recognition module is used for classifying and recognizing the multimodal emotion feature vector by using the classifier to obtain the emotion category.

各模块的具体实现参考上述方法实施例,不再赘述。本领域技术人员可以理解,可以对实施例中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个系统中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。For the specific implementation of each module, reference is made to the foregoing method embodiments, and details are not repeated here. Those skilled in the art will appreciate that the modules in an embodiment can be adaptively changed and placed in one or more systems different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies.

基于相同的发明构思,本发明实施例提供的融合注意力机制和DMCCA的多模态情感识别系统,包括至少一台计算设备,该计算设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,该计算机程序被加载至处理器时实现上述的融合注意力机制和DMCCA的多模态情感识别方法。Based on the same inventive concept, the multimodal emotion recognition system integrating attention mechanism and DMCCA provided by the embodiment of the present invention includes at least one computing device, and the computing device includes a memory, a processor, and a multimodal emotion recognition system stored on the memory and capable of processing A computer program running on the processor, when the computer program is loaded into the processor, realizes the above-mentioned fusion attention mechanism and the multimodal emotion recognition method of DMCCA.

本发明所公开的技术方案既包含了上述实施方案中涉及的技术方法,也包括由以上技术方法任意组合成的技术方案。本技术领域的普通技术人员,在不脱离本发明的原理的前提下,可以作出一定的改善和修饰,这些改善和修饰也被认为本发明的保护范围。The technical solutions disclosed in the present invention include not only the technical methods involved in the above embodiments, but also the technical solutions formed by any combination of the above technical methods. Those skilled in the art can make certain improvements and modifications without departing from the principles of the present invention, and these improvements and modifications are also considered to be within the protection scope of the present invention.

Claims (7)

1.融合注意力机制和DMCCA的多模态情感识别方法,其特征在于,所述方法包括以下步骤:1. the multimodal emotion recognition method of fusion attention mechanism and DMCCA, is characterized in that, described method comprises the following steps: (1)对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量,对预处理后的外周生理信号,通过抽取信号波形描述符及其统计特征,提取外周生理信号特征向量;(1) Extract the EEG signal feature vector and the expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. For the preprocessed peripheral physiological signal, extract the signal waveform to describe Symbol and its statistical characteristics, extract peripheral physiological signal feature vector; (2)对所述的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量,并分别使用注意力机制模块确定不同特征向量组的重要性权重,通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量;(2) The EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector are respectively mapped into several groups of feature vectors through a linear transformation matrix, and the attention mechanism module is used to determine the importance weights of different feature vector groups. , through weighted fusion to form discriminative EEG emotion feature vectors, peripheral physiological emotion feature vectors, and expression emotion feature vectors with the same dimension; (3)对所述的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量,使用鉴别多重集典型相关分析(DMCCA)方法,通过最大化同一类别样本的不同模态情感特征之间的相关性,确定各情感特征向量的投影矩阵,并将各情感特征向量投影到一个公共子空间,相加融合后得到脑电-外周生理-表情多模态情感特征向量;(3) For the EEG emotion feature vector, peripheral physiological emotion feature vector and expression emotion feature vector, the discriminative multiple set canonical correlation analysis (DMCCA) method is used to maximize the difference between different modal emotion features of the same class of samples. Determine the projection matrix of each emotional feature vector, project each emotional feature vector into a common subspace, and obtain the EEG-peripheral physiology-expression multimodal emotional feature vector after addition and fusion; (4)使用分类器对多模态情感特征向量进行分类识别,得到情感类别。(4) Use the classifier to classify and identify the multimodal emotion feature vector to obtain the emotion category. 2.根据权利要求1所述的融合注意力机制和DMCCA的多模态情感识别方法,其特征在于,步骤(2)包括以下子步骤:2. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 1, is characterized in that, step (2) comprises the following substeps: (2.1)将步骤(1)提取到的脑电信号特征以矩阵形式表示成
Figure FDA0002935593760000011
并通过线性变换矩阵W(1)映射成M1组特征向量
Figure FDA0002935593760000012
4≤M1≤16,每组特征向量的维数为N,16≤N≤64,令
Figure FDA0002935593760000013
其线性变换表达式为:
(2.1) The EEG signal features extracted in step (1) are expressed in matrix form as
Figure FDA0002935593760000011
And through the linear transformation matrix W (1) is mapped into M 1 set of eigenvectors
Figure FDA0002935593760000012
4≤M 1 ≤16, the dimension of each set of eigenvectors is N, 16≤N≤64, let
Figure FDA0002935593760000013
Its linear transformation expression is:
E(1)=(F(1))TW(1) E (1) = (F (1) ) T W (1) 其中,上标(1)代表脑电模态,T表示转置符号;Among them, the superscript (1) represents the EEG mode, and T represents the transpose symbol; 使用第一个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的脑电情感特征向量,其中第r组脑电信号特征向量的权重
Figure FDA0002935593760000014
以及脑电情感特征向量x(1)表示为:
The first attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative EEG emotion feature vector is formed through weighted fusion, where the weight of the rth group EEG signal feature vector
Figure FDA0002935593760000014
And the EEG emotion feature vector x (1) is expressed as:
Figure FDA0002935593760000021
Figure FDA0002935593760000021
Figure FDA0002935593760000022
Figure FDA0002935593760000022
其中,r=1,2,…,M1
Figure FDA0002935593760000023
表示第r组脑电信号特征向量,
Figure FDA0002935593760000024
为可训练的线性变换参数向量,exp(·)表示以自然常数e为底的指数函数;
Among them, r=1,2,...,M 1 ,
Figure FDA0002935593760000023
represents the eigenvectors of the rth group EEG signals,
Figure FDA0002935593760000024
is the trainable linear transformation parameter vector, exp( ) represents the exponential function with the natural constant e as the base;
(2.2)将步骤(1)提取到的外周生理信号特征以矩阵形式表示成
Figure FDA0002935593760000025
并通过线性变换矩阵W(2)映射成M2组特征向量
Figure FDA0002935593760000026
4≤M2≤16,令
Figure FDA0002935593760000027
Figure FDA0002935593760000028
其线性变换表达式为:
(2.2) The peripheral physiological signal features extracted in step (1) are expressed in matrix form as
Figure FDA0002935593760000025
And through the linear transformation matrix W (2) is mapped into M 2 groups of eigenvectors
Figure FDA0002935593760000026
4≤M 2 ≤16, let
Figure FDA0002935593760000027
Figure FDA0002935593760000028
Its linear transformation expression is:
E(2)=(F(2))TW(2) E (2) = (F (2) ) T W (2) 其中,上标(2)代表外周生理模态;Among them, the superscript (2) represents the peripheral physiological mode; 使用第二个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的外周生理情感特征向量,其中第s组外周生理信号特征向量的权重
Figure FDA0002935593760000029
以及外周生理情感特征向量x(2)表示为:
The second attention mechanism module is used to determine the importance weights of different feature vector groups, and a discriminative peripheral physiological emotion feature vector is formed through weighted fusion, where the weight of the sth group of peripheral physiological signal feature vectors
Figure FDA0002935593760000029
And the peripheral physiological emotion feature vector x (2) is expressed as:
Figure FDA00029355937600000210
Figure FDA00029355937600000210
Figure FDA00029355937600000211
Figure FDA00029355937600000211
其中,s=1,2,…,M2
Figure FDA00029355937600000212
表示第s组外周生理信号特征向量,
Figure FDA00029355937600000213
为可训练的线性变换参数向量;
Among them, s=1,2,...,M 2 ,
Figure FDA00029355937600000212
represents the feature vector of peripheral physiological signals in group s,
Figure FDA00029355937600000213
is a trainable linear transformation parameter vector;
(2.3)将步骤(1)提取到的表情特征以矩阵形式表示成
Figure FDA00029355937600000214
并通过线性变换矩阵W(3)映射成M3组特征向量
Figure FDA0002935593760000031
4≤M3≤16,令
Figure FDA0002935593760000032
Figure FDA0002935593760000033
其线性变换表达式为:
(2.3) Express the facial expression features extracted in step (1) in matrix form as
Figure FDA00029355937600000214
And through the linear transformation matrix W (3) mapped into M 3 groups of eigenvectors
Figure FDA0002935593760000031
4≤M 3 ≤16, let
Figure FDA0002935593760000032
Figure FDA0002935593760000033
Its linear transformation expression is:
E(3)=(F(3))TW(3) E (3) = (F (3) ) T W (3) 其中,上标(3)代表表情模态;Among them, the superscript (3) represents the expression mode; 使用第三个注意力机制模块来确定不同特征向量组的重要性权重,通过加权融合形成有鉴别力的表情情感特征向量,其中第t组表情特征向量的权重
Figure FDA0002935593760000034
以及表情情感特征向量x(3)表示为:
Use the third attention mechanism module to determine the importance weights of different feature vector groups, and form discriminative emotion feature vectors through weighted fusion, where the weight of the t-th group of expression feature vectors
Figure FDA0002935593760000034
And the expression emotion feature vector x (3) is expressed as:
Figure FDA0002935593760000035
Figure FDA0002935593760000035
Figure FDA0002935593760000036
Figure FDA0002935593760000036
其中,t=1,2,…,M3
Figure FDA0002935593760000037
表示第t组表情特征向量,
Figure FDA0002935593760000038
为可训练的线性变换参数向量。
Among them, t=1,2,...,M 3 ,
Figure FDA0002935593760000037
represents the t-th group of expression feature vectors,
Figure FDA0002935593760000038
is the trainable linear transformation parameter vector.
3.根据权利要求2所述的融合注意力机制和DMCCA的多模态情感识别方法,其特征在于,步骤(3)包括以下子步骤:3. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 2, is characterized in that, step (3) comprises following substep: (3.1)获取通过训练得到的分别对应于脑电情感特征、外周生理情感特征和表情情感特征的DMCCA投影矩阵
Figure FDA0002935593760000039
Figure FDA00029355937600000310
Figure FDA00029355937600000311
32≤d≤128;
(3.1) Obtaining the DMCCA projection matrix corresponding to EEG emotional features, peripheral physiological emotional features and facial emotional features obtained through training
Figure FDA0002935593760000039
Figure FDA00029355937600000310
and
Figure FDA00029355937600000311
32≤d≤128;
(3.2)分别使用投影矩阵Ω、Φ和Ψ将步骤(2)提取到的脑电情感特征向量x(1)、外周生理情感特征向量x(2)和表情情感特征向量x(3)投影到一个d维的公共子空间,其中脑电情感特征向量x(1)到d维公共子空间的投影为ΩTx(1),外周生理情感特征向量x(2)到d维公共子空间的投影为ΦTx(2),表情情感特征向量x(3)到d维公共子空间的投影为ΨTx(3)(3.2) Using projection matrices Ω, Φ and Ψ to project the EEG emotion feature vector x (1) , peripheral physiological emotion feature vector x (2) and expression emotion feature vector x (3) extracted in step (2) to A d-dimensional common subspace, in which the projection of the EEG emotion feature vector x (1) to the d-dimensional common subspace is Ω T x (1) , and the peripheral physiological emotion feature vector x (2) to the d-dimensional common subspace. The projection is Φ T x (2) , and the projection of the expression emotion feature vector x (3) to the d-dimensional common subspace is Ψ T x (3) ; (3.3)将ΩTx(1)、ΦTx(2)和ΨTx(3)进行融合,得到脑电-外周生理-表情多模态情感特征向量为ΩTx(1)Tx(2)Tx(3)(3.3) Integrate Ω T x (1) , Φ T x (2) and Ψ T x (3) to obtain the EEG-peripheral physiology-expression multimodal emotional feature vector as Ω T x (1)T x (2) + Ψ T x (3) .
4.根据权利要求3所述的融合注意力机制和DMCCA的多模态情感识别方法,其特征在于,步骤(3.1)中的投影矩阵Ω、Φ和Ψ通过以下步骤的训练得到:4. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 3, is characterized in that, the projection matrix Ω, Φ and Ψ in step (3.1) obtain by the training of following steps: (3.1.1)从训练样本集中分别抽取各情感类别的训练样本生成3组情感特征向量
Figure FDA0002935593760000041
其中
Figure FDA0002935593760000042
M为训练样本数,i=1,2,3,m=1,2,…,M;令i=1代表脑电模态,i=2代表外周生理模态,i=3代表表情模态,
Figure FDA0002935593760000043
代表脑电情感特征向量,
Figure FDA0002935593760000044
代表外周生理情感特征向量,
Figure FDA0002935593760000045
代表表情情感特征向量;
(3.1.1) Extract the training samples of each emotion category from the training sample set to generate 3 sets of emotion feature vectors
Figure FDA0002935593760000041
in
Figure FDA0002935593760000042
M is the number of training samples, i=1, 2, 3, m=1, 2, ..., M; let i=1 represent the EEG modality, i=2 represent the peripheral physiological modality, and i=3 represent the expression modality ,
Figure FDA0002935593760000043
represents the EEG emotion feature vector,
Figure FDA0002935593760000044
represents the peripheral physiological emotion feature vector,
Figure FDA0002935593760000045
Represents the expression emotion feature vector;
(3.1.2)计算X(i)中各列向量的均值,对X(i)进行中心化操作;(3.1.2) Calculate the mean value of each column vector in X ( i ), and perform the centering operation on X (i); (3.1.3)基于鉴别多重集典型相关分析(DMCCA)的思想求得一组投影矩阵Ω、Φ和Ψ,使得同类样本在公共投影子空间的线性相关性最大,同时最大化模态内数据的类间散布与最小化模态内数据的类内散布,令X(i)的投影向量为
Figure FDA0002935593760000046
DMCCA的目标函数为:
(3.1.3) Based on the idea of discriminative multiple set canonical correlation analysis (DMCCA), a set of projection matrices Ω, Φ and Ψ are obtained, so that the linear correlation of similar samples in the common projection subspace is maximized, and the intra-modal data is maximized. The inter-class scatter of and minimizing the intra-class scatter of intra-modal data, let the projection vector of X (i) be
Figure FDA0002935593760000046
The objective function of DMCCA is:
Figure FDA0002935593760000047
Figure FDA0002935593760000047
其中,
Figure FDA0002935593760000048
表示X(i)的类内散布矩阵,
Figure FDA0002935593760000049
Figure FDA00029355937600000410
表示X(i)的类间散布矩阵,cov(·,·)表示协方差,i,j∈{1,2,3};构建如下优化模型并求解得到投影矩阵Ω、Φ和Ψ:
in,
Figure FDA0002935593760000048
represents the intra-class scatter matrix of X (i) ,
Figure FDA0002935593760000049
Figure FDA00029355937600000410
represents the inter-class scatter matrix of X (i) , cov( , ) represents the covariance, i, j ∈ {1, 2, 3}; construct the following optimization model and solve to obtain the projection matrices Ω, Φ and Ψ:
Figure FDA00029355937600000411
Figure FDA00029355937600000411
5.根据权利要求4所述的融合注意力机制和DMCCA的多模态情感识别方法,其特征在于,使用拉格朗日乘子法求解所构建的DMCCA目标函数的优化模型,具体为:将优化模型表示为如下拉格朗日函数:5. the multimodal emotion recognition method of fusion attention mechanism and DMCCA according to claim 4, is characterized in that, uses Lagrangian multiplier method to solve the optimization model of the constructed DMCCA objective function, is specially: The optimization model is expressed as the following Lagrangian function:
Figure FDA0002935593760000051
Figure FDA0002935593760000051
其中,λ是拉格朗日乘子,再分别求L(w(1),w(2),w(3))对w(1)、w(2)和w(3)的偏导数并令其为零,即令Among them, λ is the Lagrange multiplier, and then find the partial derivatives of L(w (1) , w (2) , w (3) ) to w (1) , w (2) and w (3) respectively and combine Let it be zero, that is,
Figure FDA0002935593760000052
Figure FDA0002935593760000052
得到get
Figure FDA0002935593760000053
Figure FDA0002935593760000053
进一步对上式作简化处理,则可获得如下的广义特征值问题:By further simplifying the above formula, the following generalized eigenvalue problem can be obtained:
Figure FDA0002935593760000054
Figure FDA0002935593760000054
通过求解上式中的广义特征值问题,选取前d个最大特征值λ1≥λ2≥…≥λd对应的特征向量,即可得到投影矩阵
Figure FDA0002935593760000055
Figure FDA0002935593760000056
Figure FDA0002935593760000057
By solving the generalized eigenvalue problem in the above formula, the projection matrix can be obtained by selecting the eigenvectors corresponding to the first d largest eigenvalues λ 1 ≥λ 2 ≥...≥λ d
Figure FDA0002935593760000055
Figure FDA0002935593760000056
and
Figure FDA0002935593760000057
6.融合注意力机制和DMCCA的多模态情感识别系统,其特征在于,包括:6. A multimodal emotion recognition system integrating attention mechanism and DMCCA, characterized in that it includes: 特征初步提取模块,用于对经过预处理后的脑电信号和面部表情视频分别使用各自训练好的神经网络模型提取脑电信号特征向量和表情特征向量,对预处理后的外周生理信号,通过抽取信号波形描述符及其统计特征,提取外周生理信号特征向量;The feature preliminary extraction module is used to extract the EEG signal feature vector and the facial expression feature vector from the preprocessed EEG signal and facial expression video using the respective trained neural network models. Extract the signal waveform descriptor and its statistical features, and extract the peripheral physiological signal feature vector; 特征鉴别增强模块,用于对所述的脑电信号特征向量、外周生理信号特征向量、表情特征向量分别通过线性变换矩阵映射成若干组特征向量,并分别使用注意力机制模块确定不同特征向量组的重要性权重,通过加权融合形成维数相同的有鉴别力的脑电情感特征向量、外周生理情感特征向量、表情情感特征向量;The feature identification enhancement module is used to map the EEG signal feature vector, peripheral physiological signal feature vector, and expression feature vector into several groups of feature vectors through a linear transformation matrix, and use the attention mechanism module to determine different feature vector groups. The importance weights of , and the discriminative EEG emotion feature vector, peripheral physiological emotion feature vector, and expression emotion feature vector with the same dimension are formed by weighted fusion; 投影矩阵确定模块,用于使用鉴别多重集典型相关分析(DMCCA)方法,通过最大化同一类别样本的不同模态情感特征之间的相关性,确定各情感特征向量的投影矩阵;The projection matrix determination module is used to determine the projection matrix of each emotion feature vector by maximizing the correlation between different modal emotion features of the same class of samples by using the discriminative multiple set canonical correlation analysis (DMCCA) method; 特征融合模块,用于对所述的脑电情感特征向量、外周生理情感特征向量和表情情感特征向量,通过各自对应的投影矩阵投影到一个公共子空间,相加融合后得到脑电-外周生理-表情多模态情感特征向量;The feature fusion module is used to project the EEG emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector into a common subspace through their corresponding projection matrices, and then add and fuse to obtain the EEG-peripheral physiological emotion feature vector. - Expression multimodal emotion feature vector; 以及,分类识别模块,用于使用分类器对多模态情感特征向量进行分类识别,得到情感类别。And, the classification and recognition module is used for classifying and recognizing the multimodal emotion feature vector by using the classifier to obtain the emotion category. 7.融合注意力机制和DMCCA的多模态情感识别系统,其特征在于,包括至少一台计算设备,所述计算设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述计算机程序被加载至处理器时实现根据权利要求1-5任一项所述的融合注意力机制和DMCCA的多模态情感识别方法。7. the multimodal emotion recognition system of fusion attention mechanism and DMCCA, is characterized in that, comprises at least one computing device, and described computing device comprises memory, processor and the computer that is stored on memory and can run on processor A program, when the computer program is loaded into the processor, implements the multimodal emotion recognition method fused with the attention mechanism and DMCCA according to any one of claims 1-5.
CN202110159085.8A 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA Active CN112800998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110159085.8A CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110159085.8A CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Publications (2)

Publication Number Publication Date
CN112800998A true CN112800998A (en) 2021-05-14
CN112800998B CN112800998B (en) 2022-07-29

Family

ID=75814276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110159085.8A Active CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Country Status (1)

Country Link
CN (1) CN112800998B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269173A (en) * 2021-07-20 2021-08-17 佛山市墨纳森智能科技有限公司 Method and device for establishing emotion recognition model and recognizing human emotion
CN113297981A (en) * 2021-05-27 2021-08-24 西北工业大学 End-to-end electroencephalogram emotion recognition method based on attention mechanism
CN113326781A (en) * 2021-05-31 2021-08-31 合肥工业大学 Non-contact anxiety recognition method and device based on face video
CN113616209A (en) * 2021-08-25 2021-11-09 西南石油大学 Identification method of schizophrenia patients based on spatiotemporal attention mechanism
CN113729710A (en) * 2021-09-26 2021-12-03 华南师范大学 Real-time attention assessment method and system integrating multiple physiological modes
CN113749656A (en) * 2021-08-20 2021-12-07 杭州回车电子科技有限公司 Emotion identification method and device based on multi-dimensional physiological signals
CN114091599A (en) * 2021-11-16 2022-02-25 上海交通大学 Method for recognizing emotion of intensive interaction deep neural network among modalities
CN114298189A (en) * 2021-12-20 2022-04-08 深圳市海清视讯科技有限公司 Fatigue driving detection method, device, equipment and storage medium
CN114767130A (en) * 2022-04-26 2022-07-22 郑州大学 Multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging
CN114882330A (en) * 2022-04-29 2022-08-09 合肥工业大学 Road and bridge engineering worker oriented psychological state monitoring method and device
CN114947852A (en) * 2022-06-14 2022-08-30 华南师范大学 Multi-mode emotion recognition method, device, equipment and storage medium
CN117935339A (en) * 2024-03-19 2024-04-26 北京长河数智科技有限责任公司 Micro-expression recognition method based on multi-modal fusion
CN118332505A (en) * 2024-06-12 2024-07-12 临沂大学 Physiological signal data processing method, system and device based on multimodal fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510456A (en) * 2018-03-27 2018-09-07 华南理工大学 The sketch of depth convolutional neural networks based on perception loss simplifies method
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510456A (en) * 2018-03-27 2018-09-07 华南理工大学 The sketch of depth convolutional neural networks based on perception loss simplifies method
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁秋壮等: "基于深度学习神经网络的SAR星上目标识别系统研究", 《上海航天》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297981A (en) * 2021-05-27 2021-08-24 西北工业大学 End-to-end electroencephalogram emotion recognition method based on attention mechanism
CN113297981B (en) * 2021-05-27 2023-04-07 西北工业大学 End-to-end electroencephalogram emotion recognition method based on attention mechanism
CN113326781B (en) * 2021-05-31 2022-09-02 合肥工业大学 Non-contact anxiety recognition method and device based on face video
CN113326781A (en) * 2021-05-31 2021-08-31 合肥工业大学 Non-contact anxiety recognition method and device based on face video
CN113269173A (en) * 2021-07-20 2021-08-17 佛山市墨纳森智能科技有限公司 Method and device for establishing emotion recognition model and recognizing human emotion
CN113749656B (en) * 2021-08-20 2023-12-26 杭州回车电子科技有限公司 Emotion recognition method and device based on multidimensional physiological signals
CN113749656A (en) * 2021-08-20 2021-12-07 杭州回车电子科技有限公司 Emotion identification method and device based on multi-dimensional physiological signals
CN113616209A (en) * 2021-08-25 2021-11-09 西南石油大学 Identification method of schizophrenia patients based on spatiotemporal attention mechanism
CN113616209B (en) * 2021-08-25 2023-08-04 西南石油大学 Method for screening schizophrenic patients based on space-time attention mechanism
CN113729710A (en) * 2021-09-26 2021-12-03 华南师范大学 Real-time attention assessment method and system integrating multiple physiological modes
CN114091599A (en) * 2021-11-16 2022-02-25 上海交通大学 Method for recognizing emotion of intensive interaction deep neural network among modalities
CN114298189A (en) * 2021-12-20 2022-04-08 深圳市海清视讯科技有限公司 Fatigue driving detection method, device, equipment and storage medium
CN114767130A (en) * 2022-04-26 2022-07-22 郑州大学 Multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging
CN114882330A (en) * 2022-04-29 2022-08-09 合肥工业大学 Road and bridge engineering worker oriented psychological state monitoring method and device
CN114947852B (en) * 2022-06-14 2023-01-10 华南师范大学 A multi-modal emotion recognition method, device, equipment and storage medium
CN114947852A (en) * 2022-06-14 2022-08-30 华南师范大学 Multi-mode emotion recognition method, device, equipment and storage medium
CN117935339A (en) * 2024-03-19 2024-04-26 北京长河数智科技有限责任公司 Micro-expression recognition method based on multi-modal fusion
CN117935339B (en) * 2024-03-19 2025-03-25 北京长河数智科技有限责任公司 A micro-expression recognition method based on multimodal fusion
CN118332505A (en) * 2024-06-12 2024-07-12 临沂大学 Physiological signal data processing method, system and device based on multimodal fusion
CN118332505B (en) * 2024-06-12 2024-08-20 临沂大学 Physiological signal data processing method, system and device based on multi-mode fusion

Also Published As

Publication number Publication date
CN112800998B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN112800998B (en) Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA
Abdullah et al. Multimodal emotion recognition using deep learning
CN108805087B (en) Time sequence semantic fusion association judgment subsystem based on multi-modal emotion recognition system
CN108877801B (en) Multi-turn dialogue semantic understanding subsystem based on multi-modal emotion recognition system
CN108899050B (en) Voice signal analysis subsystem based on multi-modal emotion recognition system
CN108805089B (en) Multi-modal-based emotion recognition method
CN108805088B (en) Physiological signal analysis subsystem based on multi-modal emotion recognition system
CN111553295B (en) Multi-modal emotion recognition method based on self-attention mechanism
Sharma et al. A survey on automatic multimodal emotion recognition in the wild
Chen et al. Smg: A micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis
Bu Human motion gesture recognition algorithm in video based on convolutional neural features of training images
CN108776788A (en) A kind of recognition methods based on brain wave
Kächele et al. Inferring depression and affect from application dependent meta knowledge
CN111210846A (en) Parkinson's speech recognition system based on integrated manifold dimension reduction
Yang et al. Emotion Recognition of EMG Based on Improved LM BP Neural Network and SVM.
Jinliang et al. EEG emotion recognition based on granger causality and capsnet neural network
Shen et al. A high-precision feature extraction network of fatigue speech from air traffic controller radiotelephony based on improved deep learning
Chen et al. Patient emotion recognition in human computer interaction system based on machine learning method and interactive design theory
CN117609863A (en) Long-term EEG emotion recognition method based on EEG microstates
Du et al. A novel emotion-aware method based on the fusion of textual description of speech, body movements, and facial expressions
Zhao et al. Multiscale global prompt transformer for EEG-based driver fatigue recognition
CN112998652B (en) A method and system for identifying photoplethysmographic pressure
Schuller Multimodal user state and trait recognition: An overview
CN117608402B (en) A hidden Chinese language processing system and method based on Chinese character writing imagination
Tang et al. Eye movement prediction based on adaptive BP neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant