[go: up one dir, main page]

skip to main content
research-article

A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Published: 04 March 2022 Publication History

Abstract

Considerable attention has been paid to physiological signal-based emotion recognition in the field of affective computing. For reliability and user-friendly acquisition, electrodermal activity (EDA) has a great advantage in practical applications. However, EDA-based emotion recognition with large-scale subjects is still a tough problem. The traditional well-designed classifiers with hand-crafted features produce poorer results because of their limited representation abilities. And the deep learning models with auto feature extraction suffer the overfitting drop-off because of large-scale individual differences. Since music has a strong correlation with human emotion, static music can be involved as the external benchmark to constrain various dynamic EDA signals. In this article, we make an attempt by fusing the subject’s individual EDA features and the external evoked music features. And we propose an end-to-end multimodal framework, the one-dimensional residual temporal and channel attention network (RTCAN-1D). For EDA features, the channel-temporal attention mechanism for EDA-based emotion recognition is first involved in mine the temporal and channel-wise dynamic and steady features. The comparisons with single EDA-based SOTA models on DEAP and AMIGOS datasets prove the effectiveness of RTCAN-1D to mine EDA features. For music features, we simply process the music signal with the open-source toolkit openSMILE to obtain external feature vectors. We conducted systematic and extensive evaluations. The experiments on the current largest music emotion dataset PMEmo validate that the fusion of EDA and music is a reliable and efficient solution for large-scale emotion recognition.

References

[1]
Fadi Al Machot, Ali Elmachot, Mouhannad Ali, Elyan Al Machot, and Kyandoghere Kyamakya. 2019. A deep-learning model for subject-independent human emotion recognition using electrodermal activity sensors. Sensors 19, 7 (2019), 1659.
[2]
David M. Alexander, Chris Trengove, P. Johnston, Tim Cooper, J. P. August, and Evian Gordon. 2005. Separating individual skin conductance responses in a short interstimulus-interval paradigm. Journal of Neuroscience Methods 146, 1 (2005), 116–123.
[3]
Anna Aljanaki, Yi-Hsuan Yang, and Mohammad Soleymani. 2017. Developing a benchmark for emotional analysis of music. PloS One 12, 3 (2017), e0173392.
[4]
A. S. Anusha, S. P. Preejith, Tony J. Akl, Jayaraj Joseph, and Mohanasankar Sivaprakasam. 2018. Dry electrode optimization for wrist-based electrodermal activity monitoring. In Proceedings of the 2018 IEEE International Symposium on Medical Measurements and Applications. IEEE, 1–6.
[5]
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 423–443.
[6]
Judith Becker. 2004. Deep Listeners: Music, Emotion, and Trancing, Vol. 1. Indiana University Press.
[7]
Mathias Benedek and Christian Kaernbach. 2010. A continuous measure of phasic electrodermal activity. Journal of Neuroscience Methods 190, 1 (2010), 80–91.
[8]
Mathias Benedek and Christian Kaernbach. 2010. Decomposition of skin conductance data by means of nonnegative deconvolution. Psychophysiology 47, 4 (2010), 647–658.
[9]
Wolfram Boucsein. 2012. Electrodermal Activity. Springer Science & Business Media.
[10]
Antoni Buades, Bartomeu Coll, and J.-M. Morel. 2005. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, 60–65.
[11]
Paulo Chiliguano and Gyorgy Fazekas. 2016. Hybrid music recommender using content-based and social information. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2618–2622.
[12]
Juan Abdon Miranda Correa, Mojtaba Khomami Abadi, Niculae Sebe, and Ioannis Patras. 2018. Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing 12, 2 (2018), 479–493.
[13]
Antonio R. Damasio. 1994. Descartes’ error: Emotion, reason, and the human brain. Optometry and Vision Science 72, 11 (1994), 847–848.
[14]
Michael E. Dawson, Anne M. Schell, and Diane L. Filion. 2007. The electrodermal system. Handbook of Psychophysiology 2 (2007), 200–223.
[15]
Paul Ekman. 1992. An argument for basic emotions. Cognition & Emotion 6, 3–4 (1992), 169–200.
[16]
Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 3 (2011), 572–587.
[17]
Christos A. Frantzidis, Charalampos Bratsas, Manousos A. Klados, Evdokimos Konstantinidis, Chrysa D. Lithari, Ana B. Vivas, Christos L. Papadelis, Eleni Kaldoudi, Costas Pappas, and Panagiotis D. Bamidis. 2010. On the classification of emotional biosignals evoked while viewing affective pictures: An integrated data-mining-based approach for healthcare applications. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2010), 309–318.
[18]
Nagarajan Ganapathy, Yedukondala Rao Veeranki, and Ramakrishnan Swaminathan. 2020. Convolutional neural network based emotion classification using electrodermal activity signals and time-frequency features. Expert Systems with Applications 159 (2020), 113571.
[19]
Alberto Greco, Gaetano Valenza, Luca Citi, and Enzo Pasquale Scilingo. 2016. Arousal and valence recognition of affective sounds based on electrodermal activity. IEEE Sensors Journal 17, 3 (2016), 716–725.
[20]
Alberto Greco, Gaetano Valenza, Antonio Lanata, Enzo Pasquale Scilingo, and Luca Citi. 2015. cvxEDA: A convex optimization approach to electrodermal activity processing. IEEE Transactions on Biomedical Engineering 63, 4 (2015), 797–804.
[21]
Rui Guo, Shuangjiang Li, Li He, Wei Gao, Hairong Qi, and Gina Owens. 2013. Pervasive and unobtrusive emotion sensing for human mental health. In Proceedings of the 2013 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops. IEEE, 436–439.
[22]
Stephan Hamann. 2012. Mapping discrete and dimensional emotions onto the brain: Controversies and consensus. Trends in Cognitive Sciences 16, 9 (2012), 458–466.
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[24]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[25]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708.
[26]
Carroll E. Izard. 2007. Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspectives on Psychological Science 2, 3 (2007), 260–280.
[27]
Alejandro Jaimes and Nicu Sebe. 2007. Multimodal human–computer interaction: A survey. Computer Vision and Image Understanding 108, 1–2 (2007), 116–134.
[28]
S. Jerritta, M. Murugappan, R. Nagarajan, and Khairunizam Wan. 2011. Physiological signals based human emotion recognition: A review. In Proceedings of the 2011 IEEE 7th International Colloquium on Signal Processing and its Applications. IEEE, 410–415.
[29]
Christos D. Katsis, Nikolaos Katertsidis, George Ganiatsas, and Dimitrios I. Fotiadis. 2008. Toward emotion recognition in car-racing drivers: A biosignal processing approach. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38, 3 (2008), 502–512.
[30]
Malia Kelsey, Murat Akcakaya, Ian R. Kleckner, Richard Vincent Palumbo, Lisa Feldman Barrett, Karen S. Quigley, and Matthew S. Goodwin. 2018. Applications of sparse recovery and dictionary learning to enhance analysis of ambulatory electrodermal activity data. Biomedical Signal Processing and Control 40, 2 (2018), 58–70.
[31]
Jonghwa Kim. 2007. Bimodal emotion recognition using speech and physiological changes. Robust Speech Recognition and Understanding 265 (2007), 280.
[32]
Jonghwa Kim and Elisabeth André. 2008. Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 12 (2008), 2067–2083.
[33]
Yelin Kim and Emily Mower Provost. 2015. Emotion recognition during speech using dynamics of multiple regions of the face. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), 1–23.
[34]
Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2011. Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing 3, 1 (2011), 18–31.
[35]
Sander Koelstra and Ioannis Patras. 2013. Fusion of facial expressions and EEG for implicit affective tagging. Image and Vision Computing 31, 2 (2013), 164–174.
[36]
Peter J. Lang, Mark K. Greenwald, Margaret M. Bradley, and Alfons O. Hamm. 1993. Looking at pictures: Affective, facial, visceral, and behavioral reactions. Psychophysiology 30, 3 (1993), 261–273.
[37]
Peter Langhorne, Julie Bernhardt, and Gert Kwakkel. 2011. Stroke rehabilitation. The Lancet 377, 9778 (2011), 1693–1702.
[38]
Steve Lawrence and C. Lee Giles. 2000. Overfitting and neural networks: Conjugate gradient and backpropagation. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Vol. 1. IEEE, 114–119.
[39]
Steve Lawrence, C. Lee Giles, and Ah Chung Tsoi. 1997. Lessons in neural network training: Overfitting may be harder than expected. In Proceedings of the AAAI/IAAI. Citeseer, 540–545.
[40]
Wenqian Lin, Chao Li, and Shouqian Sun. 2017. Deep convolutional neural network for emotion recognition using EEG and peripheral physiological signal. In Proceedings of the International Conference on Image and Graphics. Springer, 385–394.
[41]
Yu-Ching Lin, Yi-Hsuan Yang, and Homer H. Chen. 2011. Exploiting online music tags for music emotion classification. ACM Transactions on Multimedia Computing, Communications, and Applications 7, 1 (2011), 1–16.
[42]
Jiamin Liu, Yuanqi Su, and Yuehu Liu. 2017. Multi-modal emotion recognition with temporal-band attention based on lstm-rnn. In Proceedings of the Pacific Rim Conference on Multimedia. Springer, 194–204.
[43]
Aditya Shekhar Nittala, Arshad Khan, Klaus Kruttwig, Tobias Kraus, and Jürgen Steimle. 2020. PhysioSkin: Rapid fabrication of skin-conformal physiological interfaces. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–10.
[44]
Wenjie Pei, Tadas Baltrusaitis, David M. J. Tax, and Louis-Philippe Morency. 2017. Temporal attention-gated model for robust sequence classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6730–6739.
[45]
Rosalind W. Picard. 2000. Affective Computing. MIT press.
[46]
Rosalind W. Picard, Elias Vyzas, and Jennifer Healey. 2001. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Transactions on Pattern Analysis & Machine Intelligence23, 10 (2001), 1175–1191.
[47]
Robert Plutchik. 1982. A psychoevolutionary theory of emotions. Social Science Information 21 (1982), 529–553.
[48]
James A. Russell. 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39, 6 (1980), 1161.
[49]
Akane Sano and Rosalind W. Picard. 2013. Stress recognition using wearable sensors and mobile phones. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. IEEE, 671–676.
[50]
Luz Santamaria-Granados, Mario Munoz-Organero, Gustavo Ramirez-Gonzalez, Enas Abdulhay, and N. J. I. A. Arunkumar. 2018. Using deep convolutional neural network for emotion detection on a physiological signals dataset. IEEE Access 7 (2018), 57–67.
[51]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.
[52]
Vivek Sharma, Neelam R. Prakash, and Parveen Kalra. 2019. Audio-video emotional response mapping based upon electrodermal activity. Biomedical Signal Processing and Control 47 (2019), 324–333.
[53]
Jainendra Shukla, Miguel Barreda-Angeles, Joan Oliver, G. C. Nandi, and Domenec Puig. 2019. Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Transactions on Affective Computing 12, 4 (2019), 857–869.
[54]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In the 3rd International Conference on Learning Representations (ICLR’15).
[55]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929–1958.
[56]
Nattapong Thammasan, Ken-ichi Fukui, and Masayuki Numao. 2017. Multimodal fusion of eeg and musical features in music-emotion recognition. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.
[57]
Cristian A. Torres, Álvaro A. Orozco, and Mauricio A. Álvarez. 2013. Feature selection for multimodal emotion recognition in the arousal-valence space. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 4330–4333.
[58]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998–6008.
[59]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.
[60]
Wilhelm Max Wundt. 1921. Vorlesungen über die menschen- und tierseele. American Journal of Psychology 32, 1 (1921), 151.
[61]
Ning Xiong and Per Svensson. 2002. Multi-sensor management for information fusion: Issues and approaches. Information Fusion 3, 2 (2002), 163–186.
[62]
Guanghao Yin, Shouqian Sun, Hui Zhang, Dian Yu, Chao Li, Kejun Zhang, and Ning Zou. 2019. User independent emotion recognition with residual signal-image network. In Proceedings of the 2019 IEEE International Conference on Image Processing. IEEE, 3277–3281.
[63]
Kejun Zhang, Hui Zhang, Simeng Li, Changyuan Yang, and Lingyun Sun. 2018. The PMEmo dataset for music emotion recognition. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. ACM, 135–142.

Cited By

View all
  • (2024)A Domain Generalization and Residual Network-Based Emotion Recognition from Physiological SignalsCyborg and Bionic Systems10.34133/cbsystems.00745Online publication date: 5-Feb-2024
  • (2024)FFA-BiGRU: Attention-Based Spatial-Temporal Feature Extraction Model for Music Emotion ClassificationApplied Sciences10.3390/app1416686614:16(6866)Online publication date: 6-Aug-2024
  • (2024)From CNNs to Transformers in Multimodal Human Action Recognition: A SurveyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366481520:8(1-24)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 3
August 2022
478 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3505208
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2022
Accepted: 01 October 2021
Revised: 01 September 2021
Received: 01 January 2021
Published in TOMM Volume 18, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multimodal fusion
  2. large-scale emotion recognition
  3. attention mechanism

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)355
  • Downloads (Last 6 weeks)32
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Domain Generalization and Residual Network-Based Emotion Recognition from Physiological SignalsCyborg and Bionic Systems10.34133/cbsystems.00745Online publication date: 5-Feb-2024
  • (2024)FFA-BiGRU: Attention-Based Spatial-Temporal Feature Extraction Model for Music Emotion ClassificationApplied Sciences10.3390/app1416686614:16(6866)Online publication date: 6-Aug-2024
  • (2024)From CNNs to Transformers in Multimodal Human Action Recognition: A SurveyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366481520:8(1-24)Online publication date: 13-May-2024
  • (2024)A novel physiological signal denoising method coupled with multispectral adaptive wavelet denoising(MAWD) and unsupervised source counting algorithm(USCA)Journal of Engineering Research10.1016/j.jer.2023.07.01612:2(175-189)Online publication date: Jun-2024
  • (2024)Interfering implicit attitudes of adopting recycled products from construction wastesJournal of Cleaner Production10.1016/j.jclepro.2024.142775464(142775)Online publication date: Jul-2024
  • (2024)Multimodal Emotion Recognition with Deep Learning: Advancements, challenges, and future directionsInformation Fusion10.1016/j.inffus.2023.102218105(102218)Online publication date: May-2024
  • (2024)Machine learning for human emotion recognition: a comprehensive reviewNeural Computing and Applications10.1007/s00521-024-09426-236:16(8901-8947)Online publication date: 20-Feb-2024
  • (2023)Enhancement of Human Feeling via AI-based BCI: A SurveyHighlights in Science, Engineering and Technology10.54097/hset.v36i.574836(633-637)Online publication date: 21-Mar-2023
  • (2023)Review of Studies on Emotion Recognition and Judgment Based on Physiological SignalsApplied Sciences10.3390/app1304257313:4(2573)Online publication date: 16-Feb-2023
  • (2023)Recognizing emotions induced by wearable haptic vibration using noninvasive electroencephalogramFrontiers in Neuroscience10.3389/fnins.2023.121955317Online publication date: 6-Jul-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media