Abstract
With the aging of society, there has been an increasing amount of research on elderly-assisted companion robots. However, many existing methods used in research insufficiently consider the physiological characteristics of the elderly or rely on a single mode of interaction, leading to inaccurate understanding of elderly individuals’ intents. In this paper, we design a multimodal intent understanding and interaction system for elderly-assisted companionship. The system presents the following main innovations: (1) Proposing a semantic-based multimodal fusion algorithm (MSFA) to integrate the semantic layers of gesture and speech, addressing the heterogeneity and asynchrony issues between the two modalities. (2) Assisting elderly individuals in completing daily tasks through the human–computer cooperative interaction control algorithm (HCC). Experimental results demonstrate that the proposed multimodal fusion algorithm achieves effective intent recognition and combines natural human–machine interaction with intent understanding. This not only accurately captures users’ interaction intents and assists in completing interactive tasks but also reduces users’ mental and cognitive load, achieving a more desirable interaction effect. Additionally, the subjective evaluation analysis by users further verifies the effectiveness of the system.
Similar content being viewed by others
References
Aaltonen, I., Arbola, A., Heikkil, P., et al.: Hello Pepper, may I tickle you? children’s and adults’ responses to an entertainment robot at a shopping mall. In: ACM Philadelphia (2017)
Berns, K., Mehdi, S. A.: Use of an autonomous mobile robot for elderly care. In: 2010 Advanced Technologies for Enhancing Quality of Life. Ieee, pp 121–126 (2010)
Cacace, J., Finzi, A., Lippiello, V.: A robust multimodal fusion framework for command interpretation in human-robot cooperation. In: 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE
Di Nuovo, A., Broz, F., Wang, N., et al.: The multi-modal interface of Robot-Era multi-robot services tailored for the elderly. Intel. Serv. Robot. 11, 109–126 (2018)
Do, H.M., Pham, M., Sheng, W., et al.: RiSH: a robot-integrated smart home for elderly care. Robot. Auton. Syst. 101(1), 74–92 (2018)
Han, J. G., Campbell, N., Jokinen, K., et al.: Investigating the use of non-verbal cues in human–robot interaction with a Nao robot. In: IEEE, pp 679–683 (2012)
Hatori, J., Kikuchi, Y., Kobayashi, S., et al.: Interactively picking real-world objects with unconstrained spoken language instructions. In: ICRA Brisbane, (2018)
Islam, M. M., Iqbal, T.: Hamlet: a hierarchical multimodal attention-based human activity recognition algorithm. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 10285–10292 (2020)
Iwata, H., Sugano, S.: Design of human symbiotic robot TWENDY-ONE. In: IEEE Kobe, (2009)
Jose, K, J., Lakshmi, K. S.: Joint slot filling and intent prediction for natural language understanding in frames dataset. In: ICIRCA Coimbatore (2018)
Kim, J. H., Thang, N. D., Kim, T. S.: 3-D hand motion tracking and gesture recognition using a data glove. In: 2009 IEEE international symposium on industrial electronics, pp 1013–101 (2009)
Koceski, S., Koceska, N.: Evaluation of an assistive telepresence robot for elderly healthcare. J. Med. Syst. 40(5), 1–7 (2016)
Lafaye, J., Gouaillier, D., Wieber, P. B.: Linear model predictive control of the locomotion of Pepper, a humanoid robot with omnidirectional wheels. In: IEEE (2014)
Li, J., Feng, Z. Q., Xie, W., et al.: A method of gesture recognition using CNN-SVM model with error correction strategy. In: 2018 International conference on computer, communication and network technology (CCNT 2018) ISBN, pp 978-1 (2018)
Maeshima, S., Osawa, A., Nishio, D., et al.: Efficacy of a hybrid assistive limb in post-stroke hemiplegic patients: a preliminary report. BMC Neurol. 11(1), 1–6 (2011)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, pp 746–751 (2013)
Parlitz, C., Hagele, M., Klein, P., et al.: Care-O-bot 3-rationale for human-robot interaction design. In: Seul:ISR, (2008)
Rane, P., Mhatre, V., Kurup, L.: Study of a home robot: JIBO. Int. J. Eng. Res. Technol. (IJERT) 3(10), 490–493 (2014)
Rosa, S., Patane, A., Lu, C.X., et al.: Semantic place understanding for human–robot coexistence—toward intelligent workplaces. IEEE Trans. Hum.-Mach. Syst. 49(2), 160–170 (2018)
Seppälä, M.: A secure and conflict free control platform for Care-O-Bot 4. 2018(1): 77–84 (2018)
Shanthakumar, V.A., Peng, C., Hansberger, J., et al.: Design and evaluation of a hand gesture recognition approach for real-time interactions. Multimed. Tools Appl. 79(25), 17707–17730 (2020)
Sindagi, V. A., Zhou, Y., Tuzel, O.: Mvx-net: multimodal voxelnet for 3d object detection. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp 7276–7282 (2019)
Variani, E., Lei, X., McDermott, E., et al.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4052–4056 (2014)
Wang, Q., Lan, Z.: The primary research of control system on companion robot for the elderly. In: 2016 International Conference on Advanced Robotics and Mechatronics (ICARM). IEEE, pp 38–41 (2016)
Zhang, J., Yin, Z., Chen, P., et al.: Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inform. Fusion 59(1), 103–126 (2020a)
Zhang, X., Feng, Z., Tian, J., et al.: Multimodal data fusion algorithm applied to robots. J. Phys.: Conf. Ser. 1453(1), 012040 (2020b)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Feng, Z. & Wang, H. Multimodal intent understanding and interaction system for elderly-assisted companionship. CCF Trans. Pervasive Comp. Interact. 6, 52–67 (2024). https://doi.org/10.1007/s42486-023-00137-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42486-023-00137-6