[go: up one dir, main page]

Skip to main content

Generating Fillers Based on Dialog Act Pairs for Smooth Turn-Taking by Humanoid Robot

  • Conference paper
  • First Online:
9th International Workshop on Spoken Dialogue System Technology

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 579))

Abstract

In spoken dialog systems for humanoid robots, smooth turn-taking function is one of the most important factors to realize natural interaction with users.   Speech collisions often occur when a user and the dialog system speak simultaneously.   This study presents a method to generate fillers at the beginning of the system utterances to indicate an intention of turn-taking or turn-holding just like human conversations.   To this end, we analyzed the relationship between a dialog context and fillers observed in a human-robot interaction corpus, where a user talks with a humanoid robot remotely operated by a human.   At first, we annotated dialog act tags in the dialog corpus and analyzed the typical type of a sequential pair of dialog acts, called a DA pair.   It is found that the typical filler forms and their occurrence patterns are different according to the DA pairs. Then, we build a machine learning model to predict occurrence of fillers and its appropriate form from linguistic and prosodic features extracted from the preceding and the following utterances. The experimental results show that the effective feature set also depends on the type of DA pair.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akita Y, Kawahara T (2010) Statistical transformation of language and pronunciation models for spontaneous speech recognition. IEEE Trans Audio Speech Lang Process 18(6):1539–1549

    Article  Google Scholar 

  2. Andersson S, Georgila K, Traum D, Aylett M, Clark R (2010) Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In: Proceedings of the Speech Prosody

    Google Scholar 

  3. Bunt H, Alexandersson J, Carletta J, Chae JW, Fang AC, Hasida K, Lee K, Petukhova O, Popescu-Belis A, Romary L et al (2010) Towards an ISO standard for dialogue act annotation. In: proceedings of the LREC 2010, Malta

    Google Scholar 

  4. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  5. Den Y (2015) Some phonological, syntactic, and cognitive factors behind phrase-final lengthening in spontaneous Japanese: a corpus-based study. Lab Phonol 6(3–4):337–379

    Google Scholar 

  6. Den Y, Koiso H, Maruyama T, Maekawa K, Takanashi K, Enomoto M, Yoshida N (2010) Two-level annotation of utterance-units in Japanese dialogs: an empirically emerged scheme. In: LREC

    Google Scholar 

  7. Inoue K, Milhorat P, Lala D, Zhao T, Kawahara T (2016) Talking with ERICA, an autonomous android. In: Proceedings of the SIGdial meeting discourse and dialogue, pp 212–215

    Google Scholar 

  8. Itagaki H, Morise M, Nisimura R, Irino T, Kawahara H (2009) A bottom-up procedure to extract periodicity structure of voiced sounds and its application to represent and restoration of pathological voices. In: MAVEBA, pp 115–118

    Google Scholar 

  9. Kawahara H, Masuda-Katsuse I, De Cheveigne A (1999) Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun 27(3):187–207

    Article  Google Scholar 

  10. Kawahara H, Morise M, Takahashi T, Nisimura R, Irino T, Banno H (2008) Tandem-straight: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE, pp 3933–3936

    Google Scholar 

  11. Kawahara T, Yamaguchi T, Inoue K, Takanashi K, Ward N (2016) Prediction and generation of backchannel form for attentive listening systems. In: Proceedings of the INTERSPEECH, vol 2016

    Google Scholar 

  12. Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Lang Speech 41(3–4):295–321

    Article  Google Scholar 

  13. Koiso H, Nishikawa K, Mabuchi Y (2006) Construction of the corpus of spontaneous Japanese

    Google Scholar 

  14. Lala D, Milhorat P, Inoue K, Ishida M, Takanashi K, Kawahara T (2017) Attentive listening system with backchanneling, response generation and flexible turn-taking. In: Proceedings of the SIGdial meeting discourse and dialogue, pp 127–136

    Google Scholar 

  15. Milhorat P, Lala D, Inoue K, Tianyu Z, Ishida M, Takanashi K, Nakamura S, Kawahara T (2017) A conversational dialogue manager for the humanoid robot ERICA. In: Proceedings of the international workshop spoken dialogue systems (IWSDS) (2017)

    Google Scholar 

  16. Ohsuga T, Horiuchi Y, Nishida M, Ichikawa A (2006) Prediction of turn-taking from prosody in spontaneous dialogue. Trans Jpn Soc Artif Intell 21:1–8

    Article  Google Scholar 

  17. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  18. Schegloff EA, Sacks H (1973) Opening up closings. Semiotica 8(4):289–327

    Article  Google Scholar 

  19. Shiwa T, Kanda T, Imai M, Ishiguro H, Hagita N (2008) How quickly should communication robots respond? In: 2008 3rd ACM/IEEE international conference on human-robot interaction (HRI). IEEE, pp 153–160

    Google Scholar 

  20. Skantze G, Hjalmarsson A, Oertel C (2014) Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Commun 65:50–66

    Article  Google Scholar 

  21. Sundaram S, Narayanan S (2002) Spoken language synthesis: experiments in synthesis of spontaneous monologues. In: Proceedings of the IEEE workshop on speech synthesis, pp 203–206

    Google Scholar 

  22. Watanabe M (2009) Features and roles of filled pauses in speech communication: a corpus-based study of spontaneous speech. Hitsuji Syobo Publishing

    Google Scholar 

Download references

Acknowledgements

This work was supported by JST ERATO Ishiguro Symbiotic Human-Robot Interaction program (Grant Number JPMJER1401), Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tatsuya Kawahara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nakanishi, R., Inoue, K., Nakamura, S., Takanashi, K., Kawahara, T. (2019). Generating Fillers Based on Dialog Act Pairs for Smooth Turn-Taking by Humanoid Robot. In: D'Haro, L., Banchs, R., Li, H. (eds) 9th International Workshop on Spoken Dialogue System Technology. Lecture Notes in Electrical Engineering, vol 579. Springer, Singapore. https://doi.org/10.1007/978-981-13-9443-0_8

Download citation

Publish with us

Policies and ethics