Iterative perceptual learning for social behavior synthesis

Iwan de Kok¹,
Ronald Poppe¹ &
Dirk Heylen¹

188 Accesses
1 Citation
Explore all metrics

Abstract

We introduce Iterative Perceptual Learning (IPL), a novel approach to learn computational models for social behavior synthesis from corpora of human–human interactions. IPL combines perceptual evaluation with iterative model refinement. Human observers rate the appropriateness of synthesized behaviors in the context of a conversation. These ratings are used to refine the machine learning models that predict the social signal timings. As the ratings correspond to those moments in the conversation where the production of a specific behavior is inappropriate, we regard features extracted at these moments as negative samples for the training of a classifier. This is an advantage over the traditional corpus-based approach to extract negative samples at random non-positive moments. We perform a comparison between IPL and the traditional corpus-based approach on the timing of backchannels for a listener in speaker–listener dialogs. While both models perform similarly in terms of precision and recall scores, there is a tendency that the backchannels generated with IPL are rated as more appropriate. We additionally investigate the effect of the amount of available training data and the variation of training data on the outcome of the models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Comprehensive Sociometrics for Two-Person Dialogs

Human-Inspired Socially-Aware Interfaces

Controlling the Listener Response Rate of Virtual Agents

References

Bavelas JB, Coates L, Johnson T (2002) Listener responses as a collaborative process: the role of gaze. J Commun 52(3):566–580
Article Google Scholar
Bertrand R, Ferré G, Blache P, Espesser R, Rauzy S (2007) Backchannels revisited from a multimodal perspective. In: Proceedings of auditory-visual speech processing. Hilvarenbeek, The Netherlands, pp 1–5
Bevacqua E, Hyniewska S, Pelachaud C (2010) Positive influence of smile backchannels in ECAs. In: Proceedings of the workshop on interacting with ECAs as virtual characters at the international joint conference on autonomous agents and multi-agent systems (AAMAS), Toronto, Canada
Cathcart N, Carletta J, Klein E (2003) A shallow model of backchannel continuers in spoken dialogue. In: Proceedings of the conference of the European chapter of the Association for Computational Linguistics, vol 1, Budapest, Hungary, pp 51–58
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar
Clark HH (1996) Using language. Cambridge University Press, Cambridge
Book Google Scholar
Dittmann AT, Llewellyn LG (1967) The phonemic clause as a unit of speech decoding. J Personal Social Psychol 6(3):341–349
Article Google Scholar
Duncan S Jr (1972) Some signals and rules for taking speaking turns in conversations. J Personal Social Psychol 23(2):283–292
Article Google Scholar
Eyben F, Wöllmer M, Schuller B (2009) OpenEAR–introducing the Munich open-source emotion and affect recognition toolkit. In: Affective computing and intelligent interaction (ACII). Amsterdam, The Netherlands, pp 576–581
Gravano A, Hirschberg J (2009) Backchannel-inviting cues in task-oriented dialogue. In: Proceedings of interspeech. Brighton, UK, pp 1019–1022
Heylen D, Bevacqua E, Pelachaud C, Poggi I, Gratch J, Schröder M (2011) Generating listening behaviour. In: Cowie R, Pelachaud C, Petta P (eds) Emotion-oriented systems. Springer, Berlin, pp 321–347
Chapter Google Scholar
Huang L, Morency LP, Gratch J (2010) Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior. In: Proceedings of autonomous agents and multi-agent systems (AAMAS), Toronto, Canada, pp 1265–1272
Huang L, Morency LP, Gratch J (2011) Virtual rapport 2.0. In: Proceedings of intelligent virtual agents (IVA). Reykjavík, Iceland, pp 68–79
Huijbregts M (2008) Segmentation, diarization and speech transcription: surprise data unraveled. Phd thesis, University of Twente
Kendon A (1967) Some functions of gaze direction in social interaction. Acta Psychol 26(1):22–63
Article Google Scholar
de Kok I, Heylen D (2011) Appropriate and inappropriate timing of listener responses from multiple perspectives. In: Proceedings of intelligent virtual agents (IVA). Reykjavík, Iceland, pp 248–254
de Kok I, Heylen D (2011) The MultiLis corpus–dealing with individual differences of nonverbal listening behavior. In: Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues. Springer, Berlin, pp 374–387
de Kok I, Ozkan D, Heylen D, Morency LP (2010) Learning and evaluating response prediction models using parallel listener consensus. In: Proceeding of international conference on multimodal interfaces and the workshop on machine learning for multimodal interaction (ICMI-MLMI). Beijing, China, p 3
Lee J, Neviarouskaya A, Prendinger H, Marsella S (2009) Learning models of speaker head nods with affective information. In: Proceedings of intelligent virtual agents (IVA). Amsterdam, The Netherlands, pp 1–6
Martin JC, Paggio P, Kuehnlein P, Stiefelhagen R, Pianesi F (2008) Introduction to the special issue on multimodal corpora for modeling human multimodal behavior. Lang Resour Eval 42(2):253–264
Article Google Scholar
Morency LP, de Kok I, Gratch J (2010) A probabilistic multimodal approach for predicting listener backchannels. Auton Agents Multi-Agent Syst 20(1):80–84
Article Google Scholar
Nishimura R, Kitaoka N, Nakagawa S (2007), A spoken dialog system for chat-like conversations considering response timing. In: Proceedings of text, speech and dialogue (TSD). Plzen, Czech Republic, pp 599–606
Okato Y, Kato K, Yamamoto M, Itahashi S (1996) Insertion of interjectory response based on prosodic information. In: Proceedings of the IEEE workshop on interactive voice technology for telecommunication applications. Basking Ridge, NJ, pp 85–88
Peters C, Pelachaud C, Bevacqua E, Mancini M, Poggi I (2005) A model of attention and interest using gaze behavior. In: Proceedings of intelligent virtual agents (IVA). Kos, Greece, pp 229–240
Poppe R, ter Maat M, Heylen D (2012) Online behavior evaluation with the Switching Wizard of Oz. In: Proceedings of intelligent virtual agents (IVA), Santa Cruz, CA, pp 486–488
Poppe R, Truong KP, Heylen D (2011) Backchannels: quantity, type and timing matters. In: Proceedings of intelligent virtual agents (IVA), pp. 228–239. Reykjavík, Iceland
Poppe R, Truong KP, Reidsma D, Heylen D (2010) Backchannel strategies for artificial listeners. In: Proceedings of intelligent virtual agents (IVA), pp 146–158
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
Google Scholar
Traum D, DeVault D, Lee J, Wang Z, Marsella S (2012) Incremental dialogue understanding and feedback for multiparty, multimodal conversation. In: Proceedings of intelligent virtual agents (IVA), Santa Cruz, CA, pp 275–288
Truong KP, Poppe R, Kok I, Heylen D (2011) A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. In: Proceedings of interspeech, Florence, Italy, pp 2973–2976
Vinciarelli A, Pantic M, Heylen D, Pelachaud C, Poggi I, D’Errico F, Schröder M (2012) Bridging the gap between social animal and unsocial machine: a survey of social signal processing. IEEE Trans Affect Comput 3(1):69–87
Article Google Scholar
Ward N, Tsukahara W (2000) Prosodic features which cue back-channel responses in English and Japanese. J Pragmat 32(8):1177–1207
Article Google Scholar
van Welbergen H, Reidsma D, Ruttkay ZM, Zwiers J (2010) Elckerlyc—a BML realizer for continuous, multimodal interaction with a virtual human. J Multimod User Interf 3(4):271–284
Article Google Scholar
Xudong D (2009) Listener response. In: The pragmatics of interaction. John Benjamins Publishing, Amsterdam, pp 104–124
Yngve VH (1970) On getting a word in edgewise. In: Sixth regional meeting of the Chicago Linguistic Society, vol 6, pp 657–677

Download references

Author information

Authors and Affiliations

Human Media Interaction Group, University of Twente, P.O. Box 217, 7500 AE , Enschede, The Netherlands
Iwan de Kok, Ronald Poppe & Dirk Heylen

Authors

Iwan de Kok
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Poppe
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Heylen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronald Poppe.

Additional information

This publication was supported by the Dutch national program COMMIT and the EU FP7 project SSPNet. We would like to thank the anonymous reviewers for their constructive feedback, which helped us to improve the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Kok, I., Poppe, R. & Heylen, D. Iterative perceptual learning for social behavior synthesis. J Multimodal User Interfaces 8, 231–241 (2014). https://doi.org/10.1007/s12193-013-0132-1

Download citation

Received: 13 July 2012
Accepted: 05 November 2013
Published: 16 November 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s12193-013-0132-1

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Comprehensive Sociometrics for Two-Person Dialogs

Human-Inspired Socially-Aware Interfaces

Controlling the Listener Response Rate of Virtual Agents

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Iterative perceptual learning for social behavior synthesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Comprehensive Sociometrics for Two-Person Dialogs

Human-Inspired Socially-Aware Interfaces

Controlling the Listener Response Rate of Virtual Agents

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation