[go: up one dir, main page]

Skip to main content

Predicting Request Success with Objective Features in German Multimodal Speech Assistants

  • Conference paper
  • First Online:
Artificial Intelligence in HCI (HCII 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13336))

Included in the following conference series:

  • 2962 Accesses

Abstract

We investigate whether objective features, like occurrence of an error and number of turns, can automatically predict success in interactions with multimodal speech assistants. We used interactions from the SmartKom corpus, a data set on multimodal interactions with virtual assistants in German. In a first step, we segmented the interactions into requests and labeled them as successful or unsuccessful. Afterwards, we defined task success as the average of request success rate. Next, we investigated whether subjective features such as emotions expressed by users show a relation to task success. We find no significant correlation. Finally, we exploited objective features, e.g., number of turns to predict request success. We find that objective features suffice to reach \(F_1\) scores over 0.9 (prediction of successful requests) and \(F_0\) scores above 0.83 (prediction of unsuccessful requests). Finally, we discuss implications of our findings for automatic evaluation of pragmatic aspects of user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Tractica. Anzahl der Nutzer virtueller digitaler Assistenten weltweit in den Jahren von 2015 bis 2021 (in Millionen). Statista (2016)

    Google Scholar 

  2. Porter, J., Pino, N., Leger, H.: Amazon Echo vs Apple HomePod vs Google Home: the battle of the smart speakers (2019). https://www.techradar.com/news/amazon-echo-vs-homepod-vs-google-home-the-battle-of-the-smart-speakers. Accessed 29 Nov 2021

  3. Gebhart, A., Price, M.: The best smart speakers for 2019 (2019). https://www.cnet.com/news/best-smart-speakers-for-2019-amazon-echo-dot-google-nest-mini-assistant-alexa/. Accessed 29 Nov 2021

  4. Van Camp, J.: The 8 best smart speakers with Alexa and Google Assistant (2019). https://www.wired.com/story/best-smart-speakers/. Accessed 29 Nov 2021

  5. Hassenzahl, M.: The hedonic/pragmatic model of user experience. Towards a UX manifesto (2007)

    Google Scholar 

  6. Minge, M., Thüring, M.: Hedonic and pragmatic halo effects at early stages of user experience. Int. J. Hum.-Comput. Stud. 109, 13–25 (2018). ISSN 1071–5819. https://doi.org/10.1016/j.ijhcs.2017.07.007

  7. Merčun, T., Žumer, M.: Exploring the influences on pragmatic and hedonic aspects of user experience. Inf. Res. 22(1) (2017)

    Google Scholar 

  8. Gao, J., Galley, M., Li, L.: Neural approaches to conversational AI. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, pp. 1371–1374. Association for Computing Machinery, New York (2018). ISBN 9781450356572, https://doi.org/10.1145/3209978.3210183

  9. Brüggemeier, B., Breiter, M., Kurz, M., Schiwy, J.: User experience of alexa, siri and google assistant when controlling music – comparison of four questionnaires. In: Stephanidis, C., Marcus, A., Rosenzweig, E., Rau, P.-L.P., Moallem, A., Rauterberg, M. (eds.) HCII 2020. LNCS, vol. 12423, pp. 600–618. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60114-0_40

    Chapter  Google Scholar 

  10. Kurz, M., Brüggemeier, B., Breiter, M.: Success is not final; failure is not fatal – task success and user experience in interactions with alexa, google assistant and siri. In: Kurosu, M. (ed.) HCII 2021. LNCS, vol. 12764, pp. 351–369. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78468-3_24

    Chapter  Google Scholar 

  11. Lewis, J., Sauro, J.: Can i leave this one out? the effect of dropping an item from the sus. J. Usabil. Stud. 13(1), 38–46 (2017)

    Google Scholar 

  12. Kocabalil, A., Laranjo, L., Coiera, E.: Measuring user experience in conversational interfaces: a comparison of six questionnaires. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference, vol. 32, pp. 1–12 (2018)

    Google Scholar 

  13. Deriu, J., Rodrigo, A., Otegi, A., Echegoyen, G., Rosset, S., Agirre, E., Cieliebak, M.: Survey on evaluation methods for dialogue systems. Artif. Intell. Rev. 54(1), 755–810 (2020). https://doi.org/10.1007/s10462-020-09866-x

    Article  Google Scholar 

  14. McLafferty, S.: Conducting questionnaire surveys. Key Methods Geogr. 1, 87–100 (2003)

    Google Scholar 

  15. Fedotov, D., Matsuda, Y., Minker, W.: From smart to personal environment: Integrating emotion recognition into smart houses. In: IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 943–948 (2019). https://doi.org/10.1109/PERCOMW.2019.8730876

  16. Shamekhi, A., Czerwinski, M., Mark, G., Novotny, M., Bennett, G.A.: An exploratory study toward the preferred conversational style for compatible virtual agents. In: Traum, D., Swartout, W., Khooshabeh, P., Kopp, S., Scherer, S., Leuski, A. (eds.) IVA 2016. LNCS (LNAI), vol. 10011, pp. 40–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47665-0_4

    Chapter  Google Scholar 

  17. Cassell, J., Thórisson, K.: The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Appl. Artif. Intell. 13, 519–538 (1999). https://doi.org/10.1080/088395199117360

    Article  Google Scholar 

  18. Hamacher, A., Bianchi-Berthouze, N., Pipe, A.G., Eder, K.: Believing in bert: using expressive communication to enhance trust and counteract operational error in physical human-robot interaction. In: 25th IEEE International Symposium on Robot and Human Interactive Communication, pp. 493–500 (2016)

    Google Scholar 

  19. Bickmore, T., Cassell, J.: Social Dialongue with Embodied Conversational Agents, pp. 23–54. Springer, Dordrecht (2005). https://doi.org/10.1007/1-4020-3933-6_2

    Book  Google Scholar 

  20. Schuller, B., et al.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)

    Article  Google Scholar 

  21. Lim, N.: Cultural differences in emotion: differences in emotional arousal level between the east and the west. Integr. Med. Res. 5(2), 105–109 (2016)

    Article  Google Scholar 

  22. Schiel, F.: Evaluation of Multimodal Dialogue Systems, pp. 617–643. Springer, Heidelberg (2006)

    Google Scholar 

  23. Hassenzahl, M., Platz, A., Burmester, M., Lehner, K.: Hedonic and ergonomic quality aspect determine a software’s appeal. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 201–208 (2000)

    Google Scholar 

  24. Wahlster, W. (ed.): SmartKom: Foundations of Multimodal Dialogue Systems. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-36678-4

  25. Beringer, N.: SmartKom - Datensammlung und Evaluation. Technical Report Memo Nr. 2, Ludwig-Maximilians-Universität München (2000). https://www.phonetik.uni-muenchen.de/Forschung/SmartKom/Memo-NR-02.ps

  26. Kleinbaum, D.G., Klein, M.: Logistic Regression. SBH, Springer, New York (2010). https://doi.org/10.1007/978-1-4419-1742-3

    Book  MATH  Google Scholar 

  27. Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)

    Google Scholar 

  28. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). ISSN 1573–0565, https://doi.org/10.1023/A:1010933404324

  29. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1007/bf00058655

    Article  MATH  Google Scholar 

  30. Louppe, G.: Understanding Random Forests: From Theory to Practice. PhD thesis, University of Liège (2014)

    Google Scholar 

  31. Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group (1984)

    Google Scholar 

  32. Schapire, R., Freund, Y., Bartlett, P., Lee, W.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26(5), 1651–1686 (1998). https://doi.org/10.1214/aos/1024691352

    Article  MathSciNet  MATH  Google Scholar 

  33. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59119-2_166

    Chapter  Google Scholar 

  34. Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Adv. Neural Inf. Process. Syst. 14, 841–848 (2002)

    Google Scholar 

  35. Zhang, H.: The optimality of naive bayes. Assoc. Adv. Artif. Intell. 1(2), 3 (2004)

    Google Scholar 

  36. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). ISSN 1573–0565, https://doi.org/10.1007/BF00994018

  37. scikit learn. scikit-learn user guide. https://scikit-learn.org/stable/user_guide.html. Accessed 12 Dec 2021

  38. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  39. King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001). https://doi.org/10.1093/oxfordjournals.pan.a004868

    Article  Google Scholar 

  40. Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., Tao, D.: Cost-sensitive feature selection by optimizing f-measures. IEEE Trans. Image Process. 27(3), 1323–1335 (2018). ISSN 1941–0042, https://doi.org/10.1109/TIP.2017.2781298

  41. Parambath, S., Usunier, N., Grandvalet, Y.: Optimizing f-measures by cost-sensitive classification. Adv. Neural Inf. Process. Syst. 27, 2123–2131 (2014). http://papers.nips.cc/paper/5508-optimizing-f-measures-by-cost-sensitive-classification.pdf

  42. Steininger, S., Schiel, F., Rabold, S.: Annotation of multimodal data. In: SmartKom: Foundations of Multimodal Dialogue Systems, pp. 571–596 (2006)

    Google Scholar 

  43. Budkov, V., Prischepa, M., Ronzhin, A., Karpov, A.: Multimodal human-robot interaction. In: International Congress on Ultra Modern Telecommunications and Control Systems, pp. 485–488. IEEE (2010). https://doi.org/10.1109/ICUMT.2010.5676593

  44. Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Hoboken (2014)

    MATH  Google Scholar 

  45. Shi, W., Yu, Z.: Sentiment adaptive end-to-end dialog systems (2019)

    Google Scholar 

  46. McDuff, D., Czerwinski, M.: Designing emotionally sentient agents. Commun. ACM 61(12), 74–83 (2018). https://doi.org/10.1145/3186591

    Article  Google Scholar 

Download references

Acknowledgments

Our work is partially funded by the German Federal Ministry for Economic Affairs and Energy as part of their AI innovation initiative (funding code01MK20011A).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mareike Weber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Weber, M., Halimeh, M.M., Kellermann, W., Popp, B. (2022). Predicting Request Success with Objective Features in German Multimodal Speech Assistants. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05643-7_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05642-0

  • Online ISBN: 978-3-031-05643-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics