Predicting Request Success with Objective Features in German Multimodal Speech Assistants

Mareike Weber¹⁰,
Mhd Modar Halimeh¹⁰,
Walter Kellermann¹⁰ &
…
Birgit Popp¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13336))

Included in the following conference series:

International Conference on Human-Computer Interaction

2962 Accesses

Abstract

We investigate whether objective features, like occurrence of an error and number of turns, can automatically predict success in interactions with multimodal speech assistants. We used interactions from the SmartKom corpus, a data set on multimodal interactions with virtual assistants in German. In a first step, we segmented the interactions into requests and labeled them as successful or unsuccessful. Afterwards, we defined task success as the average of request success rate. Next, we investigated whether subjective features such as emotions expressed by users show a relation to task success. We find no significant correlation. Finally, we exploited objective features, e.g., number of turns to predict request success. We find that objective features suffice to reach $F_1$ scores over 0.9 (prediction of successful requests) and $F_0$ scores above 0.83 (prediction of unsuccessful requests). Finally, we discuss implications of our findings for automatic evaluation of pragmatic aspects of user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri

A Study on Speech Emotion Recognition in the Context of Voice User Experience

“Speech Melody and Speech Content Didn’t Fit Together”—Differences in Speech Behavior for Device Directed and Human Directed Interactions

References

Tractica. Anzahl der Nutzer virtueller digitaler Assistenten weltweit in den Jahren von 2015 bis 2021 (in Millionen). Statista (2016)
Google Scholar
Porter, J., Pino, N., Leger, H.: Amazon Echo vs Apple HomePod vs Google Home: the battle of the smart speakers (2019). https://www.techradar.com/news/amazon-echo-vs-homepod-vs-google-home-the-battle-of-the-smart-speakers. Accessed 29 Nov 2021
Gebhart, A., Price, M.: The best smart speakers for 2019 (2019). https://www.cnet.com/news/best-smart-speakers-for-2019-amazon-echo-dot-google-nest-mini-assistant-alexa/. Accessed 29 Nov 2021
Van Camp, J.: The 8 best smart speakers with Alexa and Google Assistant (2019). https://www.wired.com/story/best-smart-speakers/. Accessed 29 Nov 2021
Hassenzahl, M.: The hedonic/pragmatic model of user experience. Towards a UX manifesto (2007)
Google Scholar
Minge, M., Thüring, M.: Hedonic and pragmatic halo effects at early stages of user experience. Int. J. Hum.-Comput. Stud. 109, 13–25 (2018). ISSN 1071–5819. https://doi.org/10.1016/j.ijhcs.2017.07.007
Merčun, T., Žumer, M.: Exploring the influences on pragmatic and hedonic aspects of user experience. Inf. Res. 22(1) (2017)
Google Scholar
Gao, J., Galley, M., Li, L.: Neural approaches to conversational AI. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, pp. 1371–1374. Association for Computing Machinery, New York (2018). ISBN 9781450356572, https://doi.org/10.1145/3209978.3210183
Brüggemeier, B., Breiter, M., Kurz, M., Schiwy, J.: User experience of alexa, siri and google assistant when controlling music – comparison of four questionnaires. In: Stephanidis, C., Marcus, A., Rosenzweig, E., Rau, P.-L.P., Moallem, A., Rauterberg, M. (eds.) HCII 2020. LNCS, vol. 12423, pp. 600–618. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60114-0_40
Chapter Google Scholar
Kurz, M., Brüggemeier, B., Breiter, M.: Success is not final; failure is not fatal – task success and user experience in interactions with alexa, google assistant and siri. In: Kurosu, M. (ed.) HCII 2021. LNCS, vol. 12764, pp. 351–369. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78468-3_24
Chapter Google Scholar
Lewis, J., Sauro, J.: Can i leave this one out? the effect of dropping an item from the sus. J. Usabil. Stud. 13(1), 38–46 (2017)
Google Scholar
Kocabalil, A., Laranjo, L., Coiera, E.: Measuring user experience in conversational interfaces: a comparison of six questionnaires. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference, vol. 32, pp. 1–12 (2018)
Google Scholar
Deriu, J., Rodrigo, A., Otegi, A., Echegoyen, G., Rosset, S., Agirre, E., Cieliebak, M.: Survey on evaluation methods for dialogue systems. Artif. Intell. Rev. 54(1), 755–810 (2020). https://doi.org/10.1007/s10462-020-09866-x
Article Google Scholar
McLafferty, S.: Conducting questionnaire surveys. Key Methods Geogr. 1, 87–100 (2003)
Google Scholar
Fedotov, D., Matsuda, Y., Minker, W.: From smart to personal environment: Integrating emotion recognition into smart houses. In: IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 943–948 (2019). https://doi.org/10.1109/PERCOMW.2019.8730876
Shamekhi, A., Czerwinski, M., Mark, G., Novotny, M., Bennett, G.A.: An exploratory study toward the preferred conversational style for compatible virtual agents. In: Traum, D., Swartout, W., Khooshabeh, P., Kopp, S., Scherer, S., Leuski, A. (eds.) IVA 2016. LNCS (LNAI), vol. 10011, pp. 40–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47665-0_4
Chapter Google Scholar
Cassell, J., Thórisson, K.: The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Appl. Artif. Intell. 13, 519–538 (1999). https://doi.org/10.1080/088395199117360
Article Google Scholar
Hamacher, A., Bianchi-Berthouze, N., Pipe, A.G., Eder, K.: Believing in bert: using expressive communication to enhance trust and counteract operational error in physical human-robot interaction. In: 25th IEEE International Symposium on Robot and Human Interactive Communication, pp. 493–500 (2016)
Google Scholar
Bickmore, T., Cassell, J.: Social Dialongue with Embodied Conversational Agents, pp. 23–54. Springer, Dordrecht (2005). https://doi.org/10.1007/1-4020-3933-6_2
Book Google Scholar
Schuller, B., et al.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)
Article Google Scholar
Lim, N.: Cultural differences in emotion: differences in emotional arousal level between the east and the west. Integr. Med. Res. 5(2), 105–109 (2016)
Article Google Scholar
Schiel, F.: Evaluation of Multimodal Dialogue Systems, pp. 617–643. Springer, Heidelberg (2006)
Google Scholar
Hassenzahl, M., Platz, A., Burmester, M., Lehner, K.: Hedonic and ergonomic quality aspect determine a software’s appeal. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 201–208 (2000)
Google Scholar
Wahlster, W. (ed.): SmartKom: Foundations of Multimodal Dialogue Systems. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-36678-4
Beringer, N.: SmartKom - Datensammlung und Evaluation. Technical Report Memo Nr. 2, Ludwig-Maximilians-Universität München (2000). https://www.phonetik.uni-muenchen.de/Forschung/SmartKom/Memo-NR-02.ps
Kleinbaum, D.G., Klein, M.: Logistic Regression. SBH, Springer, New York (2010). https://doi.org/10.1007/978-1-4419-1742-3
Book MATH Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). ISSN 1573–0565, https://doi.org/10.1023/A:1010933404324
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1007/bf00058655
Article MATH Google Scholar
Louppe, G.: Understanding Random Forests: From Theory to Practice. PhD thesis, University of Liège (2014)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group (1984)
Google Scholar
Schapire, R., Freund, Y., Bartlett, P., Lee, W.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26(5), 1651–1686 (1998). https://doi.org/10.1214/aos/1024691352
Article MathSciNet MATH Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59119-2_166
Chapter Google Scholar
Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Adv. Neural Inf. Process. Syst. 14, 841–848 (2002)
Google Scholar
Zhang, H.: The optimality of naive bayes. Assoc. Adv. Artif. Intell. 1(2), 3 (2004)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). ISSN 1573–0565, https://doi.org/10.1007/BF00994018
scikit learn. scikit-learn user guide. https://scikit-learn.org/stable/user_guide.html. Accessed 12 Dec 2021
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001). https://doi.org/10.1093/oxfordjournals.pan.a004868
Article Google Scholar
Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., Tao, D.: Cost-sensitive feature selection by optimizing f-measures. IEEE Trans. Image Process. 27(3), 1323–1335 (2018). ISSN 1941–0042, https://doi.org/10.1109/TIP.2017.2781298
Parambath, S., Usunier, N., Grandvalet, Y.: Optimizing f-measures by cost-sensitive classification. Adv. Neural Inf. Process. Syst. 27, 2123–2131 (2014). http://papers.nips.cc/paper/5508-optimizing-f-measures-by-cost-sensitive-classification.pdf
Steininger, S., Schiel, F., Rabold, S.: Annotation of multimodal data. In: SmartKom: Foundations of Multimodal Dialogue Systems, pp. 571–596 (2006)
Google Scholar
Budkov, V., Prischepa, M., Ronzhin, A., Karpov, A.: Multimodal human-robot interaction. In: International Congress on Ultra Modern Telecommunications and Control Systems, pp. 485–488. IEEE (2010). https://doi.org/10.1109/ICUMT.2010.5676593
Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Hoboken (2014)
MATH Google Scholar
Shi, W., Yu, Z.: Sentiment adaptive end-to-end dialog systems (2019)
Google Scholar
McDuff, D., Czerwinski, M.: Designing emotionally sentient agents. Commun. ACM 61(12), 74–83 (2018). https://doi.org/10.1145/3186591
Article Google Scholar

Download references

Acknowledgments

Our work is partially funded by the German Federal Ministry for Economic Affairs and Energy as part of their AI innovation initiative (funding code01MK20011A).

Author information

Authors and Affiliations

Chair of Multimedia Communications and Signal Processing, Friedrich-Alexander University, Cauerstr. 7, 91085, Erlangen, Germany
Mareike Weber, Mhd Modar Halimeh & Walter Kellermann
Fraunhofer Institute for Integrated Circuits IIS, Am Wolfsmantel 33, 91058, Erlangen, Germany
Birgit Popp

Authors

Mareike Weber
View author publications
You can also search for this author in PubMed Google Scholar
Mhd Modar Halimeh
View author publications
You can also search for this author in PubMed Google Scholar
Walter Kellermann
View author publications
You can also search for this author in PubMed Google Scholar
Birgit Popp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mareike Weber .

Editor information

Editors and Affiliations

Siemens (United States), Princeton, NJ, USA
Helmut Degen
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weber, M., Halimeh, M.M., Kellermann, W., Popp, B. (2022). Predicting Request Success with Objective Features in German Multimodal Speech Assistants. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-05643-7_39
Published: 15 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05642-0
Online ISBN: 978-3-031-05643-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Request Success with Objective Features in German Multimodal Speech Assistants

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri

A Study on Speech Emotion Recognition in the Context of Voice User Experience

“Speech Melody and Speech Content Didn’t Fit Together”—Differences in Speech Behavior for Device Directed and Human Directed Interactions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Predicting Request Success with Objective Features in German Multimodal Speech Assistants

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri

A Study on Speech Emotion Recognition in the Context of Voice User Experience

“Speech Melody and Speech Content Didn’t Fit Together”—Differences in Speech Behavior for Device Directed and Human Directed Interactions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation