Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

Guanting Dong¹¹,
Jinxu Zhao¹¹,
Tingfeng Hui¹¹,
Daichi Guo¹¹,
Wenlong Wang¹¹,
Boqi Feng¹¹,
Yueyan Qiu¹¹,
Zhuoma Gongque¹¹,
Keqing He¹²,
Zechen Wang¹¹ &
…
Weiran Xu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1790 Accesses

Abstract

With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models’ performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction (The code is available at https://github.com/ZhaoJin-xu/A-Unified-Robustness-Evaluation-Fram ework-for-Noisy-Slot-Filling-Task).

G. Dong and J. Zhao—The first two authors contribute equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models

Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation

A Word-Level Method for Generating Adversarial Examples Using Whole-Sentence Information

References

Brown, T.B., et al.: Language models are few-shot learners. ArXiv abs/2005.14165 (2020)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners (2020)
Google Scholar
Coucke, A., et al.: Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces (2018)
Google Scholar
Dong, G., et al.: PSSAT: a perturbed semantic structure awareness transferring method for perturbation-robust slot filling. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 5327–5334. International Committee on Computational Linguistics, Gyeongju (2022). https://aclanthology.org/2022.coling-1.473
Gopalakrishnan, K., Hedayatnia, B., Wang, L., Liu, Y., Hakkani-Tur, D.: Are neural open-domain dialog systems robust to speech recognition errors in the dialog history? An empirical study (2020)
Google Scholar
Gui, T., et al.: TextFlint: unified multilingual robustness evaluation toolkit for natural language processing (2021)
Google Scholar
Guo, D., et al.: Revisit out-of-vocabulary problem for slot filling: a unified contrastive frameword with multi-level data augmentations. ArXiv abs/2302.13584 (2023)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models (2021)
Google Scholar
Lazaridou, A., Gribovskaya, E., Stokowiec, W., Grigorev, N.: Internet-augmented language models through few-shot prompting for open-domain question answering. ArXiv abs/2203.05115 (2022)
Google Scholar
Li, X., et al.: A robust contrastive alignment method for multi-domain text classification. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7827–7831 (2022). https://doi.org/10.1109/ICASSP43922.2022.9747192
Li, X., et al.: Generative zero-shot prompt learning for cross-domain slot filling with inverse prompting. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 825–834. Association for Computational Linguistics, Toronto (2023). https://aclanthology.org/2023.findings-acl.52
Lin, J., et al.: M6: multi-modality-to-multi-modality multitask mega-transformer for unified pretraining. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD 2021, pp. 3251–3261. Association for Computing Machinery, New York, NY (2021). https://doi.org/10.1145/3447548.3467206
Ma, E.: NLP augmentation (2019). https://github.com/makcedward/nlpaug
Moradi, M., Samwald, M.: Evaluating the robustness of neural language models to input perturbations. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1558–1570. Association for Computational Linguistics, Online and Punta Cana (2021). https://doi.org/10.18653/v1/2021.emnlp-main.117, https://aclanthology.org/2021.emnlp-main.117
Namysl, M., Behnke, S., Köhler, J.: NAT: noise-aware training for robust neural sequence labeling. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1501–1517. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.138, https://aclanthology.org/2020.acl-main.138
OpenAI: GPT-4 technical report (2023)
Google Scholar
Ouyang, L., et al.: Training language models to follow instructions with human feedback (2022)
Google Scholar
Peng, B., Li, C., Zhang, Z., Zhu, C., Li, J., Gao, J.: RADDLE: an evaluation benchmark and analysis platform for robust task-oriented dialog systems. arXiv preprint arXiv:2012.14666 (2020)
Qixiang, G., et al.: Exploiting domain-slot related keywords description for few-shot cross-domain dialogue state tracking. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2460–2465. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.emnlp-main.157
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
Ruan, W., Nechaev, Y., Chen, L., Su, C., Kiss, I.: Towards an ASR error robust spoken language understanding system. In: Interspeech 2020 (2020). https://www.amazon.science/publications/towards-an-asr-error-robust-spoken-language-understanding-system
Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: solving AI tasks with ChatGPT and its friends in HuggingFace (2023)
Google Scholar
Touvron, H., et al.: LLaMA: open and efficient foundation language models (2023)
Google Scholar
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models (2023)
Google Scholar
Wu, D., Chen, Y., Ding, L., Tao, D.: Bridging the gap between clean data training and real-world inference for spoken language understanding. arXiv preprint arXiv:2104.06393 (2021)
Zeng, W., et al.: Semi-supervised knowledge-grounded pre-training for task-oriented dialog systems. In: Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD), pp. 39–47. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.seretod-1.6
Zhang, Y., et al.: Pay attention to implicit attribute values: a multi-modal generative framework for AVE task. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 13139–13151. Association for Computational Linguistics, Toronto (2023). https://aclanthology.org/2023.findings-acl.831
Zhao, G., Dong, G., Shi, Y., Yan, H., Xu, W., Li, S.: Entity-level interaction via heterogeneous graph for multimodal named entity recognition. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 6345–6350. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.findings-emnlp.473

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Guanting Dong, Jinxu Zhao, Tingfeng Hui, Daichi Guo, Wenlong Wang, Boqi Feng, Yueyan Qiu, Zhuoma Gongque, Zechen Wang & Weiran Xu
Meituan Group, Beijing, China
Keqing He

Authors

Guanting Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jinxu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tingfeng Hui
View author publications
You can also search for this author in PubMed Google Scholar
Daichi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wenlong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Boqi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yueyan Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoma Gongque
View author publications
You can also search for this author in PubMed Google Scholar
Keqing He
View author publications
You can also search for this author in PubMed Google Scholar
Zechen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weiran Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiran Xu .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, G. et al. (2023). Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_53

Download citation

DOI: https://doi.org/10.1007/978-3-031-44693-1_53
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models

Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation

A Word-Level Method for Generating Adversarial Examples Using Whole-Sentence Information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models

Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation

A Word-Level Method for Generating Adversarial Examples Using Whole-Sentence Information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation