[go: up one dir, main page]

Skip to main content

Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Abstract

With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models’ performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction (The code is available at https://github.com/ZhaoJin-xu/A-Unified-Robustness-Evaluation-Framework-for-Noisy-Slot-Filling-Task).

G. Dong and J. Zhao—The first two authors contribute equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brown, T.B., et al.: Language models are few-shot learners. ArXiv abs/2005.14165 (2020)

    Google Scholar 

  2. Brown, T.B., et al.: Language models are few-shot learners (2020)

    Google Scholar 

  3. Coucke, A., et al.: Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces (2018)

    Google Scholar 

  4. Dong, G., et al.: PSSAT: a perturbed semantic structure awareness transferring method for perturbation-robust slot filling. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 5327–5334. International Committee on Computational Linguistics, Gyeongju (2022). https://aclanthology.org/2022.coling-1.473

  5. Gopalakrishnan, K., Hedayatnia, B., Wang, L., Liu, Y., Hakkani-Tur, D.: Are neural open-domain dialog systems robust to speech recognition errors in the dialog history? An empirical study (2020)

    Google Scholar 

  6. Gui, T., et al.: TextFlint: unified multilingual robustness evaluation toolkit for natural language processing (2021)

    Google Scholar 

  7. Guo, D., et al.: Revisit out-of-vocabulary problem for slot filling: a unified contrastive frameword with multi-level data augmentations. ArXiv abs/2302.13584 (2023)

    Google Scholar 

  8. Hu, E.J., et al.: LoRA: low-rank adaptation of large language models (2021)

    Google Scholar 

  9. Lazaridou, A., Gribovskaya, E., Stokowiec, W., Grigorev, N.: Internet-augmented language models through few-shot prompting for open-domain question answering. ArXiv abs/2203.05115 (2022)

    Google Scholar 

  10. Li, X., et al.: A robust contrastive alignment method for multi-domain text classification. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7827–7831 (2022). https://doi.org/10.1109/ICASSP43922.2022.9747192

  11. Li, X., et al.: Generative zero-shot prompt learning for cross-domain slot filling with inverse prompting. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 825–834. Association for Computational Linguistics, Toronto (2023). https://aclanthology.org/2023.findings-acl.52

  12. Lin, J., et al.: M6: multi-modality-to-multi-modality multitask mega-transformer for unified pretraining. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD 2021, pp. 3251–3261. Association for Computing Machinery, New York, NY (2021). https://doi.org/10.1145/3447548.3467206

  13. Ma, E.: NLP augmentation (2019). https://github.com/makcedward/nlpaug

  14. Moradi, M., Samwald, M.: Evaluating the robustness of neural language models to input perturbations. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1558–1570. Association for Computational Linguistics, Online and Punta Cana (2021). https://doi.org/10.18653/v1/2021.emnlp-main.117, https://aclanthology.org/2021.emnlp-main.117

  15. Namysl, M., Behnke, S., Köhler, J.: NAT: noise-aware training for robust neural sequence labeling. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1501–1517. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.138, https://aclanthology.org/2020.acl-main.138

  16. OpenAI: GPT-4 technical report (2023)

    Google Scholar 

  17. Ouyang, L., et al.: Training language models to follow instructions with human feedback (2022)

    Google Scholar 

  18. Peng, B., Li, C., Zhang, Z., Zhu, C., Li, J., Gao, J.: RADDLE: an evaluation benchmark and analysis platform for robust task-oriented dialog systems. arXiv preprint arXiv:2012.14666 (2020)

  19. Qixiang, G., et al.: Exploiting domain-slot related keywords description for few-shot cross-domain dialogue state tracking. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2460–2465. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.emnlp-main.157

  20. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)

  21. Ruan, W., Nechaev, Y., Chen, L., Su, C., Kiss, I.: Towards an ASR error robust spoken language understanding system. In: Interspeech 2020 (2020). https://www.amazon.science/publications/towards-an-asr-error-robust-spoken-language-understanding-system

  22. Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: solving AI tasks with ChatGPT and its friends in HuggingFace (2023)

    Google Scholar 

  23. Touvron, H., et al.: LLaMA: open and efficient foundation language models (2023)

    Google Scholar 

  24. Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models (2023)

    Google Scholar 

  25. Wu, D., Chen, Y., Ding, L., Tao, D.: Bridging the gap between clean data training and real-world inference for spoken language understanding. arXiv preprint arXiv:2104.06393 (2021)

  26. Zeng, W., et al.: Semi-supervised knowledge-grounded pre-training for task-oriented dialog systems. In: Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD), pp. 39–47. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.seretod-1.6

  27. Zhang, Y., et al.: Pay attention to implicit attribute values: a multi-modal generative framework for AVE task. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 13139–13151. Association for Computational Linguistics, Toronto (2023). https://aclanthology.org/2023.findings-acl.831

  28. Zhao, G., Dong, G., Shi, Y., Yan, H., Xu, W., Li, S.: Entity-level interaction via heterogeneous graph for multimodal named entity recognition. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 6345–6350. Association for Computational Linguistics, Abu Dhabi (2022). https://aclanthology.org/2022.findings-emnlp.473

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiran Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dong, G. et al. (2023). Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44693-1_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44692-4

  • Online ISBN: 978-3-031-44693-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics