Bhargava, P., Drozd, A., and Rogers, A. Generalization in NLI: Ways (not) to go beyond simple heuristics. In Proceedings of the 2^nd Workshop on Insights from Negative Results in NLP, 2021.

Crossref

Google Scholar

[3]

Branco, R., Branco, A., Silva, J., and Rodrigues, J. Shortcutted commonsense: Data spuriousness in deep learning of commonsense reasoning. In Proceedings of the 2021 Conf. Empirical Methods in Natural Language Processing.

Google Scholar

[4]

Brown, T.B. et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020.

Google Scholar

[5]

Bubeck, S. and Sellke, M. A universal law of robustness via isoperimetry. Advances in Neural Information Processing Systems, 2021.

Google Scholar

[6]

Choi, S., Jeong, M., Han, H., and Hwang, S. C2l: Causally contrastive learning for robust text classification. In Proceedings of the AAAI Conf. Artificial Intelligence 36 (2022), 10526--10534.

Crossref

Google Scholar

[7]

Clark, C., Yatskar, M., and Zettlemoyer, L. Don't take the easy way out: Ensemble based methods for avoiding known dataset biases. Empirical Methods in Natural Language Processing, 2019.

Crossref

Google Scholar

[8]

Devlin, J., Chang, M., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. North American Chapter of the Assoc. Computational Linguistics, 2019.

Google Scholar

[9]

Du, M. et al. Towards interpreting and mitigating shortcut learning behavior of NLU models. North American Chapter of the Assoc. Computational Linguistics, 2021.

Google Scholar

[10]

Du, M., Mukherjee, S., Cheng, Y., Shokouhi, M., Hu, X., and Awadallah, A.H. Robustness challenges in model distillation and pruning for natural language understanding. In Proceedings of the 17^th Annual Meeting of the European Chapter of the Assoc. Computational Linguistics, 2023.

Crossref

Google Scholar

[11]

Ethayarajh, K. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the 2019 Conf. Empirical Methods in Natural Language Processing and the 9^th Intern. Joint Conf. Natural Language Processing, 55--65.

Google Scholar

[12]

Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S.R., and Smith, N.A. Annotation artifacts in natural language inference data. North American Chapter of the Assoc. Computational Linguistics, 2018.

Google Scholar

[13]

Han, X., Wallace, B.C., and Tsvetkov, Y. Explaining black box predictions and unveiling data artifacts through influence functions. In Proceedings of the 58^th Annual Meeting of the Assoc. Computational Linguistics, 2020.

Crossref

Google Scholar

[14]

Jin, D., Jin, Z., Zhou, J.T., and Szolovits, P. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI Conf. Artificial Intelligence, 2020.

Crossref

Google Scholar

[15]

Ko, M., Lee, J., Kim, H., Kim, G., and Kang, J. Look at the first sentence: Position bias in question answering. Empirical Methods in Natural Language Processing, 2020.

Crossref

Google Scholar

[16]

Lai, Y., Zhang, C., Feng, Y., Huang, Q., and Zhao, D. Why machine reading comprehension models learn shortcuts? ACL Findings, 2021.

Crossref

Google Scholar

[17]

Liang, Y., Cao, R., Zheng, J., Ren, J., and Gao, L. Learning to remove: Towards isotropic pre-trained BERT embedding. Artificial Neural Networks and Machine Learning: Proceedings of the 30^th Intern. Conf. Artificial Neural Networks (Bratislava, Slovakia, Sept. 14--17, 2021), 448--459.

Digital Library

Google Scholar

[18]

Liu, F. and Avci, B. Incorporating priors with feature attribution on text classification. In Proceedings of the 57^th Annual Meeting of the Assoc. Computational Linguistics, 2019.

Crossref

Google Scholar

[19]

Liu, Y. et al. RoBERTa: A robustly optimized BERT pretraining approach, 2019; arXiv:1907.11692.

Google Scholar

[20]

McCoy, R.T., Pavlick, E., and Linzen, T. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57^th Annual Meeting of the Assoc. Computational Linguistics, 2019.

Crossref

Google Scholar

[21]

Mendelson, M. and Belinkov, Y. Debiasing Methods in Natural Language Understanding Make Bias More Accessible. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.

Google Scholar

[22]

Morwani, D., Batra, J., Jain, P., and Netrapalli, P. Simplicity Bias in 1-Hidden Layer Neural Networks, 2023; arXiv:2302.00457.

Google Scholar

[23]

Niven, T. and Kao, H. Probing neural network comprehension of natural language arguments. In Proceedings of the 57^th Annual Meeting of the Assoc. Computational Linguistics, 2019.

Crossref

Google Scholar

[24]

Nye, M. et al. Show your work: Scratchpads for intermediate computation with language models. Deep Learning for Code Workshop, 2022.

Google Scholar

[25]

Pezeshkpour, P., Jain, S., Singh, S., and Wallace, B.C. Combining feature and instance attribution to detect artifacts, 2021; arXiv:2107.00323.

Google Scholar

[26]

Pham, T.M., Bui, T., Mai, L., and Nguyen, A. Out of Order: How important is the sequential order of words in a sentence in natural language understanding tasks? 2020; arXiv:2012.15180.

Google Scholar

[27]

Prasad, G., Nie, Y., Bansal, M., Jia, R., Kiela, D., and Williams, A. To what extent do human explanations of model behavior align with actual model behavior? In Proceedings of the 4^th Blackbox NLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2021.

Crossref

Google Scholar

[28]

Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., and Sun, M. Mind the style of text! Adversarial and backdoor attacks based on text style transfer. In Proceedings of the 2021 Conf. Empirical Methods in Natural Language Processing.

Google Scholar

[29]

Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Machine Learning Research (2020).

Google Scholar

[30]

Rashid, A., Lioutas, V., and Rezagholizadeh, M. MATE-KD: Masked Adversarial TExt, a companion to knowledge distillation. In Proceedings of the 59^th Annual Meeting of the Assoc. Computational Linguistics and the 11^th Intern. Joint Conf. Natural Language Processing, 2021.

Crossref

Google Scholar

[31]

Robinson, J., Sun, L., Yu, K., Batmanghelich, K., Jegelka, S., and Sra, S. Can contrastive learning avoid shortcut solutions? Advances in Neural Information Processing Systems, 2021.

Google Scholar

[32]

Schuster, T., Shah, D.J., Yeo, Y.J.S., Filizzola, D., Santus, E., and Barzilay, R. Towards debiasing fact verification models. Empirical Methods in Natural Language Processing, 2019.

Crossref

Google Scholar

[33]

Sen, P. and Saffari, A. What do models learn from question answering datasets? In Proceedings of the 2020 Conf. Empirical Methods in Natural Language Processing.

Google Scholar

[34]

Shah, H., Tamuly, K., Raghunathan, A., Jain, P., and Netrapalli, P. The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 2020.

Google Scholar

[35]

Shi, Y. et al. Gradient matching for domain generalization. In Proceedings of the 2022 Intern. Conf. Learning Representations.

Google Scholar

[36]

Si, C. et al. Prompting gpt-3 to be reliable, 2022; arXiv:2210.09150.

Google Scholar

[37]

Si, C., Wang, S., Kan, M., and Jiang, J. What does BERT learn from multiple-choice reading comprehension datasets? (2019); arXiv:1910.12391.

Google Scholar

[38]

Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., and Kiela, D. Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. Empirical Methods in Natural Language Processing, 2021.

Crossref

Google Scholar

[39]

Stacey, J., Belinkov, Y., and Rei, M. Supervising Model Attention with Human Explanations for Robust Natural Language Inference. In Proceedings of the 2022 AAAI Conf. Artificial Intelligence.

Google Scholar

[40]

Stacey, J., Minervini, P., Dubossarsky, H., Riedel, S., and Rocktschel, T. Avoiding the hypothesis-only bias in natural language inference via ensemble adversarial training. Empirical Methods in Natural Language Processing, 2020.

Crossref

Google Scholar

[41]

Sundararajan, M., Taly, A., and Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 2017 Intern. Conf. Machine Learning.

Google Scholar

[42]

Teney, D., Abbasnejad, E., and van den Hengel, A. Unshuffling data for improved generalization, (2020); arXiv:2002.11894.

Google Scholar

[43]

Tu, L., Lalwani, G., Gella, S., and He, H. An empirical study on robustness to spurious correlations using pre-trained language models. Trans. Assoc. Computational Linguistics (2020).

Google Scholar

[44]

Utama, P.A., Moosavi, N.S., and Gurevych, I. Towards debiasing NLU models from unknown biases. Empirical Methods in Natural Language Processing, 2020.

Crossref

Google Scholar

[45]

Utama, P.A., Moosavi, N.S., Sanh, V., and Gurevych, I. Avoiding inference heuristics in few-shot prompt-based finetuning. Empirical Methods in Natural Language Processing, 2021.

Crossref

Google Scholar

[46]

Vardi, G., Yehudai, G., and Shamir, O. Gradient methods provably converge to non-robust networks, 2022; arXiv:2202.04347.

Google Scholar

[47]

Wang, B. et al. Adversarial GLUE: A multi-task benchmark for robustness evaluation of language models. In Proceedings for the 35^th Conf. Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.

Google Scholar

[48]

Webson, A. and Pavlick, E. Do prompt-based models really understand the meaning of their prompts? In Proceedings of the 2022 Conf. North American Chapter of the Assoc. Computational Linguistics: Human Language Technologies.

Google Scholar

[49]

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022.

Google Scholar

[50]

Yang, J., Gupta, A., Upadhyay, S., He, L., Goel, R., and Paul, S. TableFormer: Robust transformer modeling for table-text encoding. In Proceedings of the 60^th Annual Meeting of the Assoc. Computational Linguistics, 2022.

Crossref

Google Scholar

[51]

Zellers, R., Bisk, Y., Schwartz, R., and Choi, Y. Swag: A large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the 2018 Conf. Empirical Methods in Natural Language Processing.

Google Scholar

[52]

Zhao, Z., Wallace, E., Feng, S., Klein, D., and Singh, S. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 2021 Intern. Conf. Machine Learning, 12697--12706.

Google Scholar

Cited By

View all

G JB SBala P S(2024)eHyPRETo: Enhanced Hybrid Pre-Trained and Transfer Learning-based Contextual Relation Classification ModelSalud, Ciencia y Tecnología - Serie de Conferencias10.56294/sctconf20247583(758)Online publication date: 12-May-2024
https://doi.org/10.56294/sctconf2024758
Liu XZhang KFiumara GMeo PFicara A(2024)Adaptive Evolutionary Computing Ensemble Learning Model for Sentiment AnalysisApplied Sciences10.3390/app1415680214:15(6802)Online publication date: 4-Aug-2024
https://doi.org/10.3390/app14156802
Quttainah MMishra VMadakam SLurie YMark S(2024)Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative StudyJMIR AI10.2196/518343(e51834)Online publication date: 23-Apr-2024
https://doi.org/10.2196/51834
Show More Cited By

Index Terms

Shortcut Learning of Large Language Models in Natural Language Understanding
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Learning paradigms

Recommendations

Layout-sensitive language extensibility with SugarHaskell
Haskell '12: Proceedings of the 2012 Haskell Symposium

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
Layout-sensitive language extensibility with SugarHaskell
Haskell '12

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
Bilingual sign language dictionary for learning a second sign language without learning the target spoken language
MLR '04: Proceedings of the Workshop on Multilingual Linguistic Ressources

This paper describes a bilingual sign language dictionary (Japanese Sign Language and American Sign Language) that can help people learn each sign language directly from their mother sign language. Our discussion covers two main points. The first ...

Comments

Information & Contributors

Information

Published In

Communications of the ACM Volume 67, Issue 1

January 2024

122 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/3638509

Editor:
James Larus
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2023

Published in CACM Volume 67, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
6,840
Total Downloads

Downloads (Last 12 months)6,840
Downloads (Last 6 weeks)228

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

G JB SBala P S(2024)eHyPRETo: Enhanced Hybrid Pre-Trained and Transfer Learning-based Contextual Relation Classification ModelSalud, Ciencia y Tecnología - Serie de Conferencias10.56294/sctconf20247583(758)Online publication date: 12-May-2024
https://doi.org/10.56294/sctconf2024758
Liu XZhang KFiumara GMeo PFicara A(2024)Adaptive Evolutionary Computing Ensemble Learning Model for Sentiment AnalysisApplied Sciences10.3390/app1415680214:15(6802)Online publication date: 4-Aug-2024
https://doi.org/10.3390/app14156802
Quttainah MMishra VMadakam SLurie YMark S(2024)Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative StudyJMIR AI10.2196/518343(e51834)Online publication date: 23-Apr-2024
https://doi.org/10.2196/51834
Zhao LLiu QYue LChen WChen LSun RSong CHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)COMI: COrrect and MItigate Shortcut Learning Behavior in Deep Neural NetworksProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657729(218-228)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657729
Fang CSun WChen YChen XWei ZZhang QYou YLuo BLiu YChen Z(2024)Esale: Enhancing Code-Summary Alignment Learning for Source Code SummarizationIEEE Transactions on Software Engineering10.1109/TSE.2024.342227450:8(2077-2095)Online publication date: Aug-2024
https://doi.org/10.1109/TSE.2024.3422274
Dogra VVerma SKavita Woźniak MShafi JIjaz M(2024)Shortcut Learning Explanations for Deep Natural Language Processing: A Survey on Dataset BiasesIEEE Access10.1109/ACCESS.2024.336030612(26183-26195)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3360306
Dakhel ANikanjam AKhomh FDesmarais MWashizaki H(2024)An Overview on Large Language ModelsGenerative AI for Effective Software Development10.1007/978-3-031-55642-5_1(3-21)Online publication date: 1-Jun-2024
https://doi.org/10.1007/978-3-031-55642-5_1
Hoftijzer DBurghouts GSpreeuwers L(2023)Language-Based Augmentation to Address Shortcut Learning in Object-Goal Navigation2023 Seventh IEEE International Conference on Robotic Computing (IRC)10.1109/IRC59093.2023.00007(1-8)Online publication date: 11-Dec-2023
https://doi.org/10.1109/IRC59093.2023.00007
Zeng FGan WWang YYu P(2023)Distributed Training of Large Language Models2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00126(840-847)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00126

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Layout-sensitive language extensibility with SugarHaskell

Layout-sensitive language extensibility with SugarHaskell

Bilingual sign language dictionary for learning a second sign language without learning the target spoken language

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Digital Edition

Magazine Site

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations