A Word-Granular Adversarial Attacks Framework for Causal Event Extraction
<p>Examples of sentences with pairs of causal events.</p> "> Figure 2
<p>Examples of reprocessing data. The red and yellow words in the table are the newly added reason and result labels, respectively.</p> "> Figure 3
<p>An overview of the proposed ACMM framework.</p> "> Figure 4
<p>Experimental diagram of hyperparameter selection. The two pictures above show the results of models under different batch sizes. The remaining two pictures show the loss curve of the model.</p> "> Figure 5
<p>Box plot of F1 for 5 different strategies. The upper and lower sides represent the maximum and minimum values, and the middle line represents the mean values.</p> "> Figure 6
<p>Examples of three masked sentences. Among them, “cas” represents the cause entity, and “eff” represents the result entity. There are two sentences with explicit causality, and one sentence with implicit causality.</p> ">
Abstract
:1. Introduction
- We propose an adaptive mask model that utilizing reinforcement learning to obtain a word mask distribution. In addition, through adversarial training methods, the generalization performance of large models on small datasets can be improved. Our method proposes a new perspective to combine reinforcement learning and deep learning to solve problems.
- We calibrate the causal dataset extracted from a benchmark dataset and obtain a more accurate one with causal event pair annotation.
- We conduct extensive experiments on the causal event extraction datasets. Experiments show that the model with ACMM is better than its basic version.
2. Related Works
2.1. Causality Extraction
2.2. Pre-Trained Language Model
2.3. Reinforcement Learning-Based NLP
3. Materials and Methods
3.1. Datasets
- Each word pair with a causal relationship is labeled as one of the “B-Cause”, “I-Cause”, “B-Effect”, “I-Effect”; other relationships and irrelevant words are labeled as “O”.
- The SemEval dataset only contains a pair of causal relationship entity tags. Yet, there is a variety of one-to-many, many-to-many causal relationships (Examples are shown in Figure 2). We have labeled these data.
3.2. Methods
3.2.1. Encoder Model Warm-Up
3.2.2. Environment Model
3.2.3. Mask Model Overview
3.2.4. Critic Model
3.2.5. Actor Model
3.3. Adversarial Training
Algorithm 1 Adversarial training. |
Require: environment model , mask model , dataset D, Sampling threshold
|
4. Results and Discussion
4.1. Experimental Setting
4.1.1. Evaluation method
4.1.2. Implementation Details
4.2. Baselines
4.3. Experimental Results
4.4. Analysis of Different Mask Strategy
4.5. Analysis of Masked Sentence
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ACMM | Actor-Critic Mask Model |
SOTA | state of the art |
A2C | Advantage Actor-Critic |
MLM | Mask Language Model |
NLP | Natural Language Processing |
DQN | Deep Q-Learning |
Causal TB | Causal TimeBank |
TD | Temporal-Difference |
References
- Dmitry, Z.; Chinatsu, A.; Anthony, R. Kernel methods for relation extraction. J. Mach. Learn. Res. 2003, 3, 1083–1106. [Google Scholar]
- Suncong, Z.; Feng, W.; Hongyun, B.; Yuexing, H.; Peng, Z.; Bo, X. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers. [Google Scholar]
- Clark, K.; Luong, M.-T.; Le Quoc, V.; Manning Christopher, D. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Liu, Y.; Myle, O.; Naman, G.; Jingfei, D.; Mandar, J.; Danqi, C.; Omer, L.; Mike, L.; Luke, Z.; Veselin, S. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 6–7 June 2019. [Google Scholar]
- Kira, R.; Sagie, D.; Shaul, M. Learning causality for news events prediction. In Proceedings of the 21st International Conference on World Wide Web, New York, NY, USA, 16–20 April 2012. [Google Scholar]
- Chikara, H.; Kentaro, T.; Julien, K.; Motoki, S.; Istv’an, V.; Jong-Hoon, O.; Yutaka, K. Toward future scenario generation: Extracting event causality exploiting semantic relation, context, and association features. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 22–27 June 2014; Volume 1. [Google Scholar]
- Roxana, G. Automatic detection of causal relations for question answering. In Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, Sapporo, Japan, 7–12 July 2003. [Google Scholar]
- Lee, D.; Shin, H. Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature. BMC Med. Inform. Decis. Mak. 2017, 17, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Judea, P.; Dana, M. The Book of Why: The New Science of Cause and Effect; Basic Books: New York, NY, USA, 2018. [Google Scholar]
- Khoo, C.S.G.; Kornfilt, J.; Oddy, R.N.; Myaeng, S.H. Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Lit. Linguist. Comput. 1998, 13, 177–186. [Google Scholar] [CrossRef]
- Gordon, A.S.; Bejan, C.A.; Sagae, K. Commonsense causal reasoning using millions of personal stories. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011. [Google Scholar]
- Kruengkrai, C.; Torisawa, K.; Hashimoto, C.; Kloetzer, J.; Oh, J.; Tanaka, M. Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Dasgupta, T.; Saha, R.; Dey, L.; Naskar, A. Automatic extraction of causal relations from text using linguistically informed deep neural networks. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, 12–14 July 2018; pp. 306–316. [Google Scholar]
- Li, P.; Mao, K. Knowledge-oriented convolutional neural network for causal relation extraction from natural language texts. Expert Syst. Appl. 2019, 115, 512–523. [Google Scholar] [CrossRef]
- Schomacker, T. Tropmann-Frick M. Language Representation Models: An Overview. Entropy 2021, 23, 1422. [Google Scholar] [CrossRef] [PubMed]
- Lan, Z.; Chen, M.; Sebastian, G.; Kevin, G.; Piyush, S.; Radu, S. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Li, D.; Nan, Y.; Wenhui, W.; Furu, W.; Xiaodong, L.; Yu, W.; Jianfeng, G.; Ming, Z.; Hsiao-Wuen, H. Unified language model pre-training for natural language understanding and generation. arXiv 2019, arXiv:1905.03197. [Google Scholar]
- Mandar, J.; Chen, D.; Liu, Y.; Weld, D.S.; Zettlemoyer, L.; Levy, O. Spanbert: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar]
- Wei, W.; Bin, B.; Ming, Y.; Chen, W.; Jiangnan, X.; Zuyi, B.; Liwei, P.; Luo, S. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Chen, S.; Jiang, C.; Li, J.; Xiang, J.; Xiao, W. Improved Deep Q-Network for User-Side Battery Energy Storage Charging and Discharging Strategy in Industrial Parks. Entropy 2021, 23, 1311. [Google Scholar] [CrossRef] [PubMed]
- Xia, K.; Feng, J.; Yan, C.; Duan, C.B. BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning. Entropy 2021, 23, 932. [Google Scholar] [CrossRef] [PubMed]
- Wan, K.; Wu, D.; Zhai, Y.; Li, B.; Gao, X.; Hu, Z. An Improved Approach towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning. Entropy 2021, 23, 1433. [Google Scholar] [CrossRef] [PubMed]
- Karthik, N.; Adam, Y.; Regina, B. Improving information extraction by acquiring external evidence with reinforcement learning. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP, Austin, TX, USA, 1–5 November 2016. [Google Scholar]
- Hongliang, F.; Xu, L.; Dingcheng, L.; Ping, L. End-to-end deep reinforcement learning based coreference resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
- Zhang, T.; Huang, M.; Zhao, L. Learning Structured Representation for Text Classification via Reinforcement Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Romain, P.; Caiming, X.; Richard, S. A Deep Reinforced Model for Abstractive Summarization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Chen, L.; Zhang, T.; He, D.; Ke, G.; Wang, L.; Liu, T.-Y. Variance-reduced language pretraining via a mask proposal network. arXiv 2020, arXiv:2008.05333. [Google Scholar]
- Minki, K.; Moonsu, H.; Ju, H.S. Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020. [Google Scholar]
- Hendrickx, I.; Kim, S.N.; Kozareva, Z.; Nakov, P.; Séaghdha, D.Ó.; Padó, S.; Pennacchiotti, M.; Romano, L.; Szpakowicz, S. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, Los Angeles, CA, USA, 15–16 July 2010. [Google Scholar]
- Paramita, M.; Rachele, S.; Sara, T.; Manuela, S. Annotating causality in the tempeval-3 corpus. In Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL), Gothenburg, Sweden, 26–27 April 2014. [Google Scholar]
- Volodymyr, M.; Puigdomenech, B.A.; Mehdi, M.; Alex, G.; Timothy, L.; Tim, H.; David, S.; Koray, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. NIPs 1999, 99, 1057–1063. [Google Scholar]
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Artem, C.; Oleksiy, O.; Philipp, H.; Alexander, B.; Matthias, H.; Chris, B.; Alexander, P. Targer: Neural argument mining at your fingertips. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Florence, Italy, 28–31 July 2019. [Google Scholar]
- Ilya, L.; Frank, H. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-training with whole word masking for chinese bert. arXiv 2019, arXiv:1906.08101. [Google Scholar] [CrossRef]
- Li, B.; Hou, Y.; Che, W. Data Augmentation Approaches in Natural Language Processing: A Survey. arXiv 2021, arXiv:2110.01852. [Google Scholar]
SemEval | Causal TB | |||
---|---|---|---|---|
Train | Test | Train | Test | |
Cause-effect | 904 | 421 | 220 | 181 |
Other | 6574 | 2775 | 202 | 78 |
All | 7478 | 3196 | 422 | 103 |
Model | SemEval | Causal TB | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
BiLSTM-CRF | 65.90 | 45.37 | 53.72 | 32.07 | 31.41 | 31.64 |
TARGER | 61.38 | 70.20 | 65.49 | 40.69 | 49.55 | 44.69 |
BERT | 77.61 | 77.85 | 77.71 | 44.01 | 53.85 | 47.60 |
BERT-CRF | 78.63 | 77.91 | 78.26 | 55.24 | 46.80 | 50.66 |
BERT-BiLSTM-CRF | 80.40 | 77.20 | 78.74 | 54.60 | 43.60 | 48.14 |
BERT-ACMM | 79.19 | 77.41 | 78.28 | 48.37 | 54.49 | 51.23 |
BERT-CRF-ACMM | 80.19 | 79.23 | 79.71 | 54.33 | 47.18 | 50.50 |
BERT-BiLSTM-CRF-ACMM | 81.55 | 77.08 | 79.25 | 55.36 | 45.78 | 50.12 |
Mask | SemEval | Causal TB | ||||
---|---|---|---|---|---|---|
Function | P | R | F1 | P | R | F1 |
No Mask | 77.61 | 77.85 | 77.71 | 44.01 | 53.85 | 47.60 |
Whole Word | 76.19 | 79.92 | 77.99 | 48.38 | 53.85 | 50.94 |
Word Piece | 76.54 | 78.67 | 77.58 | 52.31 | 49.36 | 50.78 |
Span | 77.04 | 76.13 | 76.58 | 43.13 | 53.85 | 47.90 |
RL Mask | 79.19 | 77.41 | 78.28 | 48.37 | 54.49 | 51.23 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Zuo, W.; Liang, S.; Yuan, X.; Zhang, Y.; Zuo, X. A Word-Granular Adversarial Attacks Framework for Causal Event Extraction. Entropy 2022, 24, 169. https://doi.org/10.3390/e24020169
Zhao Y, Zuo W, Liang S, Yuan X, Zhang Y, Zuo X. A Word-Granular Adversarial Attacks Framework for Causal Event Extraction. Entropy. 2022; 24(2):169. https://doi.org/10.3390/e24020169
Chicago/Turabian StyleZhao, Yu, Wanli Zuo, Shining Liang, Xiaosong Yuan, Yijia Zhang, and Xianglin Zuo. 2022. "A Word-Granular Adversarial Attacks Framework for Causal Event Extraction" Entropy 24, no. 2: 169. https://doi.org/10.3390/e24020169