FinKENet: A Novel Financial Knowledge Enhanced Network for Financial Question Matching
<p><b>Cases of financial question-matching task</b>. (<b>a</b>) is a similar utterance-pairs, (<b>b</b>) is not a similar utterance-pairs.</p> "> Figure 2
<p><b>Illustration of the proposed approach</b>. (<b>a</b>) is sentence-level representation, (<b>b</b>) is phrase-level representation, (<b>c</b>) is fin co-attention adapter, and (<b>d</b>) is similarity decoder layer.</p> "> Figure 3
<p><b>Illustration of the architecture of the FinBERT.</b> </p> "> Figure 4
<p><b>Illustration of the workflow of the Phrase-level Representation.</b> </p> "> Figure 5
<p><b>Illustration of the workflow of the Fin Co-Attention Adapter.</b> </p> "> Figure 6
<p><b>Analysis of Mutil-level Similarity.</b> (<b>a</b>) is accuracy, (<b>b</b>) is recall, and (<b>c</b>) is F1 score.</p> "> Figure 7
<p><b>Analysis with different epochs.</b> The epochs are represented on the X-axis, and the scores of Acc, Recall, and F1 are represented on the Y-axis.</p> ">
Abstract
:1. Introduction
- We introduce a novel financial knowledge-enhanced network that explicitly incorporates financial knowledge into text representations, which have a multi-level encoder layer consisting of sentence-level representation and phrase-level representation.
- Specifically, we propose a financial co-attention adapter, which extracts attention vectors from both sentences to phrase and from phrase to sentence, thereby enhancing the text representation capabilities of the method.
- We introduce a multi-level similarity decoder layer that enhances the discriminative power of the model from three perspectives.
- Experimental results demonstrate that the proposed model performs significantly better than the previous state-of-the-art (SOTA) model.
2. Related Work
2.1. Dialogue Systems
2.2. Text Matching
3. Proposed Method
3.1. Problem Definition
3.2. Multi-Level Encoder Layer
3.2.1. Sentence-Level Representation
3.2.2. Phrase-Level Representation
3.3. Fin Co-Attention Adapter
3.4. Mutil-Level Similarity Decoder Layer
- Cosine Similarity
- Manhattan Distance
- Euclidean Distance
- Cross Entropy Loss The training objective of our method is to minimize the score of , is the output of the Cross entropy loss function, as follows:
4. Experiments
4.1. Dataset
4.2. Experimental Setups
4.3. Baseline Models
- DSSM. Ref. [6] DSSM is a deep-structured latent semantic model used to model queries and documents.
- CDSSM. Ref. [5] CDSSM enhances DSSM with Convolutional Neural Networks (CNN), resulting in a superior semantic model.
- QACNN. Ref. [37] QACNN employs multiple deep CNNs to address non-factoid question-answering tasks.
- QALSTM. Ref. [8] QALSTM utilizes a hybrid model incorporating both CNN and LSTM for applications in the insurance question-answering domain.
- DARCNN. Ref. [38] DARCNN leverages a hybrid model that combines self-attention, cross-attention, and CNN to model the answer selection task.
- BERT. Ref. [35] BERT is a pre-trained model based on the Transformer encoder architecture, using the contextual states corresponding to to determine text similarity.
- FinBERT. Ref. [10] FinBERT is pre-trained based on financial tasks and financial text, which leads to its output context vectors being more inclined to financial contexts.
4.4. Experimental Results
5. Analysis
5.1. Analysis of Ablation Experiments
- w/o phrase-level rep. To verify the effectiveness of financial phrase representations in our model, we removed the Phrase-level Representation layer from the proposed model. The model directly uses contextual vectors (from of FinBERT) to calculate the similarity between the Query and the Question. The final contextual vectors of and are as follows:
- w/o fin co-attn. To validate the effectiveness of the Fin Co-Attention adapter, we removed this module from the proposed model. The final contextual vectors for predicting text similarity are obtained by concatenating the contextual vectors (from of FinBERT) and the financial phrase vectors from the Phrase-level Representation layer.The formula is as follows:
5.2. Analysis of Multi-Level Similarity Decoder
- single Cosine Similarity (sCS). This layer only employs cosine similarity to calculate the similarity between Query and Question. The process of model prediction for the label Y is as follows:
- single Manhattan Distance (sMD). This layer only employs Manhattan distance to calculate the similarity between Query and Question. The process of model prediction for the label Y is as follows:
- single Euclidean Distance (sED). This layer only employs Euclidean distance to calculate the similarity between Query and Question. The process of model prediction for the label Y is as follows:
- CS + MD. This layer uses the average of cosine similarity and Manhattan distance to calculate the similarity between Query and Question. The process of model prediction for the label Y is as follows:
- CS + ED. This layer uses the average of cosine similarity and Euclidean distance to calculate the similarity between Query and Question. The process of model prediction for the label Y is as follows:
- MD + ED. This layer uses the average of Manhattan distance and Euclidean distance to calculate the similarity between Query and Question. The process of model prediction for the label Y is as follows:
- CS + MD + ED. This layer uses the average of cosine similarity, Manhattan distance, and Euclidean distance to calculate the similarity between Query and Question. The process of model prediction for the label Y is as follows:
5.3. Analysis of Case Study
5.4. Analysis of Effectiveness of Model in Different Language
5.5. Model Result Analysis
6. Conclusions, Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
- Du, Z.; Qian, Y.; Liu, X.; Ding, M.; Qiu, J.; Yang, Z.; Tang, J. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 320–335. [Google Scholar]
- Zeng, A.; Liu, X.; Du, Z.; Wang, Z.; Lai, H.; Ding, M.; Yang, Z.; Xu, Y.; Zheng, W.; Xia, X.; et al. Glm-130b: An open bilingual pre-trained model. arXiv 2022, arXiv:2210.02414. [Google Scholar]
- Sun, Y.; Wang, S.; Feng, S.; Ding, S.; Pang, C.; Shang, J.; Liu, J.; Chen, X.; Zhao, Y.; Lu, Y.; et al. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv 2021, arXiv:2107.02137. [Google Scholar]
- Shen, Y.; He, X.; Gao, J.; Deng, L.; Mesnil, G. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 373–374. [Google Scholar]
- Huang, P.S.; He, X.; Gao, J.; Deng, L.; Acero, A.; Heck, L. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013; pp. 2333–2338. [Google Scholar]
- Pang, L.; Lan, Y.; Guo, J.; Xu, J.; Wan, S.; Cheng, X. Text matching as image recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Tan, M.; dos Santos, C.; Xiang, B.; Zhou, B. Improved Representation Learning for Question Answer Matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 464–473. [Google Scholar] [CrossRef]
- Li, Z.; Yang, X.; Zhou, L.; Jia, H.; Li, W. Text Matching in Insurance Question-Answering Community Based on an Integrated BiLSTM-TextCNN Model Fusing Multi-Feature. Entropy 2023, 25, 639. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Huang, D.; Huang, K.; Li, Z.; Zhao, J. Finbert: A pre-trained financial language representation model for financial text mining. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 7–5 January 2021; pp. 4513–4519. [Google Scholar]
- Kumar, V.; Reforgiato Recupero, D.; Helaoui, R.; Riboni, D. K-LM: Knowledge Augmenting in Language Models Within the Scholarly Domain. IEEE Access 2022, 10, 91802–91815. [Google Scholar] [CrossRef]
- Guo, A.; Ohashi, A.; Hirai, R.; Chiba, Y.; Tsunomori, Y.; Higashinaka, R. Influence of user personality on dialogue task performance: A case study using a rule-based dialogue system. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar]
- Niimi, Y.; Oku, T.; Nishimoto, T.; Araki, M. A rule based approach to extraction of topics and dialog acts in a spoken dialog system. In Proceedings of the Interspeech, Aalborg, Denmark, 3–7 September 2001. [Google Scholar]
- Nakano, M.; Komatani, K. A framework for building closed-domain chat dialogue systems. Knowl.-Based Syst. 2020, 204, 106212. [Google Scholar] [CrossRef]
- Alty, J.L.; Johannsen, G. Knowledge-based dialogue for dynamic systems. Automatica 1989, 25, 829–840. [Google Scholar] [CrossRef]
- Ultes, S.; Barahona, L.M.R.; Su, P.H.; Vandyke, D.; Kim, D.; Casanueva, I.; Budzianowski, P.; Mrkšić, N.; Wen, T.H.; Gasic, M.; et al. Pydial: A multi-domain statistical dialogue system toolkit. In Proceedings of the ACL 2017, System Demonstrations, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 73–78. [Google Scholar]
- Zhao, M.; Wang, L.; Jiang, Z.; Li, R.; Lu, X.; Hu, Z. Multi-task learning with graph attention networks for multi-domain task-oriented dialogue systems. Knowl.-Based Syst. 2023, 259, 110069. [Google Scholar] [CrossRef]
- Bowden, K.K.; Oraby, S.; Misra, A.; Wu, J.; Lukin, S.M.; Walker, M.A. Data-Driven Dialogue Systems for Social Agents. arXiv 2017, arXiv:1709.03190. [Google Scholar]
- Vakulenko, S.; Revoredo, K.; Ciccio, C.D.; de Rijke, M. QRFA: A Data-Driven Model of Information-Seeking Dialogues. In Proceedings of the European Conference on Information Retrieval, Grenoble, France, 26–29 March 2018. [Google Scholar]
- Cuayáhuitl, H. SimpleDS: A Simple Deep Reinforcement Learning Dialogue System. arXiv 2016, arXiv:1601.04574. [Google Scholar]
- Bunga, M.H.T.; Suyanto, S. Developing a Complete Dialogue System Using Long Short-Term Memory. In Proceedings of the 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 5–6 December 2019; pp. 326–329. [Google Scholar]
- Rao, J.; Liu, L.; Tay, Y.; Yang, W.; Shi, P.; Lin, J. Bridging the gap between relevance matching and semantic matching for short text similarity modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5370–5381. [Google Scholar]
- Nie, Y.; Bansal, M. Shortcut-stacked sentence encoders for multi-domain inference. arXiv 2017, arXiv:1708.02312. [Google Scholar]
- Mueller, J.; Thyagarajan, A. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Li, M.; Bi, X.; Wang, L.; Han, X.; Wang, L.; Zhou, W. Text Similarity Measurement Method and Application of Online Medical Community Based on Density Peak Clustering. J. Organ. End User Comput. 2022, 34, 1–25. [Google Scholar] [CrossRef]
- Zhou, X.; Dong, D.; Wu, H.; Zhao, S.; Yu, D.; Tian, H.; Liu, X.; Yan, R. Multi-view response selection for human-computer conversation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 372–381. [Google Scholar]
- Parikh, A.P.; Täckström, O.; Das, D.; Uszkoreit, J. A decomposable attention model for natural language inference. arXiv 2016, arXiv:1606.01933. [Google Scholar]
- Wang, S.; Jiang, J. A compare-aggregate model for matching text sequences. arXiv 2016, arXiv:1611.01747. [Google Scholar]
- He, H.; Lin, J. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 937–948. [Google Scholar]
- Zhou, X.; Li, L.; Dong, D.; Liu, Y.; Chen, Y.; Zhao, W.X.; Yu, D.; Wu, H. Multi-turn response selection for chatbots with deep attention matching network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 1118–1127. [Google Scholar]
- Zhao, J.; Zhan, W.; Zhao, X.; Zhang, Q.; Gui, T.; Wei, Z.; Wang, J.; Peng, M.; Sun, M. RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot Relation Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistic: Toronto, ON, Canada, 2023; pp. 6680–6691. [Google Scholar] [CrossRef]
- Huang, Z.; Zhao, W. A semantic matching approach addressing multidimensional representations for web service discovery. Expert Syst. Appl. 2022, 210, 118468. [Google Scholar] [CrossRef]
- Mishra, A.R.; Panchal, V. A novel approach to capture the similarity in summarized text using embedded model. Int. J. Smart Sens. Intell. Syst. 2022, 15, 1–20. [Google Scholar] [CrossRef]
- Kuang, Q.; Xu, T.; Chen, S. Long Text QA Matching Based on ESIM of Fusion Convolution Feature. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2022; pp. 1737–1741. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Feng, M.; Xiang, B.; Glass, M.R.; Wang, L.; Zhou, B. Applying deep learning to answer selection: A study and an open task. In Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, 13–17 December 2015; pp. 813–820. [Google Scholar] [CrossRef]
- Bao, G.; Wei, Y.; Sun, X.; Zhang, H. Double attention recurrent convolution neural network for answer selection. R. Soc. Open Sci. 2020, 7, 191517. [Google Scholar] [CrossRef] [PubMed]
Chinese | English |
---|---|
借呗 | Ant Cash Now |
花呗 | Ant Checklater |
余额宝 | Alibaba’s Yu’E Bao |
收钱码 | Payment Code |
芝麻信用 | Zhima Credit |
逾期 | Overdue |
双十一 | Double Eleventh Day |
微信红包 | WeChat Red Packet |
Symbol Definition | Description |
---|---|
model inputs | |
user query | |
pre-defined questions | |
C | FinBERT inputs |
Fin-keywords sequence | |
Y | model output |
Training | Validation | Test | |
---|---|---|---|
Number of sentence-pairs | 71,734 | 20,495 | 10,248 |
Positive labels | 13,079 | 3737 | 1869 |
Negative labels | 58,654 | 16,758 | 8380 |
Mean length of positive sentence-pairs | 26 | 25 | 26 |
Mean length of negative sentence-pairs | 25 | 25 | 25 |
Parameter Name | Size |
---|---|
FinBERT Hadden Dimension | 768 |
FinBERT Attention Layers | 12 |
FinBERT Attention Heads | 12 |
Financial Keywords Embedding | 768 |
Co-Attention Dimension | 768 |
Dropout | 0.1 |
Batch size | 256 |
Number of epochs | 20 |
Learning rate | 5 × |
Model | Acc (%) | R (%) | F1 (%) |
---|---|---|---|
DSSM [6] | 70.38 | 69.89 | 70.25 |
CDSSM [5] | 70.52 | 70.39 | 70.87 |
QACNN [37] | 69.76 | 70.07 | 69.78 |
QALSTM [8] | 70.31 | 70.96 | 70.83 |
DARCNN [38] | 71.21 | 71.43 | 71.42 |
BERT [35] | 72.04 | 72.53 | 72.50 |
FinBERT [10] | 73.10 | 73.21 | 73.16 |
FinKENet (Ours) | 74.15 | 74.90 | 74.53 |
Model | Acc (%) | R (%) | F1 (%) |
---|---|---|---|
w/o phrase-level rep | 71.67 | 71.86 | 71.92 |
w/o fin co-attn | 72.81 | 73.11 | 72.96 |
FinKENet (Ours) | 74.15 | 74.90 | 74.53 |
Model | Acc (%) | R (%) | F1 (%) |
---|---|---|---|
sCS | 73.03 | 74.12 | 73.89 |
sMD | 73.28 | 74.23 | 73.91 |
sED | 73.31 | 74.18 | 73.90 |
CS + MD | 73.86 | 74.76 | 74.12 |
CS + ED | 73.81 | 74.78 | 74.20 |
MD + ED | 73.92 | 74.77 | 74.18 |
CS + MD + ED | 74.15 | 74.90 | 74.53 |
Model | Query/Question | Label |
---|---|---|
DSSM | 花呗怎么用?/ 蚂蚁花呗如何开通? | 0 |
FinKENet (Ours) | 花呗怎么用?/ 蚂蚁花呗如何开通? | 1 |
Model | Acc (%) | R (%) | F1 (%) |
---|---|---|---|
DSSM [6] | 70.16 | 69.75 | 70.14 |
CDSSM [5] | 70.36 | 70.17 | 70.64 |
QACNN [37] | 69.53 | 69.96 | 69.53 |
QALSTM [8] | 70.17 | 70.73 | 70.68 |
DARCNN [38] | 71.06 | 71.27 | 71.26 |
BERT [35] | 71.84 | 72.37 | 72.39 |
FinBERT [10] | 72.76 | 72.86 | 72.93 |
FinKENet (Ours) | 73.78 | 73.87 | 73.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Y.; Liang, T.; Chen, Z.; Yang, B.; Wang, J.; Zhao, Y. FinKENet: A Novel Financial Knowledge Enhanced Network for Financial Question Matching. Entropy 2024, 26, 26. https://doi.org/10.3390/e26010026
Guo Y, Liang T, Chen Z, Yang B, Wang J, Zhao Y. FinKENet: A Novel Financial Knowledge Enhanced Network for Financial Question Matching. Entropy. 2024; 26(1):26. https://doi.org/10.3390/e26010026
Chicago/Turabian StyleGuo, Yu, Ting Liang, Zhongpu Chen, Binchen Yang, Jun Wang, and Yu Zhao. 2024. "FinKENet: A Novel Financial Knowledge Enhanced Network for Financial Question Matching" Entropy 26, no. 1: 26. https://doi.org/10.3390/e26010026
APA StyleGuo, Y., Liang, T., Chen, Z., Yang, B., Wang, J., & Zhao, Y. (2024). FinKENet: A Novel Financial Knowledge Enhanced Network for Financial Question Matching. Entropy, 26(1), 26. https://doi.org/10.3390/e26010026