[go: up one dir, main page]

CN115658881A - Sequence-to-sequence text summary generation method and system based on causality - Google Patents

Sequence-to-sequence text summary generation method and system based on causality Download PDF

Info

Publication number
CN115658881A
CN115658881A CN202211215316.3A CN202211215316A CN115658881A CN 115658881 A CN115658881 A CN 115658881A CN 202211215316 A CN202211215316 A CN 202211215316A CN 115658881 A CN115658881 A CN 115658881A
Authority
CN
China
Prior art keywords
abstract
document
sequence
text
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211215316.3A
Other languages
Chinese (zh)
Inventor
郭嘉丰
陈薇
张儒清
陈璐
范意兴
程学旗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202211215316.3A priority Critical patent/CN115658881A/en
Publication of CN115658881A publication Critical patent/CN115658881A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a sequence-to-sequence text abstract generation method and system based on causal relation, and belongs to the field of natural language processing and automatic text abstract generation. The method is inspired by the causal theory, and the causal relationship of each element in the abstract task is researched from the data generation perspective. Firstly, introducing two non-observable variables to obtain a structural cause-and-effect model of an abstract task; and then obtaining a corresponding sequence to a sequence generation framework according to the structure cause and effect model, wherein the corresponding sequence is used for modeling the generation process of the original text and the abstract. The framework comprises three core modules: a double hidden variable variation coder, a textual reconstruction decoder and a digest prediction decoder. Compared with the existing end-to-end deep text summarization method, the method has stronger interpretability, better summarization performance and stronger generalization capability. The method is a sequence-to-sequence framework with strong applicability, and therefore can be migrated to more model bodies, generation tasks and different data sets.

Description

基于因果关系的序列到序列文本摘要生成方法及系统Sequence-to-sequence text summary generation method and system based on causality

技术领域technical field

本发明属于自然语言处理的文本摘要技术中序列到序列文本摘要技术领域,并特别涉及一种基于因果关系的序列到序列文本摘要生成方法及系统。The invention belongs to the technical field of sequence-to-sequence text summarization in the text summarization technology of natural language processing, and particularly relates to a sequence-to-sequence text summarization method and system based on causality.

背景技术Background technique

自动文本摘要技术旨在从输入文档中自动识别重要主题和关键信息,并生成准确、简洁又流畅的文本作为摘要。早期的摘要技术主要围绕人工设计的启发式规则和模板进行探究;如今,基于神经网络的深度文本摘要技术成为主流。它通过有监督训练端到端地学习原文和摘要的匹配模式,可分为抽取式和生成式摘要方法。Automatic text summarization technology aims to automatically identify important topics and key information from input documents, and generate accurate, concise and fluent text as a summary. Early summarization techniques were mainly explored around artificially designed heuristic rules and templates; today, deep text summarization techniques based on neural networks have become mainstream. It learns matching patterns of source text and summarization end-to-end through supervised training, which can be divided into extractive and generative summarization methods.

深度抽取式摘要方法抽取原文中的句子子集作为摘要文本。它通常将摘要任务转换为序列标注或排序形式的任务,通过对文档中每个侯选句子进行二分类标注,或者对侯选句子按照重要程度排序来选择关键句子构成摘要。深度生成式摘要方法则将摘要任务视为序列到序列的生成任务,以词语或短语作为基本生成单元,从无到有地生成摘要。由于摘要文本不拘泥于原始文本的表达,因此该方法生成的摘要具备更高的灵活性和多样性,适用性更强。The deep extractive summarization method extracts a subset of sentences in the original text as the summary text. It usually converts the summarization task into a sequence labeling or sorting task, and selects key sentences to form a summary by classifying each candidate sentence in the document, or sorting the candidate sentences according to their importance. The deep generative summarization method regards the summarization task as a sequence-to-sequence generation task, and uses words or phrases as the basic generation unit to generate summaries from scratch. Since the abstract text is not limited to the expression of the original text, the abstract generated by this method has higher flexibility and diversity, and stronger applicability.

近年来,以Transformer为主要结构的预训练模型为深度文本摘要方法进一步带来了性能增益。生成式方法从预训练模型中继承了优秀的语言表达能力。结合自身适用性强的优势,基于预训练模型的生成式摘要方法逐渐成为主流的研究课题。In recent years, the pre-training model with Transformer as the main structure has further brought performance gains to deep text summarization methods. Generative methods inherit excellent language expressiveness from pre-trained models. Combined with the advantages of its own strong applicability, the generative summarization method based on the pre-trained model has gradually become a mainstream research topic.

基于预训练模型的生成式摘要方法存在以下缺点。The generative summarization methods based on pre-trained models have the following disadvantages.

(1)首先,现有技术中的数据驱动方法缺乏可解释性。这是由于在端到端的训练方式中,用户只需给模型喂入样本及其标签,模型自动学习原文和摘要的匹配模式,这种“黑盒模式”对用户来说是不可理解的。(1) First, the data-driven methods in the prior art lack interpretability. This is because in the end-to-end training method, the user only needs to feed the model samples and their labels, and the model automatically learns the matching mode of the original text and the abstract. This "black box mode" is incomprehensible to the user.

(2)然后,现有技术生成的摘要常会包含多余的信息。这些冗余信息并非原文中的核心信息。这是由于现有技术在学习原文和摘要的匹配模式时,倾向于利用数据集中所有的相关关系,其中包含大量伪相关关系,继承了训练语料库中的统计偏差。这时模型容易被数据集中易学的表面特征所误导,比如高频共现但不具备必然联系的文本对。例如,“红色枫叶”作为高频共现的文本对,“红色”和“枫叶”具备高度统计相关性,但不具备必然的因果关系,“枫叶”也可以是“绿色”的。(2) However, summaries generated by prior art often contain redundant information. These redundant information are not the core information in the original text. This is due to the fact that existing techniques tend to exploit all correlations in the dataset when learning the matching patterns of the original text and the abstract, which contain a large number of pseudo-correlations, inheriting the statistical bias in the training corpus. At this time, the model is easily misled by the easy-to-learn surface features in the data set, such as text pairs that co-occur frequently but do not have necessary connections. For example, "red maple leaf" is a high-frequency co-occurrence text pair. "Red" and "maple leaf" have a high statistical correlation, but do not have an inevitable causal relationship. "Maple leaf" can also be "green".

发明内容Contents of the invention

本发明的目的是解决上述现有技术过度依赖相关关系、容易受伪相关误导的问题,提出了一种受因果理论启发的序列到序列框架。The purpose of the present invention is to solve the problem that the above-mentioned prior art relies too much on correlation and is easily misled by false correlation, and proposes a sequence-to-sequence framework inspired by causality theory.

本发明还提出了一种基于因果关系的序列到序列文本摘要生成方法,其中包括:The present invention also proposes a sequence-to-sequence text summary generation method based on causality, which includes:

步骤1、将原文档输入基于神经网络的双隐变量变分编码器,该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征;Step 1, inputting the original document into a neural network-based double hidden variable variational encoder, the double hidden variable variational encoder performs multiple sampling of the original document to extract abstract-related features and abstract-independent features in the original document;

步骤2、拼接该摘要相关特征和摘要无关特征,得到摘要综合特征,基于该摘要综合特征对原文档进行重构,得到重构文档,并以该原文档为训练目标,基于该重构文档和该训练目标构建损失函数,训练双隐变量变分编码器;Step 2. Concatenate the abstract-related features and abstract-independent features to obtain abstract comprehensive features. Based on the abstract comprehensive features, the original document is reconstructed to obtain a reconstructed document, and the original document is used as the training target. Based on the reconstructed document and The training objective constructs a loss function and trains a double hidden variable variational encoder;

步骤3、采用训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征,基于该目标特征,得到该原文档的文本摘要。Step 3: Using the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.

所述的基于因果关系的序列到序列文本摘要生成方法,其中该步骤1包括:The sequence-to-sequence text summary generation method based on causality, wherein the step 1 includes:

通过文档编码器得到文档的编码表示向量,该双隐变量变分编码器中的文档编码器对该原文档X进行编码,得到文档的编码表示向量hdocObtain the encoded representation vector of the document through the document encoder, and the document encoder in the double hidden variable variational encoder encodes the original document X to obtain the encoded representation vector h doc of the document;

双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行编码及采样,分别得到隐变量hc和hncThe variational encoder in the double hidden variable variational encoder module encodes and samples the coded representation vector h doc of the document to obtain hidden variables h c and h nc respectively;

将hc和hnc输入原文重构解码器,得到每个位置上的输出oi;叠加hc和hnc和oi,输入语言模型输出层生成重构的原文档X′;根据X′和X计算重构损失LR;存储LR最小时对应的隐变量表示hc和hnc分别作为该摘要相关特征和该摘要无关特征。Input h c and h nc into the original text reconstruction decoder to obtain the output o i at each position; superimpose h c and h nc and o i , input the output layer of the language model to generate the reconstructed original document X′; according to X′ and X to calculate the reconstruction loss L R ; store the hidden variable representations h c and h nc corresponding to the minimum L R as the summary-related features and the summary-independent features, respectively.

所述的基于因果关系的序列到序列文本摘要生成方法,其中该步骤3包括:The sequence-to-sequence text summary generation method based on causality, wherein the step 3 includes:

摘要预测解码器根据该目标特征,计算每个位置上的输出rj;叠加该目标特征和rj,输入语言模型输出层,得到词表分布概率,以生成对应的词语组成该文本摘要。The abstract predictive decoder calculates the output r j at each position according to the target feature; superimposes the target feature and r j , and inputs it into the output layer of the language model to obtain the vocabulary distribution probability to generate the corresponding words to form the text summary.

所述的基于因果关系的序列到序列文本摘要生成方法,其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为:The causality-based sequence-to-sequence text summary generation method, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:

获取训练文档和对应的参考摘要,将训练文档输入基于神经网络的双隐变量变分编码器,得到隐变量表示hc和hnc;同时分别得到hc和hnc的高斯分布

Figure BDA0003875881570000031
Figure BDA0003875881570000032
Obtain the training document and the corresponding reference abstract, input the training document into the double hidden variable variational encoder based on the neural network, and obtain the hidden variable representations h c and h nc ; at the same time, obtain the Gaussian distribution of h c and h nc respectively
Figure BDA0003875881570000031
and
Figure BDA0003875881570000032

使用隐变量表示hc和hnc生成重构原文;并使用隐变量表示hc生成摘要结果;Use hidden variables to represent h c and h nc to generate reconstructed original text; and use hidden variables to represent h c to generate summary results;

基于训练文档和重构原文构建原文重构损失LR;基于该参考摘要和该摘要结果构建摘要预测损失LP;基于标准正态分布

Figure BDA0003875881570000033
和该高斯分布
Figure BDA0003875881570000034
计算KL散度;Construct the original text reconstruction loss L R based on the training document and reconstructed original text; construct the summary prediction loss L P based on the reference summary and the summary result; based on the standard normal distribution
Figure BDA0003875881570000033
and the Gaussian distribution
Figure BDA0003875881570000034
Calculate the KL divergence;

最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度,基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the dual hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.

本发明还提出了一种基于因果关系的序列到序列文本摘要生成系统,其中包括:The present invention also proposes a sequence-to-sequence text summary generation system based on causality, which includes:

采样模块,用于将原文档输入基于神经网络的双隐变量变分编码器,该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征;The sampling module is used to input the original document into a neural network-based double hidden variable variation encoder, and the double hidden variable variation encoder performs multiple sampling on the original document to extract abstract-related features and abstract-independent features in the original document;

拼接模块,用于拼接该摘要相关特征和摘要无关特征,得到摘要综合特征,基于该摘要综合特征对原文档进行重构,得到重构文档,并以该原文档为训练目标,基于该重构文档和该训练目标构建损失函数,训练双隐变量变分编码器;The splicing module is used to splicing the abstract-related features and the abstract-independent features to obtain the abstract comprehensive features, reconstruct the original document based on the abstract comprehensive features, and obtain the reconstructed document, and take the original document as the training target, based on the reconstructed The document and the training target construct a loss function to train a double hidden variable variational encoder;

提取模块,用于以训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征,基于该目标特征,得到该原文档的文本摘要。The extraction module is used to use the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.

所述的基于因果关系的序列到序列文本摘要生成系统,其中该采样模块用于:The sequence-to-sequence text summarization system based on causality, wherein the sampling module is used for:

通过文档编码器得到文档的编码表示向量,该双隐变量变分编码器中的文档编码器对该原文档X进行编码,得到文档的编码表示向量hdocObtain the encoded representation vector of the document through the document encoder, and the document encoder in the double hidden variable variational encoder encodes the original document X to obtain the encoded representation vector h doc of the document;

双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行编码及采样,分别得到隐变量hc和hncThe variational encoder in the double hidden variable variational encoder module encodes and samples the coded representation vector h doc of the document to obtain hidden variables h c and h nc respectively;

将hc和hnc输入原文重构解码器,得到每个位置上的输出oi;叠加hc和hnc和oi,输入语言模型输出层生成重构的原文档X′;根据X′和X计算重构损失LR;存储LR最小时对应的隐变量表示hc和hnc分别作为该摘要相关特征和该摘要无关特征。Input h c and h nc into the original text reconstruction decoder to obtain the output o i at each position; superimpose h c and h nc and o i , input the output layer of the language model to generate the reconstructed original document X′; according to X′ and X to calculate the reconstruction loss L R ; store the hidden variable representations h c and h nc corresponding to the minimum L R as the summary-related features and the summary-independent features, respectively.

所述的基于因果关系的序列到序列文本摘要生成系统,其中该提取模块用于:The sequence-to-sequence text summarization system based on causality, wherein the extraction module is used for:

摘要预测解码器根据该目标特征,计算每个位置上的输出rj;叠加该目标特征和rj,输入语言模型输出层,得到词表分布概率,以生成对应的词语组成该文本摘要。The abstract predictive decoder calculates the output r j at each position according to the target feature; superimposes the target feature and r j , and inputs it into the output layer of the language model to obtain the vocabulary distribution probability to generate the corresponding words to form the text summary.

所述的基于因果关系的序列到序列文本摘要生成系统,其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为:The causality-based sequence-to-sequence text summary generation system, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:

获取训练文档和对应的参考摘要,将训练文档输入基于神经网络的双隐变量变分编码器,得到隐变量表示hc和hnc;同时分别得到hc和hnc的高斯分布

Figure BDA0003875881570000041
Figure BDA0003875881570000042
Obtain the training document and the corresponding reference abstract, input the training document into the double hidden variable variational encoder based on the neural network, and obtain the hidden variable representations h c and h nc ; at the same time, obtain the Gaussian distribution of h c and h nc respectively
Figure BDA0003875881570000041
and
Figure BDA0003875881570000042

使用隐变量表示hc和hnc生成重构原文;并使用隐变量表示hc生成摘要结果;Use hidden variables to represent h c and h nc to generate reconstructed original text; and use hidden variables to represent h c to generate summary results;

基于训练文档和重构原文构建原文重构损失LR;基于该参考摘要和该摘要结果构建摘要预测损失LP;基于标准正态分布

Figure BDA0003875881570000043
和该高斯分布
Figure BDA0003875881570000044
计算KL散度;Construct the original text reconstruction loss L R based on the training document and reconstructed original text; construct the summary prediction loss L P based on the reference summary and the summary result; based on the standard normal distribution
Figure BDA0003875881570000043
and the Gaussian distribution
Figure BDA0003875881570000044
Calculate the KL divergence;

最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度,基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the dual hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.

本发明还提出了一种存储介质,用于存储执行所述任意一种基于因果关系的序列到序列文本摘要生成方法的程序。The present invention also proposes a storage medium for storing a program for executing any one of the sequence-to-sequence text summarization methods based on causality.

本发明还提出了一种客户端,用于所述任意一种基于因果关系的序列到序列文本摘要生成系统。The present invention also proposes a client, which is used for any of the sequence-to-sequence text summary generation systems based on causality.

由以上方案可知,本发明的CI-Seq2Seq框架与现有技术相比的优势在于:It can be seen from the above scheme that the advantages of the CI-Seq2Seq framework of the present invention compared with the prior art are:

(1)更强的可解释性。与现有技术笼统地学习原文的表示相比,本专利提出的方法能够有效地判别和提取原文中SC和SNC的隐变量表示hc和hnc。我们通过t-SNE可视化分析对所学隐变量表示的分布情况进行验证,如图1所示。从图1可以看出学到的hc和hnc表示的分布空间明显不同:hc的分布空间更加集中,而hnc的分布空间则更加分散。这证明了SC和SNC分别学到了核心信息和表示多样性的信息。(1) Stronger interpretability. Compared with the prior art which generally learns the representation of the original text, the method proposed in this patent can effectively distinguish and extract the latent variable representations h c and h nc of SC and SNC in the original text. We use t-SNE visual analysis to verify the distribution of the learned hidden variables, as shown in Figure 1. From Figure 1, it can be seen that the distribution spaces represented by the learned h c and h nc are significantly different: the distribution space of h c is more concentrated, while the distribution space of h nc is more dispersed. This proves that SC and SNC have learned core information and information representing diversity, respectively.

(2)更好的摘要性能。与现有技术相比,本专利提出的方法能够有效地提升摘要生成的质量。我们在公开的摘要数据集CNN/DM和XSUM上进行实验,将CI-Seq2Seq与其他基线模型对比。对比的基线模型包括两个同样基于变分自编码器实现的Unified VAE-PGN、VHTM,三个经典的通用预训练模型T5、BART、GLM,以及使用对比学习、对抗学习和长度可控等方法实现的CLIFF、Debiased-Ext和PtLAAM。评价指标是Rouge-1、Rouge-2和Rouge-L,它们分别衡量了单词、二元词组和最长公共子序列的召回率。对比结果如表1所示。根据表1可以看到,不仅是和其他基于变分自编码器实现的模型、通用强大的预训练模型,还是和使用其他训练技巧的模型相比,我们提出的方法在两个数据集上的所有指标都能获得明显提升。(2) Better summarization performance. Compared with the prior art, the method proposed in this patent can effectively improve the quality of summary generation. We conduct experiments on publicly available summarization datasets CNN/DM and XSUM to compare CI-Seq2Seq with other baseline models. The baseline models for comparison include two Unified VAE-PGN and VHTM, which are also based on variational autoencoders, three classic general-purpose pre-training models T5, BART, and GLM, and methods such as contrastive learning, confrontational learning, and length control Implemented CLIFF, Debiased-Ext and PtLAAM. The evaluation metrics are Rouge-1, Rouge-2, and Rouge-L, which measure the recall of words, bigrams, and longest common subsequences, respectively. The comparison results are shown in Table 1. According to Table 1, it can be seen that not only compared with other models based on variational autoencoders, general and powerful pre-training models, but also compared with models using other training techniques, our proposed method has better performance on the two data sets. All indicators can be significantly improved.

表1:CNN/DM和XSUM数据集上的摘要性能比较Table 1: Summary performance comparison on CNN/DM and XSUM datasets

Figure BDA0003875881570000051
Figure BDA0003875881570000051

(3)更强的泛化能力。与现有技术相比,本专利提出的方法能够有效地提升模型的泛化能力。我们将在CNN/DM上训练的模型用于XSUM测试集,将在XSUM上训练的模型用于CNN/DM测试集,由此测试不同模型在未见过的数据集上的表现能力,实验结果见表2。通过与原始框架BART进行对比,我们的方法能够获得一定的提升,证明了我们的模型具备更强的泛化能力。(3) Stronger generalization ability. Compared with the prior art, the method proposed in this patent can effectively improve the generalization ability of the model. We use the model trained on CNN/DM for the XSUM test set, and the model trained on XSUM for the CNN/DM test set to test the performance capabilities of different models on unseen datasets. Experimental results See Table 2. By comparing with the original framework BART, our method can get a certain improvement, which proves that our model has stronger generalization ability.

表2:CNN/DM和XSUM数据集上交叉实验的泛化性能比较Table 2: Comparison of generalization performance of crossover experiments on CNN/DM and XSUM datasets

Figure BDA0003875881570000052
Figure BDA0003875881570000052

附图说明Description of drawings

图1为t-SNE可视化分析示意图;Figure 1 is a schematic diagram of t-SNE visual analysis;

图2是使用CI-Seq2Seq生成摘要文本的总流程图;Figure 2 is a general flowchart of generating summary text using CI-Seq2Seq;

图3是使用CI-Seq2Seq生成摘要文本时,通过n次采样得到初始隐变量表示和的流程图;Figure 3 is a flow chart of obtaining the initial hidden variable representation sum through n sampling when using CI-Seq2Seq to generate summary text;

图4是使用CI-Seq2Seq生成摘要文本时,通过k轮迭代更新得到最优隐变量表示和的流程图;Figure 4 is a flowchart of obtaining the optimal hidden variable representation sum through k rounds of iterative updates when CI-Seq2Seq is used to generate summary text;

图5是使用CI-Seq2Seq生成摘要文本时,使用最优隐变量表示生成摘要的流程图;Figure 5 is a flow chart of generating a summary using the optimal hidden variable representation when using CI-Seq2Seq to generate a summary text;

图6是本发明方法的训练过程图。Fig. 6 is a diagram of the training process of the method of the present invention.

具体实施方式Detailed ways

发明人结合因果理论对文本摘要任务进行研究时,发现现有技术生成冗余摘要的缺陷是由对数据集中相关关系的过度依赖导致的。现有技术过度依赖摘要任务中各成分的相关关系,而没有考虑更本质的因果关系。因果关系描述了潜在的数据生成过程,现有技术仅仅依赖可观测变量难以刻画数据背后自然的因果关系。When the inventor combined the causal theory to study the text summarization task, he found that the defect of generating redundant summaries in the prior art is caused by the over-reliance on the correlation relationship in the data set. Existing techniques rely too much on the correlation of components in the summarization task, without considering the more essential causal relationship. The causal relationship describes the underlying data generation process, and it is difficult for existing technologies to describe the natural causal relationship behind the data only by relying on observable variables.

本发明引入两个不可观测变量并使用因果感知建模相应的数据生成过程,以解决现有技术过度依赖相关关系、容易受伪相关误导的问题。具体而言,本发明为摘要任务设计了一个结构因果模型,其中涉及到的变量除了可观测的原文和摘要外,还包含两个不可观测变量,分别表示决定摘要内容的信息(summary-causal factors,SC),如原文中的核心信息,和语料库中不决定摘要内容的其他信息(summary-non-causal factors,SNC),如导致原文生成多样性的边缘信息。两个不可观测变量的根本区别在于只有前者SC是摘要产生的原因。通过显式地区分两种信息,本发明设计了一个受因果理论启发的序列到序列框架(Causality Inspired Sequence-to-Sequence model,CI-Seq2Seq)来刻画摘要和文本的生成过程,在生成摘要时只利用SC,而在生成原文时两者都使用。模型结构上,本发明对传统的序列到序列框架中的编码器-解码器架构做出如下改进:将单编码器、单解码器改进为双隐变量变分编码器、原文重构解码器和摘要预测解码器。具体来说本发明包括如下关键技术点:The present invention introduces two unobservable variables and uses causal perception to model the corresponding data generation process, so as to solve the problem of excessive dependence on correlation relationship and easy misleading by false correlation in the prior art. Specifically, the present invention designs a structural causal model for the summarization task, in which the variables involved include two unobservable variables in addition to the observable original text and abstract, respectively representing the information that determines the content of the abstract (summary-causal factors , SC), such as the core information in the original text, and other information (summary-non-causal factors, SNC) in the corpus that do not determine the content of the summary, such as marginal information that leads to the diversity of the original text. The fundamental difference between the two unobservable variables is that only the former SC is responsible for the summary. By explicitly distinguishing the two kinds of information, the present invention designs a sequence-to-sequence framework (Causality Inspired Sequence-to-Sequence model, CI-Seq2Seq) inspired by causality theory to describe the generation process of abstracts and texts. When generating abstracts Only SC is utilized, while both are used when generating source text. In terms of model structure, the present invention makes the following improvements to the encoder-decoder architecture in the traditional sequence-to-sequence framework: the single encoder and single decoder are improved to double hidden variable variational encoders, original text reconstruction decoders and Digest predictive decoder. Specifically, the present invention includes the following key technical points:

关键点1,从数据生成过程的角度设计的文本摘要结构因果模型;技术效果:根据该结构因果模型设计的文本摘要方法能够显式地区分两种信息,学到SC和SNC各自的表示,具备更强的可解释性;Key point 1, the text summarization structural causal model designed from the perspective of the data generation process; technical effect: the text summarization method designed according to the structural causal model can explicitly distinguish two kinds of information, learn the respective representations of SC and SNC, and have Stronger interpretability;

关键点2,受因果理论启发的序列到序列框架CI-Seq2Seq;技术效果:该框架能够有效区分SC和SNC并获得各自的表示,其中SC的表示用于生成摘要,SC和SNC都用于生成原文;该框架基于有监督的变分自编码器实现,能有效提高生成摘要的质量。其中核心模块包括:双隐变量变分编码器:利用神经网络,将原文编码为SC和SNC的隐变量表示;原文重构解码器:利用神经网络,将SC和SNC的隐变量表示进行解码,用于重构原文;摘要预测解码器:利用神经网络,将SC的隐变量表示进行解码,用于预测摘要。Key point 2, CI-Seq2Seq, a sequence-to-sequence framework inspired by causality theory; technical effect: the framework can effectively distinguish SC and SNC and obtain their respective representations, where the representation of SC is used to generate summaries, and both SC and SNC are used to generate Original text; The framework is implemented based on a supervised variational autoencoder, which can effectively improve the quality of generated summaries. The core modules include: double hidden variable variational encoder: use the neural network to encode the original text into the hidden variable representation of SC and SNC; original text reconstruction decoder: use the neural network to decode the hidden variable representation of SC and SNC, Used to reconstruct the original text; abstract prediction decoder: use the neural network to decode the latent variable representation of SC to predict the abstract.

为让本发明的上述特征和效果能阐述的更明确易懂,下文特举实施例,并配合说明书附图作详细说明如下。In order to make the above-mentioned features and effects of the present invention more clear and understandable, the following specific examples are given together with the accompanying drawings for detailed description as follows.

如图2所示,是相应的总流程图。本发明生成摘要文本的总体流程包括以下几个步骤:As shown in Figure 2, it is the corresponding overall flow chart. The overall flow of the present invention to generate abstract text includes the following steps:

步骤S1:输入原文档X,将输入原文档X输入基于神经网络的双隐变量变分编码器模块。Step S1: Input the original document X, and input the original document X into the neural network-based double hidden variable variational encoder module.

步骤S2:通过n次采样得到初始隐变量表示hc0和hnc0。具体流程如图3所示,包括如下子步骤:Step S2: Obtain initial hidden variable representations h c0 and h nc0 through n times of sampling. The specific process is shown in Figure 3, including the following sub-steps:

步骤S201:通过文档编码器得到文档的编码表示向量hdoc。双隐变量变分编码器模块中的文档编码器对输入的原文档X进行编码,得到文档的编码表示向量hdocStep S201: Obtain the encoded representation vector h doc of the document through the document encoder. The document encoder in the double hidden variable variational encoder module encodes the input original document X to obtain the encoded representation vector h doc of the document.

步骤S202:通过变分编码器得到隐变量SC和SNC的高斯分布。双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行进一步编码,得到隐变量SC和SNC的高斯分布(分布参数为μc

Figure BDA0003875881570000071
μnc
Figure BDA0003875881570000072
)。Step S202: Gaussian distributions of latent variables SC and SNC are obtained through a variational encoder. The variational encoder in the double hidden variable variational encoder module further encodes the coded representation vector h doc of the document, and obtains the Gaussian distribution of hidden variables SC and SNC (the distribution parameters are μ c ,
Figure BDA0003875881570000071
μ nc and
Figure BDA0003875881570000072
).

步骤S203:通过隐变量采样器获得隐变量表示hc和hnc。隐变量采样器根据上一步得到的高斯分布进行采样,分别得到隐变量SC和SNC的表示hc和hnc。其中采样过程通过经典的再参量化(reparametrization)技术完成。具体来说,先从标准正态分布中采样得到变量εc和εnc,即

Figure BDA0003875881570000073
Figure BDA0003875881570000074
再根据如下公式分别得到所需的隐变量表示hc和hnc:Step S203: Obtain hidden variable representations h c and h nc through the hidden variable sampler. The latent variable sampler performs sampling according to the Gaussian distribution obtained in the previous step, and obtains the representations h c and h nc of latent variables SC and SNC respectively. The sampling process is completed through a classic reparametrization technique. Specifically, the variables ε c and ε nc are first sampled from the standard normal distribution, namely
Figure BDA0003875881570000073
Figure BDA0003875881570000074
Then obtain the required latent variable representations h c and h nc according to the following formulas:

hc=μccch cccc ,

hnc=μncncnc.h ncncncnc .

步骤S204:将隐变量表示hc和hnc拼接得到的hcnc输入原文重构解码器。将上一步得到的隐变量表示hc和hnc进行拼接,得到hcnc,作为原文重构解码器的初始输入。Step S204: Input the h cnc obtained by splicing the latent variable representation h c and h nc into the original text reconstruction decoder. Concatenate the hidden variable representations h c and h nc obtained in the previous step to obtain h cnc , which is used as the initial input of the original text reconstruction decoder.

步骤S205:通过原文重构解码器得到oi。原文重构解码器根据初始输入,计算每个位置上的输出oi。位置即原文词序列中,按照从左到右的顺序,每个位置对应一个词。Step S205: Obtain o i through the original text reconstruction decoder. The original text reconstruction decoder calculates the output o i at each position according to the initial input. The position is the original text word sequence, in order from left to right, each position corresponds to a word.

步骤S206:叠加hcnc和oi,输入语言模型输出层。将步骤S204得到的hcnc叠加到上一步得到的oi上,作为语言模型输出层的输入。其中语言模型输出层是现有技术,基于线性变换和束搜索(beam search)实现。Step S206: superimpose h cnc and o i , and input them into the output layer of the language model. Superimpose the h cnc obtained in step S204 on the o i obtained in the previous step as the input of the language model output layer. The language model output layer is an existing technology, which is implemented based on linear transformation and beam search.

步骤S207:通过语言模型输出层生成重构的原文档X′。语言模型输出层将上一步得到的结果通过线性变换映射到词表上,得到词表分布概率,并根据束搜索方法生成对应的词语组成原文档X′。Step S207: Generate the reconstructed original document X' through the output layer of the language model. The language model output layer maps the results obtained in the previous step to the vocabulary through linear transformation to obtain the distribution probability of the vocabulary, and generates the corresponding words to form the original document X′ according to the beam search method.

步骤S208:根据X′和X计算重构损失LR。将重构的原文档X′和输入的原文档X进行比对,计算重构损失LRStep S208: Calculate the reconstruction loss LR according to X' and X. Compare the reconstructed original document X′ with the input original document X, and calculate the reconstruction loss L R .

生成原文是为了获取对于当前原文来说最佳的SC表示hc和SNC表示hnc,其中hc将用于摘要的生成。因训练和测试时原文都是给定的,所以本发明使用原文作为监督信号:重构原文后,将生成的原文X′和给定的原文X比对,根据重构的误差确定最优的隐变量表示hc和hnc,其中hc将用于摘要的生成。即准确的SC表示既能和SNC共同作用,很好地重构原文信息,又能单独作用生成摘要。当一个SC表示能够很好地重构原文时,即也适合用来生成摘要。Generating the original text is to obtain the best SC representation h c and SNC representation h nc for the current original text, where h c will be used for abstract generation. Because the original text is given during training and testing, the present invention uses the original text as a supervisory signal: after reconstructing the original text, compare the generated original text X′ with the given original text X, and determine the optimal Hidden variables represent h c and h nc , where h c will be used for summary generation. That is to say, the accurate SC representation can not only work with SNC to reconstruct the original text information well, but also can work alone to generate a summary. When an SC representation can well reconstruct the original text, it is also suitable for generating summaries.

步骤S209:存储当前最小LR对应的隐变量表示hc和hnc。对比本次循环中的LR和目前最小LR的大小,将本次循环后最小LR对应的隐变量表示hc和hnc进行存储。Step S209: Store hidden variable representations h c and h nc corresponding to the current minimum LR . Compare the size of LR in this cycle with the current minimum LR , and store the hidden variable representations h c and h nc corresponding to the minimum LR after this cycle.

步骤S210:得到初始隐变量表示hc0和hnc0。将上述步骤(S203~S209)重复n次,通过多次采样,返回使重构损失LR最小的隐变量表示hc和hnc作为初始隐变量表示hc0和hnc0Step S210: Obtain initial hidden variable representations h c0 and h nc0 . Repeat the above steps (S203-S209) n times, and return the hidden variable representations h c and h nc that minimize the reconstruction loss LR through multiple samplings as initial hidden variable representations h c0 and h nc0 .

步骤S3:通过k轮迭代更新得到最优隐变量表示hc和hnc。如图4所示,具体流程包括以下子步骤:Step S3: Obtain the optimal hidden variable representations h c and h nc through k rounds of iterative updating. As shown in Figure 4, the specific process includes the following sub-steps:

步骤S301:使用初始隐变量表示hc0和hnc0初始化隐变量表示hc和hnc。根据步骤S2得到的初始隐变量表示hc0和hnc0,初始化隐变量SC和SNC的表示hc和hncStep S301: Use initial hidden variable representations h c0 and h nc0 to initialize hidden variable representations h c and h nc . According to the initial hidden variable representations h c0 and h nc0 obtained in step S2, the representations h c and h nc of the hidden variables SC and SNC are initialized.

步骤S302:将隐变量表示hc和hnc拼接得到的hcnc输入原文重构解码器。将上一步得到的隐变量表示hc和hnc进行拼接,得到hcnc,作为原文重构解码器的初始输入。Step S302: Input the h cnc obtained by splicing the latent variable representation h c and h nc into the original text reconstruction decoder. Concatenate the hidden variable representations h c and h nc obtained in the previous step to obtain h cnc , which is used as the initial input of the original text reconstruction decoder.

步骤S303:通过原文重构解码器得到oi。原文重构解码器根据初始输入,计算每个位置上的输出oiStep S303: Obtain o i through the original text reconstruction decoder. The original text reconstruction decoder calculates the output o i at each position according to the initial input.

步骤S304:叠加hcnc和oi,输入语言模型输出层。将步骤S302得到的hcnc叠加到上一步得到的oi上,作为语言模型输出层的输入。Step S304: superimpose h cnc and o i , and input them into the output layer of the language model. Superimpose the h cnc obtained in step S302 on the o i obtained in the previous step as the input of the output layer of the language model.

步骤S305:通过语言模型输出层生成重构的原文档X′。语言模型输出层将上一步得到的结果映射到词表上,得到词表分布概率,并生成对应的词语组成原文档X′。Step S305: Generate a reconstructed original document X' through the language model output layer. The language model output layer maps the results obtained in the previous step to the vocabulary, obtains the distribution probability of the vocabulary, and generates the corresponding words to form the original document X′.

步骤S306:根据X′和X计算重构损失LR。将重构的原文档X′和输入的原文档X进行比对,计算重构损失LRStep S306: Calculate the reconstruction loss LR according to X' and X. Compare the reconstructed original document X′ with the input original document X, and calculate the reconstruction loss L R .

步骤S307:优化得到新的隐变量表示hc和hnc。根据重构损失LR和Adam优化算法优化隐变量表示hc和hnc。具体的优化过程如下:首先将隐变量表示设置为可学习的参数,然后对其设置专门的Adam优化器;对于上一步计算的重构损失LR,通过反向传播算法进行梯度回传,最后根据Adam算法直接更新可学习的隐变量表示hc和hncStep S307: Optimizing to obtain new hidden variable representations h c and h nc . The hidden variable representations h c and h nc are optimized according to the reconstruction loss LR and the Adam optimization algorithm. The specific optimization process is as follows: first, set the hidden variable representation as a learnable parameter, and then set a special Adam optimizer for it; for the reconstruction loss L R calculated in the previous step, the gradient is returned through the backpropagation algorithm, and finally The learnable latent variable representations h c and h nc are directly updated according to the Adam algorithm.

步骤S308:得到最优隐变量表示hc和hnc。将上述步骤(S302~S307)重复k次,通过多次优化,返回最优隐变量表示hc和hncStep S308: Obtain optimal hidden variable representations h c and h nc . The above steps (S302-S307) are repeated k times, and through multiple optimizations, the optimal hidden variable representations h c and h nc are returned.

综上,步骤S2和S3的目的都是为了获得准确的隐变量表示,其评价依据均为重构损失。S2作为S3的前序步骤,用于后者的初始化。技术上,它获得隐变量的方式是多次采样。而S3则是基于S2得到的最优初始化向量进行后续的优化操作。具体的优化过程如下:首先将隐变量表示设置为可学习的参数,然后对其设置专门的Adam优化器;对于上一步计算的重构损失LR,通过反向传播算法进行梯度回传,最后根据Adam算法直接更新可学习的隐变量表示hc和hnc。好的初始化对于优化过程来说至关重要,并且优化操作的耗时更长。因此在S2时先通过多次采样的方式来选择当前最佳的表示,更加快捷。In summary, the purpose of steps S2 and S3 is to obtain accurate hidden variable representation, and the evaluation basis is the reconstruction loss. S2 is used as the preceding step of S3 for the initialization of the latter. Technically, the way it obtains hidden variables is multiple sampling. S3 performs subsequent optimization operations based on the optimal initialization vector obtained by S2. The specific optimization process is as follows: first, set the hidden variable representation as a learnable parameter, and then set a special Adam optimizer for it; for the reconstruction loss L R calculated in the previous step, the gradient is returned through the backpropagation algorithm, and finally The learnable latent variable representations h c and h nc are directly updated according to the Adam algorithm. Good initialization is crucial for the optimization process, and optimization operations take longer. Therefore, in S2, it is faster to select the current best representation by means of multiple sampling.

步骤S4:使用最优隐变量表示hc生成摘要Y′。如图5所示,具体流程包括以下子步骤:Step S4: Use the optimal latent variable representation h c to generate a summary Y'. As shown in Figure 5, the specific process includes the following sub-steps:

步骤S401:将最优隐变量表示hc输入摘要预测解码器。将步骤S3得到的最优隐变量表示hc作为摘要预测解码器的初始输入。Step S401: Input the optimal latent variable representation hc into the digest prediction decoder. The optimal hidden variable representation h c obtained in step S3 is used as the initial input of the digest prediction decoder.

步骤S402:通过摘要预测解码器得到rj。摘要预测解码器根据初始输入,计算每个位置上的输出rjStep S402: Obtain r j through the digest prediction decoder. The summary predictive decoder computes the output r j at each position based on the initial input.

步骤S403:叠加hc和rj,输入语言模型输出层。将步骤S401的hc叠加到上一步得到的rj上,作为语言模型输出层的输入。Step S403: superimpose h c and r j , and input them into the output layer of the language model. Superimpose the h c of step S401 on the r j obtained in the previous step, and use it as the input of the output layer of the language model.

步骤S404:通过语言模型输出层生成预测的摘要Y′。语言模型输出层将上一步得到的结果映射到词表上,得到词表分布概率,并生成对应的词语组成摘要Y′。Step S404: Generate a predicted summary Y' through the output layer of the language model. The language model output layer maps the results obtained in the previous step to the vocabulary, obtains the distribution probability of the vocabulary, and generates the corresponding word composition summary Y′.

上述是本发明的使用过程,现对本发明方法的训练过程说明如下。相应的训练流程图如图6所示。The above is the use process of the present invention, and the training process of the method of the present invention is described as follows. The corresponding training flow chart is shown in Figure 6.

步骤S1:输入原文档X和参考摘要Y。将输入原文档X输入基于神经网络的双隐变量变分编码器模块。Step S1: Input original document X and reference abstract Y. Input the original document X into the neural network-based double hidden variable variational encoder module.

步骤S2:通过采样得到隐变量表示hc和hnc。具体步骤参考使用过程的S201~S203。该步骤会得到隐变量SC和SNC的高斯分布

Figure BDA0003875881570000101
Figure BDA0003875881570000102
(分布参数μc、μnc为均值,
Figure BDA0003875881570000103
为方差)。Step S2: Obtain hidden variable representations h c and h nc through sampling. For specific steps, refer to S201-S203 of the usage process. This step will get the Gaussian distribution of latent variables SC and SNC
Figure BDA0003875881570000101
and
Figure BDA0003875881570000102
(distribution parameters μ c and μ nc are mean values,
Figure BDA0003875881570000103
is the variance).

步骤S3:使用隐变量表示hc和hnc生成原文X′。具体步骤参考使用过程的S204~S207。Step S3: Use hidden variables to represent h c and h nc to generate the original text X'. For specific steps, refer to S204-S207 of the usage process.

步骤S4:使用隐变量表示hc生成摘要Y′。具体步骤参考使用过程的S4。Step S4: Generate summary Y' using hidden variable representation hc . For specific steps, refer to S4 of the usage process.

步骤S5:计算训练损失并优化模型。Step S5: Calculate the training loss and optimize the model.

训练损失由三部分组成:原文重构损失LR、摘要预测损失LP以及分布约束损失LKLThe training loss consists of three parts: original text reconstruction loss LR , summary prediction loss LP , and distribution constraint loss L KL .

原文重构损失LR是基于输入的原文档X和重构的X′计算的交叉熵,目的是为了训练双隐变量变分编码器和原文重构解码器。The original text reconstruction loss LR is the cross-entropy calculated based on the input original document X and the reconstructed X′, the purpose is to train the double hidden variable variational encoder and the original text reconstruction decoder.

摘要预测损失LP是基于输入的参考摘要Y和预测的Y′计算的交叉熵,目的是为了训练双隐变量变分编码器和摘要预测解码器。The digest prediction loss LP is the cross-entropy computed based on the input reference digest Y and the predicted Y′ for the purpose of training a dual hidden variable variational encoder and a digest predictive decoder.

分布约束损失LKL是基于标准正态分布

Figure BDA0003875881570000105
(0,I)和隐变量SC/SNC的高斯分布
Figure BDA0003875881570000104
(由步骤S2得到)计算的KL散度,目的是为了训练双隐变量变分编码器,使预测的变量分布更加规范化。The distribution constrained loss L KL is based on the standard normal distribution
Figure BDA0003875881570000105
Gaussian distribution of (0, I) and latent variable SC/SNC
Figure BDA0003875881570000104
The calculated KL divergence (obtained by step S2) is aimed at training a double hidden variable variational encoder to make the predicted variable distribution more normalized.

最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints.

根据上述训练损失函数,采用Adam优化器训练模型。According to the above training loss function, the Adam optimizer is used to train the model.

以下为与上述方法实施例对应的系统实施例,本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述实施方式中。The following are system embodiments corresponding to the foregoing method embodiments, and this implementation manner may be implemented in cooperation with the foregoing implementation manners. The relevant technical details mentioned in the foregoing implementation manners are still valid in this implementation manner, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner may also be applied in the foregoing implementation manners.

本发明还提出了一种基于因果关系的序列到序列文本摘要生成系统,其中包括:The present invention also proposes a sequence-to-sequence text summary generation system based on causality, which includes:

采样模块,用于将原文档输入基于神经网络的双隐变量变分编码器,该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征;The sampling module is used to input the original document into a neural network-based double hidden variable variation encoder, and the double hidden variable variation encoder performs multiple sampling on the original document to extract abstract-related features and abstract-independent features in the original document;

拼接模块,用于拼接该摘要相关特征和摘要无关特征,得到摘要综合特征,基于该摘要综合特征对原文档进行重构,得到重构文档,并以该原文档为训练目标,基于该重构文档和该训练目标构建损失函数,训练双隐变量变分编码器;The splicing module is used to splicing the abstract-related features and the abstract-independent features to obtain the abstract comprehensive features, reconstruct the original document based on the abstract comprehensive features, and obtain the reconstructed document, and take the original document as the training target, based on the reconstructed The document and the training target construct a loss function to train a double hidden variable variational encoder;

提取模块,用于以训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征,基于该目标特征,得到该原文档的文本摘要。The extraction module is used to use the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.

所述的基于因果关系的序列到序列文本摘要生成系统,其中该采样模块用于:The sequence-to-sequence text summarization system based on causality, wherein the sampling module is used for:

通过文档编码器得到文档的编码表示向量,该双隐变量变分编码器中的文档编码器对该原文档X进行编码,得到文档的编码表示向量hdocObtain the encoded representation vector of the document through the document encoder, and the document encoder in the double hidden variable variational encoder encodes the original document X to obtain the encoded representation vector h doc of the document;

双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行编码及采样,分别得到隐变量hc和hncThe variational encoder in the double hidden variable variational encoder module encodes and samples the coded representation vector h doc of the document to obtain hidden variables h c and h nc respectively;

将hc和hnc输入原文重构解码器,得到每个位置上的输出oi;叠加hc和hnc和oi,输入语言模型输出层生成重构的原文档X′;根据X′和X计算重构损失LR;存储LR最小时对应的隐变量表示hc和hnc分别作为该摘要相关特征和该摘要无关特征。Input h c and h nc into the original text reconstruction decoder to obtain the output o i at each position; superimpose h c and h nc and o i , input the output layer of the language model to generate the reconstructed original document X′; according to X′ and X to calculate the reconstruction loss L R ; store the hidden variable representations h c and h nc corresponding to the minimum L R as the summary-related features and the summary-independent features, respectively.

所述的基于因果关系的序列到序列文本摘要生成系统,其中该提取模块用于:The sequence-to-sequence text summarization system based on causality, wherein the extraction module is used for:

摘要预测解码器根据该目标特征,计算每个位置上的输出rj;叠加该目标特征和rj,输入语言模型输出层,得到词表分布概率,以生成对应的词语组成该文本摘要。The abstract predictive decoder calculates the output r j at each position according to the target feature; superimposes the target feature and r j , and inputs it into the output layer of the language model to obtain the vocabulary distribution probability to generate the corresponding words to form the text summary.

所述的基于因果关系的序列到序列文本摘要生成系统,其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为:The causality-based sequence-to-sequence text summary generation system, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:

获取训练文档和对应的参考摘要,将训练文档输入基于神经网络的双隐变量变分编码器,得到隐变量表示hc和hnc;同时分别得到hc和hnc的高斯分布

Figure BDA0003875881570000121
Figure BDA0003875881570000122
Obtain the training document and the corresponding reference abstract, input the training document into the double hidden variable variational encoder based on the neural network, and obtain the hidden variable representations h c and h nc ; at the same time, obtain the Gaussian distribution of h c and h nc respectively
Figure BDA0003875881570000121
and
Figure BDA0003875881570000122

使用隐变量表示hc和Lnc生成重构原文;并使用隐变量表示hc生成摘要结果;Use hidden variables to represent h c and L nc to generate reconstructed original text; and use hidden variables to represent h c to generate summary results;

基于训练文档和重构原文构建原文重构损失LR;基于该参考摘要和该摘要结果构建摘要预测损失LP;基于标准正态分布

Figure BDA0003875881570000123
和该高斯分布
Figure BDA0003875881570000124
计算KL散度;Construct the original text reconstruction loss L R based on the training document and reconstructed original text; construct the summary prediction loss L P based on the reference summary and the summary result; based on the standard normal distribution
Figure BDA0003875881570000123
and the Gaussian distribution
Figure BDA0003875881570000124
Calculate the KL divergence;

最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度,基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the double hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.

本发明还提出了一种存储介质,用于存储执行所述任意一种基于因果关系的序列到序列文本摘要生成方法的程序。The present invention also proposes a storage medium for storing a program for executing any one of the sequence-to-sequence text summarization methods based on causality.

本发明还提出了一种客户端,用于所述任意一种基于因果关系的序列到序列文本摘要生成系统。The present invention also proposes a client, which is used for any of the sequence-to-sequence text summary generation systems based on causality.

Claims (10)

1. A sequence-to-sequence text summary generation method based on causal relationship is characterized by comprising the following steps:
step 1, inputting an original document into a dual-hidden variable variation coder based on a neural network, wherein the dual-hidden variable variation coder samples the original document for multiple times to extract abstract related features and abstract irrelevant features in the original document;
step 2, splicing the abstract relevant features and the abstract irrelevant features to obtain abstract comprehensive features, reconstructing an original document based on the abstract comprehensive features to obtain a reconstructed document, constructing a loss function based on the reconstructed document and the training target by taking the original document as a training target, and training a double-hidden-variable variational encoder;
and 3, extracting the abstract relevant characteristics of the original document by adopting the trained double-hidden variable variational encoder to serve as target characteristics, and obtaining the text abstract of the original document based on the target characteristics.
2. The method for generating a sequence-to-sequence text summary based on causal relationship of claim 1, wherein the step 1 comprises:
obtaining the coding expression vector of the document through a document coder, coding the original document X through the document coder in the double hidden variable variation coder to obtain the coding expression vector h of the document doc
Variational encoding in dual latent variational encoder modulesEncoder-to-document encoded representation vector h doc Coding and sampling are carried out to respectively obtain hidden variables h c And h nc
H is to be c And h nc Input the original text reconstruction decoder to obtain an output o at each position i (ii) a Superposition h c And h nc And o i Inputting a language model output layer to generate a reconstructed original document X'; calculating the reconstruction loss L from X' and X R (ii) a Store L R Minimum corresponding hidden variable representation h c And h nc As the abstract-related feature and the abstract-independent feature, respectively.
3. The method for generating a sequence-to-sequence text summary based on causal relationship of claim 2, wherein the step 3 comprises:
the abstract prediction decoder calculates the output r at each position according to the target characteristics j (ii) a Superimposing the target feature and r j And inputting the language model output layer to obtain the word list distribution probability so as to generate corresponding words to form the text abstract.
4. The method as claimed in claim 3, wherein the dual hidden variable variant coder, the digest predictive decoder and the original text reconstruction decoder are trained by:
obtaining a training document and a corresponding reference abstract, inputting the training document into a neural network-based double-hidden-variable variation encoder to obtain a hidden-variable representation h c And h nc (ii) a Simultaneously obtain h respectively c And h nc Gaussian distribution of
Figure FDA0003875881560000024
And
Figure FDA0003875881560000021
using hidden variables to represent h c And h nc Generating a reconstructed original text; and using hidden variable representationsh c Generating a summary result;
constructing an original reconstruction loss L based on a training document and a reconstructed original R (ii) a Constructing a summary prediction loss L based on the reference summary and the summary result P (ii) a Based on a standard normal distribution
Figure FDA0003875881560000022
And the Gaussian distribution
Figure FDA0003875881560000023
Calculating KL divergence;
final training loss function L = L R +L P +λL KL Wherein λ is used to adjust the degree of distribution normalization constraint, training the bilatent variable variational encoder, the digest prediction decoder, and the textual reconstruction decoder based on the training loss function L.
5. A causal relationship-based sequence-to-sequence text summary generation system, comprising:
the sampling module is used for inputting the original document into a double-hidden variable variational encoder based on a neural network, and the double-hidden variable variational encoder samples the original document for multiple times to extract abstract related features and abstract irrelevant features in the original document;
the splicing module is used for splicing the abstract related features and the abstract irrelevant features to obtain abstract comprehensive features, reconstructing an original document based on the abstract comprehensive features to obtain a reconstructed document, taking the original document as a training target, constructing a loss function based on the reconstructed document and the training target, and training a double-latent variable variational encoder;
and the extraction module is used for extracting the related features of the abstract of the original document by using the trained double-hidden variable variational encoder as target features and obtaining the text abstract of the original document based on the target features.
6. The causal relationship-based sequence-to-sequence text summary generation system of claim 5, wherein the sampling module is configured to:
obtaining the coding expression vector of the document through a document coder, coding the original document X through the document coder in the double hidden variable variation coder to obtain the coding expression vector h of the document doc
Vector h for representing coding of document by variational coder in dual hidden variational coder module doc Coding and sampling are carried out to respectively obtain hidden variables h c And h nc
H is to be c And h nc Input the original text reconstruction decoder to obtain an output o at each position i (ii) a Superposition h c And h nc And o i Inputting a language model output layer to generate a reconstructed original document X'; calculating the reconstruction loss L from X' and X R (ii) a Store L R Minimum corresponding hidden variable representation h c And h nc As the abstract-related feature and the abstract-independent feature, respectively.
7. The causal relationship-based sequence-to-sequence text summary generation system of claim 6, wherein the extraction module is configured to:
the abstract prediction decoder calculates the output r at each position according to the target characteristics j (ii) a Superimposing the target feature and r j And inputting the language model output layer to obtain the word list distribution probability so as to generate corresponding words to form the text abstract.
8. The system of claim 7, wherein the dual hidden variable variant coder, the digest predictive decoder and the textual reconstruction decoder are trained by:
obtaining a training document and a corresponding reference abstract, inputting the training document into a neural network-based dual-hidden variable variation coder to obtain a hidden variable representation h c And h nc (ii) a Simultaneously obtain h respectively c And h nc Gaussian distribution of
Figure FDA0003875881560000031
And
Figure FDA0003875881560000032
using hidden variables to represent h c And h nc Generating a reconstructed original text; and using hidden variables to represent h c Generating a summary result;
constructing an original reconstruction loss L based on a training document and a reconstructed original R (ii) a Constructing a summary prediction loss L based on the reference summary and the summary result P (ii) a Based on a standard normal distribution
Figure FDA0003875881560000033
And the Gaussian distribution
Figure FDA0003875881560000034
Calculating KL divergence;
the final training loss function is L = L R +L P +λL KL Wherein λ is used to adjust the degree of distribution normalization constraint, training the bilatent variable variational encoder, the digest prediction decoder, and the plaintext reconstruction decoder for L based on the training loss function.
9. A storage medium storing a program for executing the cause-and-effect based sequence-to-sequence text digest generation method according to any one of claims 1 to 4.
10. A client for use in any one of the causality-based sequence-to-sequence text summary generation systems of claims 5 to 8.
CN202211215316.3A 2022-09-30 2022-09-30 Sequence-to-sequence text summary generation method and system based on causality Pending CN115658881A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211215316.3A CN115658881A (en) 2022-09-30 2022-09-30 Sequence-to-sequence text summary generation method and system based on causality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211215316.3A CN115658881A (en) 2022-09-30 2022-09-30 Sequence-to-sequence text summary generation method and system based on causality

Publications (1)

Publication Number Publication Date
CN115658881A true CN115658881A (en) 2023-01-31

Family

ID=84985693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211215316.3A Pending CN115658881A (en) 2022-09-30 2022-09-30 Sequence-to-sequence text summary generation method and system based on causality

Country Status (1)

Country Link
CN (1) CN115658881A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI847696B (en) * 2023-05-15 2024-07-01 中國信託商業銀行股份有限公司 Summary generation method based on prompt engineering and its computing device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI847696B (en) * 2023-05-15 2024-07-01 中國信託商業銀行股份有限公司 Summary generation method based on prompt engineering and its computing device

Similar Documents

Publication Publication Date Title
CN112487143B (en) Public opinion big data analysis-based multi-label text classification method
CN114757182B (en) A BERT short text sentiment analysis method with improved training method
CN110020438A (en) Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence
CN111125333B (en) A Generative Question Answering Method Based on Representation Learning and Multilayer Covering Mechanism
CN113657123A (en) Mongolian Aspect-Level Sentiment Analysis Method Based on Target Template Guidance and Relation Head Coding
CN111401003B (en) Method for generating humor text with enhanced external knowledge
CN117556037A (en) Multi-mode abstract generation method for code summarization based on word replacement strategy
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN114168754A (en) Relation extraction method based on syntactic dependency and fusion information
CN115563959A (en) Self-supervised pre-training method, system and medium for Chinese Pinyin spelling error correction
Wu et al. GCDST: A graph-based and copy-augmented multi-domain dialogue state tracking
Alissa et al. Text simplification using transformer and BERT
CN116861269A (en) Multi-source heterogeneous data fusion and analysis method in engineering field
Cai et al. HITS-based attentional neural model for abstractive summarization
CN111382333B (en) Case element extraction method in news text sentences based on case correlation joint learning and graph convolution
CN115906815A (en) Error correction method and device for modifying one or more types of wrong sentences
Wu A computational neural network model for college English grammar correction
CN115719072A (en) A method and system for text-level neural machine translation based on mask mechanism
CN115658881A (en) Sequence-to-sequence text summary generation method and system based on causality
CN115270795A (en) A Named Entity Recognition Technology in Environmental Assessment Field Based on Small Sample Learning
CN114662503A (en) An aspect-level sentiment analysis method based on LSTM and grammatical distance
CN114358021A (en) Task type dialogue statement reply generation method based on deep learning and storage medium
WO2025055581A1 (en) Speech encoder training method and apparatus, and device, medium and program product
CN116681087B (en) An automatic question generation method based on multi-stage timing and semantic information enhancement
CN115309886B (en) Artificial intelligence text creation method based on multimodal information input

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination