CN115658881A

CN115658881A - Sequence-to-sequence text summary generation method and system based on causality

Info

Publication number: CN115658881A
Application number: CN202211215316.3A
Authority: CN
Inventors: 郭嘉丰; 陈薇; 张儒清; 陈璐; 范意兴; 程学旗
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-01-31

Abstract

The invention provides a sequence-to-sequence text abstract generation method and system based on causal relation, and belongs to the field of natural language processing and automatic text abstract generation. The method is inspired by the causal theory, and the causal relationship of each element in the abstract task is researched from the data generation perspective. Firstly, introducing two non-observable variables to obtain a structural cause-and-effect model of an abstract task; and then obtaining a corresponding sequence to a sequence generation framework according to the structure cause and effect model, wherein the corresponding sequence is used for modeling the generation process of the original text and the abstract. The framework comprises three core modules: a double hidden variable variation coder, a textual reconstruction decoder and a digest prediction decoder. Compared with the existing end-to-end deep text summarization method, the method has stronger interpretability, better summarization performance and stronger generalization capability. The method is a sequence-to-sequence framework with strong applicability, and therefore can be migrated to more model bodies, generation tasks and different data sets.

Description

Sequence-to-sequence text summary generation method and system based on causality

技术领域technical field

本发明属于自然语言处理的文本摘要技术中序列到序列文本摘要技术领域，并特别涉及一种基于因果关系的序列到序列文本摘要生成方法及系统。The invention belongs to the technical field of sequence-to-sequence text summarization in the text summarization technology of natural language processing, and particularly relates to a sequence-to-sequence text summarization method and system based on causality.

背景技术Background technique

自动文本摘要技术旨在从输入文档中自动识别重要主题和关键信息，并生成准确、简洁又流畅的文本作为摘要。早期的摘要技术主要围绕人工设计的启发式规则和模板进行探究；如今，基于神经网络的深度文本摘要技术成为主流。它通过有监督训练端到端地学习原文和摘要的匹配模式，可分为抽取式和生成式摘要方法。Automatic text summarization technology aims to automatically identify important topics and key information from input documents, and generate accurate, concise and fluent text as a summary. Early summarization techniques were mainly explored around artificially designed heuristic rules and templates; today, deep text summarization techniques based on neural networks have become mainstream. It learns matching patterns of source text and summarization end-to-end through supervised training, which can be divided into extractive and generative summarization methods.

深度抽取式摘要方法抽取原文中的句子子集作为摘要文本。它通常将摘要任务转换为序列标注或排序形式的任务，通过对文档中每个侯选句子进行二分类标注，或者对侯选句子按照重要程度排序来选择关键句子构成摘要。深度生成式摘要方法则将摘要任务视为序列到序列的生成任务，以词语或短语作为基本生成单元，从无到有地生成摘要。由于摘要文本不拘泥于原始文本的表达，因此该方法生成的摘要具备更高的灵活性和多样性，适用性更强。The deep extractive summarization method extracts a subset of sentences in the original text as the summary text. It usually converts the summarization task into a sequence labeling or sorting task, and selects key sentences to form a summary by classifying each candidate sentence in the document, or sorting the candidate sentences according to their importance. The deep generative summarization method regards the summarization task as a sequence-to-sequence generation task, and uses words or phrases as the basic generation unit to generate summaries from scratch. Since the abstract text is not limited to the expression of the original text, the abstract generated by this method has higher flexibility and diversity, and stronger applicability.

近年来，以Transformer为主要结构的预训练模型为深度文本摘要方法进一步带来了性能增益。生成式方法从预训练模型中继承了优秀的语言表达能力。结合自身适用性强的优势，基于预训练模型的生成式摘要方法逐渐成为主流的研究课题。In recent years, the pre-training model with Transformer as the main structure has further brought performance gains to deep text summarization methods. Generative methods inherit excellent language expressiveness from pre-trained models. Combined with the advantages of its own strong applicability, the generative summarization method based on the pre-trained model has gradually become a mainstream research topic.

基于预训练模型的生成式摘要方法存在以下缺点。The generative summarization methods based on pre-trained models have the following disadvantages.

(1)首先，现有技术中的数据驱动方法缺乏可解释性。这是由于在端到端的训练方式中，用户只需给模型喂入样本及其标签，模型自动学习原文和摘要的匹配模式，这种“黑盒模式”对用户来说是不可理解的。(1) First, the data-driven methods in the prior art lack interpretability. This is because in the end-to-end training method, the user only needs to feed the model samples and their labels, and the model automatically learns the matching mode of the original text and the abstract. This "black box mode" is incomprehensible to the user.

(2)然后，现有技术生成的摘要常会包含多余的信息。这些冗余信息并非原文中的核心信息。这是由于现有技术在学习原文和摘要的匹配模式时，倾向于利用数据集中所有的相关关系，其中包含大量伪相关关系，继承了训练语料库中的统计偏差。这时模型容易被数据集中易学的表面特征所误导，比如高频共现但不具备必然联系的文本对。例如，“红色枫叶”作为高频共现的文本对，“红色”和“枫叶”具备高度统计相关性，但不具备必然的因果关系，“枫叶”也可以是“绿色”的。(2) However, summaries generated by prior art often contain redundant information. These redundant information are not the core information in the original text. This is due to the fact that existing techniques tend to exploit all correlations in the dataset when learning the matching patterns of the original text and the abstract, which contain a large number of pseudo-correlations, inheriting the statistical bias in the training corpus. At this time, the model is easily misled by the easy-to-learn surface features in the data set, such as text pairs that co-occur frequently but do not have necessary connections. For example, "red maple leaf" is a high-frequency co-occurrence text pair. "Red" and "maple leaf" have a high statistical correlation, but do not have an inevitable causal relationship. "Maple leaf" can also be "green".

发明内容Contents of the invention

本发明的目的是解决上述现有技术过度依赖相关关系、容易受伪相关误导的问题，提出了一种受因果理论启发的序列到序列框架。The purpose of the present invention is to solve the problem that the above-mentioned prior art relies too much on correlation and is easily misled by false correlation, and proposes a sequence-to-sequence framework inspired by causality theory.

本发明还提出了一种基于因果关系的序列到序列文本摘要生成方法，其中包括：The present invention also proposes a sequence-to-sequence text summary generation method based on causality, which includes:

步骤1、将原文档输入基于神经网络的双隐变量变分编码器，该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征；Step 1, inputting the original document into a neural network-based double hidden variable variational encoder, the double hidden variable variational encoder performs multiple sampling of the original document to extract abstract-related features and abstract-independent features in the original document;

步骤2、拼接该摘要相关特征和摘要无关特征，得到摘要综合特征，基于该摘要综合特征对原文档进行重构，得到重构文档，并以该原文档为训练目标，基于该重构文档和该训练目标构建损失函数，训练双隐变量变分编码器；Step 2. Concatenate the abstract-related features and abstract-independent features to obtain abstract comprehensive features. Based on the abstract comprehensive features, the original document is reconstructed to obtain a reconstructed document, and the original document is used as the training target. Based on the reconstructed document and The training objective constructs a loss function and trains a double hidden variable variational encoder;

步骤3、采用训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征，基于该目标特征，得到该原文档的文本摘要。Step 3: Using the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.

所述的基于因果关系的序列到序列文本摘要生成方法，其中该步骤1包括：The sequence-to-sequence text summary generation method based on causality, wherein the step 1 includes:

通过文档编码器得到文档的编码表示向量，该双隐变量变分编码器中的文档编码器对该原文档X进行编码，得到文档的编码表示向量h_doc；Obtain the encoded representation vector of the document through the document encoder, and the document encoder in the double hidden variable variational encoder encodes the original document X to obtain the encoded representation vector h _doc of the document;

双隐变量变分编码器模块中的变分编码器对文档的编码表示向量h_doc进行编码及采样，分别得到隐变量h_c和h_nc；The variational encoder in the double hidden variable variational encoder module encodes and samples the coded representation vector h _doc of the document to obtain hidden variables h _c and h _nc respectively;

将h_c和h_nc输入原文重构解码器，得到每个位置上的输出o_i；叠加h_c和h_nc和o_i，输入语言模型输出层生成重构的原文档X′；根据X′和X计算重构损失L_R；存储L_R最小时对应的隐变量表示h_c和h_nc分别作为该摘要相关特征和该摘要无关特征。Input h _c and h _nc into the original text reconstruction decoder to obtain the output o _i at each position; superimpose h _c and h _nc and o _i , input the output layer of the language model to generate the reconstructed original document X′; according to X′ and X to calculate the reconstruction loss L _R ; store the hidden variable representations h _c and h _nc corresponding to the minimum L _R as the summary-related features and the summary-independent features, respectively.

所述的基于因果关系的序列到序列文本摘要生成方法，其中该步骤3包括：The sequence-to-sequence text summary generation method based on causality, wherein the step 3 includes:

摘要预测解码器根据该目标特征，计算每个位置上的输出r_j；叠加该目标特征和r_j，输入语言模型输出层，得到词表分布概率，以生成对应的词语组成该文本摘要。The abstract predictive decoder calculates the output r _j at each position according to the target feature; superimposes the target feature and r _j , and inputs it into the output layer of the language model to obtain the vocabulary distribution probability to generate the corresponding words to form the text summary.

所述的基于因果关系的序列到序列文本摘要生成方法，其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为：The causality-based sequence-to-sequence text summary generation method, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:

获取训练文档和对应的参考摘要，将训练文档输入基于神经网络的双隐变量变分编码器，得到隐变量表示h_c和h_nc；同时分别得到h_c和h_nc的高斯分布

和

Obtain the training document and the corresponding reference abstract, input the training document into the double hidden variable variational encoder based on the neural network, and obtain the hidden variable representations h _c and h _nc ; at the same time, obtain the Gaussian distribution of h _c and h _nc respectively

and

使用隐变量表示h_c和h_nc生成重构原文；并使用隐变量表示h_c生成摘要结果；Use hidden variables to represent h _c and h _nc to generate reconstructed original text; and use hidden variables to represent h _c to generate summary results;

基于训练文档和重构原文构建原文重构损失L_R；基于该参考摘要和该摘要结果构建摘要预测损失L_P；基于标准正态分布

和该高斯分布

计算KL散度；Construct the original text reconstruction loss L _R based on the training document and reconstructed original text; construct the summary prediction loss L _P based on the reference summary and the summary result; based on the standard normal distribution

and the Gaussian distribution

Calculate the KL divergence;

最终的训练损失函数为L＝L_R+L_P+λL_KL，其中λ用于调整分布规范化约束的程度，基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L _R +L _P +λL _KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the dual hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.

本发明还提出了一种基于因果关系的序列到序列文本摘要生成系统，其中包括：The present invention also proposes a sequence-to-sequence text summary generation system based on causality, which includes:

采样模块，用于将原文档输入基于神经网络的双隐变量变分编码器，该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征；The sampling module is used to input the original document into a neural network-based double hidden variable variation encoder, and the double hidden variable variation encoder performs multiple sampling on the original document to extract abstract-related features and abstract-independent features in the original document;

拼接模块，用于拼接该摘要相关特征和摘要无关特征，得到摘要综合特征，基于该摘要综合特征对原文档进行重构，得到重构文档，并以该原文档为训练目标，基于该重构文档和该训练目标构建损失函数，训练双隐变量变分编码器；The splicing module is used to splicing the abstract-related features and the abstract-independent features to obtain the abstract comprehensive features, reconstruct the original document based on the abstract comprehensive features, and obtain the reconstructed document, and take the original document as the training target, based on the reconstructed The document and the training target construct a loss function to train a double hidden variable variational encoder;

提取模块，用于以训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征，基于该目标特征，得到该原文档的文本摘要。The extraction module is used to use the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.

所述的基于因果关系的序列到序列文本摘要生成系统，其中该采样模块用于：The sequence-to-sequence text summarization system based on causality, wherein the sampling module is used for:

所述的基于因果关系的序列到序列文本摘要生成系统，其中该提取模块用于：The sequence-to-sequence text summarization system based on causality, wherein the extraction module is used for:

所述的基于因果关系的序列到序列文本摘要生成系统，其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为：The causality-based sequence-to-sequence text summary generation system, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:

和

and

和该高斯分布

and the Gaussian distribution

Calculate the KL divergence;

本发明还提出了一种存储介质，用于存储执行所述任意一种基于因果关系的序列到序列文本摘要生成方法的程序。The present invention also proposes a storage medium for storing a program for executing any one of the sequence-to-sequence text summarization methods based on causality.

本发明还提出了一种客户端，用于所述任意一种基于因果关系的序列到序列文本摘要生成系统。The present invention also proposes a client, which is used for any of the sequence-to-sequence text summary generation systems based on causality.

由以上方案可知，本发明的CI-Seq2Seq框架与现有技术相比的优势在于：It can be seen from the above scheme that the advantages of the CI-Seq2Seq framework of the present invention compared with the prior art are:

(1)更强的可解释性。与现有技术笼统地学习原文的表示相比，本专利提出的方法能够有效地判别和提取原文中SC和SNC的隐变量表示h_c和h_nc。我们通过t-SNE可视化分析对所学隐变量表示的分布情况进行验证，如图1所示。从图1可以看出学到的h_c和h_nc表示的分布空间明显不同：h_c的分布空间更加集中，而h_nc的分布空间则更加分散。这证明了SC和SNC分别学到了核心信息和表示多样性的信息。(1) Stronger interpretability. Compared with the prior art which generally learns the representation of the original text, the method proposed in this patent can effectively distinguish and extract the latent variable representations h _c and h _nc of SC and SNC in the original text. We use t-SNE visual analysis to verify the distribution of the learned hidden variables, as shown in Figure 1. From Figure 1, it can be seen that the distribution spaces represented by the learned h _c and h _nc are significantly different: the distribution space of h _c is more concentrated, while the distribution space of h _nc is more dispersed. This proves that SC and SNC have learned core information and information representing diversity, respectively.

(2)更好的摘要性能。与现有技术相比，本专利提出的方法能够有效地提升摘要生成的质量。我们在公开的摘要数据集CNN/DM和XSUM上进行实验，将CI-Seq2Seq与其他基线模型对比。对比的基线模型包括两个同样基于变分自编码器实现的Unified VAE-PGN、VHTM，三个经典的通用预训练模型T5、BART、GLM，以及使用对比学习、对抗学习和长度可控等方法实现的CLIFF、Debiased-Ext和PtLAAM。评价指标是Rouge-1、Rouge-2和Rouge-L，它们分别衡量了单词、二元词组和最长公共子序列的召回率。对比结果如表1所示。根据表1可以看到，不仅是和其他基于变分自编码器实现的模型、通用强大的预训练模型，还是和使用其他训练技巧的模型相比，我们提出的方法在两个数据集上的所有指标都能获得明显提升。(2) Better summarization performance. Compared with the prior art, the method proposed in this patent can effectively improve the quality of summary generation. We conduct experiments on publicly available summarization datasets CNN/DM and XSUM to compare CI-Seq2Seq with other baseline models. The baseline models for comparison include two Unified VAE-PGN and VHTM, which are also based on variational autoencoders, three classic general-purpose pre-training models T5, BART, and GLM, and methods such as contrastive learning, confrontational learning, and length control Implemented CLIFF, Debiased-Ext and PtLAAM. The evaluation metrics are Rouge-1, Rouge-2, and Rouge-L, which measure the recall of words, bigrams, and longest common subsequences, respectively. The comparison results are shown in Table 1. According to Table 1, it can be seen that not only compared with other models based on variational autoencoders, general and powerful pre-training models, but also compared with models using other training techniques, our proposed method has better performance on the two data sets. All indicators can be significantly improved.

表1：CNN/DM和XSUM数据集上的摘要性能比较Table 1: Summary performance comparison on CNN/DM and XSUM datasets

(3)更强的泛化能力。与现有技术相比，本专利提出的方法能够有效地提升模型的泛化能力。我们将在CNN/DM上训练的模型用于XSUM测试集，将在XSUM上训练的模型用于CNN/DM测试集，由此测试不同模型在未见过的数据集上的表现能力，实验结果见表2。通过与原始框架BART进行对比，我们的方法能够获得一定的提升，证明了我们的模型具备更强的泛化能力。(3) Stronger generalization ability. Compared with the prior art, the method proposed in this patent can effectively improve the generalization ability of the model. We use the model trained on CNN/DM for the XSUM test set, and the model trained on XSUM for the CNN/DM test set to test the performance capabilities of different models on unseen datasets. Experimental results See Table 2. By comparing with the original framework BART, our method can get a certain improvement, which proves that our model has stronger generalization ability.

表2：CNN/DM和XSUM数据集上交叉实验的泛化性能比较Table 2: Comparison of generalization performance of crossover experiments on CNN/DM and XSUM datasets

附图说明Description of drawings

图1为t-SNE可视化分析示意图；Figure 1 is a schematic diagram of t-SNE visual analysis;

图2是使用CI-Seq2Seq生成摘要文本的总流程图；Figure 2 is a general flowchart of generating summary text using CI-Seq2Seq;

图3是使用CI-Seq2Seq生成摘要文本时，通过n次采样得到初始隐变量表示和的流程图；Figure 3 is a flow chart of obtaining the initial hidden variable representation sum through n sampling when using CI-Seq2Seq to generate summary text;

图4是使用CI-Seq2Seq生成摘要文本时，通过k轮迭代更新得到最优隐变量表示和的流程图；Figure 4 is a flowchart of obtaining the optimal hidden variable representation sum through k rounds of iterative updates when CI-Seq2Seq is used to generate summary text;

图5是使用CI-Seq2Seq生成摘要文本时，使用最优隐变量表示生成摘要的流程图；Figure 5 is a flow chart of generating a summary using the optimal hidden variable representation when using CI-Seq2Seq to generate a summary text;

图6是本发明方法的训练过程图。Fig. 6 is a diagram of the training process of the method of the present invention.

具体实施方式Detailed ways

发明人结合因果理论对文本摘要任务进行研究时，发现现有技术生成冗余摘要的缺陷是由对数据集中相关关系的过度依赖导致的。现有技术过度依赖摘要任务中各成分的相关关系，而没有考虑更本质的因果关系。因果关系描述了潜在的数据生成过程，现有技术仅仅依赖可观测变量难以刻画数据背后自然的因果关系。When the inventor combined the causal theory to study the text summarization task, he found that the defect of generating redundant summaries in the prior art is caused by the over-reliance on the correlation relationship in the data set. Existing techniques rely too much on the correlation of components in the summarization task, without considering the more essential causal relationship. The causal relationship describes the underlying data generation process, and it is difficult for existing technologies to describe the natural causal relationship behind the data only by relying on observable variables.

本发明引入两个不可观测变量并使用因果感知建模相应的数据生成过程，以解决现有技术过度依赖相关关系、容易受伪相关误导的问题。具体而言，本发明为摘要任务设计了一个结构因果模型，其中涉及到的变量除了可观测的原文和摘要外，还包含两个不可观测变量，分别表示决定摘要内容的信息(summary-causal factors，SC)，如原文中的核心信息，和语料库中不决定摘要内容的其他信息(summary-non-causal factors，SNC)，如导致原文生成多样性的边缘信息。两个不可观测变量的根本区别在于只有前者SC是摘要产生的原因。通过显式地区分两种信息，本发明设计了一个受因果理论启发的序列到序列框架(Causality Inspired Sequence-to-Sequence model，CI-Seq2Seq)来刻画摘要和文本的生成过程，在生成摘要时只利用SC，而在生成原文时两者都使用。模型结构上，本发明对传统的序列到序列框架中的编码器-解码器架构做出如下改进：将单编码器、单解码器改进为双隐变量变分编码器、原文重构解码器和摘要预测解码器。具体来说本发明包括如下关键技术点：The present invention introduces two unobservable variables and uses causal perception to model the corresponding data generation process, so as to solve the problem of excessive dependence on correlation relationship and easy misleading by false correlation in the prior art. Specifically, the present invention designs a structural causal model for the summarization task, in which the variables involved include two unobservable variables in addition to the observable original text and abstract, respectively representing the information that determines the content of the abstract (summary-causal factors , SC), such as the core information in the original text, and other information (summary-non-causal factors, SNC) in the corpus that do not determine the content of the summary, such as marginal information that leads to the diversity of the original text. The fundamental difference between the two unobservable variables is that only the former SC is responsible for the summary. By explicitly distinguishing the two kinds of information, the present invention designs a sequence-to-sequence framework (Causality Inspired Sequence-to-Sequence model, CI-Seq2Seq) inspired by causality theory to describe the generation process of abstracts and texts. When generating abstracts Only SC is utilized, while both are used when generating source text. In terms of model structure, the present invention makes the following improvements to the encoder-decoder architecture in the traditional sequence-to-sequence framework: the single encoder and single decoder are improved to double hidden variable variational encoders, original text reconstruction decoders and Digest predictive decoder. Specifically, the present invention includes the following key technical points:

关键点1，从数据生成过程的角度设计的文本摘要结构因果模型；技术效果：根据该结构因果模型设计的文本摘要方法能够显式地区分两种信息，学到SC和SNC各自的表示，具备更强的可解释性；Key point 1, the text summarization structural causal model designed from the perspective of the data generation process; technical effect: the text summarization method designed according to the structural causal model can explicitly distinguish two kinds of information, learn the respective representations of SC and SNC, and have Stronger interpretability;

关键点2，受因果理论启发的序列到序列框架CI-Seq2Seq；技术效果：该框架能够有效区分SC和SNC并获得各自的表示，其中SC的表示用于生成摘要，SC和SNC都用于生成原文；该框架基于有监督的变分自编码器实现，能有效提高生成摘要的质量。其中核心模块包括：双隐变量变分编码器：利用神经网络，将原文编码为SC和SNC的隐变量表示；原文重构解码器：利用神经网络，将SC和SNC的隐变量表示进行解码，用于重构原文；摘要预测解码器：利用神经网络，将SC的隐变量表示进行解码，用于预测摘要。Key point 2, CI-Seq2Seq, a sequence-to-sequence framework inspired by causality theory; technical effect: the framework can effectively distinguish SC and SNC and obtain their respective representations, where the representation of SC is used to generate summaries, and both SC and SNC are used to generate Original text; The framework is implemented based on a supervised variational autoencoder, which can effectively improve the quality of generated summaries. The core modules include: double hidden variable variational encoder: use the neural network to encode the original text into the hidden variable representation of SC and SNC; original text reconstruction decoder: use the neural network to decode the hidden variable representation of SC and SNC, Used to reconstruct the original text; abstract prediction decoder: use the neural network to decode the latent variable representation of SC to predict the abstract.

为让本发明的上述特征和效果能阐述的更明确易懂，下文特举实施例，并配合说明书附图作详细说明如下。In order to make the above-mentioned features and effects of the present invention more clear and understandable, the following specific examples are given together with the accompanying drawings for detailed description as follows.

如图2所示，是相应的总流程图。本发明生成摘要文本的总体流程包括以下几个步骤：As shown in Figure 2, it is the corresponding overall flow chart. The overall flow of the present invention to generate abstract text includes the following steps:

步骤S1：输入原文档X，将输入原文档X输入基于神经网络的双隐变量变分编码器模块。Step S1: Input the original document X, and input the original document X into the neural network-based double hidden variable variational encoder module.

步骤S2：通过n次采样得到初始隐变量表示h_c0和h_nc0。具体流程如图3所示，包括如下子步骤：Step S2: Obtain initial hidden variable representations h _c0 and h _nc0 through n times of sampling. The specific process is shown in Figure 3, including the following sub-steps:

步骤S201：通过文档编码器得到文档的编码表示向量h_doc。双隐变量变分编码器模块中的文档编码器对输入的原文档X进行编码，得到文档的编码表示向量h_doc。Step S201: Obtain the encoded representation vector h _doc of the document through the document encoder. The document encoder in the double hidden variable variational encoder module encodes the input original document X to obtain the encoded representation vector h _doc of the document.

步骤S202：通过变分编码器得到隐变量SC和SNC的高斯分布。双隐变量变分编码器模块中的变分编码器对文档的编码表示向量h_doc进行进一步编码，得到隐变量SC和SNC的高斯分布(分布参数为μ_c、

μ_nc和

)。Step S202: Gaussian distributions of latent variables SC and SNC are obtained through a variational encoder. The variational encoder in the double hidden variable variational encoder module further encodes the coded representation vector h _doc of the document, and obtains the Gaussian distribution of hidden variables SC and SNC (the distribution parameters are μ _c ,

μ _nc and

).

步骤S203：通过隐变量采样器获得隐变量表示h_c和h_nc。隐变量采样器根据上一步得到的高斯分布进行采样，分别得到隐变量SC和SNC的表示h_c和h_nc。其中采样过程通过经典的再参量化(reparametrization)技术完成。具体来说，先从标准正态分布中采样得到变量ε_c和ε_nc，即

再根据如下公式分别得到所需的隐变量表示h_c和h_nc：Step S203: Obtain hidden variable representations h _c and h _nc through the hidden variable sampler. The latent variable sampler performs sampling according to the Gaussian distribution obtained in the previous step, and obtains the representations h _c and h _nc of latent variables SC and SNC respectively. The sampling process is completed through a classic reparametrization technique. Specifically, the variables ε _c and ε _nc are first sampled from the standard normal distribution, namely

Then obtain the required latent variable representations h _c and h _nc according to the following formulas:

h_c＝μ_c+σ_c*ε_c，h _c =μ _c +σ _c *ε _c ,

h_nc＝μ_nc+σ_nc*ε_nc.h _nc =μ _nc +σ _nc *ε _nc .

步骤S204：将隐变量表示h_c和h_nc拼接得到的h_cnc输入原文重构解码器。将上一步得到的隐变量表示h_c和h_nc进行拼接，得到h_cnc，作为原文重构解码器的初始输入。Step S204: Input the h _cnc obtained by splicing the latent variable representation h _c and h _nc into the original text reconstruction decoder. Concatenate the hidden variable representations h _c and h _nc obtained in the previous step to obtain h _cnc , which is used as the initial input of the original text reconstruction decoder.

步骤S205：通过原文重构解码器得到o_i。原文重构解码器根据初始输入，计算每个位置上的输出o_i。位置即原文词序列中，按照从左到右的顺序，每个位置对应一个词。Step S205: Obtain o _i through the original text reconstruction decoder. The original text reconstruction decoder calculates the output o _i at each position according to the initial input. The position is the original text word sequence, in order from left to right, each position corresponds to a word.

步骤S206：叠加h_cnc和o_i，输入语言模型输出层。将步骤S204得到的h_cnc叠加到上一步得到的o_i上，作为语言模型输出层的输入。其中语言模型输出层是现有技术，基于线性变换和束搜索(beam search)实现。Step S206: superimpose h _cnc and o _i , and input them into the output layer of the language model. Superimpose the h _cnc obtained in step S204 on the o _i obtained in the previous step as the input of the language model output layer. The language model output layer is an existing technology, which is implemented based on linear transformation and beam search.

步骤S207：通过语言模型输出层生成重构的原文档X′。语言模型输出层将上一步得到的结果通过线性变换映射到词表上，得到词表分布概率，并根据束搜索方法生成对应的词语组成原文档X′。Step S207: Generate the reconstructed original document X' through the output layer of the language model. The language model output layer maps the results obtained in the previous step to the vocabulary through linear transformation to obtain the distribution probability of the vocabulary, and generates the corresponding words to form the original document X′ according to the beam search method.

步骤S208：根据X′和X计算重构损失L_R。将重构的原文档X′和输入的原文档X进行比对，计算重构损失L_R。Step S208: Calculate the reconstruction loss _LR according to X' and X. Compare the reconstructed original document X′ with the input original document X, and calculate the reconstruction loss L _R .

生成原文是为了获取对于当前原文来说最佳的SC表示h_c和SNC表示h_nc，其中h_c将用于摘要的生成。因训练和测试时原文都是给定的，所以本发明使用原文作为监督信号：重构原文后，将生成的原文X′和给定的原文X比对，根据重构的误差确定最优的隐变量表示h_c和h_nc，其中h_c将用于摘要的生成。即准确的SC表示既能和SNC共同作用，很好地重构原文信息，又能单独作用生成摘要。当一个SC表示能够很好地重构原文时，即也适合用来生成摘要。Generating the original text is to obtain the best SC representation h _c and SNC representation h _nc for the current original text, where h _c will be used for abstract generation. Because the original text is given during training and testing, the present invention uses the original text as a supervisory signal: after reconstructing the original text, compare the generated original text X′ with the given original text X, and determine the optimal Hidden variables represent h _c and h _nc , where h _c will be used for summary generation. That is to say, the accurate SC representation can not only work with SNC to reconstruct the original text information well, but also can work alone to generate a summary. When an SC representation can well reconstruct the original text, it is also suitable for generating summaries.

步骤S209：存储当前最小L_R对应的隐变量表示h_c和h_nc。对比本次循环中的L_R和目前最小L_R的大小，将本次循环后最小L_R对应的隐变量表示h_c和h_nc进行存储。Step S209: Store hidden variable representations h _c and h _nc corresponding to the current minimum _LR . Compare the size of _LR in this cycle with the current minimum _LR , and store the hidden variable representations h _c and h _nc corresponding to the minimum _LR after this cycle.

步骤S210：得到初始隐变量表示h_c0和h_nc0。将上述步骤(S203～S209)重复n次，通过多次采样，返回使重构损失L_R最小的隐变量表示h_c和h_nc作为初始隐变量表示h_c0和h_nc0。Step S210: Obtain initial hidden variable representations h _c0 and h _nc0 . Repeat the above steps (S203-S209) n times, and return the hidden variable representations h _c and h _nc that minimize the reconstruction loss _LR through multiple samplings as initial hidden variable representations h _c0 and h _nc0 .

步骤S3：通过k轮迭代更新得到最优隐变量表示h_c和h_nc。如图4所示，具体流程包括以下子步骤：Step S3: Obtain the optimal hidden variable representations h _c and h _nc through k rounds of iterative updating. As shown in Figure 4, the specific process includes the following sub-steps:

步骤S301：使用初始隐变量表示h_c0和h_nc0初始化隐变量表示h_c和h_nc。根据步骤S2得到的初始隐变量表示h_c0和h_nc0，初始化隐变量SC和SNC的表示h_c和h_nc。Step S301: Use initial hidden variable representations h _c0 and h _nc0 to initialize hidden variable representations h _c and h _nc . According to the initial hidden variable representations h _c0 and h _nc0 obtained in step S2, the representations h _c and h _nc of the hidden variables SC and SNC are initialized.

步骤S302：将隐变量表示h_c和h_nc拼接得到的h_cnc输入原文重构解码器。将上一步得到的隐变量表示h_c和h_nc进行拼接，得到h_cnc，作为原文重构解码器的初始输入。Step S302: Input the h _cnc obtained by splicing the latent variable representation h _c and h _nc into the original text reconstruction decoder. Concatenate the hidden variable representations h _c and h _nc obtained in the previous step to obtain h _cnc , which is used as the initial input of the original text reconstruction decoder.

步骤S303：通过原文重构解码器得到o_i。原文重构解码器根据初始输入，计算每个位置上的输出o_i。Step S303: Obtain o _i through the original text reconstruction decoder. The original text reconstruction decoder calculates the output o _i at each position according to the initial input.

步骤S304：叠加h_cnc和o_i，输入语言模型输出层。将步骤S302得到的h_cnc叠加到上一步得到的o_i上，作为语言模型输出层的输入。Step S304: superimpose h _cnc and o _i , and input them into the output layer of the language model. Superimpose the h _cnc obtained in step S302 on the o _i obtained in the previous step as the input of the output layer of the language model.

步骤S305：通过语言模型输出层生成重构的原文档X′。语言模型输出层将上一步得到的结果映射到词表上，得到词表分布概率，并生成对应的词语组成原文档X′。Step S305: Generate a reconstructed original document X' through the language model output layer. The language model output layer maps the results obtained in the previous step to the vocabulary, obtains the distribution probability of the vocabulary, and generates the corresponding words to form the original document X′.

步骤S306：根据X′和X计算重构损失L_R。将重构的原文档X′和输入的原文档X进行比对，计算重构损失L_R。Step S306: Calculate the reconstruction loss _LR according to X' and X. Compare the reconstructed original document X′ with the input original document X, and calculate the reconstruction loss L _R .

步骤S307：优化得到新的隐变量表示h_c和h_nc。根据重构损失L_R和Adam优化算法优化隐变量表示h_c和h_nc。具体的优化过程如下：首先将隐变量表示设置为可学习的参数，然后对其设置专门的Adam优化器；对于上一步计算的重构损失L_R，通过反向传播算法进行梯度回传，最后根据Adam算法直接更新可学习的隐变量表示h_c和h_nc。Step S307: Optimizing to obtain new hidden variable representations h _c and h _nc . The hidden variable representations h _c and h _nc are optimized according to the reconstruction loss _LR and the Adam optimization algorithm. The specific optimization process is as follows: first, set the hidden variable representation as a learnable parameter, and then set a special Adam optimizer for it; for the reconstruction loss L _R calculated in the previous step, the gradient is returned through the backpropagation algorithm, and finally The learnable latent variable representations h _c and h _nc are directly updated according to the Adam algorithm.

步骤S308：得到最优隐变量表示h_c和h_nc。将上述步骤(S302～S307)重复k次，通过多次优化，返回最优隐变量表示h_c和h_nc。Step S308: Obtain optimal hidden variable representations h _c and h _nc . The above steps (S302-S307) are repeated k times, and through multiple optimizations, the optimal hidden variable representations h _c and h _nc are returned.

综上，步骤S2和S3的目的都是为了获得准确的隐变量表示，其评价依据均为重构损失。S2作为S3的前序步骤，用于后者的初始化。技术上，它获得隐变量的方式是多次采样。而S3则是基于S2得到的最优初始化向量进行后续的优化操作。具体的优化过程如下：首先将隐变量表示设置为可学习的参数，然后对其设置专门的Adam优化器；对于上一步计算的重构损失L_R，通过反向传播算法进行梯度回传，最后根据Adam算法直接更新可学习的隐变量表示h_c和h_nc。好的初始化对于优化过程来说至关重要，并且优化操作的耗时更长。因此在S2时先通过多次采样的方式来选择当前最佳的表示，更加快捷。In summary, the purpose of steps S2 and S3 is to obtain accurate hidden variable representation, and the evaluation basis is the reconstruction loss. S2 is used as the preceding step of S3 for the initialization of the latter. Technically, the way it obtains hidden variables is multiple sampling. S3 performs subsequent optimization operations based on the optimal initialization vector obtained by S2. The specific optimization process is as follows: first, set the hidden variable representation as a learnable parameter, and then set a special Adam optimizer for it; for the reconstruction loss L _R calculated in the previous step, the gradient is returned through the backpropagation algorithm, and finally The learnable latent variable representations h _c and h _nc are directly updated according to the Adam algorithm. Good initialization is crucial for the optimization process, and optimization operations take longer. Therefore, in S2, it is faster to select the current best representation by means of multiple sampling.

步骤S4：使用最优隐变量表示h_c生成摘要Y′。如图5所示，具体流程包括以下子步骤：Step S4: Use the optimal latent variable representation h _c to generate a summary Y'. As shown in Figure 5, the specific process includes the following sub-steps:

步骤S401：将最优隐变量表示h_c输入摘要预测解码器。将步骤S3得到的最优隐变量表示h_c作为摘要预测解码器的初始输入。Step S401: Input the optimal latent variable representation _hc into the digest prediction decoder. The optimal hidden variable representation h _c obtained in step S3 is used as the initial input of the digest prediction decoder.

步骤S402：通过摘要预测解码器得到r_j。摘要预测解码器根据初始输入，计算每个位置上的输出r_j。Step S402: Obtain r _j through the digest prediction decoder. The summary predictive decoder computes the output r _j at each position based on the initial input.

步骤S403：叠加h_c和r_j，输入语言模型输出层。将步骤S401的h_c叠加到上一步得到的r_j上，作为语言模型输出层的输入。Step S403: superimpose h _c and r _j , and input them into the output layer of the language model. Superimpose the h _c of step S401 on the r _j obtained in the previous step, and use it as the input of the output layer of the language model.

步骤S404：通过语言模型输出层生成预测的摘要Y′。语言模型输出层将上一步得到的结果映射到词表上，得到词表分布概率，并生成对应的词语组成摘要Y′。Step S404: Generate a predicted summary Y' through the output layer of the language model. The language model output layer maps the results obtained in the previous step to the vocabulary, obtains the distribution probability of the vocabulary, and generates the corresponding word composition summary Y′.

上述是本发明的使用过程，现对本发明方法的训练过程说明如下。相应的训练流程图如图6所示。The above is the use process of the present invention, and the training process of the method of the present invention is described as follows. The corresponding training flow chart is shown in Figure 6.

步骤S1：输入原文档X和参考摘要Y。将输入原文档X输入基于神经网络的双隐变量变分编码器模块。Step S1: Input original document X and reference abstract Y. Input the original document X into the neural network-based double hidden variable variational encoder module.

步骤S2：通过采样得到隐变量表示h_c和h_nc。具体步骤参考使用过程的S201～S203。该步骤会得到隐变量SC和SNC的高斯分布

和

(分布参数μ_c、μ_nc为均值，

为方差)。Step S2: Obtain hidden variable representations h _c and h _nc through sampling. For specific steps, refer to S201-S203 of the usage process. This step will get the Gaussian distribution of latent variables SC and SNC

and

(distribution parameters μ _c and μ _nc are mean values,

is the variance).

步骤S3：使用隐变量表示h_c和h_nc生成原文X′。具体步骤参考使用过程的S204～S207。Step S3: Use hidden variables to represent h _c and h _nc to generate the original text X'. For specific steps, refer to S204-S207 of the usage process.

步骤S4：使用隐变量表示h_c生成摘要Y′。具体步骤参考使用过程的S4。Step S4: Generate summary Y' using hidden variable representation _hc . For specific steps, refer to S4 of the usage process.

步骤S5：计算训练损失并优化模型。Step S5: Calculate the training loss and optimize the model.

训练损失由三部分组成：原文重构损失L_R、摘要预测损失L_P以及分布约束损失L_KL。The training loss consists of three parts: original text reconstruction loss _LR , summary prediction loss _LP , and distribution constraint loss L _KL .

原文重构损失L_R是基于输入的原文档X和重构的X′计算的交叉熵，目的是为了训练双隐变量变分编码器和原文重构解码器。The original text reconstruction loss _LR is the cross-entropy calculated based on the input original document X and the reconstructed X′, the purpose is to train the double hidden variable variational encoder and the original text reconstruction decoder.

摘要预测损失L_P是基于输入的参考摘要Y和预测的Y′计算的交叉熵，目的是为了训练双隐变量变分编码器和摘要预测解码器。The digest prediction loss _LP is the cross-entropy computed based on the input reference digest Y and the predicted Y′ for the purpose of training a dual hidden variable variational encoder and a digest predictive decoder.

分布约束损失L_KL是基于标准正态分布

(0，I)和隐变量SC/SNC的高斯分布

(由步骤S2得到)计算的KL散度，目的是为了训练双隐变量变分编码器，使预测的变量分布更加规范化。The distribution constrained loss L _KL is based on the standard normal distribution

Gaussian distribution of (0, I) and latent variable SC/SNC

The calculated KL divergence (obtained by step S2) is aimed at training a double hidden variable variational encoder to make the predicted variable distribution more normalized.

最终的训练损失函数为L＝L_R+L_P+λL_KL，其中λ用于调整分布规范化约束的程度。The final training loss function is L=L _R +L _P +λL _KL , where λ is used to adjust the degree of distribution normalization constraints.

根据上述训练损失函数，采用Adam优化器训练模型。According to the above training loss function, the Adam optimizer is used to train the model.

以下为与上述方法实施例对应的系统实施例，本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效，为了减少重复，这里不再赘述。相应地，本实施方式中提到的相关技术细节也可应用在上述实施方式中。The following are system embodiments corresponding to the foregoing method embodiments, and this implementation manner may be implemented in cooperation with the foregoing implementation manners. The relevant technical details mentioned in the foregoing implementation manners are still valid in this implementation manner, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner may also be applied in the foregoing implementation manners.

和

and

使用隐变量表示h_c和L_nc生成重构原文；并使用隐变量表示h_c生成摘要结果；Use hidden variables to represent h _c and L _nc to generate reconstructed original text; and use hidden variables to represent h _c to generate summary results;

和该高斯分布

and the Gaussian distribution

Calculate the KL divergence;

最终的训练损失函数为L＝L_R+L_P+λL_KL,其中λ用于调整分布规范化约束的程度，基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L _R +L _P +λL _KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the double hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.

Claims

1. A sequence-to-sequence text summary generation method based on causal relationship is characterized by comprising the following steps:

step 1, inputting an original document into a dual-hidden variable variation coder based on a neural network, wherein the dual-hidden variable variation coder samples the original document for multiple times to extract abstract related features and abstract irrelevant features in the original document;

step 2, splicing the abstract relevant features and the abstract irrelevant features to obtain abstract comprehensive features, reconstructing an original document based on the abstract comprehensive features to obtain a reconstructed document, constructing a loss function based on the reconstructed document and the training target by taking the original document as a training target, and training a double-hidden-variable variational encoder;

and 3, extracting the abstract relevant characteristics of the original document by adopting the trained double-hidden variable variational encoder to serve as target characteristics, and obtaining the text abstract of the original document based on the target characteristics.

2. The method for generating a sequence-to-sequence text summary based on causal relationship of claim 1, wherein the step 1 comprises:

obtaining the coding expression vector of the document through a document coder, coding the original document X through the document coder in the double hidden variable variation coder to obtain the coding expression vector h of the document _doc ；

Variational encoding in dual latent variational encoder modulesEncoder-to-document encoded representation vector h _doc Coding and sampling are carried out to respectively obtain hidden variables h _c And h _nc ；

H is to be _c And h _nc Input the original text reconstruction decoder to obtain an output o at each position _i (ii) a Superposition h _c And h _nc And o _i Inputting a language model output layer to generate a reconstructed original document X'; calculating the reconstruction loss L from X' and X _R (ii) a Store L _R Minimum corresponding hidden variable representation h _c And h _nc As the abstract-related feature and the abstract-independent feature, respectively.

3. The method for generating a sequence-to-sequence text summary based on causal relationship of claim 2, wherein the step 3 comprises:

the abstract prediction decoder calculates the output r at each position according to the target characteristics _j (ii) a Superimposing the target feature and r _j And inputting the language model output layer to obtain the word list distribution probability so as to generate corresponding words to form the text abstract.

4. The method as claimed in claim 3, wherein the dual hidden variable variant coder, the digest predictive decoder and the original text reconstruction decoder are trained by:

obtaining a training document and a corresponding reference abstract, inputting the training document into a neural network-based double-hidden-variable variation encoder to obtain a hidden-variable representation h _c And h _nc (ii) a Simultaneously obtain h respectively _c And h _nc Gaussian distribution of

And

using hidden variables to represent h _c And h _nc Generating a reconstructed original text; and using hidden variable representationsh _c Generating a summary result;

constructing an original reconstruction loss L based on a training document and a reconstructed original _R (ii) a Constructing a summary prediction loss L based on the reference summary and the summary result _P (ii) a Based on a standard normal distribution

And the Gaussian distribution

Calculating KL divergence;

final training loss function L = L _R +L _P +λL _KL Wherein λ is used to adjust the degree of distribution normalization constraint, training the bilatent variable variational encoder, the digest prediction decoder, and the textual reconstruction decoder based on the training loss function L.

5. A causal relationship-based sequence-to-sequence text summary generation system, comprising:

the sampling module is used for inputting the original document into a double-hidden variable variational encoder based on a neural network, and the double-hidden variable variational encoder samples the original document for multiple times to extract abstract related features and abstract irrelevant features in the original document;

the splicing module is used for splicing the abstract related features and the abstract irrelevant features to obtain abstract comprehensive features, reconstructing an original document based on the abstract comprehensive features to obtain a reconstructed document, taking the original document as a training target, constructing a loss function based on the reconstructed document and the training target, and training a double-latent variable variational encoder;

and the extraction module is used for extracting the related features of the abstract of the original document by using the trained double-hidden variable variational encoder as target features and obtaining the text abstract of the original document based on the target features.

6. The causal relationship-based sequence-to-sequence text summary generation system of claim 5, wherein the sampling module is configured to:

Vector h for representing coding of document by variational coder in dual hidden variational coder module _doc Coding and sampling are carried out to respectively obtain hidden variables h _c And h _nc ；

7. The causal relationship-based sequence-to-sequence text summary generation system of claim 6, wherein the extraction module is configured to:

8. The system of claim 7, wherein the dual hidden variable variant coder, the digest predictive decoder and the textual reconstruction decoder are trained by:

obtaining a training document and a corresponding reference abstract, inputting the training document into a neural network-based dual-hidden variable variation coder to obtain a hidden variable representation h _c And h _nc (ii) a Simultaneously obtain h respectively _c And h _nc Gaussian distribution of

And

using hidden variables to represent h _c And h _nc Generating a reconstructed original text; and using hidden variables to represent h _c Generating a summary result;

And the Gaussian distribution

Calculating KL divergence;

the final training loss function is L = L _R +L _P +λL _KL Wherein λ is used to adjust the degree of distribution normalization constraint, training the bilatent variable variational encoder, the digest prediction decoder, and the plaintext reconstruction decoder for L based on the training loss function.

9. A storage medium storing a program for executing the cause-and-effect based sequence-to-sequence text digest generation method according to any one of claims 1 to 4.

10. A client for use in any one of the causality-based sequence-to-sequence text summary generation systems of claims 5 to 8.