CN115658881A - Sequence-to-sequence text summary generation method and system based on causality - Google Patents
Sequence-to-sequence text summary generation method and system based on causality Download PDFInfo
- Publication number
- CN115658881A CN115658881A CN202211215316.3A CN202211215316A CN115658881A CN 115658881 A CN115658881 A CN 115658881A CN 202211215316 A CN202211215316 A CN 202211215316A CN 115658881 A CN115658881 A CN 115658881A
- Authority
- CN
- China
- Prior art keywords
- abstract
- document
- sequence
- text
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000001364 causal effect Effects 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims description 59
- 238000009826 distribution Methods 0.000 claims description 45
- 238000005070 sampling Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 17
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 15
- 230000009977 dual effect Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 239000013604 expression vector Substances 0.000 claims 4
- 230000008569 process Effects 0.000 abstract description 18
- 230000000694 effects Effects 0.000 abstract description 4
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000005457 optimization Methods 0.000 description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 241000510091 Quadrula quadrula Species 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 235000011772 Acer rubrum var tomentosum Nutrition 0.000 description 1
- 235000009057 Acer rubrum var tridens Nutrition 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000002789 length control Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 244000038081 red maple Species 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
Description
技术领域technical field
本发明属于自然语言处理的文本摘要技术中序列到序列文本摘要技术领域,并特别涉及一种基于因果关系的序列到序列文本摘要生成方法及系统。The invention belongs to the technical field of sequence-to-sequence text summarization in the text summarization technology of natural language processing, and particularly relates to a sequence-to-sequence text summarization method and system based on causality.
背景技术Background technique
自动文本摘要技术旨在从输入文档中自动识别重要主题和关键信息,并生成准确、简洁又流畅的文本作为摘要。早期的摘要技术主要围绕人工设计的启发式规则和模板进行探究;如今,基于神经网络的深度文本摘要技术成为主流。它通过有监督训练端到端地学习原文和摘要的匹配模式,可分为抽取式和生成式摘要方法。Automatic text summarization technology aims to automatically identify important topics and key information from input documents, and generate accurate, concise and fluent text as a summary. Early summarization techniques were mainly explored around artificially designed heuristic rules and templates; today, deep text summarization techniques based on neural networks have become mainstream. It learns matching patterns of source text and summarization end-to-end through supervised training, which can be divided into extractive and generative summarization methods.
深度抽取式摘要方法抽取原文中的句子子集作为摘要文本。它通常将摘要任务转换为序列标注或排序形式的任务,通过对文档中每个侯选句子进行二分类标注,或者对侯选句子按照重要程度排序来选择关键句子构成摘要。深度生成式摘要方法则将摘要任务视为序列到序列的生成任务,以词语或短语作为基本生成单元,从无到有地生成摘要。由于摘要文本不拘泥于原始文本的表达,因此该方法生成的摘要具备更高的灵活性和多样性,适用性更强。The deep extractive summarization method extracts a subset of sentences in the original text as the summary text. It usually converts the summarization task into a sequence labeling or sorting task, and selects key sentences to form a summary by classifying each candidate sentence in the document, or sorting the candidate sentences according to their importance. The deep generative summarization method regards the summarization task as a sequence-to-sequence generation task, and uses words or phrases as the basic generation unit to generate summaries from scratch. Since the abstract text is not limited to the expression of the original text, the abstract generated by this method has higher flexibility and diversity, and stronger applicability.
近年来,以Transformer为主要结构的预训练模型为深度文本摘要方法进一步带来了性能增益。生成式方法从预训练模型中继承了优秀的语言表达能力。结合自身适用性强的优势,基于预训练模型的生成式摘要方法逐渐成为主流的研究课题。In recent years, the pre-training model with Transformer as the main structure has further brought performance gains to deep text summarization methods. Generative methods inherit excellent language expressiveness from pre-trained models. Combined with the advantages of its own strong applicability, the generative summarization method based on the pre-trained model has gradually become a mainstream research topic.
基于预训练模型的生成式摘要方法存在以下缺点。The generative summarization methods based on pre-trained models have the following disadvantages.
(1)首先,现有技术中的数据驱动方法缺乏可解释性。这是由于在端到端的训练方式中,用户只需给模型喂入样本及其标签,模型自动学习原文和摘要的匹配模式,这种“黑盒模式”对用户来说是不可理解的。(1) First, the data-driven methods in the prior art lack interpretability. This is because in the end-to-end training method, the user only needs to feed the model samples and their labels, and the model automatically learns the matching mode of the original text and the abstract. This "black box mode" is incomprehensible to the user.
(2)然后,现有技术生成的摘要常会包含多余的信息。这些冗余信息并非原文中的核心信息。这是由于现有技术在学习原文和摘要的匹配模式时,倾向于利用数据集中所有的相关关系,其中包含大量伪相关关系,继承了训练语料库中的统计偏差。这时模型容易被数据集中易学的表面特征所误导,比如高频共现但不具备必然联系的文本对。例如,“红色枫叶”作为高频共现的文本对,“红色”和“枫叶”具备高度统计相关性,但不具备必然的因果关系,“枫叶”也可以是“绿色”的。(2) However, summaries generated by prior art often contain redundant information. These redundant information are not the core information in the original text. This is due to the fact that existing techniques tend to exploit all correlations in the dataset when learning the matching patterns of the original text and the abstract, which contain a large number of pseudo-correlations, inheriting the statistical bias in the training corpus. At this time, the model is easily misled by the easy-to-learn surface features in the data set, such as text pairs that co-occur frequently but do not have necessary connections. For example, "red maple leaf" is a high-frequency co-occurrence text pair. "Red" and "maple leaf" have a high statistical correlation, but do not have an inevitable causal relationship. "Maple leaf" can also be "green".
发明内容Contents of the invention
本发明的目的是解决上述现有技术过度依赖相关关系、容易受伪相关误导的问题,提出了一种受因果理论启发的序列到序列框架。The purpose of the present invention is to solve the problem that the above-mentioned prior art relies too much on correlation and is easily misled by false correlation, and proposes a sequence-to-sequence framework inspired by causality theory.
本发明还提出了一种基于因果关系的序列到序列文本摘要生成方法,其中包括:The present invention also proposes a sequence-to-sequence text summary generation method based on causality, which includes:
步骤1、将原文档输入基于神经网络的双隐变量变分编码器,该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征;
步骤2、拼接该摘要相关特征和摘要无关特征,得到摘要综合特征,基于该摘要综合特征对原文档进行重构,得到重构文档,并以该原文档为训练目标,基于该重构文档和该训练目标构建损失函数,训练双隐变量变分编码器;
步骤3、采用训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征,基于该目标特征,得到该原文档的文本摘要。Step 3: Using the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.
所述的基于因果关系的序列到序列文本摘要生成方法,其中该步骤1包括:The sequence-to-sequence text summary generation method based on causality, wherein the
通过文档编码器得到文档的编码表示向量,该双隐变量变分编码器中的文档编码器对该原文档X进行编码,得到文档的编码表示向量hdoc;Obtain the encoded representation vector of the document through the document encoder, and the document encoder in the double hidden variable variational encoder encodes the original document X to obtain the encoded representation vector h doc of the document;
双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行编码及采样,分别得到隐变量hc和hnc;The variational encoder in the double hidden variable variational encoder module encodes and samples the coded representation vector h doc of the document to obtain hidden variables h c and h nc respectively;
将hc和hnc输入原文重构解码器,得到每个位置上的输出oi;叠加hc和hnc和oi,输入语言模型输出层生成重构的原文档X′;根据X′和X计算重构损失LR;存储LR最小时对应的隐变量表示hc和hnc分别作为该摘要相关特征和该摘要无关特征。Input h c and h nc into the original text reconstruction decoder to obtain the output o i at each position; superimpose h c and h nc and o i , input the output layer of the language model to generate the reconstructed original document X′; according to X′ and X to calculate the reconstruction loss L R ; store the hidden variable representations h c and h nc corresponding to the minimum L R as the summary-related features and the summary-independent features, respectively.
所述的基于因果关系的序列到序列文本摘要生成方法,其中该步骤3包括:The sequence-to-sequence text summary generation method based on causality, wherein the
摘要预测解码器根据该目标特征,计算每个位置上的输出rj;叠加该目标特征和rj,输入语言模型输出层,得到词表分布概率,以生成对应的词语组成该文本摘要。The abstract predictive decoder calculates the output r j at each position according to the target feature; superimposes the target feature and r j , and inputs it into the output layer of the language model to obtain the vocabulary distribution probability to generate the corresponding words to form the text summary.
所述的基于因果关系的序列到序列文本摘要生成方法,其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为:The causality-based sequence-to-sequence text summary generation method, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:
获取训练文档和对应的参考摘要,将训练文档输入基于神经网络的双隐变量变分编码器,得到隐变量表示hc和hnc;同时分别得到hc和hnc的高斯分布和 Obtain the training document and the corresponding reference abstract, input the training document into the double hidden variable variational encoder based on the neural network, and obtain the hidden variable representations h c and h nc ; at the same time, obtain the Gaussian distribution of h c and h nc respectively and
使用隐变量表示hc和hnc生成重构原文;并使用隐变量表示hc生成摘要结果;Use hidden variables to represent h c and h nc to generate reconstructed original text; and use hidden variables to represent h c to generate summary results;
基于训练文档和重构原文构建原文重构损失LR;基于该参考摘要和该摘要结果构建摘要预测损失LP;基于标准正态分布和该高斯分布计算KL散度;Construct the original text reconstruction loss L R based on the training document and reconstructed original text; construct the summary prediction loss L P based on the reference summary and the summary result; based on the standard normal distribution and the Gaussian distribution Calculate the KL divergence;
最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度,基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the dual hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.
本发明还提出了一种基于因果关系的序列到序列文本摘要生成系统,其中包括:The present invention also proposes a sequence-to-sequence text summary generation system based on causality, which includes:
采样模块,用于将原文档输入基于神经网络的双隐变量变分编码器,该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征;The sampling module is used to input the original document into a neural network-based double hidden variable variation encoder, and the double hidden variable variation encoder performs multiple sampling on the original document to extract abstract-related features and abstract-independent features in the original document;
拼接模块,用于拼接该摘要相关特征和摘要无关特征,得到摘要综合特征,基于该摘要综合特征对原文档进行重构,得到重构文档,并以该原文档为训练目标,基于该重构文档和该训练目标构建损失函数,训练双隐变量变分编码器;The splicing module is used to splicing the abstract-related features and the abstract-independent features to obtain the abstract comprehensive features, reconstruct the original document based on the abstract comprehensive features, and obtain the reconstructed document, and take the original document as the training target, based on the reconstructed The document and the training target construct a loss function to train a double hidden variable variational encoder;
提取模块,用于以训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征,基于该目标特征,得到该原文档的文本摘要。The extraction module is used to use the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.
所述的基于因果关系的序列到序列文本摘要生成系统,其中该采样模块用于:The sequence-to-sequence text summarization system based on causality, wherein the sampling module is used for:
通过文档编码器得到文档的编码表示向量,该双隐变量变分编码器中的文档编码器对该原文档X进行编码,得到文档的编码表示向量hdoc;Obtain the encoded representation vector of the document through the document encoder, and the document encoder in the double hidden variable variational encoder encodes the original document X to obtain the encoded representation vector h doc of the document;
双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行编码及采样,分别得到隐变量hc和hnc;The variational encoder in the double hidden variable variational encoder module encodes and samples the coded representation vector h doc of the document to obtain hidden variables h c and h nc respectively;
将hc和hnc输入原文重构解码器,得到每个位置上的输出oi;叠加hc和hnc和oi,输入语言模型输出层生成重构的原文档X′;根据X′和X计算重构损失LR;存储LR最小时对应的隐变量表示hc和hnc分别作为该摘要相关特征和该摘要无关特征。Input h c and h nc into the original text reconstruction decoder to obtain the output o i at each position; superimpose h c and h nc and o i , input the output layer of the language model to generate the reconstructed original document X′; according to X′ and X to calculate the reconstruction loss L R ; store the hidden variable representations h c and h nc corresponding to the minimum L R as the summary-related features and the summary-independent features, respectively.
所述的基于因果关系的序列到序列文本摘要生成系统,其中该提取模块用于:The sequence-to-sequence text summarization system based on causality, wherein the extraction module is used for:
摘要预测解码器根据该目标特征,计算每个位置上的输出rj;叠加该目标特征和rj,输入语言模型输出层,得到词表分布概率,以生成对应的词语组成该文本摘要。The abstract predictive decoder calculates the output r j at each position according to the target feature; superimposes the target feature and r j , and inputs it into the output layer of the language model to obtain the vocabulary distribution probability to generate the corresponding words to form the text summary.
所述的基于因果关系的序列到序列文本摘要生成系统,其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为:The causality-based sequence-to-sequence text summary generation system, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:
获取训练文档和对应的参考摘要,将训练文档输入基于神经网络的双隐变量变分编码器,得到隐变量表示hc和hnc;同时分别得到hc和hnc的高斯分布和 Obtain the training document and the corresponding reference abstract, input the training document into the double hidden variable variational encoder based on the neural network, and obtain the hidden variable representations h c and h nc ; at the same time, obtain the Gaussian distribution of h c and h nc respectively and
使用隐变量表示hc和hnc生成重构原文;并使用隐变量表示hc生成摘要结果;Use hidden variables to represent h c and h nc to generate reconstructed original text; and use hidden variables to represent h c to generate summary results;
基于训练文档和重构原文构建原文重构损失LR;基于该参考摘要和该摘要结果构建摘要预测损失LP;基于标准正态分布和该高斯分布计算KL散度;Construct the original text reconstruction loss L R based on the training document and reconstructed original text; construct the summary prediction loss L P based on the reference summary and the summary result; based on the standard normal distribution and the Gaussian distribution Calculate the KL divergence;
最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度,基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the dual hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.
本发明还提出了一种存储介质,用于存储执行所述任意一种基于因果关系的序列到序列文本摘要生成方法的程序。The present invention also proposes a storage medium for storing a program for executing any one of the sequence-to-sequence text summarization methods based on causality.
本发明还提出了一种客户端,用于所述任意一种基于因果关系的序列到序列文本摘要生成系统。The present invention also proposes a client, which is used for any of the sequence-to-sequence text summary generation systems based on causality.
由以上方案可知,本发明的CI-Seq2Seq框架与现有技术相比的优势在于:It can be seen from the above scheme that the advantages of the CI-Seq2Seq framework of the present invention compared with the prior art are:
(1)更强的可解释性。与现有技术笼统地学习原文的表示相比,本专利提出的方法能够有效地判别和提取原文中SC和SNC的隐变量表示hc和hnc。我们通过t-SNE可视化分析对所学隐变量表示的分布情况进行验证,如图1所示。从图1可以看出学到的hc和hnc表示的分布空间明显不同:hc的分布空间更加集中,而hnc的分布空间则更加分散。这证明了SC和SNC分别学到了核心信息和表示多样性的信息。(1) Stronger interpretability. Compared with the prior art which generally learns the representation of the original text, the method proposed in this patent can effectively distinguish and extract the latent variable representations h c and h nc of SC and SNC in the original text. We use t-SNE visual analysis to verify the distribution of the learned hidden variables, as shown in Figure 1. From Figure 1, it can be seen that the distribution spaces represented by the learned h c and h nc are significantly different: the distribution space of h c is more concentrated, while the distribution space of h nc is more dispersed. This proves that SC and SNC have learned core information and information representing diversity, respectively.
(2)更好的摘要性能。与现有技术相比,本专利提出的方法能够有效地提升摘要生成的质量。我们在公开的摘要数据集CNN/DM和XSUM上进行实验,将CI-Seq2Seq与其他基线模型对比。对比的基线模型包括两个同样基于变分自编码器实现的Unified VAE-PGN、VHTM,三个经典的通用预训练模型T5、BART、GLM,以及使用对比学习、对抗学习和长度可控等方法实现的CLIFF、Debiased-Ext和PtLAAM。评价指标是Rouge-1、Rouge-2和Rouge-L,它们分别衡量了单词、二元词组和最长公共子序列的召回率。对比结果如表1所示。根据表1可以看到,不仅是和其他基于变分自编码器实现的模型、通用强大的预训练模型,还是和使用其他训练技巧的模型相比,我们提出的方法在两个数据集上的所有指标都能获得明显提升。(2) Better summarization performance. Compared with the prior art, the method proposed in this patent can effectively improve the quality of summary generation. We conduct experiments on publicly available summarization datasets CNN/DM and XSUM to compare CI-Seq2Seq with other baseline models. The baseline models for comparison include two Unified VAE-PGN and VHTM, which are also based on variational autoencoders, three classic general-purpose pre-training models T5, BART, and GLM, and methods such as contrastive learning, confrontational learning, and length control Implemented CLIFF, Debiased-Ext and PtLAAM. The evaluation metrics are Rouge-1, Rouge-2, and Rouge-L, which measure the recall of words, bigrams, and longest common subsequences, respectively. The comparison results are shown in Table 1. According to Table 1, it can be seen that not only compared with other models based on variational autoencoders, general and powerful pre-training models, but also compared with models using other training techniques, our proposed method has better performance on the two data sets. All indicators can be significantly improved.
表1:CNN/DM和XSUM数据集上的摘要性能比较Table 1: Summary performance comparison on CNN/DM and XSUM datasets
(3)更强的泛化能力。与现有技术相比,本专利提出的方法能够有效地提升模型的泛化能力。我们将在CNN/DM上训练的模型用于XSUM测试集,将在XSUM上训练的模型用于CNN/DM测试集,由此测试不同模型在未见过的数据集上的表现能力,实验结果见表2。通过与原始框架BART进行对比,我们的方法能够获得一定的提升,证明了我们的模型具备更强的泛化能力。(3) Stronger generalization ability. Compared with the prior art, the method proposed in this patent can effectively improve the generalization ability of the model. We use the model trained on CNN/DM for the XSUM test set, and the model trained on XSUM for the CNN/DM test set to test the performance capabilities of different models on unseen datasets. Experimental results See Table 2. By comparing with the original framework BART, our method can get a certain improvement, which proves that our model has stronger generalization ability.
表2:CNN/DM和XSUM数据集上交叉实验的泛化性能比较Table 2: Comparison of generalization performance of crossover experiments on CNN/DM and XSUM datasets
附图说明Description of drawings
图1为t-SNE可视化分析示意图;Figure 1 is a schematic diagram of t-SNE visual analysis;
图2是使用CI-Seq2Seq生成摘要文本的总流程图;Figure 2 is a general flowchart of generating summary text using CI-Seq2Seq;
图3是使用CI-Seq2Seq生成摘要文本时,通过n次采样得到初始隐变量表示和的流程图;Figure 3 is a flow chart of obtaining the initial hidden variable representation sum through n sampling when using CI-Seq2Seq to generate summary text;
图4是使用CI-Seq2Seq生成摘要文本时,通过k轮迭代更新得到最优隐变量表示和的流程图;Figure 4 is a flowchart of obtaining the optimal hidden variable representation sum through k rounds of iterative updates when CI-Seq2Seq is used to generate summary text;
图5是使用CI-Seq2Seq生成摘要文本时,使用最优隐变量表示生成摘要的流程图;Figure 5 is a flow chart of generating a summary using the optimal hidden variable representation when using CI-Seq2Seq to generate a summary text;
图6是本发明方法的训练过程图。Fig. 6 is a diagram of the training process of the method of the present invention.
具体实施方式Detailed ways
发明人结合因果理论对文本摘要任务进行研究时,发现现有技术生成冗余摘要的缺陷是由对数据集中相关关系的过度依赖导致的。现有技术过度依赖摘要任务中各成分的相关关系,而没有考虑更本质的因果关系。因果关系描述了潜在的数据生成过程,现有技术仅仅依赖可观测变量难以刻画数据背后自然的因果关系。When the inventor combined the causal theory to study the text summarization task, he found that the defect of generating redundant summaries in the prior art is caused by the over-reliance on the correlation relationship in the data set. Existing techniques rely too much on the correlation of components in the summarization task, without considering the more essential causal relationship. The causal relationship describes the underlying data generation process, and it is difficult for existing technologies to describe the natural causal relationship behind the data only by relying on observable variables.
本发明引入两个不可观测变量并使用因果感知建模相应的数据生成过程,以解决现有技术过度依赖相关关系、容易受伪相关误导的问题。具体而言,本发明为摘要任务设计了一个结构因果模型,其中涉及到的变量除了可观测的原文和摘要外,还包含两个不可观测变量,分别表示决定摘要内容的信息(summary-causal factors,SC),如原文中的核心信息,和语料库中不决定摘要内容的其他信息(summary-non-causal factors,SNC),如导致原文生成多样性的边缘信息。两个不可观测变量的根本区别在于只有前者SC是摘要产生的原因。通过显式地区分两种信息,本发明设计了一个受因果理论启发的序列到序列框架(Causality Inspired Sequence-to-Sequence model,CI-Seq2Seq)来刻画摘要和文本的生成过程,在生成摘要时只利用SC,而在生成原文时两者都使用。模型结构上,本发明对传统的序列到序列框架中的编码器-解码器架构做出如下改进:将单编码器、单解码器改进为双隐变量变分编码器、原文重构解码器和摘要预测解码器。具体来说本发明包括如下关键技术点:The present invention introduces two unobservable variables and uses causal perception to model the corresponding data generation process, so as to solve the problem of excessive dependence on correlation relationship and easy misleading by false correlation in the prior art. Specifically, the present invention designs a structural causal model for the summarization task, in which the variables involved include two unobservable variables in addition to the observable original text and abstract, respectively representing the information that determines the content of the abstract (summary-causal factors , SC), such as the core information in the original text, and other information (summary-non-causal factors, SNC) in the corpus that do not determine the content of the summary, such as marginal information that leads to the diversity of the original text. The fundamental difference between the two unobservable variables is that only the former SC is responsible for the summary. By explicitly distinguishing the two kinds of information, the present invention designs a sequence-to-sequence framework (Causality Inspired Sequence-to-Sequence model, CI-Seq2Seq) inspired by causality theory to describe the generation process of abstracts and texts. When generating abstracts Only SC is utilized, while both are used when generating source text. In terms of model structure, the present invention makes the following improvements to the encoder-decoder architecture in the traditional sequence-to-sequence framework: the single encoder and single decoder are improved to double hidden variable variational encoders, original text reconstruction decoders and Digest predictive decoder. Specifically, the present invention includes the following key technical points:
关键点1,从数据生成过程的角度设计的文本摘要结构因果模型;技术效果:根据该结构因果模型设计的文本摘要方法能够显式地区分两种信息,学到SC和SNC各自的表示,具备更强的可解释性;
关键点2,受因果理论启发的序列到序列框架CI-Seq2Seq;技术效果:该框架能够有效区分SC和SNC并获得各自的表示,其中SC的表示用于生成摘要,SC和SNC都用于生成原文;该框架基于有监督的变分自编码器实现,能有效提高生成摘要的质量。其中核心模块包括:双隐变量变分编码器:利用神经网络,将原文编码为SC和SNC的隐变量表示;原文重构解码器:利用神经网络,将SC和SNC的隐变量表示进行解码,用于重构原文;摘要预测解码器:利用神经网络,将SC的隐变量表示进行解码,用于预测摘要。
为让本发明的上述特征和效果能阐述的更明确易懂,下文特举实施例,并配合说明书附图作详细说明如下。In order to make the above-mentioned features and effects of the present invention more clear and understandable, the following specific examples are given together with the accompanying drawings for detailed description as follows.
如图2所示,是相应的总流程图。本发明生成摘要文本的总体流程包括以下几个步骤:As shown in Figure 2, it is the corresponding overall flow chart. The overall flow of the present invention to generate abstract text includes the following steps:
步骤S1:输入原文档X,将输入原文档X输入基于神经网络的双隐变量变分编码器模块。Step S1: Input the original document X, and input the original document X into the neural network-based double hidden variable variational encoder module.
步骤S2:通过n次采样得到初始隐变量表示hc0和hnc0。具体流程如图3所示,包括如下子步骤:Step S2: Obtain initial hidden variable representations h c0 and h nc0 through n times of sampling. The specific process is shown in Figure 3, including the following sub-steps:
步骤S201:通过文档编码器得到文档的编码表示向量hdoc。双隐变量变分编码器模块中的文档编码器对输入的原文档X进行编码,得到文档的编码表示向量hdoc。Step S201: Obtain the encoded representation vector h doc of the document through the document encoder. The document encoder in the double hidden variable variational encoder module encodes the input original document X to obtain the encoded representation vector h doc of the document.
步骤S202:通过变分编码器得到隐变量SC和SNC的高斯分布。双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行进一步编码,得到隐变量SC和SNC的高斯分布(分布参数为μc、μnc和)。Step S202: Gaussian distributions of latent variables SC and SNC are obtained through a variational encoder. The variational encoder in the double hidden variable variational encoder module further encodes the coded representation vector h doc of the document, and obtains the Gaussian distribution of hidden variables SC and SNC (the distribution parameters are μ c , μ nc and ).
步骤S203:通过隐变量采样器获得隐变量表示hc和hnc。隐变量采样器根据上一步得到的高斯分布进行采样,分别得到隐变量SC和SNC的表示hc和hnc。其中采样过程通过经典的再参量化(reparametrization)技术完成。具体来说,先从标准正态分布中采样得到变量εc和εnc,即 再根据如下公式分别得到所需的隐变量表示hc和hnc:Step S203: Obtain hidden variable representations h c and h nc through the hidden variable sampler. The latent variable sampler performs sampling according to the Gaussian distribution obtained in the previous step, and obtains the representations h c and h nc of latent variables SC and SNC respectively. The sampling process is completed through a classic reparametrization technique. Specifically, the variables ε c and ε nc are first sampled from the standard normal distribution, namely Then obtain the required latent variable representations h c and h nc according to the following formulas:
hc=μc+σc*εc,h c =μ c +σ c *ε c ,
hnc=μnc+σnc*εnc.h nc =μ nc +σ nc *ε nc .
步骤S204:将隐变量表示hc和hnc拼接得到的hcnc输入原文重构解码器。将上一步得到的隐变量表示hc和hnc进行拼接,得到hcnc,作为原文重构解码器的初始输入。Step S204: Input the h cnc obtained by splicing the latent variable representation h c and h nc into the original text reconstruction decoder. Concatenate the hidden variable representations h c and h nc obtained in the previous step to obtain h cnc , which is used as the initial input of the original text reconstruction decoder.
步骤S205:通过原文重构解码器得到oi。原文重构解码器根据初始输入,计算每个位置上的输出oi。位置即原文词序列中,按照从左到右的顺序,每个位置对应一个词。Step S205: Obtain o i through the original text reconstruction decoder. The original text reconstruction decoder calculates the output o i at each position according to the initial input. The position is the original text word sequence, in order from left to right, each position corresponds to a word.
步骤S206:叠加hcnc和oi,输入语言模型输出层。将步骤S204得到的hcnc叠加到上一步得到的oi上,作为语言模型输出层的输入。其中语言模型输出层是现有技术,基于线性变换和束搜索(beam search)实现。Step S206: superimpose h cnc and o i , and input them into the output layer of the language model. Superimpose the h cnc obtained in step S204 on the o i obtained in the previous step as the input of the language model output layer. The language model output layer is an existing technology, which is implemented based on linear transformation and beam search.
步骤S207:通过语言模型输出层生成重构的原文档X′。语言模型输出层将上一步得到的结果通过线性变换映射到词表上,得到词表分布概率,并根据束搜索方法生成对应的词语组成原文档X′。Step S207: Generate the reconstructed original document X' through the output layer of the language model. The language model output layer maps the results obtained in the previous step to the vocabulary through linear transformation to obtain the distribution probability of the vocabulary, and generates the corresponding words to form the original document X′ according to the beam search method.
步骤S208:根据X′和X计算重构损失LR。将重构的原文档X′和输入的原文档X进行比对,计算重构损失LR。Step S208: Calculate the reconstruction loss LR according to X' and X. Compare the reconstructed original document X′ with the input original document X, and calculate the reconstruction loss L R .
生成原文是为了获取对于当前原文来说最佳的SC表示hc和SNC表示hnc,其中hc将用于摘要的生成。因训练和测试时原文都是给定的,所以本发明使用原文作为监督信号:重构原文后,将生成的原文X′和给定的原文X比对,根据重构的误差确定最优的隐变量表示hc和hnc,其中hc将用于摘要的生成。即准确的SC表示既能和SNC共同作用,很好地重构原文信息,又能单独作用生成摘要。当一个SC表示能够很好地重构原文时,即也适合用来生成摘要。Generating the original text is to obtain the best SC representation h c and SNC representation h nc for the current original text, where h c will be used for abstract generation. Because the original text is given during training and testing, the present invention uses the original text as a supervisory signal: after reconstructing the original text, compare the generated original text X′ with the given original text X, and determine the optimal Hidden variables represent h c and h nc , where h c will be used for summary generation. That is to say, the accurate SC representation can not only work with SNC to reconstruct the original text information well, but also can work alone to generate a summary. When an SC representation can well reconstruct the original text, it is also suitable for generating summaries.
步骤S209:存储当前最小LR对应的隐变量表示hc和hnc。对比本次循环中的LR和目前最小LR的大小,将本次循环后最小LR对应的隐变量表示hc和hnc进行存储。Step S209: Store hidden variable representations h c and h nc corresponding to the current minimum LR . Compare the size of LR in this cycle with the current minimum LR , and store the hidden variable representations h c and h nc corresponding to the minimum LR after this cycle.
步骤S210:得到初始隐变量表示hc0和hnc0。将上述步骤(S203~S209)重复n次,通过多次采样,返回使重构损失LR最小的隐变量表示hc和hnc作为初始隐变量表示hc0和hnc0。Step S210: Obtain initial hidden variable representations h c0 and h nc0 . Repeat the above steps (S203-S209) n times, and return the hidden variable representations h c and h nc that minimize the reconstruction loss LR through multiple samplings as initial hidden variable representations h c0 and h nc0 .
步骤S3:通过k轮迭代更新得到最优隐变量表示hc和hnc。如图4所示,具体流程包括以下子步骤:Step S3: Obtain the optimal hidden variable representations h c and h nc through k rounds of iterative updating. As shown in Figure 4, the specific process includes the following sub-steps:
步骤S301:使用初始隐变量表示hc0和hnc0初始化隐变量表示hc和hnc。根据步骤S2得到的初始隐变量表示hc0和hnc0,初始化隐变量SC和SNC的表示hc和hnc。Step S301: Use initial hidden variable representations h c0 and h nc0 to initialize hidden variable representations h c and h nc . According to the initial hidden variable representations h c0 and h nc0 obtained in step S2, the representations h c and h nc of the hidden variables SC and SNC are initialized.
步骤S302:将隐变量表示hc和hnc拼接得到的hcnc输入原文重构解码器。将上一步得到的隐变量表示hc和hnc进行拼接,得到hcnc,作为原文重构解码器的初始输入。Step S302: Input the h cnc obtained by splicing the latent variable representation h c and h nc into the original text reconstruction decoder. Concatenate the hidden variable representations h c and h nc obtained in the previous step to obtain h cnc , which is used as the initial input of the original text reconstruction decoder.
步骤S303:通过原文重构解码器得到oi。原文重构解码器根据初始输入,计算每个位置上的输出oi。Step S303: Obtain o i through the original text reconstruction decoder. The original text reconstruction decoder calculates the output o i at each position according to the initial input.
步骤S304:叠加hcnc和oi,输入语言模型输出层。将步骤S302得到的hcnc叠加到上一步得到的oi上,作为语言模型输出层的输入。Step S304: superimpose h cnc and o i , and input them into the output layer of the language model. Superimpose the h cnc obtained in step S302 on the o i obtained in the previous step as the input of the output layer of the language model.
步骤S305:通过语言模型输出层生成重构的原文档X′。语言模型输出层将上一步得到的结果映射到词表上,得到词表分布概率,并生成对应的词语组成原文档X′。Step S305: Generate a reconstructed original document X' through the language model output layer. The language model output layer maps the results obtained in the previous step to the vocabulary, obtains the distribution probability of the vocabulary, and generates the corresponding words to form the original document X′.
步骤S306:根据X′和X计算重构损失LR。将重构的原文档X′和输入的原文档X进行比对,计算重构损失LR。Step S306: Calculate the reconstruction loss LR according to X' and X. Compare the reconstructed original document X′ with the input original document X, and calculate the reconstruction loss L R .
步骤S307:优化得到新的隐变量表示hc和hnc。根据重构损失LR和Adam优化算法优化隐变量表示hc和hnc。具体的优化过程如下:首先将隐变量表示设置为可学习的参数,然后对其设置专门的Adam优化器;对于上一步计算的重构损失LR,通过反向传播算法进行梯度回传,最后根据Adam算法直接更新可学习的隐变量表示hc和hnc。Step S307: Optimizing to obtain new hidden variable representations h c and h nc . The hidden variable representations h c and h nc are optimized according to the reconstruction loss LR and the Adam optimization algorithm. The specific optimization process is as follows: first, set the hidden variable representation as a learnable parameter, and then set a special Adam optimizer for it; for the reconstruction loss L R calculated in the previous step, the gradient is returned through the backpropagation algorithm, and finally The learnable latent variable representations h c and h nc are directly updated according to the Adam algorithm.
步骤S308:得到最优隐变量表示hc和hnc。将上述步骤(S302~S307)重复k次,通过多次优化,返回最优隐变量表示hc和hnc。Step S308: Obtain optimal hidden variable representations h c and h nc . The above steps (S302-S307) are repeated k times, and through multiple optimizations, the optimal hidden variable representations h c and h nc are returned.
综上,步骤S2和S3的目的都是为了获得准确的隐变量表示,其评价依据均为重构损失。S2作为S3的前序步骤,用于后者的初始化。技术上,它获得隐变量的方式是多次采样。而S3则是基于S2得到的最优初始化向量进行后续的优化操作。具体的优化过程如下:首先将隐变量表示设置为可学习的参数,然后对其设置专门的Adam优化器;对于上一步计算的重构损失LR,通过反向传播算法进行梯度回传,最后根据Adam算法直接更新可学习的隐变量表示hc和hnc。好的初始化对于优化过程来说至关重要,并且优化操作的耗时更长。因此在S2时先通过多次采样的方式来选择当前最佳的表示,更加快捷。In summary, the purpose of steps S2 and S3 is to obtain accurate hidden variable representation, and the evaluation basis is the reconstruction loss. S2 is used as the preceding step of S3 for the initialization of the latter. Technically, the way it obtains hidden variables is multiple sampling. S3 performs subsequent optimization operations based on the optimal initialization vector obtained by S2. The specific optimization process is as follows: first, set the hidden variable representation as a learnable parameter, and then set a special Adam optimizer for it; for the reconstruction loss L R calculated in the previous step, the gradient is returned through the backpropagation algorithm, and finally The learnable latent variable representations h c and h nc are directly updated according to the Adam algorithm. Good initialization is crucial for the optimization process, and optimization operations take longer. Therefore, in S2, it is faster to select the current best representation by means of multiple sampling.
步骤S4:使用最优隐变量表示hc生成摘要Y′。如图5所示,具体流程包括以下子步骤:Step S4: Use the optimal latent variable representation h c to generate a summary Y'. As shown in Figure 5, the specific process includes the following sub-steps:
步骤S401:将最优隐变量表示hc输入摘要预测解码器。将步骤S3得到的最优隐变量表示hc作为摘要预测解码器的初始输入。Step S401: Input the optimal latent variable representation hc into the digest prediction decoder. The optimal hidden variable representation h c obtained in step S3 is used as the initial input of the digest prediction decoder.
步骤S402:通过摘要预测解码器得到rj。摘要预测解码器根据初始输入,计算每个位置上的输出rj。Step S402: Obtain r j through the digest prediction decoder. The summary predictive decoder computes the output r j at each position based on the initial input.
步骤S403:叠加hc和rj,输入语言模型输出层。将步骤S401的hc叠加到上一步得到的rj上,作为语言模型输出层的输入。Step S403: superimpose h c and r j , and input them into the output layer of the language model. Superimpose the h c of step S401 on the r j obtained in the previous step, and use it as the input of the output layer of the language model.
步骤S404:通过语言模型输出层生成预测的摘要Y′。语言模型输出层将上一步得到的结果映射到词表上,得到词表分布概率,并生成对应的词语组成摘要Y′。Step S404: Generate a predicted summary Y' through the output layer of the language model. The language model output layer maps the results obtained in the previous step to the vocabulary, obtains the distribution probability of the vocabulary, and generates the corresponding word composition summary Y′.
上述是本发明的使用过程,现对本发明方法的训练过程说明如下。相应的训练流程图如图6所示。The above is the use process of the present invention, and the training process of the method of the present invention is described as follows. The corresponding training flow chart is shown in Figure 6.
步骤S1:输入原文档X和参考摘要Y。将输入原文档X输入基于神经网络的双隐变量变分编码器模块。Step S1: Input original document X and reference abstract Y. Input the original document X into the neural network-based double hidden variable variational encoder module.
步骤S2:通过采样得到隐变量表示hc和hnc。具体步骤参考使用过程的S201~S203。该步骤会得到隐变量SC和SNC的高斯分布和(分布参数μc、μnc为均值,为方差)。Step S2: Obtain hidden variable representations h c and h nc through sampling. For specific steps, refer to S201-S203 of the usage process. This step will get the Gaussian distribution of latent variables SC and SNC and (distribution parameters μ c and μ nc are mean values, is the variance).
步骤S3:使用隐变量表示hc和hnc生成原文X′。具体步骤参考使用过程的S204~S207。Step S3: Use hidden variables to represent h c and h nc to generate the original text X'. For specific steps, refer to S204-S207 of the usage process.
步骤S4:使用隐变量表示hc生成摘要Y′。具体步骤参考使用过程的S4。Step S4: Generate summary Y' using hidden variable representation hc . For specific steps, refer to S4 of the usage process.
步骤S5:计算训练损失并优化模型。Step S5: Calculate the training loss and optimize the model.
训练损失由三部分组成:原文重构损失LR、摘要预测损失LP以及分布约束损失LKL。The training loss consists of three parts: original text reconstruction loss LR , summary prediction loss LP , and distribution constraint loss L KL .
原文重构损失LR是基于输入的原文档X和重构的X′计算的交叉熵,目的是为了训练双隐变量变分编码器和原文重构解码器。The original text reconstruction loss LR is the cross-entropy calculated based on the input original document X and the reconstructed X′, the purpose is to train the double hidden variable variational encoder and the original text reconstruction decoder.
摘要预测损失LP是基于输入的参考摘要Y和预测的Y′计算的交叉熵,目的是为了训练双隐变量变分编码器和摘要预测解码器。The digest prediction loss LP is the cross-entropy computed based on the input reference digest Y and the predicted Y′ for the purpose of training a dual hidden variable variational encoder and a digest predictive decoder.
分布约束损失LKL是基于标准正态分布(0,I)和隐变量SC/SNC的高斯分布(由步骤S2得到)计算的KL散度,目的是为了训练双隐变量变分编码器,使预测的变量分布更加规范化。The distribution constrained loss L KL is based on the standard normal distribution Gaussian distribution of (0, I) and latent variable SC/SNC The calculated KL divergence (obtained by step S2) is aimed at training a double hidden variable variational encoder to make the predicted variable distribution more normalized.
最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints.
根据上述训练损失函数,采用Adam优化器训练模型。According to the above training loss function, the Adam optimizer is used to train the model.
以下为与上述方法实施例对应的系统实施例,本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述实施方式中。The following are system embodiments corresponding to the foregoing method embodiments, and this implementation manner may be implemented in cooperation with the foregoing implementation manners. The relevant technical details mentioned in the foregoing implementation manners are still valid in this implementation manner, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner may also be applied in the foregoing implementation manners.
本发明还提出了一种基于因果关系的序列到序列文本摘要生成系统,其中包括:The present invention also proposes a sequence-to-sequence text summary generation system based on causality, which includes:
采样模块,用于将原文档输入基于神经网络的双隐变量变分编码器,该双隐变量变分编码器对该原文档进行多次采样提取该原文档中摘要相关特征和摘要无关特征;The sampling module is used to input the original document into a neural network-based double hidden variable variation encoder, and the double hidden variable variation encoder performs multiple sampling on the original document to extract abstract-related features and abstract-independent features in the original document;
拼接模块,用于拼接该摘要相关特征和摘要无关特征,得到摘要综合特征,基于该摘要综合特征对原文档进行重构,得到重构文档,并以该原文档为训练目标,基于该重构文档和该训练目标构建损失函数,训练双隐变量变分编码器;The splicing module is used to splicing the abstract-related features and the abstract-independent features to obtain the abstract comprehensive features, reconstruct the original document based on the abstract comprehensive features, and obtain the reconstructed document, and take the original document as the training target, based on the reconstructed The document and the training target construct a loss function to train a double hidden variable variational encoder;
提取模块,用于以训练完成后的双隐变量变分编码器提取该原文档的摘要相关特征作为目标特征,基于该目标特征,得到该原文档的文本摘要。The extraction module is used to use the trained double hidden variable variational encoder to extract the summary-related features of the original document as target features, and obtain the text summary of the original document based on the target features.
所述的基于因果关系的序列到序列文本摘要生成系统,其中该采样模块用于:The sequence-to-sequence text summarization system based on causality, wherein the sampling module is used for:
通过文档编码器得到文档的编码表示向量,该双隐变量变分编码器中的文档编码器对该原文档X进行编码,得到文档的编码表示向量hdoc;Obtain the encoded representation vector of the document through the document encoder, and the document encoder in the double hidden variable variational encoder encodes the original document X to obtain the encoded representation vector h doc of the document;
双隐变量变分编码器模块中的变分编码器对文档的编码表示向量hdoc进行编码及采样,分别得到隐变量hc和hnc;The variational encoder in the double hidden variable variational encoder module encodes and samples the coded representation vector h doc of the document to obtain hidden variables h c and h nc respectively;
将hc和hnc输入原文重构解码器,得到每个位置上的输出oi;叠加hc和hnc和oi,输入语言模型输出层生成重构的原文档X′;根据X′和X计算重构损失LR;存储LR最小时对应的隐变量表示hc和hnc分别作为该摘要相关特征和该摘要无关特征。Input h c and h nc into the original text reconstruction decoder to obtain the output o i at each position; superimpose h c and h nc and o i , input the output layer of the language model to generate the reconstructed original document X′; according to X′ and X to calculate the reconstruction loss L R ; store the hidden variable representations h c and h nc corresponding to the minimum L R as the summary-related features and the summary-independent features, respectively.
所述的基于因果关系的序列到序列文本摘要生成系统,其中该提取模块用于:The sequence-to-sequence text summarization system based on causality, wherein the extraction module is used for:
摘要预测解码器根据该目标特征,计算每个位置上的输出rj;叠加该目标特征和rj,输入语言模型输出层,得到词表分布概率,以生成对应的词语组成该文本摘要。The abstract predictive decoder calculates the output r j at each position according to the target feature; superimposes the target feature and r j , and inputs it into the output layer of the language model to obtain the vocabulary distribution probability to generate the corresponding words to form the text summary.
所述的基于因果关系的序列到序列文本摘要生成系统,其中该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器三者的训练方法为:The causality-based sequence-to-sequence text summary generation system, wherein the training methods of the double hidden variable variation encoder, the summary prediction decoder and the original text reconstruction decoder are:
获取训练文档和对应的参考摘要,将训练文档输入基于神经网络的双隐变量变分编码器,得到隐变量表示hc和hnc;同时分别得到hc和hnc的高斯分布和 Obtain the training document and the corresponding reference abstract, input the training document into the double hidden variable variational encoder based on the neural network, and obtain the hidden variable representations h c and h nc ; at the same time, obtain the Gaussian distribution of h c and h nc respectively and
使用隐变量表示hc和Lnc生成重构原文;并使用隐变量表示hc生成摘要结果;Use hidden variables to represent h c and L nc to generate reconstructed original text; and use hidden variables to represent h c to generate summary results;
基于训练文档和重构原文构建原文重构损失LR;基于该参考摘要和该摘要结果构建摘要预测损失LP;基于标准正态分布和该高斯分布计算KL散度;Construct the original text reconstruction loss L R based on the training document and reconstructed original text; construct the summary prediction loss L P based on the reference summary and the summary result; based on the standard normal distribution and the Gaussian distribution Calculate the KL divergence;
最终的训练损失函数为L=LR+LP+λLKL,其中λ用于调整分布规范化约束的程度,基于该训练损失函数为L训练该双隐变量变分编码器、该摘要预测解码器和该原文重构解码器。The final training loss function is L=L R +L P +λL KL , where λ is used to adjust the degree of distribution normalization constraints. Based on the training loss function, the double hidden variable variational encoder and the summary prediction decoder are trained for L and reconstruct the decoder from the original text.
本发明还提出了一种存储介质,用于存储执行所述任意一种基于因果关系的序列到序列文本摘要生成方法的程序。The present invention also proposes a storage medium for storing a program for executing any one of the sequence-to-sequence text summarization methods based on causality.
本发明还提出了一种客户端,用于所述任意一种基于因果关系的序列到序列文本摘要生成系统。The present invention also proposes a client, which is used for any of the sequence-to-sequence text summary generation systems based on causality.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211215316.3A CN115658881A (en) | 2022-09-30 | 2022-09-30 | Sequence-to-sequence text summary generation method and system based on causality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211215316.3A CN115658881A (en) | 2022-09-30 | 2022-09-30 | Sequence-to-sequence text summary generation method and system based on causality |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115658881A true CN115658881A (en) | 2023-01-31 |
Family
ID=84985693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211215316.3A Pending CN115658881A (en) | 2022-09-30 | 2022-09-30 | Sequence-to-sequence text summary generation method and system based on causality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115658881A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI847696B (en) * | 2023-05-15 | 2024-07-01 | 中國信託商業銀行股份有限公司 | Summary generation method based on prompt engineering and its computing device |
-
2022
- 2022-09-30 CN CN202211215316.3A patent/CN115658881A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI847696B (en) * | 2023-05-15 | 2024-07-01 | 中國信託商業銀行股份有限公司 | Summary generation method based on prompt engineering and its computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112487143B (en) | Public opinion big data analysis-based multi-label text classification method | |
CN114757182B (en) | A BERT short text sentiment analysis method with improved training method | |
CN110020438A (en) | Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence | |
CN111125333B (en) | A Generative Question Answering Method Based on Representation Learning and Multilayer Covering Mechanism | |
CN113657123A (en) | Mongolian Aspect-Level Sentiment Analysis Method Based on Target Template Guidance and Relation Head Coding | |
CN111401003B (en) | Method for generating humor text with enhanced external knowledge | |
CN117556037A (en) | Multi-mode abstract generation method for code summarization based on word replacement strategy | |
CN112818698A (en) | Fine-grained user comment sentiment analysis method based on dual-channel model | |
CN114168754A (en) | Relation extraction method based on syntactic dependency and fusion information | |
CN115563959A (en) | Self-supervised pre-training method, system and medium for Chinese Pinyin spelling error correction | |
Wu et al. | GCDST: A graph-based and copy-augmented multi-domain dialogue state tracking | |
Alissa et al. | Text simplification using transformer and BERT | |
CN116861269A (en) | Multi-source heterogeneous data fusion and analysis method in engineering field | |
Cai et al. | HITS-based attentional neural model for abstractive summarization | |
CN111382333B (en) | Case element extraction method in news text sentences based on case correlation joint learning and graph convolution | |
CN115906815A (en) | Error correction method and device for modifying one or more types of wrong sentences | |
Wu | A computational neural network model for college English grammar correction | |
CN115719072A (en) | A method and system for text-level neural machine translation based on mask mechanism | |
CN115658881A (en) | Sequence-to-sequence text summary generation method and system based on causality | |
CN115270795A (en) | A Named Entity Recognition Technology in Environmental Assessment Field Based on Small Sample Learning | |
CN114662503A (en) | An aspect-level sentiment analysis method based on LSTM and grammatical distance | |
CN114358021A (en) | Task type dialogue statement reply generation method based on deep learning and storage medium | |
WO2025055581A1 (en) | Speech encoder training method and apparatus, and device, medium and program product | |
CN116681087B (en) | An automatic question generation method based on multi-stage timing and semantic information enhancement | |
CN115309886B (en) | Artificial intelligence text creation method based on multimodal information input |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |