[go: up one dir, main page]

CN118520976A - Text dialogue generation model training method, text dialogue generation method and equipment - Google Patents

Text dialogue generation model training method, text dialogue generation method and equipment Download PDF

Info

Publication number
CN118520976A
CN118520976A CN202410978470.9A CN202410978470A CN118520976A CN 118520976 A CN118520976 A CN 118520976A CN 202410978470 A CN202410978470 A CN 202410978470A CN 118520976 A CN118520976 A CN 118520976A
Authority
CN
China
Prior art keywords
model
knowledge
vector
training
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410978470.9A
Other languages
Chinese (zh)
Other versions
CN118520976B (en
Inventor
吴俊江
王晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Athena Eyes Co Ltd
Original Assignee
Athena Eyes Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Athena Eyes Co Ltd filed Critical Athena Eyes Co Ltd
Priority to CN202410978470.9A priority Critical patent/CN118520976B/en
Publication of CN118520976A publication Critical patent/CN118520976A/en
Application granted granted Critical
Publication of CN118520976B publication Critical patent/CN118520976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种文本对话生成模型训练方法、文本对话生成方法及设备,包括:通过对问答对、文档类型的知识库进行检索,得到预设数量的基础知识;基于基础知识、当前上下文和当前问题,构建第一提示词和第二提示词;构建主模型和增强模型,分别采用第一提示词对主模型进行任务训练,采用第二提示词对增强模型进行任务训练,并将得到的模型进行融合,得到初始融合模型;对初始融合模型进行训练,得到目标融合模型。采用本发明提高了文本对话生成的准确率。

The present invention discloses a text dialogue generation model training method, a text dialogue generation method and a device, comprising: obtaining a preset amount of basic knowledge by searching a knowledge base of question-answer pairs and document types; constructing a first prompt word and a second prompt word based on the basic knowledge, current context and current question; constructing a main model and an enhanced model, respectively using the first prompt word to perform task training on the main model, and using the second prompt word to perform task training on the enhanced model, and fusing the obtained models to obtain an initial fusion model; and training the initial fusion model to obtain a target fusion model. The present invention improves the accuracy of text dialogue generation.

Description

文本对话生成模型训练方法、文本对话生成方法及设备Text dialogue generation model training method, text dialogue generation method and device

技术领域Technical Field

本发明涉及自然语言处理技术领域,尤其涉及一种文本对话生成模型训练方法、文本对话生成方法及设备。The present invention relates to the technical field of natural language processing, and in particular to a text dialogue generation model training method, a text dialogue generation method and a device.

背景技术Background Art

随着人工智能技术的发展,智能应答受到了广泛的应用,智能应答需要根据问题进行检索,回去适合的回答,为提高应答效率和准确率,往往需要通过从一个庞大的文档集合中检索出相关的信息,然后利用这些检索到的信息来指导文本的生成,从而提高预测的质量和准确性。With the development of artificial intelligence technology, intelligent answering has been widely used. Intelligent answering needs to search based on the question and return the appropriate answer. In order to improve the efficiency and accuracy of the answer, it is often necessary to retrieve relevant information from a large document collection, and then use the retrieved information to guide the generation of text, thereby improving the quality and accuracy of the prediction.

现有的一些实现方式中,是通过Embedding模型从知识库中召回排序前N个(topN)文本知识,并根据该topN的知识和当前的查询队列构建提示词prompt输入大型语言模型进行基于检索的生成式问答。In some existing implementations, the top N (topN) text knowledge is recalled from the knowledge base through the Embedding model, and a prompt word is built based on the topN knowledge and the current query queue to input into a large language model for retrieval-based generative question answering.

发明人在实现本申请的过程中,发现现有方式至少存在如下问题:一是向量模型召回的知识的覆盖率和准确性,即召回的知识多了可能会带有噪音,召回少了则准确性不够;二是不同大模型的模型能力或有差别,特别是不同领域/不同任务下的大模型的能力不同,这些问题导致现有的文本对话生成准确率和效率不高的问题。In the process of implementing the present application, the inventors found that the existing methods have at least the following problems: First, the coverage and accuracy of the knowledge recalled by the vector model, that is, if more knowledge is recalled, it may contain noise, and if less knowledge is recalled, the accuracy may be insufficient; second, the model capabilities of different large models may be different, especially the capabilities of large models in different fields/different tasks are different. These problems lead to the problem of low accuracy and efficiency of existing text dialogue generation.

发明内容Summary of the invention

本发明实施例提供一种文本对话生成模型训练方法、文本对话生成方法、装置、计算机设备和存储介质,以提高文本对话生成准确率和效率。The embodiments of the present invention provide a text dialogue generation model training method, a text dialogue generation method, an apparatus, a computer device and a storage medium to improve the accuracy and efficiency of text dialogue generation.

为了解决上述技术问题,本申请实施例提供一种文本对话生成模型训练方法,包括:In order to solve the above technical problems, the present application embodiment provides a text dialogue generation model training method, including:

为了解决上述技术问题,本申请实施例提供一种文本对话生成方法,包括:In order to solve the above technical problems, the present application embodiment provides a text dialogue generation method, including:

获取目标语料,目标语料为问答对类型或文档类型;Obtain the target corpus, which is a question-answer pair type or a document type;

将目标语料输入到目标融合模型进行任务训练,得到目标文本对话。The target corpus is input into the target fusion model for task training to obtain the target text dialogue.

为了解决上述技术问题,本申请实施例还提供一种文本对话生成模型训练装置,包括:In order to solve the above technical problems, the embodiment of the present application also provides a text dialogue generation model training device, including:

知识检索模块,用于对问答对、文档类型的知识库进行检索,得到预设数量的基础知识;The knowledge retrieval module is used to search the knowledge base of question-answer pairs and document types to obtain a preset amount of basic knowledge;

提示词构建模块,用于基于所述基础知识、当前上下文和当前问题,构建第一提示词和第二提示词,所述第一提示词基于所述基础知识选择得到,用于从所述基础知识中获取对应知识,所述第二提示词用于生成回复关键词,所述回复关键词用于根据所述当前上下文和所述当前问题生成应答答案;A prompt word construction module, used to construct a first prompt word and a second prompt word based on the basic knowledge, the current context and the current question, wherein the first prompt word is selected based on the basic knowledge and is used to obtain corresponding knowledge from the basic knowledge, and the second prompt word is used to generate a reply keyword, and the reply keyword is used to generate a response answer according to the current context and the current question;

模型融合模块,用于构建主模型和增强模型,分别采用所述第一提示词对所述主模型进行任务训练,采用所述第二提示词对所述增强模型进行任务训练,并将得到的模型进行融合,得到初始融合模型;A model fusion module, used to construct a main model and an enhanced model, respectively use the first prompt word to perform task training on the main model, use the second prompt word to perform task training on the enhanced model, and fuse the obtained models to obtain an initial fusion model;

模型训练模块,用于对所述初始融合模型进行训练,得到目标融合模型。The model training module is used to train the initial fusion model to obtain a target fusion model.

可选地,所述知识检索模块包括:Optionally, the knowledge retrieval module includes:

检索单元,用于对问答对、文档类型的知识库进行检索,得到M个知识,其中,M为正整数;A retrieval unit is used to search the knowledge base of question-answer pairs and document types to obtain M pieces of knowledge, where M is a positive integer;

第一编码单元,用于将知识库中的每个知识编码为向量,得到知识库向量,其中,知识库为为第一个到第个知识,为第i个知识对应的编码向量,为向量维度;The first encoding unit is used to encode each knowledge in the knowledge base into a vector to obtain a knowledge base vector , , where the knowledge base is , For the first to knowledge, is the encoding vector corresponding to the i-th knowledge, is the vector dimension;

第二编码单元,用于对当前问题进行编码得到问题向量The second encoding unit is used to encode the current question to obtain a question vector ;

相似度计算单元,用于计算所述知识库向量与所述问题向量的余弦相似度A similarity calculation unit, used to calculate the cosine similarity between the knowledge base vector and the question vector ;

筛选单元,用于将所述知识库向量中,与所述问题向量的余弦相似度值大于预设阈值的知识,作为待筛选知识;A screening unit, used to select the knowledge in the knowledge base vector whose cosine similarity value with the question vector is greater than a preset threshold as knowledge to be screened;

选取单元,用于基于所述余弦相似度值对所述待筛选知识进行由大到小排序,得到知识排序序列,并从前往后选取预设数量的知识,作为所述基础知识。The selection unit is used to sort the knowledge to be screened from large to small based on the cosine similarity value to obtain a knowledge sorting sequence, and select a preset number of knowledge from the front to the back as the basic knowledge.

可选地,所述第一编码单元包括:Optionally, the first encoding unit includes:

第一编码子单元,用于若所述知识为问答对形式,则对所述问答对中的问题进行编码,得到所述知识对应的编码向量;A first encoding subunit, configured to encode the question in the question-answer pair if the knowledge is in the form of a question-answer pair, to obtain an encoding vector corresponding to the knowledge;

第二编码子单元,用于若所述知识为文档形式,则对所述文档进行编码,得到编码向量。The second encoding subunit is used to encode the document to obtain an encoding vector if the knowledge is in the form of a document.

可选地,所述模型融合模块包括:Optionally, the model fusion module includes:

向量化单元,用于将所述第一提示词输入所述主模型,得到第一编码向量,并将所述第二提示词输入所述增强模型,得到第二编码向量,其中,为所述第一提示词经过主模型进行Token分词和词向量化后的Tokens向量,表示所述第二提示词经过增强模型进行Token分词和词向量化后的Tokens向量,代表所述第一提示词的Tokens数量,代表所述第二提示词的Tokens数量,代表向量维度;A vectorization unit, used to input the first prompt word into the main model to obtain a first encoding vector , and input the second prompt word into the enhanced model to obtain a second encoding vector ,in, The first prompt word passes through the main model Tokens vector after Token segmentation and word vectorization, Indicates that the second prompt word is enhanced by the model Tokens vector after Token segmentation and word vectorization, The number of tokens representing the first prompt word, Represents the number of tokens of the second prompt word, represents the vector dimension;

二维向量生成单元,用于将所述第一编码向量经过所述主模型j层前馈计算,得到第一二维向量,和所述第二编码向量经过所述增强模型i层前馈计算,得到第二二维向量,其中,所述主模型、所述增强模型的总层数分别为,设定的融合层数为,j=,i=A two-dimensional vector generating unit is used to feed-forward the first encoding vector through the j-layer of the main model to obtain a first two-dimensional vector , and the second encoding vector is fed forward through the i-th layer of the enhancement model to obtain a second two-dimensional vector , where the total number of layers of the main model and the enhanced model are , , the number of fusion layers set is , j = - ,i= - ;

融合单元,用于从层开始,基于所述第一二维向量、所述第二二维向量对所述主模型、所述增强模型进行融合,得到初始融合模型。Fusion unit for Starting from the layer, the main model and the enhanced model are fused based on the first two-dimensional vector and the second two-dimensional vector to obtain an initial fused model.

为了解决上述技术问题,本申请实施例还提供一种文本对话生成装置,包括:In order to solve the above technical problems, the embodiment of the present application further provides a text dialogue generating device, including:

语料获取模块,用于获取目标语料,所述目标语料为问答对类型或文档类型;A corpus acquisition module is used to acquire target corpus, where the target corpus is a question-answer pair type or a document type;

对话生成模块,用于将所述目标语料输入到目标融合模型进行任务训练,得到目标文本对话。The dialogue generation module is used to input the target corpus into the target fusion model for task training to obtain the target text dialogue.

为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述文本对话生成模型训练方法的步骤。In order to solve the above technical problems, an embodiment of the present application also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the above text dialogue generation model training method when executing the computer program.

为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述文本对话生成模型训练方法的步骤。In order to solve the above technical problems, an embodiment of the present application also provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, the steps of the above-mentioned text dialogue generation model training method are implemented.

本发明实施例提供的文本对话生成模型训练方法、文本对话生成方法、装置、计算机设备及存储介质,通过对问答对、文档类型的知识库进行检索,得到预设数量的基础知识;基于基础知识、当前上下文和当前问题,构建第一提示词和第二提示词,第一提示词基于基础知识选择得到,用于从基础知识中获取对应知识,第二提示词用于生成回复关键词,回复关键词用于根据当前上下文和当前问题生成应答答案;构建主模型和增强模型,分别采用第一提示词对主模型进行任务训练,采用第二提示词对增强模型进行任务训练,并将得到的模型进行融合,得到初始融合模型;对初始融合模型进行训练,得到目标融合模型。实现端到端的知识选择和回复生成方案,将两种模型任务结合起来进行端到端训练,同时考虑不同模型的能力融合,构建了一套基于模型融合的文本对话生成策略,提高了文本对话生成的准确率。The text dialogue generation model training method, text dialogue generation method, device, computer equipment and storage medium provided in the embodiment of the present invention retrieve the knowledge base of question-answer pairs and document types to obtain a preset amount of basic knowledge; based on the basic knowledge, current context and current question, construct the first prompt word and the second prompt word, the first prompt word is selected based on the basic knowledge and is used to obtain the corresponding knowledge from the basic knowledge, the second prompt word is used to generate the reply keyword, and the reply keyword is used to generate the answer according to the current context and the current question; construct the main model and the enhanced model, respectively use the first prompt word to perform task training on the main model, use the second prompt word to perform task training on the enhanced model, and fuse the obtained models to obtain the initial fusion model; train the initial fusion model to obtain the target fusion model. An end-to-end knowledge selection and reply generation scheme is realized, the two model tasks are combined for end-to-end training, and the ability fusion of different models is considered at the same time, a set of text dialogue generation strategies based on model fusion is constructed, and the accuracy of text dialogue generation is improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying creative labor.

图1是本申请可以应用于其中的示例性系统架构图;FIG1 is a diagram of an exemplary system architecture in which the present application may be applied;

图2是本申请的文本对话生成模型训练方法的一个实施例的流程图;FIG2 is a flow chart of an embodiment of a text dialogue generation model training method of the present application;

图3是本申请的文本对话生成方法的一个实施例的流程图;FIG3 is a flow chart of an embodiment of a text dialogue generation method of the present application;

图4是根据本申请的文本对话生成模型训练装置的一个实施例的结构示意图;FIG4 is a schematic diagram of the structure of an embodiment of a text dialogue generation model training device according to the present application;

图5是根据本申请的文本对话生成装置的一个实施例的结构示意图;FIG5 is a schematic diagram of the structure of an embodiment of a text dialogue generating device according to the present application;

图6是根据本申请的计算机设备的一个实施例的结构示意图。FIG6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by technicians in the technical field of the present application; the terms used in the specification of the application herein are only for the purpose of describing specific embodiments and are not intended to limit the present application; the terms "including" and "having" and any variations thereof in the specification and claims of the present application and the above-mentioned drawings are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, not to describe a specific order.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various locations in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

请参阅图1,如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。Please refer to FIG1 , as shown in FIG1 , the system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links or optical fiber cables, etc.

用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages, etc.

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器( Moving Picture ExpertsGroup Audio Layer III,动态影像专家压缩标准音频层面3 )、MP4( Moving PictureExperts Group Audio Layer IV,动态影像专家压缩标准音频层面4 )播放器、膝上型便携计算机和台式计算机等等。Terminal devices 101, 102, 103 can be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 (Moving Picture Experts Group Audio Layer IV), laptop computers, desktop computers, etc.

服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for web pages displayed on the terminal devices 101 , 102 , and 103 .

需要说明的是,本申请实施例所提供的文本对话生成模型训练方法、文本对话生成方法由服务器执行,相应地,文本对话生成模型训练装置、文本对话生成装置设置于服务器中。It should be noted that the text dialogue generation model training method and the text dialogue generation method provided in the embodiments of the present application are executed by a server, and accordingly, the text dialogue generation model training device and the text dialogue generation device are arranged in the server.

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器,本申请实施例中的终端设备101、102、103具体可以对应的是实际生产中的应用系统。It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. According to the implementation requirements, there can be any number of terminal devices, networks and servers. The terminal devices 101, 102, 103 in the embodiment of the present application can specifically correspond to application systems in actual production.

请参阅图2,图2示出本发明实施例提供的一种文本对话生成模型训练方法,以该方法应用在图1中的服务端为例进行说明,详述如下:Please refer to FIG. 2 , which shows a text dialogue generation model training method provided by an embodiment of the present invention. The method is applied to the server in FIG. 1 as an example for explanation, and the details are as follows:

S201:对问答对、文档类型的知识库进行检索,得到预设数量的基础知识。S201: Search the knowledge base of question-answer pairs and document types to obtain a preset amount of basic knowledge.

在一具体可选实施方式中,对问答对、文档类型的知识库进行检索,得到预设数量的基础知识包括:In a specific optional implementation, searching the knowledge base of question-answer pairs and document types to obtain a preset amount of basic knowledge includes:

对问答对、文档类型的知识库进行检索,得到M个知识,其中,M为正整数;Search the knowledge base of question-answer pairs and document types to obtain M pieces of knowledge, where M is a positive integer;

将知识库中的每个知识编码为向量,得到知识库向量,其中,知识库为为第一个到第个知识,为第i个知识对应的编码向量,为向量维度;Encode each knowledge in the knowledge base into a vector and obtain the knowledge base vector , , where the knowledge base is , For the first to knowledge, is the encoding vector corresponding to the i-th knowledge, is the vector dimension;

对当前问题进行编码得到问题向量Encode the current question to get the question vector ;

计算知识库向量与问题向量的余弦相似度Calculate the cosine similarity between the knowledge base vector and the question vector ;

将知识库向量中,与问题向量的余弦相似度值大于预设阈值的知识,作为待筛选知识;The knowledge in the knowledge base vector whose cosine similarity value with the question vector is greater than a preset threshold is used as the knowledge to be screened;

基于余弦相似度值对待筛选知识进行由大到小排序,得到知识排序序列,并从前往后选取预设数量的知识,作为基础知识。The knowledge to be screened is sorted from large to small based on the cosine similarity value to obtain a knowledge sorting sequence, and a preset number of knowledge is selected from the front to the back as the basic knowledge.

在一具体可选实施方式中,将知识库中的每个知识编码为向量,得到知识库向量包括:In a specific optional implementation, each piece of knowledge in the knowledge base is encoded into a vector to obtain a knowledge base vector include:

若知识为问答对形式,则对问答对中的问题进行编码,得到知识对应的编码向量;If the knowledge is in the form of a question-answer pair, the question in the question-answer pair is encoded to obtain the encoding vector corresponding to the knowledge;

若知识为文档形式,则对文档进行编码,得到编码向量。If the knowledge is in the form of a document, the document is encoded to obtain an encoding vector.

S202:基于基础知识、当前上下文和当前问题,构建第一提示词和第二提示词,第一提示词基于基础知识选择得到,用于从基础知识中获取对应知识,第二提示词用于生成回复关键词,回复关键词用于根据当前上下文和当前问题生成应答答案。S202: Based on the basic knowledge, the current context and the current question, a first prompt word and a second prompt word are constructed. The first prompt word is selected based on the basic knowledge and is used to obtain corresponding knowledge from the basic knowledge. The second prompt word is used to generate a reply keyword. The reply keyword is used to generate a response answer according to the current context and the current question.

S203:构建主模型和增强模型,分别采用第一提示词对主模型进行任务训练,采用第二提示词对增强模型进行任务训练,并将得到的模型进行融合,得到初始融合模型。S203: constructing a main model and an enhanced model, respectively using the first prompt word to perform task training on the main model, and using the second prompt word to perform task training on the enhanced model, and fusing the obtained models to obtain an initial fusion model.

在一具体可选实施方式中,构建主模型和增强模型,分别采用第一提示词对主模型进行任务训练,采用第二提示词对增强模型进行任务训练,并将得到的模型进行融合,得到初始融合模型包括:In a specific optional implementation, a main model and an enhanced model are constructed, the first prompt word is used to perform task training on the main model, and the second prompt word is used to perform task training on the enhanced model, and the obtained models are fused to obtain an initial fusion model including:

将第一提示词输入主模型,得到第一编码向量,并将第二提示词输入增强模型,得到第二编码向量,其中,为第一提示词经过主模型进行Token分词和词向量化后的Tokens向量,表示第二提示词经过增强模型进行Token分词和词向量化后的Tokens向量,代表第一提示词的Tokens数量,代表第二提示词的Tokens数量,代表向量维度;Input the first prompt word into the main model to obtain the first encoding vector , and input the second prompt word into the enhanced model to obtain the second encoding vector ,in, The first prompt word passes through the main model Tokens vector after Token segmentation and word vectorization, Indicates that the second prompt word has been enhanced by the model Tokens vector after Token segmentation and word vectorization, Represents the number of Tokens of the first prompt word, Represents the number of tokens of the second prompt word, represents the vector dimension;

将第一编码向量经过主模型j层前馈计算,得到第一二维向量和第二编码向量经过增强模型i层前馈计算,得到第二二维向量,其中,主模型、增强模型的总层数分别为,设定的融合层数为,j=,i=The first encoded vector is fed forward through the jth layer of the main model to obtain the first two-dimensional vector The second encoding vector is fed forward through the enhanced model i layer to obtain the second two-dimensional vector , where the total number of layers of the main model and the enhanced model are , , the number of fusion layers set is , j = - ,i= - ;

层开始,基于第一二维向量、第二二维向量对主模型、增强模型进行融合,得到初始融合模型。from Starting from the layer, the main model and the enhanced model are fused based on the first two-dimensional vector and the second two-dimensional vector to obtain the initial fused model.

在一具体可选实施方式中,从层开始,基于第一二维向量、第二二维向量对主模型、增强模型进行融合,得到初始融合模型包括:In a specific optional implementation, from Starting from the first layer, the main model and the enhanced model are fused based on the first two-dimensional vector and the second two-dimensional vector to obtain the initial fusion model including:

对于层中的每一层,构建转换矩阵,将第二二维向量转化为和第一二维向量相同纬度,得到第二转化向量for Each layer in the layer , construct the transformation matrix , transform the second two-dimensional vector into the same latitude as the first two-dimensional vector, and obtain the second transformed vector ;

对第二转化向量和第一二维向量采用多头注意力机制融合,得到融合向量,将融合向量与第一二维向量进行叠加,作为下一层的输入向量,并进行下一层融合,直到最后一层融合,得到初始融合模型。The second transformed vector and the first two-dimensional vector are fused using a multi-head attention mechanism to obtain a fused vector , the fusion vector It is superimposed with the first two-dimensional vector as the input vector of the next layer, and the next layer is fused until the last layer is fused to obtain the initial fusion model.

具体地,在模型融合阶段,需要构建两个大模型,并且该两个模型都需要是经过SFT的chat模型,与传统的RAG使用不同,该模型融合的RAG方案需要经过训练来将两种模型想结合,具体步骤如下:Specifically, in the model fusion stage, two large models need to be built, and both models need to be chat models that have undergone SFT. Unlike the traditional RAG, the RAG solution for model fusion needs to be trained to combine the two models. The specific steps are as follows:

首先将两个模型分为主模型和增强模型,也即,采用增强模型来提升主模型的效果,输入做回复生成任务,将输入中做知识选择任务,具体如图2所示:First, the two models are divided into main models and enhanced models , that is, using the enhanced model To improve the main model The effect, enter To do the reply generation task, enter In the above example, we perform knowledge selection tasks, as shown in Figure 2:

首先将第二提示词和第一提示词输入各自的模型的得到各自的token编码向量,其中代表经过模型Token分词和词向量化后的Tokens向量,表示经过模型Token分词和词向量化后的Tokens向量,代表的Tokens数量,代表的Tokens数量,代表向量维度,此处向量维度不一定相等;First, the second prompt word and the first cue word Input your respective model , Get their respective token encoding vectors and ,in represent go through Tokens vector after model Token segmentation and word vectorization, express go through Tokens vector after model Token segmentation and word vectorization, represent The number of tokens, represent The number of tokens, Represents the vector dimension, where the vector dimension Not necessarily equal;

各自经过模型进行前馈计算,其中经过j层计算得到二维向量经过经过i层计算得到;至于i,j的计算方式如下,假设的总层数为,设定的融合层数为,则j=,i=;,即两个模型在最后进行融合。 , Each passed , The model performs feed-forward calculations, where After j layers of calculation, we get a two-dimensional vector , After the i-th layer calculation, we get ; As for the calculation method of i, j, it is assumed that , The total number of layers is , , the number of fusion layers set is , then j= - ,i= - ; That is, the two models are at the end To merge.

在最后层中,逐层对,进行融合,具体方式如下,对于中每一层,构建一个转换矩阵,将转为和相同的维度,得到;随后将进行如图2所示的Attention计算得到at the end Layer by layer , , and the specific method is as follows: Each layer , construct a transformation matrix ,Will Convert to The same dimensions, ; then , Perform the Attention calculation as shown in Figure 2 to obtain

;

其中,,(上述操作是一个multi-head attention操作,此处为了简单表示没有使用multi-head表示),其中都是可训练的映射矩阵,中已经包含了中的语义信息,而由于是以及SFT的大模型并且其知识选择相关的prompt,因此会对当前query相关的知识token分配更多的Attention 分值,由此获取的语义信息中包含了对知识的筛选信息,这个就做到end2end的知识筛选和模型融合;在得到通过与向量相加得到下一层的输入向量即;随后逐层进行模型融合直到最后一层。in, , (the above operation is a multi-head attention operation, and the multi-head representation is not used here for simplicity), where are all trainable mapping matrices, Already included in The semantic information in Since it is a large model of SFT and its knowledge selection is related to prompt, More attention scores will be assigned to knowledge tokens related to the current query, thus The acquired semantic information contains the information for knowledge screening, which achieves end-to-end knowledge screening and model fusion. Through The vectors are added to get the input vector of the next layer, that is ; Then the model is fused layer by layer until the last layer.

S204:对初始融合模型进行训练,得到目标融合模型。S204: Train the initial fusion model to obtain a target fusion model.

具体地,由于该模型融合方案中有新增的可训练参数,因此需要根据训练数据进行训练,在训练时是冻结的,而可根据具体情况进行冻结或者部分训练,而新增矩阵都是需要训练的参数。Specifically, since there are new trainable parameters in the model fusion solution, it is necessary to train according to the training data. is frozen, and Depending on the specific situation, freezing or partial training can be performed, and the newly added matrix These are all parameters that need to be trained.

同时,本方案中两种模型可以分开训练优化,即适用知识选择类的数据训练,然后使用模型回复类数据训练,然后将两类数据综合起来训练方案;经过各自模型的优化来综合提高基于模型融合的RAG对话方案的效果。At the same time, the two models in this scheme can be trained and optimized separately, that is, data training suitable for knowledge selection , and then use the model to reply to class data training , and then combine the two types of data to train scheme; through the optimization of each model, the effect of the RAG dialogue scheme based on model fusion is comprehensively improved.

本实施例中,通过对问答对、文档类型的知识库进行检索,得到预设数量的基础知识;基于基础知识、当前上下文和当前问题,构建第一提示词和第二提示词,第一提示词基于基础知识选择得到,用于从基础知识中获取对应知识,第二提示词用于生成回复关键词,回复关键词用于根据当前上下文和当前问题生成应答答案;构建主模型和增强模型,分别采用第一提示词对主模型进行任务训练,采用第二提示词对增强模型进行任务训练,并将得到的模型进行融合,得到初始融合模型;对初始融合模型进行训练,得到目标融合模型。实现端到端的知识选择和回复生成方案,将两种模型任务结合起来进行端到端训练,同时考虑不同模型的能力融合,构建了一套基于模型融合的文本对话生成策略,提高了文本对话生成的准确率。In this embodiment, a preset amount of basic knowledge is obtained by searching the knowledge base of question-answer pairs and document types; based on the basic knowledge, current context and current question, a first prompt word and a second prompt word are constructed, the first prompt word is selected based on the basic knowledge and is used to obtain corresponding knowledge from the basic knowledge, and the second prompt word is used to generate reply keywords, and the reply keywords are used to generate answer answers according to the current context and current question; a main model and an enhanced model are constructed, and the first prompt word is used to perform task training on the main model and the second prompt word is used to perform task training on the enhanced model, and the obtained models are fused to obtain an initial fusion model; the initial fusion model is trained to obtain a target fusion model. An end-to-end knowledge selection and reply generation scheme is implemented, and the two model tasks are combined for end-to-end training. At the same time, the capability fusion of different models is considered, and a set of text dialogue generation strategies based on model fusion is constructed to improve the accuracy of text dialogue generation.

请参阅图3,图3示出本发明实施例提供的一种文本对话生成方法,以该方法应用在图1中的服务端为例进行说明,详述如下:Please refer to FIG. 3 , which shows a text dialogue generation method provided by an embodiment of the present invention. The method is applied to the server in FIG. 1 as an example for explanation, and the details are as follows:

S205:获取目标语料,目标语料为问答对类型或文档类型;S205: Obtain target corpus, which is a question-answer pair type or a document type;

S206:将目标语料输入到目标融合模型进行任务训练,得到目标文本对话。S206: Input the target corpus into the target fusion model for task training to obtain the target text dialogue.

本实施例中,获取目标语料,目标语料为问答对类型或文档类型;将目标语料输入到目标融合模型进行任务训练,得到目标文本对话。采用提前训练好的目标融合模型,快速生成目标文本对话,提高了目标文本对话生成的效率。In this embodiment, a target corpus is obtained, which is a question-answer pair type or a document type; the target corpus is input into a target fusion model for task training to obtain a target text dialogue. The target fusion model trained in advance is used to quickly generate the target text dialogue, thereby improving the efficiency of generating the target text dialogue.

应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the order of execution of the steps in the above embodiment does not necessarily mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.

图4示出与上述实施例文本对话生成模型训练方法一一对应的文本对话生成模型训练装置的原理框图。如图4所示,该文本对话生成模型训练装置包括知识检索模块31、提示词构建模块32、模型融合模块33和模型训练模块34。各功能模块详细说明如下:FIG4 shows a block diagram of a text dialogue generation model training device corresponding to the text dialogue generation model training method of the above embodiment. As shown in FIG4 , the text dialogue generation model training device includes a knowledge retrieval module 31, a prompt word construction module 32, a model fusion module 33 and a model training module 34. The detailed description of each functional module is as follows:

知识检索模块31,用于对问答对、文档类型的知识库进行检索,得到预设数量的基础知识;The knowledge retrieval module 31 is used to search the knowledge base of question-answer pairs and document types to obtain a preset amount of basic knowledge;

提示词构建模块32,用于基于基础知识、当前上下文和当前问题,构建第一提示词和第二提示词,第一提示词基于基础知识选择得到,用于从基础知识中获取对应知识,第二提示词用于生成回复关键词,回复关键词用于根据当前上下文和当前问题生成应答答案;A prompt word construction module 32 is used to construct a first prompt word and a second prompt word based on the basic knowledge, the current context and the current question, wherein the first prompt word is selected based on the basic knowledge and is used to obtain corresponding knowledge from the basic knowledge, and the second prompt word is used to generate a reply keyword, and the reply keyword is used to generate a response answer according to the current context and the current question;

模型融合模块33,用于构建主模型和增强模型,分别采用第一提示词对主模型进行任务训练,采用第二提示词对增强模型进行任务训练,并将得到的模型进行融合,得到初始融合模型;A model fusion module 33 is used to construct a main model and an enhanced model, respectively use the first prompt word to perform task training on the main model, use the second prompt word to perform task training on the enhanced model, and fuse the obtained models to obtain an initial fusion model;

模型训练模块34,用于对初始融合模型进行训练,得到目标融合模型。The model training module 34 is used to train the initial fusion model to obtain a target fusion model.

可选地,知识检索模块31包括:Optionally, the knowledge retrieval module 31 includes:

检索单元,用于对问答对、文档类型的知识库进行检索,得到M个知识,其中,M为正整数;A retrieval unit is used to search the knowledge base of question-answer pairs and document types to obtain M pieces of knowledge, where M is a positive integer;

第一编码单元,用于将知识库中的每个知识编码为向量,得到知识库向量,其中,知识库为为第一个到第个知识,为第i个知识对应的编码向量,为向量维度;The first encoding unit is used to encode each knowledge in the knowledge base into a vector to obtain a knowledge base vector , , where the knowledge base is , For the first to knowledge, is the encoding vector corresponding to the i-th knowledge, is the vector dimension;

第二编码单元,用于对当前问题进行编码得到问题向量The second encoding unit is used to encode the current question to obtain a question vector ;

相似度计算单元,用于计算知识库向量与问题向量的余弦相似度Similarity calculation unit, used to calculate the cosine similarity between the knowledge base vector and the question vector ;

筛选单元,用于将知识库向量中,与问题向量的余弦相似度值大于预设阈值的知识,作为待筛选知识;A screening unit, used to select the knowledge in the knowledge base vector whose cosine similarity value with the question vector is greater than a preset threshold as the knowledge to be screened;

选取单元,用于基于余弦相似度值对待筛选知识进行由大到小排序,得到知识排序序列,并从前往后选取预设数量的知识,作为基础知识。The selection unit is used to sort the knowledge to be screened from large to small based on the cosine similarity value to obtain a knowledge sorting sequence, and select a preset number of knowledge from the front to the back as basic knowledge.

可选地,第一编码单元包括:Optionally, the first encoding unit includes:

第一编码子单元,用于若知识为问答对形式,则对问答对中的问题进行编码,得到知识对应的编码向量;A first encoding subunit is used for encoding the question in the question-answer pair to obtain an encoding vector corresponding to the knowledge if the knowledge is in the form of a question-answer pair;

第二编码子单元,用于若知识为文档形式,则对文档进行编码,得到编码向量。The second encoding subunit is used to encode the document if the knowledge is in the form of a document to obtain an encoding vector.

可选地,模型融合模块33包括:Optionally, the model fusion module 33 includes:

向量化单元,用于将第一提示词输入主模型,得到第一编码向量,并将第二提示词输入增强模型,得到第二编码向量,其中,为第一提示词经过主模型进行Token分词和词向量化后的Tokens向量,表示第二提示词经过增强模型进行Token分词和词向量化后的Tokens向量,代表第一提示词的Tokens数量,代表第二提示词的Tokens数量,代表向量维度;A vectorization unit, used to input the first prompt word into the main model to obtain a first encoding vector , and input the second prompt word into the enhanced model to obtain the second encoding vector ,in, The first prompt word passes through the main model Tokens vector after Token segmentation and word vectorization, Indicates that the second prompt word has been enhanced by the model Tokens vector after Token segmentation and word vectorization, Represents the number of Tokens of the first prompt word, Represents the number of tokens of the second prompt word, represents the vector dimension;

二维向量生成单元,用于将第一编码向量经过主模型j层前馈计算,得到第一二维向量,和第二编码向量经过增强模型i层前馈计算,得到第二二维向量,其中,主模型、增强模型的总层数分别为,设定的融合层数为,j=,i=A two-dimensional vector generating unit is used to feed-forward the first encoding vector through the j-layer of the main model to obtain a first two-dimensional vector , and the second encoding vector is fed forward through the enhanced model i layer to obtain the second two-dimensional vector , where the total number of layers of the main model and the enhanced model are , , the number of fusion layers set is , j = - ,i= - ;

融合单元,用于从层开始,基于第一二维向量、第二二维向量对主模型、增强模型进行融合,得到初始融合模型。Fusion unit for Starting from the layer, the main model and the enhanced model are fused based on the first two-dimensional vector and the second two-dimensional vector to obtain the initial fused model.

图5示出与上述实施例文本对话生成方法一一对应的文本对话生成装置的原理框图。如图5所示,该文本对话生成装置包括语料获取模块35和对话生成模块36。各功能模块详细说明如下:FIG5 shows a block diagram of a text dialogue generation device corresponding to the text dialogue generation method of the above embodiment. As shown in FIG5, the text dialogue generation device includes a corpus acquisition module 35 and a dialogue generation module 36. The detailed description of each functional module is as follows:

语料获取模块35,用于获取目标语料,目标语料为问答对类型或文档类型;Corpus acquisition module 35, used to acquire target corpus, the target corpus is a question-answer pair type or a document type;

对话生成模块36,用于将目标语料输入到目标融合模型进行任务训练,得到目标文本对话。The dialogue generation module 36 is used to input the target corpus into the target fusion model for task training to obtain the target text dialogue.

关于文本对话生成模型训练装置/文本对话生成装置的具体限定可以参见上文中对于文本对话生成模型训练方法/文本对话生成方法的限定,在此不再赘述。上述文本对话生成模型训练装置/文本对话生成装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。The specific definition of the text dialogue generation model training device/text dialogue generation device can be found in the definition of the text dialogue generation model training method/text dialogue generation method above, which will not be repeated here. Each module in the above-mentioned text dialogue generation model training device/text dialogue generation device can be implemented in whole or in part by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, or can be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图6,图6为本实施例计算机设备基本结构框图。To solve the above technical problems, the present application also provides a computer device. Please refer to FIG6 for details, which is a basic structural block diagram of the computer device of the present embodiment.

所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件连接存储器41、处理器42、网络接口43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器 (Digital Signal Processor,DSP)、嵌入式设备等。The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are interconnected through a system bus. It should be noted that the figure only shows a computer device 4 with components connected to the memory 41, the processor 42, and the network interface 43, but it should be understood that it is not required to implement all the components shown, and more or fewer components can be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculations and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable gate arrays (Field-Programmable Gate Array, FPGA), digital processors (Digital Signal Processor, DSP), embedded devices, etc.

所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a PDA, a cloud server, etc. The computer device may interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device.

所述存储器41至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或D界面显示存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如文本对话生成模型训练方法的程序代码等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 41 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or D interface display memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, disk, optical disk, etc. In some embodiments, the memory 41 can be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4. In other embodiments, the memory 41 can also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. equipped on the computer device 4. Of course, the memory 41 can also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store the operating system and various application software installed on the computer device 4, such as the program code of the text dialogue generation model training method, etc. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or are to be output.

所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的程序代码或者处理数据,例如运行文本对话生成模型训练方法的程序代码。The processor 42 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 42 is generally used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is used to run the program code stored in the memory 41 or process data, such as running the program code of the text dialogue generation model training method.

所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。The network interface 43 may include a wireless network interface or a wired network interface. The network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.

本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有界面显示程序,所述界面显示程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的文本对话生成模型训练方法的步骤。The present application also provides another implementation, namely, providing a computer-readable storage medium, wherein the computer-readable storage medium stores an interface display program, and the interface display program can be executed by at least one processor so that the at least one processor performs the steps of the text dialogue generation model training method as described above.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present application, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a magnetic disk, or an optical disk), and includes a number of instructions for enabling a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in each embodiment of the present application.

显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the embodiments described above are only some embodiments of the present application, rather than all embodiments. The preferred embodiments of the present application are given in the accompanying drawings, but they do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although the present application is described in detail with reference to the aforementioned embodiments, for those skilled in the art, it is still possible to modify the technical solutions recorded in the aforementioned specific implementation methods, or to replace some of the technical features therein with equivalents. Any equivalent structure made using the contents of the specification and drawings of this application, directly or indirectly used in other related technical fields, is similarly within the scope of patent protection of this application.

Claims (10)

1. A method for training a text dialog generation model, comprising:
searching a knowledge base of question and answer pairs and document types to obtain basic knowledge with preset quantity;
Constructing a first prompt word and a second prompt word based on the basic knowledge, the first prompt word is selected based on the basic knowledge and used for acquiring corresponding knowledge from the basic knowledge, the second prompt word is used for generating a reply keyword, and the reply keyword is used for generating a reply answer according to the current context and the current problem;
Constructing a main model and an enhancement model, respectively adopting the first prompting words to perform task training on the main model, adopting the second prompting words to perform task training on the enhancement model, and fusing the obtained models to obtain an initial fusion model;
and training the initial fusion model to obtain a target fusion model.
2. The text dialogue generating model training method as claimed in claim 1, wherein the retrieving the question-answer pairs and the knowledge base of the document type to obtain the preset amount of basic knowledge comprises:
Searching a question-answer pair and a knowledge base of the document type to obtain M pieces of knowledge, wherein M is a positive integer;
encoding each knowledge in the knowledge base into a vector to obtain a knowledge base vector Wherein the knowledge base isFor the first to the secondThe knowledge of the number of the individual pieces of knowledge,For the coding vector to which the i-th knowledge corresponds,Is vector dimension;
encoding the current problem to obtain a problem vector
Calculating cosine similarity of the knowledge base vector and the problem vector
The knowledge, in the knowledge base vector, of which the cosine similarity value with the problem vector is larger than a preset threshold value is used as knowledge to be screened;
And sorting the knowledge to be screened from large to small based on the cosine similarity value to obtain a knowledge sorting sequence, and selecting a preset number of knowledge from front to back as the basic knowledge.
3. The text dialogue generation model training method as claimed in claim 2, wherein each knowledge in the knowledge base is encoded as a vector to obtain a knowledge base vectorComprising the following steps:
if the knowledge is in a question-answer pair form, encoding the questions in the question-answer pair to obtain encoding vectors corresponding to the knowledge;
and if the knowledge is in the form of a document, encoding the document to obtain an encoding vector.
4. A method for training a text dialog generation model according to any of claims 1 to 3, wherein the constructing a main model and an enhancement model, respectively, using the first prompt word to perform task training on the main model, using the second prompt word to perform task training on the enhancement model, and fusing the obtained models, and obtaining an initial fused model includes:
inputting the first prompt word into the main model to obtain a first coding vector Inputting the second prompt word into the enhancement model to obtain a second coding vectorWherein, the method comprises the steps of, wherein,For the first prompting word to pass through the main modelPerforming Tokens vectors after Token segmentation and word vectorization,Representing the second prompt word passing through the enhancement modelPerforming Tokens vectors after Token segmentation and word vectorization,A Tokens number representing the first hint word,A Tokens number representing the second hint word,Representing a vector dimension;
the first coding vector is subjected to feedforward calculation of the j layers of the main model to obtain a first two-dimensional vector And the second coding vector is subjected to feedforward calculation of the i layer of the enhancement model to obtain a second two-dimensional vectorWherein the total layer numbers of the main model and the enhancement model are respectivelyThe set fusion layer number is,j=,i=
From the slaveAnd (3) starting a layer, and fusing the main model and the enhancement model based on the first two-dimensional vector and the second two-dimensional vector to obtain an initial fusion model.
5. The text conversation generation model training method of claim 4 wherein the slaveStarting the layer, fusing the main model and the enhancement model based on the first two-dimensional vector and the second two-dimensional vector, and obtaining an initial fusion model comprises the following steps:
For the following Each of the layersConstructing a transformation matrixConverting the second two-dimensional vector into the same latitude as the first two-dimensional vector to obtain a second conversion vector
Fusing the second transformation vector and the first two-dimensional vector by adopting a multi-head attention mechanism to obtain a fused vectorFusing the vectorsAnd superposing the first two-dimensional vector as an input vector of the next layer, and fusing the next layer until the last layer is fused to obtain the initial fusion model.
6. A method for generating a text dialog, comprising:
acquiring target corpus, wherein the target corpus is question-answer pair type or document type;
And inputting the target corpus into a target fusion model for task training to obtain a target text dialogue, wherein the target fusion model is obtained according to the method of any one of claims 1 to 5.
7. A text dialog generation model training device comprising:
the knowledge retrieval module is used for retrieving knowledge bases of question-answer pairs and document types to obtain preset quantity of basic knowledge;
the prompt word construction module is used for constructing a first prompt word and a second prompt word based on the basic knowledge, the current context and the current problem, the first prompt word is selected based on the basic knowledge and used for acquiring corresponding knowledge from the basic knowledge, the second prompt word is used for generating a reply keyword, and the reply keyword is used for generating a reply answer according to the current context and the current problem;
The model fusion module is used for constructing a main model and an enhancement model, respectively adopting the first prompt words to perform task training on the main model, adopting the second prompt words to perform task training on the enhancement model, and fusing the obtained models to obtain an initial fusion model;
And the model training module is used for training the initial fusion model to obtain a target fusion model.
8. A text conversation generating device, comprising:
The acquisition module is used for acquiring target corpus, wherein the target corpus is question-answer pair type or document type;
the generating module is configured to input the target corpus into a target fusion model for task training, so as to obtain a target text dialogue, where the target fusion model is obtained according to the method of any one of claims 1 to 5.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the text dialog generation model training method of any of claims 1 to 5 when executing the computer program or the processor implements the text dialog generation method of claim 6 when executing the computer program.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the text conversation generation model training method of any one of claims 1 to 5 or the computer program when executed by a processor implements the text conversation generation model training method of claim 6.
CN202410978470.9A 2024-07-22 2024-07-22 Text dialogue generation model training method, text dialogue generation method and equipment Active CN118520976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410978470.9A CN118520976B (en) 2024-07-22 2024-07-22 Text dialogue generation model training method, text dialogue generation method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410978470.9A CN118520976B (en) 2024-07-22 2024-07-22 Text dialogue generation model training method, text dialogue generation method and equipment

Publications (2)

Publication Number Publication Date
CN118520976A true CN118520976A (en) 2024-08-20
CN118520976B CN118520976B (en) 2024-12-24

Family

ID=92276722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410978470.9A Active CN118520976B (en) 2024-07-22 2024-07-22 Text dialogue generation model training method, text dialogue generation method and equipment

Country Status (1)

Country Link
CN (1) CN118520976B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119250184A (en) * 2024-12-05 2025-01-03 成都佳发安泰教育科技股份有限公司 Dialogue data generation method, dialogue model training method and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226338A (en) * 2022-11-16 2023-06-06 四川封面传媒科技有限责任公司 Multi-round dialogue system and method based on searching and generating fusion
CN116756295A (en) * 2023-08-16 2023-09-15 北京盛通知行教育科技集团有限公司 Knowledge base retrieval method, device and storage medium
CN117033602A (en) * 2023-08-24 2023-11-10 北京邮电大学 Method for constructing multi-mode user mental perception question-answering model
CN117056471A (en) * 2023-07-11 2023-11-14 数字郑州科技有限公司 Knowledge base construction method and question-answer dialogue method and system based on generation type large language model
CN117390156A (en) * 2023-10-12 2024-01-12 平安科技(深圳)有限公司 Cross-modal-based question-answer dialogue method, system, equipment and storage medium
CN117743559A (en) * 2024-02-20 2024-03-22 厦门国际银行股份有限公司 Multi-round dialogue processing method, device and equipment based on RAG

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226338A (en) * 2022-11-16 2023-06-06 四川封面传媒科技有限责任公司 Multi-round dialogue system and method based on searching and generating fusion
CN117056471A (en) * 2023-07-11 2023-11-14 数字郑州科技有限公司 Knowledge base construction method and question-answer dialogue method and system based on generation type large language model
CN116756295A (en) * 2023-08-16 2023-09-15 北京盛通知行教育科技集团有限公司 Knowledge base retrieval method, device and storage medium
CN117033602A (en) * 2023-08-24 2023-11-10 北京邮电大学 Method for constructing multi-mode user mental perception question-answering model
CN117390156A (en) * 2023-10-12 2024-01-12 平安科技(深圳)有限公司 Cross-modal-based question-answer dialogue method, system, equipment and storage medium
CN117743559A (en) * 2024-02-20 2024-03-22 厦门国际银行股份有限公司 Multi-round dialogue processing method, device and equipment based on RAG

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIU YANG 等: "A Hybrid Retrieval-Generation Neural Conversation Model", 《ACM》, 7 October 2019 (2019-10-07), pages 1341 - 1350 *
YIPING SONG 等: "Two are Better than One: An Ensemble of Retrieval- and Generation-Based Dialog Systems", 《ARXIV:1610.07149V1》, 23 October 2016 (2016-10-23), pages 1 - 11 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119250184A (en) * 2024-12-05 2025-01-03 成都佳发安泰教育科技股份有限公司 Dialogue data generation method, dialogue model training method and electronic device

Also Published As

Publication number Publication date
CN118520976B (en) 2024-12-24

Similar Documents

Publication Publication Date Title
CN113434664A (en) Text abstract generation method, device, medium and electronic equipment
CN112653798A (en) Intelligent customer service voice response method and device, computer equipment and storage medium
CN112699213A (en) Speech intention recognition method and device, computer equipment and storage medium
CN113505198A (en) Keyword-driven generative dialogue reply method, device and electronic device
CN115455171B (en) Text video mutual inspection rope and model training method, device, equipment and medium
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
CN114386392B (en) Document generation method, device, equipment and storage medium
CN114840869A (en) Data sensitivity identification method and device based on sensitivity identification model
CN106354852A (en) Search method and device based on artificial intelligence
CN116186295B (en) Attention-based knowledge map link prediction method, device, equipment and medium
WO2025020611A1 (en) Session response method and apparatus, electronic device and storage medium
CN119415661A (en) Method, device, equipment, medium and program product for responding to user question information
CN113569094A (en) Video recommendation method and device, electronic equipment and storage medium
CN117874234A (en) Text classification method and device based on semantics, computer equipment and storage medium
CN118520976A (en) Text dialogue generation model training method, text dialogue generation method and equipment
CN114817478A (en) Text-based question answering method, device, computer equipment and storage medium
CN111639164A (en) Question-answer matching method and device of question-answer system, computer equipment and storage medium
CN110489730A (en) Text handling method, device, terminal and storage medium
CN118535565A (en) Front-end form data verification method and device
CN118797036A (en) Model training method, device, equipment, storage medium and computer program product
CN116959559A (en) Method, device, equipment, storage medium and program product for generating prediction result
CN114648005B (en) Multi-segment machine reading and understanding method and device for multi-task joint learning
CN115329062A (en) Dialogue model training method under low-data scene and computer equipment
CN115062136A (en) Event Disambiguation Method and Related Equipment Based on Graph Neural Network
CN113361629A (en) Training sample generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Training method for text dialogue generation model, text dialogue generation method and device

Granted publication date: 20241224

Pledgee: Hunan Xiangjiang New Area Rural Commercial Bank Co.,Ltd. Dongfanghong Branch

Pledgor: Wisdom Eye Technology Co.,Ltd.

Registration number: Y2025980023483