CN118377887B

CN118377887B - Automatic question and answer method

Info

Publication number: CN118377887B
Application number: CN202410824422.4A
Authority: CN
Inventors: 刘子君; 李若鹏
Original assignee: Alibaba Cloud Feitian Hangzhou Cloud Computing Technology Co ltd
Current assignee: Alibaba Cloud Feitian Hangzhou Cloud Computing Technology Co ltd
Priority date: 2024-06-24
Filing date: 2024-06-24
Publication date: 2024-09-24
Anticipated expiration: 2044-06-24
Also published as: CN118377887A

Abstract

The embodiment of the specification provides an automatic question-answering method, wherein the automatic question-answering method comprises the following steps: receiving a question instruction comprising a question text and determining a document database corresponding to the question instruction; determining at least one reference document data in a document database and generating a reference resolved text based on the question text and the reference document data, wherein the reference resolved text is a text that resolves the question text based on the reference document data; acquiring matching weight information corresponding to the document data according to the reference analysis text and the document data; and determining at least one target document data in the document data based on the matching weight information corresponding to the document data, and generating answer text corresponding to the question instruction according to the question text and the target document data. The corresponding matching weight information of the document data is determined by referring to the analysis text, and the logical language selection in the reference analysis text is used for generating the correctly replied document to improve the accuracy of automatic question and answer.

Description

Automatic question and answer method

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to an automatic question-answering method, an automatic question-answering method applied to a cloud device, an automatic question-answering apparatus, a computing device, a computer-readable storage medium, and a computer program product.

Background

With the development of large model technology, the application of the large model technology to a specific scene often needs to use an external knowledge base. These knowledge bases can provide up-to-date knowledge not contained in the pre-training of large models, particularly in certain areas of expertise, which may be answered in error without these external knowledge bases.

Currently, the general method for determining the documents corresponding to the problems in the industry mainly comprises the steps of determining the first several documents most similar to the query by a method for matching the problems and knowledge, so that the basic requirement of information retrieval is met. However, this approach often does not take into account other information that may affect how well the documents match, resulting in a determined document that may not be the most relevant document, thereby affecting the ability of the large model to generate correct answers. Therefore, a more accurate automatic question-answering method is needed to more accurately mine and utilize information and improve the accuracy of model answers.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide an automatic question-answering method, and an automatic question-answering method applied to a cloud device. One or more embodiments of the present specification are also directed to an automatic question and answer apparatus, a computing device, a computer-readable storage medium, and a computer program product that address the shortcomings of the prior art.

According to a first aspect of embodiments of the present disclosure, there is provided an automatic question-answering method, including:

receiving a problem instruction and determining a document database corresponding to the problem instruction, wherein the problem instruction comprises a problem text, and the document database comprises at least one document data;

Determining at least one reference document data in the document database, and generating a reference analysis text based on the problem text and each reference document data, wherein the reference analysis text is a text for analyzing the problem text based on the reference document data;

According to the reference analysis text and each document data, obtaining matching weight information corresponding to each document data;

And determining at least one target document data in each document data based on the matching weight information corresponding to each document data, and generating an answer text corresponding to the question instruction according to the question text and each target document data.

According to a second aspect of embodiments of the present disclosure, an automatic question-answering method is provided, applied to a cloud device, and the method includes:

Receiving a problem instruction sent by a terminal side device, and determining a document database corresponding to the problem instruction, wherein the problem instruction comprises a problem text, and the document database comprises at least one document data;

determining at least one target document data in each document data based on the matching weight information corresponding to each document data, and generating an answer text corresponding to the question instruction according to the question text and each target document data;

And returning the answer text to the end-side equipment.

According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions that, when executed by the processor, perform the steps of the automatic question-answering method described above.

According to a fourth aspect of embodiments of the present specification, there is provided a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the automatic question-answering method described above.

According to a fifth aspect of embodiments of the present specification, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the automatic question-answering method described above.

One embodiment of the specification realizes that a problem instruction is received, and a document database corresponding to the problem instruction is determined, wherein the problem instruction comprises a problem text, and the document database comprises at least one document data; determining at least one reference document data in the document database, and generating a reference analysis text based on the problem text and each reference document data, wherein the reference analysis text is a text for analyzing the problem text based on the reference document data; according to the reference analysis text and each document data, obtaining matching weight information corresponding to each document data; and determining at least one target document data in each document data based on the matching weight information corresponding to each document data, and generating an answer text corresponding to the question instruction according to the question text and each target document data.

By applying the scheme of the embodiment of the specification, the reference analysis text of the question text is obtained, the matching weight information corresponding to each document data is determined based on the reference analysis text, a target document which is favorable for generating correct reply can be selected by using a more logical and written language in the reference analysis text, then an answer corresponding to the question text is generated according to the selected target document, and the accuracy of automatic answer is improved.

Drawings

FIG. 1 is a flow chart of an automatic question-answering method provided by one embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a calculation method of resolution correlation information according to an embodiment of the present disclosure;

fig. 3 is a flowchart of an automatic question-answering method applied to a cloud device according to an embodiment of the present disclosure;

FIG. 4 is a block diagram of an automated question-answering system according to one embodiment of the present disclosure;

FIG. 5 is a process flow diagram of a question answering method provided by one embodiment of the present disclosure;

FIG. 6 is a process flow diagram of a legal knowledge question-answering method provided by one embodiment of the present disclosure;

Fig. 7 is a schematic structural diagram of an automatic question answering device according to one embodiment of the present disclosure;

FIG. 8 is a block diagram of a computing device provided in one embodiment of the present description.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.

In one or more embodiments of the present description, a large model refers to a deep learning model with large scale model parameters, typically including hundreds of millions, billions, trillions, and even more than one billion model parameters. The large Model can be called as a Foundation Model, a training Model is performed by using a large-scale unlabeled corpus, a pre-training Model with more than one hundred million parameters is produced, the Model can adapt to a wide downstream task, and the Model has better generalization capability, such as a large-scale language Model (Large Language Model, LLM), a multi-modal pre-training Model (multi-modal pre-training Model) and the like.

When the large model is actually applied, the pretrained model can be applied to different tasks by only slightly adjusting a small number of samples, the large model can be widely applied to the fields of natural language processing (Natural Language Processing, NLP for short), computer vision and the like, and particularly can be applied to the tasks of the computer vision fields such as vision question and answer (Visual Question Answering, VQA for short), image description (IC for short), image generation and the like, and the tasks of the natural language processing fields such as emotion classification based on texts, text abstract generation, machine translation and the like, and main application scenes of the large model comprise digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.

First, terms related to one or more embodiments of the present specification will be explained.

The word embedding method comprises the following steps: is a technique of converting words in text data into a digital form, by which a computer can process language information more effectively. These methods are implemented by representing each word as a real vector such that the semantic relationship between words is represented in vector space. The vector dimensions are generally low and this representation helps to increase the efficiency and effectiveness of the algorithm relative to the original lexical space. Common Word embedding techniques include Word-to-Vector (Word to Vector), word embedding of global vectors (Global Vectors for Word Representation, gloVe for short), and the like.

Reciprocal rank fusion (Reciprocal Rank Fusion, RPF): is a technique for merging multiple search result listings. By assigning each result a reciprocal score based on its ranking and summing all reciprocal scores of the same result, an efficient method is provided for synthesizing multiple query results. The method realizes that the multiple recalled results are mapped to the same value range space according to the ranking, and the scoring standard is pulled up, so that the accuracy of the ranking results is improved.

Most similar Matching 25 (Best Matching 25, bm25) algorithm: is a widely used ranking function for evaluating relevance between queries and documents in information retrieval systems. The algorithm is based on a probability retrieval framework, and takes two factors of word frequency and inverse document frequency into consideration. The BM25 assigns a weight to each word that increases as the frequency of occurrence of the word in the document increases, but decreases in magnitude as the frequency of the word increases, thereby avoiding excessive impact on scoring.

Elastic Search (ES) library: is a library provided with keyword related searches and analyses specifically designed for processing rapid searches, stores and analyses of large amounts of data, supporting structured and unstructured data, and allowing users to conveniently perform and manage complex query operations.

In the present specification, an automatic question-answering method applied to a cloud device is provided, and the present specification also relates to an automatic question-answering apparatus, a computing device, and a computer-readable storage medium, a computer program product, which are described in detail in the following embodiments one by one.

Referring to fig. 1, fig. 1 shows a flowchart of an automatic question-answering method according to one embodiment of the present specification, which specifically includes the following steps.

Step 102: and receiving a question instruction and determining a document database corresponding to the question instruction, wherein the question instruction comprises a question text, and the document database comprises at least one document data.

In practical application, the question instruction is an instruction for acquiring an answer text corresponding to the question text, the document database is a database related to the question instruction, the question text is a text needing to acquire an answer, and the document data is document data composed of knowledge in the field corresponding to the question instruction.

In particular, the question instruction may be understood as an instruction for acquiring answer text, and the question instruction may be sent by a user or may be automatically generated by a program, which is not limited in this specification. Determining the document database to which the question instruction corresponds may be understood as obtaining a document database that more closely matches the knowledge required for the question text carried in the question instruction. A document database is understood to be a database comprising at least one document data corresponding to a domain of knowledge, a plurality of document data comprised in the same document database being document data comprising the same domain of knowledge.

It should be noted that, the document data may be directly stored, but considering that storing the document text may occupy a large amount of storage space, so that, alternatively, the document data may be data obtained by processing the document text, where the document text may be processed by partitioning the document text and then obtaining at least one vector corresponding to each document by a word embedding method, where each vector is document data corresponding to a document, or obtaining a keyword corresponding to the document text, generating an index of each document by each keyword according to a weight, then processing the document into compressed data carrying the index in a dictionary form, and the like, and obtaining a document database corresponding to a document database including a plurality of document data instead of a document text database including a plurality of document text may reduce the storage space required for executing the method by performing a vectorization process, a dictionary process, and the like compression manner on the text, so as to reduce the storage pressure for operating the apparatus of the method.

In one embodiment provided in this specification, a question instruction sent by a user is received, where the question instruction carries a question text, "how is the major vendors of the current global electric automobile market and their market share distribution? The document database corresponding to the problem instruction is an electric automobile industry analysis document database, and the document database comprises document data such as sales condition document data, market share document data, consumer preference document data, technical development document data and the like.

By determining the document database corresponding to the problem text, the confirmation range of the target document can be reduced, and the efficiency of determining the target document data corresponding to the problem instruction can be improved.

Further, the document database is generated by: acquiring at least one document text and a document database description text; processing each document text according to a preset text vector strategy, and generating document data corresponding to each document text, wherein the text vector strategy is used for vectorizing each document text; and determining a document database corresponding to each document text according to the document database description text and each document data.

In practical application, the document database description text is text describing the knowledge field corresponding to each document data in the document database, and is used for determining the document database corresponding to the problem instruction, and the text vector strategy is a strategy for vectorizing each document text.

In particular, a text vector policy may be understood as a policy that compresses document text for reducing the storage pressure of the device running the method. Since the document has a longer text and the length of the transformation vector of the word embedding method is limited, the document text is optionally firstly segmented, then each text segment after segmentation is transformed into a vector, and the transformed multiple vectors are the document data corresponding to the document text.

It should be noted that, the method for obtaining the description text of the document database may be that the same points of each document text are extracted through a deep learning model, the text describing that the above document texts together cover the domain knowledge is automatically generated, or the text is input by the user uploading the document text, which is not limited in this specification.

In one embodiment provided herein, receiving sales case document text, market share document text, consumer preference document text, and technical development document text, etc., and receiving descriptive text for each document text "is a repository that specifically stores and provides information about various industry market conditions, trend analysis, consumer behavior, competitor analysis, etc. By dividing each document text into text paragraphs with fixed lengths, generating each vector for each text paragraph by a word embedding method, and then combining each vector and the description text to determine a document database.

By acquiring the document database description text describing the knowledge field covered by the document database, the efficiency of determining the document database corresponding to the problem text later can be higher, and the efficiency of determining the target document data corresponding to the problem instruction can be improved.

Further, receiving a problem instruction, determining a document database corresponding to the problem instruction, and including: analyzing the problem instruction to determine a problem text; under the condition that the corresponding document database needs to be determined by the problem text, at least one initial document database and the description text of the initial document database corresponding to each document database are obtained; calculating the library description correlation information between the problem text and each initial document database description text, and determining a target document database description text according to the library description correlation information corresponding to each initial document database description text; and determining an initial document database corresponding to the description text of the target document database as the document database corresponding to the problem instruction.

In practical application, the initial document database is a document database corresponding to knowledge in each field related to a specific scene, the description text of the initial document database is a description text corresponding to each document database, the information of the degree of correlation of the description of the library is the degree of correlation between the description text of the question and the description text of each initial document database, and the description text of the target document database is a description text of the document database which is relatively related to the question text.

Specifically, whether the text of the question needs to be determined to correspond to the document database is determined, which is understood to be that whether the question needs specific information in a specific scene can be answered.

In one embodiment provided in this specification, the question text obtained is "how is the major vendors of the current global electric automobile market and their market share distribution? By "the problem text needs to determine the corresponding document database, since the problem involves expertise in an electric car scenario.

In another embodiment provided in this specification, the question text obtained is "if you fold a square of paper once, what is the resulting shape? By "since the question does not relate to knowledge information in any particular scenario, only logical inference is required to answer, determining the question text does not require determining the corresponding document database.

The document databases which are relatively relevant to the field related to the problem text are determined by determining the correlation degree (library description correlation degree information) between the problem text and the description text of each initial document database, and compared with the judgment of all document data in all databases, the number of times of judgment can be effectively reduced, so that the judgment efficiency is improved, and the efficiency of determining the target document data corresponding to the problem instruction is improved.

Step 104: at least one reference document data is determined in the document database, and a reference analysis text is generated based on the problem text and each reference document data, wherein the reference analysis text is text for analyzing the problem text based on the reference document data.

In practical application, the reference document data is document data for generating reference resolved text, and the reference resolved text is resolved text made to the question text based on the reference document data.

In particular, reference document data may be understood as document data that is more relevant to the problem text. The reference parsed text may be understood as parsed text for the question text that is automatically acquired from the reference document data. The method for determining the reference document data may be to determine the category of the question text through a pre-trained classification model, then determine the document data corresponding to the question text according to preset category information corresponding to each document data, or determine the similarity between the question text and each document data, and then determine the reference document data according to the similarity.

In one embodiment provided in the present specification, the above example is taken as an example, and since the problem text mainly relates to the specific situation in the market of each manufacturer in the electric automobile field, the reference document data for parsing the problem text is determined as sales situation document data, market share document data, and consumer preference document data.

The problem text is resolved by the reference document data, and the reference resolved text is generated, so that the reference resolved text is more accurate, the accuracy of determining the target document data according to the reference resolved text can be improved, the determined target document data is more accurate, and the accuracy of automatically answering the problem is improved.

Further, generating a reference parsing text based on the question text and each reference document data includes: acquiring at least one reference document text according to each reference document data; generating a text to be analyzed based on the problem text and each reference document text, wherein the text to be analyzed is a prompt text which enables an analysis generation model to generate a problem text corresponding to the reference analysis text; and inputting the text to be analyzed into an analysis generation model, and obtaining a reference analysis text output by the analysis generation model.

In practical application, the reference document text is a text corresponding to the reference document data, and the text to be analyzed is an input value analysis generation model so that the analysis generation model outputs the text of the reference analysis text.

Specifically, the specific manner of generating the text to be parsed may be directly splicing the reference parsed text and the problem text, inputting the spliced text to the parsing generation model so that the parsing generation model generates the reference parsed text for the problem text, or may be a manner of generating the text to be parsed by parsing to generate a prompt word template, combining the reference document text and the problem text, generating the text to be parsed and the like, which is not limited in this specification.

It should be noted that, the parsing generating model may be understood as any text processing model capable of performing text processing, and considering the accuracy of the outputted reference parsing text, the parsing generating model may optionally process the text to be parsed by using a large language model.

In one embodiment provided in the present specification, taking an example of generating a text to be parsed by combining a reference document text and a question text through parsing a generated hint word template, the reference parsed text generated by using a parsing generation model is described, and the corresponding parsing generated hint word template is: "question text: [ here input question text ] related document: [ here input reference document text ] request parsing: according to the above problem text and related documents, a detailed analysis is provided, the background of the problem, the related key factors and possible answers or solutions are explained, then the problem text and the reference document text are inserted into the prompt word template to obtain a text to be analyzed, and then the obtained text to be analyzed is input into an analysis generation model to obtain a reference analysis text.

Because the reference document data is not directly stored in the server in the form of text in consideration of the problem of the storage space of the server, the reference document data needs to be converted into the reference document text first, and then the text to be analyzed is generated according to each reference document text and the problem text, so that the analysis generation model can generate the analysis corresponding to the problem more accurately.

Further, determining at least one reference document data in the document database, comprising: generating problem correlation information corresponding to each document data according to the problem text and each document data, wherein the problem correlation information characterizes the correlation degree between each document data and the problem text; acquiring reference sequence information corresponding to each document data based on the problem correlation information corresponding to each document data; and determining at least one reference document data according to the reference sequence information corresponding to each document data.

In practical application, the problem correlation information is information for reflecting the matching degree between the document data and the problem text, and the reference sequence information represents the ranking in each document data of the correlation degree between the document data and the problem text.

In particular, determining the degree of relevance of the questions of each document may be understood as calculating the degree of relevance between each question and each question text, that is, the manner in which the degree of relevance of the questions is obtained may be any manner in which the degree of relevance between the documents and the question text is determined, which is not limited in this specification.

In one embodiment provided in the present specification, taking determining the keyword similarity between the question text and each document as the question relevance information as an example, calculating the keyword similarity by, for example, a BM25 (Best Matching 25) algorithm, directly adding the text corresponding to each document data to an ES (elastic search) library, directly obtaining the keyword similarity by using an obtaining method of the keyword similarity provided by the ES library, and the like, to obtain the keyword similarity between each document and the question text, and determining the question relevance by using the keyword similarity corresponding to each document.

In another embodiment provided in the present specification, taking determining the text feature similarity between the question text and each document as the question correlation information as an example, extracting text features corresponding to the question text and the document text by using, for example, any text processing model, calculating the text feature similarity between the text features, adding the text features of the document data processed by the text processing model into an ES library, directly obtaining the feature similarity between each document and the question text by using an obtaining method of feature similarity such as the text feature similarity between the question text and each document data provided by the ES library, and determining the question correlation by using the feature similarity corresponding to each document.

It should be noted that, the manner of determining the relevance between the document and the question text may also be to determine the question relevance between each document and the question text by using the keyword similarity and the feature similarity between each document and the question text.

By calculating the association degree between each document data and the problem text, determining and sequencing each document data according to the association degree, and then acquiring the document data with higher association degree with the problem text as the reference document data for analysis, the document data with higher association degree with the problem text can be selected for analyzing the problem text, and the accuracy of the reference analysis text is further improved.

Further, generating problem correlation information corresponding to each document data according to the problem text and each document data, including: acquiring problem keyword attribute information corresponding to the problem text; determining first document data, wherein the first document data is any one of the document data; acquiring first document keyword attribute information corresponding to the first document data, and generating problem keyword relevance information based on the problem keyword attribute information and the first document keyword attribute information; acquiring problem feature information corresponding to the problem text and first document feature information corresponding to the first document data, and generating problem feature correlation information based on the problem feature information and the first document feature information; and acquiring the problem correlation information corresponding to the first document data according to the problem keyword correlation information and the problem feature correlation information.

In practical application, the attribute information of the question key words is information for representing attributes of the key words in the question text, the attribute information of the document key words is information for representing attributes of the key words in the document data, the relevance information of the question key words is the relevance between the document data and the question text in terms of the key words, the characteristic information of the question is text characteristic information corresponding to the question text, the characteristic information of the document is text characteristic information corresponding to the document text corresponding to the document data, and the relevance information of the question features is the relevance between the document data and the question text in terms of the text characteristic.

Specifically, according to the specific manner of the problem keyword relevance information and the problem feature relevance information, the problem keyword relevance information and the problem feature relevance information are weighted and summed according to preset weights to obtain the problem relevance information of the information representing the relevance degree of the document data and the problem text; the information of the relevance degree of the keywords and the information of the relevance degree of the features of the questions can be processed by using RPF (Reciprocal Rank Fusion ) to obtain the information of the relevance degree of the information representing the relevance degree of the document data and the text of the questions.

The degree of correlation between the document data and the question text is determined by the degree of correlation between the document data and the question text in terms of keywords and in terms of text features, so that comprehensiveness of the degree of correlation of the directions of the keywords and deep semantic understanding of the degree of correlation of the directions of the text features can be utilized simultaneously, and a more comprehensive and accurate search result can be provided. And further, the calculation mode of the correlation degree between the document data and the problem text can be enabled to have higher accuracy.

Step 106: and acquiring matching weight information corresponding to each document data according to the reference analysis text and each document data.

In practical application, the matching weight information is information representing the matching degree between the document data and the problem instruction.

Specifically, the matching weight information may be understood as the importance degree of each document data in the case of answering the question instruction. The method comprises the steps of determining matching weight information of each document data by referring to the analysis text, obtaining the reference analysis text which is more logical and more comprehensive in description of the question text by further analyzing the question text, calculating importance degree of each document data and corresponding answers of the question instruction according to the reference analysis text, and selecting the more matched document data when the document data corresponding to the answer reference of the question instruction is generated in a subsequent selection mode, so that accuracy of automatic answer can be improved.

Further, according to the reference analysis text and each document data, obtaining matching weight information corresponding to each document data includes: according to the reference analysis text and each document data, obtaining analysis correlation information corresponding to each document data, wherein the problem correlation information characterizes the correlation degree between each document data and the reference analysis text; and determining matching weight information corresponding to each document data based on the problem correlation information and the analysis correlation information corresponding to each document data.

In practical application, the resolution correlation information is information for reflecting the matching degree between the document data and the reference resolution text.

Specifically, determining the resolution relevance information of each document may be understood as calculating the degree of association between each question and answer and the reference resolution text, that is, the manner of obtaining the resolution relevance information may be any manner of determining the degree of association between the document and the reference resolution text, which is not limited in this specification.

Referring to fig. 2, fig. 2 is a schematic diagram of a calculation manner of resolution correlation information provided in an embodiment of the present disclosure, where three pieces of document data, namely, document data 1, document data 2, and document data 3, are included in the embodiment, and resolution correlation information 1 corresponding to the document data 1, resolution correlation information 2 corresponding to the document data 2, and resolution correlation information 3 corresponding to the document data 3 are obtained by calculating resolution correlation information corresponding to the three pieces of document data by referring to the resolution text.

In one embodiment provided in the present specification, taking determining the keyword similarity between the reference parsed text and each document as the parsing correlation information as an example, calculating the keyword similarity by, for example, BM25 algorithm, directly adding each document data corresponding text to the ES library, directly obtaining the keyword similarity by using the method provided by the ES library, and the obtaining method of the keyword similarity such as the keyword similarity, to obtain the keyword similarity between each document and the reference parsed text, and determining the parsing correlation from the keyword similarity corresponding to each document.

In another embodiment provided in the present specification, taking determining the text feature similarity between the reference parsed text and each document as the parsing correlation information as an example, extracting text features corresponding to the reference parsed text and the document text by using, for example, any text processing model, calculating the text feature similarity between the text features, adding the text features of the document data processed by the text processing model into an ES library, directly obtaining the feature similarity between each document and the reference parsed text by using an obtaining method of feature similarity such as the text feature similarity between the reference parsed text and each document data provided by the ES library, and determining the parsing correlation by using the feature similarity corresponding to each document.

It should be noted that, the manner of determining the relevance between the document and the reference parsed text may also be to determine the parsing relevance between each document and the reference parsed text by using the keyword similarity and the feature similarity between each document and the reference parsed text.

The matching weight information corresponding to each document data is determined according to the problem correlation information and the analysis correlation information corresponding to each document data, which can be understood as determining the matching weight information corresponding to each document through the problem text, the reference analysis text and the document data. Specifically, the matching weight information corresponding to each document data is determined according to the problem correlation information and the analysis correlation information corresponding to each document data, which may be that each document data is ordered according to the correlation information and the analysis correlation information corresponding to each document data, the document data with the top ranking for two times (that is, the document data with the correlation degree of the problem text and the problem analysis of the document data being greater than that of other documents) is selected, and the matching weight information corresponding to the document data is determined according to the ranking before and after the document data is ranked; the calculation result may be used as the matching weight information corresponding to the calculation result by calculating the weighted sum between the problem correlation information and the analysis correlation information corresponding to the individual document data, which is not limited in any way in the present specification.

Since the reference analysis text is generated by the large model through the question text and the reference data characteristics, the degree of importance of the document data corresponding to the answer generation is determined through the degree of association between the document data and the question text and the degree of association between the reference analysis text, which can be understood as that the document more suitable for the answer model to generate the correct answer is obtained through utilizing the generation reasoning capability of the large model, and also since the reference analysis text is generated through the large model, the reference analysis text is further equivalent to having one large model select target document data for the answer model (can be understood as homogenization communication among models), compared with manually collecting data, training the document selection model according to the collected data, selecting the target document data according to the model, and the document more favorable for the model to generate the correct answer can be selected through the homogenization communication among the models.

Further, determining matching weight information corresponding to each document data based on the problem correlation information and the analysis correlation information corresponding to each document data includes: determining third document data, and determining third problem correlation information and third resolution correlation information corresponding to the third document data, wherein the third document data is any one of the document data; and acquiring matching weight information corresponding to the third document data based on the third problem correlation information and the third resolution correlation information.

Specifically, the manner of determining the matching weight information corresponding to the document data by the problem correlation information and the analysis correlation information corresponding to the single document data may be to calculate a weighted sum between the problem correlation information and the analysis correlation information as the matching weight information corresponding to the problem correlation information, or calculate a weighted sum between the problem correlation information and the analysis correlation information larger than a preset threshold, and the calculated result as the matching weight information corresponding to the problem correlation information, which is not limited in this specification.

By determining the matching weight information corresponding to the document data through the problem correlation information and the analysis correlation information corresponding to the single document data, the document which is more favorable for the model to generate the correct answer can be selected.

Further, according to the reference analysis text and each document data, obtaining analysis correlation information corresponding to each document data includes: generating an analysis text to be processed according to the reference analysis text, and acquiring analysis keyword attribute information corresponding to the analysis text to be processed, wherein the analysis text to be processed is a text for calculating the association degree between document data and the reference analysis text; determining second document data and acquiring second document keyword attribute information corresponding to the second document data, wherein the second document data is any one of the document data; generating analysis keyword relevance information based on the analysis keyword attribute information and the second document keyword attribute information; acquiring analysis characteristic information corresponding to the to-be-processed analysis text and second document characteristic information corresponding to the second document data, and generating second analysis characteristic correlation degree information based on the analysis characteristic information and the second document characteristic information; and acquiring analysis correlation information corresponding to the second document data according to the analysis keyword correlation information and the analysis feature correlation information.

In practical application, the to-be-processed analysis text is a text for calculating the association degree between each document data and the reference analysis text, the analysis keyword attribute information is information for representing the attribute of the keyword aspect in the to-be-processed analysis text, the analysis keyword correlation degree information is the correlation degree between the document data and the to-be-processed analysis text in the keyword aspect, the analysis feature information is text feature information corresponding to the to-be-processed analysis text, the document feature information is text feature information corresponding to the document text corresponding to the document data, and the analysis feature correlation degree information is the correlation degree between the document data and the to-be-processed analysis text in the text feature aspect.

Specifically, the to-be-processed analysis text may be understood as a text for calculating a correlation degree of analysis of the document data and the problem text, and the to-be-processed analysis text may be a reference analysis text, a text generated by splicing the reference analysis text and the problem text, or a text obtained by further processing the reference analysis text through a large language model to highlight an emphasis of the reference analysis text, which is not limited in the present specification.

The correlation degree between the document data and the reference analysis text is determined through the correlation degree between the document data and the analysis text to be processed in terms of keywords and text characteristics, and comprehensive and accurate retrieval results can be provided by utilizing the comprehensiveness of the correlation degree of the directions of the keywords and the deep semantic understanding of the correlation degree of the directions of the text characteristics. And further, the calculation mode of the correlation degree between the document data and the reference analysis text can have higher accuracy.

Further, generating the to-be-processed parsed text according to the reference parsed text includes: splicing the problem text and the reference analysis text to generate an analysis text to be processed; or determining the reference analysis text as the analysis text to be processed.

Determining document similarity using question text and parsed text provides more comprehensive content coverage and enhanced semantic understanding than using parsed text alone. The method combines the specific information requirement of the problem and the detailed background of analysis, thereby more comprehensively covering related topics and concepts, improving the degree of correlation of the corresponding analysis of the calculated document data and the problem instruction, selecting the document data with higher degree of correlation to generate an answer text, and further ensuring the high correlation of the result in meeting the actual requirement of the user.

Step 108: and determining at least one target document data in each document data based on the matching weight information corresponding to each document data, and generating an answer text corresponding to the question instruction according to the question text and each target document data.

In practical application, the target document data is document data for generating answer text for a question instruction, and the answer text is text for a question described by a question text in the answer question instruction.

Specifically, the target document data may be understood as document data that is relatively matched with the question instruction, and the answer text may be understood as text output by the model for the question text according to the target document data.

Further, generating an answer text corresponding to the question instruction according to the question text and each target document data, including: acquiring at least one target document text according to each target document data; generating a text to be answered based on the question text and each target document text, wherein the text to be answered is a prompt text which enables a question-answering model to generate a question text corresponding to the answer text; and inputting the text to be answered to a question-answering model, and obtaining an answer text output by the question-answering model.

In practical application, the target document text is the text corresponding to the target document data, the text to be answered is the text for inputting a question-answering model so that the question-answering model outputs the answer text, and the question-answering model is a model for outputting the answer text aiming at the question instruction.

Specifically, the answer model may be understood as any text processing model capable of performing text processing, and the answer model may have the same model structure and parameters as the above-mentioned parsing generation model, or may use a deep learning model different from the above-mentioned parsing generation model as a question-answer model, and in consideration of the accuracy of the output answer text, optionally, a model in which model parameters are adjusted for the target scene is used, and the structure and parameters of the model are not limited in this specification.

It should be noted that, the specific manner of generating the text to be answered may be directly splicing the question text and the target document text, inputting the text after splicing to the question-answer model, so that the question-answer model generates the answer text for the question text, or may be a manner of combining the target document text and the question text through the answer generation prompting word template, generating the text to be answered, and the like, and generating the text to be answered, which is not limited in this specification.

In one embodiment provided in the present specification, taking an example of generating a text to be answered by combining a target document text and a question text through an answer generation prompt word template, describing the generation of an answer text by using a question-answer model, the corresponding answer generation prompt word template is: "question text: [ here input question text ] related document: [ here input target document text ] request parsing: a detailed answer is generated based on the question text and related documents. This answer should specifically and accurately interpret the answer to the question and provide supporting evidence or arguments in combination with the information in the document data. And inserting the question text and the target document text into the prompt word template to obtain a text to be answered, and inputting the obtained text to be answered into a question-answering model to obtain an answer text.

By applying the scheme of the embodiment of the specification, firstly, the document database corresponding to the question text is determined, the confirmation range of the target document can be reduced, then, the reference document is determined according to the question correlation information between the question text and the document data in the document database, and the reference analysis text of the question text is determined according to the reference document, so that the accuracy of the reference analysis text can be improved, the accuracy of determining the target document data according to the reference analysis text can be further improved, then, the analysis correlation information corresponding to each document data is generated by utilizing the large model based on the reference analysis text, and then, the matching weight corresponding to the document is determined according to the question correlation information and the analysis correlation information of each document, and the reference analysis text which is more logical and accurate can be effectively obtained by utilizing the reasoning analysis capability of the large model, so that the target document data determined by utilizing the reference analysis text is more favorable for generating the correctly replied document by utilizing the large model, and the accuracy of model answer is further improved.

Corresponding to the above method embodiment, the present disclosure further provides an automatic question and answer method embodiment applied to the cloud device, referring to fig. 3, and fig. 3 shows a flowchart of an automatic question and answer method applied to the cloud device according to one embodiment of the present disclosure, which specifically includes the following steps.

Step 302: and receiving a problem instruction sent by the terminal side equipment, and determining a document database corresponding to the problem instruction, wherein the problem instruction comprises a problem text, and the document database comprises at least one document data.

Step 304: at least one reference document data is determined in the document database, and a reference analysis text is generated based on the problem text and each reference document data, wherein the reference analysis text is text for analyzing the problem text based on the reference document data.

Step 306: and acquiring matching weight information corresponding to each document data according to the reference analysis text and each document data.

Step 308: and determining at least one target document data in each document data based on the matching weight information corresponding to each document data, and generating an answer text corresponding to the question instruction according to the question text and each target document data.

Step 310: and returning the answer text to the end-side equipment.

Further, before receiving the problem instruction sent by the end-side device, the method further includes: receiving at least one initial document database sent by the terminal side equipment; or, at least one document text sent by the receiving end side device generates at least one initial document database according to each document text.

Specifically, considering that it takes a long time to store each document, before receiving a question instruction, that is, before generating an answer corresponding to the question, the end-side device may complete conversion of the document data, and then the user transmits the document data to the cloud device. Considering the requirement of calculation power required for processing the document text into the document data, the terminal side equipment can also send the document text to the cloud testing equipment, and the cloud equipment finishes the processing of the document data, so that the calculation power requirement of the terminal side equipment can be reduced, and the efficiency of answering the questions can be improved.

The above is a schematic scheme of an automatic question-answering method applied to cloud devices in this embodiment. It should be noted that, the technical solution of the automatic question-answering method applied to the cloud device and the technical solution of the automatic question-answering method described above belong to the same concept, and the details of the technical solution of the automatic question-answering method applied to the cloud device, which are not described in detail, can be referred to the description of the technical solution of the automatic question-answering method described above.

By applying the scheme of the embodiment of the specification, the accuracy of determining the document database corresponding to the question text can be improved by receiving the question instruction sent by the terminal side device and the initial document database, and the terminal side device only sends the question instruction and the initial document database or the document text because the step of answering the questions with higher requirements on hardware is operated on the cloud side device, so that the resource consumption of the terminal side device can be effectively reduced, the efficiency of answering the questions can be improved, and a large model with higher calculation force requirements can be deployed for answering the questions because the cloud side device generally has higher hardware level, the accuracy of answering the model can be improved, and the matching weight information corresponding to each document data is determined based on the reference analysis text, so that the documents which are more beneficial to the large model to generate correct replies can be selected by using the language with higher logic and written in the reference analysis text, and the accuracy of answering the model can be further improved.

Referring to fig. 4, fig. 4 illustrates an architecture diagram of an automatic question-answering system provided in one embodiment of the present description, which may include a client 100 and a server 200; the client 100 is configured to send a question instruction to the server 200, where the question instruction includes a question text; the server 200 is configured to receive the question instruction and determine a document database corresponding to the question instruction, where the question instruction includes a question text, and the document database includes at least one document data; determining at least one reference document data in the document database, and generating a reference analysis text based on the problem text and each reference document data, wherein the reference analysis text is a text for analyzing the problem text based on the reference document data; according to the reference analysis text and each document data, obtaining matching weight information corresponding to each document data; determining at least one target document data in each document data based on the matching weight information corresponding to each document data, and generating an answer text corresponding to the question instruction according to the question text and each target document data; sending answer text to the client 100; the client 100 is further configured to receive the answer text sent by the server 200.

By applying the scheme of the embodiment of the specification, the accuracy of determining the document database corresponding to the question text can be improved by receiving the question instruction sent by the client and the initial document database, and the client only sends the question instruction and the initial document database or the document text as the step of answering the questions with higher requirements on hardware is operated on the server, so that the resource consumption of the client can be effectively reduced, the efficiency of answering the questions can be improved, and the server usually has higher hardware level, so that a large model with higher calculation force requirement can be deployed for answering the questions, the accuracy of answering the model can be improved, and the matching weight information corresponding to each document data is determined based on the reference analytic text, so that the documents which are more beneficial to the correct answer generated by a large model can be selected by using the language which has higher logicality and is written in the reference analytic text, thereby further improving the accuracy of answering the model.

The automatic question and answer system may include a plurality of clients 100 and a server 200, wherein the clients 100 may be referred to as end-side devices and the server 200 may be referred to as cloud devices. Communication connection can be established between the plurality of clients 100 through the server 200, in the automatic question-answering scenario, the server 200 is used to provide an automatic question-answering service between the plurality of clients 100, and the plurality of clients 100 can respectively serve as a transmitting end or a receiving end, and communication is realized through the server 200.

The user may interact with the server 200 through the client 100 to receive data transmitted from other clients 100, or transmit data to other clients 100, etc. In the automatic question-answer scenario, the user may issue a data stream to the server 200 through the client 100, and the server 200 generates an answer text according to the data stream and pushes the answer text to other clients establishing communication.

Wherein, the client 100 and the server 200 establish a connection through a network. The network provides a medium for a communication link between client 100 and server 200. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The data transmitted by the client 100 may need to be encoded, transcoded, compressed, etc. before being distributed to the server 200.

The client 100 may be a browser, APP (Application), or a web Application such as H5 (HyperText Markup Language, hypertext markup language (htv) 5 th edition) Application, or a light Application (also called applet, a lightweight Application) or cloud Application, etc., and the client 100 may be based on a software development kit (SDK, software Development Kit) of a corresponding service provided by the server 200, such as a real-time communication (RTC, real Time Communication) based SDK development acquisition, etc. The client 100 may be deployed in an electronic device, need to run depending on the device or some APP in the device, etc. The electronic device may for example have a display screen and support information browsing etc. as may be a personal mobile terminal such as a mobile phone, tablet computer, personal computer etc. Various other types of applications are also commonly deployed in electronic devices, such as human-machine conversation type applications, model training type applications, text processing type applications, web browser applications, shopping type applications, search type applications, instant messaging tools, mailbox clients, social platform software, and the like.

The server 200 may include a server that provides various services, such as a server that provides communication services for multiple clients, a server for background training that provides support for a model used on a client, a server that processes data sent by a client, and so on. It should be noted that, the server 200 may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. The server may also be a server of a distributed system or a server that incorporates a blockchain. The server may also be a cloud server for cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN, content Delivery Network), basic cloud computing services such as big data and artificial intelligence platforms, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.

It should be noted that, the automatic question-answering method provided in the embodiments of the present disclosure is generally executed by the server, but in other embodiments of the present disclosure, the client may also have a similar function to the server, so as to execute the automatic question-answering method provided in the embodiments of the present disclosure. In other embodiments, the automatic question answering method provided in the embodiments of the present disclosure may be performed by the client and the server together.

The automatic question answering method provided in the present specification will be further described with reference to fig. 5 by taking an application of the automatic question answering method to question answering as an example. Fig. 5 is a flowchart of a processing procedure of a question answering method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 502: at least one initial document database is generated from the document text.

Step 504: and receiving a question instruction and analyzing the question instruction to acquire a question text.

Step 506: whether the question text requires the document to answer according to the document database is determined, if so, step 508 is performed, and if not, step 524 is performed.

Step 508: whether the question text has a corresponding document database is determined, if so, step 510 is performed, and if not, step 526 is performed.

Step 510: and determining a document database corresponding to the problem text.

Step 512: and determining problem relatedness information of each document in the document database according to the problem text.

Step 514: reference document data for generating reference parsed text is determined according to the problem correlation information of each document, and the reference parsed text is generated according to each reference document data.

Step 516: and determining the analysis relevance information of each document in the document database according to the reference analysis text.

Step 518: and generating matching weight information of each document by combining the problem correlation information and the analysis correlation information of each document.

Step 520: and selecting the document data with the preset number of the front pieces of the matching weight information as target document data.

Step 522: and the answer model outputs answers of the question text according to the question text and the target document text corresponding to each target document data.

Step 524: the answer model directly outputs answers to the question text according to the question text.

Step 526: direct output: "sorry, prior knowledge cannot support solving the problem".

By applying the scheme of the embodiment of the specification, the question text input by the user is judged, the specific answer mode of the question text is determined, and the problem that all questions are determined to correspond to the document database and the target document is called can be effectively avoided.

The automatic question-answering method provided in the present specification will be further described with reference to fig. 6, by taking an application of the automatic question-answering method to legal knowledge question-answering as an example. Fig. 6 shows a flowchart of a processing procedure of a legal knowledge question-answering method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 602: receiving document text A uploaded by a user: work 1, document text B: case set 1 corresponding to work 1, description text 1 for document a and document B: "documents storing all information related to disputes between individuals or businesses, these documents typically contain detailed information about contract disputes, property rights, family matters, and the like. Dispute events of personal rights and responsibilities are tracked and analyzed through such documents.

Step 604: receiving document text C uploaded by a user: work 2, document text D: case set 2 corresponding to work 2.

Step 606: processing the document text A according to the word embedding method to generate document data A, processing the document text B according to the word embedding method to generate document data B, combining the document data A and the document data B to generate a document database alpha, and taking the description text 1 as a database description text corresponding to the document database alpha.

Step 608: processing the document text C according to the word embedding method to generate document data C, processing the document text D according to the word embedding method to generate document data D, combining the document data C and the document data D to generate a document database beta, generating a description text 2 according to the contents recorded by the document text C and the document data D, and taking the description text 2 as a database description text corresponding to the document database alpha.

Specifically, description text 2 is: "documents storing all information related to personal or social behavior violations," these documents typically include detailed information about theft, violence, fraud, and the like. Events of personal behavior and social responsibility are tracked and analyzed through such documents.

Step 610: receiving a text sent by a user and comprising a problem: "what is the result of a lease contract between a homeowner and a tenant, if the tenant is leased without unauthorized approval from the homeowner? "problem instruction.

Step 612: and determining that the relevance of the problem text corresponding to the description text 1 is 0.9 and the relevance of the problem text corresponding to the description text 2 is 0.1 through the large model, and determining that the document database alpha corresponding to the description text 1 is the document database corresponding to the problem instruction.

Step 614: and extracting the attribute information of the question keywords corresponding to the question text as follows: homeowners, tenants, lease contracts, re-leases, and complaints.

Step 616: and calculating the attribute information of the key words and the attribute information of the document key words corresponding to the 'writing 1', determining that the correlation degree of the problem key words corresponding to the 'writing 1' is 0.9, and calculating the attribute information of the key words and the attribute information of the document key words corresponding to the 'case 1', determining that the correlation degree of the problem key words corresponding to the 'case 1' is 0.6.

Specifically, the method for confirming the relevance of the keywords of the problem here can be to directly calculate the similarity of the keywords through a BM25 algorithm; the method can also be a method for directly acquiring the keyword similarity and the like by adding the corresponding text of each document data into an ES library and using the method provided by the ES library.

Step 618: extracting feature information of a problem text, calculating the feature information and feature information corresponding to the 'work 1', determining that the problem feature correlation corresponding to the 'work 1' is 0.8, and calculating the feature information and feature information corresponding to the 'case 1', determining that the problem feature correlation corresponding to the 'case 1' is 0.9.

Specifically, the confirmation mode of the problem feature correlation degree can be that text features corresponding to the problem text and the document text are extracted through any text processing model, and the text feature similarity between the text features is calculated; the text feature of the document data after being processed by the text processing model is added into an ES library, and the text feature similarity and the like between the problem text and each document data can be directly obtained by using a method provided by the ES library.

Step 620: since the problem keyword correlation degree corresponding to "work 1" is 0.9, the problem feature correlation degree is 0.8, the problem keyword correlation degree corresponding to "case 1" is 0.6, and the problem feature correlation degree is 0.9, the problem correlation degree information corresponding to "work 1" is 1.7, and the problem correlation degree information corresponding to "case 1" is 1.5.

Specifically, the confirmation manner of the problem correlation information may be that the problem keyword correlation information and the problem feature correlation information are weighted and summed (the direct summation described in the step 620 may be understood as that the weights of the two are set to 1 and weighted and summed), so as to obtain the problem correlation information of the information representing the degree of correlation between the document data and the problem text; the method may be a method for confirming feature relevance information, such as processing the problem keyword relevance information and the problem feature relevance information by using RPF, obtaining problem relevance information representing information of relevance degree between document data and problem text, and the like.

Step 622: since it is necessary to determine a reference document data to generate a reference parsed text, the reference parsed text is generated using "work 1" and the question text.

Specifically, the reference parsing text is: "contract violation: according to most lease laws, if a tenant violates a definite term in the contract (e.g., an unauthorized lease), this constitutes a contract violation. (from work 1 page)

Compensation liability: the tenant needs to compensate for the direct loss of the landlord due to the transfer of renting.

The default price is: if the lease contract has the default terms, the tenant also needs to pay the default according to the contract.

Legal litigation: the landlord has the right to set out litigation to the court, asking for confirmation of the renter's offence and asking for proper legal remediation, including but not limited to damage reimbursement. "

It should be noted that the reason why it is determined that one piece of reference document data is required here is that in the present project requirement, the amount of reference document data configured to determine one piece of reference document data to generate the reference resolved text, that is, the reference document data used to generate the reference resolved text, is determined according to the actual project requirement, and is not limited to the use of one piece of reference document data to generate the reference resolved text.

Step 624: and then, respectively calculating and determining that the resolution correlation information of the reference resolution text corresponding to the 'works 1' is 1.7 and the resolution correlation information corresponding to the 'cases 1' is 1.9.

Specifically, the method for confirming the resolution correlation between each document and the reference resolution text is similar to the method for confirming the problem correlation between each document and the reference problem text, and will not be described herein.

Step 626: the parameters of the problem correlation information are set to 0.4, the parameters of the analysis correlation information are set to 0.6, and the matching weight information corresponding to the 'work 1' is calculated to be 1.7, and the matching weight information corresponding to the 'case 1' is calculated to be 1.74.

Step 628: and determining the case 1 as target document data, and inputting the case 1 and the question text into a question-answer model to obtain an answer text.

Specifically, the answer text is: "in most cases, if a lease is explicitly prohibited in the contract, the lease constitutes a contract violation by the lease without permission. Referring to similar cases, for example, in case numbered (described on page 341 of case 1), the court recognizes that the rental term in the rental contract is violated by the tenant Zhang Mou, and therefore the following responsibilities are required: compensation loss: courts typically require the tenant to reimburse for the actual loss to the landlord due to the breach. This may include house maintenance costs, increased management costs, etc. due to the renting. Returning property: the court decides that the tenant must terminate the contract prematurely due to the breach of the contract and return to the house. Payment of the default fund: depending on the contract terms, the tenant may also need to pay the contracted default if default terms are set. Court enforcement: if the tenant refuses to make decisions, the landlord may apply court enforcement to ensure recovery of the house and reimbursement.

It should be noted that the reason why the case 1 is determined as the target document data is also because, in the present project requirement, it is configured to determine that one target document data generates the answer text, that is, the target document data for generating the answer data is also not limited to the reference document data generating reference resolved text.

By applying the scheme of the embodiment of the specification, firstly, the document database corresponding to the problem text is determined, the confirmation range of the target document can be reduced, then, the reference document is determined according to the problem correlation information between the problem text and the document data in the document database, and the reference analysis text of the problem text is determined according to the reference document, so that the accuracy of the reference analysis text can be improved, the accuracy of determining the target document data according to the reference analysis text can be further improved, then, the analysis correlation information corresponding to each document data is generated by utilizing the large model based on the reference analysis text, and then, the matching weight corresponding to the document is determined according to the problem correlation information and the analysis correlation information of each document, and the inference analysis capability of the large model can be effectively utilized to obtain the reference analysis text which is more logical and accurate, so that the target document data determined by utilizing the reference analysis text is more favorable for generating the document which is correctly replied by utilizing the large model, and the accuracy of law is further improved.

Corresponding to the method embodiment, the present disclosure further provides an automatic question-answering device embodiment, and fig. 7 shows a schematic structural diagram of an automatic question-answering device provided in one embodiment of the present disclosure. As shown in fig. 7, the apparatus includes:

An instruction receiving module 702 configured to receive a question instruction and determine a document database corresponding to the question instruction, wherein the question instruction includes a question text, and the document database includes at least one document data; an parsing generation module 704 configured to determine at least one reference document data in the document database and generate a reference parsed text based on the question text and each reference document data, wherein the reference parsed text is a text that parses the question text based on the reference document data; the weight obtaining module 706 is configured to obtain matching weight information corresponding to each document data according to the reference analysis text and each document data; an answer generation module 708 is configured to determine at least one target document data in each document data based on the matching weight information corresponding to each document data, and generate an answer text corresponding to the question instruction according to the question text and each target document data.

Optionally, the automatic question answering device further includes a database generating module configured to: acquiring at least one document text and a document database description text; processing each document text according to a preset text vector strategy, and generating document data corresponding to each document text, wherein the text vector strategy is used for vectorizing each document text; and determining a document database corresponding to each document text according to the document database description text and each document data.

Optionally, the instruction receiving module 702 is further configured to: analyzing the problem instruction to determine a problem text; under the condition that the corresponding document database needs to be determined by the problem text, at least one initial document database and the description text of the initial document database corresponding to each document database are obtained; calculating the library description correlation information between the problem text and each initial document database description text, and determining a target document database description text according to the library description correlation information corresponding to each initial document database description text; and determining an initial document database corresponding to the description text of the target document database as the document database corresponding to the problem instruction.

Optionally, the parsing generation module 704 is further configured to: acquiring at least one reference document text according to each reference document data; generating a text to be analyzed based on the problem text and each reference document text, wherein the text to be analyzed is a prompt text which enables an analysis generation model to generate a problem text corresponding to the reference analysis text; and inputting the text to be analyzed into an analysis generation model, and obtaining a reference analysis text output by the analysis generation model.

Optionally, the parsing generation module 704 is further configured to: generating problem correlation information corresponding to each document data according to the problem text and each document data, wherein the problem correlation information characterizes the correlation degree between each document data and the problem text; acquiring reference sequence information corresponding to each document data based on the problem correlation information corresponding to each document data; and determining at least one reference document data according to the reference sequence information corresponding to each document data.

Optionally, the parsing generation module 704 is further configured to: acquiring problem keyword attribute information corresponding to the problem text; determining first document data, wherein the first document data is any one of the document data; acquiring first document keyword attribute information corresponding to the first document data, and generating problem keyword relevance information based on the problem keyword attribute information and the first document keyword attribute information; acquiring problem feature information corresponding to the problem text and first document feature information corresponding to the first document data, and generating problem feature correlation information based on the problem feature information and the first document feature information; and acquiring the problem correlation information corresponding to the first document data according to the problem keyword correlation information and the problem feature correlation information.

Optionally, the weight obtaining module 706 is further configured to: according to the reference analysis text and each document data, obtaining analysis correlation information corresponding to each document data, wherein the problem correlation information characterizes the correlation degree between each document data and the reference analysis text; and determining matching weight information corresponding to each document data based on the problem correlation information and the analysis correlation information corresponding to each document data.

Optionally, the weight obtaining module 706 is further configured to: generating an analysis text to be processed according to the reference analysis text, and acquiring analysis keyword attribute information corresponding to the analysis text to be processed, wherein the analysis text to be processed is a text for calculating the association degree between document data and the reference analysis text; determining second document data and acquiring second document keyword attribute information corresponding to the second document data, wherein the second document data is any one of the document data; generating analysis keyword relevance information based on the analysis keyword attribute information and the second document keyword attribute information; acquiring analysis characteristic information corresponding to the to-be-processed analysis text and second document characteristic information corresponding to the second document data, and generating second analysis characteristic correlation degree information based on the analysis characteristic information and the second document characteristic information; and acquiring analysis correlation information corresponding to the second document data according to the analysis keyword correlation information and the analysis feature correlation information.

Optionally, the weight obtaining module 706 is further configured to: splicing the problem text and the reference analysis text to generate an analysis text to be processed; or determining the reference analysis text as the analysis text to be processed.

Optionally, the weight obtaining module 706 is further configured to: determining third document data, and determining third problem correlation information and third resolution correlation information corresponding to the third document data, wherein the third document data is any one of the document data; and acquiring matching weight information corresponding to the third document data based on the third problem correlation information and the third resolution correlation information.

Optionally, the answer generation module 708 is further configured to: acquiring at least one target document text according to each target document data; generating a text to be answered based on the question text and each target document text, wherein the text to be answered is a prompt text which enables a question-answering model to generate a question text corresponding to the answer text; and inputting the text to be answered to a question-answering model, and obtaining an answer text output by the question-answering model.

By applying the scheme of the embodiment of the specification, firstly, the document database corresponding to the question text is determined through the instruction receiving module, the confirmation range of the target document can be reduced, then the analysis generating module determines the reference document according to the question correlation information between the question text and the document data in the document database, and determines the reference analysis text of the question text according to the reference document, so that the accuracy of the reference analysis text can be improved, the accuracy of determining the target document data according to the reference analysis text can be further improved, then the weight acquiring module generates analysis correlation information corresponding to each document data by utilizing the large model based on the reference analysis text, and the matching weight corresponding to the document is determined according to the question correlation information and the analysis correlation information of each document, so that the answer generating module can effectively acquire the reference analysis text which has logic and accuracy by utilizing the target document data determined by the reference analysis text, thereby being more beneficial to the large model to generate the correct answer document, and further improving the accuracy of model answer.

The above is a schematic scheme of an automatic question answering apparatus of this embodiment. It should be noted that, the technical solution of the automatic question-answering device and the technical solution of the automatic question-answering method belong to the same concept, and details of the technical solution of the automatic question-answering device, which are not described in detail, can be referred to the description of the technical solution of the automatic question-answering method.

Fig. 8 illustrates a block diagram of a computing device 800 provided in accordance with one embodiment of the present description. The components of computing device 800 include, but are not limited to, memory 810 and processor 820. Processor 820 is coupled to memory 810 through bus 830 and database 850 is used to hold data.

Computing device 800 also includes access device 840, access device 840 enabling computing device 800 to communicate via one or more networks 860. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 840 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, near Field Communication (NFC).

In one embodiment of the present description, the above-described components of computing device 800, as well as other components not shown in FIG. 8, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 8 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 800 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 800 may also be a mobile or stationary server.

Wherein processor 820 is operative to execute computer-executable instructions that, when executed by the processor, perform the steps of the automated question-answering method described above.

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the automatic question-answering method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the automatic question-answering method.

An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the automatic question-answering method described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the automatic question-answering method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the automatic question-answering method.

An embodiment of the present specification also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the automatic question-answering method described above.

The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the automatic question-answering method belong to the same conception, and the details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the automatic question-answering method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. An automatic question-answering method, comprising:

Generating problem correlation information corresponding to each document data according to the problem text and each document data; acquiring reference sequence information corresponding to each document data based on the problem correlation information corresponding to each document data; determining at least one reference document data according to reference sequence information corresponding to each document data, and generating a reference analysis text based on the question text and each reference document data, wherein the reference analysis text is a text for analyzing the question text based on the reference document data, and the question correlation information characterizes the correlation degree between each document data and the question text, and the question correlation information corresponding to the document data is obtained from the question keyword correlation information and the question feature correlation information between the question text and the document data;

2. The method of claim 1, generating question relevance information corresponding to each document data from the question text and each document data, comprising:

Acquiring problem keyword attribute information corresponding to the problem text;

determining first document data, wherein the first document data is any one of the document data;

Acquiring first document keyword attribute information corresponding to the first document data, and generating problem keyword relevance information based on the problem keyword attribute information and the first document keyword attribute information;

Acquiring problem feature information corresponding to the problem text and first document feature information corresponding to the first document data, and generating problem feature correlation information based on the problem feature information and the first document feature information;

And acquiring the problem correlation information corresponding to the first document data according to the problem keyword correlation information and the problem feature correlation information.

3. The method of claim 1, according to the reference parsing text and each document data, obtaining matching weight information corresponding to each document data, comprising:

according to the reference analysis text and each document data, obtaining analysis correlation information corresponding to each document data, wherein the problem correlation information characterizes the correlation degree between each document data and the reference analysis text;

and determining matching weight information corresponding to each document data based on the problem correlation information and the analysis correlation information corresponding to each document data.

4. The method of claim 3, according to the reference analysis text and each document data, obtaining analysis relevance information corresponding to each document data, comprising:

Generating an analysis text to be processed according to the reference analysis text, and acquiring analysis keyword attribute information corresponding to the analysis text to be processed, wherein the analysis text to be processed is a text for calculating the association degree between document data and the reference analysis text;

determining second document data and acquiring second document keyword attribute information corresponding to the second document data, wherein the second document data is any one of the document data;

generating analysis keyword relevance information based on the analysis keyword attribute information and the second document keyword attribute information;

Acquiring analysis characteristic information corresponding to the to-be-processed analysis text and second document characteristic information corresponding to the second document data, and generating second analysis characteristic correlation degree information based on the analysis characteristic information and the second document characteristic information;

and acquiring analysis correlation information corresponding to the second document data according to the analysis keyword correlation information and the analysis feature correlation information.

5. The method of claim 4, generating the parsed text to be processed from the reference parsed text, comprising:

Splicing the problem text and the reference analysis text to generate an analysis text to be processed; or alternatively, the first and second heat exchangers may be,

And determining the reference analysis text as the analysis text to be processed.

6. The method of claim 3, determining matching weight information corresponding to each document data based on the problem correlation information and the parsing correlation information corresponding to each document data, comprising:

determining third document data, and determining third problem correlation information and third resolution correlation information corresponding to the third document data, wherein the third document data is any one of the document data;

And acquiring matching weight information corresponding to the third document data based on the third problem correlation information and the third resolution correlation information.

7. The method of claim 1, generating reference parsed text based on the question text and reference document data, comprising:

acquiring at least one reference document text according to each reference document data;

Generating a text to be analyzed based on the problem text and each reference document text, wherein the text to be analyzed is a prompt text which enables an analysis generation model to generate a problem text corresponding to the reference analysis text;

And inputting the text to be analyzed into an analysis generation model, and obtaining a reference analysis text output by the analysis generation model.

8. The method of claim 1, generating answer text corresponding to the question instruction according to the question text and each target document data, comprising:

acquiring at least one target document text according to each target document data;

generating a text to be answered based on the question text and each target document text, wherein the text to be answered is a prompt text which enables a question-answering model to generate a question text corresponding to the answer text;

And inputting the text to be answered to a question-answering model, and obtaining an answer text output by the question-answering model.

9. The method of claim 1, the document database being generated by:

acquiring at least one document text and a document database description text;

processing each document text according to a preset text vector strategy, and generating document data corresponding to each document text, wherein the text vector strategy is used for vectorizing each document text;

And determining a document database corresponding to each document text according to the document database description text and each document data.

10. The method of claim 1, receiving a problem instruction and determining a document database to which the problem instruction corresponds, comprising:

analyzing the problem instruction to determine a problem text;

Under the condition that the corresponding document database needs to be determined by the problem text, at least one initial document database and the description text of the initial document database corresponding to each document database are obtained;

Calculating the library description correlation information between the problem text and each initial document database description text, and determining a target document database description text according to the library description correlation information corresponding to each initial document database description text;

And determining an initial document database corresponding to the description text of the target document database as the document database corresponding to the problem instruction.

11. An automatic question-answering method applied to cloud equipment, the method comprising:

And returning the answer text to the end-side equipment.

12. The method of claim 11, prior to receiving the problem instruction sent by the end-side device, the method further comprising:

Receiving at least one initial document database sent by the terminal side equipment; or alternatively, the first and second heat exchangers may be,

And generating at least one initial document database according to the at least one document text sent by the receiving end side equipment.

13. A computing device, comprising:

a memory and a processor;

The memory is adapted to store a computer program/instruction, the processor being adapted to execute the computer program/instruction, which when executed by the processor, implements the steps of the method according to any of claims 1-12.

14. A computer readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the method of any of claims 1-12.

15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-12.