[go: up one dir, main page]

CN118227655B - Database query statement generation method, device, equipment and storage medium - Google Patents

Database query statement generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN118227655B
CN118227655B CN202410647323.3A CN202410647323A CN118227655B CN 118227655 B CN118227655 B CN 118227655B CN 202410647323 A CN202410647323 A CN 202410647323A CN 118227655 B CN118227655 B CN 118227655B
Authority
CN
China
Prior art keywords
query
knowledge
item
intention
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410647323.3A
Other languages
Chinese (zh)
Other versions
CN118227655A (en
Inventor
章子晗
黎洋
方懿德
王子威
常卓
黄紫岳
薛焕然
黄丹青
杨晓峰
陈鹏
蒋杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410647323.3A priority Critical patent/CN118227655B/en
Publication of CN118227655A publication Critical patent/CN118227655A/en
Application granted granted Critical
Publication of CN118227655B publication Critical patent/CN118227655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for generating database query sentences, wherein the method comprises the following steps: acquiring a query problem in the service field, wherein the query problem is a natural language text; determining a database table configured for the service domain and a domain knowledge base, wherein the domain knowledge base comprises at least one knowledge item; retrieving knowledge items related to the query questions from the domain knowledge base, and calling an intention optimization model to optimize the query intention of the query questions based on the retrieved knowledge items to obtain query questions with optimized intention; and calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table. The application can automatically generate the database query statement and improve the accuracy of the database query statement.

Description

Database query statement generation method, device, equipment and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a database query statement.
Background
Traditionally, user access and analysis of data has relied upon a data analyst to write code (i.e., database query statements) using an underlying database query language (e.g., SQL) so that the code can be used later to retrieve data from database tables; the English of SQL is called Structured Query Language, which can be called a structured query language, which is a programming language for storing and processing information in relational databases. However, with the increase of the size and complexity of database tables, the efficiency of manually writing database query sentences (such as SQL sentences) has been difficult to adapt to the increasing data query requirements; to solve this problem, research communities propose related methods for automatically generating database query sentences, which support users to input query questions based on natural language, thereby converting the corresponding query questions into database query sentences using models. Practice shows that the related method proposed by the research community directly inputs the query problem into the model for conversion, so that the accuracy of the finally generated database query statement is low.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for generating database query sentences, which can automatically generate the database query sentences and improve the accuracy of the database query sentences.
In one aspect, an embodiment of the present application provides a method for generating a database query statement, where the method includes:
acquiring a query problem in the service field, wherein the query problem is a natural language text;
Determining a database table configured for the business domain and a domain knowledge base, wherein the domain knowledge base comprises at least one knowledge item, and one knowledge item comprises: interpretation information of one term and corresponding term in the business field;
Retrieving knowledge items related to the query questions from the domain knowledge base, and calling an intention optimization model to optimize the query intention of the query questions based on the retrieved knowledge items to obtain query questions with optimized intention;
And calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table.
On the other hand, the embodiment of the application provides a device for generating database query sentences, which comprises:
The system comprises an acquisition unit, a query unit and a processing unit, wherein the acquisition unit is used for acquiring a query problem in the service field, wherein the query problem is a natural language text;
the obtaining unit is further configured to determine a database table configured for the service domain and a domain knowledge base, where the domain knowledge base includes at least one knowledge item, and one of the knowledge items includes: interpretation information of one term and corresponding term in the business field;
The processing unit is used for retrieving knowledge items related to the query problem from the domain knowledge base, and calling an intention optimization model to perform intention optimization on the query intention of the query problem based on the retrieved knowledge items to obtain the query problem after intention optimization;
the processing unit is further used for calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table.
In a specific embodiment, the processing unit, when configured to retrieve knowledge items related to the query question from the domain knowledge base, may be specifically configured to:
Acquiring a plurality of item recall strategies, wherein the item recall strategies are used for determining the relevance scores between the query problem and each knowledge item in the domain knowledge base, and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low, wherein K is a positive integer; different entry recall policies differ in the manner in which the relevance scores between the query questions and the knowledge entries are determined;
According to the multiple item recall strategies, carrying out item recall based on the query problem in the domain knowledge base to obtain multiple item recall results; one item recall result corresponds to one item recall policy, and each item recall result comprises K knowledge items;
traversing the plurality of item recall results, and carrying out normalization processing on the relevance scores corresponding to the knowledge items in the item recall results of the current traversal to obtain normalization scores corresponding to the corresponding knowledge items;
And retrieving knowledge items related to the query problem from the plurality of item recall results based on the normalized scores corresponding to the knowledge items in the plurality of item recall results.
In another embodiment, the plurality of item recall policies includes a semantic recall policy; correspondingly, when the processing unit is used for carrying out item recall based on the query problem in the domain knowledge base according to the semantic recall strategy to obtain an item recall result, the processing unit can be specifically used for:
According to the semantic recall strategy, carrying out semantic recognition on the query problem to obtain a semantic vector of the query problem;
Acquiring semantic vectors of each knowledge item in the domain knowledge base, wherein the semantic vector of any knowledge item is obtained by carrying out semantic recognition after cleaning the corresponding knowledge item; the cleaning process includes: removing invalid words in the knowledge items;
Determining a relevance score between the query question and the corresponding knowledge item based on vector similarity between the semantic vector of the query question and the semantic vector of each knowledge item, respectively;
and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low to obtain an item recall result.
In another embodiment, the intent optimization model is a language model; correspondingly, the processing unit is used for calling the intention optimizing model to optimize the query intention of the query problem based on the retrieved knowledge items, so as to obtain the query problem after intention optimization, and the processing unit comprises the following steps:
Acquiring a thinking chain prompt text, wherein the thinking chain prompt text is used for prompting: executing a problem rewriting task on the problem in the input field according to a plurality of steps; the steps sequentially comprise: analyzing the problem to obtain a query intention, optimizing the query intention, and rewriting the corresponding problem based on the optimized query intention and the retrieved knowledge item to obtain the problem with optimized intention;
Generating a task description instruction by adopting the thinking chain prompt text, the query problem and the retrieved knowledge item; wherein the task description instruction includes the input field and the query question is located in the input field;
And calling an intention optimizing model to process the task according to the task description instruction, and obtaining the query problem after intention optimization.
In another specific embodiment, the processing unit, when configured to generate the task description instruction by using the thought chain prompt text, the query question and the retrieved knowledge item, may be specifically configured to:
Obtaining a context learning example, the context learning example comprising: example questions, example knowledge items, and question rewrite information; the problem rewriting information includes: based on the example knowledge item, executing a problem rewriting task on the example problem according to the plurality of steps, wherein the executing result corresponds to each step;
And generating a task description instruction by adopting the context learning example, the thinking chain prompt text, the query problem and the retrieved knowledge item.
In another specific embodiment, when the processing unit is configured to invoke the statement generation model to generate, according to the database table, a database query statement corresponding to the query problem after the intent optimization, the processing unit may be specifically configured to:
Retrieving M data fields related to the query problem after the intention optimization in the database table to obtain a field retrieval result; wherein M is a positive integer;
And calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the field retrieval result.
In another specific embodiment, when the processing unit is configured to retrieve M data fields related to the query problem after the intent optimization in the database table, to obtain a field retrieval result, the processing unit may be specifically configured to:
Acquiring a plurality of field recall strategies, wherein any field recall strategy is used for determining a correlation score between the query problem after intention optimization and each data field in the database table, and recalling H data fields from the database table according to the sequence of the correlation score from high to low, wherein H is a positive integer;
according to the multiple field recall strategies, carrying out field recall in the database table based on the query problem after intention optimization to obtain multiple field recall results; one field recall result corresponds to one field recall policy, and each field recall result comprises H data fields;
And retrieving M data fields related to the query problem after the intention optimization from the field recall results to obtain a field retrieval result.
In another embodiment, the sentence generation model is a language model; correspondingly, when the processing unit is used for calling the sentence generation model to generate the database query sentence corresponding to the query problem after the intention optimization according to the field retrieval result, the processing unit can be specifically used for:
Generating task prompt information of a sentence generation model by adopting the field retrieval result and the query problem after intention optimization; the task prompt information is used for prompting: writing a corresponding database query statement for the query problem subjected to intent optimization according to the field retrieval result;
And calling the statement generation model to perform task processing according to the task prompt information, and taking the data output by the statement generation model after performing task processing as a database query statement corresponding to the query problem after intention optimization.
In another embodiment, when the number of the database tables is plural, one database table corresponds to one field retrieval result; correspondingly, when the processing unit is used for generating the task prompt information of the sentence generation model by adopting the field retrieval result and the query problem after the intention optimization, the processing unit can be specifically used for:
acquiring attribute information of each database table, wherein the attribute information at least comprises: a primary key and a foreign key of the database table;
Generating task prompt information of a sentence generation model by adopting the attribute information of each database table, the field retrieval result of each database table and the query problem after intention optimization;
wherein, the task prompt information is used for prompting: and writing a corresponding database query statement for the query problem subjected to intent optimization according to the attribute information of each database table and the corresponding field retrieval result.
In another embodiment, the processing unit may be further configured to:
Acquiring an operation instruction aiming at the domain knowledge base, wherein the operation instruction is used for indicating at least one of the following operations: adding knowledge items in the domain knowledge base, deleting the knowledge items in the domain knowledge base, modifying the knowledge items in the domain knowledge base and searching the knowledge items in the domain knowledge base;
And executing corresponding operation on the domain knowledge base according to the operation instruction.
In another specific embodiment, after generating the database query statement corresponding to the query question after the intent optimization, the processing unit is further configured to:
based on the generated database query statement, carrying out data query in the database table to obtain a query result; generating a response answer of the query question according to the query result;
displaying the response answer and a statement check entry corresponding to the database query statement in a user interface;
and when the statement view entry is triggered, displaying the database query statement.
In yet another aspect, an embodiment of the present application provides a computer device, including an input interface and an output interface, the computer device further including:
a processor and a computer storage medium;
Wherein the processor is adapted to implement one or more instructions and the computer storage medium stores one or more instructions adapted to be loaded by the processor and to perform the method of generating a database query statement as mentioned above.
In yet another aspect, embodiments of the present application provide a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the above-mentioned method of generating a database query statement.
In yet another aspect, embodiments of the present application provide a computer program product comprising one or more instructions; the one or more instructions in the computer program product, when executed by the processor, implement the method of generating a database query statement as mentioned above.
According to the embodiment of the application, the database table and the domain knowledge base are configured for the service domain, so that after the query problem in the service domain is acquired, knowledge items related to the query problem can be retrieved from the domain knowledge base, and the query intention of the query problem is optimized based on the retrieved knowledge items by calling the intention optimizing model, so that the statement generating model is called to generate a database query statement corresponding to the query problem after intention optimization according to the database table. Therefore, the embodiment of the application realizes automatic generation of the database query statement, and when the database query statement is automatically generated, the corresponding domain knowledge can be injected, and the query intention of the query problem is optimized by combining the injected domain knowledge, so that the statement generation model can more clearly and accurately understand the query intention of the query problem, thereby generating a correct database query statement and improving the accuracy of the database query statement.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a schematic diagram of a framework process performed in a server according to an embodiment of the present application;
Fig. 1b is a schematic diagram of a collaborative execution framework process in a terminal and a server according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for generating a database query statement according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a user input query problem provided by an embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for generating a database query statement according to another embodiment of the present application;
FIG. 5a is a schematic diagram of logic for implementing a mental chain prompt text according to an embodiment of the present application;
FIG. 5b is a schematic diagram of a trigger display database query statement provided by an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a device for generating database query sentences according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
The embodiment of the application provides a database query statement generation technology, which can automatically search domain knowledge corresponding to corresponding service domains for questions (namely, query questions based on natural language input) of users in different service domains, so that database query statements which accord with query intentions of the users and can be executed are automatically generated based on the searched domain knowledge, and the accuracy of the database query statements is improved. Where a database query statement refers to a code statement that may be used to query data from a database, which may be, for example, an SQL statement. It should be emphasized that when any of the embodiments of the present application are applied to a specific product or technology, any data (e.g., questions of a user in different business fields) is collected with permission or consent of the user/web page, and the collection, use and processing of the relevant data complies with relevant laws and regulations and standards of the relevant region. Business fields mentioned in the embodiments of the present application are understood to be industry fields in which business services provided by enterprises or individuals are located, such as financial fields, medical fields, financial fields, e-commerce fields, and the like.
Specifically, the service components used by the technology at least comprise the following two components: an intent optimization component and a database query statement generation component (e.g., an SQL statement generation component). The ① intention optimizing component is used for receiving a query problem input by a user in any service field based on natural language, and carrying out intention optimization on the query problem by means of an intention optimizing model based on a field knowledge base corresponding to the corresponding service field, wherein the intention optimization refers to optimizing the query intention of the query problem so that the real query intention of the user can be more accurately understood by a help sentence generating model, and thus, data query sentences conforming to the query intention of the user can be more easily generated; ② The database query sentence generating component is used for receiving the query question after the intention optimization, determining a database table configured for the service field to which the query question belongs, and generating a database query sentence (such as SQL sentence) corresponding to the query question after the intention optimization based on the database table by means of the sentence generating model. It should be noted that, the intent optimization model and the sentence generation model mentioned herein may be integrated and deployed as one model, or may be deployed as two independent models; the model types of the intent optimization model and the sentence generation model may be the same or different, and are not limited thereto. For example, the intent optimization model and the sentence generation model may both be large language models (Large Language Model, LLM); or the intent optimization model may be other neural network models built based on AI (ARTIFICIAL INTELLIGENCE ) technology, and the sentence generation model may be a large language model; or the intent optimization model and the sentence generation model can be other neural network models constructed based on AI technology, and the like.
The AI technology refers to a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, AI is a comprehensive technique of computer science; the intelligent machine is mainly used for producing a novel intelligent machine which can react in a similar way of human intelligence by knowing the essence of the intelligence, so that the intelligent machine has multiple functions of sensing, reasoning, decision making and the like. Accordingly, AI technology is a comprehensive discipline that may include, but is not limited to, machine learning (MACHINE LEARNING, ML)/deep learning, and the like. The so-called machine learning is the core of AI, which is the fundamental approach for making computers intelligent, and is applied throughout various fields of artificial intelligence. Specifically, machine learning is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like; it is specialized in how computers simulate or implement learning behavior of humans to acquire new knowledge or skills, reorganizing existing knowledge structures to continually improve their own performance. Deep learning is a technique for machine learning by using a deep neural network system; machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, supervised learning, and the like.
Large language models may also be referred to as large language models or language models, which refer to language models (a deep learning model) having a large number of parameters (e.g., trillions or more) that are trained to understand and generate natural language text. These models may be based on a transducer architecture (a model structure that performs data processing based on an attention mechanism), pre-trained on large amounts of text data to learn complex patterns and structures of captured language; these pre-trained models may then address various Natural Language Processing (NLP) tasks by performing supervised fine Tuning (Supervised Fine-Tuning, SFT) on the task-specific dataset. By supervised fine tuning is meant the process of using the marker data to additionally train the model after it has been pre-trained, which may allow the model to perform better on a particular task or data set, thereby improving the performance of the model on a particular task. The basic steps are as follows: 1. pre-training model: training on large amounts of text data. 2. Task related data: tagged data related to a particular task is collected. 3. Fine tuning: the model parameters of the pre-trained model are fine-tuned using the collected tagged data. 4. Evaluation and application: the trimmed model is evaluated and applied to the actual problem. It can be seen that the supervised fine tuning has the advantage that the general knowledge of the pre-trained model is utilized, and the corresponding model can be adapted to specific tasks through fine tuning of model parameters, so that the data labeling cost and the training time can be reduced.
Based on the above description of the service component, the database query statement generation technique according to the embodiment of the present application may be divided into three phases:
Stage one: the domain knowledge base construction phase, which is a preparation phase before the intent optimization component is put into use. Specifically, the domain knowledge base can be pre-constructed, and meanwhile, operations such as adding, deleting, modifying and checking the domain knowledge base in the using process are supported by a user. For example, when a user asks (i.e., enters a query question based on natural language), the business domain to which the query question belongs may be considered; if the query question involves knowledge items of a particular industry (e.g., non-generic and complex index calculation methods (terms) in the financial domain and interpretation information of the index calculation methods), then the corresponding knowledge items may be entered into a domain knowledge base corresponding to the business domain to which the query question pertains, and then a question may be initiated.
Stage two: an intent optimization phase, which is a phase in which the intent optimization component is put into use. Specifically, after the user asks (i.e. the query questions are input based on natural language), the intention optimizing component can perform intention optimizing on the query intention of the query questions input by the user based on the domain knowledge base corresponding to the corresponding service domain, specifically, the knowledge items related to the query questions input by the user can be retrieved from the domain knowledge base corresponding to the corresponding service domain, and the intention optimizing model is called to optimize the query intention of the query questions input by the user in combination with the retrieved knowledge items, so that the query intention is clearer and more accurate for the sentence generating model, and the sentence generating model can generate correct database query sentences.
Stage three: a database query statement generation stage, which is a stage in which the database query statement generation component is put into use. Specifically, after receiving the query question after the intention optimization and the database table configured for the corresponding service domain, the database query statement generating component (e.g., the SQL statement generating component) may generate and return a database query statement (e.g., an SQL statement) corresponding to the query question after the intention optimization according to the database table.
In a specific implementation, the above-mentioned framework process may be performed in a computer device, where the computer device may be a terminal or a server; taking a computer device as an example of a server, a schematic diagram of the execution of three phases in the above-mentioned framework process in the server can be seen in fig. 1 a. Or the above-mentioned framework processes may be cooperatively executed in the terminal and the server, which is not limited thereto; for example, referring to fig. 1b, the first stage and the second stage in the framework process frame are performed in the terminal, and the third stage may be performed in the server, that is, in this case, the domain knowledge base may be constructed in the terminal through the first stage, after the query question input by the user is acquired by the terminal, the intent optimization is performed on the query question through the second stage, the query question after the intent optimization is sent to the server, and the service network generates a database query statement corresponding to the query question after the intent optimization through the third stage.
The above-mentioned terminals may be smart phones, computers (such as tablet computers, notebook computers, desktop computers, etc.), smart wearable devices (such as smart watches, smart glasses), smart voice interaction devices, smart home appliances (such as smart televisions), vehicle-mounted terminals, aircraft, etc.; the servers mentioned above may be independent physical servers, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms, and so on. Further, the terminal and the server may be located within or outside the blockchain network, which is not limited; furthermore, the terminal and the server can upload any data stored in the terminal and the server to the blockchain network for storage, so that the data stored in the terminal and the server are prevented from being tampered, and the data security is improved.
Based on the above related description about the database query statement generation technology, the embodiment of the application provides a database query statement generation method. The method may be performed by the above-mentioned computer device (i.e., terminal or server), or may be performed by both the terminal and server; for ease of illustration, embodiments of the application will be described with respect to computer devices performing the methods. Referring to fig. 2, the method may include the following steps S201 to S204:
S201, acquiring a query problem in the service field.
In the embodiment of the present application, the service field may include, but is not limited to, any of the following: financial domain, medical domain, financial domain, e-commerce domain, etc. The query question in the business domain may be entered by the user based on natural language, i.e. the query question may be a natural language text. It can be appreciated that the manner in which the user inputs the query questions is not limited in the embodiments of the present application; for example, consider the economic situation entitled "2023", see fig. 3: the user may open a question-answering session with an intelligent assistant (question-answering robot) to input a query question based on natural language in the question-answering session, or the user may open a web application to input a query question based on natural language in a query web page provided by the web application, and so on. Optionally, in other embodiments, the user may trigger the computer device to perform text conversion on the voice query instruction by inputting the voice query instruction in the service domain, so as to obtain the query problem in the service domain.
S202, determining a database table and a domain knowledge base configured for the service domain, and retrieving knowledge items related to the query problem from the domain knowledge base.
The database table may be preconfigured by a user who inputs a query problem, or may be preconfigured by a service provider of a query service in the business field, which is not limited. Also, the number of database tables may be one or more, and each database table may include a plurality of data fields, so-called primary keys refer to so-called data fields refer to fields storing service data. Optionally, if the database table is a partition for storing service data, the database table may further include a partition field, where the partition field refers to a field with a partition identifier, and the partition identifier is an identifier of a storage area generated by performing partition storage on the service data in the database table. Further, when the number of database tables is plural, the database tables may further have a primary key and a foreign key; the so-called primary key is a field or combination of fields for uniquely identifying each row of service data in a database table, while the foreign key is a field for establishing a relationship between different database tables, and the foreign key of one database table (database table a) may be the primary key of another database table (database table B), whereby it may be achieved that database table a references database B.
In addition, the domain knowledge base may be preconfigured by a user who inputs a query problem, or may be preconfigured by a service provider of a query service of the business domain, which is not limited. The domain knowledge base may include at least one knowledge item, one knowledge item including: interpretation information of one term and corresponding term in the business field. The term refers to a professional term in the business field, and can be a word, a phrase, a calculation formula or the like; and the interpretation information of the term refers to information for interpreting the term, the meaning of the term can be clearly known through the interpretation information. Illustratively, taking the business domain as the financial domain as an example, one knowledge item in the corresponding domain knowledge base may be "ring ratio (Sequential Comparison, or Quarter on Quarter, qoQ) refers to comparing a certain financial index with the data of the immediately previous period (usually the last month or the last quarter); in this knowledge item, "ring ratio" is a term, and "(Sequential Comparison, or Quarter on Quarter, qoQ) refers to the interpretation information that compares a certain financial index to the data of the immediately preceding period (typically the last month or quarter) is a term.
In one specific implementation, the computer device may retrieve knowledge items related to the query problem from the domain knowledge base based on a one-way recall manner, so as to improve the retrieval efficiency of the knowledge items. Specifically, the computer device may obtain a preset entry recall policy, where the entry recall policy is used to determine a relevance score between the query problem and each knowledge entry in the domain knowledge base, and recall K knowledge entries from the domain knowledge base according to a sequence of the relevance scores from high to low, where K is a positive integer. After the preset item recall strategy is obtained, item recall can be carried out in the domain knowledge base based on the query problem according to the preset item recall strategy to obtain an item recall result, and each knowledge item in the item recall result is used as a knowledge item related to the query problem. Further, the preset entry recall policy may include, but is not limited to, any of the following: text recall policies or semantic recall policies.
In another specific implementation, the computer device may retrieve knowledge items related to the query problem from the domain knowledge base based on a multi-way recall manner, so as to improve retrieval accuracy of the knowledge items. Specifically, the computer device may obtain a plurality of entry recall policies, different entry recall policies having different ways of determining relevance scores between the query questions and the knowledge entries; illustratively, the multiple entry recall policies mentioned herein may include, but are not limited to: text recall policies or semantic recall policies. After obtaining the multiple item recall policies, the computer device may perform item recall based on the query problem in the domain knowledge base according to the multiple item recall policies to obtain multiple item recall results, where one item recall result corresponds to one item recall policy, and each item recall result includes K knowledge items. After a plurality of item recall results are obtained, each knowledge item in the plurality of item recall results can be used as a knowledge item related to the query problem; or retrieving a preset number of knowledge items from the multiple item recall results, the knowledge items being related to the query problem.
Specifically, when the computer device retrieves a preset number of knowledge items from the multiple item recall results, the preset number of knowledge items may be randomly retrieved from the multiple item recall results as the knowledge items related to the query problem. Or in order to improve the accuracy of the retrieved knowledge items, the computer device may retrieve a preset number of knowledge items from the plurality of item recall results as knowledge items related to the query problem according to the order of the relevance scores corresponding to the knowledge items from high to low, but considering that the manner of determining the relevance score is different for each item recall policy, the relevance scores corresponding to the knowledge items in the item recall results corresponding to different item recall policies cannot be directly compared, based on which the computer device may traverse the plurality of item recall results, normalize the relevance scores corresponding to each knowledge item in the item recall result currently traversed, and obtain normalized scores corresponding to the corresponding knowledge item, so that the knowledge items related to the query problem are retrieved from the plurality of item recall results based on the normalized scores corresponding to each knowledge item in the plurality of item recall results, and in particular, the preset number of knowledge items are retrieved from the plurality of item recall results as knowledge items related to the query problem according to the order of the normalized scores from high to low, thereby improving the accuracy of the retrieved knowledge items. The normalization processing mode may specifically be as follows: and determining the maximum value in the correlation scores corresponding to the K knowledge items in the item recall result of the current traversal, and taking the ratio between the correlation score corresponding to each knowledge item and the maximum value as the normalized score corresponding to the corresponding knowledge item.
Based on the above description, text recall policies and semantic recall policies used in the embodiments of the present application are respectively described below, and specifically as follows:
(1) The text recall strategy mainly depends on keyword matching, the basis of the method is word frequency statistics, and knowledge items related to the query problem are searched in the domain knowledge base based on word frequency of the keywords in each knowledge item in the domain knowledge base by analyzing the keywords in the query problem; the word frequency of the keyword in the knowledge item refers to: the number of times a keyword appears in a knowledge item.
Specifically, according to the text recall strategy, the method for carrying out item recall based on the query problem in the domain knowledge base to obtain an item recall result can comprise the following steps: and s11, performing word segmentation processing on the query problem according to a text recall strategy to obtain I keywords of the query problem, wherein I is a positive integer. s12, counting word frequency of each keyword in the I keywords in the knowledge item D aiming at any knowledge item (set as the knowledge item D) in the domain knowledge base. s13, calculating a relevance score between the query problem and the knowledge item D based on the word frequency of each keyword in the knowledge item D. S14, after obtaining the relevance scores between the query problem and each knowledge item, recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low to obtain an item recall result; specifically, the knowledge items in the domain knowledge base are ordered according to the sequence of the relevance score from high to low, top-K (i.e. first K) knowledge items are recalled, and an item recall result is obtained.
Further, one specific embodiment of step s13 may be: and integrating (such as weighted summation or direct summation) the word frequency of each keyword in the knowledge item D to obtain a relevance score between the query problem and the knowledge item D. Or another embodiment of step s13 may be: the inverse document frequency (IDF value) of each keyword is calculated, and a relevance score between the query question and the knowledge item D is calculated based on the inverse document frequency of each keyword and the word frequency of the corresponding keyword in the knowledge item D. Wherein, the keyword is an index for measuring the uniqueness or information quantity of a keyword to the knowledge base of the whole field in the inverse document frequency, and the index is determined by the number of knowledge items containing the keyword in the knowledge base of the whole field; one keyword appears in many knowledge entries with low IDF values and vice versa, meaning that rare keywords typically have higher IDF values and thus have greater weight in the relevance score. It can be seen that the inverse document frequency of the keywords can be used to reduce the weight of common words and increase the weight of rare words, which can improve the accuracy of the final calculated relevance score.
Illustratively, let the ith keyword be denoted q i, i e [1, I ]; and the total number of knowledge items in the domain knowledge base is N, and the number of knowledge items containing the ith keyword is N (q i), then the calculation formula of the inverse document frequency (expressed by IDF (q i)) of the ith keyword can be shown in the following formula 1.1:
1.1
When calculating the relevance score between the query question and the knowledge item D based on the inverse document frequency of each keyword and the word frequency of the corresponding keyword in the knowledge item D, the computer device may perform product operation on the inverse document frequency of each keyword and the word frequency of the corresponding keyword in the knowledge item D, respectively, to obtain an importance measure value of each keyword, and integrate (e.g. weight summation or direct summation) the importance measures of each keyword, to obtain the relevance score between the query question and the knowledge item D. Or the following formula 1.2 is adopted to calculate a relevance score between the query question and the knowledge item D based on the inverse document frequency of each keyword and the word frequency of the corresponding keyword in the knowledge item D:
1.2
In the above equation 1.2, score (D, Q) represents a relevance score between the query question and knowledge item D, and f (Q i, D) represents the word frequency of the i-th keyword in knowledge item D; the |d| represents the length of the knowledge item D, which may be the number of characters in the knowledge item D; avgdl represents the average length of knowledge items in the knowledge domain library, where the average length is obtained by performing a mean operation on the lengths of the knowledge items; k 1 and b are adjustable parameters for controlling word frequency and saturation of knowledge items; k 1 has a value of between 1.2 and 2.0, whereas b typically has a value of 0.75.
(2) The semantic recall strategy mainly depends on semantic vector matching, the basis of the method is semantic vectors, semantic information of the query problem is represented by vectors in a high-dimensional space by focusing on understanding deep meaning and context of the query problem, the semantic vectors of the query problem are obtained, and knowledge items related to the query problem are searched in a domain knowledge base based on the semantic vectors of the query problem.
Specifically, according to the semantic recall policy, the method for performing item recall based on the query problem in the domain knowledge base to obtain an item recall result may include the following steps: s21, carrying out semantic recognition on the query problem according to a semantic recall strategy to obtain a semantic vector of the query problem; specifically, the pre-trained vector embedding model can be used for carrying out semantic recognition on the query problem, so that the query problem is converted into an embedded vector, and the converted embedded vector is used as the semantic vector of the query problem. s22, obtaining the semantic vector of each knowledge item in the domain knowledge base. s23, determining a relevance score between the query question and the corresponding knowledge item based on the vector similarity between the semantic vector of the query question and the semantic vector of each knowledge item respectively; the vector similarity mentioned here may be, for example, a cosine similarity calculated based on a cosine similarity algorithm, or may be a euclidean distance calculated based on a euclidean distance algorithm, or the like. S24, recalling K knowledge items from the domain knowledge base according to the sequence of the correlation score from high to low to obtain an item recall result; specifically, the knowledge items in the domain knowledge base are ordered according to the sequence of the relevance score from high to low, top-K (i.e. first K) knowledge items are recalled, and an item recall result is obtained.
It should be noted that, the semantic vector of any knowledge item mentioned above may be obtained by directly performing semantic recognition on the corresponding knowledge item. Or considering that there may be invalid words in the knowledge item, the so-called invalid words may also be called stop words, which refer to words in the knowledge item that may be ignored or deleted, and these words are usually frequently functional words or words without practical meaning, such as prepositions, conjunctions, articles, pronouns, etc.; because the invalid words do not greatly contribute to the semantic vectors of the knowledge items and occupy a large amount of storage space and calculation resources, the semantic vector of any knowledge item can be obtained by carrying out semantic recognition after cleaning the corresponding knowledge item in order to improve the generation efficiency of the semantic vector of the knowledge item and save the calculation resources; the cleaning process mentioned here includes: and removing invalid words in the knowledge items.
In addition, similar to the generation mode of the semantic vector of the query problem, for any knowledge item, the semantic recognition can be performed on the corresponding knowledge item by using a pre-trained vector embedding model, so that the corresponding knowledge item is converted into an embedding vector, and the converted embedding vector is used as the semantic vector of the corresponding knowledge item. In addition, the semantic vector of each knowledge item can be generated in real time in the process of item recall according to a semantic recall policy; or the semantic vector of each knowledge item can be generated in advance and stored in the index system, so that the semantic vector of each knowledge item can be directly and rapidly obtained from the index system when item recall is carried out according to a semantic recall strategy, the time cost is saved, and the efficiency of semantic recall is improved.
S203, invoking an intention optimizing model to optimize the query intention of the query problem based on the retrieved knowledge item, and obtaining the query problem after intention optimization.
In one particular implementation, the intent optimization model may be a language model. In this case, the computer device may obtain task prompt text based on the prompt engineering design; the prompt engineering refers to: when interacting with the language model, the process of the task prompt text input to the language model is designed and optimized, so that the language model can better understand task requirements and give more accurate output, and the task prompt text design with effective design can excite the performance of the large-scale language model. Specifically, task prompt text based on prompt engineering design can be used for prompting a query intention of a query problem optimization model based on retrieved knowledge items; based on this, when the computer device executes step S203, the intent optimization model may be invoked to perform task processing according to the task prompt text, so as to obtain the query problem after intent optimization.
Or the computer device may obtain a Chain of thought prompt text based on a prompt engineering and Chain of thought (CoT) design; the so-called thought chain is an improved hint strategy for improving the performance of language models in complex reasoning tasks, such as arithmetic reasoning, common sense reasoning and symbolic reasoning. The thought chain does not build hints with input-output pairs as simply as In-Context Learning (ICL), but rather introduces intermediate derivation steps that can introduce hints for the final output of the language model so that the language model can generate and output intent-optimized query questions based on the relevant hints. From this, the thinking chain is a kind of discrete prompt learning; compared with the context learning, the thinking chain has more middle deduction prompts, so that the logical reasoning capacity of the language model can be enhanced, and the accuracy of the finally generated query problem after the intention optimization is improved. The context learning mentioned here is a learning manner that does not need to fine-tune the language model, but allows the language model to learn some examples in the model reasoning phase (i.e. model application phase).
Specifically, the thought chain prompt text based on prompt engineering and thought chain design can be used for prompting to execute a problem rewriting task on a problem in an input field according to a plurality of steps, wherein the plurality of steps sequentially comprise: analyzing the questions to obtain query intents, optimizing the query intents, and rewriting the corresponding questions based on the optimized query intents and the retrieved knowledge items to obtain the questions with optimized intents. Based on this, when executing step S203, the computer device may generate the task description instruction by using the thought chain prompt text, the query problem and the retrieved knowledge item, so as to invoke the intent optimization model to perform task processing according to the task description instruction, and obtain the query problem after intent optimization. Wherein the task description instruction includes an input field and the query question is located in the input field. It should be noted that, the embodiment of the application does not limit the specific form of the thinking chain prompt text; for example, the specific form of the thought chain prompt text may be: "you are an assistant for intent optimization, requiring you to perform the problem-overwriting task in multiple steps; the first step: problem analysis, which requires you to analyze the input problem to get the query intent; and a second step of: intent optimization requires you to optimize the analyzed query intent; and a third step of: question rewrite, which requires you to rewrite the inputted question based on the optimized query intent and the retrieved knowledge item to generate an intent optimized question).
In another specific implementation, the intent optimization model may be a neural network model obtained by performing supervised training on a neural network model constructed based on AI technology using sample data. Wherein the sample data may include: the method comprises the steps of a first sample problem, knowledge items related to the first sample problem, and benchmark optimization results required for performing intent optimization on the first sample problem. Then, the manner in which the neural network model is supervised trained with the sample data to obtain the intent optimization model may be approximated as follows: invoking a neural network model to perform intent optimization on the query intention of the first sample problem based on knowledge items of sample data, and obtaining an actual optimization result; calculating a model loss value of the neural network model by using a model loss function based on the difference between the actual optimization result and a reference optimization result in the sample data, and optimizing model parameters of the neural network model according to the direction of reducing the model loss value; after the neural network model is optimized, if the neural network model is converged, the current neural network model is used as an intention optimization model, otherwise, the neural network model is continuously optimized until the neural network model is converged. It can be seen that an intention optimization model with intention optimization capability can be obtained in this way; in this case, when executing step S203, the computer device may directly input the query question and the retrieved knowledge item into the intent optimization model, so that the intent optimization model optimizes the query intent of the query question based on the received knowledge item, resulting in the query question after intent optimization.
The model loss function ① may be set according to the actual requirement or the empirical value, which is not limited. ② The above-mentioned direction according to the reduction of the model loss value means: model optimization direction with minimum model loss value as target; model optimization is performed in the direction, so that the model loss value generated by the neural network model again after each optimization is smaller than the model loss value generated by the neural network model before the optimization. For example, the model loss value obtained by this calculation is 0.85, and then the model loss value generated by the neural network model after this optimization should be less than 0.85 after optimizing the model parameters of the neural network model in the direction of reducing the model loss value. ③ The above mentioned convergence of the neural network model means: the training times of the neural network model reach preset times; or the neural network model reaching convergence means that: the model parameters of the neural network model are not changed any more, or the change amplitude of the model parameters is smaller than a threshold value; or the neural network model reaching convergence means that: the model loss value of the neural network model is no longer reduced, or the reduction amplitude of the model loss value is smaller than a threshold value, or the like.
S204, calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table.
In a specific implementation, the computer device may directly call the statement generation model to generate a database query statement corresponding to the query problem after the intent optimization according to the database table. Or considering that the database table may be an excessively long database table (such as a wide table containing hundreds of data fields), and only a part of data fields may exist in the database table and are related to the query problem after the intent optimization, so in order to improve the generation efficiency of the database query statement, when executing step S204, the computer device may search out M data fields related to the query problem after the intent optimization in the database table to obtain a field search result, thereby calling the statement generation model to generate the database query statement corresponding to the query problem after the intent optimization according to the field search result. The field retrieval result comprises M retrieved data fields, wherein M is a positive integer.
Wherein the sentence generation model mentioned above may be a language model. In this case, when the computer device invokes the sentence generation model to generate a database query sentence corresponding to the query problem after the intention is optimized according to the database table (or the field search result), the database table (or the field search result) and the query problem after the intention is optimized may be adopted to generate task prompt information of the sentence generation model, where the task prompt information is used for prompting: writing a corresponding database query statement for the query problem after intention optimization according to the database table (or the field retrieval result); therefore, the statement generation model can be called to process tasks according to the task prompt information, and data output by the statement generation model after the task processing is used as database query statements corresponding to the query questions after intention optimization. It can be seen that the task of generating database query statements is accomplished in this case through a language model and prompt engineering. Alternatively, to improve the quality of database query statements, the language model selected as the statement generation model may satisfy two conditions: the method has the understanding capability on the complex natural language instruction and has stronger logic reasoning and code generation capability; based on the above, the embodiment of the application can select the language model of the class of the large model generated by using the universal code as the sentence generating model on the model large class selection.
Or the sentence generation model mentioned above may be a neural network model obtained by supervised training of a neural network model constructed based on AI technology using training data. Wherein the training data may include: the second sample question, the sample field set and the annotation database query statement corresponding to the second sample question; the sample field set referred to herein may include all data fields in the database table, or may include data fields in the database table that are related to the second sample problem, without limitation. Then, the manner in which the neural network model is supervised trained with the training data to obtain the sentence-generating model may be approximately as follows: invoking a neural network model, writing a database query statement for the second sample problem according to the sample field set in the training data, and obtaining a target database query statement; calculating a model loss value of the neural network model by adopting a model loss function based on the difference between the target database query statement and the labeling database query statement in the training data, and optimizing model parameters of the neural network model according to the direction of reducing the model loss value; after the neural network model is optimized, if the neural network model is converged, the current neural network model is used as a statement generation model, otherwise, the neural network model is continuously optimized until the neural network model is converged. Therefore, a sentence generation model with sentence generation capability can be obtained in this way; in this case, when the sentence generation model is called to generate a database query sentence corresponding to the query question after intention optimization according to the database table (or the field search result), the computer device may directly input the query question after intention optimization and the database table (or the field search result) into the sentence generation model, so that the sentence generation model writes and generates a corresponding database query sentence for the query question after intention optimization based on the database table (or the field search result).
According to the embodiment of the application, the database table and the domain knowledge base are configured for the service domain, so that after the query problem in the service domain is acquired, knowledge items related to the query problem can be retrieved from the domain knowledge base, and the query intention of the query problem is optimized based on the retrieved knowledge items by calling the intention optimizing model, so that the statement generating model is called to generate a database query statement corresponding to the query problem after intention optimization according to the database table. Therefore, the embodiment of the application realizes automatic generation of the database query statement, and when the database query statement is automatically generated, the corresponding domain knowledge can be injected, and the query intention of the query problem is optimized by combining the injected domain knowledge, so that the statement generation model can more clearly and accurately understand the query intention of the query problem, thereby generating a correct database query statement and improving the accuracy of the database query statement.
Based on the related description of the method embodiment shown in fig. 2, the embodiment of the application further provides another method for generating database query sentences; the method considers that in reality, the same term may correspond to different meanings under different business scenarios, while the language model itself is static, which may provide false, outdated or generic information without knowing about the specific business scenario, thus introducing search enhancement generation (a technique that uses information from private or proprietary data sources to assist in text generation), introducing domain knowledge of the specific business scenario into the language model, keeping it relevant, accurate and practical. The method for generating the database query statement according to the embodiment of the present application is described below by taking a computer device as an execution body as an example. Referring to fig. 4, the method may include the following steps S401 to S407:
S401, acquiring a query problem in the service field.
S402, determining a database table and a domain knowledge base configured for the service domain, and retrieving knowledge items related to the query problem from the domain knowledge base.
Optionally, in order to realize scene migration and knowledge maintenance, the embodiment of the application can support any user (such as a user who inputs a query problem or a service provider of a query service in a business field) to perform operations such as adding, deleting, modifying and searching on a knowledge base in the field. In particular, the computer device may obtain operation instructions for the domain knowledge base, where the operation instructions are configured to instruct at least one of the following operations: adding knowledge items in the domain knowledge base, deleting knowledge items in the domain knowledge base, modifying knowledge items in the domain knowledge base and searching the knowledge items in the domain knowledge base. Further, the computer device can execute corresponding operation on the domain knowledge base according to the operation instruction. Wherein one knowledge item comprises: interpretation information of one term and corresponding term in the business field.
It should be noted that, in the embodiment of the present application, the time point of acquiring the operation instruction is not limited, that is, the user may input the operation instruction at any time point, so as to trigger the computer device to execute the corresponding operation on the domain knowledge base according to the operation instruction. For example, when a user inputs a query problem, if the target knowledge item related to the query problem is found to be lacking in the domain knowledge base, an operation instruction for instructing to newly add the target knowledge item in the domain knowledge base may be input, so that the computer device may add the target knowledge item to the domain knowledge base based on the operation instruction. For another example, after the user inputs the query question, the computer device displays a response answer corresponding to the query question on the user interface based on the subsequent steps S403 to S407, if the user finds that the response answer has an error, the user can check whether the domain knowledge base has an error knowledge item, and after checking the error knowledge item, an operation instruction for indicating to modify the error knowledge item is input, so that the computer device can modify the corresponding knowledge item based on the operation instruction. For another example, when there is a need for the user to migrate the service scenario (e.g., migrate from the service scenario a to the service scenario B), a corresponding operation instruction may also be input, so that the computer device performs a corresponding operation on the domain knowledge base based on the operation instruction, so as to update the knowledge item in the domain knowledge base to the knowledge item related to the service scenario B.
S403, calling an intention optimizing model to optimize the query intention of the query problem based on the retrieved knowledge item, and obtaining the query problem after intention optimization.
In an embodiment of the present application, the intent optimization model is a language model that can be understood as an intent optimization agent based on the language model; by intent optimization Agent is meant an Agent (Agent) that optimizes the query intent of the query intent, while an Agent is an important concept in the art of artificial intelligence, which refers to a system that is capable of autonomously perceiving, independently thinking, and performing the corresponding actions. In a specific implementation of step S403, the computer device may obtain a mental chain prompt text, which may be used to prompt: executing a problem rewriting task on the problem in the input field according to a plurality of steps; wherein, the steps include in order: analyzing the questions to obtain query intents, optimizing the query intents, and rewriting the corresponding questions based on the optimized query intents and the retrieved knowledge items to obtain the questions with optimized intents.
It should be noted that, for the step of "analyzing a problem to obtain a query intention" among the above-mentioned steps, only the query intention of the problem may be analyzed; the query intention of the problem can be analyzed while the key components of the problem are analyzed, so that the query intention of the query problem can be optimized by focusing on knowledge items related to the key components, and the effect of intention optimization can be improved. In this case, when analyzing the problem, the key component of the problem is also available; the specific way of the above-mentioned step of "rewriting the corresponding question based on the optimized query intent and the retrieved knowledge item to get the intent-optimized question" among the steps may be: comparing the key component with the retrieved knowledge item to obtain a comparison result, wherein the comparison result is used for indicating the knowledge item related to the key component; and rewriting corresponding questions based on the optimized query intention and the comparison result to obtain questions with optimized intention.
In addition, for the step of "optimizing the query intent" among the above-mentioned steps, the query intent may be optimized by the intent optimization model according to the information of the query question itself. Or the embodiment of the application can also introduce the historical query record of the user into the thinking chain prompt text, so that the computer equipment can more accurately optimize the query intention by referring to and analyzing the historical query record, thereby further improving the effect of optimizing the intention; that is, in this case, the specific manner of the step of "optimizing the query intent" may be: optimizing the query intention according to the historical query record. Wherein, if the query question is entered in the question-answer session, the historical query record may include: a question input in history in a question-answer session; if the query question is entered in the query web page, the historical query record may include: questions entered in the query web page over a historical period of time (e.g., the first 1 week or the first 1 day). It can be appreciated that if the query question is a question entered for the first time in a question-and-answer session, or if no question is entered in the query web page within a historical period of time, then the historical query record is empty.
To sum up, fig. 5a illustrates an implementation logic of the mental chain prompt problem, in which case the specific form of the mental chain prompt text may be illustrated as follows: "you are an assistant to problem-rewrite, requiring you to perform the problem-rewrite task in four steps; the first step: question analysis, which requires you to understand the query intent and key components of the entered question; and a second step of: knowledge comparison, namely, key components in the query problem are required to be compared with the retrieved knowledge items, so that terms and descriptions are ensured to accord with the current service scene; and a third step of: historical query record analysis, namely, the historical query record before being referred and analyzed is needed, and the query intention of the query problem is optimized according to the historical query record; fourth step: question rewrite, which requires you to rewrite the inputted question based on the results of the first three steps to generate an intent optimized question).
After the thinking chain prompt text is obtained based on the mode, the computer equipment can generate a task description instruction by adopting the thinking chain prompt text, the query problem and the retrieved knowledge item, so that an intention optimization model is called to perform task processing according to the task description instruction, and the query problem after intention optimization is obtained. Wherein the task description instruction includes an input field and the query question is located in the input field. In the process of generating the task description instruction, the computer equipment can directly splice the thinking chain prompt text, the query problem and the retrieved knowledge item to obtain the task description instruction. Or the computer equipment can acquire the context learning example and generate a task description instruction by adopting the context learning example, the thinking chain prompt text, the query problem and the retrieved knowledge item; under the specific implementation, through the thinking chain prompt and the context learning example, the intention optimization model can combine the domain knowledge of specific services and analyze the query intention of the query problem, so that the query problem input by the user is optimized more clearly and accurately.
Among other things, context learning examples may include: example questions, example knowledge items, and question rewrite information; the problem rewrite information mentioned here includes: and executing the problem rewriting task on the example problem according to a plurality of steps based on the example knowledge item, wherein each step corresponds to an execution result. It can be appreciated that where query intent needs to be optimized based on historical query records, the contextual learning examples may further include: example query records. Taking the example that the plurality of steps includes four steps shown in fig. 5a, the execution result of the first step includes: query intent and key components of example questions; the execution result of the second step comprises: comparing the key components with the example knowledge items to obtain a comparison result; the execution result of the third step includes: optimizing the query intention based on the historical query record to obtain an optimized query intention; the execution result of the fourth step comprises: an example problem after optimization is intended. For example, assuming the example problem of "how much the revenue ring ratio of customer a changed from the last year," one particular form of the contextual learning example may be as follows:
"example knowledge item: 1. the loop ratio refers to comparing a certain financial index with data of an immediately previous period (typically the last month or the last quarter). The ring ratio growth rate shows the ratio of change between two consecutive periods for analysis of short-term trends and periodic changes; 2. revenue generally refers to post-tax revenue; 3. the customer identification refers to the customer name.
Example query records: and no.
Example problem: the revenue ring ratio of customer a changes somewhat from the last year.
Question rewrite information:
The result of the execution of the first step (i.e. query intent and key components): the query intention is to query the client a for the change of the annual ring ratio in the last year, and key components include: "customer a", "benefit", "loop ratio".
The result of the second step (i.e. knowledge comparison result): "customer a" corresponds to knowledge item 3, "benefit" corresponds to knowledge item 2, and "loop ratio" corresponds to knowledge item 1.
The execution result of the third step (i.e., the optimized query intent): the rate of change of the current year's revenue for query client a compared to the last year.
The execution result of the fourth step (i.e., the query problem after intent optimization): query what the rate of change of the total post-tax revenue for the customer with the customer name "a" this year is compared to the last year.
S404, retrieving M data fields related to the query problem after intention optimization in a database table to obtain a field retrieval result.
In a specific implementation, the computer device may treat the data fields in the database table as knowledge items, and treat all the data fields in a single database table as domain knowledge bases, so that the data fields in the database table related to the query problem after the intention optimization are searched out by using a recall technique similar to the single-pass recall method or the multi-pass recall method mentioned above with respect to the knowledge items, so as to perform field compaction on the database table.
Taking a recall technique similar to the multi-way recall method about knowledge items described above as an example, when executing step S404, the computer device may acquire a plurality of field recall policies, any one of which is used to determine a relevance score between the query question after intent optimization and each data field in the database table, and recall H data fields from the database table in order of the relevance score from high to low, where H is a positive integer; the specific implementation of the multiple field recall policies mentioned herein is similar to the specific implementation of the entry recall policies mentioned previously, with the different field recall policies differing in the manner in which the relevance scores between the query questions and the data fields are determined. For example, a field recall policy may determine a relevance score between a query question and a data field based on a relevance algorithm (BM 25, BM25+, or TF (word frequency) -IDF (inverse document frequency)); another field recall policy may determine a relevance score between the query question and the data field based on a vector similarity between the semantic vector of the query question and the semantic vector of the data field, and so on.
After acquiring the multiple field recall policies, carrying out multiple-way recall on the query problem optimized based on the intention in a database table according to the multiple field recall policies to obtain multiple field recall results; one field recall result corresponds to one field recall policy, each field recall result including H data fields. Furthermore, M data fields related to the query problem after the intention optimization can be retrieved from the multiple field recall results to obtain a field retrieval result, wherein M is a positive integer. Specifically, for example, a plurality of field recall results may be traversed, and the relevance scores corresponding to the data fields in the currently traversed field recall results are normalized to obtain normalized scores corresponding to the corresponding data fields, so that M data fields related to the query problem after the intent optimization are retrieved from the plurality of field recall results based on the normalized scores corresponding to the data fields in the plurality of field recall results.
The above description is given by taking one database table as an example. In practical applications, the number of database tables may be one or more; when the number of the database tables is multiple, the computer device may search M data fields related to the query problem after the intent optimization in each database table based on the above manner, to obtain multiple field search results, that is, one database table corresponds to one field search result.
S405, calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the field retrieval result.
In the embodiment of the application, the sentence generating model is a language model. The specific implementation manner of step S405 may be: generating task prompt information of a sentence generating model by adopting field retrieval results and query problems after intention optimization; the task prompt information is used for prompting: and writing a corresponding database query statement for the query problem subjected to the intent optimization according to the field retrieval result. Then, the computer device can call the sentence generation model to perform task processing according to the task prompt information, and the data output by the sentence generation model after performing task processing is used as a database query sentence corresponding to the query problem after intention optimization.
Optionally, when the number of the database tables is multiple, in order to facilitate the statement generation model to generate complex database query statements with multiple table connections, the computer device may acquire attribute information of each database table when the task prompt information of the statement generation model is generated by adopting the field search result and the query problem after intention optimization. Wherein the attribute information at least includes: a primary key and a foreign key of the database table; when the database table includes a partition field, the attribute information may also include a partition field. Then, the computer equipment can adopt attribute information of each database table, a field retrieval result of each database table and the query problem after intention optimization to generate task prompt information of a sentence generation model; in this case, the task prompt information may be used to prompt: and writing a corresponding database query statement for the query problem after the intention optimization according to the attribute information of each database table and the corresponding field retrieval result.
Optionally, considering that the data output by the computer device after the task processing is performed on the sentence generating model is used as the database query sentence corresponding to the query problem after the intention optimization, and the sentence generating model may have an unstable factor, so that the model may output additional information finally, in order to improve the accuracy of the database query sentence, the embodiment of the present application may further set: the task prompt information is also used for prompting the sentence generation model to output the written database query sentence after task processing, and forbids to output the interpretation information related to the corresponding database query sentence, wherein the interpretation information refers to the information used for interpreting the generation mode or the generation process of the database query sentence. It can be seen that in this manner, the sentence generation model can be defined to output only strings that conform to the database query language format (e.g., SQL format), such that the resulting database query sentence does not contain other interpretations or references.
S406, based on the generated database query statement, carrying out data query in the database table to obtain a query result; and generating a response answer to the query question according to the query result.
In a specific implementation, the computer device may invoke the generated database query statement to perform data query in the database table, thereby obtaining a query result. Further, the computer device may directly use the query result as a response answer to the query question; or in order to promote the intuitiveness of the query result, the computer equipment can draw a chart based on the query result, so that the drawn chart is used as a response answer of the query question. It should be understood that the specific implementation of generating a response answer to a query question based on the query result is merely illustrative and not limiting.
S407, displaying response answers and statement check entries corresponding to the database query statements in a user interface; when the statement view entry is triggered, the database query statement is displayed.
Wherein the sentence viewing portal may specifically be a button or component. After the computer device displays the response answer and the sentence viewing portal, the user may perform a triggering operation (e.g., a clicking operation, a pressing operation, an operation of inputting a preset voice command, etc.) on the sentence viewing portal to trigger the sentence viewing portal, thereby causing the computer device to display the database query sentence. For example, take a database query statement as an SQL statement, see fig. 5 b: after the user inputs the query question (e.g. "2023 trending of revenue"), the computer device may automatically complete the whole process of domain knowledge base retrieval, intent optimization, and SQL statement generation based on the relevant description of the previous steps, and complete SQL statement execution and result visualization on the user interface side. The whole process does not need user intervention, and is not perceived by a user; after waiting for a period of time, the user can view the data query results and the analysis process at the user interface; meanwhile, the user interface provides the SQL button 51 (i.e. statement viewing entrance), after the user clicks the SQL button 51, the user can see the corresponding SQL statement 52, the SQL statement 52 accords with ClickHouse grammar, clickHouse is a column database management system (DBMS) for online analysis (OLAP), and according to the user requirement, the SQL statement screens out 2023 data, aggregates the benefit fields according to month, and returns the change trend of 2023 total monthly benefits.
Based on the description, the embodiment of the application provides a general domain knowledge base construction and retrieval scheme, and supports individual or enterprise users to upload the domain knowledge base, so that domain knowledge can be injected when database query sentences are generated, and the method is better suitable for diversified questions of users by enhancing retrieval generation and intention optimization, and generates database query sentences which are more in line with the intention of the users. In addition, the domain knowledge base in the embodiment of the application supports the addition, deletion, modification and investigation of knowledge, and is flexible and pluggable, so that different service scenes can be accessed quickly and knowledge change under the same scene can be automatically adapted. In addition, the base model (language model) adopted by the embodiment of the application is an open source commercial model, so that privately-arranged local computer equipment can be realized, and privacy leakage and data security risks are avoided. In addition, the embodiment of the application reasonably disassembles the database generating task, namely disassembles the database generating task into a knowledge base in the search field, generates the query problem after the intention optimization, generates the database query statement and the like, so that the time delay and the cost caused by multi-step reasoning can be reduced, and the feedback of the database query statement executed in the database table is not required to be acquired when the database query statement is generated, so that the complexity of generating the database query statement and the time expenditure caused by executing the database query statement can be reduced, and the generating efficiency of the database query statement is improved.
It should be emphasized that the method for generating the database query statement provided by the embodiment of the present application has been proved on the target platform, and the target platform aggregates multiple real service data sets including travel and business analysis, and the detailed test results are shown in the following table 1:
TABLE 1
In table 1, text2SQL (base) refers to a scheme using basic Text-to-SQL technology, which refers to a process of converting a natural language query into a Structured Query Language (SQL), and the scheme using basic Text-to-SQL technology directly inputs a query question into a Text-to-SQL model, which converts the query question into an SQL statement. GPT-3.5 and GPT-4 are both existing language models, and RAG-Text2SQL (units) refers to the field knowledge of the fusion service field and the scheme for enhancing the retrieval. As can be seen from the above Table 1, when only the basic text-to-SQL technique is applied, the execution accuracy of the generated SQL statement is 47.2%; however, by the adoption of the domain knowledge of the fusion service domain and the scheme of utilizing the retrieval enhancement, which are provided by the embodiment of the application, the execution accuracy of SQL sentences can be obviously improved to 60.2%, and compared with the basic text-to-SQL technology, the execution accuracy of SQL sentences is relatively improved by 27.5%. In addition, the result not only obviously exceeds the accuracy of 43.5% of the GPT-3.5 model, but also is better than the accuracy of 56.5% of the GPT-4 model, and a better landing effect is obtained.
Based on the description of the embodiment of the method for generating the database query statement, the embodiment of the application also discloses a device for generating the database query statement; the means for generating a database query statement may be a computer program (comprising one or more instructions) running on a computer device, and the means for generating a database query statement may perform the steps of the method flow shown in fig. 2 or fig. 4. Referring to fig. 6, the generating device of the database query statement may operate as follows:
an obtaining unit 601, configured to obtain a query question in a service field, where the query question is a natural language text;
The obtaining unit 601 is further configured to determine a database table configured for the service domain and a domain knowledge base, where the domain knowledge base includes at least one knowledge item, and one of the knowledge items includes: interpretation information of one term and corresponding term in the business field;
The processing unit 602 is configured to retrieve a knowledge item related to the query question from the domain knowledge base, and invoke an intent optimization model to perform intent optimization on a query intent of the query question based on the retrieved knowledge item, so as to obtain a query question after the intent optimization;
The processing unit 602 is further configured to invoke a statement generation model to generate a database query statement corresponding to the query problem after the intent optimization according to the database table.
In one embodiment, the processing unit 602, when configured to retrieve knowledge items related to the query question from the domain knowledge base, may be specifically configured to:
Acquiring a plurality of item recall strategies, wherein the item recall strategies are used for determining the relevance scores between the query problem and each knowledge item in the domain knowledge base, and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low, wherein K is a positive integer; different entry recall policies differ in the manner in which the relevance scores between the query questions and the knowledge entries are determined;
According to the multiple item recall strategies, carrying out item recall based on the query problem in the domain knowledge base to obtain multiple item recall results; one item recall result corresponds to one item recall policy, and each item recall result comprises K knowledge items;
traversing the plurality of item recall results, and carrying out normalization processing on the relevance scores corresponding to the knowledge items in the item recall results of the current traversal to obtain normalization scores corresponding to the corresponding knowledge items;
And retrieving knowledge items related to the query problem from the plurality of item recall results based on the normalized scores corresponding to the knowledge items in the plurality of item recall results.
In another embodiment, the plurality of item recall policies includes a semantic recall policy; correspondingly, when the processing unit 602 is configured to perform item recall based on the query problem in the domain knowledge base according to the semantic recall policy to obtain an item recall result, the processing unit may be specifically configured to:
According to the semantic recall strategy, carrying out semantic recognition on the query problem to obtain a semantic vector of the query problem;
Acquiring semantic vectors of each knowledge item in the domain knowledge base, wherein the semantic vector of any knowledge item is obtained by carrying out semantic recognition after cleaning the corresponding knowledge item; the cleaning process includes: removing invalid words in the knowledge items;
Determining a relevance score between the query question and the corresponding knowledge item based on vector similarity between the semantic vector of the query question and the semantic vector of each knowledge item, respectively;
and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low to obtain an item recall result.
In another embodiment, the intent optimization model is a language model; accordingly, the processing unit 602 is configured to, when configured to invoke an intent optimization model to optimize a query intent of the query question based on the retrieved knowledge item, obtain an intent-optimized query question, and include:
Acquiring a thinking chain prompt text, wherein the thinking chain prompt text is used for prompting: executing a problem rewriting task on the problem in the input field according to a plurality of steps; the steps sequentially comprise: analyzing the problem to obtain a query intention, optimizing the query intention, and rewriting the corresponding problem based on the optimized query intention and the retrieved knowledge item to obtain the problem with optimized intention;
Generating a task description instruction by adopting the thinking chain prompt text, the query problem and the retrieved knowledge item; wherein the task description instruction includes the input field and the query question is located in the input field;
And calling an intention optimizing model to process the task according to the task description instruction, and obtaining the query problem after intention optimization.
In another embodiment, when analyzing the problem, key components of the problem are also obtained;
the optimizing the query intent includes: optimizing the query intention according to the historical query record; wherein the query questions are entered in a question-and-answer session, and the historical query records include: a question input in history in the question-answering session;
The writing the corresponding problem based on the optimized query intention and the retrieved knowledge item to obtain the problem with optimized intention comprises the following steps: comparing the key component with the retrieved knowledge item to obtain a comparison result, wherein the comparison result is used for indicating the knowledge item related to the key component; and rewriting corresponding questions based on the optimized query intention and the comparison result to obtain questions with optimized intention.
In another embodiment, the processing unit 602, when configured to generate the task description instruction using the mental chain prompt text, the query question, and the retrieved knowledge item, may be specifically configured to:
Obtaining a context learning example, the context learning example comprising: example questions, example knowledge items, and question rewrite information; the problem rewriting information includes: based on the example knowledge item, executing a problem rewriting task on the example problem according to the plurality of steps, wherein the executing result corresponds to each step;
And generating a task description instruction by adopting the context learning example, the thinking chain prompt text, the query problem and the retrieved knowledge item.
In another specific embodiment, when the processing unit 602 is configured to invoke a statement generation model to generate, according to the database table, a database query statement corresponding to the query problem after the intent optimization, the processing unit may be specifically configured to:
Retrieving M data fields related to the query problem after the intention optimization in the database table to obtain a field retrieval result; wherein M is a positive integer;
And calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the field retrieval result.
In another embodiment, the processing unit 602, when configured to retrieve M data fields related to the query question after the intent optimization in the database table, may be specifically configured to:
Acquiring a plurality of field recall strategies, wherein any field recall strategy is used for determining a correlation score between the query problem after intention optimization and each data field in the database table, and recalling H data fields from the database table according to the sequence of the correlation score from high to low, wherein H is a positive integer;
according to the multiple field recall strategies, carrying out field recall in the database table based on the query problem after intention optimization to obtain multiple field recall results; one field recall result corresponds to one field recall policy, and each field recall result comprises H data fields;
And retrieving M data fields related to the query problem after the intention optimization from the field recall results to obtain a field retrieval result.
In another embodiment, the sentence generation model is a language model; correspondingly, when the processing unit 602 is configured to invoke the sentence generation model to generate the database query sentence corresponding to the query problem after the intent optimization according to the field retrieval result, the processing unit may be specifically configured to:
Generating task prompt information of a sentence generation model by adopting the field retrieval result and the query problem after intention optimization; the task prompt information is used for prompting: writing a corresponding database query statement for the query problem subjected to intent optimization according to the field retrieval result;
And calling the statement generation model to perform task processing according to the task prompt information, and taking the data output by the statement generation model after performing task processing as a database query statement corresponding to the query problem after intention optimization.
In another embodiment, when the number of the database tables is plural, one database table corresponds to one field retrieval result; correspondingly, when the processing unit 602 is configured to generate the task prompt information of the sentence generation model by using the field search result and the query problem after the intent optimization, the processing unit may be specifically configured to:
acquiring attribute information of each database table, wherein the attribute information at least comprises: a primary key and a foreign key of the database table;
Generating task prompt information of a sentence generation model by adopting the attribute information of each database table, the field retrieval result of each database table and the query problem after intention optimization;
wherein, the task prompt information is used for prompting: and writing a corresponding database query statement for the query problem subjected to intent optimization according to the attribute information of each database table and the corresponding field retrieval result.
In another embodiment, the task prompt information is further used for prompting the sentence generation model to output the written database query sentence after task processing, and forbidding to output the interpretation information related to the corresponding database query sentence.
In another embodiment, the processing unit 602 may be further configured to:
Acquiring an operation instruction aiming at the domain knowledge base, wherein the operation instruction is used for indicating at least one of the following operations: adding knowledge items in the domain knowledge base, deleting the knowledge items in the domain knowledge base, modifying the knowledge items in the domain knowledge base and searching the knowledge items in the domain knowledge base;
And executing corresponding operation on the domain knowledge base according to the operation instruction.
In another embodiment, after generating the database query statement corresponding to the query question after the intent optimization, the processing unit 602 is further configured to:
based on the generated database query statement, carrying out data query in the database table to obtain a query result; generating a response answer of the query question according to the query result;
displaying the response answer and a statement check entry corresponding to the database query statement in a user interface;
and when the statement view entry is triggered, displaying the database query statement.
According to another embodiment of the present application, each unit in the generating device of the database query statement shown in fig. 6 may be separately or completely combined into one or several other units, or some (some) units may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the database query statement-based generating device may also include other units, and in practical applications, these functions may also be implemented with assistance by other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, the generation apparatus device of the database query term as shown in fig. 6 may be constructed by running a computer program (including one or more instructions) capable of executing the steps involved in the respective methods as shown in fig. 2 or fig. 4 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and the generation method of the database query term of the embodiment of the present application may be implemented. The computer program may be recorded on, for example, a computer readable storage medium, and loaded into and executed by the computing device described above.
It should be noted that, in the embodiment of the present application, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function, and works together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may comprise a portion of the overall module or unit functionality of that module or unit.
According to the embodiment of the application, the database table and the domain knowledge base are configured for the service domain, so that after the query problem in the service domain is acquired, knowledge items related to the query problem can be retrieved from the domain knowledge base, and the query intention of the query problem is optimized based on the retrieved knowledge items by calling the intention optimizing model, so that the statement generating model is called to generate a database query statement corresponding to the query problem after intention optimization according to the database table. Therefore, the embodiment of the application realizes automatic generation of the database query statement, and when the database query statement is automatically generated, the corresponding domain knowledge can be injected, and the query intention of the query problem is optimized by combining the injected domain knowledge, so that the statement generation model can more clearly and accurately understand the query intention of the query problem, thereby generating a correct database query statement and improving the accuracy of the database query statement.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides a computer device. Referring to fig. 7, the computer device includes at least a processor 701, an input interface 702, an output interface 703, and a computer storage medium 704. Wherein the processor 701, input interface 702, output interface 703, and computer storage medium 704 within a computer device may be connected by a bus or other means. The computer storage medium 704 may be stored in a memory of a computer device, the computer storage medium 704 being configured to store a computer program, the computer program comprising one or more instructions, the processor 701 being configured to execute one or more instructions of the computer program stored by the computer storage medium 704. The processor 701, or CPU (Central Processing Unit )), is a computing core and a control core of a computer device, which is adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement a corresponding method flow or a corresponding function.
In one embodiment, the processor 701 according to the embodiment of the present application may be configured to perform a series of processing on the query questions to automatically generate corresponding database query sentences, and specifically includes: acquiring a query problem in the service field, wherein the query problem is a natural language text; determining a database table configured for the business domain and a domain knowledge base, wherein the domain knowledge base comprises at least one knowledge item, and one knowledge item comprises: interpretation information of one term and corresponding term in the business field; retrieving knowledge items related to the query problem from the domain knowledge base, and calling an intention optimization model to perform intention optimization on the query intention of the query problem based on the retrieved knowledge items to obtain the query problem after intention optimization; invoking a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table, and the like.
The embodiment of the application also provides a computer storage medium (Memory), which is a Memory device in the computer device and is used for storing computer programs and data. It is understood that the computer storage media herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer storage media provides storage space that stores an operating system of the computer device. Also stored in the memory space is a computer program comprising one or more instructions, which may be one or more program codes, adapted to be loaded and executed by the processor 701. The computer storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; alternatively, it may be at least one computer storage medium located remotely from the aforementioned processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by a processor to implement the corresponding steps in the method embodiments described above with respect to FIG. 2 or FIG. 4; in particular implementations, one or more instructions in a computer storage medium may be loaded by a processor and perform the steps of:
acquiring a query problem in the service field, wherein the query problem is a natural language text;
determining a database table configured for the business domain and a domain knowledge base, wherein the domain knowledge base comprises at least one knowledge item, and one knowledge item comprises: interpretation information of one term and corresponding term in the business field;
retrieving knowledge items related to the query problem from the domain knowledge base, and calling an intention optimization model to perform intention optimization on the query intention of the query problem based on the retrieved knowledge items to obtain the query problem after intention optimization;
And calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table.
In one embodiment, the one or more instructions may be loaded and executed by the processor in particular when retrieving knowledge items related to the query question from the domain knowledge base:
Acquiring a plurality of item recall strategies, wherein the item recall strategies are used for determining the relevance scores between the query problem and each knowledge item in the domain knowledge base, and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low, wherein K is a positive integer; different entry recall policies differ in the manner in which the relevance scores between the query questions and the knowledge entries are determined;
According to the multiple item recall strategies, carrying out item recall based on the query problem in the domain knowledge base to obtain multiple item recall results; one item recall result corresponds to one item recall policy, and each item recall result comprises K knowledge items;
traversing the plurality of item recall results, and carrying out normalization processing on the relevance scores corresponding to the knowledge items in the item recall results of the current traversal to obtain normalization scores corresponding to the corresponding knowledge items;
And retrieving knowledge items related to the query problem from the plurality of item recall results based on the normalized scores corresponding to the knowledge items in the plurality of item recall results.
In another embodiment, the plurality of item recall policies includes a semantic recall policy; correspondingly, when the item recall is performed in the domain knowledge base based on the query problem according to the semantic recall policy to obtain an item recall result, the one or more instructions may be loaded and specifically executed by the processor:
According to the semantic recall strategy, carrying out semantic recognition on the query problem to obtain a semantic vector of the query problem;
Acquiring semantic vectors of each knowledge item in the domain knowledge base, wherein the semantic vector of any knowledge item is obtained by carrying out semantic recognition after cleaning the corresponding knowledge item; the cleaning process includes: removing invalid words in the knowledge items;
Determining a relevance score between the query question and the corresponding knowledge item based on vector similarity between the semantic vector of the query question and the semantic vector of each knowledge item, respectively;
and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low to obtain an item recall result.
In another embodiment, the intent optimization model is a language model; accordingly, when invoking the intent optimization model to optimize the query intent of the query question based on the retrieved knowledge items to obtain the intent-optimized query question, the one or more instructions may be loaded by the processor and executed in particular:
Acquiring a thinking chain prompt text, wherein the thinking chain prompt text is used for prompting: executing a problem rewriting task on the problem in the input field according to a plurality of steps; the steps sequentially comprise: analyzing the problem to obtain a query intention, optimizing the query intention, and rewriting the corresponding problem based on the optimized query intention and the retrieved knowledge item to obtain the problem with optimized intention;
Generating a task description instruction by adopting the thinking chain prompt text, the query problem and the retrieved knowledge item; wherein the task description instruction includes the input field and the query question is located in the input field;
And calling an intention optimizing model to process the task according to the task description instruction, and obtaining the query problem after intention optimization.
In another embodiment, when analyzing the problem, key components of the problem are also obtained;
the optimizing the query intent includes: optimizing the query intention according to the historical query record; wherein the query questions are entered in a question-and-answer session, and the historical query records include: a question input in history in the question-answering session;
The writing the corresponding problem based on the optimized query intention and the retrieved knowledge item to obtain the problem with optimized intention comprises the following steps: comparing the key component with the retrieved knowledge item to obtain a comparison result, wherein the comparison result is used for indicating the knowledge item related to the key component; and rewriting corresponding questions based on the optimized query intention and the comparison result to obtain questions with optimized intention.
In another embodiment, when the task description instruction is generated by using the thought chain prompt text, the query question and the retrieved knowledge item, the one or more instructions may be loaded and executed by the processor:
Obtaining a context learning example, the context learning example comprising: example questions, example knowledge items, and question rewrite information; the problem rewriting information includes: based on the example knowledge item, executing a problem rewriting task on the example problem according to the plurality of steps, wherein the executing result corresponds to each step;
And generating a task description instruction by adopting the context learning example, the thinking chain prompt text, the query problem and the retrieved knowledge item.
In another embodiment, when the statement generation model is called to generate a database query statement corresponding to the query question after the intent optimization according to the database table, the one or more instructions may be loaded and specifically executed by the processor:
Retrieving M data fields related to the query problem after the intention optimization in the database table to obtain a field retrieval result; wherein M is a positive integer;
And calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the field retrieval result.
In another embodiment, in the database table, M data fields related to the query problem after the intent optimization are retrieved, and when a field retrieval result is obtained, the one or more instructions may be loaded and specifically executed by the processor:
Acquiring a plurality of field recall strategies, wherein any field recall strategy is used for determining a correlation score between the query problem after intention optimization and each data field in the database table, and recalling H data fields from the database table according to the sequence of the correlation score from high to low, wherein H is a positive integer;
according to the multiple field recall strategies, carrying out field recall in the database table based on the query problem after intention optimization to obtain multiple field recall results; one field recall result corresponds to one field recall policy, and each field recall result comprises H data fields;
And retrieving M data fields related to the query problem after the intention optimization from the field recall results to obtain a field retrieval result.
In another embodiment, the sentence generation model is a language model; correspondingly, when the statement generation model is called to generate the database query statement corresponding to the query problem after the intention optimization according to the field retrieval result, the one or more instructions can be loaded and specifically executed by the processor:
Generating task prompt information of a sentence generation model by adopting the field retrieval result and the query problem after intention optimization; the task prompt information is used for prompting: writing a corresponding database query statement for the query problem subjected to intent optimization according to the field retrieval result;
And calling the statement generation model to perform task processing according to the task prompt information, and taking the data output by the statement generation model after performing task processing as a database query statement corresponding to the query problem after intention optimization.
In another embodiment, when the number of the database tables is plural, one database table corresponds to one field retrieval result; correspondingly, when the field retrieval result and the query problem after the intention optimization are adopted to generate the task prompt information of the sentence generation model, the one or more instructions can be loaded and specifically executed by the processor:
acquiring attribute information of each database table, wherein the attribute information at least comprises: a primary key and a foreign key of the database table;
Generating task prompt information of a sentence generation model by adopting the attribute information of each database table, the field retrieval result of each database table and the query problem after intention optimization;
wherein, the task prompt information is used for prompting: and writing a corresponding database query statement for the query problem subjected to intent optimization according to the attribute information of each database table and the corresponding field retrieval result.
In another embodiment, the task prompt information is further used for prompting the sentence generation model to output the written database query sentence after task processing, and forbidding to output the interpretation information related to the corresponding database query sentence.
In another embodiment, the one or more instructions may be loaded and executed in particular by a processor:
Acquiring an operation instruction aiming at the domain knowledge base, wherein the operation instruction is used for indicating at least one of the following operations: adding knowledge items in the domain knowledge base, deleting the knowledge items in the domain knowledge base, modifying the knowledge items in the domain knowledge base and searching the knowledge items in the domain knowledge base;
And executing corresponding operation on the domain knowledge base according to the operation instruction.
In another embodiment, after generating the database query statement corresponding to the intent-optimized query question, the one or more instructions may be loaded and executed by the processor:
based on the generated database query statement, carrying out data query in the database table to obtain a query result; generating a response answer of the query question according to the query result;
displaying the response answer and a statement check entry corresponding to the database query statement in a user interface;
and when the statement view entry is triggered, displaying the database query statement.
According to the embodiment of the application, the database table and the domain knowledge base are configured for the service domain, so that after the query problem in the service domain is acquired, knowledge items related to the query problem can be retrieved from the domain knowledge base, and the query intention of the query problem is optimized based on the retrieved knowledge items by calling the intention optimizing model, so that the statement generating model is called to generate a database query statement corresponding to the query problem after intention optimization according to the database table. Therefore, the embodiment of the application realizes automatic generation of the database query statement, and when the database query statement is automatically generated, the corresponding domain knowledge can be injected, and the query intention of the query problem is optimized by combining the injected domain knowledge, so that the statement generation model can more clearly and accurately understand the query intention of the query problem, thereby generating a correct database query statement and improving the accuracy of the database query statement.
It should be noted that, according to an aspect of the present application, there is also provided a computer program product or a computer program, which comprises one or more instructions stored in a computer storage medium. The processor of the computer device reads one or more instructions from the computer storage medium and executes the one or more instructions to cause the computer device to perform the methods provided in the various alternatives to the method embodiment aspects illustrated in fig. 2 or 4 described above. It should be understood that the foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (15)

1. A method for generating a database query statement, comprising:
acquiring a query problem in the service field, wherein the query problem is a natural language text;
Determining a database table configured for the business domain and a domain knowledge base, wherein the domain knowledge base comprises at least one knowledge item, and one knowledge item comprises: interpretation information of one term and corresponding term in the business field; when a user inputs the query problem, if a target knowledge item related to the query problem is lacking in the domain knowledge base, the target knowledge item is newly added into the domain knowledge base by the user;
Retrieving knowledge items related to the query problem from the domain knowledge base, and acquiring a thinking chain prompt text, wherein the thinking chain prompt text is used for prompting: executing a problem rewriting task on the problem in the input field according to a plurality of steps; the steps sequentially comprise: analyzing the problems to obtain query intention and key components of the problems, optimizing the query intention according to the historical query records of the users, and comparing the key components with the retrieved knowledge items to obtain comparison results, wherein the comparison results are used for indicating the knowledge items related to the key components; rewriting corresponding questions based on the optimized query intention and the comparison result to obtain questions with optimized intention; wherein if the query question is entered in a question-and-answer session, the historical query record includes questions entered in the question-and-answer session historically, and if the query question is entered in a query web page, the historical query record includes questions entered in the query web page during a historical time period;
Generating a task description instruction by adopting the thinking chain prompt text, the query problem and the retrieved knowledge item; wherein the task description instruction includes the input field and the query question is located in the input field;
invoking an intention optimizing model to process tasks according to the task description instruction to obtain an inquiry problem after intention optimization;
And calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table.
2. The method of claim 1, wherein retrieving knowledge items related to the query problem from the domain knowledge base comprises:
Acquiring a plurality of item recall strategies, wherein the item recall strategies are used for determining the relevance scores between the query problem and each knowledge item in the domain knowledge base, and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low, wherein K is a positive integer; different entry recall policies differ in the manner in which the relevance scores between the query questions and the knowledge entries are determined;
According to the multiple item recall strategies, carrying out item recall based on the query problem in the domain knowledge base to obtain multiple item recall results; one item recall result corresponds to one item recall policy, and each item recall result comprises K knowledge items;
traversing the plurality of item recall results, and carrying out normalization processing on the relevance scores corresponding to the knowledge items in the item recall results of the current traversal to obtain normalization scores corresponding to the corresponding knowledge items;
And retrieving knowledge items related to the query problem from the plurality of item recall results based on the normalized scores corresponding to the knowledge items in the plurality of item recall results.
3. The method of claim 2, wherein the plurality of item recall policies includes a semantic recall policy; according to the semantic recall strategy, carrying out item recall based on the query problem in the domain knowledge base to obtain an item recall result, wherein the method comprises the following steps:
According to the semantic recall strategy, carrying out semantic recognition on the query problem to obtain a semantic vector of the query problem;
Acquiring semantic vectors of each knowledge item in the domain knowledge base, wherein the semantic vector of any knowledge item is obtained by carrying out semantic recognition after cleaning the corresponding knowledge item; the cleaning process includes: removing invalid words in the knowledge items;
Determining a relevance score between the query question and the corresponding knowledge item based on vector similarity between the semantic vector of the query question and the semantic vector of each knowledge item, respectively;
and recalling K knowledge items from the domain knowledge base according to the sequence of the relevance scores from high to low to obtain an item recall result.
4. The method of claim 1, wherein generating task description instructions using the mental chain prompt text, the query questions, and the retrieved knowledge items comprises:
Obtaining a context learning example, the context learning example comprising: example questions, example knowledge items, and question rewrite information; the problem rewriting information includes: based on the example knowledge item, executing a problem rewriting task on the example problem according to the plurality of steps, wherein the executing result corresponds to each step;
And generating a task description instruction by adopting the context learning example, the thinking chain prompt text, the query problem and the retrieved knowledge item.
5. The method of claim 1, wherein the calling statement generation model generates a database query statement corresponding to the intent-optimized query question from the database table, comprising:
Retrieving M data fields related to the query problem after the intention optimization in the database table to obtain a field retrieval result; wherein M is a positive integer;
And calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the field retrieval result.
6. The method of claim 5, wherein retrieving M data fields related to the intent-optimized query question in the database table, resulting in a field retrieval result, comprises:
Acquiring a plurality of field recall strategies, wherein any field recall strategy is used for determining a correlation score between the query problem after intention optimization and each data field in the database table, and recalling H data fields from the database table according to the sequence of the correlation score from high to low, wherein H is a positive integer;
according to the multiple field recall strategies, carrying out field recall in the database table based on the query problem after intention optimization to obtain multiple field recall results; one field recall result corresponds to one field recall policy, and each field recall result comprises H data fields;
And retrieving M data fields related to the query problem after the intention optimization from the field recall results to obtain a field retrieval result.
7. The method according to claim 5 or 6, wherein the sentence generating model is a language model, and the calling sentence generating model generates a database query sentence corresponding to the query question after the intention optimization according to the field retrieval result, including:
Generating task prompt information of a sentence generation model by adopting the field retrieval result and the query problem after intention optimization; the task prompt information is used for prompting: writing a corresponding database query statement for the query problem subjected to intent optimization according to the field retrieval result;
And calling the statement generation model to perform task processing according to the task prompt information, and taking the data output by the statement generation model after performing task processing as a database query statement corresponding to the query problem after intention optimization.
8. The method of claim 7, wherein when the number of database tables is plural, one database table corresponds to one field retrieval result;
the task prompt information of the sentence generation model is generated by adopting the field retrieval result and the query problem after the intention optimization, and the task prompt information comprises the following steps:
acquiring attribute information of each database table, wherein the attribute information at least comprises: a primary key and a foreign key of the database table;
Generating task prompt information of a sentence generation model by adopting the attribute information of each database table, the field retrieval result of each database table and the query problem after intention optimization;
wherein, the task prompt information is used for prompting: and writing a corresponding database query statement for the query problem subjected to intent optimization according to the attribute information of each database table and the corresponding field retrieval result.
9. The method of claim 7, wherein the task prompt information is further used to prompt the sentence generation model to output a composed database query sentence after task processing, and to prohibit output of interpretation information related to the corresponding database query sentence.
10. The method of claim 1, wherein the method further comprises:
Acquiring an operation instruction aiming at the domain knowledge base, wherein the operation instruction is used for indicating at least one of the following operations: adding knowledge items in the domain knowledge base, deleting the knowledge items in the domain knowledge base, modifying the knowledge items in the domain knowledge base and searching the knowledge items in the domain knowledge base;
And executing corresponding operation on the domain knowledge base according to the operation instruction.
11. The method of claim 1, wherein after generating the database query statement corresponding to the intent-optimized query question, the method further comprises:
based on the generated database query statement, carrying out data query in the database table to obtain a query result; generating a response answer of the query question according to the query result;
displaying the response answer and a statement check entry corresponding to the database query statement in a user interface;
and when the statement view entry is triggered, displaying the database query statement.
12. A database query statement generation apparatus, comprising:
The system comprises an acquisition unit, a query unit and a processing unit, wherein the acquisition unit is used for acquiring a query problem in the service field, wherein the query problem is a natural language text;
The obtaining unit is further configured to determine a database table configured for the service domain and a domain knowledge base, where the domain knowledge base includes at least one knowledge item, and one of the knowledge items includes: interpretation information of one term and corresponding term in the business field; when a user inputs the query problem, if a target knowledge item related to the query problem is lacking in the domain knowledge base, the target knowledge item is newly added into the domain knowledge base by the user;
The processing unit is used for retrieving knowledge items related to the query problem from the domain knowledge base and acquiring a thinking chain prompt text, wherein the thinking chain prompt text is used for prompting: executing a problem rewriting task on the problem in the input field according to a plurality of steps; the steps sequentially comprise: analyzing the problems to obtain query intention and key components of the problems, optimizing the query intention according to the historical query records of the users, and comparing the key components with the retrieved knowledge items to obtain comparison results, wherein the comparison results are used for indicating the knowledge items related to the key components; rewriting corresponding questions based on the optimized query intention and the comparison result to obtain questions with optimized intention; wherein if the query question is entered in a question-and-answer session, the historical query record includes questions entered in the question-and-answer session historically, and if the query question is entered in a query web page, the historical query record includes questions entered in the query web page during a historical time period; generating a task description instruction by adopting the thinking chain prompt text, the query problem and the retrieved knowledge item; wherein the task description instruction includes the input field and the query question is located in the input field; invoking an intention optimizing model to process tasks according to the task description instruction to obtain an inquiry problem after intention optimization;
the processing unit is further used for calling a statement generation model to generate a database query statement corresponding to the query problem after intention optimization according to the database table.
13. A computer device comprising an input interface and an output interface, further comprising: a processor and a computer storage medium;
Wherein the processor is adapted to implement one or more instructions, the computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the method of generating a database query statement as claimed in any one of claims 1 to 11.
14. A computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform a method of generating a database query statement as claimed in any one of claims 1 to 11.
15. A computer program product, the computer program product comprising one or more instructions; the method of generating a database query statement of any one of claims 1 to 11 when one or more instructions in the computer program are executed by a processor.
CN202410647323.3A 2024-05-23 2024-05-23 Database query statement generation method, device, equipment and storage medium Active CN118227655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410647323.3A CN118227655B (en) 2024-05-23 2024-05-23 Database query statement generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410647323.3A CN118227655B (en) 2024-05-23 2024-05-23 Database query statement generation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN118227655A CN118227655A (en) 2024-06-21
CN118227655B true CN118227655B (en) 2024-08-27

Family

ID=91503100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410647323.3A Active CN118227655B (en) 2024-05-23 2024-05-23 Database query statement generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118227655B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118733747A (en) * 2024-09-03 2024-10-01 金现代信息产业股份有限公司 Database question answering method, system, device and medium based on large language model
CN118796982A (en) * 2024-09-11 2024-10-18 阿里云飞天(杭州)云计算技术有限公司 A data processing method, computing device, storage medium and program product
CN119149579A (en) * 2024-11-19 2024-12-17 北京火山引擎科技有限公司 Method, apparatus, electronic device and computer program product for data query
CN119166662B (en) * 2024-11-21 2025-05-16 烟台海颐软件股份有限公司 A method for constructing SQL agent in power field based on KMDI chain
CN119599131A (en) * 2024-11-21 2025-03-11 北京神州泰岳软件股份有限公司 Query statement model fine tuning method, device and equipment based on large model
CN119377255B (en) * 2024-12-30 2025-05-13 浙江大学计算机创新技术研究院 Method and device for generating SQL (structured query language) based on semantic alignment and layering agents
CN119623604B (en) * 2025-02-11 2025-06-27 国网天津市电力公司城东供电分公司 A method, system, device and storage medium for compiling and researching archive knowledge

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117931843A (en) * 2024-01-31 2024-04-26 星环信息科技(上海)股份有限公司 SQL statement generation method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186026B (en) * 2021-12-14 2025-06-24 中国建设银行股份有限公司 Natural language processing method, device, equipment and storage medium
CN116821168B (en) * 2023-08-24 2024-01-23 吉奥时空信息技术股份有限公司 Improved NL2SQL method based on large language model
CN117891458B (en) * 2023-11-23 2024-12-17 星环信息科技(上海)股份有限公司 SQL sentence generation method, device, equipment and storage medium
CN117633252B (en) * 2023-12-14 2024-06-18 广州华微明天软件技术有限公司 Auxiliary retrieval method integrating knowledge graph and large language model
CN117971861A (en) * 2024-01-08 2024-05-03 国网浙江省电力有限公司营销服务中心 A feature-decoupled and configurable NL2SQL method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117931843A (en) * 2024-01-31 2024-04-26 星环信息科技(上海)股份有限公司 SQL statement generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN118227655A (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN118227655B (en) Database query statement generation method, device, equipment and storage medium
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
US10474478B2 (en) Methods, systems, and computer program product for implementing software applications with dynamic conditions and dynamic actions
US12061954B2 (en) Methods, systems, and computer program product for dynamically modifying a dynamic flow of a software application
US11726997B2 (en) Multiple stage filtering for natural language query processing pipelines
US11410031B2 (en) Dynamic updating of a word embedding model
CN118103834A (en) Information acquisition method and device
CN110750649A (en) Knowledge graph construction and intelligent response method, device, equipment and storage medium
US12353477B2 (en) Providing an object-based response to a natural language query
CN116974554A (en) Code data processing method, apparatus, computer device and storage medium
CN113779176B (en) Query request completion method, device, electronic equipment and storage medium
CN119357408A (en) A method for constructing electric power knowledge graph based on large language model
KR102682244B1 (en) Method for learning machine-learning model with structured ESG data using ESG auxiliary tool and service server for generating automatically completed ESG documents with the machine-learning model
CN117709359A (en) Method, device, equipment and storage medium for acquiring insurance knowledge question-answer pair
CN119166763A (en) Query processing method, device, computer equipment and medium based on artificial intelligence
CN117313857A (en) Intelligent question-answering implementation method, system and medium integrating large model and knowledge base
Mangla et al. Unstructured data analysis and processing using big data tool-hive and machine learning algorithm linear regression
US20250094460A1 (en) Query answering method based on large model, electronic device, storage medium, and intelligent agent
CN118070925B (en) Model training method, device, electronic equipment, storage medium and program product
US12321343B1 (en) Natural language to SQL on custom enterprise data warehouse powered by generative artificial intelligence
US12079216B2 (en) Systems and methods for querying and performing operations on data using predicate logic extended to include quotation
EP4582968A1 (en) Efficient generation of application programming interface calls using language models, data types, and enriched schema
Tiwari et al. ROC Bot: Towards Designing Virtual Command Centre for Energy Management
Yan Application and Effectiveness of Improving Retrieval Systems Based on User Understanding in Smart Archive Management Systems.
Gan et al. Knowledge base question answering based on regularization and feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant