WO2008128423A1

WO2008128423A1 - An intelligent dialog system and a method for realization thereof

Info

Publication number: WO2008128423A1
Application number: PCT/CN2008/000764
Authority: WO
Inventors: Yangsheng Xu; Chong Guo Li; Jingyu Yan; Jun Cheng; Xinyu Wu
Original assignee: Shenzhen Institute Of Advanced Technology
Priority date: 2007-04-19
Filing date: 2008-04-15
Publication date: 2008-10-30
Also published as: CN101075435B; CN101075435A

Abstract

An intelligent dialog system includes a text-comprehending answering module (2) which is used to obtain an output text based on an input text. The module includes a word dividing unit, a mapping corpus (7), a mapping unit (5), a dialog corpus (8) and a searching unit (6). The word dividing unit is used to tag parts of speech for said input text (4), and obtain a word set with parts of speech tagging. The mapping corpus (7) is used to set and store a mapping relationship between key words and concept sentences. The mapping unit (5) is used to search said mapping corpus (7) based on said word set and map to obtain a concept sentence (5). The dialog corpus (8) is used to set and store mapping relationship between concept sentences and output text. The searching unit (6) is used to search said dialog corpus (8) based on said concept sentence and obtain the output text.

Description

Intelligent chat system and implementation method thereof

The invention relates to the field of human-machine voice interaction, and particularly relates to an intelligent chat system and a realizing method thereof, which are applied to a home service robot, an entertainment robot and a voice dialogue field. Background technique

With the advent of society and the acceleration of social rhythm, people lack face-to-face communication, more through telephone, mail and internet. Therefore, some people may have a feeling of loneliness, or it is difficult to find the right person to chat and relieve boredom. They can't find a place where they can talk about their feelings. They hope to have a way to talk about their feelings and help to let go of loneliness. Or give some specific help.

Moreover, in the fast-paced and high-stress environment of modern society, people want to be understood by others, to relieve their own pressure, to confide themselves, and to have a smart entity that can communicate in natural language and can listen, understand and answer. of. Especially for elderly people, in order to prevent dementia or amnesia, there is a great demand for a device capable of language communication and voice reminding. For some specific people, it is necessary to interact in a natural language to get the information you want.

In the home intelligent service robot, people hope to use natural language to operate and control some of the functions of the robot, to achieve harmony between man and robot, to better serve human beings. Therefore, the voice chat system has great significance for human beings and society. There are many simple voice dialogue toys on the market. The technology mainly uses the voice recognition chip to perform waveform matching and establish a mapping relationship with the voice answers recorded in advance to reach the answer to the input sentence. Therefore, the number of such product conversations is limited, and it is not possible to dynamically add conversations and understanding, and it is not really possible to achieve natural interaction with people.

In addition, it is a chat intelligent entity that exists on some instant chat tools. Its main technology is to construct a virtual agent through a chat tool such as MSN, QQ, etc., and attach it to the Internet, through information retrieval and database check. Ask to answer questions and chat. It is characterized by the use of words as a medium of communication, and is completely attached to the Internet or the communication network; such intelligent entities cannot use natural language to communicate with people, lack the experience and fun of dialogue with the real language of the machine, Meet the various social needs described above.

The prior art voice chat also includes automatic speech recognition, spoken text comprehension, speech synthesis steps, in recognition The synthesis effect is better when the accuracy is high; the colloquial text understanding is generally attempted to be identified by semantic analysis, and can be implemented by using a semantic framework or an ontology representation method. Semantic analysis derives a formal method that reflects the meaning of the sentence based on the syntactic structure of the input sentence and the meaning of each real word in the sentence. The semantic framework is the carrier of semantic analysis, and some systems use ontology to represent or organize semantics. The frame. However, the main difficulty of the semantic framework is how to express semantics, and because the semantic expression of the semantic framework is empirical, it is difficult to have a unified standard, and the number is massive, which will lead to the difficulty of establishing the semantic framework.

Therefore, the prior art has drawbacks and needs improvement. Summary of the invention

It is an object of the present invention to provide an intelligent chat system and an implementation method thereof for use in the field of home service robots, entertainment robots, and voice conversations.

The technical solution of the present invention is as follows:

An intelligent chat system, comprising: a text comprehension answering module for obtaining output text according to input text; the text comprehension answering module comprises a word segmentation unit, an XML-based mapping corpus, a mapping unit, an XML-based pair-discourse corpus, and Searching unit; the word segmentation unit is configured to perform part-of-speech tagging on the input text to obtain a word set having a part-of-speech tag; the mapping corpus is used to establish and store a mapping relationship between a keyword and a concept statement; And searching for the mapping corpus according to the set of words, mapping to obtain a concept statement; the dialog corpus is used to establish and store a mapping relationship between the concept sentence and the output text; the searching unit is configured to search according to the concept statement The dialog corpus maps the output text.

In the smart chat system, it also includes a voice recognition module for converting input speech into input text.

The intelligent chat system, wherein it further comprises a speech synthesis module for converting the output text into an output speech.

The intelligent chat system, wherein the mapping corpus and the conversation corpus are set in the same corpus. The smart chat system, further comprising a pre-processing unit, configured to: replace the word set information, add a dialog flag, or set a dialog flag bit to the word set from the word segmentation unit, to obtain the mapping unit The set of words used.

The smart chat system further includes a post-processing unit, configured to: perform the following processing on the output text from the search unit: adding or storing history information, setting a conversation topic, and adding relevant information obtained by the search, The output text output to the speech synthesis module is obtained. An intelligent chat system implementation method for an intelligent chat system comprising a text comprehension answering module for outputting text according to input text, comprising the steps of: Al, establishing an XML-based mapping corpus and a dialogue corpus, the mapping corpus establishment And storing a mapping relationship between the keyword and the concept statement, the dialog corpus establishes and stores a mapping relationship between the concept sentence and the output text; A2, performing part-of-speech tagging on the input text, and obtaining a word set with part-of-speech tagging; A3, The set of words and the set of words of the keyword of the mapping corpus are matched and calculated to obtain a concept statement; A4. Searching the dialog corpus according to the concept statement to generate an output text.

The implementation method, wherein, before the step A2, the method further includes the step of: converting the input voice into the input text. The implementation method further includes the step A5: converting the output text into an output voice.

The implementation method, after step A4, further includes a post-processing step for increasing the answer accuracy: adding or storing history information, setting a conversation topic, and adding related information of the search.

The implementation method, wherein, before the step A3, the method further includes the steps of: b. determining that the input text has the following conditions: when the demonstrative pronoun occurs, the subject does not change, or the common sense needs to be added, the pre-processing steps are respectively performed: Set the information, add the dialog flag or set the dialog flag. Otherwise, go to step A3. B2. Determine whether the pre-processing is completed. If yes, return the success flag. Perform step A4. Otherwise, return the failure flag and go to step A3.

The implementation method, wherein the mapping corpus and the conversation corpus are set in the same corpus. In the implementation method, the step A1 further includes: setting a weight value for the part of speech of the mapping corpus, wherein the weight value is obtained by orthogonal optimization or two orthogonal optimization methods.

The implementation method further includes a step A6, the user evaluates the output voice, and the text understanding answering module adjusts the weight value according to the evaluation.

The implementation method further includes the step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, reading the weight value and correspondingly adjusting the mapping corpus.

Using the above scheme, the present invention establishes a corpus with part-of-speech weight optimization and learning functions, maps and categorizes semantics, and establishes an answer between mapping semantics; thereby enabling natural language to communicate with people, accuracy Higher, it also provides language communication and voice reminder; and realizes the real language dialogue between the person and the machine, so that the user can get the real language experience and fun. DRAWINGS

1 is a general frame diagram of a chat system of the present invention;

2 is a flow chart of answering the spoken text of the present invention;

3 is a schematic diagram of a spoken language text understanding answering module of the present invention; 4 is a schematic diagram of a mapping description format of a mapping corpus according to the present invention;

5 is a schematic diagram showing a direct answer format description of a conceptual statement of a dialog corpus of the present invention;

6 is a schematic diagram showing a format description of a reply with a history information of a dialog corpus of the present invention;

7 is a schematic diagram showing the format of a default answer library of a dialog corpus of the present invention;

Figure 8 is a flow chart of the method of the present invention;

9 is a schematic diagram of an optimization method for part-of-speech weights of the present invention;

FIG. 10 is a flow chart of online learning of part-of-speech weights according to the present invention. detailed description

The object of the present invention is to construct a chat system based on text interaction, with an intelligent chat system, or a robot to meet people's needs. Preferred embodiments of the present invention are described in detail below.

The present invention provides a voice chat system. Specifically, in order to implement natural language interaction, the present invention can employ the basic framework of three basic modules: Automatic Speech Recognition Module (Audio to Text, Automatic Speech Recognition, ASR, Speech to Text, STT), the user's natural sound is obtained by automatic speech recognition, that is, the speech recognition module is used to convert the input speech into input text; the spoken text comprehension answer module (Text to Text, Text to Text, TTT), That is, the text comprehension answering module for obtaining the output text according to the input text, according to which the intelligent chat system performs spoken language comprehension on the text, and generates an answer text, in which a variety of required corpora and system chat history information are used; The speech synthesis module (Text to Speech, TTS) converts the output text into an output speech, and the speech synthesis module interacts the answer text with the voice through the voice. If you do not consider natural language interactions, just consider the perspective of text interaction, you can include only the text understanding response module.

The automatic speech recognition module and the speech synthesis module can use the modules provided in the existing market, including the corresponding module software on the embedded platform, and the requirements thereof are mainly to identify the high accuracy and the best synthesis effect.

For the text understanding answer module, the understanding method used in this patent is to map and classify the semantics, and at the same time establish the answer between the mapping semantics. Compared with the traditional method, the implementation is simple, but it faces huge semantic space and categories. The spoken voice signal sent by the person becomes the corresponding text text via the automatic speech recognition module, and the spoken language understanding answering module processes the input text text and gives a text answer according to the dialogue corpus and the conversation context, and finally the speech synthesis module will get the text. The answer is converted into a sound signal that interacts with the user. Of course, it can also be a simple process: The Speaking Comprehension Answering module processes the input text and provides a textual response based on the dialog corpus and the context of the conversation, excluding the input or output of the sound. As shown in Figure 1, the voice chat system can use the user's voice output as the input of the system, for example, through the microphone, the voice signal is transmitted to the voice recognition module 1, the voice is converted into text, and the spoken text understanding answer module 2 is entered. The module will execute the whole process of Figure 2 and use the corresponding database, and return the corresponding answer sentence text. The answer text will enter the speech synthesis module 3, convert the text into speech, and let the user hear the feedback through the speaker. The invention can be applied not only to voice chat, but also to various information inquiry systems, automatic tour guide systems, automatic introduction systems, language learning systems, etc., and can be used in various occasions where information is required to be output, which not only reduces labor costs, but also reduces labor costs. At the same time, it can improve the accuracy of information and the management of information.

The textual spoken language comprehension and answer of the intelligent chat system of the present invention can be obtained by Chinese part-of-speech tagging, and then the set of keywords and colloquial text understanding corpus are mapped to a concept sentence; according to the concept sentence, the dialogue corpus, the historical record The information and information database or network gives an answer to the concept statement. As shown in FIG. 3, in the spoken text comprehension answering module 2, the main process is to input the input text through the part-of-speech tag 4 of the word segmentation unit, and perform part-of-speech tagging on the input text to obtain a word set with part-of-speech tagging; That is, the mapping module 5 searches the mapping corpus 7 according to the word set, and maps the concept sentence; then the search unit, that is, the search module 6, searches the dialog corpus 8 according to the concept sentence, and maps the output text. There are two kinds of databases involved, wherein the mapping corpus 7, that is, the database 7 is a description of the mapping from the keyword set to the concept statement, the specific description format can be as shown in Figure 4, which defines 14 Chinese part of speech, and gives Out of each set of keyword sets should be. Corresponding to a concept statement; Dialogue Corpus 8, that is, database 8 is mainly the record answer to the concept statement, Figure 5 is straight. The specific format description of the answer to the concept statement, does not involve Environmental and historical information; Figure 6 is a description and record of the answer statement given based on historical information, environmental information, and current concept statements. Figure 7 is the default answer library. The program will specify from the default answer library when needed. The way to give the output text. For example, when the user says "What is your name?", the speech recognition module can get "what is your name" under better conditions, and the 3⁄4 verbal annotation will get a participle and part of speech result, "you (pronoun) (Auxiliary) Name (noun) Yes (verb) What (pronoun) " , enter the mapping process, compare the scores of the part-of-speech tagging results with the conceptual corpus, and get the three highest-scoring concept statements, such as the score from high to low Arrange "What is your name", "What is the name", "Do you know the name?", which obviously expresses the meaning of the highest score, which is the conceptual statement obtained by mapping, according to the concept. Statement, search the dialogue corpus, Can get an answer. For some statements, such as "like", the system needs to know the context of the environment. By matching the information in the previous item, you can know how to answer, such as "What movie do you like?"

The smart chat system, or the text understanding answering module, may further include a pre-processing unit, configured to replace the word collection information, add a dialogue flag, or set a dialog flag to the word collection from the word segmentation unit. A list of words that are used by the mapping unit.

The smart chat system, or the text comprehension answering module, may further include a post-processing unit for performing the following processing on the output text from the search unit: adding or storing history information, setting a conversation theme And adding relevant information obtained by searching, and obtaining the output text output to the speech synthesis module.

By using the above-mentioned pre-processing unit and post-processing unit, the accuracy of the information can be increased, the user's information can be easily understood, and information that is easy for the user to understand and more accurate can be issued.

On this basis, the present invention also provides a method for intelligent chat system, as shown in FIG. 8, a smart chat system for a text comprehension answering module including an output text according to input text, comprising the steps of:

Al. Establish an XML-based mapping corpus and a dialog corpus. The mapping corpus establishes and stores a mapping relationship between keywords and concept sentences, and the dialog corpus establishes and stores a mapping relationship between the concept sentences and the output text. Step A1 may further include: setting a weight value for the part of speech of the mapping corpus, wherein the weight value may be obtained by orthogonal optimization or two orthogonal optimization methods. The specific orthogonal optimization or two orthogonal optimization methods will be described in detail later.

A2, performing part-of-speech tagging on the input text to obtain a set of words with part-of-speech tagging. The part-of-speech tag is used for subsequent matching calculation steps. Before step A2, the method may further include: converting the input voice into input text, that is, collecting external voice information and converting into text information. If natural language interaction is not considered, only the angle of text interaction is considered, and the step of converting the input speech into input text can be omitted.

A3. Perform matching calculation on the word set and the word set of the keyword of the mapping corpus to obtain a concept statement. Before step A3, the method may further include the steps of: b. determining that the input text has the following conditions: if the demonstrative pronoun occurs, the subject does not change, or the common sense needs to be added, respectively, the pre-processing step is performed correspondingly: replacing the word set information, adding the dialogue flag or Set the dialog flag bit, otherwise go to step A3; B2, judge whether the pre-processing is completed, if yes, return the success flag, execute step A4, otherwise return the failure flag, and perform step A3. Wherein, the replacement word collection information is required to be replaced when the current user input text contains the demonstrative pronoun, for example, the user input: "Is that city beautiful?" At this time, the history of the chat or the information stored in the database can be queried. For example, if the city where historical information is stored is Shenzhen, you need to replace it with it. Is Shenzhen beautiful? And for subsequent processing. The dialogue mark mainly indicates whether the conversation topic has been converted. When a new topic appears, the topic of the conversation is modified. For example, when the user is talking about the weather at first, but suddenly becomes a car, it is necessary to modify the topic of the conversation, add or set the dialogue flag, and invalidate or change the history information. Setting the dialog flag is a similar concept to adding a dialog flag. 'The dialog flag needs to be added when the theme first appears. The dialog flag needs to be set when the theme changes.

A4. Search the dialog corpus according to the concept statement to generate an output text. After step A4, still Post-processing steps can be included: adding or storing historical information, setting conversation topics, and joining relevant information for the search. Among them, the historical information contains sentences that have been spoken with the user, as well as some other important information, such as the name, age, hobbies, etc. of the speaker; the topic of the conversation refers to the topics currently discussed, such as weather, stocks, news, culture, Sports, etc., this is an effective reminder for the robot to search and answer information; the relevant information of the search means that according to the topic of the conversation, the user's needs can be satisfied by searching the database or the network, for example, when talking about the weather, according to the user The time and place given, the weather of the corresponding city or region, or the change of the weather, etc., by querying the relevant information obtained by these searches, can give the user's required answer. Moreover, through the above-mentioned post-processing steps, it can be used to increase the answer accuracy, so that the output text is more accurate.

After step A4, step A5 may also be included: converting the output text into an output speech. If you do not consider natural language interactions, just consider the angle of text interaction, you can omit the step of converting the output text to output speech.

After step A4, step A6 may also be included, the user evaluates the output voice, and the text understanding response module adjusts the weight value according to the evaluation. At this time, a personal information file may also be established for each user, that is, a step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, the weight value is read and Corresponding to the mapping corpus. Among them, the evaluation is subjective. For the system's answer, the user can give three levels of evaluation, for example, good, okay, bad, or other evaluations of other levels, which are not additionally limited in the present invention. After the system obtains the evaluation, the confirmation information can also be given by voice; at the same time, the system adjusts the weight value of the part of speech of the mapping corpus according to the result.

The present invention also provides a method of oral understanding. Due to the difference in the quietness of the user's use environment and the characteristics of the speech recognition software used, as well as some repetitions, omissions, pauses, sick sentences and various rich expression methods for the same semantics, the automatic The output of speech recognition has uncertainty and diversity. Therefore, it is difficult to parse and express semantics according to the commonly used rules of natural language understanding. In fact, when human beings chat and communicate in a noisy environment, sometimes they can't hear every word spoken by the other party, but if you can understand the key words, and according to the context of the part, you can recover the other party's needs. The meaning of expression. So here, use the mapping of keywords to concept sentences to get the speaker's semantics, and the concept statements are directly represented by the corresponding natural statements.

Figure 2 is a flow chart of the answer to the spoken text.

First, through the word segmentation module 9, the word collection with part-of-speech tagging is obtained. The Chinese word segmentation has been studied more and has a higher correct rate, and will not be described here. At the same time, according to the historical information of the chat, some input statements appear. Demonstrative pronouns, or conversations with the same theme, or common sense knowledge, need to be pre-processed; pre-processing 10 as needed, pre-processing 10, replacing or adding some necessary information to the dialogue flag Bit Set, the system can return the result of the preprocessing by directly returning a flag. If the pre-processing return flag is successful, the processing will directly enter the post-processing module 14 to give the final output text; if the processing is to be processed after the pre-processing, the matching sorting module 11 will be entered, according to the corpus shown in FIG. The part-of-speech tag set and the alternative part-of-speech set described by the keys attribute in the corpus are matched and calculated. Different part of speech has different weights, and each alternative concept sentence in the corpus gives a score, for example, "What is your name? The name ", the most expressive semantics in this sentence is the noun "name", other relatives are less important, so when matching, the most important part of speech should be matched; the degree of matching of this part of speech directly affects the concept The accuracy of the statement.

The matching sorting module will eventually form a set with the three highest score patterns. Because of the inherent insufficiency of speech recognition and the influence of the use environment, it may appear that the recognized text is not a complete statement at all, or even a confusing text. In this case, the result of the word segmentation will be poor, and the mapping obtained by mapping will be obtained. The scores of the statements are all zero. In this case, the chat system is considered to have not heard the speaker at all, and the set of concept statements is set to null.

If the set is empty, directly enter the default corpus as shown in Figure 7; if the set is not empty, select the highest score statement to compare with the first threshold 12, when the score is less than the threshold, also directly enter Figure 7 The default corpus shown, when the score is not less than the threshold, will successfully get the mapped concept statement, and its corresponding pattern is used as the concept statement. Wherein, when determining the first threshold, a typical test set of 100 sentences can be selected, and the test result is scored by matching, and the threshold with the highest score is selected as the first threshold here. .

After obtaining the concept statement, through the search module 13, according to some historical information and the corpus shown in Figure 6, try to give the response text, which is a search process, using the current concept statement and the previous system answer statement as input to search Because it is not necessary to satisfy both inputs at the same time, it is possible that the result of the search is empty. If the answer text is searched, it is regarded as successful, and the answer output will be directly sent to the post-processing module 14 for processing; if the search result is empty, it will be regarded as a failure, and the corpus as shown in FIG. 5 will be entered. The answer is made and the final output is also entered into the post-processing module 14. The output statement performs corresponding processing in the post-processing module 14, wherein some historical information is added, or historical information is stored, the state of the conversation topic is set, and the query of the related information is searched, and finally the answer text is formed, and the speech synthesis module is returned. . The generation of the final answer text can be generated jointly based on the answers to the concept sentences, the search for information, and historical information.

The invention also provides a structure and description storage method of a dialogue corpus. In order to complete the mapping relationship between the description keyword and the concept statement, and to describe and store the corresponding output statement according to the concept statement and context, a storage structure description language based on XML (extendable markup language:) is designed to describe these non- Structured data structures, and use XML documents to describe the corpus and relational databases to store data. The mapping corpus and dialog corpus and historical information are both described and stored using XML. And define the attribute nodes needed to describe the corpus. Number According to the library, a collection of parts of speech, concept sentences, answer statements, and historical information are stored. It is characterized by easy organization and management, and can dynamically modify the contents of the corpus. Various corpora can manually modify and add data by manual methods, and can directly add and modify corpora through voice interaction, and can automatically store specific data.

The present invention also provides a process and method for learning knowledge by voice. The knowledge accumulation of the chat system can be informed by the interlocutor in a natural interaction manner, and through mutual inquiry to determine whether the chat system obtains the knowledge given by the user, and the chat system will give corresponding natural language feedback.

The invention also provides a method for recording and using chat context information. The system automatically stores some information into the context record during the interaction with the person, stores some important information and conversation content, and adds corresponding information during the dialogue, and dynamically organizes the response according to the information. Statement.

The invention also provides an optimization of part-of-speech weights and an online learning method. When mapping keywords to conceptual statements, each keyword with different parts of speech will have different weights. An optimized method is used to obtain individual ortho-optimal weight values, and the weight values can be dynamically modified through online learning. When mapping keywords to corresponding concept sentences, it is necessary to weight the part of speech of different keywords. Keywords with different part of speech have different weights in the process of expressing sentence semantics. Usually, nouns and verbs of a sentence have higher Weight, the understanding of sentence semantics has important significance. However, natural language has many types of parts of speech, and the weight of each part of speech does not have a certain value. Therefore, the optimization method of part-of-speech weights and the online learning method are proposed to maximize the correct rate of keyword-to-concept statement mapping.

As shown in Figure 9, it is a method of determining the weight of part of speech using orthogonal optimization. Because of the large number of Chinese parts of speech, and the importance of different part of speech in semantic expression is not known, the optimization method is needed to obtain the weight of each part of speech. According to the general linguistic point of view and common sense, verbs, nouns, pronouns, numerals, adjectives, nouns, adverbs, idioms, time words, auxiliary words, modal particles, names of people, distinguishing words, and position words are selected. Important words. Firstly, according to the needs and experience, 14 words of speech are obtained, and according to the knowledge of linguistics, 14 parts of speech are divided into two groups, for example, nouns, verbs, pronouns, nouns, adjectives, time words, names, etc. A set of modal particles, orientation words, distinguishing words, auxiliary words, idioms, adverbs, numerals, and these seven parts of speech are the second group, and a set of available weights will be obtained through two sets of orthogonal optimization experiments. In the first set of experiments, the relatively more important seven attributes were used as factors, three levels, such as 3, 2, and 1, to select the orthogonal test table for the L18-3-7 standard. The other 7 parts of speech will be set to 0. At the time of the test set creation, each sentence is of a spoken type, and in the test set, try to make each part of speech appear as a natural probability. In each trial, each sentence in the test set was manually scored according to the rationality of the matched concept statement, and the score was taken as the result of this test. This will test 18 rounds. Through the first set of experiments, a set of currently optimal weight values is obtained. In the second set of experiments, the relatively important 7 parts of speech gave the first set of trials a weight value. For the remaining 7 parts of speech, use orthogonal optimization, for example, use 2 For the level of 1, 0, the orthogonal test table of the L18-3-1 standard is also selected. The remaining 7 parts of speech were optimized using the same test set and scoring criteria as the first. Finally, the two parts of speech obtained are combined to obtain the weights of the 14 parts of speech available to the system.

As shown in Figure 10, it is an online learning process with various part-of-speech weights. When the user enters the part-of-speech training mode, the database will be trained by voice. First, the user inputs a test input voice into the mapping module 15, the mapping module 15 is the mapping module 5 shown in FIG. 2, and the mapping result is given to the user in the form of voice. And the discriminating module 16, the user will give an evaluation according to the feedback, and the discriminating module 16 will adjust the weight according to the algorithm in the weight adjusting module 17 by the evaluation, and send the adjusted weight to the mapping module 15 to perform the weight adjustment of the next round. Until the end of the user satisfaction match. For example, when the user says "What is your specialty?", then the system will ask, "What do you say is 'What is your specialty?'" or ask "What are you saying 'What are you?'" Obviously the user will answer "yes" or "no". Based on the answer, the system will adjust the part of speech weights so that the answer is correct as much as possible.

The invention also provides a natural language behavior driving method. Command-driven in natural spoken language, in word-of-speech to conceptual statements, and from conceptual statements to final answers and feedback, there are specific formats and action-driven scripts that can naturally drive the system or issue commands in a spoken language. For behavior-driven, it is no longer to use the system's pre-defined phrases or simple imperatives to drive the system's behavior, but to give some correct responses to some natural command expressions, and to confirm and respond by voice to reach the reminder. User's function. This behavior-driven approach is more in line with people's daily habits, and it can be driven by natural language for new users without much learning.

The invention also provides an embedded implementation system for voice chat. For the design framework of this kind of voice chat, there are various implementation methods, such as using the voice recognition chip to complete the recognition function and mapping the storage corpus, using the embedded system to realize the speech recognition and speech synthesis similar to the ordinary processor, and language understanding. . Embedded implementation is one of them. It requires automatic speech recognition, semantic understanding and speech synthesis under a specific embedded operating system. At the same time, integration is required, and various implementation softwares on different platforms may differ. This solution is fully equipped with the inherent features of a voice chat system, and is characterized by its portability, low power consumption, compact size, and low price.

The present invention also provides a method of querying and answering information naturally by sound. Information is queried and fed back using natural speech and is able to give answers in a human language. It can satisfy people's use of a natural language to communicate the information they need, and use interactive methods to ask, answer and confirm information. And the data can come from existing databases and from the Internet.

It is to be understood that those skilled in the art can devise modifications and changes in accordance with the above description, and all such modifications and changes are intended to be included within the scope of the appended claims.

Claims

Rights request

What is claimed is: 1. An intelligent chat system, comprising: a text comprehension answering module for obtaining an output text according to an input text; the text comprehension answering module comprising a word segmentation unit, an XML-based mapping corpus, a mapping unit, and an XML-based conversation Corpus and search unit;

The word segmentation unit is configured to perform part-of-speech tagging on the input text to obtain a word set with part-of-speech tagging; the mapping corpus is used to establish and store a mapping relationship between a keyword and a concept statement;

The mapping unit is configured to search the mapping corpus according to the set of words, and map to obtain a concept statement; the dialog corpus is used to establish and store a mapping relationship between the concept statement and the output text;

The searching unit is configured to search the dialog corpus according to the concept statement, and obtain an output text by mapping.

2. The intelligent chat system of claim 1 further comprising a speech recognition module for converting the input speech into the input text.

3. The intelligent chat system of claim 1 further comprising a speech synthesis module for converting the output text to output speech.

4. The intelligent chat system of claim 1, wherein the mapping corpus and the conversation corpus are disposed in the same corpus.

The smart chat system according to claim 1, further comprising a pre-processing unit, configured to: replace the word set information, add a dialog flag, or set a dialog flag to the word set from the word segmentation unit. Bit, the set of words used for the mapping unit is obtained.

The smart chat system according to claim 1, further comprising a post-processing unit, configured to: perform the following processing on the output text from the search unit: adding or storing history information, setting a conversation theme And adding relevant information obtained by searching, and obtaining the output text output to the speech synthesis module.

7. An implementation method of an intelligent chat system, comprising: a smart chat system comprising a text understanding answer module for outputting text according to input text, comprising the steps of:

Al, establishing an XML-based mapping corpus and a dialogue corpus, the mapping corpus establishing and storing a mapping relationship of keywords to concept sentences, the dialog corpus establishing and storing a mapping relationship between the concept sentences and the output text; A2, the input The text is subjected to part-of-speech tagging to obtain a set of words with part-of-speech tagging;

A3. Perform matching calculation on the word set and the word set of the keyword of the mapping corpus to obtain a concept statement;

A4. Search the dialog corpus according to the concept statement to generate an output text.

8. The implementation method according to claim 7, wherein before step A2, the method further comprises the step of: converting the input voice into the input text.

9. The method according to claim 7, further comprising the step of: converting the output text into an output voice.

10. The implementation method according to claim 7, wherein after step A4, a post-processing step for increasing the answer accuracy is further included: adding or storing history information, setting a conversation topic, and adding related information of the search.

The implementation method according to claim 7, wherein before step A3, the method further includes the following steps:

Bl, judging the input text has the following conditions: the presence of the demonstrative pronoun, the subject does not change, or need to add common sense, respectively corresponding to the implementation of the pre-processing steps: replace the word set information, increase the dialog flag or set the dialog flag, otherwise perform step A3;

B2. Determine whether the preprocessing is completed. If yes, return the success flag, and go to step A4. Otherwise, return the failure flag and go to step A3.

12. The implementation method according to claim 7, wherein the mapping corpus and the conversation corpus are set in the same corpus.

The implementation method according to claim 7, wherein the step A1 further comprises: setting a weight value for the part of speech of the mapping corpus, wherein the weight value is orthogonal optimization or two orthogonal optimization methods. obtain.

The implementation method according to claim 13, further comprising the step A6, the user evaluating the output voice, and the text understanding answering module adjusts the weight value according to the evaluation.

The implementation method according to claim 14, further comprising the step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, reading the weight value and corresponding Adjust the mapping corpus.