WO2008128423A1 - An intelligent dialog system and a method for realization thereof - Google Patents
An intelligent dialog system and a method for realization thereof Download PDFInfo
- Publication number
- WO2008128423A1 WO2008128423A1 PCT/CN2008/000764 CN2008000764W WO2008128423A1 WO 2008128423 A1 WO2008128423 A1 WO 2008128423A1 CN 2008000764 W CN2008000764 W CN 2008000764W WO 2008128423 A1 WO2008128423 A1 WO 2008128423A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- corpus
- text
- mapping
- speech
- output
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 66
- 238000013507 mapping Methods 0.000 claims abstract description 76
- 230000015572 biosynthetic process Effects 0.000 claims description 16
- 238000003786 synthesis reaction Methods 0.000 claims description 16
- 238000005457 optimization Methods 0.000 claims description 15
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000012805 post-processing Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 description 13
- 238000012360 testing method Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 230000004044 response Effects 0.000 description 6
- 241000282414 Homo sapiens Species 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 206010037180 Psychiatric symptoms Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 208000000044 Amnesia Diseases 0.000 description 1
- 208000031091 Amnestic disease Diseases 0.000 description 1
- 206010048909 Boredom Diseases 0.000 description 1
- 206010012289 Dementia Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006986 amnesia Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012803 optimization experiment Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
Definitions
- the invention relates to the field of human-machine voice interaction, and particularly relates to an intelligent chat system and a realizing method thereof, which are applied to a home service robot, an entertainment robot and a voice dialogue field.
- Background technique
- the voice chat system has great significance for human beings and society.
- the technology mainly uses the voice recognition chip to perform waveform matching and establish a mapping relationship with the voice answers recorded in advance to reach the answer to the input sentence. Therefore, the number of such product conversations is limited, and it is not possible to dynamically add conversations and understanding, and it is not really possible to achieve natural interaction with people.
- chat intelligent entity that exists on some instant chat tools. Its main technology is to construct a virtual agent through a chat tool such as MSN, QQ, etc., and attach it to the Internet, through information retrieval and database check. Ask to answer questions and chat. It is characterized by the use of words as a medium of communication, and is completely attached to the Internet or the communication network; such intelligent entities cannot use natural language to communicate with people, lack the experience and fun of dialogue with the real language of the machine, Meet the various social needs described above.
- the prior art voice chat also includes automatic speech recognition, spoken text comprehension, speech synthesis steps, in recognition
- the synthesis effect is better when the accuracy is high; the colloquial text understanding is generally attempted to be identified by semantic analysis, and can be implemented by using a semantic framework or an ontology representation method.
- Semantic analysis derives a formal method that reflects the meaning of the sentence based on the syntactic structure of the input sentence and the meaning of each real word in the sentence.
- the semantic framework is the carrier of semantic analysis, and some systems use ontology to represent or organize semantics. The frame.
- the main difficulty of the semantic framework is how to express semantics, and because the semantic expression of the semantic framework is empirical, it is difficult to have a unified standard, and the number is massive, which will lead to the difficulty of establishing the semantic framework.
- An intelligent chat system comprising: a text comprehension answering module for obtaining output text according to input text; the text comprehension answering module comprises a word segmentation unit, an XML-based mapping corpus, a mapping unit, an XML-based pair-discourse corpus, and Searching unit; the word segmentation unit is configured to perform part-of-speech tagging on the input text to obtain a word set having a part-of-speech tag; the mapping corpus is used to establish and store a mapping relationship between a keyword and a concept statement; And searching for the mapping corpus according to the set of words, mapping to obtain a concept statement; the dialog corpus is used to establish and store a mapping relationship between the concept sentence and the output text; the searching unit is configured to search according to the concept statement The dialog corpus maps the output text.
- the smart chat system also includes a voice recognition module for converting input speech into input text.
- the intelligent chat system wherein it further comprises a speech synthesis module for converting the output text into an output speech.
- the intelligent chat system wherein the mapping corpus and the conversation corpus are set in the same corpus.
- the smart chat system further comprising a pre-processing unit, configured to: replace the word set information, add a dialog flag, or set a dialog flag bit to the word set from the word segmentation unit, to obtain the mapping unit The set of words used.
- the smart chat system further includes a post-processing unit, configured to: perform the following processing on the output text from the search unit: adding or storing history information, setting a conversation topic, and adding relevant information obtained by the search, The output text output to the speech synthesis module is obtained.
- An intelligent chat system implementation method for an intelligent chat system comprising a text comprehension answering module for outputting text according to input text, comprising the steps of: Al, establishing an XML-based mapping corpus and a dialogue corpus, the mapping corpus establishment And storing a mapping relationship between the keyword and the concept statement, the dialog corpus establishes and stores a mapping relationship between the concept sentence and the output text; A2, performing part-of-speech tagging on the input text, and obtaining a word set with part-of-speech tagging; A3, The set of words and the set of words of the keyword of the mapping corpus are matched and calculated to obtain a concept statement; A4. Searching the dialog corpus according to the concept statement to generate an output text.
- the method further includes the step of: converting the input voice into the input text.
- the implementation method further includes the step A5: converting the output text into an output voice.
- the implementation method after step A4, further includes a post-processing step for increasing the answer accuracy: adding or storing history information, setting a conversation topic, and adding related information of the search.
- the method further includes the steps of: b. determining that the input text has the following conditions: when the demonstrative pronoun occurs, the subject does not change, or the common sense needs to be added, the pre-processing steps are respectively performed: Set the information, add the dialog flag or set the dialog flag. Otherwise, go to step A3. B2. Determine whether the pre-processing is completed. If yes, return the success flag. Perform step A4. Otherwise, return the failure flag and go to step A3.
- the step A1 further includes: setting a weight value for the part of speech of the mapping corpus, wherein the weight value is obtained by orthogonal optimization or two orthogonal optimization methods.
- the implementation method further includes a step A6, the user evaluates the output voice, and the text understanding answering module adjusts the weight value according to the evaluation.
- the implementation method further includes the step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, reading the weight value and correspondingly adjusting the mapping corpus.
- the present invention establishes a corpus with part-of-speech weight optimization and learning functions, maps and categorizes semantics, and establishes an answer between mapping semantics; thereby enabling natural language to communicate with people, accuracy Higher, it also provides language communication and voice reminder; and realizes the real language dialogue between the person and the machine, so that the user can get the real language experience and fun.
- FIG. 1 is a general frame diagram of a chat system of the present invention
- 3 is a schematic diagram of a spoken language text understanding answering module of the present invention
- 4 is a schematic diagram of a mapping description format of a mapping corpus according to the present invention
- FIG. 5 is a schematic diagram showing a direct answer format description of a conceptual statement of a dialog corpus of the present invention
- FIG. 6 is a schematic diagram showing a format description of a reply with a history information of a dialog corpus of the present invention
- FIG. 7 is a schematic diagram showing the format of a default answer library of a dialog corpus of the present invention.
- Figure 8 is a flow chart of the method of the present invention.
- FIG. 9 is a schematic diagram of an optimization method for part-of-speech weights of the present invention.
- FIG. 10 is a flow chart of online learning of part-of-speech weights according to the present invention. detailed description
- the object of the present invention is to construct a chat system based on text interaction, with an intelligent chat system, or a robot to meet people's needs. Preferred embodiments of the present invention are described in detail below.
- the present invention provides a voice chat system.
- the present invention can employ the basic framework of three basic modules: Automatic Speech Recognition Module (Audio to Text, Automatic Speech Recognition, ASR, Speech to Text, STT), the user's natural sound is obtained by automatic speech recognition, that is, the speech recognition module is used to convert the input speech into input text; the spoken text comprehension answer module (Text to Text, Text to Text, TTT), That is, the text comprehension answering module for obtaining the output text according to the input text, according to which the intelligent chat system performs spoken language comprehension on the text, and generates an answer text, in which a variety of required corpora and system chat history information are used;
- the speech synthesis module (Text to Speech, TTS) converts the output text into an output speech, and the speech synthesis module interacts the answer text with the voice through the voice. If you do not consider natural language interactions, just consider the perspective of text interaction, you can include only the text understanding response module.
- the automatic speech recognition module and the speech synthesis module can use the modules provided in the existing market, including the corresponding module software on the embedded platform, and the requirements thereof are mainly to identify the high accuracy and the best synthesis effect.
- the understanding method used in this patent is to map and classify the semantics, and at the same time establish the answer between the mapping semantics.
- the implementation is simple, but it faces huge semantic space and categories.
- the spoken voice signal sent by the person becomes the corresponding text text via the automatic speech recognition module, and the spoken language understanding answering module processes the input text text and gives a text answer according to the dialogue corpus and the conversation context, and finally the speech synthesis module will get the text.
- the answer is converted into a sound signal that interacts with the user.
- the Speaking Comprehension Answering module processes the input text and provides a textual response based on the dialog corpus and the context of the conversation, excluding the input or output of the sound.
- the voice chat system can use the user's voice output as the input of the system, for example, through the microphone, the voice signal is transmitted to the voice recognition module 1, the voice is converted into text, and the spoken text understanding answer module 2 is entered.
- the module will execute the whole process of Figure 2 and use the corresponding database, and return the corresponding answer sentence text.
- the answer text will enter the speech synthesis module 3, convert the text into speech, and let the user hear the feedback through the speaker.
- the invention can be applied not only to voice chat, but also to various information inquiry systems, automatic tour guide systems, automatic introduction systems, language learning systems, etc., and can be used in various occasions where information is required to be output, which not only reduces labor costs, but also reduces labor costs. At the same time, it can improve the accuracy of information and the management of information.
- the textual spoken language comprehension and answer of the intelligent chat system of the present invention can be obtained by Chinese part-of-speech tagging, and then the set of keywords and colloquial text understanding corpus are mapped to a concept sentence; according to the concept sentence, the dialogue corpus, the historical record
- the information and information database or network gives an answer to the concept statement. As shown in FIG.
- the main process is to input the input text through the part-of-speech tag 4 of the word segmentation unit, and perform part-of-speech tagging on the input text to obtain a word set with part-of-speech tagging; That is, the mapping module 5 searches the mapping corpus 7 according to the word set, and maps the concept sentence; then the search unit, that is, the search module 6, searches the dialog corpus 8 according to the concept sentence, and maps the output text.
- mapping corpus 7 that is, the database 7 is a description of the mapping from the keyword set to the concept statement
- the specific description format can be as shown in Figure 4, which defines 14 Chinese part of speech, and gives Out of each set of keyword sets should be.
- Dialogue Corpus 8 that is, database 8 is mainly the record answer to the concept statement
- Figure 5 is straight.
- the specific format description of the answer to the concept statement does not involve Environmental and historical information
- Figure 6 is a description and record of the answer statement given based on historical information, environmental information, and current concept statements.
- Figure 7 is the default answer library. The program will specify from the default answer library when needed. The way to give the output text.
- the speech recognition module can get “what is your name” under better conditions, and the 3 ⁇ 4 verbal annotation will get a participle and part of speech result, "you (pronoun) (Auxiliary) Name (noun) Yes (verb) What (pronoun) " , enter the mapping process, compare the scores of the part-of-speech tagging results with the conceptual corpus, and get the three highest-scoring concept statements, such as the score from high to low Arrange "What is your name”, "What is the name”, "Do you know the name?”, which obviously expresses the meaning of the highest score, which is the conceptual statement obtained by mapping, according to the concept. Statement, search the dialogue corpus, Can get an answer. For some statements, such as "like", the system needs to know the context of the environment. By matching the information in the previous item, you can know how to answer, such as "What movie do you like?"
- the smart chat system may further include a pre-processing unit, configured to replace the word collection information, add a dialogue flag, or set a dialog flag to the word collection from the word segmentation unit.
- a pre-processing unit configured to replace the word collection information, add a dialogue flag, or set a dialog flag to the word collection from the word segmentation unit.
- the smart chat system may further include a post-processing unit for performing the following processing on the output text from the search unit: adding or storing history information, setting a conversation theme And adding relevant information obtained by searching, and obtaining the output text output to the speech synthesis module.
- the accuracy of the information can be increased, the user's information can be easily understood, and information that is easy for the user to understand and more accurate can be issued.
- the present invention also provides a method for intelligent chat system, as shown in FIG. 8, a smart chat system for a text comprehension answering module including an output text according to input text, comprising the steps of:
- Step A1 may further include: setting a weight value for the part of speech of the mapping corpus, wherein the weight value may be obtained by orthogonal optimization or two orthogonal optimization methods. The specific orthogonal optimization or two orthogonal optimization methods will be described in detail later.
- the method may further include: converting the input voice into input text, that is, collecting external voice information and converting into text information. If natural language interaction is not considered, only the angle of text interaction is considered, and the step of converting the input speech into input text can be omitted.
- step A3 Perform matching calculation on the word set and the word set of the keyword of the mapping corpus to obtain a concept statement.
- the method may further include the steps of: b. determining that the input text has the following conditions: if the demonstrative pronoun occurs, the subject does not change, or the common sense needs to be added, respectively, the pre-processing step is performed correspondingly: replacing the word set information, adding the dialogue flag or Set the dialog flag bit, otherwise go to step A3; B2, judge whether the pre-processing is completed, if yes, return the success flag, execute step A4, otherwise return the failure flag, and perform step A3.
- the replacement word collection information is required to be replaced when the current user input text contains the demonstrative pronoun, for example, the user input: "Is that city beautiful?"
- the history of the chat or the information stored in the database can be queried. For example, if the city where historical information is stored is Shenzhen, you need to replace it with it. Is Shenzhen beautiful? And for subsequent processing.
- the dialogue mark mainly indicates whether the conversation topic has been converted. When a new topic appears, the topic of the conversation is modified. For example, when the user is talking about the weather at first, but suddenly becomes a car, it is necessary to modify the topic of the conversation, add or set the dialogue flag, and invalidate or change the history information. Setting the dialog flag is a similar concept to adding a dialog flag. 'The dialog flag needs to be added when the theme first appears. The dialog flag needs to be set when the theme changes.
- step A4 Search the dialog corpus according to the concept statement to generate an output text.
- step A4 still Post-processing steps can be included: adding or storing historical information, setting conversation topics, and joining relevant information for the search.
- the historical information contains sentences that have been spoken with the user, as well as some other important information, such as the name, age, hobbies, etc.
- the topic of the conversation refers to the topics currently discussed, such as weather, stocks, news, culture, Sports, etc., this is an effective reminder for the robot to search and answer information;
- the relevant information of the search means that according to the topic of the conversation, the user's needs can be satisfied by searching the database or the network, for example, when talking about the weather, according to the user
- the time and place given, the weather of the corresponding city or region, or the change of the weather, etc., by querying the relevant information obtained by these searches, can give the user's required answer.
- it can be used to increase the answer accuracy, so that the output text is more accurate.
- step A5 may also be included: converting the output text into an output speech. If you do not consider natural language interactions, just consider the angle of text interaction, you can omit the step of converting the output text to output speech.
- step A6 may also be included, the user evaluates the output voice, and the text understanding response module adjusts the weight value according to the evaluation.
- a personal information file may also be established for each user, that is, a step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, the weight value is read and Corresponding to the mapping corpus.
- the evaluation is subjective.
- the user can give three levels of evaluation, for example, good, okay, bad, or other evaluations of other levels, which are not additionally limited in the present invention.
- the confirmation information can also be given by voice; at the same time, the system adjusts the weight value of the part of speech of the mapping corpus according to the result.
- the present invention also provides a method of oral understanding. Due to the difference in the quietness of the user's use environment and the characteristics of the speech recognition software used, as well as some repetitions, omissions, pauses, sick sentences and various rich expression methods for the same semantics, the automatic The output of speech recognition has uncertainty and diversity. Therefore, it is difficult to parse and express semantics according to the commonly used rules of natural language understanding. In fact, when human beings chat and communicate in a noisy environment, sometimes they can't hear every word spoken by the other party, but if you can understand the key words, and according to the context of the part, you can recover the other party's needs. The meaning of expression. So here, use the mapping of keywords to concept sentences to get the speaker's semantics, and the concept statements are directly represented by the corresponding natural statements.
- Figure 2 is a flow chart of the answer to the spoken text.
- the word segmentation module 9 the word collection with part-of-speech tagging is obtained.
- the Chinese word segmentation has been studied more and has a higher correct rate, and will not be described here.
- some input statements appear. Demonstrative pronouns, or conversations with the same theme, or common sense knowledge, need to be pre-processed; pre-processing 10 as needed, pre-processing 10, replacing or adding some necessary information to the dialogue flag Bit Set, the system can return the result of the preprocessing by directly returning a flag.
- the processing will directly enter the post-processing module 14 to give the final output text; if the processing is to be processed after the pre-processing, the matching sorting module 11 will be entered, according to the corpus shown in FIG.
- the part-of-speech tag set and the alternative part-of-speech set described by the keys attribute in the corpus are matched and calculated.
- Different part of speech has different weights, and each alternative concept sentence in the corpus gives a score, for example, "What is your name? The name ", the most expressive semantics in this sentence is the noun "name", other relatives are less important, so when matching, the most important part of speech should be matched; the degree of matching of this part of speech directly affects the concept The accuracy of the statement.
- the matching sorting module will eventually form a set with the three highest score patterns. Because of the inherent insufficiency of speech recognition and the influence of the use environment, it may appear that the recognized text is not a complete statement at all, or even a confusing text. In this case, the result of the word segmentation will be poor, and the mapping obtained by mapping will be obtained. The scores of the statements are all zero. In this case, the chat system is considered to have not heard the speaker at all, and the set of concept statements is set to null.
- the set is empty, directly enter the default corpus as shown in Figure 7; if the set is not empty, select the highest score statement to compare with the first threshold 12, when the score is less than the threshold, also directly enter Figure 7
- the default corpus shown when the score is not less than the threshold, will successfully get the mapped concept statement, and its corresponding pattern is used as the concept statement.
- a typical test set of 100 sentences can be selected, and the test result is scored by matching, and the threshold with the highest score is selected as the first threshold here. .
- the search module 13 After obtaining the concept statement, through the search module 13, according to some historical information and the corpus shown in Figure 6, try to give the response text, which is a search process, using the current concept statement and the previous system answer statement as input to search Because it is not necessary to satisfy both inputs at the same time, it is possible that the result of the search is empty. If the answer text is searched, it is regarded as successful, and the answer output will be directly sent to the post-processing module 14 for processing; if the search result is empty, it will be regarded as a failure, and the corpus as shown in FIG. 5 will be entered. The answer is made and the final output is also entered into the post-processing module 14.
- the output statement performs corresponding processing in the post-processing module 14, wherein some historical information is added, or historical information is stored, the state of the conversation topic is set, and the query of the related information is searched, and finally the answer text is formed, and the speech synthesis module is returned. .
- the generation of the final answer text can be generated jointly based on the answers to the concept sentences, the search for information, and historical information.
- the invention also provides a structure and description storage method of a dialogue corpus.
- a storage structure description language based on XML is designed to describe these non- Structured data structures, and use XML documents to describe the corpus and relational databases to store data.
- the mapping corpus and dialog corpus and historical information are both described and stored using XML. And define the attribute nodes needed to describe the corpus.
- Number According to the library, a collection of parts of speech, concept sentences, answer statements, and historical information are stored. It is characterized by easy organization and management, and can dynamically modify the contents of the corpus.
- Various corpora can manually modify and add data by manual methods, and can directly add and modify corpora through voice interaction, and can automatically store specific data.
- the present invention also provides a process and method for learning knowledge by voice.
- the knowledge accumulation of the chat system can be informed by the interlocutor in a natural interaction manner, and through mutual inquiry to determine whether the chat system obtains the knowledge given by the user, and the chat system will give corresponding natural language feedback.
- the invention also provides a method for recording and using chat context information.
- the system automatically stores some information into the context record during the interaction with the person, stores some important information and conversation content, and adds corresponding information during the dialogue, and dynamically organizes the response according to the information. Statement.
- the invention also provides an optimization of part-of-speech weights and an online learning method.
- mapping keywords to conceptual statements each keyword with different parts of speech will have different weights.
- An optimized method is used to obtain individual ortho-optimal weight values, and the weight values can be dynamically modified through online learning.
- mapping keywords to corresponding concept sentences it is necessary to weight the part of speech of different keywords. Keywords with different part of speech have different weights in the process of expressing sentence semantics. Usually, nouns and verbs of a sentence have higher Weight, the understanding of sentence semantics has important significance. However, natural language has many types of parts of speech, and the weight of each part of speech does not have a certain value. Therefore, the optimization method of part-of-speech weights and the online learning method are proposed to maximize the correct rate of keyword-to-concept statement mapping.
- 14 words of speech are obtained, and according to the knowledge of linguistics, 14 parts of speech are divided into two groups, for example, nouns, verbs, pronouns, nouns, adjectives, time words, names, etc.
- a set of modal particles, orientation words, distinguishing words, auxiliary words, idioms, adverbs, numerals, and these seven parts of speech are the second group, and a set of available weights will be obtained through two sets of orthogonal optimization experiments.
- the relatively more important seven attributes were used as factors, three levels, such as 3, 2, and 1, to select the orthogonal test table for the L18-3-7 standard.
- the other 7 parts of speech will be set to 0.
- each sentence is of a spoken type, and in the test set, try to make each part of speech appear as a natural probability.
- each sentence in the test set was manually scored according to the rationality of the matched concept statement, and the score was taken as the result of this test. This will test 18 rounds.
- a set of currently optimal weight values is obtained.
- the relatively important 7 parts of speech gave the first set of trials a weight value.
- use orthogonal optimization for example, use 2 For the level of 1, 0, the orthogonal test table of the L18-3-1 standard is also selected. The remaining 7 parts of speech were optimized using the same test set and scoring criteria as the first.
- the two parts of speech obtained are combined to obtain the weights of the 14 parts of speech available to the system.
- the database will be trained by voice.
- the user inputs a test input voice into the mapping module 15, the mapping module 15 is the mapping module 5 shown in FIG. 2, and the mapping result is given to the user in the form of voice.
- the discriminating module 16 the user will give an evaluation according to the feedback, and the discriminating module 16 will adjust the weight according to the algorithm in the weight adjusting module 17 by the evaluation, and send the adjusted weight to the mapping module 15 to perform the weight adjustment of the next round. Until the end of the user satisfaction match.
- the invention also provides a natural language behavior driving method.
- Command-driven in natural spoken language in word-of-speech to conceptual statements, and from conceptual statements to final answers and feedback, there are specific formats and action-driven scripts that can naturally drive the system or issue commands in a spoken language.
- behavior-driven it is no longer to use the system's pre-defined phrases or simple imperatives to drive the system's behavior, but to give some correct responses to some natural command expressions, and to confirm and respond by voice to reach the reminder.
- User's function This behavior-driven approach is more in line with people's daily habits, and it can be driven by natural language for new users without much learning.
- the invention also provides an embedded implementation system for voice chat.
- voice chat For the design framework of this kind of voice chat, there are various implementation methods, such as using the voice recognition chip to complete the recognition function and mapping the storage corpus, using the embedded system to realize the speech recognition and speech synthesis similar to the ordinary processor, and language understanding.
- Embedded implementation is one of them. It requires automatic speech recognition, semantic understanding and speech synthesis under a specific embedded operating system. At the same time, integration is required, and various implementation softwares on different platforms may differ.
- This solution is fully equipped with the inherent features of a voice chat system, and is characterized by its portability, low power consumption, compact size, and low price.
- the present invention also provides a method of querying and answering information naturally by sound.
- Information is queried and fed back using natural speech and is able to give answers in a human language. It can satisfy people's use of a natural language to communicate the information they need, and use interactive methods to ask, answer and confirm information. And the data can come from existing databases and from the Internet.
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
An intelligent dialog system includes a text-comprehending answering module (2) which is used to obtain an output text based on an input text. The module includes a word dividing unit, a mapping corpus (7), a mapping unit (5), a dialog corpus (8) and a searching unit (6). The word dividing unit is used to tag parts of speech for said input text (4), and obtain a word set with parts of speech tagging. The mapping corpus (7) is used to set and store a mapping relationship between key words and concept sentences. The mapping unit (5) is used to search said mapping corpus (7) based on said word set and map to obtain a concept sentence (5). The dialog corpus (8) is used to set and store mapping relationship between concept sentences and output text. The searching unit (6) is used to search said dialog corpus (8) based on said concept sentence and obtain the output text.
Description
一种智能聊天系统及其实现方法 技术领域 Intelligent chat system and implementation method thereof
本发明涉及人机语音交互领域, 尤其涉及一种应用于家庭服务机器人、 娱乐机器人 以及语音对话领域的以自然语言作为媒介的智能聊天系统及其实现方法。 背景技术 The invention relates to the field of human-machine voice interaction, and particularly relates to an intelligent chat system and a realizing method thereof, which are applied to a home service robot, an entertainment robot and a voice dialogue field. Background technique
随着社会老龄化的来临, 社会节奏的加快, 人们缺乏面对面的沟通, 更多的是通过 电话、 邮件和网络。 因此, 某些人群可能会产生一种孤独的感觉, 或者很难找到合适的人 来聊天解闷,找不到可以倾诉自己感情的地方,他们希望有一个可以倾诉自己情感的途径, 帮助排遣寂寞, 或者给与一些特定的帮助。 With the advent of society and the acceleration of social rhythm, people lack face-to-face communication, more through telephone, mail and internet. Therefore, some people may have a feeling of loneliness, or it is difficult to find the right person to chat and relieve boredom. They can't find a place where they can talk about their feelings. They hope to have a way to talk about their feelings and help to let go of loneliness. Or give some specific help.
并且, 在现代社会快节奏和高压力的环境下, 人们希望被别人理解, 排解自身的压 力, 倾诉自我, 对于一种能够用自然语言交流, 并且能够倾听、 理解和回答的智能实体是 有需求的。 尤其是对于年老的人, 为了防止痴呆或者失忆, 对于一种能够进行语言交流和 语音提醒的装置有很大的需求。对于某些特定的人, 用自然的语言进行交互来获得自己想 要的信息是有必要的。 Moreover, in the fast-paced and high-stress environment of modern society, people want to be understood by others, to relieve their own pressure, to confide themselves, and to have a smart entity that can communicate in natural language and can listen, understand and answer. of. Especially for elderly people, in order to prevent dementia or amnesia, there is a great demand for a device capable of language communication and voice reminding. For some specific people, it is necessary to interact in a natural language to get the information you want.
在家庭智能服务机器人中, 人们希望能够用自然语言来操作和控制机器人的部分功 能, 达到人与机器人的和谐, 来更好的服务人类。 所以语音聊天系统对于人类和社会都有 重大的意义。 市场上有很多简单的语音对话玩具, 其技术主要是使用语音识别芯片, 进行 波形匹配, 与提前录入的语音回答建立映射关系, 来达到对输入语句的回答。 因此这种产 品对话数量有限, 并且不能动态地添加会话和进行理解, 并不能真正达到与人自然交互的 目的。 In the home intelligent service robot, people hope to use natural language to operate and control some of the functions of the robot, to achieve harmony between man and robot, to better serve human beings. Therefore, the voice chat system has great significance for human beings and society. There are many simple voice dialogue toys on the market. The technology mainly uses the voice recognition chip to perform waveform matching and establish a mapping relationship with the voice answers recorded in advance to reach the answer to the input sentence. Therefore, the number of such product conversations is limited, and it is not possible to dynamically add conversations and understanding, and it is not really possible to achieve natural interaction with people.
另外,就是存在于一些即时聊天工具之上的聊天智能实体,其主要技术是通过如 MSN, QQ等聊天工具, 构造 T一个虚拟的智能体, 并且依附于互联网之上, 通过信息检索和数 据库査询来回答问题和进行聊天。特点是用文字作为交流的媒介, 并且是完全依附于互联 网或者是通讯网络之上的; 这种智能实体并不能运用自然语言来与人进行交流, 缺乏与机 器真实语言对话的体验和乐趣, 不能满足前面叙述的各种社会需求。 In addition, it is a chat intelligent entity that exists on some instant chat tools. Its main technology is to construct a virtual agent through a chat tool such as MSN, QQ, etc., and attach it to the Internet, through information retrieval and database check. Ask to answer questions and chat. It is characterized by the use of words as a medium of communication, and is completely attached to the Internet or the communication network; such intelligent entities cannot use natural language to communicate with people, lack the experience and fun of dialogue with the real language of the machine, Meet the various social needs described above.
现有技术的语音聊天还包括自动语音识别、 口语文本理解、 语音合成步骤, 在识别
准确率高时合成效果较佳; 口语文本理解一般是试图通过语义分析来识别, 可以采用语义 框架, 或者是本体表示方法等来实现。语义分析即根据输入语句的句法结构和句中每个实 词的词义推导出能反映这个语句意义的某种形式化方法; 语义框架是语义分析的载体, 并 且有些系统是使用本体来表示或者组织语义框架的。但是, 语义框架的主要难度在于如何 表达语义, 并且由于语义框架的语义表达是经验主义的, 很难有统一的标准, 并且数量是 海量的, 这样会导致语义框架建立的困难。 The prior art voice chat also includes automatic speech recognition, spoken text comprehension, speech synthesis steps, in recognition The synthesis effect is better when the accuracy is high; the colloquial text understanding is generally attempted to be identified by semantic analysis, and can be implemented by using a semantic framework or an ontology representation method. Semantic analysis derives a formal method that reflects the meaning of the sentence based on the syntactic structure of the input sentence and the meaning of each real word in the sentence. The semantic framework is the carrier of semantic analysis, and some systems use ontology to represent or organize semantics. The frame. However, the main difficulty of the semantic framework is how to express semantics, and because the semantic expression of the semantic framework is empirical, it is difficult to have a unified standard, and the number is massive, which will lead to the difficulty of establishing the semantic framework.
因此, 现有技术存在缺陷, 需要改进。 发明内容 Therefore, the prior art has drawbacks and needs improvement. Summary of the invention
本发明的目的在于提供一种智能聊天系统及其实现方法,应用于家庭服务机器人、娱 乐机器人以及语音对话领域。 It is an object of the present invention to provide an intelligent chat system and an implementation method thereof for use in the field of home service robots, entertainment robots, and voice conversations.
本发明的技术方案如下: The technical solution of the present invention is as follows:
一种智能聊天系统,其中,包括用于根据输入文本得到输出文本的文本理解回答模块; 所述文本理解回答模块包括分词单元、 基于 XML的映射语料库、 映射单元、 基于 XML的对- 话语料库和搜索单元; 所述分词单元用于对所述输入文本进行词性标注, 得到具有词性标' 注的词语集合; 所述映射语料库用于建立和存储关键词到概念语句的映射关系; 所述映射 单元用于根据所述词语集合, 搜索所述映射语料库, 映射得到概念语句; 所述对话语料库 用于建立和存储概念语句到输出文本的映射关系; 所述搜索单元用于根据所述概念语句, 搜索所述对话语料库, 映射得到输出文本。 An intelligent chat system, comprising: a text comprehension answering module for obtaining output text according to input text; the text comprehension answering module comprises a word segmentation unit, an XML-based mapping corpus, a mapping unit, an XML-based pair-discourse corpus, and Searching unit; the word segmentation unit is configured to perform part-of-speech tagging on the input text to obtain a word set having a part-of-speech tag; the mapping corpus is used to establish and store a mapping relationship between a keyword and a concept statement; And searching for the mapping corpus according to the set of words, mapping to obtain a concept statement; the dialog corpus is used to establish and store a mapping relationship between the concept sentence and the output text; the searching unit is configured to search according to the concept statement The dialog corpus maps the output text.
所述的智能聊天系统, 中,其还包括用于将输入语音转化为输入文本的语音识别模 块。 In the smart chat system, it also includes a voice recognition module for converting input speech into input text.
所述的智能聊天系统,其中,其还包括用于将输出文本转换成输出语音的语音合成模 块。 The intelligent chat system, wherein it further comprises a speech synthesis module for converting the output text into an output speech.
所述的智能聊天系统,其中,所述映射语料库和所述对话语料库设置在同一语料库中。 所述的智能聊天系统, 其中, 还包括预处理单元, 用于将来自所述分词单元的所述词 语集合, 进行替换词语集合信息、 增加对话标志或者设置对话标志位, 得到供所述映射单 元使用的所述词语集合。 The intelligent chat system, wherein the mapping corpus and the conversation corpus are set in the same corpus. The smart chat system, further comprising a pre-processing unit, configured to: replace the word set information, add a dialog flag, or set a dialog flag bit to the word set from the word segmentation unit, to obtain the mapping unit The set of words used.
所述的智能聊天系统, 其中, 还包括后处理单元, 用于将来自所述搜索单元的所述输 出文本,进行以下处理:加入或存储历史信息、设置谈话主题、加入搜索得到的相关信息, 得到输出到语音合成模块的所述输出文本。
. 一种智能聊天系统的实现方法,用于包括根据输入文本得到输出文本的文本理解回答 模块的智能聊天系统, 其包括步骤: Al、 建立基于 XML的映射语料库和对话语料库, 所述 映射语料库建立和存储关键字到概念语句的映射关系,所述对话语料库建立和存储概念语 句到输出文本的映射关系; A2、对所述输入文本进行词性标注, 得到具有词性标注的词语 集合; A3、 对所述词语集合和所述映射语料库的关键词的词语集合进行匹配计算, 得到概 念语句; A4、 根据所述概念语句, 搜索所述对话语料库, 生成输出文本。 The smart chat system further includes a post-processing unit, configured to: perform the following processing on the output text from the search unit: adding or storing history information, setting a conversation topic, and adding relevant information obtained by the search, The output text output to the speech synthesis module is obtained. An intelligent chat system implementation method for an intelligent chat system comprising a text comprehension answering module for outputting text according to input text, comprising the steps of: Al, establishing an XML-based mapping corpus and a dialogue corpus, the mapping corpus establishment And storing a mapping relationship between the keyword and the concept statement, the dialog corpus establishes and stores a mapping relationship between the concept sentence and the output text; A2, performing part-of-speech tagging on the input text, and obtaining a word set with part-of-speech tagging; A3, The set of words and the set of words of the keyword of the mapping corpus are matched and calculated to obtain a concept statement; A4. Searching the dialog corpus according to the concept statement to generate an output text.
所述的实现方法, 其中, 在步骤 A2之前, 还包括步骤: 将输入语音转化为输入文本。 所述的实现方法, 其中, 还包括步骤 A5: 将输出文本转换成输出语音。 The implementation method, wherein, before the step A2, the method further includes the step of: converting the input voice into the input text. The implementation method further includes the step A5: converting the output text into an output voice.
所述的实现方法, 其中, 在步骤 A4之后, 还包括用于增加回答准确度的后处理步骤: 加入或存储历史信息、 设置谈话主题、 加入搜索的相关信息。 The implementation method, after step A4, further includes a post-processing step for increasing the answer accuracy: adding or storing history information, setting a conversation topic, and adding related information of the search.
所述的实现方法, 其中, 在步骤 A3之前, 还包括步骤: Bl、 判断输入文本存在下述 情况: 出现指示代词、 主题没有变化、 或者需要加入常识, 则分别对应执行预处理步骤: 替换词语集合信息、 增加对话标志或者设置对话标志位, 否则执行步骤 A3; B2、 判断预 处理是否完成, 是则返回成功标志, 执行步骤 A4,否则返回失败标志, 执行步骤 A3。 The implementation method, wherein, before the step A3, the method further includes the steps of: b. determining that the input text has the following conditions: when the demonstrative pronoun occurs, the subject does not change, or the common sense needs to be added, the pre-processing steps are respectively performed: Set the information, add the dialog flag or set the dialog flag. Otherwise, go to step A3. B2. Determine whether the pre-processing is completed. If yes, return the success flag. Perform step A4. Otherwise, return the failure flag and go to step A3.
所述的实现方法, 其中, 所述映射语料库和所述对话语料库设置在同一语料库中。 所述的实现方法, 其中, 步骤 A1还包括: 对所述映射语料库的词性设置权重值, 其 中, 所述权重值采用正交优化或两次正交优化方法获得。 The implementation method, wherein the mapping corpus and the conversation corpus are set in the same corpus. In the implementation method, the step A1 further includes: setting a weight value for the part of speech of the mapping corpus, wherein the weight value is obtained by orthogonal optimization or two orthogonal optimization methods.
所述的实现方法, 其中, 还包括步骤 A6, 用户对所述输出语音进行评价, 所述文本 理解回答模块根据所述评价调整所述权重值。 The implementation method further includes a step A6, the user evaluates the output voice, and the text understanding answering module adjusts the weight value according to the evaluation.
所述的实现方法, 其中, 还包括为用户存储个人信息步骤, 并将所述权重值存储到用 户的个人信息中; 在用户登录时, 读取所述权重值并对应调整所述映射语料库。 The implementation method further includes the step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, reading the weight value and correspondingly adjusting the mapping corpus.
采用上述方案,本发明建立了具有词性权重优化和学习功能的语料库,对语义进行了 映射和归类, 同时建立了映射语义之间的回答; 从而能够运用自然语言来与人进行交流, 准确性较高, 还提供了语言交流和语音提醒功能; 并且实现了人与机器之间的真实语言对 话, 使用户得到真实语言对活的体验和乐趣。 附图说明 Using the above scheme, the present invention establishes a corpus with part-of-speech weight optimization and learning functions, maps and categorizes semantics, and establishes an answer between mapping semantics; thereby enabling natural language to communicate with people, accuracy Higher, it also provides language communication and voice reminder; and realizes the real language dialogue between the person and the machine, so that the user can get the real language experience and fun. DRAWINGS
图 1为本发明的聊天系统的总体框架图; 1 is a general frame diagram of a chat system of the present invention;
图 2为本发明的口语文本理解回答流程图; 2 is a flow chart of answering the spoken text of the present invention;
图 3为本发明的口语文本理解回答模块示意图;
图 4为本发明的映射语料库的映射描述格式示意图; 3 is a schematic diagram of a spoken language text understanding answering module of the present invention; 4 is a schematic diagram of a mapping description format of a mapping corpus according to the present invention;
图 5为本发明的对话语料库的概念语句的直接回答格式描述示意图; 5 is a schematic diagram showing a direct answer format description of a conceptual statement of a dialog corpus of the present invention;
图 6为本发明的对话语料库的带历史信息回答的格式描述示意图; 6 is a schematic diagram showing a format description of a reply with a history information of a dialog corpus of the present invention;
图 7为本发明的对话语料库的默认回答库的格式描述示意图; 7 is a schematic diagram showing the format of a default answer library of a dialog corpus of the present invention;
图 8为本发明方法的流程图; Figure 8 is a flow chart of the method of the present invention;
图 9为本发明的词性权重一种优化方法示意图; 9 is a schematic diagram of an optimization method for part-of-speech weights of the present invention;
图 10为本发明的词性权重的在线学习流程图。 具体实施方式 FIG. 10 is a flow chart of online learning of part-of-speech weights according to the present invention. detailed description
本发明的目的是构建一个在文字交互的基础上,还能够用语音交互,具有智能的聊天 系统, 或者是机器人, 以满足人们的需求。 以下对本发明的较佳实施例加以详细说明。 The object of the present invention is to construct a chat system based on text interaction, with an intelligent chat system, or a robot to meet people's needs. Preferred embodiments of the present invention are described in detail below.
本发明提供了一种语音聊天系统, 具体地说, 为了实现自然语言交互, 本发明可以釆 用三大基本模块的基本框架: 自动语音识别模块 (声音到文本、 Automatic Speech Recognition, ASR、 Speech to Text, STT), 用户的自然^音通过自动语音识别得到相应 的文本,即语音识别模块用于将输入语音转化为输入文本; 口语文本理解回答模块 (文本 到文本、 Text to Text, TTT), 即用于根据输入文本得到输出文本的文本理解回答模块, 据此智能聊天系统对文本进行口语理解, 产生回答文本, 这个过程中会使用各种需要的语 料库和系统的聊天纪录历史信息; 用于将输出文本转换成输出语音的语音合成模块(文本 到声音、 Speech Synthesis^ Text to Speech > TTS), 由语音合成模块将回答文本通过语 音和用户进行交互。 如果不考虑自然语言交互, 仅考虑文本交互的角度, 可以仅包括文本 理解回答模块。 The present invention provides a voice chat system. Specifically, in order to implement natural language interaction, the present invention can employ the basic framework of three basic modules: Automatic Speech Recognition Module (Audio to Text, Automatic Speech Recognition, ASR, Speech to Text, STT), the user's natural sound is obtained by automatic speech recognition, that is, the speech recognition module is used to convert the input speech into input text; the spoken text comprehension answer module (Text to Text, Text to Text, TTT), That is, the text comprehension answering module for obtaining the output text according to the input text, according to which the intelligent chat system performs spoken language comprehension on the text, and generates an answer text, in which a variety of required corpora and system chat history information are used; The speech synthesis module (Text to Speech, TTS) converts the output text into an output speech, and the speech synthesis module interacts the answer text with the voice through the voice. If you do not consider natural language interactions, just consider the perspective of text interaction, you can include only the text understanding response module.
自动语音识别模块和语音合成模块可以使用现有市场上提供的模块,包括在嵌入式平 台上的相应的模块软件, 对其要求主要是要识别准确率高, 合成效果最佳。 The automatic speech recognition module and the speech synthesis module can use the modules provided in the existing market, including the corresponding module software on the embedded platform, and the requirements thereof are mainly to identify the high accuracy and the best synthesis effect.
对于文本理解回答模块,本专利使用的理解方法是将语义进行映射和归类, 同时建立 映射语义之间的回答, 较传统方法而言, 实现简单, 但会面临巨大的语义空间和类别。 人 发出的口语声音信号经由自动语音识别模块变成相应的文本文字, 口语理解回答模块将输 入的文本文字进行处理并且根据对话语料库和对话上下文给出文本回答,最后语音合成模 块会将得到的文本回答转换成声音信号, 和使用者交互。 当然也可以是简单的过程: 口语 理解回答模块将输入的文本文字进行处理并且根据对话语料库和对话上下文给出文本回 答, 不包括声音的输入或输出。
如图 1所示, 语音聊天系统能以用户的语音输出作为系统的输入, 例如通过话筒, 将 语音信号传至语音识别模块 1 , 将语音转化为文本, 进入口语文本理解回答模块 2, 在这 个模块中将会执行图 2的整个过程和使用相应的数据库, 并返回相应的回答语句文本, 回 答语句文本将会进入语音合成模块 3, 将文本转换成语音, 通过喇叭让用户能听到反馈。 本发明不仅能用于语音聊天, 还可以应用于各种信息查询系统、 自动导游系统、 自动介绍 系统以及语言学习系统等等,可以在各种需要信息输出的场合使用,不仅能减少人力成本, 同时能提高信息的准确度以及对信息的管理。 For the text understanding answer module, the understanding method used in this patent is to map and classify the semantics, and at the same time establish the answer between the mapping semantics. Compared with the traditional method, the implementation is simple, but it faces huge semantic space and categories. The spoken voice signal sent by the person becomes the corresponding text text via the automatic speech recognition module, and the spoken language understanding answering module processes the input text text and gives a text answer according to the dialogue corpus and the conversation context, and finally the speech synthesis module will get the text. The answer is converted into a sound signal that interacts with the user. Of course, it can also be a simple process: The Speaking Comprehension Answering module processes the input text and provides a textual response based on the dialog corpus and the context of the conversation, excluding the input or output of the sound. As shown in Figure 1, the voice chat system can use the user's voice output as the input of the system, for example, through the microphone, the voice signal is transmitted to the voice recognition module 1, the voice is converted into text, and the spoken text understanding answer module 2 is entered. The module will execute the whole process of Figure 2 and use the corresponding database, and return the corresponding answer sentence text. The answer text will enter the speech synthesis module 3, convert the text into speech, and let the user hear the feedback through the speaker. The invention can be applied not only to voice chat, but also to various information inquiry systems, automatic tour guide systems, automatic introduction systems, language learning systems, etc., and can be used in various occasions where information is required to be output, which not only reduces labor costs, but also reduces labor costs. At the same time, it can improve the accuracy of information and the management of information.
本发明所述智能聊天系统的文本口语理解和回答,可以通过汉语词性标注,得到关键 词集合, 然后由这个集合和口语文本理解语料库映射到一个概念语句上; 根据概念语句、 对话语料库、 历史纪录信息和信息数据库或网络给出对概念语句的回答。 如图 3所示, 在 口语文本理解回答模块 2, 其主要过程是将输入文本通过分词单元的词性标注 4, 对所述 输入文本进行词性标注, 得到具有词性标注的词语集合; 然后映射单元, 即映射模块 5, 根据所述词语集合, 搜索映射语料库 7, 映射得到概念语句; 然后搜索单元, 即搜索模块 6, 根据所述概念语句, 搜索对话语料库 8, 映射得到输出文本。 其中会涉及到两种数据 库, 其中映射语料库 7, 即数据库 7是描述的从关键词集合到概念语句的映射, 具体的描 述格式可以如图 4所示, 其中定义了 14种汉语词性, 以及给出了每一组关键字集合应该. 对应的一个概念语句; 对话语料库 8, 即数据库 8主要是记录对概念语句回答, 图 5是直. 接对概念语句的回答的具体格式描述, 不涉及到环境和历史信息; 图 6是同时根据历史信 息、 环境信息以及当前的概念语句给出的回答语句的描述和记录; 图 7是默认回答库, 程 序会在需要的时候从默认回答库中按指定方式给出输出文本。例如, 当用户说 "你的名字 是什么" , 通过语音识别模块在较好条件下能得到 "你的名字是什么" , ¾过词性标注会 得到一个分词及词性结果, "你 (代词) 的 (助词) 名字 (名词) 是 (动词) 什么 (代 词) " , 进入映射过程, 通过词性标注结果集合与概念语料库中进行比较评分, 会得到三 个最高分的概念语句,例如分数从高到低排列 "你的名字是什么" , "叫什么名字" , "你 知道名字吗" , 显然表达的就是最高分的意义, 其也就是映射得到的概念语句, 根据概念. 语句, 搜索对话语料库, 就能得到回答。 对于某些语句, 例如 "喜欢" , 此时系统需要知 道上下文的环境, 通过前项信息匹配, 就能知道如何回答, 比如 "你喜欢什么电影? "等 等。 The textual spoken language comprehension and answer of the intelligent chat system of the present invention can be obtained by Chinese part-of-speech tagging, and then the set of keywords and colloquial text understanding corpus are mapped to a concept sentence; according to the concept sentence, the dialogue corpus, the historical record The information and information database or network gives an answer to the concept statement. As shown in FIG. 3, in the spoken text comprehension answering module 2, the main process is to input the input text through the part-of-speech tag 4 of the word segmentation unit, and perform part-of-speech tagging on the input text to obtain a word set with part-of-speech tagging; That is, the mapping module 5 searches the mapping corpus 7 according to the word set, and maps the concept sentence; then the search unit, that is, the search module 6, searches the dialog corpus 8 according to the concept sentence, and maps the output text. There are two kinds of databases involved, wherein the mapping corpus 7, that is, the database 7 is a description of the mapping from the keyword set to the concept statement, the specific description format can be as shown in Figure 4, which defines 14 Chinese part of speech, and gives Out of each set of keyword sets should be. Corresponding to a concept statement; Dialogue Corpus 8, that is, database 8 is mainly the record answer to the concept statement, Figure 5 is straight. The specific format description of the answer to the concept statement, does not involve Environmental and historical information; Figure 6 is a description and record of the answer statement given based on historical information, environmental information, and current concept statements. Figure 7 is the default answer library. The program will specify from the default answer library when needed. The way to give the output text. For example, when the user says "What is your name?", the speech recognition module can get "what is your name" under better conditions, and the 3⁄4 verbal annotation will get a participle and part of speech result, "you (pronoun) (Auxiliary) Name (noun) Yes (verb) What (pronoun) " , enter the mapping process, compare the scores of the part-of-speech tagging results with the conceptual corpus, and get the three highest-scoring concept statements, such as the score from high to low Arrange "What is your name", "What is the name", "Do you know the name?", which obviously expresses the meaning of the highest score, which is the conceptual statement obtained by mapping, according to the concept. Statement, search the dialogue corpus, Can get an answer. For some statements, such as "like", the system needs to know the context of the environment. By matching the information in the previous item, you can know how to answer, such as "What movie do you like?"
所述智能聊天系统, 或者所述文本理解回答模块, 还可以包括预处理单元, 用于将来 自所述分词单元的所述词语集合, 进行替换词语集合信息、增加对话标志或者设置对话标
志位, 得到供所述映射单元使用的所述词语集合。 The smart chat system, or the text understanding answering module, may further include a pre-processing unit, configured to replace the word collection information, add a dialogue flag, or set a dialog flag to the word collection from the word segmentation unit. A list of words that are used by the mapping unit.
所述智能聊天系统, 或者所述文本理解回答模块, 还可以包^ 5后处理单元, 用于将来 自所述搜索单元的所述输出文本, 进行以下处理: 加入或存储历史信息、 设置谈话主题、 加入搜索得到的相关信息, 得到输出到语音合成模块的所述输出文本。 The smart chat system, or the text comprehension answering module, may further include a post-processing unit for performing the following processing on the output text from the search unit: adding or storing history information, setting a conversation theme And adding relevant information obtained by searching, and obtaining the output text output to the speech synthesis module.
利用上述预处理单元和后处理单元, 可以增加信息的准确度, 便于理解用户的信息, 以及发出让用户易于理解和准确度更高的信息。 By using the above-mentioned pre-processing unit and post-processing unit, the accuracy of the information can be increased, the user's information can be easily understood, and information that is easy for the user to understand and more accurate can be issued.
在此基础上, 本发明还提供了一种智能聊天系统的^^现方法, 如图 8所示,用于包括 根据输入文本得到输出文本的文本理解回答模块的智能聊天系统, 其包括步骤: On this basis, the present invention also provides a method for intelligent chat system, as shown in FIG. 8, a smart chat system for a text comprehension answering module including an output text according to input text, comprising the steps of:
Al、建立基于 XML的映射语料库和对话语料库,所述映射语料库建立和存储关键字到 概念语句的映射关系, 所述对话语料库建立和存储概念语句到输出文本的映射关系。步骤 A1 还可以包括: 对所述映射语料库的词性设置权重值, 其中, 所述权重值可以釆用正交 优化或两次正交优化方法获得。 具体的正交优化或两次正交优化方法在后面进行详细说 明。 Al. Establish an XML-based mapping corpus and a dialog corpus. The mapping corpus establishes and stores a mapping relationship between keywords and concept sentences, and the dialog corpus establishes and stores a mapping relationship between the concept sentences and the output text. Step A1 may further include: setting a weight value for the part of speech of the mapping corpus, wherein the weight value may be obtained by orthogonal optimization or two orthogonal optimization methods. The specific orthogonal optimization or two orthogonal optimization methods will be described in detail later.
A2、对所述输入文本进行词性标注, 得到具有词性标注的词语集合。词性标注用于后 续的匹配计算步骤。 在步骤 A2之前, 还可以包括步骤: 将输入语音转化为输入文本, 即 收集外部的语音信息, 转化为文字信息。 如果不考虑自然语言交互, 仅考虑文本交互的角 度, 可以省略输入语音转化为输入文本的步骤。 A2, performing part-of-speech tagging on the input text to obtain a set of words with part-of-speech tagging. The part-of-speech tag is used for subsequent matching calculation steps. Before step A2, the method may further include: converting the input voice into input text, that is, collecting external voice information and converting into text information. If natural language interaction is not considered, only the angle of text interaction is considered, and the step of converting the input speech into input text can be omitted.
A3、对所述词语集合和所述映射语料库的关键词的词语集合进行匹配计算,得到概念 语句。 在步骤 A3之前, 还可以包括步骤: Bl、 判断输入文本存在下述情况: 出现指示代 词、主题没有变化、或者需要加入常识, 则分别对应执行预处理步骤: 替换词语集合信息、 增加对话标志或者设置对话标志位, 否则执行步骤 A3; B2、 判断预处理是否完成, 是则 返回成功标志,执行步骤 A4,否则返回失败标志,执行步骤 A3。其中, 替换词语集合信息, 是在当前用户的输入文本含有指示代词时, 需要进行替换, 比如用户输入: "那个城市漂 亮吗? "此时可以査询聊天的历史记录或存储在数据库的信息, 例如, 历史信息存储的城 市是深圳, 则需要进行替换成, 深圳漂亮吗? 并作后续处理。 对话标志主要是指明谈话主 题是否发生了转换, 当有新的主题出现时, 就要修改谈话主题。 比如, 开始时用户在谈论 天气, 但突然变成汽车, 此时就要修改谈话主题, 增加或设置对话标志位, 使得历史信息 失效或者改变。设置对话标志是与增加对话标志相似的概念, '当第一次出现主题时需要增 加对话标志, 当主题变化时需要设置对话标志。 A3. Perform matching calculation on the word set and the word set of the keyword of the mapping corpus to obtain a concept statement. Before step A3, the method may further include the steps of: b. determining that the input text has the following conditions: if the demonstrative pronoun occurs, the subject does not change, or the common sense needs to be added, respectively, the pre-processing step is performed correspondingly: replacing the word set information, adding the dialogue flag or Set the dialog flag bit, otherwise go to step A3; B2, judge whether the pre-processing is completed, if yes, return the success flag, execute step A4, otherwise return the failure flag, and perform step A3. Wherein, the replacement word collection information is required to be replaced when the current user input text contains the demonstrative pronoun, for example, the user input: "Is that city beautiful?" At this time, the history of the chat or the information stored in the database can be queried. For example, if the city where historical information is stored is Shenzhen, you need to replace it with it. Is Shenzhen beautiful? And for subsequent processing. The dialogue mark mainly indicates whether the conversation topic has been converted. When a new topic appears, the topic of the conversation is modified. For example, when the user is talking about the weather at first, but suddenly becomes a car, it is necessary to modify the topic of the conversation, add or set the dialogue flag, and invalidate or change the history information. Setting the dialog flag is a similar concept to adding a dialog flag. 'The dialog flag needs to be added when the theme first appears. The dialog flag needs to be set when the theme changes.
A4、 根据所述概念语句, 搜索所述对话语料库, 生成输出文本。 在步骤 A4之后, 还
可以包括后处理步骤: 加入或存储历史信息、设置谈话主题、加入搜索的相关信息。其中, 历史信息包含了与用户曾经交谈的句子、 以及一些其它的重要信息, 如说话者的名字、 年 纪、 爱好等等; 谈话主题是指当前谈论的话题, 比如天气、 股票、 新闻、 文化、 体育等等, 这是为机器人对信息的搜索和回答的有效提示; 搜索的相关信息是指, 根据谈话主题, 可 以通过搜索数据库或网络来满足用户的需求, 比如, 谈到天气时, 根据用户给出的时间和 地点, 给出相应城市或者地区的天气, 或者给出天气的变化等, 通过查询这些搜索得到的 相关信息, 可以给出用户需要的回答。 并且, 通过上述后处理步骤, 可以用于增加回答准 确度, 使得输出文本准确性更高。 A4. Search the dialog corpus according to the concept statement to generate an output text. After step A4, still Post-processing steps can be included: adding or storing historical information, setting conversation topics, and joining relevant information for the search. Among them, the historical information contains sentences that have been spoken with the user, as well as some other important information, such as the name, age, hobbies, etc. of the speaker; the topic of the conversation refers to the topics currently discussed, such as weather, stocks, news, culture, Sports, etc., this is an effective reminder for the robot to search and answer information; the relevant information of the search means that according to the topic of the conversation, the user's needs can be satisfied by searching the database or the network, for example, when talking about the weather, according to the user The time and place given, the weather of the corresponding city or region, or the change of the weather, etc., by querying the relevant information obtained by these searches, can give the user's required answer. Moreover, through the above-mentioned post-processing steps, it can be used to increase the answer accuracy, so that the output text is more accurate.
. 在步骤 A4之后, 还可以包括步骤 A5: 将输出文本转换成输出语音。 如果不考虑自然 语言交互, 仅考虑文本交互的角度, 可以省略输出文本转化为输出语音的步骤。 After step A4, step A5 may also be included: converting the output text into an output speech. If you do not consider natural language interactions, just consider the angle of text interaction, you can omit the step of converting the output text to output speech.
在步骤 A4之后, 还可以包括步骤 A6, 用户对所述输出语音进行评价, 所述文本理解 回答模块根据所述评价调整所述权重值。 此时, 还可以为每一个用户建立个人信息档案, 即还包括为用户存储个人信息步骤, 并将所述权重值存储到用户的个人信息中; 在用户登 录时, 读取所述权重值并对应调整所述映射语料库。 其中, 评价是人为主观的, 对于系统 的回答, 用户可以给出三个等级的评价, 比如, 很好、 还行、 不好, 或者其它等级的其它 评价, 本发明对此不加以额外限定。 系统获得评价之后, 还可以通过语音给出确认信息; 同时系统根据结果, 对所述映射语料库的词性的权重值进行调整。 After step A4, step A6 may also be included, the user evaluates the output voice, and the text understanding response module adjusts the weight value according to the evaluation. At this time, a personal information file may also be established for each user, that is, a step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, the weight value is read and Corresponding to the mapping corpus. Among them, the evaluation is subjective. For the system's answer, the user can give three levels of evaluation, for example, good, okay, bad, or other evaluations of other levels, which are not additionally limited in the present invention. After the system obtains the evaluation, the confirmation information can also be given by voice; at the same time, the system adjusts the weight value of the part of speech of the mapping corpus according to the result.
本发明还提供了一种口语理解的方法。由于用户使用环境的安静程度的不同和所使用 的语音识别软件自身的特点、 以及口语本身具有的一些重复、 省略、 停顿、 病句以及对同 一个语义具有多种丰富的表达方法的特点,使得自动语音识别的输出结果具有不确定性和 多样性, 因此, 按照常用的自然语言理解的规则方法很难进行语义的解析和表达。 其实人 类在嘈杂环境下进行聊天交流的时候, 有时候也不能听到对方说的每一个词语, 但是如果 能听懂其中关键的几个词语, 并且根据部分的上下文环境, 就能恢复出对方所要表达的意 思。 所以在此, 使用关键词 (keywords) 到概念语句(concept sentence)的映射来获得说 话者的语义, 并且概念语句直接由相应的自然语句来表示。 The present invention also provides a method of oral understanding. Due to the difference in the quietness of the user's use environment and the characteristics of the speech recognition software used, as well as some repetitions, omissions, pauses, sick sentences and various rich expression methods for the same semantics, the automatic The output of speech recognition has uncertainty and diversity. Therefore, it is difficult to parse and express semantics according to the commonly used rules of natural language understanding. In fact, when human beings chat and communicate in a noisy environment, sometimes they can't hear every word spoken by the other party, but if you can understand the key words, and according to the context of the part, you can recover the other party's needs. The meaning of expression. So here, use the mapping of keywords to concept sentences to get the speaker's semantics, and the concept statements are directly represented by the corresponding natural statements.
图 2是口语文本理解回答的流程图。 Figure 2 is a flow chart of the answer to the spoken text.
首先通过分词模块 9, 得到具有词性标注的词语集合, 汉语分词己经有较多的研究, 并且有较高的正确率, 在此不再赘述; 同时根据聊天的历史信息, 当输入语句出现一些指 示代词, 或者主题不变的谈话, 或是常识性知识需要加入时, 就需要进行预处理; 根据需 要进行预处理 10, 即前处理 10, 把一些必要的信息替换、 增加或者是进行对话标志位的
设置, 系统可以通过直接返回一个标志位, 来表示预处理的结果。 如果预处理返回标志成 功, 处理会直接进入后处理模块 14而给出最后的输出文本; 如果预处理之后还要进行处 理, 将会进入匹配排序模块 11, 根据图 4所示的语料库, 将输入的词性标注集合和语料 库中 keys属性描述的备选的词性集合进行匹配计算, 不同的词性拥有不同的权重, 对语 料库中的每一个备选的概念语句都会给出一个分数, 例如 "你叫什么名字" , 本句中最能 表达语义的就是名词 "名字" , 其他的相对来说重要性弱些, 所以在进行匹配时, 应该匹 配重要程度最高的词性; 这种词性的匹配程度直接影响概念语句的准确度。 First, through the word segmentation module 9, the word collection with part-of-speech tagging is obtained. The Chinese word segmentation has been studied more and has a higher correct rate, and will not be described here. At the same time, according to the historical information of the chat, some input statements appear. Demonstrative pronouns, or conversations with the same theme, or common sense knowledge, need to be pre-processed; pre-processing 10 as needed, pre-processing 10, replacing or adding some necessary information to the dialogue flag Bit Set, the system can return the result of the preprocessing by directly returning a flag. If the pre-processing return flag is successful, the processing will directly enter the post-processing module 14 to give the final output text; if the processing is to be processed after the pre-processing, the matching sorting module 11 will be entered, according to the corpus shown in FIG. The part-of-speech tag set and the alternative part-of-speech set described by the keys attribute in the corpus are matched and calculated. Different part of speech has different weights, and each alternative concept sentence in the corpus gives a score, for example, "What is your name? The name ", the most expressive semantics in this sentence is the noun "name", other relatives are less important, so when matching, the most important part of speech should be matched; the degree of matching of this part of speech directly affects the concept The accuracy of the statement.
该匹配排序模块最后会用得分最高的 3个 pattern构成一个集合。因为语音识别固有 的不足以及使用环境的影响, 可能会出现识别出来的文字根本不是一个完整的语句, 甚至 是混乱的文字, 这种情况下得到的分词结果会很差, 通过映射, 得到的映射语句的分数都 为零, 在这种情况下, 认为聊天系统根本没有听清楚说话者, 叫概念语句集合设置为空。 The matching sorting module will eventually form a set with the three highest score patterns. Because of the inherent insufficiency of speech recognition and the influence of the use environment, it may appear that the recognized text is not a complete statement at all, or even a confusing text. In this case, the result of the word segmentation will be poor, and the mapping obtained by mapping will be obtained. The scores of the statements are all zero. In this case, the chat system is considered to have not heard the speaker at all, and the set of concept statements is set to null.
如果该集合是空, 直接就入如图 7所示的默认语料库; 如果集合非空, 选择其中最高 得分语句与第一门限值作比较 12, 当得分小于门限时也直接进入如图 7所示的默认语料 库, 当得分不小于门限时将会成功的得到映射的概念语句, 其对应的 pattern就作为概念 语句。 其中, 在确定所述第一门限值时, 可以通过选取一个 100句比较典型的测试集, 通 过匹配, 给测试结果打分, 得分最高的门限就被选作了此处的第一门限值。 If the set is empty, directly enter the default corpus as shown in Figure 7; if the set is not empty, select the highest score statement to compare with the first threshold 12, when the score is less than the threshold, also directly enter Figure 7 The default corpus shown, when the score is not less than the threshold, will successfully get the mapped concept statement, and its corresponding pattern is used as the concept statement. Wherein, when determining the first threshold, a typical test set of 100 sentences can be selected, and the test result is scored by matching, and the threshold with the highest score is selected as the first threshold here. .
得到概念语句后, 通过搜索模块 13, 依据一些历史信息和如图 6所示的语料库, 试 图给出应答文本, 这是一个搜索过程, 用当前的概念语句和上一个系统回答语句作为输入 来搜索, 因为不一定同时满足两个输入, 所以有可能搜索的结果是空。 如果搜索出了回答 文本, 则视为成功, 将会直接将回答输出送入后处理模块 14来进行处理; 如果搜索的结 果为空, 则视为失败, 会进入如图 5所示的语料库来进行回答, 最后的输出结果同样要进 入后处理模块 14。输出语句在后处理模块 14进行相应的处理,其中会加入一些历史信息, 或者存储历史信息, 进行谈话主题的状态设置, 已经相关信息的査询搜索, 最终会形成回 答文本, 返回给语音合成模块。 最终回答文本的生成, 可以根据对概念语句的回答、 信息 的搜索和历史信息来共同生成。 After obtaining the concept statement, through the search module 13, according to some historical information and the corpus shown in Figure 6, try to give the response text, which is a search process, using the current concept statement and the previous system answer statement as input to search Because it is not necessary to satisfy both inputs at the same time, it is possible that the result of the search is empty. If the answer text is searched, it is regarded as successful, and the answer output will be directly sent to the post-processing module 14 for processing; if the search result is empty, it will be regarded as a failure, and the corpus as shown in FIG. 5 will be entered. The answer is made and the final output is also entered into the post-processing module 14. The output statement performs corresponding processing in the post-processing module 14, wherein some historical information is added, or historical information is stored, the state of the conversation topic is set, and the query of the related information is searched, and finally the answer text is formed, and the speech synthesis module is returned. . The generation of the final answer text can be generated jointly based on the answers to the concept sentences, the search for information, and historical information.
本发明还提供了一种对话语料库的结构与描述存储方法。为了完成描述关键字到概念 语句的映射关系、根据概念语句和上下文环境下给出对应输出语句的描述和存储, 设计了 一种基于 XML (extendable markup language :)的存储结构描述语言来描述这些非结构化 的数据结构, 并用 XML文档来描述语料库, 用关系数据库来存储数据。 映射语料库和对话 语料库及历史信息都使用 XML来描述和存储。 并且定义了描述语料所需要的属性节点。数
据库中存储了词性集合, 概念语句, 回答语句以及历史信息等等。 其特点是容易组织和管 理,可以动态的修改语料库的内容。各种语料库能够通过人工的方法手动修改和添加数据, 同时能够直接通过语音交互来完成语料库的添加和修改, 并且能自动存储特定的数据。 The invention also provides a structure and description storage method of a dialogue corpus. In order to complete the mapping relationship between the description keyword and the concept statement, and to describe and store the corresponding output statement according to the concept statement and context, a storage structure description language based on XML (extendable markup language:) is designed to describe these non- Structured data structures, and use XML documents to describe the corpus and relational databases to store data. The mapping corpus and dialog corpus and historical information are both described and stored using XML. And define the attribute nodes needed to describe the corpus. Number According to the library, a collection of parts of speech, concept sentences, answer statements, and historical information are stored. It is characterized by easy organization and management, and can dynamically modify the contents of the corpus. Various corpora can manually modify and add data by manual methods, and can directly add and modify corpora through voice interaction, and can automatically store specific data.
本发明还提供了一种通过语音来学习知识的过程和方法。聊天系统的知识积累可以通 过对话者用自然交互的方式来告知,并且通过相互的询问来确定是否让聊天系统获得了用 户给予的知识, 同时聊天系统会给出相应的自然语言反馈。 The present invention also provides a process and method for learning knowledge by voice. The knowledge accumulation of the chat system can be informed by the interlocutor in a natural interaction manner, and through mutual inquiry to determine whether the chat system obtains the knowledge given by the user, and the chat system will give corresponding natural language feedback.
本发明还提供了一种聊天上下文信息的纪录和使用方法。该系统在和人交互的过程中 会自动地将一些信息存储到上下文的纪录当中, 对一些重要的信息和对话内容进行存储, 并且在对话过程中会添加相应的信息, 根据信息动态地组织回答语句。 The invention also provides a method for recording and using chat context information. The system automatically stores some information into the context record during the interaction with the person, stores some important information and conversation content, and adds corresponding information during the dialogue, and dynamically organizes the response according to the information. Statement.
本发明还提供了词性权重的优化和在线学习方法。 在将关键词映射到概念语句的时 候, 每个不同词性的关键词会拥有不同的权重。使用优化的方法来得到各个词性最优的权 重值, 并且能够通过在线学习来动态修改权重值。 在将关键词映射到对应的概念语句时, 需要对各个关键词的词性进行加权,不同词性的关键词在表示句子语义的过程中具有不同 的权重, 通常一个句子的名词和动词具有较高的权重, 对句子语义的理解有重要的意义。 然而, 自然语言的词性种类很多, 各个词性的权重并没有一个确定的数值。 因此, 提出了 词性权重的优化方法和在线的学习方法, 来达到关键词到概念语句映射正确率的最大化。 The invention also provides an optimization of part-of-speech weights and an online learning method. When mapping keywords to conceptual statements, each keyword with different parts of speech will have different weights. An optimized method is used to obtain individual ortho-optimal weight values, and the weight values can be dynamically modified through online learning. When mapping keywords to corresponding concept sentences, it is necessary to weight the part of speech of different keywords. Keywords with different part of speech have different weights in the process of expressing sentence semantics. Usually, nouns and verbs of a sentence have higher Weight, the understanding of sentence semantics has important significance. However, natural language has many types of parts of speech, and the weight of each part of speech does not have a certain value. Therefore, the optimization method of part-of-speech weights and the online learning method are proposed to maximize the correct rate of keyword-to-concept statement mapping.
如图 9所示, 是一种用正交优化来确定词性权重的方法。 由于汉语词性众多, 并且在 语义表达上不同词性的重要程度并不确切知道, 需要通过优化方法来得到各个词性的权 重。 按一般语言学的观点以及常识, 选择了动词、 名词、 代词、 数词、 形容词, 地名词、 副词、 习语、 时间词、 助词、 语气词、 人名、 区别词、 方位词, 这 14类相对重要的词。 首先根据需要和经验获得 14个需要的词性,并且依据语言学的知识将 14个词性分成两组, 例如将名词、 动词、 代词、 地名词、 形容词、 时间词、 人名, 这 7个词性为第一组; 将语 气词、 方位词、 区别词、 助词、 习语、 副词、 数词, 这 7个词性为第二组, 将通过两组正 交优化试验来获得一个可用的权重集合。第一组试验时, 用相对更重要的 7个属性作为因 素, 三个水平, 例如 3、 2、 1 , 选择 L18-3-7标准的正交试验表。 另外的 7个词性会被置 为 0。 在测试集合的建立时, 每一句话都是口语类型的, 并且在测试集合中尽量让每一个 词性都按自然的概率出现。 在每一次试验中, 测试集合中的每一句话, 人工地按照匹配出 来的概念语句的合理性给与打分, 并将得分当作本次试验的结果。 这样会进行 18个回合 的试验。 通过第一组试验, 会得到一组当前最优的权重值。 在第二组试验时, 相对重要的 7个词性赋予第一组试验得到权重值。对剩下的 7个词性的权重用正交优化, 例如使用 2、
1、 0的水平, 同样选择 L18- 3- 7标准的正交试验表。 用与第一次相同的测试集合和评分 标准, 优化出余下的 7个词性。 最后, 将两次得到的词性结合起来, 获得系统可用的 14 个词性的权重值。 As shown in Figure 9, it is a method of determining the weight of part of speech using orthogonal optimization. Because of the large number of Chinese parts of speech, and the importance of different part of speech in semantic expression is not known, the optimization method is needed to obtain the weight of each part of speech. According to the general linguistic point of view and common sense, verbs, nouns, pronouns, numerals, adjectives, nouns, adverbs, idioms, time words, auxiliary words, modal particles, names of people, distinguishing words, and position words are selected. Important words. Firstly, according to the needs and experience, 14 words of speech are obtained, and according to the knowledge of linguistics, 14 parts of speech are divided into two groups, for example, nouns, verbs, pronouns, nouns, adjectives, time words, names, etc. A set of modal particles, orientation words, distinguishing words, auxiliary words, idioms, adverbs, numerals, and these seven parts of speech are the second group, and a set of available weights will be obtained through two sets of orthogonal optimization experiments. In the first set of experiments, the relatively more important seven attributes were used as factors, three levels, such as 3, 2, and 1, to select the orthogonal test table for the L18-3-7 standard. The other 7 parts of speech will be set to 0. At the time of the test set creation, each sentence is of a spoken type, and in the test set, try to make each part of speech appear as a natural probability. In each trial, each sentence in the test set was manually scored according to the rationality of the matched concept statement, and the score was taken as the result of this test. This will test 18 rounds. Through the first set of experiments, a set of currently optimal weight values is obtained. In the second set of experiments, the relatively important 7 parts of speech gave the first set of trials a weight value. For the remaining 7 parts of speech, use orthogonal optimization, for example, use 2 For the level of 1, 0, the orthogonal test table of the L18-3-1 standard is also selected. The remaining 7 parts of speech were optimized using the same test set and scoring criteria as the first. Finally, the two parts of speech obtained are combined to obtain the weights of the 14 parts of speech available to the system.
如图 10所示, 是各种词性权重的在线学习过程。 户进入词性训练模式时, 将通过 语音来训练数据库, 首先用户给定一个测试的输入语音进入映射模块 15, 映射模块 15就 是图 2所示的映射模块 5并且将映射结果以语音的形式给用户和判别模块 16, 用户会根 据反馈给出评价, 判别模块 16会通过评价来在权重调整模块 17中按算法调整权重, 将调 整后的权重送入映射模块 15, 进行下一轮的权重调整, 直到最后达到用户满意的匹配度。 例如, 当用户说 "你的特长是什么" , 然后系统经过处理后会问 "你说的是 '你的特长是 什么吗, "或者会问 "你说的是 '你是什么吗' ",显然用户会回答 "是" , 或者 "不 是" , 系统根据回答, 就会调整词性权重, 使得尽可能地回答正确。 As shown in Figure 10, it is an online learning process with various part-of-speech weights. When the user enters the part-of-speech training mode, the database will be trained by voice. First, the user inputs a test input voice into the mapping module 15, the mapping module 15 is the mapping module 5 shown in FIG. 2, and the mapping result is given to the user in the form of voice. And the discriminating module 16, the user will give an evaluation according to the feedback, and the discriminating module 16 will adjust the weight according to the algorithm in the weight adjusting module 17 by the evaluation, and send the adjusted weight to the mapping module 15 to perform the weight adjustment of the next round. Until the end of the user satisfaction match. For example, when the user says "What is your specialty?", then the system will ask, "What do you say is 'What is your specialty?'" or ask "What are you saying 'What are you?'" Obviously the user will answer "yes" or "no". Based on the answer, the system will adjust the part of speech weights so that the answer is correct as much as possible.
本发明还提供了一种自然语言行为驱动方法。用自然口语的方式来发出命令驱动,在 词性集合到概念语句, 以及从概念语句到最终的回答和反馈, 有特定的格式和动作驱动脚 本, 能够用口语方式自然地驱动系统或发出命令。对于行为驱动, 不再是使用系统提前规 定的短语或简单的祈使句来驱动系统的行为,而是对于一些自然的命令表达方式能够给出 正确的反应, 同时通过语音来进行确认和响应来达到提醒用户的功能。这种行为驱动方式 更加符合人们的日常习惯, 对于新的使用者不需要太多的学习就可以用自然语言驱动系 统。 The invention also provides a natural language behavior driving method. Command-driven in natural spoken language, in word-of-speech to conceptual statements, and from conceptual statements to final answers and feedback, there are specific formats and action-driven scripts that can naturally drive the system or issue commands in a spoken language. For behavior-driven, it is no longer to use the system's pre-defined phrases or simple imperatives to drive the system's behavior, but to give some correct responses to some natural command expressions, and to confirm and respond by voice to reach the reminder. User's function. This behavior-driven approach is more in line with people's daily habits, and it can be driven by natural language for new users without much learning.
本发明还提供了一种语音聊天的嵌入式实现系统。对于这种语音聊天的设计框架,有 多种实现方式, 比如使用语音识别芯片完成识别的功能和映射存储语料库、使用嵌入式系 统实现与普通处理器相类似的语音识别和语音合成, 及语言理解。 嵌入式的实现方式是其 中一种, 需要在特定的嵌入式操作系统下完成自动语音识别、语义理解和语音合成, 同时 要进行集成, 不同平台下的各种实现软件会有差别。这种方案完全具备语音聊天系统固有 的特性, 同时具备了便于携带, 耗电少以及精致小巧, 价格低廉的特点。 The invention also provides an embedded implementation system for voice chat. For the design framework of this kind of voice chat, there are various implementation methods, such as using the voice recognition chip to complete the recognition function and mapping the storage corpus, using the embedded system to realize the speech recognition and speech synthesis similar to the ordinary processor, and language understanding. . Embedded implementation is one of them. It requires automatic speech recognition, semantic understanding and speech synthesis under a specific embedded operating system. At the same time, integration is required, and various implementation softwares on different platforms may differ. This solution is fully equipped with the inherent features of a voice chat system, and is characterized by its portability, low power consumption, compact size, and low price.
本发明还提供了一种用声音自然地进行信息的査询和回答方法。信息的査询和反馈都 是使用自然语音, 并且能够给出符合人类语言的回答方式。 可以满足人们用一种自然语言 交流的方式来获得自己需要的信息, 釆用互动的方式进行信息的询问、 回答和确认。 并且 数据可来自于已有的数据库和来自于互联网。 The present invention also provides a method of querying and answering information naturally by sound. Information is queried and fed back using natural speech and is able to give answers in a human language. It can satisfy people's use of a natural language to communicate the information they need, and use interactive methods to ask, answer and confirm information. And the data can come from existing databases and from the Internet.
应当理解的是, 对本领域普通技术人员来说, 可以根据上述说明加以改进或变换, 而 所有这些改进和变换都应属于本发明所附权利要求的保护范围。
It is to be understood that those skilled in the art can devise modifications and changes in accordance with the above description, and all such modifications and changes are intended to be included within the scope of the appended claims.
Claims
1、 一种智能聊天系统,其特征在于,包括用于根据输入文本得到输出文本的文 本理解回答模块; 所述文本理解回答模块包括分词单元、 基于 XML的映射语料库、 映射 单元、 基于 XML的对话语料库和搜索单元; What is claimed is: 1. An intelligent chat system, comprising: a text comprehension answering module for obtaining an output text according to an input text; the text comprehension answering module comprising a word segmentation unit, an XML-based mapping corpus, a mapping unit, and an XML-based conversation Corpus and search unit;
所述分词单元用于对所述输入文本进行词性标注, 得到具有词性标注的词语集合; 所述映射语料库用于建立和存储关键词到概念语句的映射关系; The word segmentation unit is configured to perform part-of-speech tagging on the input text to obtain a word set with part-of-speech tagging; the mapping corpus is used to establish and store a mapping relationship between a keyword and a concept statement;
所述映射单元用于根据所述词语集合, 搜索所述映射语料库, 映射得到概念语句; 所述对话语料库用于建立和存储概念语句到输出文本的映射关系; The mapping unit is configured to search the mapping corpus according to the set of words, and map to obtain a concept statement; the dialog corpus is used to establish and store a mapping relationship between the concept statement and the output text;
所述搜索单元用于根据所述概念语句, 搜索所述对话语料库, 映射得到输出文本。 The searching unit is configured to search the dialog corpus according to the concept statement, and obtain an output text by mapping.
2、 根据权利要求 1所述的智能聊天系统,其特征在于,其还包括用于将输入语 音转化为输入文本的语音识别模块。 2. The intelligent chat system of claim 1 further comprising a speech recognition module for converting the input speech into the input text.
3、 根据权利要求 1所述的智能聊天系统,其特征在于,其还包括用于将输出文 本转换成输出语音的语音合成模块。 3. The intelligent chat system of claim 1 further comprising a speech synthesis module for converting the output text to output speech.
4、 根据权利要求 1所述的智能聊天系统,其特征在于,所述映射语料库和所述 对话语料库设置在同一语料库中。 4. The intelligent chat system of claim 1, wherein the mapping corpus and the conversation corpus are disposed in the same corpus.
5、 根据权利要求 1所述的智能聊天系统, 其特征在于, 还包括预处理单元, 用 于将来自所述分词单元的所述词语集合, 进行替换词语集合信息、 增加对话标志或者设 置对话标志位, 得到供所述映射单元使用的所述词语集合。 The smart chat system according to claim 1, further comprising a pre-processing unit, configured to: replace the word set information, add a dialog flag, or set a dialog flag to the word set from the word segmentation unit. Bit, the set of words used for the mapping unit is obtained.
6、 根据权利要求 1所述的智能聊天系统, 其特征在于, 还包括后处理单元, 用 于将来自所述搜索单元的所述输出文本, 进行以下处理: 加入或存储历史信息、 设置谈 话主题、 加入搜索得到的相关信息, 得到输出到语音合成模块的所述输出文本。 The smart chat system according to claim 1, further comprising a post-processing unit, configured to: perform the following processing on the output text from the search unit: adding or storing history information, setting a conversation theme And adding relevant information obtained by searching, and obtaining the output text output to the speech synthesis module.
7、 一种智能聊天系统的实现方法,用于包括根据输入文本得到输出文本的文本 理解回答模块的智能聊天系统, 其包括步骤: 7. An implementation method of an intelligent chat system, comprising: a smart chat system comprising a text understanding answer module for outputting text according to input text, comprising the steps of:
Al、建立基于 XML的映射语料库和对话语料库,所述映射语料库建立和存储关键字 到概念语句的映射关系, 所述对话语料库建立和存储概念语句到输出文本的映射关系; A2、 对所述输入文本进行词性标注, 得到具有词性标注的词语集合; Al, establishing an XML-based mapping corpus and a dialogue corpus, the mapping corpus establishing and storing a mapping relationship of keywords to concept sentences, the dialog corpus establishing and storing a mapping relationship between the concept sentences and the output text; A2, the input The text is subjected to part-of-speech tagging to obtain a set of words with part-of-speech tagging;
A3、对所述词语集合和所述映射语料库的关键词的词语集合进行匹配计算,得到概 念语句; A3. Perform matching calculation on the word set and the word set of the keyword of the mapping corpus to obtain a concept statement;
A4、 根据所述概念语句, 搜索所述对话语料库, 生成输出文本。
A4. Search the dialog corpus according to the concept statement to generate an output text.
8、 根据权利要求 7所述的实现方法, 其特征在于, 在步骤 A2之前, 还包括步 骤- 将输入语音转化为输入文本。 8. The implementation method according to claim 7, wherein before step A2, the method further comprises the step of: converting the input voice into the input text.
9、 根据权利要求 7所述的实现方法, 其特征在于, 还包括步骤 A5: 将输出文 本转换成输出语音。 9. The method according to claim 7, further comprising the step of: converting the output text into an output voice.
10、 根据权利要求 7所述的实现方法, 其特征在于, 在步骤 A4之后, 还包括用 于增加回答准确度的后处理步骤: 加入或存储历史信息、 设置谈话主题、 加入搜索的相 关信息。 10. The implementation method according to claim 7, wherein after step A4, a post-processing step for increasing the answer accuracy is further included: adding or storing history information, setting a conversation topic, and adding related information of the search.
11、 根据权利要求 7所述的实现方法, 其特征在于, 在步骤 A3之前, 还包括步 骤: The implementation method according to claim 7, wherein before step A3, the method further includes the following steps:
Bl、 判断输入文本存在下述情况: 出现指示代词、 主题没有变化、 或者需要加入常 识, 则分别对应执行预处理步骤: 替换词语集合信息、 增加对话标志或者设置对话标志 位, 否则执行步骤 A3; Bl, judging the input text has the following conditions: the presence of the demonstrative pronoun, the subject does not change, or need to add common sense, respectively corresponding to the implementation of the pre-processing steps: replace the word set information, increase the dialog flag or set the dialog flag, otherwise perform step A3;
B2、 判断预处理是否完成, 是则返回成功标志, 执行步骤 A4,否则返回失败标志, 执行步骤 A3。 B2. Determine whether the preprocessing is completed. If yes, return the success flag, and go to step A4. Otherwise, return the failure flag and go to step A3.
12、 根据权利要求 7所述的实现方法,其特征在于,所述映射语料库和所述对话 语料库设置在同一语料库中。 12. The implementation method according to claim 7, wherein the mapping corpus and the conversation corpus are set in the same corpus.
13、 根据权利要求 7所述的实现方法, 其特征在于, 步骤 A1还包括: 对所述映 射语料库的词性设置权重值,其中,所述权重值釆用正交优化或两次正交优化方法获得。 The implementation method according to claim 7, wherein the step A1 further comprises: setting a weight value for the part of speech of the mapping corpus, wherein the weight value is orthogonal optimization or two orthogonal optimization methods. obtain.
14、 根据权利要求 13所述的实现方法, 其特征在于, 还包括步骤 A6, 用户对所 述输出语音进行评价, 所述文本理解回答模块根据所述评价调整所述权重值。 The implementation method according to claim 13, further comprising the step A6, the user evaluating the output voice, and the text understanding answering module adjusts the weight value according to the evaluation.
根据权利要求 14所述的实现方法,其特征在于, 还包括为用户存储个人信息步骤, 并将所述权重值存储到用户的个人信息中; 在用户登录时, 读取所述权重值并对应调整 所述映射语料库。
The implementation method according to claim 14, further comprising the step of storing personal information for the user, and storing the weight value in the personal information of the user; when the user logs in, reading the weight value and corresponding Adjust the mapping corpus.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007100741121A CN101075435B (en) | 2007-04-19 | 2007-04-19 | Intelligent chatting system and its realizing method |
CN200710074112.1 | 2007-04-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008128423A1 true WO2008128423A1 (en) | 2008-10-30 |
Family
ID=38976431
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2008/000764 WO2008128423A1 (en) | 2007-04-19 | 2008-04-15 | An intelligent dialog system and a method for realization thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN101075435B (en) |
WO (1) | WO2008128423A1 (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107968896A (en) * | 2018-01-08 | 2018-04-27 | 杭州声讯网络科技有限公司 | Unattended communication on telephone system and communication method |
CN108153875A (en) * | 2017-12-26 | 2018-06-12 | 广州蓝豹智能科技有限公司 | Language material processing method, device, intelligent sound box and storage medium |
CN109584882A (en) * | 2018-11-30 | 2019-04-05 | 南京天溯自动化控制系统有限公司 | A kind of optimization method and system of the speech-to-text for special scenes |
CN109597986A (en) * | 2018-10-16 | 2019-04-09 | 深圳壹账通智能科技有限公司 | Localization method, device, equipment and the storage medium of abnormal problem |
CN109829052A (en) * | 2019-02-19 | 2019-05-31 | 田中瑶 | A kind of open dialogue method and system based on human-computer interaction |
CN110347996A (en) * | 2019-07-15 | 2019-10-18 | 北京百度网讯科技有限公司 | Amending method, device, electronic equipment and the storage medium of text |
CN110516043A (en) * | 2019-08-30 | 2019-11-29 | 苏州思必驰信息科技有限公司 | Answer generation method and device for question answering system |
CN111125124A (en) * | 2019-11-18 | 2020-05-08 | 云知声智能科技股份有限公司 | Corpus labeling method and apparatus based on big data platform |
CN111259649A (en) * | 2020-01-19 | 2020-06-09 | 深圳壹账通智能科技有限公司 | Interactive data classification method and device of information interaction platform and storage medium |
CN111325034A (en) * | 2020-02-12 | 2020-06-23 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for semantic completion in multi-round conversation |
CN111563029A (en) * | 2020-03-13 | 2020-08-21 | 深圳市奥拓电子股份有限公司 | Testing method, system, storage medium and computer equipment for conversation robot |
CN111666381A (en) * | 2020-06-17 | 2020-09-15 | 中国电子科技集团公司第二十八研究所 | Task type question-answer interaction system oriented to intelligent control |
CN111783439A (en) * | 2020-06-28 | 2020-10-16 | 平安普惠企业管理有限公司 | Man-machine interaction dialogue processing method and device, computer equipment and storage medium |
CN111968680A (en) * | 2020-08-14 | 2020-11-20 | 北京小米松果电子有限公司 | Voice processing method, device and storage medium |
CN112463108A (en) * | 2020-12-14 | 2021-03-09 | 美的集团股份有限公司 | Voice interaction processing method and device, electronic equipment and storage medium |
CN112562678A (en) * | 2020-11-26 | 2021-03-26 | 携程计算机技术(上海)有限公司 | Intelligent dialogue method, system, equipment and storage medium based on customer service recording |
CN112559691A (en) * | 2020-12-22 | 2021-03-26 | 珠海格力电器股份有限公司 | Semantic similarity determination method and device and electronic equipment |
CN112912954A (en) * | 2018-10-31 | 2021-06-04 | 三星电子株式会社 | Electronic device and control method thereof |
CN113535921A (en) * | 2021-07-21 | 2021-10-22 | 携程旅游网络技术(上海)有限公司 | Speech output method, system, electronic device and storage medium for customer service |
CN113555018A (en) * | 2021-07-20 | 2021-10-26 | 海信视像科技股份有限公司 | Voice interaction method and device |
CN113641778A (en) * | 2020-10-30 | 2021-11-12 | 浙江华云信息科技有限公司 | A Topic Recognition Method for Dialogue Texts |
CN113869066A (en) * | 2021-10-15 | 2021-12-31 | 中通服创立信息科技有限责任公司 | Semantic understanding method and system based on agricultural field text |
CN117874847A (en) * | 2023-12-19 | 2024-04-12 | 浙江大学 | A human-machine collaborative concept design generation method and system based on FBS theory |
CN118447845A (en) * | 2024-05-31 | 2024-08-06 | 南京龙垣信息科技有限公司 | Intelligent customer service dialogue system and equipment |
CN119599029A (en) * | 2024-11-14 | 2025-03-11 | 广东数业智能科技有限公司 | Psychological accompanying dialogue method based on multi-agent cooperation and storage medium |
CN119940345A (en) * | 2025-04-03 | 2025-05-06 | 湖南科技大学 | Intelligent scene dialogue analysis method and system based on model identification |
Families Citing this family (84)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075435B (en) * | 2007-04-19 | 2011-05-18 | 深圳先进技术研究院 | Intelligent chatting system and its realizing method |
US8374859B2 (en) | 2008-08-20 | 2013-02-12 | Universal Entertainment Corporation | Automatic answering device, automatic answering system, conversation scenario editing device, conversation server, and automatic answering method |
JP5829000B2 (en) * | 2008-08-20 | 2015-12-09 | 株式会社ユニバーサルエンターテインメント | Conversation scenario editing device |
CN101551998B (en) * | 2009-05-12 | 2011-07-27 | 上海锦芯电子科技有限公司 | A group of voice interaction devices and method of voice interaction with human |
CN101610164B (en) * | 2009-07-03 | 2011-09-21 | 腾讯科技(北京)有限公司 | Implementation method, device and system of multi-person conversation |
CN101794304B (en) * | 2010-02-10 | 2016-05-25 | 深圳先进技术研究院 | Industry information service system and method |
CN102737631A (en) * | 2011-04-15 | 2012-10-17 | 富泰华工业(深圳)有限公司 | Electronic device and method for interactive speech recognition |
US8260615B1 (en) * | 2011-04-25 | 2012-09-04 | Google Inc. | Cross-lingual initialization of language models |
CN102194005B (en) * | 2011-05-26 | 2014-01-15 | 卢玉敏 | Chat robot system and automatic chat method |
US8930189B2 (en) * | 2011-10-28 | 2015-01-06 | Microsoft Corporation | Distributed user input to text generated by a speech to text transcription service |
CN103150981A (en) * | 2013-01-02 | 2013-06-12 | 曲东阳 | Self-service voice tour-guiding system and triggering method thereof |
CN103198155B (en) * | 2013-04-27 | 2017-09-22 | 北京光年无限科技有限公司 | A kind of intelligent answer interactive system and method based on mobile terminal |
CN103279528A (en) * | 2013-05-31 | 2013-09-04 | 俞志晨 | Question-answering system and question-answering method based on man-machine integration |
CN104281609B (en) * | 2013-07-08 | 2020-03-17 | 腾讯科技(深圳)有限公司 | Configuration method and device for voice input instruction matching rule |
EP3061086B1 (en) * | 2013-10-24 | 2019-10-23 | Bayerische Motoren Werke Aktiengesellschaft | Text-to-speech performance evaluation |
CN103593054B (en) * | 2013-11-25 | 2018-04-20 | 北京光年无限科技有限公司 | A kind of combination Emotion identification and the question answering system of output |
CN104754110A (en) * | 2013-12-31 | 2015-07-01 | 广州华久信息科技有限公司 | Machine voice conversation based emotion release method mobile phone |
JP6359327B2 (en) * | 2014-04-25 | 2018-07-18 | シャープ株式会社 | Information processing apparatus and control program |
US10726831B2 (en) * | 2014-05-20 | 2020-07-28 | Amazon Technologies, Inc. | Context interpretation in natural language processing using previous dialog acts |
CN104123939A (en) * | 2014-06-06 | 2014-10-29 | 国家电网公司 | Substation inspection robot based voice interaction control method |
CN105404617B (en) * | 2014-09-15 | 2018-12-14 | 华为技术有限公司 | A kind of control method of remote desktop, controlled end and control system |
CN104392720A (en) * | 2014-12-01 | 2015-03-04 | 江西洪都航空工业集团有限责任公司 | Voice interaction method of intelligent service robot |
CN104615646A (en) * | 2014-12-25 | 2015-05-13 | 上海科阅信息技术有限公司 | Intelligent chatting robot system |
CN104898589B (en) * | 2015-03-26 | 2019-04-30 | 天脉聚源(北京)传媒科技有限公司 | A kind of intelligent response method and apparatus for intelligent steward robot |
WO2016173326A1 (en) * | 2015-04-30 | 2016-11-03 | 北京贝虎机器人技术有限公司 | Subject based interaction system and method |
CN105094315B (en) * | 2015-06-25 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | The method and apparatus of human-machine intelligence's chat based on artificial intelligence |
CN106326208B (en) * | 2015-06-30 | 2019-06-07 | 芋头科技(杭州)有限公司 | A kind of system and method that robot is trained by voice |
CN105206284B (en) * | 2015-09-11 | 2019-06-18 | 清华大学 | Dredge the cyberchat method and system of adolescent psychology pressure |
JP6120927B2 (en) * | 2015-09-24 | 2017-04-26 | シャープ株式会社 | Dialog system, method for controlling dialog, and program for causing computer to function as dialog system |
CN105376140A (en) * | 2015-09-25 | 2016-03-02 | 云活科技有限公司 | A voice message prompt method and device |
CN108139988B (en) * | 2015-10-20 | 2021-07-30 | 索尼公司 | Information processing system and information processing method |
CN105573710A (en) * | 2015-12-18 | 2016-05-11 | 合肥寰景信息技术有限公司 | Voice service method for network community |
CN105912712B (en) * | 2016-04-29 | 2019-09-17 | 华南师范大学 | Robot dialog control method and system based on big data |
CN105895097A (en) * | 2016-05-20 | 2016-08-24 | 杨天君 | Voice conversation information requiring platform |
CN106057203A (en) * | 2016-05-24 | 2016-10-26 | 深圳市敢为软件技术有限公司 | Precise voice control method and device |
CN106095834A (en) * | 2016-06-01 | 2016-11-09 | 竹间智能科技(上海)有限公司 | Intelligent dialogue method and system based on topic |
CN106294321B (en) * | 2016-08-04 | 2019-05-31 | 北京儒博科技有限公司 | A kind of the dialogue method for digging and device of specific area |
CN106228983B (en) * | 2016-08-23 | 2018-08-24 | 北京谛听机器人科技有限公司 | A kind of scene process method and system in man-machine natural language interaction |
CN106469212B (en) * | 2016-09-05 | 2019-10-15 | 北京百度网讯科技有限公司 | Man-machine interaction method and device based on artificial intelligence |
CN107844470B (en) * | 2016-09-18 | 2021-04-30 | 腾讯科技(深圳)有限公司 | Voice data processing method and equipment thereof |
CN106412263A (en) * | 2016-09-19 | 2017-02-15 | 合肥视尔信息科技有限公司 | Human-computer interaction voice system |
JP2018054790A (en) * | 2016-09-28 | 2018-04-05 | トヨタ自動車株式会社 | Voice interaction system and voice interaction method |
CN106653006B (en) * | 2016-11-17 | 2019-11-08 | 百度在线网络技术(北京)有限公司 | Searching method and device based on interactive voice |
CN108132952B (en) * | 2016-12-01 | 2022-03-15 | 百度在线网络技术(北京)有限公司 | Active type searching method and device based on voice recognition |
CN106802951B (en) * | 2017-01-17 | 2019-06-11 | 厦门快商通科技股份有限公司 | A kind of topic abstracting method and system for Intelligent dialogue |
CN107193978A (en) * | 2017-05-26 | 2017-09-22 | 武汉泰迪智慧科技有限公司 | A kind of many wheel automatic chatting dialogue methods and system based on deep learning |
CN107256260A (en) * | 2017-06-13 | 2017-10-17 | 浪潮软件股份有限公司 | A kind of intelligent semantic recognition methods, searching method, apparatus and system |
CN107393538A (en) * | 2017-07-26 | 2017-11-24 | 上海与德通讯技术有限公司 | Robot interactive method and system |
CN107463699A (en) * | 2017-08-15 | 2017-12-12 | 济南浪潮高新科技投资发展有限公司 | A kind of method for realizing question and answer robot based on seq2seq models |
CN108255804A (en) * | 2017-09-25 | 2018-07-06 | 上海四宸软件技术有限公司 | A kind of communication artificial intelligence system and its language processing method |
CN107644643A (en) * | 2017-09-27 | 2018-01-30 | 安徽硕威智能科技有限公司 | A kind of voice interactive system and method |
CN110121706B (en) * | 2017-10-13 | 2022-05-03 | 微软技术许可有限责任公司 | Providing responses in a conversation |
CN108231080A (en) * | 2018-01-05 | 2018-06-29 | 广州蓝豹智能科技有限公司 | Voice method for pushing, device, smart machine and storage medium |
CN108364655B (en) * | 2018-01-31 | 2021-03-09 | 网易乐得科技有限公司 | Voice processing method, medium, device and computing equipment |
KR102648815B1 (en) * | 2018-04-30 | 2024-03-19 | 현대자동차주식회사 | Appratus and method for spoken language understanding |
WO2020018724A1 (en) * | 2018-07-19 | 2020-01-23 | Dolby International Ab | Method and system for creating object-based audio content |
CN109325155A (en) * | 2018-07-25 | 2019-02-12 | 南京瓦尔基里网络科技有限公司 | A kind of novel dialogue state storage method and system |
CN109461448A (en) * | 2018-12-11 | 2019-03-12 | 百度在线网络技术(北京)有限公司 | Voice interactive method and device |
CN109726265A (en) * | 2018-12-13 | 2019-05-07 | 深圳壹账通智能科技有限公司 | Information processing method, device and computer-readable storage medium for assisting chat |
CN109410913B (en) | 2018-12-13 | 2022-08-05 | 百度在线网络技术(北京)有限公司 | Voice synthesis method, device, equipment and storage medium |
CN109829039B (en) * | 2018-12-13 | 2023-06-09 | 平安科技(深圳)有限公司 | Intelligent chat method, intelligent chat device, computer equipment and storage medium |
DE102018222156B4 (en) * | 2018-12-18 | 2025-01-30 | Volkswagen Aktiengesellschaft | Method, speech dialogue system and use of a speech dialogue system for generating a response output in response to speech input information |
CN109559754B (en) * | 2018-12-24 | 2020-11-03 | 焦点科技股份有限公司 | Voice rescue method and system for tumble identification |
CN111400464B (en) * | 2019-01-03 | 2023-05-26 | 百度在线网络技术(北京)有限公司 | Text generation method, device, server and storage medium |
CN109686360A (en) * | 2019-01-08 | 2019-04-26 | 哈尔滨理工大学 | A kind of voice is made a reservation robot |
CN110111788B (en) * | 2019-05-06 | 2022-02-08 | 阿波罗智联(北京)科技有限公司 | Voice interaction method and device, terminal and computer readable medium |
US10868778B1 (en) * | 2019-05-30 | 2020-12-15 | Microsoft Technology Licensing, Llc | Contextual feedback, with expiration indicator, to a natural understanding system in a chat bot |
CN112153213A (en) * | 2019-06-28 | 2020-12-29 | 青岛海信移动通信技术股份有限公司 | Method and equipment for determining voice information |
CN110427475A (en) * | 2019-08-05 | 2019-11-08 | 安徽赛福贝特信息技术有限公司 | A kind of speech recognition intelligent customer service system |
US11194970B2 (en) * | 2019-09-23 | 2021-12-07 | International Business Machines Corporation | Context-based topic recognition using natural language processing |
CN110704595B (en) * | 2019-09-27 | 2022-08-23 | 百度在线网络技术(北京)有限公司 | Dialogue processing method and device, electronic equipment and readable storage medium |
CN110880316A (en) * | 2019-10-16 | 2020-03-13 | 苏宁云计算有限公司 | Audio output method and system |
CN110827807B (en) * | 2019-11-29 | 2022-03-25 | 恒信东方文化股份有限公司 | Voice recognition method and system |
CN112988985A (en) * | 2019-12-02 | 2021-06-18 | 浙江思考者科技有限公司 | AI intelligent voice interaction-dialect one-key adding and using |
CN111326160A (en) * | 2020-03-11 | 2020-06-23 | 南京奥拓电子科技有限公司 | A speech recognition method, system and storage medium for correcting noise text |
CN112133284B (en) * | 2020-04-23 | 2023-07-07 | 中国医学科学院北京协和医院 | A medical voice dialogue method and device |
CN111754977A (en) * | 2020-06-16 | 2020-10-09 | 普强信息技术(北京)有限公司 | Voice real-time synthesis system based on Internet |
CN112115722A (en) * | 2020-09-10 | 2020-12-22 | 文化传信科技(澳门)有限公司 | A human brain-like Chinese parsing method and intelligent interactive system |
CN112231451B (en) * | 2020-10-12 | 2023-09-29 | 中国平安人寿保险股份有限公司 | Reference word recovery method and device, conversation robot and storage medium |
CN112100338B (en) * | 2020-11-02 | 2022-02-25 | 北京淇瑀信息科技有限公司 | Dialog theme extension method, device and system for intelligent robot |
US11907678B2 (en) | 2020-11-10 | 2024-02-20 | International Business Machines Corporation | Context-aware machine language identification |
CN113327612A (en) * | 2021-05-27 | 2021-08-31 | 广州广电运通智能科技有限公司 | Voice response optimization method, system, device and medium based on intelligent comment |
CN114218452A (en) * | 2021-10-29 | 2022-03-22 | 赢火虫信息科技(上海)有限公司 | Lawyer recommending method and device based on public information and electronic equipment |
CN114386424B (en) * | 2022-03-24 | 2022-06-10 | 上海帜讯信息技术股份有限公司 | Industry professional text automatic annotation method, device, terminal and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1516112A (en) * | 1995-03-01 | 2004-07-28 | ������������ʽ���� | Voice Recognition Dialogue Device |
JP2005025602A (en) * | 2003-07-04 | 2005-01-27 | Matsushita Electric Ind Co Ltd | Sentence / language generation apparatus and selection method thereof |
US20050256717A1 (en) * | 2004-05-11 | 2005-11-17 | Fujitsu Limited | Dialog system, dialog system execution method, and computer memory product |
US20060173686A1 (en) * | 2005-02-01 | 2006-08-03 | Samsung Electronics Co., Ltd. | Apparatus, method, and medium for generating grammar network for use in speech recognition and dialogue speech recognition |
JP2006208905A (en) * | 2005-01-31 | 2006-08-10 | Nissan Motor Co Ltd | Voice dialog device and voice dialog method |
CN101075435A (en) * | 2007-04-19 | 2007-11-21 | 深圳先进技术研究院 | Intelligent chatting system and its realizing method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6499013B1 (en) * | 1998-09-09 | 2002-12-24 | One Voice Technologies, Inc. | Interactive user interface using speech recognition and natural language processing |
-
2007
- 2007-04-19 CN CN2007100741121A patent/CN101075435B/en active Active
-
2008
- 2008-04-15 WO PCT/CN2008/000764 patent/WO2008128423A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1516112A (en) * | 1995-03-01 | 2004-07-28 | ������������ʽ���� | Voice Recognition Dialogue Device |
JP2005025602A (en) * | 2003-07-04 | 2005-01-27 | Matsushita Electric Ind Co Ltd | Sentence / language generation apparatus and selection method thereof |
US20050256717A1 (en) * | 2004-05-11 | 2005-11-17 | Fujitsu Limited | Dialog system, dialog system execution method, and computer memory product |
JP2006208905A (en) * | 2005-01-31 | 2006-08-10 | Nissan Motor Co Ltd | Voice dialog device and voice dialog method |
US20060173686A1 (en) * | 2005-02-01 | 2006-08-03 | Samsung Electronics Co., Ltd. | Apparatus, method, and medium for generating grammar network for use in speech recognition and dialogue speech recognition |
CN101075435A (en) * | 2007-04-19 | 2007-11-21 | 深圳先进技术研究院 | Intelligent chatting system and its realizing method |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108153875A (en) * | 2017-12-26 | 2018-06-12 | 广州蓝豹智能科技有限公司 | Language material processing method, device, intelligent sound box and storage medium |
CN108153875B (en) * | 2017-12-26 | 2022-03-11 | 北京金山安全软件有限公司 | Corpus processing method and device, intelligent sound box and storage medium |
CN107968896A (en) * | 2018-01-08 | 2018-04-27 | 杭州声讯网络科技有限公司 | Unattended communication on telephone system and communication method |
CN109597986A (en) * | 2018-10-16 | 2019-04-09 | 深圳壹账通智能科技有限公司 | Localization method, device, equipment and the storage medium of abnormal problem |
CN112912954A (en) * | 2018-10-31 | 2021-06-04 | 三星电子株式会社 | Electronic device and control method thereof |
US11893982B2 (en) | 2018-10-31 | 2024-02-06 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method therefor |
CN112912954B (en) * | 2018-10-31 | 2024-05-24 | 三星电子株式会社 | Electronic device and control method thereof |
CN109584882A (en) * | 2018-11-30 | 2019-04-05 | 南京天溯自动化控制系统有限公司 | A kind of optimization method and system of the speech-to-text for special scenes |
CN109829052A (en) * | 2019-02-19 | 2019-05-31 | 田中瑶 | A kind of open dialogue method and system based on human-computer interaction |
CN110347996B (en) * | 2019-07-15 | 2023-06-20 | 北京百度网讯科技有限公司 | Text modification method and device, electronic equipment and storage medium |
CN110347996A (en) * | 2019-07-15 | 2019-10-18 | 北京百度网讯科技有限公司 | Amending method, device, electronic equipment and the storage medium of text |
CN110516043A (en) * | 2019-08-30 | 2019-11-29 | 苏州思必驰信息科技有限公司 | Answer generation method and device for question answering system |
CN110516043B (en) * | 2019-08-30 | 2022-09-20 | 思必驰科技股份有限公司 | Answer generation method and device for question-answering system |
CN111125124A (en) * | 2019-11-18 | 2020-05-08 | 云知声智能科技股份有限公司 | Corpus labeling method and apparatus based on big data platform |
CN111125124B (en) * | 2019-11-18 | 2023-04-25 | 云知声智能科技股份有限公司 | Corpus labeling method and device based on big data platform |
CN111259649A (en) * | 2020-01-19 | 2020-06-09 | 深圳壹账通智能科技有限公司 | Interactive data classification method and device of information interaction platform and storage medium |
CN111325034A (en) * | 2020-02-12 | 2020-06-23 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for semantic completion in multi-round conversation |
CN111563029A (en) * | 2020-03-13 | 2020-08-21 | 深圳市奥拓电子股份有限公司 | Testing method, system, storage medium and computer equipment for conversation robot |
CN111666381A (en) * | 2020-06-17 | 2020-09-15 | 中国电子科技集团公司第二十八研究所 | Task type question-answer interaction system oriented to intelligent control |
CN111666381B (en) * | 2020-06-17 | 2022-11-18 | 中国电子科技集团公司第二十八研究所 | Task type question-answer interaction system oriented to intelligent control |
CN111783439B (en) * | 2020-06-28 | 2022-10-04 | 平安普惠企业管理有限公司 | Man-machine interaction dialogue processing method and device, computer equipment and storage medium |
CN111783439A (en) * | 2020-06-28 | 2020-10-16 | 平安普惠企业管理有限公司 | Man-machine interaction dialogue processing method and device, computer equipment and storage medium |
CN111968680A (en) * | 2020-08-14 | 2020-11-20 | 北京小米松果电子有限公司 | Voice processing method, device and storage medium |
CN113641778A (en) * | 2020-10-30 | 2021-11-12 | 浙江华云信息科技有限公司 | A Topic Recognition Method for Dialogue Texts |
CN112562678A (en) * | 2020-11-26 | 2021-03-26 | 携程计算机技术(上海)有限公司 | Intelligent dialogue method, system, equipment and storage medium based on customer service recording |
CN112463108A (en) * | 2020-12-14 | 2021-03-09 | 美的集团股份有限公司 | Voice interaction processing method and device, electronic equipment and storage medium |
CN112463108B (en) * | 2020-12-14 | 2023-03-31 | 美的集团股份有限公司 | Voice interaction processing method and device, electronic equipment and storage medium |
CN112559691B (en) * | 2020-12-22 | 2023-11-14 | 珠海格力电器股份有限公司 | Semantic similarity determining method and device and electronic equipment |
CN112559691A (en) * | 2020-12-22 | 2021-03-26 | 珠海格力电器股份有限公司 | Semantic similarity determination method and device and electronic equipment |
CN113555018A (en) * | 2021-07-20 | 2021-10-26 | 海信视像科技股份有限公司 | Voice interaction method and device |
CN113555018B (en) * | 2021-07-20 | 2024-05-28 | 海信视像科技股份有限公司 | Voice interaction method and device |
CN113535921A (en) * | 2021-07-21 | 2021-10-22 | 携程旅游网络技术(上海)有限公司 | Speech output method, system, electronic device and storage medium for customer service |
CN113869066A (en) * | 2021-10-15 | 2021-12-31 | 中通服创立信息科技有限责任公司 | Semantic understanding method and system based on agricultural field text |
CN117874847A (en) * | 2023-12-19 | 2024-04-12 | 浙江大学 | A human-machine collaborative concept design generation method and system based on FBS theory |
CN118447845A (en) * | 2024-05-31 | 2024-08-06 | 南京龙垣信息科技有限公司 | Intelligent customer service dialogue system and equipment |
CN119599029A (en) * | 2024-11-14 | 2025-03-11 | 广东数业智能科技有限公司 | Psychological accompanying dialogue method based on multi-agent cooperation and storage medium |
CN119940345A (en) * | 2025-04-03 | 2025-05-06 | 湖南科技大学 | Intelligent scene dialogue analysis method and system based on model identification |
Also Published As
Publication number | Publication date |
---|---|
CN101075435B (en) | 2011-05-18 |
CN101075435A (en) | 2007-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2008128423A1 (en) | An intelligent dialog system and a method for realization thereof | |
KR102803154B1 (en) | Tailoring an interactive dialog application based on creator provided content | |
JP6629375B2 (en) | Method and system for estimating user intention in search input of conversational interaction system | |
JP6678764B1 (en) | Facilitating end-to-end communication with automated assistants in multiple languages | |
EP3183728B1 (en) | Orphaned utterance detection system and method | |
JP5166661B2 (en) | Method and apparatus for executing a plan based dialog | |
US9529787B2 (en) | Concept search and semantic annotation for mobile messaging | |
EP3640938B1 (en) | Incremental speech input interface with real time feedback | |
Fang et al. | Sounding board–university of washington’s alexa prize submission | |
KR101677859B1 (en) | Method for generating system response using knowledgy base and apparatus for performing the method | |
KR20090000442A (en) | Universal conversation service device and method | |
JP2001357053A (en) | Dialogue device | |
WO2025071899A1 (en) | Natural language generation | |
CN112883350B (en) | Data processing method, device, electronic equipment and storage medium | |
Hung et al. | Context‐Centric Speech‐Based Human–Computer Interaction | |
Wang et al. | Understanding differences between human language processing and natural language processing by the synchronized model | |
US12271360B1 (en) | Dynamic indexing in key-value stores | |
US20240428787A1 (en) | Generating model output using a knowledge graph | |
Verma et al. | Deep analysis of chatbots: Features, methodology, and comparison | |
JP2003345823A (en) | Access system and access control method | |
JP2004086246A (en) | Conversation control system, conversation control method, program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08733962 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08733962 Country of ref document: EP Kind code of ref document: A1 |