CN109344231A

CN109344231A - Method and system for completing corpus of semantic deformity

Info

Publication number: CN109344231A
Application number: CN201811288739.1A
Authority: CN
Inventors: 魏誉荧
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2019-02-15
Anticipated expiration: 2038-10-31
Also published as: CN109344231B

Abstract

The invention provides a method and a system for complementing corpus of semantic disabilities, wherein the method comprises the following steps: acquiring a corpus sample library with complete semantics, and establishing an audio library, a semantic slot and a regular expression library according to the corpus sample library; acquiring user voice; matching the user voice with the audio library; when the matching result is consistent, determining part-of-speech corresponding to the matched participle according to the semantic slot, wherein the matched participle is the participle matched with the audio library in the user voice; comparing the part of speech of the matched participle with the regular expression library, and completing the incomplete components in the user voice according to the regular expression in the regular expression library to obtain a completed text; and performing semantic analysis according to the completion text. The invention intelligently identifies the real user intention by complementing the incomplete components in the corpus.

Description

A kind of method and system of the semantic incomplete corpus of completion

Technical field

The present invention relates to technical field of voice recognition, the method and system of espespecially a kind of semantic incomplete corpus of completion.

Background technique

With the fast development of internet, people's lives become more and more intelligent, therefore people are also increasingly accustomed to Various demands are completed using intelligent terminal in ground.And with increasingly mature, the intelligence of each Terminal Type of artificial intelligence the relevant technologies Change degree is also higher and higher.AC applications one of of the interactive voice as human-computer interaction mainstream in intelligent terminal, and increasingly Favor by user.

The voice that intelligent terminal is all based on user's input identifies, then takes appropriate measures, therefore user is logical The accuracy for crossing the voice that terminal is inputted drastically influences feedback made by intelligent terminal.

Since user inputs the accident that is likely to occur in voice process, for example, voice input a part interrupted by unexpected or The interference of situations such as part of speech microphone is not got and external environment, such as the excessively noisy intelligent recognition portion of environment Divide voice, occurs the phenomenon of ingredient incompleteness for the voice of above-mentioned acquisition, it is difficult to accurately identify the true intention of user.

In addition for the students of the junior years, the stage of study has just been started due to being in, during language expression, often It will appear language element incompleteness, it is intended that fuzzy situation causes speech recognition product to be difficult to the true user of intelligent recognition and is intended to.

Summary of the invention

The object of the present invention is to provide a kind of method and system of the semantic incomplete corpus of completion, realize and pass through completion corpus The ingredient of middle incompleteness is intended to the true user of intelligent recognition.

Technical solution provided by the invention is as follows:

The present invention provides a kind of method of semantic incomplete corpus of completion characterized by comprising

Semantic complete corpus sample database is obtained, audio repository, semantic slot and canonical table are established according to the corpus sample database Da Shiku；

Obtain user speech；

The user speech and the audio repository are matched；

When matching result is consistent, determine that matching segments corresponding part of speech according to the semantic slot, the matching participle is The participle being consistent in the user speech with the audio storehouse matching；

The part of speech of the matching participle and the regular expression library are compared, according in the regular expression library Regular expression by the incomplete ingredient completion in the user speech, obtain completion text；

Semantic parsing is carried out according to the completion text.

Further, the semantic complete corpus sample database of the acquisition, according to the corpus sample database establish audio repository, Semantic slot and regular expression library specifically include:

The semantic complete corpus sample database is obtained, according to participle technique to the corpus sample in the corpus sample database It is segmented to obtain the participle for including in the corpus sample and corresponding part of speech；

The semantic slot is established according to the participle and the part of speech；

The corresponding audio of the participle is obtained, the audio repository is established according to the audio；

It analyzes the corpus sample summary and obtains regular expression, the regular expressions are established according to the regular expression Formula library.

Further, the analysis corpus sample summary obtains regular expression, according to the regular expression The regular expression library is established to specifically include:

Analyze the incidence relation between the participle in the corpus sample；

Regular expression is obtained according to the part of speech and incidence relation summary, is established according to the regular expression The regular expression library.

Further, described to carry out the user speech and the audio repository after the acquisition user speech Include: before matching

Identification text is converted by the user speech, parses the identification text；

When the identification text component incompleteness, according to the audio repository, the semantic slot and the regular expression library Text is identified described in completion.

Further, described to compare the part of speech of the matching participle and the regular expression library, according to institute The regular expression in regular expression library is stated by the incomplete ingredient completion in the user speech, completion text is obtained and specifically wraps It includes:

Determine relative position of all matching participles in the user speech；

Corresponding part of speech relative position is determined depending on that relative position；

It is compared according to the part of speech relative position and the regular expression library, selects the matching ratio of preset quantity More than or equal to preset ratio regular expression as target regular expression；

According to the target regular expression by the incomplete ingredient completion in the user speech, completion text is obtained.

The present invention also provides a kind of methods of the semantic incomplete corpus of completion characterized by comprising

Database module obtains semantic complete corpus sample database, according to the corpus sample database establish audio repository, Semantic slot and regular expression library；

Module is obtained, user speech is obtained；

Matching module, by the user speech that the acquisition module obtains and the institute that the Database module is established Audio repository is stated to be matched；

Analysis module is determined when matching result is consistent according to the semantic slot that the Database module is established Matching segments corresponding part of speech, and the matching participle is the participle being consistent in the user speech with the audio storehouse matching；

Processing module builds the part of speech for the matching participle that the analysis module determines and the Database module The vertical regular expression library compares, according to the regular expression in the regular expression library by the user speech In incomplete ingredient completion, obtain completion text；

Parsing module carries out semantic parsing according to the completion text that the processing module obtains.

Further, the Database module specifically includes:

Acquiring unit obtains the semantic complete corpus sample database；

Participle unit, the corpus sample in the corpus sample database that the acquiring unit is obtained according to participle technique into Row participle obtains the participle for including in the corpus sample and corresponding part of speech；

Database unit, the participle and the part of speech obtained according to the participle unit establish the semanteme Slot；

The acquiring unit obtains the corresponding audio of the participle；

The Database unit establishes the audio repository according to the audio that the acquiring unit obtains；

Analytical unit, the corpus sample summary analyzed in the corpus sample database that the acquiring unit obtains obtain Regular expression；

The Database unit establishes the canonical table according to the regular expression that the analytical unit obtains Da Shiku.

Further, the analytical unit specifically includes:

Subelement is analyzed, the incidence relation between the participle in the corpus sample is analyzed；

Subelement is generated, is obtained just according to the incidence relation summary that the part of speech and the analysis subelement obtain Then expression formula.

Further, further includes:

Conversion module converts identification text for the user speech that the acquisition module obtains, parses the identification Text；

Completion module, when the conversion module parses the identification text component incompleteness, according to the audio repository, institute Text is identified described in predicate justice slot and regular expression library completion.

Further, handled module specifically includes:

Processing unit determines relative position of all matching participles in the user speech；

The processing unit determines corresponding part of speech with respect to position according to the relative position that the processing unit determines It sets；

Selecting unit, the part of speech relative position and the regular expression library determined according to the processing unit carry out Comparison selects the matching ratio of preset quantity to be more than or equal to the regular expression of preset ratio as target regular expression；

Completion unit, will be residual in the user speech according to the target regular expression that the selecting unit selects Ingredient completion is lacked, completion text is obtained.

A kind of method and system of the semantic incomplete corpus of the completion provided through the invention, can bring following at least one Kind the utility model has the advantages that

1, in the present invention, audio repository, semantic slot and regular expression are established by obtaining semantic complete corpus sample database Library, so that the rule that semantic complete corpus has is analyzed, convenient for subsequent according to the semantic incomplete corpus of the rule completion.

2, in the present invention, first determining whether the user speech obtained, whether ingredient is incomplete, and judgement is ingredient incompleteness completion language again Justice avoids increasing workload.

3, in the present invention, the user speech that will acquire and the feature (sound obtained by semantic complete corpus sample summary Frequency library, semantic slot and regular expression library) it compares, thus the ingredient of the incompleteness of completion most possibly.

Detailed description of the invention

Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, to a kind of semantic incompleteness of completion Above-mentioned characteristic, technical characteristic, advantage and its implementation of the method and system of corpus are further described.

It is a kind of flow chart of one embodiment of the method for the semantic incomplete corpus of completion of the present invention shown in Fig. 1；

It is a kind of flow chart of second embodiment of the method for the semantic incomplete corpus of completion of the present invention shown in Fig. 2；

It is a kind of flow chart of the third embodiment of the method for the semantic incomplete corpus of completion of the present invention shown in Fig. 3；

It is a kind of flow chart of the 4th embodiment of the method for the semantic incomplete corpus of completion of the present invention shown in Fig. 4；

It is a kind of structural representation of the 5th embodiment of the system of the semantic incomplete corpus of completion of the present invention shown in Fig. 5 Figure；

It is a kind of structural representation of the 6th embodiment of the system of the semantic incomplete corpus of completion of the present invention shown in Fig. 6 Figure；

It is a kind of structural representation of the 7th embodiment of the system of the semantic incomplete corpus of completion of the present invention shown in Fig. 7 Figure；

It is a kind of structural representation of the 8th embodiment of the system of the semantic incomplete corpus of completion of the present invention shown in Fig. 8 Figure.

Drawing reference numeral explanation:

The system of the semantic incomplete corpus of 1000 completions

1100 Database module, 1110 acquiring unit, 1120 participle unit, 1130 Database unit 1140 Analytical unit

1141 analysis subelements 1142 generate subelement

1200 obtain 1300 matching module of module, 1400 analysis module

1500 processing module, 1510 processing unit, 1520 selecting unit, 1530 completion unit

1600 parsing module, 1700 conversion module, 1800 completion module

Specific embodiment

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing, and obtain other embodiments.

To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understand Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".

The first embodiment of the present invention, as shown in Figure 1, a kind of method of the semantic incomplete corpus of completion, comprising:

S100 obtains semantic complete corpus sample database, establishes audio repository, semantic slot and just according to the corpus sample database Then expression formula library.

Corpus sample database is established specifically, collecting and obtaining a large amount of semantic complete corpus sample, analyzes all corpus Sample establishes audio repository, semantic slot and regular expression library to sum up feature possessed by semantic complete corpus.

S200 obtains user speech.

Specifically, obtaining user speech, which may be the voice that user inputs in real time, such as user merely enters Part of speech, or since the factors such as environment influence system and only collect to obtain part of speech.It is also likely to be downloading or recording Audio, such as the audio noise of recording is larger, can only identify part of speech.

S500 matches the user speech and the audio repository.

Specifically, the user speech that will acquire and according to a large amount of corpus sample summarize the audio in the audio repository obtained by One is matched.

S600 determines that matching segments corresponding part of speech, the matching point when matching result is consistent, according to the semantic slot Word is the participle being consistent in the user speech with the audio storehouse matching.

Specifically, a certain audio in audio repository is consistent with certain a part of matching result in the user speech of acquisition When, by the corresponding participle of the audio as matching participle, matching participle is found in semantic slot, so that it is determined that matching participle pair The part of speech answered.

S700 compares the part of speech of the matching participle and the regular expression library, according to the regular expression Incomplete ingredient completion in the user speech is obtained completion text by the regular expression in library.

Specifically, the regular expression in the part of speech and regular expression library of matching participle is compared one by one, thus It determines the part of speech that broken partial section is most possible in user speech, has the content of part and the incompleteness of supposition further according to user speech Incomplete ingredient completion in user speech is obtained completion text by partial part of speech.

S800 carries out semantic parsing according to the completion text.

Specifically, being carried out according to the corresponding part of speech of participle in completion text and the relationship between part of speech to completion text Parsing, obtains the semanteme of user speech, to identify the true intention of user, then makes corresponding feedback or measure.

In the present embodiment, audio repository, semantic slot and regular expression are established by obtaining semantic complete corpus sample database Library, so that the rule that semantic complete corpus has is analyzed, convenient for subsequent according to the semantic incomplete corpus of the rule completion. Then the user speech that will acquire and the feature obtained (audio repository, semantic slot and just are summarized by semantic complete corpus sample Then expression formula library) it compares, thus the ingredient of the incompleteness of completion most possibly.

The second embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment, as shown in Figure 2, comprising:

S110 obtains the semantic complete corpus sample database, according to participle technique to the corpus in the corpus sample database Sample is segmented to obtain the participle for including in the corpus sample and corresponding part of speech.

Corpus sample database is established specifically, collecting and obtaining a large amount of semantic complete corpus sample, corpus sample is not only Refer to penman text, further include voice, audio etc., difference is that the corpus sample such as voice, audio needs first to be converted to corresponding text Then this information carries out subsequent processing.

Corpus sample is segmented according to participle technique, judges the structure of sentence in corpus sample, identifies corpus sample In every a word in word part of speech, then by every a word in corpus sample according to the part of speech of word by entire sentence It is divided into the participles such as word, word and phrase composition.Therefore the participle for including in corpus sample and corresponding part of speech have been obtained.

S120 establishes the semantic slot according to the participle and the part of speech.

Specifically, all participles for including in above-mentioned all corpus samples are obtained, according to all participle and participle Corresponding part of speech establishes semantic slot, and in the corresponding relationship established between participle and part of speech in semantic slot.

S130 obtains the corresponding audio of the participle, establishes the audio repository according to the audio.

Specifically, obtaining each segments corresponding audio, it is same due to the influence of the factors such as age of user and accent A participle may correspond to multiple audios, the different audios of the same participles of acquisition more as far as possible, and one time subsequent to identify comprehensively User speech avoids omitting.Then audio repository is established according to all audios, is being established between participle and audio in audio repository Corresponding relationship.

S140 analyzes the corpus sample summary and obtains regular expression, establishes the canonical according to the regular expression Expression formula library.

Specifically, analyzing the summary of each corpus sample one by one obtains regular expression, each corpus sample corresponding one Regular expression, establishes regular expression library according to all regular expressions, statisticallys analyze in all regular expressions Proportion is more than or equal to the rule of preset ratio, using the rule as the rule of the incomplete ingredient of completion corpus, such as semanteme The participle of a certain part of speech should be regular with another or a variety of specific participle connections etc. in complete corpus.

S200 obtains user speech.

S500 matches the user speech and the audio repository.

S800 carries out semantic parsing according to the completion text.

Wherein, the S140 analyzes the corpus sample summary and obtains regular expression, is built according to the regular expression The regular expression library is stood to specifically include:

S141 analyzes the incidence relation between the participle in the corpus sample.

Specifically, above-mentioned obtained the participle for including in corpus sample and the corresponding word of the participle according to participle technique Property, then according to the incidence relation between the participle in the constituent analysis corpus sample of sentence in corpus sample.

For example, a certain corpus sample are as follows: which the composition for describing autumn has.Judged in the corpus sample by participle technique The part of speech for the word covered: (auxiliary word) composition (noun) for describing (verb) autumn (time word) has (verb) which (pronoun), Relationship between word are as follows: relationship in fixed: (verb) is described in composition (noun)-, moves guest's relationship: describing (the time in (verb)-autumn Word).

S142 obtains regular expression according to the part of speech and incidence relation summary, according to the regular expression Establish the regular expression library.

Specifically, obtaining regular expressions according to the part of speech and mutual incidence relation summary that segment in corpus sample Formula establishes regular expression library according to all regular expressions, and it is big to statistically analyze proportion in all regular expressions In the rule for being equal to preset ratio, using the rule as the rule of the incomplete ingredient of completion corpus, such as semantic complete corpus The participle of middle a certain kind part of speech should be regular with another or a variety of specific participle connections etc..

For example, a certain corpus sample are as follows: which the composition for describing autumn has.Judged in the corpus sample by participle technique The part of speech for the word covered: (auxiliary word) composition (noun) for describing (verb) autumn (time word) has (verb) which (pronoun), Relationship between word are as follows: relationship in fixed: (verb) is described in composition (noun)-, moves guest's relationship: describing (the time in (verb)-autumn Word).Therefore the corresponding corpus regular expression of the corpus sample are as follows: verb # time word # auxiliary word # noun # verb # pronoun.Noun It is relationship in fixed with first verb, first verb and time word are guest's relationship.

In the present embodiment, semantic complete corpus sample is segmented according to participle technique, to establish audio repository, language Adopted slot and regular expression library, and therefrom the rule that semantic complete corpus has is precipitated in statistical, convenient for subsequent according to this The semantic incomplete corpus of regular completion.

The third embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment, as shown in Figure 3, comprising:

S200 obtains user speech.

The user speech is converted identification text by S300, parses the identification text.

S400 is when the identification text component incompleteness, according to the audio repository, the semantic slot and the regular expressions Text is identified described in the completion of formula library.

Specifically, the user speech that will acquire is converted into identification text, the identification text is parsed, judges the identification text Whether ingredient incomplete, if incomplete, according to above by a large amount of semantic complete corpus sample summarize the audio repository obtained, The ingredient of semantic slot and the completion of regular expression library the identification text.If ingredient is not incomplete, directly according to the identification text The true intention of user is identified, to take corresponding feedback or measure.

S500 matches the user speech and the audio repository.

S800 carries out semantic parsing according to the completion text.

In the present embodiment, after getting user speech, first determining whether the user speech obtained, whether ingredient is incomplete, only Have and just take corresponding method completion semantic when determining the ingredient incompleteness of user speech, to avoid increasing workload.

The fourth embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment, as shown in Figure 4, comprising:

S200 obtains user speech.

S500 matches the user speech and the audio repository.

S710 determines relative position of all matching participles in the user speech.

Specifically, the matching participle that matching is consistent is obtained after the audio in user speech and audio repository is matched, Determine that position and all matching of the matching participle in the user speech segment mutual relative position.

S720 determines corresponding part of speech relative position depending on that relative position.

Specifically, user speech is matched after obtaining the matching participle that matching is consistent with the audio in audio repository, It is determined to match according to semantic slot and segments corresponding part of speech, mutual relative position is then segmented to obtain by obtained matching Corresponding part of speech relative position is segmented to matching.

S730 is compared according to the part of speech relative position and the regular expression library, selects the matching of preset quantity Ratio is more than or equal to the regular expression of preset ratio as target regular expression.

Specifically, the regular expression in part of speech relative position and regular expression library is compared one by one to obtain the two Matching ratio, choose wherein preset quantity matching ratio be more than or equal to preset ratio regular expression as objective expression Formula, the preset quantity and the preset ratio are independently selected by system intelligent set or user.

S740, by the incomplete ingredient completion in the user speech, obtains completion text according to the target regular expression This.

Specifically, the part of speech relative position of matching participle to be compared to the target regular expression of selection, incomplete portion is judged Then the part of speech divided can identify that the Semantic judgement of parsing part is most possibly semantic according to user speech, what is judged The word of above-mentioned corresponding semanteme is selected in the part of speech of broken partial section part by the incomplete ingredient completion in user speech, obtains completion Text.

S800 carries out semantic parsing according to the completion text.

In the present embodiment, user speech is performed corresponding processing and determines matching participle, obtains the opposite position of matching participle It sets and part of speech relative position, selection target regular expression is compared, thus by the incomplete ingredient completion in user speech, So as to identify the true intention of user, corresponding feedback or measure are then taken.

The fifth embodiment of the present invention, as shown in figure 5, a kind of system 1000 of the semantic incomplete corpus of completion, comprising:

Database module 1100 obtains semantic complete corpus sample database, establishes sound according to the corpus sample database Frequency library, semantic slot and regular expression library.

Module 1200 is obtained, user speech is obtained.

Matching module 1300, the user speech that the acquisition module 1200 is obtained and the Database module 1100 audio repositories established are matched.

Analysis module 1400, when matching result is consistent, according to institute's predicate of the Database module 1100 foundation Adopted slot determines that matching segments corresponding part of speech, and the matching, which segments, to be consistent in the user speech with the audio storehouse matching Participle.

Processing module 1500 builds the part of speech for the matching participle that the analysis module 1400 determines and the database The regular expression library that formwork erection block 1100 is established compares, will according to the regular expression in the regular expression library Incomplete ingredient completion in the user speech, obtains completion text.

Parsing module 1600 carries out semantic parsing according to the completion text that the processing module 1500 obtains.

In the present embodiment, audio repository, semantic slot and regular expression are established by obtaining semantic complete corpus sample database Library, so that the feature that semantic complete corpus has is analyzed, convenient for the semantic incomplete corpus of subsequent completion.Then it will acquire User speech and pass through semantic complete corpus sample and summarize the feature (audio repository, semantic slot and regular expression library) obtained It compares, thus the ingredient of the incompleteness of completion most possibly.

The sixth embodiment of the present invention is the optimal enforcement example of above-mentioned 5th embodiment, as shown in Figure 6, comprising:

The Database module 1100 specifically includes:

Acquiring unit 1110 obtains the semantic complete corpus sample database.

Specifically, acquiring unit 1110, which collects a large amount of semantic complete corpus sample of acquisition, establishes corpus sample database, language Material sample refers not only to penman text, further includes voice, audio etc., and difference is that the corpus sample such as voice, audio needs first to turn It is melted into corresponding text information, then carries out subsequent processing.

Participle unit 1120, the language in the corpus sample database that the acquiring unit 1110 is obtained according to participle technique Material sample is segmented to obtain the participle for including in the corpus sample and corresponding part of speech.

Specifically, participle unit 1120 segments corpus sample according to participle technique, sentence in corpus sample is judged Structure, identify corpus sample in every a word in word part of speech, then by basis in every a word in corpus sample Entire sentence is divided into the participles such as word, word and phrase and constituted by the part of speech of word.Therefore obtained include in corpus sample Participle and corresponding part of speech.

Database unit 1130, the participle and the part of speech obtained according to the participle unit 1120 establish institute Predicate justice slot.

Specifically, all participles for including in above-mentioned all corpus samples are obtained, Database unit 1130 According to all corresponding semantic slots of part of speech foundation of participle and participle, and the corresponding pass between participle and part of speech is being established in semantic slot System.

The acquiring unit 1110 obtains the corresponding audio of the participle.

The Database unit 1130 establishes the audio according to the audio that the acquiring unit 1110 obtains Library.

Specifically, acquiring unit 1110, which obtains each, segments corresponding audio, due to age of user and accent etc. because The influence of element, the same participle may correspond to multiple audios, and the different audios of the same participles of acquisition more as far as possible, one time subsequent User speech can be identified comprehensively, avoid omitting.Then Database unit 1130 establishes audio repository according to all audios, In the corresponding relationship established in audio repository between participle and audio.

Analytical unit 1140 analyzes the corpus sample in the corpus sample database that the acquiring unit 1110 obtains Summary obtains regular expression.

Specifically, analytical unit 1140 analyzes the summary of each corpus sample one by one obtains regular expression, each language Expect the corresponding regular expression of sample, regular expression library is established according to all regular expressions, is statisticallyd analyze all Proportion is more than or equal to the rule of preset ratio in regular expression, using the rule as the rule of the incomplete ingredient of completion corpus Then, such as in semantic complete corpus the participle of a certain part of speech should be with another or a variety of specific participle connection isotactics Then.

The analytical unit 1140 specifically includes:

Subelement 1141 is analyzed, the incidence relation between the participle in the corpus sample is analyzed.

Specifically, above-mentioned obtained the participle for including in corpus sample and the corresponding word of the participle according to participle technique Property, it then analyzes subelement 1141 and is closed according to the association between the participle in the constituent analysis corpus sample of sentence in corpus sample System.

Subelement 1142 is generated, the incidence relation obtained according to the part of speech and the analysis subelement 1141 is total Knot obtains regular expression.

The Database unit 1130 establishes institute according to the regular expression that the analytical unit 1140 obtains State regular expression library.

Specifically, it is total according to the part of speech and mutual incidence relation segmented in corpus sample to generate subelement 1142 Knot show that regular expression, Database unit 1130 establish regular expression library according to all regular expressions, counts The rule that proportion in all regular expressions is more than or equal to preset ratio is analyzed, using the rule as the residual of completion corpus Lack a certain part of speech in the rule of ingredient, such as semantic complete corpus participle should with another or it is specific point a variety of The rules such as word connection.

Module 1200 is obtained, user speech is obtained.

The seventh embodiment of the present invention is the optimal enforcement example of above-mentioned 5th embodiment, as shown in fig. 7, comprises:

Module 1200 is obtained, user speech is obtained.

Conversion module 1700 converts identification text for the user speech that the acquisition module 1200 obtains, parses The identification text.

Completion module 1800, when the conversion module 1700 parses the identification text component incompleteness, according to described Text is identified described in audio repository, the semantic slot and regular expression library completion.

Specifically, the user speech that conversion module 1700 will acquire is converted into identification text, the identification text, completion are parsed Module 1800 judges whether the ingredient of the identification text is incomplete, if incomplete, according to above by a large amount of semantic complete Corpus sample summarizes the ingredient of the audio repository obtained, semantic slot and the completion of regular expression library the identification text.If ingredient is not Incompleteness, then directly according to the true intention of identification text identification user, to take corresponding feedback or measure.

The eighth embodiment of the present invention is the optimal enforcement example of above-mentioned 5th embodiment, as shown in Figure 8, comprising:

Module 1200 is obtained, user speech is obtained.

Handled module 1500 specifically includes:

Processing unit 1510 determines relative position of all matching participles in the user speech.

Specifically, the matching participle that matching is consistent is obtained after the audio in user speech and audio repository is matched, Processing unit 1510 determines that position and all matching of the matching participle in the user speech segment mutual phase To position.

The processing unit 1510 determines corresponding part of speech according to the relative position that the processing unit 1510 determines Relative position.

Specifically, user speech is matched after obtaining the matching participle that matching is consistent with the audio in audio repository, Determine that matching segments corresponding part of speech according to semantic slot, then processing unit 1510 segments mutual phase by obtained matching Corresponding part of speech relative position is segmented to obtain matching to position.

Selecting unit 1520, the part of speech relative position determined according to the processing unit 1510 and the regular expressions Formula library compares, and the matching ratio of preset quantity is selected to be more than or equal to the regular expression of preset ratio as target canonical table Up to formula.

Specifically, the regular expression in part of speech relative position and regular expression library is compared one by one to obtain the two Matching ratio, selecting unit 1520 choose wherein preset quantity matching ratio be more than or equal to preset ratio regular expression As goal expression, the preset quantity and the preset ratio are independently selected by system intelligent set or user.

Completion unit 1530, according to the target regular expression of the selecting unit 1520 selection by user's language Incomplete ingredient completion in sound, obtains completion text.

Specifically, the part of speech relative position of matching participle to be compared to the target regular expression of selection, incomplete portion is judged Then the part of speech divided can identify that the Semantic judgement of parsing part is most possibly semantic according to user speech, what is judged The word completion unit 1530 of above-mentioned corresponding semanteme is selected in the part of speech of broken partial section part by the incomplete ingredient in user speech Completion obtains completion text.

It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferred Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims

1. a kind of method of the semantic incomplete corpus of completion characterized by comprising

Semantic complete corpus sample database is obtained, audio repository, semantic slot and regular expression are established according to the corpus sample database Library；

Obtain user speech；

The user speech and the audio repository are matched；

When matching result is consistent, determine that matching segments corresponding part of speech according to the semantic slot, the matching participle is described The participle being consistent in user speech with the audio storehouse matching；

By it is described matching participle part of speech and the regular expression library compare, according in the regular expression library just Then the incomplete ingredient completion in the user speech is obtained completion text by expression formula；

Semantic parsing is carried out according to the completion text.

2. the method for the semantic incomplete corpus of completion according to claim 1, which is characterized in that the acquisition semanteme is complete Whole corpus sample database is established audio repository, semantic slot and regular expression library according to the corpus sample database and is specifically included:

The semantic complete corpus sample database is obtained, the corpus sample in the corpus sample database is carried out according to participle technique Participle obtains the participle and corresponding part of speech for including in the corpus sample；

It analyzes the corpus sample summary and obtains regular expression, the regular expression is established according to the regular expression Library.

3. the method for the semantic incomplete corpus of completion according to claim 2, which is characterized in that the analysis institute predicate Material sample summary obtains regular expression, establishes the regular expression library according to the regular expression and specifically includes:

Analyze the incidence relation between the participle in the corpus sample；

Regular expression is obtained according to the part of speech and incidence relation summary, according to regular expression foundation Regular expression library.

4. the method for the semantic incomplete corpus of completion according to claim 1, which is characterized in that acquisition user's language After sound, it is described the user speech and the audio repository are matched before include:

When the identification text component incompleteness, according to the audio repository, the semantic slot and regular expression library completion The identification text.

5. the method for the semantic incomplete corpus of completion according to claim 1-4, which is characterized in that it is described general The part of speech of the matching participle and the regular expression library compare, according to the regular expressions in the regular expression library Incomplete ingredient completion in the user speech is obtained completion text and specifically included by formula:

Determine relative position of all matching participles in the user speech；

It is compared according to the part of speech relative position and the regular expression library, the matching ratio of preset quantity is selected to be greater than Equal to preset ratio regular expression as target regular expression；

6. a kind of system of the semantic incomplete corpus of completion characterized by comprising

Database module obtains semantic complete corpus sample database, establishes audio repository, semanteme according to the corpus sample database Slot and regular expression library；

Module is obtained, user speech is obtained；

Matching module, by the user speech that the acquisition module obtains and the sound that the Database module is established Frequency library is matched；

Analysis module determines matching according to the semantic slot that the Database module is established when matching result is consistent Corresponding part of speech is segmented, the matching participle is the participle being consistent in the user speech with the audio storehouse matching；

Processing module, what the part of speech and the Database module for the matching participle that the analysis module is determined were established The regular expression library compares, will be in the user speech according to the regular expression in the regular expression library Incomplete ingredient completion, obtains completion text；

7. the system of the semantic incomplete corpus of completion according to claim 6, which is characterized in that the Database mould Block specifically includes:

Acquiring unit obtains the semantic complete corpus sample database；

Participle unit, the corpus sample in the corpus sample database obtained according to participle technique to the acquiring unit divide Word obtains the participle for including in the corpus sample and corresponding part of speech；

Database unit, the participle and the part of speech obtained according to the participle unit establish the semantic slot；

The acquiring unit obtains the corresponding audio of the participle；

Analytical unit, the corpus sample summary analyzed in the corpus sample database that the acquiring unit obtains obtain canonical Expression formula；

The Database unit establishes the regular expression according to the regular expression that the analytical unit obtains Library.

8. the system of the semantic incomplete corpus of completion according to claim 7, which is characterized in that the analytical unit is specific Include:

Subelement is generated, canonical table is obtained according to the incidence relation summary that the part of speech and the analysis subelement obtain Up to formula.

9. the system of the semantic incomplete corpus of completion according to claim 6, which is characterized in that further include:

Completion module, when the conversion module parses the identification text component incompleteness, according to the audio repository, institute's predicate Text is identified described in adopted slot and regular expression library completion.

10. according to the system of the semantic incomplete corpus of the described in any item completions of claim 6-9, which is characterized in that handled Module specifically includes:

The processing unit determines corresponding part of speech relative position according to the relative position that the processing unit determines；

Selecting unit, the part of speech relative position and the regular expression library determined according to the processing unit carry out pair Than selecting the matching ratio of preset quantity to be more than or equal to the regular expression of preset ratio as target regular expression；

Completion unit, according to the target regular expression of selecting unit selection by the user speech it is incomplete at Divide completion, obtains completion text.