WO2011096015A1 - 認識辞書作成装置及び音声認識装置 - Google Patents
認識辞書作成装置及び音声認識装置 Download PDFInfo
- Publication number
- WO2011096015A1 WO2011096015A1 PCT/JP2010/000709 JP2010000709W WO2011096015A1 WO 2011096015 A1 WO2011096015 A1 WO 2011096015A1 JP 2010000709 W JP2010000709 W JP 2010000709W WO 2011096015 A1 WO2011096015 A1 WO 2011096015A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- language
- unit
- reading
- text
- registered
- Prior art date
Links
- 238000012545 processing Methods 0.000 claims description 38
- 238000007781 pre-processing Methods 0.000 claims description 36
- 238000006243 chemical reaction Methods 0.000 claims description 31
- 238000000034 method Methods 0.000 claims description 24
- 230000007717 exclusion Effects 0.000 claims description 10
- 238000010586 diagram Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Definitions
- the present invention relates to a recognition dictionary creation device for creating a dictionary of vocabulary to be subjected to speech recognition and a speech recognition device using the recognition dictionary creation device.
- Patent Document 1 discloses a speech recognition apparatus that can perform speech recognition corresponding to multiple languages by simultaneously using acoustic models of a plurality of languages that are targets of speech recognition.
- Patent Document 1 it is necessary to have a multilingual acoustic model corresponding to all of a plurality of languages to be subjected to speech recognition, and is applicable to speech recognition corresponding to only one general language. There was a problem that it was not possible. In the invention of Patent Document 1, it is necessary to specify in advance which language the recognition vocabulary is described in and give a reading. On the other hand, in the speech recognition corresponding to only one language, the recognition target language reading is automatically generated and recognized for the registration target text to which no reading is given. At this time, reading cannot be given to text written in a language different from the recognition target language.
- the present invention has been made to solve the above-described problems. Even when the language of a vocabulary to be registered in the recognition dictionary is unknown, the phoneme system of the language for speech recognition is converted from the vocabulary. It is an object of the present invention to obtain a recognition dictionary creation device capable of creating a recognition dictionary obtained by converting readings and a speech recognition device using the recognition dictionary creation device.
- a recognition dictionary creation device includes a language identification unit that identifies a reading language of input text to be registered, and a reading that gives a reading to the text to be registered with a phoneme of the language identified by the language identification unit.
- An assigning unit a reading conversion unit that converts the reading of the text to be registered from the phoneme of the language identified by the language identification unit to the phoneme of the recognition target language handled by speech recognition, and the registration target converted by the reading conversion unit
- a recognition dictionary generation unit that generates a recognition dictionary in which text readings are registered.
- the reading language of the input registration target text is identified, the reading of the registration target text is given with the phoneme of the identified language, and the registration target text reading is identified.
- a recognition dictionary in which readings converted from language readings into recognition target languages handled by speech recognition are registered is generated. In this way, even if it is unclear in which language the text (vocabulary) to be registered in the recognition dictionary is written, a recognition dictionary that conforms to the phoneme system of the speech recognition language is obtained. There is an effect that can be.
- FIG. 4 is a flowchart showing a flow of a recognition dictionary creation operation by the recognition dictionary creation device of the first embodiment. It is a figure which shows an example of the correspondence table of the phoneme whose pronunciation is similar in German and English. 4 is a flowchart showing a flow of a recognition dictionary creation operation by the recognition dictionary creation device of the first embodiment. It is a block diagram which shows the structure of the registration type speech recognition apparatus using the recognition dictionary creation apparatus by Embodiment 2 of this invention. 10 is a flowchart showing a flow of a recognition dictionary creation operation by the recognition dictionary creation device of the second embodiment.
- FIG. 1 is a block diagram showing a configuration of a registration type speech recognition apparatus using a recognition dictionary creation apparatus according to Embodiment 1 of the present invention.
- a speech recognition apparatus 100 includes a language identification unit 101, a reading imparting unit 102, a reading conversion unit 103, a recognition dictionary generation unit 104, a recognition dictionary storage unit 105, and a speech recognition unit 106.
- the language identification unit 101, the reading imparting unit 102, the reading conversion unit 103, the recognition dictionary generation unit 104, and the recognition dictionary storage unit 105 are the configuration of the recognition dictionary creation apparatus according to the first embodiment.
- the language identification unit 101 is a component that identifies the language of a vocabulary text character string (hereinafter referred to as registration target text) to be registered in the recognition dictionary.
- the text to be registered includes text strings of vocabulary whose language is difficult to specify. For example, bibliographic data such as song titles and artist names registered in a portable music player, place names registered in a cellular phone, Names are listed.
- the reading imparting unit 102 is a constituent unit that imparts a phoneme to a registration target text in the language identified by the language identifying unit 101.
- the reading conversion unit 103 is a component that converts the reading given by the reading giving unit 102 into a phoneme that is used in voice recognition performed by the voice recognition unit 106.
- the recognition dictionary generation unit 104 generates a phoneme converted by the reading conversion unit 103 as a vocabulary that is a target of speech recognition (hereinafter referred to as a recognition target vocabulary) and registers it in the recognition dictionary of the recognition dictionary storage unit 105.
- the recognition dictionary storage unit 105 is a storage unit that can be read and written by the recognition dictionary generation unit 104 and the voice recognition unit 106, and stores a recognition dictionary that registers the recognition target vocabulary generated by the recognition dictionary generation unit 104.
- the speech recognition unit 106 is a component that performs speech recognition using the recognition target vocabulary of the recognition dictionary stored in the recognition dictionary storage unit 105 and outputs a recognition result.
- the language identification unit 101, the reading imparting unit 102, the reading conversion unit 103, the recognition dictionary generating unit 104, the recognition dictionary storage unit 105, and the speech recognition unit 106 store a recognition dictionary creation program according to the gist of the present invention in a computer, When executed by the CPU, it can be realized on the computer as a specific means in which hardware and software cooperate. Further, the storage area used by the recognition dictionary storage unit 105 and the speech recognition unit 106 is constructed in a storage device mounted on the computer, such as a hard disk device or an external storage medium.
- FIG. 2 is a flowchart showing the flow of a recognition dictionary creation operation performed by the recognition dictionary creation device according to the first embodiment.
- the language identification unit 101 the reading imparting unit 102, the reading conversion unit 103, and the recognition for one registration target text. Details of the operation by the dictionary generation unit 104 will be described.
- the language identification unit 101 starts language identification processing for a character string of a registration target text, and determines in which language the character string is described (step ST201). Specifically, it is determined which language the character string of the input registration target text corresponds to among a plurality of languages set in the language identification unit 101.
- the language identification unit 101 when six European languages such as English, German, French, Italian, Spanish, and Dutch are set as language identification targets, the character string of the input registration target text is If “Guten Morgen”, the language identification unit 101 outputs a language identification result indicating that the language of the character string is German. When the language cannot be identified due to failure in language identification, the language identification unit 101 outputs a language that the speech recognition unit 106 can accept as a speech recognition target as an identification result.
- N-gram of characters For language identification by the language identification unit 101, for example, N-gram of characters is used. N-gram itself is an analysis model for language information established by Claud Elwood Shannon, and is used for language models for speech recognition, full-text search, and the like. These general methods of using N-gram are described in Reference Document 1 below. (Reference 1) “A Mathematical Theory of Communication”, CESHANNON, The Bell System Technical Journal, Vol.27, pp.379-423,623-656, July, October, 1948.
- the language identification unit 101 has a text string for learning written in each language as a language identification target, and obtains the appearance probability of three chains of characters appearing in the learning character string. For example, in the case of the character string “MITSUBISHI”, “$$ M”, “$ MI”, “MIT”, “ITS”, “TSU”, “SUB”, “UBI”, “BIS”, “ISH”, “ It can be decomposed into each chain of “SHI”, “HI $”, and “I $$”. Note that “$” means a character representing the beginning or end of the word.
- the top n are adopted as models, and character chains and their appearance frequencies are stored in the language identification model of each language.
- the language identification model of language i stores a character chain (Trigram) “$$ M”, an appearance probability Pi ($, $, M) at that time, and the like.
- the language identification unit 101 obtains the chain probability of the character chain (Trigram) for each language using the language identification model of each language for the character string of the registration target text, and selects the language having the largest chain probability value. Adopt as a result of language identification.
- the language i having the maximum linkage probability Pi is the language identification result.
- the language identification unit 101 When N-gram is not stored in the language identification model, the language identification unit 101 performs calculation by giving a predetermined constant probability as the appearance probability.
- special characters are used for characters (for example, numbers, parentheses, periods, etc.) that are commonly described in multiple languages that are subject to language identification and do not contribute to language identification.
- the N-gram may be obtained by replacing it with the character to be represented in advance. For example, special characters such as # and @ are used.
- a language having the highest likelihood among the languages in which the character is used may be output as the identification result.
- a character with which the language used is limited the character etc. which used the umlaut are mentioned, for example.
- a task means a process in which a recognition target vocabulary such as music search or address recognition is used.
- the language identification unit 101 includes a learning character string for each task, and uses a learning character string corresponding to a task in which the registration target text is used for language identification.
- the reading assigning unit 102 determines which language the identification result is from among a plurality of languages set in the speech recognition apparatus 100 (step S1).
- step S1 A reading is given to the input character string of the text to be registered using phonemes in the language of the determination result (step ST203). If the speech recognition unit 106 recognizes a language that is currently the target of speech recognition, reading is given with phonemes in this recognition target language. Similarly, even if the language of the determination result is any one of languages 1, 2,..., N other than the recognition target language, as shown in FIG. For example, G2P (Grapheme to Phoneme) is used for giving phonemes.
- language-dependent processing such as abbreviation determination and symbol processing is also performed.
- the reading conversion unit 103 converts the phoneme reading in each language into the phoneme reading in the recognition target language with respect to the registration target text that has been read in the phoneme of the language other than the recognition target language (Ste ST204).
- the reason for converting the phoneme system in this manner is that the phoneme system that can be accepted by the speech recognition unit 106 is only the recognition target language that is the target of speech recognition, and there is a phoneme that cannot be accepted for a phoneme system of a different language. It is.
- a phoneme (reading) conversion method for example, the phoneme or phoneme of the language to be recognized among the phonemes or phoneme sequences of the language that the reading conversion unit 103 cannot accept by the speech recognition unit 106 is used. Is prepared in advance as a correspondence table, and reading conversion (phoneme mapping) is performed on the reading of the text character string obtained in step ST203 in accordance with the correspondence table.
- FIG. 3 is a diagram showing an example of the correspondence table as described above, and shows the correspondence between German and English.
- the pronunciation / a / (non-lip front tongue wide vowel) and / Y / (narrow lip front wide vowel) are not in the English English pronunciation system. For this reason, when the speech recognition unit 106 accepts British English, it does not support the reading. Therefore, for the phonetic pronunciation / a / or / Y / in the German language, as shown in the correspondence table shown in FIG. For example, it is associated with / ⁇ / (wide vowel with non-lip front tongue narrowing) or / ⁇ / (round lip front tongue half vowel).
- the pronunciation notation here uses X-SAMPA notation.
- This correspondence table may associate linguistically similar ones, but the correspondence may be determined based on, for example, which phoneme notation is easy to recognize the pronunciation of each language.
- the recognition dictionary generation unit 104 inputs the phoneme given to the character string of the registration target text by the reading provision unit 102 in step ST203 or the phoneme converted by the reading conversion unit 103 in step ST204, and the speech recognition unit 106 refers to it.
- a recognition dictionary of a possible format is generated (step ST205). For example, in addition to converting the recognition vocabulary into binary data, the recognition dictionary is obtained by performing morphological analysis and word division as necessary to create language constraints. When there are a plurality of vocabularies that are registration target texts, the above processing is repeated for each registration target text. Note that the generation of the recognition dictionary may be performed collectively after reading is added to the vocabulary of all the registration target texts, rather than the additional registration for each vocabulary.
- the recognition dictionary generated by the recognition dictionary generation unit 104 is stored in the recognition dictionary storage unit 105.
- the speech recognition unit 106 performs speech recognition of the input speech with reference to the recognition vocabulary and grammar described in the recognition dictionary stored in the recognition dictionary storage unit 105, and outputs a recognition result.
- the speech recognition unit 106 reads a recognition dictionary described with phonemes of a phoneme system in a specific language, and recognizes input speech in a specific language.
- a speech recognition algorithm for example, HMM (Hidden Markov Model), DP (Dynamic Programming) matching or the like is used.
- FIG. 4 is a flowchart showing the flow of a recognition dictionary creation operation by the recognition dictionary creation device of the first embodiment, and shows a case where N languages are identified by the language identification unit 101.
- the language identification unit 101 starts a language identification process for the character string of the registration target text, determines in which language the character string is described, and is considered as the language of the character string. N languages are set as language identification results (step ST301).
- the language corresponding to the first identification result is set in the reading assigning unit 102.
- Step ST302 is the same as step ST202 shown in FIG. 2
- step ST303 is the same as step ST203 shown in FIG. 2
- step ST304 is the same as step ST204 shown in FIG.
- Processing, step ST305, is the same processing as step ST205 shown in FIG.
- step ST306 the language identification unit 101 increments the counter i by +1 and repeats the above series of processes in the language of the next identification result.
- the language identification unit 101 determines in step ST307 that the above series of processing has been completed for all the identification result languages based on the count value of the counter i (i ⁇ N + 1), the input registration target text The registration process for is terminated.
- the recognition dictionary As a result, even if one registration target text is described in a plurality of languages, these languages are identified, readings are given with the phonemes, and then converted into phoneme readings of the recognition target language. By doing so, it can be registered in the recognition dictionary as a recognition vocabulary. Therefore, even if the user utters the text character string in any language identified by the language identification unit 101, it is possible to perform speech recognition using the corresponding recognition vocabulary registered in the recognition dictionary.
- the above-described processing is repeated for each registration target text, as in the case where one language is obtained as an identification result.
- the recognition dictionary is generated not by additionally registering each language obtained as a result of language identification for one registration target text, but by additionally registering all languages identified by the vocabulary of one registration target text. Also good. Or you may carry out collectively, after giving a reading with respect to the vocabulary of all the registration object texts.
- the reading language of the input registration target text is identified, the reading is given to the registration target text with the phonemes of the identified language, and the reading of the registration target text is performed.
- a recognition dictionary in which readings obtained by converting the readings of the identified language into the recognition target language handled by speech recognition is registered.
- the language identification model using N-gram identifies the language of the text to be registered, assigns the phoneme in the identified language, and converts it into a phoneme of a language that can be accepted by speech recognition
- it can be registered as a recognition vocabulary to be referred to in speech recognition.
- the registration target text may correspond to a plurality of languages. And register as a recognized vocabulary. In this way, speech recognition is possible regardless of the language the user pronounces.
- the language identification unit 101 sets a score representing the reliability for each language for the language identification result, and the reliability is compared with a predetermined threshold value for this score. A high language is output as a final identification result.
- language identification is performed using N-gram, so that stable language identification performance is obtained as compared with the case where language determination is performed by preparing a word dictionary or the like for each language. be able to.
- the dictionary size can be reduced, and the amount of calculation and memory consumption can be reduced.
- characters that do not contribute to language identification for example, symbols such as numbers, parentheses, and periods
- the replaced N-gram is used.
- the size of the storage area of the language identification model can be reduced, and the search time and memory consumption of the language identification model can be reduced. It is easy to apply the present invention.
- the language is identified from the languages in which the character is used. By doing so, it becomes possible to improve the accuracy of language identification.
- a language identification model is created using a vocabulary of the same task as the recognition target vocabulary (processing using the recognition target vocabulary). Thus, it is possible to improve the accuracy of language identification.
- FIG. FIG. 5 is a block diagram showing a configuration of a registration type speech recognition apparatus using the recognition dictionary creation apparatus according to Embodiment 2 of the present invention.
- the speech recognition device 100A according to the second embodiment includes a language identification preprocessing unit 107, a fixed character string storage unit (exclusion target storage unit) 108, and a division.
- a character string storage unit (division target storage unit) 109 and a conversion processing storage unit (processing content storage unit) 110 are provided.
- the language identification pre-processing unit 107 is a component that is arranged in the preceding stage of the language identification unit 101 and receives registration target text.
- a specific character or character string (hereinafter referred to as a fixed character or a fixed character string) is excluded from language identification targets, or a predetermined character or character string (hereinafter referred to as a divided character or a divided character string) is used as a reference. Divide the registration target text.
- the fixed character string storage unit 108 is a storage unit that stores fixed characters or fixed character strings to be excluded from language identification targets, their description languages, and readings.
- the divided character string storage unit 109 is a storage unit that stores a divided character or a divided character string, a description language, and a reading as a division position when dividing the registration target text.
- the conversion processing storage unit 110 is a storage unit that stores the contents of preprocessing (character string exclusion and division) performed on the registration target text by the language identification preprocessing unit 107.
- the language identification unit 101 identifies the language for the character string of the registration target text that has been preprocessed by the language identification preprocessing unit 107. Further, when generating the recognition dictionary, the recognition dictionary generation unit 104 preprocesses stored in the conversion processing storage unit 110, that is, the connection relationship between the divided parts of the registration target text, the registration target text, and the like. A recognition dictionary suitable for the character string of the registration target text is generated using the character string excluded from the character string and its reading.
- the language identification preprocessing unit 107, the fixed character string storage unit 108, the divided character string storage unit 109, and the conversion processing storage unit 110 store a recognition dictionary creation program according to the gist of the present invention in a computer and cause the CPU to execute it.
- it can be realized on the computer as a specific means in which hardware and software cooperate.
- the storage areas used by the fixed character string storage unit 108, the divided character string storage unit 109, and the conversion processing storage unit 110 are constructed in a storage device mounted on the computer, such as a hard disk device or an external storage medium.
- FIG. 6 is a flowchart showing the flow of a recognition dictionary creation operation performed by the recognition dictionary creation device according to the second embodiment.
- the language identification preprocessing unit 107 the language identification unit 101, the reading assigning unit 102, the reading conversion unit 103, and the recognition Details of the operation by the dictionary generation unit 104 will be described.
- the language identification preprocessing unit 107 refers to the stored contents of the fixed character string storage unit 108 and detects a fixed character or a fixed character string included in the character string of the registration target text. Then, it is excluded from the language identification target (step ST501).
- the fixed character or the fixed character string includes a description in a specific language that appears in common in a plurality of languages.
- the language identification preprocessing unit 107 refers to the stored contents of the divided character string storage unit 109 to detect a divided character or a divided character string included in the character string of the registration target text, and the divided character or the divided character string. Is used as a reference to divide the character string of the registration target text (step ST502). Examples of the divided character or the divided character string include “(”, “)”, “ ⁇ ”, and the like, which are characters that delimit the description of the registration target text.
- the language identification preprocessing unit 107 refers to the stored content of the divided character string storage unit 109 and the language identification preprocessing unit 107 The parentheses “(” and “)” included in the character string are detected, and the character string of the registration target text is divided based on these characters. As a result, it is divided into two character strings of “Je vivii san toi” and “I Will Say Goodbye”.
- the language identification preprocessing unit 107 refers to the content stored in the divided character string storage unit 109 and includes a written language such as an uppercase character string (spell) or a number included in the character string of the registration target text. Regardless, the character string read out in the recognition target language is specified, and the character string is excluded from the registration target text and divided (step ST503). In this character string portion, the language is not specified as a result of language identification and becomes a recognition target language, and therefore, reading by the recognition target language is given.
- a written language such as an uppercase character string (spell) or a number included in the character string of the registration target text.
- the character string portion (capital character string (spell), character string such as a number) is preliminarily assigned with the reading in the recognition target language and stored in the divided character string storage unit 109, and the registration target text
- the reading assigning unit 102 may give a reading to the character string portion in the language identified by the language identifying unit 101 with respect to the character string before and after the character string portion.
- the recognition target language and the identification result language may add two types of readings, that is, the recognition target language and the identification result language, to the character string portion such as a capital letter string (spell) or a number, other than this character string portion in the registration target text Regardless of the language describing the part (language of the identification result), it is possible to correctly read the character string part uttered in the recognition target language.
- the language identification preprocessing unit 107 stores the contents of the preprocessing from step ST501 to step ST503 in the conversion processing storage unit 110.
- the contents of preprocessing include fixed characters or fixed character strings excluded from the registration target text, reading with the description language and phonemes of this language, divided characters or divided character strings serving as the division positions of the registration target text, and for each divided portion Are stored in the conversion processing storage unit 110.
- the language identification unit 101 starts language identification processing for the character string of the j-th divided portion input from the language identification preprocessing unit 107 in the same procedure as in the first embodiment. And the top N languages that are likely to be the language of the character string (the top N languages with high likelihood) are used as language identification results (step ST506).
- Step ST508 is the same process as step ST202 shown in FIG. 2
- step ST509 is the same process as step ST203 shown in FIG. 2
- step ST510 is the same as step ST204 shown in FIG. It is the same processing.
- step ST511 the language identification unit 101 increments the counter i by +1 and repeats the series of processes in the language of the next identification result.
- step ST512 the series of processes is completed for all languages of the identification result. If it is determined that (i ⁇ N + 1), the counter j is incremented by +1 (step ST513).
- step ST514 each of the character strings in the divided portions is determined until the language identifying unit 101 determines that the processing has been completed for all the divided portion character strings (j ⁇ K) based on the count value of the counter j. On the other hand, a series of processing from step ST505 to step ST514 is repeated.
- the recognition dictionary generation unit 104 refers to the preprocessing content stored in the conversion processing storage unit 110, identifies the reading for the character string excluded from the character string of the registration target text, and is excluded from the language identification target.
- the character string reading and the reading given to each divided character string input from the reading conversion unit 103 are combined to generate a recognition dictionary in a format that can be referred to by the speech recognition unit 106 (step ST515).
- the recognition dictionary is obtained by performing morphological analysis and word division as necessary to create language constraints.
- pre-processing is performed on the registration target text.
- a language identification preprocessing unit 107 to perform and a conversion processing storage unit 110 for storing the contents of the preprocessing by the language identification preprocessing unit 107, and the recognition dictionary generation unit 104 is based on the storage contents of the conversion processing storage unit 110.
- the phoneme representing the reading of the registration target text preprocessed by the language identification preprocessing unit 107 is obtained, and a recognition dictionary in which the phoneme is registered is generated.
- the registration target It becomes possible to correctly identify each segment of the text.
- words / phrases in a specific language used in a plurality of languages are stored in advance as specific character strings and excluded from language identification targets, so that phrases of a language used regardless of language (for example, Even when the music album title includes “Disc 1”, “Best of”, etc., the language of the character string part other than the phrase is used for language identification to correctly identify the language of the reading of each character string part. Is possible.
- by dividing the portion to be read depending on the recognition target language it is possible to correctly read the portion as well.
- a recognition dictionary in the phoneme system of speech recognition can be created from a vocabulary whose description language is unknown, portable music that handles data in which a vocabulary of a plurality of languages is mixed is handled. It is suitable for voice recognition devices such as players, mobile phones, and in-vehicle navigation systems.
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
- Character Discrimination (AREA)
Abstract
Description
また、特許文献1の発明は、認識語彙がどの言語で記載されているのかを予め特定して読みを付与しておく必要がある。
一方、一言語のみに対応する音声認識では、読みが付与されていない登録対象テキストに対して、認識対象言語の読みを自動で生成して認識が行われる。このとき、認識対象言語とは別の言語で記載されたテキストについては読みを付与できない。
実施の形態1.
図1は、この発明の実施の形態1による認識辞書作成装置を用いた登録型の音声認識装置の構成を示すブロック図である。図1において、実施の形態1による音声認識装置100は、言語同定部101、読み付与部102、読み変換部103、認識辞書生成部104、認識辞書記憶部105及び音声認識部106を備える。これら構成部のうち、言語同定部101、読み付与部102、読み変換部103、認識辞書生成部104及び認識辞書記憶部105が、実施の形態1による認識辞書作成装置の構成である。
図2は、実施の形態1の認識辞書作成装置による認識辞書の作成動作の流れを示すフローチャートであり、1つの登録対象テキストに対する、言語同定部101、読み付与部102、読み変換部103及び認識辞書生成部104による動作の詳細を述べる。
先ず、言語同定部101が、登録対象テキストの文字列に言語同定処理を開始し、当該文字列がどの言語で記載されたものであるかを判定する(ステップST201)。具体的には、言語同定部101に設定される複数の言語のうち、入力された登録対象テキストの文字列が、どの言語に該当するかが判定される。
例えば、言語同定部101において、英語、ドイツ語、フランス語、イタリア語、スペイン語、オランダ語の欧州6言語が、言語同定の対象として設定されている場合、入力された登録対象テキストの文字列が“Guten Morgen”であると、言語同定部101は、当該文字列の言語がドイツ語であるとの言語同定の結果を出力する。
言語の同定に失敗する等して言語が同定できなかった場合、言語同定部101は、音声認識部106が音声認識の対象として受理可能な言語を同定結果として出力する。
(参考文献1)“A Mathematical Theory of Communication”,C.E.SHANNON,The Bell System Technical Journal,Vol.27,pp.379-423,623-656,July,October,1948.
言語同定部101は、言語同定の対象となる各言語で記載された学習用テキスト文字列を有しており、学習用文字列中に現れる文字の3連鎖の出現確率を求める。例えば、文字列“MITSUBISHI”の場合、“$$M”、“$MI”、“MIT”、“ITS”、“TSU”、“SUB”、“UBI”、“BIS”、“ISH”、“SHI”、“HI$”、“I$$”の各連鎖に分解できる。なお、“$”は、語頭、語尾を表す文字を意味する。
N-gramを求めるにあたり、言語同定の対象となる複数の言語で共通に記述され、言語同定に寄与しない文字(例えば、数字、括弧やピリオド等の記号)については、これらの文字を特殊文字を表す文字に予め置き換えてN-gramを求めても構わない。例えば、#、@等の特殊文字を使用する。
なお、N-gramの学習に用いるテキスト文字列(言語同定モデルに用いるテキスト文字列)を、認識対象語彙と同じタスクの語彙を用いて学習を行うことにより、言語の同定精度を向上させることが可能である。タスクとは、例えば音楽検索や住所認識等の認識対象語彙が使用される処理を意味する。言語同定部101は、タスクごとの学習用文字列を備え、登録対象テキストが使用されるタスクに対応する学習用文字列を言語同定に用いる。
なお、音素の付与には、例えばG2P(Grapheme to Phoneme)を用いる。また、この読み付与処理において、省略語の判定や記号等の処理のように言語に依存した処理も併せて行う。
このように音素体系を変換する理由は、音声認識部106にて受理可能な音素体系が、音声認識の対象としている認識対象言語のみであり、言語が異なる音素体系は受理されない音素が存在するためである。
このような音素(読み)の変換方法としては、例えば、読み変換部103が、音声認識部106で受理できない言語の音素又は音素の系列に対して、認識対象言語のうち、最も近い音素又は音素の系列を、対応表として予め用意しておき、ステップST203で得られたテキスト文字列の読みに対して、上記対応表に応じて読みの変換(音素マッピング)を行う。
そこで、ドイツ語の発音/a/や/Y/については、図3に示す対応表のように、音声認識部106にて受理可能なイギリス英語に存在する音素のうち、発音が最も近い音素、例えば、/{/(非円唇前舌狭めの広母音)や/}/(円唇前舌半狭母音)と対応付けておく。なお、ここでの発音表記は、X-SAMPA表記を用いている。
この対応表は、言語的に近いもの同士を対応付けてもよいが、例えば、各言語の発音がどの音素表記で認識しやすいか等に基づいて対応関係を決定してもよい。
登録対象テキストである語彙が複数ある場合、各登録対象テキストに対して、上述までの処理を繰り返す。なお、認識辞書の生成は、一語彙ずつの追加登録ではなく、全ての登録対象テキストの語彙に読みを付与した後にまとめて行うようにしても構わない。
認識辞書生成部104によって生成された認識辞書は、認識辞書記憶部105に記憶される。
図4は、実施の形態1の認識辞書作成装置による認識辞書の作成動作の流れを示すフローチャートであり、言語同定部101にN個の言語が同定された場合を示している。
言語同定部101は、登録対象テキストの文字列に対して言語同定処理を開始し、当該文字列がどの言語で記載されたものであるかを判定して、当該文字列の言語として確からしい上位N個の言語を言語同定結果とする(ステップST301)。
なお、ここで、N個は固定値としてもよいが、言語同定部101で信頼度を表すスコアを出力し、このスコアが予め定めた閾値以上の信頼度である候補の数や、1位の信頼度に対する一定の信頼度差以内の候補の数としても構わない。
例えば、登録対象テキストとして“Hamburg”が入力された場合、言語同定部101は、このテキストから同定される言語がドイツ語と英語である(N=2)との結果を出力する。また、言語の同定に失敗する等して言語が同定できなかった場合や、信頼度のスコアが閾値未満である場合、言語同定部101は、音声認識部106で受理可能な言語(認識対象言語)を、言語の同定結果として出力する。
認識辞書の生成は、1つの登録対象テキストに対する言語同定結果として得られた言語ごとに追加登録するのではなく、1つの登録対象テキストの語彙で同定された全ての言語をまとめて追加登録してもよい。若しくは、全ての登録対象テキストの語彙に対して読みを付与した後にまとめて行っても構わない。
このようにすることで、登録対象テキストがどの言語が不明である場合(例えば、携帯音楽プレーヤに登録されている楽曲に対するタイトルやアーティスト名などの書誌データや、携帯電話に登録されている地名や人名等)であっても、N-gramを用いた言語同定モデルで登録対象テキストの言語を同定し、同定された言語による音素の付与を行い、音声認識で受理可能な言語の音素に変換することにより、音声認識で参照される認識語彙として登録することができる。
図5は、この発明の実施の形態2による認識辞書作成装置を用いた登録型の音声認識装置の構成を示すブロック図である。図5において、実施の形態2による音声認識装置100Aは、上記実施の形態1による音声認識装置の構成に加え、言語同定前処理部107、定型文字列記憶部(除外対象記憶部)108、分割文字列記憶部(分割対象記憶部)109及び変換処理記憶部(処理内容記憶部)110を備える。
言語同定前処理部107は、言語同定部101の前段に配置され、登録対象テキストが入力される構成部であり、言語同定処理を行う前処理として、入力された登録対象テキストの文字列における、特定の文字又は文字列(以下、定型文字又は定型文字列と呼ぶ)を言語同定の対象から除外したり、所定の文字又は文字列(以下、分割文字又は分割文字列と呼ぶ)を基準として、当該登録対象テキストを分割する。
図6は、実施の形態2の認識辞書作成装置による認識辞書の作成動作の流れを示すフローチャートであり、言語同定前処理部107、言語同定部101、読み付与部102、読み変換部103及び認識辞書生成部104による動作の詳細を述べる。
先ず、言語同定前処理部107は、登録対象テキストを入力すると、定型文字列記憶部108の記憶内容を参照して、当該登録対象テキストの文字列に含まれる定型文字又は定型文字列を検出し、言語同定の対象から除外する(ステップST501)。定型文字又は定型文字列としては、複数の言語に共通して現れる特定の言語による記述が挙げられる。
例えば、“Je vivrai sans toi(I Will Say Goodbye)”という登録対象テキストが入力された場合、分割文字列記憶部109の記憶内容を参照して、言語同定前処理部107は、当該登録対象テキストの文字列に含まれる、括弧“(”や“)”を検出し、これらの文字を基準として当該登録対象テキストの文字列を分割する。これにより、“Je vivrai sans toi”と“I Will Say Goodbye”の2つの文字列に分割される。
このように、大文字列(スペル)や数字等の文字列部分に対して、認識対象言語と同定結果の言語との2種類の読みを付与することで、登録対象テキストにおける、この文字列部分以外の部分を記述する言語(同定結果の言語)に依らず、認識対象言語で発話される文字列部分に正しく読みを付与することが可能となる。
このようにして、読み付与部102及び読み変換部103が、i(i=0~N)番目の同定結果に相当する言語でステップST508からステップST510までの処理を実行する。なお、ステップST508は、図2で示したステップST202と同様の処理であり、ステップST509は、図2で示したステップST203と同様の処理であり、ステップST510は、図2で示したステップST204と同様の処理である。
このように、特定文字列により入力テキストを分割し、分割部分に対してそれぞれ言語同定処理、読み付与処理を行うことで、登録対象テキストに複数の言語が含まれる場合であっても、登録対象テキストの各分割部分を正しく言語同定することが可能となる。
また、複数の言語に含んで使用される特定言語の単語/フレーズを、特定文字列として予め記憶し、言語同定の対象から除外することにより、言語に関係なく使用される言語のフレーズ(例えば、音楽のアルバムタイトルで“Disc 1”、“Best of”等)を含む場合においても、当該フレーズ以外の文字列部分で言語同定を行うことで各文字列部分の読みの言語を正しく言語同定することが可能となる。
また、認識対象言語に依存した読みをする箇所を分割することにより、当該箇所についても、正しく読みを付与することが可能となる。
Claims (12)
- 入力された登録対象のテキストの読みの言語を同定する言語同定部と、
前記言語同定部によって同定された言語の音素で前記登録対象のテキストに読みを付与する読み付与部と、
前記登録対象のテキストの読みを、前記言語同定部によって同定された言語の音素から音声認識で扱う認識対象言語の音素へ変換する読み変換部と、
前記読み変換部によって変換された前記登録対象のテキストの読みを登録した認識辞書を生成する認識辞書生成部とを備えた認識辞書作成装置。 - 前記言語同定部は、言語同定の対象となる複数の言語のうち、前記登録対象のテキストの読みの言語としての確からしさを示すスコアの上位から所定数の言語を同定結果として出力し、
前記読み付与部は、前記言語同定部によって同定された前記所定数の各言語の音素で前記登録対象のテキストに読みをそれぞれ付与し、
前記読み変換部は、前記登録対象のテキストの読みを、前記言語同定部によって同定された前記所定数の言語の音素から前記認識対象言語の音素へそれぞれ変換することを特徴とする請求項1記載の認識辞書作成装置。 - 前記言語同定部は、前記スコアが所定の閾値未満である場合、前記認識対象言語を同定結果として出力することを特徴とする請求項2記載の認識辞書作成装置。
- 言語同定の除外対象の文字又は文字列を記憶する除外対象記憶部と、
前記登録対象のテキストから、前記除外対象記憶部に記憶した前記除外対象の文字又は文字列に相当する部分を除外する言語同定前処理部と、
前記言語同定前処理部によって前記登録対象のテキストに施された前記除外対象の文字又は文字列の除外処理の内容を記憶する処理内容記憶部とを備え、
前記言語同定部は、前記言語同定前処理部により前記除外対象の文字又は文字列が除外された前記登録対象のテキストの読みの言語を同定し、
前記認識辞書生成部は、前記処理内容記憶部に記憶された除外処理の内容を参照して、前記除外対象の文字又は文字列の読み及び前記除外対象の文字又は文字列を除外した前記登録対象のテキストの読みから、当該登録対象のテキストの読みを求め、これを登録した認識辞書を生成することを特徴とする請求項1記載の認識辞書作成装置。 - 分割対象の文字又は文字列を記憶する分割対象記憶部を備え、
前記言語同定前処理部は、前記分割対象記憶部に記憶した前記分割対象の文字又は文字列で前記登録対象のテキストの文字列を分割し、
前記処理内容記憶部には、前記言語同定前処理部によって前記登録対象のテキストに施された分割処理の内容が記憶され、
前記言語同定部は、前記言語同定前処理部によって分割された前記登録対象のテキストの分割部分ごとに読みの言語を同定し、
前記認識辞書生成部は、前記処理内容記憶部に記憶された分割処理の内容を参照して、前記分割部分ごとの読みから前記登録対象のテキストの読みを求め、これを登録した認識辞書を生成することを特徴とする請求項4記載の認識辞書作成装置。 - 前記分割対象記憶部は、数字又は大文字列を含む分割対象の文字又は文字列及びその認識対象言語の読みを記憶しており、
前記言語同定前処理部は、前記登録対象のテキストから、前記数字又は大文字列を含む分割対象の文字又は文字列を除外して、当該登録対象のテキストの文字列を分割し、
前記言語同定部は、前記言語同定前処理部によって分割された前記登録対象のテキストの分割部分ごとに読みの言語を同定し、
前記読み付与部は、前記数字又は大文字列を含む分割対象の文字又は文字列に対して、前記言語同定部によって同定された前記分割部分の言語の読みを付与し、
前記認識辞書生成部は、前記処理内容記憶部に記憶された除外処理の内容を参照して、前記分割部分ごとの読みと、前記分割対象の文字又は文字列の認識対象言語の読み及び前記分割部分の言語から前記認識対象言語へ変換された読みから、前記登録対象のテキストの読みを求め、これを登録した認識辞書を生成することを特徴とする請求項5記載の認識辞書作成装置。 - 前記言語同定部は、言語同定の対象となる言語ごとのN-gram及びその出現確率を含む言語同定モデルを用いて、前記登録対象のテキストのN-gramの連鎖確率を言語ごとに算出し、前記連鎖確率値に基づく尤度から読みの言語を同定することを特徴とする請求項1記載の認識辞書作成装置。
- 前記言語同定部は、前記登録対象のテキストのうち、言語同定の対象となる複数の言語で共通に記述されて言語同定に寄与しない文字又は文字列については特殊文字に置換してN-gramを生成することを特徴とする請求項7記載の認識辞書作成装置。
- 前記言語同定部は、前記登録対象のテキストに使用言語が限定される文字又は文字列が含まれる場合、前記使用言語のうち、尤度が最も高い言語を同定結果として出力することを特徴とする請求項7記載の認識辞書作成装置。
- 前記言語同定部は、認識対象語彙が使用される処理ごとに言語同定モデルを備え、前記登録対象のテキストが使用される処理に対応する言語同定モデルを言語同定に用いることを特徴とする請求項7記載の認識辞書作成装置。
- 入力された登録対象のテキストの読みの言語を同定する言語同定部と、
前記言語同定部によって同定された言語の音素で前記登録対象のテキストに読みを付与する読み付与部と、
前記登録対象のテキストの読みを、前記言語同定部によって同定された言語の音素から音声認識で扱う認識対象言語の音素へ変換する読み変換部と、
前記読み変換部によって変換された前記登録対象のテキストの読みを登録した認識辞書を生成する認識辞書生成部と、
前記認識辞書生成部によって生成された前記認識辞書を参照して、入力音声の音声認識を行う音声認識部とを備えた音声認識装置。 - 言語同定の除外対象の文字又は文字列を記憶する除外対象記憶部と、
分割対象の文字又は文字列を記憶する分割対象記憶部と、
前記除外対象記憶部及び前記分割対象記憶部に記憶された内容に基づいて、入力された登録対象のテキストから、前記除外対象の文字又は文字列を除外するとともに、前記分割対象の文字又は文字列で分割する言語同定前処理部と、
前記言語同定前処理部により前記登録対象のテキストに施された処理の内容を記憶する処理内容記憶部と、
前記言語同定前処理部により前記処理が施された前記登録対象のテキストの読みの言語を同定する言語同定部と、
前記言語同定部によって同定された言語の音素で前記登録対象のテキストに読みを付与する読み付与部と、
前記登録対象のテキストの読みを、前記言語同定部によって同定された言語の音素から音声認識で扱う認識対象言語の音素へ変換する読み変換部と、
前記処理内容記憶部に記憶された前記処理の内容を参照して、前記言語同定前処理部によって前記処理が施された前記登録対象のテキストの読みを求めて、これを登録した認識辞書を生成する認識辞書生成部と、
前記認識辞書生成部によって生成された前記認識辞書を参照して、入力音声の音声認識を行う音声認識部とを備えた音声認識装置。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112010005226T DE112010005226T5 (de) | 2010-02-05 | 2010-02-05 | Erkennungswörterbuch-Erzeugungsvorrichtung und Spracherkennungsvorrichtung |
PCT/JP2010/000709 WO2011096015A1 (ja) | 2010-02-05 | 2010-02-05 | 認識辞書作成装置及び音声認識装置 |
US13/505,243 US8868431B2 (en) | 2010-02-05 | 2010-02-05 | Recognition dictionary creation device and voice recognition device |
JP2011552580A JP5318230B2 (ja) | 2010-02-05 | 2010-02-05 | 認識辞書作成装置及び音声認識装置 |
CN201080062593.4A CN102725790B (zh) | 2010-02-05 | 2010-02-05 | 识别词典制作装置及声音识别装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2010/000709 WO2011096015A1 (ja) | 2010-02-05 | 2010-02-05 | 認識辞書作成装置及び音声認識装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011096015A1 true WO2011096015A1 (ja) | 2011-08-11 |
Family
ID=44355045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/000709 WO2011096015A1 (ja) | 2010-02-05 | 2010-02-05 | 認識辞書作成装置及び音声認識装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8868431B2 (ja) |
JP (1) | JP5318230B2 (ja) |
CN (1) | CN102725790B (ja) |
DE (1) | DE112010005226T5 (ja) |
WO (1) | WO2011096015A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9239829B2 (en) | 2010-10-01 | 2016-01-19 | Mitsubishi Electric Corporation | Speech recognition device |
Families Citing this family (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
DE212014000045U1 (de) | 2013-02-07 | 2015-09-24 | Apple Inc. | Sprach-Trigger für einen digitalen Assistenten |
US9231898B2 (en) | 2013-02-08 | 2016-01-05 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US9031829B2 (en) | 2013-02-08 | 2015-05-12 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8996352B2 (en) | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for correcting translations in multi-user multi-lingual communications |
US9298703B2 (en) | 2013-02-08 | 2016-03-29 | Machine Zone, Inc. | Systems and methods for incentivizing user feedback for translation processing |
US10650103B2 (en) | 2013-02-08 | 2020-05-12 | Mz Ip Holdings, Llc | Systems and methods for incentivizing user feedback for translation processing |
US9600473B2 (en) | 2013-02-08 | 2017-03-21 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
KR102084646B1 (ko) * | 2013-07-04 | 2020-04-14 | 삼성전자주식회사 | 음성 인식 장치 및 음성 인식 방법 |
CN103578471B (zh) * | 2013-10-18 | 2017-03-01 | 威盛电子股份有限公司 | 语音辨识方法及其电子装置 |
US9711136B2 (en) * | 2013-11-20 | 2017-07-18 | Mitsubishi Electric Corporation | Speech recognition device and speech recognition method |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
JP6277921B2 (ja) * | 2014-09-25 | 2018-02-14 | 京セラドキュメントソリューションズ株式会社 | 用語集管理装置および用語集管理プログラム |
US10162811B2 (en) * | 2014-10-17 | 2018-12-25 | Mz Ip Holdings, Llc | Systems and methods for language detection |
US9372848B2 (en) * | 2014-10-17 | 2016-06-21 | Machine Zone, Inc. | Systems and methods for language detection |
JP6415929B2 (ja) * | 2014-10-30 | 2018-10-31 | 株式会社東芝 | 音声合成装置、音声合成方法およびプログラム |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
DE102015014206B4 (de) * | 2015-11-04 | 2020-06-25 | Audi Ag | Verfahren und Vorrichtung zum Auswählen eines Navigationsziels aus einer von mehreren Sprachregionen mittels Spracheingabe |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10765956B2 (en) | 2016-01-07 | 2020-09-08 | Machine Zone Inc. | Named entity recognition on chat data |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Multi-modal interfaces |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
US11361752B2 (en) * | 2017-09-11 | 2022-06-14 | Mitsubishi Electric Corporation | Voice recognition dictionary data construction apparatus and voice recognition apparatus |
WO2019060353A1 (en) | 2017-09-21 | 2019-03-28 | Mz Ip Holdings, Llc | SYSTEM AND METHOD FOR TRANSLATION OF KEYBOARD MESSAGES |
US10572586B2 (en) * | 2018-02-27 | 2020-02-25 | International Business Machines Corporation | Technique for automatically splitting words |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11475884B2 (en) * | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
CN114038463B (zh) * | 2020-07-21 | 2024-12-06 | 中兴通讯股份有限公司 | 混合语音处理的方法、电子设备、计算机可读介质 |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
US11386270B2 (en) * | 2020-08-27 | 2022-07-12 | Unified Compliance Framework (Network Frontiers) | Automatically identifying multi-word expressions |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004053979A (ja) * | 2002-07-22 | 2004-02-19 | Alpine Electronics Inc | 音声認識辞書の作成方法及び音声認識辞書作成システム |
JP2004271895A (ja) * | 2003-03-07 | 2004-09-30 | Nec Corp | 複数言語音声認識システムおよび発音学習システム |
JP2005241952A (ja) * | 2004-02-26 | 2005-09-08 | Gap Kk | 知識処理装置、知識処理方法および知識処理プログラム |
JP2006059105A (ja) * | 2004-08-19 | 2006-03-02 | Mitsubishi Electric Corp | 言語モデル作成装置及び方法並びにプログラム |
JP2006106775A (ja) * | 2005-11-25 | 2006-04-20 | Nippon Telegr & Teleph Corp <Ntt> | 多言語話者適応方法、装置、プログラム |
JP2008262279A (ja) * | 2007-04-10 | 2008-10-30 | Mitsubishi Electric Corp | 音声検索装置 |
JP2009037633A (ja) * | 2002-10-22 | 2009-02-19 | Nokia Corp | 規模調整可能なニューラルネットワーク・ベースの、文書テキストからの言語同定 |
JP2009169113A (ja) * | 2008-01-16 | 2009-07-30 | Nec Corp | 言語モデル作成装置、言語モデル作成方法および言語モデル作成プログラム |
JP2009300573A (ja) * | 2008-06-11 | 2009-12-24 | Nippon Syst Wear Kk | 多言語対応音声認識装置、システム、音声の切り替え方法およびプログラム |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5913185A (en) * | 1996-08-19 | 1999-06-15 | International Business Machines Corporation | Determining a natural language shift in a computer document |
US6085162A (en) * | 1996-10-18 | 2000-07-04 | Gedanken Corporation | Translation system and method in which words are translated by a specialized dictionary and then a general dictionary |
US6275789B1 (en) * | 1998-12-18 | 2001-08-14 | Leo Moser | Method and apparatus for performing full bidirectional translation between a source language and a linked alternative language |
US6167369A (en) * | 1998-12-23 | 2000-12-26 | Xerox Company | Automatic language identification using both N-gram and word information |
US6442524B1 (en) * | 1999-01-29 | 2002-08-27 | Sony Corporation | Analyzing inflectional morphology in a spoken language translation system |
GB2366940B (en) * | 2000-09-06 | 2004-08-11 | Ericsson Telefon Ab L M | Text language detection |
EP1217610A1 (de) | 2000-11-28 | 2002-06-26 | Siemens Aktiengesellschaft | Verfahren und System zur multilingualen Spracherkennung |
EP1466317B1 (de) | 2002-01-17 | 2007-04-18 | Siemens Aktiengesellschaft | Betriebsverfahren eines automatischen spracherkenners zur sprecherunabhängigen spracherkennung von worten aus verschiedenen sprachen und automatischer spracherkenner |
JP2004053742A (ja) * | 2002-07-17 | 2004-02-19 | Matsushita Electric Ind Co Ltd | 音声認識装置 |
JP3776391B2 (ja) | 2002-09-06 | 2006-05-17 | 日本電信電話株式会社 | 多言語音声認識方法、装置、プログラム |
CN100559463C (zh) | 2002-11-11 | 2009-11-11 | 松下电器产业株式会社 | 声音识别用辞典编制装置和声音识别装置 |
US20050267755A1 (en) | 2004-05-27 | 2005-12-01 | Nokia Corporation | Arrangement for speech recognition |
US7840399B2 (en) | 2005-04-07 | 2010-11-23 | Nokia Corporation | Method, device, and computer program product for multi-lingual speech recognition |
US8583418B2 (en) * | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8224641B2 (en) * | 2008-11-19 | 2012-07-17 | Stratify, Inc. | Language identification for documents containing multiple languages |
DE112009003930B4 (de) | 2009-01-30 | 2016-12-22 | Mitsubishi Electric Corporation | Spracherkennungsvorrichtung |
US8326602B2 (en) * | 2009-06-05 | 2012-12-04 | Google Inc. | Detecting writing systems and languages |
CN102770910B (zh) * | 2010-03-30 | 2015-10-21 | 三菱电机株式会社 | 声音识别装置 |
CN103038816B (zh) * | 2010-10-01 | 2015-02-25 | 三菱电机株式会社 | 声音识别装置 |
-
2010
- 2010-02-05 US US13/505,243 patent/US8868431B2/en active Active
- 2010-02-05 WO PCT/JP2010/000709 patent/WO2011096015A1/ja active Application Filing
- 2010-02-05 JP JP2011552580A patent/JP5318230B2/ja active Active
- 2010-02-05 CN CN201080062593.4A patent/CN102725790B/zh active Active
- 2010-02-05 DE DE112010005226T patent/DE112010005226T5/de not_active Ceased
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004053979A (ja) * | 2002-07-22 | 2004-02-19 | Alpine Electronics Inc | 音声認識辞書の作成方法及び音声認識辞書作成システム |
JP2009037633A (ja) * | 2002-10-22 | 2009-02-19 | Nokia Corp | 規模調整可能なニューラルネットワーク・ベースの、文書テキストからの言語同定 |
JP2004271895A (ja) * | 2003-03-07 | 2004-09-30 | Nec Corp | 複数言語音声認識システムおよび発音学習システム |
JP2005241952A (ja) * | 2004-02-26 | 2005-09-08 | Gap Kk | 知識処理装置、知識処理方法および知識処理プログラム |
JP2006059105A (ja) * | 2004-08-19 | 2006-03-02 | Mitsubishi Electric Corp | 言語モデル作成装置及び方法並びにプログラム |
JP2006106775A (ja) * | 2005-11-25 | 2006-04-20 | Nippon Telegr & Teleph Corp <Ntt> | 多言語話者適応方法、装置、プログラム |
JP2008262279A (ja) * | 2007-04-10 | 2008-10-30 | Mitsubishi Electric Corp | 音声検索装置 |
JP2009169113A (ja) * | 2008-01-16 | 2009-07-30 | Nec Corp | 言語モデル作成装置、言語モデル作成方法および言語モデル作成プログラム |
JP2009300573A (ja) * | 2008-06-11 | 2009-12-24 | Nippon Syst Wear Kk | 多言語対応音声認識装置、システム、音声の切り替え方法およびプログラム |
Non-Patent Citations (1)
Title |
---|
TORU SUMITOMO ET AL.: "Trie Kozo o Mochiita Tagengo Taiyaku Jisho no Koritsuteki Asshuku Shuho", DAI 61 KAI (HEISEI 12 NENDO KOKI) ZENKOKU TAIKAI KOEN RONBUNSHU (2), 3 October 2000 (2000-10-03), pages 2-133 - 2-134 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9239829B2 (en) | 2010-10-01 | 2016-01-19 | Mitsubishi Electric Corporation | Speech recognition device |
DE112010005918B4 (de) * | 2010-10-01 | 2016-12-22 | Mitsubishi Electric Corp. | Spracherkennungsvorrichtung |
Also Published As
Publication number | Publication date |
---|---|
US8868431B2 (en) | 2014-10-21 |
JP5318230B2 (ja) | 2013-10-16 |
CN102725790B (zh) | 2014-04-16 |
US20120226491A1 (en) | 2012-09-06 |
DE112010005226T5 (de) | 2012-11-08 |
JPWO2011096015A1 (ja) | 2013-06-06 |
CN102725790A (zh) | 2012-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5318230B2 (ja) | 認識辞書作成装置及び音声認識装置 | |
CN113692616B (zh) | 用于在端到端模型中的跨语言语音识别的基于音素的场境化 | |
JP7280382B2 (ja) | 数字列のエンドツーエンド自動音声認識 | |
CN103714048B (zh) | 用于校正文本的方法和系统 | |
Schuster et al. | Japanese and korean voice search | |
CN107016994B (zh) | 语音识别的方法及装置 | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
JP5480760B2 (ja) | 端末装置、音声認識方法および音声認識プログラム | |
US5949961A (en) | Word syllabification in speech synthesis system | |
TWI532035B (zh) | 語言模型的建立方法、語音辨識方法及電子裝置 | |
JP2002287787A (ja) | 明確化言語モデル | |
TW201517015A (zh) | 聲學模型的建立方法、語音辨識方法及其電子裝置 | |
CN103123644B (zh) | 声音数据检索系统及用于该系统的程序 | |
TW202020854A (zh) | 語音辨識系統及其方法、與電腦程式產品 | |
US20210090557A1 (en) | Dialogue system, dialogue processing method, translating apparatus, and method of translation | |
KR20230156125A (ko) | 룩업 테이블 순환 언어 모델 | |
CN100568222C (zh) | 歧义消除语言模型 | |
US6963832B2 (en) | Meaning token dictionary for automatic speech recognition | |
JP2005250071A (ja) | 音声認識方法及び装置及び音声認識プログラム及び音声認識プログラムを格納した記憶媒体 | |
JP2938865B1 (ja) | 音声認識装置 | |
US12008986B1 (en) | Universal semi-word model for vocabulary contraction in automatic speech recognition | |
JP4733436B2 (ja) | 単語・意味表現組データベースの作成方法、音声理解方法、単語・意味表現組データベース作成装置、音声理解装置、プログラムおよび記憶媒体 | |
Hendessi et al. | A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM | |
JP6003127B2 (ja) | 言語モデル作成プログラム及び言語モデル作成装置 | |
Sajjan et al. | Kannada speech recognition using decision tree based clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080062593.4 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10845153 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011552580 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13505243 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112010005226 Country of ref document: DE Ref document number: 1120100052263 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10845153 Country of ref document: EP Kind code of ref document: A1 |