JPH09258776A

JPH09258776A - Voice recognition method, voice recognition device, and translating method and translating device

Info

Publication number: JPH09258776A
Application number: JP8065922A
Authority: JP
Inventors: Miyuki Tanaka; 幸田中
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1996-03-22
Filing date: 1996-03-22
Publication date: 1997-10-03

Abstract

PROBLEM TO BE SOLVED: To improve accuracy of translation and the processing speed. SOLUTION: An intention key 3 is composed of various keys operated when voice is intended for a request, a desire, a question, an explanation, or the like, for instance. Among these keys, a user operates the key corresponding to an intention expressed by voice to be inputted. An operation signal corresponding to this operation is fed from a control key group 3 to a system control part 4 and received. In the system control part 4, the user's intention(talk intention) expressed by the voice of the user who is about to talk is recognized. When voice spoken in Japanese is inputted to a microphone 1, this voice is recognized according to the talk intention recognized in the system control part 4. In the system control part 4, the voice recognized result is translated into English also according to the talk intention.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識方法およ
び音声認識装置、並びに翻訳方法および翻訳装置に関す
る。特に、翻訳すべき入力文を、その入力文が表す意図
にしたがって翻訳するようにすることにより、その入力
文の翻訳を、正確かつ迅速に行うことができるようにす
る音声認識方法および音声認識装置、並びに翻訳方法お
よび翻訳装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method, a speech recognition apparatus, a translation method and a translation apparatus. In particular, by translating an input sentence to be translated according to an intention expressed by the input sentence, a voice recognition method and a voice recognition device that can accurately and quickly translate the input sentence. , And a translation method and a translation device.

【０００２】[0002]

【従来の技術】近年では、所定の言語で入力された入力
文を、他の言語に翻訳する自動翻訳装置の研究、開発
が、様々な企業や研究期間などにおいて盛んに行われて
いる。さらに、最近では、翻訳装置に音声認識装置を内
蔵させることで、翻訳すべき入力文を、音声により入力
することのできる、いわゆる音声翻訳機の研究、開発も
行われている。2. Description of the Related Art In recent years, research and development of an automatic translation device for translating an input sentence input in a predetermined language into another language have been actively conducted in various companies and research periods. Furthermore, recently, so-called voice translators have been researched and developed which can input input sentences to be translated by voice by incorporating a voice recognition device in the translator.

【０００３】[0003]

【発明が解決しようとする課題】ところで、現状の技術
では、例えばキーボードなどより入力された入力文を翻
訳する場合であっても、その翻訳の精度には、ある程度
の限界があり、また、音声認識における認識精度にも、
やはりある程度の限界がある。By the way, in the present technology, even when translating an input sentence input from a keyboard, for example, there is a certain limit to the accuracy of the translation, and the accuracy of the translation is limited. For recognition accuracy in recognition,
There are still some limits.

【０００４】このため、入力文を音声認識し、その音声
認識結果を翻訳して得られる翻訳文は、さらに精度の悪
いものとなることがあった。For this reason, the translated sentence obtained by recognizing the input sentence by voice and translating the voice recognition result may be inferior in accuracy.

【０００５】本発明は、このような状況に鑑みてなされ
たものであり、翻訳および音声認識の精度をともに向上
させ、これにより、音声で入力された入力文の翻訳の精
度も向上させることができるようにするものである。The present invention has been made in view of such a situation, and improves both the accuracy of translation and the accuracy of speech recognition, thereby improving the accuracy of translation of an input sentence input by voice. It enables you to do it.

【０００６】[0006]

【課題を解決するための手段】請求項１に記載の音声認
識方法は、音声が表す発話意図を入力する入力ステップ
を備え、音声の特徴パラメータに基づいて、その音声を
音声認識する際に、発話意図にしたがって、音声を音声
認識することを特徴とする。According to a first aspect of the present invention, there is provided a voice recognition method, which comprises an input step of inputting an utterance intention represented by a voice, and when recognizing the voice based on a characteristic parameter of the voice, The feature is that the voice is recognized according to the utterance intention.

【０００７】請求項３に記載の音声認識装置は、音声が
表す発話意図を入力する入力手段を備え、音声の特徴パ
ラメータに基づいて、その音声を音声認識する音声認識
手段が、発話意図にしたがって、音声を音声認識するこ
とを特徴とする。According to another aspect of the present invention, there is provided a voice recognition device, which comprises input means for inputting an utterance intention expressed by the voice, and the voice recognition means recognizing the voice based on the characteristic parameter of the voice according to the utterance intention. , Is characterized by recognizing voice.

【０００８】請求項４に記載の翻訳方法は、入力文が表
す意図を入力する入力ステップと、入力文が表す意図に
したがって、その入力文を、他の言語に翻訳する翻訳ス
テップとを備えることを特徴とする。The translation method according to claim 4 comprises an input step of inputting an intention represented by the input sentence, and a translation step of translating the input sentence into another language according to the intention represented by the input sentence. Is characterized by.

【０００９】請求項７に記載の翻訳装置は、入力文が表
す意図を入力する入力手段と、入力手段によって入力さ
れた意図にしたがって、入力文を、他の言語に翻訳する
翻訳手段とを備えることを特徴とする。A translation device according to a seventh aspect of the present invention comprises an input means for inputting an intention represented by the input sentence, and a translation means for translating the input sentence into another language according to the intention input by the input means. It is characterized by

【００１０】請求項１に記載の音声認識方法において
は、音声が表す発話意図を入力し、その発話意図にした
がって、音声を音声認識するようになされている。In the voice recognition method according to the first aspect, the utterance intention represented by the voice is input, and the voice is recognized according to the utterance intention.

【００１１】請求項３に記載の音声認識装置において
は、入力手段により、音声が表す発話意図を入力するこ
とができるようになされており、音声認識手段は、発話
意図にしたがって、音声を音声認識するようになされて
いる。In the voice recognition apparatus according to the third aspect, the input means can input the utterance intention represented by the voice, and the voice recognition means recognizes the voice according to the utterance intention. It is designed to do.

【００１２】請求項４に記載の翻訳方法においては、入
力文が表す意図を入力し、入力文が表す意図にしたがっ
て、その入力文を、他の言語に翻訳するようになされて
いる。In the translation method according to the fourth aspect, the intention represented by the input sentence is input, and the input sentence is translated into another language according to the intention represented by the input sentence.

【００１３】請求項７に記載の翻訳装置においては、入
力手段により、入力文が表す意図を入力することができ
るようになされており、翻訳手段は、入力手段によって
入力された意図にしたがって、入力文を、他の言語に翻
訳するようになされている。In the translation device according to the seventh aspect, the input means can input the intention represented by the input sentence, and the translation means inputs the input in accordance with the input intention by the input means. It is designed to translate sentences into other languages.

【００１４】[0014]

【発明の実施の形態】以下に、本発明の実施例を説明す
るが、その前に、特許請求の範囲に記載の発明の各手段
と以下の実施例との対応関係を明らかにするために、各
手段の後の括弧内に、対応する実施例（但し、一例）を
付加して、本発明の特徴を記述すると、次のようにな
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below, but before that, in order to clarify the correspondence between each means of the invention described in the claims and the following embodiments. The features of the present invention are described as follows by adding a corresponding embodiment (however, an example) in parentheses after each means.

【００１５】即ち、請求項３に記載の音声認識装置は、
入力された音声を音響分析し、その音声の特徴パラメー
タを出力する音響分析手段（例えば、図２に示す音響分
析部１３など）と、特徴パラメータに基づいて、音声を
音声認識する音声認識手段（例えば、図２に示す認識処
理部１４など）とを備える音声認識装置であって、音声
が表す発話意図を入力する入力手段（例えば、図１に示
す意図キー３Ｄなど）をさらに備え、音声認識手段が、
発話意図にしたがって、音声を音声認識することを特徴
とする。That is, the speech recognition apparatus according to the third aspect is
An acoustic analysis unit (for example, the acoustic analysis unit 13 shown in FIG. 2) that acoustically analyzes the input voice and outputs a characteristic parameter of the voice, and a voice recognition unit that recognizes the voice based on the characteristic parameter ( For example, a voice recognition device including the recognition processing unit 14 shown in FIG. 2), further comprising an input unit (for example, an intention key 3D shown in FIG. 1) for inputting an utterance intention expressed by voice, and voice recognition Means
The feature is that the voice is recognized according to the utterance intention.

【００１６】請求項７に記載の翻訳装置は、所定の言語
で入力された入力文を、他の言語に翻訳する翻訳装置で
あって、入力文が表す意図を入力する入力手段（例え
ば、図１に示す意図キー３Ｄなど）と、入力手段によっ
て入力された意図にしたがって、入力文を、他の言語に
翻訳する翻訳手段（例えば、図１に示すシステム制御部
４など）とを備えることを特徴とする。A translation device according to a seventh aspect is a translation device for translating an input sentence input in a predetermined language into another language, and input means for inputting an intention expressed by the input sentence (for example, FIG. 1) and the translation means for translating the input sentence into another language according to the intention input by the input means (for example, the system control unit 4 shown in FIG. 1). Characterize.

【００１７】なお、勿論この記載は、各手段を上記した
ものに限定することを意味するものではない。Of course, this description does not mean that each means is limited to the above.

【００１８】図１は、本発明を適用した翻訳装置（音声
翻訳機）の一実施例の構成を示している。この翻訳装置
は、例えば日本語などで入力された音声を、英語などに
翻訳するようになされている。FIG. 1 shows the configuration of an embodiment of a translation apparatus (speech translator) to which the present invention is applied. This translation device is adapted to translate a voice input in Japanese or the like into English or the like.

【００１９】即ち、マイク１は、そこに入力された音声
を、電気信号としての音声信号に変換して、音声認識部
２に出力するようになされている。音声認識部２は、マ
イク１に入力された音声を、例えば単語認識するように
なされている。即ち、音声認識部２は、例えば図２に示
すように、Ａ／Ｄ変換器１２、音響分析部１３、認識処
理部１４、および大規模単語辞書１５で構成されてい
る。That is, the microphone 1 is adapted to convert the voice input thereto into a voice signal as an electric signal and output it to the voice recognition section 2. The voice recognition unit 2 recognizes the voice input to the microphone 1, for example, as a word. That is, the voice recognition unit 2 includes, for example, as shown in FIG. 2, an A / D converter 12, an acoustic analysis unit 13, a recognition processing unit 14, and a large-scale word dictionary 15.

【００２０】Ａ／Ｄ変換器１２は、マイク１から出力さ
れるアナログの音声信号を、所定のクロックのタイミン
グでサンプリングして量子化し、ディジタルの音声信号
（ディジタルデータ）に変換するようになされている。
音響分析部１３は、Ａ／Ｄ変換器１２より出力される音
声信号を音響分析し、これにより、例えば所定の帯域ご
との音声のパワーや、線形予測係数（ＬＰＣ）、ケプス
トラム係数などの音声の特徴パラメータを抽出するよう
になされている。即ち、音響分析部１３は、例えばフィ
ルタバンクにより、音声信号を所定の帯域ごとにフィル
タリングし、そのフィルタリング結果を整流平滑化する
ことで、所定の帯域ごとの音声のパワーを求めるように
なされている。あるいは、音響分析部１３は、入力され
た音声に対し、例えば線形予測分析処理を施すことで、
線形予測係数を求め、またその線形予測係数からケプス
トラム係数を求めるようになされている。The A / D converter 12 is adapted to sample an analog voice signal output from the microphone 1 at a predetermined clock timing, quantize it, and convert it into a digital voice signal (digital data). There is.
The acoustic analysis unit 13 acoustically analyzes the voice signal output from the A / D converter 12, and thereby, for example, the power of the voice for each predetermined band, the voice of the linear prediction coefficient (LPC), the cepstrum coefficient, and the like. It is designed to extract characteristic parameters. That is, the acoustic analysis unit 13 obtains the sound power for each predetermined band by filtering the sound signal for each predetermined band by a filter bank and rectifying and smoothing the filtering result. . Alternatively, the acoustic analysis unit 13 performs, for example, linear prediction analysis processing on the input voice,
The linear prediction coefficient is calculated, and the cepstrum coefficient is calculated from the linear prediction coefficient.

【００２１】音響分析部３で求められた特徴パラメータ
は、そのまま、あるいは、そこで必要に応じてベクトル
量子化されて、認識処理部１４に出力されるようになさ
れている。The characteristic parameters obtained by the acoustic analysis unit 3 are output to the recognition processing unit 14 as they are or after being vector-quantized if necessary.

【００２２】認識部処理１４は、音響分析部１３からの
特徴パラメータ（あるいは、特徴パラメータをベクトル
量子化して得られるシンボル）に基づき、例えばダイナ
ミックプログラミング（ＤＰ）マッチング法や、隠れマ
ルコフモデル（ＨＭＭ）などの音声認識アルゴリズムに
したがい、後述する大規模単語辞書１５を参照して音声
認識処理を行い、マイク１に入力された音声に含まれる
１以上の単語を、音声認識結果として出力するようにな
されている。The recognition unit processing 14 is based on the characteristic parameter (or the symbol obtained by vector-quantizing the characteristic parameter) from the acoustic analysis unit 13, for example, a dynamic programming (DP) matching method or a hidden Markov model (HMM). According to a voice recognition algorithm such as the above, a voice recognition process is performed by referring to a large-scale word dictionary 15 described later, and one or more words included in the voice input to the microphone 1 are output as a voice recognition result. ing.

【００２３】大規模単語辞書１５には、音声認識の対象
とする、例えば単語（従って、本実施例では、日本語の
単語）の標準パターン（あるいはモデルなど）が記憶さ
れている。認識処理部１４では、この大規模単語辞書１
５に記憶されている単語を対象として、音声認識が行わ
れるようになされている。The large-scale word dictionary 15 stores, for example, standard patterns (or models, etc.) of words (hence, in this embodiment, Japanese words) to be subjected to voice recognition. In the recognition processing unit 14, the large-scale word dictionary 1
Speech recognition is performed for the words stored in No. 5.

【００２４】図１に戻り、コントロールキー群３は、種
々のキーから構成され、システム制御部４に対し、所定
の入力を与えるときに操作される。即ち、コントロール
キー群３は、後述する出力部７から出力された文章が正
しいか、または誤っているかをシステム制御部４に認識
させるための確認／変換キー３Ａ、再検索／不変換キー
３Ｂ、出力部７の出力状態（表示状態）をクリアするた
めのクリアキー３Ｃ、後述する意図キー３Ｄなどで構成
されている。Returning to FIG. 1, the control key group 3 is composed of various keys, and is operated when a predetermined input is given to the system control section 4. That is, the control key group 3 includes a confirmation / conversion key 3A, a re-search / non-conversion key 3B, which causes the system control unit 4 to recognize whether the sentence output from the output unit 7 described later is correct or incorrect. It is composed of a clear key 3C for clearing the output state (display state) of the output unit 7, an intention key 3D described later, and the like.

【００２５】システム制御部４は、装置を構成する各ブ
ロックを制御する他、後述する第１言語文記憶部５また
は第２言語文記憶部６それぞれに記憶されている文章を
検索することで、日本語から英語への翻訳を行ったり、
また、出力部７に文章を供給して出力させるようになさ
れている。The system control unit 4 controls each block constituting the device and retrieves sentences stored in each of the first language sentence storage unit 5 and the second language sentence storage unit 6 to be described later, Translate from Japanese to English,
Further, the output unit 7 is configured to supply and output a sentence.

【００２６】第１言語文記憶部５は、マイク１に入力さ
れる音声の言語による文章、即ち、本実施例においては
日本語の文章を数多く記憶している。第２言語文記憶部
６は、第１言語文記憶部５に記憶されている日本語の文
章を、目的とする言語に翻訳した文章、即ち、本実施例
においては、英語の文章を記憶している。The first language sentence storage unit 5 stores a large number of sentences in the language of the voice input to the microphone 1, that is, Japanese sentences in this embodiment. The second language sentence storage unit 6 stores a sentence obtained by translating a Japanese sentence stored in the first language sentence storage unit 5 into a target language, that is, an English sentence in this embodiment. ing.

【００２７】出力部７は、例えば音声合成装置およびス
ピーカや、モニタなどでなり、システム制御部４の制御
にしたがい、第１言語文記憶部５に記憶された日本語の
文章、および第２言語文記憶部６に記憶された英語の文
章を出力する（合成音で出力し、あるいは表示する）よ
うになされている。The output unit 7 is composed of, for example, a voice synthesizer, a speaker, a monitor, etc., and according to the control of the system control unit 4, the Japanese sentences stored in the first language sentence storage unit 5 and the second language. The English sentence stored in the sentence storage unit 6 is output (output with a synthetic sound or displayed).

【００２８】次に、図３のフローチャートを参照して、
その動作について説明する。この翻訳装置においては、
英訳したい日本語の文章（入力文）を、音声により、例
えば自立語などの単語の羅列で入力すると、その単語の
羅列が音声認識された後、日本語のまま文章化されるよ
うになされている。そして、この日本語の文章が、一
旦、ユーザに提示され、それが正しいことが確認される
と、その日本語の文章が英訳されるようになされてい
る。さらに、この翻訳装置においては、音声により入力
される入力文が表すユーザの意図（入力文に対応する音
声が表す、その音声の発話者（ユーザ）の意図）を入力
することができるようになされており、この意図が入力
された場合には、その意図にしたがって、音声認識およ
び翻訳が行われるようになされている。Next, referring to the flowchart of FIG.
The operation will be described. In this translation device,
When you input a Japanese sentence (input sentence) that you want to translate into English by voice, for example, by inputting a list of words such as independent words, after the list of words is recognized by voice, it will be translated into Japanese as it is. There is. Then, this Japanese sentence is presented to the user once, and when it is confirmed that it is correct, the Japanese sentence is translated into English. Further, in this translation device, the intention of the user represented by the input sentence input by voice (the intention of the speaker (user) of the voice represented by the voice corresponding to the input sentence) can be input. When this intention is input, voice recognition and translation are performed according to the intention.

【００２９】具体的には、ステップＳ０において、ユー
ザは、入力しようとする音声（入力文）が表す意図（発
話意図）を、意図キー３Ｄを操作することにより入力す
る。即ち、意図キー３は、音声が、例えば、依頼や、願
望、質問、説明などを意図するものであるときに操作さ
れる各種のキーから構成されており、ユーザは、この中
の、入力しようとする音声が表す意図に対応したものを
操作する。Specifically, in step S0, the user inputs the intention (speech intention) represented by the voice (input sentence) to be input by operating the intention key 3D. That is, the intention key 3 is composed of various keys that are operated when the voice is intended, for example, for a request, a desire, a question, an explanation, etc. Operate the one corresponding to the intention expressed by the voice.

【００３０】この操作に対応した操作信号は、コントロ
ールキー群３からシステム制御部４に供給されて受信さ
れる。これにより、システム制御部４では、ユーザがこ
れから発話しようとする音声が表す意図（以下、適宜、
発話意図という）が認識される。An operation signal corresponding to this operation is supplied from the control key group 3 to the system control unit 4 and received. As a result, the system control unit 4 causes the user's intention to be expressed by the voice that the user is about to speak (hereinafter, as appropriate,
The utterance intention) is recognized.

【００３１】その後、ステップＳ１において、マイク１
に対し、ステップＳ０で入力された発話意図に対応する
音声の入力があると、その音声は、電気信号としての音
声信号に変換され、音声認識部２に出力される。ここ
で、音声は、文章を発話したものであっても良いが、上
述したように、その文章の中の自立語（単語）を羅列し
たものであっても良い。即ち、例えば「あなたの好きな
レストランを教えて下さい」などというような文章の英
訳を得たい場合には、単語列「あなた、好き、レストラ
ン、教える」などを入力すれば良い。従って、ユーザ
は、音声の入力にそれほど負担を感じることはない。な
お、入力する単語の順番は、文章中に表れる単語の順番
と一致している必要はない。即ち、上述の例でいえば、
例えば「教える、あなた、好き、レストラン」などとい
う単語列を入力することもできる。Then, in step S1, the microphone 1
On the other hand, when a voice corresponding to the utterance intention input in step S0 is input, the voice is converted into a voice signal as an electric signal and output to the voice recognition unit 2. Here, the voice may be a utterance of a sentence, or may be a list of independent words (words) in the sentence as described above. That is, to obtain an English translation of a sentence such as "Please tell me your favorite restaurant", you can enter the word string "You, like, restaurant, teach." Therefore, the user does not feel much burden in inputting voice. The order of the words to be input does not have to match the order of the words appearing in the sentence. That is, in the above example,
For example, you can enter a word string such as "Teach, you, like, restaurant".

【００３２】音声認識部２（図２）では、ステップＳ２
において、マイク１に音声で入力された単語列を構成す
る各単語の音声認識が行われる。即ち、マイク１からの
音声信号は、Ａ／Ｄ変換器１２、さらには、音響分析部
１３を介することにより特徴パラメータ（あるいはシン
ボル）とされて、認識処理部１４に出力される。認識処
理部１４では、音響分析部１３の出力に基づいて、大規
模単語辞書１５に記憶されている単語を音声認識対象と
して音声認識が行われる。In the voice recognition unit 2 (FIG. 2), step S2
At, voice recognition of each word constituting the word string input by voice into the microphone 1 is performed. That is, the voice signal from the microphone 1 is output to the recognition processing unit 14 as a characteristic parameter (or symbol) through the A / D converter 12 and further the acoustic analysis unit 13. In the recognition processing unit 14, based on the output of the acoustic analysis unit 13, the speech stored in the large-scale word dictionary 15 is subjected to speech recognition as a speech recognition target.

【００３３】ここで、入力音声は、上述したように、自
立語の単語列であるから、一般的に、単語と単語との間
には、短い期間ではあるが無音が挿入される。そこで、
認識処理部１４では、その無音部分を検出し、そこを、
単語と単語の切れ目と認識して、入力された単語列を構
成する各単語それぞれを音声認識するようになされてい
る。従って、認識処理部１４は、連続音声認識ではな
く、それより簡易な孤立単語音声認識を行うことができ
るものであれば良いので、その大型化および高コスト化
を防止することができる。その結果、翻訳装置について
も、大型化および高コスト化を防止することができる。Here, since the input voice is a word string of independent words as described above, in general, silence is inserted between words for a short period of time. Therefore,
The recognition processing unit 14 detects the silent part,
Each word constituting the input word string is recognized by speech by recognizing a word and a word break. Therefore, the recognition processing unit 14 is not limited to the continuous speech recognition, as long as it can perform simpler isolated word speech recognition, so that it is possible to prevent the increase in size and cost. As a result, it is possible to prevent the translation apparatus from increasing in size and cost.

【００３４】さらに、認識処理部１４には、システム制
御部４で認識された発話意図が供給されるようになされ
ており、認識処理部１４では、この発話意図にしたがっ
て、音声認識が行われる。即ち、大規模単語辞書１５に
記憶されている単語は、例えば、依頼や、願望、質問、
説明などを表すものに分類されており、認識処理部１４
は、大規模単語辞書１５に記憶されている単語のうち、
システム制御部４からの発話意図を表すものとして分類
されている単語のみを、音声認識の対象として、マイク
１に入力された音声が音声認識される。Further, the recognition processing section 14 is supplied with the utterance intention recognized by the system control section 4, and the recognition processing section 14 performs voice recognition in accordance with the utterance intention. That is, the words stored in the large-scale word dictionary 15 are, for example, requests, desires, questions,
The recognition processing unit 14 is categorized into those that represent explanations.
Of the words stored in the large-scale word dictionary 15,
Only the words classified as representing the utterance intention from the system control unit 4 are subjected to voice recognition, and the voice input to the microphone 1 is voice-recognized.

【００３５】従って、マイク１に入力された音声が、そ
の音声が表す発話意図を表現するために用いられる単語
のみを対象として音声認識されるので、その認識精度を
向上させることができる。さらに、この場合、音声認識
の対象とする単語が少なくなるので、認識処理速度も向
上させることができる。Therefore, the voice input to the microphone 1 is voice-recognized only for the word used for expressing the utterance intention represented by the voice, and the recognition accuracy can be improved. Furthermore, in this case, the number of words to be subjected to voice recognition is reduced, so that the recognition processing speed can be improved.

【００３６】なお、大規模単語辞書１５に記憶されてい
る単語は、例えば、依頼や、願望、質問、説明などのう
ちのいずれか１つの発話意図を表すものに分類する他、
２以上の発話意図を表すものに分類すること、即ち、２
以上の意図に重複して分類することも可能である。It should be noted that the words stored in the large-scale word dictionary 15 are classified into, for example, ones representing the utterance intention of any one of a request, a desire, a question, an explanation, etc.
Classifying into two or more utterance intentions, that is, 2
It is also possible to classify the above intentions redundantly.

【００３７】音声認識の結果得られた１以上の単語から
なる単語列は、システム制御部４に出力される。システ
ム制御部４は、音声認識部２（認識処理部１４）から、
音声認識結果としての１以上の単語を受信すると、ステ
ップＳ３において、その単語の組み合わせと最も類似す
る文章を、第１言語文記憶部５に記憶されている日本語
の文章（以下、適宜、日本語文という）の中から検索す
る。The word string consisting of one or more words obtained as a result of the voice recognition is output to the system control unit 4. From the voice recognition unit 2 (recognition processing unit 14), the system control unit 4
When one or more words as the speech recognition result are received, in step S3, the sentence most similar to the combination of the words is written in the Japanese sentence stored in the first language sentence storage unit 5 (hereinafter, appropriately Japanese Search the word).

【００３８】ここで、ステップＳ３における検索は、例
えば、次のようにして行われる。即ち、システム制御部
４は、音声認識の結果得られた単語（以下、適宜、認識
単語という）すべてを含む日本語文を、第１言語文記憶
部５から検索する。そのような日本語文が存在する場
合、システム制御部４は、その日本語文を、認識単語の
組み合わせに最も類似するものとして、第１言語文記憶
部５から読み出す。また、第１言語文記憶部５に記憶さ
れている日本語文の中に、認識単語をすべて含むものが
存在しない場合、システム制御部４は、そのうちのいず
れか１単語を除いた単語をすべて含む日本語文を検索す
る。そのような日本語文が存在する場合、システム制御
部４は、その日本語文を、認識単語の組み合わせに最も
類似するものとして、第１言語文記憶部５から読み出
す。また、そのような日本語文が存在しない場合、シス
テム制御部４は、認識単語のうちのいずれか２単語を除
いた単語をすべて含む日本語文を検索する。以下、同様
にして、認識単語の組み合わせに最も類似する日本語文
が検索される。Here, the search in step S3 is performed as follows, for example. That is, the system control unit 4 searches the first language sentence storage unit 5 for a Japanese sentence including all the words (hereinafter, appropriately referred to as a recognized word) obtained as a result of the voice recognition. If such a Japanese sentence exists, the system control unit 4 reads the Japanese sentence from the first language sentence storage unit 5 as the one most similar to the combination of the recognized words. If none of the Japanese sentences stored in the first language sentence storage unit 5 include all the recognized words, the system control unit 4 includes all the words except one of them. Search for Japanese sentences. If such a Japanese sentence exists, the system control unit 4 reads the Japanese sentence from the first language sentence storage unit 5 as the one most similar to the combination of the recognized words. If such a Japanese sentence does not exist, the system control unit 4 searches for a Japanese sentence including all of the recognized words except for any two words. In the same manner, the Japanese sentence most similar to the combination of recognized words is searched.

【００３９】なお、第１言語文記憶部５には、日本語文
が、その文が表す意図とともに記憶されており、システ
ム制御部４は、第１言語文記憶部５に記憶されている日
本語文のうち、ステップＳ０で入力された発話意図に対
応するものだけを対象として、上述したような検索を行
う。The first language sentence storage unit 5 stores a Japanese sentence together with the intention represented by the sentence, and the system control unit 4 stores the Japanese sentence stored in the first language sentence storage unit 5. Of the above, only the object corresponding to the utterance intention input in step S0 is subjected to the above-described search.

【００４０】従って、音声認識の結果得られた単語の組
み合わせと最も類似する日本語文の検索が、第１言語文
記憶部５に記憶されている日本語文のうち、発話意図に
対応するものだけを対象として行われるので、その検索
精度を向上させることができる。さらに、この場合、検
索の対象とする日本語文が少なくなるので、検索処理速
度も向上させることができる。Therefore, the Japanese sentence most similar to the combination of words obtained as a result of the speech recognition is searched for only the Japanese sentence stored in the first language sentence storage unit 5 that corresponds to the utterance intention. Since it is performed as a target, the search accuracy can be improved. Further, in this case, the number of Japanese sentences to be searched is reduced, so that the search processing speed can be improved.

【００４１】以上のようにして、認識単語の組み合わせ
に最も類似する日本語文が検索された後は、ステップＳ
４に進み、その日本語文が、システム制御部４から出力
部７に供給されて出力される。即ち、第１言語文記憶部
５に記憶されている日本語文には、例えば、その日本語
文を構成する文字の文字コード、またはその日本語文の
音韻情報および韻律情報が付加されており、これらのデ
ータが出力部７に供給される。出力部７では、文字コー
ドに対応する文字がモニタに表示される。あるいはま
た、出力部７では、音声合成装置によって、音韻情報お
よび韻律情報に基づいて合成音が生成され、スピーカか
ら出力される。即ち、出力部７においては、日本語文が
モニタに表示され、あるいはまた、日本語文が合成音で
出力される（本明細書中においては、表示を行う場合、
および合成音の出力を行う場合のいずれの場合も、適
宜、出力という）。After the Japanese sentence most similar to the combination of the recognized words is retrieved as described above, step S
4, the Japanese sentence is supplied from the system control unit 4 to the output unit 7 and output. That is, the Japanese sentence stored in the first language sentence storage unit 5 is added with, for example, the character code of the characters forming the Japanese sentence, or the phonological information and the prosody information of the Japanese sentence. The data is supplied to the output unit 7. In the output unit 7, the character corresponding to the character code is displayed on the monitor. Alternatively, in the output unit 7, the voice synthesizer generates a synthetic sound based on the phoneme information and the prosody information, and outputs the synthesized sound from the speaker. That is, in the output unit 7, the Japanese sentence is displayed on the monitor, or the Japanese sentence is output as a synthetic voice (in the present specification, when displaying,
And, in any case, when outputting a synthetic sound, it is referred to as output).

【００４２】出力部７において、日本語文が出力された
後、それが正しいものかどうかがユーザによって確認さ
れる。即ち、ユーザは、出力部７から出力された日本語
文が正しい場合、コントロールキー群３の確認／変換キ
ー３Ａを操作する。また、ユーザは、出力部７から出力
された日本語文が誤っている場合、コントロールキー群
３の再検索／不変換キー３Ｂを操作する。いずれのキー
が操作されたかは、ステップＳ５において、システム制
御部４によって判定される。After the Japanese sentence is output at the output unit 7, the user confirms whether the sentence is correct or not. That is, if the Japanese sentence output from the output unit 7 is correct, the user operates the confirmation / conversion key 3A of the control key group 3. If the Japanese sentence output from the output unit 7 is incorrect, the user operates the re-search / non-conversion key 3B of the control key group 3. Which key has been operated is determined by the system control unit 4 in step S5.

【００４３】ステップＳ５において、再検索／不変換キ
ー３Ｂが操作されたと判定された場合、即ち、ステップ
Ｓ４で出力された日本語文が誤っている場合、ステップ
Ｓ１に戻る。従って、この場合、再び、ユーザによって
音声の入力がなされ、以下、上述した処理が行われるこ
とになる。If it is determined in step S5 that the re-search / non-conversion key 3B has been operated, that is, if the Japanese sentence output in step S4 is incorrect, the process returns to step S1. Therefore, in this case, the voice is again input by the user, and the above-described processing is performed thereafter.

【００４４】なお、ステップＳ５において、再検索／不
変換キー３Ｂが操作されたと判定された場合には、ステ
ップＳ１ではなく、図３において点線で示すように、ス
テップＳ２またはＳ３に戻るようにすることも可能であ
る。ステップＳ２に戻るようにした場合、音声認識部２
からは、ステップＳ１で入力された音声の音声認識処理
において得られた音声認識結果候補のうち、次に尤度の
高いものが、音声認識結果として出力される。即ち、音
声認識部２では、認識処理部１４において、大規模単語
辞書１５に記憶されている各単語が、入力音声にどれだ
け近いかを表す尤度が計算され、通常は、その尤度が最
も高い単語が、音声認識結果として出力される。従っ
て、ステップＳ４で出力された日本語文が誤っていると
いうことは、音声認識結果として出力した、最も尤度が
高い単語が誤っていた可能性があるから、その次に尤度
の高い単語が、音声認識結果として出力され、以下、ス
テップＳ３以降の処理が繰り返される。When it is determined in step S5 that the re-search / non-conversion key 3B has been operated, the process returns to step S2 or S3 instead of step S1 as shown by the dotted line in FIG. It is also possible. If the process returns to step S2, the voice recognition unit 2
From among the voice recognition result candidates obtained in the voice recognition processing of the voice input in step S1, the one with the next highest likelihood is output as the voice recognition result. That is, in the speech recognition unit 2, the recognition processing unit 14 calculates the likelihood that each word stored in the large-scale word dictionary 15 is close to the input speech, and normally, the likelihood is calculated. The highest word is output as the voice recognition result. Therefore, if the Japanese sentence output in step S4 is incorrect, it is possible that the word with the highest likelihood output as the speech recognition result was incorrect. , Is output as a voice recognition result, and thereafter, the processing from step S3 is repeated.

【００４５】また、ステップＳ３に戻るようにした場
合、いままで出力された日本語文以外が検索対象とされ
て、日本語文の検索が行われる。即ち、ステップＳ４で
出力された日本語文が誤っているということは、ステッ
プＳ３における検索処理が誤っていた可能性があるか
ら、そのような検索処理により得られた日本語文以外を
対象として、再度検索が行われる。When the process returns to step S3, the Japanese sentences other than the Japanese sentences that have been output so far are searched for, and the Japanese sentences are searched. That is, the fact that the Japanese sentence output in step S4 is incorrect means that the search processing in step S3 may have been incorrect. The search is done.

【００４６】なお、本実施例では、上述したように、発
話意図にしたがって音声認識、および日本語文の検索が
行われるので、その日本語文が誤っている場合は、かな
り少ないと考えられる。In the present embodiment, as described above, since the voice recognition and the retrieval of the Japanese sentence are performed according to the utterance intention, it is considered that the number of cases where the Japanese sentence is incorrect is considerably small.

【００４７】一方、ステップＳ５において、確認／変換
キー３Ａが操作されたと判定された場合、即ちステップ
Ｓ４で出力された日本語文が正しい場合（ステップＳ４
で出力された日本語文の意味と、ステップＳ１で入力さ
れた音声の意味とが合っている場合）、ステップＳ６に
進み、システム制御部４は、例えば、ステップＳ４で出
力された日本語文を構成する各単語（日本語の単語）を
英単語に変換し、その英単語の組合せに最も類似するも
のを、第２言語文記憶部６に記憶されている英語の文
（以下、適宜、英語文という）の中から検索する。On the other hand, if it is determined in step S5 that the confirmation / conversion key 3A has been operated, that is, if the Japanese sentence output in step S4 is correct (step S4).
(If the meaning of the Japanese sentence output in step S1 matches the meaning of the voice input in step S1), the process proceeds to step S6, where the system control unit 4 composes the Japanese sentence output in step S4. Each word (Japanese word) to be converted into an English word, and the word most similar to the combination of the English words is the English sentence stored in the second language sentence storage unit 6 (hereinafter referred to as an English sentence as appropriate). Search).

【００４８】なお、この検索は、例えば、ステップＳ３
における日本語文の検索を行う場合と同様にして行われ
る。即ち、第２言語文記憶部６においても、第１言語文
記憶部５における場合と同様に、英語文が、その文が表
す意図とともに記憶されており、システム制御部４は、
第２言語文記憶部６に記憶されている英語文のうち、ス
テップＳ０で入力された発話意図に対応するものだけを
対象として検索を行う。This search is performed, for example, in step S3.
It is performed in the same manner as the case of searching for a Japanese sentence in. That is, as in the case of the first language sentence storage unit 5, the English sentence is also stored in the second language sentence storage unit 6 together with the intention represented by the sentence, and the system control unit 4
Of the English sentences stored in the second language sentence storage unit 6, only the sentence corresponding to the utterance intention input in step S0 is searched.

【００４９】従って、この場合、発話意図にしたがった
英語文を対象に検索が行われるので、やはり、検索精度
および検索処理速度を向上させることができる。即ち、
この場合、日本語文が、その発話意図にしたがって、英
語文に翻訳されるので、その翻訳精度および翻訳処理速
度を向上させることができる。Therefore, in this case, since the search is performed for the English sentence according to the utterance intention, the search accuracy and the search processing speed can be improved. That is,
In this case, since the Japanese sentence is translated into the English sentence according to the utterance intention, the translation accuracy and the translation processing speed can be improved.

【００５０】以上のようにして、英語文、即ち、確認／
変換キー３Ａが操作されたときに出力部７から出力され
ていた日本語文と対応する英語文が検索されると、その
英語文は、ステップＳ７において、システム制御部４か
ら出力部７に供給されて出力され、処理を終了する。即
ち、第２言語文記憶部６には、上述の第１言語文記憶部
５における場合と同様に、英語文が、その英語文を構成
する文字の文字コード、またはその英語文の音韻情報お
よび韻律情報とともに記憶されており、これらのデータ
が出力部７に供給される。従って、ステップＳ７では、
ステップＳ４における場合と同様に、英語文がモニタに
表示され、あるいはまた、英語文が合成音で出力され
る。As described above, the English sentence, that is, confirmation /
When the English sentence corresponding to the Japanese sentence output from the output unit 7 is searched when the conversion key 3A is operated, the English sentence is supplied from the system control unit 4 to the output unit 7 in step S7. Is output and the processing ends. That is, in the second language sentence storage unit 6, as in the case of the first language sentence storage unit 5 described above, an English sentence is a character code of a character forming the English sentence, or phonological information of the English sentence and It is stored together with the prosody information, and these data are supplied to the output unit 7. Therefore, in step S7,
Similar to the case in step S4, the English sentence is displayed on the monitor, or alternatively, the English sentence is output as a synthetic sound.

【００５１】以上、本発明を、日本語の入力文を英語に
翻訳する翻訳装置に適用した場合について説明したが、
本発明は、この他、例えば日本語以外の言語を英語に翻
訳したり、また日本語を、英語以外の言語に翻訳する翻
訳装置に適用可能である。The case where the present invention is applied to a translation device for translating a Japanese input sentence into English has been described above.
The present invention is also applicable to a translation device that translates a language other than Japanese into English, or translates Japanese into a language other than English.

【００５２】なお、本実施例においては、自立語のみを
音声認識対象としたが、自立語以外の単語も音声認識対
象に含めることが可能である。In this embodiment, only the independent word is used as the voice recognition target, but words other than the independent word can be included in the voice recognition target.

【００５３】また、音声認識部２における音声認識アル
ゴリズム、出力部７で合成音を生成する手法、認識単語
と類似する日本語文を検索する手法、および英単語の組
合せと類似する英語文を検索する手法は、本実施例で説
明したものに限定されるものではない。Further, a voice recognition algorithm in the voice recognition unit 2, a method of generating a synthesized voice by the output unit 7, a method of searching a Japanese sentence similar to a recognized word, and an English sentence similar to a combination of English words are searched. The method is not limited to that described in this embodiment.

【００５４】さらに、本実施例では、日本語文を構成す
る日本語の単語を、英単語に変換することで、日本語文
の、英語文への翻訳を行うようにしたが、この翻訳の方
法も、特に限定されるものではない。即ち、例えば、第
２言語文記憶部６には、英語文を、第１言語文記憶部５
に記憶された日本語文と対応付けて記憶させておくよう
にし、ステップＳ６において、ステップＳ４で出力され
た日本語文に対応付けられている英語文を検索するよう
にすることで、日本語文の、英語文への翻訳を行うよう
にすることなどが可能である。この場合、日本語文と、
それを翻訳した英語文とが対応付けられているので、翻
訳を誤ることがない。Further, in this embodiment, the Japanese words constituting the Japanese sentence are converted into English words to translate the Japanese sentence into the English sentence, but this translation method is also applicable. It is not particularly limited. That is, for example, the second language sentence storage unit 6 stores an English sentence in the first language sentence storage unit 5.
In step S6, the English sentence associated with the Japanese sentence output in step S4 is searched for in association with the Japanese sentence stored in. It is possible to translate into English. In this case, the Japanese sentence
Since it is associated with the translated English sentence, there is no mistake in translation.

【００５５】また、本実施例では、音声認識部２におい
て、孤立単語音声認識を行うようにしたが、音声認識部
２には、この他、例えば連続音声認識やワードスポッテ
ィングなどを行わせるようにすることも可能である。Further, in the present embodiment, the speech recognition section 2 is adapted to perform the isolated word speech recognition. However, the speech recognition section 2 may be adapted to perform, for example, continuous speech recognition or word spotting. It is also possible to do so.

【００５６】さらに、本実施例では、意図キー３Ｄを操
作することで、発話意図を入力するようにしたが、発話
意図の入力は、その他、例えば音声によって行うことな
ども可能である。この場合、翻訳装置に、意図キー３Ｄ
を設ける必要がなくなるので、装置の小型化を図ること
が可能となる。Further, in the present embodiment, the utterance intention is inputted by operating the intention key 3D, but the utterance intention can be inputted by other means such as voice. In this case, the translation device has the intention key 3D
Since it is not necessary to provide the device, the device can be downsized.

【００５７】また、本実施例では、第１言語文記憶部５
と第２言語文記憶部６とを別々に構成するようにした
が、これらは、一体的に構成することも可能である。Further, in this embodiment, the first language sentence storage unit 5
Although the second language sentence storage unit 6 and the second language sentence storage unit 6 are configured separately, they may be configured integrally.

【００５８】さらに、本実施例では、ステップＳ１にお
いて音声を入力する前に、その音声が表す発話意図を入
力するようにしたが、発話意図の入力は、その他、例え
ば音声を入力した後に行うようにすることも可能であ
る。Further, in the present embodiment, the speech intention represented by the speech is inputted before the speech is inputted in step S1, but the speech intention is inputted by other means, for example, after the speech is inputted. It is also possible to

【００５９】また、本実施例では、音声により入力され
た入力文を翻訳するようにしたが、その他、例えばキー
ボードなどを操作することにより入力された入力文を翻
訳するようにすることも可能である。Further, in the present embodiment, the input sentence input by voice is translated, but it is also possible to translate the input sentence input by operating a keyboard or the like. is there.

【００６０】[0060]

【発明の効果】請求項１に記載の音声認識方法および請
求項３に記載の音声認識装置によれば、音声が表す発話
意図が入力され、その発話意図にしたがって、音声が音
声認識される。従って、音声の認識精度およびその処理
速度を向上させることが可能となる。According to the voice recognition method of the first aspect and the voice recognition apparatus of the third aspect, the utterance intention represented by the voice is input, and the voice is recognized according to the utterance intention. Therefore, it is possible to improve the voice recognition accuracy and the processing speed thereof.

【００６１】請求項４に記載の翻訳方法および請求項７
に記載の翻訳装置によれば、入力文が表す意図が入力さ
れ、入力文が表す意図にしたがって、その入力文が、他
の言語に翻訳される。従って、翻訳精度および翻訳の処
理速度を向上させることが可能となる。A translation method according to claim 4 and claim 7
According to the translation device described in (1), the intention represented by the input sentence is input, and the input sentence is translated into another language according to the intention represented by the input sentence. Therefore, it is possible to improve the translation accuracy and the translation processing speed.

[Brief description of drawings]

【図１】本発明を適用した翻訳装置の一実施例の構成を
示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a translation device to which the present invention has been applied.

【図２】図１の音声認識部２の詳細構成例を示すブロッ
ク図である。2 is a block diagram showing a detailed configuration example of a voice recognition unit 2 in FIG.

【図３】図１の翻訳装置の動作を説明するためのフロー
チャートである。FIG. 3 is a flowchart for explaining the operation of the translation device of FIG. 1;

[Explanation of symbols]

１マイク（マイクロフォン），２音声認識部，
３コントロールキー群，３Ｄ意図キー，４シ
ステム制御部，５第１言語文記憶部，６第２言語
文記憶部，７出力部，１２Ａ／Ｄ変換器，１
３音響分析部，１４認識処理部，１５大規模
単語辞書1 microphone (microphone), 2 voice recognition unit,
3 control key group, 3D intention key, 4 system control unit, 5 first language sentence storage unit, 6 second language sentence storage unit, 7 output unit, 12 A / D converter, 1
3 acoustic analysis unit, 14 recognition processing unit, 15 large-scale word dictionary

Claims

[Claims]

1. A voice recognition method comprising: an acoustic analysis step of acoustically analyzing input voice and outputting a characteristic parameter of the voice; and a voice recognition step of recognizing the voice based on the characteristic parameter. A voice recognition method, further comprising: an input step of inputting a utterance intention represented by the voice, wherein the voice recognition step recognizes the voice according to the utterance intention.

2. The voice recognition method according to claim 1, wherein the utterance intention is input by voice.

3. A voice recognition device comprising: an acoustic analysis unit that acoustically analyzes an input voice and outputs a characteristic parameter of the voice; and a voice recognition unit that recognizes the voice based on the characteristic parameter. A voice recognition device, further comprising an input unit for inputting a utterance intention represented by the voice, wherein the voice recognition unit recognizes the voice according to the utterance intention.

4. A translation method for translating an input sentence input in a predetermined language into another language, comprising: an input step of inputting an intention expressed by the input sentence; And a translation step of translating the input sentence into the other language.

5. The translation method according to claim 4, wherein the input sentence is input by voice, and the method further comprises a voice recognition step of recognizing the input sentence by voice.

6. The translation method according to claim 4, further comprising a voice recognition step of recognizing the intention represented by the input sentence by voice inputting the intention represented by the input sentence.

7. A translation device for translating an input sentence input in a predetermined language into another language, the input device inputting an intention expressed by the input sentence, and the intention input by the input device. According to the present invention, a translation device comprising: a translation unit that translates the input sentence into the other language.