JP2010026220A

JP2010026220A - Voice translation device and voice translation method

Info

Publication number: JP2010026220A
Application number: JP2008187011A
Authority: JP
Inventors: Shinichi Tsuchiya; 慎一土谷
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2008-07-18
Filing date: 2008-07-18
Publication date: 2010-02-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide voice translation device that facilitates performing voice translation of bidirectional interaction by one device. <P>SOLUTION: The voice translation device includes: a first microphone for collecting voice by a first language; a first translation section for performing translation in which voice expressed by a signal output by the first microphone is converted to a second language; a first speaker section for output of voice of the signal which is output by the first translation section; a second microphone for collecting voice by the second language; a second translation section for performing translation in which voice expressed by a signal output by the second microphone is converted to the first language; and a second speaker for output of voice of the signal which is output by the second translation section. A direction of a sound source, in which the first microphone most effectively collects voice is about the same as a direction in which the second speaker outputs voice, and a direction of a sound source, in which the second microphone most effectively collects voice is about the same as a direction in which the first speaker outputs voice. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声翻訳装置、および音声翻訳方法に関する。 The present invention relates to a speech translation apparatus and a speech translation method.

従来の音声翻訳装置は、翻訳入力マイクに加えて、集音マイクを備え、集音マイクから入力された音声に基づき周囲環境を判定し、判定した周囲環境を用いて、翻訳入力マイクから入力された音声を翻訳し、翻訳した結果の音声を、翻訳出力スピーカから出力している（例えば、特許文献１）。 A conventional speech translation apparatus includes a sound collection microphone in addition to a translation input microphone, and determines the surrounding environment based on the sound input from the sound collection microphone, and is input from the translation input microphone using the determined ambient environment. The translated voice is translated and the translated voice is output from the translation output speaker (for example, Patent Document 1).

また、特許文献２には、可聴帯域の音声信号を、可聴帯域より高い周波数帯域（超音波帯）の信号に変調し、スピーカから出力することで、指向性を持たせて音声を出力する超指向性スピーカの技術とともに、この超指向性スピーカの技術を用いて、複数の番組を表示しているテレビジョン受像機が、各番組の音声をそれぞれ異なる方向に指向性を持たせて出力する技術が記載されている。また、両面ディスプレイの各々の面に対応して、超指向性スピーカを設けた両面映像ディスプレイ装置もある（例えば、特許文献３）。
また、携帯電話端末などの携帯端末装置が、複数の超指向性スピーカを備え、通話内容などが使用者以外に聞こえないようにしたものもある（例えば、特許文献４）。
特開２００３−１０８５５１号公報特開平１１−２６２０８４号公報特開２００７−２５１３３１号公報特開２００６−０６７３８６号公報 In Patent Document 2, an audio signal in an audible band is modulated into a signal in a frequency band (ultrasonic band) higher than the audible band, and output from a speaker, so that an audio signal having a directivity is output. Along with the directional speaker technology, this super directional speaker technology allows a television receiver displaying a plurality of programs to output the sound of each program in different directions. Is described. There is also a double-sided video display device provided with a super-directional speaker corresponding to each surface of the double-sided display (for example, Patent Document 3).
In addition, there is a mobile terminal device such as a mobile phone terminal that includes a plurality of super-directional speakers so that the contents of a call cannot be heard by anyone other than the user (for example, Patent Document 4).
JP 2003-108551 A Japanese Patent Laid-Open No. 11-262084 JP 2007-251331 A JP 2006-067386 A

しかしながら、従来の音声翻訳装置にあっては、双方向の対話をするには、対話者の人数分の音声翻訳装置を用いなければならない、あるいは、１台の音声翻訳装置を対話者の間で交互に用いなければならず煩雑であるという問題がある。
なお、特許文献２のテレビジョン受像機および特許文献３の両面ディスプレイは、複数の人に個別に音声を出力するものの、対話および音声翻訳は行えず、特許文献４の携帯電話端末は、音声翻訳は行えないものの、双方向の対話を行うことはできるが、対話者の人数分の装置が必要である。 However, in the conventional speech translation apparatus, in order to perform a two-way dialogue, it is necessary to use the speech translation apparatus for the number of the talkers, or one speech translation apparatus between the talkers. There is a problem that it must be used alternately and is complicated.
In addition, although the television receiver of patent document 2 and the double-sided display of patent document 3 output audio | voice individually to several persons, a dialog and voice translation cannot be performed, and the mobile telephone terminal of patent document 4 is voice-translated. Although it is not possible to perform two-way dialogue, it is necessary to have a device equivalent to the number of dialoguers.

本発明は、このような事情に鑑みてなされたもので、その目的は、双方向の対話の音声翻訳を、１台で容易に行うことができる音声翻訳装置を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a speech translation apparatus capable of easily performing speech translation of a two-way dialogue with a single device.

（１）この発明は上述した課題を解決するためになされたもので、本発明の音声翻訳装置は、第１の言語による音声を収音し、該収音した音声を信号に変換して出力する第１マイクロフォンと、前記第１マイクロフォンが出力する信号を受け、該信号が表す音声を第２の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を出力する第１翻訳部と、前記第１翻訳部が出力した信号の音声を出力する第１スピーカ部と、第２の言語による音声を収音し、該収音した音声を信号に変換して出力する第２マイクロフォンと、前記第２マイクロフォンが出力する信号を受け、該信号が表す音声を第１の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を出力する第２翻訳部と、前記第２翻訳部が出力した信号の音声を出力する第２スピーカ部とを具備し、前記第１マイクロフォンが最も効率良く収音可能な音源の方向と、前記第２スピーカ部が音声を出力する方向とが略一致し、前記第２マイクロフォンが最も効率良く収音可能な音源の方向と、前記第１スピーカ部が音声を出力する方向とが略一致していることを特徴とする。 (1) The present invention has been made to solve the above-described problems, and the speech translation apparatus of the present invention collects speech in the first language, converts the collected speech into a signal, and outputs the signal. A first microphone that receives the signal output from the first microphone, performs a translation process to generate a signal representing a voice converted from the voice represented by the signal into a second language, and outputs a signal of the translation process result A first translation unit, a first speaker unit that outputs a voice of a signal output from the first translation unit, a voice in a second language is collected, and the collected voice is converted into a signal and output. Receiving a signal output from the second microphone, performing a translation process for generating a signal representing a voice converted from the voice represented by the signal into the first language, and outputting a signal of the translation process result A second translation unit, and the second translation unit A second speaker unit that outputs the sound of the signal output by the translation unit, and the direction of the sound source that the first microphone can collect the sound most efficiently and the direction in which the second speaker unit outputs the sound. The direction of the sound source that can be picked up most efficiently by the second microphone and the direction in which the first speaker unit outputs sound are substantially the same.

（２）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第１スピーカ部および前記第２スピーカ部は、指向性を持たせて音声を出力する指向性スピーカであることを特徴とする。 (2) The speech translation device of the present invention is the speech translation device described above, wherein the first speaker unit and the second speaker unit are directional speakers that output speech with directivity. It is characterized by that.

（３）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、第２翻訳部が前記第２マイクロフォンから翻訳対象の信号を受け取り中であることを、前記第２スピーカ部が音声を出力する方向に通知する第１翻訳状態通知部と、第１翻訳部が前記第１マイクロフォンから翻訳対象の信号を受け取り中であることを、前記第１スピーカ部が音声を出力する方向に通知する第２翻訳状態通知部とを具備することを特徴とする。 (3) Further, the speech translation device of the present invention is the speech translation device described above, wherein the second speaker unit is receiving a signal to be translated from the second microphone. A first translation state notifying unit for notifying in a direction of outputting sound, and that the first translation unit is receiving a signal to be translated from the first microphone; And a second translation state notifying unit for notifying.

（４）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第１翻訳部が翻訳処理中であることを、前記第２スピーカ部が音声を出力する方向に通知する第１翻訳状態通知部と、前記第２翻訳部が翻訳処理中であることを、前記第１スピーカ部が音声を出力する方向に通知する第２翻訳状態通知部とを具備することを特徴とする。 (4) Moreover, the speech translation apparatus according to the present invention is the above-described speech translation apparatus, and notifies the second speaker unit in the direction of outputting speech that the first translation unit is performing translation processing. A first translation state notifying unit and a second translation state notifying unit for notifying in a direction in which the first speaker unit outputs a voice that the second translation unit is performing a translation process, To do.

（５）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第１翻訳部が翻訳処理結果の音声の信号を出力中であることを、前記第２スピーカ部が音声を出力する方向に通知する第１翻訳状態通知部と、前記第２翻訳部が翻訳処理結果の音声の信号を出力中であることを、前記第１スピーカ部が音声を出力する方向に通知する第２翻訳状態通知部とを具備することを特徴とする。 (5) Further, the speech translation device of the present invention is the speech translation device described above, wherein the first speaker unit is outputting a speech signal as a translation processing result, and the second speaker unit The first translation state notifying unit for notifying in the direction of outputting the signal and the second translation unit notifying that the first speaker unit outputs the sound signal of the translation processing result in the direction of outputting the sound. And a second translation state notifying unit.

（６）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第１スピーカ部は、前記第１翻訳部が出力した信号の音声に加えて、前記第２翻訳部が出力した信号の音声を出力し、前記第２スピーカ部は、前記第２翻訳部が出力した信号の音声に加えて、前記第１翻訳部が出力した信号の音声を出力することを特徴とする。 (6) Moreover, the speech translation device of the present invention is the speech translation device described above, wherein the first speaker unit includes the second translation unit in addition to the speech of the signal output from the first translation unit. The sound of the output signal is output, and the second speaker unit outputs the sound of the signal output from the first translation unit in addition to the sound of the signal output from the second translation unit. .

（７）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第１翻訳部は、さらに前記第２翻訳部が翻訳処理した結果の信号を受け、該信号が表す音声を第２の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を出力し、前記第２翻訳部は、さらに前記第１翻訳部が翻訳処理した結果の信号を受け、該信号が表す音声を第１の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の音声の信号を出力することを特徴とする。 (7) The speech translation apparatus of the present invention is the speech translation apparatus described above, wherein the first translation unit further receives a signal resulting from translation processing by the second translation unit, and the speech represented by the signal Is translated into a second language to generate a signal representing the speech, and a signal resulting from the translation process is output. The second translation unit is a signal resulting from the translation process performed by the first translation unit. And performing a translation process for generating a signal representing a voice obtained by converting the voice represented by the signal into a first language, and outputting a speech signal as a result of the translation process.

（８）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第１翻訳部による翻訳処理の開始指示を受けつける第１翻訳開始指示部と、前記第２翻訳部による翻訳処理の開始指示を受けつける第２翻訳開始指示部とを具備し、前記第１翻訳部は、前記第１翻訳開始指示部が翻訳開始の指示を受けると、前記第１マイクロフォンが出力する信号を受けて第１の言語に変換する翻訳処理を開始し、前記第２翻訳部は、前記第２翻訳開始指示部が翻訳開始の指示を受けると、前記第１マイクロフォンが出力する信号を受けて第１の言語に変換する翻訳処理を開始することを特徴とする。 (8) Moreover, the speech translation apparatus of the present invention is the speech translation apparatus described above, wherein the first translation start instruction unit receives a translation processing start instruction by the first translation unit, and the translation by the second translation unit. A second translation start instruction unit that receives a process start instruction, and the first translation unit receives a signal output from the first microphone when the first translation start instruction unit receives an instruction to start translation. The second translation unit receives a signal output from the first microphone when the second translation start instruction unit receives an instruction to start translation, and receives a signal output from the first microphone. The translation processing for converting to the language is started.

（９）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第２翻訳開始指示部は、前記第１スピーカ部の出力方向を撮像する撮像部を備え、該撮像部による撮像結果から前記翻訳開始の指示を検出することを特徴とする。 (9) The speech translation device of the present invention is the speech translation device described above, wherein the second translation start instruction unit includes an imaging unit that images the output direction of the first speaker unit, and the imaging unit The translation start instruction is detected from the imaging result obtained by.

（１０）また、本発明の音声翻訳装置は、上述の音声翻訳装置であって、前記第２翻訳開始指示部は、前記第２マイクロフォンが出力する信号を受け、該信号から前記翻訳開始の指示を検出することを特徴とする。 (10) The speech translation device of the present invention is the speech translation device described above, wherein the second translation start instruction unit receives a signal output from the second microphone and instructs the translation start from the signal. Is detected.

（１１）また、本発明の音声翻訳方法は、音声翻訳装置における音声翻訳方法において、前記音声翻訳装置の第１のマイクロフォンが、第１の言語による音声を収音し、該収音した音声を信号に変換して出力する第１の過程と、前記音声翻訳装置が、前記第１の過程にて出力した信号を受け、該信号が表す音声を第２の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を出力する第２の過程と、前記音声翻訳装置が、前記第２の過程にて出力した信号の音声を、前記音声翻訳装置の第２マイクロフォンが最も効率良く収音可能な音源の方向と略一致する方向に出力する第３の過程と、前記音声翻訳装置の第２のマイクロフォンが、第２の言語による音声を収音し、該収音した音声を信号に変換して出力する第４の過程と、前記音声翻訳装置が、前記第４の過程にて出力した信号を受け、該信号が表す音声を第１の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を出力する第５の過程と、前記音声翻訳装置が、前記第５の過程にて出力した信号の音声を、前記第１マイクロフォンが最も効率良く収音可能な音源の方向と略一致する方向に出力する第６の過程と有することを特徴とする。 (11) In the speech translation method of the present invention, in the speech translation method in the speech translation device, the first microphone of the speech translation device collects the speech in the first language, and the collected speech is A first step of converting the signal into a signal and outputting the signal, and a signal representing the sound obtained by converting the speech represented by the speech translation device into the second language upon receiving the signal output in the first step. A second step of performing a translation process to be generated and outputting a signal of the result of the translation process; and a voice of the signal output by the speech translation device in the second step is a second microphone of the speech translation device. Is output in a direction substantially coincident with the direction of the sound source that can collect sound most efficiently, and the second microphone of the speech translation apparatus collects speech in the second language, The converted voice is converted into a signal and output. And a translation process in which the speech translation device receives the signal output in the fourth step and generates a signal representing speech obtained by converting speech represented by the signal into a first language, The fifth process of outputting the processing result signal, and the voice of the signal output by the speech translation apparatus in the fifth process are approximately the direction of the sound source that the first microphone can collect most efficiently. And a sixth process of outputting in the matching direction.

この発明によれば、第１マイクロフォンが最も効率良く収音可能な音源の方向と、第２スピーカ部が音声を出力する方向とが略一致し、第２マイクロフォンが最も効率良く収音可能な音源の方向と、第１スピーカ部が音声を出力する方向とが略一致しているので、双方向の対話の音声翻訳を、１台で容易に行うことができる。 According to this invention, the direction of the sound source that the first microphone can collect sound most efficiently coincides with the direction in which the second speaker unit outputs sound, and the sound source that the second microphone can collect sound most efficiently. Since the direction in which the first speaker unit outputs the voice substantially coincides with each other, the voice translation of the interactive dialogue can be easily performed with one unit.

［第１の実施形態］
以下、図面を参照して、本発明の第１の実施形態について説明する。図１は、この発明の第１の実施形態による音声翻訳装置１の構成を示す概略ブロック図である。音声翻訳装置１は、第１マイクロフォン１０、第１翻訳部１１、第１音声記憶部１２、第１指向性スピーカ部１３、第１翻訳状態通知部１６、表示部１７、第１翻訳開始指示部１８、第２マイクロフォン２１、第２翻訳部２２、第２音声記憶部２３、第２指向性スピーカ部２４、第２翻訳状態通知部２７、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ；発光ダイオード）２８、第２翻訳開始指示部２９を具備する。音声翻訳装置１は、設定により、様々な言語間の音声翻訳を行うことが可能であるが、以下では、第１の言語を理解する対話者Ａと、第２の言語を理解する対話者Ｂとの双方向の対話を音声翻訳する場合を、説明する。 [First Embodiment]
Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a schematic block diagram showing the configuration of a speech translation apparatus 1 according to the first embodiment of the present invention. The speech translation apparatus 1 includes a first microphone 10, a first translation unit 11, a first speech storage unit 12, a first directional speaker unit 13, a first translation state notification unit 16, a display unit 17, and a first translation start instruction unit. 18, 2nd microphone 21, 2nd translation part 22, 2nd audio | voice storage part 23, 2nd directional speaker part 24, 2nd translation state notification part 27, LED (Light Emitting Diode) 28, 2nd translation A start instruction unit 29 is provided. The speech translation apparatus 1 can perform speech translation between various languages depending on the setting. In the following, a conversation person A who understands the first language and a conversation person B who understands the second language. A case where a two-way dialogue with a voice is translated is explained.

第１翻訳開始指示部１８は、第１翻訳部１１による翻訳処理、すなわち対話者Ａの音声の翻訳処理の開始指示を、外部から受けつける。本実施形態では、第１翻訳開始指示部１８は、当該音声翻訳装置１の表面に設けられ、押されたこと（加圧）を検出するキーボタン（外面の特定領域）である操作部（検出部）２０と、操作部２０のキーボタンが押されたことを翻訳処理の開始指示として検出する第１指示検出部１９とを具備する。ここで、キーボタンは複数有り、上下左右の方向の各々、０〜９の数字の各々、「＃」、「＊」、「決定」などが対応付けられており、第１指示検出部１９は、これらのうちのいずれか一つが押されたことを翻訳処理の開始指示として検出するようにしてもよいし、特定の一つ、例えば「決定」のキーボタンが押されたことを翻訳処理の開始指示として検出するようにしてもよい。なお、本実施形態においては、操作部２０は、音声翻訳装置１の表面に設けられ、押されたことを検出するキーボタンであるとして説明したが、タッチパネルのように外面の特定領域への接触を検出するようにしてもよい。 The first translation start instructing unit 18 receives from the outside an instruction to start translation processing by the first translating unit 11, that is, speech translation processing of the conversation person A. In the present embodiment, the first translation start instruction unit 18 is provided on the surface of the speech translation apparatus 1 and is an operation unit (detection) that is a key button (a specific area on the outer surface) that detects being pressed (pressurization). Part) 20 and a first instruction detection part 19 for detecting that a key button of the operation part 20 is pressed as a translation process start instruction. Here, there are a plurality of key buttons, and each of the numbers from 0 to 9 is associated with “#”, “*”, “decision”, and the like. , It may be detected that one of these has been pressed as an instruction to start translation processing, or that a specific one, for example, a “decision” key button has been pressed It may be detected as a start instruction. In the present embodiment, the operation unit 20 is described as a key button that is provided on the surface of the speech translation apparatus 1 and detects that it is pressed. However, the operation unit 20 is in contact with a specific area on the outer surface as in a touch panel. May be detected.

第１マイクロフォン１０は、収音可能な周波数帯域に人の声の周波数帯域を含むマイクロフォンであり、対話者Ａの外部からの第１の言語による音声を収音し、該収音した音声を信号に変換して出力する。
第１翻訳部１１は、第１マイクロフォン１０が出力する信号、すなわち対話者Ａの音声の信号を受け、該信号が表す音声を第２の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を出力する。より詳細には、第１翻訳部１１は、第１翻訳開始指示部１８が翻訳開始の指示を受けると、第１マイクロフォン１０が出力する信号を受け、音声認識処理を行い、該信号が表す音声を第１の言語の文字列に変換して第１音声記憶部１２に記憶させる。次に、第１翻訳部１１は、翻訳処理を行うことで、第１音声記憶部１２に記憶させた第１の言語の文字列を第２の言語の文字列に変換する。さらに、第１翻訳部１１は、音声合成処理を行って、この第２の言語の文字列から第２の言語の音声の信号を生成し、出力する。第１音声記憶部１２は、ＲＡＭ（Random Access Memory）、ハードディスクなどの読書き可能なメモリである。 The first microphone 10 is a microphone that includes a human voice frequency band in a frequency band where sound can be collected. The first microphone 10 collects a voice in the first language from the outside of the conversation person A, and signals the collected voice as a signal. Convert to and output.
The first translation unit 11 receives a signal output from the first microphone 10, that is, a voice signal of the conversation person A, and performs a translation process for generating a signal representing the voice obtained by converting the voice represented by the signal into the second language. The translation processing result signal is output. More specifically, when the first translation start instructing unit 18 receives an instruction to start translation, the first translation unit 11 receives a signal output from the first microphone 10, performs speech recognition processing, and represents the voice represented by the signal. Is converted into a character string of the first language and stored in the first speech storage unit 12. Next, the 1st translation part 11 converts the character string of the 1st language memorize | stored in the 1st audio | voice storage part 12 into the character string of a 2nd language by performing a translation process. Further, the first translation unit 11 performs a speech synthesis process to generate and output a second language speech signal from the second language character string. The first sound storage unit 12 is a readable / writable memory such as a RAM (Random Access Memory) or a hard disk.

第１指向性スピーカ部（第１スピーカ部）１３は、第１翻訳部１１が出力した信号の音声、すなわち対話者Ａによる第１の言語の音声を第２の言語に翻訳した音声を出力する。本実施形態では、第１指向性スピーカ部１３は、第１指向性制御部１４と第１スピーカ１５とを備え、指向性を持たせて音声を出力する機能を営む。第１指向性制御部１４は、第１翻訳部１１から受けた音声の信号（可聴帯域の音声信号）を、可聴帯域より高い周波数帯域（超音波帯）の信号に変調する。第１スピーカ１５は、第１指向性制御部１４による変調結果の周波数帯域の超音波を出力可能な超音波発生素子である。第１スピーカ１５は、第１指向性制御部１４により変調された信号の超音波を出力することで、第１指向性制御部１４にて変調した音声を、指向性を持たせて出力する。なお、このように音声信号を可聴帯域より高い周波数帯域（超音波帯）の信号に変調し、該信号の超音波をスピーカで出力することで、指向性を持たせて音声を出力することは公知の技術であり、このときに用いる変調方式には、振幅変調、周波数変調、位相変調などがあり、いずれを用いても良い。 The first directional speaker unit (first speaker unit) 13 outputs the voice of the signal output from the first translation unit 11, that is, the voice obtained by translating the voice of the first language by the conversation person A into the second language. . In the present embodiment, the first directional speaker unit 13 includes a first directional control unit 14 and a first speaker 15 and functions to output sound with directivity. The first directivity control unit 14 modulates the voice signal (audio signal in the audible band) received from the first translation unit 11 into a signal in a frequency band (ultrasonic band) higher than the audible band. The first speaker 15 is an ultrasonic wave generating element that can output ultrasonic waves in the frequency band of the modulation result by the first directivity control unit 14. The first speaker 15 outputs the ultrasonic wave of the signal modulated by the first directivity control unit 14 to output the sound modulated by the first directivity control unit 14 with directivity. In this way, by modulating the audio signal into a signal in a frequency band (ultrasonic band) higher than the audible band and outputting the ultrasonic wave of the signal with a speaker, it is possible to output sound with directivity. There are known techniques, and the modulation method used at this time includes amplitude modulation, frequency modulation, phase modulation, and the like, and any of them may be used.

第２翻訳開始指示部２９は、第２翻訳部２２による翻訳処理、すなわち対話者Ｂの音声の翻訳処理の開始指示を、外部から受けつける。本実施形態では、第２翻訳開始指示部２９は、当該音声翻訳装置１の表面に設けられ、第１スピーカの音声の出力方向を撮像する（すなわち、対話者Ｂを撮像する）カメラ（撮像部）３１と第２指示検出部３０とを備える。第２指示検出部３０は、カメラ３１による撮像結果の画像中からの予め決められた形状または被写体の動きの検出を開始指示の検出とする。これにより、対話者Ａが音声翻訳装置１を携帯しているときに、対話者Ｂは、特定のポーズ（予め決められた形状）をとること、あるいは特定の動きをすることにより、翻訳開始を指示することができる。 The second translation start instructing unit 29 receives an instruction to start translation processing by the second translating unit 22, that is, a speech translation processing of the conversation person B from the outside. In the present embodiment, the second translation start instructing unit 29 is provided on the surface of the speech translation apparatus 1, and is a camera (imaging unit) that captures an image of the voice output direction of the first speaker (that is, images the conversation person B). ) 31 and the second instruction detection unit 30. The second instruction detection unit 30 detects detection of a predetermined shape or subject movement from an image captured by the camera 31 as a start instruction. Thereby, when the conversation person A carries the speech translation apparatus 1, the conversation person B starts translation by taking a specific pose (predetermined shape) or performing a specific movement. Can be directed.

第２マイクロフォン２１は、収音可能な周波数帯域に人の声の周波数帯域を含むマイクロフォンであり、対話者Ｂによる外部からの第２の言語による音声を収音し、該収音した音声を信号に変換して出力する。
第２翻訳部２２は、第２マイクロフォン２１が出力する信号、すなわち対話者Ｂの音声の信号を受け、該信号が表す音声を第１の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を出力する。より詳細には、第２翻訳部２２は、第２翻訳開始指示部２９が翻訳開始の指示を受けると、第２マイクロフォン２１が出力する信号を受け、音声認識処理を行い、該信号が表す音声を第１の言語の文字列に変換して第２音声記憶部２３に記憶させる。次に、第２翻訳部２２は、翻訳処理を行うことで、第２音声記憶部２３に記憶させた第２の言語の文字列を第１の言語の文字列に変換する。さらに、第２翻訳部２２は、音声合成処理を行って、この第１の言語の文字列から第１の言語の音声の信号を生成し、出力する。第２音声記憶部２３は、ＲＡＭ（Random Access Memory）、ハードディスクなどの読書き可能なメモリである。 The second microphone 21 is a microphone that includes a human voice frequency band in a frequency band in which sound can be collected. The second microphone 21 collects a voice in a second language from the outside by the conversation person B, and signals the collected voice as a signal. Convert to and output.
The second translation unit 22 receives a signal output from the second microphone 21, that is, a voice signal of the conversation person B, and performs a translation process for generating a signal representing the voice obtained by converting the voice represented by the signal into the first language. The translation processing result signal is output. More specifically, when the second translation start instruction unit 29 receives an instruction to start translation, the second translation unit 22 receives a signal output from the second microphone 21, performs voice recognition processing, and represents the voice represented by the signal. Is converted into a character string of the first language and stored in the second speech storage unit 23. Next, the second translation unit 22 performs a translation process to convert the second language character string stored in the second voice storage unit 23 into a first language character string. Further, the second translation unit 22 performs speech synthesis processing to generate and output a first language speech signal from the first language character string. The second sound storage unit 23 is a readable / writable memory such as a RAM (Random Access Memory) or a hard disk.

第２指向性スピーカ部（第２スピーカ部）２４は、第２翻訳部２２が出力した信号の音声、すなわち対話者Ｂによる第２の言語の音声を第１の言語に翻訳した音声を出力する。本実施形態では、第２指向性スピーカ部２４は、第２指向性制御部２５と第２スピーカ２６とを備え、指向性を持たせて音声を出力する機能を営む。第２指向性制御部２５は、第１指向性制御部１４と同様に、第２翻訳部２２から受けた音声の信号（可聴帯域の音声信号）を、可聴帯域より高い周波数帯域（超音波帯）の信号に変調する。第２スピーカ２６は、第２指向性制御部２５による変調結果の周波数帯域の超音波を出力可能な、第１スピーカ１５と同様の超音波発生素子である。第２スピーカ２６は、第２指向性制御部２５により変調された信号の超音波を出力することで、第２指向性制御部２５にて変調した音声を、指向性を持たせて出力する。なお、第１指向性スピーカ部１３の説明でも述べたように、このように指向性を持たせて音声を出力することは公知の技術であり、このときに用いる変調方式には、振幅変調、周波数変調、位相変調などがあり、いずれを用いてもよい。 The second directional speaker unit (second speaker unit) 24 outputs the voice of the signal output from the second translation unit 22, that is, the voice obtained by translating the voice of the second language by the conversation person B into the first language. . In the present embodiment, the second directivity speaker unit 24 includes a second directivity control unit 25 and a second speaker 26, and serves to output sound with directivity. Similar to the first directivity control unit 14, the second directivity control unit 25 converts the audio signal (audio signal in the audible band) received from the second translation unit 22 into a frequency band (ultrasonic band) higher than the audible band. ) Signal. The second speaker 26 is an ultrasonic wave generating element similar to the first speaker 15 that can output ultrasonic waves in the frequency band of the modulation result by the second directivity control unit 25. The second speaker 26 outputs the ultrasonic wave of the signal modulated by the second directivity control unit 25, thereby outputting the sound modulated by the second directivity control unit 25 with directivity. As described in the description of the first directional speaker unit 13, it is a known technique to output sound with directivity in this way. The modulation method used at this time includes amplitude modulation, There are frequency modulation, phase modulation, and the like, and any of them may be used.

第１翻訳状態通知部１６は、第１翻訳部１１および第２翻訳部２２から通知されて、それらの動作状態を取得し、取得した状態に応じて、第２翻訳部２２が第２マイクロフォン２１から翻訳対象の信号を受け取り中であること、第１翻訳部１１が翻訳処理中であること、および、第１翻訳部１１が翻訳処理結果の音声の信号を出力中であること、すなわち第２スピーカ２６が翻訳結果の音声を出力中であることを、それぞれを表す文字、図形などを、液晶または有機ＥＬ（ＯｒｇａｎｉｃＥｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ；有機エレクトロルミネセンス）などのディスプレイである表示部１７に表示して、通知する。 The first translation state notifying unit 16 is notified from the first translation unit 11 and the second translation unit 22 to acquire their operation states, and the second translation unit 22 receives the second microphone 21 in accordance with the acquired states. That the signal to be translated is being received from the first translation unit 11, that the first translation unit 11 is in translation processing, and that the first translation unit 11 is outputting the speech signal of the translation processing result, that is, second That the speaker 26 is outputting the voice of the translation result is displayed on the display unit 17 which is a display such as a liquid crystal or organic EL (Organic Electro-Luminescence). And notify.

表示部１７は、第２指向性スピーカ部２４が音声を出力する方向に表示するディスプレイである。すなわち表示部１７の表示面は、第２指向性スピーカ部２４の第２スピーカ２６と同一の方向を向くように音声翻訳装置１の表面に設けられており、第２指向性スピーカ部２４が出力した音声を聞き取れる位置から、表示部１７の表示内容を見ることができる。 The display unit 17 is a display that displays in a direction in which the second directional speaker unit 24 outputs sound. That is, the display surface of the display unit 17 is provided on the surface of the speech translation apparatus 1 so as to face the same direction as the second speaker 26 of the second directional speaker unit 24, and the second directional speaker unit 24 outputs it. The display content of the display unit 17 can be viewed from the position where the received voice can be heard.

第２翻訳状態通知部２７は、第１翻訳部１１および第２翻訳部２２から通知されて、それらの動作状態を取得し、取得した状態に応じて、第１翻訳部１１が第１マイクロフォン１０から翻訳対象の信号を受け取り中であること、第２翻訳部２２が翻訳処理中であること、および、第２翻訳部２２が翻訳処理結果の音声の信号を出力中であること、すなわち第１スピーカ１５が翻訳結果の音声を出力中であることを表す色または点滅などの発光パターンでＬＥＤ２８を発光させて通知するが、この通知の態様はこれに限定されない。 The second translation state notification unit 27 is notified from the first translation unit 11 and the second translation unit 22 to acquire the operation state thereof, and the first translation unit 11 performs the first microphone 10 according to the acquired state. That the signal to be translated is being received from the second translation unit 22, that the second translation unit 22 is in the translation process, and that the second translation unit 22 is outputting the speech signal of the translation process result, that is, the first The speaker 15 is notified by causing the LED 28 to emit light with a light emission pattern such as a color indicating that the speech of the translation result is being output or blinking, but the mode of this notification is not limited thereto.

ＬＥＤ２８は、第１指向性スピーカ部１３が音声を出力する方向に表示する、すなわち第１指向性スピーカ部１３が出力した音声を聞き取れる位置から、表示内容を見ることができるように、当該音声翻訳装置１の表面に設けられている。
また、第１マイクロフォン１０が最も効率良く収音可能な音源の方向と、第２指向性スピーカ部２４の第２スピーカ２６が音声を出力する方向とが略一致し、第２マイクロフォン２１が最も効率良く収音可能な音源の方向と、第１指向性スピーカ部１３の第１スピーカ１５が音声を出力する方向とが略一致している。 The LED 28 displays in the direction in which the first directional speaker unit 13 outputs sound, that is, the speech translation so that the display content can be seen from the position where the sound output from the first directional speaker unit 13 can be heard. It is provided on the surface of the device 1.
In addition, the direction of the sound source that the first microphone 10 can collect sound most efficiently coincides with the direction in which the second speaker 26 of the second directional speaker unit 24 outputs sound, and the second microphone 21 is the most efficient. The direction of the sound source that can collect sound well and the direction in which the first speaker 15 of the first directional speaker unit 13 outputs sound substantially coincide.

このように第１マイクロフォン１０、第２スピーカ２６、第２マイクロフォン２１、第１スピーカ１５が配置されているので、対話者Ａは、第２スピーカ２６から出力された第１の言語に翻訳された音声が聞こえる位置で、第１マイクロフォン１０に向かって第１の言語で話せばよく。対話者Ｂは、第１スピーカ１５から出力された第２の言語に翻訳された音声が聞こえる位置で、第２マイクロフォン２１に向かって第２の言語で話せばよい。このため、双方向の対話の音声翻訳を、１台で容易に行うことができる。 Since the first microphone 10, the second speaker 26, the second microphone 21, and the first speaker 15 are arranged in this way, the conversation person A is translated into the first language output from the second speaker 26. What is necessary is just to speak in a 1st language toward the 1st microphone 10 in the position which can hear an audio | voice. The conversation person B may speak in the second language toward the second microphone 21 at the position where the voice translated into the second language output from the first speaker 15 can be heard. For this reason, speech translation of interactive dialogue can be easily performed with one unit.

また、表示部１７は、第２指向性スピーカ部２４の第２スピーカ２６が音声を出力する方向に表示するので、対話者Ａは、第２スピーカ２６から出力された第１の言語に翻訳された音声が聞こえる位置にいれば、表示部１７に表示される第１翻訳状態通知部１６による通知を見て、音声翻訳の状態を認識することができる。すなわち、第１翻訳状態通知部１６による通知のうち、第２翻訳部２２が第２マイクロフォン２１から翻訳対象の信号を受け取り中であることの通知を認識することで、対話者Ａは、現在、対話相手の対話者Ｂが話している内容が翻訳されて、後に第２スピーカ２６から出力されるであろうことを認識できる。 In addition, since the display unit 17 displays the second speaker 26 of the second directional speaker unit 24 in a direction in which sound is output, the conversation person A is translated into the first language output from the second speaker 26. If the user is at a position where he can hear the voice, the notification by the first translation state notification unit 16 displayed on the display unit 17 can be seen to recognize the state of the speech translation. That is, among the notifications by the first translation state notification unit 16, the second translation unit 22 recognizes the notification that the signal to be translated is being received from the second microphone 21. It can be recognized that the content spoken by the conversation partner B is translated and output from the second speaker 26 later.

また、第１翻訳部１１が翻訳処理中であることの通知を認識することで、対話者Ａは、先に自分が話した内容が翻訳中であり、未だ対話相手の対話者Ｂに伝わっていないことを認識できる。また、第２スピーカ２６が翻訳結果の音声を出力中であることの通知を認識することで、対話者Ａは、先に自分が話した内容を、対話相手の対話者Ｂが聞いている最中であることを認識できる。これにより、本実施形態のように、指向性スピーカを用いているために翻訳結果が聞こえなくても、話し手は、音声翻訳装置１の動作状態と、対話相手に翻訳結果が通知されているか否かとがわかる。 In addition, by recognizing that the first translation unit 11 is in the process of translation, the conversation person A is translating the contents that he has spoken earlier and is still being communicated to the conversation person B, the conversation partner. I can recognize that there is no. Further, by recognizing that the second speaker 26 is outputting the voice of the translation result, the conversation person A can hear the content that the conversation partner B talks about first. Recognize that it is inside. Thus, as in this embodiment, even if the translation result cannot be heard because a directional speaker is used, the speaker can be notified of the translation state of the speech translation apparatus 1 and whether or not the translation result is notified to the conversation partner. I understand heels.

また、ＬＥＤ２８は、第１指向性スピーカ部１３の第１スピーカ１５が音声を出力する方向に表示する。このため、対話者Ｂは、第１スピーカ１５から出力された第２の言語に翻訳された音声が聞こえる位置にいれば、上述の対話者Ａと同様に音声翻訳の状態を認識することができる。
また、第１指向性スピーカ部１３、第２指向性スピーカ部２４は、翻訳結果の音声を指向性を持たせて出力するので、これらの出力した音声を、第１マイクロフォン１０、第２マイクロフォン２１が収音してしまい、収音した音声を再度翻訳するなどの誤動作を防ぐことができる。また、翻訳結果を対話相手以外の第３者に聞かれる可能性を小さくすることができる。また、話し手に翻訳結果が聞こえて、翻訳結果を気にしてしまい、話し手が話し難くなることを避けることができる。 Moreover, LED28 displays in the direction in which the 1st speaker 15 of the 1st directivity speaker part 13 outputs an audio | voice. For this reason, the conversation person B can recognize the state of the speech translation in the same manner as the above-described conversation person A if he / she is at a position where the voice translated into the second language output from the first speaker 15 can be heard. .
In addition, the first directional speaker unit 13 and the second directional speaker unit 24 output the voice of the translation result with directivity, so that these output voices are output to the first microphone 10 and the second microphone 21. It is possible to prevent malfunctions such as re-translating the collected sound. Further, it is possible to reduce the possibility of the translation result being heard by a third party other than the conversation partner. Moreover, it can be avoided that the speaker hears the translation result, cares about the translation result, and makes it difficult for the speaker to speak.

図２は、本実施形態における音声翻訳装置１の外観を示す概略外観図である。本実施形態において、音声翻訳装置１は、クラムシェルタイプの携帯電話端末に備えられている。図２（ａ）は、音声翻訳装置１を備えた携帯電話端末を開いた状態の内側の外観図であり、図２（ｂ）は、開いた状態の外側の外観図である。図２（ａ）に示すように、第２スピーカ２６、表示部１７、第１マイクロフォン１０は、同一の方向を向いており、さらに、この内側の面に操作部２０が設けられている。また、図２（ｂ）に示すように、ＬＥＤ２８、カメラ３１、第１スピーカ１５、第２マイクロフォン２１は、同一の方向を向いている。 FIG. 2 is a schematic external view showing the external appearance of the speech translation apparatus 1 in the present embodiment. In the present embodiment, the speech translation apparatus 1 is provided in a clamshell type mobile phone terminal. 2A is an external view of the inside of the mobile phone terminal including the speech translation device 1 in an opened state, and FIG. 2B is an external view of the outside in the open state. As shown in FIG. 2A, the second speaker 26, the display unit 17, and the first microphone 10 face the same direction, and the operation unit 20 is provided on the inner surface. Further, as shown in FIG. 2B, the LED 28, the camera 31, the first speaker 15, and the second microphone 21 face the same direction.

すなわち、第１マイクロフォン１０が最も効率良く収音可能な音源の方向と、第２指向性スピーカ部２４の第２スピーカ２６が音声を出力する方向とが略一致し、第２マイクロフォン２１が最も効率良く収音可能な音源の方向と、第１指向性スピーカ部１３の第１スピーカ１５が音声を出力する方向とが略一致している。なお、第１マイクロフォン１０および第２マイクロフォン２１の最も効率良く収音可能な音源の方向は、これらマイクロフォン各々に音波を誘導するために設けられた開口部の方向である。また、表示部１７は、第２スピーカ２６が音声を出力する方向に表示し、ＬＥＤ２８は、第１スピーカ１５が音声を出力する方向に表示する。さらに、第１スピーカ１５と、第２スピーカ２６との音声を出力する方向は逆方向である。
このような配置により、図２（ａ）に示す内側を音声翻訳装置１を保持している対話者Ａ自らに向け、図２（ｂ）に示す外側を対話相手である対話者Ｂに向けると、音声翻訳装置１を使用しやすくなっている。 That is, the direction of the sound source that can be collected most efficiently by the first microphone 10 substantially coincides with the direction in which the second speaker 26 of the second directional speaker unit 24 outputs sound, and the second microphone 21 is the most efficient. The direction of the sound source that can collect sound well and the direction in which the first speaker 15 of the first directional speaker unit 13 outputs sound substantially coincide. The direction of the sound source that can collect sound most efficiently by the first microphone 10 and the second microphone 21 is the direction of the opening provided to guide the sound wave to each of these microphones. The display unit 17 displays in a direction in which the second speaker 26 outputs sound, and the LED 28 displays in a direction in which the first speaker 15 outputs sound. Furthermore, the direction in which the sound is output from the first speaker 15 and the second speaker 26 is opposite.
With such an arrangement, when the inside shown in FIG. 2A is directed to the conversation person A holding the speech translation apparatus 1, the outside shown in FIG. 2B is directed to the conversation person B who is the conversation partner. The speech translation apparatus 1 is easy to use.

図３は、本実施形態における音声翻訳装置１の動作を説明するフローチャートである。対話者Ａが話す第１の言語から第２の言語へ翻訳する場合を例に採り、説明する。まず、第１指示検出部１９は、操作部２０のキーボタン、例えば「決定」のキーボタンが押されているかの検出処理、すなわち、翻訳開始指示の検出処理を行う（Ｓ１）。このとき、対話者Ａが操作部２０のキーボタンを押下しておらず、第１指示検出部１９は、翻訳開始指示なしと判定したときは（Ｓ２−Ｎｏ）、ステップＳ１に戻って、検出処理を繰り返す。 FIG. 3 is a flowchart for explaining the operation of the speech translation apparatus 1 according to this embodiment. A case where the conversation person A translates from the first language to the second language will be described as an example. First, the first instruction detection unit 19 performs a process of detecting whether a key button of the operation unit 20, for example, a “decision” key button is pressed, that is, a process of detecting a translation start instruction (S1). At this time, when the conversation person A does not press the key button of the operation unit 20 and the first instruction detection unit 19 determines that there is no instruction to start translation (S2-No), the process returns to step S1 to detect Repeat the process.

一方、対話者Ａが操作部２０のキーボタンを押下しており、ステップＳ２にて第１指示検出部１９が、翻訳開始指示有りと判定したときは（Ｓ２−Ｙｅｓ）、第１翻訳部１１は、第１マイクロフォン１０からの翻訳対象の信号の受け付け開始を、第２翻訳状態通知部２７にＬＥＤ２８を用いて通知させる（Ｓ３）とともに、第１マイクロフォン１０からの翻訳対象の音声信号（第１の言語の音声信号）の受け付けを開始する（Ｓ４）。第１翻訳部１１は、受け付けた音声信号を音声認識処理して、第１の言語の文字列に変換し、第１音声記憶部１２に記憶させる。第１翻訳部１１は、予め決められた時間より長い無音部分が続くなどを条件に会話の区切り目を検出すると、翻訳対象の音声信号の受け付けを停止し、さらに翻訳中であることを通知するように、第１翻訳状態通知部１６、および第２翻訳状態通知部２７に指示する（Ｓ５）。 On the other hand, when the conversation person A is pressing the key button of the operation unit 20 and the first instruction detection unit 19 determines that there is a translation start instruction in step S2 (S2-Yes), the first translation unit 11 Causes the second translation state notification unit 27 to notify the start of reception of the signal to be translated from the first microphone 10 by using the LED 28 (S3), and the speech signal to be translated from the first microphone 10 (first (S4) is started to be received (S4). The first translation unit 11 performs speech recognition processing on the received speech signal, converts it into a character string in the first language, and stores it in the first speech storage unit 12. When the first translation unit 11 detects a conversation break on the condition that a silent part longer than a predetermined time continues, the first translation unit 11 stops accepting the speech signal to be translated, and further notifies that translation is in progress. In this manner, the first translation state notification unit 16 and the second translation state notification unit 27 are instructed (S5).

次に、第１翻訳部１１は、第１音声記憶部１２に記憶させた第１の言語の文字列を読み出して、第２の言語への翻訳処理を行う（Ｓ６）。第１翻訳部１１は、翻訳処理が終了すると、翻訳結果の出力中であることを通知するように、第１翻訳状態通知部１６に指示する（Ｓ７）。次に、第１翻訳部１１は、翻訳結果の音声信号を、第１指向性スピーカ部１３から出力させる（Ｓ８）。 Next, the 1st translation part 11 reads the character string of the 1st language memorize | stored in the 1st audio | voice storage part 12, and performs the translation process to a 2nd language (S6). When the translation process ends, the first translation unit 11 instructs the first translation state notification unit 16 to notify that the translation result is being output (S7). Next, the 1st translation part 11 outputs the audio | voice signal of a translation result from the 1st directivity speaker part 13 (S8).

なお、自分が話したことの翻訳結果を確認したい場合などには、第１スピーカ１５が、第１翻訳部１１が出力した信号の音声に加えて、第２翻訳部２２が出力した信号の音声を出力し、第２スピーカ２６が、第２翻訳部２２が出力した信号の音声に加えて、第１翻訳部１１が出力した信号の音声を出力するようにしてもよい。 In addition, when it is desired to confirm the translation result of what he / she spoke, the first speaker 15 adds the voice of the signal output from the second translation unit 22 in addition to the voice of the signal output from the first translation unit 11. The second speaker 26 may output the sound of the signal output from the first translation unit 11 in addition to the sound of the signal output from the second translation unit 22.

また、第１翻訳部１１が、さらに第２翻訳部２２から、第２マイクロフォン２１が出力した信号を翻訳処理した結果の信号を受け、該信号が表す音声を第２の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を、第１指向性スピーカ部１３に出力し、第２翻訳部２２が、さらに第１翻訳部１１から、第１マイクロフォン１０が出力した信号を翻訳処理した結果の信号を受け、該信号が表す音声を第１の言語に変換した音声を表す信号を生成する翻訳処理を行い、該翻訳処理結果の信号を、第２指向性スピーカ部２４に出力するようにしてもよい。これにより、翻訳結果を再翻訳した結果を聞き、翻訳の妥当性を知ることができる。 In addition, the first translation unit 11 receives a signal resulting from translation processing of the signal output from the second microphone 21 from the second translation unit 22 and converts the voice represented by the signal into the second language. A translation process for generating a signal to be expressed is performed, a signal of the translation process result is output to the first directional speaker unit 13, and the second translation unit 22 is further output from the first translation unit 11 to the first microphone 10. A signal obtained as a result of translation processing of the received signal, a translation process is performed to generate a signal representing a voice converted from the voice represented by the signal into the first language, and the signal resulting from the translation process is sent to the second directional speaker. You may make it output to the part 24. FIG. As a result, the retranslation result of the translation result can be heard to know the validity of the translation.

また、本実施形態では、第２翻訳開始指示部２９は、カメラ３１が撮像した画像から、予め決められた形状または被写体の動作の検出を、翻訳開始の指示とするとして説明したが、第２マイクロフォン２１が出力する信号を受け、該信号が予め決められた条件（例えば、予め決められた言葉、周波数など）を満たしていることの検出を翻訳開始の指示とするようにしてもよい。
また、本実施形態において、第１翻訳部１１は、第１翻訳開始指示部１８による翻訳開始の指示を受けると、第１マイクロフォン１０からの信号の受付を開始し、続いて翻訳処理を行うとして説明したが、第１翻訳部１１は、第１マイクロフォン１０からの信号を受け付けて、音声信号あるいは音声認識により変換した文字列を、第１音声記憶部１２に記憶させておき、第１翻訳開始指示部１８による翻訳開始の指示を受けると、第１音声記憶部１２から音声信号あるいは文字列を読み出して、翻訳処理を開始するようにしてもよい。第２翻訳部２１についても、上述の第１翻訳部１１と同様に、第２翻訳開始指示部２９による翻訳開始の指示を受けると、第２音声記憶部２３から音声信号あるいは文字列を読み出して、翻訳処理を開始するようにしてもよい。 In the present embodiment, the second translation start instruction unit 29 has been described on the assumption that the detection of a predetermined shape or the motion of the subject from the image captured by the camera 31 is an instruction to start translation. A signal output from the microphone 21 may be received, and detection that the signal satisfies a predetermined condition (for example, a predetermined word, frequency, etc.) may be used as an instruction to start translation.
In the present embodiment, when the first translation unit 11 receives a translation start instruction from the first translation start instruction unit 18, the first translation unit 11 starts receiving a signal from the first microphone 10, and then performs a translation process. As described above, the first translation unit 11 receives a signal from the first microphone 10 and stores a speech signal or a character string converted by speech recognition in the first speech storage unit 12 to start the first translation. When receiving an instruction to start translation by the instruction unit 18, a speech signal or a character string may be read from the first speech storage unit 12 and the translation process may be started. Similarly to the first translation unit 11 described above, the second translation unit 21 reads a speech signal or a character string from the second speech storage unit 23 when receiving a translation start instruction from the second translation start instruction unit 29. The translation process may be started.

［第２の実施形態］
以下、図面を参照して、本発明の第２の実施形態について説明する。図４は、本実施形態による音声翻訳装置１ａの構成を示す概略ブロック図である。同図において図１の各部に対応する部分には同一の符号（１０〜１５、１８〜２６、２９〜３１）を付け、その説明を省略する。音声翻訳装置１ａは、第１マイクロフォン１０、第１翻訳部１１、第１音声記憶部１２、第１指向性スピーカ部１３、第１翻訳状態通知部１６ａ、第１翻訳開始指示部１８、第２マイクロフォン２１、第２翻訳部２２、第２音声記憶部２３、第２指向性スピーカ部２４、第２翻訳状態通知部２７ａ、第２翻訳開始指示部２９を具備する。 [Second Embodiment]
The second embodiment of the present invention will be described below with reference to the drawings. FIG. 4 is a schematic block diagram showing the configuration of the speech translation apparatus 1a according to this embodiment. In the figure, the same reference numerals (10 to 15, 18 to 26, 29 to 31) are assigned to portions corresponding to the respective portions in FIG. The speech translation apparatus 1a includes a first microphone 10, a first translation unit 11, a first speech storage unit 12, a first directional speaker unit 13, a first translation state notification unit 16a, a first translation start instruction unit 18, and a second A microphone 21, a second translation unit 22, a second voice storage unit 23, a second directional speaker unit 24, a second translation state notification unit 27a, and a second translation start instruction unit 29 are provided.

音声翻訳装置１ａは、図１の音声翻訳装置１とは、表示部１７およびＬＥＤ２８を備えず、第１翻訳状態通知部１６および第２翻訳状態通知部２７に替えて、第１翻訳状態通知部１６ａおよび第２翻訳状態通知部２７ａを具備する点が異なる。第１翻訳状態通知部１６ａは、第１翻訳部１１および第２翻訳部２２から通知されて、それらの動作状態を取得し、取得した状態を表す音の信号を生成して、第２指向性制御部２５に出力する。これにより、第１翻訳状態通知部１６ａは、取得した状態を表す音を、第２スピーカ２６に指向性を持たせて出力させる。また、第２翻訳状態通知部２７ａは、第１翻訳部１１および第２翻訳部２２から通知されて、それらの動作状態を取得し、取得した状態を表す音の信号を生成して、第１指向性制御部１４に出力する。これにより、第２翻訳状態通知部２７ａは、取得した状態を表す音を、第１スピーカ１５に指向性を持たせて出力させる。 The speech translation device 1a does not include the display unit 17 and the LED 28, unlike the speech translation device 1 of FIG. 1, and instead of the first translation state notification unit 16 and the second translation state notification unit 27, the first translation state notification unit 16a and the second translation state notification unit 27a are different. The first translation state notification unit 16a is notified from the first translation unit 11 and the second translation unit 22, acquires their operation states, generates a sound signal representing the acquired states, and generates second directivity. Output to the control unit 25. As a result, the first translation state notification unit 16a causes the second speaker 26 to output the sound representing the acquired state with directivity. In addition, the second translation state notification unit 27a is notified from the first translation unit 11 and the second translation unit 22, acquires the operation state thereof, generates a sound signal representing the acquired state, and generates the first signal. Output to the directivity control unit 14. Thereby, the second translation state notification unit 27a causes the first speaker 15 to output the sound representing the acquired state with directivity.

このように、第１翻訳部１１、第２翻訳部２２の状態を表す音を、第１スピーカ１５あるいは第２スピーカ２６から指向性を持たせて出力することで、第１の実施形態における音声翻訳装置１のように表示部１７およびＬＥＤ２８を備えていなくても、音声翻訳装置１ａを使用しているユーザは、これらの状態を把握することができる。
なお、第１翻訳部１１および第２翻訳部２２の動作状態を、第１の実施形態では第１翻訳状態通知部１６が表示部１７に表示させて対話者Ａに通知し、第２翻訳状態通知部２７がＬＥＤ２８を点灯させて対話者Ｂに通知し、第２の実施形態では第１翻訳状態通知部１６ａは第２スピーカ２６に音声を出力させて対話者Ａに通知し、第２翻訳状態通知部２７ａは第１スピーカ１５に音声を出力させて対話者Ｂに通知するとして説明した。しかし、第１翻訳状態通知部１６、１６ａによる通知は、第２指向性スピーカ部２４が音声を出力する方向に向かって行われればよく、第２翻訳状態通知部２７、２７ａによる通知は、第１指向性スピーカ部１３が音声を出力する方向に向かって行われればよく、通知の態様は、上述の形態に限定されない。 As described above, the sound representing the state of the first translation unit 11 and the second translation unit 22 is output with directivity from the first speaker 15 or the second speaker 26, so that the sound in the first embodiment is output. Even if the display unit 17 and the LED 28 are not provided as in the translation device 1, the user using the speech translation device 1a can grasp these states.
In the first embodiment, the first translation state notification unit 16 displays the operation state of the first translation unit 11 and the second translation unit 22 on the display unit 17 to notify the conversation person A, and the second translation state The notification unit 27 turns on the LED 28 to notify the conversation person B. In the second embodiment, the first translation state notification part 16a outputs the voice to the second speaker 26 to notify the conversation person A, and the second translation. It has been described that the state notification unit 27a outputs a sound to the first speaker 15 to notify the conversation person B. However, the notification by the first translation state notification units 16 and 16a only needs to be performed in the direction in which the second directional speaker unit 24 outputs sound, and the notification by the second translation state notification units 27 and 27a is The direction of notification is not limited to the above-described form, as long as it is performed in the direction in which the unidirectional speaker unit 13 outputs sound.

なお、この図１における第１翻訳部１１、第１指向性制御部１４、第１翻訳状態通知部１６、第１指示検出部１９、第２翻訳部２２、第２指向性制御部２５、第２翻訳状態通知部２７、第２指示検出部３０、および図４における第１翻訳状態通知部１６ａ、第２翻訳状態通知部２７ａは専用のハードウェアにより実現されるものであってもよく、また、これらの各部はメモリおよびＣＰＵ（中央演算装置）により構成され、各部の機能を実現するためのプログラムをメモリにロードして実行することによりその機能を実現させるものであってもよい。 1, the first translation unit 11, the first directivity control unit 14, the first translation state notification unit 16, the first instruction detection unit 19, the second translation unit 22, the second directivity control unit 25, the first The second translation status notification unit 27, the second instruction detection unit 30, and the first translation status notification unit 16a and the second translation status notification unit 27a in FIG. 4 may be realized by dedicated hardware. These units may be constituted by a memory and a CPU (central processing unit), and the functions may be realized by loading a program for realizing the functions of the units into the memory and executing the programs.

また、図１における第１翻訳部１１、第１指向性制御部１４、第１翻訳状態通知部１６、第１指示検出部１９、第２翻訳部２２、第２指向性制御部２５、第２翻訳状態通知部２７、第２指示検出部３０、および図４における第１翻訳状態通知部１６ａ、第２翻訳状態通知部２７ａの機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより各部の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。 In addition, the first translation unit 11, the first directivity control unit 14, the first translation state notification unit 16, the first instruction detection unit 19, the second translation unit 22, the second directivity control unit 25, the second in FIG. A program for realizing the functions of the translation state notification unit 27, the second instruction detection unit 30, and the first translation state notification unit 16a and the second translation state notification unit 27a in FIG. 4 is recorded on a computer-readable recording medium. Then, the program recorded in the recording medium may be read into the computer system and executed to execute the processing of each unit. Here, the “computer system” includes an OS and hardware such as peripheral devices.

また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case is also used to hold a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、この発明の実施形態を図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計変更等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design changes and the like within a scope not departing from the gist of the present invention.

本発明は、携帯電話端末、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔｓ；携帯情報端末）、電子辞書装置などの携帯情報機器に用いて好適であるが、これらに限定されない。 The present invention is suitable for use in portable information devices such as mobile phone terminals, PDAs (Personal Digital Assistants), and electronic dictionary devices, but is not limited thereto.

この発明の第１の実施形態による音声翻訳装置１の構成を示すブロック図である。It is a block diagram which shows the structure of the speech translation apparatus 1 by 1st Embodiment of this invention. 同実施形態における音声翻訳装置１の外観を示す概略外観図である。It is a schematic external view which shows the external appearance of the speech translation apparatus 1 in the embodiment. 同実施形態における音声翻訳装置１の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the speech translation apparatus 1 in the embodiment. この発明の第２の実施形態による音声翻訳装置１ａの構成を示すブロック図である。It is a block diagram which shows the structure of the speech translation apparatus 1a by 2nd Embodiment of this invention.

Explanation of symbols

１、１ａ…音声翻訳装置
１０…第１マイクロフォン
１１…第１翻訳部
１２…第１音声記憶部
１３…第１指向性スピーカ部
１４…第１指向性制御部
１５…第１スピーカ
１６、１６ａ…第１翻訳状態通知部
１７…表示部
１８…第１翻訳開始指示部
１９…第１指示検出部
２０…操作部
２１…第２マイクロフォン
２２…第２翻訳部
２３…第２音声記憶部
２４…第２指向性スピーカ部
２５…第２指向性制御部
２６…第２スピーカ
２７、２７ａ…第２翻訳状態通知部
２８…ＬＥＤ
２９…第２翻訳開始指示部
３０…第２指示検出部
３１…カメラ DESCRIPTION OF SYMBOLS 1, 1a ... Speech translation apparatus 10 ... 1st microphone 11 ... 1st translation part 12 ... 1st audio | voice storage part 13 ... 1st directivity speaker part 14 ... 1st directivity control part 15 ... 1st speaker 16, 16a ... First translation state notification unit 17 ... display unit 18 ... first translation start instruction unit 19 ... first instruction detection unit 20 ... operation unit 21 ... second microphone 22 ... second translation unit 23 ... second voice storage unit 24 ... first Two-directional speaker unit 25 ... second directivity control unit 26 ... second speaker 27, 27a ... second translation state notification unit 28 ... LED
29 ... Second translation start instruction unit 30 ... Second instruction detection unit 31 ... Camera

Claims

A first microphone that collects sound in a first language, converts the collected sound into a signal, and outputs the signal;
A first translation unit which receives a signal output from the first microphone, performs a translation process for generating a signal representing a voice obtained by converting the voice represented by the signal into a second language, and outputs a signal of the translation process result; ,
A first speaker unit that outputs a sound of a signal output by the first translation unit;
A second microphone that collects sound in a second language, converts the collected sound into a signal, and outputs the signal;
A second translation unit that receives a signal output from the second microphone, performs a translation process for generating a signal representing a voice obtained by converting the voice represented by the signal into a first language, and outputs a signal of the translation process result; ,
A second speaker unit that outputs the sound of the signal output by the second translation unit;
The direction of the sound source that the first microphone can collect sound most efficiently coincides with the direction in which the second speaker unit outputs sound,
A speech translation apparatus, wherein a direction of a sound source that can be most efficiently picked up by the second microphone and a direction in which the first speaker unit outputs a sound substantially coincide with each other.

The speech translation apparatus according to claim 1, wherein the first speaker unit and the second speaker unit are directional speakers that output speech with directivity.

A first translation state notifying unit for notifying that the second speaker unit is receiving a signal to be translated from the second microphone in a direction in which the second speaker unit outputs sound;
A second translation state notifying unit for notifying that the first speaker unit is receiving a signal to be translated from the first microphone in a direction in which the first speaker unit outputs a voice; The speech translation apparatus according to claim 2.

A first translation state notifying unit for notifying that the first translation unit is performing a translation process in a direction in which the second speaker unit outputs audio;
The voice according to claim 2, further comprising: a second translation state notifying unit that notifies that the second speaker is performing a translation process in a direction in which the first speaker unit outputs the voice. Translation device.

A first translation state notifying unit for notifying that the second speaker unit is outputting a voice that the first translation unit is outputting a speech signal of the translation processing result;
A second translation state notifying unit for notifying that the second speaker is outputting a speech signal as a result of translation processing in a direction in which the first speaker unit outputs the sound. The speech translation apparatus according to claim 2.

The first speaker unit outputs the audio of the signal output by the second translation unit in addition to the audio of the signal output by the first translation unit,
3. The speech translation apparatus according to claim 2, wherein the second speaker unit outputs the speech of the signal output from the first translation unit in addition to the speech of the signal output from the second translation unit. .

The first translation unit further receives a signal obtained as a result of translation processing by the second translation unit, performs a translation process for generating a signal representing a voice obtained by converting the voice represented by the signal into a second language, Output the processing result signal,
The second translation unit further receives a signal obtained as a result of the translation processing by the first translation unit, performs a translation process for generating a signal representing a voice obtained by converting the voice represented by the signal into a first language, and The speech translation apparatus according to claim 2, wherein a signal of a processing result is output.

A first translation start instruction unit for receiving an instruction to start translation processing by the first translation unit;
A second translation start instruction unit for receiving an instruction to start translation processing by the second translation unit;
When the first translation start instruction unit receives an instruction to start translation, the first translation unit starts a translation process of receiving a signal output from the first microphone and converting it into a first language;
When the second translation start instruction unit receives an instruction to start translation, the second translation unit starts a translation process of receiving a signal output from the first microphone and converting it into a first language. The speech translation apparatus according to claim 1.

The said 2nd translation start instruction | indication part is provided with the imaging part which images the output direction of the said 1st speaker part, and detects the said instruction | indication of the said translation start from the imaging result by this imaging part. Speech translation device.

The speech translation apparatus according to claim 8, wherein the second translation start instruction unit receives a signal output from the second microphone and detects the instruction to start translation from the signal.

In a speech translation method in a speech translation device,
A first process in which a first microphone of the speech translation apparatus collects speech in a first language, converts the collected speech into a signal, and outputs the signal;
The speech translation apparatus receives the signal output in the first process, performs a translation process for generating a signal representing the speech converted from the speech represented by the signal into a second language, and outputs a signal of the translation process result A second process of outputting
The speech translation device outputs the speech of the signal output in the second process in a direction substantially coincident with the direction of the sound source that the second microphone of the speech translation device can collect most efficiently. Process,
A fourth step in which the second microphone of the speech translation apparatus collects the speech in the second language, converts the collected speech into a signal, and outputs the signal;
The speech translation apparatus receives the signal output in the fourth process, performs a translation process for generating a signal representing the speech converted from the speech represented by the signal into the first language, and outputs a signal of the translation process result A fifth process of outputting
A sixth process in which the speech translation apparatus outputs the speech of the signal output in the fifth process in a direction substantially coincident with the direction of the sound source that the first microphone can collect most efficiently. A speech translation method characterized by the above.