JP2019191490A

JP2019191490A - Voice interaction terminal, and voice interaction terminal control method

Info

Publication number: JP2019191490A
Application number: JP2018086985A
Authority: JP
Inventors: 隆信向出; Takanobu Mukaide
Original assignee: Toshiba Visual Solutions Corp
Current assignee: Toshiba Visual Solutions Corp
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2019-10-31

Abstract

To provide a voice interactive terminal and a voice interactive terminal control method in which a user can adjust the sensitivity with which a voice interactive terminal recognizes a specific word.SOLUTION: An embodiment of the present invention is a voice interactive terminal which includes a microphone, a communication control unit which controls communication with a network, an output processing unit which outputs input voice from the microphone to the network via the communication control unit, a detection unit which detects a specific word, which is a predetermined word, among the input voices from the microphone, and a setting unit which sets a sensitivity setting value, which is a sensitivity value input from the network via the communication control unit for the detection unit to detect the specific word, in the detection unit.SELECTED DRAWING: Figure 1

Description

本実施形態は、音声対話端末および音声対話端末の制御方法に関するものである。 The present embodiment relates to a voice interactive terminal and a method for controlling the voice interactive terminal.

ユーザが発した音声を例えばマイクで収集し、この収集した音声を音声認識処理により解析することでユーザが発した内容を判別し、この判別した内容に応じた応答をユーザに提供する音声対話システムがある。この音声対話システムは、音声対話サービスの部分と音声対話端末の部分の大きく２つの部分を含む。 A voice dialogue system that collects voices uttered by a user, for example, with a microphone, discriminates the contents uttered by the user by analyzing the collected voices by voice recognition processing, and provides the user with a response according to the determined contents There is. This voice interaction system includes two main parts: a voice dialog service part and a voice dialog terminal part.

音声対話サービスの部分は、ユーザが発した内容を入力として、この入力された内容を音声認識処理により解析し、この解析結果に応じた応答をユーザに提供する機能を持つ。 The portion of the voice interaction service has a function of receiving the content issued by the user as input, analyzing the input content by speech recognition processing, and providing a response according to the analysis result to the user.

音声対話端末の部分は、マイクで収集したユーザの発話内容を、音声データとして音声対話サービスに入力したり、音声対話サービスが出力する応答の内容を、ユーザに対して音声として出力したり周辺機器の制御を行ったりする機能を持つ。 The voice dialog terminal part inputs the user's utterance contents collected by the microphone as voice data to the voice dialog service, and outputs the response contents output by the voice dialog service to the user as a voice. It has a function to control.

さらに、音声対話サービスの部分は、ユーザが発した内容の入力に対応した応答の提供ではなく、音声対話サービスが自発的に音声対話端末に情報を提供する場合もある。 Furthermore, the voice dialogue service part may not provide a response corresponding to the input of the content issued by the user, but the voice dialogue service may voluntarily provide information to the voice dialogue terminal.

特開２０１７−５３７３６１号公報JP 2017-537361 A

音声対話端末は、ユーザの発話内容をマイクで収集する際に、予め決められた特定ワードを認識することで、以降のユーザの発話内容の収集を行う。 The voice interactive terminal collects the user's utterance contents thereafter by recognizing a predetermined specific word when the user's utterance contents are collected by the microphone.

音声対話端末が置かれている場所は、ユーザが特定ワードを発話する以外に、日常会話やテレビジョン等のＡＶ機器の音等の室内雑音が発生している場合が一般的である。このため音声対話端末が、これらの室内雑音をユーザが特定ワードを発したと誤認識してしまう場合がある。このため音声対話端末が置かれている場所の室内雑音の状況に応じて、音声対話システムを使うユーザが、音声対話端末が特定ワードを認識する感度を調整できることが望ましい。 In general, the place where the voice interaction terminal is placed is that there is room noise such as daily conversation or sound of AV equipment such as television, in addition to the user speaking a specific word. For this reason, the voice interaction terminal may misrecognize that the room noise has been generated by the user. For this reason, it is desirable that the user using the voice interaction system can adjust the sensitivity with which the voice interaction terminal recognizes the specific word in accordance with the state of room noise at the place where the voice interaction terminal is placed.

しかし現状の音声対話システムでは、音声対話端末が特定ワードを認識する感度を、ユーザが調整する仕組みがない、という課題があった。 However, the current voice dialogue system has a problem that the user has no mechanism for adjusting the sensitivity with which the voice dialogue terminal recognizes a specific word.

そこで本発明の本実施形態では、音声対話端末が特定ワードを認識する感度を、ユーザが調整することが可能な音声対話端末および音声対話端末制御方法を提供することを目的とする。 Therefore, an object of the present embodiment of the present invention is to provide a voice interaction terminal and a voice interaction terminal control method in which the user can adjust the sensitivity with which the voice interaction terminal recognizes a specific word.

マイクと、
ネットワークとの通信を制御する通信制御部と、
前記マイクから入力された音声を、前記通信制御部を介して前記ネットワークに出力する出力処理部と、
前記マイクから入力された前記音声のうち、予め定められたワードである特定ワードを検出する検出部と、
前記通信制御部を介して前記ネットワークより入力された、前記検出部が前記特定ワードを検出するための感度の値である感度設定値を、前記検出部に設定する設定部と、
を具備する音声対話端末である。 With a microphone,
A communication control unit for controlling communication with the network;
An output processing unit that outputs audio input from the microphone to the network via the communication control unit;
A detection unit that detects a specific word that is a predetermined word among the voices input from the microphone;
A setting unit configured to set a sensitivity setting value, which is a sensitivity value for the detection unit to detect the specific word, input from the network via the communication control unit to the detection unit;
Is a voice interactive terminal.

図１は、本発明の実施形態が適用された音声対話端末１を含む音声対話システムの概要を示す図である。FIG. 1 is a diagram showing an outline of a voice dialogue system including a voice dialogue terminal 1 to which an embodiment of the present invention is applied. 図２は、図１に示した音声対話端末１の構成図である。FIG. 2 is a block diagram of the voice interactive terminal 1 shown in FIG. 図３は、第１の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れを示す図である。FIG. 3 is a diagram illustrating an overall configuration diagram and a flow of data related to sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the first embodiment. 図４は、携帯端末２０あるいは携帯端末２１にインストールされている音声対話端末１を制御するための感度設定アプリの、感度設定値を設定する設定画面の例である。FIG. 4 is an example of a setting screen for setting a sensitivity setting value of a sensitivity setting application for controlling the voice interactive terminal 1 installed in the mobile terminal 20 or the mobile terminal 21. 図５Ａは、第１の実施形態における音声対話端末１のトリガワードの検出感度の感度設定値を、ユーザが携帯端末２０あるいは携帯端末２１にインストールされているアプリケーションを操作して変更する場合の、音声対話端末１の処理フローである。FIG. 5A shows the case where the user changes the sensitivity setting value of the trigger word detection sensitivity of the voice interactive terminal 1 in the first embodiment by operating an application installed in the mobile terminal 20 or the mobile terminal 21. It is a processing flow of the voice interactive terminal 1. FIG. 図５Ｂは、第１の実施形態における音声対話端末１のトリガワードの検出感度の感度設定値を、ユーザが携帯端末２０あるいは携帯端末２１にインストールされているアプリケーションを操作して変更する場合の、携帯端末２０あるいは携帯端末２１の処理フローである。FIG. 5B shows a case where the sensitivity setting value of the trigger word detection sensitivity of the voice interactive terminal 1 in the first embodiment is changed by the user operating an application installed on the mobile terminal 20 or the mobile terminal 21. It is a processing flow of the portable terminal 20 or the portable terminal 21. 図６は、第２の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れの例を示す図である。FIG. 6 is a diagram illustrating an example of the overall configuration diagram and the flow of data related to sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the second embodiment. 図７は、携帯端末２０あるいは携帯端末２１にインストールされている音声対話端末１を制御するためのアプリケーションにおける、周辺雑音の測定の開始および終了の設定画面の例である。FIG. 7 is an example of a setting screen for starting and ending measurement of ambient noise in an application for controlling the voice interactive terminal 1 installed in the mobile terminal 20 or the mobile terminal 21. 図８Ａは、第２の実施形態における音声対話端末が、携帯端末２１で設定した期間収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データにより算出した感度設定値を用いて、感度設定値を更新する場合の、音声対話端末１の処理フローである。FIG. 8A shows the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2 when the voice dialogue terminal in the second embodiment transmits the surrounding voice collected for the period set by the portable terminal 21 to the voice dialogue service 2. 7 is a processing flow of the voice interactive terminal 1 when the sensitivity setting value is updated using the sensitivity setting value calculated by the above. 図８Ｂは、第２の実施形態における音声対話端末が、携帯端末２１で設定した期間収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データにより算出した感度設定値を用いて、感度設定値を更新する場合の、携帯端末２１の処理フローである。FIG. 8B shows the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2 when the voice dialogue terminal in the second embodiment transmits the surrounding voices collected for the period set by the portable terminal 21 to the voice dialogue service 2. It is a processing flow of the portable terminal 21 when a sensitivity setting value is updated using the sensitivity setting value calculated by the above. 図８Ｃは、第２の実施形態における音声対話端末が、携帯端末２１で設定した期間収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データにより算出した感度設定値を用いて、感度設定値を更新する場合の、検出感度算出部２−４の処理フローである。FIG. 8C shows the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2 when the voice dialogue terminal in the second embodiment transmits the surrounding voices collected for the period set by the portable terminal 21 to the voice dialogue service 2. 4 is a processing flow of the detection sensitivity calculation unit 2-4 when the sensitivity setting value is updated using the sensitivity setting value calculated by the above. 図９は、第３の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れの例を示す図である。FIG. 9 is a diagram illustrating an example of the entire configuration diagram and the flow of data related to sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the third embodiment. 図１０Ａは、第３の実施形態における音声対話端末が、ユーザの発話の内容により設定した期間において収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データを用いて算出した感度設定値を用いて、感度設定値を更新する場合の、音声対話端末１の処理フローである。FIG. 10A shows that the voice conversation terminal according to the third embodiment transmits the surrounding voice collected during the period set according to the content of the user's utterance to the voice dialogue service 2, and the voice dialogue terminal 1 collects the voice dialogue service 2. It is a processing flow of the voice interactive terminal 1 when the sensitivity setting value is updated using the sensitivity setting value calculated using the voice data. 図１０Ｂは、第３の実施形態における音声対話端末が、ユーザの発話の内容により設定した期間において収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データを用いて算出した感度設定値を用いて、感度設定値を更新する場合の、検出感度算出部２−４の処理フローである。FIG. 10B shows that the voice conversation terminal according to the third embodiment transmits the surrounding voice collected during the period set according to the content of the user's utterance to the voice dialogue service 2, and the voice dialogue terminal 1 collects the voice dialogue service 2. It is a processing flow of the detection sensitivity calculation part 2-4 in the case of updating a sensitivity setting value using the sensitivity setting value calculated using the audio | voice data performed. 図１１は、第４の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れの例を示す図である。FIG. 11 is a diagram illustrating an example of an overall configuration diagram and a data flow regarding sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the fourth embodiment. 図１２は、第４の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れの他の例を示す図である。FIG. 12 is a diagram illustrating another example of the overall configuration diagram and the data flow related to sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the fourth embodiment.

以下、本発明の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施形態が適用された音声対話端末１を含む音声対話システムの概要を示す図である。本音声対話システムは、例えば家屋４に配置された音声対話端末１とクラウド上に存在するサーバ９に配置された音声対話サービス２とからなる、音声対話端末１と音声対話サービス２は、家屋４内に配置されたホームゲートウェイ５を経由してネットワーク３を介して互いに通信を行うことが可能である。 FIG. 1 is a diagram showing an outline of a voice dialogue system including a voice dialogue terminal 1 to which an embodiment of the present invention is applied. The voice dialogue system includes, for example, a voice dialogue terminal 1 arranged in a house 4 and a voice dialogue service 2 arranged in a server 9 existing on the cloud. It is possible to communicate with each other via the network 3 via the home gateway 5 arranged inside.

音声対話端末１は、またＢｌｕｅｔｏｏｔｈ（登録商標）、ＺｉｇＢｅｅ（登録商標）、Ｗｉ−Ｆｉ等の近距離無線通信システム７を介して、家屋４の中に設置されている照明１０やエア・コンディショナー（エアコン）１１、録画再生機器１２と通信を行うことが可能である。また音声対話端末１は、赤外線通信のようなペアリングを必要としない通信方式を介して周辺機器を制御することも可能である。また音声対話端末１は、ここに示した電子機器以外の電子機器とも通信を行うことも可能である。また、一例として近距離無線通信システムを用いるものとして説明するが必ずしも近距離無線通信システムである必要はなく、通常の無線システム或いは有線通信システムを用いてもよい。 The voice interactive terminal 1 is also connected to a lighting 10 or an air conditioner (installed in the house 4) via a short-range wireless communication system 7 such as Bluetooth (registered trademark), ZigBee (registered trademark), or Wi-Fi. It is possible to communicate with the air conditioner 11 and the recording / playback device 12. The voice interactive terminal 1 can also control peripheral devices via a communication method that does not require pairing, such as infrared communication. The voice interactive terminal 1 can also communicate with electronic devices other than the electronic devices shown here. In addition, although a short-range wireless communication system is described as an example, the short-range wireless communication system is not necessarily used, and a normal wireless system or a wired communication system may be used.

音声対話端末１は、Ｂｌｕｅｔｏｏｔｈ、ＺｉｇＢｅｅ、Ｗｉ−Ｆｉ等の近距離無線通信システム７を介して携帯端末２０により、音声対話端末１の機能設定等種々の制御をすることが可能である。また音声対話端末１は、音声対話サービス２を含むサーバ９を経由してネットワーク３を介して携帯端末２１により、音声対話端末１の機能設定等種々の制御をすることが可能である。携帯端末２０および携帯端末２１には、音声対話端末１を制御するためのアプリケーション（感度設定アプリとも呼ぶ）がインストールされている。 The voice interactive terminal 1 can perform various controls such as function setting of the voice interactive terminal 1 by the portable terminal 20 via the short-range wireless communication system 7 such as Bluetooth, ZigBee, Wi-Fi or the like. The voice interactive terminal 1 can perform various controls such as function setting of the voice interactive terminal 1 by the mobile terminal 21 via the network 3 via the server 9 including the voice interactive service 2. An application (also referred to as a sensitivity setting application) for controlling the voice interactive terminal 1 is installed in the mobile terminal 20 and the mobile terminal 21.

音声対話サービス２は、例えば音声対話サービスＡ２−１（図示しない）と音声対話サービスＢ２−２（図示しない）の２つの音声対話サービスを含んでもよい。この場合音声対話サービスＡ２−１と音声対話サービスＢ２−２のいずれを用いるかは、ユーザが発する特定ワードによって決定される。音声対話サービス２は、３つ以上の音声対話サービスを含んでもよい。 The voice dialogue service 2 may include two voice dialogue services, for example, a voice dialogue service A2-1 (not shown) and a voice dialogue service B2-2 (not shown). In this case, which one of the voice conversation service A2-1 and the voice conversation service B2-2 is used is determined by a specific word issued by the user. The voice interaction service 2 may include three or more voice interaction services.

音声対話端末１は、ユーザが発した言葉を備え付けのマイク１−１で収集し、特定ワードと認識すると、以降のユーザの発話内容を継続してマイク１−１を通して収集し、この収集したユーザの発話内容の音声データを、ネットワーク３を介して音声対話サービス２に送る。 The voice interactive terminal 1 collects the words uttered by the user with the provided microphone 1-1, and when it recognizes it as a specific word, it continuously collects the utterance contents of the subsequent users through the microphone 1-1. Is sent to the voice conversation service 2 via the network 3.

音声対話サービス２は、音声認識部２−１、対話処理部２−２、音声合成部２−３、検出感度算出部２−４からなる。 The voice dialogue service 2 includes a voice recognition unit 2-1, a dialogue processing unit 2-2, a voice synthesis unit 2-3, and a detection sensitivity calculation unit 2-4.

音声対話端末１から送られてきた音声データを受信した音声対話サービス２は、受信した音声データの音声認識部２−１で解析を行い、この解析した内容に応じた応答を対話処理部２−２で生成する。 The voice dialogue service 2 that has received the voice data sent from the voice dialogue terminal 1 analyzes the received voice data by the voice recognition unit 2-1, and sends a response corresponding to the analyzed contents to the dialogue processing unit 2-. 2 to generate.

音声対話サービス２が生成する応答内容は、音声による応答とコマンドによる応答との２種類の応答を含む。いずれの種類の応答を生成するかは、音声対話端末１から送られてきた音声データを音声認識部２−１で解析した結果による。音声による応答とコマンドによる応答に関しては、以降で詳細に説明する。 The response content generated by the voice interaction service 2 includes two types of responses: a response by voice and a response by command. Which type of response is generated depends on the result of the voice recognition unit 2-1 analyzing the voice data sent from the voice interactive terminal 1. The voice response and the command response will be described in detail later.

応答を生成すると、この生成した応答の内容によりを音声合成部２−３で音声データに変換し、この変換した音声データを、ネットワーク３を介して音声対話端末１に送信する。 When the response is generated, the content of the generated response is converted into voice data by the voice synthesizer 2-3, and the converted voice data is transmitted to the voice interactive terminal 1 via the network 3.

特定ワードとは、ユーザが発話する、音声対話端末１との対話処理を開始するための所定のキーワードのことであり、トリガワードあるいはウェイク表現あるいはボイストリガあるいは音声コマンドあるいは起動コマンドと呼んでもよい。以降の説明では、特定ワードをトリガワードと呼んで説明する。 The specific word is a predetermined keyword for starting a dialogue process with the voice dialogue terminal 1 uttered by the user, and may be called a trigger word, a wake expression, a voice trigger, a voice command, or an activation command. In the following description, the specific word is referred to as a trigger word.

音声対話端末１は、マイク１−１からユーザの発話内容を、図２に示すトリガワード検出処理部２０３によりトリガワードであると認識すると、トリガワード以降のユーザの発話内容を、音声対話端末１に対して話しかけられているものとして、ユーザの発話内容を継続して収集する。また音声対話端末１は、マイク１−１から一定時間ユーザの発話の入力がない場合、ユーザの発話が終了したと判断し、再びトリガワードが入力されるのを待つトリガワード入力待ち状態に遷移する。 When the voice conversation terminal 1 recognizes the user's utterance content from the microphone 1-1 as the trigger word by the trigger word detection processing unit 203 shown in FIG. 2, the voice conversation terminal 1 The user's utterance contents are continuously collected as being spoken to. Further, when there is no input of the user's utterance from the microphone 1-1 for a certain period of time, the voice interactive terminal 1 determines that the user's utterance has ended, and transitions to a trigger word input waiting state waiting for the trigger word to be input again. To do.

トリガワードは、予め決めれたものが登録されてあってもよいし、音声対話端末１の初期設定時にユーザにより任意に登録されてもよい。 A predetermined trigger word may be registered, or may be arbitrarily registered by the user at the time of initial setting of the voice interaction terminal 1.

音声対話サービス２の対話処理部２−２が生成する応答は、音声による応答とコマンドによる応答の２種類の応答を含む。 The response generated by the dialogue processing unit 2-2 of the voice dialogue service 2 includes two types of responses, a voice response and a command response.

音声による応答は、音声対話端末１から入力される音声データに応じて音声対話サービス２が、音声で生成する応答である。 The response by voice is a response generated by the voice dialogue service 2 by voice according to the voice data input from the voice dialogue terminal 1.

コマンドによる応答は、音声対話端末１から入力される音声データに応じて、音声対話サービス２が、制御コマンドで生成する応答である。制御コマンドは、音声対話端末１がもつ電子機器（デバイス）あるいは音声対話端末１と近距離無線通信システム等を介して接続されている周辺機器（周辺デバイス）を制御するコマンドである。音声対話端末１がもつ電子機器（デバイス）は、例えば付属するカメラである。音声対話端末１と近距離無線通信システム等で接続されている周辺機器（周辺デバイス）は、例えば照明１０やエア・コンディショナー（エアコン）１１である。 The response by the command is a response generated by the voice interaction service 2 using the control command in accordance with the voice data input from the voice interaction terminal 1. The control command is a command for controlling an electronic device (device) included in the voice interaction terminal 1 or a peripheral device (peripheral device) connected to the voice interaction terminal 1 via a short-range wireless communication system or the like. The electronic device (device) included in the voice interactive terminal 1 is an attached camera, for example. Peripheral devices (peripheral devices) connected to the voice interactive terminal 1 via a short-range wireless communication system or the like are, for example, a lighting 10 or an air conditioner (air conditioner) 11.

音声データによる応答の応答内容は、ユーザが音声対話端末１に対して発話した例えば「おはよう」に対する「おはようございます。今日は元気ですか？」のようにユーザが発話した内容に対応した返事である。また例えば「今から新幹線に乗って大阪に行くと何時に着くかな？」という質問に対する「今から３０分後に出発すれば、夜の８時までに大阪駅に到着します」のようにユーザの問い掛けに対応した回答である。 The response content of the response by the voice data is a response corresponding to the content spoken by the user such as “Good morning, how are you today?” is there. In addition, for example, the user's question is “When do you get to Osaka by taking the Shinkansen?” “If you leave 30 minutes from now, you will arrive at Osaka Station by 8 pm” The answer corresponds to the question.

音声対話サービス２から応答を受け取った音声対話端末１は、その応答が音声データによる応答の場合は、その応答の内容を、例えば備え付けのスピーカ１−２より音声として出力することができる。これによりユーザは、自らの発話に対する音声対話システムの応答を聞くことができる。 The voice interactive terminal 1 that has received a response from the voice interactive service 2 can output the content of the response as, for example, a voice from the provided speaker 1-2 when the response is a response by voice data. Thereby, the user can hear the response of the voice dialogue system to his / her utterance.

コマンドによる応答の応答内容は、ユーザが音声対話端末１に対して発話した例えば「エアコンつけて」に対する「デバイス＝エアコン１０、操作＝ＯＮ、モード＝冷房、設定＝温度２６度、風量最大」の内容のコマンドである。また例えば「ちょっと電気つけて」に対する「デバイス＝照明１０、操作＝ＯＮ」の内容のコマンドである。 The response content of the response by the command is “device = air conditioner 10, operation = ON, mode = cooling, setting = temperature 26 degrees, air volume maximum” for the user who spoke to the voice interactive terminal 1, for example “turn on air conditioner”. It is a content command. Further, for example, a command of “device = lighting 10, operation = ON” for “slightly turn on”.

音声対話サービス２から応答を受け取った音声対話端末１は、その応答がコマンドによる応答の場合は、コマンドに含まれている制御対象のデバイスの制御を行う。例えばコマンドの内容が「デバイス＝エアコン１０、操作＝ＯＮ、モード＝冷房、設定＝温度２６度、風量最大」の場合、音声対話端末１は、エアコン１１を温度２６度、風量最大の設定で起動するように、内部に持つＷｉ-Ｆｉ、ＺｉｇＢｅｅ、Ｂｌｕｅｔｏｏｔｈ等の近距離無線通信システムを介して制御する。 The voice interactive terminal 1 that has received a response from the voice interactive service 2 controls the device to be controlled included in the command if the response is a command response. For example, when the content of the command is “device = air conditioner 10, operation = ON, mode = cooling, setting = temperature 26 degrees, air volume maximum”, the voice interaction terminal 1 starts the air conditioner 11 with the temperature 26 degrees and air volume maximum. As described above, control is performed via a short-range wireless communication system such as Wi-Fi, ZigBee, or Bluetooth.

コマンドによる応答の内容は、ユーザが音声対話端末１に対して発した例えば「ＡＡＡ動画サービスのＢＢＢコンテンツを再生して」に対する「ｐｌａｙｆｒｏｍｗｗｗ．ｘｘｘｘｘｘ．ｃｏ．ｊｐ／ｍｕｓｉｃＢＢＢ．ｗａｖ」のように、コマンドの部分である「ｐｌａｙ」とユーザの発話の内容をもとにテキストデータに変換した部分である「ｗｗｗ．ｘｘｘｘｘｘ．ｃｏ．ｊｐ／ｍｕｓｉｃＢＢＢ．ｗａｖ」から構成される場合もある。 The content of the response by the command is, for example, “play from www.xxxx.co.jp/musicBBB.wav” for “playing back the BBB content of AAA video service” issued to the voice interactive terminal 1 by the user. , “Play” that is a command part and “www.xxxxxxxx.co.jp/musicBBB.wav” that is a part converted to text data based on the content of the user's utterance.

音声対話サービス２から応答を受け取った音声対話端末１は、その応答がテキストデータを含むコマンドによる応答の場合は、コマンドの解釈に加えてテキストデータ部分の解釈も行い、制御対象のデバイスの制御を行う。例えばコマンドの内容が「ｐｌａｙｆｒｏｍｗｗｗ．ｘｘｘｘｘｘ．ｃｏ．ｊｐ／ｍｕｓｉｃＢＢＢ．ｗａｖ」の場合、音声対話端末１は、ｗｗｗ．ｘｘｘｘｘｘ．ｃｏ．ｊｐ／ｍｕｓｉｃＢＢＢ．ｗａｖのデータを取得して、この取得したデータを音声対話端末１内で再生してもよい。 The voice interactive terminal 1 that has received a response from the voice interactive service 2 interprets the text data portion in addition to interpreting the command when the response is a response by a command including text data, and controls the device to be controlled. Do. For example, when the content of the command is “play from www.xxxxxxxx.co.jp/musicBBB.wav”, the voice interactive terminal 1 uses www. xxxxxxxx. co. jp / musicBBB. The wav data may be acquired and the acquired data may be reproduced in the voice interactive terminal 1.

このように音声対話サービス２は、ユーザとの対話に基づく情報の提供を行うことができる。 As described above, the voice dialogue service 2 can provide information based on the dialogue with the user.

また音声対話サービス２は、音声対話端末１からの音声データの入力がない場合でも、自発的に音声対話端末１に情報を提供してもよい。 Further, the voice conversation service 2 may voluntarily provide information to the voice conversation terminal 1 even when no voice data is input from the voice conversation terminal 1.

音声対話サービス２が自発的に提供する情報は、例えばユーザの近所のバス停へのバスの接近情報であったり、ユーザの居住地域への雨雲の接近情報であったりと、ユーザ個人のニーズに対応した情報であってもよいし、また例えば緊急地震速報や津波警報のように公共性の高い情報であってもよい。 The information provided spontaneously by the voice dialogue service 2 corresponds to the user's individual needs, for example, information on the approach of the bus to the bus stop in the vicinity of the user or the approach information of the rain cloud to the user's living area. It may also be information that is highly public, such as emergency earthquake warnings and tsunami warnings.

図２は、図１に示した音声対話端末１の構成図である。 FIG. 2 is a block diagram of the voice interactive terminal 1 shown in FIG.

音声対話端末１は、ユーザが発した発話を収集するマイク１−１、収集した発話内容をノイズキャンセル等音響処理を行う音響処理部２０１、収集された発話内容からトリガワードを検出するトリガワード検出処理部２０３、トリガワード検出処理部２０３でトリガワードを検出したら、以降のユーザの発話内容を音声サービス２に送信するための処理を行う音声データ出力処理部２０２、出力する音声データを音声対話サービス２とやり取りする通信制御部２０４を含む。さらに音声対話端末１は、トリガワード検出処理部２０３でトリガワードを検出する際の検出感度を設定するトリガワード検出感度設定処理部２０５を含む。 The voice interactive terminal 1 includes a microphone 1-1 that collects utterances uttered by a user, an acoustic processing unit 201 that performs acoustic processing such as noise cancellation on the collected utterance contents, and trigger word detection that detects a trigger word from the collected utterance contents. When the processing unit 203 and the trigger word detection processing unit 203 detect the trigger word, the voice data output processing unit 202 performs processing for transmitting the user's subsequent utterance content to the voice service 2, and the voice data to be output is the voice conversation service. 2 is included. Furthermore, the voice interactive terminal 1 includes a trigger word detection sensitivity setting processing unit 205 that sets detection sensitivity when the trigger word detection processing unit 203 detects a trigger word.

検出感度は、マイク１−１から入力された音声データとトリガワードとの類似度を示すものである。トリガワード検出処理部２０３は、マイク１−１から入力された音声データとトリガワードとの類似度が、予め設定した閾値を超えた場合に、マイク１−１から入力された音声データは、トリガワードであると判断する。 The detection sensitivity indicates the similarity between the voice data input from the microphone 1-1 and the trigger word. The trigger word detection processing unit 203, when the similarity between the voice data input from the microphone 1-1 and the trigger word exceeds a preset threshold, the voice data input from the microphone 1-1 Judged to be a word.

トリガワード検出処理部２０３に設定されている閾値は、トリガワード検出感度設定処理部２０５が設定する感度設定値により更新される。 The threshold value set in the trigger word detection processing unit 203 is updated with the sensitivity setting value set by the trigger word detection sensitivity setting processing unit 205.

音声対話端末１は、一定時間マイク１−１からユーザの発話の入力がない場合、トリガワードが入力されるのを待つトリガワード入力待ち状態に遷移する。音声対話端末１は、トリガワード検出処理部２０３において、マイク１−１から入力された音声が、予め登録されてトリガワードと一致するかの検出を常に行う。音声対話端末１は、トリガワード検出処理部２０３でトリガワードを検出すると、以降ユーザの発話内容をネットワーク３を介して音声サービス２に送信する。音声対話端末１は、トリガワード検出処理部２０３でトリガワードを検出した後も、一定時間マイク１−１からユーザの発話の入力がない場合は、再びトリガワード入力待ち状態に遷移する。
（第１の実施形態）
第１の実施形態の音声対話端末は、ユーザが携帯端末２０あるいは携帯端末２１にインストールされているアプリケーション（感度設定アプリ）を操作して、任意の検出感度の感度設定値を入力することで、トリガワード検出処理部２０３の検出感度を変更する音声対話端末である。 When there is no input of the user's utterance from the microphone 1-1 for a certain period of time, the voice interactive terminal 1 transits to a trigger word input waiting state waiting for a trigger word to be input. In the voice interactive terminal 1, the trigger word detection processing unit 203 always detects whether the voice input from the microphone 1-1 is registered in advance and matches the trigger word. When the trigger word detection processing unit 203 detects the trigger word, the voice interaction terminal 1 transmits the user's utterance content to the voice service 2 via the network 3 thereafter. Even after the trigger word detection processing unit 203 detects the trigger word, the voice interactive terminal 1 transitions again to the trigger word input waiting state when there is no input of the user's utterance from the microphone 1-1 for a certain period of time.
(First embodiment)
In the voice interactive terminal according to the first embodiment, the user operates an application (sensitivity setting application) installed in the mobile terminal 20 or the mobile terminal 21 and inputs a sensitivity setting value of an arbitrary detection sensitivity. This is a voice interactive terminal that changes the detection sensitivity of the trigger word detection processing unit 203.

図３は、第１の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れを示す図である。 FIG. 3 is a diagram illustrating an overall configuration diagram and a flow of data related to sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the first embodiment.

携帯端末２０を操作することでユーザが設定した感度設定値は、近距離無線通信システム７を介して音声対話端末１に送信することが出来る。また携帯端末２１を操作することでユーザが設定した感度設定値は、サーバ９を経由してネットワーク３を介して音声対話端末１に送信することが出来る。これにより音声対話端末１は、トリガワード検出処理部２０３の検出感度を、携帯端末２０あるいは携帯端末２１のユーザ操作により自由に変更することができる。 The sensitivity setting value set by the user by operating the mobile terminal 20 can be transmitted to the voice interactive terminal 1 via the short-range wireless communication system 7. The sensitivity setting value set by the user by operating the mobile terminal 21 can be transmitted to the voice interactive terminal 1 via the server 9 and the network 3. Thereby, the voice interactive terminal 1 can freely change the detection sensitivity of the trigger word detection processing unit 203 by a user operation of the portable terminal 20 or the portable terminal 21.

図４は、携帯端末２０あるいは携帯端末２１にインストールされている音声対話端末１を制御するための感度設定アプリの、感度設定値を設定する設定画面の例である。 FIG. 4 is an example of a setting screen for setting a sensitivity setting value of a sensitivity setting application for controlling the voice interactive terminal 1 installed in the mobile terminal 20 or the mobile terminal 21.

携帯端末２０あるいは携帯端末２１のユーザは、スライドバー（スライダー）４０１を左右に移動させることで検出感度の感度設定値を上下させることができる。ユーザは、スライドバー４０１を任意の位置に設定し、設定ボタン４０２を押下することで、ユーザがスライドバー４０１に設定した位置に基づく感度設定値が、近距離無線通信システム７あるいはネットワーク３を介して音声対話端末１に送られ、トリガワード検出処理部２０３に設定される。 The user of the mobile terminal 20 or the mobile terminal 21 can move the slide bar (slider) 401 left and right to raise or lower the sensitivity setting value of the detection sensitivity. When the user sets the slide bar 401 to an arbitrary position and presses the setting button 402, the sensitivity setting value based on the position set by the user on the slide bar 401 is transmitted via the short-range wireless communication system 7 or the network 3. Are sent to the voice interaction terminal 1 and set in the trigger word detection processing unit 203.

図４の例は、スライドバー４０を左右に移動させることで検出感度の設定値を上下させる場合の例であるが、検出感度の設定値（例えば０〜１００）の数値を直接またはアップダウンさせて入力させる方法のユーザインターフェースであってもよい。 The example of FIG. 4 is an example in which the detection sensitivity setting value is raised or lowered by moving the slide bar 40 to the left or right, but the detection sensitivity setting value (for example, 0 to 100) is directly or up and down. It may be a user interface of a method of making it input.

図５Ａは、第１の実施形態における音声対話端末１のトリガワードの検出感度の感度設定値を、ユーザが携帯端末２０あるいは携帯端末２１にインストールされているアプリケーションを操作して変更する場合の、音声対話端末１の処理フローである。 FIG. 5A shows the case where the user changes the sensitivity setting value of the trigger word detection sensitivity of the voice interactive terminal 1 in the first embodiment by operating an application installed in the mobile terminal 20 or the mobile terminal 21. It is a processing flow of the voice interactive terminal 1. FIG.

音声対話端末１の電源をＯＮすると、音声対話端末１は音声対話端末処理を開始する（Ｓ５００）。音声対話端末１は、起動後初期化処理を行い（Ｓ５０１）、マイク１−１から音声が入力されるのを待つ。 When the power of the voice interactive terminal 1 is turned on, the voice interactive terminal 1 starts a voice interactive terminal process (S500). The voice interactive terminal 1 performs an initialization process after activation (S501), and waits for a voice to be input from the microphone 1-1.

音声対話端末１は、マイク１−１から音声が入力される（Ｓ５０２）と、入力された音声データを音響処理部２０１でノイズの除去等音響処理を行う（Ｓ５０３）。次に音声対話端末１は、感度設定値を受信しているかを判定する（Ｓ５０４）。 When a voice is input from the microphone 1-1 (S502), the voice interactive terminal 1 performs acoustic processing such as noise removal on the input voice data by the acoustic processing unit 201 (S503). Next, the voice interaction terminal 1 determines whether a sensitivity setting value has been received (S504).

Ｓ５０４の判定の結果、感度設定値を受信した場合（Ｓ５０４のＹｅｓ）、音声対話端末１は、トリガワード検出感度設定処理部２０５の検出感度を受信した感度設定値に設定する。音声対話端末１は、トリガワード検出感度設定処理部２０５の検出感度を設定すると、トリガワード検出処理を行う（Ｓ５０６）。 When the sensitivity setting value is received as a result of the determination in S504 (Yes in S504), the voice interactive terminal 1 sets the detection sensitivity of the trigger word detection sensitivity setting processing unit 205 to the received sensitivity setting value. After setting the detection sensitivity of the trigger word detection sensitivity setting processing unit 205, the voice interaction terminal 1 performs a trigger word detection process (S506).

Ｓ５０４の判定の結果、感度設定値を受信しなかった場合（Ｓ５０４のＮｏ）、音声対話端末１は、トリガワード検出処理を行う（Ｓ５０６）。 If the sensitivity setting value is not received as a result of the determination in S504 (No in S504), the voice interactive terminal 1 performs a trigger word detection process (S506).

音声対話端末１は、Ｓ５０２の処理でマイク１−１から入力された音声と、予めトリガワードとして登録されている内容との類似度を算出し、この算出した類似度が予め決められた閾値を超えたかどうかを判定する（Ｓ５０６）。 The voice interactive terminal 1 calculates the similarity between the voice input from the microphone 1-1 in the process of S502 and the contents registered in advance as the trigger word, and sets the threshold for which the calculated similarity is determined in advance. It is determined whether it has been exceeded (S506).

Ｓ５０６の判定の結果、閾値を超えてる場合（Ｓ５０７のＹｅｓ）、音声対話端末１は、Ｓ５０２でマイク１−１から入力された音声をトリガワードであると認識し、トリガワード以降に入力された音声データを、ネットワーク３を介して音声対話サービス２に送信する（Ｓ５０８）。 As a result of the determination in S506, when the threshold is exceeded (Yes in S507), the voice interaction terminal 1 recognizes the voice input from the microphone 1-1 in S502 as a trigger word, and is input after the trigger word. The voice data is transmitted to the voice dialogue service 2 via the network 3 (S508).

Ｓ５０６の判定の結果、閾値を超えていない場合（Ｓ５０７のＮｏ）、音声対話端末１は、Ｓ３０２でマイク１−１から入力された音声からトリガワードを認識できなかったと判断し、トリガワード以降に入力された音声データを、音声対話サービス２に送信しない。 As a result of the determination in S506, if the threshold is not exceeded (No in S507), the voice interaction terminal 1 determines that the trigger word could not be recognized from the voice input from the microphone 1-1 in S302, and after the trigger word The input voice data is not transmitted to the voice dialogue service 2.

音声対話端末１は、電源がＯＮであれば（Ｓ５０９のＹｅｓ）、Ｓ５０２の処理へ戻りＳ５０３以降の処理を、電源がＯＮである限り繰り返す。 If the power is on (Yes in S509), the voice interactive terminal 1 returns to the process of S502 and repeats the processes after S503 as long as the power is on.

図５Ｂは、第１の実施形態における音声対話端末１のトリガワードの検出感度の感度設定値を、ユーザが携帯端末２０あるいは携帯端末２１にインストールされているアプリケーションを操作して変更する場合の、携帯端末２０あるいは携帯端末２１の処理フローである。なお、携帯端末２０の処理フローと携帯端末２１の処理フローは同一であるので、携帯端末２１の処理フローを例に説明する。 FIG. 5B shows a case where the sensitivity setting value of the trigger word detection sensitivity of the voice interactive terminal 1 in the first embodiment is changed by the user operating an application installed on the mobile terminal 20 or the mobile terminal 21. It is a processing flow of the portable terminal 20 or the portable terminal 21. Since the processing flow of the mobile terminal 20 and the processing flow of the mobile terminal 21 are the same, the processing flow of the mobile terminal 21 will be described as an example.

携帯端末２０のユーザが、感度設定値を設定する感度設定アプリを起動するためのアイコンをタップすると、感度設定アプリは携帯端末処理を開始する（Ｓ５２０）。感度設定アプリは、アイコンがタップされたことにより起動する（Ｓ５２１）と、例えば図４に示す表示内容（ＧＵＩ画面）を携帯端末２１の表示画面に表示する。 When the user of the mobile terminal 20 taps an icon for starting a sensitivity setting application for setting the sensitivity setting value, the sensitivity setting application starts mobile terminal processing (S520). When the sensitivity setting application is activated when the icon is tapped (S521), for example, the display content (GUI screen) shown in FIG. 4 is displayed on the display screen of the mobile terminal 21.

ユーザは、携帯端末２０の表示画面に表示されている図４に示す表示内容により、感度設定値を調整して設定する（Ｓ５２２）。ユーザは、感度設定値を設定したあとに、図４の設定ボタン４０２を押下することで、設定された感度設定値がネットワーク３を介して音声対話端末１に送られ（Ｓ５２３）、トリガワード検出感度設定処理部２０５によりトリガワード検出処理部２０３の閾値が更新される。 The user adjusts and sets the sensitivity setting value according to the display content shown in FIG. 4 displayed on the display screen of the mobile terminal 20 (S522). After the sensitivity setting value is set, the user presses the setting button 402 in FIG. 4 so that the set sensitivity setting value is sent to the voice interactive terminal 1 via the network 3 (S523), and trigger word detection is performed. The threshold value of the trigger word detection processing unit 203 is updated by the sensitivity setting processing unit 205.

なお第１の実施形態の音声対話端末は、ユーザが携帯端末２０あるいは携帯端末２１にインストールされているアプリケーション（感度設定アプリ）を操作して感度設定値を入力する場合を例に説明したが、例えば音声対話端末１に感度設定アプリがインストールされていてもよい。この場合ユーザは、音声対話端末１にインストールされている感度設定アプリを直接操作することで、感度設定値を入力してもよい。
（第２の実施形態）
第２の実施形態の音声対話端末は、携帯端末２１で設定した期間において収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データを用いて算出した感度設定値を用いて、感度設定値を更新する音声対話端末である。音声対話端末１は、周辺の音声の収集を携帯端末２１からの指示で行う。 In the voice interactive terminal according to the first embodiment, the case where the user operates the application (sensitivity setting application) installed in the mobile terminal 20 or the mobile terminal 21 and inputs the sensitivity setting value has been described as an example. For example, a sensitivity setting application may be installed in the voice interactive terminal 1. In this case, the user may input the sensitivity setting value by directly operating the sensitivity setting application installed in the voice interactive terminal 1.
(Second Embodiment)
The voice interaction terminal according to the second embodiment transmits the surrounding voice collected during the period set by the mobile terminal 21 to the voice dialogue service 2, and uses the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2. The voice interactive terminal updates the sensitivity setting value using the calculated sensitivity setting value. The voice interactive terminal 1 collects peripheral voices according to instructions from the portable terminal 21.

図６は、第２の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れを示す図である。 FIG. 6 is a diagram illustrating an overall configuration diagram and a flow of data relating to sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the second embodiment.

ユーザは、携帯端末２１にインストールされている感度設定アプリを操作することで、音声対話端末１が、周辺の音声をマイク１−１で収集する期間を指示することができる。ユーザは、携帯端末２１の感度設定アプリにより、期間の開始として感度設定開始、期間の終了として感度設定終了を設定する。 The user can instruct a period during which the voice interactive terminal 1 collects surrounding sounds with the microphone 1-1 by operating the sensitivity setting application installed in the portable terminal 21. The user sets the sensitivity setting start as the start of the period and the sensitivity setting end as the end of the period by the sensitivity setting application of the mobile terminal 21.

感度設定アプリにより設定された感度設定開始および感度設定終了は、感度設定開始イベントおよび感度設定終了イベントとして、ネットワーク３を介してサーバ９を経由して音声対話端末１に送信される（Ｓ６１）。 The sensitivity setting start and sensitivity setting end set by the sensitivity setting application are transmitted as a sensitivity setting start event and a sensitivity setting end event via the network 3 to the voice interactive terminal 1 via the server 9 (S61).

感度設定開始イベントおよび感度設定終了イベントを受信した音声対話端末１は、受信したイベントの内容に基づいて、設定された期間内にマイク１−１で収集した音声データを検出感度算出部２−４に送る（Ｓ６２）。設定された期間内の音声データの送信処理は、図８Ａを用いて説明する。なお、設定された期間中は、音声対話端末１に対して、ユーザはトリガワードを発しないことが望ましい。 Upon receiving the sensitivity setting start event and the sensitivity setting end event, the voice interactive terminal 1 detects the voice data collected by the microphone 1-1 within the set period based on the contents of the received event, and detects the sensitivity calculation unit 2-4. (S62). The audio data transmission process within the set period will be described with reference to FIG. 8A. Note that it is desirable that the user does not issue a trigger word to the voice interactive terminal 1 during the set period.

検出感度算出部２−４は、受信した音声データを用いて、トリガワードを発話していないのに、トリガワードを発したと誤検出した回数が一定回数（一定割合）以下になるようにトリガワードの検出感度を算出する。検出感度算出部２−４は、検出感度の算出が完了すると、この算出した感度設定値を音声対話端末１に送信する（Ｓ６３）。感度設定の算出処理は、図８Ｃを用いて説明する。 The detection sensitivity calculation unit 2-4 uses the received audio data to trigger so that the number of times that the trigger word is erroneously detected is less than a certain number (a certain rate) even though the trigger word is not uttered. Calculate the word detection sensitivity. When the calculation of the detection sensitivity is completed, the detection sensitivity calculation unit 2-4 transmits the calculated sensitivity setting value to the voice interactive terminal 1 (S63). The sensitivity setting calculation process will be described with reference to FIG. 8C.

音声対話端末１は、検出感度算出部２−４から送られてきた感度設定値を、トリガワード検出処理部２０３に設定する。 The voice interactive terminal 1 sets the sensitivity setting value sent from the detection sensitivity calculation unit 2-4 in the trigger word detection processing unit 203.

図７は、携帯端末２０あるいは携帯端末２１にインストールされている音声対話端末１を制御するためのアプリケーションにおける、周辺雑音の測定の開始および終了の設定画面の例である。携帯端末２０あるいは携帯端末２１のユーザは、任意のタイミンツで開始ボタン７０２と終了ボタン７０３を押下することができる。ユーザが開始ボタン７０２を押下すると、測定開始の通知（感度設定開始イベント）がネットワーク３を介して音声対話端末１に送られ、トリガワード検出処理部２０３が周辺雑音の測定を開始する。次にユーザが終了ボタン７０３を押下すると、測定終了の通知（感度設定終了イベント）がネットワーク３を介して音声対話端末１に送られ、トリガワード検出処理部２０３が周辺雑音の測定を終了する。 FIG. 7 is an example of a setting screen for starting and ending measurement of ambient noise in an application for controlling the voice interactive terminal 1 installed in the mobile terminal 20 or the mobile terminal 21. The user of the mobile terminal 20 or the mobile terminal 21 can press the start button 702 and the end button 703 at any timing. When the user presses the start button 702, a measurement start notification (sensitivity setting start event) is sent to the voice interactive terminal 1 via the network 3, and the trigger word detection processing unit 203 starts measuring ambient noise. Next, when the user presses the end button 703, a measurement end notification (sensitivity setting end event) is sent to the voice interactive terminal 1 via the network 3, and the trigger word detection processing unit 203 ends the measurement of the ambient noise.

図７に示す設定アプリの設定画面の例は、感度設定開始および感度設定終了を設定する例であるが、例えば感度設定開始と期間の長さの２つを設定できる設定画面を表示してもよい。また、ユーザが開始ボタン７０２を押下した後に終了ボタン７０３を押下しなくても、所定数秒間（例えば、１０秒間）取得した周辺雑音の音声データを音声対話サービス２に送信し、この送信後に、感度設定終了のイベントを音声対話サービス２に送信する。このようにすることで終了ボタン７０３を省くことも可能である。 The example of the setting application setting screen shown in FIG. 7 is an example of setting sensitivity setting start and sensitivity setting end. For example, even if a setting screen that can set two of sensitivity setting start and period length is displayed. Good. Even if the user does not press the end button 703 after pressing the start button 702, the ambient noise audio data acquired for a predetermined number of seconds (for example, 10 seconds) is transmitted to the voice interaction service 2, and after this transmission, The sensitivity setting end event is transmitted to the voice interactive service 2. In this way, the end button 703 can be omitted.

図８Ａは、第２の実施形態における音声対話端末が、携帯端末２１で設定した期間収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データにより算出した感度設定値を用いて、感度設定値を設定する場合の、音声対話端末１の処理フローである。 FIG. 8A shows the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2 when the voice dialogue terminal in the second embodiment transmits the surrounding voice collected for the period set by the portable terminal 21 to the voice dialogue service 2. 7 is a processing flow of the voice interactive terminal 1 when setting a sensitivity setting value using the sensitivity setting value calculated by the above.

音声対話端末１の電源をＯＮすると、音声対話端末１は音声対話端末処理を開始する（Ｓ８００）。音声対話端末１は、起動後初期化処理を行い（８０１）、マイク１−１から音声が入力されるのを待つ。 When the power of the voice interactive terminal 1 is turned on, the voice interactive terminal 1 starts a voice interactive terminal process (S800). The voice interaction terminal 1 performs an initialization process after activation (801), and waits for a voice to be input from the microphone 1-1.

音声対話端末１は、マイク１−１から音声が入力される（Ｓ８０２）と、この入力された音声データを音響処理部２０１でノイズの除去等音響処理を行う（Ｓ８０３）。次に音声対話端末１は、感度設定開始イベントを受信しているかを判定する（Ｓ８０４）。 When voice is input from the microphone 1-1 (S802), the voice interactive terminal 1 performs acoustic processing such as noise removal on the input voice data by the acoustic processing unit 201 (S803). Next, the voice interaction terminal 1 determines whether a sensitivity setting start event has been received (S804).

Ｓ８０４の判定の結果、感度設定開始イベントを受信している場合（Ｓ８０４のＹｅｓ）、音声対話端末１は、マイク１−１から入力された音声を、音声データとして音声対話サービス２に送信する（Ｓ８１２）。続いて音声対話端末１は、感度設定終了イベントを受信したかを判定する（Ｓ８１３）。 When the sensitivity setting start event is received as a result of the determination in S804 (Yes in S804), the voice interaction terminal 1 transmits the voice input from the microphone 1-1 to the voice interaction service 2 as voice data ( S812). Subsequently, the voice interaction terminal 1 determines whether a sensitivity setting end event has been received (S813).

Ｓ８１３の判定の結果、感度設定終了イベントを受信していない場合（Ｓ８１３のＮｏ）、音声対話端末１は、Ｓ８０２の処理に戻り、Ｓ８０２以降の処理を続ける。 If the sensitivity setting end event has not been received as a result of the determination in S813 (No in S813), the voice interaction terminal 1 returns to the processing in S802 and continues the processing from S802.

Ｓ８０４の判定の結果、感度設定開始イベントを受信していない場合（Ｓ８０４のＮｏ）、音声対話端末１は、図５Ａの処理フローのＳ５０４以降と同じ処理を行う。つまり音声対話端末１は、感度設定値を受信したかの判定（Ｓ８０５）と、トリガワードを検出したかの判定（８０８）を、電源がＯＮされている間繰り返し行う。 As a result of the determination in S804, when the sensitivity setting start event has not been received (No in S804), the voice interaction terminal 1 performs the same processing as S504 and subsequent steps in the processing flow of FIG. 5A. That is, the voice interactive terminal 1 repeatedly determines whether a sensitivity setting value has been received (S805) and whether a trigger word has been detected (808) while the power is on.

またＳ８１３の判定の結果、感度設定終了イベントを受信していた場合（Ｓ８１３のＹｅｓ）、音声対話端末１は、図５Ａの処理フローのＳ５０４以降と同じ処理を行う。つまり、音声対話端末１は、感度設定値を受信したかの判定（Ｓ８０５）と、トリガワードを検出したかの判定（８０８）を、電源がＯＮされている間繰り返し行う。 If the sensitivity setting end event is received as a result of the determination in S813 (Yes in S813), the voice interaction terminal 1 performs the same processing as S504 and subsequent steps in the processing flow of FIG. 5A. That is, the voice interactive terminal 1 repeatedly determines whether a sensitivity setting value has been received (S805) and whether a trigger word has been detected (808) while the power is on.

図８Ｂは、第２の実施形態における音声対話端末が、携帯端末２１で設定した期間収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データにより算出した感度設定値を用いて、感度設定値を更新する場合の、携帯端末２１の処理フローである。 FIG. 8B shows the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2 when the voice dialogue terminal in the second embodiment transmits the surrounding voices collected for the period set by the portable terminal 21 to the voice dialogue service 2. It is a processing flow of the portable terminal 21 when a sensitivity setting value is updated using the sensitivity setting value calculated by the above.

携帯端末２１のユーザが感度設定アプリを起動するためのアイコンをタップすると、感度設定アプリは携帯端末処理を開始する（Ｓ８２０）。感度設定アプリは、アイコンがタップされたことにより起動する（Ｓ８２１）と、例えば図７に示す表示内容を携帯端末２１の表示画面に表示する。 When the user of the mobile terminal 21 taps an icon for starting the sensitivity setting application, the sensitivity setting application starts mobile terminal processing (S820). When the sensitivity setting application is activated when the icon is tapped (S821), for example, the display content illustrated in FIG. 7 is displayed on the display screen of the mobile terminal 21.

ユーザは、携帯端末２１の表示画面に表示さている図７に示す表示内容の開始ボタン７０２を押下すると、感度設定開始イベントが、ネットワーク３を介して音声対話端末１に送信される。感度設定開始イベントを受信した音声対話端末１は、図８ＡのＳ８０４の処理においてＹｅｓと判定し、取得した周辺雑音の音声データを音声対話サービス２に送信を開始する。 When the user presses a display content start button 702 shown in FIG. 7 displayed on the display screen of the mobile terminal 21, a sensitivity setting start event is transmitted to the voice interactive terminal 1 via the network 3. The voice interaction terminal 1 that has received the sensitivity setting start event determines Yes in the process of S804 in FIG. 8A and starts transmitting the acquired voice data of the ambient noise to the voice interaction service 2.

つづいてユーザが、携帯端末２０の表示画面に表示されている図７に示す表示内容の終了ボタン７０３を押下すると、感度設定終了イベントが、ネットワーク３を介して音声対話端末１に送られる。感度設定終了イベントを受信した音声対話端末１は、図８ＡのＳ８１３の処理においてをＹｅｓと判定し、周辺雑音の音声データの音声対話サービス２への送信を終了する。 Subsequently, when the user presses a display content end button 703 shown in FIG. 7 displayed on the display screen of the mobile terminal 20, a sensitivity setting end event is sent to the voice interactive terminal 1 via the network 3. The voice interaction terminal 1 that has received the sensitivity setting end event determines Yes in the process of S813 in FIG. 8A and ends the transmission of the voice data of the ambient noise to the voice interaction service 2.

図８Ｃは、第２の実施形態における音声対話端末が、携帯端末２１で設定した期間収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データにより算出した感度設定値を用いて、感度設定値を更新する場合の、検出感度算出部２−４の処理フローである。 FIG. 8C shows the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2 when the voice dialogue terminal in the second embodiment transmits the surrounding voices collected for the period set by the portable terminal 21 to the voice dialogue service 2. 4 is a processing flow of the detection sensitivity calculation unit 2-4 when the sensitivity setting value is updated using the sensitivity setting value calculated by the above.

音声対話サービス２の検出感度算出部２−４は、外部からイベントを受信するとクラウド処理を開始する（Ｓ８３０）。検出感度算出部２−４は、受信したイベントが感度設定開始かどうかを判定する（Ｓ８３１）。 The detection sensitivity calculation unit 2-4 of the voice interaction service 2 starts cloud processing upon receiving an event from the outside (S830). The detection sensitivity calculation unit 2-4 determines whether the received event is a sensitivity setting start (S831).

Ｓ８３１の判定の結果、受信したイベントが感度設定開始イベントである場合（Ｓ８３１がＹｅｓ）、検出感度算出部２−４は、検出感度の調整を開始する（Ｓ８３２）。また検出感度算出部２−４は同時に、受信した感度設定開始イベントを音声対話端末１に送信する（Ｓ８３２）。 As a result of the determination in S831, if the received event is a sensitivity setting start event (S831 is Yes), the detection sensitivity calculation unit 2-4 starts adjusting detection sensitivity (S832). At the same time, the detection sensitivity calculation unit 2-4 transmits the received sensitivity setting start event to the voice interaction terminal 1 (S832).

検出感度算出部２−４は、感度設定開始イベントを音声対話端末１に送ることで、音声対話端末１から音声データが送られてくる（図８Ａの８１２）のを待つ（Ｓ８３３）。検出感度算出部２−４は、音声対話端末１から音声データが送られてくる（Ｓ８３３）と、感度設定終了イベントを受信するまで、送られてくる音声データを受信し続ける（Ｓ８３４のＮｏ）。 The detection sensitivity calculation unit 2-4 sends a sensitivity setting start event to the voice interaction terminal 1, and waits for the voice data to be transmitted from the voice interaction terminal 1 (812 in FIG. 8A) (S833). When the voice data is sent from the voice interaction terminal 1 (S833), the detection sensitivity calculation unit 2-4 continues to receive the voice data sent until a sensitivity setting end event is received (No in S834). .

検出感度算出部２−４は、感度設定終了イベントを受信すると（Ｓ８３４のＹｅｓ）検出感度調整を終了する（Ｓ８３５）。また検出感度算出部２−４は同時に、受信した感度設定終了イベントを音声対話端末１に送信する（Ｓ８３５）。 When receiving the sensitivity setting end event (Yes in S834), the detection sensitivity calculation unit 2-4 ends the detection sensitivity adjustment (S835). At the same time, the detection sensitivity calculation unit 2-4 transmits the received sensitivity setting end event to the voice interaction terminal 1 (S835).

次に検出感度算出部２−４は、受信した音声データを用いて感度設定値を算出する（Ｓ８３６）。検出感度算出部２−４は、算出した感度設定値を音声対話端末１に送信し（Ｓ８３７）、処理を終了する。 Next, the detection sensitivity calculation unit 2-4 calculates a sensitivity setting value using the received audio data (S836). The detection sensitivity calculation unit 2-4 transmits the calculated sensitivity setting value to the voice interaction terminal 1 (S837), and ends the process.

以上のように第２の実施形態の音声対話端末は、音声対話端末が置かれている周辺の音の状況から算出された最適な感度設定値を用いて、トリガワードを検出する感度設定値を設定するので、音声対話端末１は、置かれている環境に適した感度でトリガワードを検出でき、ユーザが使い勝手が向上する。
（第３の実施形態）
第３の実施形態の音声対話端末は、ユーザの発話の内容により設定した期間において収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データを用いて算出した感度設定値を用いて、感度設定値を設定する音声対話端末である。音声対話端末１は、周辺の音声の収集をユーザの発話による指示で行う。 As described above, the voice interactive terminal according to the second embodiment uses the optimum sensitivity setting value calculated from the surrounding sound situation where the voice interactive terminal is placed, and sets the sensitivity setting value for detecting the trigger word. Since it is set, the voice interactive terminal 1 can detect the trigger word with a sensitivity suitable for the environment in which it is placed, and the usability of the user is improved.
(Third embodiment)
The voice interaction terminal according to the third embodiment transmits the surrounding voice collected during the period set according to the content of the user's utterance to the voice dialogue service 2, and the voice data collected by the voice dialogue terminal 1 in the voice dialogue service 2 This is a voice interactive terminal for setting a sensitivity setting value using the sensitivity setting value calculated by using the sensitivity setting value. The voice interactive terminal 1 collects surrounding voice according to an instruction by a user's utterance.

図９は、第３の実施形態の音声対話端末に感度設定を行う場合の、全体構成図および感度設定に関するデータの流れを示す図である。 FIG. 9 is a diagram illustrating an overall configuration diagram and a data flow related to sensitivity setting when sensitivity setting is performed in the voice interactive terminal according to the third embodiment.

ユーザは、音声対話端末１に発話することで、周辺の音声をマイク１−１で収集する期間を指示することができる。発話の内容は例えば、期間の開始を示す感度設定開始としての「トリガワード検出感度開始」、期間の終了を示す感度設定終了としての「トリガワード検出感度終了」でもよい。ユーザが発話した「トリガワード検出感度開始」および「トリガワード検出感度終了」は、音声データとして検出感度算出部２−４に送られ（Ｓ９１）、検出感度算出部２−４において期間の開始と期間の終了として認識される。検出感度算出部２−４は、認識した期間の開始を感度設定開始イベントとして音声対話端末１に送信する（Ｓ９１）。また検出感度算出部２−４は、認識した期間の終了を感度設定終了イベントとして音声対話端末１に送信する（Ｓ９１）。 The user can instruct the period during which surrounding sounds are collected by the microphone 1-1 by speaking to the voice interactive terminal 1. The content of the utterance may be, for example, “trigger word detection sensitivity start” as the sensitivity setting start indicating the start of the period and “trigger word detection sensitivity end” as the sensitivity setting end indicating the end of the period. The “trigger word detection sensitivity start” and “trigger word detection sensitivity end” uttered by the user are sent to the detection sensitivity calculation unit 2-4 as voice data (S91), and the detection sensitivity calculation unit 2-4 determines the start of the period. Recognized as the end of the period. The detection sensitivity calculation unit 2-4 transmits the start of the recognized period to the voice interaction terminal 1 as a sensitivity setting start event (S91). The detection sensitivity calculation unit 2-4 transmits the end of the recognized period to the voice interaction terminal 1 as a sensitivity setting end event (S91).

また音声対話端末１は、マイク１−１から入力された音声を、音声データとして検出感度算出部２−４に送信する（Ｓ９２）。 In addition, the voice interactive terminal 1 transmits the voice input from the microphone 1-1 to the detection sensitivity calculation unit 2-4 as voice data (S92).

音声対話端末１から送られてきた期間の開始を示す音声データである「トリガワード検出感度開始」、期間の終了を示す音声データである「トリガワード検出感度終了」を認識した検出感度算出部２−４は、この期間に受信した音声データを用いて、トリガワードを発話していないのに、トリガワードを発したとご検出した回数が一定回数以下になるようにトリガワードの検出感度を算出する。検出感度算出部２−４は、検出感度の算出が完了すると、算出した感度設定値を音声対話端末１に送信する（Ｓ９３）。 Detection sensitivity calculator 2 that recognizes “trigger word detection sensitivity start”, which is voice data indicating the start of the period sent from the voice interactive terminal 1, and “trigger word detection sensitivity end”, which is the voice data indicating the end of the period. -4 uses the voice data received during this period to calculate the trigger word detection sensitivity so that the number of times the trigger word is detected when the trigger word is not spoken is less than a certain number To do. When the calculation of the detection sensitivity is completed, the detection sensitivity calculation unit 2-4 transmits the calculated sensitivity setting value to the voice interactive terminal 1 (S93).

図１０Ａは、第３の実施形態における音声対話端末が、ユーザの発話の内容により設定した期間において収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データを用いて算出した感度設定値を用いて、感度設定値を更新する場合の、音声対話端末１の処理フローである。図１０ＡのＳ１０００からＳ１０１３までの処理は、図８ＡのＳ８００からＳ８１３までの処理と同一である。 FIG. 10A shows that the voice conversation terminal according to the third embodiment transmits the surrounding voice collected during the period set according to the content of the user's utterance to the voice dialogue service 2, and the voice dialogue terminal 1 collects the voice dialogue service 2. It is a processing flow of the voice interactive terminal 1 when the sensitivity setting value is updated using the sensitivity setting value calculated using the voice data. The processing from S1000 to S1013 in FIG. 10A is the same as the processing from S800 to S813 in FIG. 8A.

図１０Ｂは、第３の実施形態における音声対話端末が、ユーザの発話の内容により設定した期間において収集した周辺の音声を音声対話サービス２に送信し、音声対話サービス２において音声対話端末１が収集した音声データを用いて算出した感度設定値を用いて、感度設定値を更新する場合の、検出感度算出部２−４の処理フローである。 FIG. 10B shows that the voice conversation terminal according to the third embodiment transmits the surrounding voice collected during the period set according to the content of the user's utterance to the voice dialogue service 2, and the voice dialogue terminal 1 collects the voice dialogue service 2. It is a processing flow of the detection sensitivity calculation part 2-4 in the case of updating a sensitivity setting value using the sensitivity setting value calculated using the audio | voice data performed.

図８Ｃの処理フローとの違いは、期間の開始を示す感度設定開始イベントを受信する代わりに期間の開始を示す音声データである「トリガワード検出感度開始」を受信して認識する点および、期間の終了を示す感度設定終了イベントを受信する代わりに期間の終了を示す音声データである「トリガワード検出感度終了」を受信して認識する点である。 The difference from the processing flow of FIG. 8C is that, instead of receiving the sensitivity setting start event indicating the start of the period, “trigger word detection sensitivity start” that is voice data indicating the start of the period is received and recognized, and the period Instead of receiving a sensitivity setting end event indicating the end of the period, “trigger word detection sensitivity end” which is audio data indicating the end of the period is received and recognized.

音声対話端末１は、トリガワードを認識し、マイク１−１から入力された音声を音声データとして音声対話サービス２に送信しているとする。音声対話サービス２は、音声対話端末１から送られてきた音声データを受信すると、クラウド処理（１０２０）を開始する。 It is assumed that the voice interaction terminal 1 recognizes the trigger word and transmits the voice input from the microphone 1-1 to the voice interaction service 2 as voice data. When the voice dialogue service 2 receives the voice data sent from the voice dialogue terminal 1, the voice dialogue service 2 starts cloud processing (1020).

検出感度算出部２−４は、受信した音声データの中から、期間の開始を示す音声データである「トリガワード検出感度開始」を認識すると（Ｓ１０２３のＹｅｓ）、検出感度算出部２−４は、検出感度の調整を開始する（Ｓ１０２４）。また検出感度算出部２−４は、同時に受信した感度設定開始のイベントを音声対話端末１に送信する（Ｓ１０２４）。 When the detection sensitivity calculation unit 2-4 recognizes “start trigger word detection sensitivity”, which is audio data indicating the start of a period, from the received audio data (Yes in S1023), the detection sensitivity calculation unit 2-4 Then, adjustment of detection sensitivity is started (S1024). Also, the detection sensitivity calculation unit 2-4 transmits the simultaneously received sensitivity setting start event to the voice interaction terminal 1 (S1024).

検出感度算出部２−４は、受信した音声データの中から、期間の終了を示す音声データである「トリガワード検出感度終了」を認識するまで音声データを受信し続ける（Ｓ１０２５）。 The detection sensitivity calculation unit 2-4 continues to receive the voice data from the received voice data until it recognizes “end of trigger word detection sensitivity”, which is voice data indicating the end of the period (S1025).

検出感度算出部２−４は、受信した音声データの中から、期間の終了を示す音声データである「トリガワード検出感度終了」を認識する（Ｓ１０２６のＹｅｓ）と検出感度調整を終了する（Ｓ１０２７）。検出感度算出部２−４は、検出感度調整を終了する（Ｓ１０２７）と、受信した音声データを用いて感度設定値を算出する（Ｓ１０２８）。検出感度算出部２−４は、算出した感度設定値を音声対話端末１に送信し（Ｓ１０２９）、処理を終了する（１０３０）。 The detection sensitivity calculator 2-4 recognizes “end of trigger word detection sensitivity”, which is audio data indicating the end of the period, from the received audio data (Yes in S1026), and ends the detection sensitivity adjustment (S1027). ). When the detection sensitivity adjustment ends (S1027), the detection sensitivity calculation unit 2-4 calculates a sensitivity setting value using the received audio data (S1028). The detection sensitivity calculation unit 2-4 transmits the calculated sensitivity setting value to the voice interactive terminal 1 (S1029), and ends the process (1030).

以上のように第３の実施形態の音声対話端末は、音声対話端末が置かれている周辺の音の状況から算出された最適な感度設定値を用いて、トリガワードを検出する感度設定値を更新することができ、かつ最適な感度設定値を算出を、ユーザの発話をトリガーにして実行することができるため、ユーザは携帯端末を操作する手間から解放され、さらにユーザの使い勝手が向上する。このよう音声対話端末１は、置かれている環境に適した感度でトリガワードを検出でき、ユーザが使い勝手が向上する。 As described above, the voice interaction terminal according to the third embodiment uses the optimum sensitivity setting value calculated from the state of the surrounding sound where the voice interaction terminal is placed, and sets the sensitivity setting value for detecting the trigger word. Since it is possible to update and to calculate the optimum sensitivity setting value using the user's utterance as a trigger, the user is freed from the trouble of operating the mobile terminal, and the user's usability is further improved. In this way, the voice interactive terminal 1 can detect the trigger word with a sensitivity suitable for the environment in which the voice interactive terminal 1 is placed, and the usability of the user is improved.

以上のように、本実施形態の音声対話端末は、音声対話端末が設定されている状況の応じて、トリガワードを検出する感度の設定を更新することが可能である。例えば。比較的騒がしい環境においては、第２の実施形態の機能を使い周辺環境に基づいた検出感度の設定を行うのに加え、第１の実施形態のユーザ操作による検出感度を設定を行ってもよい。 As described above, the voice interactive terminal according to the present embodiment can update the sensitivity setting for detecting the trigger word according to the situation in which the voice interactive terminal is set. For example. In a relatively noisy environment, in addition to setting the detection sensitivity based on the surrounding environment using the function of the second embodiment, the detection sensitivity by the user operation of the first embodiment may be set.

また第１の実施形態の機能および第２の実施形態の機能の両方を同時に使用してもよい。たとえば、ある位置に置かれた音声対話端末１に対して他の位置にいるユーザがトリガワードを発する場合の、検出感度を調整するシーンを想定する。この場合、第１の実施形態の機能だけで検出感度を決定すると誤検出が増えることが予想される。そこで、第２の実施形態の機能を合わせて使用することで、音声対話端末１とユーザの立ち位置の相対関係において、周辺環境を考慮にいれた最適な検出感度を設定することが可能である。
（第４の実施形態）
第１の実施形態から第３の実施形態において音声対話サービス２の検出感度算出部２−４は、サーバ９に存在しているが、音声対話端末１の中に存在していてもよい。 Moreover, you may use both the function of 1st Embodiment and the function of 2nd Embodiment simultaneously. For example, assume a scene in which detection sensitivity is adjusted when a user at another position issues a trigger word to the voice interactive terminal 1 placed at a certain position. In this case, it is expected that the number of false detections increases when the detection sensitivity is determined only by the function of the first embodiment. Therefore, by using the functions of the second embodiment together, it is possible to set the optimum detection sensitivity in consideration of the surrounding environment in the relative relationship between the voice interactive terminal 1 and the user's standing position. .
(Fourth embodiment)
In the first to third embodiments, the detection sensitivity calculation unit 2-4 of the voice interaction service 2 exists in the server 9, but may exist in the voice interaction terminal 1.

第４の実施形態の音声対話端末は、さらに検出感度算出部２−４を含む音声対話端末である。 The voice interactive terminal according to the fourth embodiment is a voice interactive terminal further including a detection sensitivity calculating unit 2-4.

図１１および図１２に、検出感度算出部２−４が音声対話端末１の中に存在している場合の全体構成図および感度設定に関するデータの流れの例を示す図である。 FIG. 11 and FIG. 12 are diagrams showing an example of the entire configuration diagram and the flow of data related to sensitivity setting when the detection sensitivity calculation unit 2-4 is present in the voice interaction terminal 1.

図１１は、第２の実施形態の音声対話端末１の場合の図６に対応する、検出感度算出部２−４が音声対話端末１の中に存在している場合の、全体構成図および感度設定に関するデータの流れを示す図である。 FIG. 11 is an overall configuration diagram and sensitivity when the detection sensitivity calculation unit 2-4 exists in the voice interaction terminal 1 corresponding to FIG. 6 in the case of the voice interaction terminal 1 of the second embodiment. It is a figure which shows the flow of the data regarding a setting.

感度設定アプリにより設定された感度設定開始および感度設定終了は、感度設定開始イベントおよび感度設定終了イベントとして、ネットワーク３を介して音声対話端末１に送信される（Ｓ１１１）。音声対話端末１は、この感度設定開始イベントと感度設定終了イベントとの間、図６に示す例の場合同様に、周辺の音声をマイク１−１で収集するが、図１１に示す例の場合は、図６に示すＳ６２およびＳ６３に相当する処理をすべて音声対話端末１の内部を行う点が、図６に示す例との相違点である。 The sensitivity setting start and sensitivity setting end set by the sensitivity setting application are transmitted as a sensitivity setting start event and a sensitivity setting end event to the voice interactive terminal 1 via the network 3 (S111). The voice interactive terminal 1 collects peripheral voices with the microphone 1-1 between the sensitivity setting start event and the sensitivity setting end event, as in the example shown in FIG. 6, but in the example shown in FIG. Is different from the example shown in FIG. 6 in that all the processes corresponding to S62 and S63 shown in FIG.

同様に図１２は、第３の実施形態の音声対話端末１の場合の図９に対応する、検出感度算出部２−４が音声対話端末１の中に存在している場合の全体構成図および感度設定に関するデータの流れを示す図である。 Similarly, FIG. 12 is an overall configuration diagram in the case where the detection sensitivity calculation unit 2-4 exists in the voice interaction terminal 1 and corresponds to FIG. 9 in the case of the voice interaction terminal 1 of the third embodiment. It is a figure which shows the flow of the data regarding a sensitivity setting.

ユーザは、図９に示す例の場合同様に、音声対話端末１に発話することで、周辺の音声をマイク１−１で収集する期間を指示することができる。音声対話端末１は、この指示された期間、図６に示す例の場合同様に、周辺の音声をマイク１−１で収集するが、図１２に示す例の場合は、図９に示すＳ９１、Ｓ９２およびＳ９３の処理に相当する処理をすべて音声対話端末１の内部を行う点が、図９に示す例との相違点である。 As in the case of the example shown in FIG. 9, the user can instruct the period for collecting the surrounding sound with the microphone 1-1 by speaking to the voice interactive terminal 1. In the instructed period, the voice interactive terminal 1 collects peripheral voices with the microphone 1-1 as in the example shown in FIG. 6, but in the example shown in FIG. 12, S91 shown in FIG. The difference from the example shown in FIG. 9 is that all the processes corresponding to the processes of S92 and S93 are performed inside the voice interactive terminal 1.

本発明のいくつかの実施形態を説明したが、これらの実施形態は例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。さらにまた、請求項の各構成要素において、構成要素を分割して表現した場合、或いは複数を合わせて表現した場合、或いはこれらを組み合わせて表現した場合であっても本発明の範疇である。また、複数の実施形態を組み合わせてもよく、この組み合わせで構成される実施例も発明の範疇である。 Although several embodiments of the present invention have been described, these embodiments have been presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof. Furthermore, in each constituent element of the claims, even when the constituent element is expressed in a divided manner, when a plurality of constituent elements are expressed together, or when they are expressed in combination, they are within the scope of the present invention. Further, a plurality of embodiments may be combined, and an example constituted by this combination is also within the scope of the invention.

また、本明細書と各図において、既出の図に関して前述したものと同一又は類似した機能を発揮する構成要素には同一の参照符号を付し、重複する詳細な説明を適宜省略することがある。また請求項を制御ロジックとして表現した場合、コンピュータを実行させるインストラクションを含むプログラムとして表現した場合、及び前記インストラクションを記載したコンピュータ読み取り可能な記録媒体として表現した場合でも本発明の装置を適用したものである。また、使用している名称や用語についても限定されるものではなく、他の表現であっても実質的に同一内容、同趣旨であれば、本発明に含まれるものである。 In addition, in the present specification and each drawing, components that perform the same or similar functions as those described above with reference to the previous drawings are denoted by the same reference numerals, and repeated detailed description may be omitted as appropriate. . In addition, when the claims are expressed as control logic, when expressed as a program including instructions for causing a computer to execute, and when expressed as a computer-readable recording medium describing the instructions, the apparatus of the present invention is applied. is there. Further, the names and terms used are not limited, and other expressions are included in the present invention as long as they have substantially the same contents and the same concept.

１・・・音声対話端末、２・・・音声対話サービス、２−１・・・音声認識部、２−２・・・対話処理部、２−３・・・音声合成部、２−４検出感度算出部、３・・・ネットワーク、２０３・・・トリガワード検出処理部、２０５・・・トリガワード検出感度設定処理部。 DESCRIPTION OF SYMBOLS 1 ... Voice dialogue terminal, 2 ... Voice dialogue service, 2-1 ... Voice recognition part, 2-2 ... Dialog processing part, 2-3 ... Voice synthesis part, 2-4 detection Sensitivity calculation unit, 3... Network, 203... Trigger word detection processing unit, 205.

Claims

With a microphone,
A communication control unit for controlling communication with the network;
An output processing unit that outputs audio input from the microphone to the network via the communication control unit;
A detection unit that detects a specific word that is a predetermined word among the voices input from the microphone;
A setting unit configured to set a sensitivity setting value, which is a sensitivity value for the detection unit to detect the specific word, input from the network via the communication control unit to the detection unit;
A voice interaction terminal comprising:

The voice interaction terminal according to claim 1, wherein the sensitivity is a similarity between a voice input from the microphone and the specific word.

The output processing unit receives, via the communication control unit, the first sound input from the microphone between a sensitivity setting start event and a sensitivity setting end event input from the network via the communication control unit. Output to the network,
2. The voice interactive terminal according to claim 1, wherein the setting unit sets the sensitivity setting value calculated based on the first voice input from the network via the communication control unit in the detection unit.

The output processing unit receives the second sound input from the microphone between the first specific word indicating the start of sensitivity setting and the second specific word indicating the end of sensitivity setting, which is input from the microphone. Output to the network via the control unit,
The voice interactive terminal according to claim 1, wherein the setting unit sets the sensitivity setting value calculated based on the second voice input from the network via the communication control unit in the detection unit.

The voice interaction terminal according to claim 3, wherein the sensitivity setting value is transmitted from a mobile terminal associated in advance.

With a microphone,
Among the voices input from the microphone, a detection unit that detects a specific word that is a predetermined word;
Sensitivity that is a sensitivity value for the detection unit to detect the specific word based on the first sound input from the microphone between the start event indicating the start of sensitivity setting and the end event indicating the end of sensitivity setting. A sensitivity calculation unit for calculating a set value;
A setting unit that sets the sensitivity setting value calculated by the sensitivity calculation unit in the detection unit;
A voice interaction terminal comprising:

The voice interaction terminal according to claim 6, further comprising a communication control unit that controls communication with the mobile terminal, wherein the start event and the end event are input from the mobile terminal via the communication control unit.

The sensitivity calculation unit recognizes the first specific word input from the microphone as the start event indicating the sensitivity setting start, and the second specific word input from the microphone indicates the sensitivity setting end. The voice interactive terminal according to claim 6, wherein the voice interactive terminal is recognized as an end event.

Outputting audio input from a microphone to a network;
Detecting a specific word, which is a predetermined word, from the voice input from the microphone;
Updating the sensitivity setting value, which is a sensitivity value for detecting the specific word input from the network, to detect the specific word;
A voice interactive terminal control method comprising: