JP6226911B2

JP6226911B2 - Server apparatus, system, method for managing voice recognition function, and program for controlling information communication terminal

Info

Publication number: JP6226911B2
Application number: JP2015113874A
Authority: JP
Inventors: 豊川　卓; 卓豊川; 泰貴畠山
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2015-06-04
Filing date: 2015-06-04
Publication date: 2017-11-08
Anticipated expiration: 2035-06-04
Also published as: JP2017003608A

Description

本開示は音声認識機能の制御に関し、より特定的には、話者を認識する機能の制御に関する。 The present disclosure relates to control of a speech recognition function, and more specifically to control of a function for recognizing a speaker.

音声認識において、話者を認識するための技術が知られている。たとえば、特開２０１０−２１７３１９号公報（特許文献１）は、「音声信号から話者の特定を行う話者特定装置において、話者特定のための精度向上を図る」ための技術を開示している（［要約］参照）。特開平７−２６１７８１号公報（特許文献２）は、「話者認識精度が高い話者認識のための音素モデルを作成する学習方法」を開示している（［要約］参照）。 In voice recognition, a technique for recognizing a speaker is known. For example, Japanese Patent Laid-Open No. 2010-217319 (Patent Document 1) discloses a technique for “increasing accuracy for speaker identification in a speaker identification device that identifies a speaker from an audio signal”. (See [Summary]). Japanese Patent Laid-Open No. 7-261781 (Patent Document 2) discloses a “learning method for creating a phoneme model for speaker recognition with high speaker recognition accuracy” (see [Summary]).

特開２０１０−２１７３１９号公報JP 2010-217319 A 特開平７−２６１７８１号公報Japanese Patent Laid-Open No. 7-261781

話者認識機能等の音声認識機能は、家電、自動車、コンピュータ装置その他の機器に適用される。しかしながら、当該機器のユーザが一人である場合や認識精度が低くユーザが煩わしいと考える時等、話者認識機能が不要の場合がある。そのため、話者認識機能をＯＦＦ（オフ）にすることが考えられる。話者認識機能を有する機器では、当該話者認識機能に紐付けられた情報により個人が特定される。そのため、話者認識機能が働かないと、個人との紐付きが利用されなくなり、当該個人（ユーザ）の名前や好みの話題（嗜好）を特定できなくなる。したがって、話者認識機能がオフにされた場合であっても、当該機器のユーザや当該ユーザの嗜好を特定するための技術が必要とされている。 Voice recognition functions such as a speaker recognition function are applied to home appliances, automobiles, computer devices, and other devices. However, there are cases where the speaker recognition function is not necessary, for example, when there is only one user of the device, or when the recognition accuracy is low and the user is troublesome. Therefore, it is conceivable to turn off the speaker recognition function. In a device having a speaker recognition function, an individual is specified by information associated with the speaker recognition function. Therefore, if the speaker recognition function does not work, the association with the individual is not used, and the name of the individual (user) and the favorite topic (preference) cannot be specified. Therefore, even when the speaker recognition function is turned off, a technique for identifying the user of the device and the user's preference is required.

本開示は、上述のような問題点を解決するためになされたものであって、ある局面における目的は、話者認識機能がオフにされた場合であっても、機器のユーザまたは当該ユーザの嗜好を特定できるサーバを提供することである。他の局面における目的は、話者認識機能がオフにされた場合であっても、機器のユーザまたは当該ユーザの嗜好を特定できるシステムを提供することである。他の局面における目的は、話者認識機能がオフにされた場合であっても、機器のユーザまたは当該ユーザの嗜好を特定できるように音声認識機能を管理するための方法を提供することである。さらに他の局面における目的は、話者認識機能がオフにされた場合であっても、機器のユーザまたは当該ユーザの嗜好を特定できるように当該機器を制御するためのプログラムを提供することである。 The present disclosure has been made in order to solve the above-described problems, and an object in one aspect is to provide a user of a device or a user of the user even when the speaker recognition function is turned off. It is to provide a server that can specify preferences. An object in another aspect is to provide a system that can identify a user of a device or a preference of the user even when the speaker recognition function is turned off. An object in another aspect is to provide a method for managing a speech recognition function so that the user of the device or the user's preferences can be identified even when the speaker recognition function is turned off. . Still another object of the present invention is to provide a program for controlling a device so that the user of the device or the user's preference can be specified even when the speaker recognition function is turned off. .

一実施の形態に従うサーバは、話者を認識するデータを格納するためのメモリと、メモリに保持されるデータに基づいて話者認識機能を実行することにより、音声認識処理を実行するためのプロセッサと、サーバに対する命令の入力を受け付ける入力インターフェイスとを備える。プロセッサは、話者認識機能をオフにする命令を受信すると、話者認識機能をオフにするように構成されている。 A server according to an embodiment includes a memory for storing data for recognizing a speaker, and a processor for executing speech recognition processing by executing a speaker recognition function based on data held in the memory And an input interface for receiving an instruction input to the server. The processor is configured to turn off the speaker recognition function upon receiving an instruction to turn off the speaker recognition function.

この発明の上記および他の目的、特徴、局面および利点は、添付の図面と関連して理解されるこの発明に関する次の詳細な説明から明らかとなるであろう。 The above and other objects, features, aspects and advantages of the present invention will become apparent from the following detailed description of the present invention taken in conjunction with the accompanying drawings.

システムの概略の構成を表わすブロック図である。It is a block diagram showing the schematic structure of a system. 実施例１における、端末１００とサーバ１１０との間の処理の一部を表すシーケンスチャートである。6 is a sequence chart showing a part of processing between the terminal 100 and the server 110 in the first embodiment. サーバ１１０において話者認識機能がオフにされることを端末１００に確認する場合の処理を表すシーケンスチャートである。10 is a sequence chart showing processing when the terminal 100 confirms that the speaker recognition function is turned off in the server 110. 話者認識機能をオフにしたのちに話者を登録する場合の処理を表す図である。It is a figure showing the process in the case of registering a speaker after turning off a speaker recognition function. サーバ１１０が備えるＣＰＵ（図示しない）が実行する処理の一部を表わすフローチャートである。It is a flowchart showing a part of process which CPU (not shown) with which the server 110 is provided performs. 実施例５に係るサーバ１１０の処理の手順を表すフローチャートである。10 is a flowchart illustrating a processing procedure of a server 110 according to a fifth embodiment. 端末１００およびサーバ１１０がネットワーク７００に接続可能な状態であることを表わす図である。FIG. 2 is a diagram showing that terminal 100 and server 110 can be connected to network 700. 端末１００またはサーバ１１０が備えるモニタ９００における画面の変化を表わす図である。It is a figure showing the change of the screen in the monitor 900 with which the terminal 100 or the server 110 is provided. 複数の機器について機能のオン／オフの設定を変更するための画面を表す図である。It is a figure showing the screen for changing the setting of ON / OFF of a function about a some apparatus. 機器とユーザとが関連付けられている場合における画面の推移を表す図である。It is a figure showing transition of a screen in case a device and a user are associated. 登録されているユーザを一覧で表示できる場合の画面を表す図である。It is a figure showing the screen in case the registered user can be displayed by a list.

以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部品には同一の符号を付してある。それらの名称および機能も同じである。したがって、それらについての詳細な説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

図１を参照して、本実施の形態に係るシステムの構成について説明する。図１は、システムの概略の構成を表わすブロック図である。システムは、端末１００とサーバ１１０とを備える。 With reference to FIG. 1, the configuration of the system according to the present embodiment will be described. FIG. 1 is a block diagram showing a schematic configuration of the system. The system includes a terminal 100 and a server 110.

端末１００は、音声認識機能を有する。端末１００は、たとえば、掃除ロボット、テレビ、スマートフォン、冷蔵庫、電子レンジ、電気自動車、電動自転車、歩行補助車その他の対話型家電機器として実現される。端末１００は、通信インターフェイス１０１と、音声入力部１０２と、音声出力部１０３とを備える。 The terminal 100 has a voice recognition function. The terminal 100 is realized as, for example, a cleaning robot, a television, a smartphone, a refrigerator, a microwave oven, an electric vehicle, an electric bicycle, a walking auxiliary vehicle, or other interactive home appliances. The terminal 100 includes a communication interface 101, a voice input unit 102, and a voice output unit 103.

サーバ１１０は、音声認識機能を有する汎用コンピュータによって実現される。サーバ１１０は、通信インターフェイス１１１と、制御部１１２と、判断部１１３と、固定認識部１１４と、話者認識部１１５と、音声認識部１１６と、対話分析生成部１１７とを備える。なお、汎用コンピュータの構成は周知である。したがって、サーバ１１０のハードウェアの詳細な説明は繰り返さない。 The server 110 is realized by a general-purpose computer having a voice recognition function. The server 110 includes a communication interface 111, a control unit 112, a determination unit 113, a fixed recognition unit 114, a speaker recognition unit 115, a voice recognition unit 116, and a dialog analysis generation unit 117. The configuration of a general purpose computer is well known. Therefore, detailed description of the hardware of server 110 will not be repeated.

端末１００において、通信インターフェイス１０１は、サーバ１１０の通信インターフェイス１１１と通信する。通信の態様は、無線通信および有線通信いずれでもよい。無線通信は、たとえば、ＷｉＦｉ（Wireless Fidelity）、ＬＴＥ（Long Term Evolution）、ＮＦＣ（Near Field Communication）、Bluetooth（登録商標）のように公知の無線通信技術により実現される。有線通信は、たとえば、ＬＡＮ（Local Area Network）により実現される。 In the terminal 100, the communication interface 101 communicates with the communication interface 111 of the server 110. The communication mode may be either wireless communication or wired communication. The wireless communication is realized by a known wireless communication technology such as WiFi (Wireless Fidelity), LTE (Long Term Evolution), NFC (Near Field Communication), and Bluetooth (registered trademark). Wired communication is realized by, for example, a LAN (Local Area Network).

音声入力部１０２は、端末１００のユーザによる発話の入力を受け付けて、受け付けた音声信号を通信インターフェイス１０１を介してサーバ１１０に送信する。音声出力部１０３は、通信インターフェイス１０１を介してサーバ１１０から受信した音声データに基づき音声を出力する。 The voice input unit 102 receives an utterance input from the user of the terminal 100 and transmits the received voice signal to the server 110 via the communication interface 101. The audio output unit 103 outputs audio based on audio data received from the server 110 via the communication interface 101.

サーバ１１０において、通信インターフェイス１１１は、端末１００と通信し得る。サーバ１１０と端末１００との通信は、たとえば、サーバ１１０が端末１００に通信のリクエストを送信した場合に、または、サーバ１１０が端末１００から当該リクエストを受信した場合に開始される。サーバ１１０または端末１００において予め規定された条件が成立した場合に、または、サーバ１１０または端末１００のユーザが当該通信を要求する指示をサーバ１１０または端末１００に与えた場合に、当該リクエストは生成される。 In the server 110, the communication interface 111 can communicate with the terminal 100. The communication between the server 110 and the terminal 100 is started, for example, when the server 110 transmits a communication request to the terminal 100 or when the server 110 receives the request from the terminal 100. The request is generated when a predetermined condition is established in the server 110 or the terminal 100, or when the user of the server 110 or the terminal 100 gives an instruction to request the communication to the server 110 or the terminal 100. The

制御部１１２は、サーバ１１０の動作を制御する。制御部１１２は、ＣＰＵ（Central Processing Unit）その他の処理装置によって実現される。 The control unit 112 controls the operation of the server 110. The control unit 112 is realized by a CPU (Central Processing Unit) or other processing device.

判断部１１３は、通信インターフェイス１１１を介して受信した信号に基づいて、あるいは、サーバ１１０において保持されている設定の内容に基づいて、固定認識部１１４による認識および話者認識部１１５による認識のいずれを実行するかを判断する。固定認識部１１４による認識とは、話者に依存しない認識をいう。この場合、サーバ１１０に登録されているユーザ別の設定（たとえば、嗜好など）は、認識の内容または認識の結果に基づく応答に影響を及ぼさない。 Based on the signal received via the communication interface 111 or the content of the setting held in the server 110, the determination unit 113 recognizes either the recognition by the fixed recognition unit 114 or the recognition by the speaker recognition unit 115. Determine whether to execute. Recognition by the fixed recognition unit 114 refers to recognition independent of the speaker. In this case, the setting for each user (for example, preference) registered in the server 110 does not affect the response based on the recognition content or the recognition result.

固定認識部１１４は、端末１００またはサーバ１１０に与えられた音声が音声認識部１１６によって認識されると、当該認識の結果を生成する。認識の結果は、たとえば文字列によって表される。認識の結果は、サーバ１１０による問い合せに対する回答、サーバ１１０に対する命令等を含み得る。 When the voice given to the terminal 100 or the server 110 is recognized by the voice recognition unit 116, the fixed recognition unit 114 generates a result of the recognition. The recognition result is represented by a character string, for example. The recognition result may include an answer to the inquiry by the server 110, an instruction to the server 110, and the like.

話者認識部１１５は、サーバ１１０への音声信号の入力を検知すると、サーバ１１０に保持されているデータに基づいて、音声の内容および当該音声を与えた話者を認識するための処理を実行する。当該データは、たとえば、声紋情報、当該声紋情報の識別情報、当該声紋が登録されたユーザの識別子および名前等を含む。当該データは、サーバ１１０の記憶装置に保持され、あるいは、他の情報通信装置からサーバ１１０に送られる。 When the speaker recognition unit 115 detects an input of an audio signal to the server 110, the speaker recognition unit 115 executes a process for recognizing the content of the audio and the speaker who gave the audio based on the data held in the server 110. To do. The data includes, for example, voiceprint information, identification information of the voiceprint information, an identifier and a name of a user in which the voiceprint is registered. The data is stored in the storage device of the server 110 or is sent to the server 110 from another information communication device.

音声認識部１１６は、制御部１１２から送られるデータに基づき、サーバ１１０に与えられた音声を認識する処理を実行する。当該処理の手法は、特に限定されない。 The voice recognition unit 116 executes a process for recognizing the voice given to the server 110 based on the data sent from the control unit 112. The method of the process is not particularly limited.

対話分析生成部１１７は、音声認識部１１６による認識の結果として制御部１１２から出力されるデータに基づき、端末１００またはサーバ１１０のユーザによる発話の内容を分析し、または当該発話に対する応答を生成する。 The dialogue analysis generation unit 117 analyzes the content of the utterance by the user of the terminal 100 or the server 110 based on the data output from the control unit 112 as a result of recognition by the voice recognition unit 116, or generates a response to the utterance. .

端末１００およびサーバ１１０は、上記の各機能を実現するソフトウェアモジュールによって、もしくは、各機能を実現する回路素子その他のハードウェアモジュールによって、または、ソフトウェアモジュールとハードウェアモジュールとの組み合わせによって実現され得る。 The terminal 100 and the server 110 can be realized by a software module that realizes each function described above, by a circuit element or other hardware module that realizes each function, or by a combination of a software module and a hardware module.

［構成］
（１）サーバ１１０は、メモリと、プロセッサと、通信インタフェイスを備える。メモリは、話者を認識するデータを格納する。通信インタフェイスは、第一の音声信号を受信する。通信インタフェイスが第一の音声信号を受信すると、プロセッサは、受信した第一の音声信号及び話者を認識するデータに基いて話者認識機能を実行し、第一の音声信号及び話者認識機能の実行結果に基づき第二の音声信号を作成し、第二の音声信号を通信インタフェイスを介して出力する。通信インタフェイスは、サーバ１１０に対する命令をさらに受信する。通信インタフェイスが話者認識機能をオフにする命令を受信すると、プロセッサは、話者認識機能をオフにするように構成されている。 [Constitution]
(1) The server 110 includes a memory, a processor, and a communication interface. The memory stores data for recognizing the speaker. The communication interface receives the first audio signal. When the communication interface receives the first voice signal, the processor performs a speaker recognition function based on the received first voice signal and the data for recognizing the speaker, and the first voice signal and the speaker recognition. A second audio signal is created based on the execution result of the function, and the second audio signal is output via the communication interface. The communication interface further receives instructions for the server 110. When the communication interface receives an instruction to turn off the speaker recognition function, the processor is configured to turn off the speaker recognition function.

（２）プロセッサは、話者認識機能がオフにされたことを示す信号を通信インタフェイスを介してさらに出力するように構成されている。 (2) The processor is configured to further output a signal indicating that the speaker recognition function is turned off via the communication interface.

（３）通信インタフェイスが話者認識機能をオフにする命令を受信すると、プロセッサは、当該話者認識機能をオフにしてもよいかどうかをユーザに問い合わせるための信号を通信インタフェイスを介して出力するように構成されている。 (3) When the communication interface receives a command to turn off the speaker recognition function, the processor sends a signal to the user via the communication interface to inquire whether the speaker recognition function may be turned off. It is configured to output.

（４）通信インタフェイスが話者認識機能をオフにする命令を受信すると、プロセッサは、当該命令を与えた話者の名前を問い合わせ、通信インタフェイスは、話者の名前を受信する。プロセッサは、受信した話者の名前により特定される個人情報に基づき第二の音声信号を作成するように構成されている。 (4) When the communication interface receives a command to turn off the speaker recognition function, the processor inquires about the name of the speaker who gave the command, and the communication interface receives the name of the speaker. The processor is configured to generate a second audio signal based on personal information specified by the received speaker name.

（５）プロセッサは、登録された話者の名前を通信インタフェイスを介して出力するように構成されている。 (5) The processor is configured to output the name of the registered speaker via the communication interface.

（６）プロセッサは、話者認識機能に使用される声紋情報に基づいて話者認識機能をオフにする命令を与えた話者を特定し、話者が特定された場合に、当該特定された話者の声紋情報により特定される当該話者の個人情報に基づき第二の音声信号を作成するように構成されている。 (6) The processor identifies the speaker who gave the instruction to turn off the speaker recognition function based on the voiceprint information used for the speaker recognition function, and when the speaker is identified, the identified A second audio signal is generated based on the personal information of the speaker specified by the voiceprint information of the speaker.

（７）プロセッサは、特定された話者の個人情報を使用することを当該話者に確認する信号を通信インタフェイスを介して出力するように構成されている。 (7) The processor is configured to output, via a communication interface, a signal confirming the speaker that the personal information of the identified speaker is to be used.

（８）プロセッサは、一人分の声紋情報がサーバ１１０に保持されている場合に、サーバ１１０における現在の設定を音声認識処理に利用するか否かを確認する信号を通信インタフェイスを介して出力するように構成されている。 (8) When the voice print information for one person is held in the server 110, the processor outputs a signal for confirming whether or not the current setting in the server 110 is used for the voice recognition processing through the communication interface. Is configured to do.

（９）他の局面に従うシステムは、サーバ１１０と、対話型家電機器とを備える。対話型家電機器は、音声の入力を受け付けるように構成された音声入力部と、入力された音声に基づく信号をサーバ１１０に送信するように構成された通信部とを備える。サーバ１１０は、上記の構成を備える。 (9) A system according to another aspect includes a server 110 and an interactive home appliance. The interactive home appliance includes a voice input unit configured to accept voice input and a communication unit configured to transmit a signal based on the input voice to the server 110. The server 110 has the above configuration.

（１０）他の局面に従うシステムは、サーバ１１０と通信可能な情報処理端末をさらに備える。情報処理端末は、モニタと、メモリと、通信インタフェイスと、プロセッサとを備える。情報処理装置のプロセッサは、話者認識機能をオフにする命令を受け付けるための画面と、サーバ１１０による話者認識機能の使用者を管理する画面とをモニタに表示させるように構成されている。 (10) A system according to another aspect further includes an information processing terminal capable of communicating with server 110. The information processing terminal includes a monitor, a memory, a communication interface, and a processor. The processor of the information processing apparatus is configured to display on the monitor a screen for receiving an instruction to turn off the speaker recognition function and a screen for managing a user of the speaker recognition function by the server 110.

（１１）他の局面において、サーバ１１０による音声認識機能を管理するための方法は、話者を認識するデータにアクセスするステップと、第一の音声信号を受信するステップと、第一の音声信号の受信に基づいて、受信した第一の音声信号及び話者を認識するデータに基いて話者認識機能を実行するステップと、第一の音声信号及び話者認識機能の実行結果に基づき第二の音声信号を作成するステップと、第二の音声信号を通信インタフェイスを介して出力するステップと、サーバ１１０に対する命令をさらに受信するステップと、話者認識機能をオフにする命令の受信に基づいて、話者認識機能をオフにするステップとを含む。 (11) In another aspect, a method for managing a voice recognition function by the server 110 includes a step of accessing data for recognizing a speaker, a step of receiving a first voice signal, and a first voice signal. On the basis of the received first speech signal and the data for recognizing the speaker, and on the basis of the execution result of the first speech signal and the speaker recognition function. Based on receiving a command to turn off the speaker recognition function, generating a second voice signal via the communication interface, receiving further commands to the server 110, and And turning off the speaker recognition function.

（１２）他の局面において、情報通信端末を制御するためのプログラムは情報通信端末に、話者認識機能を有するサーバ１１０との通信を確立するステップと、情報通信端末のモニタに、話者認識機能をオフにする命令を受け付けるための画面を表示するステップと、サーバ１１０による話者認識機能の使用者を管理する画面をモニタに表示するステップとを実行させる。 (12) In another aspect, the program for controlling the information communication terminal establishes communication with the server 110 having the speaker recognition function in the information communication terminal, and performs speaker recognition on the monitor of the information communication terminal. A step of displaying a screen for accepting an instruction to turn off the function and a step of displaying on the monitor a screen for managing a user of the speaker recognition function by the server 110 are executed.

［実施例１］
以下、本開示に係るシステムの実施例について説明する。図２は、実施例１における、端末１００とサーバ１１０との間の処理の一部を表すシーケンスチャートである。 [Example 1]
Hereinafter, examples of the system according to the present disclosure will be described. FIG. 2 is a sequence chart showing a part of processing between the terminal 100 and the server 110 in the first embodiment.

図２に示されるように、端末１００とサーバ１１０とは、通信を確立する。たとえば、端末１００のユーザによる操作に応答して、当該通信が確立される。 As shown in FIG. 2, the terminal 100 and the server 110 establish communication. For example, the communication is established in response to an operation by the user of the terminal 100.

ステップ２１０にて、端末１００は、端末１００のユーザによる発話に応答して、サーバ１１０に対して話者認識機能をオフにするための命令を送信する。この命令は、たとえば、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、当該命令を音声認識することによって得られた文字列または当該命令を与えた発話に基づく音声信号等を含み得る。当該発話は、たとえば、「話者認識機能をオフにして」といった命令を含む。なお、上記ユーザ識別子は個人ユーザの識別子でもいいし、家族等複数の個人からなるグループに対する識別子でもよい。 In step 210, terminal 100 transmits a command for turning off the speaker recognition function to server 110 in response to the utterance by the user of terminal 100. This command includes, for example, an IP (Internet Protocol) address of the server 110 with which communication has been established, an IP address of the terminal 100, a terminal identifier or a user identifier, a character string obtained by voice recognition of the command, or the command. An audio signal based on a given utterance may be included. The utterance includes, for example, a command such as “turn off speaker recognition function”. The user identifier may be an individual user identifier or an identifier for a group of a plurality of individuals such as family members.

サーバ１１０において、制御部１１２が、端末１００によって送信された当該命令を受信すると、音声認識部１１６は、当該命令に含まれる音声を解析して話者を認識する。別の局面において、当該命令に含まれる音声が端末１００において既に解析されている場合には、音声認識部１１６は、当該認識処理を実行しない。 In the server 110, when the control unit 112 receives the command transmitted from the terminal 100, the voice recognition unit 116 analyzes the voice included in the command and recognizes the speaker. In another aspect, when the voice included in the command has already been analyzed in the terminal 100, the voice recognition unit 116 does not execute the recognition process.

制御部１１２は、話者認識部１１５による話者認識機能をオフにする。たとえば、制御部１１２は、サーバ１１０における制御の設定を変更する。 The control unit 112 turns off the speaker recognition function by the speaker recognition unit 115. For example, the control unit 112 changes control settings in the server 110.

ステップ２２０にて、サーバ１１０は、端末１００に対して、話者認識機能をオフにした旨の回答を送信する。 In step 220, server 110 transmits an answer indicating that the speaker recognition function is turned off to terminal 100.

この結果、下記のように処理が変更される。端末１００は、端末１００のユーザによる発話（例えば「交通情報教えて」）に応答して、サーバ１１０に対して音声信号を送信する。この音声信号は、たとえば、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、をさらに含み得る。サーバ１１０では、送信された音声信号を通信インターフェイス１１１が受信し、上記のシーケンスがなされる前は話者認識部１１５を利用して話者を認識し音声信号に対して話者に応じた返答信号（例えば、各個人の通勤・通学経路を鑑みた電車遅延情報を別サーバから取得して「電車の遅延はありません」と応答）を作成していたが、上記のシーケンスがなされた後は固定認識部１１４を利用して話者を認識し、音声信号に対して話者に応じた返答信号（例えば「電車の遅延はありません」）を作成するように変更される。作成された返答信号は通信インターフェイス１１１を介して端末１００に送信される。 As a result, the processing is changed as follows. The terminal 100 transmits an audio signal to the server 110 in response to an utterance (for example, “Tell me traffic information”) by the user of the terminal 100. The audio signal may further include, for example, an IP (Internet Protocol) address of the server 110 with which communication has been established, an IP address of the terminal 100, a terminal identifier, or a user identifier. In the server 110, the communication interface 111 receives the transmitted voice signal, and before the above sequence is performed, the speaker recognition unit 115 is used to recognize the speaker and respond to the voice signal according to the speaker. A signal (eg, train delay information that takes into account each individual's commuting / commuting route) was created from another server and a response saying “There is no train delay” was created, but fixed after the above sequence was made The recognition unit 114 is used to recognize the speaker, and a response signal (for example, “no train delay”) corresponding to the speaker is generated for the voice signal. The created response signal is transmitted to the terminal 100 via the communication interface 111.

［実施例２］
以下、実施例２について説明する。図３は、サーバ１１０において話者認識機能がオフにされることを端末１００に確認する場合の処理を表すシーケンスチャートである。 [Example 2]
Example 2 will be described below. FIG. 3 is a sequence chart showing processing when the terminal 100 confirms that the speaker recognition function is turned off in the server 110.

図３に示されるように、ステップ２１０にて、端末１００は、話者認識機能をオフにするための命令をサーバ１１０に送信する。 As shown in FIG. 3, in step 210, terminal 100 transmits a command for turning off the speaker recognition function to server 110.

ステップ３１０にて、サーバ１１０は、端末１００に対して、たとえば「話者認識機能をオフにします。よろしいですか？」といった確認メッセージを送信する。 At step 310, server 110 transmits a confirmation message such as “Turn off speaker recognition function. Are you sure?” To terminal 100.

ステップ３２０にて、端末１００は、確認メッセージに対する応答が端末１００のユーザによって入力されたことに基づいて、サーバ１１０に対し、ステップ３１０において受信した音声に対する回答（たとえば「ＯＫ」を表す信号）を送信する。なお、この端末１００から送信される回答は、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、当該命令を音声認識することによって得られた文字列または当該命令を与えた発話に基づく音声信号等を含み得る。 In step 320, terminal 100 provides server 110 with an answer to the voice received in step 310 (for example, a signal indicating “OK”) based on the response to the confirmation message being input by the user of terminal 100. Send. The reply sent from the terminal 100 was obtained by voice recognition of the IP (Internet Protocol) address of the server 110 with which communication was established, the IP address of the terminal 100, the terminal identifier or user identifier, and the command. An audio signal based on a character string or an utterance given the instruction may be included.

ステップ２２０にて、サーバ１１０は、端末１００に対し話者認識機能をオフにした旨の応答を送信する。これにより、端末１００のユーザは、話者認識機能をオフにしたことを確認できる。 In step 220, server 110 transmits a response indicating that the speaker recognition function is turned off to terminal 100. Thereby, the user of the terminal 100 can confirm that the speaker recognition function is turned off.

この結果、下記のように処理が変更される。端末１００は、そのユーザによる発話に応答して、サーバ１１０に対して音声信号を送信する。この音声信号は、たとえば、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、をさらに含み得る。サーバ１１０では、通信インターフェイス１１１が端末１００によって送信された音声信号を受信し、上記の一連の処理が実行される前は話者認識部１１５を利用して話者を認識し音声信号に対して話者に応じた返答信号を作成する。一方、上記の一連の処理が実行された後は、サーバ１１０は、固定認識部１１４を利用して話者を認識し、音声信号に対して話者に応じた返答信号を作成するように変更される。作成された返答信号は通信インターフェイス１１１を介して端末１００に送信される。 As a result, the processing is changed as follows. The terminal 100 transmits an audio signal to the server 110 in response to the utterance by the user. The audio signal may further include, for example, an IP (Internet Protocol) address of the server 110 with which communication has been established, an IP address of the terminal 100, a terminal identifier, or a user identifier. In the server 110, the communication interface 111 receives the audio signal transmitted by the terminal 100, and recognizes the speaker using the speaker recognition unit 115 before the above series of processing is executed. Create a response signal according to the speaker. On the other hand, after the above-described series of processing is executed, the server 110 is changed so as to recognize the speaker using the fixed recognition unit 114 and generate a response signal corresponding to the speaker for the voice signal. Is done. The created response signal is transmitted to the terminal 100 via the communication interface 111.

［実施例３］
以下、実施例３について説明する。図４は、話者認識機能をオフにしたのちに話者を登録する場合の処理を表す図である。図４に示されるように、ステップ２１０にて、端末１００は、サーバ１１０に対し、話者認識機能をオフにする旨の命令を送信する。ステップ２２０にて、サーバ１１０は、端末１００に対し、話者認識機能をオフにした旨の回答を送信する。 [Example 3]
Example 3 will be described below. FIG. 4 is a diagram illustrating processing when a speaker is registered after the speaker recognition function is turned off. As shown in FIG. 4, in step 210, terminal 100 transmits a command to turn off the speaker recognition function to server 110. In step 220, server 110 transmits an answer indicating that the speaker recognition function is turned off to terminal 100.

ステップ４１０にて、サーバ１１０は、話者認識機能がオフにされたこと、または、話者認識機能がオフにされた状態で発話が行われたことを検知する。そこで、サーバ１１０は、端末１００に対し、話者を問い合わせるメッセージ（たとえば「名前を教えて下さい。」）を送信する。端末１００は、そのメッセージを受信すると、そのメッセージに基づく音声を出力する。端末１００のユーザは、その音声を認識すると、メッセージに対する回答を端末１００に発する。 In step 410, server 110 detects that the speaker recognition function has been turned off or that the utterance has been performed with the speaker recognition function turned off. Therefore, the server 110 transmits a message (for example, “Tell me your name”) for inquiring about the speaker to the terminal 100. When terminal 100 receives the message, terminal 100 outputs sound based on the message. When the user of terminal 100 recognizes the voice, it issues an answer to the message to terminal 100.

ステップ４２０にて、端末１００は、ユーザからの発話（たとえば「権兵衛です。」）を受け付けて、その発話に応じた信号をサーバ１１０に送信する。なお、この発話に応じた信号は、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、当該命令を音声認識することによって得られた文字列または当該命令を与えた発話に基づく音声信号等を含み得る。サーバ１１０は、端末１００から音声信号を受信したことを検知すると、音声認識処理を行ない、端末１００のユーザが「権兵衛」であることを認識する。サーバ１１０は、音声合成を行ない、「権兵衛」に対する挨拶のメッセージを生成する。 In step 420, terminal 100 accepts an utterance from the user (for example, “It is Gonbei”), and transmits a signal corresponding to the utterance to server 110. The signal corresponding to this utterance is an IP (Internet Protocol) address of the server 110 with which communication has been established, an IP address of the terminal 100, a terminal identifier or user identifier, and a character string obtained by voice recognition of the command. Alternatively, it may include an audio signal based on the utterance given the command. When the server 110 detects that a voice signal has been received from the terminal 100, the server 110 performs voice recognition processing and recognizes that the user of the terminal 100 is “Gonbei”. The server 110 performs speech synthesis and generates a greeting message for “Gonbei”.

ステップ４３０にて、サーバ１１０は、端末１００に対し、当該挨拶（たとえば「権兵衛さん、よろしくお願い致します。」）を送信する。端末１００は、サーバ１１０から挨拶を受信すると、当該挨拶を音声で出力する。 At step 430, server 110 transmits the greeting (for example, “Mr. Gonbei, thank you”) to terminal 100. When the terminal 100 receives a greeting from the server 110, the terminal 100 outputs the greeting by voice.

この結果、下記のように処理が変更される。端末１００は、そのユーザによる発話に応答して、サーバ１１０に対して音声信号を送信する。この音声信号は、たとえば、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、をさらに含み得る。サーバ１１０では、通信インターフェイス１１１が送信された音声信号を受信し、上記のシーケンスがなされる前は話者認識部１１５を利用して話者を認識し音声信号に対して話者に応じた返答信号を作成していたが、上記のシーケンスがなされた後は固定認識部１１４を利用して話者を認識し、音声信号に対して話者に応じた返答信号を作成するように変更される。作成された返答信号は通信インターフェイス１１１を介して端末１００に送信される。 As a result, the processing is changed as follows. The terminal 100 transmits an audio signal to the server 110 in response to the utterance by the user. The audio signal may further include, for example, an IP (Internet Protocol) address of the server 110 with which communication has been established, an IP address of the terminal 100, a terminal identifier, or a user identifier. In the server 110, the communication interface 111 receives the transmitted voice signal, and before the above sequence is performed, the speaker recognition unit 115 is used to recognize the speaker, and the voice signal is answered according to the speaker. The signal was created, but after the above sequence is made, the fixed recognition unit 114 is used to recognize the speaker, and the voice signal is changed to create a response signal corresponding to the speaker. . The created response signal is transmitted to the terminal 100 via the communication interface 111.

［実施例４］
図５を参照して、実施例４に係るサーバ１１０の制御構造について説明する。図５は、サーバ１１０が備えるＣＰＵ（図示しない）が実行する処理の一部を表わすフローチャートである。本実施の形態において、サーバ１１０は、話者認識された情報として、当該話者を識別するためのユーザデータと、当該ユーザデータに関連付けられた声紋情報とをメモリに保持している。サーバ１１０と端末１００とは、通信可能の状態である。端末１００は、ユーザから話者認識機能をオフにする旨の命令を音声で受け付けると、その命令をサーバ１１０に送信する。 [Example 4]
A control structure of the server 110 according to the fourth embodiment will be described with reference to FIG. FIG. 5 is a flowchart showing a part of processing executed by a CPU (not shown) included in server 110. In the present embodiment, server 110 holds, as memory-recognized information, user data for identifying the speaker and voiceprint information associated with the user data in a memory. The server 110 and the terminal 100 are in a communicable state. When the terminal 100 receives a command to turn off the speaker recognition function from the user by voice, the terminal 100 transmits the command to the server 110.

ステップ５１０にて、サーバ１１０のＣＰＵは、話者認識機能をオフにする命令を端末１００から受信する。なお、この命令は、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、当該命令を音声認識することによって得られた文字列または当該命令を与えた発話に基づく音声信号等を含み得る。ＣＰＵは、当該命令を与えた音声信号を解析し、音声から声紋情報を抽出する。 In step 510, CPU of server 110 receives a command to turn off the speaker recognition function from terminal 100. This command includes the IP (Internet Protocol) address of the server 110 with which communication has been established, the IP address of the terminal 100, a terminal identifier or user identifier, a character string obtained by voice recognition of the command, or the command. An audio signal based on a given utterance may be included. The CPU analyzes the voice signal to which the instruction is given and extracts voiceprint information from the voice.

ステップ５２０にて、ＣＰＵは、ステップ５１０において受信した音声命令の声紋情報とサーバ１１０において現在所有される声紋情報とが一致するか否かを判断する。なお、上記現在所有する声紋情報は端末識別子またはユーザ識別子毎に管理されていてもよい。ＣＰＵは、これらの声紋情報が一致すると判断すると（ステップ５２０にてＹＥＳ）、制御をステップ５３０に切り換える。そうでない場合には（ステップ５２０にてＮＯ）、ＣＰＵは、制御をステップ５６０に切り換える。 In step 520, the CPU determines whether the voiceprint information of the voice command received in step 510 matches the voiceprint information currently owned in server 110. Note that the currently owned voiceprint information may be managed for each terminal identifier or user identifier. If CPU determines that these voiceprint information matches (YES in step 520), CPU switches control to step 530. If not (NO in step 520), the CPU switches control to step 560.

ステップ５３０にて、ＣＰＵは、一致したユーザの情報を使ってよいかユーザに尋ねる。より具体的には、ＣＰＵは、ユーザに尋ねるための音声信号を生成し、生成された音声信号を端末１００に送信する。端末１００は、音声信号をサーバ１１０から受信すると、その信号に基づく音声を出力する。端末１００のユーザは、出力された音声に対して発声することにより、問い合わせに対する回答を端末１００に入力する。その回答は、端末１００からサーバ１１０に送られる。なお、この回答は、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、当該命令を音声認識することによって得られた文字列または当該命令を与えた発話に基づく音声信号等を含み得る。サーバ１１０において、ＣＰＵは、予め定められた時間内に当該問合せに対する回答を端末１００から受信したか否かを確認する。ＣＰＵは、当該予め定められた時間内に、ユーザの情報を使ってよい旨の回答を当該ユーザから受信したと判断すると（ステップ５３０にてＹＥＳ）、制御をステップ５４０に切り換える。そうでない場合には（ステップ５３０にてＮＯ）、ＣＰＵは、制御をステップ５６０に切り換える。 At step 530, the CPU asks the user whether the matched user information can be used. More specifically, the CPU generates an audio signal for asking the user, and transmits the generated audio signal to the terminal 100. When terminal 100 receives an audio signal from server 110, terminal 100 outputs audio based on the signal. The user of the terminal 100 inputs an answer to the inquiry to the terminal 100 by uttering the output voice. The answer is sent from the terminal 100 to the server 110. This answer includes the IP (Internet Protocol) address of the server 110 with which communication has been established, the IP address of the terminal 100, the terminal identifier or user identifier, the character string obtained by voice recognition of the command, or the command. An audio signal based on a given utterance may be included. In server 110, the CPU confirms whether or not an answer to the inquiry has been received from terminal 100 within a predetermined time. If the CPU determines that an answer indicating that the user information may be used is received from the user within the predetermined time (YES in step 530), the CPU switches the control to step 540. If not (NO in step 530), the CPU switches control to step 560.

ステップ５４０にて、ＣＰＵは、その人のデータをそのまま利用した固定認識とする。サーバ１１０は、既に保持されている声紋情報に基づいて音声認識処理を実行する。したがって、サーバ１１０に登録されているユーザによる発話として、以降の発話も認識され得る。 In step 540, the CPU performs fixed recognition using the person's data as it is. The server 110 executes a speech recognition process based on the voiceprint information already held. Therefore, subsequent utterances can be recognized as utterances by the user registered in the server 110.

ステップ５５０にて、ＣＰＵは、その人のデータをそのまま利用した固定認識として話者認識機能をオフとしたことを出力する。具体的には、サーバ１１０は端末１００に対しその旨を表わす信号を送信する。端末１００は、そのような信号を受信すると、音声出力部１０３を介してその旨を発話する。端末１００のユーザは、既に登録されている自身の声紋情報に基づく音声認識が継続されるということを認識できる。 In step 550, the CPU outputs that the speaker recognition function is turned off as fixed recognition using the person's data as it is. Specifically, the server 110 transmits a signal indicating that to the terminal 100. When the terminal 100 receives such a signal, the terminal 100 utters that fact via the audio output unit 103. The user of the terminal 100 can recognize that the speech recognition based on his / her own voice print information that has already been registered is continued.

ステップ５６０にて、ＣＰＵは、端末１００のユーザに対して名前を聞く。具体的にはＣＰＵは、名前を問い合わせるためのメッセージを生成する。ＣＰＵは、端末１００に対して、当該メッセージに基づく信号を送信する。端末１００は、その信号を受信すると、音声出力部１０３を介して名前を問い合わせるメッセージを出力する。端末１００のユーザがそのメッセージを認識して名前を発話すると、端末１００は、その発話された音声を音声入力部１０２において受け付けて、その音声に応じた信号をサーバ１１０に送信する。 At step 560, the CPU asks the user of terminal 100 for the name. Specifically, the CPU generates a message for inquiring the name. The CPU transmits a signal based on the message to the terminal 100. When the terminal 100 receives the signal, the terminal 100 outputs a message asking for a name via the voice output unit 103. When the user of terminal 100 recognizes the message and utters a name, terminal 100 accepts the spoken voice in voice input unit 102 and transmits a signal corresponding to the voice to server 110.

ステップ５７０にて、ＣＰＵは、受信したユーザの名前で固定認識とする。具体的には、サーバ１１０において、固定認識部１１４は、新しいユーザプロファイルデータを新たに作成する。ユーザプロファイルデータは、たとえば、ユーザを識別するコード、ユーザの名前、発話から抽出された声紋情報等を含み得る。 In step 570, the CPU performs fixed recognition using the received user name. Specifically, in the server 110, the fixed recognition unit 114 newly creates new user profile data. The user profile data may include, for example, a code for identifying the user, the name of the user, voiceprint information extracted from the utterance, and the like.

ステップ５８０にて、ＣＰＵは、ユーザ名とともに固定認識とした話者認識機能をオフにしたことを出力する。具体的には、サーバ１１０は端末１００に対してその旨を表わす信号を送信する。端末１００は、その信号を受信すると、音声出力部１０３を介してメッセージで出力する。 In step 580, the CPU outputs that the speaker recognition function, which is fixed recognition, is turned off together with the user name. Specifically, the server 110 transmits a signal indicating that to the terminal 100. When the terminal 100 receives the signal, the terminal 100 outputs it as a message via the audio output unit 103.

この結果、下記のように処理が変更される。端末１００は、そのユーザによる発話に応答して、サーバ１１０に対して音声信号を送信する。この音声信号は、たとえば、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、をさらに含み得る。サーバ１１０では、通信インターフェイス１１１が、端末１００によって送信された音声信号を受信し、上記の一連の処理がなされる前は話者認識部１１５を利用して話者を認識し音声信号に対して話者に応じた返答信号を作成する。他方、上記の一連の処理がなされた後は、サーバ１１０は、固定認識部１１４を利用して話者を認識し、音声信号に対して話者に応じた返答信号を作成するように変更される。作成された返答信号は、通信インターフェイス１１１を介して端末１００に送信される。 As a result, the processing is changed as follows. The terminal 100 transmits an audio signal to the server 110 in response to the utterance by the user. The audio signal may further include, for example, an IP (Internet Protocol) address of the server 110 with which communication has been established, an IP address of the terminal 100, a terminal identifier, or a user identifier. In the server 110, the communication interface 111 receives the audio signal transmitted by the terminal 100, and recognizes the speaker using the speaker recognition unit 115 before the above-described series of processing is performed. Create a response signal according to the speaker. On the other hand, after the above-described series of processing has been performed, the server 110 is changed to recognize the speaker using the fixed recognition unit 114 and to generate a response signal corresponding to the speaker for the voice signal. The The created response signal is transmitted to the terminal 100 via the communication interface 111.

［実施例５］
図６を参照して、実施例５について説明する。図６は、実施例５に係るサーバ１１０の処理の手順を表すフローチャートである。端末１００とサーバ１１０とは、通信可能な状態にある。ある局面において、端末１００のユーザは、話者認識機能をオフにするべき旨を端末１００に向かって発話する。端末１００は、その発話に応じた音声信号をサーバ１１０に送信する。 [Example 5]
Example 5 will be described with reference to FIG. FIG. 6 is a flowchart illustrating a processing procedure of the server 110 according to the fifth embodiment. The terminal 100 and the server 110 are in a communicable state. In a certain situation, the user of the terminal 100 speaks to the terminal 100 that the speaker recognition function should be turned off. The terminal 100 transmits an audio signal corresponding to the utterance to the server 110.

ステップ６１０にて、サーバ１１０のＣＰＵは、話者認識機能をオフにする命令を端末１００から受信する。 In step 610, CPU of server 110 receives a command to turn off the speaker recognition function from terminal 100.

ステップ６２０にて、ＣＰＵは、現在の持っている声紋情報が１人分であるか否かを判断する。この判断は、サーバ１１０の記憶装置（たとえばハードディスク）に格納されているデータに基づいて行なわれる。ＣＰＵは、現在持っている声紋情報が１人分であることを確認すると（ステップ６２０にてＹＥＳ）、制御をステップ６３０に切り換える。そうでない場合には（ステップ６２０にてＮＯ）、ＣＰＵは、制御をステップ６６０に切り換える。 In step 620, the CPU determines whether the current voiceprint information is for one person. This determination is made based on data stored in a storage device (for example, a hard disk) of server 110. When the CPU confirms that the current voiceprint information is for one person (YES in step 620), the CPU switches control to step 630. If not (NO in step 620), the CPU switches control to step 660.

ステップ６３０にて、ＣＰＵは、有している声紋情報がその人であるかどうかを当該端末１００のユーザに問い合わせる。より具体的には、サーバ１１０は、その問い合わせのための信号を生成し、当該信号を端末１００に送信する。端末１００は、その信号を受信すると、その問い合わせのメッセージを音声出力部１０３を介して出力する。 In step 630, the CPU inquires of the user of the terminal 100 whether or not the voiceprint information it has is that person. More specifically, the server 110 generates a signal for the inquiry and transmits the signal to the terminal 100. When the terminal 100 receives the signal, the terminal 100 outputs the inquiry message via the voice output unit 103.

ステップ６４０にて、ＣＰＵは、予め定められた時間内に、問い合わせに対するＹＥＳの回答を受信したか否かを判断する。ＣＰＵは、ＹＥＳの回答を受信したと判断すると（ステップ６４０にてＹＥＳ）、制御をステップ６５０に切り換える。そうでない場合には（ステップ６４０にてＮＯ）、ＣＰＵは、制御をステップ６６０に切り換える。 In step 640, the CPU determines whether or not a YES response to the inquiry has been received within a predetermined time. When CPU determines that the answer of YES has been received (YES in step 640), the control is switched to step 650. If not (NO in step 640), the CPU switches control to step 660.

ステップ６５０にて、ＣＰＵは、その人のデータをそのまま利用した固定認識を行なわうことを決定する。その結果、固定認識部１１４が機能し得る。 In step 650, the CPU determines to perform fixed recognition using the person's data as it is. As a result, the fixed recognition unit 114 can function.

ステップ６５５にて、ＣＰＵは、話者認識機能をオフにした旨を端末１００に対して出力する。端末１００は、その旨を音声出力部１０３を介してメッセージとして出力する。 In step 655, the CPU outputs to the terminal 100 that the speaker recognition function has been turned off. The terminal 100 outputs a message to that effect via the voice output unit 103.

ステップ６６０にて、ＣＰＵは、受信した話者認識機能をオフにする命令の声紋情報と現在所有する声紋情報とが一致するか否かを判断する。ＣＰＵは、これらの声紋情報が一致すると判断すると（ステップ６６０にてＹＥＳ）、制御をステップ６６５に切り換える。そうでない場合には（ステップ６６０にてＮＯ）、ＣＰＵは、制御をステップ６８０に切り換える。 In step 660, the CPU determines whether or not the voiceprint information of the received command to turn off the speaker recognition function matches the voiceprint information currently owned. If CPU determines that these voiceprint information matches (YES in step 660), CPU switches control to step 665. If not (NO in step 660), the CPU switches control to step 680.

ステップ６６５にて、ＣＰＵは、その人のデータをそのまま利用した固定認識とすることを決定する。 In step 665, the CPU determines to perform fixed recognition using the person's data as it is.

ステップ６７０にて、ＣＰＵは、その人のデータをそのまま利用した固定認識として話者認識機能をオフとしたことを端末１００に出力する。端末１００は、その旨を信号を受信すると、当該信号に基づいて音声を合成して、音声出力部１０３を介して当該音声を出力する。 In step 670, the CPU outputs to terminal 100 that the speaker recognition function is turned off as fixed recognition using the person's data as it is. When the terminal 100 receives a signal to that effect, the terminal 100 synthesizes voice based on the signal and outputs the voice via the voice output unit 103.

ステップ６８０にて、ＣＰＵは、端末１００のユーザに名前を問い合わせる。
ステップ６８５にて、ＣＰＵは、端末１００から受信したユーザ名で固定認識を行なうことを決定する。より具体的には、ＣＰＵは、受信したユーザ名に関連付けられる声紋情報を新たに作成する。 In step 680, the CPU inquires of the user of terminal 100 about the name.
In step 685, the CPU determines to perform fixed recognition using the user name received from terminal 100. More specifically, the CPU newly creates voiceprint information associated with the received user name.

ステップ６９０にて、サーバ１１０のＣＰＵは、ユーザ名とともに固定認識とした話者認識をオフにしたことを示す信号を端末１００に送信する。 In step 690, CPU of server 110 transmits to terminal 100 a signal indicating that speaker recognition, which is fixed recognition, is turned off together with the user name.

この結果、下記のように処理が変更される。端末１００は、そのユーザによる発話に応答して、サーバ１１０に対して音声信号を送信する。この音声信号は、たとえば、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末１００のＩＰアドレス、端末識別子またはユーザ識別子、をさらに含み得る。サーバ１１０では、通信インターフェイス１１１が、端末１００によって送信された音声信号を受信し、上記の一連の処理が実行される前は話者認識部１１５を利用して話者を認識し音声信号に対して話者に応じた返答信号を作成する。一方、上記の一連の処理が実行された後は、サーバ１１０は、固定認識部１１４を利用して話者を認識し、音声信号に対して話者に応じた返答信号を作成するように変更される。作成された返答信号は、通信インターフェイス１１１を介して端末１００に送信される。 As a result, the processing is changed as follows. The terminal 100 transmits an audio signal to the server 110 in response to the utterance by the user. The audio signal may further include, for example, an IP (Internet Protocol) address of the server 110 with which communication has been established, an IP address of the terminal 100, a terminal identifier, or a user identifier. In the server 110, the communication interface 111 receives the voice signal transmitted by the terminal 100, and recognizes the speaker using the speaker recognition unit 115 before executing the above-described series of processing. To create a response signal according to the speaker. On the other hand, after the above-described series of processing is executed, the server 110 is changed so as to recognize the speaker using the fixed recognition unit 114 and generate a response signal corresponding to the speaker for the voice signal. Is done. The created response signal is transmitted to the terminal 100 via the communication interface 111.

［使用態様］
図７を参照して、本実施の形態に係る端末１００およびサーバ１１０の使用態様について説明する。図７は、端末１００およびサーバ１１０がネットワーク７００に接続可能な状態であることを表わす図である。 [Usage]
With reference to FIG. 7, the usage mode of terminal 100 and server 110 according to the present embodiment will be described. FIG. 7 is a diagram showing that terminal 100 and server 110 can be connected to network 700.

ある局面において、端末１００は、たとえば、お掃除ロボット７１０、ＰＣ７２０のように、音声認識機能と通信機能とを備える機器として実現され得る。お掃除ロボット７１０とＰＣ７２０とは、それぞれインターネットその他のネットワーク７００に接続可能である。接続のタイミングは、特に限定されない。なお、本図では、端末１００は１セットのみ示されているが、サーバに接続可能な機器は一つの端末に限られない。実際の局面では、ひとつサーバに対して、複数の端末が接続され得る。 In one aspect, the terminal 100 can be realized as a device having a voice recognition function and a communication function, such as a cleaning robot 710 and a PC 720, for example. The cleaning robot 710 and the PC 720 can be connected to the Internet and other networks 700, respectively. The connection timing is not particularly limited. In this figure, only one set of terminals 100 is shown, but the devices that can be connected to the server are not limited to one terminal. In an actual situation, a plurality of terminals can be connected to one server.

サーバ１１０も、端末１００が接続しているネットワークに接続可能である。サーバ１１０によるネットワークへの接続は、端末１００による当該ネットワークへの接続に依存しない。 The server 110 can also be connected to the network to which the terminal 100 is connected. The connection of the server 110 to the network does not depend on the connection of the terminal 100 to the network.

［画面の表示態様］
図８を参照して、画面の表示態様について説明する。図８は、端末１００またはサーバ１１０が備えるモニタ９００における画面の変化を表わす図である。 [Screen display mode]
The display mode of the screen will be described with reference to FIG. FIG. 8 is a diagram illustrating a screen change on the monitor 900 included in the terminal 100 or the server 110.

（状態Ａ）
状態Ａとして示されるように、ある局面において、モニタ９００は、声紋機能をオフにするかどうかを問い合わせる画面を表示している。この画面は、たとえば、端末１００が、通信が確立されたサーバ１１０のＩＰ（Internet Protocol）アドレス、端末７１０のＩＰアドレス、端末識別子またはユーザ識別子を含んだ情報をサーバ１１０に送信することにより、サーバ１１０が作成する。ユーザが「ＯＫ」を入力すると、サーバ１１０における設定に応じて、モニタ９００の画面は、変わり得る。たとえば、名前の入力を促す設定が初期値としてサーバ１１０にて設定されている場合には、画面は状態（Ｂ）に切り換わる（ステップ８０１）。登録されている名前が一覧表示される設定が有効である場合、画面は状態（Ｃ）に切り換わる（ステップ８０２）。一人のユーザのみが登録されている場合、画面は状態（Ｄ）に切り換わる（ステップ８０３）。登録されているユーザの設定を引き継ぐ場合においてユーザに確認することが有効であるとき、画面は状態（Ｅ）に切り換わる。 (State A)
As shown as state A, in one aspect, monitor 900 displays a screen asking whether to turn off the voiceprint function. This screen is displayed when, for example, the terminal 100 transmits information including the IP (Internet Protocol) address of the server 110 with which communication has been established, the IP address of the terminal 710, a terminal identifier, or a user identifier to the server 110. 110 creates. When the user inputs “OK”, the screen of the monitor 900 may change according to the setting in the server 110. For example, if the setting for prompting the input of the name is set in the server 110 as an initial value, the screen is switched to the state (B) (step 801). If the setting for displaying a list of registered names is valid, the screen switches to the state (C) (step 802). If only one user is registered, the screen switches to state (D) (step 803). When it is effective to confirm with the user when taking over the settings of the registered user, the screen switches to the state (E).

（状態Ｂ）
状態Ｂとして示されるように、モニタ９００は、ユーザの名前の入力を促す画面を表示し得る。ユーザは、操作によりあるいは発話により、名前を入力する。画面は、状態（Ｂ）以降の遷移について規定する設定に応じて切り換わる。たとえば、入力された名前についてユーザに確認することなく処理を続ける設定が有効である場合、画面は状態（Ｄ）に切り換わる（ステップ８０９）。ユーザに確認することが規定されている場合には、画面は状態（Ｅ）に切り換わる（ステップ８０８）。 (State B)
As shown as state B, monitor 900 may display a screen that prompts the user to enter a name. The user inputs a name by operation or utterance. The screen is switched according to the setting that defines the transition after the state (B). For example, if the setting for continuing the processing without confirming with the user for the input name is valid, the screen switches to the state (D) (step 809). If it is specified to be confirmed by the user, the screen switches to the state (E) (step 808).

（状態Ｃ）
状態Ｃとして示されるように、モニタ９００は、ユーザの名前を選択するための画面を表示し得る。たとえば、モニタ９００は、話者認識が既に行われたユーザとしてサーバ１１０に登録されている一人以上のユーザを一覧形式で表示する。ユーザは、一覧の中から自身の名前を選択する。ユーザが「ＯＫ」ボタンを押下すると、画面は、サーバ１１０における設定に応じて切り換わる。 (State C)
As shown as state C, monitor 900 may display a screen for selecting the user's name. For example, the monitor 900 displays one or more users registered in the server 110 as users who have already performed speaker recognition in a list format. The user selects his / her name from the list. When the user presses the “OK” button, the screen is switched according to the setting in the server 110.

たとえば、選択された名前を確認する設定が有効である場合には、画面は状態（Ｄ）に切り換わる（ステップ８０６）。既に登録されている名前に関連付けられた設定を継続することについて問い合わせることが有効である場合には、画面は状態（Ｅ）に切り換わる（ステップ８０７）。ユーザが「その他」を選択して「ＯＫ」ボタンを押した場合には、画面は状態（Ｂ）に切り換わる（ステップ８０５）。 For example, if the setting for confirming the selected name is valid, the screen switches to the state (D) (step 806). If it is valid to inquire about continuing the settings associated with the name already registered, the screen switches to state (E) (step 807). If the user selects “Other” and presses the “OK” button, the screen switches to the state (B) (step 805).

（状態Ｄ）
状態Ｄとして示されるように、モニタ９００は、入力されたユーザの名前または選択されたユーザの名前を確認するための画面を表示し得る。ユーザが「ＯＫ」を入力すると、サーバ１１０における設定に応じて画面は切り換わる。たとえば、今までの設定を引き継ぐことが有効である場合、画面は状態（Ｅ）に切り換わる（ステップ８１０）。そのような設定が特に規定されていない場合、サーバ１１０は、「ＯＫ」の入力に応答して、その他の設定に応じた処理を継続する。 (State D)
As shown as state D, the monitor 900 may display a screen for confirming the name of the input user or the name of the selected user. When the user inputs “OK”, the screen is switched according to the setting in the server 110. For example, if it is effective to take over the previous settings, the screen switches to state (E) (step 810). If such a setting is not particularly specified, the server 110 continues the process according to the other setting in response to the input of “OK”.

（状態Ｅ）
状態Ｅとして示されるように、モニタ９００は、確定された入力を確認するためのメッセージを表示し得る。たとえば、メッセージは、入力されたユーザについて既に設定されている内容を引き継いでもよいかどうかを確認するためのメッセージを含み得る。ユーザが「ＯＫ」を入力すると、サーバ１１０は、その設定に応じた処理を継続する。キャンセルが押されると、サーバ１１０は、これまでの処理を破棄して、予め設定された待機状態に戻る。 (State E)
As shown as state E, monitor 900 may display a message to confirm the confirmed input. For example, the message may include a message for confirming whether the content already set for the input user may be taken over. When the user inputs “OK”, the server 110 continues processing according to the setting. When cancel is pressed, the server 110 discards the processing so far and returns to a preset standby state.

図９を参照して、別の局面におけるモニタ９００の画面の遷移について説明する。図９は、複数の機器について機能のオン／オフの設定を変更するための画面を表す図である。 With reference to FIG. 9, the transition of the screen of the monitor 900 in another situation will be described. FIG. 9 is a diagram illustrating a screen for changing function on / off settings for a plurality of devices.

（状態Ａ）
状態Ａとして示されるように、モニタ９００は画面９１０を表示し得る。画面９１０は、あるユーザの住居において使用される１つ以上の設定を含む。１つ以上の設定は、それぞれ、ある機器の設定に相当する。ユーザが、画面９１０からいずれかの設定を選択すると、画面は切り換わる。たとえば、ユーザが「話者認識機能設定」を選択すると、画面は状態（Ｂ）に切り換わる（ステップ９２０）。 (State A)
As shown as state A, monitor 900 may display screen 910. Screen 910 includes one or more settings used in a user's residence. Each of the one or more settings corresponds to a setting of a certain device. When the user selects any setting from the screen 910, the screen is switched. For example, when the user selects “speaker recognition function setting”, the screen switches to the state (B) (step 920).

（状態Ｂ）
状態Ｂとして示されるように、別の局面において、モニタ９００は、画面９３０を表示し得る。画面９３０は、話者認識機能設定のための入力を受け付ける。たとえば、画面９３０は、特定のユーザが使用できる複数の機器のいずれかの選択を受け付ける。モニタ９００は、サーバ１１０における設定に応じて画面を切り換える。たとえば、選択された機器について話者認識機能をオフにすることを確認することが規定されている場合には、モニタ９００は状態（Ｃ）に切り換わる（ステップ９４０）。そのような確認を行なうことが規定されていない場合には、モニタ９００は状態（Ｄ）に切り換わる（ステップ９４５）。 (State B)
As shown as state B, in another aspect, monitor 900 may display screen 930. Screen 930 receives input for speaker recognition function setting. For example, the screen 930 accepts selection of any of a plurality of devices that can be used by a specific user. The monitor 900 switches the screen according to the setting in the server 110. For example, if it is stipulated to confirm that the speaker recognition function is to be turned off for the selected device, the monitor 900 switches to the state (C) (step 940). If it is not stipulated to perform such confirmation, monitor 900 switches to state (D) (step 945).

（状態Ｃ）
状態Ｃとして示されるように、モニタ９００は、画面９３０に加えて、画面９５０を表示し得る。画面９５０は、たとえば選択された機器の話者認識機能をオフにしてもよいかどうかを問い合わせるための画面である。端末１００のモニタ９００が画面９５０を表示すると、ユーザは、自らが選択した機器Ａについて話者認識機能がオフになることを確認することができる。ユーザが画面９５０に表示された「オン」を押すと、モニタ９００は状態（Ｄ）に切り換わる（ステップ９６０）。 (State C)
As shown as state C, monitor 900 may display screen 950 in addition to screen 930. Screen 950 is a screen for inquiring whether or not the speaker recognition function of the selected device may be turned off, for example. When the monitor 900 of the terminal 100 displays the screen 950, the user can confirm that the speaker recognition function is turned off for the device A selected by the user. When the user presses “ON” displayed on the screen 950, the monitor 900 switches to the state (D) (step 960).

（状態Ｄ）
状態Ｄとして示されるように、モニタ９００は、画面９３０に加えて、画面９７０を表示し得る。画面９７０は、選択された機器の話者認識機能をオフにした旨を通知するためのメッセージを表示する。ユーザは、このメッセージを確認すると、自らが選択した機器Ａの話者認識機能がオフにされたことを認識し得る。 (State D)
As shown as state D, monitor 900 may display screen 970 in addition to screen 930. Screen 970 displays a message for notifying that the speaker recognition function of the selected device has been turned off. Upon confirming this message, the user can recognize that the speaker recognition function of the device A selected by the user has been turned off.

図１０を参照して、さらに別の局面について説明する。図１０は、機器とユーザとが関連付けられている場合における画面の推移を表す図である。 Still another aspect will be described with reference to FIG. FIG. 10 is a diagram illustrating the transition of the screen when the device and the user are associated with each other.

（状態Ａ）
状態Ａとして示されるように、モニタ９００は、画面９３０に加えて画面９５０を表示している。ユーザが画面９５０に表示された「ＯＫ」を押すと、モニタ９００は状態（Ｂ）に切り換わる（ステップ１０１０）。 (State A)
As shown as state A, the monitor 900 displays a screen 950 in addition to the screen 930. When the user presses “OK” displayed on the screen 950, the monitor 900 switches to the state (B) (step 1010).

（状態Ｂ）
状態Ｂとして示されるように、モニタ９００は、画面９３０に加えて、画面１０２０を表示し得る。画面１０２０は、選択された機器を利用するユーザの名前の入力を促す画面である。たとえば、ユーザは、キー操作、タッチ操作あるいは音声入力を用いて、「早川太郎」と入力する。入力後、ユーザが「ＯＫ」を押すと、画面は状態（Ｃ）に切り換わる（ステップ１０２５）。 (State B)
As shown as state B, monitor 900 may display screen 1020 in addition to screen 930. A screen 1020 is a screen that prompts the user to input the name of the user who uses the selected device. For example, the user inputs “Taro Hayakawa” using a key operation, a touch operation, or voice input. After the input, when the user presses “OK”, the screen is switched to the state (C) (step 1025).

（状態Ｃ）
状態Ｃとして示されるように、モニタ９００は、画面９３０に加えて画面１０３０を表示し得る。画面１０３０は、選択された機器の利用者として入力されたユーザ名を用いて、当該ユーザに対する挨拶のためのメッセージを表示する。ユーザはこのメッセージを視認すると、入力した内容を確認することができる。 (State C)
As shown as state C, monitor 900 may display screen 1030 in addition to screen 930. The screen 1030 displays a greeting message for the user using the user name input as the user of the selected device. When the user visually recognizes this message, the user can confirm the input content.

図１１を参照して、さらに別の局面について説明する。図１１は、登録されているユーザを一覧で表示できる場合の画面を表す図である。 Still another aspect will be described with reference to FIG. FIG. 11 is a diagram illustrating a screen when registered users can be displayed in a list.

（状態Ａ）
状態Ａとして示されるように、モニタ９００は、画面９３０に加えて、画面９５０を表示し得る。複数のユーザが登録されている場合には、モニタ９００は状態（Ｂ）に切り換わる（ステップ１１１０）。登録されているユーザが一人の場合には、モニタ９００は状態（Ｃ）に切り換わる（ステップ１１３０）。 (State A)
As shown as state A, monitor 900 may display screen 950 in addition to screen 930. If a plurality of users are registered, the monitor 900 switches to the state (B) (step 1110). If there is only one registered user, the monitor 900 switches to the state (C) (step 1130).

（状態Ｂ）
状態Ｂとして示されるように、モニタ９００は、画面９３０に加えて、画面１１２０を表示し得る。画面１１２０は、話者認識機能が設定された機器のユーザの選択を促す画面である。たとえば、画面１１２０は、ユーザの一覧を表示する。ユーザが名前を選択すると、モニタ９００は、サーバ１１０における設定に応じて画面を切り換える。たとえば、話者認識機能をオフにしたうえで、これまでの設定を引き継ぐことについて確認することがサーバ１１０に規定されている場合には、モニタ９００は、画面を状態（Ｃ）に切り換える（ステップ１１４０）。そのような規定がない場合には、モニタ９００は、画面を状態（Ｄ）に切り換える（ステップ１１４５）。 (State B)
As shown as state B, monitor 900 may display screen 1120 in addition to screen 930. A screen 1120 is a screen that prompts the user to select a device for which the speaker recognition function is set. For example, the screen 1120 displays a list of users. When the user selects a name, the monitor 900 switches the screen according to the setting in the server 110. For example, when the server 110 stipulates that the speaker recognition function is turned off and that the server 110 confirms that the previous settings are to be taken over, the monitor 900 switches the screen to the state (C) (step (C) 1140). If there is no such regulation, the monitor 900 switches the screen to the state (D) (step 1145).

（状態Ｃ）
状態Ｃとして示されるように、モニタ９００は、画面９３０に加えて、画面１１５０を表示する。画面１１５０は、たとえば選択された名前のユーザに対して、話者認識機能をオフにした場合には、これまでの設定を引き継いでもよいかどうかを問い合わせるためのメッセージを含む。ユーザが、画面１１５０において「ＯＫ」を選択すると、モニタ９００は画面を状態（Ｄ）に切り換える（ステップ１１６０）。 (State C)
As shown as state C, monitor 900 displays screen 1150 in addition to screen 930. The screen 1150 includes, for example, a message for inquiring whether or not to accept the previous setting when the speaker recognition function is turned off for the user with the selected name. When the user selects “OK” on the screen 1150, the monitor 900 switches the screen to the state (D) (step 1160).

（状態Ｄ）
状態Ｄとして示されるように、モニタ９００は、画面９３０に加えて、画面１１７０を表示する。画面１１７０は、選択されたユーザに対して挨拶を表示するためのメッセージを含む。ユーザは、そのようなメッセージを視認すると、選択された機器Ａについて話者認識機能がオフにされ、通常の音声認識機能のみが実行されることを確認することができる。 (State D)
As shown as state D, monitor 900 displays screen 1170 in addition to screen 930. Screen 1170 includes a message for displaying a greeting to the selected user. When the user visually recognizes such a message, the user can confirm that the speaker recognition function is turned off for the selected device A and only the normal voice recognition function is executed.

［実施の形態のまとめ］
以上のようにして、本実施の形態に係るシステムによると、話者認識機能は、ユーザの命令に応じてオフにされる。話者認識機能がオフにされても、通常の音声認識機能は有効である。したがって、音声認識機能を有する機器のユーザが一人の場合、話者認識機能を使用することなく音声認識機能を使用できるので、煩雑な処理あるいは話者認識のための操作が省略される。その結果、ユーザの好みに応じて、話者認識機能を使用するか否かを切り換えることができるので、当該システムの利便性が向上し得る。また、話者認識機能がオフにされた場合であっても、音声認識機能および当該ユーザに関連付けられた設定は利用可能に構成されるので、当該ユーザに応じた嗜好や話題に応じた機能を利用することができる。 [Summary of embodiment]
As described above, according to the system according to the present embodiment, the speaker recognition function is turned off in response to a user command. Even if the speaker recognition function is turned off, the normal speech recognition function is effective. Accordingly, when there is only one user of a device having a voice recognition function, the voice recognition function can be used without using the speaker recognition function, so that complicated processing or operations for speaker recognition are omitted. As a result, whether or not to use the speaker recognition function can be switched according to the user's preference, so that the convenience of the system can be improved. Even if the speaker recognition function is turned off, the voice recognition function and the settings associated with the user are configured to be usable. Can be used.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１００端末、１１０サーバ、１０１，１１１通信インターフェイス、１０２音声入力部、１０３音声出力部、１１２制御部、１１３判断部、１１４固定認識部、１１５話者認識部、１１６音声認識部、１１７対話分析生成部、７００ネットワーク、７１０お掃除ロボット、９００モニタ。 100 terminal, 110 server, 101, 111 communication interface, 102 voice input unit, 103 voice output unit, 112 control unit, 113 judgment unit, 114 fixed recognition unit, 115 speaker recognition unit, 116 voice recognition unit, 117 dialog analysis generation Department, 700 Network, 710 Cleaning Robot, 900 Monitor.

Claims

A server device,
Memory,
A processor;
With a communication interface,
The memory stores data for recognizing a speaker,
The communication interface receives a first audio signal;
When the communication interface receives the first audio signal, the processor
Performing a speaker recognition function based on the received first voice signal and data for recognizing the speaker;
Create a second audio signal based on the execution result of the first audio signal and the speaker recognition function,
Outputting a second audio signal via the communication interface;
The communication interface further receives a command to the server device;
When the communication interface receives an instruction to turn off the speaker recognition function, the processor turns off the speaker recognition function ;
When the communication interface receives a command to turn off the speaker recognition function,
The processor queries the name of the speaker who gave the instruction,
The communication interface receives the name of the speaker;
The server device , wherein the processor is configured to generate the second audio signal based on personal information specified by the received name of the speaker .

The server device according to claim 1 , wherein the processor is configured to output the registered name of the speaker via the communication interface.

A server device,
Memory,
A processor;
With a communication interface,
The memory stores data for recognizing a speaker,
The communication interface receives a first audio signal;
When the communication interface receives the first audio signal, the processor
Performing a speaker recognition function based on the received first voice signal and data for recognizing the speaker;
Create a second audio signal based on the execution result of the first audio signal and the speaker recognition function,
Outputting a second audio signal via the communication interface;
The communication interface further receives a command to the server device;
When the communication interface receives an instruction to turn off the speaker recognition function, the processor turns off the speaker recognition function ;
The processor is
Identifying a speaker who has given an instruction to turn off the speaker recognition function based on voiceprint information used for the speaker recognition function;
A server device configured to generate the second voice signal based on personal information of the speaker specified by the voiceprint information of the specified speaker when the speaker is specified .

The server device according to claim 3 , wherein the processor is configured to output, via the communication interface, a signal confirming the speaker to use the personal information of the identified speaker. .

A server device,
Memory,
A processor;
With a communication interface,
The memory stores data for recognizing a speaker,
The communication interface receives a first audio signal;
When the communication interface receives the first audio signal, the processor
Performing a speaker recognition function based on the received first voice signal and data for recognizing the speaker;
Create a second audio signal based on the execution result of the first audio signal and the speaker recognition function,
Outputting a second audio signal via the communication interface;
The communication interface further receives a command to the server device;
When the communication interface receives an instruction to turn off the speaker recognition function, the processor turns off the speaker recognition function ;
When the voice print information for one person is held in the server device, the processor sends a signal for confirming whether or not the current setting in the server device is used for voice recognition processing via the communication interface. A server device configured to output .

The server according to any one of claims 1 to 5, wherein the processor is configured to further output a signal indicating that the speaker recognition function is turned off via the communication interface. apparatus.

When the communication interface receives an instruction to turn off the speaker recognition function, the processor sends a signal to the user via the communication interface to inquire whether the speaker recognition function may be turned off. The server apparatus according to any one of claims 1 to 6, wherein the server apparatus is configured to output the output.

A system,
A server device according to any one of claims 1 to 7 ,
With interactive home appliances,
The interactive home appliance is:
A voice input unit configured to accept voice input;
A communication unit configured to transmit a signal based on the input voice to the server device ;
An information processing terminal capable of communicating with the server device ;
The server device
Memory,
A processor;
With a communication interface,
The memory stores data for recognizing a speaker,
The communication interface receives a first audio signal;
When the communication interface receives the first audio signal, the processor
Performing a speaker recognition function based on the received first voice signal and data for recognizing the speaker;
Create a second audio signal based on the execution result of the first audio signal and the speaker recognition function,
Outputting a second audio signal via the communication interface;
The communication interface further receives a command to the server device;
When the communication interface receives a command to turn off the speaker recognition function, the processor is configured to turn off the speaker recognition function ;
The information processing terminal
A monitor,
Memory,
A communication interface;
With a processor,
The processor of the information processing terminal displays on the monitor a screen for receiving an instruction to turn off the speaker recognition function and a screen for managing a user of the speaker recognition function by the server device. Configured system.

A method for managing a voice recognition function by a server device,
Accessing data that recognizes the speaker;
Receiving a first audio signal;
Performing a speaker recognition function based on the received first voice signal and data for recognizing the speaker based on the reception of the first voice signal;
Creating a second audio signal based on the execution result of the first audio signal and the speaker recognition function;
And outputting via a communications interface a second audio signal,
Receiving further instructions for the server device;
Turning off the speaker recognition function based on receiving a command to turn off the speaker recognition function ;
Querying the name of the speaker who gave the command based on receiving the command to turn off the speaker recognition function;
Receiving the name of the speaker;
Generating the second audio signal based on personal information specified by the received speaker's name .

A method for managing a voice recognition function by a server device,
Accessing data that recognizes the speaker;
Receiving a first audio signal;
Performing a speaker recognition function based on the received first voice signal and data for recognizing the speaker based on the reception of the first voice signal;
Creating a second audio signal based on the execution result of the first audio signal and the speaker recognition function;
And outputting via a communications interface a second audio signal,
Receiving further instructions for the server device;
Turning off the speaker recognition function based on receiving a command to turn off the speaker recognition function ;
Identifying a speaker who has given an instruction to turn off the speaker recognition function based on voiceprint information used for the speaker recognition function;
Generating the second audio signal based on the personal information of the speaker identified by the voiceprint information of the identified speaker .

A method for managing a voice recognition function by a server device,
Accessing data that recognizes the speaker;
Receiving a first audio signal;
Performing a speaker recognition function based on the received first voice signal and data for recognizing the speaker based on the reception of the first voice signal;
Creating a second audio signal based on the execution result of the first audio signal and the speaker recognition function;
And outputting via a communications interface a second audio signal,
Receiving further instructions for the server device;
Turning off the speaker recognition function based on receiving a command to turn off the speaker recognition function ;
Outputting, via the communication interface, a signal for confirming whether or not the current setting in the server device is used for voice recognition processing when voice print information for one person is held in the server device; Including a method.

A program for controlling an information communication terminal, the program being connected to the information communication terminal,
Establishing communication with a server device having a speaker recognition function;
Displaying a screen for receiving an instruction to turn off the speaker recognition function on the monitor of the information communication terminal;
A program for executing a step of displaying a screen for managing a user of the speaker recognition function by the server device on the monitor.