JP6522679B2

JP6522679B2 - Speech control apparatus, method, speech system, and program

Info

Publication number: JP6522679B2
Application number: JP2017047738A
Authority: JP
Inventors: 靖典山下
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2017-03-13
Filing date: 2017-03-13
Publication date: 2019-05-29
Anticipated expiration: 2033-10-31
Also published as: JP2017122930A

Description

本発明は、発話装置に発話させる発話内容を決定する発話制御装置、方法、発話システム、及びプログラムに関する。 The present invention relates to a speech control apparatus, method, speech system, and program for determining the contents of speech to be made to speech in a speech apparatus.

近年、ユーザが所望する情報を、装置が音声を発することによって提供する方法が知られている。この場合に用いられる音声として、装置に予め記憶されている音声を信号処理によって合成する合成音声が知られている。 In recent years, methods have been known in which information desired by the user is provided by the device emitting speech. As speech to be used in this case, synthetic speech in which speech stored in advance in a device is synthesized by signal processing is known.

例えば、特許文献１には、複数の話者の音声を別々に収録した音声合成用データベースを備え、ユーザからの指示により、ユーザが希望する音声合成用データベースに切り替えて、音声を再生する再生装置が開示されている。 For example, Patent Document 1 includes a voice synthesis database in which voices of a plurality of speakers are separately recorded, and a playback device that switches to a voice synthesis database desired by the user according to an instruction from the user, and plays the voice. Is disclosed.

特開２００５−３２１７０６号公報（２００５年１１月１７日公開）JP 2005-321706 A (released on November 17, 2005)

一般に、人間同士の会話において、発する人の気分や性格によって、同じ意味でも違うフレーズが使われる。例えば、「おはよう」とあいさつをされ、それに対して「おはよう」と返す場合もあれば、「おはよう」以外のフレーズを返す場合もあるし、「おはよう」のイントネーションを変えて、そのときの気分を表現するような場合もある。 Generally, in human-to-human conversations, different phrases are used with the same meaning depending on the mood and character of the person who emits. For example, it may be said that "Good morning" and then return "Good morning", and may return phrases other than "Good morning", or change the "Good morning" intonation and feel the mood at that time. There is also a case to express.

一方で、特許文献１に記載の技術では、再生する人間の音声の声色を、ユーザが希望する声色に変更することはできるものの、発するフレーズやイントネーションを変更することができないため、ユーザと装置との間の円滑なコミュニケーションを図ることが困難であった。 On the other hand, in the technology described in Patent Document 1, although the voice color of the human voice to be reproduced can be changed to the voice color desired by the user, the user can not change the emitted phrase or intonation. It was difficult to achieve smooth communication between

本発明は上記問題に鑑みてなされたものであり、その目的は、従来に比べてユーザと円滑なコミュニケーションを図ることが可能な発話制御装置を提供することにある。 The present invention has been made in view of the above problems, and an object thereof is to provide a speech control apparatus capable of achieving smooth communication with a user as compared with the prior art.

上記の課題を解決するために、本発明の一態様に係る発話制御装置は、発話装置に発話させる発話内容を決定する発話制御装置であって、上記発話装置から入力音声情報を取得する音声情報取得部と、上記入力音声情報を認識する音声情報認識部と、上記発話制御装置に設定されたモードと、上記音声情報認識部によって認識された認識情報と、に応じて１又は複数のデータベースを参照することにより、発話内容を決定する発話内容決定部と、上記発話内容決定部が決定した発話内容を上記発話装置に出力する音声出力部と、を備える。 In order to solve the above-mentioned subject, a speech control device concerning one mode of the present invention is a speech control device which determines the contents of speech which makes a speech device utter, and voice information which acquires input speech information from the above-mentioned speech device According to an acquisition unit, a voice information recognition unit for recognizing the input voice information, a mode set in the utterance control device, and recognition information recognized by the voice information recognition unit An utterance content determination unit that determines the utterance content by referring to a speech output unit that outputs the utterance content determined by the utterance content determination unit to the utterance device.

上記の課題を解決するために、本発明の一態様に係る方法は、発話装置に発話させる発話内容を決定する方法であって、上記発話装置から入力音声情報を取得する音声情報取得工程と、上記入力音声情報を認識する音声情報認識工程と、設定されたモードと、上記音声情報認識工程において認識された認識情報と、に応じて１又は複数のデータベースを参照することにより、発話内容を決定する発話内容決定工程と、上記発話内容決定工程において決定された発話内容を上記発話装置に出力する音声出力工程と、を含む。 In order to solve the above problems, a method according to an aspect of the present invention is a method of determining the content of speech to be uttered by a speech device, and a voice information acquisition step of acquiring input speech information from the speech device; The utterance content is determined by referring to one or more databases according to the voice information recognition process for recognizing the input voice information, the set mode, and the recognition information recognized in the voice information recognition process. And an audio output step of outputting the utterance content determined in the utterance content determination step to the utterance device.

上記の課題を解決するために、本発明の一態様に係る発話システムは、発話装置と、発話制御装置と、を備えた発話システムであって、上記発話制御装置が、上記発話装置から入力音声情報を取得する音声情報取得部と、上記入力音声情報を認識する音声情報認識部と、上記発話制御装置に設定されたモードと、上記音声情報認識部によって認識された認識情報と、に応じて１又は複数のデータベースを参照することにより、発話内容を決定する発話内容決定部と、上記発話内容決定部が決定した発話内容を上記発話装置に出力する音声出力部と、を備え、上記発話装置が、上記発話制御装置から出力された発話内容を取得する発話内容取得部と、取得した発話内容を発話する発話部と、を備える。 In order to solve the above-mentioned subject, a speech system concerning one mode of the present invention is a speech system provided with a speech device and a speech control device, and the above-mentioned speech control device is an input voice from the above-mentioned speech device According to the voice information acquisition unit for obtaining information, the voice information recognition unit for recognizing the input voice information, the mode set in the speech control device, and the recognition information recognized by the voice information recognition unit The utterance device includes an utterance content determination unit that determines utterance content by referring to one or more databases, and an audio output unit that outputs the utterance content determined by the utterance content determination unit to the utterance device. And an utterance content acquisition unit that acquires the utterance content output from the utterance control device, and an utterance unit that utters the acquired utterance content.

本発明の一態様によれば、発話制御装置は、従来に比べてユーザと円滑なコミュニケーションを図ることができる。 According to one aspect of the present invention, the speech control device can communicate more smoothly with the user than in the related art.

本発明の実施形態１に係る発話システムの構成を示すブロック図である。It is a block diagram which shows the structure of the speech system which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る発話システムにおいて、発話内容決定部が発話内容を決定するために参照するデータベースの例である。In the speech system concerning Embodiment 1 of the present invention, it is an example of a database which a speech contents deciding part refers to in order to decide speech contents. 本発明の実施形態１に係る発話システムにおける認識フレーズと回答フレーズとの例である。It is an example of the recognition phrase and reply phrase in the speech system concerning Embodiment 1 of the present invention. 本発明の実施形態５に係る発話システムにおいて、発話内容決定部が発話内容を決定するために参照するあいまいデータベースの例である。In the speech system concerning Embodiment 5 of the present invention, it is an example of the ambiguous database which a speech contents deciding part refers to in order to decide speech contents. 本発明の実施形態５に係る発話システムにおける、あいまいフレーズの例である。It is an example of the ambiguous phrase in the speech system concerning Embodiment 5 of the present invention. 本発明の実施形態６に係る発話システムにおいて、発話内容決定部が発話内容を決定するために参照するデータベースの例である。In the speech system concerning Embodiment 6 of the present invention, it is an example of a database which a speech contents deciding part refers to in order to decide speech contents. 本発明の実施形態７に係る発話システムにおいて、サーバに設定されているモードを変更する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which changes the mode set to the server in the speech system which concerns on Embodiment 7 of this invention.

〔実施形態１〕
以下、本発明の実施形態１について、詳細に説明する。 Embodiment 1
Hereinafter, Embodiment 1 of the present invention will be described in detail.

（発話システム１の構成）
図１は、本発明の実施形態１に係る発話システム１の構成を示すブロック図である。発話システム１は、図１に示すように、発話装置１０及びサーバ（発話制御装置）２０によって構成されている。 (Configuration of Utterance System 1)
FIG. 1 is a block diagram showing the configuration of the speech system 1 according to the first embodiment of the present invention. The speech system 1 includes a speech device 10 and a server (speech control device) 20, as shown in FIG.

発話システム１では、サーバ２０は、複数のモードを有している。サーバ２０は、複数のモードの中から選択されて設定されたモード（以下、設定されたモードを「発話モード」とも称する）に応じて発話内容を決定し、当該発話内容を発話装置１０から発話させる。 In the speech system 1, the server 20 has a plurality of modes. The server 20 determines the content of the utterance according to the mode selected and set among the plurality of modes (hereinafter, the set mode is also referred to as “the utterance mode”), and the utterance content is uttered from the uttering device 10 Let

ここで、発話内容とは、発話装置１０が発話すべきフレーズ及びイントネーションを含む情報である。また、発話内容には、発話装置１０が当該フレーズを当該イントネーションで発話する声色、音量、話速、音の高さも含まれる。 Here, the utterance content is information including a phrase to be uttered by the utterance device 10 and intonation. The utterance content also includes the voice color, the volume, the speech speed, and the pitch of the speech in which the speech apparatus 10 utters the phrase in the intonation.

また、フレーズとは、１以上の単語によって構成されていればよく、単語の数や文の構成に限定されず、１つの単語であっても、複数の単語によって構成された文であっても、複数の文であってもよい。 Further, the phrase may be composed of one or more words, and is not limited to the number of words or the composition of the sentence, and it may be a single word or a sentence composed of a plurality of words. , May be multiple sentences.

また、イントネーションとは、文の切れ目（各発音の間の無音期間の長さ）や文の高低の調子に限定されず、単語のアクセントも含まれる。例えば、東京弁のように「おはよう」を、アクセントなしにフラットに発話することと、関西弁のように「よ」にアクセントをつけて発話することとは、イントネーションが異なっていることになる。なお、フレーズには必ずイントネーションがついているが、本実施形態及び他の実施形態においても、どのようなイントネーションであっても構わない場合については、イントネーションについての言及は省略する。 In addition, intonation is not limited to sentence breaks (the length of a silence period between each pronunciation) and the pitch of sentences, and includes word accents. For example, speaking "Good morning" flatly without accent as in the case of Tokyo dialect and speaking with accenting "y" like the Kansai dialect are different in intonation. In addition, although the phrase always includes an intonation, in the present embodiment and the other embodiments, the reference to the intonation is omitted in the case where it may be any intonation.

また、モードとは、フレーズ、またはフレーズ及びイントネーションに関連付けられており、より具体的には、例えば、フレーズが「おはよう」でありイントネーションがフラットであるモード１、フレーズが「おはよう」でありイントネーションが「よ」にアクセントのついたモード２、フレーズが「おはようございます」でありイントネーションがフラットであるモード３、などが挙げられる。したがって、選択されるモードが異なれば、対応する発話内容のフレーズ及びイントネーションの少なくとも何れかが異なることになる。 A mode is associated with a phrase or a phrase and intonation, more specifically, for example, mode 1 in which the phrase is "Good morning" and the intonation is flat, the phrase is "Good morning", and the intonation is in Mode 2 is accented with "Yo", mode 3 in which the phrase is "Good morning" and the intonation is flat. Therefore, if the selected mode is different, at least one of the phrase and the intonation of the corresponding utterance content will be different.

また、モードは、単にサーバ２０が発話内容を決定するためのパラメータであってもよいが、発話装置１０のキャラクタと一致させてもよい。例えば、サーバ２０が元気モードを有し、元気モードが選択されている場合、サーバ２０は、発話装置１０が元気なキャラクタであるかのような発話内容を決定する、としてもよい。また、例えば、サーバ２０が関西弁モードを有し、関西弁モードが選択されている場合、サーバ２０は、発話装置１０が関西出身のキャラクタであるかのような発話内容を決定する、としてもよい。 Also, the mode may be a parameter simply for the server 20 to determine the content of the speech, but may be made to match the character of the speech device 10. For example, when the server 20 has the fine mode and the fine mode is selected, the server 20 may determine the content of the utterance as if the speech apparatus 10 is a fine character. Also, for example, when the server 20 has the Kansai dialect mode and the Kansai dialect mode is selected, the server 20 determines the content of the utterance as if the utterance device 10 is a character from Kansai. Good.

サーバ２０において設定可能なモードは例えば以下の通りである。 The modes that can be set in the server 20 are, for example, as follows.

・東京弁モード
・関西弁モード
・標準モード
・元気モード
・丁寧モード
なお、上記設定可能なモードから複数のモードを選択し、重畳的にサーバ２０に設定することもできる。例えば、関西弁モードと元気モードとを重畳させた元気な関西弁モードといったモードを、サーバ２０に設定することもできる。また、モードを、ユーザの操作によって変更可能な構成であってもよい。 -Tokyo valve mode-Kansai valve mode-Standard mode-Energy mode-Polite mode It is also possible to select a plurality of modes from the above-described settable modes and set them in the server 20 in an overlapping manner. For example, it is possible to set the server 20 as a mode such as a energetic Kansai valve mode in which the Kansai valve mode and the energetic mode are superimposed. Further, the mode may be changed by the operation of the user.

また、各モードについて、サーバ２０は、キャラクタの名称を対応させて管理してもよい。例えば、東京弁モードに対して「江戸っ子コロちゃん」、元気モードに対して「元気なアイちゃん」、丁寧モードに対して「のんびりナオちゃん」等というようなキャラクタ名称をサーバ２０が対応させ、これをモードの名称としてもよい。 Also, for each mode, the server 20 may manage the character names in correspondence. For example, the server 20 is made to correspond to character names such as "Edokko Koro-chan" for the Tokyo valve mode, "Genki ai-chan" for the fine mode, and "Lobby Nao-chan" for the polite mode, etc. This may be the name of the mode.

さらに、各モードにキャラクタイメージ（グラフィクス、色等）を対応させ、サーバ２０にモードを設定した際に、発話装置１０は、設定されたモードに対応するキャラクタイメージを、後述する表示部に表示させてもよい。 Furthermore, when a character image (graphics, color, etc.) is associated with each mode and the mode is set in the server 20, the speech device 10 causes the display unit described later to display a character image corresponding to the set mode. May be

（発話装置１０の構成）
発話装置１０は、図１に示すように、通信部（発話内容取得部）１１、音声情報取得部１２、発話部１３、操作受付部１４、及び表示部１５を備えている。 (Configuration of the speech device 10)
As shown in FIG. 1, the speech device 10 includes a communication unit (speech content acquisition unit) 11, a speech information acquisition unit 12, a speech unit 13, an operation reception unit 14, and a display unit 15.

通信部１１は、サーバ２０に接続されており、サーバ２０との間でデータを送受信することができる。 The communication unit 11 is connected to the server 20, and can transmit and receive data to and from the server 20.

音声情報取得部１２は、発話装置１０の周辺の音声を取得し、取得した音声を入力音声情報として出力する。 The voice information acquisition unit 12 obtains voices around the speech device 10, and outputs the obtained voice as input voice information.

発話部１３は、発話内容を取得し、取得した発話内容に含まれるフレーズを、発話内容に含まれるイントネーションで発話する。 The utterance unit 13 acquires the utterance content, and utters a phrase included in the acquired utterance content with intonation included in the utterance content.

操作受付部１４は、ユーザがハードキー、スイッチ、タッチセンサなどを操作して入力した情報を受け付ける。また、操作受付部１４は、ユーザが発した音声も、ユーザが操作した情報として受け付けることができる。操作受付部１４は、受け付けた情報を操作情報として出力する。 The operation accepting unit 14 accepts information input by the user operating a hard key, a switch, a touch sensor or the like. In addition, the operation accepting unit 14 can also accept voices uttered by the user as information manipulated by the user. The operation accepting unit 14 outputs the accepted information as operation information.

表示部１５は、各種の情報をユーザに報知する機能を有している。表示部１５は、液晶表示パネルに代表される表示パネル、及び発光可能に構成されている発光部のうち、少なくとも何れかを備え、表示パネル又は発光部を介して、各種の情報をユーザに報知することが好ましい。表示部１５は、発光部を備える場合、発光部を発光することによって、ユーザに情報を報知可能である。 The display unit 15 has a function of notifying the user of various types of information. The display unit 15 includes at least one of a display panel represented by a liquid crystal display panel and a light emitting unit configured to be capable of emitting light, and notifies the user of various information via the display panel or the light emitting unit. It is preferable to do. When the display unit 15 includes the light emitting unit, the light can be notified to the user by emitting light from the light emitting unit.

また、表示部１５は、設定されているモードをユーザに示す構成を有してもよい。例えば、表示部１５が表示パネルを備える場合、表示部１５は、設定されているモードに対応したキャラクタ（「江戸っ子コロちゃん」、「元気なアイちゃん」、「のんびりナオちゃん」等）のイメージ（グラフィクス）を表示部１５に表示させる。このイメージは動画であってもよく、発話に合わせて動作することがより好ましい。このイメージは、後述するモード設定の際に、設定されたモードに応じて、サーバ２０よりダウンロードして表示部１５に表示させてもよいし、ダウンロード後に発話装置１０に内蔵する記憶部（不図示）に一旦記憶させ、記憶部（不図示）からこれを読み出すことで表示部１５に表示させてもよい。また、予め記憶部（不図示）に記憶させているものを、設定されたモードに応じたイメージを読み出すことで表示部１５に表示させてもよい。 In addition, the display unit 15 may have a configuration that indicates the set mode to the user. For example, when the display unit 15 includes a display panel, the display unit 15 may select characters (such as "Edokko Coro", "Genki Ai", "Girls Nao", etc.) corresponding to the set mode. An image (graphics) is displayed on the display unit 15. This image may be a moving image, and it is more preferable to operate according to the speech. This image may be downloaded from the server 20 and displayed on the display unit 15 according to the set mode at the time of mode setting to be described later, or a storage unit (not shown) built in the speech device 10 after downloading. ) And may be displayed on the display unit 15 by reading the same from a storage unit (not shown). In addition, the display unit 15 may display an image stored in advance in a storage unit (not shown) by reading an image according to the set mode.

また、表示部１５が発光部を備える場合、表示部１５は、設定されたモードに応じた色を発光部に発光させてもよい。例えば、東京弁モードのときは青、元気モードのときは赤、丁寧モードのときは緑、等である。また、表示部１５は、発話に合わせて発光部を点滅させてもよい。 When the display unit 15 includes the light emitting unit, the display unit 15 may cause the light emitting unit to emit a color according to the set mode. For example, it is blue in the case of Tokyo valve mode, red in the case of fine mode, green in the case of polite mode, and so on. In addition, the display unit 15 may blink the light emitting unit according to the utterance.

（サーバ２０の構成）
サーバ２０は、図１に示すように、サーバ通信部（音声出力部）２１、制御部２２、記憶部２３、及び外部情報取得部（音声情報取得部）２４を備えている。 (Configuration of server 20)
As shown in FIG. 1, the server 20 includes a server communication unit (voice output unit) 21, a control unit 22, a storage unit 23, and an external information acquisition unit (voice information acquisition unit) 24.

サーバ通信部２１は、発話装置１０に接続されており、発話装置１０との間でデータを送受信することができる。 The server communication unit 21 is connected to the speech apparatus 10 and can transmit and receive data to and from the speech apparatus 10.

制御部２２は、サーバ２０の各構成を統括的に制御するものである。制御部２２の機能は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）やフラッシュメモリなどの記憶装置に記憶されたプログラムをＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇ
Ｕｎｉｔ）が実行することによって実現される。また、図１に示すように、制御部２２は、音声情報認識部（認識手段）２５及び発話内容決定部（発話内容決定手段、判定手段）２６としても機能する。 The control unit 22 centrally controls each component of the server 20. The function of the control unit 22 is, for example, CPU (Central Processing) for a program stored in a storage device such as a random access memory (RAM) or a flash memory.
Unit) is implemented. Further, as shown in FIG. 1, the control unit 22 also functions as a voice information recognition unit (recognition means) 25 and an utterance content determination unit (utterance content determination means, determination means) 26.

記憶部２３は、認識フレーズと認識フレーズに対応した回答フレーズとを含むデータベースや、あいまいな回答フレーズを含むあいまいデータベースなど、各種データベースを記憶している。なお、それぞれのデータベースの詳細については、後述する。 The storage unit 23 stores various databases such as a database including a recognition phrase and an answer phrase corresponding to the recognition phrase, or a vague database including an ambiguous answer phrase. The details of each database will be described later.

外部情報取得部２４は、ネットワーク２に接続されており、制御部２２からの指示により、ネットワーク２から外部情報を取得することができる。 The external information acquisition unit 24 is connected to the network 2 and can acquire external information from the network 2 according to an instruction from the control unit 22.

（制御部２２の構成）
上述したように、制御部２２は、音声情報認識部２５及び発話内容決定部２６としても機能する。 (Configuration of control unit 22)
As described above, the control unit 22 also functions as the voice information recognition unit 25 and the utterance content determination unit 26.

音声情報認識部２５は、取得した入力音声情報を認識し、認識情報として出力する。ここで、音声情報認識部２５が入力音声情報を認識するとは、音声情報認識部２５が、入力音声情報に含まれるフレーズ及びイントネーションを判別することである。例えば、入力音声情報が、アクセントのないフラットな「おはよう」を含む場合、音声情報認識部２５は、取得した入力音声情報が含むフレーズが「おはよう」であり、イントネーションはアクセントがないフラットである、と判別する。そして、認識フレーズが「おはよう」であり、認識イントネーションは「アクセントがないフラット」であることを示す認識情報を、音声情報認識部２５は出力する。 The voice information recognition unit 25 recognizes the acquired input voice information and outputs it as recognition information. Here, the speech information recognition unit 25 recognizing the input speech information means that the speech information recognition unit 25 discriminates a phrase and intonation included in the input speech information. For example, when the input voice information includes a flat "Ohayou" without accents, the voice information recognition unit 25 determines that the phrase included in the acquired input voice information is "Good morning" and the intonation is a flat without accents. To determine. Then, the voice information recognition unit 25 outputs recognition information indicating that the recognition phrase is "Good morning" and the recognition intonation is "Flat without accent."

発話内容決定部２６は、設定されたモード及び音声情報認識部２５が出力する認識情報に応じて、発話装置１０に発話させる発話内容を決定する。発話内容決定部２６が発話内容を決定する処理の詳細については、後述する。 The utterance content determination unit 26 determines the utterance content to be uttered by the utterance device 10 according to the set mode and the recognition information output from the voice information recognition unit 25. Details of the process in which the utterance content determination unit 26 determines the utterance content will be described later.

（制御部２２の処理）
制御部２２の処理について、図２を用いて以下に説明する。 (Processing of control unit 22)
The processing of the control unit 22 will be described below with reference to FIG.

図２は、本発明の実施形態１に係る発話システム１において、発話内容決定部２６が発話内容を決定するために参照するデータベースの例である。 FIG. 2 is an example of a database which the utterance content determination unit 26 refers to in order to determine the utterance content in the utterance system 1 according to the first embodiment of the present invention.

図２に示すように、各データベースには、認識フレーズと、それに対応する回答フレーズ及びイントネーションとが含まれている。また、各データベースは、同一カテゴリに含まれる認識フレーズ及び回答フレーズをそれぞれ含んでいる。例えば、「朝のあいさつ」というカテゴリに含まれるフレーズとして、データベース１及びデータベース２の認識フレーズには「おはよう」、データベース３の認識フレーズには「おはようございます」が含まれている。また、図２に示すように、データベース１〜３には、「帰宅時のあいさつ」というカテゴリに含まれるフレーズとして、「ただいま」「ただいま」及び「ただいま戻りました」をそれぞれ含んでおり、「就寝時のあいさつ」というカテゴリに含まれるフレーズとして、「おやすみ」「おやすみなさい」をそれぞれ含んでいる。また、各データベースは、サーバ２０が有するモードにそれぞれ対応している。例えば、データベース１は標準モード、データベース２は元気モード、データベース３は丁寧モードにそれぞれ対応しているとする。なお、本実施形態においては、データベース１〜３に含まれるイントネーションはすべてアクセントのないフラットなイントネーションとする。 As shown in FIG. 2, each database contains a recognition phrase and a corresponding answer phrase and intonation. In addition, each database includes the recognition phrase and the answer phrase included in the same category. For example, as phrases included in the category of "Morning greeting", "Good morning" is included in the recognition phrases of Database 1 and Database 2 and "Good morning" is included in the recognition phrase of Database 3. In addition, as shown in FIG. 2, the databases 1 to 3 respectively include "I'm here", "I'm now" and "I just returned" as phrases included in the category "Greetings at home" "Good night" and "Good night" are respectively included as a phrase included in the category "Guest at bedtime". Each database corresponds to the mode that the server 20 has. For example, it is assumed that the database 1 corresponds to the standard mode, the database 2 corresponds to the fine mode, and the database 3 corresponds to the polite mode. In the present embodiment, all intonations included in the databases 1 to 3 are flat intonation without accents.

図２に示すデータベースが記憶部２４に記憶されており、ユーザが発話装置１０に対して、アクセントがなくフラットなイントネーションで「おはよう」と発した場合に、制御部２２が行う処理について、説明する。 Description will be made of processing performed by the control unit 22 when the database shown in FIG. 2 is stored in the storage unit 24 and the user issues "Ohayou" with a flat intonation without accent to the speech apparatus 10. .

まず、音声情報取得部１２は、アクセントがなくフラットなイントネーションの「おはよう」を含む入力音声情報を、通信部１１を介してサーバ２０に出力する。サーバ２０の制御部２２は、サーバ通信部２１を介して入力音声情報を取得する。そして、音声情報認識部２５は、取得した入力音声情報を認識する。この場合、音声情報認識部２５は、認識フレーズ「おはよう」と、認識イントネーション「アクセントがないフラット」とを示す認識情報を、発話内容決定部２６に出力する。 First, the voice information acquisition unit 12 outputs input voice information including a flat intonation “oh you” without accents to the server 20 via the communication unit 11. The control unit 22 of the server 20 acquires input voice information via the server communication unit 21. Then, the voice information recognition unit 25 recognizes the acquired input voice information. In this case, the voice information recognition unit 25 outputs, to the utterance content determination unit 26, recognition information indicating the recognition phrase "Ohayou" and the recognition intonation "flat without accent".

発話内容決定部２６は、取得した認識情報から、発話装置１０が発話すべき発話内容を決定する。 The utterance content determination unit 26 determines the utterance content to be uttered by the utterance device 10 from the acquired recognition information.

例えば、サーバ２０に標準モードが設定されている場合、発話内容決定部２６は、標準モードに対応するデータベース１を参照し、認識フレーズ「おはよう」に対応する回答フレーズ「おはよう」を選択する。そして、発話内容決定部２６は、選択した「おはよう」というフレーズと、アクセントがないフラットなイントネーションとを、発話内容として決定する。 For example, when the standard mode is set in the server 20, the utterance content determination unit 26 refers to the database 1 corresponding to the standard mode, and selects the answer phrase "Ohayou" corresponding to the recognition phrase "Ohayou". Then, the utterance content determination unit 26 determines the selected phrase "Good morning" and the flat intonation without accent as the utterance content.

また、例えば、サーバ２０に元気モードが設定されている場合、発話内容決定部２６は、参照するデータベースを、元気モードに対応するデータベース２に切り替え、認識フレーズ「おはよう」に対応する回答フレーズ「今日も気合入れていこう！」を選択する。そして、発話内容決定部２６は、選択した「今日も気合入れていこう！」というフレーズと、アクセントがないフラットなイントネーションとを、発話内容として決定する。 Further, for example, when the fine mode is set in the server 20, the utterance content determination unit 26 switches the database to be referred to to the database 2 corresponding to the fine mode, and the answer phrase "today" corresponding to the recognition phrase "Ohayou" Let's take it easy! " Then, the utterance content determination unit 26 determines, as the utterance content, the selected phrase “Let's get in touch with you today” and a flat intonation without an accent.

そして、発話内容決定部２６は、決定した発話内容を、サーバ通信部２１を介して発話装置１０に出力する。発話装置１０の発話部１３は、通信部１１を介して発話内容を取得し、取得した発話内容に含まれるフレーズを、発話内容に含まれるイントネーションで発話する。 Then, the utterance content determination unit 26 outputs the determined utterance content to the utterance device 10 via the server communication unit 21. The utterance unit 13 of the utterance device 10 acquires the utterance content via the communication unit 11, and utters the phrase included in the acquired utterance content in the intonation included in the utterance content.

また、例えば、サーバ２０に丁寧モードが設定されている場合、発話内容決定部２６は、参照するデータベースを、丁寧モードに対応するデータベース３に切り替える。ここで、データベース３において、「朝のあいさつ」というカテゴリに含まれる認識フレーズは「おはようございます」であり、発話内容決定部２６が音声情報認識部２５から取得した認識フレーズ「おはよう」とは異なるので、発話内容決定部２６は、回答フレーズを選択しない。したがって、発話内容を決定しないため、サーバ２０は、発話装置１０から何も発話させない。 Further, for example, when the polite mode is set in the server 20, the utterance content determination unit 26 switches the database to be referred to to the database 3 corresponding to the polite mode. Here, in the database 3, the recognition phrase included in the category “Morning greetings” is “Good morning”, which is different from the recognition phrase “Good morning” acquired by the speech content determination unit 26 from the voice information recognition unit 25. Therefore, the utterance content determination unit 26 does not select the answer phrase. Therefore, the server 20 does not make any speech from the speech device 10 because the speech content is not determined.

なお、サーバ２０は、認識フレーズを含まないフレーズデータベースを参照する構成であってもよい。この場合、発話内容決定部２６は、フレーズデータベースを参照し、フレーズデータベースを切り替えることによって発話内容を決定することができる。 Note that the server 20 may be configured to refer to a phrase database that does not include a recognition phrase. In this case, the utterance content determination unit 26 can determine the utterance content by switching the phrase database with reference to the phrase database.

また、この場合、サーバ２０は、フレーズデータベースに対応付けられた認識データベースであって、認識フレーズを含み回答フレーズを含まない認識データベースを参照する構成としてもよい。当該構成において、音声情報認識部２５は、当該認識データベースを参照し、入力音声情報に含まれるフレーズに応じて、認識データベースを切り替え、入力音声情報を認識することができる。 Further, in this case, the server 20 may be configured to refer to a recognition database that is a recognition database associated with a phrase database and that contains a recognition phrase and does not contain an answer phrase. In the configuration, the voice information recognition unit 25 can switch the recognition database according to the phrase included in the input voice information with reference to the recognition database, and can recognize the input voice information.

続いて、サーバに設定されたモードによって、発話すべきフレーズのイントネーションを設定する処理について、説明する。 Subsequently, processing for setting intonation of a phrase to be uttered according to the mode set in the server will be described.

まず、図２のデータベース１と同じ認識フレーズ及び回答フレーズを有し、回答フレーズの各フレーズに、データベース１の回答フレーズとは異なるイントネーションが含まれるデータベース１０を、記憶部２４に記憶させる。本実施形態では、認識フレーズ「おはよう」に対して、「おはよう」の「よ」にアクセントをつけたイントネーションの回答フレーズを、データベース１０は含むとする。そして、データベース１０を、関西弁モードに対応させる。特定の音にアクセントをつける方法として、例えば、回答フレーズを一文字ずつに分け、特定の文字にアクセントをつけてもよい。また、特定の文字にアクセントがついた回答フレーズを実際に人が発した声を予め録音し、録音した音声を用いてもよい。 First, the storage unit 24 stores the database 10 having the same recognition phrase and response phrase as the database 1 of FIG. 2 and in which each phrase of the response phrase includes an intonation different from the response phrase of the database 1. In the present embodiment, it is assumed that the database 10 includes an answer phrase of intonation in which “Yo” of “Good morning” is accented with respect to the recognition phrase “Good morning”. Then, the database 10 is made to correspond to the Kansai dialect mode. As a method of accenting a specific sound, for example, the answer phrase may be divided into characters one by one and accented on a specific character. In addition, voices actually produced by people may be prerecorded and voices may be used.

ユーザが発話装置１０に対して、アクセントがなくフラットなイントネーションで「おはよう」と発した場合に、発話内容決定部２６が行う処理について、説明する。 A process performed by the utterance content determination unit 26 when the user issues "Good morning" with a flat intonation without an accent to the utterance device 10 will be described.

まず、発話内容決定部２６は、音声情報認識部２５から、認識フレーズ「おはよう」と、認識イントネーション「アクセントがないフラット」とを示す認識情報を取得する。そして、発話内容決定部２６は、取得した認識情報から、発話装置１０が発話すべき発話内容を決定する。 First, the utterance content determination unit 26 acquires, from the voice information recognition unit 25, recognition information indicating the recognition phrase "Ohayou" and the recognition intonation "flat without accent". Then, the utterance content determination unit 26 determines the utterance content to be uttered by the utterance device 10 from the acquired recognition information.

例えば、サーバ２０に関西弁モードが設定されている場合、発話内容決定部２６は、関西弁モードに対応するデータベース１０を参照し、認識フレーズ「おはよう」に対応する回答フレーズ「おはよう」を選択する。また、発話内容決定部２６は、「おはよう」のイントネーションを、データベース１０の回答フレーズのイントネーションである「よ」にアクセントをつけるイントネーションに設定する。そして、発話内容決定部２６は、選択した「おはよう」というフレーズと、「よ」にアクセントをつけるイントネーションとを、発話内容として決定する。 For example, when the Kansai dialect mode is set in the server 20, the utterance content determination unit 26 refers to the database 10 corresponding to the Kansai dialect mode, and selects the answer phrase "good morning" corresponding to the recognition phrase "good morning". . In addition, the utterance content determination unit 26 sets the intonation of “good morning” as an intonation for accenting “Y” which is the intonation of the answer phrase in the database 10. Then, the utterance content determination unit 26 determines, as the utterance content, the selected phrase "Good morning" and the intonation for accenting "Yo".

なお、サーバ２０は、１又は複数の回答フレーズに対してそれぞれ異なるイントネーションが設定されている複数のフレーズデータベースを参照する構成としてもよい。このような構成において、発話内容決定部２６は、参照する上記フレーズデータベースを切り替えることにより、発話内容を決定する。 The server 20 may refer to a plurality of phrase databases in which different intonations are set for one or a plurality of answer phrases. In such a configuration, the utterance content determination unit 26 determines the utterance content by switching the phrase database to be referred to.

また、この場合、サーバ２０は、フレーズデータベースに対応付けられた認識データベースであって、イントネーションが設定された認識フレーズを含み回答フレーズを含まない認識データベースを参照する構成としてもよい。当該構成において、音声情報認識部２５は、当該認識データベースを参照し、入力音声情報に含まれるフレーズに応じて、当該認識データベースを切り替え、入力音声情報を認識することができる。 Further, in this case, the server 20 may be configured to refer to a recognition database that is a recognition database associated with the phrase database and that includes a recognition phrase for which intonation is set and does not include an answer phrase. In the configuration, the voice information recognition unit 25 can switch the recognition database according to the phrase included in the input voice information by referring to the recognition database, and can recognize the input voice information.

このように、実施形態１に係る発話システム１では、ユーザからの「おはよう」に対して、「おはよう」または「今日も気合入れていこう！」を、発話装置１０から発話させることができる。また、ユーザからの「おはよう」に対して、アクセントのないフラットなイントネーションの「おはよう」や、「よ」にアクセントのついたイントネーションの「おはよう」を、発話装置１０から発話させることができる。したがって、発話システム１は、同一カテゴリ（「朝のあいさつ」とういカテゴリ）に含まれる複数のフレーズ（「おはよう」及び「今日も気合入れていこう！」）から発話すべきフレーズを選択する選択処理と、発話すべきフレーズのイントネーションを設定する設定処理と、の少なくとも何れかの処理を行うことにより、ユーザに合わせた発話内容を発話装置１０から発話させることができるので、従来に比べてユーザと円滑なコミュニケーションを図ることができる。なお、認識フレーズと回答フレーズとの例を、図３に示す。図３は、本発明の実施形態１に係る発話システムにおける認識フレーズと回答フレーズとの例である。 As described above, in the speech system 1 according to the first embodiment, "Good morning" or "Let's get excited today" can be uttered from the speech device 10 with respect to "Good morning" from the user. In addition, it is possible to cause the speech device 10 to utter "Good morning" with a flat intonation without accent and "Good morning" with intonation accented with "yo" with respect to "Good morning" from the user. Therefore, the speech system 1 performs selection processing for selecting a phrase to be uttered from a plurality of phrases ("Ohayou" and "I will also try today!") Included in the same category ("Amorning Greeting" category). By performing at least one of the processing for setting the intonation of the phrase to be uttered, and the processing for setting the intonation of the phrase to be uttered, it is possible to cause the uttering device 10 to utter the uttered contents tailored to the user. Smooth communication can be achieved. An example of the recognition phrase and the answer phrase is shown in FIG. FIG. 3 is an example of a recognition phrase and an answer phrase in the speech system according to the first embodiment of the present invention.

なお、サーバ２０が備える制御部２２及び記憶部２３は、発話装置１０が備える構成であってもよい。この場合、発話装置１０の発話内容決定部２６において、同一カテゴリから発話すべきフレーズを選択する選択処理と、発話すべきフレーズのイントネーションを設定する設定処理と、の少なくとも何れかの処理を行うことにより発話内容を決定する。そして、発話部１３は、当該発話内容を取得し、当該発話内容に含まれるフレーズを、当該発話内容に含まれるイントネーションで発話する。したがって、ユーザに合わせた発話内容を発話装置１０は発話することができるので、ユーザと円滑なコミュニケーションを図ることができる。 The control unit 22 and the storage unit 23 included in the server 20 may be included in the speech apparatus 10. In this case, at least one of selection processing for selecting a phrase to be uttered from the same category and setting processing for setting intonation of the phrase to be uttered in the utterance content determination unit 26 of the utterance device 10. Determine the content of the utterance by Then, the utterance unit 13 obtains the utterance content, and utters a phrase included in the utterance content with intonation included in the utterance content. Therefore, since the speech apparatus 10 can utter the speech contents tailored to the user, smooth communication can be achieved with the user.

〔実施形態２〕
実施形態１では、発話システム１は、入力音声情報に含まれるフレーズによって、発話装置１０に発話させるフレーズを選択または発話装置１０に発話させるフレーズのイントネーションを設定したが、入力音声情報に含まれるイントネーションによって、発話装置１０に発話させるフレーズを選択する、または発話装置１０に発話させるフレーズのイントネーションを設定する構成としてもよい。 Second Embodiment
In the first embodiment, the speech system 1 selects a phrase to be uttered by the speech device 10 or sets an intonation of a phrase to be uttered by the speech device 10 according to the phrase contained in the input speech information. According to the above, the phrase to be uttered by the speech device 10 may be selected, or the intonation of the phrase to be uttered by the speech device 10 may be set.

まず、図２に示すデータベース１、データベース２、及びデータベース３の認識フレーズと共にフラットなイントネーションを含め、データベース１の各認識フレーズの特定の音にアクセントのあるイントネーションを含めたデータベース４を、記憶部２３が記憶している場合について、説明する。本実施形態では、データベース４のイントネーションとして、「おはよう」の「よ」にアクセントをつけたイントネーションとし、各データベースに対応するモードは、実施形態１と同じとする。そして、入力音声情報が、「おはよう」の「よ」にアクセントをつけたイントネーションを含む場合について、説明する。 First, storage unit 23 includes database 1 shown in FIG. 2, database 2 and database 3 together with recognition phrases of database 3 including flat intonation and database 4 including intonation with accents in specific sounds of each recognition phrase of database 1. The case where is memorized is explained. In this embodiment, the intonation of the database 4 is assumed to be an intonation accented with “Yo” of “Ohayou”, and the mode corresponding to each database is the same as that of the first embodiment. Then, a case where the input voice information includes an intonation in which “Yo” of “Good morning” is accented will be described.

まず、音声情報認識部２５は、入力音声情報を認識し、認識フレーズが「およよう」、認識イントネーションが「おはよう」の「よ」にアクセントをつけたイントネーションであることを示す認識情報を発話内容決定部２６に出力する。 First, the voice information recognition unit 25 recognizes input voice information, and utters recognition information indicating that the recognition phrase is "Toyo" and the recognition intonation accents "Yo" of "Ohayou". The content is output to the content determination unit 26.

例えば、サーバ２０に関西弁モードが設定されている場合、発話内容決定部２６は、関西弁モードに対応するデータベース４を参照し、データベース４の認識フレーズと、認識フレーズのイントネーションとが、取得した認識情報と一致するか否かを判定する。本実施形態では、データベース４の認識フレーズと、認識フレーズのイントネーションとが、取得した認識情報と一致するので、発話内容決定部２６は、認識情報に対応した「おはよう」というフレーズと、「よ」にアクセントをつけるイントネーションとを、発話内容として決定する。 For example, when the Kansai dialect mode is set in the server 20, the utterance content determination unit 26 refers to the database 4 corresponding to the Kansai dialect mode and acquires the recognition phrase of the database 4 and the intonation of the recognition phrase. It is determined whether or not it matches the recognition information. In the present embodiment, since the recognition phrase of the database 4 and the intonation of the recognition phrase coincide with the acquired recognition information, the utterance content determination unit 26 recognizes the phrase “Ohayou” corresponding to the recognition information, and “Y”. The intonation to accentuate is determined as the utterance content.

また、例えば、サーバ２０に標準モードが設定されている場合、発話内容決定部２６は、標準モードに対応するデータベース１を参照し、データベース１の認識フレーズと、認識フレーズのイントネーションとが、取得した認識情報と一致するか否かを判定する。本実施形態では、データベース４の認識フレーズのイントネーションと、認識情報のイントネーションとは一致しないので、発話内容決定部２６は、発話内容を決定しない。 Further, for example, when the standard mode is set in the server 20, the utterance content determination unit 26 refers to the database 1 corresponding to the standard mode, and the recognition phrase of the database 1 and the intonation of the recognition phrase are acquired. It is determined whether or not it matches the recognition information. In the present embodiment, since the intonation of the recognition phrase of the database 4 does not match the intonation of the recognition information, the speech content determination unit 26 does not determine the speech content.

このように実施形態２に係る発話システム１では、入力音声に含まれるイントネーションに応じて、発話内容を決定することができる。したがって、発話システム１は、ユーザのイントネーションに応じた発話内容を発話装置１０から発話させることができるので、従来に比べてユーザとより円滑なコミュニケーションを図ることができる。 As described above, in the speech system 1 according to the second embodiment, the contents of speech can be determined according to intonation included in the input speech. Therefore, since the speech system 1 can cause the speech device 10 to utter the speech contents according to the intonation of the user, smoother communication with the user can be achieved as compared with the conventional case.

〔実施形態３〕
上述した実施形態では、発話システム１は、認識した入力音声情報に対応した発話内容を、発話装置１０から発話させたが、入力音声情報を取得しない構成であってもよく、発話内容決定部２６は、予め定められた条件が満たされた場合に、予め定められた処理を実行してもよい。 Third Embodiment
In the embodiment described above, the speech system 1 utters the speech content corresponding to the recognized input speech information from the speech device 10. However, the speech content determination unit 26 may be configured not to acquire the input speech information. The controller may execute a predetermined process when a predetermined condition is satisfied.

例えば、予め定められた条件が「朝の７時」であり、予め定められた処理が「朝のあいさつをする」であった場合、発話内容決定部２６は、朝の７時になると、朝のあいさつに含まれるフレーズを選択し、発話内容を決定する。例えば、サーバ２０に標準モードが設定されている場合、朝の７時になると、発話内容決定部２６は、標準モードに対応するデータベース１を参照し、朝のあいさつである「おはよう」というフレーズと、アクセントのないフラットなイントネーションとを、発話内容として決定する。また、例えば、サーバ２０に丁寧モードが設定されている場合、朝の７時になると、発話内容決定部２６は、参照するデータベースを、丁寧モードに対応するデータベース３に切り替える。そして、発話内容決定部２６は、朝のあいさつである「おはようございます」というフレーズと、アクセントのないフラットなイントネーションとを、発話内容として決定する。 For example, if the predetermined condition is "7 o'clock in the morning" and the predetermined process is "to deliver a greeting in the morning", the utterance content determination unit 26 proceeds to 7 o'clock in the morning. Select the phrase included in the greeting and decide the content of the utterance. For example, when the standard mode is set in the server 20, the utterance content determination unit 26 refers to the database 1 corresponding to the standard mode at 7 o'clock in the morning, and the phrase "Good morning" which is a morning greeting, A flat intonation without accents is determined as the utterance content. Further, for example, when the polite mode is set in the server 20, the utterance content determination unit 26 switches the database to be referred to the database 3 corresponding to the polite mode at 7 o'clock in the morning. Then, the utterance content determination unit 26 determines, as the utterance content, a phrase "Good morning" which is a morning greeting and a flat intonation without an accent.

このように、実施形態３に係る発話システム１は、予め設定された条件が満たされた場合に、予め定められた処理を実行することができる。したがって、発話システム１は、ユーザから入力音声情報を取得しなくても、発話装置１０が自ら発話するので、従来に比べてユーザとより円滑なコミュニケーションを図ることができる。 Thus, the speech system 1 according to the third embodiment can execute a predetermined process when a preset condition is satisfied. Therefore, the speech system 1 can communicate more smoothly with the user as compared with the related art because the speech device 10 speaks itself without acquiring the input speech information from the user.

また、実施形態３では、認識フレーズと回答フレーズとが対応する必要はなく、認識フレーズと回答フレーズとが別々になったデータベースを参照する構成であってもよい。このような構成において、認識フレーズを含まないフレーズデータベースが記憶部２３に複数記憶されており、発話内容決定部２６は、これらのフレーズデータベース切り替えることにより、発話内容を決定することができる。 Further, in the third embodiment, the recognition phrase and the answer phrase do not need to correspond to each other, and the configuration may be such that a database in which the recognition phrase and the answer phrase are separated is referred to. In such a configuration, a plurality of phrase databases not including a recognition phrase are stored in the storage unit 23, and the utterance content determination unit 26 can determine the utterance content by switching the phrase databases.

〔実施形態４〕
実施形態３において説明したように、記憶部２３に記憶されるデータベースは、認識フレーズと回答フレーズとが別々になったデータベースであってもよい。したがって、回答フレーズを含まない認識データベースを切り替えることにより、音声情報認識部２５は、入力音声情報を認識するという構成であってもよい。 Embodiment 4
As described in the third embodiment, the database stored in the storage unit 23 may be a database in which the recognition phrase and the answer phrase are separated. Therefore, the voice information recognition unit 25 may be configured to recognize the input voice information by switching the recognition database not including the answer phrase.

例えば、図２に示すデータベース１、データベース２、及びデータベース３の回答フレーズを含まない認識データベース１、認識データベース２、及び認識データベース３を記憶部２３が記憶している場合について、説明する。この場合、入力音声情報がフラットなイントネーションの「おはようございます」を含む場合、音声情報認識部２５は、認識データベースを切り替えることにより、入力音声情報に含まれるフレーズが認識データベース１及び認識データベース２の認識フレーズとは異なり、認識データベース３の認識フレーズと一致することを認識することができる。 For example, the case where the storage unit 23 stores the recognition database 1, the recognition database 2, and the recognition database 3 which do not include the answer phrases of the database 1, the database 2, and the database 3 shown in FIG. 2 will be described. In this case, when the input speech information includes flat intonation “Good morning”, the speech information recognition unit 25 switches the recognition database so that the phrases included in the input speech information are recognized database 1 and recognition database 2. Unlike the recognition phrase, it can be recognized that it matches the recognition phrase of the recognition database 3.

また、実施形態２において説明したように、認識データベースに認識フレーズと共にイントネーションを含めておくことにより、イントネーションによって認識データベースを切り替える構成であってもよい。 Further, as described in the second embodiment, the recognition database may be switched by the intonation by including the intonation together with the recognition phrase in the recognition database.

例えば、上述した認識データベース１、認識データベース２、及び認識データベース３の認識フレーズと共にフラットなイントネーションを含め、データベース１の各認識フレーズの特定の音にアクセントがついたイントネーションを含めた認識データベース４を、記憶部２３が記憶している場合について、説明する。本実施形態では、データベース４のイントネーションとして、「おはよう」の「よ」にアクセントをつけたイントネーションとする。 For example, a recognition database 4 including a flat intonation together with the recognition phrases of the recognition database 1, the recognition database 2 and the recognition database 3 described above, and an intonation in which a specific sound of each recognition phrase in the database 1 is accented The case where the storage unit 23 stores data will be described. In the present embodiment, the intonation of the database 4 is an intonation in which “Yo” of “Good morning” is accented.

この場合、入力音声情報が「おはよう」の「よ」にアクセントをつけたイントネーションを含んでいる場合、音声情報認識部２５は、認識データベースを切り替えることにより、認識データベース１、認識データベース２、及び認識データベース３の認識フレーズのイントネーションとは異なり、認識データベース４の認識フレーズのイントネーションと一致することを認識することができる。 In this case, when the input voice information includes intonation in which "Oha" of "Ohayou" is accented, the voice information recognition unit 25 switches the recognition database to recognize the recognition database 1, the recognition database 2, and the recognition. Unlike the intonation of the recognition phrase of the database 3, it can be recognized that it matches with the intonation of the recognition phrase of the recognition database 4.

〔実施形態５〕
上述した実施形態において、取得した認識フレーズと一致する認識フレーズが参照するデータベースに含まれている場合、または取得した認識イントネーションが参照するデータベースに含まれている場合、発話内容決定部２６は、認識フレーズが含まれるカテゴリと一致するカテゴリに含まれるフレーズを、発話内容として決定している。一方、取得した認識フレーズと一致する認識フレーズが参照するデータベースに含まれていない場合、及び取得した認識イントネーションと一致するイントネーションが参照するデータベースに含まれていない場合、の少なくとも何れかの場合に、発話内容決定部２６は発話内容を決定しないとした。しかしながら、ユーザが発したフレーズに対して発話装置１０が何も反応しないと、ユーザは、自らが発したフレーズを発話装置が音声情報として取得していないのか、それとも、自らが発したフレーズに対応する回答がないのか、それとも故障しているのか、を判断することができない。そこで、本実施形態では、ユーザが発したフレーズに対応する回答が、参照するデータベースに存在しない場合でも、発話内容決定部２６が発話内容を決定する処理について、図４を用いて説明する。 Fifth Embodiment
In the embodiment described above, when the recognition phrase that matches the acquired recognition phrase is included in the referenced database, or when the acquired recognition intonation is included in the database that is referred to, the utterance content determination unit 26 recognizes A phrase included in a category that matches a category including a phrase is determined as the utterance content. On the other hand, at least one of the case where the recognition phrase that matches the acquired recognition phrase is not included in the referenced database and the case where the intonation that matches the acquired recognized intonation is not included in the database. The utterance content determination unit 26 does not determine the utterance content. However, if the speech device 10 does not respond to the phrase issued by the user, the user may not acquire the phrase issued by the user as speech information, or may respond to the phrase issued by the user. It can not be judged whether there is no answer or failure. So, in this embodiment, even when the answer corresponding to the phrase which a user uttered does not exist in the database to refer to, the processing in which the utterance content determination unit 26 determines the utterance content will be described using FIG.

図４は、本発明の実施形態５に係る発話システム１において、発話内容決定部２６が発話内容を決定するために参照するあいまいデータベースの例である。あいまいデータベースとは、あいまいなフレーズ（以下、あいまいフレーズとも呼ぶ）を含んでいるデータベースである。ここで、あいまいフレーズとは、入力音声情報が含むフレーズのカテゴリとは異なるカテゴリに含まれるフレーズであると表現することもできる。換言すると、あいまいフレーズとは、入力音声情報に含まれるフレーズ及びイントネーションが、所定のフレーズ及びイントネーションと一致する場合に選択されるフレーズのカテゴリとは異なるカテゴリに含まれるフレーズであると表現することもできる。また、あいまいフレーズとは、入力音声情報が認識できない、または認識できたが対応する回答フレーズがないということを暗示するフレーズであると表現することもできる。 FIG. 4 is an example of a fuzzy database which the speech content determination unit 26 refers to in order to determine the speech content in the speech system 1 according to the fifth embodiment of the present invention. The ambiguous database is a database including ambiguous phrases (hereinafter also referred to as ambiguous phrases). Here, the ambiguous phrase can also be expressed as a phrase included in a category different from the category of the phrase included in the input speech information. In other words, the ambiguous phrase may be expressed as a phrase included in input speech information and a phrase included in a category different from the category of the phrase selected when the phrase and the intonation match the predetermined phrase and the intonation. it can. In addition, the ambiguous phrase can also be expressed as a phrase that implies that the input speech information can not be recognized or has been recognized but there is no corresponding answer phrase.

本実施形態において、データベース４は標準モードに対応し、データベース５は元気モードに対応する。なお、あいまいフレーズの例を、図５に示す。図５は、本発明の実施形態５に係る発話システム１における、あいまいフレーズの例である。 In the present embodiment, the database 4 corresponds to the standard mode, and the database 5 corresponds to the fine mode. An example of the vague phrase is shown in FIG. FIG. 5 is an example of the ambiguous phrase in the speech system 1 according to the fifth embodiment of the present invention.

続いて、発話内容決定部２６が、取得した認識フレーズと一致する認識フレーズが参照するデータベースに含まれていない場合にあいまいデータベースを参照する例について、説明する。 Subsequently, an example in which the utterance content determination unit 26 refers to the fuzzy database when the recognition phrase that matches the acquired recognition phrase is not included in the referred database will be described.

まず、音声情報認識部２５は、入力音声情報を認識し、認識情報を発話内容決定部２６に出力する。発話内容決定部２６は、認識情報に基づく発話内容を、サーバ２０に設定されているモードに応じて決定する。ここで、認識情報に含まれる認識フレーズが、参照するデータベースの認識フレーズと一致しない場合、発話内容決定部２６は、あいまいデータベースを参照し、あいまいフレーズを選択する。 First, the voice information recognition unit 25 recognizes input voice information, and outputs the recognition information to the utterance content determination unit 26. The utterance content determination unit 26 determines the utterance content based on the recognition information according to the mode set in the server 20. Here, when the recognition phrase included in the recognition information does not match the recognition phrase of the database to be referred to, the utterance content determination unit 26 refers to the fuzzy database and selects the fuzzy phrase.

例えば、サーバ２０に標準モードが設定されており、入力音声情報が「おはようございます」というフレーズを含んでいる場合、発話内容決定部２６は、まずデータベース１を参照し、「おはようございます」に対応する回答フレーズを選択する。ここで、データベース１には、「おはようございます」に対応する回答フレーズはないため、続いて、データベース４を参照し、「うんうん」を回答フレーズとして選択する。なお、発話内容決定部２６は、データベース４から回答フレーズを選択する場合に、所定の条件（例えば、データベース４に含まれる回答フレーズを、上から順番に選択する、など）に基づいて選択してもよいし、ランダムに回答フレーズを選択してもよい。ランダムに回答フレーズを選択する構成とすれば、ユーザに対して、より自然なコミュニケーションの印象を与えることができる。 For example, when the standard mode is set in the server 20 and the input voice information includes the phrase "Good morning", the utterance content determination unit 26 first refers to the database 1 and "Good morning". Select the corresponding answer phrase. Here, since there is no answer phrase corresponding to "Good morning" in the database 1, subsequently, referring to the database 4, "un un" is selected as the answer phrase. In addition, when selecting an answer phrase from the database 4, the utterance content determination unit 26 selects a predetermined condition (for example, selecting an answer phrase included in the database 4 in order from the top). You may choose an answer phrase randomly. By selecting an answer phrase at random, it is possible to give the user a more natural impression of communication.

また、例えば、サーバ２０に元気モードが設定されており、入力音声情報が「おはようございます」というフレーズを含んでいる場合、発話内容決定部２６は、まずデータベース２を参照し、「おはようございます」に対応する回答フレーズを選択する。ここで、データベース２には、「おはようございます」に対応する回答フレーズはないため、続いて、データベース５を参照し、「いいことありそう！」を回答フレーズとして選択する。 Further, for example, when the fine mode is set in the server 20 and the input voice information includes the phrase "Good morning", the utterance content determination unit 26 first refers to the database 2, "Good morning. Select the answer phrase corresponding to Here, since there is no answer phrase corresponding to "Good morning" in the database 2, subsequently, the database 5 is referred to and "Good thing likely" is selected as the answer phrase.

このように、実施形態５に係る発話システム１では、ユーザからの入力音声情報に含まれるフレーズが、所定のフレーズと一致しない場合（音声情報に含まれるフレーズ対応する回答フレーズがない場合）、発話装置１０は、あいまいフレーズを発話する。したがって、ユーザは、あたかも人と会話しているかのように発話装置と会話することができるので、発話システム１では、ユーザとより円滑なコミュニケーションを図ることができる。 Thus, in the speech system 1 according to the fifth embodiment, when the phrase included in the input speech information from the user does not match the predetermined phrase (when there is no answer phrase corresponding to the phrase included in the speech information), the speech The device 10 utters a vague phrase. Therefore, since the user can talk with the speech device as if he / she is talking to a person, the speech system 1 can communicate more smoothly with the user.

次に、発話内容決定部２６が、取得した認識フレーズと一致する認識フレーズが参照するデータベースに含まれている場合であっても、取得した認識イントネーションが参照するデータベースと一致しない場合にあいまいデータベースを参照する例について、説明する。 Next, even if the utterance content determination unit 26 includes a recognition phrase that matches the acquired recognition phrase in the referenced database, if the acquired recognition intonation does not match the referenced database, An example to be referred to will be described.

まず、音声情報認識部２５は、入力音声情報を認識し、認識情報を発話内容決定部２６に出力する。発話内容決定部２６は、認識情報に基づく発話内容を、サーバ２０に設定されているモードに応じて決定する。ここで、認識情報に含まれる認識フレーズが、参照するデータベースの認識フレーズと一致しているが、認識イントネーションが、参照するデータベースの認識フレーズに設定されたイントネーションと一致しない場合、発話内容決定部２６は、あいまいデータベースを参照し、あいまいフレーズを選択する。 First, the voice information recognition unit 25 recognizes input voice information, and outputs the recognition information to the utterance content determination unit 26. The utterance content determination unit 26 determines the utterance content based on the recognition information according to the mode set in the server 20. Here, when the recognition phrase included in the recognition information matches the recognition phrase of the referenced database, but the recognition intonation does not match the intonation set in the recognition phrase of the referenced database, the utterance content determination unit 26 Refers to the fuzzy database and selects a fuzzy phrase.

例えば、サーバ２０に標準モードが設定されており、入力音声情報が、フレーズ「おはよう」の「よ」にアクセントをつけたイントネーションを含む場合、発話内容決定部２６は、認識フレーズ「おはよう」及び「よ」にアクセントをつけた認識イントネーションに対応する回答フレーズを、データベース１から選択する。ここで、データベース１には、認識フレーズ「おはよう」に対応する回答フレーズ「おはよう」はあるが、「おはよう」及び「よ」にアクセントをつけた認識イントネーションに対応する回答フレーズはないため、発話内容決定部２６は、データベース４を参照し、「もう１回言って」を回答フレーズとして選択する。なお、発話内容決定部２６は、データベース４から回答フレーズを選択する場合に、上述したように、所定の条件（例えば、データベース４に含まれる回答フレーズを、上から順番に選択する、など）に基づいて選択してもよいし、ランダムに回答フレーズを選択してもよい。ランダムに回答フレーズを選択する構成とすれば、ユーザに対して、より自然なコミュニケーションの印象を与えることができる。 For example, when the standard mode is set in the server 20 and the input voice information includes intonation accented with "Y" of the phrase "Ohayou", the utterance content determination unit 26 recognizes the recognition phrases "Ohayou" and " The answer phrase corresponding to the recognition intonation accented with "Y" is selected from the database 1. Here, although there is an answer phrase "Good morning" corresponding to the recognition phrase "Good morning" in the database 1, there is no answer phrase corresponding to the recognition intonation accented with "Good morning" and "Yo", so the utterance content The determination unit 26 refers to the database 4 and selects “say once more” as the answer phrase. Note that, as described above, the utterance content determination unit 26 selects predetermined conditions (for example, sequentially selects the answer phrases included in the database 4 from the top, etc.) when selecting the answer phrases from the database 4. It may be selected based on it, or the answer phrase may be selected at random. By selecting an answer phrase at random, it is possible to give the user a more natural impression of communication.

また、例えば、サーバ２０に元気モードが設定されており、入力音声情報がフレーズ「おはよう」の「よ」にアクセントをつけたイントネーションを含む場合、発話内容決定部２６は、認識フレーズ「おはよう」及び「よ」にアクセントをつけた認識イントネーションに対応する回答フレーズを、データベース２から選択する。ここで、データベース２には、認識フレーズ「おはよう」に対応する回答フレーズ「今日も気合入れていこう！」はあるが、「おはよう」及び「よ」にアクセントをつけた認識イントネーションに対応する回答フレーズはないため、発話内容決定部２６は、データベース５を参照し、「声が小さい！」を回答フレーズとして選択する。 Further, for example, when the fine mode is set in the server 20 and the input voice information includes intonation accented with “Yo” of the phrase “Good morning”, the utterance content determination unit 26 recognizes the recognition phrase “Good morning” and The response phrase corresponding to the recognition intonation accented with "Y" is selected from the database 2. Here, database 2 has an answer phrase "I will try to compliment you today!" Corresponding to the recognition phrase "Good morning", but an answer phrase corresponding to a recognition intonation accented with "Good morning" and "Yo". Since the utterance content determination unit 26 refers to the database 5, the utterance content determination unit 26 selects "voice is small!" As an answer phrase.

このように、実施形態５に係る発話システム１では、ユーザからの入力音声情報に含まれるフレーズが、所定のフレーズと一致する場合（音声情報に含まれるフレーズ対応するフレーズがある場合）であっても、所定のイントネーションと一致しない場合、発話装置１０は、あいまいフレーズを発話する。したがって、ユーザは、あたかも人と会話しているかのように発話装置と会話することができるので、発話システム１では、ユーザとより円滑なコミュニケーションを図ることができる。 Thus, in the speech system 1 according to the fifth embodiment, the phrase included in the input voice information from the user matches the predetermined phrase (when there is a phrase corresponding to the phrase included in the voice information), Also, when the speech does not match the predetermined intonation, the speech device 10 utters the vague phrase. Therefore, since the user can talk with the speech device as if he / she is talking to a person, the speech system 1 can communicate more smoothly with the user.

〔実施形態６〕
上述した実施形態では、サーバ２０は、入力音声情報に対応した回答フレーズを発話装置１０から発話させたが、入力音声情報に基づいて外部情報を取得し、外部情報に対応した回答フレーズを発話装置１０に発話させる構成としてもよい。外部情報を取得する構成である実施形態６について、図６を用いて説明する。 Sixth Embodiment
In the embodiment described above, the server 20 causes the utterance device 10 to utter the answer phrase corresponding to the input voice information, but acquires external information based on the input voice information, and utters the answer phrase corresponding to the external information It is good also as composition which makes 10 speak. Sixth Embodiment A configuration for acquiring external information will be described with reference to FIG.

図６は、本発明の実施形態６に係る発話システム１において、発話内容決定部２６が発話内容を決定するために参照するデータベースの例である。図６のデータベース６は、認識フレーズに対応する指示が含まれている。指示とは、発話内容決定部２６が実行する処理のことである。例えば、図６のデータベース６には、「ネットワークから天気情報を取得する」という指示が含まれている。続いて、データベース７及びデータベース８は、外部情報に対応した回答フレーズが含まれている。外部情報とは、発話内容決定部２６が外部情報取得部２４を介して取得した情報である。例えば、図６のデータベース７及び８には、「晴れ」「雨」という天気に関する天気情報が外部情報として含まれている。本実施形態では、データベース７は標準モード、データベース８は元気モードにそれぞれ対応しており、入力音声情報が「今日の天気は？」というフレーズであった場合を例に挙げ、説明する。 FIG. 6 is an example of a database which the utterance content determination unit 26 refers to in order to determine the utterance content in the utterance system 1 according to the sixth embodiment of the present invention. The database 6 of FIG. 6 includes an instruction corresponding to the recognition phrase. The instruction is a process performed by the utterance content determination unit 26. For example, the database 6 of FIG. 6 includes an instruction “to acquire weather information from the network”. Subsequently, the database 7 and the database 8 include the response phrase corresponding to the external information. The external information is information acquired by the utterance content determination unit 26 via the external information acquisition unit 24. For example, in the databases 7 and 8 of FIG. 6, weather information on weather such as "fine" and "rain" is included as external information. In the present embodiment, the database 7 corresponds to the standard mode, and the database 8 corresponds to the fine mode, and a case where the input voice information is the phrase "What is the weather today?" Will be described as an example.

まず、音声情報認識部２５は、入力音声情報を認識し、認識フレーズが「今日の天気は？」であることを示す認識情報を、発話内容決定部２６に出力する。発話内容決定部２６は、参照するデータベースを切り替え、取得した認識情報に対応する「ネットワークから天気情報を取得する」という指示を実行する。そして、発話内容決定部２６は、外部情報取得部２４を介して、天気情報を取得する。 First, the voice information recognition unit 25 recognizes the input voice information, and outputs recognition information indicating that the recognition phrase is "What is the weather today?" To the utterance content determination unit 26. The utterance content determination unit 26 switches the database to be referred to, and executes an instruction “acquire weather information from the network” corresponding to the acquired recognition information. Then, the utterance content determination unit 26 acquires weather information via the external information acquisition unit 24.

続いて、発話内容決定部２６は、取得した天気情報に対応する回答フレーズを選択する。 Subsequently, the utterance content determination unit 26 selects an answer phrase corresponding to the acquired weather information.

例えば、取得した天気情報が「晴れ」であり、サーバ２０に標準モードが設定されている場合、発話内容決定部２６は、標準モードに対応するデータベース７を参照し、「晴れだよ」というフレーズを発話内容として決定する。 For example, when the acquired weather information is "fine" and the standard mode is set in the server 20, the utterance content determination unit 26 refers to the database 7 corresponding to the standard mode, and the phrase "fine" is obtained. Are determined as the utterance content.

また、例えば、取得した天気情報が「雨」であり、サーバ２０に元気モードが設定されている場合、発話内容決定部２６は、元気モードに対応するデータベース８を参照し、「雨だー！」というフレーズを発話内容として決定する。 Also, for example, when the acquired weather information is "rain" and the fine mode is set in the server 20, the utterance content determination unit 26 refers to the database 8 corresponding to the fine mode, and "rainy! The phrase "" is determined as the utterance content.

このように、実施形態６に係る発話システム１では、入力音声情報に対応した指示、及び外部情報に対応した回答フレーズを含むデータベースを備えることにより、外部情報及びサーバ２０に設定されているモードに応じた発話内容を、発話装置１０に発話させることができる。したがって、発話システム１は、リアルタイムに取得した外部情報に応じた発話内容を、発話装置１０から発話させることができる。 As described above, in the speech system 1 according to the sixth embodiment, the mode set in the external information and the server 20 is provided by providing the database including the instruction corresponding to the input voice information and the answer phrase corresponding to the external information. It is possible to cause the utterance device 10 to utter the corresponding utterance content. Therefore, the speech system 1 can cause the speech device 10 to utter speech contents according to the external information acquired in real time.

〔実施形態７〕
本実施形態では、サーバ２０に設定されている発話モードを、ユーザが発する音声によってサーバ２０が変更する処理について、図７を用いて説明する。なお、本実施形態において、サーバ２０から送信された発話内容を、発話装置１０の通信部（確認案内フレーズ受信手段）１１を介して発話部（確認案内フレーズ発話手段）１３が取得し、該発話内容を発話部１３が発話する処理、及び、音声情報取得部（回答受付手段）１２が音声情報を取得し、通信部（回答送信手段）１１を介してサーバ２０に送信する処理については、上述の実施形態において既に説明したためここでは説明を省略する。 Seventh Embodiment
In the present embodiment, a process of changing the speech mode set in the server 20 by the user's voice by the server 20 will be described using FIG. 7. In the present embodiment, the utterance unit (confirmation guidance phrase utterance unit) 13 acquires the utterance content transmitted from the server 20 via the communication unit (confirmation guidance phrase reception unit) 11 of the utterance device 10, and the utterance is made The process in which the speech unit 13 utters the contents, and the process in which the voice information acquisition unit (answer reception means) 12 obtains voice information and transmits it to the server 20 via the communication unit (answer transmission means) 11 are described above. The explanation has been omitted here since the explanation has already been made in the embodiment of.

図７は、本発明の実施形態７に係る発話システム１において、サーバ２０に設定されているモードを変更する処理の流れを示すフローチャートである。サーバ２０は、発話装置１０から、モードを変更することを示す操作情報を取得すると、発話内容決定部２６は、モードを変更するか否かをユーザに確認する確認案内フレーズ（例えば、「元気モードに変更します。よろしいですか？」など）を選択する。そして、発話内容決定部２６は、選択した確認案内フレーズを含む発話内容を、サーバ通信部２１（確認案内フレーズ送信手段）を介して発話装置１０に出力する（ステップＳ１）。 FIG. 7 is a flowchart showing a flow of a process of changing the mode set in the server 20 in the speech system 1 according to Embodiment 7 of the present invention. When the server 20 acquires, from the speech device 10, operation information indicating that the mode is to be changed, the utterance content determination unit 26 confirms with the user whether or not to change the mode. Change it to, and so on. " Then, the utterance content determination unit 26 outputs the utterance content including the selected confirmation guidance phrase to the utterance device 10 via the server communication unit 21 (confirmation guidance phrase transmission unit) (step S1).

なお、サーバ２０に対して発話装置１０が複数ある場合、モード変更の対象となる発話装置を特定するための情報を、モードを変更することを示す操作情報に含めておく構成としてもよい。また、サーバ２０は、モードを変更することを示す操作情報を、サーバ２０に接続されている入力装置を介して取得してもよい。また、発話装置１０が備える操作受付部１４が受け付けた操作情報を、発話装置１０がサーバ２０に送信することにより、サーバ２０は、モードを変更することを示す操作情報を取得してよい。 When there are a plurality of speech devices 10 in the server 20, information for specifying a speech device to be a target of mode change may be included in operation information indicating that the mode is to be changed. In addition, the server 20 may obtain operation information indicating that the mode is to be changed, via an input device connected to the server 20. In addition, the speech device 10 may transmit the operation information received by the operation reception unit 14 of the speech device 10 to the server 20, so that the server 20 may acquire the operation information indicating that the mode is to be changed.

また、ユーザが発話装置１０に発する音声によってモードを変更する構成としてもよい。より具体的には、各モードに対応するキャラクタ名称の特徴的な一部（例えば、「アイちゃん」や「ナオちゃん」等）、またはキャラクタ名称の全部（例えば、「元気なアイちゃん」、「のんびりナオちゃん」等）を含む音声情報を、発話装置１０の音声情報取得部１２が取得し、取得した音声情報をサーバ２０が受信することによって、サーバ２０は、設定されているモードを変更してもよい。この場合、音声情報取得部１２が取得した音声情報を、発話装置１０がそのままサーバ２０に送信し、サーバ２０において対応するモードを特定してもよいし、発話装置１０が、各モードとキャラクタ名称とを対応させ、発話装置１０が、音声情報取得部１２が取得した音声情報に対応するモードを特定し、特定したモードを示すモード情報をサーバ２０に送信してもよい。 Further, the mode may be changed according to the sound emitted by the user to the speech device 10. More specifically, a characteristic part of the character name corresponding to each mode (for example, "Ai-chan", "Nao-chan", etc.) or all of the character name (for example When the voice information acquisition unit 12 of the speech device 10 acquires voice information including “Lanly Nao-chan” and the like and the server 20 receives the acquired voice information, the server 20 changes the set mode. May be In this case, the speech apparatus 10 may transmit the speech information acquired by the speech information acquisition unit 12 to the server 20 as it is, and the server 20 may specify the corresponding mode. The speech apparatus 10 may identify each mode and character name. And the speech apparatus 10 may specify a mode corresponding to the voice information acquired by the speech information acquisition unit 12 and transmit mode information indicating the specified mode to the server 20.

また、ステップＳ１において、発話内容決定部２５は、確認案内フレーズとして、モードに対応したキャラクタ名称を使用した確認案内フレーズを選択してもよい。キャラクタ名称を使用した確認案内フレーズの一例として、「元気なアイちゃんになります。いいかな？」が挙げられる。さらに、発話装置１０は、キャラクタ名称を使用した確認案内フレーズを発話する場合、「元気なアイちゃん」のキャラクタイメージを、表示部１５が備える表示パネルに表示させてもよい。そして、発話内容決定部２６は、発話内容の出力とともに、タイマーをスタートさせる（ステップＳ２）。 In step S1, the utterance content determination unit 25 may select a confirmation guidance phrase using a character name corresponding to the mode as the confirmation guidance phrase. As an example of the confirmation guidance phrase using the character name, there is "Becoming a healthy eye. Good?" Furthermore, when uttering the confirmation guidance phrase using the character name, the speech device 10 may display the character image of “Genki Ai-chan” on the display panel of the display unit 15. Then, the utterance content determination unit 26 starts the timer together with the output of the utterance content (step S2).

次に、発話内容決定部２６は、スタートさせたタイマーが、所定の時間を経過し、タイムアウトが発生したか否かを判定する（ステップＳ３）。ここで、所定の時間とは、出力した確認案内フレーズに対するユーザの回答を受け付ける時間である。 Next, the utterance content determination unit 26 determines whether or not the started timer has passed a predetermined time and a timeout has occurred (step S3). Here, the predetermined time is a time when the user's answer to the output confirmation guidance phrase is received.

ステップＳ３において、「タイムアウトが発生した」と判定された場合（ステップＳ３：ＹＥＳ）、発話内容決定部２６は、タイムアウトが発声したため、モードを変更する処理を中止することを示すタイムアウト案内フレーズ（例えば、「時間切れのため、モード変更を中止しました」など）を選択する。そして、発話内容決定部２６は、選択したタイムアウト案内フレーズを含む発話内容を、サーバ通信部２１を介して発話装置１０に出力する（ステップＳ４）。 If it is determined in step S3 that "timeout has occurred" (step S3: YES), the utterance content determination unit 26 indicates a timeout guidance phrase indicating that the process of changing the mode is to be canceled (for example, , "The mode change has been canceled due to time out", etc.). Then, the utterance content determination unit 26 outputs the utterance content including the selected timeout guidance phrase to the utterance device 10 via the server communication unit 21 (step S4).

一方、ステップＳ３において、「タイムアウトは発生していない」と判定された場合（ステップＳ３：ＮＯ）、音声情報認識部（取得手段）２５は、サーバ通信部（回答受付手段）２１を介して、上記確認案内フレーズに対する回答である音声情報を取得したか否かを判定する（ステップＳ５）。 On the other hand, when it is determined in step S3 that "timeout has not occurred" (step S3: NO), the voice information recognition unit (acquisition means) 25 receives the response via the server communication unit (answer acceptance means) 21. It is determined whether voice information which is an answer to the confirmation guidance phrase has been acquired (step S5).

ステップＳ５において、「入力音声情報を取得していない」と判定された場合（ステップＳ５：ＮＯ）、発話内容決定部２６の処理は、タイムアウトが発生したか否かを判定するステップＳ３に戻る。 If it is determined in step S5 that "the input speech information has not been acquired" (step S5: NO), the processing of the utterance content determination unit 26 returns to step S3 of determining whether or not a timeout has occurred.

一方、ステップＳ５において、「入力音声情報を取得した」と判定された場合（ステップＳ５：ＹＥＳ）、音声情報認識部２５は、ユーザからの回答である入力音声情報を認識する（ステップＳ６）。そして、発話内容決定部（モード変更決定手段）２６は、音声情報認識部２５によって認識されたフレーズに応じて、モードを変更するか否かを決定する。 On the other hand, if it is determined in step S5 that "input voice information has been acquired" (step S5: YES), the voice information recognition unit 25 recognizes input voice information which is a response from the user (step S6). Then, the utterance content determination unit (mode change determination means) 26 determines whether or not to change the mode in accordance with the phrase recognized by the voice information recognition unit 25.

ステップＳ６において、「取得した音声情報は、ユーザが変更を承認することを示す確認フレーズ（例えば、「いいよ」など）である」と判定された場合（ステップＳ６：確認フレーズ）、発話内容決定部２６は、モードを変更することを決定する。そして、発話内容決定部２６は、設定されているモードを変更し、モード変更が完了したことを示す確認完了フレーズ（例えば、「モードを変更しました」）を選択する。そして、発話内容決定部２６は、選択した確認完了フレーズを含む発話内容を、サーバ通信部２１を介して発話装置１０に出力する（ステップＳ７）。 If it is determined in step S6 that "the acquired voice information is a confirmation phrase (for example," good "etc.) indicating that the user approves the change" (step S6: confirmation phrase), the utterance content determination The unit 26 decides to change the mode. Then, the utterance content determination unit 26 changes the set mode, and selects a confirmation completion phrase (for example, “mode changed”) indicating that the mode change is completed. Then, the utterance content determination unit 26 outputs the utterance content including the selected confirmation completion phrase to the utterance device 10 via the server communication unit 21 (step S7).

また、ステップＳ６において、「取得した音声情報は、確認フレーズとは異なる誤フレーズ（例えば、「おはよう」など）である」と判定された場合（ステップＳ６：誤フレーズ）、発話内容決定部２６は、モードを変更しないことを決定し、誤フレーズを取得した回数が３回目か否かを判定する（ステップＳ８）。 If it is determined in step S6 that "the acquired voice information is a wrong phrase (for example," good morning "etc.) different from the confirmation phrase" (step S6: false phrase), the utterance content determination unit 26 It is determined that the mode is not changed, and it is determined whether the number of times the wrong phrase has been acquired is the third (step S8).

ステップＳ８において、「誤フレーズを取得した回数は、３回目ではない」と判定された場合（ステップＳ８：ＮＯ）、発話内容決定部２６は、ユーザに再度確認フレーズを発してもらうように促すことを示す再確認案内フレーズ（例えば、「もう１回言って下さい」など）を選択する。そして、発話内容決定部２６は、選択した再確認案内フレーズを含む発話内容を、サーバ通信部２１を介して発話装置１０に出力する（ステップＳ９）。そして、発話内容決定部２６は、再びタイマーを初めからスタートさせるため、ステップＳ２に戻る。 If it is determined in step S8 that "the number of times the wrong phrase has been acquired is not the third time" (step S8: NO), the utterance content determination unit 26 prompts the user to issue the confirmation phrase again. Select a reconfirmation guidance phrase (eg, "Please say one more time," etc.) indicating. Then, the utterance content determination unit 26 outputs the utterance content including the selected reconfirmation guide phrase to the utterance device 10 via the server communication unit 21 (step S9). Then, the utterance content determination unit 26 returns to step S2 to start the timer again from the beginning.

一方、ステップＳ８において、「誤フレーズを取得した回数は、３回目である」と判定された場合（ステップＳ８：ＹＥＳ）、発話内容決定部２６の処理は、モード変更を終了するステップＳ１０に進む。 On the other hand, when it is determined in step S8 that "the number of times the wrong phrase has been acquired is the third time" (step S8: YES), the processing of the utterance content determination unit 26 proceeds to step S10 to end the mode change. .

また、ステップＳ６において、「取得した音声情報は、ユーザがモードの変更を中止することを示す終了フレーズ（例えば、「やめる」など）である」と判定された場合（ステップＳ６：終了フレーズ）、発話内容決定部２６は、モードを変更しないことを決定し、モードを変更する処理を終了することを示す終了案内フレーズ（例えば、「モード変更を中止しました」など）を選択する。そして、発話内容決定部２６は、選択した終了案内フレーズを含む発話内容を、サーバ通信部２１を介して発話装置１０に出力する（ステップＳ１０）。 When it is determined in step S6 that "the acquired voice information is an end phrase (for example," stop "etc.) indicating that the user cancels the mode change (step S6: end phrase) The utterance content determination unit 26 determines not to change the mode, and selects an end guidance phrase (for example, “mode change has been canceled” or the like) indicating that the process of changing the mode is ended. Then, the utterance content determination unit 26 outputs the utterance content including the selected end guidance phrase to the utterance device 10 via the server communication unit 21 (step S10).

また、ステップＳ６において、「取得した音声情報は、ノイズである」と判定された場合（ステップＳ５：ノイズ）、発話内容決定部２６の処理は、モードを変更しないことを決定し、タイムアウトが発生したか否かを判定するステップＳ３に戻る。なお、音声情報がノイズであるか否かを判定する方法として、例えば、入力音声情報に含まれる音の大きさが、所定の範囲に含まれているか否かによって判定する方法が挙げられる。ここで、所定の範囲の例として、ユーザが会話において発する音の大きさに含まれない範囲を挙げると、音声情報に含まれる音の大きさが、ユーザが会話において発する音の大きさより小さい、またはユーザが会話において発する音の大きさより大きい、の何れかに含まれる場合、発話内容決定部２６は、入力音声情報はノイズであると判定する。 When it is determined in step S6 that "the acquired voice information is noise" (step S5: noise), the processing of the utterance content determination unit 26 determines not to change the mode, and a timeout occurs. It returns to step S3 to determine whether or not it has. In addition, as a method of determining whether audio | voice information is noise, the method of determining by whether the magnitude | size of the sound contained in input audio | voice information is contained in the predetermined | prescribed range, for example is mentioned. Here, if the range not included in the loudness of the sound emitted by the user in the conversation is taken as an example of the predetermined range, the loudness of the sound contained in the voice information is smaller than the loudness of the sound emitted by the user in the conversation Alternatively, when it is included in any of the magnitudes of the sounds emitted by the user in the conversation, the utterance content determination unit 26 determines that the input voice information is noise.

このように、実施形態７に係る発話システム１では、サーバ２０に設定されているモードをユーザが変更する場合、ユーザからモード変更の操作を受け付けた後、さらにモード変更するか否かを確認することができる。したがって、サーバ２０に設定されているモードが誤って変更されるのを防ぐことができる。また、発話システム１において、発話装置１０が確認案内フレーズを発話し、それに対してユーザが発話することにより、サーバ２０の設定を変更することができる。換言すると、ユーザと発話装置１０とが会話することによって、サーバ２０の設定が変更できるので、発話システム１は、ユーザと円滑なコミュニケーションを図ることができる。 As described above, in the speech system 1 according to the seventh embodiment, when the user changes the mode set in the server 20, after accepting the mode change operation from the user, it is further confirmed whether or not to change the mode. be able to. Therefore, it is possible to prevent the mode set in the server 20 from being erroneously changed. Further, in the speech system 1, the setting of the server 20 can be changed by the speech apparatus 10 uttering the confirmation guide phrase and the user utters the phrase. In other words, since the setting of the server 20 can be changed by the user speaking with the speech device 10, the speech system 1 can communicate smoothly with the user.

なお、図７のステップＳ５において、音声情報認識部２５が認識した認識フレーズに基づいて発話内容決定部２６は次の処理を実行したが、認識フレーズ及び認識イントネーションに基づいて、発話内容決定部２６は次の処理を実行してもよい。 In step S5 of FIG. 7, the speech content determination unit 26 executes the following processing based on the recognition phrase recognized by the speech information recognition unit 25. However, the speech content determination unit 26 performs the following processing based on the recognition phrase and recognition intonation. May perform the following processing.

例えば、ステップＳ１において、発話内容決定部２６は、モードを変更することを示す操作情報を取得すると、発話内容を決定するために参照するデータベースを、変更した後のモードに対応するデータベースとする構成であってもよい。より具体的には、発話内容決定部２６は、元気モードに変更することを示す操作情報を取得した場合、発話内容決定部２６は、発話内容を決定するために参照するデータベースを、確認フレーズとして「ＯＫ！」「よろしく！」を含んだ元気モードに対応するデータベースに切り替える（ステップＳ１）。なお、モード変更の対象となる発話装置を特定するための情報が操作情報に含まれている場合、モード変更の対象となる発話装置の発話についてデータベースを切り替える。 For example, in step S1, when the utterance content determination unit 26 acquires operation information indicating that the mode is to be changed, the database referred to for determining the utterance content is set as a database corresponding to the mode after the change. It may be More specifically, when the utterance content determination unit 26 acquires operation information indicating to change to the fine mode, the utterance content determination unit 26 determines, as a confirmation phrase, a database to be referred to for determining the utterance content. The database is switched to the database corresponding to the energy mode including "OK!" And "Regards!" (Step S1). In addition, when the information for specifying the speech apparatus used as the object of mode change is contained in operation information, a database is switched about the speech of the speech apparatus used as the object of mode change.

次に、発話内容決定部２６が実行するステップＳ２〜Ｓ４、及び音声情報認識部２５が実行するステップＳ５については、既に説明しているため、省略する。 Next, steps S2 to S4 executed by the utterance content determination unit 26 and step S5 executed by the voice information recognition unit 25 have already been described, and thus are omitted.

続いて、ステップＳ６において、音声情報認識部２５が、ユーザから「いいよ」というフレーズを含む音声情報を取得した場合、元気モードに対応した確認フレーズではないため、誤フレーズとして、発話内容決定部２６は、ステップＳ８に進む。 Subsequently, in step S6, when the voice information recognition unit 25 obtains voice information including the phrase "Good" from the user, it is not a confirmation phrase corresponding to the fine mode, and therefore the utterance content decision unit 26 proceeds to step S8.

このように、実施形態７に係る発話システム１では、変更後のモードにおいて、ユーザは変更後のモードに対応する認識フレーズを入力音声情報として発話できるか否かを、モードを変更する前に確認することができる。 As described above, in the speech system 1 according to the seventh embodiment, in the mode after the change, whether or not the user can utter the recognition phrase corresponding to the mode after the change as the input voice information is confirmed before changing the mode. can do.

なお、実施形態７に係る発話システム１では、入力音声情報を所定の条件でサーバ２０が取得した場合に、サーバ２０に設定されたモードを変更できる構成としてもよい。所定の条件として、例えば、音量が所定の値よりも大きい入力音声情報を、サーバが複数回（例えば３回）続けて取得した場合、サーバ２０は元気モードに変更するとしてもよい。所定の音量よりも大きい入力音声情報をユーザが発している場合、ユーザが元気だと判断できるので、サーバ２０が元気モードに変更することにより、ユーザにより楽しく発話装置１０と会話させることができる。 In addition, in the speech system 1 according to the seventh embodiment, when the server 20 acquires input voice information under a predetermined condition, the mode set in the server 20 may be changed. As a predetermined condition, for example, when the server acquires input sound information whose sound volume is larger than a predetermined value continuously a plurality of times (for example, three times), the server 20 may change to the energy mode. When the user is emitting input voice information larger than a predetermined volume, it can be determined that the user is fine, so when the server 20 changes to the fine mode, the user can have a pleasant conversation with the speech device 10.

また、実施形態７に係る発話システム１において、発話装置１０が制御部２２及び記憶部２３を備えることにより、発話装置１０が図７の処理を実行してもよい。この場合、発話装置の発話内容決定部（決定手段）２６において決定した確認案内フレーズを、発話部（確認案内フレーズ発話手段）１３が発話する。そして、音声情報取得部（回答受付手段）１２が、確認案内フレーズに対する回答である入力音声情報を取得し、取得した入力音声情報を音声情報認識部２５に出力する。そして、音声情報認識部２５が認識したフレーズに応じて、発話内容決定部２６が、モードを変更するか否かを決定する。したがって、発話装置１０は、ユーザと円滑なコミュニケーションを図ることができる。 In addition, in the speech system 1 according to the seventh embodiment, the speech device 10 may execute the processing of FIG. 7 by including the control unit 22 and the storage unit 23 in the speech device 10. In this case, the utterance unit (confirmation guidance phrase utterance unit) 13 utters the confirmation guidance phrase determined in the utterance content determination unit (determination unit) 26 of the utterance device. Then, the voice information acquisition unit (answer reception means) 12 obtains input voice information which is a response to the confirmation guide phrase, and outputs the obtained input voice information to the voice information recognition unit 25. Then, according to the phrase recognized by the voice information recognition unit 25, the utterance content determination unit 26 determines whether to change the mode. Therefore, the speech apparatus 10 can achieve smooth communication with the user.

〔実施形態８〕
発話内容決定部２６は、フレーズやイントネーションに加え、決定した発話内容の声色、音量、話速、音の高さを、サーバ２０に設定されたモードに応じて変更できる変更部（変更手段）を備える構成としてもよい。 [Embodiment 8]
The uttered content determination unit 26 is a change unit (change means) capable of changing the voice color, the volume, the speech speed and the sound pitch of the determined uttered content in addition to the phrase and intonation according to the mode set in the server 20. It is good also as composition provided.

例えば、サーバ２０に元気モードが設定された場合、発話内容決定部２６は、変更部において、声色を元気な声色に変更し、音量を標準モードの１．２倍、話速を標準モードの１．３倍に変更する、としてもよい。 For example, when the fine mode is set in the server 20, the utterance content determination unit 26 changes the voice color to the fine voice color in the change unit, and the volume is 1.2 times the standard mode and the speech speed is 1 in the standard mode. You may change it by three times.

また、例えば、サーバ２０に丁寧モードが設定された場合、発話内容決定部２６は、変更部において、声色を丁寧な声色に変更し、音量を標準モードの０．９倍、話速を標準モードの０．８倍、音の高さを標準モードの１．２倍に変更する、としてもよい。 Also, for example, when the polite mode is set to the server 20, the utterance content determination unit 26 changes the voice color to a polite voice color in the change unit, and the volume is 0.9 times the standard mode and the speech speed is the standard mode The pitch may be changed to 0.8 times that of the standard mode and 1.2 times that of the standard mode.

このように、実施形態８に係る発話システム１では、決定した発話内容の声色、音量、話速、音の高さを、サーバ２０に設定されたモードに応じて変更することができる。したがって、より人間味に溢れる発話装置１０を実現することができる。 Thus, in the speech system 1 according to the eighth embodiment, the voice color, the volume, the speech speed, and the pitch of the determined speech contents can be changed according to the mode set in the server 20. Therefore, it is possible to realize the speech device 10 which is full of human touch.

〔実施形態９〕
サーバ２０の制御部２２は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 [Embodiment 9]
The control unit 22 of the server 20 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit).

後者の場合、サーバ２０は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラム及び各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the server 20 is a CPU that executes instructions of a program that is software that implements each function, a ROM (Read Only Memory) or a storage in which the program and various data are readably recorded by a computer (or CPU). A device (these are referred to as “recording media”), a RAM (Random Access Memory) for developing the program, and the like are provided. The object of the present invention is achieved by the computer (or CPU) reading the program from the recording medium and executing the program. As the recording medium, a “non-transitory tangible medium”, for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit or the like can be used. The program may be supplied to the computer via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る発話制御装置（２０）は、発話装置（１０）に発話させる発話内容を決定する発話制御装置であって、同一カテゴリに含まれる複数のフレーズから発話すべきフレーズを選択する選択処理と、発話すべきフレーズのイントネーションを設定する設定処理と、の少なくとも何れかの処理を行うことによって、上記発話内容を決定する発話内容決定手段（発話内容決定部２６）、を備える。 [Summary]
The speech control device (20) according to aspect 1 of the present invention is a speech control device that determines the contents of speech to be made to speech in the speech device (10), and selects a phrase to be uttered from a plurality of phrases included in the same category. An utterance content determination unit (utterance content determination unit 26) that determines the utterance content by performing at least one of selection processing and setting processing for setting intonation of a phrase to be uttered.

上記の構成によれば、発話制御装置は、同一カテゴリに含まれる複数のフレーズから発話すべきフレーズを選択する、または発話すべきフレーズのイントネーションを設定する、の少なくとも何れかの処理を行うことによって、発話装置に発話させる発話内容を決定する。したがって、発話制御装置は、ユーザに合わせた発話内容を発話装置から発話させることができるので、従来に比べてユーザと円滑なコミュニケーションを図ることが可能である。 According to the above configuration, the speech control apparatus performs at least one of the process of selecting the phrase to be uttered from a plurality of phrases included in the same category or setting the intonation of the phrase to be uttered. The content of the utterance to be uttered to the uttering device is determined. Therefore, since the speech control device can make the speech device utter the speech contents tailored to the user, it is possible to achieve smooth communication with the user as compared with the conventional case.

本発明の態様２に係る発話制御装置は、上記態様１において、１又は複数のフレーズを含む複数のフレーズデータベースを参照するよう構成されており、上記発話内容決定手段は、参照する上記フレーズデータベースを切り替えることにより、上記発話内容を決定してもよい。 The speech control apparatus according to aspect 2 of the present invention is configured to refer to a plurality of phrase databases including one or a plurality of phrases in the above-mentioned aspect 1, and the speech content determination means refers to the phrase database to be referred to. The content of the utterance may be determined by switching.

上記の構成によれば、発話制御装置は、複数のフレーズデータベースを参照し、フレーズデータベースを切り替えることにより発話内容を決定するので、従来に比べてユーザと円滑なコミュニケーションを図ることが可能な発話制御装置を容易に実現することができる。 According to the above configuration, the speech control device refers to the plurality of phrase databases and determines the contents of the speech by switching the phrase databases, so that speech control can be made to communicate with the user more smoothly than before. The device can be easily realized.

本発明の態様３に係る発話制御装置は、上記態様１または２において、１又は複数のフレーズに対してそれぞれ異なるイントネーションが設定されている複数のフレーズデータベースを参照するよう構成されており、上記発話内容決定手段は、参照する上記フレーズデータベースを切り替えることにより、上記発話内容を決定してもよい。 The speech control apparatus according to aspect 3 of the present invention is configured to refer to a plurality of phrase databases in which different intonations are set for one or a plurality of phrases in the above aspect 1 or 2, and the speech The content determination means may determine the utterance content by switching the phrase database to be referred to.

上記の構成によれば、発話制御装置は、１又は複数のフレーズに対してそれぞれ異なるイントネーションが設定されている複数のフレーズデータベースを参照し、イントネーションが設定されたフレーズを発話内容として決定する。したがって、ユーザと円滑なコミュニケーションを図ることが可能な発話制御装置を容易に実現することができる。 According to the above configuration, the speech control device refers to a plurality of phrase databases in which different intonations are set for one or a plurality of phrases, and determines a phrase for which the intonation is set as the speech content. Therefore, it is possible to easily realize a speech control device capable of achieving smooth communication with the user.

本発明の態様４に係る発話制御装置は、上記態様２または３において、上記フレーズデータベースに対応する認識データベースであって、入力音声情報を認識するための認識データベースを参照してもよい。 The speech control apparatus according to aspect 4 of the present invention may be the recognition database corresponding to the phrase database in the above aspect 2 or 3, and may refer to a recognition database for recognizing input speech information.

上記の構成によれば、発話制御装置は、フレーズデータベースに対応した認識データベースを参照することにより、入力音声情報を認識することができる。 According to the above configuration, the speech control device can recognize the input speech information by referring to the recognition database corresponding to the phrase database.

本発明の態様５に係る発話制御装置は、上記態様４において、上記認識データベースを、入力音声情報に含まれるフレーズ及びイントネーションの少なくとも何れかに応じて切り替える認識手段（音声情報認識部２５）、をさらに備え、上記発話内容決定手段は、上記認識手段によって認識されたフレーズ及びイントネーションの少なくとも何れかに応じて、上記発話内容を決定してもよい。 In the speech control device according to aspect 5 of the present invention, in the above aspect 4, a recognition unit (speech information recognition unit 25) for switching the recognition database according to at least one of a phrase included in input speech information and intonation. The speech content determination means may further determine the speech content according to at least one of the phrase and the intonation recognized by the recognition means.

上記の構成によれば、発話制御装置は、認識手段によって認識された入力音声情報に含まれるフレーズ及びイントネーションの少なくとも何れかに応じて決定された発話内容を、発話装置に発話させる。したがって、ユーザはあたかも人と会話しているかのように発話装置と会話することができるので、発話制御装置は、従来に比べてユーザと円滑なコミュニケーションを図ることが可能である。 According to the above configuration, the speech control device causes the speech device to utter the speech content determined according to at least one of the phrase and the intonation included in the input speech information recognized by the recognition means. Therefore, since the user can talk to the speech apparatus as if he / she is talking to a person, the speech control apparatus can communicate more smoothly with the user than before.

本発明の態様６に係る発話制御装置は、上記態様２〜５において、入力音声情報に含まれるフレーズが、所定のフレーズと一致するか否かを判定する判定手段（発話内容決定部２６）をさらに備え、上記判定手段によって、上記入力音声情報に含まれるフレーズが、所定のフレーズと一致しないと判定された場合、上記発話内容決定手段は、上記入力音声情報に含まれるフレーズが所定のフレーズと一致する場合に選択されるフレーズのカテゴリとは異なるカテゴリに含まれるフレーズを、発話すべきフレーズとして選択してもよい。 The speech control apparatus according to the sixth aspect of the present invention is the speech control device according to the second to fifth aspects, wherein the judgment means (the speech content determination unit 26) judges whether the phrase included in the input speech information matches the predetermined phrase. Furthermore, when it is determined by the determination means that the phrase included in the input voice information does not match the predetermined phrase, the utterance content determination means determines that the phrase included in the input voice information is a predetermined phrase A phrase included in a category different from the category of the phrase selected in the case of matching may be selected as the phrase to be uttered.

上記の構成によれば、入力音声情報に含まれるフレーズが、所定のフレーズと一致しないと判定された場合、発話制御装置は、上記入力音声情報に含まれるフレーズが、所定のフレーズと一致する場合に選択されるフレーズのカテゴリとは異なるカテゴリに含まれるフレーズを、発話装置から発話させることができる。したがって、ユーザは、あたかも人と会話しているかのように発話装置と会話することができるので、発話制御装置は、従来に比べてユーザと円滑なコミュニケーションを図ることが可能である。 According to the above configuration, when it is determined that the phrase included in the input voice information does not match the predetermined phrase, the utterance control device determines that the phrase included in the input voice information matches the predetermined phrase. A phrase included in a category different from the category of the phrase selected in can be uttered from the speaking device. Therefore, since the user can talk with the speech device as if he / she is talking to a person, the speech control device can communicate more smoothly with the user than in the past.

本発明の態様７に係る発話制御装置は、上記態様３〜５において、入力音声情報に含まれるフレーズ及びイントネーションが、所定のフレーズ及びイントネーションと一致するか否かを判定する判定手段（発話内容決定部２６）をさらに備え、上記判定手段によって、上記入力音声情報に含まれるフレーズが所定のフレーズと一致する場合であっても、上記入力音声情報に含まれるイントネーションが、所定のイントネーションと一致しないと判定された場合、上記発話内容決定手段は、上記入力音声情報に含まれるイントネーションが所定のイントネーションと一致する場合に選択されるフレーズのカテゴリとは異なるカテゴリに含まれるフレーズを、発話すべきフレーズとして選択してもよい。 The speech control apparatus according to aspect 7 of the present invention is the judging means for judging whether or not the phrase and intonation included in the input voice information match the predetermined phrase and intonation in the above-mentioned modes 3 to 5 (the speech content determination Section 26), and even if the phrase included in the input voice information matches the predetermined phrase by the determination means, the intonation included in the input voice information does not match the predetermined intonation When it is determined, the uttered content determination means sets, as the phrase to be uttered, a phrase included in a category different from the category of the phrase selected when the intonation included in the input voice information matches the predetermined intonation. You may choose.

上記の構成によれば、入力音声情報に含まれるフレーズが所定のフレーズと一致する場合であっても、入力音声情報に含まれるイントネーションが所定のイントネーションと一致しない場合には、所定のフレーズ及びイントネーションが一致した場合に選択されるフレーズのカテゴリとは異なるカテゴリを、発話制御装置は、発話装置から発話させることができる。したがって、ユーザは、あたかも人と会話しているかのように発話装置と会話することができるので、発話制御装置は、従来に比べてユーザと円滑なコミュニケーションを図ることが可能である。 According to the above configuration, even if the phrase included in the input voice information matches the predetermined phrase, if the intonation included in the input voice information does not match the predetermined intonation, the predetermined phrase and intonation are included. The utterance control device can cause the utterance control device to utter a category different from the category of the phrase selected when there is a match. Therefore, since the user can talk with the speech device as if he / she is talking to a person, the speech control device can communicate more smoothly with the user than in the past.

本発明の態様８に係る発話制御装置は、上記態様６または７において、上記発話内容決定手段は、上記選択処理において、上記異なるカテゴリに含まれる複数のフレーズを含むデータベースから発話すべきフレーズをランダムに選択してもよい。 In the utterance control device according to an eighth aspect of the present invention, in the sixth or seventh aspect, the utterance content determination means randomly selects a phrase to be uttered from a database including a plurality of phrases included in the different categories in the selection process. You may choose to

上記の構成によれば、発話制御装置は、異なるカテゴリに含まれる複数のフレーズをランダムに選択し、発話装置から発話させることができる。したがって、発話装置は、同じフレーズばかり回答することはなく、ユーザは、あたかも人と会話しているかのように発話装置と会話することができるので、発話制御装置は、従来に比べてユーザと円滑なコミュニケーションを図ることが可能である。 According to the above configuration, the speech control device can randomly select a plurality of phrases included in different categories and cause the speech device to speak. Therefore, the speech device does not answer only the same phrase, and the user can talk to the speech device as if he / she was talking to a person, so the speech control device is smoother with the user than in the past. Communication is possible.

本発明の態様９に係る発話制御装置は、上記態様１〜８において、上記発話内容決定手段は、発話装置に発話させる発話内容の声色、音量、話速、音の高さの少なくとも１つを変更する変更手段をさらに備えてもよい。 In the speech control apparatus according to aspect 9 of the present invention, in the above-mentioned aspects 1 to 8, the speech content determination means includes at least one of voice color, volume, speech speed and pitch of speech content to be made to speech to the speech apparatus It may further comprise changing means for changing.

上記の構成によれば、発話制御装置は、発話させる発話内容の声色、音量、話速、音の高さの少なくとも１つを変更して、発話内容を発話装置に発話させることができる。したがって、発話内容に応じた声色、音量、話速、音の高さで、発話内容を発話装置に発話させることができる。 According to the above configuration, the utterance control device can change the voice color, the volume, the speech speed, and the pitch of the utterance content to be uttered to cause the utterance device to utter the utterance content. Therefore, it is possible to cause the speech apparatus to utter the speech content with the voice color, the volume, the speech speed, and the pitch according to the speech content.

本発明の態様１０に係る方法は、発話装置に発話させる発話内容を決定する方法であって、同一カテゴリに含まれる複数のフレーズから、発話すべきフレーズを選択する選択処理と、発話すべきフレーズのイントネーションを設定する設定処理と、の少なくとも何れかの処理を行うことによって、上記発話内容を決定する発話内容決定工程を含み、上記発話内容決定工程は、複数のフレーズを含む複数のフレーズデータベースを切り替えることにより、上記発話内容を決定する。 A method according to a tenth aspect of the present invention is a method of determining speech content to be made to be uttered in a speech device, comprising: selection processing for selecting a phrase to be uttered from a plurality of phrases included in the same category; The utterance content determination step of determining the utterance content by performing at least one of the setting processing of setting the intonation of the plurality of phrase databases including the plurality of phrases. By switching, the utterance content is determined.

上記の構成によれば、方法は、ユーザと円滑なコミュニケーションを図る発話装置を実現することができる。 According to the above configuration, the method can realize an utterance device that achieves smooth communication with the user.

本発明の態様１１に係る発話システムは、上記態様５における発話制御装置と、発話装置とを備えた発話システムであって、上記発話装置は、入力音声情報を発話制御装置に送信し、発話制御装置から受信した発話内容を発話する。 A speech system according to aspect 11 of the present invention is a speech system including the speech control device according to aspect 5 and a speech device, wherein the speech device transmits input speech information to the speech control device, and performs speech control The utterance content received from the device is uttered.

上記の構成によれば、発話システムは、上記態様５に記載の発話制御装置と同様の効果を奏する発話システムを実現することができる。 According to the above configuration, the speech system can realize the speech system having the same effect as the speech control device described in the fifth aspect.

本発明の各態様に係る発話制御装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記発話制御装置が備える各手段として動作させることにより上記発話制御装置をコンピュータにて実現させる発話制御装置の制御プログラム、及びそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The speech control device according to each aspect of the present invention may be realized by a computer, and in this case, the speech control device is realized by the computer by operating the computer as each means included in the speech control device. A control program of a speech control device and a computer readable recording medium recording the same also fall within the scope of the present invention.

本発明の態様１２に係る発話装置は、同一カテゴリに含まれる複数のフレーズから発話すべきフレーズを選択する選択処理と、発話すべきフレーズのイントネーションを設定する設定処理と、の少なくとも何れかの処理を行うことによって発話内容を決定する発話内容決定手段（発話内容決定部２６）と、上記発話内容決定手段によって決定された発話内容を発話する発話手段（発話部１３）と、を備える。 The speech apparatus according to aspect 12 of the present invention performs at least one of a selection process of selecting a phrase to be uttered from a plurality of phrases included in the same category and a setting process of setting an intonation of a phrase to be uttered. And uttering means (uttering unit 13) for uttering the uttered contents determined by the uttered content determining means.

上記の構成によると、態様１に係る発話制御装置と同様の効果を奏する発話装置が実現できる。 According to the above configuration, it is possible to realize a speech apparatus that achieves the same effect as the speech control apparatus according to aspect 1.

本発明の態様１３に係る発話制御装置は、フレーズ、またはフレーズ及びイントネーションに関連付けられた発話モードを変更するか否かをユーザに確認するための確認案内フレーズを発話装置に送信する確認案内フレーズ送信手段（サーバ通信部２１）と、上記確認案内フレーズに対するユーザからの回答を受け付ける回答受付手段（サーバ通信部２１）と、上記回答受付手段が受け付けた回答に応じて、モードを変更するか否かを決定するモード変更決定手段（発話内容決定部２６）と、を備えている。 The speech control apparatus according to aspect 13 of the present invention transmits a confirmation guidance phrase for transmitting to the speech apparatus a confirmation guidance phrase for confirming to the user whether a phrase or a speech mode associated with the phrase and intonation is to be changed. Whether to change the mode according to the means (server communication unit 21), the answer accepting means (server communication unit 21) for accepting an answer from the user for the confirmation guide phrase, and the answer accepted by the answer accepting means And a mode change determination means (utterance content determination unit 26) for determining

上記の構成によれば、発話制御装置は、ユーザにモードを変更するか否かを確認する確認案内フレーズを、発話装置から発話させることができる。したがって、発話制御装置は、モードをユーザと会話することによって変更できるので、ユーザと円滑なコミュニケーションを図ることが可能である。 According to the above configuration, the speech control device can cause the user to utter a confirmation guidance phrase for confirming whether or not to change the mode. Therefore, since the speech control device can change the mode by talking with the user, it is possible to achieve smooth communication with the user.

本発明の態様１４に係る発話装置は、フレーズ、またはフレーズ及びイントネーションに関連付けられた発話モードを変更するか否かをユーザに確認するための確認案内フレーズを発話する確認案内フレーズ発話手段（発話部１３）と、上記確認案内フレーズに対するユーザからの回答を受け付ける回答受付手段（音声情報取得部１２）と、上記回答受付手段が受け付けた回答に応じて、発話モードを変更するか否かを決定するモード変更決定手段（発話内容決定部２６）と、を備えている。 An utterance apparatus according to aspect 14 of the present invention is a confirmation guide phrase utterance unit (uttering unit for uttering a confirmation guide phrase for confirming to a user whether a phrase or an utterance mode associated with a phrase and intonation is to be changed 13) according to the answer acceptance means (voice information acquisition unit 12) for accepting an answer from the user to the confirmation guide phrase, and whether to change the speech mode according to the answer accepted by the answer acceptance means And a mode change determination unit (utterance content determination unit 26).

上記の構成によれば、上記態様１１に係る発話制御装置と同様の効果を奏する発話装置を実現することができる。 According to the above configuration, it is possible to realize a speech apparatus that achieves the same effect as the speech control apparatus according to aspect 11.

本発明の態様１５に係る発話システムは、上記態様１４に記載の発話制御装置と、発話装置とを備えた発話システムであって、上記発話装置は、上記発話制御装置から上記確認案内フレーズを受信する確認案内フレーズ受信手段（通信部１１）と、上記確認案内フレーズを発話する確認案内フレーズ発話手段（発話部１３）と、上記確認案内フレーズに対するユーザからの回答を受け付ける回答受付手段（音声情報取得部１２）と、上記回答受付手段が受け付けた回答を上記発話制御装置に送信する回答送信手段（通信部１１）と、を備えている。 A speech system according to aspect 15 of the present invention is a speech system including the speech control device according to aspect 14 and a speech device, wherein the speech device receives the confirmation guide phrase from the speech control device. Confirmation guidance phrase receiving unit (communication unit 11), confirmation guidance phrase utterance unit (uttering unit 13) for uttering the confirmation guidance phrase, and answer acceptance unit (voice information acquisition for receiving an answer from the user for the confirmation guidance phrase And a response transmitting unit (communication unit 11) for transmitting the response received by the response receiving unit to the utterance control device.

上記の構成によれば、上記態様１１に係る発話制御装置と同様の効果を奏する発話システムを実現することができる。 According to the above configuration, it is possible to realize a speech system that achieves the same effect as the speech control device according to aspect 11.

本発明の態様１６に係る方法は、発話装置の発話を制御する方法であって、フレーズ、またはフレーズ及びイントネーションに関連付けられた発話モードを変更するか否かをユーザに確認するための確認案内フレーズを発話装置に送信する確認案内フレーズ送信工程と、上記確認案内フレーズに対するユーザからの回答を受け付ける回答受付工程と、上記回答受付工程において受け付けた回答に応じて、発話モードを変更するか否かを決定する決定工程と、を含んでいる。 A method according to aspect 16 of the present invention is a method of controlling the speech of the speech device, the confirmation guidance phrase for confirming with the user whether or not to change the speech mode associated with the phrase or the phrase and intonation. Whether to change the speech mode according to the confirmation guidance phrase transmission step of transmitting the speech to the speech apparatus, the answer reception step of receiving an answer from the user for the confirmation guide phrase, and the answer received in the answer reception step. And a decision process to decide.

（付記事項）
上記の課題を解決するために、本発明の一態様に係る発話制御装置は、発話装置に発話させる発話内容を決定する発話制御装置であって、同一カテゴリに含まれる複数のフレーズから発話すべきフレーズを選択する選択処理と、発話すべきフレーズのイントネーションを設定する設定処理と、の少なくとも何れかの処理を行うことによって、上記発話内容を決定する発話内容決定手段、を備える。 (Additional items)
In order to solve the above-mentioned subject, a speech control device concerning one mode of the present invention is a speech control device which determines the contents of speech to be made to speak in a speech device, and should speak from a plurality of phrases contained in the same category The utterance content determination means determines the above-mentioned utterance content by performing at least one of a selection process of selecting a phrase and a setting process of setting intonation of a phrase to be uttered.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and embodiments obtained by appropriately combining the technical means disclosed in the different embodiments. Is also included in the technical scope of the present invention. Furthermore, new technical features can be formed by combining the technical means disclosed in each embodiment.

本発明は、ユーザに情報を提供する発話装置を備えたシステムに利用することができる。 The present invention can be used in a system provided with a speech device for providing information to a user.

１発話システム
２ネットワーク
１０発話装置
１１通信部（発話内容取得部）
１３発話部
２０サーバ（発話制御装置）
２１サーバ通信部（音声出力部）
２２制御部
２３記憶部
２４外部情報取得部（音声情報取得部）
２５音声情報認識部
２６発話内容決定部 1 speech system 2 network 10 speech device 11 communication unit (speech content acquisition unit)
13 utterance part 20 server (utterance control device)
21 Server communication unit (voice output unit)
22 control unit 23 storage unit 24 external information acquisition unit (voice information acquisition unit)
25 speech information recognition unit 26 utterance content determination unit

Claims

A speech control apparatus for determining the content of speech to be made to be uttered into a speech apparatus, comprising:
A voice information acquisition unit that obtains input voice information from the speech device;
A voice information recognition unit that recognizes the input voice information;
An utterance content determination unit that determines the utterance content by referring to one or more databases according to the mode set in the utterance control device and the recognition information recognized by the speech information recognition unit;
A voice output unit that outputs the utterance content determined by the utterance content determination unit to the utterance device;
Equipped with
The utterance content determination unit determines whether the recognition information corresponds to the mode, and does not determine the utterance content if the recognition information does not correspond to the mode.
An utterance control device characterized by the above.

In the above database, the recognition phrase indicated by the recognition information and the response phrase corresponding to the recognition phrase are associated for each mode.
The utterance content determination unit selects the answer phrase by referring to the one or more databases according to the set mode.
The speech control apparatus according to claim 1, characterized in that:

In the above database, the answer phrase associated with the recognition phrase indicated by the recognition information is different for each mode.
The speech control apparatus according to claim 2, characterized in that:

The utterance content determination unit determines the utterance content according to the mode regardless of the setting of the utterance device.
The speech control apparatus according to claim 2 or 3, characterized in that:

A method of determining an utterance content to be uttered by an utterance device, comprising:
A voice information acquisition step of acquiring input voice information from the speech device;
A voice information recognition step of recognizing the input voice information;
An utterance content determination step of determining the utterance content by referring to one or more databases according to the set mode and the recognition information recognized in the voice information recognition step.
An audio output step of outputting the utterance content determined in the utterance content determination step to the utterance device;
Including
In the utterance content determination step, it is determined whether or not the recognition information corresponds to the mode, and when the recognition information does not correspond to the mode, the utterance content is not determined.
A method characterized by

A speech device,
A speech control device;
A speech system equipped with
The above speech control device
A voice information acquisition unit that obtains input voice information from the speech device;
A voice information recognition unit that recognizes the input voice information;
An utterance content determination unit that determines the utterance content by referring to one or more databases according to the mode set in the utterance control device and the recognition information recognized by the speech information recognition unit;
A voice output unit that outputs the utterance content determined by the utterance content determination unit to the utterance device;
Equipped with
The speech device is
An utterance content acquisition unit for acquiring the utterance content output from the utterance control device;
An utterance unit for uttering the acquired utterance content;
Equipped with
The utterance content determination unit determines whether the recognition information corresponds to the mode, and does not determine the utterance content if the recognition information does not correspond to the mode.
Utterance system characterized by

It is a program for functioning a computer as an utterance control device according to any one of claims 1 to 4 , and the program for functioning a computer as said each part.