JP2019128384A

JP2019128384A - System, method, and program for processing information

Info

Publication number: JP2019128384A
Application number: JP2018008210A
Authority: JP
Inventors: 辰顕鈴木; Tatsuaki Suzuki; 北岸　郁雄; Ikuo Kitagishi; 郁雄北岸; 健介 ▲高▼田; Kensuke Takada; 宏幸穴井; Hiroyuki Anai
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2018-01-22
Filing date: 2018-01-22
Publication date: 2019-08-01
Anticipated expiration: 2038-01-22
Also published as: JP6788620B2

Abstract

【課題】利用者に違和感を与えないように情報を提供すること。【解決手段】本発明の一態様は、利用者により発せられた音声に対する応答内容と、前記応答内容とは異なる特定情報とを出力部に出力させる応答部と、前記特定情報の出力態様を、前記応答内容の出力態様である第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、前記特定情報の出力態様を、前記第１出力態様よりも前記利用者が聞き取りやすい第２出力態様に変更して、前記特定情報を出力部に出力させる制御部とを備える情報処理システムである。【選択図】図１PROBLEM TO BE SOLVED: To provide information so as not to give a feeling of strangeness to a user. SOLUTION: One aspect of the present invention is a response unit for outputting a response content to a voice emitted by a user, specific information different from the response content to an output unit, and an output mode of the specific information. Output of the specific information when the user's instruction is received after changing to the first output mode, which is harder for the user to hear than the third output mode, which is the output mode of the response content, and outputting to the output unit. It is an information processing system including a control unit that changes the mode to a second output mode that is easier for the user to hear than the first output mode and outputs the specific information to the output unit. [Selection diagram] Fig. 1

Description

本発明は、情報処理システム、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

従来、目的地までの経路探索を行い、探索結果に応じて誘導経路を案内するナビゲーション処理中に、ユーザとの対話に基づいて、音声広告又は音声広告に係るアンケートを、音声出力手段により音声出力させる出力制御手段を備える情報処理装置が開示されている（例えば、特許文献１）。 Conventionally, during a navigation process of searching for a route to a destination and guiding a guided route according to a search result, a voice advertisement or a questionnaire related to a voice advertisement is output by voice output means based on dialogue with the user. An information processing apparatus including an output control unit is disclosed (for example, Patent Document 1).

特開２０１７−５８３１５号公報JP, 2017-58315, A

しかしながら、従来の技術では、音声の出力は利用者に違和感を与える場合があった。 However, in the conventional technology, the output of the sound sometimes gives the user a sense of incongruity.

本発明は、このような事情を考慮してなされたものであり、利用者に違和感を与えないように情報を提供することができる情報処理システム、情報処理方法、およびプログラムを提供することを目的の一つとする。 The present invention has been made in view of such circumstances, and an object thereof is to provide an information processing system, an information processing method, and a program capable of providing information so as not to give a user a sense of incongruity. One of them.

本発明の一態様は、利用者により発せられた音声に対する応答内容と、前記応答内容とは異なる特定情報とを出力部に出力させる応答部と、前記特定情報の出力態様を、前記応答内容の出力態様である第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、前記特定情報の出力態様を、前記第１出力態様よりも前記利用者が聞き取りやすい第２出力態様に変更して、前記特定情報を出力部に出力させる制御部とを備える情報処理システムである。 One aspect of the present invention provides a response unit that outputs a response content to a voice uttered by a user and specific information different from the response content to an output unit, an output mode of the specific information, When the user's instruction is accepted after changing to the first output mode that is more difficult for the user to hear than the third output mode, which is the output mode, and outputting to the output unit, the output mode of the specific information is It is an information processing system provided with the control part which changes to the 2nd output mode which the user can easily hear rather than the 1st output mode, and makes the output part output the above-mentioned specific information.

本発明の一態様によれば、利用者に違和感を与えないように情報を提供することができる。 According to one embodiment of the present invention, information can be provided so as not to give the user a sense of incongruity.

情報処理システム１の構成を示す図である。FIG. 1 is a diagram showing the configuration of an information processing system 1; 情報処理システム１により実行される処理の流れの一例を示すフローチャートである。5 is a flowchart showing an example of the flow of processing executed by the information processing system 1; 環境パターン情報６４の内容の一例を示す図である。It is a figure which shows an example of the content of the environment pattern information. 広告情報９２の内容の一例を示す図である。It is a figure which shows an example of the content of the advertisement information. 出力度合情報７２の内容の一例を示す図である。It is a figure which shows an example of the content of the output level information. 利用者と自動応答装置４０との会話の一例を示す図である。It is a figure which shows an example of the conversation between a user and the automatic response apparatus. 端末装置１０および自動応答装置４０により実行される処理の流れの一例を示すフローチャートである。4 is a flowchart illustrating an example of a flow of processing executed by a terminal device 10 and an automatic response device 40. 利用情報７４の内容の一例を示す図である。It is a figure which shows an example of the content of the usage information. 第２実施形態の情報処理システム１Ａに含まれる自動応答装置４０Ａの機能構成の一例を示す図である。It is a figure which shows an example of a function structure of 40 A of automatic response apparatuses contained in the information processing system 1A of 2nd Embodiment. 端末装置１０および第２実施形態の自動応答装置４０Ａにより実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process performed by the terminal device 10 and the automatic response apparatus 40A of 2nd Embodiment. 指示対応情報７６の内容の一例を示す図である。It is a figure which shows an example of the content of the instruction | indication correspondence information. 第２実施形態の利用者と自動応答装置４０との会話の一例を示す図である。It is a figure which shows an example of the conversation with the user of 2nd Embodiment, and the automatic response apparatus. 広告の情報が出力される際の音量の変化を示す図である。It is a figure which shows the change of the sound volume at the time of the information of an advertisement being output. 第３実施形態の情報処理システム１Ｂの機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the information processing system 1B of 3rd Embodiment. 自動応答装置４０Ｂにより実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process performed by the automatic response apparatus 40B. 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その１）である。It is the figure which shows an example of the image displayed on the conversation and the display part 15 of 3rd Embodiment (the 1). 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その２）である。It is the figure which shows an example of the image displayed on the conversation and the display part 15 of 3rd Embodiment (the 2). 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その３）である。It is a figure (the 3) which shows an example of the image displayed on the conversation and the display part 15 of 3rd Embodiment. 第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その４）である。It is a figure (the 4) which shows an example of the image displayed on the conversation and the display part 15 of 3rd Embodiment.

以下、図面を参照し、本発明の情報処理システム、情報処理方法、およびプログラムの実施形態について説明する。 Hereinafter, embodiments of an information processing system, an information processing method, and a program according to the present invention will be described with reference to the drawings.

＜概要（共通事項）＞
情報処理システムは、一以上のプロセッサにより実現される。情報処理システムは、利用者により発せられた音声に対する応答内容と、応答内容とは異なる特定情報とを出力部に出力させる。「応答内容」は、例えば、ＡＩ（Artificial Intelligence；人工知能）や、深層学習などの機械学習されたモデルにより動作する自動応答装置が決定する情報である。「特定情報」とは、例えば、広告や、挨拶、会話のきっかけとなる発話、お知らせ（例えばレコメンドやパスワード変更の要求）等の、利用者により発せられた音声に対する応答に該当しない情報である。 <Overview (common items)>
An information processing system is implemented by one or more processors. The information processing system causes the output unit to output the response content to the voice uttered by the user and the specific information different from the response content. The “response content” is, for example, information determined by an automatic response device operating by a machine-learned model such as artificial intelligence (AI) or deep learning. The “specific information” is information that does not correspond to a response to a voice uttered by the user, such as an advertisement, a greeting, an utterance that triggers a conversation, or an announcement (for example, a request for a recommendation or a password change).

［概要（その１）］
情報処理システムは、音声が入力または出力の対象とされたユーザデバイス（例えば、マイクやスピーカ）の利用度合に応じて、特定情報の出力態様を制御する。「利用度合」とは、例えば、音声をユーザデバイスに入力した回数または頻度に基づく値、または音声をユーザデバイスに出力させた回数または頻度に基づく値である。例えば、ユーザデバイスの利用度合が高いほど、特定情報の出力量を多くする。すなわち、音声入力または出力を普段から多用するユーザには自動応答装置からの話しかけや音声広告を多く出力する。また、ユーザデバイスの利用度合が高いほど、特定情報の出力態様を利用者が聞き取りやすいように制御する。「出力態様」とは、例えば、音の大きさや、音の高低、情報が出力されるテンポである。概要（その１）については、後述する第１実施形態を中心に説明する。 [Overview (Part 1)]
The information processing system controls the output mode of the specific information in accordance with the usage degree of the user device (for example, a microphone or a speaker) to which audio is input or output. The “degree of use” is, for example, a value based on the number or frequency of voice input to the user device, or a value based on the number or frequency of voice output to the user device. For example, the output amount of the specific information is increased as the usage degree of the user device is higher. That is, a large number of talks and voice advertisements from the automatic response device are output to a user who frequently uses voice input or output. In addition, the higher the degree of use of the user device, the more control is performed so that the user can easily hear the output mode of the specific information. The “output mode” is, for example, the volume of the sound, the pitch of the sound, and the tempo at which information is output. The outline (part 1) will be described focusing on the first embodiment described later.

［概要（その２）］
情報処理システムは、特定情報の出力態様を、応答内容の第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、特定情報の出力態様を第２出力態様に変更して、特定情報を出力部に出力させる。「第２出力態様」は、第１出力態様よりも利用者が聞き取りやすい出力態様である。すなわち、自動応答装置との対話において特定情報（例えば音声広告）のみの音量を小さくし利用者からの要望や操作に応じて音量をアップする。概要（その２）については、後述する第２実施形態を中心に説明する。 [Overview (Part 2)]
When the information processing system receives an instruction from the user after changing the output mode of the specific information to the first output mode in which the user is less likely to hear than the third output mode of the response content and causing the output unit to output The output mode of the specific information is changed to the second output mode, and the specific information is output to the output unit. The “second output mode” is an output mode that is easier for the user to hear than the first output mode. That is, the volume of only the specific information (for example, voice advertisement) is reduced in the dialogue with the automatic response device, and the volume is increased according to the request or operation from the user. The outline (No. 2) will be described focusing on a second embodiment described later.

［概要（その３）］
情報処理システムは、第１のキャラクターに応じた出力態様によって応答内容を出力部に出力させ、第２のキャラクターに応じた出力態様によって特定情報を出力部に出力させる。更に、情報処理システムは、第１のキャラクターと第２のキャラクターとの会話を出力部に出力させる。「第１のキャラクター」は、例えば、日常において、利用者と対話したり、利用者の発話に対して応答したりするキャラクターである。「第２のキャラクター」は、例えば、第１のキャラクターとは異なるキャラクターであって、特定情報（例えば広告）に対応付けられたキャラクターである。このように、利用者と対話する第１のキャラクターと、音声広告に対応した第２のキャラクターとが、会話することで利用者の広告に対する興味を喚起させる。概要（その３）については、後述する第３実施形態を中心に説明する。 [Overview (Part 3)]
The information processing system causes the output unit to output the contents of the response in accordance with the output mode corresponding to the first character, and causes the output unit to output the specific information in accordance with the output mode corresponding to the second character. Furthermore, the information processing system causes the output unit to output a conversation between the first character and the second character. The “first character” is, for example, a character that interacts with the user or responds to the user's speech in everyday life. The “second character” is, for example, a character that is different from the first character and is associated with specific information (for example, an advertisement). In this way, the first character interacting with the user and the second character corresponding to the voice advertisement arouse the user's interest in the advertisement. The outline (No. 3) will be described focusing on a third embodiment described later.

＜第１実施形態＞
［全体構成］
図１は、情報処理システム１の構成を示す図である。情報処理システム１は、例えば、端末装置１０と、自動応答装置４０と、広告提供装置８０とを備える。これらの装置は、ネットワークＮＷを介して互いに通信する。ネットワークＮＷは、例えば、ＷＡＮ（Wide Area Network）やＬＡＮ（Local Area Network）、インターネット、専用回線、無線基地局、プロバイダなどを含む。本実施形態では、自動応答装置４０が、「情報処理システム」の一例である。また、「情報処理システム」は、端末装置１０および／または広告提供装置８０を含んでもよい。 First Embodiment
[overall structure]
FIG. 1 is a diagram showing the configuration of the information processing system 1. The information processing system 1 includes, for example, a terminal device 10, an automatic response device 40, and an advertisement providing device 80. These devices communicate with each other via a network NW. The network NW includes, for example, a wide area network (WAN), a local area network (LAN), the Internet, a dedicated line, a wireless base station, a provider, and the like. In the present embodiment, the automatic response device 40 is an example of the “information processing system”. Further, the “information processing system” may include the terminal device 10 and / or the advertisement providing device 80.

［端末装置の機能構成］
端末装置１０は、例えば、スマートスピーカ（Artificial intelligenceスピーカ）や、スマートフォン、タブレット端末、パーソナルコンピュータ等である。第１実施形態では端末装置１０は、スマートスピーカであるものとして説明する。 [Functional configuration of terminal device]
The terminal device 10 is, for example, a smart speaker (Artificial intelligence speaker), a smartphone, a tablet terminal, a personal computer, or the like. In the first embodiment, the terminal device 10 will be described as a smart speaker.

端末装置１０は、例えば、マイク１２と、スピーカ１４と、音声認識部１６と、音声生成部１８と、端末制御部２０と、端末装置側通信部２２と、記憶部３０とを備える。音声認識部１６、音声生成部１８、および端末制御部２０は、例えば、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサが、フラッシュメモリなどの記憶部３０に記憶されたアプリケーションプログラム（アプリ３２）を実行することにより実現される。アプリ３２は、例えば、ネットワークを介してサーバ装置等からダウンロードされてもよいし、予め端末装置１０にプリインストールされていてもよい。なお、アプリケーションプログラムに代えて、以下に説明するものと同様の機能を有するブラウザがＵＡ（User Agent）として用いられてもよい。なお、端末装置１０に含まれる一部または全部の機能は、自動応答装置４０に含まれてもよい。 The terminal device 10 includes, for example, a microphone 12, a speaker 14, a voice recognition unit 16, a voice generation unit 18, a terminal control unit 20, a terminal device side communication unit 22, and a storage unit 30. The voice recognition unit 16, the voice generation unit 18, and the terminal control unit 20 are configured such that, for example, a hardware processor such as a CPU (Central Processing Unit) executes an application program (application 32) stored in a storage unit 30 such as a flash memory. It is realized by executing. The application 32 may be downloaded from, for example, a server device or the like via a network, or may be preinstalled on the terminal device 10 in advance. Note that, instead of the application program, a browser having the same function as that described below may be used as a UA (User Agent). Note that some or all of the functions included in the terminal device 10 may be included in the automatic response device 40.

マイク１２は、利用者によって発せられた音声、または端末装置１０が存在する環境の環境音を取得する。スピーカは、音声生成部１８により生成された情報に応じた音声を出力する。 The microphone 12 acquires the sound emitted by the user or the environmental sound of the environment where the terminal device 10 exists. The speaker outputs sound corresponding to the information generated by the sound generation unit 18.

音声認識部１６は、マイク１２により取得された音声をデジタルデータ（音声データ）に変換する。音声生成部１８は、自動応答装置４０により送信された情報に基づいて、スピーカ１４に出力させる音声に応じた情報を生成する。 The voice recognition unit 16 converts the voice acquired by the microphone 12 into digital data (voice data). The sound generation unit 18 generates information according to the sound to be output to the speaker 14 based on the information transmitted by the automatic response device 40.

端末制御部２０は、音声認識部１６により変換されたデジタルデータを、端末装置側通信部２２を用いて、自動応答装置４０に送信する。端末制御部２０は、自動応答装置４０により送信された情報を、端末装置側通信部２２を介して取得する。 The terminal control unit 20 transmits the digital data converted by the voice recognition unit 16 to the automatic response device 40 using the terminal device side communication unit 22. The terminal control unit 20 acquires the information transmitted by the automatic response device 40 via the terminal device side communication unit 22.

端末装置側通信部２２は、例えば、無線通信インターフェースである。端末装置側通信部２２は、自動応答装置４０により送信された情報を取得したり、端末装置１０において処理された処理結果を自動応答装置４０に送信したりする。 The terminal device side communication unit 22 is, for example, a wireless communication interface. The terminal device side communication unit 22 acquires information transmitted by the automatic response device 40 or transmits a processing result processed in the terminal device 10 to the automatic response device 40.

［自動応答装置の機能構成］
自動応答装置４０は、例えば、利用者特定部４２と、環境解析部４３と、パターン特定部４４と、解釈部４６と、応答部４８と、提供制御部５０と、学習部５２と、応答装置側通信部５４と、第１記憶部６０と、第２記憶部７０とを備える。利用者特定部４２、環境解析部４３、パターン特定部４４、解釈部４６、応答部４８、提供制御部５０、および学習部５２は、例えば、ＣＰＵ等のハードウェアプロセッサが、記憶装置（例えば第１記憶部６０）に記憶されたプログラムを実行することにより実現される。また、これらの機能部は、ＬＳＩ（Large Scale Integration）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）等のハードウェアによって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。また、上記のプログラムは、予め記憶装置に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭなどの着脱可能な記憶媒体に格納されており、記憶媒体が自動応答装置４０のドライブ装置に装着されることで記憶装置にインストールされてもよい。第１記憶部６０および第２記憶部７０は、例えば、ＲＯＭ（Read Only Memory）、フラッシュメモリ、ＳＤカード、ＲＡＭ（Random Access Memory）、レジスタ等によって実現される。 [Functional configuration of automatic answering device]
The automatic response apparatus 40 includes, for example, a user identification unit 42, an environment analysis unit 43, a pattern identification unit 44, an interpretation unit 46, a response unit 48, a provision control unit 50, a learning unit 52, and a response device. A side communication unit 54, a first storage unit 60, and a second storage unit 70 are provided. For the user identification unit 42, the environment analysis unit 43, the pattern identification unit 44, the interpretation unit 46, the response unit 48, the provision control unit 50, and the learning unit 52, for example, a hardware processor such as a CPU is a storage device (e.g. This is realized by executing a program stored in one storage unit 60). In addition, these functional units may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit), It may be realized by cooperation of software and hardware. In addition, the above program may be stored in advance in a storage device, or stored in a removable storage medium such as a DVD or a CD-ROM, and the storage medium is attached to the drive device of the automatic response device 40 May be installed in the storage device. The first storage unit 60 and the second storage unit 70 are realized by, for example, a ROM (Read Only Memory), a flash memory, an SD card, a RAM (Random Access Memory), a register, and the like.

第１記憶部６０には、例えば、後述する、利用者特定情報６２、環境特定情報６３、環境パターン情報６４、正規表現情報６６、およびシナリオ情報６８が記憶されている。第２記憶部７０には、例えば、後述する、出力度合情報７２、および利用情報７４が記憶されている。第１記憶部６０と第２記憶部７０は、必ずしも別体の記憶装置により実現される必要はなく、一体の記憶装置における異なる記憶領域であってもよい。 For example, user specifying information 62, environment specifying information 63, environment pattern information 64, regular expression information 66, and scenario information 68, which will be described later, are stored in the first storage unit 60. In the second storage unit 70, for example, output degree information 72 and usage information 74 described later are stored. The first storage unit 60 and the second storage unit 70 do not necessarily have to be realized by separate storage devices, and may be different storage areas in an integrated storage device.

利用者特定部４２は、例えば、端末装置１０により送信された音声データから人の声を表すと推定される音声データの成分（以下、発話成分）を抽出する。利用者特定部４２は、抽出した発話成分と、利用者特定情報６２に含まれる情報とを照合して、抽出した発話成分により表される音声を発した人物を特定する。利用者特定情報６２は、利用者の識別情報と、その利用者の声の特徴を示す情報（例えば、声紋パターンや周波数パターン）が対応付けられた情報である。 For example, the user identification unit 42 extracts a component of voice data (hereinafter referred to as an utterance component) estimated to represent a human voice from the voice data transmitted by the terminal device 10. The user identification unit 42 collates the extracted utterance component with the information included in the user identification information 62, and identifies the person who uttered the voice represented by the extracted utterance component. The user identification information 62 is information in which the identification information of the user is associated with information indicating the characteristics of the voice of the user (for example, a voice print pattern or a frequency pattern).

また、利用者特定部４２は、利用者特定情報６２を参照し、音声を発した利用者の周辺に存在する人物の種別を特定してもよい。この場合、利用者特定情報６２には、予め利用者の家族や友人などの声の特徴を示す情報が含まれている。また、利用者特定部４２は、端末装置側通信部２２を介して、家族等が保有する端末装置とWi-Fiルーターとの接続状態を示す情報を取得し、取得した情報に基づいて、端末装置の保有者がWi-Fiルーターが設置された位置付近に存在するか否かを判定してもよい。 The user identification unit 42 may also refer to the user identification information 62 to identify the type of person present in the vicinity of the user who issued the voice. In this case, the user identification information 62 includes information indicating the features of the voice of the user's family, friends, etc. in advance. In addition, the user specifying unit 42 acquires information indicating a connection state between the terminal device owned by the family and the Wi-Fi router via the terminal device side communication unit 22, and based on the acquired information, the terminal It may be determined whether the owner of the device exists near the position where the Wi-Fi router is installed.

環境解析部４３は、例えば、端末装置１０により送信された音声データから人の声以外の環境音を表すと推定される音声データの成分（以下、環境音成分）を抽出する。環境解析部４３は、抽出した環境音成分と、環境特定情報６３に含まれる情報とを照合して、抽出した環境音成分により表される環境音の大きさや、その環境音の発生要因を特定する。環境特定情報６３は、環境音の発生要因の識別情報と、環境音の発生要因ごとの音の特徴とが互いに対応付けられた情報である。 For example, the environment analysis unit 43 extracts a component of sound data (hereinafter referred to as an environment sound component) estimated to represent an environment sound other than a human voice from the sound data transmitted by the terminal device 10. The environment analysis unit 43 collates the extracted environmental sound component with the information included in the environment specifying information 63, and specifies the magnitude of the environmental sound represented by the extracted environmental sound component and the cause of the environmental sound. To do. The environment specifying information 63 is information in which identification information of an occurrence factor of environmental sound and a feature of sound for each occurrence factor of environmental sound are associated with each other.

パターン特定部４４は、例えば、環境パターン情報６４と、利用者特定部４２の処理結果、および環境解析部４３の処理結果に基づいて、環境パターンを特定する。環境パターンとは、利用者が存在している環境について、所定の基準に従って分類されたパターンである。詳細は後述する。 The pattern specifying unit 44 specifies an environment pattern based on, for example, the environment pattern information 64, the processing result of the user specifying unit 42, and the processing result of the environment analysis unit 43. An environmental pattern is a pattern classified according to a predetermined standard about the environment where a user exists. Details will be described later.

解釈部４６は、例えば、人の声に対応する音声データを、テキスト情報に変換し、更に、テキスト情報と正規表現情報６６とを照合して、利用者の発話の意味を解釈する。例えば、利用者により「新宿から渋谷までの行き方を教えて」と発話されたものとする。解釈部４６は、上記の発話を形態素解析し、発話を品詞に分割する。そして、解釈部４６は、固有名詞かつ場所名に該当する新宿および渋谷を符号に変換した検索キーを生成し、正規表現情報６８を検索する。正規表現情報６８には、固有名詞を抽象化した符号に変換した情報（正規表現）が登録されている。例えば、「〇〇から××への行き方を教えて」、「〇〇から××までの行き方を教えて」などのテキストが付与された情報が正規表現として登録されている。 For example, the interpretation unit 46 converts voice data corresponding to a human voice into text information, and further collates the text information with the regular expression information 66 to interpret the meaning of the user's utterance. For example, it is assumed that the user uttered “Tell me how to get from Shinjuku to Shibuya”. The interpretation unit 46 performs morphological analysis on the utterance and divides the utterance into parts of speech. Then, the interpretation unit 46 generates a search key in which Shinjuku and Shibuya corresponding to proper nouns and place names are converted into codes, and searches the regular expression information 68. In the regular expression information 68, information (regular expression) obtained by converting a proper noun into an abstract code is registered. For example, information to which a text such as “Tell me how to go from OO to XX” or “Tell me how to go from OO to XX” is registered as a regular expression.

応答部４８は、例えば、正規表現情報６８に含まれる「（固有名詞、場所）から（固有名詞、場所）までの行き方を教えて」に対応するテキスト情報を取得し、〇〇から××までの行き方を提供すればよいことを認識する。 The response unit 48 acquires, for example, text information corresponding to “Teach me how to get from (proper noun, place) to (proper noun, place)” included in the regular expression information 68, from 〇 to ×× Recognize that you should provide directions.

そして、応答部４８は、（固有名詞、場所）の部分に、符号化された元情報である「新宿」および「渋谷」を埋め込むことで、「新宿から渋谷までの行き方を知りたい」という利用者の意思を認識する。応答部４８は、ネットワーク検索などを行い、新宿から渋谷までの行き方を取得する。応答部４８は、例えば、シナリオ情報６８を参照し、新宿から渋谷までの行き方を示す、端末装置１０において出力するための音声元情報を生成する。シナリオ情報６８は、例えば、利用者の発話に対して応答すべき内容が予め保持されている。すなわち、利用者が「〇〇から××までの行き方を知りたい」という意思を有する発話に対する応答内容が保持されている。シナリオ情報６８は、例えば、応答内容が利用者の嗜好等に合致するように利用者ごとに用意されている。 Then, the response unit 48 embeds “Shinjuku” and “Shibuya”, which are the encoded original information, in the (proprietary noun, place) part, and uses “I want to know how to get from Shinjuku to Shibuya” Recognize the will of the The response unit 48 performs a network search and obtains directions from Shinjuku to Shibuya. The response unit 48 refers to the scenario information 68, for example, and generates voice source information to be output from the terminal device 10, which indicates the way from Shinjuku to Shibuya. As the scenario information 68, for example, the contents to be responded to the user's speech are held in advance. That is, the content of the response to the utterance in which the user has the intention "I want to know the way from 〇 to ×" is held. The scenario information 68 is prepared for each user, for example, so that the response content matches the preference or the like of the user.

なお、上記の応答部４８などの自動応答装置４０に含まれる一部または全部の機能は、端末装置１０に備えられてもよい。また、正規表現情報６６やシナリオ情報６８などの情報も端末装置１０の記憶装置に記憶されていてもよい。 Note that some or all of the functions included in the automatic response device 40 such as the response unit 48 may be provided in the terminal device 10. Information such as regular expression information 66 and scenario information 68 may also be stored in the storage device of the terminal device 10.

提供制御部５０は、応答部４８により生成された音声元情報を、端末装置１０に出力させるために、応答装置側通信部５４を用いて、音声元情報を端末装置１０に送信する。更に、提供制御部５０は、広告提供装置８０により送信された音声元情報を端末装置１０に出力させるために、応答装置側通信部５４を用いて、その音声元情報を端末装置１０に送信する。 The provision control unit 50 transmits the voice source information to the terminal device 10 using the response device side communication unit 54 in order to cause the terminal device 10 to output the voice source information generated by the response unit 48. Further, the provision control unit 50 transmits the voice source information to the terminal device 10 using the response device side communication unit 54 in order to cause the terminal device 10 to output the voice source information transmitted by the advertisement providing device 80. .

また、提供制御部５０は、応答内容または特定情報の出力態様を指定し、指定した出力態様で応答内容または特定情報を端末装置１０のスピーカ１４に出力させるために、指定した出力態様と応答内容または特定情報とを対応付けた情報を、応答装置側通信部５４を用いて端末装置１０に送信する。この提供制御部５０の機能は、端末装置１０に備えられてもよい。 Further, the provision control unit 50 designates an output mode of the response content or the specific information, and outputs the response content or the specific information in the specified output mode to the speaker 14 of the terminal device 10, the designated output mode and the response content Or the information which matched specific information is transmitted to the terminal device 10 using the response apparatus side communication part 54. FIG. The function of the provision control unit 50 may be included in the terminal device 10.

学習部５２は、端末装置１０のスピーカ１４に出力させた応答内容または特定情報の内容、出力させた情報の出力態様、利用者の反応、および環境パターンを学習する。学習とは、例えば、人工知能を用いた学習や、深層学習などの機械学習等である。 The learning unit 52 learns the response content or specific information content output to the speaker 14 of the terminal device 10, the output mode of the output information, the user's reaction, and the environment pattern. The learning is, for example, learning using artificial intelligence or machine learning such as deep learning.

応答装置側通信部５４は、ネットワークインターフェースカード（Network Interface Card）等の通信インターフェースを含む。応答装置側通信部５４は、端末装置１０または広告提供装置８０により送信された情報を取得したり、自動応答装置４０において処理された処理結果を端末装置１０または広告提供装置８０に送信したりする。 The response device side communication unit 54 includes a communication interface such as a network interface card. The response device side communication unit 54 acquires information transmitted by the terminal device 10 or the advertisement providing device 80, or transmits a processing result processed in the automatic response device 40 to the terminal device 10 or the advertisement providing device 80. .

［広告提供装置］
広告提供装置８０は、例えば、情報提供部８２と、広告提供装置側通信部８４と、広告提供装置側記憶部９０とを備える。情報提供部８２は、利用者の発話により入力された情報、または自動応答装置４０の応答内容に基づいて、利用者に提供する広告を抽出し、抽出した広告に関する情報（例えば音声元情報および音声を出力する出力態様）を自動応答装置４０に提供する。 [Advertising equipment]
The advertisement providing apparatus 80 includes, for example, an information providing unit 82, an advertisement providing apparatus side communication unit 84, and an advertisement providing apparatus side storage unit 90. The information providing unit 82 extracts an advertisement to be provided to the user based on the information input by the user's utterance or the response content of the automatic response apparatus 40, and information on the extracted advertisement (for example, voice source information and voice) To the automatic response device 40.

広告提供装置側通信部８４は、ネットワークインターフェースカード等の通信インターフェースを含む。広告提供装置側通信部８４は、自動応答装置４０により送信された情報を取得したり、広告提供装置８０において処理された処理結果を自動応答装置４０に送信したりする。広告提供装置側記憶部９０には、後述する広告情報９２が記憶されている。なお、広告提供装置８０と自動応答装置４０とは一体の装置として設けられてもよい。 The advertisement providing device side communication unit 84 includes a communication interface such as a network interface card. The advertisement providing device side communication unit 84 acquires information transmitted by the automatic response device 40 or transmits a processing result processed in the advertisement providing device 80 to the automatic response device 40. The advertisement providing device side storage unit 90 stores advertisement information 92 to be described later. The advertisement providing device 80 and the automatic response device 40 may be provided as an integrated device.

［フローチャート（出力度合を決定する処理）］
図２は、情報処理システム１により実行される処理の流れの一例を示すフローチャートである。本処理は、利用者による音声ＵＩ（ユーザインタフェース／ユーザデバイス）の利用の量に応じ、人工物からの音声出力の量を制御する処理である。この音声ＵＩは音声認識である。 [Flowchart (processing to determine output degree)]
FIG. 2 is a flowchart showing an example of the flow of processing executed by the information processing system 1. The present process is a process of controlling the amount of audio output from an artifact according to the amount of use of the audio UI (user interface / user device) by the user. This voice UI is voice recognition.

まず、端末装置１０は、利用者により音声が入力されたか否かを判定する（Ｓ１０）。利用者により音声が入力された場合（利用者と自動応答装置４０との会話が開始された場合）、入力された音声データ（発話成分および環境音成分）は、自動応答装置４０に送信される。 First, the terminal device 10 determines whether or not a voice is input by the user (S10). When voice is input by the user (when conversation between the user and the automatic response device 40 is started), the input voice data (speech component and environmental sound component) is transmitted to the automatic response device 40 .

自動応答装置４０は、発話成分を取得し、取得した発話成分と利用者特定情報６２に基づいて、利用者を特定する（Ｓ２０）。自動応答装置４０は、環境音成分を取得し、取得した環境音成分と環境パターン情報６４に基づいて、環境パターンを特定する（Ｓ２２）。 The automatic response apparatus 40 acquires the utterance component, and identifies the user based on the acquired utterance component and the user identification information 62 (S20). The automatic response apparatus 40 acquires an environmental sound component, and specifies an environmental pattern based on the acquired environmental sound component and the environmental pattern information 64 (S22).

図３は、環境パターン情報６４の内容の一例を示す図である。環境パターン情報６４は、複数の環境パターンと、分類基準とが対応付けられた情報である。環境パターンの分類基準は、例えば、曜日や、時間、利用者の周囲に存在している人物の数、人物の種別、利用者が存在している環境音の大きさ、利用者が存在している環境（自宅、オフィス、街）、利用者が存在している位置、および利用者のスケジュール（事前に登録された現在の予定）等のうち、少なくとも一以上の項目に基づいて、分類されるパターンである。 FIG. 3 is a diagram showing an example of the contents of the environmental pattern information 64. As shown in FIG. The environment pattern information 64 is information in which a plurality of environment patterns are associated with classification criteria. The classification criteria of the environmental pattern include, for example, the day of the week, the time, the number of persons existing around the user, the type of person, the size of the environmental sound in which the user exists, and the user Classification based on at least one or more of the user's environment (home, office, city), location where the user exists, and user schedule (currently registered in advance) It is a pattern.

利用者が存在している環境、利用者が存在している位置、または利用者のスケジュールは、例えば予め利用者により設定された情報である。また、利用者が存在している環境、または利用者が存在している位置は、不図示のＧＰＳ（Global Positioning System）を利用した位置測位装置により測位された情報に基づいて特定されてもよい。また、利用者のスケジュールは、端末装置１０が他の装置からネットワークＮＷを介して取得した情報であってもよい。 The environment where the user exists, the position where the user exists, or the schedule of the user are, for example, information set in advance by the user. Further, the environment where the user exists or the position where the user exists may be specified based on information measured by a positioning device using a GPS (Global Positioning System) (not shown). . The user's schedule may be information acquired by the terminal device 10 from another device via the network NW.

次に、自動応答装置４０は、特定した利用者に提供する広告の内容を決定するように広告提供装置８０に依頼する（Ｓ２４）。この際、自動応答装置４０は、端末装置１０に入力された音声に含まれる情報をテキスト情報に変換したテキスト情報を広告提供装置８０に送信する。 Next, the automatic response apparatus 40 requests the advertisement providing apparatus 80 to determine the content of the advertisement to be provided to the specified user (S24). At this time, the automatic response apparatus 40 transmits, to the advertisement providing apparatus 80, text information obtained by converting the information included in the voice input to the terminal apparatus 10 into text information.

広告提供装置８０は、自動応答装置４０の依頼に応じて、広告情報９２を参照して、テキスト情報に対応する利用者に提供する広告の内容を決定する（Ｓ３０）。なお、広告提供装置８０は、利用者に提供する広告が存在しない場合、その旨を自動応答装置４０に送信する。 In response to the request from the automatic response device 40, the advertisement providing device 80 refers to the advertisement information 92 and determines the content of the advertisement to be provided to the user corresponding to the text information (S30). If there is no advertisement to be provided to the user, the advertisement providing apparatus 80 transmits a message to that effect to the automatic response apparatus 40.

図４は、広告情報９２の内容の一例を示す図である。広告情報９２は、広告ＩＤに対して、キャラクター、商品（またはサービス）、シナリオ、およびキーワードが対応付けられた情報である。「キャラクター」とは、所定の特徴を有する人物や、人に見立てた動物、植物、創作物、人工物などである。キャラクターは、商品ごとに設けられてもよいし、複数の商品ごとや、キャンペーンごとに設けられてもよい。 FIG. 4 is a diagram showing an example of the contents of the advertisement information 92. As shown in FIG. The advertisement information 92 is information in which a character, a product (or service), a scenario, and a keyword are associated with an advertisement ID. The "character" is a person having a predetermined feature, an animal, a plant, a creation, an artifact, etc. The character may be provided for each product, or may be provided for each of a plurality of products or for each campaign.

「シナリオ」とは、キャラクターが発する言葉（または言動）の内容や順序を規定したものである。シナリオは、例えば、キャラクターごとに設けられている。また、広告情報９２には、シナリオに加え、音声のトーンや、テンポ等のキャラクターの特徴がキャラクターに対して対応付けられている。商品やキャンペーンごとのキャラクターは、シナリオ（行動ルール）を基に自律的に行動する。 A “scenario” defines the content and order of words (or actions) issued by a character. A scenario is provided, for example, for each character. Further, in the advertisement information 92, in addition to the scenario, the characteristics of the character such as voice tone and tempo are associated with the character. Characters for each product and campaign act autonomously based on scenarios (action rules).

「キーワード」は、広告に関連付けられた言葉である。［キーワード］は、商品を示す言葉の意味（意味情報）と同一の意味を有する言葉、または商品を示す言葉の意味に関連する言葉である。関連する言葉とは、商品を示す言葉から一般的に想起される言葉である。例えば、広告提供装置８０は、利用者により入力された言葉または自動応答装置４０により発せられた音声に含まれる言葉と、広告情報９２のキーワードとが合致する場合に、合致するキーワードに対応付けられた広告ＩＤに対応する情報（キャラクターが発話する音声元情報等）を自動応答装置４０に送信する。なお、広告提供装置８０は、人工知能や、深層学習などの機械学習されたモデルにより利用者に提供する情報を決定してもよい。 “Keyword” is a word associated with an advertisement. [Keyword] is a word having the same meaning as the meaning (semantic information) of a word indicating a product, or a word related to the meaning of a word indicating a product. Related words are words that are generally recalled from words that describe products. For example, the advertisement providing device 80 is associated with a matching keyword when the word input by the user or the word included in the voice emitted by the automatic response device 40 matches the keyword of the advertisement information 92. Information corresponding to the advertisement ID (speech original information etc. spoken by the character) is transmitted to the automatic response device 40. Note that the advertisement providing device 80 may determine information to be provided to the user by a machine-learned model such as artificial intelligence or deep learning.

次に、自動応答装置４０は、後述する出力度合情報７２を参照して、環境パターンに応じた広告の出力度合を決定し、決定した出力度合で広告を出力するように端末装置１０に指示をする（Ｓ２６）。次に、端末装置１０は、自動応答装置４０の指示に基づいて、広告を出力する（Ｓ１２）。これにより本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40 refers to the output degree information 72 described later, determines the output degree of the advertisement according to the environment pattern, and instructs the terminal device 10 to output the advertisement with the determined output degree. (S26). Next, the terminal device 10 outputs an advertisement based on the instruction of the automatic response device 40 (S12). Thereby, the process of one routine of this flowchart is completed.

図５は、出力度合情報７２の内容の一例を示す図である。出力度合情報７２は、例えば、環境パターンごとに用意されている。また、出力度合情報７２は、利用者ＩＤに対して、環境パターンにおける過去の利用度合および広告を出力する出力度合が対応付けられた情報である。 FIG. 5 is a diagram showing an example of the content of the output degree information 72. As shown in FIG. The output degree information 72 is prepared, for example, for each environment pattern. Further, the output degree information 72 is information in which the past usage degree of the environment pattern and the output degree of outputting the advertisement are associated with the user ID.

「過去の利用度合」とは、利用者が過去にスピーカ１４から音声による情報（例えば広告）の提供を受けた度合、または利用者が過去にマイク１２に音声を用いて情報を入力した度合である。「出力度合」とは、スピーカ１４を用いて利用者に情報を出力する場合に、出力される音の大きさである。「出力度合」は、「出力態様」の一例である。出力度合は、例えば、過去の利用度合が多いほど、出力される音の大きさは大きくなるように設定されている。なお、「スピーカ１４から音声による情報の提供を受けた度合」において、音楽を出力させた度合は除かれてもよい。 “Past usage” refers to the degree to which a user has been provided with voice information (for example, an advertisement) from the speaker 14 in the past, or the degree to which the user has previously input information into the microphone 12 using voice. is there. The “output degree” is the volume of sound that is output when information is output to the user using the speaker 14. The “output degree” is an example of the “output mode”. The output degree is set, for example, so that the magnitude of the sound to be output becomes larger as the past use degree increases. Note that the degree of output of music may be excluded from the “degree of provision of information by voice from the speaker 14”.

また、出力度合情報７２において、出力度合に代えて、他の出力に関する態様が対応付けられていてもよい。出力に関する態様とは、例えば、音の大きさ加え、音の高低、広告の内容が出力されるテンポ等である。出力に関する態様は、例えば、過去の利用度合が多いほど、利用者が聞き取りやすいように設定されている。 Further, in the output level information 72, instead of the output level, other output modes may be associated. The aspect related to output includes, for example, the volume of the sound, the pitch of the sound, the tempo at which the content of the advertisement is output, and the like. For example, the aspect related to the output is set so that the user can easily hear as the past usage degree increases.

また、利用者が存在する環境の環境音が所定の大きさ以上の場合、環境音が所定の大きさ未満の場合よりも、特定情報の出力態様の変化度合を小さくしてもよい。すなわち、もともと環境音が大きい環境においては、特定情報の出力を大きくさせなくてもよい。 Further, when the environmental sound of the environment where the user exists is greater than or equal to a predetermined level, the degree of change in the output mode of the specific information may be smaller than when the environmental sound is less than the predetermined level. That is, in an environment where the environmental sound is originally large, the output of the specific information may not be increased.

上述したように、自動応答装置４０が、出力度合情報７２を参照することにより、利用者に違和感を与えないように情報を提供することができる。 As described above, by referring to the output degree information 72, the automatic response apparatus 40 can provide information so as not to give a sense of discomfort to the user.

なお、上述した説明では、一例として、利用者が音声を入力した場合に、利用度合に基づいて出力態様を制御する例について説明したが、単に自動応答装置４０が発話したり、情報を出力したりする場合において利用度合に基づいて出力態様を制御してもよい。 In the above description, as an example, an example in which the output mode is controlled based on the degree of use when the user inputs voice is described. However, the automatic response device 40 simply utters or outputs information. In such a case, the output mode may be controlled based on the degree of use.

［具体例（その１）］
図６は、利用者と自動応答装置４０との会話の一例を示す図である。例えば、図６（Ａ）に示すように、（１）利用者が「新しい車が欲しいな。」とマイク１２に入力する。
（２）自動応答装置４０は、第１キャラクターの出力態様で、「どんな車が欲しいの？」と応答する。 [Specific example (1)]
FIG. 6 is a view showing an example of the conversation between the user and the automatic response device 40. As shown in FIG. For example, as shown in FIG. 6A, (1) the user inputs to the microphone 12 "I want a new car".
(2) The automatic response device 40 responds, "What kind of car do you want?" In the output mode of the first character.

次に、図６（Ｂ）に示すように、（３）利用者が「燃費のいい車がいいな。」とマイク１２に入力する。（４）自動応答装置４０は、第１キャラクターの出力態様で、「節約できるからいいよね。」と応答する。そして、（５）自動応答装置４０は、第２キャラクターの出力態様で、「車Ａが燃費いいよ。」と発話する。この第２キャラクターの出力態様は、ユーザデバイスの利用度合に応じた出力態様である。 Next, as shown in FIG. 6 (B), (3) the user inputs to the microphone 12 that “a car with good fuel efficiency is good”. (4) The automatic response device 40 responds “It is good because it can be saved” in the output mode of the first character. Then, (5) the automatic response apparatus 40 utters "The car A has good fuel consumption." In the output mode of the second character. This output mode of the second character is an output mode corresponding to the usage degree of the user device.

次に、図６（Ｃ）に示すように、（６）利用者が「詳しく教えて。」とマイク１２に入力する。（７）自動応答装置４０は、第２キャラクターの出力態様で、「車Ａは電気自動車だよ。フル充電で〇〇キロ走行可能だよ。」と応答する。 Next, as shown in FIG. 6C, the user inputs (6) “Tell me in detail” to the microphone 12. (7) The automatic response device 40 responds with the output mode of the second character, “Car A is an electric vehicle.

このように、第１キャラクターと利用者との会話において、キーワードが出現した場合、自動応答装置４０は、ユーザデバイスの利用度合に応じた出力態様で、キーワードに基づく広告を第２キャラクターの出力態様で、利用者に提供する。この結果、利用者に違和感を与えないように情報を提供することができる。 Thus, when a keyword appears in the conversation between the first character and the user, the automatic response device 40 outputs an advertisement based on the keyword in the output mode according to the usage degree of the user device. Provide to users. As a result, information can be provided so as not to give the user a sense of incongruity.

なお、上記の（６）で、車Ａに興味を示さなかった場合、第２キャラクターは、その後、発話しなくてもよい。また、車Ａに興味を示さなかった場合、他の車に対応するキャラクターの出力態様で、他の車を紹介してもよい。 Note that if the car A does not show interest in (6) above, the second character may not speak after that. In addition, when the car A does not show an interest, another car may be introduced in the output mode of the character corresponding to the other car.

また、車の広告を提供したい場合、自動応答装置４０は、第１キャラクターに車の話題で会話するような発話や応答を行ってもよい。この場合、例えば、自動応答装置４０は、上述したキーワード、キーワードを誘導するような発話を行う。例えば、出力したい特定情報に基づいて、キャラクターの会話が選択される。 In addition, when it is desired to provide a car advertisement, the automatic response apparatus 40 may make an utterance or a response to talk to the first character on a topic of the car. In this case, for example, the automatic response apparatus 40 utters the keyword and the keyword described above. For example, a character conversation is selected based on specific information to be output.

また、上述した例では、第２キャラクターの発話の出力度合を変更するものとしたが、第１キャラクターの発話の出力度合が変更されてもよい。また、出力度合は、利用者とキャラクターとの会話の度合に基づいて変更されてもよい。例えば、第１キャラクターと利用者との会話の度合が、第Ｎキャラクター（Ｎは任意の自然数）と利用者との会話の度合よりも高い場合、第１キャラクターが利用者に話し掛ける度合を、第Ｎキャラクターが利用者に話しかける度合よりも多くする。 In the above-described example, the output level of the second character's utterance is changed. However, the output level of the first character's utterance may be changed. The output level may be changed based on the degree of conversation between the user and the character. For example, when the degree of conversation between the first character and the user is higher than the degree of conversation between the Nth character (N is an arbitrary natural number) and the user, the degree to which the first character talks to the user is Make it more than the N character talks to the user.

［フローチャート（学習する処理）］
図７は、端末装置１０および自動応答装置４０により実行される処理の流れの一例を示すフローチャートである。図６のフローチャートのＳ４０、Ｓ５０、およびＳ５２の処理は、図２のフローチャートのＳ１０、Ｓ２０、およびＳ２２の処理と同様のため説明を省略する。 [Flowchart (learning process)]
FIG. 7 is a flowchart showing an example of the flow of processing executed by the terminal device 10 and the automatic response device 40. The processes of S40, S50, and S52 in the flowchart of FIG. 6 are the same as the processes of S10, S20, and S22 in the flowchart of FIG.

Ｓ５２の処理後に、自動応答装置４０は、自装置が情報を利用者に提供したか否かを判定する（Ｓ５４）。情報を利用者に提供した場合、自動応答装置４０は、提供した情報の内容、および情報の提供後の利用者の反応を取得し、取得した反応を利用情報７４として第２記憶部７０に記憶させる（Ｓ５６）。 After the processing of S52, the automatic response device 40 determines whether or not the own device has provided information to the user (S54). When the information is provided to the user, the automatic response device 40 acquires the content of the provided information and the user's reaction after the information is provided, and stores the acquired response as the usage information 74 in the second storage unit 70. (S56).

図８は、利用情報７４の内容の一例を示す図である。利用情報７４は、利用者ごとに、過去に利用者により入力された情報、または過去に利用者に対して出力された情報と、入力された情報、または出力された情報の出力態様と、環境パターンと、出力された情報に対する利用者の反応（例えば指示）とが互いに対応付けられた情報である。 FIG. 8 is a diagram showing an example of the contents of the usage information 74. As shown in FIG. The usage information 74 includes, for each user, information input by the user in the past, information output to the user in the past, input information, output mode of the output information, environment, This is information in which a pattern and a user's reaction (for example, an instruction) to the output information are associated with each other.

次に、自動応答装置４０は、所定のタイミングに到達したか否かを判定する（Ｓ５８）。所定のタイミングに到達していない場合、本フローチャートの１ルーチンの処理が終了する。所定のタイミングに到達した場合、自動応答装置４０は、利用情報７４を学習データとして学習する（Ｓ６０）。これにより本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40 determines whether or not a predetermined timing has been reached (S58). If the predetermined timing has not been reached, the process of one routine of this flowchart is terminated. When the predetermined timing is reached, the automatic response device 40 learns the usage information 74 as learning data (S60). Thereby, the process of one routine of this flowchart is completed.

上述したように、利用者に情報を提供した際の利用者の反応や、環境パターン、情報の出力態様、情報の内容が学習されることにより、利用者の好みを把握することができる。そして、学習部５２は、利用者の好みを反映させて出力度合情報７２を生成したり、更新したりすることができる。 As described above, the user's preference can be grasped by learning the reaction of the user when providing information to the user, the environmental pattern, the output mode of the information, and the content of the information. Then, the learning unit 52 can generate or update the output degree information 72 by reflecting the preference of the user.

例えば、土曜日や、時間帯が７時〜８時、利用者の周囲に親が存在している場合、利用者が自宅にいる場合、またはプライベートのスケジュールが予定されている時間帯において、他の状況の場合よりも抑制するように特定情報が出力されるように指示されたことを示す情報が、利用情報７４に含まれているものとする。この場合、学習部５２は、上述した状況に対応する環境パターンでは、特定情報の出力を抑制するように、出力度合情報７２を生成する。 For example, if a parent is present around the user on a Saturday, 7 o'clock to 8 o'clock, or if the user is at home, or if the private schedule is scheduled, It is assumed that the usage information 74 includes information indicating that the specific information is instructed to be output so as to be suppressed more than in the situation. In this case, the learning unit 52 generates the output degree information 72 so as to suppress the output of the specific information in the environment pattern corresponding to the above-described situation.

上述したように、利用者と音声インタラクションするスマートスピーカなどの人工物が、音声広告や話し掛けを過剰に行うと利用者は煩雑に感じる場合があるが、本実施形態では、利用者の音声インタラクションの利用度合や、インタラクションが行われた状況に応じて、音声広告や話し掛けを調整するため、利用者に違和感を与えないように情報を提供することができる。 As described above, an artificial object such as a smart speaker that performs voice interaction with a user may feel complicated if the user performs excessive voice advertisement or talking, but in this embodiment, the user's voice interaction Since the voice advertisement and the conversation are adjusted according to the degree of use and the situation in which the interaction is performed, information can be provided so as not to give the user a sense of incongruity.

なお、上述した例では、提供制御部５０が、音声が入力または出力の対象とされたユーザデバイスの利用度合に応じて、特定情報の出力態様を制御するものとして説明したが、これに代えて（或いは加えて）、以下のように変更されてもよい。すなわち、提供制御部５０は、ユーザデバイスの利用度合に応じて、第２応答内容の出力態様を制御する。この「第２応答内容」は、利用者により発せられた音声に対する応答内容であって広告を含む内容である。例えば、この場合、自動応答装置４０は、広告を含む応答内容を決定し、決定した応答内容をユーザデバイスの利用度合に応じた出力態様で端末装置１０に出力させる。このように、応答内容そのものが広告となり、且つ応答内容の制御態様が制御されるため、利用者に違和感を与えないように情報を提供することができる。 In the example described above, the provision control unit 50 has been described as controlling the output mode of the specific information in accordance with the usage degree of the user device for which the voice is input or output, but instead of this, (Or in addition), it may be changed as follows. That is, the providing control unit 50 controls the output mode of the second response content according to the usage degree of the user device. This “second response content” is a response content to the voice uttered by the user and includes an advertisement. For example, in this case, the automatic response device 40 determines the response content including the advertisement, and causes the terminal device 10 to output the determined response content in an output mode corresponding to the usage degree of the user device. Thus, since the response content itself becomes an advertisement and the control mode of the response content is controlled, information can be provided so as not to give the user a sense of incongruity.

以上説明した第１実施形態によれば、提供制御部５０が、音声が入力または出力の対象とされたユーザデバイスの利用度合に応じて、特定情報の出力態様を制御することにより、利用者に違和感を与えないように情報を提供することができる。 According to the first embodiment described above, the provision control unit 50 controls the output mode of the specific information according to the usage degree of the user device to which the voice is input or output, thereby allowing the user to Information can be provided so as not to give a sense of incongruity.

＜第２実施形態＞
以下、第２実施形態について説明する。提供制御部５０は、特定情報の出力態様を、応答内容の第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、特定情報の出力態様を、第１出力態様よりも利用者が聞き取りやすい第２出力態様に変更して、特定情報を出力部に出力させる。第１実施形態との相違点を中心に説明する。 Second Embodiment
The second embodiment will be described below. The provision control unit 50 changes the output mode of the specific information to the first output mode in which the user is less likely to hear than the third output mode of the response content and causes the output unit to output it, and then receives the user's instruction. In this case, the output mode of the specific information is changed to the second output mode that is easier for the user to hear than the first output mode, and the specific information is output to the output unit. The differences from the first embodiment will be mainly described.

図９は、第２実施形態の情報処理システム１Ａに含まれる自動応答装置４０Ａの機能構成の一例を示す図である。自動応答装置４０Ａは、第２記憶部７０に代えて、第２記憶部７０Ａを備える。第２記憶部７０Ａは、例えば、出力度合情報７２および利用情報７４に加え、更に指示対応情報７６（詳細は後述する）を備える。 FIG. 9 is a diagram illustrating an example of a functional configuration of an automatic response device 40A included in the information processing system 1A according to the second embodiment. The automatic response device 40A includes a second storage unit 70A in place of the second storage unit 70. For example, in addition to the output degree information 72 and the use information 74, the second storage unit 70A further includes instruction correspondence information 76 (details will be described later).

第２実施形態の応答部４８は、特定情報を端末装置１０に出力させる場合、特定情報の出力態様を、応答内容の第３出力態様よりも利用者が聞き取りにくい第１態様に変更して、特定情報を端末装置１０に出力させる。 When outputting the specific information to the terminal device 10, the response unit 48 of the second embodiment changes the output mode of the specific information to the first mode in which the user is less likely to hear the user than the third output mode of the response content, The specific information is output to the terminal device 10.

上記のように特定情報を端末装置１０に出力させた後、自動応答装置４０Ａは、利用者の指示を受け付けた場合に、特定情報の出願態様を、第１出力態様よりも利用者が聞き取りやすい第２出力態様に変更して、特定情報を端末装置１０に出力させる。第２出力態様は、例えば、第１出力態様よりも、音量が大きい、音の周波数帯が利用者にとって聞き取りやすい、情報が出力されるテンポが適切である態様である。 After the specific information is output to the terminal device 10 as described above, when the user's instruction is received, the automatic response device 40A can easily hear the application mode of the specific information than the first output mode. It changes to a 2nd output mode, and makes specific information be output to the terminal device 10. FIG. The second output mode is, for example, a mode in which the volume is larger than the first output mode, the sound frequency band is easy for the user to hear, and the tempo at which information is output is appropriate.

なお、利用者が聞き取りにくい第１態様に変更する処理において、利用者が存在する環境の環境音が所定の大きさ以上の場合、環境音が所定の大きさ未満の場合よりも、特定情報の出力態様を変化させなくてもよいし、出力態様の変化度合を小さくしてもよい。もともと環境音が大きい環境で出力態様を変更しても利用者に対する影響が小さいためである。 In the process of changing to the first mode in which the user hardly hears, if the environmental sound of the environment in which the user is present is greater than or equal to the predetermined size, the specific information is more specific than if the environmental sound is less than the predetermined size. The output mode may not be changed, or the change degree of the output mode may be reduced. This is because even if the output mode is changed in an environment where the environmental sound is large originally, the influence on the user is small.

［フローチャート］
図１０は、端末装置１０および第２実施形態の自動応答装置４０Ａにより実行される処理の流れの一例を示すフローチャートである。本処理は、第１出力態様で特定情報が出力された後に実行される処理である。図１０のフローチャートのＳ６０、Ｓ７０、およびＳ７２の処理は、図２のフローチャートのＳ１０、Ｓ２０、およびＳ２２の処理と同様のため説明を省略する。 [flowchart]
FIG. 10 is a flow chart showing an example of the flow of processing executed by the terminal device 10 and the automatic response device 40A of the second embodiment. This process is a process executed after the specific information is output in the first output mode. The processes of S60, S70 and S72 of the flowchart of FIG. 10 are the same as the processes of S10, S20 and S22 of the flowchart of FIG.

次に、自動応答装置４０Ａは、指示対応情報７６を参照し、特定された利用者と、特定された環境パターンと、入力された音声に含まれる情報（指示の内容）との組み合わせに合致する広告の情報の出力態様を決定する（Ｓ７４）。指示の内容とは、利用者が情報の出力に関して求めた指示の情報である。指示の内容とは、例えば、ボリュームを上げることや、ゆっくりと情報を出力させること、高い音で情報を出力させること、数秒前に出力された情報を出力すること等、またはこれらの組み合わせである。 Next, the automatic response device 40A refers to the instruction correspondence information 76, and matches the combination of the specified user, the specified environment pattern, and information (instruction contents) included in the input voice. The output mode of the advertisement information is determined (S74). The content of the instruction is information on the instruction that the user has requested regarding the output of information. The content of the instruction is, for example, raising the volume, outputting information slowly, outputting information with a high sound, outputting information output a few seconds ago, or a combination thereof. .

図１１は、指示対応情報７６の内容の一例を示す図である。指示対応情報７６は、利用者によって行われた指示に対して、どのような出力態様で情報を出力するかを決定するのに用いられる情報である。指示対応情報７６は、例えば、環境パターンごとに、利用者ＩＤ、指示の内容、および出力態様が互いに対応付けられた情報である。 FIG. 11 is a diagram showing an example of the content of the instruction correspondence information 76. As shown in FIG. The instruction correspondence information 76 is information used to determine in which output mode the information is to be output in response to an instruction issued by the user. The instruction correspondence information 76 is, for example, information in which the user ID, the instruction content, and the output mode are associated with each other for each environment pattern.

次に、自動応答装置４０Ａは、端末装置１０に決定した出力態様で広告の情報を出力するように指示する（Ｓ７６）。次に、端末装置１０は、自動応答装置４０Ａの指示に基づいて、決定された出力態様で広告の情報をスピーカ１４に出力させる（Ｓ６２）。これにより本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40A instructs the terminal device 10 to output the advertisement information in the determined output mode (S76). Next, the terminal device 10 causes the speaker 14 to output the advertisement information in the determined output mode based on the instruction of the automatic response device 40A (S62). Thereby, the process of one routine of this flowchart is completed.

上述したように、自動応答装置４０が、利用者の求めに応じて出力態様を変更するため、利用者に違和感を与えないように情報を提供することができる。 As described above, since the automatic response device 40 changes the output mode according to the user's request, information can be provided so as not to give the user a sense of incongruity.

［具体例（その２−１）］
図１２は、第２実施形態の利用者と自動応答装置４０Ａとの会話の一例を示す図である。例えば、図１２（Ａ）に示すように、（１）利用者が「新しい車が欲しいな。」とマイク１２に入力する。（２）自動応答装置４０Ａは、第１キャラクターの出力態様で、「どんな車が欲しいの？」と応答する。 [Specific example (part 2-1)]
FIG. 12 is a diagram illustrating an example of a conversation between the user of the second embodiment and the automatic response device 40A. For example, as shown in FIG. 12A, (1) the user inputs to the microphone 12 "I want a new car". (2) The automatic response device 40A responds, "What kind of car do you want?" In the output mode of the first character.

次に、図１２（Ｂ）に示すように、（３）利用者が「燃費のいい車がいいな。」とマイク１２に入力する。（４）自動応答装置４０Ａは、第１キャラクターの出力態様で、「節約できるからいいよね。」と応答する。 Next, as shown in FIG. 12 (B), (3) the user inputs to the microphone 12 that “a car with good fuel efficiency is good”. (4) The automatic response device 40 </ b> A responds “It is good because it can be saved” in the output mode of the first character.

次に、例えば、数秒程度、利用者によって発話がされない場合、図１２（Ｃ）に示すように、（５）自動応答装置４０Ａは、第２キャラクターの出力態様であり、且つ第１出力態様で、「車Ａをおすすめします。・・・・」と発話する。 Next, for example, when the user does not make a speech for several seconds, as shown in FIG. 12C, (5) the automatic response device 40A is an output mode of the second character and in the first output mode. Say, "We recommend car A ....".

（６）利用者は、上記（５）で出力された情報に興味を持っていたが音量が小さいため聞こえなかったことから、「聞こえないよ。」と発話する。そうすると、（７）自動応答装置４０Ａは、第２キャラクターの出力態様であり、且つ音量を上げて、上記（５）で出力させた情報を端末装置１０に出力させる。すなわち、第２キャラクターが「車Ａをおすすめします。・・・」と、再度、発話する。 (6) The user is interested in the information output in (5) but can not hear it because the volume is small, so he utters "I can not hear it." Then, (7) the automatic response device 40A is the output mode of the second character, increases the volume, and causes the terminal device 10 to output the information output in (5) above. That is, the second character speaks again, “We recommend car A.”

このように、第２キャラクターが情報を出力する場合の出力態様を、第１キャラクターが情報を出力する場合の出力態様よりも、利用者が聞き取りにくくすることにより、利用者に煩わしさを感じさせることを抑制することができる。また、利用者の求めに応じ、第２キャラクターが情報を出力する場合の出力態様を、利用者が聞き取りやすいようにすることにより、利用者にとっての利便性を向上させることができる。 In this way, the output mode when the second character outputs information is made more difficult for the user to hear than the output mode when the first character outputs information. Can be suppressed. Further, convenience for the user can be improved by making it easy for the user to hear the output mode when the second character outputs information in response to the user's request.

なお、上述した説明では、一例として、利用者が音声を入力した場合に、特定情報が出力される例について説明したが、単に自動応答装置４０Ａが特定情報を出力する場合において、上記のように出力態様が制御されてもよい。また、例えば、出力したい特定情報に基づいて、第１のキャラクターと第２のキャラクターの会話が選択されてもよい。 In the above description, the example in which the specific information is output when the user inputs a voice has been described as an example. However, when the automatic response device 40A simply outputs the specific information, as described above. The output mode may be controlled. Also, for example, the conversation between the first character and the second character may be selected based on the specific information to be output.

［具体例（その２−２）］
図１３は、広告の情報が出力される際の音量の変化を示す図である。図１３の縦軸は音の大きさを示し、図１３の横軸は時間を示している。以下で説明する広告Ａ〜Ｃの各広告の長さ（時間）は、例えば所定秒（例えば１５秒程度）である。広告Ａ〜Ｃの順で広告の情報が出力される予定であるものとする。この場合において、例えば、広告Ａが出力され、広告Ｂが出力され、広告Ｂの内容が出力されている途中（図１３の時刻Ｔ）で、利用者が音量を上げることを指示した。自動応答装置４０Ａは、時刻Ｔにおいて、広告Ｂの内容を最初から端末装置１０に出力させる。すなわち、所定時間遡った部分や音量を絞った部分から、広告Ｂが再出力される。また、その後、自動応答装置４０Ａは、図示するように広告Ｂの内容が出力された後、音量を上げる前の音量に下げてもよいし、音量を上げた状態を維持してもよい。 [Specific example (2-2)]
FIG. 13 is a diagram showing a change in volume when advertisement information is output. The vertical axis in FIG. 13 indicates the magnitude of sound, and the horizontal axis in FIG. 13 indicates time. The length (time) of each advertisement of advertisements A to C described below is, for example, a predetermined second (for example, about 15 seconds). It is assumed that advertisement information is scheduled to be output in the order of advertisements A to C. In this case, for example, the advertisement A is output, the advertisement B is output, and the user instructs to increase the volume while the content of the advertisement B is being output (time T in FIG. 13). At time T, the automatic response device 40A causes the terminal device 10 to output the contents of the advertisement B from the beginning. That is, the advertisement B is re-outputted from the portion going back for a predetermined time or the portion with reduced volume. Further, after that, after the contents of the advertisement B are output as illustrated, the automatic response apparatus 40A may lower the volume to the level before raising the volume, or may keep the volume increased.

上述したように、自動応答装置４０Ａが、利用者により指示がされた場合に、指示された際に出力していた広告を最初から出力させるため、利用者は所望の情報を取得することができる。 As described above, when the automatic response device 40A instructs the user, the user can obtain desired information because the advertisement output at the time of the instruction is output from the beginning .

なお、上述した例では、利用者の指示に基づいて、内容Ｂを最初から出力するものとしたが、広告Ａの最初から出力してもよいし、利用者の指示がされたときから所定時間前に出力されていた情報から出力してもよい。また、利用者の発話の内容（例えば切迫度）に基づいて、再出力させる情報が決定されてもよい。また、自動応答装置４０Ａは、過去の利用者の指示の傾向または予め設定された条件に基づいて、利用者の指示がされたときから、どの程度前から広告を再度再生するかを決定してもよい。 In the example described above, the content B is output from the beginning based on the user's instruction, but may be output from the beginning of the advertisement A, or a predetermined time from when the user's instruction is given You may output from the information output previously. Information to be re-outputted may be determined based on the content of the user's utterance (for example, the degree of urgency). Further, the automatic response device 40A determines how long before the advertisement is reproduced again from the time when the user's instruction is made based on the tendency of the user's instruction in the past or the preset condition. Also good.

［その他］
提供制御部５０は、特定情報の属性に基づいて、特定情報の出力態様を、第１出力態様に変更して特定情報を出力部に出力させてもよい。特定情報の属性とは、広告に関する情報、機器の操作に関する情報、楽曲、およびユーザに関連する期限に関する情報（パスワードの変更期限などの情報）のうち、少なくとも一つを含む。例えば、提供制御部５０は、広告に関する情報の出力態様を第１出力態様に変更し、他の属性の特定情報は出力態様を変更しなくてもよい。 [Others]
The provision control unit 50 may change the output mode of the specific information to the first output mode based on the attribute of the specific information and cause the output unit to output the specific information. The attribute of the specific information includes at least one of information related to advertisement, information related to device operation, music, and information related to a time limit related to the user (information such as a password change time limit). For example, the provision control unit 50 may change the output mode of the information related to the advertisement to the first output mode, and the specific information of the other attributes may not change the output mode.

提供制御部５０は、広告の種別に基づいて特定情報の出力態様を、第１出力態様に変更して特定情報を出力部に出力させてもよい。広告の種別とは、例えば、広告に対応する商品の種別である。例えば、提供制御部５０は、車の広告の出力態様については、第１出力態様に変更するが、不動産の広告の出力態様については、第１出力態様に変更せずに、出力部に出力させてもよい。 The provision control unit 50 may change the output mode of the specific information to the first output mode based on the type of the advertisement and cause the output unit to output the specific information. The type of advertisement is, for example, the type of product corresponding to the advertisement. For example, the provision control unit 50 changes the output mode of the car advertisement to the first output mode, but does not change the output mode of the real estate advertisement to the output unit without changing to the first output mode. May be.

また、提供制御部５０は、広告の種別と、過去に行われた利用者の指示の結果とに基づいて、特定情報の出力態様を、第１出力態様に変更して特定情報を出力部に出力させてもよい。例えば、学習部５２が、広告の種別と、過去に行われた利用者の指示の結果とを学習する。例えば、学習部５２は、車の広告が出力された場合、利用者はボリュームのアップを指示したが、不動産の広告が出力された場合、利用者はボリュームのダウンを指示したことを学習する。この場合、例えば、提供制御部５０は、車の広告の出力態様については、第１出力態様に変更するが、不動産の広告の出力態様については、第１出力態様に変更せずに、出力部に出力させてもよい。 Further, the provision control unit 50 changes the output mode of the specific information to the first output mode based on the type of advertisement and the result of the user's instruction made in the past, and the specific information is output to the output unit. It may be output. For example, the learning unit 52 learns the types of advertisements and the results of user instructions made in the past. For example, when the advertisement of a car is output, the learning unit 52 instructs the user to increase the volume, but when the advertisement of real estate is output, the learning unit 52 learns that the user instructs to decrease the volume. In this case, for example, the provision control unit 50 changes the output mode of the car advertisement to the first output mode, but does not change the output mode of the real estate advertisement to the first output mode. May be output.

また、提供制御部５０は、上記の考え方を採用して、利用者に対応する環境パターンに基づいて、特定情報の出力態様を、第１出力態様に変更してもよい。例えば、ある環境においては、第１出力態様で特定情報が出力されることが利用者にとって好ましいことが学習部５２により学習される。提供制御部５０は、学習結果に基づいて、特定情報を第１出力態様で出力する。 Further, the provision control unit 50 may adopt the above-described concept and change the output mode of the specific information to the first output mode based on the environment pattern corresponding to the user. For example, in a certain environment, the learning unit 52 learns that it is preferable for the user that the specific information is output in the first output mode. The providing control unit 50 outputs the specific information in the first output mode based on the learning result.

また、利用者により指定された情報（例えば所定の属性の情報）の出力態様については、第１出力態様に変更し、指定されていない情報の出力態様については第１出力態様に変更しなくてもよい。 Further, the output mode of information specified by the user (for example, information of a predetermined attribute) is changed to the first output mode, and the output mode of information not specified is not changed to the first output mode. Also good.

また、指示対応情報７６は、学習部５２により生成される。例えば、学習部５２は、第１出力態様で特定情報が出力部に出力された後、環境パターンごとに、利用者により受けた指示の内容および指示に基づいて変更された特定情報の出力態様を学習する。そして、学習部５２は、所定の環境パターンにおいて、特定情報の出力態様をどのように変更させたかを学習して、利用者の嗜好に合致する指示対応情報７６を生成する。 The instruction correspondence information 76 is generated by the learning unit 52. For example, after the specific information is output to the output unit in the first output mode, the learning unit 52 changes the content of the instruction received by the user and the output mode of the specific information changed based on the instruction for each environment pattern. learn. Then, the learning unit 52 learns how to change the output mode of the specific information in the predetermined environment pattern, and generates the instruction correspondence information 76 that matches the user's preference.

例えば、学習部５２は、土曜日や、時間帯が７時〜８時、利用者の周囲に親が存在している場合、利用者が自宅にいる場合、またはプライベートのスケジュールが予定されている時間帯において、利用者により受けた指示の内容および指示に基づいて変更した特定情報の出力態様を学習し、学習結果に基づいて、指示対応情報７６を生成する。例えば、利用者が、所定の環境パターンにおいてボリューム「１０」で特定情報の出力させる傾向にある場合、指示対応情報７６において、ボリュームの変更指示がされた場合の第２出力態様はボリューム「１０」に設定される。 For example, if a parent is present around the user on a Saturday, 7 o'clock to 8 o'clock, the learning unit 52, if the user is at home, or if a private schedule is scheduled In the band, the output mode of the changed specific information is learned based on the content of the instruction received by the user and the instruction, and the instruction correspondence information 76 is generated based on the learning result. For example, when the user tends to output specific information with volume "10" in a predetermined environment pattern, the second output mode when volume change instruction is given in the instruction correspondence information 76 is volume "10". Set to

以上説明した第２実施形態によれば、提供制御部５０は、特定情報の出力態様を、応答内容の出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、特定情報の出力態様を、第１出力態様よりも利用者が聞き取りやすい第２出力態様に変更して、特定情報を出力部に出力させることにより、利用者に違和感を与えないように情報を提供することができる。 According to the second embodiment described above, the provision control unit 50 changes the output mode of the specific information to the first output mode that is more difficult for the user to hear than the output mode of the response content, and causes the output unit to output it. After that, when the user's instruction is accepted, the output mode of the specific information is changed to a second output mode in which the user can easily hear more than the first output mode, and the specific information is output to the output unit. Information can be provided so as not to give the user a sense of discomfort.

例えば、自動応答装置と利用者との対話の延長にそのまま音声広告を出力すると、煩わしく思われたり、ステルスマーケティング（ステマ）とみなされてしまったりする場合があるが、本実施形態のように、特定情報を利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示によって特定情報を第２出力態様に変更して出力部に出力させることにより、煩わしいと感じさせることを抑制したり、ステルスマーケティングとみなされること抑制する。 For example, if the voice advertisement is output as it is for the extension of the dialogue between the automatic response device and the user, it may seem annoying or regarded as stealth marketing (stemmer), but as in this embodiment, After the specific information is changed to the first output mode that is difficult for the user to hear and output to the output unit, the specific information is changed to the second output mode according to the user's instruction and output to the output unit. Suppress what you feel, or what is considered stealth marketing.

＜第３実施形態＞
以下、第３実施形態について説明する。提供制御部５０は、応答内容を出力する第１のキャラクターと、特定情報を出力する第２のキャラクターとの会話を出力部に出力させる。第１実施形態との相違点を中心に説明する。 Third Embodiment
The third embodiment will be described below. The providing control unit 50 causes the output unit to output a conversation between the first character that outputs the response content and the second character that outputs the specific information. The differences from the first embodiment will be mainly described.

図１４は、第３実施形態の情報処理システム１Ｂの機能構成の一例を示す図である。情報処理システム１Ｂは、例えば、端末装置１０Ｂと、自動応答装置４０Ｂと、広告提供装置８０Ｂとを備える。 FIG. 14 is a diagram illustrating an example of a functional configuration of the information processing system 1B according to the third embodiment. The information processing system 1B includes, for example, a terminal device 10B, an automatic response device 40B, and an advertisement providing device 80B.

端末装置１０Ｂは、第１実施形態の端末装置１０の機能構成に加え、更に表示部１５と、画像生成部１９とを備える。表示部１５は、画像生成部１９の制御に基づいて、画像を表示する。画像生成部１９は、自動応答装置４０Ｂにより送信された情報に基づいて、表示部１５に画像を表示させる。例えば、音声生成部１８と画像生成部１９とは、自動応答装置４０Ｂにより送信された情報に基づいて、表示部１５に表示される画像の内容と、スピーカ１４に出力される音声の内容とが意図したタイミングになるように協調して、スピーカ１４および表示部１５を制御する。以下、音声生成部１８と画像生成部１９とを合わせたものを、「生成部１７」と称する。 The terminal device 10B includes a display unit 15 and an image generation unit 19 in addition to the functional configuration of the terminal device 10 of the first embodiment. The display unit 15 displays an image based on the control of the image generation unit 19. The image generation unit 19 displays an image on the display unit 15 based on the information transmitted by the automatic response device 40B. For example, based on the information transmitted by the automatic response device 40B, the sound generation unit 18 and the image generation unit 19 may use the contents of the image displayed on the display unit 15 and the contents of the sound output on the speaker 14. The speaker 14 and the display unit 15 are controlled in a coordinated manner so that the intended timing is reached. Hereinafter, a combination of the sound generation unit 18 and the image generation unit 19 is referred to as a “generation unit 17”.

自動応答装置４０Ｂは、第１実施形態の自動応答装置４０の機能構成に加え、更に画像提供部４９を備え、第１実施形態の第１記憶部６０に代えて、第１記憶部６０Ｂを備える。第１記憶部６０Ｂは、例えば、第１実施形態の第１記憶部６０に記憶された情報に加え、更にモーション情報６９が記憶されている。モーション情報６９は、利用者と会話するキャラクターの動きが規定された情報である。画像提供部４９は、モーション情報６９に含まれる情報、または広告提供装置８０Ｂにより提供された情報に基づいて、端末装置１０Ｂに表示される画像を生成するための情報を端末装置１０に提供する。画像を生成するための情報には、スピーカ１４に出力される発話に対して、画像を変化させるタイミングが対応付けられている。以下、応答部４８と画像提供部４９とを合わせたものを、「応答提供部４７」と称する。 In addition to the functional configuration of the automatic response device 40 of the first embodiment, the automatic response device 40B further includes an image providing unit 49, and includes a first storage unit 60B instead of the first storage unit 60 of the first embodiment. . For example, in addition to the information stored in the first storage unit 60 of the first embodiment, the first storage unit 60B further stores motion information 69. The motion information 69 is information in which the motion of the character talking with the user is defined. The image providing unit 49 provides the terminal device 10 with information for generating an image to be displayed on the terminal device 10B based on the information included in the motion information 69 or the information provided by the advertisement providing device 80B. The information for generating the image is associated with the utterance output to the speaker 14 with the timing for changing the image. Hereinafter, the combination of the response unit 48 and the image providing unit 49 is referred to as a “response providing unit 47”.

広告提供装置８０Ｂは、第１実施形態の広告提供装置側記憶部９０に代えて、広告提供装置側記憶部９０Ｂを備える。広告提供装置側記憶部９０は、例えば、広告情報９２Ｂを備える。広告情報９２Ｂは、第１実施形態の広告情報９２の情報に加え、更に広告モーション情報９３を備える。広告モーション情報９３は、広告ＩＤに対応付けられたキャラクターの動きが規定された情報である。 The advertisement providing device 80B includes an advertisement providing device side storage unit 90B in place of the advertisement providing device side storage unit 90 of the first embodiment. The advertisement provision device side storage unit 90 includes, for example, advertisement information 92B. The advertisement information 92B further includes advertisement motion information 93 in addition to the information of the advertisement information 92 of the first embodiment. The advertisement motion information 93 is information that defines the movement of the character associated with the advertisement ID.

［フローチャート］
図１５は、自動応答装置４０Ｂにより実行される処理の流れの一例を示すフローチャートである。まず、応答提供部４７が、第１キャラクターと第２キャラクターとを会話させる（Ｓ８０）。次に、広告提供部４７は、第２キャラクターに広告の情報を出力させる（Ｓ８２）。 [flowchart]
FIG. 15 is a flowchart showing an example of the flow of processing executed by the automatic response device 40B. First, the response providing unit 47 causes a conversation between the first character and the second character (S80). Next, the advertisement providing unit 47 causes the second character to output advertisement information (S82).

次に、自動応答装置４０Ｂは、出力された広告の情報（第１の特定情報）に応じて利用者が音声を入力したか否かを判定する（Ｓ８４）。なお、音声に代えて、所定の操作がされたか否かが判定されてもよい。利用者が音声を入力していない場合、本フローチャートの１ルーチンの処理が終了する。 Next, the automatic response device 40B determines whether or not the user has input voice according to the output advertisement information (first specific information) (S84). Note that, instead of the voice, it may be determined whether a predetermined operation has been performed. When the user has not input a voice, the process of one routine of this flowchart ends.

利用者が音声を入力した場合、自動応答装置４０Ｂは、利用者が広告の情報の出力に対して煩わしいと感じているか否かを判定する（Ｓ８６）。「煩わしいと感じている」とは、例えば、入力された音声に含まれる情報が広告の情報の出力に関して、否定的な意味を有していることである。より具体的には、例えば、「静かにして」、「やめて」、「音を下げて」などの意味を有する発話がされた場合、利用者が煩わしいと感じていると判定される。利用者が煩わしいと感じていない場合、本フローチャートの１ルーチンの処理が終了する。なお、Ｓ８６で煩わしいと感じていない場合、自動応答装置４０Ｂは、第１の特定情報よりも詳細な情報である第２の特定情報を出力部に出力させる。詳細な情報とは、例えば、第１の特定情報が商品名や商品の属性である場合、その説明的な内容である。 When the user inputs a voice, the automatic response apparatus 40B determines whether the user feels it bothersome to output the advertisement information (S86). “I feel bothersome” means, for example, that the information contained in the input voice has a negative meaning with respect to the output of the information of the advertisement. More specifically, for example, when an utterance having a meaning such as "make quiet", "stop", "drop the sound" is made, it is determined that the user feels troublesome. If the user does not feel bothersome, the processing of one routine of this flowchart ends. If the user does not feel troublesome at S86, the automatic response apparatus 40B causes the output unit to output second identification information that is more detailed than the first identification information. The detailed information is, for example, the explanatory content when the first specific information is a product name or a product attribute.

利用者が煩わしいと感じている場合、応答提供部４７は、広告の情報を出力させることを停止する（Ｓ８８）。なお、停止に代えて、利用者の反応に基づいて出力態様を変更させてもよい。例えば、利用者が「音を下げて」と入力した場合、広告の情報が出力される音が小さく制御される。これにより本フローチャートの１ルーチンの処理が終了する。 When the user feels troublesome, the response providing unit 47 stops outputting the information of the advertisement (S88). In addition, it may replace with a stop and may change an output mode based on a user's reaction. For example, when the user inputs “turn down the sound”, the sound at which the advertisement information is output is controlled to be small. Thus, the processing of one routine of this flowchart ends.

上述したように、キャラクター同士が会話をして広告の情報を出力させることにより、利用者に対して、より情報に対する興味を持たせることができる。また、利用者の反応に応じて、情報の出力を抑制するため、利用者にとっての利便性が向上する。 As described above, it is possible for the user to have more interest in information by causing the characters to have a conversation and output the information of the advertisement. In addition, since the output of information is suppressed according to the reaction of the user, the convenience for the user is improved.

［具体例（その３−１）］
図１６は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その１）である。提供制御部は、利用者に提供した情報に基づいて、第１キャラクターと第２キャラクターとを会話させる。例えば、図１６に示すように、（１）第２キャラクターＣＲ２が「今日の天気はどう？」と発話する。（２）第１キャラクターＣＲ１が、「予報では快晴だよ。」と応答する。 [Specific example (3-1)]
FIG. 16 is a diagram (part 1) illustrating an example of an image displayed on the conversation and display unit 15 of the third embodiment. A provision control part makes a 1st character and a 2nd character have conversation based on the information provided to the user. For example, as shown in FIG. 16, (1) the second character CR2 utters "How is the weather today?" (2) The first character CR1 responds, "It is clear in the forecast."

次に、（３）第２キャラクターＣＲ２が「ドライブ日和だね。」と発話する。次に、（４）第１キャラクターＣＲ１が、「そうだね。」と応答する。次に、（５）第２キャラクターＣＲ２が、「そういえば、ドライブするのに最適な車が発売されたよ。」と発話する。 Next, (3) the second character CR2 utters, "It's a driving weather." Next, (4) the first character CR1 responds with "Yes." Next, (5) the second character CR2 utters, "If it says so, a car best suited for driving has been released."

このように、キャラクター同士で会話させて、商品を紹介することにより、利用者により自然に商品に興味を持たせることができる。 In this way, by letting the characters talk and introduce the product, the user can naturally have an interest in the product.

［具体例（その３−２）］
例えば、自動応答装置４０Ｂは、第１キャラクターと利用者との会話に基づいて、利用者の好みや、嗜好、行動予定等の嗜好情報を取得する。嗜好情報とは、例えば、利用者の趣味や、利用頻度が高い施設または場所、購入頻度が高い商品、購入を希望している商品またはサービス等の情報である。 [Specific example (3-2)]
For example, based on the conversation between the first character and the user, the automatic response device 40B acquires preference information such as the user's preference, preference, and action schedule. The preference information is, for example, information such as a user's taste, a facility or place frequently used, a product frequently purchased, a product or service desired to be purchased, or the like.

図１７は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その２）である。提供制御部５０は、例えば、利用者と第１キャラクターとの会話に含まれる会話情報を第２キャラクターにより出力される特定情報の内容に反映させるか否かを利用者に問い合わせ、利用者に許諾を得た場合、会話情報を特定情報の内容に反映させる。 FIG. 17 is a diagram (part 2) illustrating an example of an image displayed on the conversation and display unit 15 according to the third embodiment. The provision control unit 50 inquires of the user whether, for example, the conversation information included in the conversation between the user and the first character is to be reflected in the content of the specific information output by the second character, and grants the user permission If it is obtained, the conversation information is reflected in the content of the specific information.

例えば、図１７に示すように、（１）第１キャラクターＣＲ１が「利用者Ａさん。利用者Ａさんが車の購入を考えていること他の人に教えていい？」と発話する。この発話に対して、利用者Ａさんが「いいよ。」と回答したものとする。（２）第１キャラクターＣＲ１が、「いいんだね。他の人に教えておくね。きっといい車が見つかるよ！」と応答する。このように、第１キャラクターが利用者の興味関心、傾向などの情報を第２キャラクターに提供することで、第２キャラクターが出力する情報を最適化する。 For example, as shown in FIG. 17, (1) the first character CR1 utters "User A. Can you tell other people that User A is thinking of purchasing a car?" Assume that user A answers "Yes" to this utterance. (2) The first character CR1 responds, "It's good. I will tell other people. I will surely find a good car!" As described above, the first character provides information such as the user's interest and tendency to the second character, thereby optimizing the information output by the second character.

図１８は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その３）である。前述した図１７の（２）の応答後、所定のタイミングで以下の会話が行われる。（１）第２キャラクターＣＲ２が、例えば表示部１５に表示されていない状態で「ごめんください。」と発話する。次に、（２）第１キャラクターＣＲ１が、「どなたですか？」と応答する。次に、（３）第２キャラクターＣＲ２が、「少しお時間よろしいでしょうか？」と発話する。次に、（４）第１キャラクターＣＲ１が、「利用者Ａさん、どなたか尋ねてきましたよ。入れてもいいですか？」と発話する。この発話に対して、利用者Ａさんが、「入れていいよ。」と回答したものとする。次に、（５）第１キャラクターＣＲ１が、利用者Ａさんの発話に応じて、「お入りください。」と発話する。その後、表示部１５に図１９に示す画像が表示される。 FIG. 18 is a diagram (part 3) illustrating an example of an image displayed on the conversation and display unit 15 according to the third embodiment. After the response in (2) of FIG. 17 described above, the following conversation is performed at a predetermined timing. (1) While the second character CR2 is not displayed on the display unit 15, for example, it utters "I'm sorry." Next, (2) the first character CR1 responds with "Who is it?" Next, (3) the second character CR2 utters, "Are you going for a while?" Next, (4) the first character CR1 utters, "User A, I asked who I am. Can I insert it?" In response to this utterance, it is assumed that the user A responds as "I can insert it." Next, (5) the first character CR1 speaks "Please enter" according to the speech of the user A. Thereafter, the image shown in FIG. 19 is displayed on the display unit 15.

図１９は、第３実施形態の会話および表示部１５に表示される画像の一例を示す図（その４）である。（１）第２キャラクターＣＲ２が、例えば表示部１５に表示された状態で「お車をお探しであることをお伺いしたので、ご紹介に参りました。」と発話する。次に、（２）第１キャラクターＣＲ１が、「利用者Ａさん、お話聞いてみますか？」と応答する。この応答に対して、利用者が肯定的な発話を行った場合、例えば、第２キャラクターＣＲ２は、商品を紹介する。この応答に対して、利用者が否定的な発話を行った場合、例えば、第２キャラクターＣＲ２は、商品の紹介を行わず、姿を消す。 FIG. 19 is a diagram (part 4) illustrating an example of an image displayed on the conversation and display unit 15 according to the third embodiment. (1) While the second character CR2 is displayed on the display unit 15, for example, he / she utters, "I asked for an introduction because I asked that I was looking for a car." Next, (2) the first character CR1 responds, "User A, do you want to hear a story?" When the user makes a positive utterance in response to this response, for example, the second character CR2 introduces a product. When the user makes a negative utterance in response to this response, for example, the second character CR2 disappears without introducing a product.

このように、嗜好情報の取扱いについて、許可が得られた場合に、利用者の嗜好情報に応じた広告の情報が出力されるため、利用者に煩わしさを感じさせることを抑制しつつ、利用者にとっての利便性を向上させることができる。 As described above, when permission is obtained for the handling of preference information, advertisement information corresponding to the preference information of the user is output, so that it is possible to prevent the user from feeling annoying. It is possible to improve the convenience for the disabled.

なお、上述した例では、第１キャラクターＣＲ１と第２キャラクターＣＲ２とが会話する例について説明したが、これに代えて（または加えて）第２キャラクターＣＲ２と、第３キャラクターとが会話してもよい。第３キャラクターは、例えば、第２キャラクターＣＲ２がおすすめする商品（またはサービス）と競合する（または関連する）商品（またはサービス）を宣伝するキャラクターである。 In the example described above, an example in which the first character CR1 and the second character CR2 talk is described, but instead (or in addition) the second character CR2 and the third character talk Good. The third character is, for example, a character promoting a product (or service) that competes (or is associated with) a product (or service) recommended by the second character CR2.

以上説明した第３実施形態によれば、提供制御部５０は、第１のキャラクターに応じた出力態様によって応答内容を出力部に出力させ、第２のキャラクターに応じた出力態様によって特定情報を出力部に出力させ、第１のキャラクターと第２のキャラクターとの会話を出力部に出力させることにより、よりユーザに情報に対する興味を喚起させることができる。 According to the third embodiment described above, the provision control unit 50 causes the output unit to output the response content in the output mode according to the first character, and outputs the specific information in the output mode according to the second character. By causing the unit to output and causing the output unit to output the conversation between the first character and the second character, it is possible to make the user more interested in the information.

なお、上述した各実施形態の情報処理システム１では、端末装置１０は一台であるものとして説明したが、二以上の端末装置１０が設けられてもよい。この場合、自動応答装置４０は、例えば、第１の端末装置１０または第２の端末装置１０から、その装置の識別情報と共に端末装置１０に入力された音声データを取得する。そして、自動応答装置４０は、取得した識別情報を参照して、第１の端末装置１０に第１キャラクターの出力態様で応答内容を出力させ、第２の端末装置１０に第２キャラクターの出力態様で特定情報を出力させる。 In addition, in the information processing system 1 of each embodiment mentioned above, although the terminal device 10 demonstrated as what was one, two or more terminal devices 10 may be provided. In this case, the automatic response apparatus 40 acquires, for example, voice data input to the terminal apparatus 10 together with identification information of the apparatus from the first terminal apparatus 10 or the second terminal apparatus 10. Then, the automatic response apparatus 40 causes the first terminal device 10 to output the response content in the output mode of the first character with reference to the acquired identification information, and the second terminal device 10 outputs the output mode of the second character. To output specific information.

以上説明した実施形態によれば、利用者により発せられた音声に対する応答内容と、前記応答内容とは異なる特定情報とを出力部に出力させる応答部と、前記特定情報の出力態様を、前記応答内容の出力態様である第３出力態様よりも利用者が聞き取りにくい第１出力態様に変更して出力部に出力させた後、利用者の指示を受け付けた場合に、前記特定情報の出力態様を、前記第１出力態様よりも前記利用者が聞き取りやすい第２出力態様に変更して、前記特定情報を出力部に出力させる制御部と備えることにより、利用者に違和感を与えないように情報を提供することができる。 According to the embodiment described above, the response unit for causing the output unit to output the response content for the voice emitted by the user and the specific information different from the response content, the output mode of the specific information, the response The output mode of the specific information is received when the user's instruction is received after changing to a first output mode in which the user is less likely to hear than the third output mode, which is the content mode, and causing the output unit to output Changing the information to a second output mode that is easier for the user to hear than the first output mode, and providing a control unit that causes the output unit to output the specific information so that the user does not feel discomfort Can be provided.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 As mentioned above, although the form for carrying out the present invention was explained using an embodiment, the present invention is not limited at all by such an embodiment, and various modification and substitution within the range which does not deviate from the gist of the present invention Can be added.

１，１Ａ、１Ｂ…情報処理システム、１０…端末装置、１２…マイク、１４…スピーカ、１５…表示部、１６…音声認識部、１８…音声生成部、１９…画像生成部、４０、４０Ａ、４０Ｂ…自動応答装置、４２…利用者特定部、４３…環境解析部、４６…解釈部、４８…応答部、４９…画像提供部、５０…提供制御部、５２…学習部、８０…広告提供装置、８２…情報提供部 1, 1A, 1B: information processing system, 10: terminal device, 12: microphone, 14: speaker, 15: display unit, 16: voice recognition unit, 18: voice generation unit, 19: image generation unit, 40, 40A, DESCRIPTION OF SYMBOLS 40B ... Automatic response apparatus, 42 ... User identification part, 43 ... Environment analysis part, 46 ... Interpretation part, 48 ... Response part, 49 ... Image provision part, 50 ... Provision control part, 52 ... Learning part, 80 ... Advertisement offer Device, 82 ... information providing unit

Claims

A response unit that outputs a response content to a voice uttered by the user and specific information different from the response content to the output unit;
The user's instruction was accepted after changing the output mode of the specific information to a first output mode in which the user is less likely to hear than the third output mode, which is the output mode of the response content, and causing the output unit to output In this case, the control unit causes the output unit to output the specific information by changing the output mode of the specific information to a second output mode in which the user can easily hear the user than the first output mode;
An information processing system comprising:

When the control unit receives an instruction from the user, the control unit changes the output mode of the specific information based on the mode of the received instruction.
The information processing system according to claim 1.

The control unit is configured to output the specific information output by the output unit in the first output mode when the user receives an instruction when the output unit outputs the specific information in the first output mode. Change the mode to the second output mode, and output again to the output unit,
The information processing system according to claim 1.

The first output mode is an output mode in which the sound volume is smaller than in the third output mode, the frequency band of sound is difficult for the user to hear, or the tempo at which information is output is early.
The information processing system according to any one of claims 1 to 3.

The control unit changes the output mode of the specific information to the first output mode based on the attribute of the specific information, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 4.

The attribute of the specific information includes at least one of information on an advertisement, information on operation of a device, music, and information on a deadline related to a user,
The information processing system according to any one of claims 1 to 5.

The specific information includes information on an advertisement,
The control unit changes the output mode of the specific information to the first output mode based on the type of the advertisement, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 6.

The specific information includes information on an advertisement,
The control unit changes the output mode of the specific information to the first output mode based on the type of the advertisement and the result of the user's instruction given in the past, and outputs the specific information. To output to
The information processing system according to any one of claims 1 to 7.

The control unit changes the output mode of the specific information to the first output mode based on the attribute of the specific information and the attribute of the specific information specified by the user in advance. Output to the output unit,
The information processing system according to any one of claims 1 to 8.

The control unit changes the output mode of the specific information to the first output mode based on a time zone, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 9.

The control unit changes the output mode of the specific information to the first output mode based on an environment in which the user exists, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 10.

The control unit changes the output mode of the specific information to the first output mode based on the position where the user exists, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 11.

The control unit changes the output mode of the specific information to the first output mode based on a person present around the user and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 12.

The control unit changes the output mode of the specific information to the first output mode based on the user's schedule information, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 13.

The control unit changes the output mode of the specific information to the first output mode based on the environmental sound of the environment in which the user is present, and causes the output unit to output the specific information.
The information processing system according to any one of claims 1 to 14.

The user's instruction is performed by voice input to a user device for which voice is input, or by a predetermined operation.
The information processing system according to any one of claims 1 to 15.

The control unit causes the output unit to output the response content in an output mode according to the first character, and causes the output unit to output the specific information in an output mode according to a second character.
The information processing system according to any one of claims 1 to 16.

One or more computers
The response content for the voice uttered by the user and the specific information different from the response content are output to the output unit,
The user's instruction was accepted after changing the output mode of the specific information to a first output mode in which the user is less likely to hear than the third output mode, which is the output mode of the response content, and causing the output unit to output In this case, the output mode of the specific information is changed to a second output mode that is easier for the user to hear than the first output mode, and the specific information is output to the output unit.
Information processing method.

On one or more computers,
The response content for the voice uttered by the user and the specific information different from the response content are output to the output unit,
The user's instruction was accepted after changing the output mode of the specific information to a first output mode in which the user is less likely to hear than the third output mode, which is the output mode of the response content, and causing the output unit to output In this case, the output mode of the specific information is changed to a second output mode that is easier for the user to hear than the first output mode, and the specific information is output to the output unit.
program.