JP2021047507A

JP2021047507A - Notification system, notification control device, notification control method, and notification control program

Info

Publication number: JP2021047507A
Application number: JP2019168321A
Authority: JP
Inventors: 航遠藤; Ko Endo
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2021-03-25

Abstract

To provide a notification system, a notification control device, a notification control method, and a notification control program which enable content preferred by an occupant of a vehicle to be provided according to an ambient environment of the occupant of the vehicle.SOLUTION: A notification system comprises: a first acquisition unit which acquires voice data showing a voice uttered by an occupant; an utterance content interpretation unit which interprets a content of an utterance of the occupant on the basis of the voice data; a second acquisition unit which acquires ambient environment information; a third acquisition unit which outputs the content of the utterance and the ambient environment information to recommendation systems which make content recommendation to the occupant to acquire recommendation information showing one or more pieces of content; a notification control unit which causes an output unit provided in a moving body to perform notification of a plurality of pieces of recommendation information acquired by the third acquisition unit from each of the recommendation systems; and a provision unit which, when selection operation is accepted by an operation unit, causes the output unit to output the content corresponding to any selected recommendation information.SELECTED DRAWING: Figure 1

Description

本発明は、通知システム、通知制御装置、通知制御方法、及び通知制御プログラムに関する。 The present invention relates to a notification system, a notification control device, a notification control method, and a notification control program.

従来、ユーザに係る情報に応じて、ユーザ好みのコンテンツを提供するコンテンツシステムを選択する技術が開示されている（例えば、特許文献１）。 Conventionally, a technique for selecting a content system that provides user-favorite content according to information relating to the user has been disclosed (for example, Patent Document 1).

特開２０１４−１１５８９７号公報Japanese Unexamined Patent Publication No. 2014-115897

ここで、ユーザの嗜好は、ユーザの周囲環境や、ユーザがいる場所に応じて変化する場合がある。しかしながら、従来の技術では、ユーザの周囲環境に応じてユーザ好みのコンテンツを提供することまでは困難であった。 Here, the user's preference may change depending on the user's surrounding environment and the place where the user is. However, with the conventional technology, it has been difficult to provide user-favorite content according to the user's surrounding environment.

本発明は、このような事情を考慮してなされたものであり、車両の乗員の周囲環境に応じて車両の乗員の好みのコンテンツを提供するようにできる通知システム、通知制御装置、通知制御方法、及び通知制御プログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and is a notification system, a notification control device, and a notification control method capable of providing favorite contents of a vehicle occupant according to the surrounding environment of the vehicle occupant. , And one of the purposes is to provide a notification control program.

この発明に係る通知システム、通知制御装置、通知制御方法、及び通知制御プログラムは、以下の構成を採用した。
（１）この発明の一態様の通知システムは、移動体に搭乗している乗員により発話された音声を示す音声データを取得する第１取得部と、前記音声データに基づいて該乗員の発話の内容を解釈する発話内容解釈部と、前記乗員の周囲環境に係る周囲環境情報を取得する第２取得部と、前記乗員にコンテンツをレコメンドする一以上のレコメンドシステムに、前記発話の内容および前記周囲環境情報を出力し、該レコメンドシステムのそれぞれから、一以上のコンテンツを示すレコメンド情報を取得する第３取得部と、前記移動体が備える出力部に、前記第３取得部により取得された前記レコメンドシステム毎の前記レコメンド情報を複数通知させる通知制御部と、前記出力部により通知された前記レコメンド情報のうちいずれかを選択する操作が、前記移動体が備える操作部により受け付けられた場合、選択されたいずれかのレコメンド情報に対応するコンテンツを該出力部に出力させる提供部と、を備えるものである。 The notification system, the notification control device, the notification control method, and the notification control program according to the present invention have adopted the following configurations.
(1) The notification system according to one aspect of the present invention includes a first acquisition unit that acquires voice data indicating voice spoken by an occupant on a moving body, and a first acquisition unit that acquires voice data of the occupant based on the voice data. The content of the utterance and the surroundings are provided to the utterance content interpretation unit that interprets the content, the second acquisition unit that acquires the ambient environment information related to the occupant's surrounding environment, and one or more recommendation systems that recommend the content to the occupant. The third acquisition unit that outputs environmental information and acquires recommendation information indicating one or more contents from each of the recommendation systems, and the recommendation unit acquired by the third acquisition unit to the output unit included in the moving body. When the operation of selecting one of the notification control unit for notifying a plurality of the recommendation information for each system and the recommendation information notified by the output unit is accepted by the operation unit included in the moving body, it is selected. It is provided with a providing unit for outputting the content corresponding to any of the recommended information to the output unit.

（２）の態様は、上記（１）の態様に係る通知システムにおいて、前記通知制御部は、前記レコメンド情報に対応する一以上の前記コンテンツのうち、該レコメンド情報と前記乗員の嗜好を示す嗜好情報とに基づいて、当該乗員の嗜好と合致するコンテンツを前記出力部により通知させるものである。 In the notification system according to the aspect (1), the notification control unit has a preference of showing the recommendation information and the preference of the occupant among the one or more contents corresponding to the recommendation information. Based on the information, the output unit notifies the content that matches the taste of the occupant.

（３）の態様は、上記（１）または（２）の態様に係る通知システムが、前記出力部により通知された前記レコメンド情報と、前記レコメンド情報に係る前記周囲環境情報とを対応付けた履歴情報を生成する生成部と、前記第２取得部により新たに取得された前記周囲環境情報および前記生成部により生成された前記履歴情報に基づいて、前記乗員にレコメンドするための、新たにコンテンツを選択する選択部を更に備え、前記通知制御部は、前記選択部により選択された前記コンテンツを前記出力部に新たに通知させるものである。 The aspect (3) is a history in which the notification system according to the aspect (1) or (2) associates the recommendation information notified by the output unit with the ambient environment information related to the recommendation information. Based on the generation unit that generates information, the ambient environment information newly acquired by the second acquisition unit, and the history information generated by the generation unit, new content for recommending to the occupant is newly provided. The notification control unit further includes a selection unit to be selected, and causes the output unit to newly notify the content selected by the selection unit.

（４）の態様は、上記（３）の態様に係る通知システムにおいて、前記提供部は、前記出力部により通知された一以上の前記コンテンツを選択しない操作が、前記操作部により受け付けられた場合、前記レコメンド情報が示す一以上の前記コンテンツを前記出力部に提供させず、前記生成部は、該出力部が当該コンテンツを提供させなかった旨の履歴情報を生成するものである。 The aspect (4) is the case where in the notification system according to the aspect (3), the operation unit receives an operation in which the providing unit does not select one or more of the contents notified by the output unit. The output unit is not provided with one or more of the contents indicated by the recommendation information, and the generation unit generates historical information to the effect that the output unit did not provide the contents.

（５）の態様は、上記（１）から（４）のいずれかの態様に係る通知システムにおいて、前記周囲環境情報には、前記移動体の周囲の環境を対象にする情報と、前記乗員の周囲の環境を対象にする情報とが含まれ、前記移動体の周囲の環境を対象にする情報には、該移動体が存在する地点の天気、気候、温度もしくは湿度、又は、該移動体の周辺に存在する地点のＰＯＩ情報が含まれ、前記乗員の周囲の環境を対象にする情報には、発話時の日時もしくは曜日、又は、当該乗員の年代毎の人数、性別毎の人数もしくは総人数、又は、乗員の状況が含まれるものである。 The aspect (5) is the notification system according to any one of the above (1) to (4), wherein the surrounding environment information includes information targeting the environment around the moving body and the occupant. Information that targets the surrounding environment is included, and the information that targets the environment around the moving body includes the weather, climate, temperature or humidity at the point where the moving body exists, or the information of the moving body. The POI information of the points existing in the vicinity is included, and the information targeting the environment around the occupant includes the date and time or day of the speech, the number of the occupants by age group, the number of persons by gender, or the total number of persons. Or, the situation of the occupants is included.

（６）の態様は、上記（１）から（５）のいずれかの態様に係る通知システムにおいて、前記第３取得部は、前記レコメンドシステムが前記発話の内容に類似するクエリに更に基づいて前記コンテンツをレコメンドした前記レコメンド情報を取得するものである。 The aspect (6) is the notification system according to any one of the above aspects (1) to (5), wherein the third acquisition unit further bases the recommendation system on a query similar to the content of the utterance. The recommendation information that recommends the content is acquired.

（７）の態様は、上記（１）から（６）のいずれかの態様に係る通知システムにおいて、一以上の前記レコメンドシステムは、互いに異なる運営者によって運営されるシステムであるものである。 The aspect (7) is a notification system according to any one of the above aspects (1) to (6), in which one or more of the recommendation systems are operated by operators different from each other.

（８）この発明の他の態様の通知制御装置は、移動体に搭乗している乗員により発話された音声を示す音声データを取得する第１取得部と、前記音声データに基づいて該乗員の発話の内容を解釈する発話内容解釈部と、前記乗員の周囲環境に係る周囲環境情報を取得する第２取得部と、前記乗員にコンテンツをレコメンドする一以上のレコメンドシステムに、前記発話の内容および前記周囲環境情報を出力し、該レコメンドシステムのそれぞれから、一以上のコンテンツを示すレコメンド情報を取得する第３取得部と、前記移動体が備える出力部に、前記第３取得部により取得された前記レコメンドシステム毎の前記レコメンド情報を複数通知させる通知制御部と、前記出力部により通知された前記レコメンド情報のうちいずれかの該レコメンド情報を選択する操作が、前記移動体が備える操作部により受け付けられた場合、選択されたいずれかのレコメンド情報に対応するコンテンツを該出力部に出力させる提供部と、を備えるものである。 (8) The notification control device of another aspect of the present invention includes a first acquisition unit that acquires voice data indicating voice spoken by an occupant on a moving body, and the occupant based on the voice data. The content of the utterance and the content of the utterance are added to the utterance content interpretation unit that interprets the content of the utterance, the second acquisition unit that acquires the ambient environment information related to the occupant's surrounding environment, and one or more recommendation systems that recommend the content to the occupant. The third acquisition unit acquired the surrounding environment information and acquired the recommendation information indicating one or more contents from each of the recommendation systems, and the output unit included in the moving body. The operation unit included in the moving body accepts an operation of selecting the recommendation information from the notification control unit for notifying a plurality of the recommendation information for each recommendation system and the recommendation information notified by the output unit. If so, it includes a providing unit that outputs the content corresponding to any of the selected recommendation information to the output unit.

（９）この発明の他の態様の通知制御方法は、単数または複数のコンピュータが実行する、移動体に搭乗している乗員により発話された音声を示す音声データを取得する第１取得プロセスと、前記音声データに基づいて該乗員の発話の内容を解釈する発話内容解釈プロセスと、前記乗員の周囲環境に係る周囲環境情報を取得する第２取得プロセスと、前記乗員にコンテンツをレコメンドする一以上のレコメンドシステムに、前記発話の内容および前記周囲環境情報を出力する出力ステップと、該レコメンドシステムのそれぞれから、一以上のコンテンツを示すレコメンド情報を取得する取得ステップとを含む第３取得プロセスと、前記移動体が備える出力部に、前記第３取得プロセスにより取得された前記レコメンドシステム毎の前記レコメンド情報を複数通知させる通知プロセスと、前記出力部により通知された前記レコメンド情報のうちいずれかの該レコメンド情報を選択する操作が、前記移動体が備える操作部により受け付けられた場合、選択されたいずれかのレコメンド情報に対応するコンテンツを該出力部に出力させる出力プロセスと、を有するものである。 (9) The notification control method of another aspect of the present invention includes a first acquisition process of acquiring voice data indicating voice spoken by an occupant aboard a moving body, which is executed by one or more computers. The utterance content interpretation process that interprets the utterance content of the occupant based on the voice data, the second acquisition process that acquires the ambient environment information related to the occupant's surrounding environment, and one or more that recommends the content to the occupant. A third acquisition process including an output step of outputting the content of the utterance and the surrounding environment information to the recommendation system, and an acquisition step of acquiring recommendation information indicating one or more contents from each of the recommendation systems, and the above-mentioned. A notification process for notifying the output unit of the moving body of a plurality of the recommendation information for each of the recommendation systems acquired by the third acquisition process, and the recommendation of any one of the recommendation information notified by the output unit. When the operation of selecting information is accepted by the operation unit included in the moving body, it has an output process for outputting the content corresponding to any of the selected recommendation information to the output unit.

（１０）この発明の他の態様の通知制御プログラムは、単数または複数のコンピュータにインストールされるためのプログラムであって、移動体に搭乗している乗員により発話された音声を示す音声データを取得する第１取得プロセスと、前記音声データに基づいて該乗員の発話の内容を解釈する発話内容解釈プロセスと、前記乗員の周囲環境に係る周囲環境情報を取得する第２取得プロセスと、前記乗員にコンテンツをレコメンドする一以上のレコメンドシステムに、前記発話の内容および前記周囲環境情報を出力する出力ステップと、該レコメンドシステムのそれぞれから、一以上のコンテンツを示すレコメンド情報を取得する取得ステップとを含む第３取得プロセスと、前記移動体が備える出力部に、前記第３取得プロセスにより取得された前記レコメンドシステム毎の前記レコメンド情報を複数通知させる通知プロセスと、前記出力部により通知された前記レコメンド情報のうちいずれかの該レコメンド情報を選択する操作が、前記移動体が備える操作部により受け付けられた場合、選択されたいずれかのレコメンド情報に対応するコンテンツを該出力部に出力させる出力プロセスと、を前記コンピュータに実行させるものである。 (10) The notification control program of another aspect of the present invention is a program to be installed on one or more computers, and acquires voice data indicating voice spoken by an occupant on a moving body. The first acquisition process, the utterance content interpretation process that interprets the utterance content of the occupant based on the voice data, the second acquisition process that acquires the ambient environment information related to the occupant's surrounding environment, and the occupant Includes an output step of outputting the content of the utterance and the surrounding environment information to one or more recommendation systems that recommend the content, and an acquisition step of acquiring recommendation information indicating one or more contents from each of the recommendation systems. A notification process for notifying a third acquisition process, a plurality of the recommendation information for each recommendation system acquired by the third acquisition process, and an output unit included in the moving body, and the recommendation information notified by the output unit. When the operation of selecting one of the recommended information is accepted by the operation unit included in the moving body, the output process for outputting the content corresponding to the selected recommended information to the output unit, and the output process. Is to be executed by the computer.

（１）〜（１０）によれば、車両の乗員の周囲環境に応じて車両の乗員の好みのコンテンツを提供するようにできる。 According to (1) to (10), it is possible to provide the favorite content of the vehicle occupant according to the surrounding environment of the vehicle occupant.

（２）によれば、より車両の乗員の好みのコンテンツを提供するようできる。 According to (2), it is possible to provide more favorite contents of the occupants of the vehicle.

（３）によれば、これまでの車両の乗員の好みに応じたコンテンツを提供するようにできる。 According to (3), it is possible to provide content according to the preference of the occupants of the vehicle so far.

（４）によれば、車両の乗員の好みに合わないコンテンツを提供しないようにすることができる。 According to (4), it is possible to prevent the provision of content that does not suit the tastes of the occupants of the vehicle.

（５）〜（６）によれば、車両の乗員に多くのコンテンツの選択肢を提示することができる。 According to (5) to (6), many content options can be presented to the occupants of the vehicle.

エージェント装置１００を含むエージェントシステム１の構成図である。It is a block diagram of the agent system 1 including the agent apparatus 100. 実施形態に係るエージェント装置１００の構成と、車両Ｍに搭載された機器とを示す図である。It is a figure which shows the structure of the agent apparatus 100 which concerns on embodiment, and the apparatus mounted on the vehicle M. 実施形態に係るエージェントサーバ２００の構成と、エージェント装置１００の構成の一部とを示す図である。It is a figure which shows the structure of the agent server 200 which concerns on embodiment, and a part of the structure of agent apparatus 100. コンテンツサーバ３００−１の構成の一例を示す図である。It is a figure which shows an example of the structure of the content server 300-1. レコメンド情報の通知に用いられる第１通知画像ＩＭａ１場面の一例を示す図である。It is a figure which shows an example of the 1st notification image IMa1 scene used for the notification of the recommendation information. レコメンド情報の通知に用いられる第２通知画像ＩＭａ２場面の一例を示す図である。It is a figure which shows an example of the 2nd notification image IMa2 scene used for the notification of the recommendation information. 実施形態に係るエージェントシステム１の一連の動作の一例を示すフローチャートである。It is a flowchart which shows an example of a series of operations of the agent system 1 which concerns on embodiment. 実施形態に係るエージェントシステム１の一連の動作の一例を示すフローチャートである。It is a flowchart which shows an example of a series of operations of the agent system 1 which concerns on embodiment. 履歴情報１５２の内容の一例を示す図である。It is a figure which shows an example of the contents of the history information 152. エージェントサーバ２００の機能を備えるエージェント装置１００Ａの構成の一例を示す図である。It is a figure which shows an example of the configuration of the agent apparatus 100A which has the function of the agent server 200.

以下、図面を参照し、本発明の通知システム、通知制御装置、通知制御方法、及び通知制御プログラムの実施形態について説明する。 Hereinafter, embodiments of the notification system, the notification control device, the notification control method, and the notification control program of the present invention will be described with reference to the drawings.

＜実施形態＞
エージェント装置は、本実施形態の通知システムを含むエージェントシステム１の一部または全部を実現する装置である。以下では、エージェント装置の一例として、乗員（利用者の一例）が搭乗する車両（以下、車両Ｍ）に搭載され、エージェント機能を備えたエージェント装置について説明する。なお、本発明の適用上、必ずしもエージェント装置がエージェント機能を有している必要はない。また、エージェント装置は、スマートフォン等の可搬型端末装置（汎用端末）であってもよいが、以下では、車両に搭載されたエージェント機能を備えたエージェント装置を前提として説明する。エージェント機能とは、例えば、車両Ｍの乗員と対話をしながら、乗員の発話の中に含まれる要求（コマンド）に基づく各種の情報提供や各種機器制御を行ったり、ネットワークサービスを仲介したりする機能である。エージェント装置が複数のエージェント機能を有する場合、エージェント機能は、それぞれに果たす機能、処理手順、制御、出力態様・内容がそれぞれ異なってもよい。また、エージェント機能の中には、車両内の機器（例えば運転制御や車体制御に関わる機器）の制御等を行う機能を有するものがあってよい。車両Ｍは、「移動体」の一例である。 <Embodiment>
The agent device is a device that realizes a part or all of the agent system 1 including the notification system of the present embodiment. Hereinafter, as an example of the agent device, an agent device mounted on a vehicle (hereinafter, vehicle M) on which a occupant (an example of a user) is boarded and having an agent function will be described. For the application of the present invention, the agent device does not necessarily have to have an agent function. Further, the agent device may be a portable terminal device (general-purpose terminal) such as a smartphone, but the following description will be made on the premise of an agent device having an agent function mounted on a vehicle. The agent function is, for example, providing various information based on a request (command) included in the utterance of the occupant, controlling various devices, and mediating a network service while interacting with the occupant of the vehicle M. It is a function. When the agent device has a plurality of agent functions, the agent functions may have different functions, processing procedures, controls, and output modes / contents. In addition, some of the agent functions may have a function of controlling devices in the vehicle (for example, devices related to driving control and vehicle body control). The vehicle M is an example of a “moving body”.

エージェント機能は、例えば、乗員の音声を認識する音声認識機能（音声をテキスト化する機能）に加え、自然言語処理機能（テキストの構造や意味を理解する機能）、対話管理機能、ネットワークを介して他装置を検索し、或いは自装置が保有する所定のデータベースを検索するネットワーク検索機能等を統合的に利用して実現される。これらの機能の一部または全部は、ＡＩ（Artificial Intelligence）技術によって実現されてよい。また、これらの機能を行うための構成の一部（特に、音声認識機能や自然言語処理解釈機能）は、車両Ｍの車載通信装置または車両Ｍに持ち込まれた汎用通信装置と通信可能なエージェントサーバ（外部装置）に搭載されてもよい。以下の説明では、構成の一部がエージェントサーバに搭載されており、エージェント装置とエージェントサーバとが協働してエージェントシステムを実現することを前提とする。また、エージェント装置とエージェントサーバが協働して仮想的に出現させるサービス提供主体（サービス・エンティティ）をエージェントと称する。 Agent functions include, for example, a voice recognition function that recognizes the voice of an occupant (a function that converts voice into text), a natural language processing function (a function that understands the structure and meaning of text), a dialogue management function, and a network. It is realized by using a network search function that searches for another device or a predetermined database owned by the own device in an integrated manner. Some or all of these functions may be realized by AI (Artificial Intelligence) technology. In addition, a part of the configuration for performing these functions (particularly, the voice recognition function and the natural language processing interpretation function) is an agent server capable of communicating with the in-vehicle communication device of the vehicle M or the general-purpose communication device brought into the vehicle M. It may be mounted on (external device). In the following description, it is assumed that a part of the configuration is mounted on the agent server, and the agent device and the agent server cooperate to realize the agent system. Further, a service provider (service entity) in which an agent device and an agent server cooperate to appear virtually is called an agent.

＜全体構成＞
図１は、エージェント装置１００を含むエージェントシステム１の構成図である。エージェントシステム１は、例えば、エージェント装置１００と、一以上のエージェントサーバ２００とを備える。本実施形態におけるエージェントシステム１を提供する提供者は、例えば、自動車メーカー、ネットワークサービス事業者、電子商取引事業者、携帯端末の販売者や製造者等が挙げられ、任意の主体（法人、団体、個人等）がエージェントシステム１の提供者となり得る。なお、図１では、エージェントサーバ２００が一つである場合について説明したが、これに限られず、エージェントシステム１は、二以上のエージェントサーバ２００を備えるものであってもよい。この場合、各エージェントサーバ２００は、互いに異なる任意の主体によって提供されてもよい。 <Overall configuration>
FIG. 1 is a configuration diagram of an agent system 1 including an agent device 100. The agent system 1 includes, for example, an agent device 100 and one or more agent servers 200. Providers that provide the agent system 1 in the present embodiment include, for example, automobile manufacturers, network service providers, electronic commerce businesses, sellers and manufacturers of mobile terminals, and any other entity (corporation, organization, etc.). An individual, etc.) can be the provider of the agent system 1. Note that FIG. 1 has described the case where there is only one agent server 200, but the present invention is not limited to this, and the agent system 1 may include two or more agent servers 200. In this case, each agent server 200 may be provided by any entity different from each other.

エージェント装置１００は、ネットワークＮＷを介してエージェントサーバ２００と通信する。ネットワークＮＷは、例えば、インターネット、セルラー網、Ｗｉ−Ｆｉ網、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、公衆回線、電話回線、無線基地局等の通信網のうち一部または全部を含む。ネットワークＮＷには、一以上のコンテンツサーバ３００（図示するコンテンツサーバ３００−１〜３００−３）が接続されており、エージェントサーバ２００またはエージェント装置１００は、ネットワークＮＷを介してコンテンツサーバ３００からコンテンツを取得することができる。 The agent device 100 communicates with the agent server 200 via the network NW. The network NW includes, for example, a part or all of communication networks such as the Internet, cellular network, Wi-Fi network, WAN (Wide Area Network), LAN (Local Area Network), public line, telephone line, and wireless base station. Including. One or more content servers 300 (content servers 300-1 to 300-3 in the figure) are connected to the network NW, and the agent server 200 or the agent device 100 transmits content from the content server 300 via the network NW. Can be obtained.

以下、コンテンツサーバ３００−１〜３００−２がビデオコンテンツを提供するサーバ装置であり、コンテンツサーバ３００−３が音楽コンテンツを提供するサーバ装置である場合について説明する。また、コンテンツサーバ３００−１と、コンテンツサーバ３００−２とは、互いに異なる任意の主体（運営者）によって提供されてもよい。以下、コンテンツサーバ３００−１は、「〇〇Ｖｉｄｅｏ」により運営され、コンテンツサーバ３００−２は、「△△Ｖｉｄｅｏ」により運営され、コンテンツサーバ３００−３は、「〇〇Ｍｕｓｉｃ」により運営されるものとする。コンテンツサーバ３００は、「レコメンドシステム」の一例である。 Hereinafter, a case where the content servers 300-1 to 300-2 are server devices for providing video content and the content server 300-3 is a server device for providing music content will be described. Further, the content server 300-1 and the content server 300-2 may be provided by arbitrary entities (operators) different from each other. Hereinafter, the content server 300-1 is operated by "○○ Video", the content server 300-2 is operated by "△△ Video", and the content server 300-3 is operated by "○○ Music". It shall be. The content server 300 is an example of a “recommendation system”.

エージェント装置１００は、車両Ｍの乗員と対話を行い、乗員からの音声をエージェントサーバ２００に送信し、エージェントサーバ２００から得られた回答を、音声出力や画像表示の形で乗員に提示する。 The agent device 100 interacts with the occupant of the vehicle M, transmits the voice from the occupant to the agent server 200, and presents the answer obtained from the agent server 200 to the occupant in the form of voice output or image display.

［車両］
図２は、実施形態に係るエージェント装置１００の構成と、車両Ｍに搭載された機器とを示す図である。車両Ｍには、例えば、一以上のマイク１０と、スピーカ２０と、表示・操作装置３０と、車載通信装置４０と、エージェント装置１００とが搭載される。これらの装置は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続される。なお、図２に示す構成はあくまで一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 [vehicle]
FIG. 2 is a diagram showing the configuration of the agent device 100 according to the embodiment and the equipment mounted on the vehicle M. The vehicle M is equipped with, for example, one or more microphones 10, a speaker 20, a display / operation device 30, an in-vehicle communication device 40, and an agent device 100. These devices are connected to each other by a multiplex communication line such as a CAN (Controller Area Network) communication line, a serial communication line, a wireless communication network, or the like. The configuration shown in FIG. 2 is merely an example, and a part of the configuration may be omitted or another configuration may be added.

マイク１０は、車室内で発せられた音を収集する収音部である。スピーカ２０は、例えば、車室内に配設されたスピーカ（音出力部）を含む。表示・操作装置３０は、画像を表示するとともに、入力操作を受付可能な装置（或いは装置群）である。表示・操作装置３０は、例えば、タッチパネルとして構成されたディスプレイ装置を含む。表示・操作装置３０は、更に、ＨＵＤ（Head Up Display）や機械式の入力装置を含んでもよい。表示・操作装置３０は、エージェント装置１００とナビゲーション装置（不図示）とで共用されてもよい。車載通信装置４０は、例えば、セルラー網やＷｉ−Ｆｉ網を利用してネットワークＮＷにアクセス可能な無線通信装置である。表示・操作装置３０のタッチパネルは、「操作部」の一例である。 The microphone 10 is a sound collecting unit that collects sounds emitted in the vehicle interior. The speaker 20 includes, for example, a speaker (sound output unit) arranged in the vehicle interior. The display / operation device 30 is a device (or device group) capable of displaying an image and accepting an input operation. The display / operation device 30 includes, for example, a display device configured as a touch panel. The display / operation device 30 may further include a HUD (Head Up Display) or a mechanical input device. The display / operation device 30 may be shared by the agent device 100 and the navigation device (not shown). The in-vehicle communication device 40 is, for example, a wireless communication device that can access the network NW using a cellular network or a Wi-Fi network. The touch panel of the display / operation device 30 is an example of an “operation unit”.

［エージェント装置］
エージェント装置１００は、管理部１１０と、エージェント機能部１３０と、車載通信部１４０と、記憶部１５０とを備える。管理部１１０は、例えば、音響処理部１１１と、周囲環境情報取得部１１２と、エージェントＷＵ（Wake Up）判定部１１３と、通信制御部１１４と、出力制御部１２０と備える。図２に示すソフトウェア配置は説明のために簡易に示しており、実際には、例えば、エージェント機能部１３０と車載通信装置４０の間に管理部１１０が介在してもよいように、任意に改変することができる。また、以下では、エージェント機能部１３０とエージェントサーバ２００が協働して出現させるエージェントを、単に「エージェント」と称する場合がある。 [Agent device]
The agent device 100 includes a management unit 110, an agent function unit 130, an in-vehicle communication unit 140, and a storage unit 150. The management unit 110 includes, for example, an audio processing unit 111, an ambient environment information acquisition unit 112, an agent WU (Wake Up) determination unit 113, a communication control unit 114, and an output control unit 120. The software layout shown in FIG. 2 is simply shown for the sake of explanation, and is actually modified arbitrarily so that, for example, the management unit 110 may intervene between the agent function unit 130 and the in-vehicle communication device 40. can do. Further, in the following, an agent caused by the agent function unit 130 and the agent server 200 to appear in cooperation with each other may be simply referred to as an “agent”.

エージェント装置１００の各構成要素は、例えば、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。記憶部１５０は、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）により実現されてもよく、ＤＶＤやＣＤ−ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）により実現されてもよく、ドライブ装置に装着される記憶媒体であってもよい。また、記憶部１５０の一部又は全部は、ＮＡＳや外部のストレージサーバ等、エージェント装置１００がアクセス可能な外部装置であってもよい。記憶部１５０には、例えば、エージェント装置１００において実行されるプログラム等の他、履歴情報１５２等の情報が記憶される。履歴情報１５２の詳細は、後述する。 Each component of the agent device 100 is realized, for example, by executing a program (software) by a hardware processor such as a CPU (Central Processing Unit). Some or all of these components are hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit), etc. It may be realized by (including circuits), or it may be realized by the cooperation of software and hardware. The storage unit 150 may be realized by a storage device (a storage device including a non-transient storage medium) such as an HDD (Hard Disk Drive) or a flash memory, and is a removable storage medium such as a DVD or a CD-ROM. It may be realized by (non-transient storage medium), or it may be a storage medium mounted on a drive device. Further, a part or all of the storage unit 150 may be an external device such as NAS or an external storage server that can be accessed by the agent device 100. The storage unit 150 stores, for example, information such as history information 152 in addition to a program executed by the agent device 100. The details of the history information 152 will be described later.

管理部１１０は、ＯＳ（Operating System）やミドルウェア等のプログラムが実行されることで機能する。 The management unit 110 functions by executing a program such as an OS (Operating System) or middleware.

管理部１１０の音響処理部１１１は、マイク１０から収集される音を受け付け、受け付けた音に対して、エージェントごとに予め設定されているウエイクアップワードを認識したり、その他の発話内容を認識するのに適した状態になるようにしたりする音響処理を行う。ウエイクアップワードとは、例えば、対象のエージェントを起動させるためのワード（単語）やフレーズ等である。ウエイクアップワードは、単体のエージェントを起動させるものでもよく、複数のエージェントを起動させるものでもよい。音響処理とは、例えば、バンドパスフィルタ等のフィルタリングによるノイズ除去や音の増幅等である。また、音響処理部１１１は、音響処理された音声（以下、音声ストリーム）を、エージェントＷＵ判定部１１３や起動中のエージェント機能部１３０に出力する。音響処理部１１１は、「第１取得部」の一例である。音声ストリームは、「音声データ」の一例である。 The sound processing unit 111 of the management unit 110 receives the sound collected from the microphone 10, recognizes the wake-up word preset for each agent for the received sound, and recognizes other utterance contents. Perform sound processing to make it suitable for. The wakeup word is, for example, a word or phrase for activating the target agent. The wakeup word may start a single agent or may start a plurality of agents. The acoustic processing is, for example, noise removal by filtering such as a bandpass filter, sound amplification, and the like. Further, the sound processing unit 111 outputs the sound-processed voice (hereinafter, voice stream) to the agent WU determination unit 113 and the activated agent function unit 130. The sound processing unit 111 is an example of the “first acquisition unit”. The audio stream is an example of "audio data".

周囲環境情報取得部１１２は、車載通信装置４０を介してネットワークＮＷに接続される外部装置（不図示）から、乗員（車両Ｍ）が存在する場所に係る情報（以下、周囲環境情報）を取得する。具体的には、周囲環境情報取得部１１２は、例えば、車両Ｍが備えるＧＮＳＳ（Global Navigation Satellite System）受信機（不図示）等によって特定された車両Ｍの位置を示す情報を外部装置に送信する。そして、外部装置は、受信した車両Ｍの位置を示す情報に基づいて、車両Ｍが存在する位置の天気、気候（季節）、温度、湿度、車両Ｍが存在する位置の周辺に存在するＰＯＩ（Point of Interest）情報等を示す周囲環境情報をエージェント装置１００に送信する。 The surrounding environment information acquisition unit 112 acquires information related to the location where the occupant (vehicle M) exists (hereinafter referred to as ambient environment information) from an external device (not shown) connected to the network NW via the vehicle-mounted communication device 40. To do. Specifically, the surrounding environment information acquisition unit 112 transmits, for example, information indicating the position of the vehicle M specified by the GNSS (Global Navigation Satellite System) receiver (not shown) included in the vehicle M to the external device. .. Then, the external device receives the weather, climate (season), temperature, humidity of the position where the vehicle M exists, and the POI (POI) existing around the position where the vehicle M exists, based on the received information indicating the position of the vehicle M. Point of Interest) Surrounding environment information indicating information and the like is transmitted to the agent device 100.

なお、周囲環境情報取得部１１２は、車両Ｍの外気を検出するセンサ（不図示）の検出結果に基づいて、車両Ｍが存在する位置の天気、気候、温度、湿度等を特定し、周囲環境情報を生成してもよい。また、周囲環境情報取得部１１２は、車両Ｍが備えるナビゲーション装置（不図示）からＰＯＩ情報を取得することにより、周囲環境情報を生成してもよい。 The surrounding environment information acquisition unit 112 identifies the weather, climate, temperature, humidity, etc. at the position where the vehicle M exists based on the detection result of the sensor (not shown) that detects the outside air of the vehicle M, and the surrounding environment. Information may be generated. Further, the surrounding environment information acquisition unit 112 may generate the surrounding environment information by acquiring the POI information from the navigation device (not shown) included in the vehicle M.

また、車両Ｍが、車両Ｍの周囲環境を撮像し、画像を生成するカメラを備え、周囲環境情報取得部１１２は、車両Ｍの周囲環境を示す画像を用いたディープラーニングにより学習された学習モデルを用いて、車両Ｍが存在する位置の天気、気候（季節）、温度、湿度、車両Ｍが存在する位置の周辺に存在するＰＯＩ情報等を含む周囲環境情報を導出してもよい。 Further, the vehicle M is provided with a camera that captures the surrounding environment of the vehicle M and generates an image, and the surrounding environment information acquisition unit 112 is a learning model learned by deep learning using an image showing the surrounding environment of the vehicle M. May be used to derive ambient environment information including weather, climate (season), temperature, humidity at the position where the vehicle M exists, POI information existing around the position where the vehicle M exists, and the like.

また、周囲環境情報に含まれる情報は一例であってこれに限られない。周囲環境情報取得部１１２は、例えば、車両Ｍの車室内の環境に係る情報を周囲環境情報として生成してもよい。この場合、車両Ｍは、車両Ｍの車室内を撮像するカメラ（不図示）を備えてもよい。カメラは、例えば、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）等の固体撮像素子を利用したデジタルカメラである。カメラは、車両Ｍの車室内を撮像可能な任意の箇所に取り付けられる。そして、カメラは、例えば、周期的に繰り返し車両Ｍの周辺を撮像する。例えば、カメラは、ステレオカメラであってもよい。周囲環境情報取得部１１２は、カメラが車両Ｍの車室内を撮像し、生成した画像を画像処理し、車両Ｍの乗員の人数、乗員の年代、乗員の状況（盛り上がっているか、話に熱中しているか、退屈しているか、乗員の気分等）等を特定する。例えば、周囲環境情報取得部１１２は、画像処理によって乗員の顔を認識し、認識された顔の個数によって乗員の人数を特定し、認識された顔領域の特徴量に基づいて乗員の年代を特定する。また、周囲環境情報取得部１１２は、画像処理によって乗員の顔を認識し、認識された顔の表情や口の動きによって乗員の状況を特定する。そして、周囲環境情報取得部１１２は、特定した情報を含めた周囲環境情報を生成する。 Further, the information included in the surrounding environment information is an example and is not limited to this. The surrounding environment information acquisition unit 112 may generate, for example, information related to the environment inside the vehicle interior of the vehicle M as the surrounding environment information. In this case, the vehicle M may include a camera (not shown) that images the interior of the vehicle M. The camera is, for example, a digital camera using a solid-state image sensor such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor). The camera is attached to an arbitrary place where the vehicle interior of the vehicle M can be imaged. Then, the camera periodically and repeatedly images the periphery of the vehicle M, for example. For example, the camera may be a stereo camera. In the ambient environment information acquisition unit 112, the camera images the interior of the vehicle M, processes the generated image, and the number of occupants of the vehicle M, the age of the occupants, and the situation of the occupants (whether it is exciting or enthusiastic about talking). Identify whether you are bored, bored, the mood of the occupants, etc.). For example, the ambient environment information acquisition unit 112 recognizes the occupant's face by image processing, specifies the number of occupants by the number of recognized faces, and identifies the occupant's age based on the feature amount of the recognized face area. To do. In addition, the surrounding environment information acquisition unit 112 recognizes the occupant's face by image processing, and identifies the occupant's situation by the recognized facial expression and mouth movement. Then, the surrounding environment information acquisition unit 112 generates the surrounding environment information including the specified information.

また、周囲環境情報取得部１１２は、例えば、車両Ｍの車室内を示す画像を用いたディープラーニングにより学習された学習モデルを用いて、車両Ｍの乗員の人数、乗員の年代、乗員の状況等を含む周囲環境情報を導出してもよい。 Further, the surrounding environment information acquisition unit 112 uses, for example, a learning model learned by deep learning using an image showing the interior of the vehicle M, such as the number of occupants of the vehicle M, the age of the occupants, the situation of the occupants, and the like. The surrounding environment information including may be derived.

以下、周囲環境情報には、車両Ｍが存在する位置の天気、気候（季節）、温度、湿度、車両Ｍが存在する位置の周辺に存在するＰＯＩ情報、車両Ｍの乗員の人数、乗員の年代、乗員の状況等の情報うち、少なくとも一以上の情報が含まれるものとする。周囲環境情報取得部１１２は、例えば、エージェントサーバ２００からの指示に応じて、周囲環境情報を取得（生成）し、車載通信装置４０によってエージェントサーバ２００に送信する。周囲環境情報取得部１１２は、「第２取得部」の一例である。 Hereinafter, the surrounding environment information includes the weather, climate (season), temperature, humidity at the position where the vehicle M exists, POI information existing around the position where the vehicle M exists, the number of occupants of the vehicle M, and the age of the occupants. , At least one or more of the information such as the situation of the occupants shall be included. The ambient environment information acquisition unit 112 acquires (generates) the ambient environment information in response to an instruction from the agent server 200, and transmits the ambient environment information to the agent server 200 by the vehicle-mounted communication device 40, for example. The ambient environment information acquisition unit 112 is an example of the “second acquisition unit”.

エージェントＷＵ判定部１１３は、エージェントに予め定められているウエイクアップワードを認識する。エージェントＷＵ判定部１１３は、音響処理部１１１によって音響処理が行われた音声ストリームから発話された音声を認識する。まず、エージェントＷＵ判定部１１３は、音声ストリームにおける音声波形の振幅と零交差に基づいて音声区間を検出する。エージェントＷＵ判定部１１３は、混合ガウス分布モデル（ＧＭＭ；Gaussian mixture model)に基づくフレーム単位の音声識別、及び非音声識別に基づく区間検出を行ってもよい。 The agent WU determination unit 113 recognizes a wakeup word predetermined for the agent. The agent WU determination unit 113 recognizes the voice uttered from the voice stream that has been sound-processed by the sound processing unit 111. First, the agent WU determination unit 113 detects a voice section based on the amplitude and zero intersection of the voice waveform in the voice stream. The agent WU determination unit 113 may perform frame-by-frame speech recognition based on a Gaussian mixture model (GMM) and section detection based on non-speech discrimination.

次に、エージェントＷＵ判定部１１３は、検出した音声区間における音声をテキスト化し、文字情報とする。そして、エージェントＷＵ判定部１１３は、テキスト化した文字情報がウエイクアップワードに該当するか否かを判定する。ウエイクアップワードであると判定した場合、エージェントＷＵ判定部１１３は、ウエイクアップワードに対応するエージェント機能部１３０を起動させる。なお、エージェントＷＵ判定部１１３に相当する機能が、エージェントサーバ２００に搭載されてもよい。この場合、管理部１１０は、音響処理部１１１によって音響処理が行われた音声ストリームをエージェントサーバ２００に送信し、エージェントサーバ２００がウエイクアップワードであると判定した場合、エージェントサーバ２００からの指示に従ってエージェント機能部１３０が起動する。また、各エージェント機能部１３０は、常時起動しており且つウエイクアップワードの判定を自ら行うものであってよい。この場合、管理部１１０がエージェントＷＵ判定部１１３を備える必要はない。 Next, the agent WU determination unit 113 converts the voice in the detected voice section into text and converts it into character information. Then, the agent WU determination unit 113 determines whether or not the textualized character information corresponds to the wakeup word. When it is determined that the wakeup word is used, the agent WU determination unit 113 activates the agent function unit 130 corresponding to the wakeup word. The agent server 200 may be equipped with a function corresponding to the agent WU determination unit 113. In this case, when the management unit 110 transmits the voice stream to which the sound processing has been performed by the sound processing unit 111 to the agent server 200 and determines that the agent server 200 is a wakeup word, the management unit 110 follows an instruction from the agent server 200. The agent function unit 130 is started. Further, each agent function unit 130 may be always activated and may determine the wakeup word by itself. In this case, the management unit 110 does not need to include the agent WU determination unit 113.

また、エージェントＷＵ判定部１１３は、上述した手順と同様の手順で、発話された音声に含まれる終了ワードを認識した場合であり、且つ、終了ワードに対応するエージェントが起動している状態（以下、必要に応じて「起動中」と称する）である場合、起動中のエージェント機能部を終了（停止）させる。なお、エージェントの起動、及び終了は、例えば、表示・操作装置３０から所定の操作を受け付けることによって実行されてもよいが、以下では、音声による起動、及び停止の例を説明する。また、起動中のエージェントは、音声の入力を所定時間以上受け付けなかった場合に停止させてもよい。 Further, the agent WU determination unit 113 recognizes the end word included in the spoken voice by the same procedure as the above procedure, and the agent corresponding to the end word is activated (hereinafter, , If necessary, it is referred to as "starting"), the running agent function unit is terminated (stopped). The start and end of the agent may be executed, for example, by accepting a predetermined operation from the display / operation device 30, but the following describes an example of starting and stopping by voice. Further, the activated agent may be stopped when the voice input is not received for a predetermined time or more.

通信制御部１１４は、エージェント機能部１３０を、ネットワークＮＷに接続可能にするための制御を行う。例えば、通信制御部１１４は、エージェント機能部１３０がネットワークを介して外部装置（例えば、エージェントサーバ２００）と通信を行う場合の接続状態等を制御する。また、通信制御部１１４は、通信が途切れた場合の再接続や、接続状態の切り替え等の制御を行う。 The communication control unit 114 controls the agent function unit 130 so that it can be connected to the network NW. For example, the communication control unit 114 controls the connection state and the like when the agent function unit 130 communicates with an external device (for example, the agent server 200) via the network. Further, the communication control unit 114 controls such as reconnection when communication is interrupted and switching of the connection state.

出力制御部１２０は、通信制御部１１４またはエージェント機能部１３０等からの指示に応じて表示部またはスピーカ２０に応答内容等の情報を出力させることで、乗員にサービス等の提供を行う。具体的には、出力制御部１２０は、後述するエージェント機能部１３０によってコンテンツサーバ３００から取得した（レコメンドされた）一以上のコンテンツを示す情報（以下、レコメンド情報）を、スピーカ２０や、表示・操作装置３０に通知させる。コンテンツとは、例えば、動画や音楽等である。 The output control unit 120 provides the occupants with services and the like by causing the display unit or the speaker 20 to output information such as response contents in response to an instruction from the communication control unit 114 or the agent function unit 130 or the like. Specifically, the output control unit 120 displays, such as the speaker 20, information indicating one or more contents (hereinafter, recommended information) acquired (recommended) from the content server 300 by the agent function unit 130 described later. Notify the operating device 30. The content is, for example, a moving image, music, or the like.

また、出力制御部１２０は、通知したレコメンド情報に含まれる一以上のコンテンツのうち、いずれかのコンテンツを選択する操作が表示・操作装置３０のタッチパネルによって受け付けられた場合、操作に応じてエージェント機能部１３０が取得したコンテンツの音声をスピーカ２０に出力させたり、コンテンツの画像を表示・操作装置３０の表示装置に表示させたりすることにより、コンテンツを乗員に提供させる。 Further, when the operation of selecting one of the one or more contents included in the notified recommendation information is received by the touch panel of the display / operation device 30, the output control unit 120 has an agent function according to the operation. The content is provided to the occupants by outputting the sound of the content acquired by the unit 130 to the speaker 20 and displaying the image of the content on the display device of the display / operation device 30.

音声制御部１２２は、エージェント機能部１３０がエージェントサーバ２００から取得した情報に基づいて、エージェントが車両Ｍの乗員の発話に応答する応答内容を、車両Ｍの乗員に通知するために用いられる音声をスピーカ２０に出力させる。 The voice control unit 122 transmits a voice used for notifying the occupant of the vehicle M of the response content in which the agent responds to the utterance of the occupant of the vehicle M based on the information acquired by the agent function unit 130 from the agent server 200. Output to the speaker 20.

表示制御部１２４は、エージェント機能部１３０がエージェントサーバ２００から取得した情報に基づいて、エージェントが車両Ｍの乗員の発話に応答する応答内容を、車両Ｍの乗員に通知するために用いられる画像を表示・操作装置３０のディスプレイ装置に表示させる。 The display control unit 124 displays an image used for notifying the occupant of the vehicle M of the response content in which the agent responds to the utterance of the occupant of the vehicle M based on the information acquired by the agent function unit 130 from the agent server 200. It is displayed on the display device of the display / operation device 30.

エージェント機能部１３０は、エージェントサーバ２００と協働して、車両の乗員の発話に応じて、音声、及び画像による応答を含むサービスを提供する。エージェント機能部１３０には、例えば、車両Ｍ、又は車両Ｍに搭載される車載機器を制御する権限が付与されており、後述する処理によりエージェントサーバ２００によって認識された車両Ｍの発話内容が、車両Ｍに搭載される車両機器の動作を指示するコマンドである場合、エージェント機能部１３０は、コマンドに基づいてそれらの車両機器を制御する。エージェント機能部１３０は、通信制御部１１４の制御に基づいて、車載通信部１４０によって車載通信装置４０を介してエージェントサーバ２００と通信する。 The agent function unit 130 cooperates with the agent server 200 to provide a service including a response by voice and an image in response to an utterance of a vehicle occupant. For example, the agent function unit 130 is granted the authority to control the vehicle M or the in-vehicle device mounted on the vehicle M, and the utterance content of the vehicle M recognized by the agent server 200 by the process described later is the vehicle. In the case of a command for instructing the operation of the vehicle equipment mounted on the M, the agent function unit 130 controls those vehicle equipment based on the command. Based on the control of the communication control unit 114, the agent function unit 130 communicates with the agent server 200 by the vehicle-mounted communication unit 140 via the vehicle-mounted communication device 40.

なお、エージェント機能部１３０には、法律や条例、エージェントを提供する事業者同士の契約等に応じて、車両機器を制御する権限が割り振られるものであってもよい。 The agent function unit 130 may be assigned the authority to control vehicle equipment according to laws, ordinances, contracts between businesses that provide agents, and the like.

車載通信部１４０は、例えば、エージェント機能部１３０がネットワークＮＷに接続する場合に、車載通信装置４０を介して通信させる。車載通信部１４０は、エージェント機能部１３０からの情報を、車載通信装置４０を介してエージェントサーバ２００やその他の外部装置に出力する。また、車載通信部１４０は、車載通信装置４０を介して入力された情報をエージェント機能部１３０に出力する。 The vehicle-mounted communication unit 140 communicates via the vehicle-mounted communication device 40, for example, when the agent function unit 130 connects to the network NW. The vehicle-mounted communication unit 140 outputs the information from the agent function unit 130 to the agent server 200 and other external devices via the vehicle-mounted communication device 40. Further, the vehicle-mounted communication unit 140 outputs the information input via the vehicle-mounted communication device 40 to the agent function unit 130.

エージェント機能部１３０は、エージェントＷＵ判定部１１３による起動指示に基づいて起動し、乗員の発話に対して、エージェントサーバ２００を介して乗員の発話の音声に含まれる要求に対する応答内容を生成し、生成した応答内容を出力制御部１２０に出力する。また、エージェント機能部１３０は、エージェントサーバ２００と通信を行う場合には、通信制御部１１４により制御された接続状態によって通信を行う。また、エージェント機能部１３０は、エージェントＷＵ判定部１１３による制御に基づいて、エージェントを停止させてもよい。 The agent function unit 130 is activated based on an activation instruction by the agent WU determination unit 113, and generates and generates a response content to a request included in the voice of the occupant's utterance via the agent server 200 in response to the occupant's utterance. The response content is output to the output control unit 120. Further, when communicating with the agent server 200, the agent function unit 130 communicates according to the connection state controlled by the communication control unit 114. Further, the agent function unit 130 may stop the agent based on the control by the agent WU determination unit 113.

［エージェントサーバ］
図３は、実施形態に係るエージェントサーバ２００の構成と、エージェント装置１００の構成の一部とを示す図である。以下、エージェントサーバ２００の構成とともに、エージェント機能部１３０等の動作について説明する。ここでは、エージェント装置１００からネットワークＮＷまでの物理的な通信についての説明を省略する。 [Agent server]
FIG. 3 is a diagram showing a configuration of the agent server 200 and a part of the configuration of the agent device 100 according to the embodiment. Hereinafter, the operation of the agent function unit 130 and the like will be described together with the configuration of the agent server 200. Here, the description of the physical communication from the agent device 100 to the network NW will be omitted.

エージェントサーバ２００は、通信部２１０を備える。通信部２１０は、例えば、ＮＩＣ（Network Interface Card）等のネットワークインターフェースである。更に、エージェントサーバ２００は、例えば、音声認識部２２０と、自然言語処理部２２１と、対話管理部２２２と、ネットワーク検索部２２３と、応答内容生成部２２４との機能部を備える。これらの構成要素は、例えば、ＣＰＵ等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵ等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤやフラッシュメモリ等の記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。音声認識部２２０と、自然言語処理部２２１とを組み合わせたものは、「発話内容解釈部」の一例である。 The agent server 200 includes a communication unit 210. The communication unit 210 is, for example, a network interface such as a NIC (Network Interface Card). Further, the agent server 200 includes, for example, a voice recognition unit 220, a natural language processing unit 221, a dialogue management unit 222, a network search unit 223, and a response content generation unit 224. These components are realized, for example, by a hardware processor such as a CPU executing a program (software). Some or all of these components may be realized by hardware such as LSI, ASIC, FPGA, GPU (including circuit part; circuitry), or realized by collaboration between software and hardware. May be good. The program may be stored in advance in a storage device such as an HDD or a flash memory (a storage device including a non-transient storage medium), or a removable storage medium such as a DVD or a CD-ROM (non-transient). It is stored in a sex storage medium) and may be installed by mounting the storage medium in a drive device. The combination of the voice recognition unit 220 and the natural language processing unit 221 is an example of the "utterance content interpretation unit".

また、エージェントサーバ２００は、記憶部２５０を備える。記憶部２５０は、上記の記憶部１５０を実現する各種記憶装置と同様の装置により実現される。記憶部２５０には、例えば、辞書ＤＢ２５２、パーソナルプロファイル２５４、知識ベースＤＢ２５６、応答規則ＤＢ２５８等のデータやプログラムが格納される。 Further, the agent server 200 includes a storage unit 250. The storage unit 250 is realized by the same device as the various storage devices that realize the storage unit 150. The storage unit 250 stores, for example, data and programs such as a dictionary DB 252, a personal profile 254, a knowledge base DB 256, and a response rule DB 258.

エージェント装置１００において、エージェント機能部１３０は、例えば、音響処理部１１１等から入力される音声ストリーム、或いは圧縮や符号化等の処理を行った音声ストリームを、エージェントサーバ２００に送信する。エージェント機能部１３０は、ローカル処理（エージェントサーバ２００を介さない処理）が可能なコマンド（要求内容）が認識できた場合には、コマンドで要求された処理を実行してもよい。ローカル処理が可能なコマンドとは、例えば、エージェント装置１００が備える記憶部１５０を参照することで応答可能なコマンドである。より具体的には、ローカル処理が可能なコマンドとは、例えば、記憶部１５０内に存在する電話帳データ（不図示）から特定者の名前を検索し、合致した名前に対応付けられた電話番号に電話をかける（相手を呼び出す）コマンドである。したがって、エージェント機能部１３０は、エージェントサーバ２００が備える機能の一部を有してもよい。 In the agent device 100, the agent function unit 130 transmits, for example, a voice stream input from the sound processing unit 111 or the like, or a voice stream that has undergone processing such as compression or coding to the agent server 200. When the agent function unit 130 can recognize a command (request content) capable of local processing (processing that does not go through the agent server 200), the agent function unit 130 may execute the processing requested by the command. The command capable of local processing is, for example, a command that can be responded to by referring to the storage unit 150 included in the agent device 100. More specifically, the command capable of local processing is, for example, a telephone directory associated with a matching name by searching for the name of a specific person from the telephone directory data (not shown) existing in the storage unit 150. It is a command to make a call to (call the other party). Therefore, the agent function unit 130 may have a part of the functions provided in the agent server 200.

音声ストリームを取得すると、音声認識部２２０が音声認識を行ってテキスト化された文字情報を出力し、自然言語処理部２２１が文字情報に対して辞書ＤＢ２５２を参照しながら意味解釈を行う。辞書ＤＢ２５２は、例えば、文字情報に対して抽象化された意味情報が対応付けられたものである。辞書ＤＢ２５２は、例えば、機能辞書２５２Ａと、汎用辞書２５２Ｂとを含む。 When the voice stream is acquired, the voice recognition unit 220 performs voice recognition and outputs the textualized character information, and the natural language processing unit 221 interprets the meaning of the character information while referring to the dictionary DB 252. The dictionary DB 252 is, for example, associated with abstract semantic information with respect to character information. The dictionary DB 252 includes, for example, a functional dictionary 252A and a general-purpose dictionary 252B.

機能辞書２５２Ａは、エージェントサーバ２００がエージェント機能部１３０と協働して実現するエージェントが提供する機能（サービス）をカバーするための辞書である。例えば、エージェントが車載エアコンを制御する機能を提供する場合、機能辞書２５２Ａには、「エアコン」、「空調」、「つける」、「消す」、「温度」、「上げる」、「下げる」、「内気」、「外気」等の単語が、動詞、目的語等の単語種別、及び抽象化された意味と対応付けられて登録されている。また、機能辞書２５２Ａには、同時に使用可能であることを示す単語間リンク情報が含まれてよい。 The function dictionary 252A is a dictionary for covering the functions (services) provided by the agent realized by the agent server 200 in cooperation with the agent function unit 130. For example, when the agent provides a function to control an in-vehicle air conditioner, the function dictionary 252A contains "air conditioner", "air conditioning", "turn on", "turn off", "temperature", "raise", "lower", and "lower". Words such as "inside air" and "outside air" are registered in association with word types such as verbs and objects, and abstracted meanings. In addition, the functional dictionary 252A may include inter-word link information indicating that they can be used at the same time.

汎用辞書２５２Ｂは、エージェントの提供する機能に限らず、一般的な物事の事象を抽象化された意味と対応付けた辞書である。機能辞書２５２Ａと汎用辞書２５２Ｂのそれぞれは、同義語や類義語の一覧情報を含んでもよい。機能辞書２５２Ａと汎用辞書２５２Ｂとは、複数の言語のそれぞれに対応して用意されてよく、その場合、音声認識部２２０及び自然言語処理部２２１は、予め設定されている言語設定に応じた機能辞書２５２Ａ及び汎用辞書２５２Ｂ、並びに文法情報（不図示）を使用する。音声認識部２２０の処理と、自然言語処理部２２１の処理は、段階が明確に分かれるものではなく、自然言語処理部２２１の処理結果を受けて音声認識部２２０が認識結果を修正する等、相互に影響し合って行われてよい。 The general-purpose dictionary 252B is not limited to the functions provided by the agent, but is a dictionary in which general events are associated with abstracted meanings. Each of the functional dictionary 252A and the general-purpose dictionary 252B may include list information of synonyms and synonyms. The function dictionary 252A and the general-purpose dictionary 252B may be prepared corresponding to each of a plurality of languages, in which case the voice recognition unit 220 and the natural language processing unit 221 have functions corresponding to preset language settings. A dictionary 252A, a general-purpose dictionary 252B, and grammatical information (not shown) are used. The processing of the voice recognition unit 220 and the processing of the natural language processing unit 221 are not clearly separated in stages, and the voice recognition unit 220 corrects the recognition result in response to the processing result of the natural language processing unit 221. It may be done by influencing each other.

自然言語処理部２２１は、音声認識部２２０による認識結果に基づく意味解析の一つとして、音声に含まれるサービスの要求に対応するために必要な機能に関する情報（以下、機能必要情報）を取得する。例えば、認識結果として、車両Ｍの車載機器の制御を指示する「窓を開けて」、「空調の温度を上げて」等のテキストが認識された場合、自然言語処理部２２１は、辞書ＤＢ２５２等を参照し、「車両機器制御」という対象機器・機能種別を取得する。そして、自然言語処理部２２１は、取得した機能必要情報をエージェント機能部１３０に出力する。自然言語処理部２２１は、機能必要情報に基づきサービス要求に対する実行可否の判定結果を取得する。自然言語処理部２２１は、要求された機能が実行可能である場合に、サービスの要求に対応できるものとして、解釈された発話内容に対応したコマンドを生成する。 The natural language processing unit 221 acquires information on functions required for responding to a request for a service included in voice (hereinafter referred to as function necessary information) as one of semantic analysis based on the recognition result by the voice recognition unit 220. .. For example, when a text such as "open the window" or "raise the temperature of the air conditioner" instructing the control of the in-vehicle device of the vehicle M is recognized as the recognition result, the natural language processing unit 221 may use the dictionary DB252 or the like. To acquire the target device / function type called "vehicle device control". Then, the natural language processing unit 221 outputs the acquired function necessary information to the agent function unit 130. The natural language processing unit 221 acquires the determination result of whether or not the service request can be executed based on the function required information. The natural language processing unit 221 generates a command corresponding to the interpreted utterance content as being able to respond to the service request when the requested function can be executed.

また、認識結果として、コンテンツサーバ３００にコンテンツの提供を求める「ポップスのコンテンツを紹介して」、「何かクラッシックのコンテンツを再生して」等の意味が認識された場合、自然言語処理部２２１は、辞書ＤＢ２５２等を参照し、「コンテンツレコメンド制御」という機能種別を取得し、対話管理部２２２に、「コンテンツレコメンド制御」という機能種別の情報を送信する。 In addition, when the meanings such as "introduce pop content" and "play some classic content" that request the content server 300 to provide the content are recognized as the recognition result, the natural language processing unit 221 Refers to the dictionary DB252 and the like, acquires the function type of "content recommendation control", and transmits the information of the function type of "content recommendation control" to the dialogue management unit 222.

対話管理部２２２は、自然言語処理部２２１により生成されたコマンドに基づいて、パーソナルプロファイル２５４や知識ベースＤＢ２５６、応答規則ＤＢ２５８を参照しながら車両Ｍの乗員に対する応答内容（例えば、乗員への発話内容や出力部から出力する画像、音声）を決定する。知識ベースＤＢ２５６は、物事の関係性を規定した情報である。応答規則ＤＢ２５８は、コマンドに対してエージェントが行うべき動作（回答や機器制御の内容等）を規定した情報である。対話管理部２２２は自然言語処理部２２１から、「コンテンツレコメンド制御」という機能種別の情報が送信されれば、エージェント装置１００から周囲環境情報を取得し、取得した周囲環境情報をコンテンツサーバ３００に出力する。コンテンツサーバ３００は、後述する処理によってレコメンド情報をエージェントサーバ２００に出力する。対話管理部２２２は、コンテンツサーバ３００から取得したレコメンド情報をエージェント装置１００に送信する。レコメンド情報の取得に係る処理について、対話管理部２２２は、「第３取得部」の一例である。 The dialogue management unit 222 responds to the occupant of the vehicle M (for example, the utterance content to the occupant) while referring to the personal profile 254, the knowledge base DB 256, and the response rule DB 258 based on the command generated by the natural language processing unit 221. And the image and sound to be output from the output section). The knowledge base DB 256 is information that defines the relationships between things. The response rule DB 258 is information that defines the actions (answers, device control contents, etc.) that the agent should perform in response to the command. When the dialogue management unit 222 transmits the information of the function type "content recommendation control" from the natural language processing unit 221, the dialogue management unit 222 acquires the surrounding environment information from the agent device 100 and outputs the acquired surrounding environment information to the content server 300. To do. The content server 300 outputs the recommendation information to the agent server 200 by a process described later. The dialogue management unit 222 transmits the recommendation information acquired from the content server 300 to the agent device 100. Regarding the process related to the acquisition of the recommendation information, the dialogue management unit 222 is an example of the “third acquisition unit”.

また、対話管理部２２２は、音声ストリームから得られる特徴情報を用いて、パーソナルプロファイル２５４と照合を行うことで、乗員を特定してもよい。この場合、パーソナルプロファイル２５４には、例えば、音声の特徴情報が更に応付けられている。音声の特徴情報とは、例えば、声の高さ、イントネーション、リズム（音の高低のパターン）等の喋り方の特徴や、メル周波数ケプストラム係数（Mel Frequency Cepstrum Coefficients）等による特徴量に関する情報である。音声の特徴情報は、例えば、乗員の初期登録時に所定の単語や文章等を乗員に発声させ、発声させた音声を認識することで得られる情報である。 Further, the dialogue management unit 222 may identify the occupant by collating with the personal profile 254 using the feature information obtained from the voice stream. In this case, for example, voice feature information is further attached to the personal profile 254. Voice feature information is, for example, information on speaking characteristics such as voice pitch, intonation, and rhythm (sound pitch pattern), and feature quantities based on Mel Frequency Cepstrum Coefficients and the like. .. The voice feature information is, for example, information obtained by having the occupant utter a predetermined word or sentence at the time of initial registration of the occupant and recognizing the uttered voice.

対話管理部２２２は、コマンドがネットワークＮＷを介して検索可能な情報を要求するものである場合、ネットワーク検索部２２３に検索を行わせる。ネットワーク検索部２２３は、ネットワークＮＷを介してコンテンツサーバ３００等の外部機器にアクセスし、所望の情報を取得する。 The dialogue management unit 222 causes the network search unit 223 to perform a search when the command requests information that can be searched via the network NW. The network search unit 223 accesses an external device such as the content server 300 via the network NW and acquires desired information.

応答内容生成部２２４は、対話管理部２２２により決定された発話の内容が車両Ｍの乗員に理解されるように、応答文を生成し、生成した応答文をエージェント装置１００に送信する。また、応答内容生成部２２４は、カメラが車室内を撮像した画像に基づいて車両Ｍの乗員を認識した認識結果をエージェント装置１００から取得し、取得した認識結果によりコマンドを含む発話を行った乗員がパーソナルプロファイル２５４に登録された乗員であることが特定されている場合に、乗員の名前を呼んだり、乗員の話し方に似せた話し方にしたりした応答文を生成してもよい。本実施形態において、応答文は、例えば、「あなたにおすすめのコンテンツをご用意しました。」等の文章である。 The response content generation unit 224 generates a response sentence so that the content of the utterance determined by the dialogue management unit 222 can be understood by the occupants of the vehicle M, and transmits the generated response sentence to the agent device 100. Further, the response content generation unit 224 acquires the recognition result of recognizing the occupant of the vehicle M based on the image captured by the camera interior from the agent device 100, and the occupant who made an utterance including a command based on the acquired recognition result. When is identified as a occupant registered in the personal profile 254, a response sentence may be generated that calls the occupant's name or makes the occupant's speech similar to that of the occupant. In the present embodiment, the response sentence is, for example, a sentence such as "We have prepared the content recommended for you."

エージェント機能部１３０は、応答文を取得すると、音声合成を行って音声を出力するように音声制御部１２２に指示する。また、エージェント機能部１３０は、応答文を含む画像等を表示するように表示制御部１２４に指示する。 When the agent function unit 130 acquires the response sentence, the agent function unit 130 instructs the voice control unit 122 to perform voice synthesis and output the voice. Further, the agent function unit 130 instructs the display control unit 124 to display an image or the like including a response sentence.

［コンテンツサーバ３００］
図４は、コンテンツサーバ３００−１の構成の一例を示す図である。コンテンツサーバ３００−１〜３００−３は、同様の構成を有するため、以降は、コンテンツサーバ３００−１について説明し、コンテンツサーバ３００−２〜３００−３の説明については、省略する。 [Content Server 300]
FIG. 4 is a diagram showing an example of the configuration of the content server 300-1. Since the content servers 300-1 to 300-3 have the same configuration, the content server 300-1 will be described below, and the description of the content servers 300-2 to 300-3 will be omitted.

コンテンツサーバ３００−１は、通信部３１０を備える。通信部３１０は、例えば、ＮＩＣ等のネットワークインターフェースである。更に、コンテンツサーバ３００−１は、取得部３２０と、レコメンド情報取得部３２２との機能部を備える。これらの構成要素は、例えば、ＣＰＵ等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵ等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤやフラッシュメモリ等の記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。 The content server 300-1 includes a communication unit 310. The communication unit 310 is, for example, a network interface such as a NIC. Further, the content server 300-1 includes a functional unit of an acquisition unit 320 and a recommendation information acquisition unit 322. These components are realized, for example, by a hardware processor such as a CPU executing a program (software). Some or all of these components may be realized by hardware such as LSI, ASIC, FPGA, GPU (including circuit part; circuitry), or realized by collaboration between software and hardware. May be good. The program may be stored in advance in a storage device such as an HDD or a flash memory (a storage device including a non-transient storage medium), or a removable storage medium such as a DVD or a CD-ROM (non-transient). It is stored in a sex storage medium) and may be installed by mounting the storage medium in a drive device.

また、コンテンツサーバ３００−１は、記憶部３５０と、コンテンツＤＢ３６０とを備える。記憶部３５０と、コンテンツＤＢ３６０とは、それぞれ、上記の記憶部１５０や記憶部２５０を実現する各種記憶装置と同様の装置により実現される。記憶部３５０には、例えば、学習モデル３５２等のデータやプログラムが格納される。また、コンテンツＤＢ３６０には、コンテンツサーバ３００−１が提供するコンテンツが格納される。 Further, the content server 300-1 includes a storage unit 350 and a content DB 360. The storage unit 350 and the content DB 360 are realized by the same devices as the various storage devices that realize the storage unit 150 and the storage unit 250, respectively. The storage unit 350 stores, for example, data or a program such as a learning model 352. Further, the content DB 360 stores the content provided by the content server 300-1.

学習モデル３５２は、例えば、周囲環境情報を入力データとし、レコメンド情報を出力データとした機械学習によって学習された学習済みモデルである。学習モデル３５２は、入力層と、隠れ層と、出力層とを有する。学習モデル３５２の入力層には、周囲環境情報に含まれる乗員の発話時の日時、曜日、車両Ｍが存在する地点の天気、気候、温度、湿度、車両Ｍの周辺に存在するＰＯＩ情報、車両Ｍの乗員の総人数、乗員の年代毎の人数、性別毎の人数、乗員の状況を示す情報が入力される。出力層からは、レコメンド情報が出力される。レコメンド情報には、コンテンツＤＢ３６０におけるコンテンツの格納場所（リンク）を示す情報が一以上含まれる。隠れ層は、入力層と出力層とをつなぐ多層のニューラルネットワークを有する。隠れ層のパラメータは、入力層への入力を学習データとし、出力層から出力されるべきデータを教師データとして機械学習を行うことで最適化される。この場合、学習データは、周囲環境情報であり、教師データは、当該周囲環境情報が示す環境に存在する車両Ｍの乗員がレコメンド情報に応じて選択したコンテンツである。 The learning model 352 is, for example, a learned model learned by machine learning using ambient environment information as input data and recommendation information as output data. The learning model 352 has an input layer, a hidden layer, and an output layer. The input layer of the learning model 352 includes the date and time when the occupant speaks, the day of the day, the weather at the point where the vehicle M exists, the climate, the temperature, the humidity, the POI information existing around the vehicle M, and the vehicle, which are included in the surrounding environment information. Information indicating the total number of M occupants, the number of occupants by age group, the number of occupants by gender, and the status of occupants is input. Recommendation information is output from the output layer. The recommendation information includes one or more pieces of information indicating a storage location (link) of the content in the content DB 360. The hidden layer has a multi-layer neural network connecting the input layer and the output layer. The parameters of the hidden layer are optimized by performing machine learning using the input to the input layer as training data and the data to be output from the output layer as teacher data. In this case, the learning data is the surrounding environment information, and the teacher data is the content selected by the occupant of the vehicle M existing in the environment indicated by the surrounding environment information according to the recommendation information.

なお、コンテンツサーバ３００−１は、学習段階のモデルを学習させる学習部を備えてもよい。この場合、学習部は、周囲環境情報と、当該周囲環境情報が示す環境に存在する車両Ｍの乗員がレコメンド情報に応じて選択したコンテンツを示す情報とに基づいて、学習段階のモデルを学習させ、学習段階のモデルが適切に学習された場合、学習段階のモデルを学習モデル３５２として記憶部３５０に記憶させる。 The content server 300-1 may include a learning unit for learning the model at the learning stage. In this case, the learning unit trains the learning stage model based on the surrounding environment information and the information indicating the content selected by the occupant of the vehicle M existing in the environment indicated by the surrounding environment information according to the recommendation information. When the learning stage model is properly learned, the learning stage model is stored in the storage unit 350 as the learning model 352.

取得部３２０は、通信部３１０によってエージェントサーバ２００から周囲環境情報を取得（受信）する。レコメンド情報取得部３２２は、取得部３２０によって取得された周囲環境情報を学習モデル３５２に入力し、出力としてレコメンド情報を取得する。レコメンド情報取得部３２２は、取得したレコメンド情報をエージェントサーバ２００に送信する。エージェントサーバ２００は、コンテンツサーバ３００からコンテンツサーバ３００毎にレコメンド情報を受信し、受信したレコメンド情報をエージェント装置１００に送信する。 The acquisition unit 320 acquires (receives) ambient environment information from the agent server 200 by the communication unit 310. The recommendation information acquisition unit 322 inputs the ambient environment information acquired by the acquisition unit 320 into the learning model 352, and acquires the recommendation information as an output. The recommendation information acquisition unit 322 transmits the acquired recommendation information to the agent server 200. The agent server 200 receives recommendation information from the content server 300 for each content server 300, and transmits the received recommendation information to the agent device 100.

［レコメンド情報の通知例］
図５は、レコメンド情報の通知に用いられる第１通知画像ＩＭａ１場面の一例を示す図である。エージェント機能部１３０は、エージェントサーバ２００からレコメンド情報を受信し、出力制御部１２０によって車両Ｍの乗員に通知させる。出力制御部１２０は、例えば、レコメンド情報を含む第１通知画像ＩＭａ１を生成し、表示・操作装置３０の画像に表示させて車両Ｍの乗員にレコメンド情報を通知させる。図５に示す通り、第１通知画像ＩＭａ１には、レコメンド情報に含まれるコンテンツをそれぞれ示す画像（以下、コンテンツ画像ＩＭｃ）が一以上含まれる。また、第１通知画像ＩＭａ１には、車両Ｍの乗員がコンテンツ画像ＩＭｃに対応するコンテンツの提供を受けないことを選択するボタンＢ１が含まれる。レコメンド情報の通知に係る処理において、出力制御部１２０と、エージェント機能部１３０とを組合せたものが「通知制御部」の一例である。 [Example of notification of recommendation information]
FIG. 5 is a diagram showing an example of a first notification image IMa1 scene used for notification of recommendation information. The agent function unit 130 receives the recommendation information from the agent server 200, and causes the output control unit 120 to notify the occupants of the vehicle M. The output control unit 120 generates, for example, the first notification image IMa1 including the recommendation information, displays it on the image of the display / operation device 30, and causes the occupant of the vehicle M to notify the recommendation information. As shown in FIG. 5, the first notification image IMa1 includes one or more images (hereinafter, content image IMc) indicating the contents included in the recommendation information. Further, the first notification image IMa1 includes a button B1 for selecting that the occupant of the vehicle M does not receive the content corresponding to the content image IMc. An example of the "notification control unit" is a combination of the output control unit 120 and the agent function unit 130 in the process related to the notification of the recommendation information.

図５の例の第１通知画像ＩＭａ１は、コンテンツサーバ３００毎のレコメンド情報が複数通知される。具体的には、第１通知画像ＩＭａ１には、「〇〇Ｖｉｄｅｏ」により運営されるコンテンツサーバ３００−１から出力されたレコメンド情報に含まれるコンテンツを示したコンテンツ画像ＩＭｃ１（図示するコンテンツ画像ＩＭｃ１−１〜ＩＭｃ１−３）と、「△△Ｖｉｄｅｏ」により運営されるコンテンツサーバ３００−２から出力されたレコメンド情報に含まれるコンテンツを示したコンテンツ画像ＩＭｃ２（図示するコンテンツ画像ＩＭｃ２−１〜ＩＭＣ２−３）と、「〇〇Ｍｕｓｉｃ」により運営されるコンテンツサーバ３００−３から出力されたレコメンド情報に含まれるコンテンツを示したコンテンツ画像ＩＭｃ３（図示するコンテンツ画像ＩＭｃ３−１〜ＩＭｃ３−３）とが含まれる。 In the first notification image IMa1 of the example of FIG. 5, a plurality of recommendation information for each content server 300 is notified. Specifically, the first notification image IMa1 is a content image IMc1 (illustrated content image IMc1-) showing the content included in the recommendation information output from the content server 300-1 operated by "OO Video". Content image IMc2 (illustrated content image IMc2-1 to IMC2-3) showing the content included in the recommendation information output from the content server 300-2 operated by 1 to IMc1-3) and "△△ Video". ) And the content image IMc3 (illustrated content images IMc3-1 to IMc3-3) indicating the content included in the recommendation information output from the content server 300-3 operated by "○○ Music". ..

エージェント機能部１３０は、第１通知画像ＩＭａ１が表示されたことに応じて、第１通知画像ＩＭａ１に含まれる一以上のコンテンツ画像ＩＭｃのうち、いずれかのコンテンツ画像ＩＭｃを選択する車両Ｍの乗員の操作（つまり、コンテンツを選択する操作）が受け付けられたか否かを判定する。エージェント機能部１３０は、操作が受け付けられた場合、レコメンド情報に基づいて、当該選択されたコンテンツ画像ＩＭｃに対応するコンテンツに係る情報（例えば、コンテンツＤＢ３６０におけるコンテンツの格納場所）を特定する。エージェント機能部１３０は、特定した格納場所に格納されたコンテンツを取得し、出力制御部１２０によって車両Ｍの乗員に提供させる。具体的には、音声制御部１２２は、エージェント機能部１３０によって取得されたコンテンツの音声をスピーカ２０によって出力させ、表示制御部１２４は、エージェント機能部１３０によって取得されたコンテンツの画像を表示・操作装置３０の表示装置に表示させる。コンテンツの提供に係る処理において、出力制御部１２０と、エージェント機能部１３０とを組み合わせたものが「提供部」の一例である。 The agent function unit 130 selects one of the one or more content image IMc included in the first notification image IMa1 in response to the display of the first notification image IMa1 by the occupant of the vehicle M. (That is, the operation of selecting the content) is determined. When the operation is accepted, the agent function unit 130 specifies information related to the content corresponding to the selected content image IMc (for example, a content storage location in the content DB 360) based on the recommendation information. The agent function unit 130 acquires the content stored in the specified storage location and causes the output control unit 120 to provide the content to the occupant of the vehicle M. Specifically, the voice control unit 122 causes the speaker 20 to output the voice of the content acquired by the agent function unit 130, and the display control unit 124 displays and operates an image of the content acquired by the agent function unit 130. It is displayed on the display device of the device 30. In the process related to the provision of contents, a combination of the output control unit 120 and the agent function unit 130 is an example of the “providing unit”.

また、エージェント機能部１３０は、通知画像ＩＭａが表示されたことに応じて、ボタンＢ１が選択する車両Ｍの乗員の操作が受け付けられた場合、コンテンツＤＢ３６０からコンテンツを取得せず、出力制御部１２０によってコンテンツを提供させない。 Further, when the operation of the occupant of the vehicle M selected by the button B1 is accepted in response to the display of the notification image IMa, the agent function unit 130 does not acquire the content from the content DB 360, and the output control unit 120 does not acquire the content. Does not allow content to be provided by.

［選択されたコンテンツサーバ３００からレコメンド情報を取得する場合］
なお、上述では、エージェントサーバ２００の対話管理部２２２が、ネットワークＮＷに接続される一以上のコンテンツサーバ３００のいずれにも周囲環境情報を出力し、一以上のコンテンツサーバ３００のそれぞれからレコメンド情報を取得する場合について説明したが、これに限られない。対話管理部２２２は、例えば、ネットワークＮＷに一以上のコンテンツサーバ３００が接続される場合、一以上のコンテンツサーバ３００から適当なコンテンツサーバ３００を（例えば、１つだけ）選択し、選択したコンテンツサーバ３００に対して周囲環境情報を出力するものであってもよい。 [When acquiring recommendation information from the selected content server 300]
In the above description, the dialogue management unit 222 of the agent server 200 outputs the surrounding environment information to any one or more content servers 300 connected to the network NW, and recommends information from each of the one or more content servers 300. The case of acquisition has been described, but the present invention is not limited to this. For example, when one or more content servers 300 are connected to the network NW, the dialogue management unit 222 selects an appropriate content server 300 (for example, only one) from the one or more content servers 300, and selects the selected content server. The surrounding environment information may be output to the 300.

この場合、対話管理部２２２は、適当なコンテンツサーバ３００として、車両Ｍの乗員の好みに合致するコンテンツを提供するコンテンツサーバ３００を選択してもよく、車両Ｍの乗員にコンテンツを提供した提供履歴の有るコンテンツサーバ３００を選択してもよく、予め車両Ｍの乗員がお気に入りとして登録しているコンテンツサーバ３００を選択してもよく、車両Ｍの乗員の発話内容に合致するコンテンツサーバ３００を選択してもよい。 In this case, the dialogue management unit 222 may select the content server 300 that provides the content that matches the preference of the occupant of the vehicle M as the appropriate content server 300, and the provision history that provides the content to the occupant of the vehicle M. The content server 300 may be selected, or the content server 300 registered in advance by the occupant of the vehicle M as a favorite may be selected, or the content server 300 matching the utterance content of the occupant of the vehicle M may be selected. You may.

車両Ｍの乗員の好みと合致するコンテンツサーバ３００を選択する場合、対話管理部２２２は、パーソナルプロファイル２５４を参照し、車両Ｍの乗員の好み（或いは、嗜好、傾向等）を特定し、特定した車両Ｍの乗員の好みに合致するコンテンツを多数提供しているコンテンツサーバ３００を選択する。また、車両Ｍの乗員にコンテンツを提供した提供履歴の有るコンテンツサーバ３００を選択する場合、記憶部２５０（或いは、記憶部１５０）には、コンテンツの提供を受けた車両Ｍの乗員を識別可能な情報と、コンテンツを提供したコンテンツサーバ３００を識別可能な情報とが互いに対応付けられた提供履歴情報が記憶され、対話管理部２２２は、提供履歴情報を参照して、コンテンツサーバ３００を選択する。また、予め車両Ｍの乗員がお気に入りとして登録しているコンテンツサーバ３００を選択する場合、記憶部２５０（或いは、記憶部１５０）には、車両Ｍの乗員を識別可能な情報と、当該乗員がお気に入りのコンテンツサーバ３００を識別可能な情報とが対に対応付けられたお気に入り情報が記憶され、対話管理部２２２は、お気に入り情報を参照して、コンテンツサーバ３００を選択する。予め車両Ｍの乗員がお気に入りとして登録しているコンテンツサーバ３００を示す情報や、パーソナルプロファイル２５４は、「嗜好情報」の一例である。 When selecting the content server 300 that matches the preference of the occupant of the vehicle M, the dialogue management unit 222 refers to the personal profile 254 and identifies and identifies the preference (or preference, tendency, etc.) of the occupant of the vehicle M. Select the content server 300 that provides a large number of contents that match the tastes of the occupants of the vehicle M. Further, when the content server 300 having the provision history of providing the content to the occupant of the vehicle M is selected, the storage unit 250 (or the storage unit 150) can identify the occupant of the vehicle M who has received the content. The provision history information in which the information and the information that can identify the content server 300 that provided the content are associated with each other is stored, and the dialogue management unit 222 selects the content server 300 with reference to the provision history information. Further, when the content server 300 registered as a favorite by the occupant of the vehicle M in advance is selected, the storage unit 250 (or the storage unit 150) contains information that can identify the occupant of the vehicle M and the occupant's favorite. The favorite information associated with the information that can identify the content server 300 of the above is stored, and the dialogue management unit 222 selects the content server 300 with reference to the favorite information. The information indicating the content server 300 registered as a favorite by the occupant of the vehicle M in advance and the personal profile 254 are examples of "preference information".

図６は、レコメンド情報の通知に用いられる第２通知画像ＩＭａ２場面の一例を示す図である。エージェント機能部１３０は、学習モデル３５２が格納された記憶部３５０がある１つのコンテンツサーバ３００から取得したレコメンド情報を受信し、出力制御部１２０によって車両Ｍの乗員に通知させる。図６の例では、第２通知画像ＩＭａ２には、「〇〇Ｖｉｄｅｏ」により運営されるコンテンツサーバ３００−１から出力されたレコメンド情報に含まれるコンテンツを示したコンテンツ画像ＩＭｃ１（図示するコンテンツ画像ＩＭｃ１−１〜ＩＭｃ１−６）のみが含まれる。 FIG. 6 is a diagram showing an example of a second notification image IMa2 scene used for notification of recommendation information. The agent function unit 130 receives the recommendation information acquired from one content server 300 having the storage unit 350 in which the learning model 352 is stored, and causes the output control unit 120 to notify the occupant of the vehicle M. In the example of FIG. 6, the second notification image IMa2 is a content image IMc1 (illustrated content image IMc1) indicating the content included in the recommendation information output from the content server 300-1 operated by "○○ Video". -1 to IMc1-6) are included.

［動作フロー］
図７〜図８は、実施形態に係るエージェントシステム１の一連の動作の一例を示すフローチャートである。本フローチャートの処理は、例えば、所定周期或いは所定のタイミングで繰り返し実行されてよい。また、図７〜図８の例では、エージェント装置１００により実行される処理のうち、主にウエイクアップワードによりエージェントを起動し、乗員の発話に含まれる要求に応答する処理について説明するものとし、エージェント装置１００のエージェント機能部１３０と協働して実行されるエージェントサーバ２００の処理も含めて説明するものとする。 [Operation flow]
7 to 8 are flowcharts showing an example of a series of operations of the agent system 1 according to the embodiment. The processing of this flowchart may be repeatedly executed, for example, at a predetermined cycle or a predetermined timing. Further, in the examples of FIGS. 7 to 8, among the processes executed by the agent device 100, the process of activating the agent mainly by the wakeup word and responding to the request included in the utterance of the occupant shall be described. The processing of the agent server 200 executed in cooperation with the agent function unit 130 of the agent device 100 will also be described.

まず、エージェントＷＵ判定部１１３は、ウエイクアップワードに対応付けられたエージェント（エージェント機能部１３０）を起動させる（ステップＳ１００）。次に、音響処理部１１１は、マイク１０等により音声を受け付けたか否かを判定する（ステップＳ１０２）。音声を受け付けた場合、音響処理部１１１は、受け付けた音声に対して発話内容を認識するのに適した状態になるように音響処理を行う（ステップＳ１０４）。音響処理された音声は、エージェント機能部１３０からエージェントサーバ２００に送信される。 First, the agent WU determination unit 113 activates the agent (agent function unit 130) associated with the wakeup word (step S100). Next, the sound processing unit 111 determines whether or not the voice is received by the microphone 10 or the like (step S102). When the voice is received, the sound processing unit 111 performs sound processing so that the received voice is in a state suitable for recognizing the utterance content (step S104). The sound-processed voice is transmitted from the agent function unit 130 to the agent server 200.

エージェントサーバ２００の音声認識部２２０は、受け付けた音声の認識を行い、音声をテキスト化する（ステップＳ１０６）。次に、自然言語処理部２２１は、テキスト化された文字情報に対する自然言語処理を実行し、文字情報の意味解釈を行う（ステップＳ１０８）。次に、自然言語処理部２２１は、意味解釈した車両Ｍの乗員の発話の内容が、コンテンツの提供を求める発話の内容であるか否かを判定する（ステップＳ１１０）。自然言語処理部２２１は、コンテンツの提供を求める発話の内容ではないと判定した場合、処理をステップＳ１１８に進める。自然言語処理部２２１は、コンテンツの提供を求める発話の内容であると判定した場合、対話管理部２２２に「コンテンツレコメンド制御」という機能種別の情報を送信し、ステップＳ１１２に進める。 The voice recognition unit 220 of the agent server 200 recognizes the received voice and converts the voice into text (step S106). Next, the natural language processing unit 221 executes natural language processing on the textualized character information and interprets the meaning of the character information (step S108). Next, the natural language processing unit 221 determines whether or not the content of the utterance of the occupant of the vehicle M whose meaning is interpreted is the content of the utterance requesting the provision of the content (step S110). When the natural language processing unit 221 determines that the content is not the content of the utterance requesting the provision of the content, the process proceeds to step S118. When the natural language processing unit 221 determines that the content is the content of the utterance requesting the provision of the content, the natural language processing unit 221 transmits information of the function type "content recommendation control" to the dialogue management unit 222, and proceeds to step S112.

対話管理部２２２は、エージェント装置１００の周囲環境情報取得部１１２によって取得された周囲環境情報を取得する（ステップＳ１１２）。対話管理部２２２は、取得した周囲環境情報をコンテンツサーバ３００に出力し、レコメンド情報を取得する（ステップＳ１１４）。コンテンツサーバ３００の取得部３２０は、通信部３１０によってエージェントサーバ２００から周囲環境情報を取得（受信）する。レコメンド情報取得部３２２は、取得部３２０によって取得された周囲環境情報を学習モデル３５２に入力し、出力としてレコメンド情報を取得する。レコメンド情報取得部３２２は、取得したレコメンド情報をエージェントサーバ２００に送信する。エージェント機能部１３０は、エージェントサーバ２００がコンテンツサーバ３００から取得したレコメンド情報を出力制御部１２０によって車両Ｍの乗員に通知させる（ステップＳ１１６）。具体的には、エージェント機能部１３０は、レコメンド情報を示す第１通知画像ＩＭａ１や、第２通知画像ＩＭａ２を表示・操作装置３０の表示装置に表示させる。 The dialogue management unit 222 acquires the ambient environment information acquired by the ambient environment information acquisition unit 112 of the agent device 100 (step S112). The dialogue management unit 222 outputs the acquired surrounding environment information to the content server 300 and acquires the recommendation information (step S114). The acquisition unit 320 of the content server 300 acquires (receives) ambient environment information from the agent server 200 by the communication unit 310. The recommendation information acquisition unit 322 inputs the ambient environment information acquired by the acquisition unit 320 into the learning model 352, and acquires the recommendation information as an output. The recommendation information acquisition unit 322 transmits the acquired recommendation information to the agent server 200. The agent function unit 130 causes the output control unit 120 to notify the occupant of the vehicle M of the recommendation information acquired by the agent server 200 from the content server 300 (step S116). Specifically, the agent function unit 130 causes the display device of the display / operation device 30 to display the first notification image IMa1 and the second notification image IMa2 indicating the recommendation information.

エージェントサーバ２００の応答内容生成部２２４は、対話管理部２２２により決定された発話の内容が車両Ｍの乗員に理解されるように、応答内容を生成する（ステップＳ１１８）。エージェント機能部１３０は、エージェントサーバ２００から取得した応答結果を出力部に出力させる（ステップＳ１２０）。具体的には、エージェント機能部１３０は、エージェントサーバ２００から取得した応答内容を、音声として音声制御部１２２にスピーカ２０から出力させ、画像として表示制御部１２４に表示・操作装置３０の表示装置に表示させる。この場合、応答内容は、「あなたにおすすめのコンテンツをご用意しました。」等のメッセージである。 The response content generation unit 224 of the agent server 200 generates the response content so that the occupant of the vehicle M can understand the content of the utterance determined by the dialogue management unit 222 (step S118). The agent function unit 130 causes the output unit to output the response result acquired from the agent server 200 (step S120). Specifically, the agent function unit 130 causes the voice control unit 122 to output the response content acquired from the agent server 200 as voice from the speaker 20, and displays it as an image on the display control unit 124 on the display device of the operation device 30. Display it. In this case, the response content is a message such as "We have prepared the recommended content for you."

次に、エージェント機能部１３０は、第１通知画像ＩＭａ１や、第２通知画像ＩＭａ２が表示されたことに応じて、表示・操作装置３０のタッチパネルによって車両Ｍの乗員の操作が受け付けられたか否かを判定する（ステップＳ１２４）。エージェント機能部１３０は、操作が受け付けられるまでの間、待機する。エージェント機能部１３０は、車両Ｍの乗員の操作が受け付けられたと判定した場合、第１通知画像ＩＭａ１や、第２通知画像ＩＭａ２に含まれる一以上のコンテンツ画像ＩＭｃのうち、いずれかのコンテンツ画像ＩＭｃを選択する車両Ｍの乗員の操作（つまり、コンテンツを選択する操作）が受け付けられたか否かを判定する（ステップＳ１２６）。エージェント機能部１３０は、操作が受け付けられるまでの間、待機する。エージェント機能部１３０は、コンテンツを選択する操作が受け付けられた場合、レコメンド情報に基づいて、コンテンツ画像ＩＭｃに対応するコンテンツをコンテンツサーバ３００から取得する（ステップＳ１２８）。エージェント機能部１３０は、取得したコンテンツを出力制御部１２０によって車両Ｍの乗員に提供させる（ステップＳ１３０）。エージェント機能部１３０は、コンテンツを選択する操作が受け付けられなかった場合（つまり、コンテンツの提供を受けないことを選択する操作が受け付けられた場合）、コンテンツサーバ３００からコンテンツを取得せず、出力制御部１２０にコンテンツを提供させない（ステップＳ１３２）。 Next, the agent function unit 130 determines whether or not the operation of the occupant of the vehicle M is accepted by the touch panel of the display / operation device 30 in response to the display of the first notification image IMa1 and the second notification image IMa2. Is determined (step S124). The agent function unit 130 waits until the operation is accepted. When the agent function unit 130 determines that the operation of the occupant of the vehicle M has been accepted, the agent function unit 130 determines that the content image IMc is one of the first notification image IMa1 and one or more content image IMc included in the second notification image IMa2. It is determined whether or not the operation of the occupant of the vehicle M for selecting the content (that is, the operation for selecting the content) has been accepted (step S126). The agent function unit 130 waits until the operation is accepted. When the operation of selecting the content is accepted, the agent function unit 130 acquires the content corresponding to the content image IMc from the content server 300 based on the recommendation information (step S128). The agent function unit 130 causes the occupant of the vehicle M to provide the acquired content by the output control unit 120 (step S130). When the operation for selecting the content is not accepted (that is, when the operation for selecting not to receive the content is accepted), the agent function unit 130 does not acquire the content from the content server 300 and controls the output. The content is not provided to the unit 120 (step S132).

ステップＳ１３０、又はＳ１３２の処理後、エージェント機能部１３０は、エージェントの処理を終了させるか否かを判定する（ステップＳ１３４）。エージェントを終了させないと判定された場合には、ステップＳ１０２の処理に戻る。また、エージェントを終了させると判定された場合、管理部１１０は、エージェントを終了させる（ステップＳ１３６）。エージェントを終了させる場合には、例えば、エージェントを終了させる終了ワードの音声を受け付けた場合、エージェントを終了させる車載スイッチが押された場合、マイク１０が音声を受け付けない状態が所定時間以上継続した場合等が含まれる。本フローチャートの処理は、終了する。 After the processing of step S130 or S132, the agent function unit 130 determines whether or not to terminate the processing of the agent (step S134). If it is determined that the agent is not terminated, the process returns to step S102. If it is determined to terminate the agent, the management unit 110 terminates the agent (step S136). When terminating the agent, for example, when the voice of the end word for terminating the agent is received, when the in-vehicle switch for terminating the agent is pressed, or when the state in which the microphone 10 does not accept the voice continues for a predetermined time or longer. Etc. are included. The processing of this flowchart ends.

［実施形態のまとめ］
ここで、車両Ｍの乗員の嗜好は、車両Ｍの内部における乗員の周囲環境や、車両Ｍが存在する場所に応じて転換する場合がある。例えば、車両Ｍの乗員は、車両Ｍが海辺に存在する場合には、海に関連するコンテンツの提供を受けたくなったり、車両Ｍが雨の降っている地域に存在する場合には、雨に関連するコンテンツの提供を受けたくなったりする場合がある。エージェント機能部１３０、及び出力制御部１２０は、周囲環境情報に基づいて学習された学習モデル３５２を用いてコンテンツサーバ３００により導出されたレコメンド情報を元にコンテンツのレコメンドを行う。このため、本実施形態のエージェントシステム１によれば、周囲環境情報に応じて得られたレコメンド情報を用いてコンテンツをレコメンドし、車両の乗員の周囲環境に応じて車両の乗員の好みのコンテンツを提供するようにできる。 [Summary of Embodiment]
Here, the preference of the occupant of the vehicle M may change depending on the surrounding environment of the occupant inside the vehicle M and the place where the vehicle M exists. For example, the occupant of the vehicle M may want to be provided with sea-related content when the vehicle M is located at the beach, or may be in the rain when the vehicle M is located in a rainy area. You may want to be provided with related content. The agent function unit 130 and the output control unit 120 recommend the content based on the recommendation information derived by the content server 300 using the learning model 352 learned based on the surrounding environment information. Therefore, according to the agent system 1 of the present embodiment, the content is recommended using the recommendation information obtained according to the surrounding environment information, and the favorite content of the vehicle occupant is selected according to the surrounding environment of the vehicle occupant. Can be provided.

［レコメンド情報を用いない通知例］
なお、上述では、エージェント機能部１３０が、自然言語処理部２２１から取得したレコメンド情報を用いて車両Ｍの乗員にコンテンツをレコメンドする場合について説明したが、これに限られない。エージェント機能部１３０は、例えば、履歴情報に基づいて、コンテンツを車両Ｍの乗員にレコメンドするコンテンツを選択してもよい。この場合、エージェント機能部１３０は、記憶部１５０に記憶される履歴情報１５２を参照し、レコメンドするコンテンツを決定する。 [Example of notification without recommendation information]
In the above description, the case where the agent function unit 130 recommends the content to the occupant of the vehicle M using the recommendation information acquired from the natural language processing unit 221 has been described, but the present invention is not limited to this. For example, the agent function unit 130 may select the content that recommends the content to the occupant of the vehicle M based on the history information. In this case, the agent function unit 130 refers to the history information 152 stored in the storage unit 150 and determines the content to be recommended.

図９は、履歴情報１５２の内容の一例を示す図である。履歴情報１５２は、例えば、車両Ｍの乗員がコンテンツの提供を受けた際に、又は受けない場合でも、周囲環境情報取得部１１２によって取得された周囲環境情報と、当該コンテンツを識別可能な情報と、当該コンテンツのコンテンツＤＢ３６０における格納場所を示す情報とが互いに対応付けられた情報である。エージェント機能部１３０は、レコメンド情報に基づいて車両Ｍの乗員にコンテンツが提供された場合、当該レコメンド情報に係る周囲環境情報と、当該コンテンツを識別可能な情報と、当該レコメンド情報に示される当該コンテンツの格納場所を示す情報とを互いに対応付けたレコードによって履歴情報１５２を生成（更新）し、記憶部１５０に記憶させる。 FIG. 9 is a diagram showing an example of the contents of the history information 152. The history information 152 includes, for example, the surrounding environment information acquired by the surrounding environment information acquisition unit 112 and information that can identify the content even when the occupant of the vehicle M receives or does not receive the content. , Information indicating a storage location of the content in the content DB 360 is information associated with each other. When the content is provided to the occupants of the vehicle M based on the recommendation information, the agent function unit 130 includes the surrounding environment information related to the recommendation information, the information that can identify the content, and the content shown in the recommendation information. The history information 152 is generated (updated) by the records in which the information indicating the storage location of the above is associated with each other, and is stored in the storage unit 150.

また、エージェント機能部１３０は、自然言語処理部２２１により解釈された車両Ｍの乗員の発話の内容が、コンテンツの提供を求める内容であると認識された場合、周囲環境情報取得部１１２によって取得された周囲環境情報を検索キーとして履歴情報１５２を検索し、検索キーの周囲環境情報と合致する（或いは、合致の程度が高い）周囲環境情報に対応付けられたコンテンツを選択する。そして、エージェント機能部１３０は、選択したコンテンツの提供を出力制御部１２０に通知させる。履歴情報１５２に基づいてコンテンツを選択する処理において、エージェント機能部１３０は、「選択部」の一例である。 Further, when the agent function unit 130 recognizes that the content of the utterance of the occupant of the vehicle M interpreted by the natural language processing unit 221 is the content requesting the provision of the content, the agent function unit 130 is acquired by the surrounding environment information acquisition unit 112. The history information 152 is searched using the surrounding environment information as a search key, and the content associated with the surrounding environment information that matches (or has a high degree of matching) with the surrounding environment information of the search key is selected. Then, the agent function unit 130 notifies the output control unit 120 of the provision of the selected content. In the process of selecting the content based on the history information 152, the agent function unit 130 is an example of the “selection unit”.

［レコメンド情報のフィルタリング］
また、エージェント機能部１３０は、コンテンツサーバ３００から取得したレコメンド情報に含まれる一以上のコンテンツについて、いずれも出力制御部１２０に通知させる場合について説明したが、これに限られない。エージェント機能部１３０は、コンテンツサーバ３００から取得したレコメンド情報に含まれる一以上のコンテンツのうち、いずれかのコンテンツを抽出して出力制御部１２０に通知させてもよい。この場合、エージェント機能部１３０は、上述した履歴情報１５２や、パーソナルプロファイル２５４、或いは車両Ｍの乗員の発話の内容に含まれるクエリ（問い合わせ語句）等に基づいて、レコメンド情報に含まれるコンテンツから、いずれかのコンテンツを抽出して（換言すると、車両Ｍの乗員の好みに合わないコンテンツを除外し）、出力制御部１２０に通知させる。例えば、車両Ｍの乗員の発話内容に「何か『音楽』をかけて」、「何か『映画』を再生して」等の『クエリ』が含まれる場合、エージェント機能部１３０は、レコメンド情報に含まれるコンテンツのうち、『音楽』や『映画』等の『クエリ』に合致するコンテンツを抽出して通知、及び提供する。 [Filtering recommendation information]
Further, the agent function unit 130 has described the case where the output control unit 120 is notified of one or more contents included in the recommendation information acquired from the content server 300, but the present invention is not limited to this. The agent function unit 130 may extract one of the one or more contents included in the recommendation information acquired from the content server 300 and notify the output control unit 120 of the content. In this case, the agent function unit 130 uses the content included in the recommendation information based on the history information 152 described above, the personal profile 254, the query (inquiry phrase) included in the content of the utterance of the occupant of the vehicle M, and the like. One of the contents is extracted (in other words, the content that does not suit the occupant's preference of the vehicle M is excluded), and the output control unit 120 is notified. For example, when the utterance content of the occupant of the vehicle M includes a "query" such as "play something" music "" or "play something" movie "", the agent function unit 130 recommends information. Of the contents included in, the contents that match the "query" such as "music" and "movie" are extracted, notified, and provided.

また、エージェント機能部１３０は、コンテンツサーバ３００から取得したレコメンド情報に含まれる一以上のコンテンツのうち、クエリ、及びクエリに類似するクエリのコンテンツを抽出して（換言すると、クエリ、及びクエリに類似するクエリのコンテンツ以外のコンテンツを除外した）、出力制御部１２０に通知させてもよい。この場合、エージェント機能部１３０は、クエリをエージェントサーバ２００に出力する。そして、自然言語処理部２２１は、記憶部２５０に記憶される類語辞書（不図示）に基づいて特定されたクエリに類似するクエリを特定する。類語辞書とは、単語と、当該単語の類語とが互いに対応付けられた辞書である。エージェント機能部１３０は、クエリと、自然言語処理部２２１によって特定された類似するクエリとに基づいて、レコメンド情報に含まれるコンテンツのうち、クエリ、又は類似するクエリに合致するコンテンツを抽出して通知、及び提供する。 Further, the agent function unit 130 extracts the query and the content of the query similar to the query from the one or more contents included in the recommendation information acquired from the content server 300 (in other words, the query and the query-like content). The output control unit 120 may be notified (excluding the contents other than the contents of the query to be executed). In this case, the agent function unit 130 outputs a query to the agent server 200. Then, the natural language processing unit 221 identifies a query similar to the query specified based on the thesaurus (not shown) stored in the storage unit 250. A thesaurus is a dictionary in which a word and a thesaurus of the word are associated with each other. The agent function unit 130 extracts and notifies the query or the content matching the similar query from the contents included in the recommendation information based on the query and the similar query specified by the natural language processing unit 221. , And provide.

なお、エージェント機能部１３０がレコメンド情報に含まれるコンテンツを除外する構成に代えて、学習モデル３５２が車両Ｍの乗員の発話の内容に含まれるクエリが入力されるものであってもよい。例えば、車両Ｍの乗員の発話の内容が「『歌手Ａ』の曲をかけて」等のクエリを含むものである場合、エージェント機能部１３０は、周囲環境情報と、クエリを示す情報とをコンテンツサーバ３００に出力する。レコメンド情報取得部３２２は、学習モデル３５２に周囲環境情報とクエリとを入力し、出力されたレコメンド情報を取得する。この時、学習モデル３５２は、入力データとして周囲環境情報とクエリとが入力され、レコメンド情報を出力データとした学習済みモデルである。 Instead of the configuration in which the agent function unit 130 excludes the content included in the recommendation information, the learning model 352 may input a query included in the utterance content of the occupant of the vehicle M. For example, when the content of the utterance of the occupant of the vehicle M includes a query such as "play the song of" singer A "", the agent function unit 130 provides the surrounding environment information and the information indicating the query to the content server 300. Output to. The recommendation information acquisition unit 322 inputs the surrounding environment information and the query into the learning model 352, and acquires the output recommendation information. At this time, the learning model 352 is a learned model in which the surrounding environment information and the query are input as input data and the recommendation information is used as output data.

この場合、学習モデル３５２の入力層には、周囲環境情報に含まれる乗員の発話時の日時、曜日、車両Ｍが存在する地点の天気、気候、温度、湿度、車両Ｍの周辺に存在するＰＯＩ情報、車両Ｍの乗員の総人数、乗員の年代毎の人数、性別毎の人数、乗員の状況を示す情報と、クエリを示す情報（及び、クエリと合致するエンティティを示す情報）が入力される。出力層からは、レコメンド情報が出力される。学習モデル３５２の学習方法は、上述した方法と同様であるため、説明を省略する。 In this case, the input layer of the learning model 352 includes the date and time when the occupant speaks, the day of the day, the weather at the point where the vehicle M exists, the climate, the temperature, the humidity, and the POI existing around the vehicle M included in the surrounding environment information. Information, the total number of occupants of the vehicle M, the number of occupants by age, the number of occupants by gender, the information indicating the occupant status, and the information indicating the query (and the information indicating the entity matching the query) are input. .. Recommendation information is output from the output layer. Since the learning method of the learning model 352 is the same as the method described above, the description thereof will be omitted.

［エージェント装置１００とエージェントサーバ２００とを合わせた構成］
なお、上述では、エージェント装置１００と、エージェントサーバ２００とが別体によって構成される場合について説明したが、これに限られない。エージェント装置１００と、エージェントサーバ２００とは一体に構成されていてもよい。図１０は、エージェントサーバ２００の機能を備えるエージェント装置１００Ａの構成の一例を示す図である。エージェント装置１００Ａは、エージェント装置１００が備えるエージェント機能部１３０に代えて（或いは、加えて）、エージェント機能部１３０Ａを備える。エージェント機能部１３０Ａは、例えば、音声認識部２２０と、自然言語処理部２２１と、対話管理部２２２と、ネットワーク検索部２２３と、応答内容生成部２２４とをその機能部として備える。これらの機能部が実行する処理は、上述した処理と同様であるため、説明を省略する。また、エージェント装置１００Ａは、エージェント装置１００が備える記憶部１５０に代えて（或いは、加えて）、記憶部１５０Ａを備える。記憶部１５０Ａには、例えば、エージェント装置１００Ａにおいて実行されるプログラムの他、履歴情報１５２と、辞書ＤＢ２５２（機能辞書２５２Ａ、汎用辞書２５２Ｂ、及び別称辞書２５２Ｃを含む）と、パーソナルプロファイル２５４と、知識ベースＤＢ２５６と、応答規則ＤＢ２５８との情報が記憶される。エージェント装置１００Ａによれば、ネットワークを介した通信を行わずとも、上述した処理を実行することができる。 [Configuration of Agent Device 100 and Agent Server 200]
In the above description, the case where the agent device 100 and the agent server 200 are separately configured has been described, but the present invention is not limited to this. The agent device 100 and the agent server 200 may be integrally configured. FIG. 10 is a diagram showing an example of the configuration of the agent device 100A having the function of the agent server 200. The agent device 100A includes an agent function unit 130A in place of (or in addition to) the agent function unit 130 included in the agent device 100. The agent function unit 130A includes, for example, a voice recognition unit 220, a natural language processing unit 221, a dialogue management unit 222, a network search unit 223, and a response content generation unit 224 as its functional units. Since the processing executed by these functional units is the same as the processing described above, the description thereof will be omitted. Further, the agent device 100A includes a storage unit 150A in place of (or in addition to) the storage unit 150 included in the agent device 100. In the storage unit 150A, for example, in addition to the program executed in the agent device 100A, the history information 152, the dictionary DB 252 (including the functional dictionary 252A, the general-purpose dictionary 252B, and the alias dictionary 252C), the personal profile 254, and knowledge. Information on the base DB 256 and the response rule DB 258 is stored. According to the agent device 100A, the above-described processing can be executed without performing communication via the network.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１…エージェントシステム、１０…マイク、２０…スピーカ、３０…表示・操作装置、４０…車載通信装置、１００…エージェント装置、１１０…管理部、１１１…音響処理部、１１２…周囲環境情報取得部、１１３…エージェントＷＵ判定部、１１４…通信制御部、１２０…出力制御部、１２２…音声制御部、１２４…表示制御部、１３０…エージェント機能部、１４０…車載通信部、１５０…記憶部、１５２…履歴情報、２００…エージェントサーバ、２１０…通信部、２２０…音声認識部、２２１…自然言語処理部、２２２…対話管理部、２２３…ネットワーク検索部、２２４…応答内容生成部、２５０…記憶部、２５２Ａ…機能辞書、２５２Ｂ…汎用辞書、２５２…辞書ＤＢ、２５６…知識ベースＤＢ、２５８…応答規則ＤＢ、２５４…パーソナルプロファイル、３００、３００−１、３００−２、３００−３…コンテンツサーバ、３１０…通信部、３２０…取得部、３２２…レコメンド情報取得部、３５０…記憶部、３５２…学習モデル、３６０…コンテンツＤＢ、ＩＭａ…通知画像、ＩＭａ１…第１通知画像、ＩＭａ２…第２通知画像 1 ... Agent system, 10 ... Microphone, 20 ... Speaker, 30 ... Display / operation device, 40 ... In-vehicle communication device, 100 ... Agent device, 110 ... Management unit, 111 ... Sound processing unit, 112 ... Surrounding environment information acquisition unit, 113 ... Agent WU determination unit, 114 ... Communication control unit, 120 ... Output control unit, 122 ... Voice control unit, 124 ... Display control unit, 130 ... Agent function unit, 140 ... In-vehicle communication unit, 150 ... Storage unit, 152 ... History information, 200 ... agent server, 210 ... communication unit, 220 ... voice recognition unit, 221 ... natural language processing unit, 222 ... dialogue management unit, 223 ... network search unit, 224 ... response content generation unit, 250 ... storage unit, 252A ... Functional dictionary, 252B ... General-purpose dictionary, 252 ... Dictionary DB, 256 ... Knowledge base DB, 258 ... Response rule DB, 254 ... Personal profile, 300, 300-1, 300-2, 300-3 ... Content server, 310 ... communication unit, 320 ... acquisition unit, 322 ... recommendation information acquisition unit, 350 ... storage unit, 352 ... learning model, 360 ... content DB, IMa ... notification image, IMa1 ... first notification image, IMa2 ... second notification image

Claims

The first acquisition unit that acquires voice data indicating the voice spoken by the occupants on the moving object, and
An utterance content interpretation unit that interprets the utterance content of the occupant based on the voice data,
The second acquisition unit that acquires the surrounding environment information related to the surrounding environment of the occupant, and
With a third acquisition unit that outputs the content of the utterance and the surrounding environment information to one or more recommendation systems that recommend content to the occupant, and acquires recommendation information indicating one or more content from each of the recommendation systems. ,
A notification control unit that causes an output unit included in the mobile body to notify a plurality of the recommendation information for each recommendation system acquired by the third acquisition unit.
When the operation of selecting one of the recommended information notified by the output unit is accepted by the operation unit included in the moving body, the content corresponding to the selected recommended information is sent to the output unit. The provider to output and
Notification system with.

The notification control unit uses the output unit to output content that matches the occupant's preference based on the recommendation information and the preference information indicating the occupant's preference among one or more of the contents corresponding to the recommendation information. Notify me
The notification system according to claim 1.

A generation unit that generates history information in which the recommendation information notified by the output unit and the surrounding environment information related to the recommendation information are associated with each other.
A selection unit for newly selecting content for recommending to the occupant based on the ambient environment information newly acquired by the second acquisition unit and the history information generated by the generation unit is further provided.
The notification control unit causes the output unit to newly notify the content selected by the selection unit.
The notification system according to claim 1 or 2.

When the operation unit does not select one or more of the contents notified by the output unit, the providing unit does not provide the output unit with one or more of the contents indicated by the recommendation information. ,
The generation unit generates historical information indicating that the output unit did not provide the content.
The notification system according to claim 3.

The surrounding environment information includes information that targets the environment around the moving body and information that targets the environment around the occupant.
The information targeting the environment around the moving body includes the weather, climate, temperature or humidity of the point where the moving body exists, or POI information of the point existing around the moving body.
The information targeting the environment around the occupant includes the date and time or day of the week at the time of utterance, the number of the occupant by age group, the number of persons by gender or the total number of persons, or the situation of the occupant.
The notification system according to any one of claims 1 to 4.

The third acquisition unit acquires the recommendation information in which the recommendation system recommends the content based on a query similar to the content of the utterance.
The notification system according to any one of claims 1 to 5.

One or more of the recommendation systems are systems operated by different operators.
The notification system according to any one of claims 1 to 6.

The first acquisition unit that acquires voice data indicating the voice spoken by the occupants on the moving object, and
An utterance content interpretation unit that interprets the utterance content of the occupant based on the voice data,
The second acquisition unit that acquires the surrounding environment information related to the surrounding environment of the occupant, and
With a third acquisition unit that outputs the content of the utterance and the surrounding environment information to one or more recommendation systems that recommend content to the occupant, and acquires recommendation information indicating one or more content from each of the recommendation systems. ,
A notification control unit that causes an output unit included in the mobile body to notify a plurality of the recommendation information for each recommendation system acquired by the third acquisition unit.
When the operation of selecting one of the recommended information among the recommended information notified by the output unit is accepted by the operation unit included in the moving body, the content corresponding to the selected recommended information is displayed. A providing unit that outputs to the output unit and
Notification control device.

Runs by one or more computers,
The first acquisition process to acquire voice data indicating the voice spoken by the occupants on the moving object, and
An utterance content interpretation process that interprets the utterance content of the occupant based on the voice data,
The second acquisition process for acquiring the surrounding environment information related to the occupant's surrounding environment, and
An output step that outputs the content of the utterance and the surrounding environment information to one or more recommendation systems that recommend the content to the occupant, and an acquisition step that acquires recommendation information indicating one or more contents from each of the recommendation systems. And the third acquisition process, including
A notification process for causing the output unit included in the mobile body to notify a plurality of the recommendation information for each recommendation system acquired by the third acquisition process.
When the operation of selecting one of the recommended information among the recommended information notified by the output unit is accepted by the operation unit included in the moving body, the content corresponding to the selected recommended information is displayed. The output process to be output to the output unit and
Notification control method having.

A program to be installed on one or more computers
The first acquisition process to acquire voice data indicating the voice spoken by the occupants on the moving object, and
An utterance content interpretation process that interprets the utterance content of the occupant based on the voice data,
The second acquisition process for acquiring the surrounding environment information related to the occupant's surrounding environment, and
An output step that outputs the content of the utterance and the surrounding environment information to one or more recommendation systems that recommend the content to the occupant, and an acquisition step that acquires recommendation information indicating one or more contents from each of the recommendation systems. And the third acquisition process, including
A notification process for causing the output unit included in the mobile body to notify a plurality of the recommendation information for each recommendation system acquired by the third acquisition process.
When the operation of selecting one of the recommended information among the recommended information notified by the output unit is accepted by the operation unit included in the moving body, the content corresponding to the selected recommended information is displayed. The output process to be output to the output unit and
A notification control program that causes the computer to execute.