JP7178983B2

JP7178983B2 - Agent device, agent method and program

Info

Publication number: JP7178983B2
Application number: JP2019219255A
Authority: JP
Inventors: 幸治石井; 昌宏暮橋
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2022-11-28
Anticipated expiration: 2039-12-04
Also published as: CN112908320A; CN112908320B; JP2021089360A

Description

本発明は、エージェント装置、エージェント方法、及びプログラムに関する。 The present invention relates to agent devices, agent methods, and programs.

近年、操作者が手操作により操作対象の機器に対する指示等を入力することに代えて、操作者が発話し、発話に含まれる指示等を音声認識させることにより、音声により簡便に入力操作をできるようにする技術が知られている（例えば、特許文献１参照）。 In recent years, instead of manually inputting instructions to the device to be operated by the operator, the operator speaks and recognizes the instructions included in the speech, making it possible to easily perform input operations by voice. A technique for doing so is known (see, for example, Patent Document 1).

特開２００１－１４７１３４号公報Japanese Patent Application Laid-Open No. 2001-147134

ここで、操作者は、発話によって複数の指示を行う場合がある。しかしながら、従来の技術では、発話によって複数の指示が行われた場合、どの指示を優先的に実行するかを決定することが困難であった。 Here, the operator may give a plurality of instructions by speaking. However, with the conventional technology, it is difficult to determine which instruction should be executed with priority when a plurality of instructions are given by utterance.

本発明の態様は、このような事情を考慮してなされたものであり、発話によってされた複数の指示を適切な順序によって実行することができるエージェント装置、エージェント方法、及びプログラムを提供することを目的の一つとする。 Aspects of the present invention have been made in consideration of such circumstances, and aim to provide an agent device, an agent method, and a program capable of executing a plurality of uttered instructions in an appropriate order. one of the purposes.

この発明に係るエージェント装置、エージェント方法、及びプログラムは、以下の構成を採用した。
（１）この発明の一態様のエージェント装置は、利用者が発話した音声を示すデータを取得する取得部と、前記取得部により取得された前記データに基づいて、前記利用者の発話内容を認識する音声認識部と、前記発話内容に含まれる指示を特定する特定部と、前記特定部により特定された指示に応答する情報を、表示部を含む情報出力装置に出力させる出力制御部と、前記特定部により複数の前記指示が特定された場合、前記特定された複数の前記指示が同時に実行することが可能であるか否かを判定する判定部と、前記判定部により複数の前記指示が同時に実行することが不可能であると判定された場合、前記発話内容に含まれる複数の前記指示に対して、前記表示部により表示されている内容との関連性に基づいて優先度を付加する優先度付加部と、を備え、前記出力制御部は、前記特定部により複数の前記指示が特定された場合、前記優先度付加部に付加された前記優先度の高い前記指示から順に、前記指示に応じた情報を前記情報出力装置に出力させるものである。 The agent device, agent method, and program according to the present invention employ the following configurations.
(1) An agent device according to one aspect of the present invention includes an acquisition unit that acquires data representing voice uttered by a user, and recognizes the content of the user's utterance based on the data acquired by the acquisition unit. a speech recognition unit for specifying an instruction included in the utterance content; an output control unit for outputting information responding to the instruction specified by the specifying unit to an information output device including a display unit; a determining unit that determines whether or not the specified multiple instructions can be executed simultaneously when the multiple instructions are specified by the specifying unit; priority for adding a priority to the plurality of instructions included in the utterance content based on relevance to the content displayed by the display unit when it is determined that execution is impossible and a priority addition unit, wherein, when the identification unit identifies a plurality of the instructions, the output control unit adds the instructions to the instructions in descending order of the priority added to the priority addition unit. The corresponding information is output to the information output device.

（２）の態様は、上記（１）の態様に係るエージェント装置において、前記判定部は、同時に実行することが不可能な指示のリスト情報を参照して、前記特定された複数の前記指示が同時に実行することが可能であるか否かを判定するものである。 Aspect (2) is the agent device according to aspect (1) above, wherein the determination unit refers to list information of instructions that cannot be executed simultaneously, and determines whether the specified plurality of instructions are It determines whether or not they can be executed simultaneously.

（３）の態様は、上記（１）または（２）の態様に係るエージェント装置において、前記出力制御部は、前記判定部により複数の前記指示が同時に実行することが可能であると判定された場合、複数の前記指示に応答する情報を、前記情報出力装置に出力させるものである。 Aspect (3) is the agent device according to aspect (1) or (2) above, wherein the output control unit is determined by the determination unit to be capable of simultaneously executing a plurality of instructions. In this case, the information output device is caused to output information responding to a plurality of the instructions.

（４）この発明の他の態様のエージェント装置は、利用者が発話した音声を示すデータを取得する取得部と、前記取得部により取得された前記データに基づいて、前記利用者の発話内容を認識する音声認識部と、前記発話内容に含まれる指示を特定する特定部と、前記特定部により特定された指示に応答する情報を、表示部を含む情報出力装置に出力させる出力制御部と、前記発話内容に含まれる複数の前記指示に対して、前記表示部により表示されている内容との関連性に基づいて優先度を付加する優先度付加部と、を備え、前記出力制御部は、前記特定部により複数の前記指示が特定された場合、前記優先度付加部に付加された前記優先度の高い前記指示から順に、前記指示に応じた情報を前記情報出力装置に出力させるものである。 (4) An agent device according to another aspect of the present invention includes an acquisition unit that acquires data representing a voice uttered by a user, and acquires content of the user's utterance based on the data acquired by the acquisition unit. a speech recognition unit for recognition, an identification unit for identifying an instruction included in the utterance content, an output control unit for outputting information responding to the instruction identified by the identification unit to an information output device including a display unit; a priority adding unit that adds priority to the plurality of instructions included in the utterance content based on relevance to the content displayed by the display unit, wherein the output control unit comprises: When a plurality of the instructions are specified by the specifying unit, the information output device is caused to output information corresponding to the instructions in order of the high priority added to the priority adding unit. .

（５）の態様は、上記（１）から（４）のいずれかの態様に係るエージェント装置において、前記優先度付加部は、前記発話内容に含まれる複数の前記指示が、いずれも前記表示部により表示されている内容に関連しない場合、又はいずれも前記表示部により表示されている内容に関連する場合、複数の前記指示が発話された順序に基づいて、前記優先度を付加するものである。 Aspect (5) is the agent device according to any one of aspects (1) to (4) above, wherein the priority adding unit is configured to display the plurality of instructions included in the utterance content on the display unit. or if both are related to the content displayed by the display unit, the priority is added based on the order in which the plurality of instructions are spoken. .

（６）の態様は、上記（１）から（５）のいずれかの態様に係るエージェント装置において、前記優先度付加部は、前記発話内容に含まれる複数の前記指示が、いずれも前記表示部により表示されている内容に関連しない場合、又はいずれも前記表示部により表示されている内容に関連する場合、複数の前記指示を接続する接続詞に基づいて、前記優先度を付加するものである。 Aspect (6) is the agent device according to any one of aspects (1) to (5) above, wherein the priority adding unit is configured to display the plurality of instructions included in the utterance content on the display unit. If it is not related to the content displayed by the display unit, or if both are related to the content displayed by the display unit, the priority is added based on the conjunction connecting the multiple instructions.

（７）の態様は、上記（１）から（６）のいずれかの態様に係るエージェント装置において、前記優先度付加部は、前記発話内容に含まれる複数の前記指示が、いずれも前記表示部により表示されている内容に関連しない場合、又はいずれも前記表示部により表示されている内容に関連する場合、前記発話内容に含まれる前記指示の順序を示す語句に基づいて、前記優先度を付加するものである。 Aspect (7) is the agent device according to any one of aspects (1) to (6) above, wherein the priority adding unit is configured to display the plurality of instructions included in the utterance content on the display unit. If it is not related to the content displayed by the display unit, or if both are related to the content displayed by the display unit, the priority is added based on the words and phrases that indicate the order of the instructions included in the utterance content. It is something to do.

（８）の態様は、上記（１）から（７）のいずれかの態様に係るエージェント装置において、前記優先度付加部は、前記発話内容に含まれる複数の前記指示が、いずれも前記表示部により表示されている内容に関連しない場合、又はいずれも前記表示部により表示されている内容に関連する場合、前記発話内容に含まれる前記指示のタイミングを示す語句に基づいて、前記優先度を付加するものである。 An aspect of (8) is the agent device according to any one of aspects (1) to (7) above, wherein the priority adding unit is configured to display the plurality of instructions included in the utterance content on the display unit. If it is not related to the content displayed by the display unit, or if both are related to the content displayed by the display unit, the priority is added based on the phrase indicating the timing of the instruction included in the utterance content. It is something to do.

（９）この発明の他の態様のエージェント方法は、コンピュータが、利用者が発話した音声を示すデータを取得し、取得された前記データに基づいて、前記利用者の発話内容を認識し、前記発話内容に含まれる指示を特定し、特定された指示に応答する情報を、表示部を含む情報出力装置に出力させ、複数の前記指示が特定された場合、前記特定された複数の前記指示が同時に実行することが可能であるか否かを判定し、複数の前記指示が同時に実行することが不可能であると判定された場合、前記発話内容に含まれる複数の前記指示に対して、前記表示部により表示されている内容との関連性に基づいて優先度を付加し、複数の前記指示が特定された場合、付加された前記優先度の高い前記指示から順に、前記指示に応じた情報を情報出力装置に出力するものである。 (9) An agent method according to another aspect of the present invention is such that a computer acquires data indicating a voice uttered by a user, recognizes the content of the user's utterance based on the acquired data, specifying instructions included in the utterance content, causing an information output device including a display unit to output information responding to the specified instructions, and when a plurality of the instructions are specified, the specified instructions It is determined whether or not a plurality of instructions can be executed simultaneously, and if it is determined that a plurality of instructions cannot be executed simultaneously, the plurality of instructions included in the utterance content are A priority is added based on the relevance to the content displayed by the display unit, and when a plurality of the instructions are specified, the information corresponding to the instruction is displayed in order from the instruction with the highest priority added. is output to the information output device.

（１０）この発明の他の態様のプログラムは、コンピュータに、利用者が発話した音声を示すデータを取得させ、取得された前記データに基づいて、前記利用者の発話内容を認識させ、前記発話内容に含まれる指示を特定させ、特定された指示に応答する情報を、表示部を含む情報出力装置に出力させ、複数の前記指示が特定された場合、前記特定された複数の前記指示が同時に実行することが可能であるか否かを判定させ、複数の前記指示が同時に実行することが不可能であると判定された場合、前記発話内容に含まれる複数の前記指示に対して、前記表示部により表示されている内容との関連性に基づいて優先度を付加させ、複数の前記指示が特定された場合、付加された前記優先度の高い前記指示から順に、前記指示に応じた情報を情報出力装置に出力させるものである。 (10) A program according to another aspect of the present invention causes a computer to acquire data indicating a voice uttered by a user, recognizes the contents of the user's utterance based on the acquired data, specifying an instruction included in the content, outputting information responding to the specified instruction to an information output device including a display unit, and when a plurality of the specified instructions are specified, the specified instructions are simultaneously output. If it is determined that a plurality of instructions cannot be executed at the same time, the display is performed for the plurality of instructions included in the utterance content. Priorities are added based on the relevance to the content displayed by the unit, and when a plurality of the instructions are specified, the information corresponding to the instructions is displayed in order from the instructions with the highest priority added. This is to be output to an information output device.

（１）～（１０）の態様によれば、発話によってされた複数の指示を適切な順序によって実行することができる。 According to aspects (1) to (10), a plurality of uttered instructions can be executed in an appropriate order.

（２）の態様によれば、より適切な順序によって指示を実行することができる。 According to aspect (2), instructions can be executed in a more appropriate order.

（５）～（８）の態様によれば、より精度良く実行する指示の順序を決定することができる。 According to aspects (5) to (8), the order of instructions to be executed can be determined with higher accuracy.

実施形態に係るエージェントシステム１の構成の一例を示す図である。It is a figure showing an example of composition of agent system 1 concerning an embodiment. 実施形態に係るエージェント装置１００の構成の一例を示す図である。1 is a diagram showing an example of the configuration of an agent device 100 according to an embodiment; FIG. 運転席から見た車室内の一例を示す図である。It is a figure which shows an example in the vehicle interior seen from the driver's seat. 車両Ｍを上から見た車室内の一例を示す図である。It is a figure which shows an example in the vehicle interior which looked at the vehicle M from above. リスト情報１５４の内容の一例を示す図である。4 is a diagram showing an example of contents of list information 154. FIG. 実施形態に係るサーバ装置２００の構成の一例を示す図である。It is a figure which shows an example of a structure of the server apparatus 200 which concerns on embodiment. 回答情報２３２の内容の一例を示す図である。It is a figure which shows an example of the content of the reply information 232. FIG. 指示に優先度を付加する場面の一例を示す図である。FIG. 10 is a diagram showing an example of a scene in which priority is added to instructions; 付加された優先度に基づいて、情報出力装置に情報を出力させる場面の一例を示す図である。It is a figure which shows an example of the scene which makes an information output device output information based on the added priority. 実施形態に係るエージェント装置１００の一連の処理の流れを示すフローチャートである。4 is a flow chart showing a series of processes of the agent device 100 according to the embodiment; 実施形態に係るサーバ装置２００の一例の処理の流れを示すフローチャートである。4 is a flow chart showing an example of the flow of processing of the server device 200 according to the embodiment. 変形例に係るエージェント装置１００Ａの一例を示す図である。FIG. 10 is a diagram showing an example of an agent device 100A according to a modified example;

以下、図面を参照し、本発明のエージェント装置、エージェント方法、及びプログラムの実施形態について説明する。 Embodiments of an agent device, an agent method, and a program according to the present invention will be described below with reference to the drawings.

＜実施形態＞
［システム構成］
図１は、実施形態に係るエージェントシステム１の構成の一例を示す図である。実施形態に係るエージェントシステム１は、例えば、車両Ｍに搭載されるエージェント装置１００と、車両Ｍ外に存在するサーバ装置２００とを備える。車両Ｍは、例えば、二輪や三輪、四輪等の車両である。これらの車両の駆動源は、ディーゼルエンジンやガソリンエンジン等の内燃機関、電動機、或いはこれらの組み合わせであってよい。電動機は、内燃機関に連結された発電機による発電電力、或いは二次電池や燃料電池の放電電力を使用して動作する。 <Embodiment>
[System configuration]
FIG. 1 is a diagram showing an example of the configuration of an agent system 1 according to an embodiment. The agent system 1 according to the embodiment includes, for example, an agent device 100 mounted on a vehicle M and a server device 200 existing outside the vehicle M. FIG. The vehicle M is, for example, a two-wheeled, three-wheeled, or four-wheeled vehicle. The drive source of these vehicles may be an internal combustion engine such as a diesel engine or a gasoline engine, an electric motor, or a combination thereof. The electric motor operates using electric power generated by a generator connected to the internal combustion engine, or electric power discharged from a secondary battery or a fuel cell.

エージェント装置１００とサーバ装置２００とは、ネットワークＮＷを介して通信可能に接続される。ネットワークＮＷは、ＬＡＮ（Local Area Network）やＷＡＮ（Wide Area Network）等が含まれる。ネットワークＮＷには、例えば、Ｗｉ－ＦｉやＢｌｕｅｔｏｏｔｈ（登録商標、以下省略）等無線通信を利用したネットワークが含まれてよい。 Agent device 100 and server device 200 are communicably connected via network NW. The network NW includes a LAN (Local Area Network), a WAN (Wide Area Network), and the like. The network NW may include, for example, a network using wireless communication such as Wi-Fi and Bluetooth (registered trademark, hereinafter omitted).

エージェントシステム１は、複数のエージェント装置１００および複数のサーバ装置２００により構成されてもよい。以降は、エージェントシステム１が一つのエージェント装置１００と、一つのサーバ装置２００とを備える場合について説明する。 The agent system 1 may be composed of multiple agent devices 100 and multiple server devices 200 . Hereinafter, a case where the agent system 1 includes one agent device 100 and one server device 200 will be described.

エージェント装置１００は、エージェント機能を用いて車両Ｍの乗員からの音声を取得し、取得した音声をサーバ装置２００に送信する。また、エージェント装置１００は、サーバ装置から得られるデータ（以下、エージェントデータ）等に基づいて、乗員と対話したり、画像や映像等の情報を提供したり、車両Ｍに搭載される車載機器ＶＥや他の装置を制御したりする。 The agent device 100 uses the agent function to acquire voices from the occupants of the vehicle M, and transmits the acquired voices to the server device 200 . Also, the agent device 100 interacts with the occupant, provides information such as images and videos, and controls the in-vehicle equipment VE mounted on the vehicle M based on data (hereinafter referred to as agent data) obtained from the server device. or control other devices.

サーバ装置２００は、車両Ｍに搭載されたエージェント装置１００と通信し、エージェント装置１００から各種データを取得する。サーバ装置２００は、取得したデータに基づいて車両Ｍの乗員に対する応答として適したエージェントデータを生成し、生成したエージェントデータをエージェント装置１００に提供する。 The server device 200 communicates with the agent device 100 mounted on the vehicle M and acquires various data from the agent device 100 . The server device 200 generates agent data suitable as a response to the occupants of the vehicle M based on the acquired data, and provides the generated agent data to the agent device 100 .

［エージェント装置の構成］
図２は、実施形態に係るエージェント装置１００の構成の一例を示す図である。実施形態に係るエージェント装置１００は、例えば、通信部１０２と、マイク（マイクロフォン）１０６と、スピーカ１０８と、表示部１１０と、制御部１２０と、記憶部１５０とを備える。これらの装置や機器は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続されてよい。なお、図２に示すエージェント装置１００の構成はあくまでも一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 [Agent device configuration]
FIG. 2 is a diagram showing an example of the configuration of the agent device 100 according to the embodiment. The agent device 100 according to the embodiment includes, for example, a communication unit 102, a microphone 106, a speaker 108, a display unit 110, a control unit 120, and a storage unit 150. These devices and devices may be connected to each other by multiplex communication lines such as CAN (Controller Area Network) communication lines, serial communication lines, wireless communication networks, and the like. The configuration of the agent device 100 shown in FIG. 2 is merely an example, and part of the configuration may be omitted, or another configuration may be added.

通信部１０２は、ＮＩＣ（Network Interface controller）等の通信インターフェースを含む。通信部１０２は、ネットワークＮＷを介してサーバ装置２００等と通信する。 The communication unit 102 includes a communication interface such as a NIC (Network Interface controller). The communication unit 102 communicates with the server device 200 and the like via the network NW.

マイク１０６は、車室内の音声を電気信号化し収音する音声入力装置である。マイク１０６は、収音した音声のデータ（以下、音声データ）を制御部１２０に出力する。例えば、マイク１０６は、乗員が車室内のシートに着座したときの前方付近に設置される。例えば、マイク１０６は、マットランプ、ステアリングホイール、インストルメントパネル、またはシートの付近に設置される。マイク１０６は、車室内に複数設置されていてもよい。 A microphone 106 is a voice input device that converts voice in the vehicle into an electric signal and picks up the voice. The microphone 106 outputs data of collected sound (hereinafter referred to as sound data) to the control unit 120 . For example, the microphone 106 is installed near the front when the passenger sits on the seat inside the vehicle. For example, the microphone 106 is placed near a mat lamp, steering wheel, instrument panel, or seat. A plurality of microphones 106 may be installed in the vehicle interior.

スピーカ１０８は、例えば、車室内のシート付近または表示部１１０付近に設置される。スピーカ１０８は、制御部１２０により出力される情報に基づいて音声を出力する。 The speaker 108 is installed, for example, near the seat in the vehicle compartment or near the display unit 110 . Speaker 108 outputs sound based on the information output by control unit 120 .

表示部１１０は、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electroluminescence）ディスプレイ等の表示装置を含む。表示部１１０は、制御部１２０により出力される情報に基づいて画像を表示する。スピーカ１０８と、表示部１１０とを組み合わせたものは、「情報出力装置」の一例である。 Display unit 110 includes a display device such as an LCD (Liquid Crystal Display) or an organic EL (Electroluminescence) display. Display unit 110 displays an image based on information output from control unit 120 . A combination of the speaker 108 and the display unit 110 is an example of an "information output device."

図３は、運転席から見た車室内の一例を示す図である。図示の例の車室内には、マイク１０６Ａ～１０６Ｃと、スピーカ１０８Ａ～１０８Ｃと、表示部１１０Ａ～１１０Ｃとが設置される。マイク１０６Ａは、例えば、ステアリングホイールに設けられ、主に運転者が発話した音声を収音する。マイク１０６Ｂは、例えば、助手席正面のインストルメントパネル（ダッシュボードまたはガーニッシュ）ＩＰに設けられ、主に助手席の乗員が発話した音声を収音する。マイク１０６Ｃは、例えば、インストルメントパネルの中央（運転席と助手席との間）付近に設置される。 FIG. 3 is a diagram showing an example of the interior of the vehicle viewed from the driver's seat. Microphones 106A to 106C, speakers 108A to 108C, and display units 110A to 110C are installed in the vehicle interior of the illustrated example. The microphone 106A is provided, for example, on the steering wheel, and mainly picks up the voice uttered by the driver. The microphone 106B is provided, for example, on the instrument panel (dashboard or garnish) IP in front of the front passenger seat, and mainly picks up the voices spoken by the occupant in the front passenger seat. The microphone 106C is installed, for example, near the center of the instrument panel (between the driver's seat and the passenger's seat).

スピーカ１０８Ａは、例えば、運転席側のドアの下部に設置され、スピーカ１０８Ｂは、例えば、助手席側のドアの下部に設置され、スピーカ１０８Ｃは、例えば、表示部１１０Ｃの付近、つまり、インストルメントパネルＩＰの中央付近に設置される。 The speaker 108A is installed, for example, under the door on the driver's seat side, the speaker 108B is installed, for example, under the door on the passenger seat side, and the speaker 108C is installed, for example, near the display unit 110C, that is, the instrument It is installed near the center of the panel IP.

表示部１１０Ａは、例えば運転者が車外を視認する際の視線の先に虚像を表示させるＨＵＤ（Head-Up Display）装置である。ＨＵＤ装置は、例えば、車両Ｍのフロントウインドシールド、或いはコンバイナーと呼ばれる光の透過性を有する透明な部材に光を投光することで、乗員に虚像を視認させる装置である。乗員は、主に運転者であるが、運転者以外の乗員であってもよい。 The display unit 110A is, for example, a HUD (Head-Up Display) device that displays a virtual image ahead of the driver's line of sight when viewing the outside of the vehicle. The HUD device is, for example, a device that allows an occupant to visually recognize a virtual image by projecting light onto the front windshield of the vehicle M or a transparent member having light transmittance called a combiner. The occupant is mainly the driver, but may be an occupant other than the driver.

表示部１１０Ｂは、運転席（ステアリングホイールに最も近い座席）の正面付近のインストルメントパネルＩＰに設けられ、乗員がステアリングホイールの間隙から、或いはステアリングホイール越しに視認可能な位置に設置される。表示部１１０Ｂは、例えば、ＬＣＤや有機ＥＬ表示装置等である。表示部１１０Ｂには、例えば、車両Ｍの速度、エンジン回転数、燃料残量、ラジエータ水温、走行距離、その他の情報の画像が表示される。 The display unit 110B is provided on the instrument panel IP near the front of the driver's seat (the seat closest to the steering wheel), and is installed at a position where the passenger can view it through the gap between the steering wheels or through the steering wheel. The display unit 110B is, for example, an LCD or an organic EL display device. The display unit 110B displays, for example, the speed of the vehicle M, the engine speed, the remaining amount of fuel, the radiator water temperature, the travel distance, and other information images.

表示部１１０Ｃは、インストルメントパネルＩＰの中央付近に設置される。表示部１１０Ｃは、例えば、表示部１１０Ｂと同様に、ＬＣＤや有機ＥＬ表示装置等である。表示部１１０Ｃは、テレビ番組や映画等のコンテンツを表示する。 The display unit 110C is installed near the center of the instrument panel IP. The display unit 110C is, for example, an LCD, an organic EL display device, or the like, like the display unit 110B. The display unit 110C displays contents such as TV programs and movies.

なお、車両Ｍには、更に、後部座席付近にマイクとスピーカが設けられてよい。図４は、車両Ｍを上から見た車室内の一例を示す図である。車室内には、図３で例示したマイクスピーカに加えて、更に、マイク１０６Ｄ、１０６Ｅと、スピーカ１０８Ｄ、１０８Ｅとが設置されてよい。 The vehicle M may be further provided with a microphone and a speaker near the rear seats. FIG. 4 is a diagram showing an example of the interior of the vehicle M viewed from above. In addition to the microphone speakers illustrated in FIG. 3, microphones 106D and 106E and speakers 108D and 108E may be installed in the vehicle interior.

マイク１０６Ｄは、例えば、助手席ＳＴ２の後方に設置された後部座席ＳＴ３の付近（例えば、助手席ＳＴ２の後面）に設けられ、主に、後部座席ＳＴ３に着座する乗員が発話した音声を収音する。マイク１０６Ｅは、例えば、運転席ＳＴ１の後方に設置された後部座席ＳＴ４の付近（例えば、運転席ＳＴ１の後面）に設けられ、主に、後部座席ＳＴ４に着座する乗員が発話した音声を収音する。 The microphone 106D is provided, for example, in the vicinity of the rear seat ST3 installed behind the passenger seat ST2 (for example, the rear surface of the passenger seat ST2), and mainly picks up the voices spoken by the occupant seated on the rear seat ST3. do. The microphone 106E is provided, for example, in the vicinity of the rear seat ST4 installed behind the driver's seat ST1 (for example, behind the driver's seat ST1), and mainly picks up the voices spoken by the passengers seated in the rear seat ST4. do.

スピーカ１０８Ｄは、例えば、後部座席ＳＴ３側のドアの下部に設置され、スピーカ１０８Ｅは、例えば、後部座席ＳＴ４側のドアの下部に設置される。 The speaker 108D is installed, for example, under the door on the rear seat ST3 side, and the speaker 108E is installed, for example, under the door on the rear seat ST4 side.

なお、図１に例示した車両Ｍは、図３または図４に例示するように、乗員である運転手が操作可能なステアリングホイールを備える車両であるものとして説明したがこれに限られない。例えば、車両Ｍは、ルーフがない、すなわち車室がない（またはその明確な区分けがない）車両であってもよい。 Although the vehicle M illustrated in FIG. 1 has been described as a vehicle having a steering wheel that can be operated by a driver who is a passenger, as illustrated in FIG. 3 or 4, the vehicle M is not limited to this. For example, the vehicle M may be a vehicle without a roof, ie without a passenger compartment (or without a clear division thereof).

また、図３または図４の例では、車両Ｍを運転操作する運転手が座る運転席と、その他の運転操作をしない乗員が座る助手席や後部座席とが一つの室内にあるものとして説明しているがこれに限られない。例えば、車両Ｍは、ステアリングホイールに代えて、ステアリングハンドルを備えた鞍乗り型自動二輪車両であってもよい。 Further, in the example of FIG. 3 or FIG. 4, it is assumed that the driver's seat where the driver who operates the vehicle M sits, and the passenger's seat and the rear seats where the other passengers who do not operate the vehicle M sit are in one room. but not limited to this. For example, the vehicle M may be a saddle type motorcycle having a steering handle instead of the steering wheel.

また、図３または図４の例では、車両Ｍが、ステアリングホイールを備える車両であるものとして説明しているがこれに限られない。例えば、車両Ｍは、ステアリングホイールのような運転操作機器が設けられていない自動運転車両であってもよい。自動運転車両とは、例えば、乗員の操作に依らずに車両の操舵または加減速のうち一方または双方を制御して運転制御を実行することである。 Further, in the example of FIG. 3 or 4, the vehicle M is described as being a vehicle having a steering wheel, but the vehicle M is not limited to this. For example, the vehicle M may be an automatically driven vehicle that is not provided with a driving operation device such as a steering wheel. An autonomously driven vehicle is, for example, one that controls one or both of steering and acceleration/deceleration of the vehicle to execute driving control without depending on the operation of the occupant.

図２の説明に戻り、制御部１２０は、例えば、取得部１２１と、音声合成部１２２と、通信制御部１２３と、特定部１２４と、判定部１２５と、優先度付加部１２６と、出力制御部１２７とを備える。これらの構成要素は、例えば、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等のプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予め記憶部１５０（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることで記憶部１５０にインストールされてもよい。 Returning to the description of FIG. 2, the control unit 120 includes, for example, an acquisition unit 121, a speech synthesis unit 122, a communication control unit 123, an identification unit 124, a determination unit 125, a priority addition unit 126, and an output control unit. and a portion 127 . These components are realized by executing programs (software) by processors such as CPUs (Central Processing Units) and GPUs (Graphics Processing Units). Some or all of these components are implemented by hardware (including circuitry) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), and FPGA (Field-Programmable Gate Array). It may be realized by cooperation of software and hardware. The program may be stored in advance in the storage unit 150 (a storage device having a non-transitory storage medium), or may be a removable storage medium such as a DVD or CD-ROM (non-transitory storage medium). , and may be installed in the storage unit 150 by loading the storage medium into the drive device.

記憶部１５０は、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＯＭ（Read Only Memory）、またはＲＡＭ（Random Access Memory）等により実現される。記憶部１５０には、例えば、プロセッサによって参照されるプログラム等と、車載機器情報１５２と、リスト情報１５４とが格納される。車載機器情報１５２は、車両Ｍに搭載されている車載機器ＶＥの一覧を示す情報である。 The storage unit 150 is implemented by an HDD, flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), ROM (Read Only Memory), RAM (Random Access Memory), or the like. The storage unit 150 stores, for example, programs referred to by the processor, in-vehicle device information 152, and list information 154. FIG. The in-vehicle equipment information 152 is information indicating a list of in-vehicle equipment VE installed in the vehicle M. FIG.

図５は、リスト情報１５４の内容の一例を示す図である。リスト情報１５４は、例えば、同時に実行することが不可能な指示を示す情報である。この指示とは、例えば、乗員が発話した音声に含まれる指示であって、車載機器ＶＥの動作に係る指示である。図５に示すリスト情報１５４には、同時に実行することが不可能な指示として、「同一の制御対象に対する指示」と、「目的地を２つ指定する指示」と、「同時に制御できない車載機器ＶＥ１と、車載機器ＶＥ２とに対する指示」と、「同時に制御できない車載機器ＶＥ３と、車載機器ＶＥ４に対する指示」とが含まれる。 FIG. 5 is a diagram showing an example of the contents of the list information 154. As shown in FIG. The list information 154 is, for example, information indicating instructions that cannot be executed simultaneously. This instruction is, for example, an instruction included in the voice uttered by the passenger, and is an instruction related to the operation of the vehicle-mounted device VE. In the list information 154 shown in FIG. 5, as instructions that cannot be executed simultaneously, "an instruction to the same controlled object", "an instruction to designate two destinations", and "an in-vehicle device VE1 that cannot be controlled simultaneously". and the on-vehicle device VE2”, and “an instruction to the on-vehicle device VE3 that cannot be controlled simultaneously and the on-vehicle device VE4”.

図２の説明に戻り、取得部１２１は、マイク１０６から音声データや、他の情報を取得する。 Returning to the description of FIG. 2 , the acquisition unit 121 acquires audio data and other information from the microphone 106 .

音声合成部１２２は、通信部１０２がサーバ装置２００から受信したエージェントデータに音声制御内容が含まれる場合に、音声制御として発話によって音声指示された音声データに対応する、人工的な合成音声を生成する。以下、音声合成部１２２が生成する人工的な合成音声を、エージェント音声とも記載する。 When the agent data received by the communication unit 102 from the server device 200 includes voice control content, the voice synthesis unit 122 generates artificial synthesized voice corresponding to the voice data instructed by an utterance as voice control. do. The artificial synthesized speech generated by the speech synthesizing unit 122 is hereinafter also referred to as agent speech.

通信制御部１２３は、取得部１２１によって取得された音声データを通信部１０２によってサーバ装置２００に送信させる。通信制御部１２３は、サーバ装置２００から送信されたエージェントデータを通信部１０２によって受信させる。 The communication control unit 123 causes the communication unit 102 to transmit the voice data acquired by the acquisition unit 121 to the server device 200 . The communication control unit 123 causes the communication unit 102 to receive the agent data transmitted from the server device 200 .

特定部１２４は、エージェントデータに含まれる車載機器ＶＥに対する指示を特定する。特定部１２４は、例えば、エージェントデータに含まれる車載機器ＶＥを特定するため、車載機器情報１５２に含まれる車載機器ＶＥのそれぞれを検索キーとして、エージェントデータを検索する。特定部１２４は、検索の結果、エージェントデータに含まれることを特定した一つ又は複数の車載機器ＶＥを、指示対象の車載機器ＶＥとして特定する。エージェントデータは、例えば、「発話内容」の一例である。 The identifying unit 124 identifies instructions to the vehicle-mounted device VE included in the agent data. For example, in order to identify the on-vehicle device VE included in the agent data, the identifying unit 124 searches for the agent data using each on-vehicle device VE included in the on-vehicle device information 152 as a search key. The identifying unit 124 identifies one or a plurality of vehicle-mounted devices VE identified as being included in the agent data as the vehicle-mounted device VE to be instructed. The agent data is, for example, an example of "utterance contents".

判定部１２５は、特定部１２４によって特定された指示対象の車載機器ＶＥが複数である場合、リスト情報１５４に基づいて、これらの複数の指示が、同時に実行することが可能であるか否かを判定する。 When there are a plurality of vehicle-mounted devices VE to be instructed identified by the identifying unit 124, the determining unit 125 determines whether or not the plurality of instructions can be executed simultaneously based on the list information 154. judge.

なお、判定部１２５は、明らかに同時に実行できない指示については、リスト情報１５４を用いずに複数の指示が同時に実行することが可能であるか否かを判定してもよい。明らかに同時に実行できない指示とは、例えば、同一の指示対象に対する相反する指示等である。具体的には、「オーディオの音量を上げて（指示Ａ）、音量を下げて（指示Ｂ）」等や、「車両Ｍを停止させて（指示Ａ）、車両Ｍの速度を上げて（指示Ｂ）」等の指示である。 Note that the determination unit 125 may determine whether a plurality of instructions can be executed simultaneously without using the list information 154 for instructions that cannot be executed simultaneously. Instructions that cannot be executed simultaneously are, for example, contradictory instructions to the same referent. Specifically, "Increase the volume of the audio (instruction A), lower the volume (instruction B)" or the like, or "Stop the vehicle M (instruction A), increase the speed of the vehicle M (instruction B)” or the like.

優先度付加部１２６は、判定部１２５によってエージェントデータに含まれる複数の指示を同時に実行することが不可能であると判定された場合、エージェントデータに含まれる複数の指示に対して、優先度を付加する。優先度付加部１２６が優先度を付加する処理の詳細については、後述する。 If the determination unit 125 determines that the multiple instructions included in the agent data cannot be executed simultaneously, the priority addition unit 126 assigns priority to the multiple instructions included in the agent data. Append. The details of the process of adding priority by the priority adding unit 126 will be described later.

出力制御部１２７は、エージェントデータに含まれる指示に応じて、音声合成部１２２によってエージェント音声が生成されると、そのエージェント音声をスピーカ１０８に出力させる。また、出力制御部１２７は、エージェントデータに含まれる指示に応じて、画像データを表示部１１０に表示させる。また、出力制御部１２７は、音声データの認識結果（フレーズ等のテキストデータ）の画像を表示部１１０に表示させてもよい。 The output control unit 127 causes the speaker 108 to output the agent voice when the voice synthesizing unit 122 generates the agent voice according to the instruction included in the agent data. In addition, the output control unit 127 causes the display unit 110 to display image data according to instructions included in the agent data. In addition, the output control unit 127 may cause the display unit 110 to display an image of the speech data recognition result (text data such as phrases).

ここで、出力制御部１２７は、優先度付加部１２６によって複数の指示のそれぞれに優先度が付加されている場合、複数の指示のうち、付加された優先度の高い指示から順に、指示に応じたエージェント音声をスピーカ１０８に出力させたり、指示された画像データを表示部１１０に表示させたりする。 Here, when a priority is added to each of the plurality of instructions by the priority adding section 126, the output control section 127 sequentially selects an instruction having a higher priority among the plurality of instructions according to the instruction. The voice of the agent received is output from the speaker 108, and the designated image data is displayed on the display unit 110. FIG.

［サーバ装置の構成］
図６は、実施形態に係るサーバ装置２００の構成の一例を示す図である。実施形態に係るサーバ装置２００は、例えば、通信部２０２と、制御部２１０と、記憶部２３０とを備える。 [Configuration of server device]
FIG. 6 is a diagram showing an example of the configuration of the server device 200 according to the embodiment. The server device 200 according to the embodiment includes, for example, a communication unit 202, a control unit 210, and a storage unit 230.

通信部２０２は、ＮＩＣ等の通信インターフェースを含む。通信部２０２は、ネットワークＮＷを介して各車両Ｍに搭載されたエージェント装置１００等と通信する。 The communication unit 202 includes a communication interface such as NIC. The communication unit 202 communicates with the agent device 100 or the like mounted on each vehicle M via the network NW.

制御部２１０は、例えば、取得部２１１と、発話区間抽出部２１２と、音声認識部２１３と、エージェントデータ生成部２１４と、通信制御部２１５とを備える。これらの構成要素は、例えば、ＣＰＵやＧＰＵ等のプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予め記憶部２３０（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることで記憶部２３０にインストールされてもよい。 The control unit 210 includes, for example, an acquisition unit 211 , an utterance segment extraction unit 212 , a voice recognition unit 213 , an agent data generation unit 214 and a communication control unit 215 . These components are implemented by executing a program (software) by a processor such as a CPU or GPU. Some or all of these components may be implemented by hardware (including circuitry) such as LSI, ASIC, and FPGA, or may be implemented by cooperation of software and hardware. good too. The program may be stored in advance in the storage unit 230 (a storage device having a non-transitory storage medium), or may be a removable storage medium (non-transitory storage medium) such as a DVD or CD-ROM. , and may be installed in the storage unit 230 by loading the storage medium into the drive device.

記憶部２３０は、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ、ＲＯＭ、またはＲＡＭ等により実現される。記憶部２３０には、例えば、プロセッサによって参照されるプログラムのほかに、回答情報２３２等が格納される。 Storage unit 230 is implemented by an HDD, flash memory, EEPROM, ROM, RAM, or the like. The storage unit 230 stores, for example, a program referred to by the processor as well as answer information 232 and the like.

図７は、回答情報２３２の内容の一例を示す図である。回答情報２３２には、例えば、意味情報に、制御部１２０に実行させる制御内容が対応付けられている。意味情報とは、例えば、音声認識部２１３により発話内容全体から認識される意味である。制御内容には、例えば、車載機器ＶＥに対する指示（制御）に関する車載機器制御や、エージェント音声を出力する音声制御、表示部１１０に表示させる画像制御等が含まれる。例えば、回答情報２３２では、「エアコンの起動」という意味情報に対して、「エアコンを起動させる」車載機器制御と、「エアコンを起動しました」という音声制御と、車室内温度及び設定温度を表示する表示制御とが対応付けられている。 FIG. 7 is a diagram showing an example of the content of the reply information 232. As shown in FIG. In the answer information 232, for example, semantic information is associated with control contents to be executed by the control unit 120. FIG. The semantic information is, for example, the meaning recognized by the speech recognition unit 213 from the entire utterance content. The contents of control include, for example, vehicle-mounted device control related to instructions (control) to vehicle-mounted device VE, voice control for outputting agent voice, image control for display on display unit 110, and the like. For example, in the response information 232, for the semantic information of "starting up the air conditioner", on-vehicle device control of "starting up the air conditioner", voice control of "starting up the air conditioner", vehicle interior temperature, and set temperature are displayed. is associated with display control.

図６に戻り、取得部２１１は、通信部２０２によってエージェント装置１００から送信された、音声データを取得する。 Returning to FIG. 6 , the acquisition unit 211 acquires voice data transmitted from the agent device 100 by the communication unit 202 .

発話区間抽出部２１２は、取得部１２１によって取得された音声データから、乗員が発話している期間（以下、発話区間と称する）を抽出する。例えば、発話区間抽出部２１２は、零交差法を利用して、音声データに含まれる音声信号の振幅に基づいて発話区間を抽出してよい。また、発話区間抽出部２１２は、混合ガウス分布モデル（ＧＭＭ；Gaussian mixture model）に基づいて、音声データから発話区間を抽出してもよいし、発話区間特有の音声信号をテンプレート化したデータベースとテンプレートマッチング処理を行うことで、音声データから発話区間を抽出してもよい。 The speech period extraction unit 212 extracts a period during which the passenger speaks (hereinafter referred to as a speech period) from the voice data acquired by the acquisition unit 121 . For example, the speech segment extraction unit 212 may use the zero-crossing method to extract the speech segment based on the amplitude of the audio signal included in the audio data. In addition, the utterance segment extraction unit 212 may extract utterance segments from the speech data based on a Gaussian mixture model (GMM), or may extract a speech segment from the speech data using a templated speech signal specific to the utterance segment. A speech segment may be extracted from the audio data by performing matching processing.

音声認識部２１３は、発話区間抽出部２１２によって抽出された発話区間ごとに音声データを認識し、認識した音声データをテキスト化することで、発話内容を含むテキストデータを生成する。例えば、音声認識部２１３は、発話区間の音声信号を、低周波数や高周波数等の複数の周波数帯に分離し、分類した各音声信号をフーリエ変換することで、スペクトログラムを生成する。音声認識部２１３は、生成したスペクトログラムを、再帰的ニューラルネットワークに入力することで、スペクトログラムから文字列を得る。再帰的ニューラルネットワークは、例えば、学習用の音声から生成したスペクトログラムに対して、その学習用の音声に対応した既知の文字列が教師ラベルとして対応付けられた教師データを利用することで、予め学習されていてよい。そして、音声認識部２１３は、再帰的ニューラルネットワークから得た文字列のデータを、テキストデータとして出力する。 The speech recognition unit 213 recognizes speech data for each speech period extracted by the speech period extraction unit 212, and converts the recognized speech data into text, thereby generating text data including speech content. For example, the speech recognition unit 213 separates the speech signal of the speech period into a plurality of frequency bands such as low frequency and high frequency, and Fourier transforms each classified speech signal to generate a spectrogram. The speech recognition unit 213 obtains a character string from the spectrogram by inputting the generated spectrogram to a recursive neural network. A recursive neural network, for example, learns in advance by using teacher data in which known character strings corresponding to learning speech are associated as teacher labels for spectrograms generated from learning speech. It can be. Then, the speech recognition unit 213 outputs the character string data obtained from the recursive neural network as text data.

また、音声認識部２１３は、自然言語のテキストデータの構文解析を行って、テキストデータを形態素に分け、各形態素からテキストデータに含まれる文言の意味を認識する。 The speech recognition unit 213 also parses the natural language text data, divides the text data into morphemes, and recognizes the meaning of the sentences included in the text data from each morpheme.

エージェントデータ生成部２１４は、音声認識部２１３により認識された発話内容の意味に基づいて、回答情報２３２の意味情報を参照し、合致する意味情報に対応付けられた制御内容を取得する。なお、認識結果として、「エアコンをつけて」、「エアコンの電源を入れてください」等の意味が認識された場合、エージェントデータ生成部２１４は、上述の意味を標準文字情報「エアコンの起動」等に置き換える。これにより、指示の発話内容に発話の揺らぎがあった場合にも指示にあった制御内容を取得し易くすることができる。 The agent data generation unit 214 refers to the semantic information of the answer information 232 based on the meaning of the utterance content recognized by the speech recognition unit 213, and acquires the control content associated with the matching semantic information. Note that when meanings such as "turn on the air conditioner" and "turn on the power of the air conditioner" are recognized as a result of recognition, the agent data generation unit 214 converts the above meanings into the standard character information "start up the air conditioner". etc. As a result, it is possible to easily acquire the control content matching the instruction even when there is fluctuation in the utterance content of the instruction.

また、エージェントデータ生成部２１４は、取得した制御内容（例えば、車載機器制御、音声制御、または表示制御のうち少なくとも一つ）に対応する処理を実行させるためのエージェントデータを生成する。 In addition, the agent data generation unit 214 generates agent data for executing processing corresponding to the acquired control content (for example, at least one of vehicle-mounted device control, voice control, and display control).

通信制御部２１５は、エージェントデータ生成部２１４により生成されたエージェントデータを、通信部２０２によってエージェント装置１００に送信させる。これにより、エージェント装置１００は、制御部１２０によって、エージェントデータに対応する制御が実行することができる。 The communication control unit 215 causes the communication unit 202 to transmit the agent data generated by the agent data generation unit 214 to the agent device 100 . Thereby, the agent device 100 can execute control corresponding to the agent data by the control unit 120 .

［優先度を付加する処理について］
以下、図８と図９とを用いて、優先度付加部１２６の優先度を付加する処理の詳細について説明する。図８は、指示に優先度を付加する場面の一例を示す図である。図９は、付加された優先度に基づいて、情報出力装置に情報を出力させる場面の一例を示す図である。 [Regarding the process of adding priority]
Details of the priority adding process of the priority adding unit 126 will be described below with reference to FIGS. 8 and 9. FIG. FIG. 8 is a diagram showing an example of a scene in which priority is added to instructions. FIG. 9 is a diagram showing an example of a scene where the information output device is caused to output information based on the added priority.

図８において、乗員は、目的地を設定することを指示する発話ＳＰ１として、「動物園に行って」と発話している。出力制御部１２７は、発話ＳＰ１に応じて生成されたエージェントデータに基づいて、ナビゲーション装置に「動物園」までの移動経路を特定させ、ナビゲーション装置により特定させた移動経路を示す画像ＩＭ１を表示部１１０に表示させる。更に、乗員は、目的地を設定することを指示する発話ＳＰ２として、「コンビニエンスストアに行って」と発話している。 In FIG. 8, the passenger utters "go to the zoo" as the utterance SP1 instructing to set the destination. Based on the agent data generated in response to the utterance SP1, the output control unit 127 causes the navigation device to identify the moving route to the "zoo", and the display unit 110 displays an image IM1 indicating the moving route identified by the navigation device. to display. Furthermore, the passenger utters "Go to the convenience store" as the utterance SP2 instructing to set the destination.

この場合、特定部１２４は、発話ＳＰ１と、発話ＳＰ２とには、車載機器ＶＥであるナビゲーション装置に対する指示であって、目的地を設定する複数の指示が含まれていると認識する。この場合、判定部１２５は、特定部１２４によって特定された車載機器ＶＥに対する指示が複数であるため、リスト情報１５４に基づいてこれらの複数の指示が、同時に実行することが可能であるか否かを判定する。上述したように、リスト情報１５４には、同時に実行することが不可能な指示として、「同一の制御対象に対する指示」が含まれている。このため、判定部１２５は、特定部１２４によって特定された車載機器ＶＥに対する複数の指示が、同時に実行することが不可能であると判定する。判定部１２５によって車載機器ＶＥに対する複数の指示が、同時に実行することが不可能であると判定された場合、優先度付加部１２６は、複数の指示のそれぞれに優先度を付加する。 In this case, the specifying unit 124 recognizes that the utterance SP1 and the utterance SP2 are instructions to the navigation device, which is the in-vehicle device VE, and include a plurality of instructions for setting a destination. In this case, since there are a plurality of instructions for the in-vehicle device VE identified by the identification unit 124, the determination unit 125 determines whether or not these instructions can be executed simultaneously based on the list information 154. judge. As described above, the list information 154 includes "instructions for the same controlled object" as instructions that cannot be executed simultaneously. Therefore, the determination unit 125 determines that it is impossible to simultaneously execute a plurality of instructions to the vehicle-mounted device VE identified by the identification unit 124 . When determination unit 125 determines that multiple instructions to vehicle-mounted device VE cannot be executed simultaneously, priority addition unit 126 adds a priority to each of the multiple instructions.

［（１）表示部に表示される内容に基づく優先度の付加］
優先度付加部１２６は、例えば、乗員の発話（発話ＳＰ１～ＳＰ２）がされた場面において、表示部１１０に表示されている内容との関連性に基づいて、優先度を付加する。上述したように、発話ＳＰ１が発話された場面において、表示部１１０には、「動物園」までの移動経路を示す画像ＩＭ１が表示されている。このため、発話ＳＰ１が示す指示と、発話ＳＰ２が示す指示とでは、発話ＳＰ１が示す指示の方が、表示部１１０の内容（この場合、「動物園」までの経路）との関連性が高い。したがって、優先度付加部１２６は、発話ＳＰ１に含まれる指示に高い優先度を付加し、発話ＳＰ２に含まれる指示には、発話ＳＰ１に含まれる指示よりも低い優先度を付加する。 [(1) Addition of priority based on content displayed on display]
The priority adding unit 126 adds a priority based on the relevance to the content displayed on the display unit 110, for example, when the passenger speaks (utterances SP1 to SP2). As described above, in the scene where the utterance SP1 is uttered, the display unit 110 displays the image IM1 showing the moving route to the "zoo". Therefore, between the instruction indicated by the utterance SP1 and the instruction indicated by the utterance SP2, the instruction indicated by the utterance SP1 is more relevant to the content of the display unit 110 (in this case, the route to "zoo"). Therefore, the priority adding unit 126 adds a higher priority to the instruction included in the utterance SP1, and a lower priority to the instruction included in the utterance SP2 than the instruction included in the utterance SP1.

優先度付加部１２６は、例えば、発話ＳＰ１に係るエージェントデータと発話ＳＰ２に係るエージェントデータとに基づいて、発話ＳＰ１に係るエージェントデータに含まれる指示と、発話ＳＰ２に係るエージェントデータに含まれる指示とをそれぞれ特定する。優先度付加部１２６は、出力制御部１２７の制御履歴を示す情報、或いは車載機器ＶＥ（この場合、ナビゲーション装置）の制御状態に係る情報を取得し、表示部１１０に表示されている内容を特定する。そして、優先度付加部１２６は、特定した各種情報に基づいて、発話ＳＰ１に係る指示と、発話ＳＰ２に係る指示とのうち、表示部１１０に表示されている内容との関連性の高い指示を特定し、優先度を付加する。 For example, based on the agent data related to the utterance SP1 and the agent data related to the utterance SP2, the priority adding unit 126 combines the instructions included in the agent data related to the utterance SP1 and the instructions included in the agent data related to the utterance SP2. identify each. The priority addition unit 126 acquires information indicating the control history of the output control unit 127 or information related to the control state of the vehicle-mounted device VE (in this case, the navigation device), and identifies the content displayed on the display unit 110. do. Then, based on the specified various information, the priority adding unit 126 selects an instruction that is highly related to the content displayed on the display unit 110, out of the instruction related to the utterance SP1 and the instruction related to the utterance SP2. Identify and prioritize.

出力制御部１２７は、優先度付加部１２６によって付加された優先度に基づいて、優先度の高い指示から順に、指示に応じた制御を実行する。また、出力制御部１２７は、優先度付加部１２６によって付加された優先度に基づいて、優先度の高い指示から順に、指示に応じた情報を情報出力装置に出力させる。 Based on the priorities added by the priority adding unit 126, the output control unit 127 executes control according to the instructions in descending order of priority. Based on the priority added by the priority adding unit 126, the output control unit 127 causes the information output device to output information corresponding to the instructions in descending order of priority.

図９において、出力制御部１２７は、ナビゲーション装置に、発話ＳＰ１に応じて生成されたエージェントデータに基づいて、「動物園」までの移動経路を特定させつつ、発話ＳＰ２に応じて生成されたエージェントデータに基づいて、「動物園」までの移動経路の途中に存在する「コンビニエンスストア」を経由地点として特定させる。そして、出力制御部１２７は、「動物園」までの移動経路を示しつつ、「動物園」までの移動経路の途中に存在する「コンビニエンスストア」を経由地点として示す画像ＩＭ２を表示部１１０に表示さる。出力制御部１２７は、音声合成部１２２により生成された「動物園」までの移動経路に関するエージェント音声ＳＤ１をスピーカ１０８に出力させた後、音声合成部１２２により生成された「コンビニエンスストア」に関するエージェント音声ＳＤ２をスピーカ１０８に出力させる。エージェント音声ＳＤ１は、例えば、「目的地を動物園に設定しました。経路をご案内します。」等の音声であり、エージェント音声ＳＤ２は、例えば、「動物園までの経路上のコンビニエンスストアを経由地点に追加しました。」等の音声である。 In FIG. 9, the output control unit 127 causes the navigation device to identify the moving route to the "zoo" based on the agent data generated in response to the speech SP1, and the agent data generated in response to the speech SP2. , the "convenience store" existing on the way to the "zoo" is specified as a waypoint. Then, the output control unit 127 causes the display unit 110 to display an image IM2 showing the travel route to the "zoo" and showing the "convenience store" existing on the travel route to the "zoo" as a waypoint. The output control unit 127 causes the speaker 108 to output the agent voice SD1 related to the travel route to “zoo” generated by the voice synthesis unit 122, and then outputs the agent voice SD2 related to “convenience store” generated by the voice synthesis unit 122. is output from the speaker 108 . The agent voice SD1 is, for example, a voice such as "The destination is set to the zoo. I will guide you on the route." added to."

ここで、画像ＩＭ１～ＩＭ２が示すように、車両Ｍの近傍には、動物園までの経路の途中に存在するコンビニエンスストアＣＳ１と、動物園までの経路からは外れるものの、車両Ｍの現在位置から最も近いコンビニエンスストアＣＳ２とが存在する。優先度付加部１２６による優先度の付加が行われない場合、出力制御部１２７は、発話ＳＰに応じた指示を順次処理する。このため、出力制御部１２７は、発話ＳＰ１が示す指示に基づき、ナビゲーション装置に動物園までの移動経路を特定させた後、発話ＳＰ２が示す指示に基づき、ナビゲーション装置に動物園までの移動経路をキャンセルさせ、最寄りのコンビニエンスストアＣＳ２までの移動経路を特定させる。この場合、車両Ｍは、動物園までの移動経路から外れたコンビニエンスストアＣＳ２に立ち寄ってから動物園に向かうこととなり、動物園までの道のりが遠回りとなる。 Here, as shown by the images IM1 and IM2, near the vehicle M, there is a convenience store CS1 located on the way to the zoo, and a convenience store CS1 which is the closest to the current position of the vehicle M, although it is off the route to the zoo. There is a convenience store CS2. When priority addition by the priority addition unit 126 is not performed, the output control unit 127 sequentially processes instructions according to the speech SP. Therefore, the output control unit 127 causes the navigation device to identify the travel route to the zoo based on the instruction indicated by the utterance SP1, and then causes the navigation device to cancel the travel route to the zoo based on the instruction indicated by the utterance SP2. , to identify the moving route to the nearest convenience store CS2. In this case, the vehicle M stops at the convenience store CS2, which is off the route to the zoo, and then heads for the zoo, resulting in a detour to the zoo.

一方、優先度付加部１２６による優先度の付加が行われる場合、出力制御部１２７は、優先度の高い指示から順に処理する。この場合、車両Ｍは、動物園に向かいつつ、動物園までの経路上に存在するコンビニエンスストアＣＳ１に立ち寄るため、動物園まで効率的に移動することができる。したがって、本実施形態のエージェント装置１００によれば、発話によってされた複数の指示を車両Ｍの乗員にとって適切な順序によって実行することができる。 On the other hand, when priority is added by the priority adding unit 126, the output control unit 127 processes instructions in descending order of priority. In this case, since the vehicle M stops at the convenience store CS1 on the route to the zoo while heading to the zoo, it is possible to move efficiently to the zoo. Therefore, according to the agent device 100 of the present embodiment, a plurality of uttered instructions can be executed in an order appropriate for the occupant of the vehicle M.

［（２）発話の順序に基づく優先度の付加］
なお、上述では、優先度付加部１２６は、例えば、表示部１１０に表示されている内容との関連性に基づいて、優先度を付加する場合について説明したが、これに限られない。優先度付加部１２６は、例えば、発話ＳＰに含まれる複数の指示が、いずれも表示部１１０に表示されている内容に関連にしない場合や、いずれも表示部１１０に表示されている内容に関連する場合には、発話ＳＰの順序に基づいて、優先度を付加してもよい。この場合、優先度付加部１２６は、先に発話された発話ＳＰ１に含まれる指示に高い優先度を付加し、発話ＳＰ１よりも後に発話された発話ＳＰ２に含まれる指示には、発話ＳＰ１に含まれる指示よりも低い優先度を付加する。 [(2) Addition of priority based on the order of utterances]
In the above description, the priority adding unit 126 adds priority based on, for example, the relationship with the content displayed on the display unit 110. However, the present invention is not limited to this. For example, the priority adding unit 126 may not associate any of a plurality of instructions included in the utterance SP with the content displayed on the display unit 110, If so, priority may be added based on the order of the speech SPs. In this case, the priority adding unit 126 gives a higher priority to the instruction included in the utterance SP1 that was uttered earlier, and gives a higher priority to the instruction included in the utterance SP2 that was uttered after the utterance SP1. give it a lower priority than the instructions given.

［（３）接続詞に基づく優先度の付加］
また、優先度付加部１２６は、例えば、発話ＳＰに含まれる複数の指示が、いずれも表示部１１０に表示されている内容に関連にしない場合や、いずれも表示部１１０に表示されている内容に関連する場合には、指示を示す語句を接続する接続詞に基づいて、優先度を付加してもよい。例えば、「（指示Ａ）をして“それから”（指示Ｂ）をして。」（例１）や、「（指示Ａ）して“同じように”（指示Ｂ）をして。」（例２）等の乗員の発話ＳＰには、「それから」や、「同じように」等の、指示の実行順序を示す接続詞が含まれる。この場合、記憶部１５０には、接続詞を示す情報と、接続詞の前の語句と接続詞の後の語句との順序（又は、優先度）を示す情報とが互いに対応付けられた接続詞情報（不図示）が記憶されており、優先度付加部１２６は、接続詞情報に基づいて、複数の指示のそれぞれに優先度を付加する。 [(3) Addition of priority based on conjunction]
In addition, the priority adding unit 126 may, for example, set a plurality of instructions included in the utterance SP as not relating to the content displayed on the display unit 110, or , the priority may be added based on the conjunctions connecting the referent phrases. For example, ``(instruction A) and then ``then'' (instruction B)'' (example 1) or ``(instruction A) and ``the same'' (instruction B). The passenger's utterance SP such as example 2) includes conjunctions indicating the execution order of instructions, such as "then" and "similarly." In this case, the storage unit 150 stores conjunction information (not shown) in which information indicating the conjunction and information indicating the order (or priority) of the words before the conjunction and the words after the conjunction are associated with each other. ) are stored, and the priority adding unit 126 adds a priority to each of the plurality of instructions based on the conjunction information.

優先度付加部１２６は、（例１）や（例２）のように、指示Ａの方が、指示Ｂよりも先に実行することを示す接続詞が含まれる場合、指示Ａに高い優先度を付加し、指示Ｂには、指示Ａよりも低い優先度を付加する。 If the instruction A contains a conjunction indicating that the instruction A is to be executed before the instruction B, as in (Example 1) or (Example 2), the priority adding unit 126 assigns a higher priority to the instruction A. A lower priority than the instruction A is added to the instruction B.

［（４）順序を示す語句に基づく優先度の付加］
また、優先度付加部１２６は、例えば、発話ＳＰに含まれる複数の指示が、いずれも表示部１１０に表示されている内容に関連にしない場合や、いずれも表示部１１０に表示されている内容に関連する場合には、指示の順序を示す語句に基づいて、優先度を付加してもよい。例えば、「（指示Ａ）を“した後に”（指示Ｂ）をして。」（例３）や、「（指示Ａ）を“する前に”（指示Ｂ）をして。」（例４）等の乗員の発話ＳＰには、「した後に」や、「する前に」等の、指示の実行順序を示す接続詞が含まれる。この場合、記憶部１５０には、順序を示す語句を示す情報と、順序を示す語句の前の語句と順序を示す語句の後の語句との順序を示す情報とが互いに対応付けられた順序語句情報（不図示）が記憶されており、優先度付加部１２６は、順序語句情報に基づいて、複数の指示のそれぞれに優先度を付加する。 [(4) Addition of priority based on words indicating order]
In addition, the priority adding unit 126 may, for example, set a plurality of instructions included in the utterance SP as not relating to the content displayed on the display unit 110, or priority may be added based on the order of instructions. For example, "do (instruction B) after" (instruction A). ) and the like include conjunctions indicating the execution order of instructions, such as "after" and "before". In this case, in the storage unit 150, information indicating the words and phrases indicating the order and information indicating the order of the words and phrases before the words and phrases indicating the order and the words and phrases after the words and phrases indicating the order are associated with each other. Information (not shown) is stored, and the priority adding unit 126 adds a priority to each of a plurality of instructions based on the order phrase information.

優先度付加部１２６は、（例３）のように、指示Ａの方が、指示Ｂよりも先に実行することを示す語句が含まれる場合、指示Ａに高い優先度を付加し、指示Ｂには、指示Ａよりも低い優先度を付加する。また、優先度付加部１２６は、（例４）のように、指示Ｂの方が、指示Ａよりも先に実行することを示す語句が含まれる場合、指示Ｂに高い優先度を付加し、指示Ａには、指示Ｂよりも低い優先度を付加する。 As in (Example 3), if the instruction A contains a phrase indicating that the instruction A should be executed before the instruction B, the priority adding unit 126 adds a higher priority to the instruction A, and gives the instruction B a higher priority. is assigned a lower priority than instruction A. Further, if the instruction B contains a phrase indicating that the instruction B is to be executed before the instruction A, as in (Example 4), the priority adding unit 126 adds a higher priority to the instruction B, Instruction A is given a lower priority than instruction B.

［（５）タイミングを示す語句に基づく優先度の付加］
また、優先度付加部１２６は、例えば、発話ＳＰに含まれる複数の指示が、いずれも表示部１１０に表示されている内容に関連にしない場合や、いずれも表示部１１０に表示されている内容に関連する場合には、指示のタイミングを示す語句に基づいて、優先度を付加してもよい。例えば、「“１７時”に小学校に到着して（指示Ａ）“１８時”にスイミングスクールに到着して（指示Ｂ）」（例５）等の乗員の発話ＳＰには、指示を行うタイミングを示す語句が含まれる。この場合、優先度付加部１２６は、各指示のタイミングを示す語句に基づいてタイミングが早い指示の方が、優先度が高くなるように、複数の指示のそれぞれに優先度を付加する。 [(5) Addition of priority based on words indicating timing]
In addition, the priority adding unit 126 may, for example, set a plurality of instructions included in the utterance SP as not relating to the content displayed on the display unit 110, or , the priority may be added based on a phrase indicating the timing of the instruction. For example, in the passenger's utterance SP such as "Arrive at the elementary school at '17:00' (instruction A) and arrive at the swimming school at '18:00' (instruction B)" (Example 5), the timing for giving the instruction Contains phrases that indicate In this case, the priority adding unit 126 adds a priority to each of the multiple instructions based on the phrase indicating the timing of each instruction so that the instruction with earlier timing has a higher priority.

優先度付加部１２６は、（例５）のように、指示Ａの方が、指示Ｂよりも先に実行することを示す語句が含まれる場合、指示Ａに高い優先度を付加し、指示Ｂには、指示Ａよりも低い優先度を付加する。 If the instruction A contains a phrase indicating that the instruction A is to be executed before the instruction B, as in (Example 5), the priority adding unit 126 adds a higher priority to the instruction A, and gives the instruction B a higher priority. is assigned a lower priority than instruction A.

［処理フロー］
次に、実施形態に係るエージェントシステム１の処理の流れについてフローチャートを用いて説明する。なお、以下では、エージェント装置１００の処理と、サーバ装置２００との処理を分けて説明するものとする。また、以下に示す処理の流れは、所定のタイミングで繰り返し実行されてよい。所定のタイミングとは、例えば、音声データからエージェント装置を起動させる特定ワード（例えば、ウェイクアップワード）が抽出されたタイミングや、車両Ｍに搭載される各種スイッチのうち、エージェント装置１００を起動させるスイッチの選択を受け付けたタイミング等である。 [Processing flow]
Next, the flow of processing of the agent system 1 according to the embodiment will be explained using a flowchart. In the following description, processing by the agent device 100 and processing by the server device 200 will be described separately. Also, the flow of processing described below may be repeatedly executed at a predetermined timing. The predetermined timing is, for example, the timing at which a specific word (for example, a wake-up word) that activates the agent device is extracted from voice data, or the switch that activates the agent device 100 among various switches mounted on the vehicle M. is the timing at which the selection of is received.

図１０は、実施形態に係るエージェント装置１００の一連の処理の流れを示すフローチャートである。まず、取得部１２１は、ウェイクアップワードが認識された後に、マイク１０６によって乗員の音声データが収集されたか（つまり、乗員の発話があったか）否かを判定する（ステップＳ１００）。取得部１２１は、乗員の音声データが収集されるまでの間、待機する。次に、通信制御部１２３は、サーバ装置２００に対して音声データを通信部１０２に送信させる（ステップＳ１０２）。次に、通信制御部１２３は、通信部１０２にエージェントデータをサーバ装置２００から受信させる（ステップＳ１０４）。 FIG. 10 is a flow chart showing a series of processes of the agent device 100 according to the embodiment. First, the acquisition unit 121 determines whether or not voice data of the passenger has been collected by the microphone 106 (that is, whether or not the passenger has spoken) after the wakeup word is recognized (step S100). The acquisition unit 121 waits until the passenger's voice data is collected. Next, the communication control unit 123 causes the server device 200 to transmit the voice data to the communication unit 102 (step S102). Next, the communication control unit 123 causes the communication unit 102 to receive the agent data from the server device 200 (step S104).

特定部１２４は、受信したエージェントデータに含まれる車載機器ＶＥに対する指示を特定する（ステップＳ１０６）。判定部１２５は、車載機器ＶＥに対する指示が特定部１２４によって特定されたか否か（つまり、発話内容に車載機器ＶＥに対する指示が含まれるか否か）を判定する（ステップＳ１０８）。判定部１２５は、特定部１２４によって車載機器ＶＥに対する指示が特定されなかった場合、処理を終了する。 The identification unit 124 identifies the instruction for the vehicle-mounted device VE included in the received agent data (step S106). The determination unit 125 determines whether or not an instruction to the vehicle-mounted device VE has been identified by the identification unit 124 (that is, whether or not the utterance content includes an instruction to the vehicle-mounted device VE) (step S108). If the specification unit 124 does not specify an instruction to the vehicle-mounted device VE, the determination unit 125 ends the process.

判定部１２５は、車載機器ＶＥに対する指示が含まれると判定した場合、車載機器ＶＥに対する複数の指示がエージェントデータに含まれるか否かを判定する（ステップＳ１１０）。判定部１２５は、エージェントデータには車載機器ＶＥに対する指示が含まれるが、複数の指示ではない（一つの指示である）と判定した場合、処理をステップＳ１１４に進める。判定部１２５は、車載機器ＶＥに対する複数の指示がエージェントデータに含まれると判定した場合、リスト情報１５４に基づいてこれらの複数の指示が、同時に実行することが可能であるか否かを判定する（ステップＳ１１２）。判定部１２５は、車載機器ＶＥに対する複数の指示が、同時に実行することが可能であると判定した場合、処理をステップＳ１１４に進める。 When determining that an instruction to the vehicle-mounted device VE is included, the determination unit 125 determines whether or not the agent data includes a plurality of instructions to the vehicle-mounted device VE (step S110). If the determination unit 125 determines that the agent data includes instructions to the vehicle-mounted device VE, but does not include a plurality of instructions (there is one instruction), the process proceeds to step S114. If the determining unit 125 determines that the agent data includes a plurality of instructions for the in-vehicle device VE, it determines whether or not the plurality of instructions can be executed simultaneously based on the list information 154. (Step S112). When determining unit 125 determines that a plurality of instructions to vehicle-mounted device VE can be executed simultaneously, the process proceeds to step S114.

出力制御部１２７は、特定部１２４によって特定された車載機器ＶＥに対する一つ、又は複数の指示に応じた情報を同時に情報出力装置に出力させる（ステップＳ１１４）。
出力制御部１２７は、例えば、エージェントデータに含まれる指示に応じて、車載機器ＶＥを制御する。また、出力制御部１２７は、例えば、エージェントデータに含まれる指示に応じて、音声合成部１２２によってエージェント音声が生成されると、そのエージェント音声をスピーカ１０８に出力させる。また、出力制御部１２７は、エージェントデータに含まれる指示に応じて、指示された画像データを表示部１１０に表示させる。 The output control unit 127 causes the information output device to simultaneously output information corresponding to one or more instructions to the vehicle-mounted device VE identified by the identification unit 124 (step S114).
The output control unit 127 controls the in-vehicle device VE according to instructions included in the agent data, for example. Further, for example, when an agent voice is generated by the voice synthesizing unit 122 according to an instruction included in agent data, the output control unit 127 causes the speaker 108 to output the agent voice. In addition, the output control unit 127 causes the display unit 110 to display the instructed image data according to the instruction included in the agent data.

出力制御部１２７が複数の指示に応じた情報を同時に情報出力装置に出力させるとは、例えば、各指示のエージェントデータに係るエージェント音声を、スピーカ１０８Ａ～１０８Ｃのうち、いずれかのスピーカ１０８に同時に出力させたり、各指示のエージェントデータに係る画像を、表示部１１０Ａ～１１０Ｃのうち、いずれかの表示部１１０に表示させたりすることである。なお、出力制御部１２７は、各指示のエージェントデータに係るエージェント音声を、優先度に関わらず、スピーカ１０８Ａ～１０８Ｃのうち、いずれか（例えば、一つ）のスピーカ１０８に順次出力させたり、各指示のエージェントデータに係る画像を、優先度に関わらず、表示部１１０Ａ～１１０Ｃのうち、いずれか（例えば、一つ）の表示部１１０に順次表示させたりするものであってもよい。 The fact that the output control unit 127 simultaneously outputs information corresponding to a plurality of instructions to the information output device means that, for example, the agent voice related to the agent data of each instruction is simultaneously output to one of the speakers 108A to 108C. It is to output or display an image related to the agent data of each instruction on one of the display units 110A to 110C. Note that the output control unit 127 sequentially outputs the agent voice related to the agent data of each instruction to one of the speakers 108A to 108C (for example, one), regardless of the priority, or The images related to the instructed agent data may be sequentially displayed on one of the display units 110A to 110C (for example, one) regardless of the priority.

優先度付加部１２６は、判定部１２５によって車載機器ＶＥに対する複数の指示が、同時に実行することが不可能であると判定された場合、複数の指示のそれぞれに対して優先度を付加する（ステップＳ１１６）。優先度付加部１２６は、例えば、複数の指示のそれぞれに、表示部１１０に表示されている内容との関連性に基づいて優先度を付加してもよく、発話ＳＰの順序に基づいて優先度を付加してもよく、指示を示す語句を接続する接続詞に基づいて優先度を付加してもよく、指示の順序を示す語句に基づいて優先度を付加してもよく、指示のタイミングを示す語句に基づいて優先度を付加してもよい。 When the judgment unit 125 judges that a plurality of instructions to the vehicle-mounted device VE cannot be executed at the same time, the priority addition unit 126 adds a priority to each of the plurality of instructions (step S116). For example, the priority adding unit 126 may add a priority to each of a plurality of instructions based on the relevance to the content displayed on the display unit 110, and may add the priority based on the order of the utterance SP. may be added, the priority may be added based on the conjunction that connects the phrases indicating the instruction, the priority may be added based on the phrase indicating the order of the instruction, and the priority may be added based on the phrase indicating the timing of the instruction Priority may be added based on phrases.

出力制御部１２７は、特定部１２４によって特定された車載機器ＶＥに対す複数の指示に応じた情報を、優先度が高い指示に応じた情報から順に情報出力装置に出力させる（ステップＳ１１８）。 The output control unit 127 causes the information output device to output information corresponding to a plurality of instructions to the vehicle-mounted device VE identified by the identifying unit 124 in order of priority (step S118).

図１１は、実施形態に係るサーバ装置２００の一例の処理の流れを示すフローチャートである。まず、通信部２０２は、エージェント装置１００から音声データを取得する（ステップＳ２００）。次に、発話区間抽出部２１２は、音声データに含まれる発話区間を抽出する（ステップＳ２０２）。次に、音声認識部２１３は、抽出された発話区間における音声データから、発話内容を認識する。具体的には、音声認識部２１３は、音声データをテキストデータにして、最終的にはテキストデータに含まれる文言を認識する（ステップＳ２０４）。エージェントデータ生成部２１４は、発話内容全体の意味に基づくエージェントデータを生成する（ステップＳ２０６）。次に、通信制御部２１５は、通信部２０２を介して、エージェントデータをエージェント装置１００に送信する（ステップＳ２０８）。 FIG. 11 is a flow chart showing an example of the processing flow of the server device 200 according to the embodiment. First, the communication unit 202 acquires voice data from the agent device 100 (step S200). Next, the speech segment extraction unit 212 extracts speech segments included in the voice data (step S202). Next, the speech recognition unit 213 recognizes the utterance contents from the speech data in the extracted utterance period. Specifically, the speech recognition unit 213 converts the speech data into text data, and finally recognizes the words included in the text data (step S204). The agent data generation unit 214 generates agent data based on the meaning of the entire speech content (step S206). Next, the communication control unit 215 transmits the agent data to the agent device 100 via the communication unit 202 (step S208).

［実施形態のまとめ］
以上説明したように、本実施形態のエージェントシステム１は、利用者（この一例では、乗員）が発話した音声を示す音声データを取得する取得部１２１と、取得部１２１により取得された音声データに基づいて、乗員の発話内容を認識する音声認識部２１３と、発話内容に含まれる指示を特定する特定部１２４と、特定部１２４によって特定された指示に応答する情報を、表示部１１０を含む情報出力装置に出力させる出力制御部１２７と、特定部１２４によって複数の指示が特定された場合、特定された複数の指示が同時に実行することが可能であるか否かを判定する判定部１２５と、判定部１２５により複数の指示が同時に実行することが不可能であると判定された場合、発話内容に含まれる複数の指示に対して、表示部１１０により表示されている内容との関連性に基づいて優先度を付加する優先度付加部１２６と、を備え、出力制御部１２７は、特定部１２４により複数の指示が特定された場合、優先度付加部１２６に付加された優先度の高い指示から順に、指示に応じた情報を情報出力装置に出力させる。これにより、本実施形態のエージェントシステム１は、発話によってされた複数の指示を車両Ｍの乗員にとって適切な順序によって実行することができる。 [Summary of embodiment]
As described above, the agent system 1 of the present embodiment includes the acquisition unit 121 that acquires voice data representing the voice uttered by the user (a passenger in this example), and the voice data acquired by the acquisition unit 121. Based on this, a speech recognition unit 213 that recognizes the content of the utterance of the passenger, a specification unit 124 that specifies an instruction included in the content of the utterance, and information that responds to the instruction specified by the specification unit 124, information including the display unit 110. an output control unit 127 that causes an output device to output; a determination unit 125 that, when a plurality of instructions are identified by the identification unit 124, determines whether or not the plurality of identified instructions can be executed simultaneously; If the determining unit 125 determines that it is impossible to execute a plurality of instructions at the same time, the plurality of instructions included in the utterance content are determined based on the relevance to the content displayed by the display unit 110. and a priority adding unit 126 for adding priority to each priority, and when a plurality of instructions are specified by the specifying unit 124, the output control unit 127 selects from the instructions with high priority added to the priority adding unit 126. In order, the information output device is made to output the information corresponding to the instruction. As a result, the agent system 1 of the present embodiment can execute a plurality of uttered instructions in an order appropriate for the occupant of the vehicle M. FIG.

＜変形例＞
上述した実施形態では、車両Ｍに搭載されたエージェント装置１００と、サーバ装置２００とが互いに異なる装置であるものとして説明したがこれに限定されるものではない。例えば、エージェント機能に係るサーバ装置２００の構成要素は、エージェント装置１００の構成要素に含まれてもよい。この場合、サーバ装置２００は、エージェント装置１００の制御部１２０によって仮想的に実現される仮想マシンとして機能させてもよい。以下、サーバ装置２００の構成要素を含むエージェント装置１００Ａを変形例として説明する。なお、変形例において、上述した実施形態と同様の構成要素については、同様の符号を付するものとし、ここでの具体的な説明は省略する。 <Modification>
In the above-described embodiment, the agent device 100 mounted on the vehicle M and the server device 200 are different devices, but the present invention is not limited to this. For example, the constituent elements of the server device 200 related to the agent function may be included in the constituent elements of the agent device 100 . In this case, the server device 200 may function as a virtual machine that is virtually realized by the controller 120 of the agent device 100 . An agent device 100A including the components of the server device 200 will be described below as a modified example. In addition, in the modified example, the same components as in the above-described embodiment are denoted by the same reference numerals, and detailed description thereof is omitted here.

図１２は、変形例に係るエージェント装置１００Ａの一例を示す図である。エージェント装置１００Ａは、例えば、通信部１０２と、マイク１０６と、スピーカ１０８と、表示部１１０と、制御部１２０ａと、記憶部１５０ａとを備える。制御部１２０ａは、例えば、取得部１２１と、音声合成部１２２と、通信制御部１２３と、特定部１２４と、判定部１２５と、優先度付加部１２６と、出力制御部１２７と、発話区間抽出部２１２と、音声認識部２１３と、エージェントデータ生成部２１４とを備える。 FIG. 12 is a diagram showing an example of an agent device 100A according to a modification. The agent device 100A includes, for example, a communication section 102, a microphone 106, a speaker 108, a display section 110, a control section 120a, and a storage section 150a. The control unit 120a includes, for example, an acquisition unit 121, a speech synthesis unit 122, a communication control unit 123, a specification unit 124, a determination unit 125, a priority addition unit 126, an output control unit 127, and a speech segment extraction unit. It includes a unit 212 , a speech recognition unit 213 and an agent data generation unit 214 .

また、記憶部１５０ａは、例えば、プロセッサによって参照されるプログラムのほかに、車載機器情報１５２、リスト情報１５４、及び回答情報２３２が含まれる。回答情報２３２は、サーバ装置２００から取得した最新の情報によって更新されてもよい。 Further, the storage unit 150a includes, for example, in-vehicle device information 152, list information 154, and answer information 232 in addition to programs referred to by the processor. The answer information 232 may be updated with the latest information obtained from the server device 200 .

エージェント装置１００Ａの処理は、例えば、図１０に示すフローチャートのステップＳ１００の処理の後に、図１１に示すフローチャートのステップＳ２０２～ステップＳ２０６の処理を実行し、その後、図１０に示すフローチャートのステップＳ１０６以降の処理を実行する処理である。 The processing of the agent device 100A is, for example, after the processing of step S100 of the flowchart shown in FIG. 10, the processing of steps S202 to S206 of the flowchart shown in FIG. is a process for executing the process of

以上説明した変形例のエージェント装置１００Ａによれば、第１実施形態と同様の効果を奏する他、乗員からの音声を取得するたびに、ネットワークＮＷを介してサーバ装置２００との通信を行う必要がないため、より迅速に発話内容を認識することができる。また、車両Ｍがサーバ装置２００と通信できない状態であっても、エージェントデータを生成して、乗員に情報を提供することができる。 According to the agent device 100A of the modified example described above, in addition to the same effects as those of the first embodiment, it is not necessary to communicate with the server device 200 via the network NW each time the voice from the passenger is acquired. Therefore, the utterance content can be recognized more quickly. Further, even when the vehicle M cannot communicate with the server device 200, it is possible to generate agent data and provide information to the occupants.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 As described above, the mode for carrying out the present invention has been described using the embodiments, but the present invention is not limited to such embodiments at all, and various modifications and replacements can be made without departing from the scope of the present invention. can be added.

１…エージェントシステム、１００、１００Ａ…エージェント装置、１０２、２０２…通信部、１０６、１０６、１０６Ａ、１０６Ｂ、１０６Ｃ、１０６Ｄ、１０６Ｅ…マイク、１０８、１０８Ａ、１０８Ｂ、１０８Ｃ、１０８Ｄ、１０８Ｅ…スピーカ、１１０、１１０Ａ、１１０Ｂ、１１０Ｃ…表示部、１２０、１２０ａ、２１０…制御部、１２１…取得部、２１１…取得部、１２２…音声合成部、１２３…通信制御部、２１５…通信制御部、１２４…特定部、１２５…判定部、１２６…優先度付加部、１２７…出力制御部、１５０、１５０ａ、２３０…記憶部、１５２…車載機器情報、１５４…リスト情報、２００…サーバ装置、２１２…発話区間抽出部、２１３…音声認識部、２１４…エージェントデータ生成部、２３２…回答情報、Ｍ…車両、ＳＤ１、ＳＤ２…エージェント音声、ＳＰ、ＳＰ１、ＳＰ２…発話、ＶＥ、ＶＥ１、ＶＥ２、ＶＥ３、ＶＥ４…車載機器 Reference Signs List 1 agent system 100, 100A agent device 102, 202 communication unit 106, 106, 106A, 106B, 106C, 106D, 106E microphone 108, 108A, 108B, 108C, 108D, 108E speaker 110 , 110A, 110B, 110C... display unit 120, 120a, 210... control unit 121... acquisition unit 211... acquisition unit 122... speech synthesis unit 123... communication control unit 215... communication control unit 124... identification Unit 125 Determination unit 126 Priority addition unit 127 Output control unit 150, 150a, 230 Storage unit 152 In-vehicle device information 154 List information 200 Server device 212 Speech segment extraction Part 213... Voice recognition part 214... Agent data generation part 232... Answer information M... Vehicle SD1, SD2... Agent voice SP, SP1, SP2... Speech VE, VE1, VE2, VE3, VE4... In-vehicle machine

Claims

an acquisition unit that acquires data indicating a voice uttered by a user;
a speech recognition unit that recognizes utterance content of the user based on the data acquired by the acquisition unit;
an identification unit that identifies an instruction included in the utterance content;
an output control unit that causes an information output device including a display unit to output information responding to the instruction specified by the specifying unit;
a determining unit that, when the specifying unit specifies a plurality of the instructions, determines whether the specified plurality of the instructions can be executed at the same time;
If the determination unit determines that the plurality of instructions cannot be executed at the same time, the relationship between the plurality of instructions included in the utterance content and the content displayed by the display unit a priority adding unit that adds priority based on the nature of the
When the specifying unit specifies a plurality of instructions, the output control unit outputs information corresponding to the instructions to the information output device in order of priority added to the priority adding unit. to output to
agent device.

The determination unit refers to list information of instructions that cannot be executed simultaneously, and determines whether or not the specified plurality of instructions can be executed simultaneously.
The agent device according to claim 1.

The output control unit causes the information output device to output information in response to the plurality of instructions when the determination unit determines that the plurality of instructions can be executed at the same time.
3. The agent device according to claim 1 or 2.

an acquisition unit that acquires data indicating a voice uttered by a user;
a speech recognition unit that recognizes utterance content of the user based on the data acquired by the acquisition unit;
an identification unit that identifies an instruction included in the utterance content;
an output control unit that causes an information output device including a display unit to output information responding to the instruction specified by the specifying unit;
a priority adding unit that adds priority to the plurality of instructions included in the utterance content based on relevance to the content displayed by the display unit;
When the specifying unit specifies a plurality of instructions, the output control unit outputs information corresponding to the instructions to the information output device in order of priority added to the priority adding unit. to output to
agent device.

If none of the plurality of instructions included in the speech content are related to the content displayed by the display unit, or all of the instructions are related to the content displayed by the display unit If so, adding the priority based on the order in which the multiple instructions were spoken;
An agent device according to any one of claims 1 to 4.

If none of the plurality of instructions included in the speech content are related to the content displayed by the display unit, or all of the instructions are related to the content displayed by the display unit if so, adding said priority based on a conjunction connecting multiple said instructions;
An agent device according to any one of claims 1 to 5.

If none of the plurality of instructions included in the speech content are related to the content displayed by the display unit, or all of the instructions are related to the content displayed by the display unit in the case, adding the priority based on a phrase indicating the order of the instructions included in the utterance content;
Agent device according to any one of claims 1 to 6.

If none of the plurality of instructions included in the speech content are related to the content displayed by the display unit, or all of the instructions are related to the content displayed by the display unit in the case, adding the priority based on a phrase indicating the timing of the instruction included in the utterance content;
Agent device according to any one of claims 1 to 7.

the computer
Acquire data indicating the voice uttered by the user,
recognizing the utterance content of the user based on the acquired data;
identifying an instruction included in the utterance content;
causing an information output device including a display to output information responding to the specified instruction;
if multiple instructions are identified, determining whether the identified instructions can be executed simultaneously;
when it is determined that the plurality of instructions cannot be executed simultaneously, based on the relevance of the plurality of instructions included in the utterance content to the content displayed by the display unit add priority,
when a plurality of the instructions are specified, outputting information corresponding to the instructions to an information output device in order from the added instructions with the highest priority;
agent method.

to the computer,
Acquire data indicating the voice uttered by the user,
Recognizing the utterance content of the user based on the acquired data,
identify instructions included in the utterance content;
causing an information output device including a display to output information responding to the specified instruction;
if a plurality of said instructions are identified, determining whether said plurality of said identified instructions can be executed simultaneously;
when it is determined that the plurality of instructions cannot be executed at the same time, based on the relevance of the plurality of instructions included in the utterance content to the content displayed by the display unit add priority,
When a plurality of said instructions are specified, causing an information output device to output information corresponding to said instructions in order from said added instructions with higher priority;
program.