WO2025027877A1

WO2025027877A1 - Information processing device, information processing method, and program

Info

Publication number: WO2025027877A1
Application number: PCT/JP2023/046256
Authority: WO
Inventors: 悠西村; 晃介椋本
Original assignee: ソニーグループ株式会社
Priority date: 2023-08-03
Filing date: 2023-12-22
Publication date: 2025-02-06

Abstract

Provided is an information processing device of an information processing system in which a voice input from a first user present in a first space and image data obtained by imaging the first space are presented to a second user present in a second space away from the first space, and a voice input from the second user is presented to the first user, said information processing device comprising a control circuit which controls the display of display information corresponding to information pertaining to the viewpoint of the second user and the image data and information pertaining to an object present in the first space by a display.

Description

Information processing device, information processing method, and program

　本開示は、情報処理装置、情報処理方法およびプログラムに関する。 This disclosure relates to an information processing device, an information processing method, and a program.

　近年、管理対象空間とは離れた場所に存在するユーザが管理対象空間の状況を管理するための技術が知られている。例えば、特許文献１には、管理対象空間が作業現場であり、ユーザが作業現場の状況を管理する管理者であり、管理者が作業現場の状況を管理するための技術が開示されている。 In recent years, technology has become known that allows a user who is in a location separate from the managed space to manage the status of the managed space. For example, Patent Document 1 discloses technology in which the managed space is a work site, the user is an administrator who manages the status of the work site, and the administrator manages the status of the work site.

特開２０２２－４２３０３号公報JP 2022-42303 A

　しかし、管理対象空間とは離れた場所に存在するユーザが管理対象空間の状況をより効率良く管理することが望まれる。 However, it is desirable for users who are located far away from the managed space to be able to manage the status of the managed space more efficiently.

　上記課題を解決するために、本開示のある観点によれば、第１の空間に存在する第１のユーザから入力された音声と、前記第１の空間が撮像されて得られた画像データとが、前記第１の空間とは離れた第２の空間に存在する第２のユーザに提示されるとともに、前記第２のユーザから入力された音声が、前記第１のユーザに提示されている情報処理システムにおける情報処理装置であって、前記第２のユーザの視点に関する情報と前記画像データとに応じた表示情報と、前記第１の空間に存在するオブジェクトに関する情報との、ディスプレイによる表示を制御する制御回路を備える、情報処理装置が提供される。 In order to solve the above problem, according to one aspect of the present disclosure, there is provided an information processing device in an information processing system in which voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and voice input from the second user is presented to the first user, the information processing device including a control circuit for controlling the display, on a display, of information relating to the viewpoint of the second user and display information corresponding to the image data, and information relating to an object present in the first space.

　本開示の別の観点によれば、第１の空間に存在する第１のユーザから入力された音声と、前記第１の空間が撮像されて得られた画像データとが、前記第１の空間とは離れた第２の空間に存在する第２のユーザに提示されるとともに、前記第２のユーザから入力された音声が、前記第１のユーザに提示されている情報処理システムにおける情報処理方法であって、前記第２のユーザの視点に関する情報と前記画像データとに応じた表示情報と、前記第１の空間に存在するオブジェクトに関する情報との、ディスプレイによる表示をプロセッサが制御することを含む、情報処理方法が提供される。 According to another aspect of the present disclosure, there is provided an information processing method in an information processing system in which voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separate from the first space, and voice input from the second user is presented to the first user, the information processing method including a processor controlling the display, on a display, of information relating to the viewpoint of the second user and display information corresponding to the image data, and information relating to an object present in the first space.

　本開示の別の観点によれば、コンピュータを、第１の空間に存在する第１のユーザから入力された音声と、前記第１の空間が撮像されて得られた画像データとが、前記第１の空間とは離れた第２の空間に存在する第２のユーザに提示されるとともに、前記第２のユーザから入力された音声が、前記第１のユーザに提示されている情報処理システムにおける情報処理装置であって、前記第２のユーザの視点に関する情報と前記画像データとに応じた表示情報と、前記第１の空間に存在するオブジェクトに関する情報との、ディスプレイによる表示を制御する制御回路として機能させるプログラムが提供される。 According to another aspect of the present disclosure, there is provided an information processing device in an information processing system in which voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and voice input from the second user is presented to the first user, and a program is provided that causes a computer to function as a control circuit for controlling the display, on a display, of information relating to the viewpoint of the second user and display information corresponding to the image data, and information relating to an object present in the first space.

本開示の実施形態に係る情報処理システムの構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of an information processing system according to an embodiment of the present disclosure. 本開示の実施形態に係る情報処理システム１を構成する装置間において送受信されるデータの例を示す図である。2 is a diagram showing an example of data transmitted and received between devices constituting the information processing system 1 according to an embodiment of the present disclosure. FIG. 本開示の実施形態に係る情報処理システム１によって行われる全体的な処理の例を示す図である。FIG. 2 is a diagram illustrating an example of an overall process performed by an information processing system 1 according to an embodiment of the present disclosure. 本開示の実施形態に係る現場作業者デバイス１０の機能構成例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of a field worker device 10 according to an embodiment of the present disclosure. 本開示の実施形態に係る遠隔参加者デバイス２０の機能構成例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of a remote participant device 20 according to an embodiment of the present disclosure. 本開示の実施形態に係るサーバ３０の機能構成例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of a server 30 according to an embodiment of the present disclosure. 遠隔参加者が指摘を行う場合に行われるマルチモーダル処理の詳細について説明するための図である。13 is a diagram for explaining details of a multimodal process performed when a remote participant makes an indication. FIG. 現場作業者が指摘を行う場合に行われるマルチモーダル処理の詳細について説明するための図である。13 is a diagram for explaining details of a multi-modal process performed when a site worker makes an indication. FIG. 制御部３４０によって行われる音声認識の例について説明するための図である。11 is a diagram for explaining an example of voice recognition performed by a control unit 340. FIG. 制御部３４０によって行われる指摘情報の認識の例について説明するための図である。11 is a diagram for explaining an example of recognition of indication information performed by a control unit 340. FIG. 遠隔参加者によって注意されているオブジェクトの認識の例について説明するための図である。FIG. 13 is a diagram for explaining an example of recognition of an object that is being noticed by a remote participant. 現場作業者によって注意されているオブジェクトの認識の例について説明するための図である。FIG. 13 is a diagram for explaining an example of recognition of an object that is being noticed by a site worker. 制御部３４０によって行われる返答情報の認識の例について説明するための図である。11 is a diagram for explaining an example of response information recognition performed by a control unit 340. FIG. 制御部３４０によって記憶部３５０に記録されるタグ内のメタ情報の例を示す図である。13 is a diagram showing an example of meta-information in a tag recorded in the storage unit 350 by the control unit 340. FIG. 認識モデルの変形例について説明するための図である。FIG. 13 is a diagram for explaining a modified example of the recognition model. 関連付け開始操作の例について説明するための図である。FIG. 11 is a diagram for explaining an example of an association start operation. 指摘情報の認識の変形例について説明するための図である。13 is a diagram for explaining a modified example of recognition of indicated information. FIG. 話者の特定結果がタグ付けに用いられる変形例について説明するための図である。FIG. 13 is a diagram for explaining a modified example in which a speaker identification result is used for tagging. ＡＩとの連携例について説明するための図である。FIG. 13 is a diagram for explaining an example of collaboration with AI. タグ内のメタ情報がリアルタイム映像に反映される例について説明するための図である。11 is a diagram for explaining an example in which meta information in a tag is reflected in real-time video. FIG. タグ内のメタ情報がタイムラインに付される場合について説明するための図である。FIG. 13 is a diagram for explaining a case where meta information in a tag is added to a timeline. タグ内のメタ情報が作業現場のマップおよび表示映像に付される場合について説明するための図である。11 is a diagram for explaining a case where meta information in a tag is added to a map of a work site and a displayed image. FIG. タグの管理の一例を示す図である。FIG. 11 illustrates an example of tag management. レポートの一例を示す図である。FIG. 13 illustrates an example of a report. 二次元コードが付された作業現場の例を示す図である。FIG. 1 is a diagram showing an example of a work site with a two-dimensional code. 現場作業者デバイス１０によって実行されるハードウェア処理を説明するための図である。2 is a diagram for explaining hardware processing executed by the field worker device 10. FIG. ３６０度映像とカメラ映像とに基づく二次元コードの認識の例について説明するための図である。11 is a diagram for explaining an example of recognition of a two-dimensional code based on a 360-degree image and a camera image. FIG. サーバ３０によって実行されるソフトウェア処理を説明するための図である。FIG. 2 is a diagram for explaining software processing executed by the server 30. 遠隔参加者デバイス２０－１によるマップの第１の表示例を示す図である。FIG. 13 shows a first example of a map displayed by a remote participant device 20-1. 詳細画面表示ボタンｂ２が選択された場合に遠隔参加者デバイス２０－１によって表示される詳細画面の例を示す図である。FIG. 13 is a diagram showing an example of a details screen displayed by the remote participant device 20-1 when the details screen display button b2 is selected. 遠隔参加者デバイス２０－１によるマップの第２の表示例を示す図である。FIG. 13 shows a second example of a map displayed by a remote participant device 20-1. 詳細画面表示ボタンｂ８が選択された場合に遠隔参加者デバイス２０－１によって表示される詳細画面の例を示す図である。FIG. 13 is a diagram showing an example of a details screen displayed by the remote participant device 20-1 when the details screen display button b8 is selected. 検索キーによるフィルタリングが行われた後の詳細画面の例を示す図である。FIG. 13 is a diagram showing an example of a details screen after filtering using a search key. 情報処理装置９００のハードウェア構成例を示すブロック図である。FIG. 9 is a block diagram showing an example of a hardware configuration of an information processing device 900.

　以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Below, a preferred embodiment of the present disclosure will be described in detail with reference to the attached drawings. Note that in this specification and drawings, components having substantially the same functional configurations are designated by the same reference numerals to avoid redundant description.

　なお、説明は以下の順序で行うものとする。
　０．概要
　１．実施形態の詳細
　　１．１．情報処理システムの構成例
　　１．２．現場作業者デバイスの機能構成例
　　１．３．遠隔参加者デバイスの機能構成例
　　１．４．サーバの機能構成例
　　１．５．マルチモーダル処理の詳細
　　１．６．二次元コードの利用
　２．ハードウェア構成例
　３．まとめ The explanation will be given in the following order.
0. Overview 1. Details of the embodiment 1.1. Example of the configuration of the information processing system 1.2. Example of the functional configuration of the on-site worker device 1.3. Example of the functional configuration of the remote participant device 1.4. Example of the functional configuration of the server 1.5. Details of multimodal processing 1.6. Use of two-dimensional codes 2. Example of the hardware configuration 3. Summary

　＜０．概要＞
　まず、本開示の実施形態の概要について説明する。 <0. Overview>
First, an overview of the embodiment of the present disclosure will be described.

　一般的に、現場作業者によって作業が行われる作業現場（例えば、建設現場など）の状況の管理が行われる。作業現場の状況を管理する役割を担う現場所長（以下、単に「所長」とも言う。）は、作業現場に出向いて、安全のためのパトロールおよび施工の進捗管理などを日々実施し、作業進捗をレポートにまとめて管理することがある。しかし、所長が作業現場に移動するために要するコスト、是正すべき事項の管理のために要するコスト、レポート作成に要するコストなどが甚大となり得る。 Generally, the situation at a work site (such as a construction site) where work is being performed by on-site workers is managed. The site manager (hereinafter simply referred to as the "foreman"), who is responsible for managing the situation at the work site, visits the work site to patrol for safety and manage the progress of construction on a daily basis, and sometimes compiles and manages the progress of work in reports. However, the costs involved in the foreman traveling to the work site, the costs involved in managing matters that need to be corrected, and the costs involved in creating reports can be enormous.

　そこで、所長が作業現場に移動するために要するコストを削減するため、遠隔地から所長が作業現場の状況を管理するための技術（以下、「遠隔臨場」とも言う。）が知られている。また、作業現場の全体的な状況を所長が管理するため、作業現場において撮像された画像データ（例えば、静止画データまたは動画像データ）を活用して作業現場の状況を記録するシステムが導入されることがある。なお、画像データの例として、撮像地点を基準として全方位が撮像された動画像データ（以下、「３６０度映像」とも言う。）が用いられ得る。 In order to reduce the cost required for the manager to travel to the work site, a technology that allows the manager to manage the situation at the work site from a remote location (hereinafter also referred to as "remote presence") is known. Also, in order for the manager to manage the overall situation at the work site, a system that records the situation at the work site using image data (e.g., still image data or video image data) captured at the work site may be introduced. Note that as an example of image data, video image data captured in all directions from the imaging point (hereinafter also referred to as "360-degree video") may be used.

　しかし、遠隔臨場による安全のためのパトロール、３６０度映像を活用した作業現場の状況記録、是正すべき事項の管理、および、レポート作成といった処理それぞれは、別々に行われるのが通例である。そのため、全部の処理を終わらせるためには多くのコストを要してしまう。 However, each of these tasks - remote safety patrols, recording the work site situation using 360-degree video, managing items that need to be corrected, and creating reports - is usually done separately. As a result, it is very costly to complete all of these tasks.

　そこで、遠隔臨場を用いて、３６０度映像を活用した作業現場の状況記録、是正すべき事項の管理、および、レポート作成を互いに関連させながら完結させるシステムが求められている。かかるシステムにより、遠隔地に存在する所長が作業現場の状況をより効率良く管理することが可能となることが期待される。 Therefore, there is a demand for a system that uses remote presence to record the situation at the work site using 360-degree video, manages items that need to be corrected, and creates reports while linking them all together. It is expected that such a system will enable a manager in a remote location to manage the situation at the work site more efficiently.

　なお、作業現場は、管理対象空間の例に該当し、第１の空間の例に該当する。遠隔地は、管理対象空間とは離れた場所の例に該当し、第１の空間とは離れた第２の空間の例に該当する。現場作業者は、管理対象空間に存在する第１のユーザの例に該当する。また、所長は、遠隔参加者の例に過ぎない。したがって、遠隔参加者は、所長以外の者（例えば、現場主任、現場技術職員、本社職員、または、施主など）であってもよい。また、遠隔参加者は、管理対象空間の状況を管理する第２のユーザの例に該当する。 The work site is an example of a managed space and an example of a first space. The remote location is an example of a place separate from the managed space and an example of a second space separate from the first space. The on-site worker is an example of a first user who exists in the managed space. The director is merely an example of a remote participant. Therefore, the remote participant may be someone other than the director (for example, a site supervisor, a site technical staff member, a head office staff member, or a client). The remote participant is an example of a second user who manages the status of the managed space.

　以上、本開示の実施形態の概要について説明した。 The above provides an overview of the embodiments of the present disclosure.

　＜１．実施形態の詳細＞
　続いて、本開示の実施形態について詳細に説明する。 1. Details of the embodiment
Next, the embodiments of the present disclosure will be described in detail.

　（１．１．情報処理システムの構成例）
　まず、図１を参照しながら、本開示の実施形態に係る情報処理システムの構成例について説明する。 (1.1. Configuration example of information processing system)
First, a configuration example of an information processing system according to an embodiment of the present disclosure will be described with reference to FIG.

　図１は、本開示の実施形態に係る情報処理システムの構成例を示す図である。図１に示されるように、本開示の実施形態に係る情報処理システム１は、現場作業者デバイス１０と、遠隔参加者デバイス２０－１～２０－３と、サーバ３０と、ネットワーク４０とを備える。 FIG. 1 is a diagram showing an example configuration of an information processing system according to an embodiment of the present disclosure. As shown in FIG. 1, an information processing system 1 according to an embodiment of the present disclosure includes an on-site worker device 10, remote participant devices 20-1 to 20-3, a server 30, and a network 40.

　現場作業者デバイス１０、遠隔参加者デバイス２０－１～２０－３およびサーバ３０それぞれは、ネットワーク４０に接続されている。そして、現場作業者デバイス１０と、遠隔参加者デバイス２０－１～２０－３と、サーバ３０とは、ネットワーク４０を介して互いに通信可能に構成されている。なお、以下の説明においては、遠隔参加者デバイス２０－１～２０－３それぞれを区別せずに「遠隔参加者デバイス２０」と称する場合がある。 The on-site worker device 10, the remote participant devices 20-1 to 20-3, and the server 30 are each connected to a network 40. The on-site worker device 10, the remote participant devices 20-1 to 20-3, and the server 30 are configured to be able to communicate with each other via the network 40. Note that in the following description, the remote participant devices 20-1 to 20-3 may be referred to as "remote participant devices 20" without distinguishing between them.

　（現場作業者デバイス１０）
　現場作業者デバイス１０は、コンピュータによって実現され、現場作業者によって使用される情報処理装置である。例えば、現場作業者デバイス１０は、現場作業者の頭部に装着されるウェアラブルデバイスなどによって実現されてよい。現場作業者デバイス１０は、作業現場における３６０度映像を撮像可能なカメラ１１０（図４）を有している。また、現場作業者デバイス１０は、作業現場における音（以下、「現場音声」とも言う。）を検出するマイクロフォン１２０（図４）を有している。現場音声には、現場環境音および現場作業者の音声が含まれ得る。 (Field Worker Device 10)
The on-site worker device 10 is an information processing device realized by a computer and used by a on-site worker. For example, the on-site worker device 10 may be realized by a wearable device attached to the head of the on-site worker. The on-site worker device 10 has a camera 110 ( FIG. 4 ) capable of capturing 360-degree images of the work site. The on-site worker device 10 also has a microphone 120 ( FIG. 4 ) that detects sounds at the work site (hereinafter also referred to as "on-site sounds"). The on-site sounds may include on-site environmental sounds and the sounds of the on-site workers.

　また、現場作業者デバイス１０は、遠隔参加者デバイス２０－１～２０－３それぞれからサーバ３０を介して受信した遠隔参加者の音声を出力可能なスピーカ１７０（図４）を有している。なお、図１に示された例では、現場作業者デバイス１０がディスプレイを有していないが、現場作業者デバイス１０は、ディスプレイを有してもよく、かかるディスプレイは、遠隔参加者デバイス２０が表示可能な情報と同様の情報を表示可能であってもよい。 The on-site worker device 10 also has a speaker 170 (FIG. 4) that can output the voices of the remote participants received from each of the remote participant devices 20-1 to 20-3 via the server 30. Note that in the example shown in FIG. 1, the on-site worker device 10 does not have a display, but the on-site worker device 10 may have a display, and such a display may be capable of displaying information similar to the information that the remote participant device 20 can display.

　（遠隔参加者デバイス２０）
　遠隔参加者デバイス２０は、コンピュータによって実現され、遠隔参加者によって使用される情報処理装置である。図１に示された例では、遠隔参加者デバイス２０－１は、ＰＣ（Ｐｅｒｓｏｎａｌ　Ｃｏｍｐｕｔｅｒ）であり、遠隔参加者デバイス２０－２は、ＨＭＤ（Ｈｅａｄ　Ｍｏｕｎｔｅｄ　Ｄｉｓｐｌａｙ）であり、遠隔参加者デバイス２０－３は、スマートフォンである。このように、遠隔参加者デバイス２０の具体的な形態は特に限定されない。 (Remote Participant Device 20)
The remote participant device 20 is an information processing device realized by a computer and used by a remote participant. In the example shown in Fig. 1, the remote participant device 20-1 is a PC (Personal Computer), the remote participant device 20-2 is a HMD (Head Mounted Display), and the remote participant device 20-3 is a smartphone. Thus, the specific form of the remote participant device 20 is not particularly limited.

　ＨＭＤは、ＶＲ（Ｖｉｒｔｕａｌ　Ｒｅａｌｉｔｙ）デバイス、ＡＲ（Ａｕｇｍｅｎｔｅｄ　Ｒｅａｌｉｔｙ）デバイス、または、ＭＲ（Ｍｉｘｅｄ　Ｒｅａｌｉｔｙ）デバイスなどによって実現されてよい。なお、以下の説明においては、遠隔参加者デバイス２０－１を使用する遠隔参加者が、所長である場合を想定する。さらに、図１に示された例では、情報処理システム１が備える遠隔参加者デバイス２０の数が３つである。しかし、情報処理システム１が備える遠隔参加者デバイス２０の数は、１または２つであってもよいし、４つ以上であってもよい。 The HMD may be realized by a VR (Virtual Reality) device, an AR (Augmented Reality) device, or an MR (Mixed Reality) device. In the following description, it is assumed that the remote participant using the remote participant device 20-1 is the director. Furthermore, in the example shown in FIG. 1, the number of remote participant devices 20 provided in the information processing system 1 is three. However, the number of remote participant devices 20 provided in the information processing system 1 may be one or two, or may be four or more.

　また、遠隔参加者デバイス２０は、現場作業者デバイス１０から受信した、作業現場における３６０度映像と遠隔参加者の視点に関する情報とに応じた表示情報を表示可能なディスプレイ２８０（図５）を有している。また、遠隔参加者デバイス２０は、遠隔地における音（以下、「遠隔音声」とも言う。）を検出するマイクロフォン２２０（図５）を有している。遠隔音声には、遠隔参加者の音声が含まれ得る。また、現場作業者デバイス１０は、現場作業者デバイス１０からサーバ３０を介して受信した現場作業者の音声、および、他の遠隔参加者デバイス２０からサーバ３０を介して受信した遠隔参加者の音声を出力可能なスピーカ２７０（図５）を有している。 The remote participant device 20 also has a display 280 (FIG. 5) capable of displaying display information corresponding to the 360-degree video of the work site and information related to the viewpoint of the remote participant received from the on-site worker device 10. The remote participant device 20 also has a microphone 220 (FIG. 5) that detects sound in a remote location (hereinafter also referred to as "remote audio"). The remote audio may include the voice of the remote participant. The on-site worker device 10 also has a speaker 270 (FIG. 5) that can output the voice of the on-site worker received from the on-site worker device 10 via the server 30, and the voice of the remote participant received from another remote participant device 20 via the server 30.

　（サーバ３０）
　サーバ３０は、コンピュータによって実現され、情報処理装置の例として機能する。なお、以下の説明においては、現場作業者デバイス１０、遠隔参加者デバイス２０およびサーバ３０それぞれが有する機能についてより詳細に説明するが、サーバ３０が有する機能の一部または全部は、サーバ３０の代わりに現場作業者デバイス１０または遠隔参加者デバイス２０が有していてもよい。 (Server 30)
The server 30 is realized by a computer and functions as an example of an information processing device. In the following description, the functions of the on-site worker device 10, the remote participant device 20, and the server 30 will be described in more detail, but some or all of the functions of the server 30 may be possessed by the on-site worker device 10 or the remote participant device 20 instead of the server 30.

　（ネットワーク４０）
　ネットワーク４０は、ネットワーク４０に接続された装置間における通信を中継する。ネットワーク４０の種別は限定されない。具体的な一例として、ネットワーク４０は、Ｗｉ－Ｆｉ（登録商標）規格に基づくネットワークのような、所謂無線のネットワークにより構成されていてもよい。また、他の一例として、ネットワーク４０は、インターネット、専用線、ＬＡＮ（Ｌｏｃａｌ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）、または、ＷＡＮ（Ｗｉｄｅ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）などにより構成されていてもよい。また、ネットワーク４０は、複数のネットワークを含んでもよく、少なくとも一部が有線のネットワークとして構成されていてもよい。 (Network 40)
The network 40 relays communication between devices connected to the network 40. The type of the network 40 is not limited. As a specific example, the network 40 may be configured as a so-called wireless network, such as a network based on the Wi-Fi (registered trademark) standard. As another example, the network 40 may be configured as the Internet, a dedicated line, a local area network (LAN), a wide area network (WAN), or the like. The network 40 may include multiple networks, and at least a part of the network may be configured as a wired network.

　（装置間の送受信データ）
　図２は、本開示の実施形態に係る情報処理システム１を構成する装置間において送受信されるデータの例を示す図である。 (Data transmitted between devices)
FIG. 2 is a diagram showing an example of data transmitted and received between devices constituting the information processing system 1 according to an embodiment of the present disclosure.

　図２に示されるように、現場作業者デバイス１０は、カメラ１１０（図４）によって得られた作業現場における３６０度映像と遠隔参加者の視点に関する情報とに応じた表示情報（以下、「表示映像」とも言う。）がリアルタイムに、サーバ３０を介して遠隔参加者デバイス２０－１～２０－３それぞれに送信されるように制御する。遠隔参加者デバイス２０－１～２０－３それぞれは、表示映像を受信し、受信した表示映像をディスプレイ２８０（図５）に表示する。 As shown in FIG. 2, the on-site worker device 10 controls so that display information (hereinafter also referred to as "display image") corresponding to the 360-degree image of the work site obtained by the camera 110 (FIG. 4) and information related to the viewpoints of the remote participants is transmitted in real time to each of the remote participant devices 20-1 to 20-3 via the server 30. Each of the remote participant devices 20-1 to 20-3 receives the display image and displays the received display image on the display 280 (FIG. 5).

　遠隔参加者は、表示映像を視認することにより、作業現場の状況を把握し得る。 Remote participants can grasp the situation at the work site by visually checking the displayed image.

　さらに、現場作業者デバイス１０を使用する現場作業者、および、遠隔参加者デバイス２０－１～２０－３を使用する遠隔参加者の任意の組み合わせにおいて対話が行われ得る。ただし、紙面の都合上、遠隔参加者デバイス２０－１を使用する遠隔参加者と遠隔参加者デバイス２０－３を使用する遠隔参加者との間において行われる対話は、図２から省略されている。 Furthermore, interactions may take place between any combination of on-site workers using on-site worker device 10 and remote participants using remote participant devices 20-1 to 20-3. However, due to space limitations, interactions between a remote participant using remote participant device 20-1 and a remote participant using remote participant device 20-3 are omitted from FIG. 2.

　このような対話により、作業現場の状況についての円滑なコミュニケーションが図られ得る。なお、より具体的に、対話は、現場作業者デバイス１０を使用する現場作業者、および、遠隔参加者デバイス２０－１～２０－３を使用する遠隔参加者のうち、一人が話者となり、話者以外の者が聴者となり、話者の音声が、話者の使用するデバイスのマイクロフォンに入力され、聴者の使用するデバイスのスピーカによって聴者に提示されることにより実現され得る。 Such dialogue can facilitate smooth communication about the situation at the work site. More specifically, the dialogue can be realized by having one of the on-site worker using the on-site worker device 10 and the remote participants using the remote participant devices 20-1 to 20-3 act as the speaker and the other person as the listener, with the speaker's voice being input to the microphone of the device used by the speaker and presented to the listener via the speaker of the device used by the listener.

　（全体的な処理の例）
　図３は、本開示の実施形態に係る情報処理システム１によって行われる全体的な処理の例を示す図である。図３に示されるように、全体的な処理は、「リアルタイム処理」と、「ポスト処理」とに分けられる。「リアルタイム処理」は、作業現場における３６０度映像がリアルタイムに配信されるタイミングと同じタイミングに行われる処理である。「ポスト処理」は、作業現場における３６０度映像がサーバ３０に保存された後に行われる処理である。 (Overall processing example)
Fig. 3 is a diagram showing an example of the overall processing performed by the information processing system 1 according to an embodiment of the present disclosure. As shown in Fig. 3, the overall processing is divided into "real-time processing" and "post-processing". "Real-time processing" is processing performed at the same timing as the 360-degree video of the work site is distributed in real time. "Post-processing" is processing performed after the 360-degree video of the work site is saved in the server 30.

　「リアルタイム処理」は、現場作業者デバイス１０からサーバ３０への３６０度映像の送信（Ｓ３１）、現場作業者デバイス１０からサーバ３０への現場音声の送信（Ｓ３２）、および、現場作業者デバイス１０からサーバ３０への現場作業者の位置情報の送信（Ｓ３３）などを含む。 The "real-time processing" includes the transmission of 360-degree video from the on-site worker device 10 to the server 30 (S31), the transmission of on-site audio from the on-site worker device 10 to the server 30 (S32), and the transmission of the on-site worker's position information from the on-site worker device 10 to the server 30 (S33).

　また、「リアルタイム処理」は、遠隔参加者デバイス２０からサーバ３０への遠隔音声の送信（Ｓ３４）、遠隔参加者デバイス２０からサーバ３０への、遠隔参加者の視点方向、ＦｏＶ（Ｆｉｅｌｄ　ｏｆ　Ｖｉｅｗ）およびカーソル位置の送信（Ｓ３５）などを含む。視点方向およびＦｏＶは、上記した視点に関する情報に含まれ得る。 The "real-time processing" also includes transmitting remote audio from the remote participant device 20 to the server 30 (S34), transmitting the remote participant's viewpoint direction, FoV (Field of View), and cursor position from the remote participant device 20 to the server 30 (S35), and the like. The viewpoint direction and FoV may be included in the information related to the viewpoint described above.

　なお、本明細書において、視点は、３６０度映像に対して遠隔参加者が視線を向けている位置である。したがって、視点方向は、３６０度映像に対して遠隔参加者が視線を向けている方向の３軸（Ｘ軸、Ｙ軸およびＺ軸）周りの角度（ロール角、ピッチ角およびヨ―角）によって表現され得る。 In this specification, the viewpoint is the position at which the remote participant is looking at the 360-degree video. Therefore, the viewpoint direction can be expressed by the angles (roll angle, pitch angle, and yaw angle) around the three axes (X-axis, Y-axis, and Z-axis) of the direction in which the remote participant is looking at the 360-degree video.

　ＦｏＶは、３６０度映像に対する表示映像の拡大率である。したがって、表示映像は、３６０度映像のうち、遠隔参加者の視点方向を基準としてＦｏＶで規定される範囲の映像である。ＦｏＶは、遠隔参加者による拡大率を上げるための操作により大きくなり、遠隔参加者による拡大率を下げるための操作により小さくなる。 The FoV is the magnification ratio of the displayed image relative to the 360-degree image. Therefore, the displayed image is the image of the 360-degree image within the range defined by the FoV, based on the viewpoint direction of the remote participant. The FoV becomes larger when the remote participant operates to increase the magnification ratio, and becomes smaller when the remote participant operates to decrease the magnification ratio.

　カーソル位置は、表示映像上に重畳されるカーソルの位置である。カーソル位置は、遠隔参加者によるカーソル移動操作により表示映像上を移動し得る。そして、遠隔参加者による決定操作により、カーソル位置に対応する処理が実行され得る。 The cursor position is the position of the cursor superimposed on the displayed image. The cursor position can be moved on the displayed image by a cursor movement operation by the remote participant. Then, a process corresponding to the cursor position can be executed by a confirm operation by the remote participant.

　また、「リアルタイム処理」は、サーバ３０によって実行されるマルチモーダル処理を含む（Ｓ３６）。例えば、マルチモーダル処理は、サーバ３０が、３６０度映像と視点方向とＦｏＶとに基づいて表示映像を生成する処理、および、表示映像を遠隔参加者デバイス２０に送信する処理、および、遠隔参加者デバイス２０が、表示映像を表示する処理（Ｓ３７）を含む。なお、以下では、サーバ３０によって３６０度映像が保存される前にリアルタイムに表示される表示映像を「リアルタイム映像」とも言う。 The "real-time processing" also includes multimodal processing executed by the server 30 (S36). For example, the multimodal processing includes a process in which the server 30 generates a display image based on the 360-degree video, the viewing direction, and the FoV, a process in which the server 30 transmits the display image to the remote participant device 20, and a process in which the remote participant device 20 displays the display image (S37). Note that, below, the display image that is displayed in real time before the 360-degree video is saved by the server 30 is also referred to as the "real-time video."

　また、マルチモーダル処理は、サーバ３０が、現場音声から現場作業者の音声を抽出し、現場作業者の音声を遠隔参加者デバイス２０に送信する処理、および、遠隔参加者デバイス２０が、現場作業者の音声を出力する処理（Ｓ３８）を含む。 The multimodal processing also includes a process in which the server 30 extracts the voice of the on-site worker from the on-site voice and transmits the voice of the on-site worker to the remote participant device 20, and a process in which the remote participant device 20 outputs the voice of the on-site worker (S38).

　さらに、マルチモーダル処理は、サーバ３０が、遠隔音声から遠隔参加者の音声を抽出し、遠隔参加者の音声を現場作業者デバイス１０に送信する処理、および、現場作業者デバイス１０が、遠隔参加者の音声を出力する処理（Ｓ３９）を含む。なお、図３には示されていないが、マルチモーダル処理は、サーバ３０が、遠隔参加者の音声を他の遠隔参加者が使用する遠隔参加者デバイス２０に送信する処理、および、他の遠隔参加者が使用する遠隔参加者デバイス２０が、遠隔参加者の音声を出力する処理も含む。 Furthermore, the multimodal processing includes a process in which the server 30 extracts the voice of the remote participant from the remote voice and transmits the voice of the remote participant to the on-site worker device 10, and a process in which the on-site worker device 10 outputs the voice of the remote participant (S39). Note that, although not shown in FIG. 3, the multimodal processing also includes a process in which the server 30 transmits the voice of the remote participant to a remote participant device 20 used by another remote participant, and a process in which the remote participant device 20 used by the other remote participant outputs the voice of the remote participant.

　また、マルチモーダル処理は、現在時刻を示す時刻情報、現場作業者の位置情報、遠隔参加者の視点方向、ＦｏＶおよびカーソル位置などのメタ情報を取得する処理を含む。メタ情報は、映像と現場音声と遠隔音声とに関連するタグに組み込まれる。タグには、後にも説明するように、他のメタ情報も後に組み込まれ得る。以下の説明においては、タグにメタ情報を組み込むことを「タグ付け」とも言う。 Multimodal processing also includes processing to acquire meta-information such as time information indicating the current time, position information of on-site workers, viewpoint direction of remote participants, FoV, and cursor position. The meta-information is incorporated into tags associated with the video, on-site audio, and remote audio. As will be explained later, other meta-information may also be incorporated into the tag later. In the following explanation, incorporating meta-information into a tag is also referred to as "tagging."

　また、「リアルタイム処理」は、サーバ３０によって実行される各種データ（３６０度映像、対話履歴、メタ情報など）を保存する処理（Ｓ４０）を含む。対話履歴は、現場作業者の音声と遠隔参加者の音声とを含む。 The "real-time processing" also includes a process (S40) executed by the server 30 to save various data (360-degree video, dialogue history, meta information, etc.). The dialogue history includes the voice of the on-site worker and the voice of the remote participant.

　「ポスト処理」は、サーバ３０が、保存された３６０度映像に対して映像処理を行って得た表示映像を遠隔参加者デバイス２０に送信する処理、および、保存された現場作業者の音声および保存された遠隔参加者の音声に対して音声処理を行って得た音声を遠隔参加者デバイス２０に送信する処理を含む（Ｓ４１）。また、「ポスト処理」は、遠隔参加者デバイス２０が、表示映像を過去の表示映像として表示する処理、および、遠隔参加者デバイス２０が、当該音声を過去の音声として出力する処理を含む（Ｓ４２）。 "Post-processing" includes a process in which the server 30 transmits to the remote participant device 20 a display image obtained by performing image processing on the stored 360-degree image, and a process in which the server 30 transmits to the remote participant device 20 an audio obtained by performing audio processing on the stored audio of the on-site worker and the stored audio of the remote participant (S41). In addition, "post-processing" includes a process in which the remote participant device 20 displays the display image as a past display image, and a process in which the remote participant device 20 outputs the audio as a past audio (S42).

　なお、以下では、サーバ３０によって一旦保存されてから取り出された３６０度映像に基づいて表示される表示映像を「アーカイブ映像」とも言う。 Note that below, the display image that is displayed based on the 360-degree image that has been temporarily stored by the server 30 and then retrieved is also referred to as "archived image."

　「ポスト処理」は、サーバ３０が、保存されたタグに基づくレポートの作成（レポート化）を自動的に行う処理（Ｓ４３）を含む。また、「ポスト処理」は、サーバ３０が、レポートを遠隔参加者デバイス２０に送信する処理、および、遠隔参加者デバイス２０が、レポートを表示する処理（Ｓ４４）を含む。 The "post-processing" includes a process (S43) in which the server 30 automatically creates a report (reporting) based on the saved tags. The "post-processing" also includes a process (S44) in which the server 30 transmits the report to the remote participant device 20, and a process (S44) in which the remote participant device 20 displays the report.

　以上、本開示の実施形態に係る情報処理システム１の構成例について説明した。 The above describes an example configuration of the information processing system 1 according to an embodiment of the present disclosure.

　（１．２．現場作業者デバイスの機能構成例）
　続いて、図４を参照しながら、本開示の実施形態に係る現場作業者デバイス１０の機能構成例について説明する。 (1.2. Example of Functional Configuration of Field Worker Device)
Next, an example of the functional configuration of the field worker device 10 according to the embodiment of the present disclosure will be described with reference to FIG.

　図４は、本開示の実施形態に係る現場作業者デバイス１０の機能構成例を示す図である。図４に示されるように、現場作業者デバイス１０は、カメラ１１０と、マイクロフォン１２０と、検出部１３０と、制御部１４０と、記憶部１５０と、通信部１６０と、スピーカ１７０とを備える。 FIG. 4 is a diagram showing an example of the functional configuration of the field worker device 10 according to an embodiment of the present disclosure. As shown in FIG. 4, the field worker device 10 includes a camera 110, a microphone 120, a detection unit 130, a control unit 140, a storage unit 150, a communication unit 160, and a speaker 170.

　（カメラ１１０）
　カメラ１１０は、作業者が存在する作業現場を撮像することにより３６０度映像を得る。ここでは、カメラ１１０が二つの撮像装置を有しており、二つの撮像装置それぞれによって撮像されて得られた映像が結合されることにより、３６０度映像が得られる場合を主に想定する。しかし、カメラ１１０が有する撮像装置の数は、三つ以上であってもよいし、一つであってもよい。 (Camera 110)
The camera 110 captures a 360-degree image by capturing an image of the work site where the worker is present. Here, it is mainly assumed that the camera 110 has two imaging devices, and the images captured by the two imaging devices are combined to obtain the 360-degree image. However, the number of imaging devices that the camera 110 has may be three or more, or may be one.

　例えば、撮像装置が備えるレンズは、魚眼レンズであってもよい。魚眼レンズを通して撮像装置による撮像が行われる場合には、撮像によって得られる画像に歪みが生じてしまうものの、広範囲の画像が得られる。一例として、魚眼レンズの画角は、１８０度よりも大きな値（例えば、１８５度など）であってもよい。しかし、撮像装置が備えるレンズは、魚眼レンズ以外のレンズであってもよい。 For example, the lens included in the imaging device may be a fisheye lens. When the imaging device captures an image through a fisheye lens, a wide range of image is obtained, although distortion occurs in the image obtained by the capture. As an example, the angle of view of the fisheye lens may be a value greater than 180 degrees (e.g., 185 degrees). However, the lens included in the imaging device may be a lens other than a fisheye lens.

　例えば、撮像装置が備えるレンズは、超広角レンズであってもよい。超広角レンズを通して撮像装置による撮像が行われる場合にも、（魚眼レンズを通して撮像が行われる場合と比較して狭い範囲の画像が得られる可能性があるが）、広範囲の画像が得られる。さらに、超広角レンズを通して撮像装置による撮像が行われる場合には、撮像によって得られる画像に生じる歪みを抑えることが可能となる。 For example, the lens of the imaging device may be an ultra-wide-angle lens. When the imaging device captures images through the ultra-wide-angle lens, a wide range of image can be obtained (although there is a possibility that a narrower range of image can be obtained compared to when the imaging device captures images through a fisheye lens). Furthermore, when the imaging device captures images through the ultra-wide-angle lens, it is possible to reduce distortion that occurs in the image obtained by imaging.

　（マイクロフォン１２０）
　マイクロフォン１２０は、現場音声を検出する。上記したように、現場音声には、現場環境音および現場作業者の音声が含まれ得る。 (Microphone 120)
The microphone 120 detects on-site sounds. As described above, the on-site sounds may include on-site environmental sounds and the voices of on-site workers.

　（検出部１３０）
　検出部１３０は、現場作業者の位置情報を検出する。例えば、検出部１３０は、位置検出センサを有しており、位置検出センサによって現場作業者の位置情報を検出する。例えば、位置検出センサは、ＧＰＳ（Ｇｌｏｂａｌ　Ｐｏｓｉｔｉｏｎｉｎｇ　Ｓｙｓｔｅｍ）センサであってもよい。あるいは、検出部１３０は、カメラ１１０による撮像によって得られた映像に基づいて自己位置推定技術により現場作業者の位置情報を検出してもよい。あるいは、カメラ１１０による撮像によって得られた映像が、通信部１６０によってサーバ３０に送信され、サーバ３０が、その映像に基づいて現場作業者の位置情報を検出してもよい。また、現場作業者の位置情報を検出する処理は、リアルタイム処理にて行われてもよいし、ポスト処理にて行われてもよい。 (Detection Unit 130)
The detection unit 130 detects the position information of the on-site worker. For example, the detection unit 130 has a position detection sensor, and detects the position information of the on-site worker by the position detection sensor. For example, the position detection sensor may be a GPS (Global Positioning System) sensor. Alternatively, the detection unit 130 may detect the position information of the on-site worker by a self-position estimation technique based on an image captured by the camera 110. Alternatively, the image captured by the camera 110 may be transmitted to the server 30 by the communication unit 160, and the server 30 may detect the position information of the on-site worker based on the image. In addition, the process of detecting the position information of the on-site worker may be performed in real time processing or in post processing.

　なお、現場作業者の位置情報の表現形式は限定されない。一例として、現場作業者の位置情報は、作業現場のマップ上における座標によって表現されてもよい。 The format of the location information of the on-site worker is not limited. As an example, the location information of the on-site worker may be expressed by coordinates on a map of the work site.

　（制御部１４０）
　制御部１４０は、１または複数のプロセッサによって構成される。プロセッサは、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ；中央演算処理装置）などであってもよい。制御部１４０がプロセッサによって構成される場合、かかるプロセッサは、電子回路によって構成されてよい。制御部１４０は、かかるプロセッサによってプログラムが実行されることによって実現され得る。 (Control unit 140)
The control unit 140 is configured with one or more processors. The processor may be a CPU (Central Processing Unit) or the like. When the control unit 140 is configured with a processor, the processor may be configured with an electronic circuit. The control unit 140 can be realized by the processor executing a program.

　（記憶部１５０）
　記憶部１５０は、メモリを含んで構成され、制御部１４０によって実行されるプログラムを記憶したり、このプログラムの実行に必要なデータを記憶したりする記録媒体である。また、記憶部１５０は、制御部１４０による演算のためにデータを一時的に記憶する。記憶部１５０は、磁気記憶部デバイス、半導体記憶デバイス、光記憶デバイス、または、光磁気記憶デバイスなどにより構成される。 (Memory unit 150)
The storage unit 150 is a recording medium including a memory, which stores a program executed by the control unit 140 and stores data necessary for executing the program. The storage unit 150 also temporarily stores data for calculations by the control unit 140. The storage unit 150 is configured by a magnetic storage device, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.

　（通信部１６０）
　通信部１６０は、通信インターフェースを含んで構成されており、ネットワーク４０を介して、遠隔参加者デバイス２０－１～２０－３およびサーバ３０それぞれと通信を行う。例えば、通信部１６０は、マイクロフォン１２０によって検出された現場音声をサーバ３０に送信する。また、通信部１６０は、サーバ３０から送信された遠隔参加者の音声を受信する。 (Communication unit 160)
The communication unit 160 includes a communication interface and communicates with each of the remote participant devices 20-1 to 20-3 and the server 30 via the network 40. For example, the communication unit 160 transmits on-site audio detected by the microphone 120 to the server 30. The communication unit 160 also receives audio of the remote participants transmitted from the server 30.

　（スピーカ１７０）
　スピーカ１７０は、制御部１４０による制御に従って、音の出力を行う。例えば、スピーカ１７０は、通信部１６０によって受信された遠隔参加者の音声を出力する。 (Speaker 170)
The speaker 170 outputs sound under the control of the control unit 140. For example, the speaker 170 outputs the voice of a remote participant received by the communication unit 160.

　以上、本開示の実施形態に係る現場作業者デバイス１０の機能構成例について説明した。 The above describes an example of the functional configuration of the field worker device 10 according to an embodiment of the present disclosure.

　（１．３．遠隔参加者デバイスの機能構成例）
　続いて、図５を参照しながら、本開示の実施形態に係る遠隔参加者デバイス２０の機能構成例について説明する。 (1.3. Example of Functional Configuration of Remote Participant Device)
Next, an example of the functional configuration of the remote participant device 20 according to an embodiment of the present disclosure will be described with reference to FIG.

　図５は、本開示の実施形態に係る遠隔参加者デバイス２０の機能構成例を示す図である。図５に示されるように、遠隔参加者デバイス２０は、視点検出部２１０と、マイクロフォン２２０と、操作検出部２３０と、制御部２４０と、記憶部２５０と、通信部２６０と、スピーカ２７０と、ディスプレイ２８０とを備える。 FIG. 5 is a diagram showing an example of the functional configuration of a remote participant device 20 according to an embodiment of the present disclosure. As shown in FIG. 5, the remote participant device 20 includes a viewpoint detection unit 210, a microphone 220, an operation detection unit 230, a control unit 240, a storage unit 250, a communication unit 260, a speaker 270, and a display 280.

　（視点検出部２１０）
　視点検出部２１０は、遠隔参加者の視点方向を検出する。上記したように、遠隔参加者の視点方向は、３６０度映像に対して遠隔参加者が視線を向けている方向の３軸周りの角度（ロール角、ピッチ角およびヨ―角）によって表現され得る。例えば、視点検出部２１０は、加速度センサおよび方位センサを有してもよい。このとき、視点検出部２１０は、加速度センサによって、ロール角およびピッチ角を検出し、方位センサによってヨ―角を検出してもよい。 (Viewpoint detection unit 210)
The viewpoint detection unit 210 detects the viewpoint direction of the remote participant. As described above, the viewpoint direction of the remote participant can be expressed by angles (roll angle, pitch angle, and yaw angle) around three axes in the direction in which the remote participant is looking at the 360-degree video. For example, the viewpoint detection unit 210 may have an acceleration sensor and an orientation sensor. In this case, the viewpoint detection unit 210 may detect the roll angle and pitch angle by the acceleration sensor, and detect the yaw angle by the orientation sensor.

　（マイクロフォン２２０）
　マイクロフォン２２０は、遠隔音声を検出する。上記したように、遠隔音声には、遠隔参加者の音声が含まれ得る。 (Microphone 220)
The microphone 220 detects distant audio, which, as noted above, may include the voices of remote participants.

　（操作検出部２３０）
　操作検出部２３０は、遠隔参加者によって入力される各種操作を検出する。操作検出部２３０の具体的な形態は限定されない。例えば、遠隔参加者デバイス２０－１が備える操作検出部２３０は、マウスおよびキーボードを有しており、マウスおよびキーボードによって操作を検出する。また、遠隔参加者デバイス２０－２が備える操作検出部２３０は、ボタンを有しており、ボタンによって操作を検出する。遠隔参加者デバイス２０－３が備える操作検出部２３０は、タッチパネルを有しており、タッチパネルによって操作を検出する。 (Operation detection unit 230)
The operation detection unit 230 detects various operations input by remote participants. The specific form of the operation detection unit 230 is not limited. For example, the operation detection unit 230 provided in the remote participant device 20-1 has a mouse and a keyboard and detects operations by the mouse and the keyboard. The operation detection unit 230 provided in the remote participant device 20-2 has a button and detects operations by the button. The operation detection unit 230 provided in the remote participant device 20-3 has a touch panel and detects operations by the touch panel.

　（制御部２４０）
　制御部２４０は、１または複数のプロセッサによって構成される。プロセッサは、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ；中央演算処理装置）などであってもよい。制御部２４０がプロセッサによって構成される場合、かかるプロセッサは、電子回路によって構成されてよい。制御部２４０は、かかるプロセッサによってプログラムが実行されることによって実現され得る。 (Control unit 240)
The control unit 240 is configured with one or more processors. The processor may be a CPU (Central Processing Unit) or the like. When the control unit 240 is configured with a processor, the processor may be configured with an electronic circuit. The control unit 240 can be realized by the processor executing a program.

　（記憶部２５０）
　記憶部２５０は、メモリを含んで構成され、制御部２４０によって実行されるプログラムを記憶したり、このプログラムの実行に必要なデータを記憶したりする記録媒体である。また、記憶部２５０は、制御部２４０による演算のためにデータを一時的に記憶する。記憶部２５０は、磁気記憶部デバイス、半導体記憶デバイス、光記憶デバイス、または、光磁気記憶デバイスなどにより構成される。 (Memory unit 250)
The storage unit 250 is a recording medium including a memory, which stores a program executed by the control unit 240 and stores data necessary for executing the program. The storage unit 250 also temporarily stores data for calculations by the control unit 240. The storage unit 250 is configured by a magnetic storage device, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.

　（通信部２６０）
　通信部２６０は、通信インターフェースを含んで構成されており、ネットワーク４０を介して、現場作業者デバイス１０、他の遠隔参加者デバイス２０およびサーバ３０それぞれと通信を行う。例えば、通信部２６０は、マイクロフォン２２０によって検出された遠隔音声をサーバ３０に送信する。また、通信部２６０は、サーバ３０から送信された現場作業者の音声および表示映像を受信する。 (Communication unit 260)
The communication unit 260 includes a communication interface and communicates with the on-site worker device 10, the other remote participant devices 20, and the server 30 via the network 40. For example, the communication unit 260 transmits remote voice detected by the microphone 220 to the server 30. The communication unit 260 also receives the voice and display image of the on-site worker transmitted from the server 30.

　（スピーカ２７０）
　スピーカ２７０は、制御部２４０による制御に従って、音の出力を行う。例えば、スピーカ２７０は、通信部２６０によって受信された現場作業者の音声を出力する。 (Speaker 270)
The speaker 270 outputs sound in accordance with the control of the control unit 240. For example, the speaker 270 outputs the voice of a site worker received by the communication unit 260.

　（ディスプレイ２８０）
　ディスプレイ２８０は、制御部２４０による制御に従って、各種映像を表示する。例えば、ディスプレイ２８０は、通信部２６０によって受信された表示映像を表示する。なお、上記したように、表示映像は、３６０度映像と視点方向とＦｏＶとに基づいてサーバ３０によって生成された映像であってよい。 (Display 280)
The display 280 displays various images under the control of the control unit 240. For example, the display 280 displays a display image received by the communication unit 260. As described above, the display image may be an image generated by the server 30 based on the 360-degree image, the viewing direction, and the FoV.

　以上、本開示の実施形態に係る遠隔参加者デバイス２０の機能構成例について説明した。 The above describes an example of the functional configuration of the remote participant device 20 according to an embodiment of the present disclosure.

　（１．４．サーバの機能構成例）
　続いて、図６を参照しながら、本開示の実施形態に係るサーバ３０の機能構成例について説明する。 (1.4. Example of Server Functional Configuration)
Next, an example of a functional configuration of the server 30 according to an embodiment of the present disclosure will be described with reference to FIG.

　図６は、本開示の実施形態に係るサーバ３０の機能構成例を示す図である。図６に示されるように、サーバ３０は、制御部３４０と、記憶部３５０と、通信部３６０とを備える。 FIG. 6 is a diagram showing an example of the functional configuration of the server 30 according to an embodiment of the present disclosure. As shown in FIG. 6, the server 30 includes a control unit 340, a storage unit 350, and a communication unit 360.

　（制御部３４０）
　制御部３４０は、１または複数のプロセッサによって構成される。プロセッサは、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ；中央演算処理装置）などであってもよい。制御部３４０がプロセッサによって構成される場合、かかるプロセッサは、電子回路によって構成されてよい。電子回路は、制御回路とも換言され得る。制御部３４０は、かかるプロセッサによってプログラムが実行されることによって実現され得る。 (Control unit 340)
The control unit 340 is configured with one or more processors. The processor may be a CPU (Central Processing Unit) or the like. When the control unit 340 is configured with a processor, the processor may be configured with an electronic circuit. The electronic circuit may also be referred to as a control circuit. The control unit 340 may be realized by the processor executing a program.

　（記憶部３５０）
　記憶部３５０は、メモリを含んで構成され、制御部３４０によって実行されるプログラムを記憶したり、このプログラムの実行に必要なデータを記憶したりする記録媒体である。また、記憶部３５０は、制御部３４０による演算のためにデータを一時的に記憶する。記憶部３５０は、磁気記憶部デバイス、半導体記憶デバイス、光記憶デバイス、または、光磁気記憶デバイスなどにより構成される。 (Memory unit 350)
The storage unit 350 is a recording medium including a memory, which stores a program executed by the control unit 340 and stores data necessary for executing the program. The storage unit 350 also temporarily stores data for calculations by the control unit 340. The storage unit 350 is configured by a magnetic storage device, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.

　（通信部３６０）
　通信部３６０は、通信インターフェースを含んで構成されており、ネットワーク４０を介して、現場作業者デバイス１０および遠隔参加者デバイス２０－１～２０－３それぞれと通信を行う。 (Communication unit 360)
The communication unit 360 includes a communication interface, and communicates with the on-site worker device 10 and each of the remote participant devices 20-1 to 20-3 via the network 40.

　例えば、通信部３６０は、３６０度映像および現場音声を現場作業者デバイス１０から受信する。また、通信部３６０は、表示映像および現場作業者の音声を遠隔参加者デバイス２０－１～２０－３それぞれに送信する。また、通信部３６０は、遠隔音声を話者が使用する遠隔参加者デバイス２０から受信する。また、通信部３６０は、遠隔参加者の音声を現場作業者デバイス１０および聴者が使用する遠隔参加者デバイス２０それぞれに送信する。 For example, the communication unit 360 receives 360-degree video and on-site audio from the on-site worker device 10. The communication unit 360 also transmits the display video and the audio of the on-site worker to each of the remote participant devices 20-1 to 20-3. The communication unit 360 also receives remote audio from the remote participant device 20 used by the speaker. The communication unit 360 also transmits the audio of the remote participant to each of the on-site worker device 10 and the remote participant device 20 used by the listener.

　以上、本開示の実施形態に係るサーバ３０の機能構成例について説明した。 The above describes an example of the functional configuration of the server 30 according to an embodiment of the present disclosure.

　（１．５．マルチモーダル処理の詳細）
　続いて、本開示の実施形態に係るマルチモーダル処理の詳細について説明する。上記したように、現場作業者は、作業現場に存在しており、作業現場を直接的に視認し得る。一方、遠隔参加者は、３６０度映像と遠隔参加者の視点方向とＦｏＶとに応じた表示映像を視認し得る。現場作業者および遠隔参加者は、それぞれ自身の視野範囲を確認しながら、対話を行い得る。 (1.5. Details of multimodal processing)
Next, details of the multimodal processing according to the embodiment of the present disclosure will be described. As described above, the on-site worker is present at the work site and can directly view the work site. Meanwhile, the remote participant can view the 360-degree video and a display video according to the viewpoint direction and FoV of the remote participant. The on-site worker and the remote participant can have a conversation while checking their own field of view.

　例えば、作業現場に存在するオブジェクトに対して現場作業者が指摘（例えば、質問など）を行いたいと考える場合が生じ得る。このとき、遠隔参加者が、指摘に対して返答を行いたいと考える場合が生じ得る。あるいは、作業現場に存在するオブジェクトに対して遠隔参加者が指摘（例えば、是正要請など）を行いたいと考える場合が生じ得る。このとき、現場作業者が、指摘に対して返答（例えば、対応状況の返答など）を行いたいと考える場合が生じ得る。 For example, there may be cases where an on-site worker wants to make a comment (e.g., a question) about an object that exists at the work site. At this time, there may be cases where a remote participant wants to reply to the comment. Alternatively, there may be cases where a remote participant wants to make a comment (e.g., a request for correction) about an object that exists at the work site. At this time, there may be cases where an on-site worker wants to reply to the comment (e.g., a response about the current status of the response).

　以下では、図７を参照しながら、遠隔参加者が指摘を行う場合に行われるマルチモーダル処理の詳細について説明し、図８を参照しながら、現場作業者が指摘を行う場合に行われるマルチモーダル処理の詳細について説明する。 Below, we will explain the details of the multimodal processing performed when a remote participant makes an indication, with reference to Figure 7, and explain the details of the multimodal processing performed when an on-site worker makes an indication, with reference to Figure 8.

　（遠隔参加者が指摘を行う場合）
　図７は、遠隔参加者が指摘を行う場合に行われるマルチモーダル処理の詳細について説明するための図である。 (When a remote participant makes a comment)
FIG. 7 is a diagram for explaining details of the multimodal processing performed when a remote participant makes an indication.

　図７に示されるように、サーバ３０において、制御部３４０は、遠隔参加者の音声（Ｄ１１）に基づいて音声認識（Ｓ１１）を行って音声認識結果を得る。音声認識結果は、テキストデータとして得られる。制御部３４０は、遠隔参加者の音声から得られた音声認識結果に基づいて、指摘を示す指摘情報を認識する（Ｓ１２）。 As shown in FIG. 7, in the server 30, the control unit 340 performs voice recognition (S11) based on the voice of the remote participant (D11) to obtain a voice recognition result. The voice recognition result is obtained as text data. The control unit 340 recognizes indication information indicating an indication based on the voice recognition result obtained from the voice of the remote participant (S12).

　一方、制御部３４０は、現場作業者の音声（Ｄ１２）に基づいて音声認識（Ｓ１３）を行って音声認識結果を得る。音声認識結果は、テキストデータとして得られる。制御部３４０は、現場作業者の音声から得られた音声認識結果に基づいて、指摘に対する返答を示す返答情報を認識する（Ｓ１４）。 Meanwhile, the control unit 340 performs voice recognition (S13) based on the voice of the on-site worker (D12) to obtain a voice recognition result. The voice recognition result is obtained as text data. The control unit 340 recognizes response information indicating a response to the indication based on the voice recognition result obtained from the voice of the on-site worker (S14).

　制御部３４０は、３６０度映像（Ｄ１３）に基づいて作業現場に存在するオブジェクトを認識することによりオブジェクトに関する情報を得る（Ｓ１５）。オブジェクトに関する情報には、オブジェクトの種類と３６０度映像におけるオブジェクトの位置情報とが含まれ得る。なお、３６０度映像の代わりに、遠隔参加者の音声から得られた音声認識結果に基づいて、オブジェクトが認識されてもよい。また、制御部３４０は、オブジェクトの認識に、現場環境音（Ｄ１５）を加味してもよい。 The control unit 340 obtains information about the objects by recognizing the objects present at the work site based on the 360-degree video (D13) (S15). The information about the object may include the type of object and the position information of the object in the 360-degree video. Note that instead of the 360-degree video, the object may be recognized based on the voice recognition results obtained from the voice of the remote participant. The control unit 340 may also take into account the site environmental sounds (D15) when recognizing the object.

　このようにして制御部３４０によって認識されたオブジェクトが遠隔参加者によって注意されているオブジェクトであるとみなされてもよい。しかし、ここでは、遠隔参加者によって注意されているオブジェクトの認識精度を高めるため、制御部３４０は、認識したオブジェクトを遠隔参加者によって注意されているオブジェクト候補とする。ここでは、オブジェクト候補が１つ存在する場合を主に想定する。しかし、オブジェクト候補は、複数であってもよい。 The object recognized by the control unit 340 in this manner may be considered to be the object that is being noticed by the remote participant. However, here, in order to increase the accuracy of recognizing the object that is being noticed by the remote participant, the control unit 340 sets the recognized object as a candidate object that is being noticed by the remote participant. Here, it is mainly assumed that there is one object candidate. However, there may be multiple object candidates.

　制御部３４０は、遠隔参加者によって注意されているオブジェクト候補から、遠隔参加者の視点方向、ＦｏＶおよびカーソル位置（Ｄ１４）に基づいて、遠隔参加者によって注意されているオブジェクト（注意点）を認識する（Ｓ１６）。なお、カーソル位置は、遠隔参加者によって注意されているオブジェクトの認識精度をより高めるために加味される。そのため、カーソル位置が存在しない場合などには、カーソル位置は加味されなくてもよい。 The control unit 340 recognizes the object (point of attention) that the remote participant is paying attention to from among the object candidates that the remote participant is paying attention to, based on the viewpoint direction, FoV, and cursor position (D14) of the remote participant (S16). Note that the cursor position is taken into consideration in order to further improve the accuracy of recognizing the object that the remote participant is paying attention to. Therefore, in cases where there is no cursor position, the cursor position does not need to be taken into consideration.

　Ｓ１２における指摘情報の認識のときに、遠隔参加者の音声から得られた音声認識結果に基づいて、オブジェクトの種類が認識される場合も想定され得る。かかる場合には、制御部３４０は、遠隔参加者によって注意されているオブジェクトを、指摘情報とともに認識された種類のオブジェクトからのみ認識してもよい。 When recognizing the pointed-out information in S12, it is conceivable that the type of object may be recognized based on the voice recognition results obtained from the voice of the remote participant. In such a case, the control unit 340 may recognize the object pointed out by the remote participant only from the object type recognized together with the pointed-out information.

　制御部３４０は、遠隔参加者によって注意されているオブジェクトを認識したという条件が満たされた場合に、既にタグＴ１１に組み込まれたメタ情報である、時刻情報（Ｄ１７）、および、現場作業者の位置情報（Ｄ１６）に、新たにタグＴ１１に組み込まれるメタ情報である、指摘者が遠隔参加者であることを示す情報、遠隔参加者によって注意されているオブジェクトの種類、遠隔参加者による指摘情報、および、現場作業者による返答情報を関連付けてタグＴ１１を生成し、生成したタグＴ１１を記憶部３５０に記録する。なお、図示はされていないが、遠隔参加者の視点方向、ＦｏＶおよびカーソル位置などのメタ情報も既にタグＴ１１に組み込まれている。 When the condition that the object pointed out by the remote participant has been recognized is met, the control unit 340 generates tag T11 by associating the meta information already incorporated in tag T11, such as time information (D17) and on-site worker position information (D16), with the meta information to be newly incorporated in tag T11, such as information indicating that the pointer is a remote participant, the type of object pointed out by the remote participant, the information pointed out by the remote participant, and the response information by the on-site worker, and records the generated tag T11 in the storage unit 350. Although not shown, meta information such as the remote participant's viewpoint direction, FoV, and cursor position has also already been incorporated in tag T11.

　なお、遠隔参加者によって注意されているオブジェクトを認識したという条件は、遠隔参加者に関する関連付け開始条件の例に該当する。遠隔参加者に関する関連付け開始条件は、制御部３４０が、既にタグに組み込まれているメタ情報に、新たなメタ情報を関連付けるための条件であり、制御部３４０は、遠隔参加者に関する関連付け開始条件が満たされたことに基づいて、既にタグに組み込まれているメタ情報に、新たなメタ情報を関連付ける。ただし、後にも説明するように、遠隔参加者に関する関連付け開始条件は、遠隔参加者によって注意されているオブジェクトを認識したという条件以外の条件であってもよい。 Note that the condition of recognizing an object that is being called out by a remote participant corresponds to an example of an association start condition for a remote participant. The association start condition for a remote participant is a condition for the control unit 340 to associate new meta information with meta information already incorporated in a tag, and the control unit 340 associates the new meta information with the meta information already incorporated in a tag based on the association start condition for a remote participant being satisfied. However, as will be explained later, the association start condition for a remote participant may be a condition other than the condition of recognizing an object that is being called out by a remote participant.

　図７に示された例では、タグＴ１１に、時刻情報として「〇時〇分」が組み込まれ、位置情報として「位置□」が組み込まれ、オブジェクトの種類として「オブジェクトＡ」が組み込まれ、指摘情報として「△△（是正要請などの注意）」が組み込まれ、返答情報として「対応状況●●」が組み込まれている。なお、図示されていないが、タグＴ１１には、遠隔参加者によって注意されているオブジェクトの位置情報が追加的に組み込まれてもよい。 In the example shown in FIG. 7, tag T11 incorporates "xx:xx" as time information, "position □" as location information, "object A" as the type of object, "△△ (attention such as a request for correction)" as pointed out information, and "response status ●●" as response information. Although not shown, tag T11 may also incorporate additional location information of the object being pointed out by the remote participant.

　（現場作業者が指摘を行う場合）
　図８は、現場作業者が指摘を行う場合に行われるマルチモーダル処理の詳細について説明するための図である。 (When a worker on-site makes a comment)
FIG. 8 is a diagram for explaining details of the multimodal processing performed when a site worker issues an indication.

　図８に示されるように、サーバ３０において、制御部３４０は、現場作業者の音声（Ｄ２３）に基づいて音声認識（Ｓ２２）を行って音声認識結果を得る。音声認識結果は、テキストデータとして得られる。制御部３４０は、現場作業者の音声から得られた音声認識結果に基づいて、指摘を示す指摘情報を認識する（Ｓ２４）。 As shown in FIG. 8, in the server 30, the control unit 340 performs voice recognition (S22) based on the voice of the on-site worker (D23) to obtain a voice recognition result. The voice recognition result is obtained as text data. The control unit 340 recognizes indication information indicating an indication based on the voice recognition result obtained from the voice of the on-site worker (S24).

　一方、制御部３４０は、遠隔参加者の音声（Ｄ２４）に基づいて音声認識（Ｓ２５）を行って音声認識結果を得る。音声認識結果は、テキストデータとして得られる。制御部３４０は、遠隔参加者の音声から得られた音声認識結果に基づいて、指摘に対する返答を示す返答情報を認識する（Ｓ２６）。 Meanwhile, the control unit 340 performs voice recognition (S25) based on the voice of the remote participant (D24) to obtain a voice recognition result. The voice recognition result is obtained as text data. The control unit 340 recognizes response information indicating a response to the indication based on the voice recognition result obtained from the voice of the remote participant (S26).

　なお、返答情報の認識には、遠隔参加者の視点方向、ＦｏＶおよびカーソル位置（Ｄ２５）が加味されてもよい。なお、カーソル位置は、遠隔参加者による返答情報の認識精度をより高めるために加味される。そのため、カーソル位置が存在しない場合などには、カーソル位置は加味されなくてもよい。 Note that the view direction, FoV, and cursor position (D25) of the remote participant may be taken into account when recognizing the response information. Note that the cursor position is taken into account in order to further increase the accuracy of recognition of the response information by the remote participant. Therefore, in cases where the cursor position does not exist, the cursor position does not need to be taken into account.

　制御部３４０は、３６０度映像（Ｄ２１）に基づいて作業現場に存在するオブジェクトを認識することによりオブジェクトに関する情報を得る（Ｓ２１）。オブジェクトに関する情報には、オブジェクトの種類と３６０度映像におけるオブジェクトの位置情報とが含まれ得る。なお、３６０度映像の代わりに、現場作業者の音声から得られた音声認識結果に基づいて、オブジェクトが認識されてもよい。また、制御部３４０は、オブジェクトの認識に、現場環境音（Ｄ２２）を加味してもよい。 The control unit 340 obtains information about the objects by recognizing the objects present at the work site based on the 360-degree video (D21) (S21). The information about the object may include the type of object and the position information of the object in the 360-degree video. Note that instead of the 360-degree video, the object may be recognized based on the voice recognition results obtained from the voice of the on-site worker. The control unit 340 may also take into account the site environmental sounds (D22) when recognizing the object.

　このようにして制御部３４０によって認識されたオブジェクトが現場作業者によって注意されているオブジェクトであるとみなされてもよい。しかし、ここでは、現場作業者によって注意されているオブジェクトの認識精度を高めるため、制御部３４０は、認識したオブジェクトを現場作業者によって注意されているオブジェクト候補とする。ここでは、オブジェクト候補が１つ存在する場合を主に想定する。しかし、オブジェクト候補は、複数であってもよい。 The object recognized by the control unit 340 in this manner may be considered to be the object that has been drawn attention to by the on-site worker. However, here, in order to increase the accuracy of recognizing the object that has been drawn attention to by the on-site worker, the control unit 340 sets the recognized object as a candidate object that has been drawn attention to by the on-site worker. Here, it is mainly assumed that there is one object candidate. However, there may be multiple object candidates.

　制御部３４０は、現場作業者によって注意されているオブジェクト候補から、現場作業者によって指示されている位置に基づいて、現場作業者によって注意されているオブジェクト（注意点）を認識する（Ｓ２３）。なお、現場作業者によって指示されている位置の例については、後に詳細に説明する。 The control unit 340 recognizes the object (point of attention) that the on-site worker is paying attention to from among the candidate objects that the on-site worker is paying attention to, based on the position indicated by the on-site worker (S23). Note that examples of the position indicated by the on-site worker will be described in detail later.

　Ｓ２４における指摘情報の認識のときに、現場作業者の音声から得られた音声認識結果に基づいて、オブジェクトの種類が認識される場合も想定され得る。かかる場合には、制御部３４０は、現場作業者によって注意されているオブジェクトを、指摘情報とともに認識された種類のオブジェクトからのみ認識してもよい。 When recognizing the indicated information in S24, it is conceivable that the type of object may be recognized based on the voice recognition results obtained from the voice of the on-site worker. In such a case, the control unit 340 may recognize the object that has been pointed out by the on-site worker only from the object type recognized together with the indicated information.

　制御部３４０は、現場作業者によって注意されているオブジェクトを認識したという条件が満たされた場合に、既にタグＴ２１に組み込まれたメタ情報である、時刻情報（Ｄ２７）、および、現場作業者の位置情報（Ｄ２６）に、新たにタグＴ２１に組み込まれるメタ情報である、指摘者が現場作業者であることを示す情報、現場作業者によって注意されているオブジェクトの種類、現場作業者による指摘情報、および、遠隔参加者による返答情報を関連付けてタグＴ２１を生成し、生成したタグＴ２１を記憶部３５０に記録する。なお、図示はされていないが、遠隔参加者の視点方向、ＦｏＶおよびカーソル位置などのメタ情報も既にタグＴ２１に組み込まれている。 When the condition that an object called out by a field worker has been recognized is met, the control unit 340 generates tag T21 by associating the meta information already incorporated in tag T21, namely, time information (D27) and location information of the field worker (D26), with the meta information to be newly incorporated in tag T21, namely, information indicating that the person who pointed out the object is a field worker, the type of object called out by the field worker, the information pointed out by the field worker, and response information by the remote participant, and records the generated tag T21 in the storage unit 350. Although not shown, meta information such as the viewpoint direction, FoV, and cursor position of the remote participant has also already been incorporated in tag T21.

　なお、現場作業者によって注意されているオブジェクトを認識したという条件は、現場作業者に関する関連付け開始条件の例に該当する。現場作業者に関する関連付け開始条件は、制御部３４０が、既にタグに組み込まれているメタ情報に、新たなメタ情報を関連付けるための条件であり、制御部３４０は、現場作業者に関する関連付け開始条件が満たされたことに基づいて、既にタグに組み込まれているメタ情報に、新たなメタ情報を関連付ける。ただし、後にも説明するように、現場作業者に関する関連付け開始条件は、現場作業者によって注意されているオブジェクトを認識したという条件以外の条件であってもよい。 The condition of recognizing an object that is being called out by a field worker corresponds to an example of an association start condition related to a field worker. The association start condition related to a field worker is a condition for the control unit 340 to associate new meta information with meta information already incorporated in a tag, and the control unit 340 associates the new meta information with the meta information already incorporated in a tag based on the fact that the association start condition related to a field worker is satisfied. However, as will be explained later, the association start condition related to a field worker may be a condition other than the condition of recognizing an object that is being called out by a field worker.

　図８に示された例では、タグＴ２１に、時刻情報として「〇時〇分」が組み込まれ、位置情報として「位置□」が組み込まれ、オブジェクトの種類として「オブジェクトＡ」が組み込まれ、指摘情報として「●●」が組み込まれ、返答情報として「△△」が組み込まれている。なお、図示されていないが、タグＴ２１には、現場作業者によって注意されているオブジェクトの位置情報が追加的に組み込まれてもよい。 In the example shown in FIG. 8, tag T21 incorporates "xx:xx" as time information, "position □" as location information, "object A" as object type, "●●" as pointed out information, and "△△" as response information. Although not shown, tag T21 may also incorporate additional location information of an object that has been pointed out by a site worker.

　（音声認識）
　続いて、図９を参照しながら、制御部３４０によって行われる音声認識の例について説明する。 (Voice Recognition)
Next, an example of voice recognition performed by the control unit 340 will be described with reference to FIG.

　図９は、制御部３４０によって行われる音声認識の例について説明するための図である。図９を参照すると、音声認識モデルＭ１１が示されている。音声認識モデルＭ１１は、機械学習によりあらかじめ生成された学習済みモデルである。例えば、制御部３４０は、現場作業者または遠隔参加者から入力された音声を、音声認識モデルＭ１１に入力することに基づいて、音声認識モデルＭ１１から出力される文字を音声認識結果として得ればよい。 FIG. 9 is a diagram for explaining an example of voice recognition performed by the control unit 340. Referring to FIG. 9, a voice recognition model M11 is shown. The voice recognition model M11 is a trained model that has been generated in advance by machine learning. For example, the control unit 340 may input voice input from an on-site worker or a remote participant into the voice recognition model M11, and obtain characters output from the voice recognition model M11 as the voice recognition result.

　図９に示された例では、音声認識モデルＭ１１に入力される「音声」が、遠隔参加者から入力された音声である。また、音声認識モデルＭ１１から出力される文字が「近くで見せて」という文字である。なお、文字はテキストデータに該当し得る。 In the example shown in FIG. 9, the "voice" input to the voice recognition model M11 is the voice input from a remote participant. Furthermore, the characters output from the voice recognition model M11 are the characters "let me see it up close." Note that the characters may correspond to text data.

　（指摘情報の認識）
　続いて、図１０を参照しながら、制御部３４０によって行われる指摘情報の認識の例について説明する。 (Recognition of pointed out information)
Next, an example of recognition of indication information performed by the control unit 340 will be described with reference to FIG.

　図１０は、制御部３４０によって行われる指摘情報の認識の例について説明するための図である。図１０を参照すると、指摘情報認識モデルＭ２１が示されている。指摘情報認識モデルＭ２１は、機械学習によりあらかじめ生成された学習済みモデルである。 FIG. 10 is a diagram for explaining an example of the recognition of pointed out information performed by the control unit 340. Referring to FIG. 10, a pointed out information recognition model M21 is shown. The pointed out information recognition model M21 is a trained model that has been generated in advance by machine learning.

　例えば、制御部３４０は、文字の集合によって構成される文章を、指摘情報認識モデルＭ２１に入力することに基づいて、指摘情報認識モデルＭ２１から出力される指摘情報を得ればよい。このとき、制御部３４０は、文字の集合によって構成される文章を、指摘情報認識モデルＭ２１に入力することに基づいて、指摘情報認識モデルＭ２１からオブジェクトに関する情報が追加的に出力される場合には、指摘情報に加えてオブジェクトに関する情報を得てもよい。 For example, the control unit 340 may obtain the indication information output from the indication information recognition model M21 based on inputting a sentence composed of a set of characters to the indication information recognition model M21. In this case, if information about an object is additionally output from the indication information recognition model M21 based on inputting a sentence composed of a set of characters to the indication information recognition model M21, the control unit 340 may obtain information about the object in addition to the indication information.

　より詳細に、指摘情報認識モデルＭ２１は、続いている複数の文章から注目する文章を抜き出す処理（以下、「文章抽出処理」とも言う。）を行い、抜き出した文章が、何に対して何を言う文章であるのかを分類する処理（以下、「文章分類処理」とも言う。）を行う。 More specifically, the pointed-out information recognition model M21 performs a process of extracting a sentence of interest from multiple subsequent sentences (hereinafter also referred to as the "sentence extraction process"), and then performs a process of classifying the extracted sentence into what it is saying and in what context (hereinafter also referred to as the "sentence classification process").

　例えば、文章抽出処理は、所定の時間幅の窓を設けて、窓の中での注意レベルを数値化し、数値化した注意レベルが閾値以上の文章を抜き出す処理であってもよい。注意レベルの数値化としては、ＲＮＮ（Ｒｅｃｕｒｒｅｎｔ　Ｎｅｕｒａｌ　Ｎｅｔｗｏｒｋ）などといった、再帰的な関数を用いた数値化が採用されてもよい。 For example, the sentence extraction process may be a process of setting a window of a predetermined time width, quantifying the attention level within the window, and extracting sentences whose quantified attention level is equal to or exceeds a threshold value. The attention level may be quantified using a recursive function such as a recurrent neural network (RNN).

　あるいは、文章抽出処理は、アテンション機構を採用した文章の抽出であってもよい。また、文章抽出処理は、作業が属する分野に特化した文章をあらかじめ用意しておき、あらかじめ用意した文章を用いてルールベースで文章を抽出する処理であってもよい。 Alternatively, the sentence extraction process may be a process of extracting sentences using an attention mechanism. Furthermore, the sentence extraction process may be a process in which sentences specialized in the field to which the work belongs are prepared in advance, and sentences are extracted on a rule-based basis using the prepared sentences.

　文章分類処理は、係り受け解析のような方法を用いて、主語および述語を抽出し、抽出した主語および述語を用いて分類を行う処理であってもよい。あるいは、文章分類処理は、アテンション機構などによる機械学習を活用して分類を行う処理であってもよい。あるいは、文章分類処理は、指摘情報ごとにラベルを設け、文章がどのラベルに分類されるかを判断することにより指摘情報を決めてもよい。 The text classification process may be a process that uses a method such as dependency analysis to extract subjects and predicates, and then performs classification using the extracted subjects and predicates. Alternatively, the text classification process may be a process that uses machine learning, such as an attention mechanism, to perform classification. Alternatively, the text classification process may determine the pointed-out information by setting a label for each piece of pointed-out information and judging which label the text is classified into.

　なお、図１０に示された例では、指摘情報認識モデルＭ２１に入力される「文章」が、遠隔参加者から入力された音声に基づいて認識された「右の方に進んでください。ここの袋は危険なので、後で戻しておくように」という文章である。また、指摘情報認識モデルＭ２１から出力される分類結果が「オブジェクトの種類：袋、指摘情報：危険なので戻しておくこと」という分類結果である。 In the example shown in FIG. 10, the "sentence" input to the pointed-out information recognition model M21 is the sentence "Please go to the right. This bag is dangerous, so put it back later" that is recognized based on the voice input from the remote participant. The classification result output from the pointed-out information recognition model M21 is "Object type: bag, pointed-out information: dangerous, put it back."

　（遠隔参加者によって注意されているオブジェクトの認識）
　続いて、図１１を参照しながら、遠隔参加者によって注意されているオブジェクトの認識の例について説明する。 (Recognition of objects attended to by remote participants)
Next, an example of recognition of an object that is being attended to by a remote participant will be described with reference to FIG.

　図１１は、遠隔参加者によって注意されているオブジェクトの認識の例について説明するための図である。図１１を参照すると、オブジェクト認識モデルＭ３１が示されている。オブジェクト認識モデルＭ３１は、機械学習によりあらかじめ生成された学習済みモデルである。 FIG. 11 is a diagram for explaining an example of recognition of an object that is being noted by a remote participant. Referring to FIG. 11, an object recognition model M31 is shown. The object recognition model M31 is a trained model that has been generated in advance by machine learning.

　３６０度映像Ｇ１１は、作業現場が撮像されて得られた映像であり、現場作業者デバイス１０から送信され、サーバ３０によって受信される。３６０度映像Ｇ１１には、オブジェクトＢ１１およびオブジェクトＢ１２が写っている。オブジェクトＢ１１およびオブジェクトＢ１２それぞれは、袋である。また、３６０度映像Ｇ１１には、オブジェクトＢ１３およびオブジェクトＢ１４が写っている。オブジェクトＢ１３およびオブジェクトＢ１４それぞれは、現場作業者である。 The 360-degree video G11 is an image obtained by capturing an image of the work site, and is transmitted from the on-site worker device 10 and received by the server 30. Objects B11 and B12 are captured in the 360-degree video G11. Each of objects B11 and B12 is a bag. Objects B13 and B14 are also captured in the 360-degree video G11. Each of objects B13 and B14 is a on-site worker.

　表示映像Ｈ１１は、３６０度映像Ｇ１１のうち、指摘を行った遠隔参加者の視点方向とＦｏＶとに応じた範囲の映像である。表示映像Ｈ１１には、オブジェクトＢ１１およびオブジェクトＢ１２が写っており、オブジェクトＢ１３およびオブジェクトＢ１４は、写っていない。例えば、制御部３４０は、３６０度映像Ｇ１１と、指摘を行った遠隔参加者の視点方向と、ＦｏＶと、カーソル位置とを、オブジェクト認識モデルＭ３１に入力し、オブジェクト認識モデルＭ３１から出力されるオブジェクトに関する情報を得てもよい。 Displayed image H11 is an image of the 360-degree image G11 in a range that corresponds to the viewpoint direction and FoV of the remote participant who made the point. Objects B11 and B12 are shown in displayed image H11, but objects B13 and B14 are not. For example, control unit 340 may input 360-degree image G11, the viewpoint direction of the remote participant who made the point, FoV, and cursor position to object recognition model M31, and obtain information about the object that is output from object recognition model M31.

　図１１に示された例では、オブジェクトに関する情報として、「オブジェクトの種類：袋、オブジェクトの位置情報：□」が得られている。 In the example shown in Figure 11, the information obtained about the object is "Object type: bag, Object position information: □".

　なお、上記したように、３６０度映像の代わりに、遠隔参加者の音声から得られた音声認識結果に基づいて、オブジェクトが認識されてもよい。また、制御部３４０は、オブジェクトの認識に、現場環境音を加味してもよい。あるいは、現場環境音の代わりに、現場環境音の特徴（例えば、どちらの方向から現場作業者デバイス１０に現場環境音が到来しているかを示す情報など）が加味されてもよい。 As described above, instead of the 360-degree video, objects may be recognized based on voice recognition results obtained from the voices of remote participants. The control unit 340 may also take into account on-site environmental sounds when recognizing objects. Alternatively, instead of on-site environmental sounds, characteristics of the on-site environmental sounds (such as information indicating the direction from which the on-site environmental sounds are coming to the on-site worker device 10) may be taken into account.

　また、上記したように、遠隔参加者の音声から得られた音声認識結果に基づいて、オブジェクトの種類が認識される場合も想定され得る。かかる場合には、制御部３４０は、遠隔参加者によって注意されているオブジェクトを、指摘情報とともに認識された種類のオブジェクトからのみ認識してもよい。 Furthermore, as described above, it is conceivable that the type of object may be recognized based on the voice recognition results obtained from the voice of the remote participant. In such a case, the control unit 340 may recognize the object that is being pointed out by the remote participant only from the object type that is recognized together with the pointed out information.

　（現場作業者によって注意されているオブジェクトの認識）
　続いて、図１２を参照しながら、現場作業者によって注意されているオブジェクトの認識の例について説明する。 (Recognition of objects attended to by field workers)
Next, an example of recognition of an object that is being noticed by a site worker will be described with reference to FIG.

　図１２は、現場作業者によって注意されているオブジェクトの認識の例について説明するための図である。図１２を参照すると、３６０度映像Ｇ２１が示されている。３６０度映像Ｇ２１には、オブジェクトＢ２１およびオブジェクトＢ２２が写っている。オブジェクトＢ２１およびオブジェクトＢ２２それぞれは、袋である。また、３６０度映像Ｇ２１には、オブジェクトＢ２３が写っている。オブジェクトＢ２３は、箱である。 FIG. 12 is a diagram for explaining an example of recognizing an object that is being noted by a site worker. Referring to FIG. 12, a 360-degree video G21 is shown. Objects B21 and B22 are shown in the 360-degree video G21. Objects B21 and B22 are each a bag. Furthermore, object B23 is shown in the 360-degree video G21. Object B23 is a box.

　オブジェクトＢ２１～Ｂ２３それぞれは、現場作業者によって注意されているオブジェクト候補であるとする。ここで、現場作業者が指摘したいオブジェクトは、オブジェクトＢ２３である場合を想定する。このとき、現場作業者は、オブジェクトＢ２３の位置を手Ｆ１１の位置によって指示する。 Each of the objects B21 to B23 is assumed to be a candidate object that has been drawn attention of by the on-site worker. Here, it is assumed that the object that the on-site worker wishes to point out is object B23. In this case, the on-site worker indicates the position of object B23 by the position of his/her hand F11.

　そこで、制御部３４０は、現場作業者によって注意されているオブジェクト候補から、現場作業者によって指示されている位置に基づいて、現場作業者によって注意されているオブジェクト（注意点）を認識してもよい。図１２に示された例では、現場作業者によって注意されているオブジェクト（注意点）として、オブジェクトＢ２３が認識される。 The control unit 340 may then recognize the object (point of attention) that the on-site worker is drawing attention to from among the candidate objects that the on-site worker is drawing attention to, based on the position indicated by the on-site worker. In the example shown in FIG. 12, object B23 is recognized as the object (point of attention) that the on-site worker is drawing attention to.

　（返答情報の認識）
　続いて、図１３を参照しながら、制御部３４０によって行われる返答情報の認識の例について説明する。 (Recognition of response information)
Next, an example of response information recognition performed by the control unit 340 will be described with reference to FIG.

　図１３は、制御部３４０によって行われる返答情報の認識の例について説明するための図である。図１３を参照すると、返答情報認識モデルＭ４１が示されている。返答情報認識モデルＭ４１は、機械学習によりあらかじめ生成された学習済みモデルである。 FIG. 13 is a diagram for explaining an example of response information recognition performed by the control unit 340. Referring to FIG. 13, a response information recognition model M41 is shown. The response information recognition model M41 is a trained model that has been generated in advance by machine learning.

　例えば、制御部３４０は、文字の集合によって構成される文章を、返答情報認識モデルＭ４１に入力することに基づいて、返答情報認識モデルＭ４１から出力される返答情報を得ればよい。 For example, the control unit 340 may obtain response information output from the response information recognition model M41 based on inputting a sentence composed of a set of characters into the response information recognition model M41.

　より詳細に、返答情報認識モデルＭ４１は、指摘情報認識モデルＭ２１（図１０）と同様に、文章抽出処理および文章分類処理を行う。 More specifically, the response information recognition model M41 performs sentence extraction processing and sentence classification processing, similar to the pointed-out information recognition model M21 (Figure 10).

　なお、図１３に示された例では、遠隔参加者による指摘情報が、遠隔参加者から入力された音声に基づいて認識された「右の方に進んでください。ここの袋は危険なので、後で戻しておくように」という文章である。 In the example shown in FIG. 13, the instruction information from the remote participant is the sentence "Please go to the right. The bag here is dangerous, so please put it back later" that is recognized based on the voice input from the remote participant.

　また、返答情報認識モデルＭ４１に入力される文章が、「かしこまりました。後ほど対応します。」である場合に、返答情報認識モデルＭ４１から出力される分類結果は「要是正」という分類結果である。また、返答情報認識モデルＭ４１に入力される文章が、「今戻しました。」である場合に、返答情報認識モデルＭ４１から出力される分類結果は「対応済み」という分類結果である。 Also, if the sentence input to the response information recognition model M41 is "Understood. I will deal with it later," the classification result output from the response information recognition model M41 is "Needs correction."Also, if the sentence input to the response information recognition model M41 is "I have just returned it," the classification result output from the response information recognition model M41 is "Affected."

　（タグ内のメタ情報）
　続いて、図１４を参照しながら、制御部３４０によって記憶部３５０に記録されるタグ内のメタ情報の例について説明する。 (Meta information in tags)
Next, an example of meta-information in a tag recorded in the storage unit 350 by the control unit 340 will be described with reference to FIG.

　図１４は、制御部３４０によって記憶部３５０に記録されるタグ内のメタ情報の例を示す図である。図１４を参照すると、制御部３４０によって記憶部３５０に記録されるタグの例としての、タグＴ３１が示されている。 FIG. 14 is a diagram showing an example of meta information in a tag recorded in storage unit 350 by control unit 340. Referring to FIG. 14, tag T31 is shown as an example of a tag recorded in storage unit 350 by control unit 340.

　タグＴ３１に、時刻情報として「〇時〇分」が組み込まれ、現場作業者の位置情報として「位置□」が組み込まれ、遠隔参加者によって注意されているオブジェクトの種類として「袋」が組み込まれ、指摘情報として「危険なので戻しておくようにとの指摘」が組み込まれ、返答情報として「対応状況：未対応」が組み込まれている。また、タグＴ３１には、その他のメタ情報も組み込まれる。 In tag T31, "xx:xx" is incorporated as time information, "position □" is incorporated as the position information of the on-site worker, "bag" is incorporated as the type of object warned about by the remote participant, "it's dangerous so it should be put back" is incorporated as the indication information, and "response status: not yet responded" is incorporated as the response information. Other meta information is also incorporated in tag T31.

　その他のメタ情報としては、指摘者を示す情報、３６０度映像におけるオブジェクトの位置情報、遠隔参加者の視点方向、ＦｏＶおよびカーソル位置などが挙げられる。図１４に示されたタグＴ３１が記憶部３５０に記録されれば、後にタグＴ３１を参照することにより、指摘されたオブジェクトがどの位置にあるのかを把握したり、時刻情報に基づいてオブジェクトに対する指摘があったときの遠隔音声および現場音声を再生したりすることが可能となる。 Other meta information includes information indicating the person who pointed out the object, position information of the object in the 360-degree video, the viewpoint direction of the remote participant, FoV, and cursor position. If tag T31 shown in FIG. 14 is recorded in storage unit 350, it will be possible to later refer to tag T31 to determine the location of the pointed out object and to play back remote audio and on-site audio at the time the object was pointed out based on time information.

　（認識モデルの変形例）
　上記においては、音声認識モデルＭ１１と、指摘情報認識モデルＭ２１と、オブジェクト認識モデルＭ３１と、返答情報認識モデルＭ４１とが、別々に設けられる例について説明した。しかし、これらのモデルが有する機能と同等の機能を有する一つの認識モデルが採用されてもよい。 (Modifications of the recognition model)
In the above, an example has been described in which the voice recognition model M11, the pointed-out information recognition model M21, the object recognition model M31, and the response information recognition model M41 are provided separately, but a single recognition model having the same functions as those of these models may be adopted.

　図１５は、認識モデルの変形例について説明するための図である。図１５を参照すると、認識モデルＭ５１が示されている。例えば、制御部３４０は、遠隔参加者または現場作業者の音声と、３６０度映像とを、認識モデルＭ５１に入力し、認識モデルＭ５１から出力される、オブジェクトの種類と、オブジェクトの位置情報と、指摘情報と、返答情報とを得てもよい。 FIG. 15 is a diagram for explaining a modified example of the recognition model. Referring to FIG. 15, recognition model M51 is shown. For example, the control unit 340 may input the voice of a remote participant or an on-site worker and a 360-degree video to the recognition model M51, and obtain the type of object, object position information, indication information, and response information that are output from the recognition model M51.

　（関連付け開始条件の変形例）
　上記においては、遠隔参加者に関する関連付け開始条件が、遠隔参加者によって注意されているオブジェクトを認識したという条件である場合を主に想定した。しかし、遠隔参加者に関する関連付け開始条件は、遠隔参加者によって注意されているオブジェクトを認識したという条件以外の条件であってもよい。例えば、遠隔参加者に関する関連付け開始条件は、遠隔参加者から所定の関連付け開始操作が入力されたという条件であってもよい。 (Modification of Association Start Condition)
In the above, it is mainly assumed that the association start condition for the remote participant is a condition that an object that is being noticed by the remote participant is recognized. However, the association start condition for the remote participant may be a condition other than the condition that an object that is being noticed by the remote participant is recognized. For example, the association start condition for the remote participant may be a condition that a predetermined association start operation is input from the remote participant.

　図１６は、関連付け開始操作の例について説明するための図である。図１６を参照すると、遠隔参加者デバイス２０－１が示されている。遠隔参加者デバイス２０－１が備えるディスプレイ２８０は、表示映像Ｗ１１をリアルタイムに表示している。また、ディスプレイ２８０は、関連付け開始ボタンＷ１２を表示している。例えば、遠隔参加者に関する関連付け開始操作は、遠隔参加者が関連付け開始ボタンＷ１２を押下する操作であってもよい。 FIG. 16 is a diagram for explaining an example of an association start operation. Referring to FIG. 16, a remote participant device 20-1 is shown. The display 280 of the remote participant device 20-1 displays a display image W11 in real time. The display 280 also displays an association start button W12. For example, the association start operation for a remote participant may be an operation in which the remote participant presses the association start button W12.

　（指摘情報の認識の変形例）
　上記においては、遠隔参加者による指摘情報が、遠隔参加者の音声から得られた音声認識結果に基づいて認識される場合を主に想定した。しかし、遠隔参加者による指摘情報は、操作検出部２３０に入力されてもよい。 (Modification of the recognition of indicated information)
In the above description, it is mainly assumed that the information pointed out by the remote participant is recognized based on the voice recognition result obtained from the voice of the remote participant. However, the information pointed out by the remote participant may be input to the operation detection unit 230.

　図１７は、指摘情報の認識の変形例について説明するための図である。図１７を参照すると、遠隔参加者デバイス２０に表示される指摘情報入力画面Ｗ２１が示されている。指摘情報入力画面Ｗ２１は、指摘情報入力欄Ｗ２１１と、対応状況入力欄Ｗ２１２とを含んでいる。 FIG. 17 is a diagram for explaining a modified example of the recognition of pointed out information. Referring to FIG. 17, a pointed out information input screen W21 displayed on the remote participant device 20 is shown. The pointed out information input screen W21 includes a pointed out information input field W211 and a response status input field W212.

　例えば、制御部３４０は、指摘情報入力欄Ｗ２１１に指摘情報を入力する操作が操作検出部２３０に対して行われたことに基づいて、指摘情報入力欄Ｗ２１１に入力された指摘情報を認識してもよい。さらに、制御部３４０は、対応状況入力欄Ｗ２１２に対応状況を入力する操作が操作検出部２３０に対して行われたことに基づいて、対応状況入力欄Ｗ２１２に入力された対応状況を認識してもよい。 For example, the control unit 340 may recognize the indication information input into the indication information input field W211 based on an operation performed on the operation detection unit 230 to input indication information into the indication information input field W211. Furthermore, the control unit 340 may recognize the response status input into the response status input field W212 based on an operation performed on the operation detection unit 230 to input a response status into the response status input field W212.

　なお、遠隔参加者の音声から得られた音声認識結果に基づく指摘情報の認識と、遠隔参加者によって入力された指摘情報の認識とが、併用されてもよい。例えば、遠隔参加者の音声から得られた音声認識結果に基づいて認識された指摘情報が、遠隔参加者によって入力された指摘情報によって修正されてもよい。 Note that the recognition of pointed out information based on the voice recognition results obtained from the voice of the remote participant and the recognition of pointed out information input by the remote participant may be used together. For example, pointed out information recognized based on the voice recognition results obtained from the voice of the remote participant may be corrected by pointed out information input by the remote participant.

　（話者の特定）
　上記においては、現場作業者デバイス１０を使用する現場作業者が作業現場に存在する場合を主に想定した。しかし、実際には、現場作業者デバイス１０を使用する現場作業者の近くに、現場作業者デバイス１０を使用していない現場作業者も存在し得る。さらに、上記したように、遠隔参加者デバイス２０を使用する遠隔参加者も複数存在する場合が想定され得る。 (Speaker Identification)
In the above, it is mainly assumed that there is a field worker using the field worker device 10 at the work site. However, in reality, there may be a field worker who is not using the field worker device 10 near the field worker who uses the field worker device 10. Furthermore, as described above, it may be assumed that there are a plurality of remote participants using the remote participant device 20.

　このとき、一例として、指摘情報とオブジェクトに関する情報との対応付けが正常に行われない可能性がある。そのため、音声を発している話者が特定され、話者の特定結果がタグ付けに用いられるのが望ましい。図１８を参照しながら、話者の特定結果がタグ付けに用いられる変形例について説明する。 In this case, as an example, there is a possibility that the indication information and the information related to the object may not be properly associated. Therefore, it is desirable to identify the speaker who is making the sound, and use the result of identifying the speaker for tagging. With reference to FIG. 18, a modified example in which the result of identifying the speaker is used for tagging will be described.

　図１８は、話者の特定結果がタグ付けに用いられる変形例について説明するための図である。現場作業者Ｕ１２は、現場作業者デバイス１０を使用する現場作業者である。また、現場作業者Ｕ１１は、現場作業者Ｕ１２の近くに存在し、現場作業者デバイス１０を使用していない現場作業者である。現場作業者Ｕ１１と現場作業者Ｕ１２とは、別のオブジェクトを視認している可能性がある。しかし、現場作業者Ｕ１１による指摘情報と現場作業者Ｕ１２の視点方向に基づくオブジェクトに関する情報との対応付けが行われてしまう可能性がある。 FIG. 18 is a diagram for explaining a modified example in which the speaker identification result is used for tagging. On-site worker U12 is a field worker who uses the field worker device 10. On-site worker U11 is a field worker who is near the field worker U12 and is not using the field worker device 10. There is a possibility that the field worker U11 and the field worker U12 are visually recognizing different objects. However, there is a possibility that the pointed-out information by the field worker U11 will be associated with information about the object based on the viewpoint direction of the field worker U12.

　そこで、制御部３４０は、現場作業者Ｕ１２が話者であるか否かを特定するのがよい。ここでは、現場作業者Ｕ１２が話者であると特定された場合を想定する。かかる場合には、制御部３４０は、現場作業者Ｕ１２が話者であると特定されたことに基づいて、現場作業者Ｕ１２の音声から得られた音声認識結果に基づく指摘情報の認識を行うのがよい。 The control unit 340 should therefore determine whether or not the on-site worker U12 is the speaker. Here, it is assumed that the on-site worker U12 has been identified as the speaker. In such a case, the control unit 340 should recognize the indication information based on the voice recognition results obtained from the voice of the on-site worker U12, based on the fact that the on-site worker U12 has been identified as the speaker.

　これにより、現場作業者Ｕ１１による指摘情報と現場作業者Ｕ１２の視点方向に基づくオブジェクトに関する情報との対応付けが行われてしまう可能性が低減され得る。換言すると、作業現場に複数の現場作業者が存在する場合であっても、現場作業者Ｕ１２による指摘情報と現場作業者Ｕ１２の視点方向に基づくオブジェクトに関する情報とが、正常に対応付けられる可能性が高まる。 This can reduce the possibility of matching the information pointed out by on-site worker U11 with information about an object based on the viewpoint direction of on-site worker U12. In other words, even if there are multiple on-site workers at the work site, the possibility of correctly matching the information pointed out by on-site worker U12 with information about an object based on the viewpoint direction of on-site worker U12 increases.

　なお、現場作業者が話者であるか否かは、現場作業者が使用するマイクロフォン１２０による音の検出結果に基づいて特定されてよい。一例として、現場作業者が使用するマイクロフォン１２０によって検出された音量が閾値以上である場合に、その現場作業者が話者であると特定されてもよい。あるいは、現場作業者が話者であるか否かは、現場作業者が使用するマイクロフォン１２０による音の検出結果と機械学習モデルとに基づいて特定されてもよい。あるいは、マイクロフォン１２０によって検出された音の到来方向が推定可能な場合には、音の到来方向が特定の方向（例えば、現場作業者の口の位置からマイクロフォン１２０に向かう方向）であるか否かにより、現場作業者が話者であるか否かが判定されてもよい。 Whether or not a field worker is a speaker may be determined based on the sound detection results by the microphone 120 used by the field worker. As an example, if the volume detected by the microphone 120 used by the field worker is equal to or greater than a threshold, the field worker may be determined to be a speaker. Alternatively, whether or not a field worker is a speaker may be determined based on the sound detection results by the microphone 120 used by the field worker and a machine learning model. Alternatively, if the direction from which the sound detected by the microphone 120 comes can be estimated, whether or not the field worker is a speaker may be determined based on whether or not the direction from which the sound comes is a specific direction (for example, the direction from the position of the field worker's mouth toward the microphone 120).

　また、遠隔参加者Ａの視点方向と、遠隔参加者Ｂの視点方向とが示されている。遠隔参加者Ａの視点方向と、遠隔参加者Ｂの視点方向とを参照すると、遠隔参加者Ａと遠隔参加者Ｂとが別のオブジェクトを視認している可能性がある。しかし、遠隔参加者Ａによる指摘情報と遠隔参加者Ｂの視点方向に基づくオブジェクトに関する情報との対応付けが行われてしまう可能性がある。 Furthermore, the viewpoint direction of remote participant A and the viewpoint direction of remote participant B are shown. When referring to the viewpoint direction of remote participant A and the viewpoint direction of remote participant B, it is possible that remote participant A and remote participant B are viewing different objects. However, there is a possibility that the information pointed out by remote participant A will be associated with information about the object based on the viewpoint direction of remote participant B.

　そこで、制御部３４０は、遠隔参加者Ａおよび遠隔参加者Ｂがそれぞれ話者であるか否かを特定するのがよい。ここでは、遠隔参加者Ａが話者であると特定された場合を想定する。かかる場合には、制御部３４０は、遠隔参加者Ａが話者であると特定されたことに基づいて、遠隔参加者Ａの音声から得られた音声認識結果に基づく指摘情報の認識を行うのがよい。 The control unit 340 should therefore determine whether or not each of remote participant A and remote participant B is a speaker. Here, it is assumed that remote participant A has been identified as the speaker. In such a case, the control unit 340 should recognize pointed-out information based on the voice recognition results obtained from the voice of remote participant A, based on the fact that remote participant A has been identified as the speaker.

　これにより、遠隔参加者Ａによる指摘情報と遠隔参加者Ｂの視点方向に基づくオブジェクトに関する情報との対応付けが行われてしまう可能性が低減され得る。換言すると、遠隔地に複数の遠隔参加者が存在する場合であっても、遠隔参加者Ａによる指摘情報と遠隔参加者Ａの視点方向に基づくオブジェクトに関する情報とが、正常に対応付けられる可能性が高まり、遠隔参加者Ｂによる指摘情報と遠隔参加者Ｂの視点方向に基づくオブジェクトに関する情報とが、正常に対応付けられる可能性が高まる。 This can reduce the possibility of matching the information pointed out by remote participant A with information about an object based on the viewpoint direction of remote participant B. In other words, even when there are multiple remote participants in remote locations, the possibility of correctly matching the information pointed out by remote participant A with information about an object based on the viewpoint direction of remote participant A increases, and the possibility of correctly matching the information pointed out by remote participant B with information about an object based on the viewpoint direction of remote participant B increases.

　なお、遠隔参加者が話者であるか否かは、遠隔参加者が使用するマイクロフォン２２０による音の検出結果に基づいて特定されてよい。一例として、遠隔参加者が使用するマイクロフォン２２０によって検出された音量が閾値以上である場合に、その遠隔参加者が話者であると特定されてもよい。 Whether or not a remote participant is a speaker may be determined based on the results of sound detection by the microphone 220 used by the remote participant. As an example, if the volume detected by the microphone 220 used by the remote participant is equal to or greater than a threshold, the remote participant may be determined to be a speaker.

　（ＡＩとの連携例）
　上記においては、遠隔参加者が何を指摘するかを考える場合を主に想定した。しかし、ＡＩ（Ａｒｔｉｆｉｃｉａｌ　Ｉｎｔｅｌｌｉｇｅｎｃｅ）との連携により、遠隔参加者は、ＡＩから提供された情報に基づいて、何を指摘するかを考えることが可能になる。図１９を参照しながら、ＡＩとの連携例について説明する。 (Example of collaboration with AI)
In the above, it is mainly assumed that the remote participant thinks about what to point out. However, by cooperating with AI (Artificial Intelligence), the remote participant can think about what to point out based on information provided by the AI. An example of cooperation with AI will be described with reference to FIG. 19.

　図１９は、ＡＩとの連携例について説明するための図である。図１９を参照すると、予測モデルＭ６１は、機械学習によりあらかじめ生成された学習済みモデルである。予測モデルＭ６１は、３６０度映像から遠隔参加者が指摘を行う可能性（以下、「指摘可能性」とも言う。）を、オブジェクトごとに推定するモデルである。 FIG. 19 is a diagram for explaining an example of collaboration with AI. Referring to FIG. 19, prediction model M61 is a trained model that has been generated in advance by machine learning. Prediction model M61 is a model that estimates, for each object, the possibility that a remote participant will point out something from the 360-degree video (hereinafter, also referred to as "pointing possibility").

　制御部３４０は、３６０度映像Ｇ１１を、予測モデルＭ６１に入力することに基づいて、予測モデルＭ６１から出力されるオブジェクトごとの指摘可能性を、遠隔参加者によるオブジェクトごとの指摘可能性として推定すればよい。 The control unit 340 inputs the 360-degree video G11 into the prediction model M61, and estimates the possibility of each object being pointed out by a remote participant based on the possibility of each object being pointed out output from the prediction model M61.

　図１９に示された例では、オブジェクトＢ１１に対する遠隔参加者による指摘可能性が「９０％」であり、オブジェクトＢ１２に対する遠隔参加者による指摘可能性が「６０％」であり、オブジェクトＢ１３に対する遠隔参加者による指摘可能性が「７５％」であり、オブジェクトＢ１４に対する遠隔参加者による指摘可能性が「７０％」である。 In the example shown in FIG. 19, the probability that a remote participant will point out object B11 is "90%", the probability that a remote participant will point out object B12 is "60%", the probability that a remote participant will point out object B13 is "75%", and the probability that a remote participant will point out object B14 is "70%".

　制御部３４０は、オブジェクトごとの指摘可能性に応じた表示映像を生成する。そして、制御部３４０は、生成した表示映像を、通信部３６０を介して、遠隔参加者によって使用される遠隔参加者デバイス２０に送信する。そして、遠隔参加者デバイス２０において、制御部２４０は、通信部２６０を介して表示映像を取得し、表示映像のディスプレイ２８０による表示を制御する。 The control unit 340 generates a display image according to the pointing possibility for each object. The control unit 340 then transmits the generated display image via the communication unit 360 to the remote participant device 20 used by the remote participant. In the remote participant device 20, the control unit 240 then obtains the display image via the communication unit 260 and controls the display of the display image on the display 280.

　一例として、制御部３４０は、指摘可能性が閾値より高いオブジェクトが存在する場合に、そのオブジェクトの位置に応じた付加画像を表示映像に重畳してもよい。ここで、閾値は具体的にどのような値であってもよいが、閾値が５０％である場合を想定する。このとき、制御部３４０は、オブジェクトＢ１１～Ｂ１４それぞれの指摘可能性が閾値より高いと判定する。 As an example, when an object exists whose pointing probability is higher than a threshold, the control unit 340 may superimpose an additional image corresponding to the position of the object onto the displayed image. Here, the threshold may be any specific value, but it is assumed that the threshold is 50%. In this case, the control unit 340 determines that the pointing probability of each of objects B11 to B14 is higher than the threshold.

　表示映像Ｈ１１は、３６０度映像Ｇ１１のうち、指摘を行った遠隔参加者の視点方向とＦｏＶとに応じた範囲の映像である。表示映像Ｈ１１には、オブジェクトＢ１１およびオブジェクトＢ１２が写っている。制御部３４０は、３６０度映像Ｇ１１における、表示映像Ｈ１１の位置を基準とした、指摘可能性が閾値より高いオブジェクトＢ１１～Ｂ１４の相対的な位置に応じた位置に、付加画像Ｗ３１および付加画像Ｗ３２を重畳する。 Displayed image H11 is an image of the 360-degree image G11 in a range that corresponds to the viewing direction and FoV of the remote participant who made the pointing. Objects B11 and B12 are shown in displayed image H11. The control unit 340 superimposes additional image W31 and additional image W32 at positions in the 360-degree image G11 that correspond to the relative positions of objects B11 to B14, which have a higher possibility of being pointed out than a threshold, based on the position of displayed image H11.

　ここでは、３６０度映像Ｇ１１において、表示映像Ｈ１１の位置を基準として、指摘可能性が閾値より高いオブジェクトＢ１３～Ｂ１４が右に位置する。そのため、制御部３４０は、表示映像Ｈ１１に写るオブジェクトＢ１１～Ｂ１２よりも右側に、付加画像Ｗ３２を重畳する。また、３６０度映像Ｇ１１における表示映像Ｈ１１の内部に、指摘可能性が閾値より高いオブジェクトＢ１１～Ｂ１２が位置する。そのため、制御部３４０は、表示映像Ｈ１１の内部に、付加画像Ｗ３２とは異なる付加画像Ｗ３１を重畳する。 Here, in the 360-degree video G11, objects B13-B14, whose pointing possibility is higher than the threshold, are located to the right of the position of the display video H11. Therefore, the control unit 340 superimposes the additional image W32 to the right of the objects B11-B12 appearing in the display video H11. Also, objects B11-B12, whose pointing possibility is higher than the threshold, are located inside the display video H11 in the 360-degree video G11. Therefore, the control unit 340 superimposes an additional image W31, which is different from the additional image W32, inside the display video H11.

　例えば、遠隔参加者は、付加画像Ｗ３２を選択する操作を入力することが可能であってもよい。このとき、制御部３４０は、遠隔参加者によって付加画像Ｗ３２を選択する操作が操作検出部２３０に入力されたことに基づいて、オブジェクトが写る範囲に表示映像Ｈ１１を変更してもよい。 For example, the remote participant may be able to input an operation to select the additional image W32. At this time, the control unit 340 may change the display image H11 to a range in which the object appears, based on the operation to select the additional image W32 being input by the remote participant to the operation detection unit 230.

　遠隔参加者は、オブジェクトごとの指摘可能性を考慮しながら、指摘可能性が閾値よりも高いオブジェクトＢ１１～Ｂ１４それぞれに対して指摘を行うか否かを考えることが可能である。また、遠隔参加者は、現場作業者に対して、オブジェクトに近づくことを音声により指示すれば、オブジェクトを迅速に現場作業者に確認してもらうことも可能である。 The remote participant can consider whether or not to point out each of the objects B11 to B14 whose pointing probability is higher than the threshold value, while taking into account the pointing probability for each object. In addition, the remote participant can instruct the on-site worker by voice to approach the object, so that the on-site worker can quickly check the object.

　（リアルタイム映像へのタグの活用）
　上記のようにして、制御部３４０によってタグが生成されるが、制御部３４０によって生成されたタグ内のメタ情報は、リアルタイム映像に反映されてよい。図２０を参照しながら、タグ内のメタ情報がリアルタイム映像に反映される例について説明する。 (Using tags for real-time video)
As described above, the tag is generated by the control unit 340, and the meta information in the tag generated by the control unit 340 may be reflected in the real-time video. An example in which the meta information in the tag is reflected in the real-time video will be described with reference to Fig. 20 .

　図２０は、タグ内のメタ情報がリアルタイム映像に反映される例について説明するための図である。 Figure 20 is a diagram to explain an example in which meta information in a tag is reflected in real-time video.

　図２０に示されるように、制御部３４０は、３６０度映像Ｇ１１のうち、指摘を行った遠隔参加者の視点方向とＦｏＶとに応じた範囲の表示映像Ｈ１１を生成する。制御部３４０は、生成した表示映像Ｈ１１を、通信部３６０を介して、遠隔参加者によって使用される遠隔参加者デバイス２０に送信する。そして、遠隔参加者デバイス２０において、制御部２４０は、通信部２６０を介して表示映像Ｈ１１を取得し、表示映像Ｈ１１のディスプレイ２８０による表示を制御する。 As shown in FIG. 20, the control unit 340 generates a display image H11 from the 360-degree image G11, the range of which corresponds to the viewpoint direction and FoV of the remote participant who made the point. The control unit 340 transmits the generated display image H11 via the communication unit 360 to the remote participant device 20 used by the remote participant. Then, in the remote participant device 20, the control unit 240 acquires the display image H11 via the communication unit 260, and controls the display of the display image H11 on the display 280.

　さらに、制御部３４０は、オブジェクトＢ１１の位置に応じた表示映像Ｈ１１における位置に、ピンＰ１１を付加してもよい。また、制御部３４０は、遠隔参加者によってピンＰ１１を選択する操作が入力されると、オブジェクトＢ１１の位置に応じた表示映像Ｈ１１における位置にタグＴ４１を付加してもよい。タグＴ４１は、タグＴ３１（図１４）の前半部分と同様である。 Furthermore, the control unit 340 may add a pin P11 to a position in the display image H11 that corresponds to the position of the object B11. Furthermore, when an operation to select the pin P11 is input by a remote participant, the control unit 340 may add a tag T41 to a position in the display image H11 that corresponds to the position of the object B11. The tag T41 is similar to the first half of the tag T31 (FIG. 14).

　ここで、タグＴ４１に含まれる「袋」は、遠隔参加者によって注意されているオブジェクトＢ１１の種類に該当する。また、オブジェクトＢ１１の種類は、遠隔参加者によって注意されているオブジェクトＢ１１に関する情報の例である。 Here, the "bag" included in tag T41 corresponds to the type of object B11 that is being noted by the remote participant. Also, the type of object B11 is an example of information related to object B11 that is being noted by the remote participant.

　遠隔参加者の視点方向の移動に伴って、３６０度映像における表示映像の位置が変化するため、表示映像においてオブジェクトＢ１１は移動し得る。そのため、制御部３４０は、オブジェクトＢ１１に追従するように、ピンＰ１１およびタグ４１を移動させる。なお、３６０度映像に対するオブジェクトの位置情報がタグに組み込まれているため、遠隔参加者が視点方向を移動させたとしても、遠隔参加者の視点方向の移動にピンＰ１１およびタグ４１が追従され得る。 As the viewpoint of the remote participant moves, the position of the displayed image in the 360-degree image changes, so object B11 may move in the displayed image. Therefore, the control unit 340 moves pin P11 and tag 41 so that they follow object B11. Note that because the position information of the object relative to the 360-degree image is incorporated in the tag, even if the remote participant moves his or her viewpoint, pin P11 and tag 41 can follow the movement of the remote participant's viewpoint.

　図２０を参照すると、現場作業者の視点方向の移動に伴って、３６０度映像Ｇ１１が３６０度映像Ｇ３１に変化している。また、遠隔参加者の視点方向の移動に伴って、表示映像Ｈ１１が表示映像Ｈ２１に変化している。しかし、遠隔参加者の視点方向の移動にピンＰ１１およびタグ４１が追従されているため、表示映像Ｈ２１におけるオブジェクトＢ１１の位置に応じた位置にタグＴ４１が付加されており、表示映像Ｈ２１におけるオブジェクトＢ１１の位置に応じた位置に、ピンＰ１１が付加されている。 Referring to FIG. 20, as the on-site worker's viewpoint moves, the 360-degree image G11 changes to a 360-degree image G31. As the remote participant's viewpoint moves, the displayed image H11 changes to a displayed image H21. However, because the pin P11 and tag 41 follow the movement of the remote participant's viewpoint, the tag T41 is added at a position corresponding to the position of the object B11 in the displayed image H21, and the pin P11 is added at a position corresponding to the position of the object B11 in the displayed image H21.

　（アーカイブ映像へのタグの活用）
　制御部３４０によって生成されたタグ内のメタ情報は、アーカイブ映像に反映されてよい。図２１および図２２を参照しながら、タグ内のメタ情報がアーカイブ映像に反映される例について説明する。図２１を参照しながら、タグ内のメタ情報がタイムラインに付される場合について説明し、図２２を参照しながら、タグ内のメタ情報が作業現場のマップおよび表示映像に付される場合について説明する。 (Using tags for archived footage)
The meta information in the tag generated by the control unit 340 may be reflected in the archived video. An example in which the meta information in the tag is reflected in the archived video will be described with reference to Fig. 21 and Fig. 22. A case in which the meta information in the tag is attached to a timeline will be described with reference to Fig. 21, and a case in which the meta information in the tag is attached to a map of a work site and a displayed video will be described with reference to Fig. 22.

　図２１は、タグ内のメタ情報がタイムラインに付される場合について説明するための図である。図２１を参照すると、アーカイブ映像の例としての表示映像Ｈ１１が示されている。 FIG. 21 is a diagram for explaining a case where meta information in a tag is added to a timeline. Referring to FIG. 21, a display image H11 is shown as an example of archived video.

　制御部２４０は、所定の出力開始条件が満たされたことに基づいて、記憶部３５０から取り出された３６０度映像と遠隔参加者の視点方向とＦｏＶとに応じた表示映像Ｈ１１の、ディスプレイ２８０による表示を制御する。遠隔参加者の視点方向とＦｏＶがタグ内に残っているために、遠隔参加者の視点方向とＦｏＶとが考慮された表示映像Ｈ１１が表示されることにより、表示映像Ｈ１１から指摘情報に効率的にアクセスすることが可能となる。 Based on the satisfaction of a predetermined output start condition, the control unit 240 controls the display 280 to display the 360-degree image retrieved from the storage unit 350 and the display image H11 corresponding to the viewpoint direction and FoV of the remote participant. Because the viewpoint direction and FoV of the remote participant remain in the tag, the display image H11 is displayed taking into account the viewpoint direction and FoV of the remote participant, making it possible to efficiently access the pointed-out information from the display image H11.

　ここで、所定の出力開始条件は、遠隔参加者から表示映像Ｈ１１の出力開始操作が入力されたという条件を含んでもよい。出力開始操作は、特に限定されない。例えば、出力開始ボタンがディスプレイ２８０に表示されている場合、出力開始操作は、出力開始ボタンを選択する操作であってもよい。 Here, the specified output start condition may include a condition that an output start operation for the display image H11 is input from a remote participant. The output start operation is not particularly limited. For example, if an output start button is displayed on the display 280, the output start operation may be an operation of selecting the output start button.

　表示映像Ｈ１１には、タイムラインＨ４１が付加されている。制御部３４０は、遠隔参加者によって注意されたオブジェクトＢ１１が認識された表示映像Ｈ１１における再生位置に対応するタイムラインＨ４１上の位置に、ピンＰ１１およびピンＰ２１を付加してもよい。制御部３４０は、遠隔参加者によってピンＰ２１を選択する操作が入力されると、ピンＰ２１に対応するオブジェクトの位置に応じた表示映像Ｈ１１における位置にタグを付加してもよい。 A timeline H41 is added to the display image H11. The control unit 340 may add pins P11 and P21 to positions on the timeline H41 that correspond to the playback position in the display image H11 where object B11, which was noted by the remote participant, was recognized. When the remote participant inputs an operation to select pin P21, the control unit 340 may add a tag to a position in the display image H11 that corresponds to the position of the object corresponding to pin P21.

　例えば、遠隔参加者が、ピンＰ１１に対応する位置に再生位置を合わせることにより、ピンＰ１１に対応するオブジェクトＢ１１が認識された表示映像Ｈ１１における再生位置に対応する、表示映像、現場作業者の音声および遠隔参加者の音声が出力され得る。同様に、遠隔参加者が、ピンＰ２１に対応する位置に再生位置を合わせることにより、ピンＰ２１に対応するオブジェクトが認識された表示映像Ｈ１１における再生位置に対応する、表示映像、現場作業者の音声および遠隔参加者の音声が出力され得る。 For example, when a remote participant adjusts the playback position to a position corresponding to pin P11, a display image, the voice of the on-site worker, and the voice of the remote participant corresponding to the playback position in the display image H11 where the object B11 corresponding to pin P11 is recognized can be output. Similarly, when a remote participant adjusts the playback position to a position corresponding to pin P21, a display image, the voice of the on-site worker, and the voice of the remote participant corresponding to the playback position in the display image H11 where the object corresponding to pin P21 is recognized can be output.

　図２２は、タグ内のメタ情報が作業現場のマップおよび表示映像に付される場合について説明するための図である。図２２を参照すると、作業現場のマップＷ１３が示されている。制御部３４０は、オブジェクトの位置情報に対応する作業現場のマップＷ１３における位置に、ピンＰ２１を付加してもよい。さらに、図２２を参照すると、表示映像Ｗ１１が示されている。制御部３４０は、オブジェクトの位置情報に対応する表示映像Ｗ１１における位置に、ピンＰ２１を付加してもよい。 FIG. 22 is a diagram for explaining a case where meta information in a tag is attached to a map of the work site and a displayed image. Referring to FIG. 22, a map W13 of the work site is shown. The control unit 340 may add a pin P21 to a position on the map W13 of the work site that corresponds to the position information of the object. Furthermore, referring to FIG. 22, a displayed image W11 is shown. The control unit 340 may add a pin P21 to a position on the displayed image W11 that corresponds to the position information of the object.

　また、遠隔参加者は、マルチモーダル処理によって得られる各種データ（３６０度映像、対話履歴、メタ情報など）が保存されているため、ポスト処理を活かして、より効率的に所望の情報へアクセスすることが可能となる。 In addition, because the various data obtained through multimodal processing (360-degree video, dialogue history, meta information, etc.) is stored, remote participants can utilize post-processing to more efficiently access the desired information.

　（アーカイブ映像の自動ハイライト）
　３６０度映像の全部が、サーバ３０が備える記憶部３５０に保存されてしまうと、サーバ３０の管理費が膨大になってしまい、アーカイブ映像にアクセスするときの通信負荷も大きくなってしまう。 (Automatic highlights from archive footage)
If the entire 360-degree video were to be stored in the storage unit 350 of the server 30, the management costs of the server 30 would become enormous, and the communication load when accessing the archived video would also become large.

　そこで、制御部３４０は、オブジェクトＢ１１が認識された時刻の数秒前から、オブジェクトＢ１１が認識された時刻の数秒後までの時間帯をハイライトし、それ以外の時間帯の３６０度映像のフレームレートを低下させるといった処理をすることによって、３６０度映像のデータ量を大幅に下げることができる。 Therefore, the control unit 340 highlights the period from a few seconds before the object B11 is recognized to a few seconds after the object B11 is recognized, and reduces the frame rate of the 360-degree video during other periods, thereby significantly reducing the amount of data for the 360-degree video.

　（指摘箇所の映像の高精細化）
　現場作業者または遠隔参加者によって指摘されたオブジェクトをより鮮明にしたいという要求がある。そのため、制御部３４０は、超解像を用いて、表示映像のうち、現場作業者または遠隔参加者によって指摘されたオブジェクトの周辺のみを高解像度化してもよい。 (Increasing the resolution of the image of the pointed out area)
There is a demand for the object pointed out by the on-site worker or the remote participant to be made clearer, so the control unit 340 may use super-resolution to increase the resolution of only the periphery of the object pointed out by the on-site worker or the remote participant in the displayed image.

　また、制御部３４０は、表示映像の揺れによるボケ、白飛びなどの色合い、などを補正してもよい。また、制御部３４０は、画像特徴量をベースとしたスタビライゼーション、機械学習による超解像処理を活用することができる。 The control unit 340 may also correct blurring caused by shaking of the displayed image, overexposure, and other color tones. The control unit 340 may also utilize stabilization based on image features and super-resolution processing using machine learning.

　（見落とし箇所の認識と表示）
　上記においては、オブジェクトごとに指摘可能性をリアルタイムに推定し、指摘可能性が閾値よりも高いオブジェクトに関する情報をリアルタイムに表示する技術について説明した。しかし、遠隔参加者デバイス２０は、アーカイブ映像の表示のときに、リアルタイムには遠隔参加者が見落としてしまった指摘可能性のある箇所を表示してもよい。 (Recognizing and displaying overlooked areas)
In the above, a technique for estimating the possibility of pointing out for each object in real time and displaying information about objects with a higher possibility of pointing out than a threshold in real time has been described. However, when displaying the archived video, the remote participant device 20 may display in real time points that may be pointed out by the remote participant that were overlooked by the remote participant.

　また、高負荷の演算手法、未来時刻の情報を使ったフィードバックなどが行われることも想定される。これにより、リアルタイム処理が行われるときより、高い精度で各種の認識を行うことが可能となる。 It is also expected that high-load calculation methods and feedback using information from the future will be used. This will enable various recognitions to be performed with higher accuracy than when real-time processing is performed.

　（タグの管理）
　制御部３４０によって得られたタグは、作業現場の状況の管理に用いられ得る。例えば遠隔参加者は、スクラム方式を用いて、作業現場の状況の管理することができる。図２３を参照しながら、一般的なスクラム方式を用いて、作業現場の状況を管理する例について説明する。 (Tag Management)
The tag obtained by the control unit 340 can be used to manage the situation of the work site. For example, the remote participants can manage the situation of the work site using the SCRUM method. With reference to FIG. 23 , an example of managing the situation of the work site using a general SCRUM method will be described.

　図２３は、タグの管理の一例を示す図である。図２３を参照すると、タグＴ５１、タグＴ５２、タグＴ５３の順に変化している。遠隔参加者は、タグをチケットとして管理し、作業現場の状況の変化、指摘者へのフィードバックおよびレビュー、終了などのように、タグの情報を修正していくことが可能である。 FIG. 23 is a diagram showing an example of tag management. Referring to FIG. 23, the tags change in the order of tag T51, tag T52, and tag T53. Remote participants can manage tags as tickets and modify the tag information, such as changes in the situation at the work site, feedback and review to the person who pointed out the problem, and completion.

　（レポート出力）
　さらに、タグは、作業の進捗状況を示すレポートとして出力され得る。例えば、現場所長は、毎日の進捗状況を上級管理者または施主などに対して報告する必要がある。本開示の実施形態によれば、現場所長は、作業の進捗状況を示すレポートを自動で出力することが可能となる。 (Report output)
Furthermore, the tag can be output as a report showing the progress of the work. For example, the site manager needs to report the progress of the work every day to a senior manager or a client. According to the embodiment of the present disclosure, the site manager can automatically output a report showing the progress of the work.

　図２４は、レポートの一例を示す図である。図２４を参照すると、作業の進捗状況を示すレポートの一例として、ピンＰ２１が付加された作業現場のマップＷ１３、ピンＰ２１が付加された表示映像Ｗ１１、および、タグＴ４１が含まれた、出力情報Ｒ１１が示されている。サーバ３０における制御部３４０は、出力情報Ｒ１１を自動的に作成して出力する。例えば、制御部３４０は、現場所長からのレポート出力指示に基づいて、出力情報Ｒ１１を自動的に作成して出力し得る。出力情報Ｒ１２および出力情報Ｒ１３それぞれも、出力情報Ｒ１１と同様の情報である。 FIG. 24 is a diagram showing an example of a report. Referring to FIG. 24, as an example of a report showing the progress of work, a map W13 of the work site with a pin P21 added, a display image W11 with a pin P21 added, and output information R11 including a tag T41 are shown. The control unit 340 in the server 30 automatically creates and outputs the output information R11. For example, the control unit 340 may automatically create and output the output information R11 based on a report output instruction from the site manager. The output information R12 and output information R13 are each similar information to the output information R11.

　例えば、遠隔参加者による期間（例えば、日または月など）を指定する操作が操作検出部２３０に入力されたとする。例えば、タグＴ４１は、その期間に属する時刻情報を含んだタグである場合を想定する。表示映像Ｗ１１は、その期間に属する時刻情報に関連付けられた視点方向およびＦｏＶに応じた表示映像である場合を想定する。ピンＰ２１が付加された作業現場のマップＷ１３は、その期間に属する時刻情報に関連付けられたオブジェクトの位置情報に応じた作業現場のマップである場合を想定する。 For example, suppose that an operation specifying a period (e.g., a day or a month) is input to the operation detection unit 230 by a remote participant. For example, it is assumed that the tag T41 is a tag including time information belonging to that period. It is assumed that the display image W11 is a display image corresponding to the viewpoint direction and FoV associated with the time information belonging to that period. It is assumed that the map W13 of the work site with the pin P21 added is a map of the work site corresponding to the position information of an object associated with the time information belonging to that period.

　かかる場合には、出力情報Ｒ１１に含まれる時刻情報は、遠隔参加者によって指定された期間に属している。同様に、出力情報Ｒ１２および出力情報Ｒ１３それぞれに含まれる時刻情報も、遠隔参加者によって指定された期間に属している場合を想定する。 In such a case, the time information included in output information R11 belongs to the period specified by the remote participants. Similarly, it is assumed that the time information included in output information R12 and output information R13 also belongs to the period specified by the remote participants.

　制御部３４０は、期間を指定する操作が入力されたことに基づいて、その期間に属する時刻情報を含んだ、出力情報Ｒ１１、出力情報Ｒ１２および出力情報Ｒ１３の、ディスプレイ２８０による表示を制御する。 Based on the input of an operation to specify a period, the control unit 340 controls the display 280 to display the output information R11, output information R12, and output information R13, which include time information belonging to that period.

　以上、本開示の実施形態に係るマルチモーダル処理の詳細について説明した。 The above describes the details of multimodal processing according to an embodiment of the present disclosure.

　（１．６．二次元コードの利用）
　続いて、本開示の実施形態に係る二次元コードの利用について説明する。 (1.6. Use of two-dimensional codes)
Next, the use of two-dimensional codes according to an embodiment of the present disclosure will be described.

　（二次元コード利用の概要）
　上記したように、遠隔参加者は、サーバ３０によるタグ付けによってタグに組み込まれたメタ情報を用いて、所望の表示映像に対して容易にアクセスすることが可能となる。特に、現場作業者の位置情報がメタ情報としてタグに組み込まれることにより、遠隔参加者は、タグに組み込まれた現場作業者の位置情報を用いて、所望の表示映像に対して効率良くアクセスすることが可能となる。 (Overview of 2D code usage)
As described above, the remote participants can easily access the desired display video by using the meta information embedded in the tag through tagging by the server 30. In particular, by incorporating the location information of the on-site worker into the tag as meta information, the remote participants can efficiently access the desired display video by using the location information of the on-site worker embedded in the tag.

　なお、上記したように、現場作業者の位置情報は、現場作業者デバイス１０において、ＧＰＳセンサによって検出されてもよいし、自己位置推定技術によって検出されてもよい。あるいは、現場作業者の位置情報は、サーバ３０によって作業現場の映像に基づいて検出されてもよい。また、現場作業者の位置情報の表現形式は限定されない。上記したように、現場作業者の位置情報は、作業現場のマップ上における座標によって表現されてもよい。例えば、マップは、作業現場のフロアごとに用意されてもよく、一つのマップは複数のエリアに分かれていてもよい。 As described above, the location information of the on-site worker may be detected by a GPS sensor in the on-site worker device 10, or may be detected by self-location estimation technology. Alternatively, the location information of the on-site worker may be detected by the server 30 based on video of the work site. Furthermore, the representation format of the location information of the on-site worker is not limited. As described above, the location information of the on-site worker may be represented by coordinates on a map of the work site. For example, a map may be prepared for each floor of the work site, and one map may be divided into multiple areas.

　ここで、現場作業者が存在する位置によっては、ＧＰＳセンサによる検出精度が低くなってしまう場合がある。そこで、現場作業者が存在する位置を問わずに検出精度を高めるために、自己位置推定技術を用いることも考えられる。しかし、自己位置推定技術による検出では、検出に要する処理コストが大きくなりやすく、検出の即時性も高くなりにくい。さらに、作業現場は、複数のフロアからなる階層構造を有している場合が多く、階層構造を理解するために多くの時間が必要となってしまう。 Here, depending on the location of the on-site worker, the detection accuracy of the GPS sensor may be low. Therefore, in order to improve detection accuracy regardless of the location of the on-site worker, it is possible to use self-location estimation technology. However, detection using self-location estimation technology tends to require high processing costs for detection, and it is difficult to achieve high immediacy of detection. Furthermore, work sites often have a hierarchical structure consisting of multiple floors, and it takes a lot of time to understand the hierarchical structure.

　そこで、以下の説明においては、作業現場に二次元コードを付することを提案する。まず、図２５を参照しながら、二次元コードが付された作業現場の例について説明する。 In the following explanation, we propose attaching a two-dimensional code to the work site. First, with reference to Figure 25, we will explain an example of a work site with a two-dimensional code.

　図２５は、二次元コードが付された作業現場の例を示す図である。図２５を参照すると、作業現場ＪにオブジェクトＢ２１およびオブジェクトＢ２２が存在している。また、作業現場Ｊに、現場作業者デバイス１０を使用する現場作業者が存在している。また、作業現場Ｊの所定の場所に、二次元コードＣ１が付されている。二次元コードＣ１が付された場所の位置情報と、二次元コードＣ１とは、あらかじめ関連付けられている。 FIG. 25 is a diagram showing an example of a work site to which a two-dimensional code has been attached. Referring to FIG. 25, objects B21 and B22 exist at work site J. Furthermore, at work site J, there exists a field worker using a field worker device 10. Furthermore, a two-dimensional code C1 has been attached to a specific location at work site J. The position information of the location to which the two-dimensional code C1 has been attached and the two-dimensional code C1 are associated in advance.

　そして、現場作業者デバイス１０が備えるカメラ１１０（図４）の撮像範囲に二次元コードＣ１が入ると、カメラ１１０によって二次元コードＣ１が読み取られ、二次元コードＣ１が読み取られた時刻が時刻情報として、当該位置情報に関連付けられる。これによって、二次元コードＣ１が付された場所の近くに、当該時刻情報が示す時刻に現場作業者が到達したことが把握される。 When the two-dimensional code C1 comes within the imaging range of the camera 110 (FIG. 4) provided on the field worker device 10, the two-dimensional code C1 is read by the camera 110, and the time when the two-dimensional code C1 is read is associated with the position information as time information. This makes it clear that the field worker has arrived near the location where the two-dimensional code C1 is attached at the time indicated by the time information.

　さらに、二次元コードＣ１が付された場所の位置情報に、フロアを示す情報とフロアにおけるエリアを示す情報とが組み込まれていれば、現場作業者がどのフロアのどのエリアにいつ到達したかが把握され得る。一例として、二次元コードＣ１が付された場所の位置情報に時刻情報が関連付けられたことがリアルタイムに確認されれば、二次元コードＣ１が付された場所の近くに現場作業者が到達したことが即時的に把握され得る。 Furthermore, if the location information of the location where the two-dimensional code C1 is attached incorporates information indicating the floor and information indicating the area on the floor, it can be known when the field worker arrived at which area on which floor. As an example, if it is confirmed in real time that time information is associated with the location information of the location where the two-dimensional code C1 is attached, it can be instantly known that the field worker has arrived near the location where the two-dimensional code C1 is attached.

　また、二次元コードＣ１が付された場所の位置情報に、３６０度映像および現場音声などが関連付けられてもよい。遠隔参加者は、３６０度映像および現場音声などを確認することによって、二次元コードＣ１が付された場所の近くの状況を定期的に確認することも可能となる。 Furthermore, 360-degree video and on-site audio may be associated with the location information of the location to which the two-dimensional code C1 is attached. By checking the 360-degree video and on-site audio, remote participants can periodically check the situation near the location to which the two-dimensional code C1 is attached.

　なお、図２５に示された例では、二次元コードＣ１がＱＲ（Ｑｕｉｃｋ　Ｒｅｓｐｏｎｓｅ）コード（登録商標）である。しかし、後にも説明するように、二次元コードＣ１は、ＱＲコード（登録商標）に限定されなくてもよい。 In the example shown in FIG. 25, the two-dimensional code C1 is a QR (Quick Response) code (registered trademark). However, as will be explained later, the two-dimensional code C1 does not have to be limited to a QR code (registered trademark).

　（ハードウェア処理）
　図２６は、現場作業者デバイス１０によって実行されるハードウェア処理を説明するための図である。図２６に示されるように、現場作業者デバイス１０が備える制御部１４０は、ＧＰＵ（Ｇｒａｐｈｉｃｓ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）１４１と、ＣＰＵ１４５とを備える。ＧＰＵ１４１は、映像取得部１４２と、画像処理部１４３と、エンコーダ１４４とを備える。 (Hardware processing)
Fig. 26 is a diagram for explaining hardware processing executed by the field worker device 10. As shown in Fig. 26, the control unit 140 included in the field worker device 10 includes a GPU (Graphics Processing Unit) 141 and a CPU 145. The GPU 141 includes a video acquisition unit 142, an image processing unit 143, and an encoder 144.

　現場作業者デバイス１０が備えるカメラ１１０（図４）によって得られた複数のカメラ映像は、ＧＰＵ１４１に入力され、映像取得部１４２によって取得される。一方、現場作業者デバイス１０が備える、図示しないＩＭＵ（Ｉｎｅｒｔｉａｌ　Ｍｅａｓｕｒｅｍｅｎｔ　Ｕｎｉｔ）によって得られたＩＭＵ情報は、ＣＰＵ１４５に入力される。ＣＰＵ１４５は、ＩＭＵ情報を取得し、ＩＭＵ情報に対するフィルタリングを実行して回転角を得る（Ｓ１４７）。ＣＰＵ１４５は、回転角をＧＰＵ１４１に出力する（Ｓ１４６）。 Multiple camera images obtained by the cameras 110 (FIG. 4) provided on the field worker device 10 are input to the GPU 141 and acquired by the image acquisition unit 142. Meanwhile, IMU information obtained by an IMU (Inertial Measurement Unit) (not shown) provided on the field worker device 10 is input to the CPU 145. The CPU 145 acquires the IMU information and performs filtering on the IMU information to obtain a rotation angle (S147). The CPU 145 outputs the rotation angle to the GPU 141 (S146).

　画像処理部１４３は、複数のカメラ映像に対するスティッチングを行うことにより、複数のカメラ映像を合成して合成画像を生成し（Ｓ１４１）、回転角に基づいて合成画像を回転させることにより、３６０度映像を生成する（Ｓ１４２）。画像処理部１４３は、３６０度映像を、エンコーダ１４４に出力するとともにＣＰＵ１４５に出力する。エンコーダ１４４は、３６０度映像に対して符号化を行う。符号化後の３６０度映像は、サーバ３０を介してリアルタイムに遠隔参加者デバイス２０に配信されたり、サーバ３０に保存されたりする。 The image processing unit 143 stitches the multiple camera images to generate a composite image (S141), and generates a 360-degree video by rotating the composite image based on the rotation angle (S142). The image processing unit 143 outputs the 360-degree video to the encoder 144 and also to the CPU 145. The encoder 144 encodes the 360-degree video. The encoded 360-degree video is distributed to the remote participant device 20 in real time via the server 30, or is stored in the server 30.

　ＣＰＵ１４５は、３６０度映像に基づいて二次元コードを認識するコード認識を行う（Ｓ１４８）。このように、ＧＰＵ１４１による符号化とＣＰＵ１４５によるコード認識とが並列に実行されることにより、ＧＰＵ１４１による処理速度を低下させることなく、コード認識が実行され得る。ＣＰＵ１４５によって認識された二次元コードは、サーバ３０に送信される。 The CPU 145 performs code recognition to recognize the two-dimensional code based on the 360-degree video (S148). In this way, the encoding by the GPU 141 and the code recognition by the CPU 145 are performed in parallel, so that the code recognition can be performed without reducing the processing speed of the GPU 141. The two-dimensional code recognized by the CPU 145 is transmitted to the server 30.

　なお、ＣＰＵ１４５は、現場作業者デバイス１０にランプが備えられている場合には、ランプを発光させることによって二次元コードを認識したことを現場作業者に通知してもよい。あるいは、ＣＰＵ１４５は、マイクロフォン１２０から所定の音を出力させることによって二次元コードを認識したことを現場作業者に通知してもよい。このとき、ＣＰＵ１４５は、認識した二次元コードごとに、発光するランプの色または出力される音を変えてもよい。 If the field worker device 10 is equipped with a lamp, the CPU 145 may notify the field worker that a two-dimensional code has been recognized by illuminating the lamp. Alternatively, the CPU 145 may notify the field worker that a two-dimensional code has been recognized by outputting a specified sound from the microphone 120. At this time, the CPU 145 may change the color of the illuminated lamp or the sound output for each recognized two-dimensional code.

　また、ＣＰＵ１４５は、二次元コードを認識したことに基づいて、ハードウェア制御を実行する（Ｓ１４９）。ここで、ハードウェア制御は、主に３６０度映像に関するハードウェア制御であってもよい。すなわち、ＣＰＵ１４５は、二次元コードを認識したことに基づいて、３６０度映像に関する制御を行ってもよい。例えば、３６０度映像に関する制御は、どの二次元コードが認識されたかに応じて変化してもよい。 The CPU 145 also executes hardware control based on the recognition of the two-dimensional code (S149). Here, the hardware control may be hardware control related mainly to the 360-degree video. That is, the CPU 145 may perform control related to the 360-degree video based on the recognition of the two-dimensional code. For example, the control related to the 360-degree video may change depending on which two-dimensional code is recognized.

　例えば、作業現場の中には、危険な場所が存在する場合があり得る。かかる場合には、危険な場所の近く（例えば、危険な場所の入口など）に二次元コードが付されていてもよい。このとき、ＧＰＵ１４１は、危険な場所の近くに付された二次元コードがＣＰＵ１４５によって認識されたことに基づいて、サーバ３０への３６０度映像の送信を自動的に開始することにより、３６０度映像に関する制御を行ってもよい。 For example, there may be dangerous locations within a work site. In such a case, a two-dimensional code may be attached near the dangerous location (for example, at the entrance to the dangerous location). In this case, the GPU 141 may control the 360-degree video by automatically starting transmission of the 360-degree video to the server 30 based on the CPU 145 recognizing the two-dimensional code attached near the dangerous location.

　サーバ３０において、制御部３４０は、３６０度映像と遠隔参加者の視点方向およびＦｏＶとに応じた表示映像を、遠隔参加者デバイス２０に自動的に配信し始めてもよい。そして、遠隔参加者デバイス２０において、制御部２４０は、ディスプレイ２８０による表示映像の表示を開始してもよい。これにより、危険な場所に存在する現場作業者の状況が遠隔参加者によって容易に把握され得る。なお、サーバ３０において、制御部３４０は、３６０度映像を自動的に保存し始めてもよい。 In the server 30, the control unit 340 may automatically start to distribute the 360-degree video and the display video according to the viewpoint direction and FoV of the remote participant to the remote participant device 20. Then, in the remote participant device 20, the control unit 240 may start to display the display video on the display 280. This allows the remote participant to easily grasp the situation of the field worker who is in a dangerous location. In addition, in the server 30, the control unit 340 may automatically start to save the 360-degree video.

　あるいは、作業現場の中には、私的な場所（例えば、トイレなど）が存在する場合があり得る。かかる場合には、私的な場所の近く（例えば、私的な場所の入口など）に二次元コードが付されていてもよい。このとき、ＧＰＵ１４１は、私的な場所の近くに付された二次元コードがＣＰＵ１４５によって認識され始めた場合には（すなわち、現場作業者がトイレに入るときなどには）、サーバ３０への３６０度映像の送信を自動的に停止することにより、３６０度映像に関する制御を行ってもよい。 Alternatively, there may be a private place (such as a toilet) within the work site. In such a case, a two-dimensional code may be attached near the private place (such as the entrance to the private place). In this case, when the two-dimensional code attached near the private place begins to be recognized by the CPU 145 (i.e., when the site worker enters the toilet), the GPU 141 may control the 360-degree video by automatically stopping the transmission of the 360-degree video to the server 30.

　サーバ３０において、制御部３４０は、３６０度映像と遠隔参加者の視点方向およびＦｏＶとに応じた表示映像の、遠隔参加者デバイス２０への配信を自動的に停止してもよい。そして、遠隔参加者デバイス２０において、制御部２４０は、ディスプレイ２８０による表示映像の表示を停止してもよい。これにより、私的な場所に存在する現場作業者の状況が遠隔参加者に把握されずに済む。なお、サーバ３０において、制御部３４０は、３６０度映像の保存を自動的に停止してもよい。 In the server 30, the control unit 340 may automatically stop the delivery of the 360-degree video and the display video according to the viewpoint direction and FoV of the remote participant to the remote participant device 20. Then, in the remote participant device 20, the control unit 240 may stop the display of the display video on the display 280. This prevents the remote participants from being aware of the situation of the on-site worker who is in a private location. In addition, in the server 30, the control unit 340 may automatically stop saving the 360-degree video.

　一方、ＧＰＵ１４１は、私的な場所の近くに付された二次元コードのＣＰＵ１４５による認識が中断されてから再開された場合には（すなわち、現場作業者がトイレから出るときなどには）、危険な場所の近くに付された二次元コードが認識された場合と同様に、サーバ３０への３６０度映像の送信を自動的に開始することにより、３６０度映像に関する制御を行ってもよい。 On the other hand, when the recognition by the CPU 145 of a two-dimensional code placed near a private location is interrupted and then resumed (i.e., when the field worker leaves the toilet, etc.), the GPU 141 may control the 360-degree video by automatically starting transmission of the 360-degree video to the server 30, in the same way as when a two-dimensional code placed near a dangerous location is recognized.

　あるいは、作業現場の中には、撮像環境が良好でない場所（例えば、暗い部屋など）が存在する場合があり得る。かかる場合には、撮像環境が良好でない場所の近く（例えば、撮像環境が良好でない場所の入口など）に二次元コードが付されていてもよい。このとき、ＧＰＵ１４１は、撮像環境が良好でない場所の近くに付された二次元コードがＣＰＵ１４５によって認識されたことに基づいて、３６０度映像に関する所定のパラメータを変更することにより、３６０度映像に関する制御を行ってもよい。 Alternatively, there may be a place in the work site where the imaging environment is poor (e.g., a dark room). In such a case, a two-dimensional code may be attached near the place where the imaging environment is poor (e.g., the entrance to the place where the imaging environment is poor). In this case, the GPU 141 may control the 360-degree video by changing a predetermined parameter related to the 360-degree video based on the CPU 145 recognizing the two-dimensional code attached near the place where the imaging environment is poor.

　より具体的に、ＧＰＵ１４１は、３６０度映像を生成するための複数のカメラ映像を撮像するカメラ１１０（図４）の所定のパラメータを変更することにより、３６０度映像に関する制御を行ってもよい。例えば、パラメータの変更は、ＥＶ値（Ｅｘｐｏｓｕｒｅ　Ｖａｌｕｅ）を所定の値まで高くすることであってもよい。これにより、３６０度映像の明るさが高まることが期待される。 More specifically, GPU 141 may control the 360-degree video by changing a specific parameter of camera 110 (FIG. 4) that captures multiple camera images to generate the 360-degree video. For example, the parameter change may be to increase the EV (Exposure Value) to a specific value. This is expected to increase the brightness of the 360-degree video.

　なお、現場作業者デバイス１０によって短時間に同一の二次元コードが複数回認識されてしまうことが懸念される。したがって、ＣＰＵ１４５は、所定の時間内に認識した複数の二次元コードのうちのいずれか一つの二次元コードを、複数の二次元コードを代表する二次元コード（以下、「代表の二次元コード」とも言う。）として決定し、決定した代表の二次元コードをサーバ３０に送信してもよい。例えば、所定の時間の長さは、１分などであってもよい。また、所定の時間の長さは、現場作業者などにより変更可能であってもよい。 It is a concern that the same two-dimensional code may be recognized multiple times by the field worker device 10 within a short period of time. Therefore, the CPU 145 may determine one of the multiple two-dimensional codes recognized within a specified time period as a two-dimensional code that represents the multiple two-dimensional codes (hereinafter also referred to as a "representative two-dimensional code"), and transmit the determined representative two-dimensional code to the server 30. For example, the length of the specified time period may be one minute. Furthermore, the length of the specified time period may be changeable by the field worker, etc.

　また、複数の二次元コードのうちのいずれが代表の二次元コードとして決定されてもよい。例えば、複数の二次元コードのうち認識された順番が真ん中である二次元コードが代表の二次元コードとして決定されてもよい。 Also, any one of the multiple two-dimensional codes may be determined as the representative two-dimensional code. For example, the two-dimensional code that is recognized in the middle of the multiple two-dimensional codes may be determined as the representative two-dimensional code.

　あるいは、私的な場所の近くに付された二次元コードの認識が開始された場合など（例えば、現場作業者がトイレに入るときなど）も想定され得る。かかる場合などには、複数の二次元コードのうち最初に認識された二次元コードが代表の二次元コードとして決定されてもよい。 Alternatively, it may be the case that recognition of a two-dimensional code placed near a private location is initiated (for example, when a field worker enters a restroom). In such a case, the first two-dimensional code to be recognized among multiple two-dimensional codes may be determined to be the representative two-dimensional code.

　あるいは、私的な場所の近くに付された二次元コードの認識が中断されてから再開された場合など（例えば、現場作業者がトイレから出るときなど）も想定され得る。かかる場合などには、複数の二次元コードのうち最後に認識された二次元コードが代表の二次元コードとして決定されてもよい。 Alternatively, it may be possible that recognition of a two-dimensional code placed near a private location is interrupted and then resumed (for example, when a field worker leaves a bathroom). In such a case, the last two-dimensional code recognized among multiple two-dimensional codes may be determined as the representative two-dimensional code.

　ここで、二次元コードは、３６０度映像から直接認識されてもよいが、３６０度映像には、二次元コードが歪んで写ってしまっている可能性がある。そのため、二次元コードが、３６０度映像から直接認識される場合には、二次元コードの認識精度が高まらない可能性もある。そこで、二次元コードは、３６０度映像を参照しながら歪みの少ないカメラ映像から直接認識されてもよい。図２７を参照しながら、かかる３６０度映像とカメラ映像とに基づく二次元コードの認識の例について説明する。 Here, the two-dimensional code may be recognized directly from the 360-degree video, but the two-dimensional code may appear distorted in the 360-degree video. Therefore, when the two-dimensional code is recognized directly from the 360-degree video, the recognition accuracy of the two-dimensional code may not be improved. Therefore, the two-dimensional code may be recognized directly from camera images with less distortion while referring to the 360-degree video. An example of recognition of a two-dimensional code based on such a 360-degree video and camera images will be described with reference to FIG. 27.

　図２７は、３６０度映像とカメラ映像とに基づく二次元コードの認識の例について説明するための図である。図２７を参照すると、３６０度映像Ｇ５１とカメラ映像Ｈ５１とカメラ映像Ｈ５２とが示されている。 FIG. 27 is a diagram for explaining an example of recognition of a two-dimensional code based on a 360-degree image and a camera image. Referring to FIG. 27, a 360-degree image G51, a camera image H51, and a camera image H52 are shown.

　カメラ映像Ｈ５１は、現場作業者の視点方向のカメラ映像であり、３６０度映像Ｇ５１の領域Ｇ５１１に対応している。また、カメラ映像Ｈ５２は、カメラ映像Ｈ５１から水平方向に移動した位置のカメラ映像であり、３６０度映像Ｇ５１の領域Ｇ５１２に対応している。３６０度映像Ｇ５１には、歪みの多い二次元コードＣ１が写っている。一方、カメラ映像Ｈ５１には、歪みの少ない二次元コードＣ１が写っている。 Camera image H51 is a camera image from the viewpoint of the on-site worker, and corresponds to area G511 of the 360-degree image G51. Camera image H52 is a camera image from a position moved horizontally from camera image H51, and corresponds to area G512 of the 360-degree image G51. The 360-degree image G51 shows a two-dimensional code C1 with a lot of distortion. On the other hand, camera image H51 shows a two-dimensional code C1 with little distortion.

　そこで、ＣＰＵ１４５は、領域Ｇ５１１に対応するカメラ映像Ｈ５１から二次元コードの認識を試み、領域Ｇ５１２に対応するカメラ映像Ｈ５２から二次元コードの認識を試みる。このようにして、ＣＰＵ１４５は、３６０度映像Ｇ５１における領域を水平方向に移動させながら、全方位の領域に対応するカメラ映像に対して、二次元コードの認識を試みる。これによって、二次元コードの認識精度が高まることが期待される。 The CPU 145 then attempts to recognize the two-dimensional code from the camera image H51 corresponding to area G511, and attempts to recognize the two-dimensional code from the camera image H52 corresponding to area G512. In this way, the CPU 145 attempts to recognize the two-dimensional code in the camera images corresponding to areas in all directions while moving the area in the 360-degree image G51 horizontally. This is expected to improve the accuracy of recognizing the two-dimensional code.

　（ソフトウェア処理）
　図２８は、サーバ３０によって実行されるソフトウェア処理を説明するための図である。図２８に示されるように、サーバ３０が備える記憶部３５０には、「ＩＤ」と「マップ」とマップにおける「座標」と「時刻情報」と「到達者名」とが関連付けられたデータベースが記憶されている。 (Software processing)
Fig. 28 is a diagram for explaining software processing executed by server 30. As shown in Fig. 28, storage unit 350 included in server 30 stores a database in which "ID", "map", "coordinates" on the map, "time information", and "arrival name" are associated with each other.

　図２８に示された例では、「マップ」には、「マップＭ１」が登録されており、「座標」には、マップＭ１における座標（ｘ１，ｙ１）が登録されている。なお、「マップ」と「座標」との組み合わせは、現場作業者の位置情報の例に該当する。また、「時刻情報」および「到達者名」には、データが登録されていなくてもよいし、既にデータが登録されていてもよい。 In the example shown in FIG. 28, "Map M1" is registered in "Map," and the coordinates (x1, y1) on Map M1 are registered in "Coordinates." The combination of "Map" and "Coordinates" corresponds to an example of location information for a field worker. Furthermore, data may not be registered in "Time Information" and "Arrival Name," or data may already be registered.

　現場作業者デバイス１０によって二次元コードが認識されると、現場作業者デバイス１０からサーバ３０に「二次元コード」が送信される。このとき、二次元コードがサーバ３０に送信されるだけではなく、現場作業者デバイス１０にあらかじめ設定された現場作業者の名前が「到達者名」として、現場作業者デバイス１０からサーバ３０に送信されてもよい。図２８に示された例では、「二次元コード」が「１」であり、「到達者名」が「技術職員Ａ」である。 When the two-dimensional code is recognized by the field worker device 10, the "two-dimensional code" is transmitted from the field worker device 10 to the server 30. At this time, not only is the two-dimensional code transmitted to the server 30, but the name of the field worker pre-set in the field worker device 10 may also be transmitted from the field worker device 10 to the server 30 as the "arrival name." In the example shown in FIG. 28, the "two-dimensional code" is "1" and the "arrival name" is "Technical staff A."

　サーバ３０においては、通信部３６０によって二次元コードが受信されたことに基づいて、制御部３４０は、二次元コードに対応するＩＤに関連付けられた時刻情報に、現在時刻を示す時刻情報を登録する。図２８に示された例では、「時刻情報」が「９／２４　１４：１５」である。なお、現在時刻は、現場作業者デバイス１０によって二次元コードが認識された時刻とも換言され得る。また、制御部３４０は、二次元コードに対応するＩＤに関連付けられた「到達者名」に、「技術職員Ａ」を登録する。 In the server 30, upon receipt of the two-dimensional code by the communication unit 360, the control unit 340 registers time information indicating the current time in the time information associated with the ID corresponding to the two-dimensional code. In the example shown in FIG. 28, the "time information" is "9/24 14:15." The current time can also be expressed as the time when the two-dimensional code was recognized by the field worker device 10. The control unit 340 also registers "Technical Staff A" in the "Arrival Name" associated with the ID corresponding to the two-dimensional code.

　制御部３４０は、マップＭ１と、マップＭ１における座標（ｘ，ｙ）と、時刻情報「１４：１５」と、「到達者名：技術職員Ａ」とを、通信部３６０を介して遠隔参加者デバイス２０に送信し得る。 The control unit 340 can transmit the map M1, the coordinates (x, y) on the map M1, the time information "14:15", and "Arrival name: Technical staff A" to the remote participant device 20 via the communication unit 360.

　このように、現場作業者の位置情報の例であるマップおよびマップにおける座標に対して、現場作業者デバイス１０によって二次元コードが認識された時刻情報が関連付けられる。これにより、現場作業者の位置情報に関連付けられた３６０度映像と時刻情報とが関連付けられるため、二次元コードが認識された時刻情報に対応する３６０度映像の特定が容易となる。 In this way, the time information at which the two-dimensional code was recognized by the field worker device 10 is associated with the map and the coordinates on the map, which are examples of the field worker's location information. This associates the 360-degree image associated with the field worker's location information with the time information, making it easy to identify the 360-degree image that corresponds to the time information at which the two-dimensional code was recognized.

　（現場作業者の位置検出に関して）
　上記したように、現場作業者の位置は、ＧＰＳセンサによって検出されてもよいし、自己位置推定技術によって検出されてもよいし、二次元コードの読み取りによって検出されてもよい。特に、ＧＰＳセンサは屋外での位置検出に適している。そのため、屋外が作業現場である場合などには、現場作業者の位置は、ＧＰＳセンサによって検出されるのが望ましい。 (Regarding location detection of field workers)
As described above, the position of the on-site worker may be detected by a GPS sensor, may be detected by self-location estimation technology, or may be detected by reading a two-dimensional code. In particular, a GPS sensor is suitable for detecting a position outdoors. Therefore, when a work site is outdoors, it is desirable to detect the position of the on-site worker by a GPS sensor.

　一方、屋内の作業現場は、複数のフロアからなる階層構造を有している場合がある。このような複数のフロアからなる屋内が作業現場である場合などには、現場作業者の位置は、二次元コードの読み取りによって検出されるのが望ましい。 On the other hand, indoor work sites may have a hierarchical structure consisting of multiple floors. In such cases where the work site is an indoor location consisting of multiple floors, it is desirable to detect the location of on-site workers by reading a two-dimensional code.

　また、複数の位置検出が併用されてもよい。例えば、屋外が作業現場である場合には、ＧＰＳセンサによる位置検出と、二次元コードの読み取りによる位置検出とが併用されてもよい。さらに、屋内が作業現場である場合には、自己位置推定技術による位置検出と、二次元コードの読み取りによる位置検出とが併用されてもよい。 Furthermore, multiple position detection methods may be used in combination. For example, if the work site is outdoors, position detection using a GPS sensor and position detection using reading a two-dimensional code may be used in combination. Furthermore, if the work site is indoors, position detection using self-location estimation technology and position detection using reading a two-dimensional code may be used in combination.

　あるいは、現場作業者が重要な場所に到達したことは、二次元コードの読み取りによって検出され、一の重要な場所と他の重要な場所との間に存在する現場作業者の位置は、ＧＰＳセンサまたは自己位置推定技術によって検出されてもよい。 Alternatively, the arrival of a field worker at a key location may be detected by reading a two-dimensional code, and the location of the field worker between one key location and another key location may be detected by a GPS sensor or self-location estimation technology.

　より具体的に、現場作業者が存在するフロアまたはエリアは、二次元コードの読み取りによって検出されてもよい。そして、フロア内における現場作業者の詳細な位置は、ＧＰＳセンサによって検出されてもよい。また、屋外のエリアにおける現場作業者の詳細な位置は、ＧＰＳセンサによって検出され、屋内のエリアにおける現場作業者の詳細な位置は、自己位置推定技術によって検出されてもよい。 More specifically, the floor or area where the field worker is present may be detected by reading a two-dimensional code. The detailed location of the field worker within the floor may then be detected by a GPS sensor. Also, the detailed location of the field worker in an outdoor area may be detected by a GPS sensor, and the detailed location of the field worker in an indoor area may be detected by self-location estimation technology.

　（遠隔参加者デバイスによる表示）
　上記したように、タグ付けによって、作業現場の３６０度映像に対してタグが関連付けられ、３６０度映像とタグとがサーバ３０における記憶部３５０に保存される。このとき、３６０度映像に関連する現場音声および遠隔音声も記憶部３５０に保存されてよい。タグには、メタ情報が含まれる。メタ情報の例としては、時刻情報、現場作業者の位置情報、オブジェクトの種類、指摘者名、指摘情報、返答者名および返答情報などが挙げられる。例えば、現場作業者の位置情報は、マップとマップにおける座標との組み合わせによって表現され得る。 (Viewed by remote participant device)
As described above, by tagging, a tag is associated with a 360-degree image of a work site, and the 360-degree image and the tag are stored in the storage unit 350 of the server 30. At this time, on-site audio and remote audio related to the 360-degree image may also be stored in the storage unit 350. The tag includes meta information. Examples of meta information include time information, location information of the on-site worker, type of object, name of the person who pointed out the problem, pointed out information, name of the person who responded, and response information. For example, location information of the on-site worker may be expressed by a combination of a map and coordinates on the map.

　上記したように、現場作業者が指摘者である場合、かつ、遠隔参加者が返答者である場合もあり得るし、遠隔参加者が指摘者である場合、かつ、現場作業者が返答者である場合もあり得る。すなわち、現場作業者と遠隔作業者とのうち、一方が指摘者であり得るし、他方が返答者であり得る。ただし、指摘者および返答者が、同一のまたは異なる現場作業者である場合もあり得るし、指摘者および返答者が、同一のまたは異なる遠隔作業者である場合もあり得る。 As described above, there may be cases where an on-site worker is the one who points out a problem and a remote participant is the one who responds, and there may be cases where a remote participant is the one who points out a problem and an on-site worker is the one who responds. In other words, one of the on-site worker and the remote worker may be the one who points out a problem and the other may be the one who responds. However, there may be cases where the pointer and the person who responds are the same or different on-site workers, and there may be cases where the pointer and the person who responds are the same or different remote workers.

　また、作業現場に付された二次元コードが現場作業者デバイス１０によって読み取られたことに基づいて、その二次元コードに関連付けられた位置情報に、二次元コードが読み取られた時刻情報と、現場作業者デバイス１０を使用する現場作業者の名前（すなわち、到達者名）とが関連付けられる。位置情報に関連付けられた時刻情報および到達者名は、サーバ３０における記憶部３５０に保存される。 In addition, based on the fact that the two-dimensional code attached to the work site is read by the field worker device 10, the location information associated with the two-dimensional code is associated with the time information when the two-dimensional code was read and the name of the field worker using the field worker device 10 (i.e., the name of the person who arrived). The time information and the name of the person who arrived associated with the location information are stored in the memory unit 350 in the server 30.

　さらに、二次元コードが現場作業者デバイス１０によって読み取られたことに基づいて、作業現場の３６０度映像がサーバ３０を介して遠隔参加者デバイス２０に配信されてもよいし、サーバ３０における記憶部３５０に保存されてもよい。以下では、３６０度映像の遠隔参加者デバイス２０への配信を単に「配信」とも言い、３６０度映像のサーバ３０における記憶部３５０への保存を単に「録画」とも言う。 Furthermore, based on the two-dimensional code being read by the on-site worker device 10, the 360-degree video of the work site may be distributed to the remote participant device 20 via the server 30, or may be stored in the memory unit 350 of the server 30. Hereinafter, the distribution of the 360-degree video to the remote participant device 20 is also simply referred to as "distribution", and the storage of the 360-degree video in the memory unit 350 of the server 30 is also simply referred to as "recording".

　上記では、３６０度映像の配信または録画が、現場作業者デバイス１０による二次元コードの読み取りに基づいて開始される場合を主に想定した。しかし、３６０度映像の配信は、現場作業者デバイス１０に対する現場作業者からの所定の配信開始操作の入力に基づいて開始されてもよい。また、３６０度映像の録画は、現場作業者デバイス１０に対する現場作業者からの所定の録画開始操作の入力に基づいて開始されてもよい。 In the above, it has been mainly assumed that the distribution or recording of 360-degree video is started based on the field worker device 10 reading a two-dimensional code. However, the distribution of 360-degree video may also be started based on the field worker inputting a predetermined operation to start distribution to the field worker device 10. Also, the recording of 360-degree video may also be started based on the field worker inputting a predetermined operation to start recording to the field worker device 10.

　同様に、３６０度映像の配信は、現場作業者デバイス１０に対する現場作業者からの所定の配信停止操作の入力に基づいて停止されてもよい。あるいは、３６０度映像の録画は、現場作業者デバイス１０に対する現場作業者からの所定の録画停止操作の入力に基づいて停止されてもよい。 Similarly, the distribution of the 360-degree video may be stopped based on a predetermined operation to stop distribution being input from the field worker to the field worker device 10. Alternatively, the recording of the 360-degree video may be stopped based on a predetermined operation to stop recording being input from the field worker to the field worker device 10.

　例えば、遠隔参加者デバイス２０－１がマップを選択する操作を遠隔参加者デバイス２０に対して入力すると、選択されたマップが遠隔参加者デバイス２０－１によって表示される。 For example, when the remote participant device 20-1 inputs an operation to select a map to the remote participant device 20, the selected map is displayed by the remote participant device 20-1.

　このとき、マップにタグが関連付けられている場合が想定される。かかる場合には、当該タグに含まれるメタ情報が表示されてもよい。例えば、メタ情報は、指摘情報、指摘者名などであってもよい。また、マップに、マップにおける座標と、二次元コードの読み取りがなされた時刻情報と、到達者名とが関連付けられている場合も想定され得る。かかる場合には、マップにおける座標と、時刻情報と、到達者名とが表示されてもよい。 In this case, it is assumed that a tag is associated with the map. In such a case, meta information included in the tag may be displayed. For example, the meta information may be pointed out information, the name of the person who pointed it out, etc. It is also assumed that coordinates on the map, time information when the two-dimensional code was read, and the name of the person who arrived may be associated with the map. In such a case, the coordinates on the map, time information, and the name of the person who arrived may be displayed.

　（遠隔参加者デバイスによる第１の表示例）
　図２９は、遠隔参加者デバイス２０－１によるマップの第１の表示例を示す図である。図２９に示されるように、遠隔参加者デバイス２０－１によってマップが表示されている。 (First display example by remote participant device)
Fig. 29 is a diagram showing a first example of a map displayed by the remote participant device 20-1. As shown in Fig. 29, a map is displayed by the remote participant device 20-1.

　ここで、マップには、タグが関連付けられており、タグには、マップにおける座標ｐ１、指摘者名「所長」および指摘情報「○○について指摘」が含まれている場合を想定する。かかる場合には、遠隔参加者デバイス２０－１は、マップにおける座標ｐ１を吹き出し表示Ｆ１によって表示してもよい。吹き出し表示Ｆ１には、指摘者名「所長」および指摘情報「○○について指摘」が含まれている。 Here, it is assumed that a tag is associated with the map, and that the tag includes the coordinate p1 on the map, the name of the person who pointed out the problem "Director", and the pointed out information "Pointed out about XX". In such a case, the remote participant device 20-1 may display the coordinate p1 on the map using a balloon display F1. The balloon display F1 includes the name of the person who pointed out the problem "Director" and the pointed out information "Pointed out about XX".

　また、吹き出し表示Ｆ１には、詳細画面表示ボタンｂ１が含まれている。詳細画面表示ボタンｂ１が選択されたときの動作は、図３１における詳細画面表示ボタンｂ８が選択されたときの動作とほぼ同様であるため、詳細画面表示ボタンｂ１についての詳細な説明は省略する。 The balloon display F1 also includes a details screen display button b1. The operation when the details screen display button b1 is selected is almost the same as the operation when the details screen display button b8 in FIG. 31 is selected, so a detailed explanation of the details screen display button b1 will be omitted.

　また、マップにおける座標ｐ１には、指摘者名と指摘情報との組み合わせが他にも関連付けられている場合を想定するが、最新の時刻情報に関連付けられた指摘者名「所長」と指摘情報「○○について指摘」との組み合わせのみが表示されている。 It is also assumed that other combinations of the name of the person who pointed out the problem and the pointed out information are associated with coordinate p1 on the map, but only the combination of the name of the person who pointed out the problem "Director" and the pointed out information "Pointed out about XX" associated with the most recent time information is displayed.

　さらに、マップには、マップにおける座標ｐ２と、時刻情報「１４：１５」と、到達者名「技術職員Ａ」とが関連付けられている場合を想定する。かかる場合には、遠隔参加者デバイス２０－１は、マップにおける座標ｐ２を吹き出し表示Ｆ２によって表示してもよい。吹き出し表示Ｆ２には、時刻情報「１４：１５」および到達者名「技術職員Ａ」が含まれている。また、吹き出し表示Ｆ２には、詳細画面表示ボタンｂ２が含まれている。詳細画面表示ボタンｂ２については後に詳細に説明する。 Furthermore, it is assumed that the map coordinate p2, the time information "14:15", and the name of the person who arrived "Technical Staff A" are associated with each other on the map. In such a case, the remote participant device 20-1 may display the map coordinate p2 using a balloon display F2. The balloon display F2 includes the time information "14:15" and the name of the person who arrived "Technical Staff A". The balloon display F2 also includes a details screen display button b2. The details screen display button b2 will be explained in detail later.

　なお、図２９には示されていないが、座標ｐ２が属するエリアを「エリアＲｃ」とする。また、マップにおける座標ｐ２には、時刻情報と到着者名との組み合わせが他にも関連付けられている場合を想定するが、最新の時刻情報「１４：１５」と到達者名「技術職員Ａ」との組み合わせのみが表示されている。 Note that, although not shown in FIG. 29, the area to which coordinate p2 belongs is referred to as "Area Rc." It is also assumed that other combinations of time information and arriving person names are associated with coordinate p2 on the map, but only the most recent combination of time information "14:15" and arriving person name "Technical Staff A" is displayed.

　以下の説明においては、現場作業者デバイス１０を使用する現場作業者が「技術職員Ａ」であり、遠隔参加者デバイス２０－１を使用する遠隔参加者が「所長」である場合を想定する。さらに、現場作業者デバイス１０を使用する現場作業者「技術職員Ａ」とは異なる現場作業者「技術職員Ｂ」も作業現場に存在している。現場作業者「技術職員Ｂ」は、現場作業者「技術職員Ａ」が使用する現場作業者デバイス１０とは異なる現場作業者デバイス１０を使用している。 In the following explanation, it is assumed that the on-site worker using the on-site worker device 10 is "Technical Staff A" and the remote participant using the remote participant device 20-1 is the "Manager." Furthermore, there is also a on-site worker "Technical Staff B" at the work site who is different from the on-site worker device 10 used by the on-site worker device 10 used by the on-site worker "Technical Staff A." The on-site worker "Technical Staff B" uses a on-site worker device 10 that is different from the on-site worker device 10 used by the on-site worker "Technical Staff A."

　「技術職員Ａ」の現在位置は、マップにおける座標ｐ３であり、マップにおける座標ｐ３が吹き出し表示Ｆ３により表示されている。吹き出し表示Ｆ３には、「技術職員Ａ」の現在のステータス「録画中」が含まれている。また、吹き出し表示Ｆ３には、配信依頼ボタンｂ３が含まれている。 The current location of "Technical Staff A" is coordinate p3 on the map, and coordinate p3 on the map is displayed by balloon display F3. Balloon display F3 includes "Technical Staff A's" current status "Recording". Balloon display F3 also includes a distribution request button b3.

　遠隔参加者デバイス２０－１を使用する遠隔参加者が配信依頼ボタンｂ３を選択する操作を入力すると、当該遠隔参加者は、技術職員Ａに配信依頼を行うことが可能である。例えば、配信依頼は、現場作業者デバイス１０にランプが備えられている場合には、ランプを発光させることによって実現され得る。あるいは、配信依頼は、マイクロフォン１２０から所定の音を出力させることによって実現されてもよい。なお、座標ｐ３が属するエリアを「エリアＲａ」とした場合、「技術職員Ａ」の現在位置として、エリア「エリアＲａ」が現在位置ｐに表示されてもよい。あるいは、「技術職員Ａ」の現在位置として、エリアＲａとともにマップに対応するフロア（例えば、１階など）が表示されてもよい。 When a remote participant using the remote participant device 20-1 inputs an operation to select the distribution request button b3, the remote participant can make a distribution request to technical staff A. For example, if the on-site worker device 10 is equipped with a lamp, the distribution request can be realized by turning on the lamp. Alternatively, the distribution request can be realized by outputting a specified sound from the microphone 120. If the area to which the coordinate p3 belongs is "area Ra," the area "area Ra" may be displayed at the current position p as the current location of "technical staff A." Alternatively, the corresponding floor (e.g., the first floor) on the map may be displayed together with area Ra as the current location of "technical staff A."

　「技術職員Ｂ」の現在位置は、マップにおける座標ｐ４であり、マップにおける座標ｐ４が吹き出し表示Ｆ４により表示されている。吹き出し表示Ｆ４には、「技術職員Ｂ」の現在のステータス「配信中」が含まれている。また、吹き出し表示Ｆ４には、対話参加ボタンｂ４が含まれている。 The current location of "Technical Staff B" is coordinate p4 on the map, and coordinate p4 on the map is displayed by balloon display F4. Balloon display F4 includes "Technical Staff B's" current status, "Streaming." Balloon display F4 also includes a dialogue participation button b4.

　なお、ステータスの種類としては、「録画中」および「配信中」以外にも、「録画配信なし」および「対応不可」などが挙げられる。「録画配信なし」は、３６０度映像が配信も保存もされていない状態である。「対応不可」は、遠隔参加者からの配信依頼を受けたとしても、配信依頼に応じられない状態である。ステータスは、現場作業者名に関連付けられている。 In addition to "recording" and "streaming," other types of status include "no recording stream" and "unavailable." "No recording stream" means that the 360-degree video is neither streamed nor saved. "Unavailable" means that a streaming request cannot be fulfilled, even if one is received from a remote participant. The status is associated with the name of the on-site worker.

　なお、現場作業者のステータスは、「録画中」から「対応不可」に自動的に切り替わってもよい。例えば、切り替えのタイミングは、現場作業者デバイス１０によって所定の二次元コードが認識された場合であってもよい。あるいは、切り替えのタイミングは、現場作業者デバイス１０が備えるマイクロフォン１２０によって検出された音量が閾値より大きい場合であってもよいし、カメラ１１０によって検出された画像の明るさが閾値より低い場合であってもよい。あるいは、切り替えのタイミングは、通信部１６０による通信品質が所定の品質よりも低下した場合であってもよい。 The status of the on-site worker may automatically switch from "recording" to "unavailable". For example, the timing of the switch may be when a specified two-dimensional code is recognized by the on-site worker device 10. Alternatively, the timing of the switch may be when the volume detected by the microphone 120 provided in the on-site worker device 10 is greater than a threshold value, or when the brightness of the image detected by the camera 110 is lower than a threshold value. Alternatively, the timing of the switch may be when the communication quality by the communication unit 160 falls below a specified quality.

　遠隔参加者デバイス２０－１を使用する遠隔参加者が対話参加ボタンｂ４を選択する操作を入力すると、当該遠隔参加者は、技術職員Ｂが行っている対話に参加することが可能である。例えば、対話への参加は、遠隔参加者が発した音声が技術職員Ｂに提示され、技術職員Ｂが発した音声が遠隔参加者に提示されることにより実現され得る。なお、座標ｐ４が属するエリアを「エリアＲｂ」とした場合、「技術職員Ｂ」の現在位置として、エリア「エリアＲｂ」が現在位置ｐに表示されてもよい。あるいは、「技術職員Ｂ」の現在位置として、エリアＲｂとともにマップに対応するフロア（例えば、１階など）が表示されてもよい。 When a remote participant using the remote participant device 20-1 inputs an operation to select the dialogue participation button b4, the remote participant can join the dialogue being conducted by technical staff B. For example, participation in a dialogue can be achieved by the remote participant's voice being presented to technical staff B, and the voice spoken by technical staff B being presented to the remote participant. If the area to which coordinate p4 belongs is "area Rb," then area "area Rb" may be displayed at current position p as the current location of "technical staff B." Alternatively, the corresponding floor (e.g., the first floor) on the map may be displayed together with area Rb as the current location of "technical staff B."

　映像リストＬ１は、遠隔参加者デバイス２０－１によって表示されているマップに関連付けられた３６０度映像のリストである。指摘情報リストＬ２は、マップに関連付けられたタグに含まれる指摘情報のリストである。 The video list L1 is a list of 360-degree videos associated with the map displayed by the remote participant device 20-1. The pointed-out information list L2 is a list of pointed-out information included in tags associated with the map.

　また、遠隔参加者デバイス２０－１によって表示される情報（例えば、吹き出し表示Ｆ１および吹き出し表示Ｆ２など）が多くなってしまうことも想定され得る。そのため、表示される情報に対するフィルタリングが可能であってもよい。検索キー入力欄Ｅ１、緊急ボタンｂ５、指摘済みボタンｂ６および実行ボタンｂ７などは、フィルタリングに用いられる。しかし、これらを用いたフィルタリングは、詳細画面５１（図３０）および詳細画面５２（図３２）におけるフィルタリングと同様であるため、ここでのフィルタリングの詳細な説明は省略する。 It is also conceivable that the amount of information displayed by the remote participant device 20-1 (for example, speech bubble display F1 and speech bubble display F2) will increase. For this reason, filtering of the displayed information may be possible. The search key input field E1, emergency button b5, pointed out button b6, and execute button b7 are used for filtering. However, filtering using these is similar to filtering on the details screen 51 (FIG. 30) and the details screen 52 (FIG. 32), so a detailed description of filtering will be omitted here.

　遠隔参加者が詳細画面表示ボタンｂ２を選択する操作を入力した場合を想定する。かかる場合には、遠隔参加者デバイス２０－１は、吹き出し表示Ｆ２によって表示される座標ｐ２をサーバ３０に送信し、サーバ３０は、座標ｐ２に関連付けられた時刻情報と到達者名とに基づく詳細画面を遠隔参加者デバイス２０－１に送信する。遠隔参加者デバイス２０－１において、制御部２４０は、ディスプレイ２８０への詳細画面の表示を制御する。 Let us assume that a remote participant inputs an operation to select the details screen display button b2. In such a case, the remote participant device 20-1 transmits the coordinate p2 displayed by the speech bubble display F2 to the server 30, and the server 30 transmits a details screen based on the time information and the name of the person who arrived associated with the coordinate p2 to the remote participant device 20-1. In the remote participant device 20-1, the control unit 240 controls the display of the details screen on the display 280.

　図３０は、詳細画面表示ボタンｂ２が選択された場合に遠隔参加者デバイス２０－１によって表示される詳細画面の例を示す図である。図３０を参照すると、かかる詳細画面の例として詳細画面５１が示されている。 FIG. 30 shows an example of a details screen displayed by the remote participant device 20-1 when the details screen display button b2 is selected. Referring to FIG. 30, details screen 51 is shown as an example of such a details screen.

　詳細画面５１には、座標ｐ２に関連付けられた、時刻情報「９／２２　１６：３５」と到達者名「所長」との組み合わせと、時刻情報「９／２３　１１：１５」と到達者名「技術職員Ｂ」との組み合わせと、時刻情報「９／２４　１４：１５」と到達者名「技術職員Ａ」との組み合わせとが含まれている。また、それぞれの時刻情報に対応する位置に、選択対象時刻ｔ１１～ｔ１３が表示されている。 Detail screen 51 includes a combination of time information "9/22 16:35" and the name of the person who arrived "Director," a combination of time information "9/23 11:15" and the name of the person who arrived "Technical Staff B," and a combination of time information "9/24 14:15" and the name of the person who arrived "Technical Staff A," all of which are associated with coordinate p2. In addition, selectable times t11 to t13 are displayed at the positions corresponding to each piece of time information.

　選択対象時刻ｔ１１～ｔ１３のいずれかを選択する操作が遠隔参加者によって入力されると、遠隔参加者によって選択された選択対象時刻を示す時刻情報と、遠隔参加者の視点方向およびＦｏＶとが、遠隔参加者デバイス２０－１からサーバ３０に送信される。 When a remote participant inputs an operation to select one of the selection target times t11 to t13, time information indicating the selection target time selected by the remote participant, as well as the viewpoint direction and FoV of the remote participant, are transmitted from the remote participant device 20-1 to the server 30.

　サーバ３０は、遠隔参加者デバイス２０－１から受信した時刻情報に関連付けられた３６０度映像における所定のフレームと、受信した遠隔参加者の視点方向およびＦｏＶとに応じた静止画像を遠隔参加者デバイス２０－１に送信する。遠隔参加者デバイス２０－１は、サーバ３０から送信された静止画像を表示する。遠隔参加者は、このように表示された静止画像を見ることにより、作業現場の同一のエリアの状況を時系列に沿って確認することが可能である。 The server 30 transmits to the remote participant device 20-1 a specific frame in the 360-degree video associated with the time information received from the remote participant device 20-1, and a still image corresponding to the received viewpoint direction and FoV of the remote participant. The remote participant device 20-1 displays the still image transmitted from the server 30. By viewing the still images displayed in this way, the remote participant can check the situation of the same area of the work site in chronological order.

　なお、所定のフレームは、３６０度映像を構成する複数のフレームのいずれのフレームであってもよい。例えば、所定のフレームは、３６０度映像を構成する複数のフレームの最後のフレームであってもよいし、最初のフレームであってもよい。 Note that the specified frame may be any of the multiple frames that make up the 360-degree video. For example, the specified frame may be the last frame or the first frame of the multiple frames that make up the 360-degree video.

　図３０に示された例では、選択対象時刻ｔ１３が遠隔参加者によって選択されたことに基づいて、選択対象時刻ｔ１３を示す時刻情報「９／２４　１４：１５」に関連付けられた３６０度映像と遠隔参加者の視点方向およびＦｏＶとに応じた静止画像Ｈ３１が表示されている。 In the example shown in FIG. 30, based on the selection of the selection target time t13 by the remote participant, a 360-degree image associated with the time information "9/24 14:15" indicating the selection target time t13 and a still image H31 according to the viewpoint direction and FoV of the remote participant are displayed.

　なお、遠隔参加者が動画表示ボタンｂ１１を選択する操作を入力することによって、選択対象時刻ｔ１３を示す時刻情報「９／２４　１４：１５」に関連付けられた３６０度映像と遠隔参加者の視点方向およびＦｏＶとに応じた動画像が表示されてもよい。遠隔参加者は、このように表示された動画像を見ることにより、作業現場の状況の時間変化を詳細に確認することが可能である。 In addition, when the remote participant inputs an operation to select the video display button b11, a 360-degree image associated with the time information "9/24 14:15" indicating the selection target time t13 and a video corresponding to the viewpoint direction and FoV of the remote participant may be displayed. By viewing the video displayed in this way, the remote participant can check the detailed changes in the situation at the work site over time.

　また、遠隔参加者が選択対象時刻ｔ１１～ｔ１３のいずれか二つを選択する操作を入力し、並べて比較ボタンｂ１２を選択する操作を入力することによって、選択された二つの選択対象時刻に関連する静止画像が並べて表示されてもよい。遠隔参加者は、このように表示された二つの静止画像を見ることにより、互いに異なる二つの時刻における作業現場の状況を比較しながら確認することが可能である。 Also, a remote participant may input an operation to select any two of the selection target times t11 to t13 and select the side-by-side comparison button b12, so that still images related to the two selected selection target times are displayed side-by-side. By looking at the two still images displayed in this way, the remote participant can compare and confirm the situation at the work site at the two different times.

　詳細画面５１には、フィルタリングに用いられる、検索キー入力欄Ｅ１、期間入力欄Ｅ２および実行ボタンｂ７などが含まれている。 The details screen 51 includes a search key input field E1, a period input field E2, and an execute button b7, which are used for filtering.

　例えば、検索キーによるフィルタリングが可能であってもよい。例えば、遠隔参加者が検索キー入力欄Ｅ１に「所長」を入力し、実行ボタンｂ７を選択する操作を操作検出部２３０に入力した場合を想定する。かかる場合には、遠隔参加者デバイス２０－１において、制御部２４０は、検索キー入力欄Ｅ１に入力された「所長」を、指定された検索キーとしてサーバ３０に送信されるように通信部２６０を制御する。 For example, filtering by search key may be possible. For example, assume that a remote participant inputs "Director" into the search key input field E1 and inputs an operation to select the execute button b7 to the operation detection unit 230. In such a case, in the remote participant device 20-1, the control unit 240 controls the communication unit 260 to transmit "Director" input into the search key input field E1 to the server 30 as the specified search key.

　サーバ３０において、通信部３６０は、検索キー「所長」を受信し、制御部３４０は、座標ｐ２に関連付けられた到達者の中から、検索キー「所長」を含んだ到達者を特定する。そして、制御部３４０は、検索キー「所長」を含んだ到達者「所長」と時刻情報「９／２２　１６：３５」との組み合わせのみが、遠隔参加者デバイス２０－１に送信されるように通信部３６０を制御する。 In the server 30, the communication unit 360 receives the search key "Director", and the control unit 340 identifies the arrival person who includes the search key "Director" from among the arrival people associated with the coordinate p2. The control unit 340 then controls the communication unit 360 so that only the combination of the arrival person "Director" who includes the search key "Director" and the time information "9/22 16:35" is sent to the remote participant device 20-1.

　これによって、遠隔参加者デバイス２０－１においては、時刻情報「９／２２　１６：３５」と到達者名「所長」との組み合わせと、当該組み合わせに対応する選択対象時刻ｔ１１のみが表示されるようになる。このようなフィルタリングによって、遠隔参加者は、到達者名「所長」に関連付けられた静止画像および動画像のみを確認したい場合などに、確認したい静止画像および動画像へのアクセスが容易となる。 As a result, on the remote participant device 20-1, only the combination of the time information "9/22 16:35" and the arrival person's name "Director", and the selection target time t11 corresponding to this combination, are displayed. This type of filtering makes it easy for remote participants to access the still images and videos they wish to view, for example, when they wish to view only the still images and videos associated with the arrival person's name "Director".

　あるいは、遠隔参加者が期間入力欄Ｅ２に「９／２４～」を入力し、実行ボタンｂ７を選択する操作を操作検出部２３０に入力した場合を想定する。かかる場合には、遠隔参加者デバイス２０－１において、制御部２４０は、期間入力欄Ｅ２に入力された「９／２４～」を、指定された期間としてサーバ３０に送信されるように通信部２６０を制御する。 Alternatively, assume that a remote participant inputs "9/24~" into the period input field E2 and inputs an operation to select the execute button b7 to the operation detection unit 230. In such a case, in the remote participant device 20-1, the control unit 240 controls the communication unit 260 so that "9/24~" input into the period input field E2 is sent to the server 30 as the specified period.

　サーバ３０において、通信部３６０は、期間「９／２４～」を受信し、制御部３４０は、座標ｐ２に関連付けられた時刻情報の中から、期間「９／２４～」に属する時刻情報を特定する。そして、制御部３４０は、期間「９／２４～」に属する時刻情報「９／２４　１４：１５」と到達者「技術職員Ａ」の組み合わせのみが、遠隔参加者デバイス２０－１に送信されるように通信部３６０を制御する。 In the server 30, the communication unit 360 receives the period "9/24~", and the control unit 340 identifies time information belonging to the period "9/24~" from the time information associated with the coordinate p2. The control unit 340 then controls the communication unit 360 so that only the combination of time information "9/24 14:15" belonging to the period "9/24~" and the arrival person "Technical Staff A" is transmitted to the remote participant device 20-1.

　これによって、遠隔参加者デバイス２０－１においては、時刻情報「９／２４　１４：１５」と到達者名「技術職員Ａ」との組み合わせに対応する選択対象時刻ｔ１３のみが表示されるようになる。このようなフィルタリングによって、遠隔参加者は、期間「９／２４～」に属する時刻情報「９／２４　１４：１５」に関連付けられた静止画像および動画像のみを確認したい場合などに、確認したい静止画像および動画像へのアクセスが容易となる。 As a result, only the selection target time t13 corresponding to the combination of the time information "9/24 14:15" and the arrival person's name "Technical Staff A" is displayed on the remote participant device 20-1. This type of filtering makes it easy for remote participants to access the still images and videos they wish to view, for example, when they wish to view only the still images and videos associated with the time information "9/24 14:15" that belongs to the period "9/24~".

　（遠隔参加者デバイスによる第２の表示例）
　図３１は、遠隔参加者デバイス２０－１によるマップの第２の表示例を示す図である。図３１に示されるように、遠隔参加者デバイス２０－１によってマップが表示されている。図２９に示された例と同様に、マップにおける座標ｐ１が吹き出し表示Ｆ１によって表示されている。 (Second Example Display by Remote Participant Device)
Fig. 31 is a diagram showing a second example of a map displayed by the remote participant device 20-1. As shown in Fig. 31, a map is displayed by the remote participant device 20-1. As in the example shown in Fig. 29, a coordinate p1 on the map is displayed by a balloon display F1.

　さらに、マップにおける座標ｐ５、返答者名「技術職員Ａ」および返答情報「××について対応不可」が含まれたタグがマップに関連付けられている場合を想定する。かかる場合には、遠隔参加者デバイス２０－１は、マップにおける座標ｐ５を吹き出し表示Ｆ８によって表示してもよい。吹き出し表示Ｆ８には、返答者名「技術職員Ａ」および返答情報「××について対応不可」が含まれている。また、吹き出し表示Ｆ８には、詳細画面表示ボタンｂ８が含まれている。なお、図３１には示されていないが、座標ｐ５が属するエリアを「エリアＲｄ」とする。 Furthermore, assume that a tag including the coordinate p5 on the map, the respondent's name "Technical Staff A", and the response information "XX is unavailable" is associated with the map. In such a case, the remote participant device 20-1 may display the coordinate p5 on the map using a balloon display F8. The balloon display F8 includes the respondent's name "Technical Staff A" and the response information "XX is unavailable". The balloon display F8 also includes a details screen display button b8. Although not shown in FIG. 31, the area to which the coordinate p5 belongs is referred to as "area Rd".

　また、マップにおける座標ｐ５には、指摘者名と指摘情報との組み合わせ、および、返答者名と返答情報との組み合わせが他にも関連付けられている場合を想定するが、最新の時刻情報に関連付けられた返答者名「技術職員Ａ」と返答情報「××について対応不可」との組み合わせのみが表示されている。 It is assumed that other combinations of the name of the person who pointed out the problem and the pointed out information, and the name of the person who responded and the response information may be associated with coordinate p5 on the map, but only the combination of the responder name "Technical Staff A" and the response information "Cannot handle XX" associated with the most recent time information is displayed.

　遠隔参加者が詳細画面表示ボタンｂ８を選択する操作を入力した場合を想定する。かかる場合には、遠隔参加者デバイス２０－１は、吹き出し表示Ｆ８によって表示される座標ｐ５をサーバ３０に送信し、サーバ３０は、座標ｐ５が属するエリアＲｄ内の座標に関連付けられた時刻情報と指摘者名または返答者名と指摘情報または返答情報とに基づく詳細画面を遠隔参加者デバイス２０－１に送信する。遠隔参加者デバイス２０－１において、制御部２４０は、ディスプレイ２８０への詳細画面の表示を制御する。 Let us assume that a remote participant inputs an operation to select the details screen display button b8. In this case, the remote participant device 20-1 transmits the coordinate p5 displayed by the speech bubble display F8 to the server 30, and the server 30 transmits to the remote participant device 20-1 a details screen based on the time information, the name of the person who pointed out the problem or the name of the person who responded, and the pointed out information or the response information associated with the coordinate p5 in the area Rd. In the remote participant device 20-1, the control unit 240 controls the display of the details screen on the display 280.

　図３２は、詳細画面表示ボタンｂ８が選択された場合に遠隔参加者デバイス２０－１によって表示される詳細画面の例を示す図である。図３２を参照すると、かかる詳細画面の例として詳細画面５２が示されている。 FIG. 32 is a diagram showing an example of a details screen displayed by the remote participant device 20-1 when the details screen display button b8 is selected. Referring to FIG. 32, details screen 52 is shown as an example of such a details screen.

　詳細画面５２には、座標ｐ５が属するエリアＲｄ内の座標に関連付けられた、指摘者名「所長」と指摘情報「××について指摘」との組み合わせと、指摘者名「所長」と指摘情報「△△について指摘」との組み合わせと、返答者名「技術職員Ａ」と返答情報「△△について状況確認」との組み合わせと、返答者名「技術職員Ｂ」と返答情報「△△について対応済み」との組み合わせとが含まれている。 Details screen 52 includes a combination of the problem-taker's name "Director" and problem information "Point raised about XX," a combination of the problem-taker's name "Director" and problem information "Point raised about △△," a combination of the responder's name "Technical staff member A" and reply information "Status check on △△," and a combination of the responder's name "Technical staff member B" and reply information "△△ dealt with," all of which are associated with coordinates in area Rd to which coordinate p5 belongs.

　紙面の都合上、座標ｐ５に関連付けられた時刻情報は、詳細画面５２から省略されている。それぞれの時刻情報に対応する位置には、選択対象時刻ｔ２４～ｔ２８が表示されている。選択対象時刻ｔ２４～ｔ２８の使用方法は、図３０に示された選択対象時刻ｔ１１～ｔ１３の使用方法と同様である。 Due to space limitations, the time information associated with coordinate p5 has been omitted from the details screen 52. Selection target times t24 to t28 are displayed at positions corresponding to the respective time information. The method of using selection target times t24 to t28 is the same as the method of using selection target times t11 to t13 shown in FIG. 30.

　図３２に示された例では、選択対象時刻ｔ２８が遠隔参加者によって選択されたことに基づいて、選択対象時刻ｔ２８を示す時刻情報に関連付けられた３６０度映像と遠隔参加者の視点方向およびＦｏＶとに応じた静止画像Ｈ６１が表示されている。動画表示ボタンｂ１１および並べて比較ボタンｂ１２の使用方法は、図３０を参照しながら既に説明した通りである。 In the example shown in FIG. 32, a still image H61 according to the viewpoint direction and FoV of the remote participant and the 360-degree video associated with the time information indicating the selection target time t28 is displayed based on the selection target time t28 being selected by the remote participant. The method of using the video display button b11 and the side-by-side comparison button b12 has already been described with reference to FIG. 30.

　詳細画面５２には、フィルタリングに用いられる、検索キー入力欄Ｅ１、緊急ボタンｂ５、指摘済みボタンｂ６、期間入力欄Ｅ２および実行ボタンｂ７などが含まれている。期間入力欄Ｅ２の使用方法は、図３０を参照しながら既に説明した通りである。また、図３２に示された検索キー入力欄Ｅ１の使用方法は、図３０を参照しながら説明した検索キー入力欄Ｅ１の使用方法とほぼ同様であるが、到達者名の代わりに、指摘者名、指摘情報、返答者名および返答情報などが検索対象とされてよい。ここでは、指摘情報および返答情報が検索対象とされる場合を例として説明する。 The details screen 52 includes a search key input field E1, an emergency button b5, a pointed out button b6, a period input field E2, and an execute button b7, which are used for filtering. The method of using the period input field E2 has already been explained with reference to FIG. 30. The method of using the search key input field E1 shown in FIG. 32 is almost the same as the method of using the search key input field E1 explained with reference to FIG. 30, but instead of the name of the recipient, the name of the person who pointed out the problem, the pointed out information, the name of the person who responded, and the response information may be searched for. Here, an example is explained in which the pointed out information and the response information are searched for.

　例えば、遠隔参加者が検索キー入力欄Ｅ１にオブジェクトの種類「××」を入力し、実行ボタンｂ７を選択する操作を操作検出部２３０に入力した場合を想定する。かかる場合には、遠隔参加者デバイス２０－１において、制御部２４０は、検索キー入力欄Ｅ１に入力された「××」を、指定された検索キーとしてサーバ３０に送信されるように通信部２６０を制御する。 For example, assume that a remote participant inputs the object type "XX" into the search key input field E1 and inputs an operation to select the execute button b7 to the operation detection unit 230. In such a case, in the remote participant device 20-1, the control unit 240 controls the communication unit 260 so that the "XX" input into the search key input field E1 is sent to the server 30 as the specified search key.

　サーバ３０において、通信部３６０は、検索キー「××」を受信し、制御部３４０は、座標ｐ５が属するエリアＲｄ内の座標に関連付けられた指摘情報の中から、検索キー「××」を含んだ指摘情報および返答情報を特定する。そして、制御部３４０は、検索キー「××」を含んだ指摘情報と、当該指摘情報に関連付けられた指摘者名および時刻情報、検索キー「××」を含んだ返答情報と、当該返答情報に関連付けられた返答者名および時刻情報のみが、遠隔参加者デバイス２０－１に送信されるように通信部３６０を制御する。 In the server 30, the communication unit 360 receives the search key "XX", and the control unit 340 identifies the indication information and response information that contain the search key "XX" from among the indication information associated with coordinates in area Rd to which coordinate p5 belongs. The control unit 340 then controls the communication unit 360 so that only the indication information that contains the search key "XX", the name of the person who indicated the indication and the time information associated with the indication information, the response information that contains the search key "XX", and the name of the person who responded and the time information associated with the response information are transmitted to the remote participant device 20-1.

　図３３は、検索キーによるフィルタリングが行われた後の詳細画面の例を示す図である。図３３を参照すると、かかる詳細画面の例として詳細画面５３が示されている。図３３に示されるように、遠隔参加者デバイス２０－１においては、検索キー「××」を含んだ指摘情報「××について指摘」に関連付けられた時刻情報に対応する選択対象時刻ｔ２４、検索キー「××」を含んだ返答情報「××について対応不可」に関連付けられた時刻情報に対応する選択対象時刻ｔ２８のみが表示されるようになる。また、選択対象時刻ｔ２４が選択され、選択対象時刻ｔ２４を示す時刻情報に対応する、タグＴ４１および静止画像Ｈ６２が表示されている。 FIG. 33 is a diagram showing an example of a details screen after filtering using a search key. Referring to FIG. 33, details screen 53 is shown as an example of such a details screen. As shown in FIG. 33, on the remote participant device 20-1, only the selection target time t24 corresponding to the time information associated with the pointed out information "Pointed out about XX" containing the search key "XX" and the selection target time t28 corresponding to the time information associated with the response information "Cannot handle XX" containing the search key "XX" are displayed. In addition, the selection target time t24 is selected, and a tag T41 and a still image H62 corresponding to the time information indicating the selection target time t24 are displayed.

　さらに、指摘情報が属するカテゴリによるフィルタリングが可能であってもよい。例えば、カテゴリは、オブジェクトへの対応が必要であるか否かを示す情報（是正の要否）であってもよい。一例として、サーバ３０において、制御部３４０は、指摘情報に対応が必要であることを示す所定の用語（例えば、「してください」「お願いします」など）が含まれているか否かによって、オブジェクトへの対応が必要であるか否かを判断してもよい。 Furthermore, filtering by the category to which the pointed out information belongs may be possible. For example, the category may be information indicating whether or not an action needs to be taken on the object (whether or not correction is required). As one example, in the server 30, the control unit 340 may determine whether or not an action needs to be taken on the object based on whether or not the pointed out information contains a predetermined term indicating a need for action (for example, "please" or "please")

　あるいは、カテゴリは、オブジェクトへの対応の緊急度合いを示す情報であってもよい。一例として、サーバ３０において、制御部３４０は、指摘情報に緊急を示す所定の用語（例えば、「至急」など）が含まれているか否かによって、オブジェクトへの対応が緊急であるか否かを判断してもよい。カテゴリは、制御部３４０によって指摘情報に関連付けられてよい。 Alternatively, the category may be information indicating the degree of urgency of responding to the object. As an example, in the server 30, the control unit 340 may determine whether or not responding to the object is urgent based on whether or not the indication information contains a predetermined term indicating urgency (e.g., "urgent"). The category may be associated with the indication information by the control unit 340.

　例えば、遠隔参加者が緊急ボタンｂ５を選択し、実行ボタンｂ７を選択する操作を操作検出部２３０に入力した場合を想定する。かかる場合には、遠隔参加者デバイス２０－１において、制御部２４０は、カテゴリ「緊急」を、指定されたカテゴリとしてサーバ３０に送信されるように通信部２６０を制御する。 For example, assume that a remote participant selects the emergency button b5 and then inputs an operation to select the execute button b7 to the operation detection unit 230. In such a case, in the remote participant device 20-1, the control unit 240 controls the communication unit 260 so that the category "emergency" is sent to the server 30 as the specified category.

　サーバ３０において、通信部３６０は、カテゴリ「緊急」を受信し、制御部３４０は、座標ｐ５が属するエリアＲｄ内の座標に関連付けられた指摘情報の中から、カテゴリ「緊急」に属する指摘情報を特定する。そして、制御部３４０は、カテゴリ「緊急」に属する指摘情報と指摘者名と時刻情報との組み合わせのみが、遠隔参加者デバイス２０－１に送信されるように通信部３６０を制御する。 In the server 30, the communication unit 360 receives the category "emergency", and the control unit 340 identifies the indication information belonging to the category "emergency" from the indication information associated with coordinates in the area Rd to which the coordinate p5 belongs. The control unit 340 then controls the communication unit 360 so that only the combination of the indication information belonging to the category "emergency", the name of the person who indicated the indication, and time information is transmitted to the remote participant device 20-1.

　これによって、遠隔参加者デバイス２０－１においては、サーバ３０から送信された指摘情報と指摘者名と時刻情報との組み合わせと、当該組み合わせに対応する選択対象時刻のみが表示されるようになる。なお、指摘情報の代わりに返答情報が属するカテゴリが用いられてもよい。例えば、カテゴリは、遠隔参加者から現場作業者に対する称賛を示す情報であるか否かを示す情報などであってもよい。 As a result, on the remote participant device 20-1, only the combination of the criticism information, the name of the person who criticized, and the time information sent from the server 30, and the selection target time corresponding to that combination are displayed. Note that a category to which the response information belongs may be used instead of the criticism information. For example, the category may be information indicating whether or not the information is a praise from the remote participant to the on-site worker.

　さらに、オブジェクトへの対応状況によるフィルタリングが可能であってもよい。例えば、対応状況は、オブジェクトに対する指摘が済んだか否かを示す情報（指摘済みまたは未指摘）であってもよい。一例として、サーバ３０において、制御部３４０は、指摘情報認識モデルＭ２１によって指摘情報が認識されたか否かによって、オブジェクトに対する指摘が済んだか否かを判断してもよい。 Furthermore, filtering may be possible based on the response status of the object. For example, the response status may be information indicating whether or not the object has been pointed out (pointed out or not pointed out). As an example, in the server 30, the control unit 340 may determine whether or not the object has been pointed out based on whether or not the pointed out information has been recognized by the pointed out information recognition model M21.

　あるいは、対応状況は、オブジェクトが確認されたか否かを示す情報（確認済みまたは未確認）であってもよい。一例として、サーバ３０において、制御部３４０は、返答情報に確認済みを示す所定の用語（例えば、「確認しました」など）が含まれているか否かによって、オブジェクトが確認されたか否かを判断してもよい。 Alternatively, the response status may be information indicating whether or not the object has been confirmed (confirmed or unconfirmed). As an example, in the server 30, the control unit 340 may determine whether or not the object has been confirmed based on whether or not the response information contains a predetermined term indicating confirmation (e.g., "confirmed").

　あるいは、対応状況は、オブジェクトへの対応が済んだか否かを示す情報（対応済みまたは未対応）であってもよい。一例として、サーバ３０において、制御部３４０は、返答情報に対応済みを示す所定の用語（例えば、「対応しました」など）が含まれているか否かによって、オブジェクトへの対応が済んだか否かを判断してもよい。対応状況は、制御部３４０によってオブジェクトに関する情報に関連付けられてよい。 Alternatively, the response status may be information indicating whether or not an object has been responded to (responded to or not responded to). As an example, in the server 30, the control unit 340 may determine whether or not an object has been responded to based on whether or not the response information contains a predetermined term indicating that a response has been made (e.g., "responded to"). The response status may be associated with information about the object by the control unit 340.

　例えば、遠隔参加者が指摘済みボタンｂ６を選択し、実行ボタンｂ７を選択する操作を操作検出部２３０に入力した場合を想定する。かかる場合には、遠隔参加者デバイス２０－１において、制御部２４０は、対応状況「指摘済み」を、指定された対応状況としてサーバ３０に送信されるように通信部２６０を制御する。 For example, assume that a remote participant selects the "Indicated" button b6 and then inputs an operation to select the "Execute" button b7 to the operation detection unit 230. In such a case, in the remote participant device 20-1, the control unit 240 controls the communication unit 260 to send the response status "Indicated" to the server 30 as the specified response status.

　サーバ３０において、通信部３６０は、対応状況「指摘済み」を受信し、制御部３４０は、座標ｐ５が属するエリアＲｄ内の座標に関連付けられたオブジェクトに関する情報の中から、対応状況「指摘済み」に関連付けられたオブジェクトに関する情報を特定する。そして、制御部３４０は、オブジェクトに関する情報に関連付けられた、指摘情報と指摘者名と時刻情報との組み合わせのみが、遠隔参加者デバイス２０－１に送信されるように通信部３６０を制御する。 In the server 30, the communication unit 360 receives the response status "pointed out", and the control unit 340 identifies information about the object associated with the response status "pointed out" from information about objects associated with coordinates in the area Rd to which the coordinate p5 belongs. The control unit 340 then controls the communication unit 360 so that only the combination of the pointing out information, the name of the person who pointed out the problem, and time information associated with the information about the object is sent to the remote participant device 20-1.

　これによって、遠隔参加者デバイス２０－１においては、サーバ３０から送信された指摘情報と指摘者名と時刻情報との組み合わせと、当該組み合わせに対応する選択対象時刻のみが表示されるようになる。なお、指摘情報には、オブジェクトに関する情報の例としてのオブジェクトの種類が含まれ得る。例えば、「××について指摘」という指摘情報に含まれる「××」は、オブジェクトの種類に該当し得る。 As a result, on the remote participant device 20-1, only the combination of the pointed out information, the name of the person who pointed out the problem, and the time information sent from the server 30, and the selection target time corresponding to that combination are displayed. The pointed out information may include the type of object as an example of information about the object. For example, the "XX" included in the pointed out information "Pointed out about XX" may correspond to the type of object.

　同様にして、指摘者名によるフィルタリングが可能であってもよい。すなわち、制御部３４０は、座標ｐ５が属するエリアＲｄ内の座標に関連付けられた指摘者名の中から、遠隔参加者によって指定された指摘者名に合致する指摘者名を特定してよい。そして、遠隔参加者デバイス２０－１において、制御部２４０は、特定された指摘者名に関連付けられた、指摘情報と時刻情報との、ディスプレイ２８０による表示を制御してもよい。 Similarly, filtering by the name of the person who pointed out the problem may be possible. That is, the control unit 340 may identify the name of the person who pointed out the problem that matches the name of the person who pointed out the problem specified by the remote participant from among the names of the people who pointed out the problem that are associated with coordinates in the area Rd to which the coordinate p5 belongs. Then, in the remote participant device 20-1, the control unit 240 may control the display on the display 280 of the pointed out information and time information associated with the identified name of the person who pointed out the problem.

　同様にして、返答者名によるフィルタリングが可能であってもよい。すなわち、制御部３４０は、座標ｐ５が属するエリアＲｄ内の座標に関連付けられた返答者名の中から、遠隔参加者によって指定された返答者名に合致する返答者名を特定してよい。そして、遠隔参加者デバイス２０－１において、制御部２４０は、特定された返答者名に関連付けられた、返答情報と時刻情報との、ディスプレイ２８０による表示を制御してもよい。 Similarly, filtering by responder name may be possible. That is, the control unit 340 may identify a responder name that matches the responder name specified by the remote participant from among the responder names associated with coordinates in area Rd to which coordinate p5 belongs. Then, in the remote participant device 20-1, the control unit 240 may control the display on the display 280 of the response information and time information associated with the identified responder name.

　ステータスによるフィルタリングが可能であってもよい。すなわち、制御部３４０は、現場作業者名に関連付けられたステータスの中から、遠隔参加者によって指定されたステータスに合致するステータスを特定してよい。そして、遠隔参加者デバイス２０－１において、制御部２４０は、特定されたステータスに関連付けられた、現場作業者名のディスプレイ２８０による表示を制御してもよい。 Filtering by status may be possible. That is, the control unit 340 may identify a status that matches the status specified by the remote participant from among the statuses associated with the names of on-site workers. Then, in the remote participant device 20-1, the control unit 240 may control the display 280 to display the names of on-site workers associated with the identified status.

　以上、本開示の実施形態の詳細について説明した。 The above describes the details of the embodiment of the present disclosure.

　＜２．ハードウェア構成例＞
　続いて、図３４を参照して、本開示の実施形態に係るサーバ３０の例としての情報処理装置９００のハードウェア構成例について説明する。図３４は、情報処理装置９００のハードウェア構成例を示すブロック図である。なお、サーバ３０は、必ずしも図３４に示したハードウェア構成の全部を有している必要はなく、サーバ３０の中に、図３４に示したハードウェア構成の一部は存在しなくてもよい。 2. Hardware configuration example
Next, a hardware configuration example of an information processing device 900 as an example of the server 30 according to an embodiment of the present disclosure will be described with reference to Fig. 34. Fig. 34 is a block diagram showing a hardware configuration example of the information processing device 900. Note that the server 30 does not necessarily have to have all of the hardware configuration shown in Fig. 34, and some of the hardware configuration shown in Fig. 34 may not be present in the server 30.

　図３４に示すように、情報処理装置９００は、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　ｕｎｉｔ）９０１、ＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）９０２、およびＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）９０３を含む。また、情報処理装置９００は、ホストバス９０７、ブリッジ９０９、外部バス９１１、インターフェース９１３、入力装置９１５、出力装置９１７、ストレージ装置９１９、ドライブ９２１、接続ポート９２３、通信装置９２５を含んでもよい。情報処理装置９００は、ＣＰＵ９０１に代えて、またはこれとともに、ＤＳＰ（Ｄｉｇｉｔａｌ　Ｓｉｇｎａｌ　Ｐｒｏｃｅｓｓｏｒ）またはＡＳＩＣ（Ａｐｐｌｉｃａｔｉｏｎ　Ｓｐｅｃｉｆｉｃ　Ｉｎｔｅｇｒａｔｅｄ　Ｃｉｒｃｕｉｔ）と呼ばれるような処理回路を有してもよい。 As shown in FIG. 34, the information processing device 900 includes a CPU (Central Processing unit) 901, a ROM (Read Only Memory) 902, and a RAM (Random Access Memory) 903. The information processing device 900 may also include a host bus 907, a bridge 909, an external bus 911, an interface 913, an input device 915, an output device 917, a storage device 919, a drive 921, a connection port 923, and a communication device 925. The information processing device 900 may have a processing circuit such as a DSP (Digital Signal Processor) or an ASIC (Application Specific Integrated Circuit) instead of or in addition to the CPU 901.

　ＣＰＵ９０１は、演算処理装置および制御装置として機能し、ＲＯＭ９０２、ＲＡＭ９０３、ストレージ装置９１９、またはリムーバブル記録媒体９２７に記録された各種プログラムに従って、情報処理装置９００内の動作全般またはその一部を制御する。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータなどを記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータなどを一時的に記憶する。ＣＰＵ９０１、ＲＯＭ９０２、およびＲＡＭ９０３は、ＣＰＵバスなどの内部バスにより構成されるホストバス９０７により相互に接続されている。さらに、ホストバス９０７は、ブリッジ９０９を介して、ＰＣＩ（Ｐｅｒｉｐｈｅｒａｌ　Ｃｏｍｐｏｎｅｎｔ　Ｉｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス９１１に接続されている。 The CPU 901 functions as an arithmetic processing device and control device, and controls all or part of the operations within the information processing device 900 in accordance with various programs recorded in the ROM 902, RAM 903, storage device 919, or removable recording medium 927. The ROM 902 stores programs and arithmetic parameters used by the CPU 901. The RAM 903 temporarily stores programs used in the execution of the CPU 901 and parameters that change appropriately during the execution. The CPU 901, ROM 902, and RAM 903 are interconnected by a host bus 907 that is composed of an internal bus such as a CPU bus. Furthermore, the host bus 907 is connected to an external bus 911 such as a PCI (Peripheral Component Interconnect/Interface) bus via a bridge 909.

　入力装置９１５は、例えば、ボタンなど、ユーザによって操作される装置である。入力装置９１５は、マウス、キーボード、タッチパネル、スイッチおよびレバーなどを含んでもよい。また、入力装置９１５は、ユーザの音声を検出するマイクロフォンを含んでもよい。入力装置９１５は、例えば、赤外線やその他の電波を利用したリモートコントロール装置であってもよいし、情報処理装置９００の操作に対応した携帯電話などの外部接続機器９２９であってもよい。入力装置９１５は、ユーザが入力した情報に基づいて入力信号を生成してＣＰＵ９０１に出力する入力制御回路を含む。ユーザは、この入力装置９１５を操作することによって、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりする。 The input device 915 is a device operated by a user, such as a button. The input device 915 may include a mouse, a keyboard, a touch panel, a switch, a lever, and the like. The input device 915 may also include a microphone that detects the user's voice. The input device 915 may be, for example, a remote control device that uses infrared rays or other radio waves, or an externally connected device 929 such as a mobile phone that supports operation of the information processing device 900. The input device 915 includes an input control circuit that generates an input signal based on information input by the user and outputs it to the CPU 901. The user operates the input device 915 to input various data to the information processing device 900 and instruct processing operations.

　出力装置９１７は、取得した情報をユーザに対して視覚的または聴覚的に通知することが可能な装置で構成される。出力装置９１７は、例えば、ＬＣＤ（Ｌｉｑｕｉｄ　Ｃｒｙｓｔａｌ　Ｄｉｓｐｌａｙ）、有機ＥＬ（Ｅｌｅｃｔｒｏ－Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイなどの表示装置、スピーカおよびヘッドホンなどの音出力装置などであり得る。また、出力装置９１７は、ＰＤＰ（Ｐｌａｓｍａ　Ｄｉｓｐｌａｙ　Ｐａｎｅｌ）、プロジェクタ、ホログラム、プリンタ装置などを含んでもよい。出力装置９１７は、情報処理装置９００の処理により得られた結果を、テキストまたは画像などの映像として出力したり、音声または音響などの音として出力したりする。また、出力装置９１７は、周囲を明るくするためライトなどを含んでもよい。 The output device 917 is configured with a device capable of visually or audibly notifying the user of acquired information. The output device 917 may be, for example, a display device such as an LCD (Liquid Crystal Display) or an organic EL (Electro-Luminescence) display, or a sound output device such as a speaker or a headphone. The output device 917 may also include a PDP (Plasma Display Panel), a projector, a hologram, a printer device, or the like. The output device 917 outputs the results obtained by the processing of the information processing device 900 as a video such as text or an image, or as a sound such as voice or audio. The output device 917 may also include a light to brighten the surroundings.

　ストレージ装置９１９は、情報処理装置９００の記憶部の一例として構成されたデータ格納用の装置である。ストレージ装置９１９は、例えば、ＨＤＤ（Ｈａｒｄ　Ｄｉｓｋ　Ｄｒｉｖｅ）などの磁気記憶デバイス、半導体記憶デバイス、光記憶デバイス、または光磁気記憶デバイスなどにより構成される。このストレージ装置９１９は、ＣＰＵ９０１が実行するプログラムや各種データ、および外部から取得した各種のデータなどを格納する。 The storage device 919 is a device for storing data, configured as an example of a storage unit of the information processing device 900. The storage device 919 is configured, for example, with a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, or a magneto-optical storage device. This storage device 919 stores the programs and various data executed by the CPU 901, as well as various data acquired from the outside.

　ドライブ９２１は、磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリなどのリムーバブル記録媒体９２７のためのリーダライタであり、情報処理装置９００に内蔵、あるいは外付けされる。ドライブ９２１は、装着されているリムーバブル記録媒体９２７に記録されている情報を読み出して、ＲＡＭ９０３に出力する。また、ドライブ９２１は、装着されているリムーバブル記録媒体９２７に記録を書き込む。 The drive 921 is a reader/writer for a removable recording medium 927 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, and is built into the information processing device 900 or is externally attached. The drive 921 reads out information recorded on the attached removable recording medium 927 and outputs it to the RAM 903. The drive 921 also writes information to the attached removable recording medium 927.

　接続ポート９２３は、機器を情報処理装置９００に直接接続するためのポートである。接続ポート９２３は、例えば、ＵＳＢ（Ｕｎｉｖｅｒｓａｌ　Ｓｅｒｉａｌ　Ｂｕｓ）ポート、ＩＥＥＥ１３９４ポート、ＳＣＳＩ（Ｓｍａｌｌ　Ｃｏｍｐｕｔｅｒ　Ｓｙｓｔｅｍ　Ｉｎｔｅｒｆａｃｅ）ポートなどであり得る。また、接続ポート９２３は、ＲＳ－２３２Ｃポート、光オーディオ端子、ＨＤＭＩ（登録商標）（Ｈｉｇｈ－Ｄｅｆｉｎｉｔｉｏｎ　Ｍｕｌｔｉｍｅｄｉａ　Ｉｎｔｅｒｆａｃｅ）ポートなどであってもよい。接続ポート９２３に外部接続機器９２９を接続することで、情報処理装置９００と外部接続機器９２９との間で各種のデータが交換され得る。 The connection port 923 is a port for directly connecting a device to the information processing device 900. The connection port 923 may be, for example, a Universal Serial Bus (USB) port, an IEEE 1394 port, or a Small Computer System Interface (SCSI) port. The connection port 923 may also be an RS-232C port, an optical audio terminal, or a High-Definition Multimedia Interface (HDMI) (registered trademark) port. By connecting an external device 929 to the connection port 923, various types of data may be exchanged between the information processing device 900 and the external device 929.

　通信装置９２５は、例えば、ネットワーク９３１に接続するための通信デバイスなどで構成された通信インターフェースである。通信装置９２５は、例えば、有線または無線ＬＡＮ（Ｌｏｃａｌ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、またはＷＵＳＢ（Ｗｉｒｅｌｅｓｓ　ＵＳＢ）用の通信カードなどであり得る。また、通信装置９２５は、光通信用のルータ、ＡＤＳＬ（Ａｓｙｍｍｅｔｒｉｃ　Ｄｉｇｉｔａｌ　Ｓｕｂｓｃｒｉｂｅｒ　Ｌｉｎｅ）用のルータ、または、各種通信用のモデムなどであってもよい。通信装置９２５は、例えば、インターネットや他の通信機器との間で、ＴＣＰ／ＩＰなどの所定のプロトコルを用いて信号などを送受信する。また、通信装置９２５に接続されるネットワーク９３１は、有線または無線によって接続されたネットワークであり、例えば、インターネット、家庭内ＬＡＮ、赤外線通信、ラジオ波通信または衛星通信などである。 The communication device 925 is, for example, a communication interface configured with a communication device for connecting to the network 931. The communication device 925 may be, for example, a wired or wireless LAN (Local Area Network), Bluetooth (registered trademark), or a communication card for WUSB (Wireless USB). The communication device 925 may also be a router for optical communication, a router for ADSL (Asymmetric Digital Subscriber Line), or a modem for various communications. The communication device 925 transmits and receives signals, for example, between the Internet and other communication devices using a predetermined protocol such as TCP/IP. The network 931 connected to the communication device 925 is a network connected by wire or wirelessly, for example, the Internet, a home LAN, infrared communication, radio wave communication, or satellite communication.

　＜３．まとめ＞
　本開示の実施形態によれば、第１の空間に存在する第１のユーザから入力された音声と、前記第１の空間が撮像されて得られた画像データとが、前記第１の空間とは離れた第２の空間に存在する第２のユーザに提示されるとともに、前記第２のユーザから入力された音声が、前記第１のユーザに提示されており、前記第２のユーザの視点に関する情報と前記画像データとに応じた表示情報の、ディスプレイによる表示を制御する制御部を備える、情報処理装置が提供される。 <3. Summary>
According to an embodiment of the present disclosure, an information processing device is provided in which audio input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and audio input from the second user is presented to the first user, and the information relating to the second user's viewpoint and display information corresponding to the image data are displayed on a display.The information processing device is provided with a control unit that controls the display of information on a display device that is related to the second user's viewpoint and the image data.

　かかる構成によれば、第１の空間とは離れた第２の空間に存在するユーザが第１の空間の状況をより効率良く管理することが可能となる。 This configuration allows a user in a second space separate from the first space to more efficiently manage the situation in the first space.

　以上、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本開示の技術的範囲はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 The above describes in detail preferred embodiments of the present disclosure with reference to the attached drawings, but the technical scope of the present disclosure is not limited to such examples. It is clear that a person with ordinary knowledge in the technical field of the present disclosure can conceive of various modified or revised examples within the scope of the technical ideas described in the claims, and it is understood that these also naturally fall within the technical scope of the present disclosure.

　例えば、上記では、現場作業者によって行われる作業が建設作業である場合を主に想定した。しかし、上記したように、現場作業者によって行われる作業は、建設作業に限定されない。例えば、現場作業者によって行われる作業は、電子機器などを点検する点検作業であってもよい。このとき、点検対象の電子機器がオブジェクトとなり得るが、電子機器が設置型である場合などには、オブジェクトの位置はあらかじめ登録され得る。そして、作業員がオブジェクトに近づいたことが検出された場合には、作業員に何らかの通知がなされてもよい。 For example, in the above, it is mainly assumed that the work performed by the on-site worker is construction work. However, as mentioned above, the work performed by the on-site worker is not limited to construction work. For example, the work performed by the on-site worker may be inspection work to check electronic devices, etc. In this case, the electronic device to be inspected may be the object, but if the electronic device is a stationary type, the position of the object may be registered in advance. Then, if it is detected that the worker is approaching the object, some kind of notification may be given to the worker.

　また、現場作業者によって行われる作業は、農業に必要な作業（すなわち、農作業）であってもよい。このとき、農場が作業現場に該当してもよく、作物が栽培される空間的な単位（例えば、作物エリア、ハウス、畝など）が作業現場におけるエリアに該当してもよい。そして、作物が栽培される空間的な単位ごとに、上記と同様のフィルタリングが可能であってもよい。あるいは、現場作業者によって行われる作業は、所定の監視対象（例えば、監視対象エリアの危険性、不審な人物など）を監視する監視作業であってもよい。 The work performed by the field worker may also be work necessary for agriculture (i.e., farm work). In this case, a farm may correspond to the work site, and the spatial unit in which the crops are grown (e.g., crop area, greenhouse, ridge, etc.) may correspond to the area at the work site. Filtering similar to that described above may be possible for each spatial unit in which the crops are grown. Alternatively, the work performed by the field worker may be surveillance work to monitor a specified surveillance target (e.g., dangers in the surveillance area, suspicious people, etc.).

　また、上記では、二次元コードにあらかじめ位置情報が関連付けられており、現場作業者デバイス１０が備えるカメラ１１０によって二次元コードが読み取られたことに基づいて、サーバ３０が、二次元コードが読み取られた時刻情報を位置情報に関連付ける例を主に説明した。しかし、二次元コードの代わりに、ＱＲコード（登録商標）以外の二次元コードが用いられてもよい。あるいは、二次元コードの代わりに、二次元以外の次元を有するコード（例えば、一次元コードなど）が用いられてもよい。あるいは、二次元コードの代わりにＡＲ（Ａｕｇｍｅｎｔｅｄ　Ｒｅａｌｉｔｙ）マーカーが用いられてもよい。 In the above, a two-dimensional code is associated with location information in advance, and based on the two-dimensional code being read by the camera 110 equipped in the field worker device 10, the server 30 associates the time information at which the two-dimensional code is read with the location information. However, a two-dimensional code other than a QR code (registered trademark) may be used instead of the two-dimensional code. Alternatively, a code having a dimension other than two dimensions (such as a one-dimensional code) may be used instead of the two-dimensional code. Alternatively, an AR (Augmented Reality) marker may be used instead of the two-dimensional code.

　また、二次元コードの読み取りは、所定の信号の受信の一例に過ぎない。すなわち、サーバ３０は、現場作業者デバイス１０によって所定の信号が受信されたことに基づいて、所定の信号の受信時刻を示す時刻情報を、二次元コードにあらかじめ関連付けられた位置情報に関連付けてもよい。例えば、所定の信号は、ビーコンが発する信号であってもよい。あるいは、所定の信号は、Ｗｉ－ｆｉ（登録商標）の基地局が発する信号であってもよい。 Furthermore, reading a two-dimensional code is merely one example of receiving a specific signal. That is, based on the reception of a specific signal by the field worker device 10, the server 30 may associate time information indicating the time the specific signal was received with location information previously associated with the two-dimensional code. For example, the specific signal may be a signal emitted by a beacon. Alternatively, the specific signal may be a signal emitted by a Wi-Fi (registered trademark) base station.

　また、上記では、現場作業者デバイス１０の形態が、現場作業者の頭部に装着される形態である場合について主に説明した。しかし、現場作業者デバイス１０の形態は、現場作業者の頭部に装着される形態に限定されなくてもよい。 In the above, the case where the on-site worker device 10 is mounted on the head of a field worker has been mainly described. However, the form of the on-site worker device 10 does not have to be limited to a form mounted on the head of a field worker.

　例えば、現場作業者デバイス１０は、スマートグラスであってもよい。スマートグラスには、片目でディスプレイを見るスカウター型などが含まれ得る。あるいは、現場作業者デバイス１０は、現場作業者の身体に装着されるボディウォーンカメラであってもよい。あるいは、現場作業者デバイス１０は、肩掛け型デバイスであってもよいし、首掛け型のデバイスであってもよい。 For example, the field worker device 10 may be smart glasses. Smart glasses may include a scouter type in which the display is viewed with one eye. Alternatively, the field worker device 10 may be a body-worn camera that is attached to the body of the field worker. Alternatively, the field worker device 10 may be a shoulder-mounted device or a neck-mounted device.

　また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏し得る。 Furthermore, the effects described in this specification are merely descriptive or exemplary and are not limiting. In other words, the technology disclosed herein may achieve other effects that are apparent to a person skilled in the art from the description in this specification, in addition to or in place of the above effects.

　なお、以下のような構成も本開示の技術的範囲に属する。
（１）
　第１の空間に存在する第１のユーザから入力された音声と、前記第１の空間が撮像されて得られた画像データとが、前記第１の空間とは離れた第２の空間に存在する第２のユーザに提示されるとともに、前記第２のユーザから入力された音声が、前記第１のユーザに提示されている情報処理システムにおける情報処理装置であって、
　前記第２のユーザの視点に関する情報と前記画像データとに応じた表示情報と、前記第１の空間に存在するオブジェクトに関する情報との、ディスプレイによる表示を制御する制御回路を備える、
　情報処理装置。
（２）
　前記画像データに基づいて前記オブジェクトに対する指摘可能性が推定され、
　前記制御回路は、前記指摘可能性に応じた前記表示情報の、前記ディスプレイによる表示を制御する、
　前記（１）に記載の情報処理装置。
（３）
　前記制御回路は、前記指摘可能性が閾値より高いオブジェクトが存在する場合に、前記オブジェクトの位置に応じた付加画像が前記表示情報に重畳されるように前記ディスプレイによる表示を制御する、
　前記（２）に記載の情報処理装置。
（４）
　前記制御回路は、前記付加画像が選択されたことに基づいて、前記オブジェクトが写る範囲に前記表示情報が変更されるように前記ディスプレイを制御する、
　前記（３）に記載の情報処理装置。
（５）
　前記オブジェクトに関する情報は、前記第１のユーザまたは前記第２のユーザから入力された音声に基づいて認識される、
　前記（１）～（４）のいずれか一項に記載の情報処理装置。
（６）
　前記オブジェクトに関する情報は、前記画像データに基づいて認識される、
　前記（１）～（４）のいずれか一項に記載の情報処理装置。
（７）
　前記第２のユーザに関する関連付け開始条件に基づいて、前記オブジェクトに関する情報が前記視点に関する情報に関連付けられ、
　前記第２のユーザに関する関連付け開始条件は、前記第２のユーザから入力された音声に基づいて前記オブジェクトが認識されたという条件、前記画像データに基づいて前記オブジェクトが認識されたという条件、または、前記第２のユーザから所定の関連付け開始操作が入力されたという条件である、
　前記（１）～（６）のいずれか一項に記載の情報処理装置。
（８）
　前記オブジェクトに関する情報は、前記オブジェクトの種類と前記画像データにおける前記オブジェクトの位置情報とを含む、
　前記（１）～（７）のいずれか一項に記載の情報処理装置。
（９）
　前記オブジェクトに関する情報には、前記オブジェクトへの対応状況が関連付けられており、
　前記制御回路は、指定された対応状況に前記オブジェクトに関する情報が関連付けられていると特定されたことに基づいて、前記オブジェクトに関する情報の、前記ディスプレイによる表示を制御する、
　前記（１）～（７）のいずれか一項に記載の情報処理装置。
（１０）
　前記対応状況は、前記オブジェクトに対する指摘が済んだか否かを示す情報、前記オブジェクトが確認されたか否かを示す情報または前記オブジェクトへの対応が済んだか否かを示す情報である、
　前記（９）に記載の情報処理装置。
（１１）
　前記制御回路は、前記視点に関する情報に関連付けられた時刻情報の前記ディスプレイによる表示を制御する、
　前記（１）～（１０）のいずれか一項に記載の情報処理装置。
（１２）
　前記時刻情報は、前記第１のユーザのデバイスによる所定の信号の受信時刻を示す情報である、
　前記（１１）に記載の情報処理装置。
（１３）
　前記所定の信号が受信されたことに基づいて、前記所定の信号にあらかじめ関連付けられた位置情報に、前記時刻情報が関連付けられる、
　前記（１２）に記載の情報処理装置。
（１４）
　前記制御回路は、期間を指定する操作に基づいて、前記期間に属する時刻情報に関連付けられた前記視点に関する情報に応じた前記表示情報の、前記ディスプレイによる表示を制御する、
　前記（１１）～（１３）のいずれか一項に記載の情報処理装置。
（１５）
　前記制御回路は、前記第１のユーザの位置情報の前記ディスプレイによる表示を制御する、
　前記（１）～（１４）のいずれか一項に記載の情報処理装置。
（１６）
　前記制御回路は、前記第１のユーザおよび前記第２のユーザの一方のユーザから入力された音声に基づいて認識された前記一方のユーザによる指摘情報の、前記ディスプレイによる表示を制御する、
　前記（１）～（１５）のいずれか一項に記載の情報処理装置。
（１７）
　前記制御回路は、前記一方のユーザが、指定された指摘者に合致すると特定されたことに基づいて、前記指摘情報の前記ディスプレイによる表示を制御する、
　前記（１６）に記載の情報処理装置。
（１８）
　前記制御回路は、前記指摘情報が、指定された検索キーを含んでいると特定されたことに基づいて、前記指摘情報の前記ディスプレイによる表示を制御する、
　前記（１６）に記載の情報処理装置。
（１９）
　前記指摘情報には、前記指摘情報が属するカテゴリが関連付けられており、
　前記制御回路は、前記指摘情報が、指定されたカテゴリに関連付けられていると特定されたことに基づいて、前記指摘情報の前記ディスプレイによる表示を制御する、
　前記（１６）に記載の情報処理装置。
（２０）
　前記カテゴリは、オブジェクトへの対応が必要であるか否かを示す情報または前記オブジェクトへの対応の緊急度合いを示す情報である、
　前記（１９）に記載の情報処理装置。
（２１）
　前記一方のユーザから入力された音声に基づく指摘情報の認識は、前記一方のユーザが話者であると特定されたことに基づいて判定される、
　前記（１６）～（２０）のいずれか一項に記載の情報処理装置。
（２２）
　前記制御回路は、前記視点に関する情報に応じた前記表示情報、および、前記第１の空間に存在するオブジェクトの位置情報に応じたマップの、前記ディスプレイによる表示を制御する、
　前記（１）～（２１）のいずれか一項に記載の情報処理装置。
（２３）
　前記制御回路は、前記第１のユーザおよび前記第２のユーザの一方のユーザから入力された音声に基づいて認識された前記一方のユーザによる指摘情報と他方のユーザから入力された音声に基づいて認識された前記他方のユーザによる返答情報との、前記ディスプレイによる表示を制御する、
　前記（１）～（２２）のいずれか一項に記載の情報処理装置。
（２４）
　前記制御回路は、前記他方のユーザが、指定された返答者に合致すると特定されたことに基づいて、前記他方のユーザによる返答情報の前記ディスプレイによる表示を制御する、
　前記（２３）に記載の情報処理装置。
（２５）
　前記制御回路は、所定の出力開始条件が満たされたことに基づいて、前記表示情報の、ディスプレイによる表示を制御し、
　前記所定の出力開始条件は、前記表示情報の出力開始操作が入力されたという条件を含む、
　前記（１）～（２４）のいずれか一項に記載の情報処理装置。
（２６）
　前記第１のユーザのデバイスによって所定の信号が受信されたことに基づいて、前記画像データに関する制御が行われる、
　前記（１）～（２５）のいずれか一項に記載の情報処理装置。
（２７）
　前記制御回路は、前記表示情報の表示を開始または停止することにより前記画像データに関する制御を行う、
　前記（２６）に記載の情報処理装置。
（２８）
　前記画像データに関する制御は、前記画像データに関する所定のパラメータの変更である、
　前記（２６）に記載の情報処理装置。
（２９）
　前記視点に関する情報は、前記第２のユーザの視点方向および前記画像データに対する前記表示情報の拡大率を含む、
　前記（１）～（２８）のいずれか一項に記載の情報処理装置。
（３０）
　前記制御回路は、
　前記オブジェクトに関する情報を保存し、
　前記保存された情報に基づいて前記第１の空間における作業進捗のレポートを自動で作成する、
　前記（１）～（２９）のいずれか一項に記載の情報処理装置。
（３１）
　前記レポートは、前記オブジェクトに関する情報、前記表示情報、および、前記オブジェクトが存在する位置の周辺のマップを含む、
　前記（３０）に記載の情報処理装置。
（３２）
　第１の空間に存在する第１のユーザから入力された音声と、前記第１の空間が撮像されて得られた画像データとが、前記第１の空間とは離れた第２の空間に存在する第２のユーザに提示されるとともに、前記第２のユーザから入力された音声が、前記第１のユーザに提示されている情報処理システムにおける情報処理方法であって、
　前記第２のユーザの視点に関する情報と前記画像データとに応じた表示情報と、前記第１の空間に存在するオブジェクトに関する情報との、ディスプレイによる表示をプロセッサが制御することを含む、
　情報処理方法。
（３３）
　コンピュータを、
　第１の空間に存在する第１のユーザから入力された音声と、前記第１の空間が撮像されて得られた画像データとが、前記第１の空間とは離れた第２の空間に存在する第２のユーザに提示されるとともに、前記第２のユーザから入力された音声が、前記第１のユーザに提示されている情報処理システムにおける情報処理装置であって、
　前記第２のユーザの視点に関する情報と前記画像データとに応じた表示情報と、前記第１の空間に存在するオブジェクトに関する情報との、ディスプレイによる表示を制御する制御回路として機能させるプログラム。 Note that the following configurations also fall within the technical scope of the present disclosure.
(1)
An information processing device in an information processing system, in which a voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and the voice input from the second user is presented to the first user,
a control circuit for controlling display, by a display, of display information corresponding to information related to a viewpoint of the second user and the image data, and information related to an object existing in the first space;
Information processing device.
(2)
A pointability of the object is estimated based on the image data;
The control circuit controls the display of the display information on the display in accordance with the pointing possibility.
The information processing device according to (1).
(3)
the control circuit controls, when an object whose pointing possibility is higher than a threshold value is present, a display on the display such that an additional image according to a position of the object is superimposed on the display information.
The information processing device according to (2).
(4)
the control circuit controls the display so that the display information is changed to a range in which the object is shown, based on the selection of the additional image.
The information processing device according to (3).
(5)
The information about the object is recognized based on a voice input from the first user or the second user.
The information processing device according to any one of (1) to (4).
(6)
The information about the object is recognized based on the image data.
The information processing device according to any one of (1) to (4).
(7)
Based on an association initiation condition for the second user, information about the object is associated with information about the viewpoint;
The association start condition for the second user is a condition that the object is recognized based on a voice input from the second user, a condition that the object is recognized based on the image data, or a condition that a predetermined association start operation is input from the second user.
The information processing device according to any one of (1) to (6).
(8)
The information about the object includes a type of the object and position information of the object in the image data.
The information processing device according to any one of (1) to (7).
(9)
The information about the object is associated with a correspondence status for the object;
the control circuit controls the display of the information about the object on the display based on the fact that the information about the object is identified as being associated with the designated correspondence situation.
The information processing device according to any one of (1) to (7).
(10)
The response status is information indicating whether or not a problem has been pointed out regarding the object, information indicating whether or not the object has been confirmed, or information indicating whether or not a response has been made to the object.
The information processing device according to (9) above.
(11)
The control circuit controls the display of time information associated with the information about the viewpoint on the display.
The information processing device according to any one of (1) to (10).
(12)
The time information is information indicating a time when a predetermined signal is received by the device of the first user.
The information processing device according to (11) above.
(13)
Based on the reception of the predetermined signal, the time information is associated with position information previously associated with the predetermined signal.
The information processing device according to (12).
(14)
the control circuit controls, based on an operation for designating a period, the display of the display information corresponding to information related to the viewpoint associated with time information belonging to the period, on the display.
The information processing device according to any one of (11) to (13).
(15)
The control circuit controls the display of the first user's location information on the display.
The information processing device according to any one of (1) to (14).
(16)
The control circuit controls the display of information indicated by one of the first user and the second user, the information being recognized based on a voice input from the one user.
The information processing device according to any one of (1) to (15).
(17)
The control circuit controls the display of the indication information on the display based on the fact that the one user is identified as matching the designated indicater.
The information processing device according to (16).
(18)
The control circuit controls display of the indication information on the display based on the fact that the indication information is identified as including a specified search key.
The information processing device according to (16).
(19)
The indicated information is associated with a category to which the indicated information belongs,
The control circuit controls display of the indication information on the display based on the fact that the indication information is identified as being associated with a designated category.
The information processing device according to (16).
(20)
The category is information indicating whether or not a response to the object is required, or information indicating a degree of urgency of a response to the object.
The information processing device according to (19).
(21)
The recognition of the indication information based on the voice input from the one user is determined based on the fact that the one user is identified as a speaker.
The information processing device according to any one of (16) to (20).
(22)
the control circuit controls the display, by the display, of the display information corresponding to the information relating to the viewpoint and a map corresponding to position information of an object existing in the first space;
The information processing device according to any one of (1) to (21).
(23)
The control circuit controls the display of indication information by one of the first user and the second user, which is recognized based on a voice input from the one user, and response information by the other user, which is recognized based on a voice input from the other user.
23. The information processing device according to any one of (1) to (22).
(24)
the control circuitry controls the display of response information by the other user on the display based on the other user being identified as matching a designated respondent.
The information processing device according to (23).
(25)
The control circuit controls the display of the display information on the display based on a predetermined output start condition being satisfied,
the predetermined output start condition includes a condition that an operation to start output of the display information is input.
The information processing device according to any one of (1) to (24).
(26)
Control is performed on the image data based on a predetermined signal being received by the device of the first user.
The information processing device according to any one of (1) to (25).
(27)
the control circuit performs control related to the image data by starting or stopping the display of the display information;
The information processing device described in (26).
(28)
The control related to the image data is a change of a predetermined parameter related to the image data.
The information processing device described in (26).
(29)
The information regarding the viewpoint includes a viewpoint direction of the second user and a magnification ratio of the display information with respect to the image data.
The information processing device according to any one of (1) to (28).
(30)
The control circuit includes:
storing information about the object;
automatically generating a report of the work progress in the first space based on the stored information;
30. The information processing device according to any one of (1) to (29).
(31)
the report includes information about the object, the display information, and a map of the vicinity of the location of the object;
The information processing device according to (30).
(32)
An information processing method in an information processing system, in which a voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and the voice input from the second user is presented to the first user,
and a processor controls a display of display information corresponding to the information on the viewpoint of the second user and the image data, and information on an object present in the first space, on a display.
Information processing methods.
(33)
Computer,
An information processing device in an information processing system, in which a voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and the voice input from the second user is presented to the first user,
A program that functions as a control circuit that controls the display, on a display, of information regarding the second user's viewpoint and display information corresponding to the image data, and information regarding objects existing in the first space.

　１　　　　情報処理システム
　１０　　　現場作業者デバイス
　１１０　　カメラ
　１２０　　マイクロフォン
　１３０　　検出部
　１４０　　制御部
　１５０　　記憶部
　１６０　　通信部
　１７０　　スピーカ
　２０　　　遠隔参加者デバイス
　２１０　　視点検出部
　２２０　　マイクロフォン
　２３０　　操作検出部
　２４０　　制御部
　２５０　　記憶部
　２６０　　通信部
　２７０　　スピーカ
　２８０　　ディスプレイ
　３０　　　サーバ
　３４０　　制御部
　３５０　　記憶部
　３６０　　通信部 REFERENCE SIGNS LIST 1 Information processing system 10 Site worker device 110 Camera 120 Microphone 130 Detection unit 140 Control unit 150 Storage unit 160 Communication unit 170 Speaker 20 Remote participant device 210 Viewpoint detection unit 220 Microphone 230 Operation detection unit 240 Control unit 250 Storage unit 260 Communication unit 270 Speaker 280 Display 30 Server 340 Control unit 350 Storage unit 360 Communication unit

Claims

An information processing device in an information processing system, in which a voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and the voice input from the second user is presented to the first user,
Display information corresponding to information related to the second user's viewpoint and the image data;
A control circuit for controlling a display of information relating to an object present in the first space by a display.
Information processing device.

A pointability of the object is estimated based on the image data;
The control circuit controls the display of the display information on the display in accordance with the pointing possibility.
The information processing device according to claim 1 .

the control circuit controls, when an object whose pointing possibility is higher than a threshold value is present, a display on the display such that an additional image according to a position of the object is superimposed on the display information.
The information processing device according to claim 2 .

the control circuit controls the display so that the display information is changed to a range in which the object is shown, based on the selection of the additional image.
The information processing device according to claim 3 .

The information about the object is recognized based on a voice input from the first user or the second user.
The information processing device according to claim 1 .

The information about the object is recognized based on the image data.
The information processing device according to claim 1 .

Based on an association initiation condition for the second user, information about the object is associated with information about the viewpoint;
The association start condition for the second user is a condition that the object is recognized based on a voice input from the second user, a condition that the object is recognized based on the image data, or a condition that a predetermined association start operation is input from the second user.
The information processing device according to claim 1 .

The information about the object includes a type of the object and position information of the object in the image data.
The information processing device according to claim 1 .

The information about the object is associated with a correspondence status for the object;
The control circuit controls the display of the information about the object on the display based on the fact that the information about the object is identified as being associated with the designated correspondence situation.
The information processing device according to claim 1 .

The response status is information indicating whether or not a notice has been issued for the object, information indicating whether or not the object has been confirmed, or information indicating whether or not a response has been issued for the object.
The information processing device according to claim 9.

The control circuit controls the display of time information associated with the information about the viewpoint on the display.
The information processing device according to claim 1 .

The time information is information indicating a time when a predetermined signal is received by the device of the first user.
The information processing device according to claim 11.

Based on the reception of the predetermined signal, the time information is associated with position information previously associated with the predetermined signal.
The information processing device according to claim 12.

the control circuit controls, based on an operation for designating a period, the display of the display information on the display in accordance with information related to the viewpoint associated with time information belonging to the period.
The information processing device according to claim 11.

The control circuit controls the display of the first user's location information on the display.
The information processing device according to claim 1 .

The control circuit controls the display of information indicated by one of the first user and the second user, the information being recognized based on a voice input from the one user.
The information processing device according to claim 1 .

The control circuit controls the display of the indication information on the display based on the fact that the one user is identified as matching the designated indicater.
The information processing device according to claim 16.

The control circuit controls display of the indication information on the display based on the fact that the indication information is identified as including a specified search key.
The information processing device according to claim 16.

The indicated information is associated with a category to which the indicated information belongs,
The control circuit controls the display of the indication information on the display based on the fact that the indication information is identified as being associated with a designated category.
The information processing device according to claim 16.

The category is information indicating whether or not a response to the object is required, or information indicating a degree of urgency of a response to the object.
The information processing device according to claim 19.

The recognition of the indication information based on the voice input from the one user is determined based on the fact that the one user is identified as a speaker.
The information processing device according to claim 16.

the control circuit controls the display, by the display, of the display information corresponding to the information relating to the viewpoint and a map corresponding to position information of an object present in the first space;
The information processing device according to claim 1 .

The control circuit controls the display of indication information by one of the first user and the second user, which is recognized based on a voice input from the one user, and response information by the other user, which is recognized based on a voice input from the other user.
The information processing device according to claim 1 .

the control circuitry controls the display of response information by the other user on the display based on the other user being identified as matching a designated respondent.
The information processing device according to claim 23.

The control circuit controls the display of the display information on the display based on a predetermined output start condition being satisfied,
the predetermined output start condition includes a condition that an operation to start output of the display information is input.
The information processing device according to claim 1 .

Control is performed on the image data based on a predetermined signal being received by the device of the first user.
The information processing device according to claim 1 .

the control circuit performs control related to the image data by starting or stopping the display of the display information;
27. The information processing device according to claim 26.

The control related to the image data is a change of a predetermined parameter related to the image data.
27. The information processing device according to claim 26.

The information regarding the viewpoint includes a viewpoint direction of the second user and a magnification ratio of the display information with respect to the image data.
The information processing device according to claim 1 .

The control circuit includes:
storing information about the object;
automatically generating a report of the work progress in the first space based on the stored information;
The information processing device according to claim 1 .

the report includes information about the object, the display information, and a map of the vicinity of the location of the object;
The information processing device according to claim 30.

An information processing method in an information processing system, in which a voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and the voice input from the second user is presented to the first user,
and a processor controls a display of display information corresponding to the information on the viewpoint of the second user and the image data, and information on an object present in the first space, on a display.
Information processing methods.

Computer,
An information processing device in an information processing system, in which a voice input from a first user present in a first space and image data obtained by capturing an image of the first space are presented to a second user present in a second space separated from the first space, and the voice input from the second user is presented to the first user,
A program that functions as a control circuit that controls the display, on a display, of information regarding the second user's viewpoint and display information corresponding to the image data, and information regarding objects existing in the first space.