JP2023117157A

JP2023117157A - Medium processor, transmission device, and reception device

Info

Publication number: JP2023117157A
Application number: JP2022019714A
Authority: JP
Inventors: 秀一青木; Shuichi Aoki
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2022-02-10
Filing date: 2022-02-10
Publication date: 2023-08-23

Abstract

To provide a medium processor, a transmission device, and a reception device which can properly display a 3D object in a specific content while suppressing a transfer traffic.SOLUTION: The medium processor includes: a reception unit for receiving point-of-view information from a user terminal; a renderer for generating a specific content including at least a content having a degree of freedom of a point of view on the basis of the point-of-view information; and a transmission unit for transmitting the specific content generated by the renderer to the user terminal. The reception unit receives, from the transmission device, quality information of each of at least two streams with different qualities according to the direction of a 3D object based on the point-of-view information as quality information of a stream regarding the three-dimensional object included in the specific content.SELECTED DRAWING: Figure 8

Description

本発明は、メディア処理装置、送信装置及び受信装置に関する。 The present invention relates to media processing devices, transmitting devices and receiving devices.

従来、360°映像及び3Dオブジェクトなどのコンテンツを伝送する仕組みが提案されている（例えば、非特許文献１）。このような仕組としては、利用者が座位で頭を動かした範囲の視点移動を伴う3DoF+（Degree of Freedom）、利用者が自由に動く範囲の視点移動を伴う6DoFなどが知られている。このような仕組みでは、360°映像と3Dオブジェクトとの位置関係は、シーン記述によって示される。 Conventionally, a mechanism for transmitting contents such as 360° video and 3D objects has been proposed (for example, Non-Patent Document 1). As such a mechanism, 3DoF+ (Degree of Freedom), which involves moving the user's head while in a sitting position, and 6DoF, which involves moving the user's viewpoint freely, are known. In such a scheme, the positional relationship between the 360° video and the 3D objects is indicated by the scene description.

3GPP TR 26.928 V16.1.0 2020年12月3GPP TR 26.928 V16.1.0 December 2020

上述した背景下において、視点の自由度を有するコンテンツを含む特定コンテンツを想定した場合に、メディア処理装置によって生成した上で、生成された特定コンテンツをメディア処理装置からユーザ端末に送信するケースが考えられる。 In the background described above, when it is assumed that specific content includes content with a degree of freedom of viewpoint, a case can be considered in which the media processing device generates the specific content and then transmits the generated specific content from the media processing device to the user terminal. be done.

発明者等は、鋭意検討の結果、視点情報に応じて3Dオブジェクトの一部が表示されないことに着目し、3Dオブジェクトに関するバウンディングボックを構成する各面の品質が均一でなくてもよいことを見出した。 As a result of diligent studies, the inventors have focused on the fact that a part of the 3D object is not displayed according to the viewpoint information, and have found that the quality of each surface that constitutes the bounding box of the 3D object does not have to be uniform. Ta.

そこで、本発明は、上述した課題を解決するためになされたものであり、伝送トラフィックを抑制しながらも、3Dオブジェクトを適切に表示することを可能とするメディア処理装置、送信装置及び受信装置を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, the present invention has been made to solve the above-described problems, and provides a media processing device, a transmitting device, and a receiving device that can appropriately display a 3D object while suppressing transmission traffic. intended to provide

開示の一態様は、視点情報をユーザ端末から受信する受信部と、前記視点情報に基づいて、視点の自由度を有するコンテンツを少なくとも含む特定コンテンツを生成するレンダラと、前記レンダラによって生成された前記特定コンテンツを前記ユーザ端末に送信する送信部と、を備え、前記受信部は、前記特定コンテンツに含まれる3次元オブジェクトに関するストリームの品質情報として、前記視点情報に基づいた前記3次元オブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を前記送信装置から受信する、メディア処理装置である。 According to one aspect of the disclosure, a receiving unit that receives viewpoint information from a user terminal; a renderer that generates specific content including at least content having a degree of freedom of viewpoint based on the viewpoint information; and a transmitting unit configured to transmit specific content to the user terminal, wherein the receiving unit transmits, as stream quality information regarding a 3D object included in the specific content, the orientation of the 3D object based on the viewpoint information. A media processing device that receives quality information about each of two or more streams of different quality from the transmitting device.

開示の一態様は、視点の自由度を有するコンテンツの構成を送信する送信部を備え、前記送信部は、前記コンテンツを少なくとも含む特定コンテンツに含まれる3次元オブジェクトに関するストリームの品質情報を送信し、前記品質情報は、前記3次元オブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を含む、送信装置である。 One aspect of the disclosure includes a transmission unit that transmits a configuration of content having a degree of freedom of viewpoint, the transmission unit transmits stream quality information regarding a three-dimensional object included in specific content that includes at least the content, The transmitting device, wherein the quality information includes quality information about each of two or more streams that differ in quality depending on the orientation of the three-dimensional object.

開示の一態様は、視点の自由度を有するコンテンツの構成を受信する受信部を備え、前記受信部は、前記コンテンツを少なくとも含む特定コンテンツに含まれる3次元オブジェクトに関するストリームの品質情報を受信し、前記品質情報は、前記3次元オブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を含む、受信装置である。 One aspect of the disclosure includes a receiving unit that receives a configuration of content having a degree of freedom of viewpoint, the receiving unit receives stream quality information regarding a three-dimensional object included in specific content that includes at least the content, The receiving device, wherein the quality information includes quality information about each of two or more streams that differ in quality depending on the orientation of the three-dimensional object.

本発明によれば、伝送トラフィックを抑制しながらも、特定コンテンツに含まれる3Dオブジェクトを適切に表示することを可能とするメディア処理装置、送信装置及び受信装置を提供することができる。 According to the present invention, it is possible to provide a media processing device, a transmitting device, and a receiving device that can appropriately display a 3D object included in specific content while suppressing transmission traffic.

図１は、実施形態に係る伝送システム10を示す図である。FIG. 1 is a diagram showing a transmission system 10 according to an embodiment. 図２は、実施形態に係るメディア処理装置200及びユーザ端末300を示すブロック図である。FIG. 2 is a block diagram showing a media processing device 200 and a user terminal 300 according to an embodiment. 図３は、実施形態に係る第2コンテンツを説明するための図である。FIG. 3 is a diagram for explaining the second content according to the embodiment. 図４は、実施形態に係る特定コンテンツの視聴方法を示す図である。FIG. 4 is a diagram showing a method for viewing specific content according to the embodiment. 図５は、動作例1を説明するための図である。FIG. 5 is a diagram for explaining Operation Example 1. FIG. 図６は、動作例2を説明するための図である。FIG. 6 is a diagram for explaining Operation Example 2. FIG. 図７は、動作例2を説明するための図である。FIG. 7 is a diagram for explaining Operation Example 2. FIG. 図８は、動作例3を説明するための図である。FIG. 8 is a diagram for explaining Operation Example 3. FIG. 図９は、動作例3を説明するための図である。FIG. 9 is a diagram for explaining Operation Example 3. FIG. 図１０は、動作例3を説明するための図である。FIG. 10 is a diagram for explaining Operation Example 3. FIG. 図１１は、動作例3を説明するための図である。FIG. 11 is a diagram for explaining Operation Example 3. FIG. 図１２は、動作例4を説明するための図である。FIG. 12 is a diagram for explaining Operation Example 4. FIG. 図１３は、動作例4を説明するための図である。FIG. 13 is a diagram for explaining Operation Example 4. FIG. 図１４は、動作例4を説明するための図である。FIG. 14 is a diagram for explaining Operation Example 4. FIG. 図１５は、動作例5を説明するための図である。FIG. 15 is a diagram for explaining Operation Example 5. FIG. 図１６は、動作例5を説明するための図である。FIG. 16 is a diagram for explaining Operation Example 5. FIG. 図１７は、動作例5を説明するための図である。FIG. 17 is a diagram for explaining Operation Example 5. FIG. 図１８は、動作例5を説明するための図である。FIG. 18 is a diagram for explaining Operation Example 5. FIG. 図１９は、変更例1に係る第1方法ついて説明するための図である。FIG. 19 is a diagram for explaining the first method according to Modification 1. FIG. 図２０は、変更例1に係る第2方法ついて説明するための図である。FIG. 20 is a diagram for explaining the second method according to Modification 1. FIG.

次に、本発明の実施形態について説明する。なお、以下の図面の記載において、同一または類似の部分には、同一または類似の符号を付している。ただし、図面は模式的なものであり、各寸法の比率などは現実のものとは異なることに留意すべきである。 Next, embodiments of the present invention will be described. In addition, in the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals. However, it should be noted that the drawings are schematic, and the ratio of each dimension is different from the actual one.

したがって、具体的な寸法などは以下の説明を参酌して判断すべきものである。また、図面相互間においても互いの寸法の関係や比率が異なる部分が含まれていることは勿論である。 Therefore, specific dimensions should be determined with reference to the following description. In addition, it goes without saying that there are portions with different dimensional relationships and ratios between the drawings.

［開示の概要］
開示の概要に係るメディア処理装置は、視点情報をユーザ端末から受信する受信部と、前記視点情報に基づいて、視点の自由度を有するコンテンツを少なくとも含む特定コンテンツを生成するレンダラと、前記レンダラによって生成された前記特定コンテンツを前記ユーザ端末に送信する送信部と、を備え、前記受信部は、前記特定コンテンツに含まれる3次元オブジェクトに関するストリームの品質情報として、前記視点情報に基づいた前記3次元オブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を前記送信装置から受信する。 [Summary of Disclosure]
A media processing device according to an overview of the disclosure includes: a receiving unit that receives viewpoint information from a user terminal; a renderer that generates specific content including at least content having a degree of freedom of viewpoint based on the viewpoint information; a transmission unit configured to transmit the generated specific content to the user terminal, wherein the reception unit transmits the 3D data based on the viewpoint information as stream quality information regarding a 3D object included in the specific content. Quality information is received from the transmitting device for each of two or more streams with different quality depending on the orientation of the object.

開示の概要では、メディア処理装置は、特定コンテンツに含まれる3Dオブジェクトに関するストリームの品質情報として、視点情報に基づいた3Dオブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を送信装置から受信する。このような構成によれば、3Dオブジェクトに関するバウンディングボックスを構成する各面の品質が均一でなくてもよいという新たな知見に基づいて、伝送トラフィックを抑制しながらも、3Dオブジェクトを適切に表示することができる。 According to the outline of the disclosure, a media processing device sends quality information about each of two or more streams whose quality differs depending on the direction of a 3D object based on viewpoint information, as stream quality information about a 3D object included in a specific content, from a transmitting device. receive. According to such a configuration, the 3D object can be displayed appropriately while suppressing the transmission traffic based on the new knowledge that the quality of each surface constituting the bounding box of the 3D object does not have to be uniform. be able to.

なお、メディア処理装置によって生成される特定コンテンツは視点情報に基づいて生成されるものであり、ユーザ端末側では、特定コンテンツに含まれる映像について2D映像として扱うことができることに留意すべきである。 It should be noted that the specific content generated by the media processing device is generated based on viewpoint information, and it should be noted that the video included in the specific content can be handled as 2D video on the user terminal side.

［実施形態］
（伝送システム）
以下において、実施形態に係る伝送システムについて説明する。図１は、実施形態に係る伝送システム10を示す図である。図１に示すように、デジタル無線伝送システムは、送信装置100、メディア処理装置200及びユーザ端末300を備える。 [Embodiment]
(transmission system)
A transmission system according to an embodiment will be described below. FIG. 1 is a diagram showing a transmission system 10 according to an embodiment. As shown in FIG. 1, the digital wireless transmission system comprises a transmitting device 100, a media processing device 200 and a user terminal 300. FIG.

実施形態において、送信装置100は、視点の自由度を有していない第1コンテンツ及び視点の自由度を有する第2コンテンツをメディア処理装置200に送信する。さらに、送信装置100は、第1コンテンツに付随する第1制御情報及び第2コンテンツに付随する第2制御情報をメディア処理装置200に送信する。 In the embodiment, the transmitting device 100 transmits to the media processing device 200 the first content without viewpoint flexibility and the second content with viewpoint flexibility. Furthermore, the transmitting device 100 transmits to the media processing device 200 first control information accompanying the first content and second control information accompanying the second content.

第1コンテンツは、2D映像及び音声の少なくともいずれか1つを含んでもよい。第1コンテンツ及び第1制御情報は、第1方式で送信されてもよい。第1方式は、ISO/IEC 23008-1（以下、MMT（MPEG Media Transport））に準拠する方式であってもよい。以下においては、第1方式がMMTに準拠するMMTP（MMT Protocol）であるケースについて例示する。第1制御情報は、MMT-SI（Signaling Information）と称されてもよい。 The first content may include at least one of 2D video and audio. The first content and the first control information may be transmitted in the first scheme. The first method may be a method conforming to ISO/IEC 23008-1 (hereinafter referred to as MMT (MPEG Media Transport)). A case in which the first method is MMTP (MMT Protocol) conforming to MMT is exemplified below. The first control information may be called MMT-SI (Signaling Information).

第2コンテンツは、360°映像及び3Dオブジェクトを含んでもよい。第2コンテンツ及び第2制御情報は、第2方式で送信されてもよい。あるいは、HTTP（Hyper Text Transfer Protocol）などのプロトコルで伝送されてもよい。第2コンテンツは、利用者が座位で頭を動かした範囲の視点移動を伴う3DoF+（Degree of Freedom）、利用者が自由に動く範囲の視点移動を伴う6DoFなどに準拠してもよい。第2コンテンツは、視点の自由度を有するため、同一時刻（フレーム）において、2以上の360°映像を含んでもよく、2以上の3Dオブジェクトを含んでもよい。第2制御情報は、シーン記述と称されてもよい。 The second content may include 360° video and 3D objects. The second content and second control information may be transmitted by the second method. Alternatively, it may be transmitted using a protocol such as HTTP (Hyper Text Transfer Protocol). The second content may conform to 3DoF+ (Degree of Freedom), which involves viewpoint movement within a range in which the user moves his or her head in a sitting position, or 6DoF, which involves viewpoint movement within a range in which the user can freely move. Since the second content has a degree of freedom of viewpoint, it may include two or more 360° videos and two or more 3D objects at the same time (frame). The second control information may be called scene description.

ここで、第2制御情報は、上述した第1方式で送信されてもよい。すなわち、第2制御情報は、第1制御情報と同じ第1方式（例えば、MMTP）で送信されてもよい。あるいは、HTTPなどのプロトコルで伝送されてもよい。 Here, the second control information may be transmitted by the first method described above. That is, the second control information may be transmitted by the same first scheme (eg, MMTP) as the first control information. Alternatively, it may be transmitted using a protocol such as HTTP.

送信装置100からメディア処理装置200への伝送は、特に限定されるものではないが、衛星放送を用いた伝送であってもよく、インターネット網を用いた伝送であってもよく、移動体通信網を用いた伝送であってもよい。 Transmission from the transmission device 100 to the media processing device 200 is not particularly limited, but may be transmission using satellite broadcasting, transmission using the Internet network, or mobile communication network. may be used for transmission.

特に限定されるものではないが、伝送システムは、デジタル無線伝送システムであってもよい。デジタル無線伝送システムは、4K、8K衛星放送で用いるシステムであってもよい。 Although not particularly limited, the transmission system may be a digital wireless transmission system. The digital wireless transmission system may be a system used for 4K and 8K satellite broadcasting.

メディア処理装置200は、ユーザ端末300から受信する視点情報に基づいて、上述した第2コンテンツを少なくとも含む特定コンテンツを生成し、生成された特定コンテンツをユーザ端末300に送信する。特に限定されるものではないが、特定コンテンツの伝送は、インターネット網を用いた伝送であってもよく、移動体通信網を用いた伝送であってもよい。 The media processing device 200 generates specific content including at least the second content described above based on the viewpoint information received from the user terminal 300 and transmits the generated specific content to the user terminal 300 . Although not particularly limited, the transmission of the specific content may be transmission using the Internet network or transmission using a mobile communication network.

ユーザ端末300は、スマートフォン、タブレット端末、ヘッドマウントディスプレイなどのユーザ端末であってもよい。図１に示すように、ユーザ端末300として2以上のユーザ端末300が設けられてもよい。言い換えると、2以上のユーザ端末300は、特定コンテンツの生成をメディア処理装置200に要求してもよい。各ユーザ端末300は、別々の視点情報をメディア処理装置200に送信してもよい。 The user terminal 300 may be a user terminal such as a smart phone, tablet terminal, or head mounted display. As shown in FIG. 1 , two or more user terminals 300 may be provided as user terminals 300 . In other words, two or more user terminals 300 may request the media processing device 200 to generate specific content. Each user terminal 300 may transmit different viewpoint information to the media processing device 200 .

（メディア処理装置及びユーザ端末）
以下において、実施形態に係るメディア処理装置及びユーザ端末について説明する。図２は、実施形態に係るメディア処理装置200及びユーザ端末300を示すブロック図である。 (Media processing device and user terminal)
A media processing device and a user terminal according to embodiments will be described below. FIG. 2 is a block diagram showing a media processing device 200 and a user terminal 300 according to an embodiment.

第1に、メディア処理装置200は、受付部210と、レンダラ220と、符号化処理部230と、を有する。 First, the media processing device 200 has a reception unit 210, a renderer 220, and an encoding processing unit 230.

受付部210は、視点情報を受け付ける。実施形態では、受付部210は、視点情報をユーザ端末300から受信する受信部を構成する。視点情報は、ユーザ端末300のユーザの視点位置を示す情報要素、ユーザ端末300のユーザの視線方向を示す情報要素を含む。 The reception unit 210 receives viewpoint information. In the embodiment, the receiving unit 210 constitutes a receiving unit that receives viewpoint information from the user terminal 300. FIG. The viewpoint information includes an information element indicating the viewpoint position of the user of the user terminal 300 and an information element indicating the line-of-sight direction of the user of the user terminal 300 .

レンダラ220は、視点情報に基づいて、第2コンテンツを少なくとも含む特定コンテンツを生成する。特定コンテンツは、視点情報に基づいて生成されるため、同一時刻（フレーム）において、1つの360°映像を含んでもよく、1つの3Dオブジェクトを含んでもよい。以下において、特定コンテンツは、第2コンテンツに加えて第1コンテンツを含むケースについて例示する。 The renderer 220 generates specific content including at least the second content based on the viewpoint information. Since the specific content is generated based on viewpoint information, it may include one 360° video and one 3D object at the same time (frame). In the following, the specific content will be exemplified for the case where the first content is included in addition to the second content.

図２に示すように、レンダラ220は、第1制御情報（MMT-SI）に基づいて、特定コンテンツの一部として、2D映像及び音声を含む第1コンテンツを生成する。第1コンテンツの生成において視点情報は不要である。 As shown in FIG. 2, the renderer 220 generates first content including 2D video and audio as part of specific content based on first control information (MMT-SI). Viewpoint information is not required in generating the first content.

具体的には、レンダラ220は、2D映像、音声及びMMT-SIがパケット化されたMMTPパケットの形式で、2D映像、音声及びMMT-SIを取得する。 Specifically, the renderer 220 acquires 2D video, audio, and MMT-SI in the form of MMTP packets in which 2D video, audio, and MMT-SI are packetized.

例えば、MMTPパケットは、IP（Internet Protocol）パケットに格納される。IPパケットは、UDP（User Datagram Protocol）を用いて伝送されてもよく、TCP（Transmission Control Protocol）を用いて伝送されてもよい。 For example, MMTP packets are stored in IP (Internet Protocol) packets. IP packets may be transmitted using UDP (User Datagram Protocol) or may be transmitted using TCP (Transmission Control Protocol).

ここで、第1コンテンツは、一定時間幅で区切られた単位（以下、MPU；Media Processing Unit）で処理される。MPUは、1以上のアクセスユニットを含む。アクセスユニットは、MFU（Media Fragment Unit）として扱われることもある。2D映像に関するMFUは、NAL（Network Abstraction Layer）ユニットと称されてもよく、音声に関するMFUは、MHAS（MPEG-H 3D Audio Stream）パケットと称されてもよい。 Here, the first content is processed in units (hereafter referred to as MPUs; Media Processing Units) that are separated by a certain time width. MPU includes one or more access units. An access unit may also be treated as an MFU (Media Fragment Unit). An MFU for 2D video may be referred to as a NAL (Network Abstraction Layer) unit, and an MFU for audio may be referred to as an MHAS (MPEG-H 3D Audio Stream) packet.

MMT-SIは、PA（Package Access）メッセージを含み、PAメッセージは、第1コンテンツの一覧を示すMPT（MMT Package Table）を含む。さらに、MMT-SIは、第1コンテンツの提示時刻を示すMPUタイムスタンプ記述子を含む。MPUタイムスタンプ記述子は、MPUの提示時刻、すなわち、MPUにおいて最初に提示するアクセスユニットの時刻を意味してもよい。 MMT-SI includes a PA (Package Access) message, and the PA message includes an MPT (MMT Package Table) showing a list of first contents. Furthermore, MMT-SI includes an MPU timestamp descriptor that indicates the presentation time of the first content. The MPU Timestamp Descriptor may represent the presentation time of the MPU, ie the time of the first presented access unit in the MPU.

MPUタイムスタンプ記述子は、UTC（Coordinated Universal Time）を基準時刻として生成されてもよい。基準時刻は、TAI（International Atomic Time）が用いられてもよく、GPS（Global Positioning System）から提供される時刻が用いられてもよい。基準時刻は、NTP（Network Time Protocol）サーバから提供される時刻であってもよく、PTP（Precision Time Protocol）サーバから提供される時刻であってもよい。 The MPU timestamp descriptor may be generated using UTC (Coordinated Universal Time) as a reference time. As the reference time, TAI (International Atomic Time) may be used, or the time provided by GPS (Global Positioning System) may be used. The reference time may be the time provided by an NTP (Network Time Protocol) server or the time provided by a PTP (Precision Time Protocol) server.

第2に、レンダラ220は、第2制御情報（シーン記述）に基づいて、特定コンテンツの一部として、360°映像及び3Dオブジェクトを含む第2コンテンツを生成する。第2コンテンツの生成において視点情報が用いられる。 Second, the renderer 220 generates second content including 360° video and 3D objects as part of the specific content based on the second control information (scene description). Viewpoint information is used in generating the second content.

具体的には、レンダラ220は、シーン記述がパケット化されたMMTPパケットの形式で、シーン記述を取得してもよい。360°映像及び3Dオブジェクトの取得方法は特に限定されるものではない。 Specifically, the renderer 220 may obtain the scene description in the form of MMTP packets in which the scene description is packetized. The acquisition method of 360° video and 3D object is not particularly limited.

360°映像は、ERP（Equirectangular projection）やキューブマップなどの射影変換によって2D映像に変換されていてもよい。360°映像に適用した射影変換の種類を示すメタデータが付加されていてもよい。3Dオブジェクトは、メッシュ形式で符号化されてもよい。メッシュ形式の符号化としては、ISO/IEC 14496-16 “Animation framework extension (AFX)”が用いられてもよい。3Dオブジェクトは、ポイントクラウド形式で符号化されてもよい。ポイントクラウド形式の符号化としては、ISO/IEC 23090-5 “Video-based Point Cloud Compression”が用いられてもよい。 The 360° video may be transformed into a 2D video by projective transformation such as ERP (Equirectangular projection) or cube map. Metadata indicating the type of projective transformation applied to the 360° video may be added. A 3D object may be encoded in a mesh format. ISO/IEC 14496-16 “Animation framework extension (AFX)” may be used for mesh format encoding. 3D objects may be encoded in point cloud format. ISO/IEC 23090-5 “Video-based Point Cloud Compression” may be used for point cloud format encoding.

ここで、第2コンテンツは、一定時間幅で区切られた単位で1つのファイルに纏められる。一定時間幅は、500msであってもよい。例えば、フレームレートが60fps（frame per second）である場合には、1つのファイルは、30 frameを含む。 Here, the second content is grouped into one file in units separated by a certain time width. The constant time width may be 500ms. For example, when the frame rate is 60 fps (frames per second), one file contains 30 frames.

シーン記述は、1つのファイル毎に生成され、360°映像及び3Dオブジェクトを特定する情報をフレーム毎に含む。例えば、シーン記述は、フレームの3Dオブジェクトの名称を示す情報要素（object_name）、フレーム番号を示す情報要素（frame_number）、フレームにおける3Dオブジェクトの位置を示す情報要素（translation_object）、フレームにおける3Dオブジェクトの回転を示す情報要素（rotation_object）、フレームにおける3Dオブジェクトの大きさを示す情報要素（scale_object）などを含む。 A scene description is generated for each file and contains information specifying the 360° video and 3D objects for each frame. For example, the scene description contains an information element (object_name) indicating the name of the 3D object in the frame, an information element (frame_number) indicating the frame number, an information element (translation_object) indicating the position of the 3D object in the frame, and a rotation of the 3D object in the frame. and an information element (rotation_object) indicating the size of the 3D object in the frame (scale_object).

第3に、レンダラ220は、第1コンテンツ及び第2コンテンツを含む特定コンテンツを符号化処理部230に出力する。レンダラ220は、特定コンテンツとともに、特定コンテンツの提示時刻を符号化処理部230に出力してもよい。 Third, the renderer 220 outputs the specific content including the first content and the second content to the encoding processing section 230 . The renderer 220 may output the presentation time of the specific content to the encoding processing unit 230 together with the specific content.

ここで、特定コンテンツの提示時刻は、メディア処理装置200とユーザ端末300との間の遅延時間に基づいて修正されてもよい。具体的には、レンダラ220は、送信装置100からメディア処理装置200に提供される特定コンテンツの提示時刻（T）及び遅延時間（ΔT）に基づいて、メディア処理装置200からユーザ端末300に提供される特定コンテンツの提示時刻（T’=T+ΔT）を算出してもよい。遅延時間（ΔT）は、メディア処理装置200において予め定められた値であってもよく、ユーザ端末300毎に異なる値であってもよい。 Here, the presentation time of the specific content may be modified based on the delay time between the media processing device 200 and the user terminal 300. FIG. Specifically, the renderer 220 provides the content from the media processing device 200 to the user terminal 300 based on the presentation time (T) and the delay time (ΔT) of the specific content provided from the transmission device 100 to the media processing device 200. The presentation time (T'=T+ΔT) of the specific content may be calculated. The delay time (ΔT) may be a predetermined value in the media processing device 200, or may be a different value for each user terminal 300. FIG.

第4に、レンダラ220は、特定コンテンツの生成に用いた視点情報をユーザ端末300に送信する送信部を構成してもよい。特定コンテンツの生成に用いた視点情報は、符号化処理部230からユーザ端末300に送信されてもよい。 Fourth, the renderer 220 may configure a transmission unit that transmits viewpoint information used to generate specific content to the user terminal 300 . Viewpoint information used to generate specific content may be transmitted from the encoding processing unit 230 to the user terminal 300 .

例えば、視点情報及び特定コンテンツの伝送方式は、MMTPであってもよく、HTTPであってもよい。特定コンテンツの伝送方式としてMMTPが用いられる場合には、視点情報は、ISO/IEC 23090-2で規定されたOMAF（Omnidirectional Media Format）にメタデータとして格納されてもよい。 For example, the transmission method of viewpoint information and specific content may be MMTP or HTTP. When MMTP is used as the specific content transmission method, the viewpoint information may be stored as metadata in OMAF (Omnidirectional Media Format) defined by ISO/IEC 23090-2.

符号化処理部230は、レンダラ220によって生成された特定コンテンツを符号化する。実施形態では、符号化処理部230は、特定コンテンツをユーザ端末300に送信する送信部の一例であってもよい。 The encoding processing unit 230 encodes the specific content generated by the renderer 220. FIG. In the embodiment, the encoding processing unit 230 may be an example of a transmitting unit that transmits specific content to the user terminal 300. FIG.

さらに、符号化処理部230は、特定コンテンツの提示時刻を符号化してもよい。符号化処理部230は、提示時刻を示す情報要素を特定コンテンツとともにユーザ端末300に送信してもよい。 Furthermore, the encoding processing unit 230 may encode the presentation time of the specific content. The encoding processing unit 230 may transmit the information element indicating the presentation time to the user terminal 300 together with the specific content.

ここで、符号化処理部230が用いる圧縮符号化方式としては、任意の圧縮符号化方式を用いることができる。例えば、圧縮符号化方式は、HEVC（High Efficiency Video Coding）であってもよく、VVC（Versatile Video Coding）であってもよい。 Here, as the compression encoding method used by the encoding processing unit 230, any compression encoding method can be used. For example, the compression encoding method may be HEVC (High Efficiency Video Coding) or VVC (Versatile Video Coding).

上述したように、特定コンテンツに含まれる第2コンテンツは、視点情報に基づいて生成されるため、特定コンテンツに含まれる映像は、視点の自由度を有していない2D映像として扱うことができる。 As described above, the second content included in the specific content is generated based on the viewpoint information, so the video included in the specific content can be treated as a 2D video that does not have a degree of freedom of viewpoint.

例えば、特定コンテンツの視聴開始や終了で用いる伝送制御方式は、RTSP（Real Time Streaming Protocol）を含んでもよい。伝送方式は、MMTPであってもよく、HTTPであってもよい。伝送方式としてMMTPが用いられる場合には、特定コンテンツは、ISO/IEC 23090-2で規定されたOMAFに格納されてもよい。 For example, the transmission control method used for starting and ending viewing of specific content may include RTSP (Real Time Streaming Protocol). The transmission method may be MMTP or HTTP. When MMTP is used as the transmission method, specific content may be stored in OMAF defined in ISO/IEC 23090-2.

図２に示すように、ユーザ端末300は、検出部310と、復号処理部320と、レンダラ330と、を有する。 As shown in FIG. 2, the user terminal 300 has a detection unit 310, a decoding processing unit 320, and a renderer 330.

検出部310は、ユーザの視点位置及び視線方向を検出する。検出部310は、加速度センサを含んでもよく、GPS（Global Positioning System）センサを含んでもよい。検出部310は、ユーザによって手動で入力されるユーザI/F（例えば、タッチセンサ、キーボード、マウス、コントローラなど）を含んでもよい。検出部310は、視点情報（視点位置及び視線方向）をメディア処理装置200に送信してもよい。検出部310は、視点情報（ビューポート）をレンダラ330に出力してもよい。 The detection unit 310 detects the viewpoint position and line-of-sight direction of the user. The detection unit 310 may include an acceleration sensor and may include a GPS (Global Positioning System) sensor. The detection unit 310 may include a user I/F (eg, touch sensor, keyboard, mouse, controller, etc.) manually input by the user. The detection unit 310 may transmit viewpoint information (viewpoint position and line-of-sight direction) to the media processing device 200 . The detection unit 310 may output viewpoint information (viewport) to the renderer 330 .

復号処理部320は、メディア処理装置200から受信する特定コンテンツを復号する。復号処理部320は、メディア処理装置200から受信する提示時刻を復号してもよい。復号処理部320は、特定コンテンツをレンダラ330に出力してもよく、提示時刻をレンダラ330に出力してもよい。 The decoding processing unit 320 decodes the specific content received from the media processing device 200 . The decoding processing unit 320 may decode the presentation time received from the media processing device 200 . The decoding processing unit 320 may output the specific content to the renderer 330 and may output the presentation time to the renderer 330 .

レンダラ330は、復号処理部320によって復号された特定コンテンツを出力する。レンダラ330は、復号処理部320によって復号された提示時刻に基づいて特定コンテンツを出力してもよい。例えば、レンダラ330は、特定コンテンツに含まれる映像コンテンツをディスプレイに出力し、特定コンテンツに含まれる音声コンテンツをスピーカに出力してもよい。 The renderer 330 outputs the specific content decrypted by the decryption processor 320. FIG. The renderer 330 may output the specific content based on the presentation time decoded by the decoding processing unit 320. FIG. For example, the renderer 330 may output video content included in the specific content to a display and output audio content included in the specific content to a speaker.

ここで、レンダラ330は、メディア処理装置200から受信する視点情報と検出部310から入力される視点情報との差異に基づいて、視点位置及び視線方向が修正された特定コンテンツを生成してもよい。 Here, the renderer 330 may generate specific content in which the viewpoint position and line-of-sight direction are corrected based on the difference between the viewpoint information received from the media processing device 200 and the viewpoint information input from the detection unit 310. .

（第2コンテンツ）
以下において、実施形態に係る第2コンテンツについて説明する。ここでは、t=0、t=1及びt=2における第2コンテンツについて説明する。t=0、t=1及びt=2の時間間隔は特に限定されるものではない。 (second content)
The second content according to the embodiment will be described below. Here, the second content at t=0, t=1 and t=2 will be described. The time intervals of t=0, t=1 and t=2 are not particularly limited.

例えば、図３に示すように、t=0において、3Dオブジェクトが表示されずに、360°映像が表示されてもよい。360°映像は、3Dオブジェクトの背景映像であると考えてもよい。t=1において、360°映像に重畳される形式で3Dオブジェクトが表示されてもよい。さらに、t=1において、360°映像に重畳される3Dオブジェクトの位置及び回転が変更されてもよい。 For example, as shown in FIG. 3, at t=0, the 360° video may be displayed without displaying the 3D object. A 360° image may be considered as a background image of a 3D object. At t=1, the 3D object may be displayed in a form superimposed on the 360° video. Furthermore, at t=1, the position and rotation of the 3D object superimposed on the 360° image may be changed.

上述したシーン記述は、t=0、t=1及びt=2のそれぞれについて、3Dオブジェクトの位置、回転及び大きさを示す情報要素を含み、360°映像上に3Dオブジェクトを適切に重畳することができる。 The above scene description includes information elements indicating the position, rotation and size of the 3D object for each of t=0, t=1 and t=2, and appropriately superimposes the 3D object on the 360° video. can be done.

（視聴方法）
以下において、実施形態に係る視聴方法について説明する。ここでは、第1コンテンツ及び第2コンテンツを含む特定コンテンツの視聴について例示する。 (Viewing method)
A viewing method according to the embodiment will be described below. Here, viewing of specific content including first content and second content is exemplified.

図４に示すように、ステップS11において、ユーザ端末300は、RTSP SETUPをメディア処理装置に送信する。RTSP SETUPは、特定コンテンツの視聴を開始する旨のメッセージである。 As shown in FIG. 4, in step S11, the user terminal 300 sends RTSP SETUP to the media processing device. RTSP SETUP is a message to start viewing specific content.

ここで、RTSP SETUPは、ユーザ端末300のIPアドレス、待受ポート番号、コンテンツの識別情報（コンテンツID）などを含む。RTSP SETUPは、特定コンテンツを視聴するためのユーザ端末300の能力情報を含んでもよい。能力情報は、フレームレート、表示解像度などを含んでもよい。表示解像度は、視野角（FoV：Field of View）を含んでもよい。能力情報は、ユーザ端末300が対応する符号化方式及び圧縮方式を示す情報要素を含んでもよい。 Here, the RTSP SETUP includes the IP address of the user terminal 300, standby port number, content identification information (content ID), and the like. The RTSP SETUP may contain information about the capabilities of the user terminal 300 to view specific content. Capability information may include frame rate, display resolution, and the like. The display resolution may include a viewing angle (FoV: Field of View). The capability information may include an information element indicating the encoding scheme and compression scheme that the user terminal 300 supports.

ここでは、ユーザ端末300の能力情報がメディア処理装置200に直接的に通知されるケースが例示されているが、実施形態はこれに限定されるものではない。ユーザ端末300の能力情報は、送信装置100に通知された上で、送信装置100からメディア処理装置200に通知されてもよい。 Here, a case is exemplified where the capability information of the user terminal 300 is directly notified to the media processing device 200, but the embodiment is not limited to this. The capability information of the user terminal 300 may be notified to the transmission device 100 and then transmitted from the transmission device 100 to the media processing device 200 .

ステップS12において、メディア処理装置200は、RTSP SETUPに対する応答を送信する。ここでは、RTSP SETUPを受け付けた旨を示すACKが応答として送信される。 At step S12, the media processing device 200 sends a response to the RTSP SETUP. Here, an ACK indicating that the RTSP SETUP has been accepted is sent as a response.

ステップS21において、ユーザ端末300は、初期視点情報をメディア処理装置200に送信する。初期視点情報は、MMT-SIの形式で送信されてもよい。 In step S21, the user terminal 300 transmits initial viewpoint information to the media processing device 200. FIG. The initial viewpoint information may be transmitted in the form of MMT-SI.

ステップS22において、メディア処理装置200は、初期視点情報に基づいて初期特定コンテンツを生成する（レンダリング処理）。例えば、メディア処理装置200は、初期視点情報及びシーン記述に基づいて、初期特定コンテンツに含める第2コンテンツを生成する。 In step S22, the media processing device 200 generates initial specific content based on the initial viewpoint information (rendering process). For example, the media processing device 200 generates second content to be included in the initial specific content based on the initial viewpoint information and the scene description.

ここで、メディア処理装置200は、ユーザ端末300の表示解像度よりも広い範囲をビューポートとして初期特定コンテンツを生成してもよい。例えば、表示解像度よりも広い範囲は、水平方向において表示解像度+20%、垂直方向において表示解像度+20%の範囲であってもよい。 Here, the media processing device 200 may generate the initial specific content using a viewport that is wider than the display resolution of the user terminal 300 . For example, the range wider than the display resolution may be a range of +20% display resolution in the horizontal direction and +20% display resolution in the vertical direction.

メディア処理装置200は、初期特定コンテンツに圧縮符号化方式を適用する。特に限定されるものではないが、圧縮符号化方式は、HEVCであってもよく、VVCであってもよい。 The media processing device 200 applies the compression encoding method to the initial specific content. Although not particularly limited, the compression encoding method may be HEVC or VVC.

ステップS23において、メディア処理装置200は、初期視点情報に対応する初期特定コンテンツをユーザ端末300に送信する。メディア処理装置200は、初期特定コンテンツの提示時刻をユーザ端末300に送信する。上述したように、ユーザ端末300に提供される提示時刻（T’）は、遅延時間（ΔT）に基づいて定められてもよい。 In step S23, the media processing device 200 transmits initial specific content corresponding to the initial viewpoint information to the user terminal 300. FIG. The media processing device 200 transmits the presentation time of the initial specific content to the user terminal 300 . As described above, the presentation time (T') provided to the user terminal 300 may be determined based on the delay time (ΔT).

なお、遅延時間（ΔT）としてユーザ端末300毎に異なる値を用いる場合には、上述したRTSP SETUPにRTSP SETUPの送信時刻を含めることによって、メディア処理装置200側で特定することが可能である。 When using a different value for each user terminal 300 as the delay time (ΔT), it is possible to specify it on the media processing device 200 side by including the RTSP SETUP transmission time in the RTSP SETUP described above.

ユーザ端末300は、提示時刻（T’）に基づいて特定コンテンツを出力する。ユーザ端末300は、メディア処理装置200から受信する視点情報と検出部310から入力される視点情報との差異に基づいて、視点位置及び視線方向が修正された特定コンテンツを生成してもよい。 The user terminal 300 outputs specific content based on the presentation time (T'). Based on the difference between the viewpoint information received from the media processing device 200 and the viewpoint information input from the detection unit 310, the user terminal 300 may generate specific content in which the viewpoint position and line-of-sight direction are corrected.

ステップS31において、ユーザ端末300は、視点情報をメディア処理装置200に送信する。視点情報は、MMT-SIの形式で送信されてもよい。ここで、ユーザ端末300は、所定周期（例えば、500ms）で視点情報を送信してもよく、視点位置及び視線方向の少なくともいずれか1つの変更に応じて視点情報を送信してもよい。 In step S31, the user terminal 300 transmits viewpoint information to the media processing device 200. FIG. Viewpoint information may be transmitted in the form of MMT-SI. Here, the user terminal 300 may transmit viewpoint information at a predetermined cycle (eg, 500 ms), or may transmit viewpoint information in accordance with a change in at least one of the viewpoint position and line-of-sight direction.

ステップS32において、メディア処理装置200は、ステップS31で受信する視点情報に基づいて特定コンテンツを生成する（レンダリング処理）。 In step S32, the media processing device 200 generates specific content based on the viewpoint information received in step S31 (rendering processing).

ステップS33において、メディア処理装置200は、ステップS31で受信する視点情報に対応する特定コンテンツをユーザ端末300に送信する。 In step S33, the media processing device 200 transmits to the user terminal 300 the specific content corresponding to the viewpoint information received in step S31.

ステップS31～ステップS33の処理は、初期視点情報に代えてステップS31で受信する視点情報を用いる点を除いて、ステップS21～ステップS23の処理と同様である。従って、ステップS31～ステップS33の処理の詳細については省略する。ステップS31～ステップS33の処理は、所定周期で繰り返されてもよく、ユーザの視点位置又は視線方向の変更毎に繰り返されてもよい。 The processing of steps S31 to S33 is the same as the processing of steps S21 to S23, except that the viewpoint information received in step S31 is used instead of the initial viewpoint information. Therefore, the details of the processing in steps S31 to S33 are omitted. The processing of steps S31 to S33 may be repeated at a predetermined cycle, or may be repeated each time the user's viewpoint position or line-of-sight direction is changed.

ステップS41において、ユーザ端末300は、RTSP TEARDOWNをメディア処理装置に送信する。RTSP TEARDOWNは、特定コンテンツの視聴を終了する旨のメッセージである。 In step S41, the user terminal 300 sends RTSP TEARDOWN to the media processing device. RTSP TEARDOWN is a message to the effect that viewing of specific content is finished.

ステップS42において、メディア処理装置200は、RTSP TEARDOWNに対する応答を送信する。ここでは、RTSP TEARDOWNを受け付けた旨を示すACKが応答として送信される。 At step S42, the media processing device 200 sends a response to the RTSP TEARDOWN. Here, an ACK indicating that RTSP TEARDOWN has been accepted is sent as a response.

図４では、ステップS11及びステップS12がRTSPベースで実行されるケースについて例示したが、実施形態はこれに限定されるものではない。ステップS11及びステップS12は、MMTPベースで実行されてもよく、HTTPベースで実行されてもよい。 FIG. 4 exemplifies a case where steps S11 and S12 are executed based on RTSP, but the embodiment is not limited to this. Steps S11 and S12 may be executed based on MMTP or HTTP.

同様に、ステップS41及びステップS42がRTSPベースで実行されるケースについて例示したが、実施形態はこれに限定されるものではない。ステップS41及びステップS42は、MMTPベースで実行されてもよく、HTTPベースで実行されてもよい。 Similarly, the case where steps S41 and S42 are executed based on RTSP has been exemplified, but embodiments are not limited to this. Steps S41 and S42 may be executed based on MMTP or HTTP.

図４では、ステップS31～ステップS33がMMTPベースで実行されるケースについて例示したが、実施形態はこれに限定されるものではない。ステップS31～ステップS33は、他の方式（例えば、HTTP）ベースで実行されてもよい。 FIG. 4 illustrates the case where steps S31 to S33 are executed on an MMTP basis, but the embodiment is not limited to this. Steps S31-S33 may be performed in other ways (eg, HTTP) based.

同様に、ステップS41～ステップS43がMMTPベースで実行されるケースについて例示したが、実施形態はこれに限定されるものではない。ステップS41～ステップS43は、他の方式（例えば、HTTP）ベースで実行されてもよい。 Similarly, the case where steps S41 to S43 are executed on an MMTP basis has been exemplified, but embodiments are not limited to this. Steps S41-S43 may also be performed on a different basis (eg, HTTP).

（動作例1）
上述した実施形態は、以下に示す動作例1を含んでもよい。動作例1では、メディア処理装置200は、特定コンテンツに付加されるシーケンス番号と対応付けて、特定コンテンツの生成で用いた視点情報をユーザ端末300に送信する。 (Operation example 1)
The above-described embodiment may include Operation Example 1 shown below. In operation example 1, the media processing device 200 transmits viewpoint information used in generating the specific content to the user terminal 300 in association with the sequence number added to the specific content.

具体的には、メディア処理装置200（レンダラ220）は、上述した実施形態と同様に、第2制御情報（シーン記述）に基づいて、特定コンテンツの一部として、360°映像及び3Dオブジェクトを含む第2コンテンツを生成する。第2コンテンツの生成において、ユーザ端末300から受信する視点情報が用いられる。 Specifically, the media processing device 200 (renderer 220) includes 360° video and 3D objects as part of the specific content based on the second control information (scene description), as in the above-described embodiment. Generate secondary content. Viewpoint information received from the user terminal 300 is used in generating the second content.

動作例1では、レンダラ220は、特定コンテンツ（ここでは、第2コンテンツ）の生成で用いた視点情報をシーケンス番号と対応付ける。レンダラ220は、特定コンテンツに付加されるシーケンス番号と対応付けて、特定コンテンツの生成で用いた視点情報をユーザ端末300に送信する。視点情報は、VP（View Port）メッセージに格納されてもよい。VPメッセージは、ISO/IEC 23008-1、ARIB STD-B60、ARIB TR-B39などで規定されたMMT-SIの形式を有してもよい。VPメッセージは、フレーム毎に送信されてもよい。VPメッセージは、MMTPに関するメッセージ（MMT-SI）としてユーザ端末300に送信されてもよい。 In operation example 1, the renderer 220 associates the viewpoint information used in generating the specific content (here, the second content) with the sequence number. The renderer 220 transmits viewpoint information used in generating the specific content to the user terminal 300 in association with the sequence number added to the specific content. Viewpoint information may be stored in a VP (View Port) message. The VP message may have the format of MMT-SI defined by ISO/IEC 23008-1, ARIB STD-B60, ARIB TR-B39, and so on. A VP message may be sent every frame. The VP message may be sent to the user terminal 300 as a message related to MMTP (MMT-SI).

特に限定されるものではないが、VPメッセージは、図５に示すデータ構造を有してもよい。図５に示すように、VPメッセージは、message_id、version、length、fov、viewpoint_pos_x、viewpoint_pos_y、viewpoint_pos_z、viewpoint_yaw、viewpoint_pitch、viewpoint_roll、viewport_width、viewport_height、mpu_sequence_number_flag、mpu_sequence_numberなどを含んでもよい。 Although not particularly limited, the VP message may have the data structure shown in FIG. As shown in FIG. 5, the VP message may include message_id, version, length, fov, viewpoint_pos_x, viewpoint_pos_y, viewpoint_pos_z, viewpoint_yaw, viewpoint_pitch, viewpoint_roll, viewport_width, viewport_height, mpu_sequence_number_flag, mpu_sequence_number, and so on.

message_idは、VPメッセージを示す識別情報である。message_idは、0x0204であってもよい。 message_id is identification information indicating a VP message. message_id may be 0x0204.

versionは、MMTPプロトコルのバージョンを示す情報である。versionは、0x00であってもよい。 version is information indicating the version of the MMTP protocol. version may be 0x00.

lengthは、VPメッセージの長さを示す情報である。 length is information indicating the length of the VP message.

fovは、視野角（field of view）を示す情報である。 fov is information indicating a field of view.

viewpoint_pos_xは、視点位置のx座標を示す情報である。viewpoint_pos_xは、特定コンテンツの生成で用いた視点情報の一例である。 viewpoint_pos_x is information indicating the x-coordinate of the viewpoint position. viewpoint_pos_x is an example of viewpoint information used in generating specific content.

viewpoint_pos_yは、視点位置のy座標を示す情報である。viewpoint_pos_yは、特定コンテンツの生成で用いた視点情報の一例である。 viewpoint_pos_y is information indicating the y-coordinate of the viewpoint position. viewpoint_pos_y is an example of viewpoint information used in generating specific content.

viewpoint_pos_zは、視点位置のz座標を示す情報である。viewpoint_pos_zは、特定コンテンツの生成で用いた視点情報の一例である。 viewpoint_pos_z is information indicating the z-coordinate of the viewpoint position. viewpoint_pos_z is an example of viewpoint information used in generating specific content.

viewpoint_yawは、視点位置のヨーを示す情報である。viewpoint_yawは、特定コンテンツの生成で用いた視点情報の一例である。 viewpoint_yaw is information indicating the yaw of the viewpoint position. viewpoint_yaw is an example of viewpoint information used in generating specific content.

viewpoint_pitchは、視点位置のピッチを示す情報である。viewpoint_pitchは、特定コンテンツの生成で用いた視点情報の一例である。 viewpoint_pitch is information indicating the pitch of viewpoint positions. viewpoint_pitch is an example of viewpoint information used in generating specific content.

viewpoint_rollは、視点位置のロールを示す情報である。viewpoint_rollは、特定コンテンツの生成で用いた視点情報の一例である。 viewpoint_roll is information indicating the roll of the viewpoint position. viewpoint_roll is an example of viewpoint information used in generating specific content.

viewport_widthは、表示領域（特定コンテンツ）の幅を示す情報である。 viewport_width is information indicating the width of the display area (specific content).

viewport_heightは、表示領域（特定コンテンツ）の高さを示す情報である。 viewport_height is information indicating the height of the display area (specific content).

mpu_sequence_number_flagは、mpu_sequence_numberのフィールドが存在するか否かを示す情報である。例えば、mpu_sequence_number_flagが1である場合に、mpu_sequence_numberのフィールドが存在し、mpu_sequence_number_flagが0である場合に、mpu_sequence_numberのフィールドが存在しなくてもよい。 mpu_sequence_number_flag is information indicating whether or not the mpu_sequence_number field exists. For example, if the mpu_sequence_number_flag is 1, the mpu_sequence_number field may exist, and if the mpu_sequence_number_flag is 0, the mpu_sequence_number field may not exist.

mpu_sequence_numberは、VPメッセージが示す特定コンテンツに対応する映像のMPUシーケンス番号である。mpu_sequence_numberは、特定コンテンツに付加されるシーケンス番号の一例である。 mpu_sequence_number is the MPU sequence number of the video corresponding to the specific content indicated by the VP message. mpu_sequence_number is an example of a sequence number added to specific content.

ここで、viewpoint_pos_x、viewpoint_pos_y、viewpoint_pos_zは、シーン記述によって構成される3次元空間におけるユーザの視点位置を示す情報要素の一例である。viewpoint_yaw、viewpoint_pitch、viewpoint_rollは、シーン記述によって構成される3次元空間におけるユーザの視線方向を示す情報要素の一例である。viewport_width、viewport_heightは、特定コンテンツに含まれる映像の画素数を示す情報要素の一例である。 Here, viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z are examples of information elements that indicate the user's viewpoint position in the three-dimensional space configured by the scene description. viewpoint_yaw, viewpoint_pitch, and viewpoint_roll are examples of information elements that indicate the direction of the user's line of sight in the three-dimensional space configured by the scene description. viewport_width and viewport_height are examples of information elements indicating the number of pixels of video included in specific content.

このような前提下において、メディア処理装置200及びユーザ端末300は、以下に示す動作を実行してもよい。 Under this premise, the media processing device 200 and the user terminal 300 may perform the following operations.

第1に、メディア処理装置200（レンダラ220）は、特定コンテンツの生成に用いた視点位置に基づいて、viewpoint_pos_x、viewpoint_pos_y、viewpoint_pos_zを特定してもよい。レンダラ220は、特定コンテンツの生成に用いた視線方向に基づいて、viewpoint_yaw、viewpoint_pitch、viewpoint_rollを特定してもよい。レンダラ220は、特定コンテンツの横方向の画素数に基づいてviewport_widthを特定し、特定コンテンツの縦方向の画素数に基づいてviewport_heightを特定してもよい。 First, the media processing device 200 (renderer 220) may identify viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z based on the viewpoint position used to generate the specific content. The renderer 220 may identify the viewpoint_yaw, viewpoint_pitch, and viewpoint_roll based on the viewing direction used to generate the particular content. The renderer 220 may determine viewport_width based on the number of pixels in the horizontal direction of the specific content, and determine viewport_height based on the number of pixels in the vertical direction of the specific content.

メディア処理装置200（符号化処理部230）は、レンダラ220によって生成された特定コンテンツの圧縮符号化を実行し、特定コンテンツを送信してもよい。ここで、符号化処理部230は、図５に示すVPメッセージをユーザ端末300に送信してもよい。すなわち、符号化処理部230は、特定コンテンツに付加されるシーケンス番号と対応付けて、特定コンテンツの生成で用いた視点情報をユーザ端末300に送信してもよい。 The media processing device 200 (encoding processing unit 230) may perform compression encoding of the specific content generated by the renderer 220 and transmit the specific content. Here, the encoding processing unit 230 may transmit the VP message shown in FIG. 5 to the user terminal 300. FIG. That is, the encoding processing unit 230 may transmit the viewpoint information used in generating the specific content to the user terminal 300 in association with the sequence number added to the specific content.

第2に、ユーザ端末300（復号処理部320）は、特定コンテンツに付加されるシーケンス番号と対応付けて、特定コンテンツの生成で用いた視点情報をメディア処理装置200から受信する受信部を構成してもよい。すなわち、復号処理部320は、図５に示すVPメッセージ（MMT-SI）をメディア処理装置200から受信してもよい。 Second, user terminal 300 (decryption processing unit 320) constitutes a receiving unit that receives viewpoint information used in generating specific content from media processing device 200 in association with the sequence number added to the specific content. may That is, the decoding processing unit 320 may receive the VP message (MMT-SI) shown in FIG.

ユーザ端末300（レンダラ330）は、シーケンス番号（mpu_sequence_number）に基づいて、復号された特定コンテンツを構成する映像とVPメッセージに含まれる視点情報とを対応付けてもよい。レンダラ330は、検出部310によって検出される視点情報とVPメッセージに含まれる視点情報との差異に基づいて、VPメッセージに含まれる情報（viewport_width及びviewport_height）によって定義される表示領域から、検出部310によって検出される視点情報によって特定される範囲の映像を特定してもよい。レンダラ330は、特定された映像を表示してもよい。 Based on the sequence number (mpu_sequence_number), the user terminal 300 (renderer 330) may associate the video that constitutes the decoded specific content with the viewpoint information included in the VP message. Based on the difference between the viewpoint information detected by the detector 310 and the viewpoint information included in the VP message, the renderer 330 renders the detector 310 from the display area defined by the information (viewport_width and viewport_height) included in the VP message. You may specify the image|video of the range specified by the viewpoint information detected by. Renderer 330 may display the identified video.

（動作例2）
上述した実施形態は、以下に示す動作例2を含んでもよい。動作例2では、図６に示すように、視点情報をフィードバックする第1ユーザ端末400及び視点情報をフィードバックしない第2ユーザ端末500が混在するケースが想定される。第1ユーザ端末400は、ヘッドマウントディスプレイなどの端末であってもよい。第1ユーザ端末は、上述したユーザ端末300と同様の機能を有していてもよい。第2ユーザ端末500は、ボリュメトリックディスプレイなどの端末であってもよい。 (Operation example 2)
The above-described embodiment may include Operation Example 2 shown below. In operation example 2, as shown in FIG. 6, a case is assumed in which a first user terminal 400 that feeds back viewpoint information and a second user terminal 500 that does not feed back viewpoint information coexist. The first user terminal 400 may be a terminal such as a head mounted display. The first user terminal may have the same functions as the user terminal 300 described above. The second user terminal 500 may be a terminal such as a volumetric display.

具体的には、動作例2では、図６に示すように、メディア処理装置200（レンダラ220）は、コンテンツの構成及び推奨ビューポート情報を送信装置100から受信する受信部を構成してもよい。コンテンツの構成は、2D映像、音声、360°映像、3Dオブジェクトを含むと考えてもよい。コンテンツの構成は、MMT-SI及びシーン記述を含むと考えてもよい。推奨ビューポート情報は、特定視点情報の一例であると考えてもよい。推奨ビューポート情報は、特定コンテンツによって構成される3次元空間（シーン記述で構築される3次元空間）において、どの位置、方向及び画角で映像を見るのかを定義（推奨）する情報であってもよい。推奨ビューポート情報は、特定コンテンツによって構成される3次元空間における視点位置を示す情報要素及び特定コンテンツによって構成される3次元空間における視線方向を示す情報要素の少なくてもいずか1つを示す情報要素を含んでもよい。推奨ビューポート情報は、主として、第2ユーザ端末500で用いられる視点情報であると考えてもよい。 Specifically, in operation example 2, as shown in FIG. 6, the media processing device 200 (renderer 220) may configure a receiving unit that receives content configuration and recommended viewport information from the transmitting device 100. . The configuration of content may be considered to include 2D video, audio, 360° video, and 3D objects. Content configuration may be considered to include MMT-SI and scene description. The recommended viewport information may be considered as an example of specific viewpoint information. The recommended viewport information is information that defines (recommended) the position, direction, and angle of view in which the video is viewed in the 3D space configured by the specific content (the 3D space constructed by the scene description). good too. The recommended viewport information is information that indicates at least one of information elements indicating the viewpoint position in the 3D space configured by the specific content and information elements indicating the viewing direction in the 3D space configured by the specific content. may contain elements. It may be considered that the recommended viewport information is mainly viewpoint information used by the second user terminal 500 .

以下において、明示的に記載しない限りにおいて、特定コンテンツの生成で用いる視点情報は、送信装置100から受信する特定視点情報（推奨ビューポート情報）を含んでもよく、ユーザ端末300から受信する視点情報を含んでもよい。 In the following, unless explicitly stated, the viewpoint information used in generating the specific content may include the specific viewpoint information (recommended viewport information) received from the transmission device 100, or the viewpoint information received from the user terminal 300. may contain.

特に限定されるものではないが、推奨ビューポート情報は、図７に示す態様でシーン記述に含まれてもよい。図７に示すように、推奨ビューポート情報は、camera_orientation、frame_number、translation、yfovを含んでもよい。 Although not particularly limited, the recommended viewport information may be included in the scene description in the manner shown in FIG. As shown in FIG. 7, the recommended viewport information may include camera_orientation, frame_number, translation, and yfov.

camera_orientationは、シーン記述で構築される3次元空間において映像を見る方向を示す情報である。camera_orientationは、視線方向と同義であると考えてもよい。 camera_orientation is information indicating the direction in which an image is viewed in a three-dimensional space constructed by the scene description. camera_orientation may be considered synonymous with viewing direction.

frame_numberは、camera_orientation、translation及びyfovが適用される映像のフレーム番号を示す情報である。camera_orientationは、特定視点情報の一例であると考えてもよい。 frame_number is information indicating the frame number of the video to which camera_orientation, translation, and yfov are applied. camera_orientation may be considered as an example of specific viewpoint information.

translationは、シーン記述で構築される3次元空間において映像を見る位置を示す情報である。translationは、視点位置と同義であると考えてもよい。translationは、特定視点情報の一例であると考えてもよい。例えば、図７では、フレーム番号が0である場合に、視点位置が[0,0,-50]であり、フレーム番号が2505である場合に、視点位置が[0,0,-75]に移動するケースが例示されている。 Translation is information indicating the position where the video is viewed in the three-dimensional space constructed by the scene description. You may think that translation is synonymous with viewpoint position. You may think that translation is an example of specific viewpoint information. For example, in FIG. 7, when the frame number is 0, the viewpoint position is [0, 0, -50], and when the frame number is 2505, the viewpoint position is [0, 0, -75]. A moving case is illustrated.

yfovは、シーン記述で構築される3次元空間において映像を見る画角を示す情報である。 yfov is information indicating the angle of view for viewing an image in a three-dimensional space constructed by scene description.

特に限定されるものではないが、camera_orientation及びtranslationは、コンテンツの制作者が付与してもよい。或いは、コンテンツがカメラによって撮像されるケースを想定した場合に、カメラに設けられるGPS及びセンサによってcamera_orientation及びtranslationが自動的に付与されてもよい。 Although not particularly limited, camera_orientation and translation may be given by the creator of the content. Alternatively, assuming a case where content is captured by a camera, camera_orientation and translation may be automatically given by the GPS and sensor provided in the camera.

ここで、translationは、特定コンテンツによって構成される3次元空間における視点位置を示す情報要素の一例である。camera_orientationは、特定コンテンツによって構成される3次元空間における視線方向を示す情報要素の一例である。 Here, "translation" is an example of an information element indicating a viewpoint position in a three-dimensional space configured by specific content. camera_orientation is an example of an information element that indicates the line-of-sight direction in a three-dimensional space configured by specific content.

第1に、メディア処理装置200（レンダラ220）は、特定視点情報（推奨ビューポート情報）に基づいて特定コンテンツを生成してもよい。メディア処理装置200（符号化処理部230）は、レンダラ220によって生成された特定コンテンツの圧縮符号化を実行し、特定コンテンツを送信してもよい。ここで、符号化処理部230は、特定視点情報に基づいて生成された特定コンテンツを第1ユーザ端末400に送信してもよく、特定視点情報に基づいて生成された特定コンテンツを第2ユーザ端末500に送信してもよい。 First, the media processing device 200 (renderer 220) may generate specific content based on specific viewpoint information (recommended viewport information). The media processing device 200 (encoding processing unit 230) may perform compression encoding of the specific content generated by the renderer 220 and transmit the specific content. Here, the encoding processing unit 230 may transmit the specific content generated based on the specific viewpoint information to the first user terminal 400, and transmit the specific content generated based on the specific viewpoint information to the second user terminal. You may send to 500.

第2に、メディア処理装置200（受付部210）は、ユーザ端末が視点情報をフィードバックする第1ユーザ端末400である場合に、第1ユーザ端末400から視点情報を受信してもよい。メディア処理装置（レンダラ220）は、第1ユーザ端末400から受信する視点情報に基づいて特定コンテンツを生成してもよい。このようなケースにおいて、メディア処理装置200（受付部210）は、リセット信号を第1ユーザ端末400から受信してもよい。メディア処理装置（レンダラ220）は、リセット信号に応じて、特定視点情報（推奨ビューポート情報）に基づいて特定コンテンツを生成してもよい。 Second, the media processing device 200 (accepting unit 210) may receive viewpoint information from the first user terminal 400 when the user terminal is the first user terminal 400 that feeds back viewpoint information. The media processing device (renderer 220 ) may generate specific content based on the viewpoint information received from the first user terminal 400 . In such a case, the media processing device 200 (accepting unit 210) may receive the reset signal from the first user terminal 400. FIG. The media processing device (renderer 220) may generate specific content based on the specific viewpoint information (recommended viewport information) in response to the reset signal.

このようなケースにおいて、第1ユーザ端末400は、ユーザ端末300と同様の構成を有していてもよい。但し、第1ユーザ端末400（検出部310）は、リセット信号を検出する機能を有してもよい。検出部310は、リセット信号を入力するユーザ操作を検出してもよい。検出部310は、リセット信号をメディア処理装置200に送信してもよい。 In such a case, the first user terminal 400 may have the same configuration as the user terminal 300. FIG. However, first user terminal 400 (detection unit 310) may have a function of detecting a reset signal. The detection unit 310 may detect a user operation of inputting a reset signal. The detector 310 may send a reset signal to the media processing device 200 .

第3に、メディア処理装置200（レンダラ220）は、ユーザ端末が視点情報をフィードバックしない第2ユーザ端末500である場合に、特定視点情報（推奨ビューポート情報）に基づいて特定コンテンツを生成してもよい。 Third, the media processing device 200 (renderer 220) generates specific content based on specific viewpoint information (recommended viewport information) when the user terminal is the second user terminal 500 that does not feed back viewpoint information. good too.

このようなケースにおいて、第2ユーザ端末500は、視点情報を検出する検出部310を有していなくてもよい。第2ユーザ端末500は、レンダラ330を有していなくてもよい。第2ユーザ端末500は、検出部310及びレンダラ330を有していない点を除いて、ユーザ端末300と同様の構成を有してもよい。 In such a case, the second user terminal 500 may not have the detection unit 310 that detects viewpoint information. Second user terminal 500 may not have renderer 330 . Second user terminal 500 may have the same configuration as user terminal 300 except that it does not have detector 310 and renderer 330 .

（動作例3）
上述した実施形態は、以下に示す動作例3を含んでもよい。動作例3では、メディア処理装置200（例えば、後述する選択部260）は、特定コンテンツに含まれる3Dオブジェクトに関するストリームの品質情報として、視点情報に基づいた3Dオブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を送信装置100から受信する受信部を構成してもよい。 (Operation example 3)
The above-described embodiment may include Operation Example 3 shown below. In operation example 3, the media processing device 200 (for example, the selection unit 260 to be described later) selects two or more streams whose quality differs depending on the direction of the 3D object based on the viewpoint information, as the stream quality information regarding the 3D object included in the specific content. A receiving unit may be configured to receive quality information about each of the streams from the transmitting device 100 .

具体的には、動作例3では、図８に示すように、メディア処理装置200は、図２に示す構成に加えて、選択部260を有する。選択部260は、シーン記述及び3Dオブジェクトを送信装置100から受信する。選択部260は、2以上のストリームの中から選択されたストリーム（3Dオブジェクト）をレンダラ220に入力する。選択部260は、選択されたストリームの送信を送信装置100に要求してもよい。なお、メディア処理装置200が複数のユーザ端末300に特定コンテンツを送信するケースを想定した場合には、選択部260は、複数のユーザ端末300の各々で必要とされるストリームの送信を送信装置100に要求してもよく、全てのストリームの送信を送信装置100に要求してもよい。 Specifically, in operation example 3, as shown in FIG. 8, the media processing device 200 has a selector 260 in addition to the configuration shown in FIG. The selection unit 260 receives the scene description and the 3D objects from the sending device 100 . The selection unit 260 inputs a stream (3D object) selected from two or more streams to the renderer 220 . The selection unit 260 may request the transmission device 100 to transmit the selected stream. In addition, when assuming a case where the media processing device 200 transmits specific content to a plurality of user terminals 300, the selection unit 260 selects the transmission device 100 to transmit a stream required by each of the plurality of user terminals 300. , or may request the transmitting apparatus 100 to transmit all streams.

ここで、選択部260は、3Dオブジェクトに関するストリームの品質情報として、視点情報に基づいた3Dオブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を送信装置100から受信する。 Here, the selection unit 260 receives, from the transmitting device 100, quality information about each of two or more streams whose quality differs depending on the orientation of the 3D object based on the viewpoint information, as the quality information about the stream about the 3D object.

品質情報は、3Dオブジェクトに関するバウンディングボックスを構成する各面の相対品質を示す情報であってもよい。バウンディングボックスは、3Dオブジェクトを射影する3次元の矩形によって表されてもよい。例えば、図９に示すように、バウンディングボックスは、頂点A～頂点Hによって定義されてもよい。このようなケースにおいて、バウンディングボックスの各面は、頂点A,B,F,Eで表される面#1、頂点B,C,G,Fで表される面#2、頂点A,B,C,Dで表される面#3、頂点E,F,G,Hで表される面#4、頂点A,D,H,Eで表される面#5、頂点D,C,G,Hで表される面#6を含む。 The quality information may be information indicating the relative quality of each facet that makes up the bounding box for the 3D object. A bounding box may be represented by a three-dimensional rectangle that projects the 3D object. For example, a bounding box may be defined by vertex A through vertex H, as shown in FIG. In such a case, each face of the bounding box is face #1 represented by vertices A, B, F, E, face #2 represented by vertices B, C, G, F, vertices A, B, Face #3 represented by C, D, Face #4 represented by vertices E, F, G, H, Face #5 represented by vertices A, D, H, E, Vertices D, C, G, Includes face #6, denoted by H.

このようなケースにおいて、シーン記述によって構築される3次元空間において3Dオブジェクトを見るケースを想定すると、3つの面について主として観察されると想定される。言い換えると、残りの3つの面についてはあまり観察されないと想定される。 In such a case, assuming a case of viewing a 3D object in a 3D space constructed by the scene description, it is assumed that three planes are mainly observed. In other words, the remaining three faces are assumed to be less observed.

動作例3では、特定コンテンツに含まれる3Dオブジェクトに関するストリームとして、視点情報に基づいた3Dオブジェクトの向きによって品質が異なる2以上のストリームが準備される。 In operation example 3, two or more streams with different quality depending on the orientation of the 3D object based on the viewpoint information are prepared as streams relating to the 3D object included in the specific content.

特に限定されるものではないが、品質情報は、図１０に示す態様でシーン記述に含まれてもよい。図１０では、3Dオブジェクトの向きが異なるストリームとして、6つのストリームが例示されている。品質情報は、”quality” [#1,#2,#3,#4,#5,#6]の形式で表されてもよい。なお、[ ]内において、#1～#6は、面#1～面#6の品質インデックスを意味している。品質インデックスは、1～9の範囲の値を取り得てもよい。品質インデックスの値が大きいほど、品質が高いことを意味してもよい。例えば、”id”=”1”で識別されるストリームでは、#1,#2,#3の品質（”8”）が高く、#4,#5,#6の品質（”3”）が低い。”id”=”2”で識別されるストリームでは、#1,#2,#3の品質（”3”）が低く、#4,#5,#6の品質（”8”）が高い。 Although not particularly limited, quality information may be included in the scene description in the manner shown in FIG. In FIG. 10, six streams are illustrated as streams with different orientations of 3D objects. The quality information may be represented in the format "quality" [#1,#2,#3,#4,#5,#6]. In [ ], #1 to #6 mean the quality index of plane #1 to plane #6. The quality index may take values ranging from 1-9. A higher quality index value may mean a higher quality. For example, in the stream identified by ”id”=”1”, #1,#2,#3 have high quality (”8”) and #4,#5,#6 have high quality (”3”). low. In the stream identified by "id"="2", #1,#2,#3 have low quality ("3") and #4,#5,#6 have high quality ("8").

このような前提下において、メディア処理装置200は、以下に示す動作を実行してもよい。以下においては、2以上のストリームの中から選択されたストリーム（3Dオブジェクト）の選択について主として説明する。 Under this premise, the media processing device 200 may perform the following operations. In the following, selection of a stream (3D object) selected from two or more streams will be mainly described.

モード1では、図１１の上段に示すように、メディア処理装置200（選択部260）は、ユーザの視点位置に最も近い頂点（例えば、頂点B）を特定した上で、最も近い頂点を有する3つの面（例えば、頂点A,B,F,Eで表される面#1、頂点B,C,G,Fで表される面#2、頂点A,B,C,Dで表される面#3）を特定してもよい。選択部260は、特定された3つの面の品質インデックスの総和が最大となるストリーム（図１０に示す例では、”id”=”１”で識別されるストリーム）を選択してもよい。 In mode 1, as shown in the upper part of FIG. 11, the media processing device 200 (the selection unit 260) specifies the vertex (for example, vertex B) closest to the user's viewpoint position, and then selects the vertex having the closest vertex. For example, face #1 represented by vertices A,B,F,E, face #2 represented by vertices B,C,G,F, face represented by vertices A,B,C,D #3) may be specified. The selection unit 260 may select the stream with the maximum sum of the quality indexes of the identified three faces (in the example shown in FIG. 10, the stream identified by “id”=“1”).

なお、モード1では、視点情報に基づいてユーザの視点位置に最も近い頂点が特定されることから、選択部260は、視点情報及び品質情報に基づいて、2以上のストリームの中からユーザ端末300に送信すべきストリームを選択すると考えてもよい。 Note that in mode 1, the vertex closest to the user's viewpoint position is specified based on the viewpoint information. may be considered to select the stream to be sent to the .

モード2では、3Dオブジェクトの縮小表示又は拡大表示が実行されるケースで適用されるモードであってもよい。例えば、図１１の中段に示すように、メディア処理装置200（選択部260）は、ユーザの視点位置に最も近い頂点（例えば、頂点B）を特定した上で、最も近い頂点を有する3つの面（例えば、頂点A,B,F,Eで表される面#1、頂点B,C,G,Fで表される面#2、頂点A,B,C,Dで表される面#3）を特定してもよい。3Dオブジェクトの縮小表示が実行されるケースでは、3Dオブジェクトの画素が間引かれる。従って、選択部260は、特定された3つの面の品質インデックスの総和が最小となるストリーム（図１０に示す例では、”id”=”2”で識別されるストリーム）を選択してもよい。一方で、3Dオブジェクトの拡大表示が実行されるケースでは、3Dオブジェクトの画素が補間される。従って、選択部260は、特定された3つの面の品質インデックスの総和が最大となるストリーム（図１０に示す例では、”id”=”1”で識別されるストリーム）を選択してもよい。 Mode 2 may be a mode that is applied when a reduced display or enlarged display of a 3D object is performed. For example, as shown in the middle part of FIG. 11, the media processing device 200 (selection unit 260) identifies the vertex (for example, vertex B) closest to the user's viewpoint position, and then selects three surfaces having the closest vertex. (For example, surface #1 represented by vertices A, B, F, E, surface #2 represented by vertices B, C, G, F, surface #3 represented by vertices A, B, C, D ) may be specified. In the case where a reduced display of the 3D object is performed, the pixels of the 3D object are thinned out. Therefore, the selection unit 260 may select the stream that minimizes the sum of the quality indexes of the identified three faces (in the example shown in FIG. 10, the stream identified by “id”=“2”). . On the other hand, in cases where a magnified display of the 3D object is performed, the pixels of the 3D object are interpolated. Therefore, the selection unit 260 may select the stream that maximizes the sum of the quality indexes of the identified three surfaces (in the example shown in FIG. 10, the stream identified by “id”=“1”). .

なお、モード2では、視点情報に基づいてユーザの視点位置に最も近い頂点が特定されることから、選択部260は、視点情報及び品質情報に基づいて、2以上のストリームの中からユーザ端末300に送信すべきストリームを選択すると考えてもよい。 Note that in mode 2, since the vertex closest to the user's viewpoint position is specified based on the viewpoint information, the selection unit 260 selects the user terminal 300 from the two or more streams based on the viewpoint information and the quality information. may be considered to select the stream to be sent to the .

モード3では、ユーザの視線方向において、2つの3Dオブジェクト（3Dオブジェクト#1及び3Dオブジェクト#2）が重なるケースで適用されるモードであってもよい。ここでは、3Dオブジェクト#1に関するストリームの選択について説明する。例えば、図１１の下段に示すように、メディア処理装置200（選択部260）は、ユーザの視点位置に最も近い頂点（例えば、頂点B）を特定した上で、最も近い頂点を有する3つの面（例えば、頂点A,B,F,Eで表される面#1、頂点B,C,G,Fで表される面#2、頂点A,B,C,Dで表される面#3）を特定してもよい。ここで、ユーザの視点位置に最も近い頂点（例えば、頂点B）とユーザの視点位置とを結ぶ線分上において3Dオブジェクト#2が重なっており、特定された3つの面が3Dオブジェクト#2によって遮られる。従って、選択部260は、特定された3つの面の品質インデックスの総和が最小となるストリーム（図１０に示す例では、”id”=”2”で識別されるストリーム）を選択してもよい。 Mode 3 may be a mode applied in a case where two 3D objects (3D object #1 and 3D object #2) overlap in the direction of the user's line of sight. Here, selection of a stream for 3D object #1 will be described. For example, as shown in the lower part of FIG. 11, the media processing device 200 (the selection unit 260) identifies the vertex (for example, vertex B) closest to the user's viewpoint position, and then selects three surfaces having the closest vertex. (For example, surface #1 represented by vertices A, B, F, E, surface #2 represented by vertices B, C, G, F, surface #3 represented by vertices A, B, C, D ) may be specified. Here, the 3D object #2 overlaps on the line segment connecting the vertex (for example, vertex B) closest to the user's viewpoint position and the user's viewpoint position, and the identified three faces are overlapped by the 3D object #2. blocked. Therefore, the selection unit 260 may select the stream that minimizes the sum of the quality indexes of the identified three faces (in the example shown in FIG. 10, the stream identified by “id”=“2”). .

なお、モード3では、視点情報に基づいてユーザの視点位置に最も近い頂点が特定されることから、選択部260は、視点情報及び品質情報に基づいて、2以上のストリームの中からユーザ端末300に送信すべきストリームを選択すると考えてもよい。さらに、モード3では、視点情報及び3Dオブジェクトの配置情報に基づいて2つの3Dオブジェクトの重なりが特定されることから、選択部260は、視点情報、品質情報及び配置情報に基づいて、2以上のストリームの中からユーザ端末300に送信すべきストリームを選択すると考えてもよい。3Dオブジェクトの配置情報（例えば、図１０に示す”rotation_object”、”scale_object”、”translation_object）は、シーン記述に含まれてもよい。3Dオブジェクトの配置情報は、図１０に示す”link_area”であると考えてもよい。すなわち、選択部260は、3次元空間における3Dオブジェクトの配置情報を送信装置100から受信してもよい。 Note that in mode 3, since the vertex closest to the user's viewpoint position is specified based on the viewpoint information, the selection unit 260 selects the user terminal 300 from the two or more streams based on the viewpoint information and the quality information. may be considered to select the stream to be sent to the . Furthermore, in mode 3, the overlapping of two 3D objects is specified based on the viewpoint information and the arrangement information of the 3D objects. It may be considered that a stream to be transmitted to the user terminal 300 is selected from among the streams. 3D object placement information (for example, "rotation_object", "scale_object", "translation_object" shown in Fig. 10) may be included in the scene description, and the 3D object placement information is "link_area" shown in Fig. 10. In other words, the selection unit 260 may receive the placement information of the 3D object in the 3D space from the transmission device 100. FIG.

なお、図１０に示す品質情報では、各ストリームにおいて6つの面の品質インデックスの総和が等しい。しかしながら、実施形態はこれに限定されるものではない。6つの面の品質インデックスの総和は、2以上のストリーム間で異なっていてもよい。 Note that in the quality information shown in FIG. 10, the sum of the quality indexes of the six planes is the same for each stream. However, embodiments are not so limited. The sum of the six-sided quality indices may differ between two or more streams.

（動作例4）
上述した実施形態は、以下に示す動作例4を含んでもよい。ここでは、動作例4は、動作例3に加えて、以下に示す動作を含む。動作例4では、メディア処理装置200（例えば、後述する選択部270）は、特定コンテンツに含まれる2以上のオブジェクトの各々に関する重要度情報を送信装置100から受信する受信部を構成してもよい。 (Operation example 4)
The above-described embodiment may include Operation Example 4 shown below. Here, in addition to operation example 3, operation example 4 includes the following operation. In operation example 4, the media processing device 200 (for example, the selection unit 270 to be described later) may configure a reception unit that receives importance information regarding each of two or more objects included in the specific content from the transmission device 100. .

具体的には、動作例4では、図１２に示すように、メディア処理装置200は、図２に示す構成に加えて、選択部270を有する。選択部270は、シーン記述、3Dオブジェクト及び360°映像を送信装置100から受信する。選択部270は、2以上のストリームの中から選択されたストリーム（3Dオブジェクト）をレンダラ220に入力する。 Specifically, in operation example 4, as shown in FIG. 12, the media processing device 200 has a selector 270 in addition to the configuration shown in FIG. The selection unit 270 receives the scene description, 3D objects and 360° video from the transmission device 100 . The selection unit 270 inputs a stream (3D object) selected from two or more streams to the renderer 220 .

ここで、選択部270は、2以上のオブジェクトの各々に関する重要度情報を送信装置100から受信する。オブジェクトは、3Dオブジェクト及び360°映像を含んでもよい。 Here, the selection unit 270 receives importance information about each of the two or more objects from the transmission device 100. FIG. Objects may include 3D objects and 360° videos.

例えば、重要度情報は、2以上のオブジェクト間の相対的な重要度を示す情報であってもよい。例えば、図１３に示すように、シーン記述によって構築される3次元空間において、オブジェクトA（背景）、オブジェクトB（人）及びオブジェクトC（犬）が存在するケースについて考える。オブジェクトA（背景）は、360°映像の一例であり、オブジェクトB（人）及びオブジェクトC（犬）は、3Dオブジェクトの一例である。このようなケースにおいて、重要度情報は、オブジェクトA（背景）、オブジェクトB（人）及びオブジェクトC（犬）の各々の間の相対的な重要度を示す情報であってもよい。 For example, importance information may be information indicating relative importance between two or more objects. For example, as shown in FIG. 13, consider a case where object A (background), object B (person), and object C (dog) exist in a three-dimensional space constructed by scene description. Object A (background) is an example of a 360° video, and object B (person) and object C (dog) are examples of 3D objects. In such a case, the importance information may be information indicating the relative importance among each of object A (background), object B (person) and object C (dog).

特に限定されるものではないが、重要度情報は、図１４に示す態様でシーン記述に含まれてもよい。図１４では、重要度情報は、weightで表されてもよい。weightは、1～9の範囲の値を取り得てもよい。weightの値が大きいほど、重要度が高いことを意味してもよい。図１４では、”object_id”=”0”で識別されるオブジェクトA（背景）のweight（”9”）が最も高く、”object_id”=”1”で識別されるオブジェクトB（人）のweight（”3”）が最も低く、”object_id”=”2”で識別されるオブジェクトC（犬）のweight（”8”）がオブジェクトB（人）のweightよりも高くオブジェクトA（背景）のweightよりも低いケースが例示されている。 Although not particularly limited, the importance information may be included in the scene description in the manner shown in FIG. In FIG. 14, the importance information may be represented by weight. weight may take values in the range 1-9. A larger value of weight may mean a higher degree of importance. In FIG. 14, object A (background) identified by "object_id" = "0" has the highest weight ("9"), and object B (person) identified by "object_id" = "1" has the highest weight ( "3") is the lowest, and the weight of object C (dog) identified by "object_id" = "2" ("8") is higher than the weight of object B (person) and the weight of object A (background). A low case is illustrated.

第1に、メディア処理装置200（選択部270）は、重要度が最も高い3Dオブジェクトについて、品質が最も高いストリームを選択する。ストリームの選択方法は、動作例3と同様であってもよい。例えば、選択部270は、オブジェクトC（犬）の重要度がオブジェクトB（人）の重要度よりも大きいため、オブジェクトC（犬）について、ユーザの視点位置に最も近い頂点を有する3つの面の品質インデックスの総和が最大となるストリームを選択する。 First, the media processing device 200 (selector 270) selects the highest quality stream for the 3D object with the highest importance. The stream selection method may be the same as in Operation Example 3. For example, since the importance of object C (dog) is higher than the importance of object B (person), the selection unit 270 selects three planes of object C (dog) that have vertices closest to the user's viewpoint position. Select the stream with the highest sum of quality indices.

第2に、メディア処理装置200（選択部270）は、重要度が最も高い3Dオブジェクト以外の3Dオブジェクトについて、品質が最も低いストリームを選択する。続いて、選択部270は、重要度が高い3Dオブジェクトから順に、特定条件が満たされる範囲内において、品質が最も低いストリームを品質が高いストリームに置き換える。特定条件は、送信装置100からメディア処理装置200への回線の帯域が閾値以下である第1条件を含んでもよく、メディア処理装置200の処理負荷が閾値以下である第2条件を含んでもよい。特定条件は、第1条件及び第2条件の組み合わせによって定義されてもよい。例えば、選択部270は、オブジェクトB（人）の重要度がオブジェクトC（犬）の重要度よりも小さいため、オブジェクトB（人）について、特定条件が満たされる範囲内において、品質が高いストリームを選択する。 Second, the media processing device 200 (selection unit 270) selects the lowest quality stream for 3D objects other than the 3D object with the highest importance. Subsequently, the selection unit 270 replaces the lowest quality stream with the highest quality stream within the range where the specific condition is satisfied, in descending order of importance of the 3D object. The specific conditions may include a first condition that the bandwidth of the line from the transmission device 100 to the media processing device 200 is equal to or less than a threshold, and may include a second condition that the processing load of the media processing device 200 is equal to or less than the threshold. A specific condition may be defined by a combination of a first condition and a second condition. For example, since the importance of object B (human) is lower than the importance of object C (dog), the selection unit 270 selects a high-quality stream for object B (human) within a range that satisfies a specific condition. select.

上述したように、メディア処理装置200（選択部270）は、視点情報、品質情報及び重要度情報に基づいて、2以上のストリームの中からユーザ端末300に送信すべきストリームを選択すると考えてもよい。メディア処理装置200（選択部270）は、視点情報、品質情報、配置情報及び重要度情報に基づいて、2以上のストリームの中からユーザ端末300に送信すべきストリームを選択すると考えてもよい。 As described above, media processing device 200 (selection unit 270) selects a stream to be transmitted to user terminal 300 from among two or more streams based on viewpoint information, quality information, and importance information. good. It may be considered that the media processing device 200 (selection unit 270) selects a stream to be transmitted to the user terminal 300 from among two or more streams based on viewpoint information, quality information, arrangement information, and importance information.

なお、動作例4では、360°映像について、1つのストリームが存在するケースについて例示した。しかしながら、実施形態はこれに限定されるものではない。360°映像についても、品質が異なる2以上のストリームが存在してもよい。 Note that, in Operation Example 4, the case where one stream exists for 360° video is illustrated. However, embodiments are not so limited. For 360° video, there may also be two or more streams with different qualities.

動作例4では、3Dオブジェクトについて、視点情報に基づいた3Dオブジェクトの向きによって品質が異なる2以上のストリームが存在するケースについて例示した。しかしながら、実施形態はこれに限定されるものではない。3Dオブジェクトについて、3Dオブジェクトの向きによらずに、品質が異なる2以上のストリームが存在してもよい。 In operation example 4, there are two or more streams with different quality depending on the orientation of the 3D object based on the viewpoint information. However, embodiments are not so limited. For a 3D object, there may be two or more streams with different quality regardless of the orientation of the 3D object.

（動作例5）
上述した実施形態は、以下に示す動作例3を含んでもよい。動作例3では、メディア処理装置200（例えば、レンダラ220）は、特定コンテンツによって構成される3次元空間（シーン記述によって構築される3次元空間）においてユーザの視点位置の移動範囲を定義する情報要素を送信装置100から受信する受信部を構成してもよい。 (Operation example 5)
The above-described embodiment may include Operation Example 3 shown below. In operation example 3, the media processing device 200 (for example, the renderer 220) generates an information element that defines the movement range of the user's viewpoint position in a three-dimensional space configured by specific content (three-dimensional space constructed by scene description). from the transmitting device 100 may be configured.

第1に、動作例5では、情報要素は、特定コンテンツに含まれる3Dオブジェクトの内側へのユーザの視点位置の移動を制限する情報要素（以下、第1情報要素）を含んでもよい。例えば、図１５に示すように、シーン記述によって構築される3次元空間に3Dオブジェクトが配置されるケースにおいて、3Dオブジェクトの内側への視点位置の移動が制限されてもよい。但し、3Dオブジェクトが建築物であるケース、3Dオブジェクトの内側に別のシーンが存在するケースなどにおいては、3Dオブジェクトの内側への視点位置の移動が許容されてもよい。 First, in Operation Example 5, the information element may include an information element (hereinafter referred to as first information element) that restricts movement of the user's viewpoint position to the inside of the 3D object included in the specific content. For example, as shown in FIG. 15, in a case where a 3D object is arranged in a 3D space constructed by a scene description, movement of the viewpoint position inside the 3D object may be restricted. However, in cases where the 3D object is a building, cases where another scene exists inside the 3D object, etc., movement of the viewpoint position to the inside of the 3D object may be permitted.

第2に、動作例5では、情報要素は、3次元空間の外側へのユーザの視点位置の移動を制限する情報要素（以下、第2情報要素）を含んでもよい。例えば、図１６に示すように、3次元空間は、直方体及び回転楕円体の組合せで定義されてもよい。3次元空間を定義する直方体の数は2以上であってもよく、3次元空間を定義する回転楕円体の数は2以上であってもよい。但し、3次元空間の外側へのユーザの視点位置の移動が許容されるケースがあってもよい。 Second, in Operation Example 5, the information element may include an information element (hereinafter referred to as a second information element) that restricts movement of the user's viewpoint position to the outside of the three-dimensional space. For example, as shown in FIG. 16, the three-dimensional space may be defined by a combination of cuboids and spheroids. The number of cuboids defining the three-dimensional space may be two or more, and the number of spheroids defining the three-dimensional space may be two or more. However, there may be cases where movement of the user's viewpoint position to the outside of the three-dimensional space is permitted.

特に限定されるものではないが、第1情報要素は、図１７に示す態様でシーン記述に含まれてもよい。図１７では、第1情報要素は、viewing_inside_object_flagで表されてもよい。viewing_inside_object_flagは、3Dオブジェクト毎に設定されてもよい。例えば、viewing_inside_object_flagが”0”である場合に、3Dオブジェクト内への視点位置の移動が制限され、viewing_inside_object_flagが”1”である場合に、3Dオブジェクト内への視点位置の移動が許容されてもよい。 Although not particularly limited, the first information element may be included in the scene description in the manner shown in FIG. In FIG. 17, the first information element may be represented by viewing_inside_object_flag. viewing_inside_object_flag may be set for each 3D object. For example, when viewing_inside_object_flag is "0", movement of the viewpoint position into the 3D object may be restricted, and when viewing_inside_object_flag is "1", movement of the viewpoint position into the 3D object may be permitted. .

特に限定されるものではないが、第2情報要素は、図１８に示す態様でシーン記述に含まれてもよい。図１８では、第2情報要素は、3次元空間を定義する直方体を定義する情報要素（cuboid_center_x, cuboid_center_y, cuboid_center_z, cuboid_size_x, cuboid_size_y, cuboid_size_z）を含んでもよい。cuboid_center_x, cuboid_center_y, cuboid_center_zは、直方体の中心位置を示す情報要素であり、cuboid_size_x, cuboid_size_y, cuboid_size_zは、直方体のサイズを示す情報要素である。第2情報要素は、3次元空間を定義する回転楕円体を定義する情報要素（spheroid_center_x, spheroid_center_y, spheroid_center_z, spheroid_size_x, spheroid_size_y, spheroid_size_z）を含んでもよい。spheroid_center_x, spheroid_center_y, spheroid_center_zは、回転楕円体の中心位置を示す情報要素であり、spheroid_size_x, spheroid_size_y, spheroid_size_zは、回転楕円体のサイズを示す情報要素である。なお、cuboid_enableは、直方体によって3次元空間を定義するか否かを示す情報要素であり、spheroid_enableは、回転楕円体によって3次元空間を定義するか否かを示す情報要素であってもよい。図１８では、2つの直方体及び2つの回転楕円体によって3次元空間が定義されるケースが例示されている。 Although not particularly limited, the second information element may be included in the scene description in the manner shown in FIG. In FIG. 18, the second information element may include information elements (cuboid_center_x, cuboid_center_y, cuboid_center_z, cuboid_size_x, cuboid_size_y, cuboid_size_z) that define a cuboid defining a three-dimensional space. cuboid_center_x, cuboid_center_y, and cuboid_center_z are information elements indicating the center position of the cuboid, and cuboid_size_x, cuboid_size_y, and cuboid_size_z are information elements indicating the size of the cuboid. The second information element may include information elements (spheroid_center_x, spheroid_center_y, spheroid_center_z, spheroid_size_x, spheroid_size_y, spheroid_size_z) defining a spheroid defining a three-dimensional space. spheroid_center_x, spheroid_center_y, spheroid_center_z are information elements indicating the center position of the spheroid, and spheroid_size_x, spheroid_size_y, spheroid_size_z are information elements indicating the size of the spheroid. Note that cuboid_enable may be an information element indicating whether or not a three-dimensional space is defined by a cuboid, and spheroid_enable may be an information element indicating whether or not a three-dimensional space is defined by a spheroid. FIG. 18 illustrates a case where a three-dimensional space is defined by two cuboids and two spheroids.

このような前提下において、メディア処理装置200（レンダラ220）は、以下に示す動作を実行してもよい。 Under this premise, the media processing device 200 (renderer 220) may perform the following operations.

第1に、レンダラ220は、ユーザの視点位置が移動範囲外に移動する場合において、ユーザの視点位置の軌跡と移動範囲の境界との交点を視点位置として、特定コンテンツを生成してもよい。すなわち、レンダラ220は、視点位置が移動範囲外に移動しようとした時点の位置（境界位置）で視点位置を固定してもよい。 First, when the user's viewpoint position moves outside the movement range, the renderer 220 may generate specific content using the intersection of the trajectory of the user's viewpoint position and the boundary of the movement range as the viewpoint position. That is, the renderer 220 may fix the viewpoint position at the position (boundary position) when the viewpoint position is about to move out of the movement range.

第2に、レンダラ220は、ユーザの視点位置が移動範囲外に移動する場合において、視点位置の移動が制限されている旨をユーザに通知してもよい。例えば、レンダラ220は、「ここから先は移動できません」などのメッセージを表示してもよい。 Second, the renderer 220 may notify the user that movement of the viewpoint position is restricted when the user's viewpoint position moves outside the movement range. For example, the renderer 220 may display a message such as "You cannot move beyond this point."

（作用及び効果）
実施形態では、メディア処理装置200は、視点情報に基づいて特定コンテンツを生成した上で、特定コンテンツをユーザ端末300に送信する。このような構成によれば、視点の自由度を有する第2コンテンツを含む特定コンテンツをユーザ端末300側で生成する必要がなく、ユーザ端末300は、メディア処理装置200に対して視点情報を提供すれば、特定コンテンツを提示することができる。従って、メディア処理装置200とユーザ端末300との間の遅延が生じるものの、ユーザ端末300の処理負荷を軽減することができる。 (Action and effect)
In the embodiment, the media processing device 200 generates specific content based on viewpoint information, and then transmits the specific content to the user terminal 300. FIG. According to such a configuration, there is no need for the user terminal 300 to generate specific content including the second content having a degree of freedom of viewpoint, and the user terminal 300 can provide the media processing device 200 with viewpoint information. For example, specific content can be presented. Therefore, although there is a delay between the media processing device 200 and the user terminal 300, the processing load on the user terminal 300 can be reduced.

動作例1では、メディア処理装置200は、特定コンテンツに付加されるシーケンス番号と対応付けて、特定コンテンツの生成で用いた視点情報をユーザ端末300に送信する。このような構成によれば、ユーザ端末300は、特定コンテンツの生成で用いた視点情報及びシーケンス番号を把握することができるため、メディア処理装置200で特定コンテンツを生成する際に用いる視点情報がユーザ端末300で特定コンテンツを表示する際に用いる視点情報と異なるケースを想定した場合であっても、特定コンテンツを適切に表示することができる。 In operation example 1, the media processing device 200 transmits viewpoint information used in generating the specific content to the user terminal 300 in association with the sequence number added to the specific content. According to such a configuration, the user terminal 300 can grasp the viewpoint information and the sequence number used in generating the specific content. Even if it is assumed that the viewpoint information differs from the viewpoint information used when displaying the specific content on the terminal 300, the specific content can be displayed appropriately.

動作例2では、メディア処理装置200は、コンテンツの構成及び特定視点情報（推奨ビューポート情報）を送信装置100から受信する。このような構成によれば、メディア処理装置200は、特定視点情報に基づいて特定コンテンツを生成することができ、視点情報をフィードバックする第1ユーザ端末400及び視点情報をフィードバックしない第2ユーザ端末500が混在するケースを想定した場合であっても、特定コンテンツを適切に表示することができる。 In operation example 2, the media processing device 200 receives the content configuration and specific viewpoint information (recommended viewport information) from the transmission device 100 . With such a configuration, the media processing device 200 can generate specific content based on specific viewpoint information, and the first user terminal 400 that feeds back viewpoint information and the second user terminal 500 that does not feed back viewpoint information. specific content can be displayed appropriately even when assuming a case where

動作例2では、メディア処理装置200は、ユーザ端末が第1ユーザ端末400である場合であっても、リセット信号に応じて、特定視点情報に基づいて特定コンテンツを生成する。このような構成によれば、シーン記述によって構築される3次元空間において視点位置及び視線方向が第1ユーザ端末400において不明となるケース（3次元空間において迷子になるケース）を想定した場合であっても、リセット信号によって特定視点情報に基づいた特定コンテンツに復帰することができる。 In operation example 2, even when the user terminal is the first user terminal 400, the media processing device 200 generates the specific content based on the specific viewpoint information in response to the reset signal. According to such a configuration, it is assumed that the viewpoint position and line-of-sight direction in the three-dimensional space constructed by the scene description are unknown in the first user terminal 400 (the case of getting lost in the three-dimensional space). However, it is possible to return to the specific content based on the specific viewpoint information by the reset signal.

動作例3では、メディア処理装置200は、特定コンテンツに含まれる3Dオブジェクトに関するストリームの品質情報として、視点情報に基づいた3Dオブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を送信装置100から受信する。このような構成によれば、3Dオブジェクトに関するバウンディングボックスを構成する各面の品質が均一でなくてもよいという新たな知見に基づいて、伝送トラフィックを抑制しながらも、3Dオブジェクトを適切に表示することができる。 In operation example 3, the media processing device 200 sends quality information about each of two or more streams that differ in quality depending on the orientation of the 3D object based on viewpoint information, as stream quality information about the 3D object included in the specific content. Receive from 100. According to such a configuration, the 3D object can be displayed appropriately while suppressing the transmission traffic based on the new knowledge that the quality of each surface constituting the bounding box of the 3D object does not have to be uniform. be able to.

動作例4では、メディア処理装置200は、特定コンテンツに含まれる2以上のオブジェクトの各々に関する重要度情報を送信装置100から受信する。このような構成によれば、360°映像及び3Dオブジェクトなどのオブジェクト毎の重要度を設定する仕組みを導入することによって、伝送トラフィックを抑制しながらも、特定コンテンツに含まれる各オブジェクトを適切に表示することができる。 In operation example 4, the media processing device 200 receives from the transmission device 100 importance information about each of two or more objects included in the specific content. According to such a configuration, by introducing a mechanism for setting the importance of each object such as 360° video and 3D objects, while suppressing transmission traffic, each object included in specific content can be displayed appropriately. can do.

動作例5では、メディア処理装置200は、特定コンテンツによって構成される3次元空間においてユーザの視点位置の移動範囲を定義する情報要素を送信装置100から受信する。このような構成によれば、ユーザ端末300に表示される特定コンテンツの破綻を生じることなく、視点の自由度を有するコンテンツを含む特定コンテンツを適切に表示することができる。 In operation example 5, the media processing device 200 receives from the transmission device 100 an information element that defines the movement range of the user's viewpoint position in the three-dimensional space configured by the specific content. According to such a configuration, it is possible to appropriately display the specific content including the content having a degree of freedom of viewpoint without causing failure of the specific content displayed on the user terminal 300 .

［変更例1］
以下において、実施形態の変更例1について説明する。以下においては、実施形態に対する相違点について主として説明する。 [Modification 1]
Modification 1 of the embodiment will be described below. In the following, mainly the differences with respect to the embodiments will be described.

変更例1では、特定コンテンツが第1コンテンツ及び第2コンテンツの双方を含む場合において、第1コンテンツと第2コンテンツとの同期を取る方法について説明する。 Modification 1 describes a method of synchronizing first content and second content when specific content includes both first content and second content.

なお、以下において、同期とは、第1コンテンツ（例えば、MPU）と第2コンテンツ（ファイル）との提示時刻が適切に揃うことを意味する。従って、同期は、2D映像と3Dオブジェクトとの提示時刻が揃うことを含んでもよく、音声と3Dオブジェクトとの提示時刻が揃うことを含んでもよい。同様に、同期は、2D映像と360°映像との提示時刻が揃うことを含んでもよく、音声と360°映像との提示時刻が揃うことを含んでもよい。 Note that hereinafter, synchronization means that the presentation times of the first content (for example, MPU) and the second content (file) are appropriately aligned. Therefore, synchronization may include aligning the presentation times of the 2D video and the 3D object, and may include aligning the presentation times of the audio and the 3D object. Similarly, synchronization may include aligning presentation times of 2D video and 360° video, and aligning presentation times of audio and 360° video.

第1方法では、メディア処理装置200が第1制御情報（MMT-SI）に基づいて、第1コンテンツと第2コンテンツとの同期を取るケースについて説明する。メディア処理装置200は、MMT-SIをエントリーポイントとして、シーン記述（第2コンテンツ）の有無を確認した上で、シーン記述が存在する場合には、MPUタイムスタンプ記述子を流用して、第1コンテンツ及び第2コンテンツを含む特定コンテンツの提示時刻を特定する。 In the first method, a case will be described in which the media processing device 200 synchronizes the first content and the second content based on the first control information (MMT-SI). The media processing device 200 uses MMT-SI as an entry point to check whether there is a scene description (second content). Identify the presentation time of the specific content including the content and the second content.

具体的には、図１９に示すように、2D映像及び音声は、MPUタイムスタンプ記述子（図１９では、単にtimestamp）に基づいて提示されるため、2D映像及び音声の同期が取れる。 Specifically, as shown in FIG. 19, 2D video and audio are presented based on the MPU timestamp descriptor (just timestamp in FIG. 19), so 2D video and audio can be synchronized.

一方で、シーン記述に含まれる最初のフレームの提示時刻は、MMT-SIに含まれるMPUタイムスタンプ記述子を参照することによって特定される。シーン記述に含まれる２番目以降フレームの提示時刻は、シーン記述に含まれるフレーム番号及び第2コンテンツのフレームレートによって特定することが可能である。例えば、フレームレートが30fpsであるケースを考えると、n番目のフレームの提示時刻は、MPUタイムスタンプ記述子によって特定される時刻に1/30×nを加算することによって特定される。但し、シーン記述に含まれる最初のフレームのフレーム番号は”0”である。 On the other hand, the presentation time of the first frame included in the scene description is specified by referring to the MPU timestamp descriptor included in MMT-SI. The presentation times of the second and subsequent frames included in the scene description can be specified by the frame numbers included in the scene description and the frame rate of the second content. For example, considering the case where the frame rate is 30 fps, the presentation time of the nth frame is determined by adding 1/30*n to the time specified by the MPU Timestamp Descriptor. However, the frame number of the first frame included in the scene description is "0".

第1方法では、シーン記述に含まれる最初のフレームの提示時刻をシーン記述が含まないケースを例示したが、シーン記述は、シーン記述に含まれる最初のフレームの提示時刻を含んでもよい。 In the first method, the scene description does not include the presentation time of the first frame included in the scene description, but the scene description may include the presentation time of the first frame included in the scene description.

第2方法では、メディア処理装置200が第2制御情報（シーン記述）に基づいて、第1コンテンツと第2コンテンツとの同期を取るケースについて説明する。メディア処理装置200は、シーン記述をエントリーポイントとして、MMT-SI（第1コンテンツ）の有無を確認した上で、MMT-SIが存在する場合には、シーン記述に含まれる提示時刻に基づいて、第1コンテンツ及び第2コンテンツを含む特定コンテンツの提示時刻を特定する。 In the second method, a case will be described in which the media processing device 200 synchronizes the first content and the second content based on the second control information (scene description). Using the scene description as an entry point, the media processing device 200 confirms the presence or absence of MMT-SI (first content), and if MMT-SI exists, based on the presentation time included in the scene description, Identify the presentation time of the specific content including the first content and the second content.

このようなケースにおいて、シーン記述は、第2コンテンツの提示時刻を示す絶対時刻情報を含む。絶対時刻情報は、シーン記述に含まれる最初のフレームの提示時刻であってもよい。 In such cases, the scene description includes absolute time information indicating the presentation time of the second content. The absolute time information may be the presentation time of the first frame included in the scene description.

例えば、絶対時刻情報は、UTCを基準時刻として生成されてもよい。基準時刻は、TAIが用いられてもよく、GPSから提供される時刻が用いられてもよい。基準時刻は、NTPサーバから提供される時刻であってもよく、PTPサーバから提供される時刻であってもよい。さらに、絶対時刻情報は、MPUタイムスタンプ記述子と同一基準時刻に基づいて生成されてもよい。 For example, absolute time information may be generated using UTC as a reference time. As the reference time, TAI may be used, or the time provided by GPS may be used. The reference time may be the time provided by the NTP server or the time provided by the PTP server. Furthermore, absolute time information may be generated based on the same reference time as the MPU timestamp descriptor.

さらに、シーン記述は、第1コンテンツを特定するための参照情報を含む。参照情報は、第1コンテンツを構成するMPUを特定するための情報であってもよい。すなわち、参照情報は、シーン記述に含まれるオブジェクトとして第1コンテンツ（MPU）を扱うための情報である。 Furthermore, the scene description includes reference information for identifying the first content. The reference information may be information for identifying the MPUs forming the first content. That is, the reference information is information for handling the first content (MPU) as an object included in the scene description.

具体的には、図２０に示すように、シーン記述に含まれる最初のフレームの提示時刻は、シーン記述に含まれる絶対時刻情報によって特定される。シーン記述に含まれる２番目以降フレームの提示時刻は、シーン記述に含まれるフレーム番号及び第2コンテンツのフレームレートによって特定することが可能である。例えば、フレームレートが30fpsであるケースを考えると、n番目のフレームの提示時刻は、MPUタイムスタンプ記述子によって特定される時刻に1/30×nを加算することによって特定される。但し、シーン記述に含まれる最初のフレームのフレーム番号は”0”である。 Specifically, as shown in FIG. 20, the presentation time of the first frame included in the scene description is specified by absolute time information included in the scene description. The presentation times of the second and subsequent frames included in the scene description can be specified by the frame numbers included in the scene description and the frame rate of the second content. For example, considering the case where the frame rate is 30 fps, the presentation time of the nth frame is determined by adding 1/30*n to the time specified by the MPU Timestamp Descriptor. However, the frame number of the first frame included in the scene description is "0".

一方で、2D映像及び音声は、MPUタイムスタンプ記述子（図２０では、単にtimestamp）に基づいて提示されるため、2D映像及び音声の同期が取れる。ここで、上述した参照情報がシーン記述に含まれるため、メディア処理装置200は、シーン記述に含まれる参照情報に基づいて、第2コンテンツとともに提示すべき第1コンテンツの有無を確認することができる。 On the other hand, 2D video and audio are presented based on the MPU timestamp descriptor (just timestamp in FIG. 20), so 2D video and audio can be synchronized. Here, since the above-described reference information is included in the scene description, the media processing device 200 can confirm the presence or absence of the first content to be presented together with the second content based on the reference information included in the scene description. .

第2方法では、2D映像と音声との同期がMMT-SIに含まれるMPUタイムスタンプ記述子に基づいて取られているが、変更例1では、2D映像と音声との同期についても、シーン記述に含まれる情報要素（絶対時刻情報及び参照情報）に基づいて取られてもよい。このようなケースにおいて、少なくとも、MMT-SIに含まれるMPUタイムスタンプ記述子については省略されてもよい。さらに、MMT-SIそのものが省略されてもよい。 In the second method, 2D video and audio are synchronized based on the MPU timestamp descriptor included in MMT-SI. may be taken based on the information elements (absolute time information and reference information) contained in . In such cases, at least the MPU timestamp descriptor included in MMT-SI may be omitted. Furthermore, MMT-SI itself may be omitted.

なお、MMT-SIに含まれるMPUタイムスタンプ記述子の基準時刻（以下、第1基準時刻）とシーン記述に含まれる絶対時刻情報の基準時刻（第2基準時刻）とが異なる場合には、第1制御情報（MMT-SI）及び第2制御情報（シーン記述）の少なくともいずれか1つは、第1基準時刻と第2基準時刻との変換情報を含んでもよい。例えば、MMT-SIは、第1基準時刻（例えば、UTC）で表されたMPUタイムスタンプ記述子に加えて、第2基準時刻（例えば、UTC以外の基準時刻）で表されたMPUタイムスタンプ記述子を含んでもよい。シーン記述は、第2基準時刻（例えば、UTC以外の基準時刻）で表された絶対時刻情報に加えて、第1基準時刻（例えば、UTC）で表された絶対時刻情報を含んでもよい。 If the reference time of the MPU timestamp descriptor included in MMT-SI (hereafter referred to as the first reference time) differs from the reference time of the absolute time information included in the scene description (second reference time), the At least one of the first control information (MMT-SI) and the second control information (scene description) may include conversion information between the first reference time and the second reference time. For example, in MMT-SI, in addition to an MPU timestamp descriptor expressed in a first reference time (e.g. UTC), an MPU timestamp description expressed in a second reference time (e.g. a reference time other than UTC) May contain children. The scene description may include absolute time information expressed in a first reference time (eg, UTC) in addition to absolute time information expressed in a second reference time (eg, a reference time other than UTC).

なお、MMT-SIに含まれるMPUタイムスタンプ記述子は、第1絶対時刻情報と称されてもよく、シーン記述に含まれる絶対時刻情報は、第2絶対時刻情報と称されてもよい。 Note that the MPU timestamp descriptor included in MMT-SI may be referred to as first absolute time information, and the absolute time information included in the scene description may be referred to as second absolute time information.

［その他の実施形態］
本発明は上述した開示によって説明したが、この開示の一部をなす論述及び図面は、この発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施形態、実施例及び運用技術が明らかとなろう。 [Other embodiments]
While the present invention has been described in the foregoing disclosure, the discussion and drawings forming part of this disclosure should not be taken as limiting the invention. Various alternative embodiments, implementations and operational techniques will become apparent to those skilled in the art from this disclosure.

上述した開示では、特定コンテンツが第1コンテンツ及び第2コンテンツの双方を含むケースについて例示したが、上述した開示はこれに限定されるものではない。特定コンテンツは、少なくとも第2コンテンツを含めばよい。 Although the above disclosure exemplifies the case where the specific content includes both the first content and the second content, the above disclosure is not limited to this. The specific content should include at least the second content.

上述した開示では特に触れていないが、MMTに関する用語は、ISO/IEC 23008-1、ARIB STD-B60、ARIB TR-B39などで規定された内容に基づいて解釈されてもよい。 Although not specifically mentioned in the disclosure above, terms related to MMT may be interpreted based on the content defined in ISO/IEC 23008-1, ARIB STD-B60, ARIB TR-B39, and the like.

上述した開示では、MMT-SIに含まれる第1絶対時刻情報として、MPUタイムスタンプ記述子を例示した。しかしながら、上述した開示はこれに限定されるものではない。MMT-SIに含まれる第1絶対時刻情報は、MPU拡張タイムスタンプ記述子であってもよい。 In the above disclosure, the MPU timestamp descriptor was exemplified as the first absolute time information included in MMT-SI. However, the above disclosure is not so limited. The first absolute time information included in MMT-SI may be an MPU extended timestamp descriptor.

上述した開示では特に触れていないが、メディア処理装置200は、必要に応じて、第2コンテンツの一部を送信装置100に要求してもよい。このような構成によれば、第2コンテンツの伝送に伴う帯域を節約し、メディア処理装置200の処理負荷の増大を抑制することができる。 Although not specifically mentioned in the above disclosure, the media processing device 200 may request part of the second content from the transmitting device 100 as necessary. According to such a configuration, it is possible to save the band accompanying the transmission of the second content and suppress an increase in the processing load of the media processing device 200. FIG.

上述した開示では、第1コンテンツの伝送方式としてMMTPを例示した。しかしながら、上述した開示はこれに限定されるものではない。第1コンテンツの伝送方式は、ISO/IEC 23009-1（以下、MPEG-DASH（Dynamic Adaptive Stream over HTTP））に準拠する方式であってもよい。このようなケースにおいて、第1制御情報は、MPD（Media Presentation Description）であってもよい。すなわち、上述した開示において、MMT-SIはMPDと読み替えられてもよい。 In the above disclosure, MMTP was exemplified as the transmission scheme of the first content. However, the above disclosure is not so limited. The transmission method of the first content may be a method conforming to ISO/IEC 23009-1 (hereinafter referred to as MPEG-DASH (Dynamic Adaptive Stream over HTTP)). In such a case, the first control information may be MPD (Media Presentation Description). That is, in the above disclosure, MMT-SI may be read as MPD.

上述した開示では特に触れていないが、「取得」は「受信」と読み替えられてもよい。 Although not specifically mentioned in the above disclosure, "acquisition" may be read as "reception".

特に限定されるものではないが、動作例2は、以下のように表現されてもよい。送信装置100は、視点の自由度を有するコンテンツの構成を送信する送信部を備え、送信部は、コンテンツを少なくとも含む特定コンテンツの生成に用いられる特定視点情報を送信する。受信装置は、視点の自由度を有するコンテンツの構成を受信する受信部を備え、受信部は、コンテンツを少なくとも含む特定コンテンツの生成に用いられる特定視点情報を受信する。このようなケースにおいて、受信装置は、メディア処理装置200であってもよく、ユーザ端末300であってもよい。 Although not particularly limited, Operation Example 2 may be expressed as follows. The transmission device 100 includes a transmission unit that transmits a configuration of content having a degree of freedom of viewpoint, and the transmission unit transmits specific viewpoint information that is used to generate specific content that includes at least the content. The receiving device includes a receiving unit that receives a configuration of content having a degree of freedom of viewpoint, and the receiving unit receives specific viewpoint information that is used to generate specific content including at least the content. In such cases, the receiving device may be the media processing device 200 or the user terminal 300 .

特に限定されるものではないが、動作例3は、以下のように表現されてもよい。送信装置100は、視点の自由度を有するコンテンツの構成を送信する送信部を備え、送信部は、コンテンツを少なくとも含む特定コンテンツに含まれる3次元オブジェクトに関するストリームの品質情報を送信し、品質情報は、3次元オブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を含む。受信装置は、視点の自由度を有するコンテンツの構成を受信する受信部を備え、受信部は、コンテンツを少なくとも含む特定コンテンツに含まれる3次元オブジェクトに関するストリームの品質情報を受信し、品質情報は、3次元オブジェクトの向きによって品質が異なる2以上のストリームの各々に関する品質情報を含む。このようなケースにおいて、受信装置は、メディア処理装置200であってもよく、ユーザ端末300であってもよい。 Although not particularly limited, Operation Example 3 may be expressed as follows. The transmission device 100 includes a transmission unit that transmits a content configuration having a degree of freedom of viewpoint, the transmission unit transmits stream quality information about a three-dimensional object included in specific content that includes at least the content, and the quality information is , containing quality information about each of two or more streams whose quality varies depending on the orientation of the 3D object. The receiving device includes a receiving unit that receives a content configuration having a degree of freedom of viewpoint, the receiving unit receives stream quality information about a three-dimensional object included in specific content that includes at least the content, and the quality information includes: Contains quality information for each of two or more streams that differ in quality depending on the orientation of the 3D object. In such cases, the receiving device may be the media processing device 200 or the user terminal 300 .

特に限定されるものではないが、動作例4は、以下のように表現されてもよい。送信装置100は、視点の自由度を有するコンテンツの構成を送信する送信部を備え、送信部は、コンテンツを少なくとも含む特定コンテンツに含まれる2以上のオブジェクトの各々に関する重要度情報を送信する。受信装置は、視点の自由度を有するコンテンツの構成を受信する受信部を備え、受信部は、コンテンツを少なくとも含む特定コンテンツに含まれる2以上のオブジェクトの各々に関する重要度情報を受信する。このようなケースにおいて、受信装置は、メディア処理装置200であってもよく、ユーザ端末300であってもよい。 Although not particularly limited, Operation Example 4 may be expressed as follows. The transmission device 100 includes a transmission unit that transmits a content configuration having a degree of freedom of viewpoint, and the transmission unit transmits importance information regarding each of two or more objects included in specific content that includes at least the content. The receiving device includes a receiving unit that receives a configuration of content having a degree of freedom of viewpoint, and the receiving unit receives importance information regarding each of two or more objects included in specific content that includes at least the content. In such cases, the receiving device may be the media processing device 200 or the user terminal 300 .

特に限定されるものではないが、動作例4は、以下のように表現されてもよい。送信装置100は、視点の自由度を有するコンテンツの構成を送信する送信部を備え、送信部は、コンテンツを少なくとも含む特定コンテンツによって構成される3次元空間においてユーザの視点位置の移動範囲を定義する情報要素を送信する。受信装置は、視点の自由度を有するコンテンツの構成を受信する受信部を備え、受信部は、コンテンツを少なくとも含む特定コンテンツによって構成される3次元空間においてユーザの視点位置の移動範囲を定義する情報要素を受信する。このようなケースにおいて、受信装置は、メディア処理装置200であってもよく、ユーザ端末300であってもよい。 Although not particularly limited, Operation Example 4 may be expressed as follows. The transmission device 100 includes a transmission unit that transmits a content configuration having a degree of freedom of viewpoint, and the transmission unit defines a movement range of a user's viewpoint position in a three-dimensional space configured by specific content including at least the content. Send an information element. The receiving device includes a receiving unit that receives a content configuration having a degree of freedom of viewpoint, and the receiving unit includes information that defines a movement range of a user's viewpoint position in a three-dimensional space configured by specific content including at least the content. Receive elements. In such cases, the receiving device may be the media processing device 200 or the user terminal 300 .

上述した開示では特に触れていないが、送信装置100、メディア処理装置200及びユーザ端末300が行う各処理をコンピュータに実行させるプログラムが提供されてもよい。また、プログラムは、コンピュータ読取り可能媒体に記録されていてもよい。コンピュータ読取り可能媒体を用いれば、コンピュータにプログラムをインストールすることが可能である。ここで、プログラムが記録されたコンピュータ読取り可能媒体は、非一過性の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、CD-ROMやDVD-ROM等の記録媒体であってもよい。 Although not specifically mentioned in the disclosure above, a program may be provided that causes a computer to execute each process performed by the transmission device 100, the media processing device 200, and the user terminal 300. FIG. Also, the program may be recorded on a computer-readable medium. A computer readable medium allows the installation of the program on the computer. Here, the computer-readable medium on which the program is recorded may be a non-transitory recording medium. The non-transitory recording medium is not particularly limited, but may be, for example, a recording medium such as CD-ROM or DVD-ROM.

或いは、送信装置100、メディア処理装置200及びユーザ端末300が行う各処理を実行するためのプログラムを記憶するメモリ及びメモリに記憶されたプログラムを実行するプロセッサによって構成されるチップが提供されてもよい。 Alternatively, a chip configured by a memory storing a program for executing each process performed by the transmitting device 100, the media processing device 200, and the user terminal 300 and a processor executing the program stored in the memory may be provided. .

10…伝送システム、100…送信装置、200…メディア処理装置、210…受付部、220…レンダラ、230…符号化処理部、260…選択部、270…選択部、300…ユーザ端末、310…検出部、320…復号処理部、330…レンダラ、400…第1ユーザ端末、500…第2ユーザ端末 DESCRIPTION OF SYMBOLS 10... Transmission system 100... Transmission apparatus 200... Media processing apparatus 210... Reception part 220... Renderer 230... Encoding process part 260... Selection part 270... Selection part 300... User terminal 310... Detection Unit 320 Decoding processing unit 330 Renderer 400 First user terminal 500 Second user terminal

Claims

a receiving unit that receives viewpoint information from a user terminal;
a renderer that generates specific content including at least content having a degree of freedom of viewpoint based on the viewpoint information;
a transmitting unit configured to transmit the specific content generated by the renderer to the user terminal;
The receiving unit transmits, as stream quality information about a 3D object included in the specific content, quality information about each of two or more streams that differ in quality depending on the orientation of the 3D object based on the viewpoint information. A media processing device that receives from.

2. The media processing device according to claim 1, wherein said renderer selects a stream to be transmitted to said user terminal from among said two or more streams based on said viewpoint information and said quality information.

3. The media processing device according to claim 1, wherein said quality information is information indicating relative quality of each surface forming a bounding box for said three-dimensional object.

4. The media processing device according to any one of claims 1 to 3, wherein said receiving unit receives arrangement information of a three-dimensional object in a three-dimensional space from said transmitting device.

5. The media processing device according to claim 4, wherein said renderer selects a stream to be transmitted to said user terminal from among said two or more streams based on said viewpoint information, said quality information and said arrangement information.

A transmission unit that transmits a configuration of content having a degree of freedom of viewpoint,
The transmitting unit transmits stream quality information regarding a three-dimensional object included in specific content including at least the content,
The transmitting device, wherein the quality information includes quality information about each of two or more streams whose quality varies depending on the orientation of the three-dimensional object.

A receiving unit for receiving a content configuration having a degree of freedom of viewpoint,
The receiving unit receives stream quality information about a three-dimensional object included in specific content including at least the content,
The receiving device, wherein the quality information includes quality information about each of two or more streams whose quality varies depending on the orientation of the three-dimensional object.