JP7268107B2

JP7268107B2 - Visual media data processing method

Info

Publication number: JP7268107B2
Application number: JP2021151936A
Authority: JP
Inventors: ワンイェ－クイ
Original assignee: Lemon Inc Cayman Island
Current assignee: Lemon Inc Cayman Island
Priority date: 2020-09-17
Filing date: 2021-09-17
Publication date: 2023-05-02
Anticipated expiration: 2041-09-17
Also published as: US12206879B2; EP3972265A1; US20220086473A1; JP2022050369A; JP2022050367A; EP3972275A1; US11902552B2; KR20220037396A; CN114205600A; US20220086497A1; KR20220037387A; US12143611B2; JP7209062B2; JP2022050368A; CN114205601A; KR20220037388A; CN114205598A; US20220086385A1; EP3972274A1; JP7268106B2

Description

関連出願の相互参照
パリ条約に関して適用可能な特許法及び／又は規則の下で本願は2020年9月17日付で出願された米国仮特許出願第63/079,892号の優先権及び利益を適時に主張するために行われている。法に基づく全ての目的に関し、前述の出願の開示全体は、本願の開示の一部として参照により援用される。 CROSS-REFERENCE TO RELATED APPLICATIONS Under applicable patent law and/or regulations relating to the Paris Convention, this application timely claims priority to and benefit from U.S. Provisional Patent Application No. 63/079,892, filed September 17, 2020. is done to The entire disclosure of the aforementioned application is incorporated by reference as part of the disclosure of the present application for all purposes under the law.

技術分野
本件特許明細書はデジタル・オーディオ・ビデオ・メディア情報をファイル・フォーマットで生成、保存、及び消費することに関連している。 TECHNICAL FIELD This patent application relates to creating, storing, and consuming digital audio-video media information in file formats.

背景
デジタル・ビデオは、インターネット及びその他のデジタル通信ネットワークにおいて利用する最大の帯域幅を占めている。ビデオを受信及び表示することが可能な接続ユーザー・デバイスの台数が増加するにつれて、デジタル・ビデオの利用に対する帯域幅需要は増加し続けるであろうということが予想される。 BACKGROUND Digital video accounts for the largest bandwidth utilization on the Internet and other digital communication networks. It is expected that the bandwidth demand for digital video usage will continue to increase as the number of connected user devices capable of receiving and displaying video increases.

本件明細書は、ビデオ又は画像のコーディングされた表現をファイル・フォーマットに従って処理するためにビデオ・エンコーダ及びデコーダによって使用することが可能な技術を開示する。 This specification discloses techniques that can be used by video encoders and decoders to process coded representations of videos or images according to file formats.

ある態様例では、ビデオ処理方法が開示される。方法は、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップを含み、ビットストリームは、1つ以上のプロファイル階層レベル・シンタックス構造を含む1つ以上のパラメータ・セットと1つ以上の出力レイヤ・セットとを含み、プロファイル階層レベル・シンタックス構造のうちの少なくとも1つは一般制約情報シンタックス構造を含み、フォーマット・ルールは、シンタックス要素がビジュアル・メディア・ファイルのコンフィギュレーション・レコードに含まれることを指定し、シンタックス要素は、コンフィギュレーション・レコードで指定される出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従うプロファイル、階層又はレベルを指定している。 In one example aspect, a video processing method is disclosed. The method includes performing conversion between a bitstream of visual media data and a visual media file according to format rules, the bitstream including one or more profile hierarchy level syntax structures. including one or more parameter sets and one or more output layer sets, wherein at least one of the profile hierarchy level syntax structures includes a general constraint information syntax structure; Specifies that the element is included in the configuration record of the visual media file, and the syntax element is the profile followed by the output layer set identified by the output layer set index specified in the configuration record. , specifying a hierarchy or level.

別の態様例では、ビデオ処理方法が開示される。方法は、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップを含み、フォーマット・ルールは、ビジュアル・メディア・ファイルにおけるシンタックス要素の特徴を指定し、シンタックス要素は、ビットストリームに関連する制約情報を指定するために使用されるバイト数を表す値を有する。 In another example aspect, a video processing method is disclosed. The method includes performing conversion between a bitstream of visual media data and a visual media file according to formatting rules, the formatting rules characterizing syntax elements in the visual media file. The specified syntax element has a value that represents the number of bytes used to specify constraint information associated with the bitstream.

別の態様例では、ビデオ処理方法が開示される。方法は、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップを含み、フォーマット・ルールは、ビジュアル・メディア・ファイルのシンタックス要素の特徴を指定し、フォーマット・ルールは、レベル識別（level identification）を表す値を有するシンタックス要素は、サブピクチャ共通グループ・ボックス又はサブピクチャ複数グループ・ボックスのうちの任意の1つ又は双方において8ビットを用いてコーディングされることを指定している。 In another example aspect, a video processing method is disclosed. The method includes performing conversion between a bitstream of visual media data and a visual media file according to formatting rules, the formatting rules characterizing syntax elements of the visual media file. The format rules specify that syntax elements with values representing level identification shall be 8 bits in any one or both of the Subpicture Common Group Box or the Subpicture Multiple Group Box. It specifies that it is coded using

別の態様例では、ビデオ処理方法が開示される。方法は、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームを保存するファイルとビジュアル・メディア・データとの間の変換を実行するステップを含み、フォーマット・ルールは、ファイルで識別されるビットストリーム表現に関連するプロファイル、階層、制約又は階層に関連してファイルに含まれる情報に対する制約を指定している。 In another example aspect, a video processing method is disclosed. The method includes performing a conversion between a file storing a bitstream of visual media data according to format rules and performing a conversion between the visual media data and a bitstream representation identified in the file. It specifies a profile, hierarchy, constraints, or restrictions on the information contained in the file in relation to the hierarchy.

更に別の態様例では、ビデオ・エンコーダ装置が開示される。ビデオ・エンコーダは、上述の方法を実施するように構成されたプロセッサを備える。 In yet another example aspect, a video encoder apparatus is disclosed. A video encoder comprises a processor configured to implement the method described above.

更に別の態様例では、ビデオ・デコーダ装置が開示される。ビデオ・デコーダは、上述の方法を実施するように構成されたプロセッサを備える。 In yet another example aspect, a video decoder apparatus is disclosed. A video decoder comprises a processor configured to implement the method described above.

更に別の態様例では、そこにコードを格納するコンピュータ読み取り可能な媒体が開示される。コードは、プロセッサ実行可能コードの形式で、本件で説明される方法の1つを具現化する。 In yet another example aspect, a computer-readable medium having code stored thereon is disclosed. The code, in the form of processor-executable code, embodies one of the methods described herein.

更に別の態様例では、そこにビットストリームを格納するコンピュータ読み取り可能な媒体が開示される。ビットストリームは、本件明細書で説明される方法を用いて生成又は処理される。 In yet another example aspect, a computer-readable medium having a bitstream stored thereon is disclosed. A bitstream is generated or processed using the methods described herein.

これら及びその他の特徴については本件明細書を通じて説明される。 These and other features are described throughout this specification.

ビデオ処理システム例のブロック図である。1 is a block diagram of an example video processing system; FIG.

ビデオ処理装置のブロック図である。1 is a block diagram of a video processing device; FIG.

ビデオ処理方法例のフローチャートである。4 is a flowchart of an example video processing method;

本開示の幾つかの実施形態によるビデオ・コーディング・システムを示すブロック図である。1 is a block diagram illustrating a video coding system according to some embodiments of the disclosure; FIG.

本開示の幾つかの実施形態によるエンコーダを示すブロック図である。FIG. 4 is a block diagram illustrating an encoder according to some embodiments of the disclosure;

本開示の幾つかの実施形態によるデコーダを示すブロック図である。FIG. 4 is a block diagram illustrating a decoder according to some embodiments of the disclosure;

エンコーダのブロック図の一例を示す。1 shows an example of a block diagram of an encoder; FIG.

ビデオ処理の方法例に関するフローチャートである。4 is a flowchart of an example method of video processing; ビデオ処理の方法例に関するフローチャートである。4 is a flowchart of an example method of video processing; ビデオ処理の方法例に関するフローチャートである。4 is a flowchart of an example method of video processing;

セクション見出しは、本件明細書においては説明の理解を容易にするために使用されており、各セクションで開示された技術及び実施形態の適用をそのセクションのみに限定してはいない。更に、H.266の用語は、何らかの説明において、理解を容易にするためにのみ使用されており、開示される技術の範囲を限定するためには使用されていない。このように、本件で説明される技術は、他のビデオ・コーデック・プロトコル及び設計にも適用可能である。本件明細書では、VVC仕様又はISOBMFFファイル・フォーマット仕様の現在のドラフトに関し、オープン及びクローズの二重括弧（例えば、[[ ]]）であって二重括弧内のテキストはキャンセルされたテキストであることを示すものによって、及び追加されたテキストを示す太字イタリック体のテキストによって、テキストに対する編集変更が示される。 Section headings are used herein to facilitate understanding of the description and do not limit the application of the techniques and embodiments disclosed in each section to that section only. Further, H.266 terminology is used in some descriptions only for ease of understanding and is not used to limit the scope of the disclosed technology. As such, the techniques described herein are applicable to other video codec protocols and designs. For the purposes of this specification, for current drafts of the VVC specification or the ISOBMFF file format specification, opening and closing double brackets (e.g., [[ ]]) where the text within the double brackets is the canceled text Editing changes to text are indicated by an indication and by bold italic text indicating added text.

1．概要説明
本件明細書はビデオ・ファイル・フォーマットに関連する。具体的には、ISOベース・メディア・ファイル・フォーマット（ISOBMFF）に基づく多用途ビデオ・コーディング（VVC）ビデオ・ビットストリームを搬送するメディア・ファイルにおけるデコーダ構成情報及びサブピクチャ・エンティティ・グループのシグナリングに関連する。アイデアは、個々に又は様々な組み合わせで、例えばVVC規格のような任意のコーデックによってコーディングされたビデオ・ビットストリーム、及び例えば開発中のVVCビデオ・ファイル・フォーマットのような任意のビデオ・ファイル・フォーマットに適用することができる。
2．略語
ACT（adaptive colour transform）適応色変換
ALF（adaptive loop filter）適応ループフィルタ
AMVR（adaptive motion vector resolution）適応動きベクトル分解能
APS（adaptation parameter set）適応パラメータ・セット
AU（access unit）アクセス・ユニット
AUD（access unit delimiter）アクセス・ユニット・デリミタ
AVC（advanced video coding）（Rec. ITU-T H.264 | ISO/IEC 14496-10）アドバンスト・ビデオ・コーディング
B（bi-predictive）双－予測
BCW（bi-prediction with CU-level weights） CUレベルのウェイトによる双－予測
BDOF（bi-directional optical flow）双－予測オプティカル・フロー
BDPCM（block-based delta pulse code modulation）ブロック・ベースのデルタ・パルス・コード変調
BP（buffering period）バッファリング期間
CABAC（context-based adaptive binary arithmetic coding）コンテキスト・ベースの適応バイナリ算術コーディング
CB（coding block）コーディング・ブロック
CBR（constant bit rate）固定ビット・レート
CCALF（cross-component adaptive loop filter）クロス・コンポーネント適応ループ・フィルタ
CPB（coded picture buffer）コーディングされたピクチャのバッファ
CRA（clean random access）クリーン・ランダム・アクセス
CRC（cyclic redundancy check）巡回冗長検査
CTB（coding tree block）コーディング・ツリー・ブロック
CTU（coding tree unit）コーディング・ツリー・ユニット
CU（coding unit）コーディング・ユニット
CVS（coded video sequence）コーディングされたビデオ・シーケンス
DPB（decoded picture buffer）復号化されたピクチャ・バッファ
DCI（decoding capability information）復号化能力情報
DRAP（dependent random access point）依存性ランダム・アクセス・ポイント
DU（decoding unit）復号化ユニット
DUI（decoding unit information）復号化ユニット情報
EG（exponential-Golomb）指数ゴロム
EGk（k-th order exponential-Golomb） k次－指数ゴロム
EOB（end of bitstream）ビットストリーム末尾
EOS（end of sequence）シーケンス末尾
FD（filler data）フィラー・データ
FIFO（first-in, first-out）先入れ先出し
FL（fixed-length）固定長
GBR（green, blue, and red）グリーン，ブルー，レッド
GCI（general constraints information）一般制約情報
GDR（gradual decoding refresh）漸進的復号化リフレッシュ
GPM（geometric partitioning mode）幾何学的パーティショニング・モード
HEVC（high efficiency video coding）（Rec. ITU-T H.265 | ISO/IEC 23008-2）高効率ビデオ・コーディング
HRD（hypothetical reference decoder）仮想リファレンス・デコーダ
HSS（hypothetical stream scheduler）仮想ストリーム・スケジューラ
I（intra）イントラ
IBC（intra block copy）イントラ・ブロック・コピー
IDR（instantaneous decoding refresh）瞬時復号化リフレッシュ
ILRP（inter-layer reference picture）インター・レイヤ参照ピクチャ
IRAP（intra random access point）イントラ・ランダム・アクセス・ポイント
LFNST（low frequency non-separable transform）低周波ノン・セパラブル変換
LPS（least probable symbol）最低確率シンボル
LSB（least significant bit）最下位ビット
LTRP（long-term reference picture）長期参照ピクチャ
LMCS（luma mapping with chroma scaling）クロマ・スケーリングによるルマ・マッピング
MIP（matrix-based intra prediction）行列ベースのイントラ予測
MPS（most probable symbol）最確シンボル
MSB（most significant bit）最上位ビット
MTS（multiple transform selection）多重変換選択
MVP（motion vector prediction）動きベクトル予測
NAL（network abstraction layer）ネットワーク抽象化レイヤ
OLS（output layer set）出力レイヤ・セット
OP（operation point）オペレーション・ポイント
OPI（operating point information）オペレーティング・ポイント情報
P（predictive）予測
PH（picture header）ピクチャ・ヘッダ
POC（picture order count）ピクチャ・オーダー・カウント
PPS（picture parameter set）ピクチャ・パラメータ・セット
PROF（prediction refinement with optical flow）オプティカル・フローによる予測精密化
PT（picture timing）ピクチャ・タイミング
PU（picture unit）ピクチャ・ユニット
QP（quantization parameter）量子化パラメータ
RADL（random access decodable leading (picture)）ランダム・アクセス復号可能リーディング（ピクチャ）
RASL（random access skipped leading (picture)）ランダム・アクセス・スキップ・リーディング（ピクチャ）
RBSP（raw byte sequence payload）未処理バイト・シーケンス・ペイロード
RGB（red, green, and blue）レッド，グリーン，ブルー
RPL（reference picture list）参照ピクチャ・リスト
SAO（sample adaptive offset）サンプル適応オフセット
SAR（sample aspect ratio）サンプル・アスペクト比
SEI（supplemental enhancement information）補足エンハンスメント情報
SH（slice header）スライス・ヘッダ
SLI（subpicture level information）サブピクチャ・レベル情報
SODB（string of data bits）データ・ビット列
SPS（sequence parameter set）シーケンス・パラメータ・セット
STRP（short-term reference picture）短期参照ピクチャ
STSA（step-wise temporal sublayer access）ステップ・ワイズ・テンポラル・サブレイヤ・アクセス
TR（truncated rice）トランケーテッド・ライス
VBR（variable bit rate）可変ビット・レート
VCL（video coding layer）ビデオ・コーディング・レイヤ
VPS（video parameter set）ビデオ・パラメータ・セット
VSEI（versatile supplemental enhancement information）（Rec. ITU-T H.274 | ISO/IEC 23002-7）多用途補足エンハンスメント情報
VUI（video usability information）ビデオ利用情報
VVC（versatile video coding）（Rec. ITU-T H.266 | ISO/IEC 23090-3）多用途ビデオ・コーディング 1. GENERAL DESCRIPTION This specification relates to video file formats. Specifically, for the signaling of decoder configuration information and subpicture entity groups in media files carrying Versatile Video Coding (VVC) video bitstreams based on the ISO Base Media File Format (ISOBMFF). Related. The idea is to create video bitstreams coded by any codec, such as the VVC standard, and any video file format, such as the VVC video file format under development, individually or in various combinations. can be applied to
2. Abbreviations
ACT (adaptive color transform)
ALF (adaptive loop filter)
AMVR (adaptive motion vector resolution)
APS (adaptation parameter set)
AU (access unit)
AUD (access unit delimiter) access unit delimiter
AVC (advanced video coding) (Rec. ITU-T H.264 | ISO/IEC 14496-10) Advanced Video Coding
B (bi-predictive) bi-predictive
BCW (bi-prediction with CU-level weights) Bi-prediction with CU-level weights
BDOF (bi-directional optical flow) bi-predictive optical flow
BDPCM (block-based delta pulse code modulation) Block-based delta pulse code modulation
BP (buffering period) Buffering period
CABAC (context-based adaptive binary arithmetic coding)
CB (coding block)
CBR (constant bit rate) Constant bit rate
CCALF (cross-component adaptive loop filter)
CPB (coded picture buffer) Buffer for coded pictures
CRA (clean random access)
CRC (cyclic redundancy check)
CTB (coding tree block)
CTU (coding tree unit)
CU (coding unit)
CVS (coded video sequence)
DPB (decoded picture buffer) Decoded picture buffer
DCI (decoding capability information)
DRAP (dependent random access point)
DU (decoding unit) Decoding unit
DUI (decoding unit information)
EG (exponential-Golomb)
EGk (k-th order exponential-Golomb)
EOB (end of bitstream) End of bitstream
EOS (end of sequence) end of sequence
FD (filler data) Filler data
FIFO (first-in, first-out)
FL (fixed-length) Fixed length
GBR (green, blue, and red) Green, blue, and red
GCI (general constraints information)
GDR (gradual decoding refresh)
GPM (geometric partitioning mode)
HEVC (high efficiency video coding) (Rec. ITU-T H.265 | ISO/IEC 23008-2) High efficiency video coding
HRD (hypothetical reference decoder) Hypothetical reference decoder
HSS (hypothetical stream scheduler)
I (intra) Intra
IBC (intra block copy)
IDR (instantaneous decoding refresh)
ILRP (inter-layer reference picture) Inter-layer reference picture
IRAP (intra random access point)
LFNST (low frequency non-separable transform)
LPS (least probable symbol)
LSB (least significant bit)
LTRP (long-term reference picture)
LMCS (luma mapping with chroma scaling)
MIP (matrix-based intra prediction) Matrix-based intra prediction
MPS (most probable symbol)
MSB (most significant bit)
MTS (multiple transform selection)
MVP (motion vector prediction)
NAL (network abstraction layer) network abstraction layer
OLS (output layer set) Output layer set
OP (operation point)
OPI (operating point information)
P (predictive)
PH (picture header) Picture header
POC (picture order count)
PPS (picture parameter set) Picture parameter set
PROF (prediction refinement with optical flow)
PT (picture timing)
PU (picture unit)
QP (quantization parameter) Quantization parameter
RADL (random access decodable leading (picture))
RASL (random access skipped leading (picture))
RBSP (raw byte sequence payload) raw byte sequence payload
RGB (red, green, and blue) Red, green, and blue
RPL (reference picture list)
SAO (sample adaptive offset)
SAR (sample aspect ratio) sample aspect ratio
SEI (supplemental enhancement information)
SH (slice header) Slice header
SLI (subpicture level information) Subpicture level information
SODB (string of data bits) data bit string
SPS (sequence parameter set) Sequence parameter set
STRP (short-term reference picture) short-term reference picture
STSA (step-wise temporal sublayer access)
TR (truncated rice)
VBR (variable bit rate)
VCL (video coding layer) video coding layer
VPS (video parameter set)
VSEI (versatile supplemental enhancement information) (Rec. ITU-T H.274 | ISO/IEC 23002-7) Versatile supplemental enhancement information
VUI (video usability information)
VVC (versatile video coding) (Rec. ITU-T H.266 | ISO/IEC 23090-3) Versatile video coding

3．ビデオ・コーディング・イントロダクション
3.1．ビデオ・コーディング規格
ビデオ・コーディング規格は、周知のITU-T及びISO・IEC規格の開発を通じて主に発展している。ITU-TはH.261とH.263を作成し、ISO/IECはMPEG-1とMPEG-4 Visualを作成し、2つの組織は共同してH.262/MPEG-2ビデオとH.264/MPEG-4アドバンスト・ビデオ・コーディング（AVC）とH.265/HEVC規格とを作成した。H.262以来、ビデオ・コーディング規格はハイブリッド・ビデオ・コーディング構造に基づいており、そこでは時間的予測と変換コーディングが使用される。HEVCを越える将来のビデオ・コーディング技術を探求するため、2015年に共同ビデオ探査チーム（Joint Video Exploration Team，JVET）がVCEGとMPEGにより共同で設立された。それ以来、多くの新しい方法がJVETによって採用されており、共同探索モデル（Joint Exploration Model，JEM）と名付けられる参照ソフトウェアに入れられている。多用途ビデオ・コーディング（VVC）プロジェクトが公式にスタートすると、以後JVETは共同ビデオ・エキスパート・チーム（JVET）に改名された。2020年7月1日付の第19回会合においてファイナライズされたVVC規格は、新しいコーディング規格であり、HEVCと比較した場合に50%のビットレート低減を目指している。 3. video coding introduction
3.1. Video Coding Standards Video coding standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. ITU-T produced H.261 and H.263, ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly developed H.262/MPEG-2 video and H.264. Created MPEG-4 Advanced Video Coding (AVC) and H.265/HEVC standards. Since H.262, video coding standards have been based on hybrid video coding structures, in which temporal prediction and transform coding are used. The Joint Video Exploration Team (JVET) was jointly established by VCEG and MPEG in 2015 to explore future video coding technologies beyond HEVC. Since then, many new methods have been adopted by JVET and are included in the reference software named Joint Exploration Model (JEM). After the official launch of the Versatile Video Coding (VVC) project, the JVET was later renamed the Joint Video Expert Team (JVET). The VVC standard, which was finalized at the 19th meeting on July 1, 2020, is a new coding standard that aims for a 50% bitrate reduction when compared to HEVC.

VVC（Versatile Video Coding）規格（ITU-T H.266 | ISO/IEC 23090-3）及び関連するVSEI（Versatile Supplemental Enhancement Information）規格（ITU-T H.274 | ISO/IEC 23002-7）は、最大限に広範囲に及ぶアプリケーションでの用途のために設計されており、用途は、テレビ放送、ビデオ会議、記憶媒体からの再生のような従来の用途と、多重コード化ビデオ・ビットストリーム、マルチビュー・ビデオ、スケーラブル階層化コーディング、及びビューポート適応360°イマーシブ・メディアからのコンテンツの適応ビット・レート・ストリーミング、ビデオ領域抽出、構成、及びマージングのようなより新しく且つより豊富な進化したユース・ケースとの双方を含む。 The Versatile Video Coding (VVC) standard (ITU-T H.266 | ISO/IEC 23090-3) and the related Versatile Supplemental Enhancement Information (VSEI) standard (ITU-T H.274 | ISO/IEC 23002-7) Designed for use in the widest possible range of applications, including traditional uses such as television broadcasting, videoconferencing, playback from storage media, as well as multiplexed coded video bitstreams, multiview Newer and richer advanced use cases such as video, scalable layered coding, and adaptive bitrate streaming of content from viewport-adaptive 360° immersive media, video region extraction, composition, and merging including both

3.2．ファイル・フォーマット規格
メディア・ストリーミング・アプリケーションは、典型的にはIP、TCP、HTTP転送方法に基づいており、典型的にはISOベースのメディア・ファイル・フォーマット（ISOBMFF）のようなファイル・フォーマット依存している。そのようなストリーミング・システムの一つは、HTTP（DASH）上の動的適応ストリーミングである。ISOBMFF及びDASHでビデオ・フォーマットを使用する場合、AVCファイル・フォーマット及びHEVCファイル・フォーマットのようなビデオ・フォーマットに特有のファイル・フォーマット仕様が、ISOBMFFトラック及びDASH表現及びセグメントにおけるビデオ・コンテンツのカプセル化に必要とされるであろう。ビデオ・ビットストリームに関する重要な情報、例えばプロファイル、階層、及びレベル、並びに他の多くは、コンテンツ選択の目的、例えばストリーミング・セッションの開始時の初期化のため及びストリーミング・セッション中のストリーム適応化のための両方に適切なメディア・セグメントの選択のために、ファイル・フォーマット・レベルのメタデータ及び／又はDASHメディア・プレゼンテーション記述（MPD）として公開されることを必要とするであろう。 3.2. File Format Standards Media streaming applications are typically based on IP, TCP, HTTP transport methods and are typically file format dependent, such as the ISO Base Media File Format (ISOBMFF). ing. One such streaming system is Dynamic Adaptive Streaming over HTTP (DASH). When using video formats with ISOBMFF and DASH, file format specifications specific to video formats, such as the AVC file format and HEVC file format, encapsulate video content in ISOBMFF tracks and DASH representations and segments. would be required for Important information about the video bitstream, such as profiles, hierarchies, and levels, and many others, may be used for content selection purposes, such as for initialization at the start of a streaming session and for stream adaptation during a streaming session. For the selection of appropriate media segments for both, it would need to be exposed as file format level metadata and/or DASH Media Presentation Descriptions (MPDs).

同様に、ISOBMFFで画像フォーマットを使用する場合、AVC画像ファイル・フォーマットやHEVC画像ファイル・フォーマットのような、画像フォーマットに特有のファイル・フォーマット仕様が必要とされるであろう。 Similarly, using an image format with ISOBMFF would require a file format specification specific to the image format, such as the AVC image file format or the HEVC image file format.

ISOBMFFに基づくVVCビデオ・コンテンツの保存のためのファイル・フォーマットであるVVCビデオ・ファイル・フォーマットが、現在、MPEGによって開発されている。 VVC video file format, which is a file format for storage of VVC video content based on ISOBMFF, is currently being developed by MPEG.

図1は、本願で開示される種々の技術が実装され得る例示的なビデオ処理システム1900を示すブロック図である。種々の実装は、システム1900の構成要素の一部又は全部を含んでもよい。システム1900は、ビデオ・コンテンツを受信するための入力1902を含んでもよい。ビデオ・コンテンツは、生の又は非圧縮のフォーマット、例えば、8又は10ビットの多重成分ピクセル値で受信されてもよいし、又は圧縮された又は符号化されたフォーマットで受信されてもよい。入力1902は、ネットワーク・インターフェース、周辺バス・インターフェース、又は記憶インターフェースを表現している可能性がある。ネットワーク・インターフェースの例は、イーサーネット、光受動ネットワーク（PON）などのような有線インターフェースや、Wi-Fi又はセルラー・インターフェースのような無線インターフェースを含む。 FIG. 1 is a block diagram that illustrates an exemplary video processing system 1900 in which various techniques disclosed in this application may be implemented. Various implementations may include some or all of the components of system 1900 . System 1900 may include an input 1902 for receiving video content. Video content may be received in raw or uncompressed format, eg, 8- or 10-bit multi-component pixel values, or may be received in compressed or encoded format. Input 1902 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as Ethernet, Passive Optical Network (PON), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.

システム1900は、本件明細書で説明される種々のコーディング又は符号化方法を実装することが可能なコーディング構成要素1904を含んでもよい。コーディング構成要素1904は、入力1902からコーディング構成要素1904の出力までのビデオの平均ビットレートを低減して、ビデオのコーディングされた表現を生成することができる。従って、コーディング技術は、ビデオ圧縮又はビデオ・トランスコーディング技術と呼ばれることが間々ある。コーディング構成要素1904の出力は、記憶されてもよいし、あるいは構成要素1906によって表現されているように接続された通信を介して伝送されてもよい。入力1902で受信されたビデオの記憶又は通信されるビットストリーム（又はコーディングされた）表現は、ディスプレイ・インターフェース1910に送信されるピクセル値又は表示可能なビデオを生成するために、構成要素1908によって使用されてもよい。ビットストリーム表現から、ユーザーが視聴可能なビデオを生成するプロセスは、ビデオ解凍と呼ばれることが間々ある。更に、特定のビデオ処理操作は、「コーディングする」操作又はツールと称されるが、コーディング・ツール又は操作はエンコーダで使用され、コーディングの結果を逆向きに処理する対応する復号化ツール又は操作はデコーダで実行されるであろうということが理解されるであろう。 System 1900 may include a coding component 1904 capable of implementing various coding or encoding methods described herein. Coding component 1904 can reduce the average bitrate of the video from input 1902 to the output of coding component 1904 to produce a coded representation of the video. Therefore, coding techniques are sometimes referred to as video compression or video transcoding techniques. The output of coding component 1904 may be stored or transmitted via communication coupled as represented by component 1906 . The stored or communicated bitstream (or coded) representation of the video received at input 1902 is used by component 1908 to generate pixel values or displayable video that are sent to display interface 1910. may be The process of producing a user-viewable video from a bitstream representation is often referred to as video decompression. Further, although certain video processing operations are referred to as "coding" operations or tools, the coding tools or operations are used in encoders and the corresponding decoding tools or operations that work inversely on the results of coding are It will be appreciated that it will be performed in the decoder.

周辺バス・インターフェース又はディスプレイ・インターフェースの例は、ユニバーサル・シリアル・バス（USB）又は高解像度マルチメディア・インターフェース（HDMI（登録商標））、ディスプレイポート（Displayport）などを含む可能性がある。ストレージ・インターフェースの例は、シリアル・アドバンスト・テクノロジ・アタッチメント（serial advanced technology attachment，SATA）、PCI、IDEインターフェースなどを含む。本件明細書で説明される技術は、携帯電話、ラップトップ、スマートフォン、又はその他のデバイスであってデジタル・データ処理及び／又はビデオ表示を実行することが可能なデバイス、のような種々の電子デバイスで具体化されることが可能である。 Examples of peripheral bus interfaces or display interfaces may include Universal Serial Bus (USB) or High Definition Multimedia Interface (HDMI®), Displayport, or the like. Examples of storage interfaces include serial advanced technology attachment (SATA), PCI, IDE interfaces, and the like. The technology described herein can be applied to various electronic devices such as mobile phones, laptops, smart phones, or other devices capable of performing digital data processing and/or video display. can be embodied in

図2は、ビデオ処理装置3600のブロック図である。装置3600は、本願で説明される1つ以上の方法を実装するために使用されてもよい。装置3600は、スマートフォン、タブレット、コンピュータ、モノのインターネット（Internet of Things，IoT）受信機などで具体化されてもよい。装置3600は、1つ以上のプロセッサ3602、1つ以上のメモリ3604、及びビデオ処理ハードウェア3606を含んでもよい。プロセッサ3602は、本件明細書で説明される1つ以上の方法を実装するように構成されてもよい。メモリ（memories）3604は、本願で説明される方法及び技術を実装するために使用されるデータ及びコードを記憶するために使用されてもよい。ビデオ処理ハードウェア3606は、ハードウェア回路において、本件明細書で説明される幾つかの技術を実装するために使用されてもよい。幾つかの実施形態では、ビデオ処理ハードウェア3606は、例えばグラフィックス・コプロセッサのようなプロセッサ3602に少なくとも部分的に含まれていてもよい。 FIG. 2 is a block diagram of the video processing device 3600. As shown in FIG. Apparatus 3600 may be used to implement one or more methods described herein. Device 3600 may be embodied in smart phones, tablets, computers, Internet of Things (IoT) receivers, and the like. Device 3600 may include one or more processors 3602 , one or more memories 3604 and video processing hardware 3606 . Processor 3602 may be configured to implement one or more methods described herein. Memories 3604 may be used to store data and code used to implement the methods and techniques described herein. Video processing hardware 3606 may be used to implement some of the techniques described herein in hardware circuits. In some embodiments, video processing hardware 3606 may be at least partially included in processor 3602, such as a graphics co-processor.

図4は、本開示の技術を利用することが可能な例示的なビデオ・コーディング・システム100を示すブロック図である。 FIG. 4 is a block diagram illustrating an example video coding system 100 that can utilize techniques of this disclosure.

図4に示すように、ビデオ・コーディング・システム100は、送信元デバイス110及び送信先デバイス120を含む可能性がある。送信元デバイス110は、符号化されたビデオ・データを生成することが可能であり、ビデオ符号化デバイスと言及されてもよい。送信先デバイス120は、送信元デバイス110によって生成された符号化されたビデオ・データを復号化することが可能であり、ビデオ復号化デバイスと言及されてもよい。 As shown in FIG. 4, video coding system 100 may include source device 110 and destination device 120 . Source device 110 may generate encoded video data and may also be referred to as a video encoding device. Destination device 120 is capable of decoding encoded video data generated by source device 110 and may be referred to as a video decoding device.

送信元デバイス110は、ビデオ・ソース112、ビデオ・エンコーダ114、及び入力／出力（I/O）インターフェース116を含むことが可能である。 Source device 110 may include video source 112 , video encoder 114 , and input/output (I/O) interface 116 .

ビデオ・ソース112は、ビデオ・キャプチャ・デバイスのようなソース、ビデオ・コンテンツ・プロバイダーからビデオ・データを受信するためのインターフェース、及び／又はビデオ・データを生成するためのコンピュータ・グラフィックス・システム、又はそのようなソースの組み合わせを含んでもよい。ビデオ・データは、1つ以上のピクチャを含む可能性がある。ビデオ・エンコーダ114は、ビデオ・ソース112からのビデオ・データを符号化してビットストリームを生成する。ビットストリームは、ビデオ・データのコーディングされた表現を形成するビットのシーケンスを含む可能性がある。ビットストリームは、コーディングされたピクチャ及び関連するデータを含んでもよい。コーディングされたピクチャは、ピクチャのコーディングされた表現である。関連するデータは、シーケンス・パラメータ・セット、ピクチャ・パラメータ・セット、及び他のシンタックス構造を含んでもよい。I/Oインターフェース116は、変調器／復調器（モデム）及び／又は送信機を含んでもよい。符号化されたビデオ・データは、ネットワーク130aを通じてI/Oインターフェース116を介して送信先デバイス120へ直接的に送信されてもよい。符号化されたビデオ・データはまた、送信先デバイス120によるアクセスのために記憶媒体／サーバー130b上に格納されてもよい。 Video source 112 can be a source such as a video capture device, an interface for receiving video data from a video content provider, and/or a computer graphics system for generating video data; or may include a combination of such sources. Video data may include one or more pictures. Video encoder 114 encodes video data from video source 112 to generate a bitstream. A bitstream may include a sequence of bits that form a coded representation of video data. A bitstream may include coded pictures and associated data. A coded picture is a coded representation of a picture. Associated data may include sequence parameter sets, picture parameter sets, and other syntax structures. I/O interface 116 may include a modulator/demodulator (modem) and/or transmitter. The encoded video data may be sent directly to destination device 120 via I/O interface 116 over network 130a. The encoded video data may also be stored on storage media/server 130b for access by destination device 120b.

送信先デバイス120は、I/Oインターフェース126、ビデオ・デコーダ124、及びディスプレイ・デバイス122を含んでもよい。 Destination device 120 may include I/O interface 126 , video decoder 124 , and display device 122 .

I/Oインターフェース126は、受信機及び／又はモデムを含んでもよい。I/Oインターフェース126は、送信元デバイス110又は記憶媒体／サーバー130bから、符号化されたビデオ・データを取得することができる。ビデオ・デコーダ124は、符号化されたビデオ・データを復号化することができる。ディスプレイ・デバイス122は、復号化されたビデオ・データをユーザーに表示することができる。ディスプレイ・デバイス122は、送信先デバイス120と一体化されてもよいし、又は送信先デバイス120の外部にあってもよく、その場合の送信先デバイスは外部ディスプレイ・デバイスとのインターフェースとなるように構成される。 I/O interface 126 may include a receiver and/or modem. I/O interface 126 may obtain encoded video data from source device 110 or storage media/server 130b. Video decoder 124 may decode encoded video data. Display device 122 can display the decoded video data to a user. Display device 122 may be integrated with destination device 120 or may be external to destination device 120, in which case destination device interfaces with an external display device. Configured.

ビデオ・エンコーダ114及びビデオ・デコーダ124は、高効率ビデオ・コーディング（High Efficiency Video Coding，HEVC）規格、汎用ビデオ・コーディング（Versatile Video Coding，VVC）規格、及びその他の現行及び／又は将来の規格のようなビデオ圧縮規格に従って動作することができる。 Video encoder 114 and video decoder 124 are compatible with High Efficiency Video Coding (HEVC) standards, Versatile Video Coding (VVC) standards, and other current and/or future standards. It can operate according to video compression standards such as

図5はビデオ・エンコーダ200の一例を示すブロック図であり、これは図4に示すシステム100内のビデオ・エンコーダ114であってもよい。 FIG. 5 is a block diagram illustrating an example of video encoder 200, which may be video encoder 114 in system 100 shown in FIG.

ビデオ・エンコーダ200は、本開示の技術の何れか又は全てを実行するように構成することができる。図5の例では、ビデオ・エンコーダ200は、複数の機能的な構成要素を含む。本開示で説明される技術は、ビデオ・エンコーダ200の種々の構成要素の間で共有されてもよい。幾つかの例において、プロセッサは、本開示で説明される技術の何れか又は全てを実行するように構成することができる。 Video encoder 200 may be configured to perform any or all of the techniques of this disclosure. In the example of FIG. 5, video encoder 200 includes multiple functional components. The techniques described in this disclosure may be shared among various components of video encoder 200 . In some examples, a processor may be configured to perform any or all of the techniques described in this disclosure.

ビデオ・エンコーダ200の機能的な構成要素は、パーティション・ユニット201と、モード選択ユニット203、動き推定ユニット204、動き補償ユニット205、及びイントラ予測ユニット206を含むことが可能な予測ユニット202と、残差生成ユニット207と、変換ユニット208と、量子化ユニット209と、逆量子化ユニット210と、逆変換ユニット211と、再構成ユニット212と、バッファ213と、エントロピー符号化ユニット214とを含むことが可能である。 The functional components of video encoder 200 are: partition unit 201; prediction unit 202, which may include mode selection unit 203; motion estimation unit 204; motion compensation unit 205; It may include a difference generation unit 207, a transform unit 208, a quantization unit 209, an inverse quantization unit 210, an inverse transform unit 211, a reconstruction unit 212, a buffer 213, and an entropy encoding unit 214. It is possible.

他の例では、ビデオ・エンコーダ200は、より多い、より少ない、又は異なる機能的な構成要素を含むことができる。一例では、予測ユニット202は、イントラ・ブロック・コピー（IBC）ユニットを含むことができる。IBCユニットはIBCモードで予測を実行することが可能であり、そのモードでは、少なくとも1つの参照ピクチャは現在のビデオ・ブロックが配置されているピクチャである。 In other examples, video encoder 200 may include more, fewer, or different functional components. In one example, prediction unit 202 can include an intra block copy (IBC) unit. An IBC unit may perform prediction in IBC mode, in which at least one reference picture is the picture in which the current video block is located.

更に、動き推定ユニット204や動き補償ユニット205のような幾つかの構成要素は、高度に統合されていてもよいが、説明のために図5の例では別々に表現されている。 Furthermore, some components such as motion estimation unit 204 and motion compensation unit 205 may be highly integrated, but are represented separately in the example of FIG. 5 for illustration purposes.

パーティション・ユニット201は、ピクチャを1つ以上のビデオ・ブロックにパーティション化することができる。ビデオ・エンコーダ200及びビデオ・デコーダ300は、様々なビデオ・ブロック・サイズをサポートすることができる。 Partition unit 201 may partition a picture into one or more video blocks. Video encoder 200 and video decoder 300 can support various video block sizes.

モード選択ユニット203は、コーディング・モードのうちの一方、インター又はイントラを、例えば誤り結果に基づいて選択し、その結果のイントラ・コーディング又はインター・コーディングされたブロックを、残差ブロック・データ生成のために残差生成ユニット207へ、及び参照ピクチャとして使用する符号化済みブロックの再構成のために再構成ユニット212へ提供する。幾つかの例では、モード選択ユニット203は、予測がインター予測信号及びイントラ予測信号に基づいているイントラ＆インター予測コンビネーション（CIIP）モードを選択することができる。モード選択ユニット203はまた、インター予測の場合に、ブロックに対する動きベクトルの解像度（例えば、サブ・ピクセル又は整数ピクセル精度）を選択することができる。 Mode selection unit 203 selects one of the coding modes, inter or intra, eg, based on error results, and uses the resulting intra-coded or inter-coded block for residual block data generation. to residual generation unit 207 for use as a reference picture, and to reconstruction unit 212 for reconstruction of the encoded block for use as a reference picture. In some examples, mode selection unit 203 may select a combined intra and inter prediction (CIIP) mode, in which prediction is based on inter prediction signals and intra prediction signals. Mode select unit 203 may also select the resolution of motion vectors for blocks (eg, sub-pixel or integer-pixel precision) for inter-prediction.

現在のビデオ・ブロックに関してインター予測を実行するために、動き推定ユニット204は、バッファ213からの1つ以上の参照フレームを現在のビデオ・ブロックと比較することによって、現在のビデオ・ブロックの動き情報を生成することができる。動き補償ユニット205は、現在のビデオ・ブロックに関連するピクチャ以外のバッファ213からのピクチャの動き情報及び復号化されたサンプルに基づいて、現在のビデオ・ブロックについて予測されるビデオ・ブロックを決定することができる。 To perform inter prediction for the current video block, motion estimation unit 204 extracts motion information for the current video block by comparing one or more reference frames from buffer 213 with the current video block. can be generated. Motion compensation unit 205 determines a predicted video block for the current video block based on the decoded samples and motion information for pictures from buffer 213 other than pictures associated with the current video block. be able to.

動き推定ユニット204と動き補償ユニット205は、例えば、現在のビデオ・ブロックがIスライスであるか、Pスライスであるか、又はBスライスであるかどうかに依存して、現在のビデオ・ブロックに対して様々な処理を実行することができる。 Motion estimation unit 204 and motion compensation unit 205, for example, for the current video block depending on whether it is an I slice, a P slice, or a B slice. can perform various operations.

幾つかの例では、動き推定ユニット204は、現在のビデオ・ブロックに対して片－方向予測を実行することができ、動き推定ユニット204は、現在のビデオ・ブロックに対する参照ピクチャ・ブロックについて、リスト0又はリスト1の参照ピクチャを検索することができる。次いで、動き推定ユニット204は、参照ビデオ・ブロックを含むリスト0又はリスト1内の参照ピクチャを示す参照インデックスと、現在のビデオ・ブロック及び参照ビデオ・ブロックの間の空間的変位を示す動きベクトルとを生成することができる。動き推定ユニット204は、参照インデックス、予測方向インジケータ、及び動きベクトルを、現在のビデオ・ブロックの動き情報として出力することができる。動き補償ユニット205は、現在のビデオ・ブロックの動き情報によって示される参照ビデオ・ブロックに基づいて、現在のブロックの予測されたビデオ・ブロックを生成することができる。 In some examples, motion estimation unit 204 may perform uni-directional prediction for the current video block, and motion estimation unit 204 uses a list of reference picture blocks for the current video block. You can retrieve 0 or list 1 reference pictures. Motion estimation unit 204 then generates a reference index that indicates the reference picture in list 0 or list 1 that contains the reference video block, and a motion vector that indicates the spatial displacement between the current video block and the reference video block. can be generated. Motion estimation unit 204 may output the reference index, prediction direction indicator, and motion vector as motion information for the current video block. Motion compensation unit 205 may generate a predicted video block for the current block based on reference video blocks indicated by motion information for the current video block.

他の例では、動き推定ユニット204は、現在のビデオ・ブロックに対して双－方向予測を実行することができ、動き推定ユニット204は、現在のビデオ・ブロックに対する参照ビデオ・ブロックについて、リスト0内の参照ピクチャを検索することができ、また、現在のビデオ・ブロックに対する別の参照ビデオ・ブロックについて、リスト1内の参照ピクチャを検索することができる。次いで、動き推定ユニット204は、参照ビデオ・ブロックを含むリスト0及びリスト1内の参照ピクチャを示す参照インデックスと、参照ビデオ・ブロック及び現在のビデオ・ブロックの間の空間的変位を示す動きベクトルとを生成することができる。動き推定ユニット204は、現在のビデオ・ブロックの動き情報として、現在のビデオ・ブロックの参照インデックスと動きベクトルを出力することができる。動き補償ユニット205は、現在のビデオ・ブロックの動き情報によって示される参照ビデオ・ブロックに基づいて、現在のビデオ・ブロックの予測されたビデオ・ブロックを生成することができる。 In another example, motion estimation unit 204 may perform bi-directional prediction for the current video block, and motion estimation unit 204 uses list 0 for reference video blocks for the current video block. A reference picture in List 1 can be searched for, and a reference picture in List 1 can be searched for another reference video block for the current video block. Motion estimation unit 204 then generates a reference index that indicates the reference picture in list 0 and list 1 that contains the reference video block, and a motion vector that indicates the spatial displacement between the reference video block and the current video block. can be generated. Motion estimation unit 204 may output the reference index and motion vector of the current video block as the motion information of the current video block. Motion compensation unit 205 may generate a predicted video block for the current video block based on reference video blocks indicated by motion information for the current video block.

幾つかの例では、動き推定ユニット204は、デコーダの復号化処理のための動き情報の完全なセットを出力することができる。 In some examples, motion estimation unit 204 may output a complete set of motion information for the decoder's decoding process.

幾つかの例では、動き推定ユニット204は、現在のビデオに対する動き情報の完全なセットを出力しない可能性がある。むしろ、動き推定ユニット204は、他のビデオ・ブロックの動き情報を参照して、現在のビデオ・ブロックの動き情報をシグナリングすることができる。例えば、動き推定ユニット204は、現在のビデオ・ブロックの動き情報が、隣接するビデオ・ブロックの動き情報と十分に類似していることを判断することができる。 In some examples, motion estimation unit 204 may not output a complete set of motion information for the current video. Rather, motion estimation unit 204 may signal motion information for the current video block with reference to motion information for other video blocks. For example, motion estimation unit 204 may determine that motion information for the current video block is sufficiently similar to motion information for neighboring video blocks.

一例では、動き推定ユニット204は、現在のビデオ・ブロックに関連するシンタックス構造において、現在のビデオ・ブロックが別のビデオ・ブロックと同じ動き情報を有することをビデオ・デコーダ300に指示する値を指定することができる。 In one example, motion estimation unit 204 sets a value in the syntax structure associated with the current video block that indicates to video decoder 300 that the current video block has the same motion information as another video block. can be specified.

別の例では、動き推定ユニット204は、現在のビデオ・ブロックに関連するシンタックス構造において、別のビデオ・ブロック及び動きベクトル差分（MVD）を識別することができる。動きベクトル差分は、現在のビデオ・ブロックの動きベクトルと指定されたビデオ・ブロックの動きベクトルとの間の差分を示す。ビデオ・デコーダ300は、指定されたビデオ・ブロックの動きベクトルと動きベクトル差分とを使用して、現在のビデオ・ブロックの動きベクトルを決定することができる。 In another example, motion estimation unit 204 can identify another video block and a motion vector difference (MVD) in the syntax structure associated with the current video block. The motion vector difference indicates the difference between the motion vector of the current video block and the motion vector of the specified video block. Video decoder 300 may use the motion vector of the specified video block and the motion vector difference to determine the motion vector of the current video block.

上述したように、ビデオ・エンコーダ200は、動きベクトルを予測的にシグナリングすることができる。ビデオ・エンコーダ200によって実現され得る予測シグナリング技術の2つの例は、アドバンスト動きベクトル予測（advanced motion vector predication，AMVP）及びマージ・モード・シグナリングを含む。 As described above, video encoder 200 can predictively signal motion vectors. Two examples of predictive signaling techniques that may be implemented by video encoder 200 include advanced motion vector prediction (AMVP) and merge mode signaling.

イントラ予測ユニット206は、現在のビデオ・ブロックに対してイントラ予測を実行することができる。イントラ予測ユニット206が現在のビデオ・ブロックに対してイントラ予測を実行する場合、イントラ予測ユニット206は、同じピクチャ内の他のビデオ・ブロックの復号化されたサンプルに基づいて、現在のビデオ・ブロックに対する予測データを生成することができる。現在のビデオ・ブロックに対する予測データは、予測されるビデオ・ブロックと種々のシンタックス要素を含んでもよい。 Intra prediction unit 206 may perform intra prediction on the current video block. When intra-prediction unit 206 performs intra-prediction on the current video block, intra-prediction unit 206 predicts the current video block based on decoded samples of other video blocks in the same picture. can generate prediction data for Predictive data for a current video block may include the predicted video block and various syntax elements.

残差生成ユニット207は、現在のビデオ・ブロックから、現在のビデオ・ブロックの予測されたビデオ・ブロックを減算することによって（例えば、マイナス符号で示される）、現在のビデオ・ブロックに対する残差データを生成することができる。現在のビデオ・ブロックの残差データは、現在のビデオ・ブロック内のサンプルの異なるサンプル成分に対応する残差ビデオ・ブロックを含んでもよい。 Residual generation unit 207 generates residual data for the current video block by subtracting the predicted video block of the current video block from the current video block (e.g., indicated by the minus sign). can be generated. The residual data for the current video block may include residual video blocks corresponding to different sample components of the samples in the current video block.

他の例では、例えばスキップ・モードでは、現在のビデオ・ブロックに関し、現在のビデオ・ブロックに対する残差データが存在しない場合があり、残差生成ユニット207は減算処理を実行しない可能性がある。 In other examples, eg, in skip mode, there may be no residual data for the current video block for the current video block, and residual generation unit 207 may not perform the subtraction operation.

変換処理ユニット208は、現在のビデオ・ブロックに関連する残差ビデオ・ブロックに、1つ以上の変換を適用することによって、現在のビデオ・ブロックに対する1つ以上の変換係数ビデオ・ブロックを生成することができる。 Transform processing unit 208 generates one or more transform coefficient video blocks for the current video block by applying one or more transforms to residual video blocks associated with the current video block. be able to.

変換処理ユニット208が現在のビデオ・ブロックに関連する変換係数ビデオ・ブロックを生成した後、量子化ユニット209は、現在のビデオ・ブロックに関連する1つ以上の量子化パラメータ（QP）値に基づいて、現在のビデオ・ブロックに関連する変換係数ビデオ・ブロックを量子化することができる。 After transform processing unit 208 has generated the transform coefficient video block associated with the current video block, quantization unit 209 performs quantization parameter (QP) values based on one or more quantization parameter (QP) values associated with the current video block. to quantize a transform coefficient video block associated with the current video block.

逆量子化ユニット210及び逆変換ユニット211はそれぞれ逆量子化及び逆変換を変換係数ビデオ・ブロックに適用し、変換係数ビデオ・ブロックから残差ビデオ・ブロックを再構成することができる。再構成ユニット212は、再構成された残差ビデオ・ブロックを、予測ユニット202によって生成された1つ以上の予測されたビデオ・ブロックからの対応するサンプルに追加し、現在のブロックに関連する再構成されたビデオ・ブロックを生成して、バッファ213に記憶することができる。 Inverse quantization unit 210 and inverse transform unit 211 may apply inverse quantization and inverse transform, respectively, to the transform coefficient video block to reconstruct a residual video block from the transform coefficient video block. Reconstruction unit 212 adds the reconstructed residual video block to the corresponding samples from one or more predicted video blocks generated by prediction unit 202 to provide a reconstruction associated with the current block. Composed video blocks may be generated and stored in buffer 213 .

再構成ユニット212がビデオ・ブロックを再構成した後、ループ・フィルタリング動作を実行し、ビデオ・ブロック内のビデオ・ブロッキング・アーチファクトを低減することができる。 After reconstruction unit 212 reconstructs a video block, it may perform loop filtering operations to reduce video blocking artifacts in the video block.

エントロピー符号化ユニット214は、ビデオ・エンコーダ200の他の機能的な構成要素からデータを受信することができる。エントロピー符号化ユニット214がデータを受信すると、エントロピー符号化ユニット214は、1つ以上のエントロピー符号化動作を実行して、エントロピー符号化されたデータを生成し、エントロピー符号化されたデータを含むビットストリームを出力することができる。 Entropy encoding unit 214 may receive data from other functional components of video encoder 200 . When entropy encoding unit 214 receives the data, entropy encoding unit 214 performs one or more entropy encoding operations to produce entropy coded data and bits containing the entropy coded data. Streams can be output.

図6は、ビデオ・デコーダ300の一例を示すブロック図であり、これは図4に示すシステム100内のビデオ・デコーダ114であってもよい。 FIG. 6 is a block diagram illustrating an example of video decoder 300, which may be video decoder 114 in system 100 shown in FIG.

ビデオ・デコーダ300は、本開示の技術の何れか又は全てを実行するように構成することができる。図6の例では、ビデオ・デコーダ300は、複数の機能的構成要素を含む。本開示で説明される技術は、ビデオ・デコーダ300の種々の構成要素の間で共有されてもよい。幾つかの例において、プロセッサは、本開示で説明される技術の何れか又は全てを実行するように構成することができる。 Video decoder 300 may be configured to perform any or all of the techniques of this disclosure. In the example of FIG. 6, video decoder 300 includes multiple functional components. The techniques described in this disclosure may be shared among various components of video decoder 300 . In some examples, a processor may be configured to perform any or all of the techniques described in this disclosure.

図6の例では、ビデオ・デコーダ300は、エントロピー復号化ユニット301と、動き補償ユニット302と、イントラ予測ユニット303と、逆量子化ユニット304と、逆変換ユニット305と、再構成ユニット306と、バッファ307とを含む。ビデオ・デコーダ300は、幾つかの例において、ビデオ・エンコーダ200（図5）に関して説明した符号化経路と概ね逆の復号化経路を実行することができる。 In the example of FIG. 6, video decoder 300 includes entropy decoding unit 301, motion compensation unit 302, intra prediction unit 303, inverse quantization unit 304, inverse transform unit 305, reconstruction unit 306, buffer 307; Video decoder 300 may, in some examples, perform a decoding path generally reciprocal to the encoding path described with respect to video encoder 200 (FIG. 5).

エントロピー復号化ユニット301は、符号化されたビットストリームを取り出すことができる。符号化されたビットストリームは、エントロピー符号化されたビデオ・データ（例えば、ビデオ・データの符号化されたブロック）を含むことができる。エントロピー復号化ユニット301は、エントロピー符号化されたビデオ・データを復号化することができ、エントロピー復号化されたビデオ・データから、動き補償ユニット302は、動きベクトル、動きベクトル精度、参照ピクチャ・リスト・インデックス、及び他の動き情報を含む動き情報を決定することができる。動き補償ユニット302は、例えば、AMVP及びマージ・モードを実行することによって、そのような情報を決定することができる。 An entropy decoding unit 301 can retrieve the encoded bitstream. An encoded bitstream may include entropy-encoded video data (eg, encoded blocks of video data). Entropy decoding unit 301 can decode entropy encoded video data, from which entropy decoded video data motion compensation unit 302 can generate motion vectors, motion vector precision, reference picture list • Motion information can be determined, including indices and other motion information. Motion compensation unit 302 may determine such information, for example, by performing AMVP and merge modes.

動き補償ユニット302は、おそらくは補間フィルタに基づいて補間を実行することによって、動き補償されたブロックを生成することができる。サブ・ピクセル精度で使用される補間フィルタのための識別子が、シンタックス要素に含まれてもよい。 Motion compensation unit 302 may generate motion-compensated blocks, possibly by performing interpolation based on interpolation filters. Identifiers for interpolation filters used with sub-pixel precision may be included in the syntax elements.

動き補償ユニット302は、ビデオ・ブロックの符号化中にビデオ・エンコーダ20によって使用されるような補間フィルタを使用して、参照ブロックのサブ整数ピクセルに対する補間された値を計算してもよい。動き補償ユニット302は、受信したシンタックス情報に従ってビデオ・エンコーダ200によって使用される補間フィルタを決定し、補間フィルタを使用して予測ブロックを生成することができる。 Motion compensation unit 302 may compute interpolated values for sub-integer pixels of reference blocks using interpolation filters, such as those used by video encoder 20 during encoding of the video blocks. Motion compensation unit 302 may determine the interpolation filters used by video encoder 200 according to received syntax information and use the interpolation filters to generate predictive blocks.

動き補償ユニット302は、シンタックス情報の一部を使用して、符号化されたビデオ・シーケンスのフレーム及び／又はスライスを符号化するために使用されるブロックのサイズ、符号化されたビデオ・シーケンスのピクチャの各マクロブロックがどのようにパーティション化されるかを記述するパーティション情報、各パーティションがどのように符号化されるかを示すモード、インター符号化されたブロック各々に対する1つ以上の参照フレーム（及び参照フレーム・リスト）、及び符号化されたビデオ・シーケンスを復号化するための他の情報を決定することができる。 Motion compensation unit 302 uses some of the syntax information to determine the size of blocks used to encode the frames and/or slices of the encoded video sequence, the size of the encoded video sequence, and the partition information that describes how each macroblock of the picture is partitioned, a mode that indicates how each partition is coded, one or more reference frames for each inter-coded block (and reference frame list), and other information for decoding the encoded video sequence can be determined.

イントラ予測ユニット303は、例えば、ビットストリームで受信したイントラ予測モードを使用して、空間的に隣接するブロックから予測ブロックを形成することができる。逆量子化ユニット303は、ビットストリーム内で提供される、エントロピー復号化ユニット301によって復号化される量子化されたビデオ・ブロック係数を、逆量子化する、即ち、量子化解除する。逆変換ユニット303は、逆変換を適用する。 Intra-prediction unit 303 can, for example, use intra-prediction modes received in the bitstream to form a prediction block from spatially adjacent blocks. Inverse quantization unit 303 inverse quantizes, or dequantizes, the quantized video block coefficients decoded by entropy decoding unit 301 provided in the bitstream. Inverse transform unit 303 applies the inverse transform.

再構成ユニット306は、残差ブロックを、動き補償ユニット202又はイントラ予測ユニット303によって生成された対応する予測ブロックと合算して、復号化されたブロックを形成することができる。所望であれば、復号化されたブロックをフィルタリングしてブロック性アーチファクトを除去するために、デブロッキング・フィルタが適用されてもよい。次いで、復号化されたビデオ・ブロックはバッファ307に格納され、バッファ307は、後続の動き補償／イントラ予測のための参照ブロックを提供し、また、ディスプレイ・デバイスでの提示のために復号化されたビデオを生成する。 Reconstruction unit 306 may sum the residual block with the corresponding prediction block generated by motion compensation unit 202 or intra prediction unit 303 to form a decoded block. If desired, a deblocking filter may be applied to filter the decoded blocks to remove blockiness artifacts. The decoded video blocks are then stored in buffer 307, which provides reference blocks for subsequent motion compensation/intra-prediction and is also decoded for presentation on a display device. to generate a video.

幾つかの実施形態による好ましい解決策のリストを以下に与える。 A list of preferred solutions according to some embodiments is given below.

以下の解決策は前述のセクション（例えば、アイテム1-4）で議論した技術の例示的な実施形態を示す。 The solutions below illustrate exemplary embodiments of the techniques discussed in the previous sections (eg, items 1-4).

1．ビジュアル・メディア処理方法（例えば、図3に示される方法3000）において、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームを保存するファイルとビジュアル・メディア・データとの間の変換を実行するステップ（3002）を含み、フォーマット・ルールは、ファイルで識別されるビットストリーム表現に関連するプロファイル、階層、制約又は階層に関連してファイルに含まれる情報に対する制約を指定している。 1. In a visual media processing method (e.g., method 3000 shown in FIG. 3), performing a conversion between a file storing a bitstream of visual media data according to format rules and the visual media data ( 3002), where the format rules specify constraints on the information contained in the file in relation to profiles, hierarchies, constraints or hierarchies associated with the bitstream representations identified in the file.

2．解決策1の方法において、フォーマット・ルールは、ファイル内で識別されるビットストリーム表現の出力レイヤ・セットが従うプロファイルの身元をファイルが含んでいることを指定している。 2. In the solution 1 method, the format rules specify that the file contains the identity of the profile to which the output layer sets of bitstream representations identified in the file conform.

3．解決策1-2のうちの何れかの方法において、フォーマット・ルールは、ファイルで識別される階層が、ファイルに含まれる出力レイヤ・セットが従う全てのシンタックス構造において指定される最高階層以上であることを指定している。 3. In any of solutions 1-2, the formatting rule is such that the hierarchy identified in the file is greater than or equal to the highest hierarchy specified in all syntactic structures followed by the output layer set contained in the file. specifies that there is

4．解決策1-3のうちの何れかの方法において、フォーマット・ルールは、ファイルで識別される制約が、ファイル内の出力レイヤ・セットが従う制約を指定するシンタックス構造の1つ以上の制約フィールドによって指定される対応する値に整合することを指定している。 Four. In any of solutions 1-3, the format rule comprises one or more constraint fields of a syntax structure in which constraints identified in the file specify constraints to be followed by output layer sets in the file. specifies that it matches the corresponding value specified by

5．解決策1-4のうちの何れかの方法において、フォーマット・ルールは、ファイルで識別されるレベルが、ファイル内の出力レイヤ・セットが従うレベルを指定するシンタックス構造の1つ以上のレベル・フィールドによって指定される対応する値に整合することを指定している。 Five. In any of solutions 1-4, the format rule is one or more levels of syntax structure where the level identified in the file specifies the level followed by the output layer set in the file. Specifies to match the corresponding value specified by the field.

6．解決策1-5のうちの何れかの方法において、変換は、ビジュアル・メディア・データのビットストリーム表現を生成すること、及びフォーマット・ルールに従ってビットストリーム表現をファイルに保存することを含む。 6. A method of any of solutions 1-5, converting includes generating a bitstream representation of the visual media data and saving the bitstream representation to a file according to format rules.

7．解決策1-5のうちの何れかの方法において、変換は、ビジュアル・メディア・データを復元するためにフォーマット・ルールに従ってファイルを分析することを含む。 7. A method of any of solutions 1-5, converting includes parsing the file according to format rules to recover the visual media data.

8．ビデオ復号化装置において、解決策1-7のうちの1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 8. A video decoding apparatus comprising a processor configured to implement the method described in one or more of solutions 1-7.

9．ビデオ符号化装置において、解決策1-7のうちの1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 9. In a video encoding device, comprising a processor configured to implement the method described in one or more of solutions 1-7.

10．コンピュータ・プログラム製品において、そこに保存されるコンピュータ・コードを有し、コードはプロセッサにより実行されると、解決策1-7のうちの何れかに記載の方法をプロセッサに実行させる。 Ten. In a computer program product, having computer code stored therein, the code, when executed by a processor, causes the processor to perform the method according to any of solutions 1-7.

11．コンピュータ読み取り可能な媒体において、媒体におけるビットストリーム表現は、解決策1-7のうちの何れかに従って生成されるファイル・フォーマットに従っている。 11. In computer-readable media, the bitstream representation on the media conforms to a file format generated according to any of solutions 1-7.

12．本件で説明される方法、装置、又はシステム。 12. A method, apparatus, or system described herein.

本件で説明される解決策において、エンコーダは、フォーマット・ルールに従ってコーディングされた表現を生成することによって、フォーマット・ルールに従うことが可能である。本件で説明される解決策において、デコーダは、フォーマット・ルールを使用して、フォーマット・ルールに従うシンタックス要素の存否の知識を用いて、コーディングされた表現におけるシンタックス要素を解析し、復号化されたビデオを生成することができる。 In the solution described herein, an encoder can follow formatting rules by generating representations that are coded according to the formatting rules. In the solution described herein, the decoder uses the format rules to parse the syntax elements in the coded representation with knowledge of the presence or absence of syntax elements that follow the format rules, and You can generate a video with

技術1．ビジュアル・メディア・データを処理する方法（例えば、図8に示される方法8000）において、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップ（8002）を含み、ビットストリームは、1つ以上のプロファイル階層レベル・シンタックス構造を含む1つ以上のパラメータ・セットと1つ以上の出力レイヤ・セットとを含み、プロファイル階層レベル・シンタックス構造のうちの少なくとも1つは一般制約情報シンタックス構造を含み、フォーマット・ルールは、シンタックス要素がビジュアル・メディア・ファイルのコンフィギュレーション・レコードに含まれることを指定し、シンタックス要素は、コンフィギュレーション・レコードで指定される出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従うプロファイル、階層又はレベルを指定している。 Technology 1. In a method of processing visual media data (e.g., method 8000 shown in FIG. 8), performing a conversion between a bitstream of visual media data and a visual media file according to format rules ( 8002), the bitstream includes one or more parameter sets including one or more profile hierarchy level syntax structures and one or more output layer sets, and the profile hierarchy level syntax structure of at least one of which includes a general constraint information syntax structure, the formatting rules specify that the syntax elements be included in the configuration record of the visual media file, the syntax elements It specifies the profile, hierarchy or level followed by the output layer set identified by the output layer set index specified in the record.

技術2．技術1の方法において、シンタックス要素は、出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従うプロファイルを指定している。 Technology 2. In the Technique 1 method, the syntax element specifies the profile that the output layer set identified by the output layer set index follows.

技術3．技術1の方法において、シンタックス要素は、出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従う全てのプロファイル階層レベル・シンタックス構造において指定される最高階層以上である階層を指定する一般階層シンタックス要素である。 Technology 3. In the technique 1 method, the syntax element specifies a hierarchy that is equal to or higher than the highest hierarchy specified in all profile hierarchy level syntax structures followed by the output layer set identified by the output layer set index. It is a hierarchical syntax element.

技術4．技術1の方法において、シンタックス要素は、出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従う全てのプロファイル階層レベル・シンタックス構造において指定される最高階層を指定する一般階層シンタックス要素である。 Technology 4. In the technique 1 method, the syntax element is a general hierarchy syntax element that specifies the highest hierarchy specified in all profile hierarchy level syntax structures followed by the output layer set identified by the output layer set index. is.

技術5．技術1の方法において、シンタックス要素は、コンフィギュレーション・レコードに関連付けられるストリームが従う最高階層を指定する一般階層シンタックス要素である。 Technology 5. In the Technique 1 method, the syntax element is a general hierarchy syntax element that specifies the highest hierarchy followed by the stream associated with the configuration record.

技術6．技術1の方法において、シンタックス要素は、コンフィギュレーション・レコードに関連付けられるストリームが従う階層を指定する一般階層シンタックス要素である。 Technology 6. In the Technique 1 method, the syntax element is a general hierarchy syntax element that specifies the hierarchy followed by the stream associated with the configuration record.

技術7．技術1の方法において、コンフィギュレーション・レコードは一般制約情報シンタックス要素を含み、フォーマット・ルールは、一般制約情報シンタックス要素における第1ビットが、出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従う全てのプロファイル階層レベル・シンタックス構造における全ての一般制約情報シンタックス構造における第2ビットに対応することを指定しており、フォーマット・ルールは、全ての一般制約情報シンタックス構造における第2ビットが1に等しく設定される場合に限り、第1ビットは1に設定されることを指定している。 Technology 7. In the technique 1 method, the configuration record includes a general constraint information syntax element, and the format rule is such that the first bit in the general constraint information syntax element is the output layer set index identified by the output layer set index. specifies that the set corresponds to the second bit in all general constraint information syntax structures in all profile hierarchy level syntax structures that follow, and the format rule specifies that the It specifies that the first bit is set to one only if two bits are set equal to one.

技術8．技術1の方法において、シンタックス要素は一般レベル・シンタックス要素であり、一般レベル・シンタックス要素の値は、出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従う全てのプロファイル階層レベル・シンタックス要素において指定される最高レベル以上である能力のレベルを指定している。 Technology 8. In the technique 1 method, the syntax element is a general-level syntax element, and the value of the general-level syntax element is all profile hierarchy levels followed by the output layer set identified by the output layer set index. • specifies a level of competence that is equal to or greater than the highest level specified in the syntax element;

技術9．技術1の方法において、フォーマット・ルールは、ビジュアル・メディア・ファイルに保存されたストリームに含まれる1つ以上の他の出力レイヤ・セットにシンタックス要素が関連付けられることは許容されないことを指定している。 Technology 9. In technique 1, the formatting rules specify that the syntax elements are not allowed to be associated with one or more other output layer sets included in the stream saved to the visual media file. there is

技術10．技術1-9のうちの何れかの方法において、変換は、ビジュアル・メディア・ファイルを生成すること、及びフォーマット・ルールに従ってビットストリームをビジュアル・メディア・ファイルに保存することを含む。 Technology 10. The method of any of Techniques 1-9, converting includes generating a visual media file and saving the bitstream to the visual media file according to format rules.

技術11．技術1-9うちの何れかの方法において、変換は、ビジュアル・メディア・ファイルを生成することを含み、方法は、ビジュアル・メディア・ファイルを、非一時的なコンピュータ読み取り可能な記録媒体に保存するステップを更に含む。 Technology 11. In the method of any of Techniques 1-9, converting includes generating a visual media file, and the method stores the visual media file in a non-transitory computer-readable recording medium. Further comprising steps.

技術12．技術1-9うちの何れかの方法において、変換は、ビットストリームを再構築するためにフォーマット・ルールに従ってビジュアル・メディア・ファイルを分析することを含む。 Technology 12. In any of Techniques 1-9, converting includes parsing the visual media file according to format rules to reconstruct the bitstream.

技術13．技術1-12うちの何れかの方法において、ビジュアル・メディア・ファイルは多用途ビデオ・コーディング（VVC）によって処理される。 Technology 13. In any of Techniques 1-12, the visual media file is processed with Versatile Video Coding (VVC).

技術14．ビジュアル・メディア・データを処理する装置において、プロセッサと命令を伴う非一時的なメモリとを含み、命令は、プロセッサによって実行されると、技術1-13のうちの1つ以上に記載された方法をプロセッサに実行させる。 Technology 14. In an apparatus for processing visual media data, comprising a processor and a non-transitory memory with instructions, the instructions being executed by the processor, the method described in one or more of Techniques 1-13 is executed by the processor.

技術15．技術1-13のうちの何れかに記載の方法をプロセッサに実行させる命令を保存する非一時的なコンピュータ読み取り可能な記憶媒体。 Technology 15. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform the method of any of Techniques 1-13.

技術16．ビデオ復号化装置において、技術1-13のうちの任意の1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 Technology 16. A video decoding apparatus comprising a processor configured to implement the method described in any one or more of Techniques 1-13.

技術17．ビデオ符号化装置において、技術1-13のうちの任意の1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 Technology 17. In a video encoding apparatus, including a processor configured to implement the method described in any one or more of Techniques 1-13.

技術18．コンピュータ・プログラム製品において、そこに保存されるコンピュータ・コードを有し、コードはプロセッサにより実行されると、技術1-13のうちの何れかに記載の方法をプロセッサに実行させる。 Technology 18. The computer program product has computer code stored therein which, when executed by a processor, causes the processor to perform the method of any of Techniques 1-13.

技術19．コンピュータ読み取り可能な媒体において、ビジュアル・メディア・ファイルは、技術1-13のうちの何れかに従って生成されるファイル・フォーマットに従っている。 Technology 19. On the computer-readable medium, the visual media files conform to file formats generated according to any of Techniques 1-13.

技術20．ビジュアル・メディア・ファイル生成方法において、技術1-13のうちの何れかに記載の方法に従ってビジュアル・メディア・ファイルを生成するステップと、ビジュアル・メディア・ファイルをコンピュータ読み取り可能なプログラム媒体に保存するステップとを含む。 Technology 20. A method for generating a visual media file comprising the steps of generating a visual media file according to the method of any of Techniques 1-13 and storing the visual media file in a computer readable program medium. including.

技術21．ビデオ処理装置によって実行される方法によって生成されたビジュアル・メディア・ファイルのビットストリームを保存する非一時的なコンピュータ読み取り可能な記録媒体において、方法は技術1-13のうちの何れかに記載されているものである。幾つかの実施形態において、非一時的なコンピュータ読み取り可能な記憶媒体は、ビデオ処理装置によって実行される方法によって生成されるビジュアル・メディア・ファイルのビットストリームを保存し、方法は、フォーマット・ルールに従ってビジュアル・メディア・データに基づいてビジュアル・メディア・ファイルを生成するステップを含み、ビットストリームは、1つ以上のプロファイル階層レベル・シンタックス構造を含む1つ以上のパラメータ・セットと1つ以上の出力レイヤ・セットとを含み、プロファイル階層レベル・シンタックス構造のうちの少なくとも1つは一般制約情報シンタックス構造を含み、フォーマット・ルールは、シンタックス要素がビジュアル・メディア・ファイルのコンフィギュレーション・レコードに含まれることを指定し、シンタックス要素は、コンフィギュレーション・レコードで指定される出力レイヤ・セット・インデックスにより識別される出力レイヤ・セットが従うプロファイル、階層又はレベルを指定している。 Technology 21. In a non-transitory computer readable recording medium storing a bitstream of a visual media file generated by a method performed by a video processing device, the method is described in any of Techniques 1-13. There is. In some embodiments, a non-transitory computer-readable storage medium stores a bitstream of a visual media file generated by a method performed by a video processing device, the method following formatting rules. generating a visual media file based on visual media data, the bitstream comprising one or more parameter sets and one or more outputs including one or more profile hierarchy level syntax structures; and at least one of the profile hierarchy level syntax structures includes a general constraint information syntax structure, and the format rules specify that the syntax elements be included in the configuration record of the visual media file. Designated to be included, the syntax element specifies the profile, hierarchy or level that the output layer set identified by the output layer set index specified in the configuration record follows.

実装1．ビジュアル・メディア・データを処理する方法において（例えば、図9に示される方法9000）、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップ（9002）を含み、フォーマット・ルールは、ビジュアル・メディア・ファイルにおけるシンタックス要素の特徴を指定し、シンタックス要素は、ビットストリームに関連する制約情報を指定するために使用されるバイト数を表す値を有する。 Implementation 1. In a method of processing visual media data (e.g., method 9000 shown in FIG. 9), performing a conversion between a bitstream of visual media data and a visual media file according to format rules ( 9002), where format rules specify characteristics of syntax elements in visual media files, where syntax elements are values representing the number of bytes used to specify constraint information associated with the bitstream. have

実装2．実装1に記載の方法において、フォーマット・ルールは、シンタックス要素がビジュアル・メディア・ファイルにおいて6ビットを使用してコーディングされることを指定している。 Implementation 2. In the method described in Implementation 1, the formatting rules specify that syntax elements are coded using 6 bits in the visual media file.

実装3．実装1に記載の方法において、フォーマット・ルールは、ビジュアル・メディア・ファイルにおけるプロファイル階層レベル・マルチレイヤのイネーブルされたフラグ・シンタックス要素の直後にビジュアル・メディア・ファイルでコーディングされることを指定している。 Implementation 3. 2. The method of implementation 1, specifying that the format rule is coded in the visual media file immediately after the profile hierarchy level multi-layer enabled flag syntax element in the visual media file. ing.

実装4．実装1に記載の方法において、フォーマット・ルールは、シンタックス要素が、ビジュアル・メディア・ファイルにおける一般制約情報シンタックス要素におけるバイト数を指定し、フォーマット・ルールは、1に等しいシンタックス要素の値は、一般制約情報シンタックス要素における一般制約情報フラグが0に等しいこと、及び一般制約情報シンタックス要素はビジュアル・メディア・ファイルのプロファイル階層レベル・レコードに含まれるようには許容されないこと、を示すことを指定している。 Implementation 4. The method of implementation 1, wherein the format rule specifies the number of bytes in the general constraint information syntax element in the visual media file, and the format rule specifies the value of the syntax element equal to 1 indicates that the General Constraint Information flag in the General Constraint Information syntax element is equal to 0, and that the General Constraint Information syntax element is not allowed to be included in the Profile Hierarchy Level Record of a Visual Media File It specifies that

実装5．実装1に記載の方法において、フォーマット・ルールは、一般制約情報シンタックス要素をビジュアル・メディア・ファイルに含める条件は、シンタックス要素により指定される値が1より大きいかどうかに依存することを指定している。 Implementation 5. In the method described in Implementation 1, the formatting rules specify that the condition for including the General Constraint Information syntax element in the visual media file depends on whether the value specified by the syntax element is greater than 1 are doing.

実装6．実装1に記載の方法において、フォーマット・ルールは、フォーマット・ルールは、ビジュアル・メディア・ファイルにおいて一般制約情報シンタックス要素をコーディングするために使用されるビット数は、制約情報を指定するために使用されるバイト数を表す値と8との乗算の結果であることを指定し、フォーマット・ルールは、制約情報を指定するために使用されるバイト数を表す値と8との乗算の結果は2を減算されない。 Implementation 6. 2. The method of implementation 1, wherein the number of bits used to code the general constraint information syntax element in the visual media file is used to specify the constraint information. is the result of multiplying the value representing the number of bytes used to specify the constraint information by 8, and the result of multiplying the value representing the number of bytes used to specify the constraint information by 8 is 2 is not subtracted.

実装7．ビジュアル・メディア・データを処理する方法において、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップを含み、フォーマット・ルールは、ビジュアル・メディア・ファイルにおいてシンタックス要素のために5ビットが使用されることを指定し、シンタックス要素は、ビジュアル・メディア・ファイルのデコーダ設定レコードにおいてネットワーク抽象化レイヤ・ユニット・タイプを示す値を有する。幾つかの実施形態において、フォーマット・ルールは、ビジュアル・メディア・ファイルにおいて別のシンタックス要素のために5ビットが使用されることを指定し、別のシンタックス要素は、ビジュアル・メディア・ファイルのデコーダ設定レコードにおいてネットワーク抽象化レイヤ・ユニット・タイプを示す別の値を有する。 Implementation 7. A method of processing visual media data, comprising performing a conversion between a bitstream of visual media data and a visual media file according to format rules, the format rules being defined by the visual media data. Specifies that 5 bits are used for the syntax element in the file, the syntax element has a value that indicates the network abstraction layer unit type in the decoder configuration record of the visual media file. In some embodiments, the formatting rule specifies that 5 bits be used for another syntax element in the visual media file, the another syntax element being the It has another value that indicates the network abstraction layer unit type in the decoder configuration record.

実装8．ビジュアル・メディア・データを処理する方法において、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップを含み、ビジュアル・メディア・ファイルのトラックは1つ以上の出力レイヤ・セットを含むビデオ・ビットストリームを含み、フォーマット・ルールは、トラックのためにシンタックス要素が指定されることを指定し、シンタックス要素は、トラックが、1つ以上の出力レイヤ・セットのうちの特定の出力レイヤ・セットに対応するビデオ・ビットストリームを含むかどうかを示す。幾つかの実施形態において、ビジュアル・メディア・ファイルのトラックは、1つ以上の出力レイヤ・セットを含むビデオ・ビットストリームを含み、フォーマット・ルールは、トラックのために別のシンタックス要素が指定されることを指定し、別のシンタックス要素は、トラックが、1つ以上の出力レイヤ・セットのうちの特定の出力レイヤ・セットに対応するビデオ・ビットストリームを含むかどうかを示す。 Implementation 8. A method of processing visual media data, comprising performing conversion between a bitstream of visual media data and a visual media file according to format rules, wherein the visual media file has one track. contains a video bitstream containing one or more output layer sets, the format rules specify that syntax elements are specified for the tracks, the syntax elements specify that the tracks may have one or more outputs Indicates whether to include a video bitstream corresponding to a particular output layer set of the layer sets. In some embodiments, a track of a visual media file includes a video bitstream that includes one or more output layer sets, and formatting rules specify separate syntax elements for the track. and another syntax element indicates whether the track contains a video bitstream corresponding to a particular one of the one or more output layer sets.

実装9．実装8に記載の方法において、シンタックス要素は、トラックが、複数の出力レイヤ・セットに対応するビデオ・ビットストリームを含むことを示す。幾つかの実施形態において、別のシンタックス要素が、トラックが、複数の出力レイヤ・セットに対応するビデオ・ビットストリームを含むことを示す。 Implementation 9. In the method of implementation 8, the syntax element indicates that the track includes video bitstreams corresponding to multiple output layer sets. In some embodiments, another syntax element indicates that the track contains video bitstreams corresponding to multiple output layer sets.

実装10．実装8に記載の方法において、シンタックス要素は、トラックが、1つ以上の出力レイヤ・セットのうちの特定の出力レイヤ・セットに対応しないビデオ・ビットストリームを含むことを示す。幾つかの実施形態において、別のシンタックス要素は、トラックが、1つ以上の出力レイヤ・セットのうちの特定の出力レイヤ・セットに対応しないビデオ・ビットストリームを含むことを示す。 Implementation 10. In the method of implementation 8, the syntax element indicates that the track contains a video bitstream that does not correspond to a particular output layer set of the one or more output layer sets. In some embodiments, another syntax element indicates that the track contains a video bitstream that does not correspond to a specific one of the one or more output layer sets.

実装11．ビジュアル・メディア・データを処理する方法において、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップを含み、フォーマット・ルールは、ビジュアル・メディア・ファイルがシンタックス要素を含み、シンタックス要素の値は、出力レイヤ・セットを示すために使用される出力レイヤ・セット・インデックスを示す。幾つかの実施形態において、フォーマット・ルールは、ビジュアル・メディア・ファイルが別のシンタックス要素を含むかどうかを指定し、別のシンタックス要素の値は、出力レイヤ・セットを示すために使用される出力レイヤ・セットを示す。 Implementation 11. A method of processing visual media data, comprising performing a conversion between a bitstream of visual media data and a visual media file according to format rules, the format rules being defined by the visual media data. The file contains syntax elements, the value of which indicates the output layer set index used to indicate the output layer set. In some embodiments, the formatting rules specify whether the visual media file contains another syntax element, and the value of the another syntax element is used to indicate the output layer set. shows the output layer set.

実装12．実装11に記載の方法において、フォーマット・ルールは、ビジュアル・メディア・ファイルにおけるプロファイル階層の現在のフラグ・シンタックス要素の別の値が1に等しいことに応じて、又はプロファイル階層レイヤ・マルチレイヤ・イネーブル・フラグが1に等しいことに応じて、ビジュアル・メディア・ファイルがシンタックス要素を選択的に示すことを指定し、シンタックス要素の値は、デコーダ設定レコードにおける出力レイヤ・セット・インデックスを示す。幾つかの実施形態において、フォーマット・ルールは、ビジュアル・メディア・ファイルにおけるプロファイル階層の現在のフラグ・シンタックス要素の別の値が1に等しいことに応じて、又はプロファイル階層レイヤ・マルチレイヤ・イネーブル・フラグが1に等しいことに応じて、ビジュアル・メディア・ファイルが別のシンタックス要素を選択的に示すことを指定し、シンタックス要素の値は、デコーダ設定レコードにおける出力レイヤ・セット・インデックスを示す。 Implementation 12. In the method of implementation 11, the format rule is responsive to another value of the profile hierarchy current flag syntax element in the visual media file being equal to 1, or the profile hierarchy layer multi-layer Specifies that the visual media file selectively indicates a syntax element in response to the enable flag being equal to 1, the value of the syntax element indicating an output layer set index in the decoder configuration record . In some embodiments, the formatting rule is responsive to another value of the profile hierarchy current flags syntax element in the visual media file being equal to 1, or the profile hierarchy layer multi-layer enabled o Specifies that the visual media file selectively indicates another syntax element in response to the flag being equal to 1, the value of the syntax element being the output layer set index in the decoder configuration record show.

実装13．実装11に記載の方法において、フォーマット・ルールは、ビジュアル・メディア・ファイルは、シンタックス要素であってその値が出力レイヤ・セット・インデックスを示すもの、を含むことを許容されないことを指定し、フォーマット・ルールは、プロファイル階層の現在のフラグ・シンタックス要素がビジュアル・メディア・ファイルにおいて1に等しいことに応じて、出力レイヤ・セット・インデックスの値が、トラックで搬送される唯一のレイヤを含む第2出力レイヤ・セットの第2出力レイヤ・インデックスの第2値に等しいと推定されることを指定している。フォーマット・ルールは、ビジュアル・メディア・ファイルは、別のシンタックス要素であってその値が出力レイヤ・セット・インデックスを示すもの、を含むことを許容されないことを指定し、フォーマット・ルールは、プロファイル階層の現在のフラグ・シンタックス要素がビジュアル・メディア・ファイルにおいて1に等しいことに応じて、出力レイヤ・セット・インデックスの値が、トラックで搬送される唯一のレイヤを含む第2出力レイヤ・セットの第2出力レイヤ・インデックスの第2値に等しいと推定されることを指定している。 Implementation 13. 12. The method of implementation 11, wherein the formatting rules specify that the visual media file is not allowed to contain a syntax element whose value indicates the output layer set index; The format rule contains the only layer that the value of the output layer set index is to be carried on the track according to the current flags syntax element of the profile hierarchy being equal to 1 in the visual media file It specifies that it is presumed to be equal to the second value of the second output layer index of the second output layer set. The format rule specifies that the visual media file is not allowed to contain another syntax element whose value indicates the output layer set index, and the format rule specifies that the profile A second output layer set in which the value of the output layer set index contains the only layer carried on the track in response to the current flags syntax element of the hierarchy being equal to 1 in the visual media file is assumed to be equal to the second value of the second output layer index of .

実装14．実装1-13のうちの何れかの方法において、変換は、ビジュアル・メディア・ファイルを生成すること、及びフォーマット・ルールに従ってビットストリームをビジュアル・メディア・ファイルに保存することを含む。 Implementation 14. 14. The method of any of implementations 1-13, converting includes generating a visual media file and saving the bitstream to the visual media file according to format rules.

実装15．実装1-13のうちの何れかの方法において、変換は、ビジュアル・メディア・ファイルを生成することを含み、方法は、ビジュアル・メディア・ファイルを、非一時的なコンピュータ読み取り可能な記録媒体に保存するステップを更に含む。 Implementation 15. 14. In the method of any of Implementations 1-13, converting includes generating a visual media file, and the method saves the visual media file to a non-transitory computer-readable recording medium. further comprising the step of:

実装16．実装1-13のうちの何れかの方法において、変換は、ビットストリームを再構築するためにフォーマット・ルールに従ってビジュアル・メディア・ファイルを分析することを含む。 Implementation 16. 14. The method of any of implementations 1-13, converting includes parsing the visual media file according to format rules to reconstruct the bitstream.

実装17．実装1-16のうちの何れかの方法において、ビジュアル・メディア・ファイルは多用途ビデオ・コーディング（VVC）によって処理される。 Implementation 17. In any of implementations 1-16, the visual media file is processed with Versatile Video Coding (VVC).

実装18．実装1-17のうちの何れかに記載の方法において、プロセッサと命令を伴う非一時的なメモリとを含み、命令は、プロセッサによって実行されると、実装1-17のうちの1つ以上に記載された方法をプロセッサに実行させる。 Implementation 18. 18. The method of any of implementations 1-17, comprising a processor and a non-transitory memory with instructions, wherein the instructions, when executed by the processor, result in one or more of implementations 1-17. Cause the processor to perform the described method.

実装19．実装1-13のうちの何れかに記載の方法をプロセッサに実行させる命令を保存する非一時的なコンピュータ読み取り可能な記憶媒体。 Implementation 19. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform the method of any of implementations 1-13.

実装20．ビデオ復号化装置において、実装1-17のうちの1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 Implementation 20. A video decoding apparatus comprising a processor configured to implement the method described in one or more of Implementations 1-17.

実装21．ビデオ符号化装置において、実装1-17のうちの1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 Implementation 21. In a video encoding apparatus, including a processor configured to implement the method described in one or more of implementations 1-17.

実装22．コンピュータ・プログラム製品において、そこに保存されるコンピュータ・コードを有し、コードはプロセッサにより実行されると、実装1-17のうちの何れかに記載された方法をプロセッサに実行させる。 Implementation 22. The computer program product has computer code stored therein which, when executed by a processor, causes the processor to perform the method described in any of implementations 1-17.

実装23．コンピュータ読み取り可能な媒体において、ビジュアル・メディア・ファイルは、実装1-17のうちの何れかに従って生成されるファイル・フォーマットに従っている。 Implementation 23. On the computer-readable medium, the visual media file conforms to a file format generated according to any of implementations 1-17.

実装24．ビジュアル・メディア・ファイル生成方法において、実装1-17のうちの何れかに記載の方法に従ってビジュアル・メディア・ファイルを生成するステップと、ビジュアル・メディア・ファイルをコンピュータ読み取り可能なプログラム媒体に保存するステップとを含む。 Implementation 24. A method of generating a visual media file, generating a visual media file according to the method of any of implementations 1-17, and storing the visual media file on a computer-readable program medium. including.

実装25．実装1-17のうちの何れかに記載の方法において、ビデオ処理装置によって実行される方法によって生成されたビジュアル・メディア・ファイルのビットストリームを保存する非一時的なコンピュータ読み取り可能な記録媒体において、方法は実装1-17のうちの何れかに記載されているものである。ビデオ処理装置により実行される方法により生成されるビジュアル・メディア・ファイルのビットストリームを保存する非一時的なコンピュータ読み取り可能な記録媒体において、方法は、フォーマット・ルールに従ってビジュアル・メディア・データに基づいてビジュアル・メディア・ファイルを生成するステップを含み、フォーマット・ルールは、ビジュアル・メディア・ファイルにおいてシンタックス要素の特徴を指定し、シンタックス要素は、ビットストリームに関連する制約情報を示すために使用されるバイト数を示す値を有する。 Implementation 25. 18. The method of any of implementations 1-17, in a non-transitory computer-readable recording medium storing a bitstream of a visual media file generated by a method performed by a video processing device, The method is as described in any of implementations 1-17. In a non-transitory computer readable recording medium storing a bitstream of a visual media file generated by a method performed by a video processing apparatus, the method comprises: generating a visual media file, the formatting rules specifying characteristics of syntax elements in the visual media file, the syntax elements being used to indicate constraint information associated with the bitstream; It has a value that indicates the number of bytes to store.

動作1．ビジュアル・メディア・データを処理する方法（例えば、図10に示される方法10002）において、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップ（10002）を含み、フォーマット・ルールは、ビジュアル・メディア・ファイルのシンタックス要素の特徴を指定し、フォーマット・ルールは、レベル識別（身元）を表す値を有するシンタックス要素は、サブピクチャ共通グループ・ボックス又はサブピクチャ複数グループ・ボックスのうちの任意の1つ又は双方において8ビットを用いてコーディングされることを指定している。 Action 1. In a method of processing visual media data (e.g., method 10002 shown in FIG. 10), performing conversion between a bitstream of visual media data and a visual media file according to format rules ( 10002), wherein the formatting rules specify the characteristics of the syntax elements of the visual media file, the formatting rules having a value representing a level identification (identity). It specifies that any one or both of the box or sub-picture multiple group box is coded using 8 bits.

動作2．動作1の方法において、フォーマット・ルールは、シンタックス要素であってその値はレベル識別を表すもの、の直後の予約ビットの不存在を指定している。 Action 2. In the method of operation 1, the format rule specifies the absence of reserved bits immediately after the syntax element whose value represents the level identification.

動作3．動作1の方法において、フォーマット・ルールは、シンタックス要素であってその値はレベル識別を表すもの、の直後の24ビットは予約ビットであることを指定している。 Action 3. In the method of operation 1, the format rule specifies that the 24 bits immediately following the syntax element whose value represents the level identification are reserved bits.

動作4．動作1の方法において、フォーマット・ルールは、シンタックス要素であってその値はレベル識別を表すもの、の直後の8ビットは予約ビットであることを指定している。 Action 4. In the method of operation 1, the format rule specifies that the eight bits immediately following the syntax element whose value represents the level identification are reserved bits.

動作5．ビジュアル・メディア・データを処理する方法において、フォーマット・ルールに従ってビジュアル・メディア・データのビットストリームとビジュアル・メディア・ファイルとの間の変換を実行するステップを含み、フォーマット・ルールは、ビジュアル・メディア・ファイルにおける第1シンタックス要素、第2シンタックス要素、又は第3シンタックス要素セットに関連する特徴を指定し、第1シンタックス要素は、ビジュアル・メディア・ファイルにおけるアクティブなトラックの数を示す第1値を有し、第2シンタックス要素は、ビジュアル・メディア・ファイルにおけるサブグループ識別子の数を示す第2値を有し、第3シンタックス要素セットの各シンタックス要素は、ビジュアル・メディア・ファイルにおけるアクティブなトラックの数を示す第3値を有する。幾つかの実施形態において、フォーマット・ルールは、ビジュアル・メディア・ファイルにおける第1シンタックス要素、第2シンタックス要素、又は第3シンタックス要素セットに関連する特徴を指定し、第1シンタックス要素は、ビジュアル・メディア・ファイルにおけるアクティブなトラックの数を示す第1値を有し、第2シンタックス要素は、ビジュアル・メディア・ファイルにおけるサブグループ識別子の数を示す第2値を有し、第3シンタックス要素セットの各シンタックス要素は、ビジュアル・メディア・ファイルにおけるアクティブなトラックの数を示す第3値を有する。 Action 5. A method of processing visual media data, comprising performing a conversion between a bitstream of visual media data and a visual media file according to format rules, the format rules being defined by the visual media data. Specifies characteristics associated with the first syntax element, the second syntax element, or the third set of syntax elements in the file, where the first syntax element indicates the number of active tracks in the visual media file. has a value of 1, the second syntax element has a second value indicating the number of subgroup identifiers in the visual media file, and each syntax element in the third set of syntax elements It has a third value that indicates the number of active tracks in the file. In some embodiments, the formatting rules specify characteristics associated with a first syntax element, a second syntax element, or a third set of syntax elements in the visual media file; has a first value indicating the number of active tracks in the visual media file, a second syntax element has a second value indicating the number of subgroup identifiers in the visual media file, and a second Each syntax element of the three syntax element set has a third value that indicates the number of active tracks in the visual media file.

動作6．動作5の方法において、フォーマット・ルールは、ビジュアル・メディア・ファイルのサブピクチャ共通グループ・ボックスにおけるアクティブなトラックの数を示す第1値を有する第1シンタックス要素を指定するために、16ビットが使用されることを指定している。 Action 6. In the method of act 5, the format rule uses 16 bits to specify a first syntax element having a first value indicating the number of active tracks in the subpicture common group box of the visual media file. specified to be used.

動作7．動作5の方法において、フォーマット・ルールは、ビジュアル・メディア・ファイルのサブピクチャ複数グループ・ボックスにおけるサブグループ識別子の数を示す第2値を有する第2シンタックス要素を指定するために、16ビットが使用されることを指定し、フォーマット・ルールは、ビジュアル・メディア・ファイルのサブピクチャ複数グループ・ボックスにおけるアクティブなトラックの数を示す第3値を有する第3シンタックス要素セットの各シンタックス要素を指定するために、16ビットが使用されることを指定している。 Action 7. In the method of act 5, the format rule has 16 bits to specify a second syntax element having a second value indicating the number of subgroup identifiers in the subpicture multiple group box of the visual media file. The format rule specifies that each syntax element of the third syntax element set has a third value that indicates the number of active tracks in the subpicture multiple group box of the visual media file. To specify, it specifies that 16 bits are used.

動作8．動作5の方法において、フォーマット・ルールは、アクティブなトラックの数を示す第1値を有する第1シンタックス要素の直後の16ビットは予約されており、第2シンタックス要素はサブグループ識別子の数を示し、或いはアクティブなトラックの数を示す第3値を有する第3シンタックス要素セットの各シンタックス要素は予約されていることを指定している。 Action 8. In the method of operation 5, the format rule is such that the 16 bits immediately following the first syntax element are reserved with the first value indicating the number of active tracks and the second syntax element is the number of subgroup identifiers. or specify that each syntax element of the third syntax element set with a third value indicating the number of active tracks is reserved.

動作9．動作5の方法において、フォーマット・ルールは、アクティブなトラックの数を示す第1値を有する第1シンタックス要素の直後での予約ビットの不存在を指定し、第2シンタックス要素はサブグループ識別子の数を示し、或いはアクティブなトラックの数を示す第3値を有する第3シンタックス要素セットの各シンタックス要素は予約されている。 Action 9. In the method of act 5, the format rule specifies the absence of reserved bits immediately following a first syntax element having a first value indicating the number of active tracks and a second syntax element a subgroup identifier or with a third value indicating the number of active tracks is reserved.

動作10．動作1-9のうちの何れかの方法において、変換は、ビジュアル・メディア・ファイルを生成すること、及びフォーマット・ルールに従ってビットストリームをビジュアル・メディア・ファイルに保存することを含む。 Action 10. In the method of any of acts 1-9, converting includes generating a visual media file and saving the bitstream to the visual media file according to formatting rules.

動作11．動作1-9うちの何れかの方法において、変換は、ビジュアル・メディア・ファイルを生成することを含み、方法は、ビジュアル・メディア・ファイルを、非一時的なコンピュータ読み取り可能な記録媒体に保存するステップを更に含む。 Action 11. In the method of any of Acts 1-9, converting includes generating a visual media file, and the method stores the visual media file on a non-transitory computer-readable recording medium. Further comprising steps.

動作12．動作1-9うちの何れかの方法において、変換は、ビットストリームを再構築するためにフォーマット・ルールに従ってビジュアル・メディア・ファイルを分析することを含む。 Action 12. In the method of any of acts 1-9, converting includes parsing the visual media file according to format rules to reconstruct the bitstream.

動作13．動作1-12うちの何れかの方法において、ビジュアル・メディア・ファイルは多用途ビデオ・コーディング（VVC）によって処理される。 Action 13. In the method of any of acts 1-12, the visual media file is processed with Versatile Video Coding (VVC).

動作14．ビジュアル・メディア・データを処理する装置において、プロセッサと命令を伴う非一時的なメモリとを含み、命令は、プロセッサによって実行されると、動作1-13のうちの1つ以上に記載された方法をプロセッサに実行させる。 Action 14. In an apparatus for processing visual media data, comprising a processor and a non-transitory memory with instructions, the instructions being executed by the processor, the method recited in one or more of acts 1-13. is executed by the processor.

動作15．動作1-13のうちの何れかに記載の方法をプロセッサに実行させる命令を保存する非一時的なコンピュータ読み取り可能な記憶媒体。 Action 15. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform the method of any of acts 1-13.

動作16．ビデオ復号化装置において、動作1-13のうちの1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 Action 16. A video decoding apparatus comprising a processor configured to implement the method described in one or more of acts 1-13.

動作17．ビデオ符号化装置において、動作1-13のうちの1つ以上に記載された方法を実現するように構成されたプロセッサを含む。 Action 17. In a video encoding apparatus, including a processor configured to implement the method described in one or more of acts 1-13.

動作18．コンピュータ・プログラム製品において、そこに保存されるコンピュータ・コードを有し、コードはプロセッサにより実行されると、動作1-13のうちの何れかに記載の方法をプロセッサに実行させる。 Action 18. The computer program product has computer code stored therein which, when executed by a processor, causes the processor to perform the method of any of acts 1-13.

動作19．コンピュータ読み取り可能な媒体において、ビジュアル・メディア・ファイルは、動作1-13のうちの何れかに従って生成されるファイル・フォーマットに従っている。 Action 19. On the computer-readable medium, the visual media file conforms to a file format generated according to any of acts 1-13.

動作20．ビジュアル・メディア・ファイル生成方法において、動作1-13のうちの何れかに記載された方法に従ってビジュアル・メディア・ファイルを生成するステップと、ビジュアル・メディア・ファイルをコンピュータ読み取り可能なプログラム媒体に保存するステップとを含む。 Action 20. A method of generating a visual media file comprising the steps of generating a visual media file according to the method described in any of acts 1-13, and storing the visual media file on a computer readable program medium. step.

動作21．ビデオ処理装置によって実行される方法によって生成されたビジュアル・メディア・ファイルのビットストリームを保存する非一時的なコンピュータ読み取り可能な記録媒体において、方法は動作1-13のうちの何れかに記載されているものである。幾つかの実施形態において、非一時的なコンピュータ読み取り可能な記憶媒体は、ビデオ処理装置によって実行される方法によって生成されるビジュアル・メディア・ファイルのビットストリームを保存し、方法は、フォーマット・ルールに従ってビジュアル・メディア・データに基づいてビジュアル・メディア・ファイルを生成するステップを含み、フォーマット・ルールは、ビジュアル・メディア・ファイルのシンタックス要素の特徴を指定し、フォーマット・ルールは、レベル識別を表す値を有するシンタックス要素は、サブピクチャ共通グループ・ボックス又はサブピクチャ複数グループ・ボックスのうちの任意の1つ又は双方において8ビットを用いてコーディングされることを指定している。 Action 21. In a non-transitory computer readable recording medium storing a bitstream of a visual media file generated by a method performed by a video processing device, the method of any of acts 1-13. There is. In some embodiments, a non-transitory computer-readable storage medium stores a bitstream of a visual media file generated by a method performed by a video processing device, the method following formatting rules. generating a visual media file based on the visual media data, the formatting rules specifying characteristics of syntax elements of the visual media file, the formatting rules specifying values representing level identifications specifies that any one or both of the Subpicture Common Group Box or the Subpicture Multiple Group Box are coded using 8 bits.

本件明細書において、「ビデオ処理」という用語は、ビデオ符号化、ビデオ復号化、ビデオ圧縮又はビデオ解凍を指す可能性がある。例えば、ビデオ圧縮アルゴリズムは、ビデオのピクセル表現から、対応するビットストリーム表現へ、又はその逆への変換の間に適用される可能性がある。現在のビデオ・ブロックのビットストリーム表現は、例えばシンタックスによって定義されるように、ビットストリーム内で同じ場所にあるか又は異なる場所に拡散されるビットに対応してもよい。例えば、マクロブロックは、変換されたコーディングされたエラー残差値の観点から、また、ビットストリーム内のヘッダ及び他のフィールドのビットを使用してコーディングされてもよい。更に、変換中に、デコーダは、上述のソリューションで説明されているように、判定に基づいて、何らかフィールドが存在するか又は存在しない可能性があるという情報を用いて、ビットストリームを解析してもよい。同様に、エンコーダは、特定のシンタックス・フィールドが含まれるか、又は含まれるべきでないかを決定し、それに応じて、コーディングされた表現にシンタックス・フィールドを含めるか又はコーディングされた表現からシンタックス・フィールドを除外することによって、コーディングされた表現を生成することができる。 As used herein, the term "video processing" may refer to video encoding, video decoding, video compression or video decompression. For example, a video compression algorithm may be applied during conversion from a pixel representation of the video to a corresponding bitstream representation, or vice versa. The bitstream representation of the current video block may correspond to bits co-located or spread to different locations within the bitstream, eg, as defined by the syntax. For example, macroblocks may be coded in terms of transformed coded error residual values and using bits from headers and other fields in the bitstream. Further, during the transform, the decoder parses the bitstream with the information that some fields may or may not be present based on the decision, as described in the solution above. may Similarly, the encoder may determine whether a particular syntax field should or should not be included and, accordingly, either include the syntax field in the coded representation or remove the syntax from the coded representation. By omitting the tax field, a coded representation can be generated.

開示された及びその他の解決手段、具体例、実施形態、モジュール、及び機能的動作は、本件明細書で開示される構造及びそれらの構造的均等物を含む、デジタル電子回路、又はコンピュータ・ソフトウェア、ファームウェア、又はハードウェア、又はそれらの1つ以上の組み合わせにおいて実現することができる。開示される及びその他の実施形態は、1つ以上のコンピュータ・プログラム製品として、即ち、データ処理装置による実行のための、又はその動作を制御するための、コンピュータ読み取り可能な媒体上で符号化されているコンピュータ・プログラム命令の1つ以上のモジュールとして、実装することができる。コンピュータ読み取り可能な媒体は、機械読み取り可能なストレージ・デバイス、機械読み取り可能なストレージ基板、メモリ・デバイス、機械読み取り可能な伝搬信号に影響を及ぼす物質の組成、又はそれらの1つ以上の組み合わせであるとすることが可能である。用語「データ処理装置」は、例えば、プログラマブル・プロセッサ、コンピュータ、又は複数のプロセッサ又はコンピュータを含む、データを処理するためのすべての装置、デバイス、及び機械を包含する。装置は、ハードウェアに加えて、問題としているコンピュータ・プログラムの実行環境を生成するコード、例えば、プロセッサ・ファームウェア、プロトコル・スタック、データベース管理システム、オペレーティング・システム、又はそれらの1つ以上の組み合わせを構成するコードを含むことができる。伝搬する信号は、人工的に生成された信号、例えば、適切な受信装置への送信のために情報を符号化するために生成されるマシンにより生成された電気信号、光学信号、又は電磁信号である。 The disclosed and other solutions, implementations, embodiments, modules, and functional operations may be implemented using digital electronic circuitry, or computer software, including the structures disclosed herein and their structural equivalents. It can be implemented in firmware or hardware, or a combination of one or more thereof. The disclosed and other embodiments may be encoded as one or more computer program products, i.e., on a computer readable medium, for execution by, or for controlling the operation of, a data processing apparatus. can be implemented as one or more modules of computer program instructions. A computer-readable medium is a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter that affects a machine-readable propagating signal, or a combination of one or more thereof. It is possible to The term "data processor" encompasses all apparatus, devices and machines for processing data including, for example, a programmable processor, computer, or multiple processors or computers. In addition to hardware, the apparatus includes code that creates an execution environment for the computer program in question, such as processor firmware, protocol stacks, database management systems, operating systems, or combinations of one or more thereof. It can contain configuration code. The propagated signal may be an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal generated to encode information for transmission to a suitable receiving device. be.

コンピュータ・プログラム（プログラム、ソフトウェア、ソフトウェア・アプリケーション、スクリプト、コードとしても知られている）は、コンパイル又は解釈された言語を含む、任意の形式のプログラミング言語で書くことが可能であり、それは、スタンド・アロン・プログラムとして、又はモジュール、コンポーネント、サブルーチン、又はコンピューティング環境での使用に適したその他のユニットとして、任意の形式で配備することができる。コンピュータ・プログラムは、必ずしもファイル・システム内のファイルに対応するとは限らない。プログラムは、他のプログラム又はデータを保持するファイルの一部分（例えば、マークアップ言語文書に記憶される1つ以上のスクリプト）内に、問題としているプログラム専用の単一ファイル内に、又は複数の調整されたファイル（例えば、1つ以上のモジュール、サブ・プログラム、又はコードの一部分を記憶するファイル）内に、保存されることが可能である。コンピュータ・プログラムは、1つのコンピュータ上で又は複数のコンピュータ上で実行されるように配備することが可能であり、複数のコンピュータは、1つのサイトに配置されるか、又は複数のサイトにわたって分散されて通信ネットワークによって相互接続されている。 A computer program (also known as a program, software, software application, script, code) can be written in any form of programming language, including compiled or interpreted languages, and it stands for • be deployed in any form as a standalone program, or as modules, components, subroutines or other units suitable for use in a computing environment; Computer programs do not necessarily correspond to files in a file system. A program may be contained within a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), within a single file dedicated to the program in question, or in multiple can be stored in a structured file (eg, a file that stores one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers, which are located at one site or distributed across multiple sites. are interconnected by a communication network.

本件明細書で説明されるプロセス及びロジックの流れは、1つ以上のコンピュータ・プログラムを実行する1つ以上のプログラマブル・プロセッサによって実行され、入力データに作用して出力を生成することによって機能を実行することができる。プロセス及びロジックの流れはまた、例えばFPGA（フィールド・プログラマブル・ゲート・アレイ）又はASIC（特定用途向け集積回路）のような特殊目的論理回路によって実行されることが可能であり、また、それらとして装置を実装することも可能である。 The processes and logic flows described herein are performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. can do. The processes and logic flow can also be implemented by special purpose logic circuits such as FPGAs (Field Programmable Gate Arrays) or ASICs (Application Specific Integrated Circuits) and can be implemented as such by devices. can also be implemented.

コンピュータ・プログラムの実行に適したプロセッサは、例えば、汎用及び専用双方のマイクロプロセッサ、及び任意の種類のデジタル・コンピュータの任意の1つ以上のプロセッサを含む。一般に、プロセッサは、リード・オンリ・メモリ又はランダム・アクセス・メモリ又は双方から命令及びデータを受信するであろう。コンピュータの本質的な要素は、命令を実行するためのプロセッサと、命令及びデータを記憶するための1つ以上のメモリ・デバイスである。一般に、コンピュータはまた、データを記憶するための1つ以上の大容量ストレージ・デバイス、例えば磁気的なもの、磁気光ディスク、又は光ディスクを含み、あるいはそれらからデータを受信したり、それらへデータを転送したり、若しくは双方のために動作可能に結合される。しかしながら、コンピュータがそのようなデバイスを有することは必須ではない。コンピュータ・プログラム命令及びデータを記憶するのに適したコンピュータ読み取り可能な媒体は、例えば、EPROM、EEPROM、及びフラッシュ・メモリ・デバイスのような半導体メモリ・デバイス；磁気ディスク、例えば内部ハード・ディスク又はリムーバブル・ディスク；光磁気ディスク；並びにCD ROM及びDVD-ROMディスク；を含む、あらゆる形態の不揮発性メモリ、媒体及びメモリ・デバイスを含む。プロセッサ及びメモリは、特殊目的論理回路によって補足されるか、又はそこに内蔵されることが可能である。 Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from read only memory or random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer also includes, receives data from, or transfers data to, one or more mass storage devices, such as magnetic, magneto-optical, or optical disks, for storing data. or operably combined for both. However, it is not essential that the computer have such a device. Computer readable media suitable for storing computer program instructions and data include, for example, semiconductor memory devices such as EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable • Includes all forms of non-volatile memory, media and memory devices, including disks; magneto-optical disks; and CD ROM and DVD-ROM disks; The processor and memory may be supplemented by or embedded in special purpose logic circuitry.

本件明細書は多くの詳細を含んでいるが、これらは、何れかの対象事項やクレームされ得るものの範囲に関する限定として解釈されるべきではなく、むしろ特定の技術の特定の実施形態に特有である可能性がある特徴の説明として解釈されるべきである。別々の実施形態の文脈で本件明細書で説明される特定の特徴は、組み合わせて単一の実施形態で実施することも可能である。逆に、単一の実施形態の文脈で説明されている種々の特徴は、複数の実施形態において別々に、又は任意の適切なサブコンビネーションで実施することも可能である。更に、特徴が、特定の組み合わせにおいて作用するものとして上述されていたり、当初にそのようにクレームされていたりさえするかもしれないが、クレームされた組み合わせからの1つ以上の特徴は、場合によっては、組み合わせから切り出されることが可能であり、クレームされた組み合わせは、サブコンビネーション又はサブコンビネーションの変形例に仕向けられる可能性がある。 Although this specification contains many details, these should not be construed as limitations on the scope of any subject matter or what may be claimed, but rather specific to particular embodiments of particular technologies. It should be interpreted as a description of possible features. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Further, while features may be described above, or even originally claimed, as working in particular combinations, one or more features from the claimed combination may in some cases be , may be cut out of combinations, and the claimed combinations may be directed to subcombinations or variations of subcombinations.

同様に、図中、動作は特定の順序で記述されているが、これは、所望の結果を達成するために、このような動作が図示の特定の順序で又は順番通りに実行されること、又は、例示されたすべての動作が実行されること、を要求するものとして理解されるべきではない。更に、この特許文献で説明される実施形態における種々のシステム構成要素の分け方は、すべての実施形態でこのような分け方を要求とするものとして理解されるべきではない。 Similarly, although the figures describe operations in a particular order, this indicates that such operations are performed in the particular order or sequence shown to achieve a desired result. or should not be construed as requiring that all illustrated acts be performed. Further, the division of various system components in the embodiments described in this patent document should not be understood as requiring such division in all embodiments.

僅かな実装例及び実施例のみが記述されているに過ぎず、本特許文献で説明され図示されているものに基づいて他の実装、拡張及び変更を行うことができる。
Only a few implementations and examples have been described and other implementations, extensions and modifications can be made based on what is described and illustrated in this patent document.

Claims

A method of processing visual media data, comprising:
converting between a bitstream of visual media data and a visual media file according to formatting rules, the formatting rules specifying characteristics of syntax elements in said visual media file. ,
the syntax element has a value representing the number of bytes used to specify constraint information associated with the bitstream;
The method of claim 1, wherein the syntax element features include: the syntax element is coded in the visual media file using 6 bits.

3. The format rule specifies that the format rule is coded in the visual media file immediately after a profile hierarchy level multi-layer enabled flag syntax element in the visual media file. 1. The method according to 1.

the constraint information includes a number of bytes in a general constraint information syntax element of the visual media file;
wherein the syntax element specifies the number of bytes in the general constraint information syntax element in the visual media file;
The format rule is such that the value of the syntax element equal to 1 means that a general constraint information flag in the general constraint information syntax element equals 0; 2. The method of claim 1, specifying to indicate that the profile hierarchy level record of the .

The format rules specify that the condition for including a general constraint information syntax element in the visual media file depends on whether the value specified by the syntax element is greater than one. The method of claim 1.

the formatting rules specify that 5 bits be used for separate syntax elements in the visual media file;
2. The method of claim 1, wherein said another syntax element has another value indicating a network abstraction layer unit type in a decoder configuration record of said visual media file.

a track of the visual media file comprising a video bitstream comprising one or more output layer sets;
the formatting rules specify that separate syntax elements be specified for the track;
2. The method of claim 1, wherein the another syntax element indicates whether the track contains a video bitstream corresponding to a particular one of the one or more output layer sets. Method.

7. The method of claim 6 , wherein the another syntax element indicates that the track contains the video bitstream corresponding to multiple output layer sets.

7. The method of claim 6 , wherein the syntax element indicates that the track includes the video bitstream that does not correspond to the particular one of the one or more output layer sets. .

9. Any one of claims 1-8 , wherein said transforming comprises generating said visual media file and saving said bitstream in said visual media file according to said format rules. The method described in .

2. The converting comprises generating the visual media file, and the method further comprising saving the visual media file to a non-transitory computer-readable recording medium. - a method according to any one of clauses 8 ;

The method of any one of claims 1-8 , wherein said transforming comprises parsing said visual media file according to said format rules to reconstruct said bitstream.

A method according to any one of claims 1-11 , wherein said visual media file is processed by Versatile Video Coding (VVC).

Apparatus for processing visual media data, comprising a processor and a non-transitory memory with instructions, said instructions being executed by said processor, any one of claims 1-12 . Apparatus causing said processor to perform the method of claim 1.

A non-transitory computer-readable storage medium storing instructions that cause a processor to perform the method of any one of claims 1-12 .