JP2023553839A

JP2023553839A - Bidirectional optical flow in video coding

Info

Publication number: JP2023553839A
Application number: JP2023532594A
Authority: JP
Inventors: ジ・ジャン; ハン・フアン; チュン－チ・チェン; ヤン・ジャン; ヴァディム・セレジン; マルタ・カルチェヴィッチ
Original assignee: クアルコム，インコーポレイテッド
Priority date: 2020-12-22
Filing date: 2021-12-21
Publication date: 2023-12-26
Also published as: KR20230123951A; WO2022140377A1; EP4268452A1; TW202243475A

Abstract

A method for decoding video data includes determining that bidirectional optical flow (BDOF) is enabled for a block of video data and a block based on the determination that BDOF is enabled for a block of video data. dividing into a plurality of sub-blocks, determining a respective strain value for each sub-block of the one or more sub-blocks of the plurality of sub-blocks; determining, for each sub-block of the one or more sub-blocks, one of performing pixel-wise BDOF or bypassing BDOF based on the respective distortion values; determining predicted samples for each sub-block of the one or more sub-blocks based on a determination that BDOF is performed or BDOF is bypassed; and reconstructing the block based on the predicted samples. including.

Description

本出願は、2021年12月20日に出願された米国出願第17/645,233号、および2020年12月22日に出願された米国仮出願第63/129,190号の優先権を主張し、その各々の内容全体が参照により本明細書に組み込まれる。米国出願第17/645,233号は、2020年12月22日に出願された米国仮出願第63/129,190号の利益を主張する。 This application claims priority to U.S. Application No. 17/645,233, filed on December 20, 2021, and U.S. Provisional Application No. 63/129,190, filed on December 22, 2020, each of which , the entire contents of which are incorporated herein by reference. U.S. Application No. 17/645,233 claims the benefit of U.S. Provisional Application No. 63/129,190, filed on December 22, 2020.

本開示は、ビデオ符号化およびビデオ復号に関する。 TECHNICAL FIELD This disclosure relates to video encoding and video decoding.

デジタルビデオ能力は、デジタルテレビジョン、デジタルダイレクトブロードキャストシステム、ワイヤレスブロードキャストシステム、携帯情報端末(PDA)、ラップトップまたはデスクトップコンピュータ、タブレットコンピュータ、電子ブックリーダー、デジタルカメラ、デジタル記録デバイス、デジタルメディアプレーヤ、ビデオゲーミングデバイス、ビデオゲームコンソール、セルラーまたは衛星無線電話、いわゆる「スマートフォン」、ビデオ遠隔会議デバイス、ビデオストリーミングデバイスなどを含む、広範囲にわたるデバイスの中に組み込まれ得る。デジタルビデオデバイスは、MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4、Part 10、アドバンストビデオコーディング(AVC)、ITU-T H.265/高効率ビデオコーディング(HEVC)によって規定される規格、およびそのような規格の拡張に記載されている技法などの、ビデオコーディング技法を実装する。ビデオデバイスは、そのようなビデオコーディング技法を実装することによって、デジタルビデオ情報をより効率的に送信、受信、符号化、復号、および/または記憶し得る。 Digital video capabilities include digital television, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video It can be incorporated into a wide variety of devices, including gaming devices, video game consoles, cellular or satellite radio telephones, so-called "smartphones," video teleconferencing devices, video streaming devices, and the like. Digital video devices include MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Implements video coding techniques, such as those described in the standards defined by HEVC, and extensions of such standards. By implementing such video coding techniques, video devices may more efficiently transmit, receive, encode, decode, and/or store digital video information.

ビデオコーディング技法は、ビデオシーケンスに固有の冗長性を低減または除去するために、空間(イントラピクチャ)予測および/または時間(インターピクチャ)予測を含む。ブロックベースのビデオコーディングの場合、ビデオスライス(たとえば、ビデオピクチャ、またはビデオピクチャの一部分)は、ビデオブロックに区分されてよく、ビデオブロックは、コーディングツリーユニット(CTU)、コーディングユニット(CU)、および/またはコーディングノードと呼ばれることもある。ピクチャのイントラコード化(I)スライスの中のビデオブロックは、同じピクチャの中の隣接ブロックの中の参照サンプルに対する空間予測を使用して符号化される。ピクチャのインターコード化(PまたはB)スライスの中のビデオブロックは、同じピクチャの中の隣接ブロックの中の参照サンプルに対する空間予測、または他の参照ピクチャの中の参照サンプルに対する時間予測を使用し得る。ピクチャはフレームと呼ばれることがあり、参照ピクチャは参照フレームと呼ばれることがある。 Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or eliminate redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture, or a portion of a video picture) may be partitioned into video blocks, which are divided into coding tree units (CTUs), coding units (CUs), and /or sometimes called a coding node. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction on reference samples in adjacent blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture use spatial prediction with respect to reference samples in adjacent blocks in the same picture, or temporal prediction with respect to reference samples in other reference pictures. obtain. A picture is sometimes called a frame, and a reference picture is sometimes called a reference frame.

「Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Coding of Moving Video, High efficiency Video Coding」、国際電気通信連合、2016年12月、664頁“Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Coding of Moving Video, High efficiency Video Coding”, International Telecommunication Union, December 2016, 664 pages. VVCテストモデル10(VTM10.0)、https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTMVVC Test Model 10 (VTM10.0), https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM Brossら、「Versatile Video Coding (Draft 10)」、ITU-T SG16 WP 3とISO/IEC JTC 1/SC29/WG11との共同ビデオエキスパートチーム(JVET)、遠隔会議による第18回会合、2020年6月22日～7月1日、JVET-S2001-vABross et al., “Versatile Video Coding (Draft 10)”, Joint Video Expert Team (JVET) of ITU-T SG16 WP 3 and ISO/IEC JTC 1/SC29/WG11, 18th meeting via remote conference, June 2020. July 22nd to July 1st, JVET-S2001-vA Brossら、「Versatile Video Coding Editorial Refinements on Draft 10」、ITU-T SG16 WP 3とISO/IEC JTC 1/SC29/WG11との共同ビデオエキスパートチーム(JVET)、遠隔会議による第20回会合、2020年10月7日～16日、JVET-T2001-v2Bross et al., “Versatile Video Coding Editorial Refinements on Draft 10”, Joint Video Expert Team (JVET) of ITU-T SG16 WP 3 and ISO/IEC JTC 1/SC29/WG11, 20th meeting via remote conference, 2020. October 7th to 16th, JVET-T2001-v2 J.Chen、Y.Ye、およびS.Kim、「Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11)」、JVET-T2002、2020年12月J.Chen, Y.Ye, and S.Kim, "Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11)", JVET-T2002, December 2020

概して、本開示は、デコーダ側動きベクトル導出(たとえば、テンプレートマッチング、双方向マッチング、デコーダ側動きベクトル(MV:motion vector)改善(refinement)、および/または双方向オプティカルフロー(BDOF:bi-directional optical flow))のための技法を説明する。本開示の技法は、HEVC(高効率ビデオコーディング)、VVC(多用途ビデオコーディング)、エッセンシャルビデオコーディング(EVC:Essential Video Coding)などの、既存のビデオコーデックのうちのいずれかに適用され得るか、または任意の将来のビデオコーディング規格における効率的なコーディングツールであってよい。 In general, the present disclosure describes decoder-side motion vector derivation (e.g., template matching, bidirectional matching, decoder-side motion vector (MV) refinement), and/or bi-directional optical flow (BDOF) flow)). The techniques of this disclosure may be applied to any of the existing video codecs, such as HEVC (High Efficiency Video Coding), VVC (Versatile Video Coding), Essential Video Coding (EVC), or may be an efficient coding tool in any future video coding standard.

1つまたは複数の例では、BDOFのために、ビデオエンコーダおよびビデオデコーダ(たとえば、ビデオコーダ)は、ブロックのサブブロックに対してピクセル単位BDOFが実行されるかどうか、またはBDOFがバイパスされるかどうかを、選択的に決定するように構成され得る。すなわち、ビデオコーダは、ピクセル単位BDOF、またはピクセル単位BDOF(または、一般にBDOF)がバイパスされることのうちの1つを、選択してよい。このようにして、例示的な技法は、一緒に組み合わせられるとき(たとえば、サブブロックに対してピクセル単位BDOFのうちの1つが実行されるかまたはサブブロックに対してBDOFがバイパスされることを、ビデオコーダが決定する場合)のような、より良好なコーディング性能をもたらし得るコーディングモード間の選択を促進し得る。 In one or more examples, for BDOF, video encoders and video decoders (e.g., video coders) may determine whether pixel-by-pixel BDOF is performed on subblocks of the block or whether BDOF is bypassed. may be configured to selectively determine whether or not. That is, the video coder may select one of the pixel-by-pixel BDOF, or the pixel-by-pixel BDOF (or BDOF in general) being bypassed. In this way, the example techniques, when combined together (e.g., one of the pixel-by-pixel BDOF is performed for the sub-block or the BDOF is bypassed for the sub-block), (as determined by the video coder) may facilitate selection between coding modes that may result in better coding performance.

その上、いくつかの例では、サブブロックに対してピクセル単位BDOFを実行すべきかそれともBDOFをバイパスすべきかを決定することは、ひずみ値を決定しひずみ値をしきい値と比較することに基づいてよい。いくつかの例では、ビデオコーダは、ひずみ値を決定するために使用される計算が、ピクセル単位BDOFを実行するときにビデオコーダによって再使用され得るような方法で、ひずみ値を決定するように構成され得る。たとえば、ビデオコーダがピクセル単位BDOFを実行することになる場合、ビデオコーダは、ひずみ値を決定するために実行された計算からの結果を再使用してピクセル単位BDOFを実行してよい。 Moreover, in some examples, deciding whether to perform pixel-wise BDOF or bypass BDOF on a subblock is based on determining a strain value and comparing the strain value to a threshold. It's fine. In some examples, the video coder determines the distortion values in such a way that the calculations used to determine the distortion values can be reused by the video coder when performing pixel-wise BDOF. can be configured. For example, if a video coder is to perform pixel-wise BDOF, the video coder may reuse the results from the calculations performed to determine the distortion values to perform pixel-wise BDOF.

一例では、本開示はビデオデータを復号する方法を説明し、方法は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定することと、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割することと、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定することと、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定することと、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定することと、予測サンプルに基づいてブロックを再構成することとを備える。 In one example, this disclosure describes a method of decoding video data, the method comprising: determining that bidirectional optical flow (BDOF) is enabled for a block of video data; dividing the block into a plurality of subblocks based on a determination that the subblock is enabled; and determining a respective strain value for each subblock of the one or more subblocks of the plurality of subblocks. and for each subblock of one or more of the subblocks, one of the following: pixel-wise BDOF is performed or BDOF is bypassed. determining prediction samples for each subblock of the one or more subblocks based on a determination that pixel-wise BDOF is performed or BDOF is bypassed; and reconstructing the block based on the samples.

一例では、本開示はビデオデータを復号するためのデバイスを説明し、デバイスは、ビデオデータを記憶するように構成されたメモリと、メモリに結合された処理回路構成とを備え、処理回路構成は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定し、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定し、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定し、予測サンプルに基づいてブロックを再構成するように構成される。 In one example, this disclosure describes a device for decoding video data, the device comprising a memory configured to store video data and processing circuitry coupled to the memory, the processing circuitry comprising: , determining that bidirectional optical flow (BDOF) is enabled for the block of video data, and dividing the block into a plurality of subblocks based on the determination that BDOF is enabled for the block; determining a respective strain value for each subblock of the one or more subblocks of the plurality of subblocks; and determining a respective strain value for each subblock of the one or more subblocks of the plurality of subblocks; , determine one of whether per-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values, and based on the determination that per-pixel BDOF is performed or BDOF is bypassed. is configured to determine predictive samples for each sub-block of the one or more sub-blocks and to reconstruct the block based on the predictive samples.

一例では、本開示はその上に命令を記憶するコンピュータ可読記憶媒体を説明し、命令は、実行されたとき、1つまたは複数のプロセッサに、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定させ、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割させ、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定させ、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定させ、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定させ、予測サンプルに基づいてブロックを再構成させる。 In one example, this disclosure describes a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to perform a bidirectional optical flow (BDOF) operation on a block of video data. ) determines that BDOF is enabled, causes the block to be split into multiple subblocks based on the determination that BDOF is enabled for the block, and one or more of the multiple subblocks A respective distortion value is determined for each sub-block of the plurality of sub-blocks, and for each sub-block of one or more sub-blocks of the plurality of sub-blocks, a pixel-wise BDOF is performed or the BDOF is bypassed. for each sub-block of one or more sub-blocks based on the decision that pixel-wise BDOF is performed or BDOF is bypassed. the predicted samples are determined, and the block is reconstructed based on the predicted samples.

一例では、本開示はビデオデータを復号するためのデバイスを説明し、デバイスは、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定するための手段と、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割するための手段と、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するための手段と、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するための手段と、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定するための手段と、予測サンプルに基づいてブロックを再構成するための手段とを備える。 In one example, this disclosure describes a device for decoding video data, the device comprising: means for determining that bidirectional optical flow (BDOF) is enabled for a block of video data; means for dividing the block into a plurality of subblocks based on a determination that BDOF is enabled for the plurality of subblocks, and respectively for each subblock of the one or more subblocks of the plurality of subblocks; and whether pixel-wise BDOF is performed or BDOF is bypassed for each sub-block of one or more of the plurality of sub-blocks. for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed. and means for reconstructing the block based on the predicted samples.

1つまたは複数の例の詳細が、添付図面および以下の説明に記載される。他の特徴、目的、および利点が、説明、図面、および特許請求の範囲から明らかとなろう。 The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

本開示の技法を実行し得る例示的なビデオ符号化および復号システムを示すブロック図である。FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may implement techniques of this disclosure. 例示的な4分木2分木(QTBT)構造および対応するコーディングツリーユニット(CTU)を示す概念図である。1 is a conceptual diagram illustrating an example quadtree binary tree (QTBT) structure and corresponding coding tree unit (CTU); FIG. 例示的な4分木2分木(QTBT)構造および対応するコーディングツリーユニット(CTU)を示す概念図である。1 is a conceptual diagram illustrating an example quadtree binary tree (QTBT) structure and a corresponding coding tree unit (CTU). FIG. 本開示の技法を実行し得る例示的なビデオエンコーダを示すブロック図である。FIG. 1 is a block diagram illustrating an example video encoder that may implement the techniques of this disclosure. 本開示の技法を実行し得る例示的なビデオデコーダを示すブロック図である。1 is a block diagram illustrating an example video decoder that may implement techniques of this disclosure. FIG. マージモードのための空間隣接動きベクトル候補の例を示す概念図である。FIG. 3 is a conceptual diagram illustrating an example of spatially adjacent motion vector candidates for merge mode. 高度動きベクトル予測子(AMVP)モードのための空間隣接動きベクトル候補の例を示す概念図である。FIG. 2 is a conceptual diagram illustrating an example of spatially adjacent motion vector candidates for advanced motion vector predictor (AMVP) mode. 時間動きベクトル予測子(TMVP)候補の例を示す概念図である。FIG. 2 is a conceptual diagram showing an example of a temporal motion vector predictor (TMVP) candidate. 動きベクトルスケーリングの例を示す概念図である。FIG. 3 is a conceptual diagram showing an example of motion vector scaling. 初期動きベクトル(MV)の周囲の探索エリアに対して実行されるテンプレートマッチングを示す概念図である。FIG. 2 is a conceptual diagram showing template matching performed on a search area around an initial motion vector (MV). 時間距離に基づいて比例する動きベクトル差分の例を示す概念図である。FIG. 3 is a conceptual diagram showing an example of a motion vector difference proportional to time distance. 時間距離に関係なく鏡のように反射する動きベクトル差分の例を示す概念図である。FIG. 3 is a conceptual diagram showing an example of a motion vector difference that is reflected like a mirror regardless of time distance. [-8,8]という探索範囲の中の3×3正方形探索パターンの一例を示す概念図である。FIG. 3 is a conceptual diagram showing an example of a 3×3 square search pattern within the search range [-8,8]. 復号側動きベクトル改善の一例を示す概念図である。FIG. 3 is a conceptual diagram showing an example of decoding side motion vector improvement. 双方向オプティカルフロー(BDOF)において使用される拡張されたコーディングユニット(CU)を示す概念図である。FIG. 2 is a conceptual diagram illustrating an extended coding unit (CU) used in bidirectional optical flow (BDOF). サブブロックバイパスを伴うピクセル単位BDOFの例示的なプロセスを示すフローチャートである。2 is a flowchart illustrating an example process for pixel-wise BDOF with sub-block bypass. 8×8サブブロックのピクセル単位BDOFの一例を示す概念図である。FIG. 3 is a conceptual diagram showing an example of a pixel-by-pixel BDOF of 8×8 sub-blocks. 本開示の技法による、現在ブロックを復号するための例示的な方法を示すフローチャートである。3 is a flowchart illustrating an example method for decoding a current block in accordance with techniques of this disclosure. 本開示の技法による、現在ブロックを符号化するための例示的な方法を示すフローチャートである。2 is a flowchart illustrating an example method for encoding a current block in accordance with techniques of this disclosure.

ビデオエンコーダは、ブロックに対する1つまたは複数の動きベクトルを用いて1つまたは複数の参照ピクチャの中の1つまたは複数の参照ブロックから予測ブロックを生成するように構成され得る。ビデオエンコーダは、予測ブロックとそのブロックとの間の残差を決定し、残差を示す情報および動きベクトルを決定するために使用される情報をシグナリングする。ビデオデコーダは、残差を示す情報および動きベクトルを決定するために使用される情報を受信する。ビデオデコーダは、動きベクトルを決定し、動きベクトルから参照ブロックを決定し、予測ブロックを生成する。ビデオデコーダは、予測ブロックを残差に加算してブロックを再構成する。 A video encoder may be configured to generate a predictive block from one or more reference blocks in one or more reference pictures using one or more motion vectors for the block. A video encoder determines a residual between a predictive block and that block and signals information indicative of the residual and information used to determine a motion vector. A video decoder receives information indicative of residuals and information used to determine motion vectors. A video decoder determines a motion vector, determines a reference block from the motion vector, and generates a predictive block. A video decoder adds the predictive block to the residual to reconstruct the block.

場合によっては、参照ブロックおよび予測ブロックは同じブロックである。しかしながら、すべての例において参照ブロックおよび予測ブロックが同じであることが必要とされるとは限らない。双予測におけるようないくつかの例では、ビデオエンコーダおよびビデオデコーダは、第1の動きベクトルに基づいて第1の参照ブロックを、かつ第2の動きベクトルに基づいて第2の参照ブロックを決定してよい。ビデオエンコーダおよびビデオデコーダは、第1の参照ブロックと第2の参照ブロックとをブレンドして予測ブロックを生成してよい。 In some cases, the reference block and prediction block are the same block. However, the reference block and prediction block may not be required to be the same in all instances. In some examples, such as in bi-prediction, video encoders and video decoders determine a first reference block based on a first motion vector and a second reference block based on a second motion vector. It's fine. A video encoder and a video decoder may blend the first reference block and the second reference block to generate a predictive block.

その上、いくつかの例では、ビデオエンコーダおよびビデオデコーダは、第1の参照ブロックおよび第2の参照ブロックのサンプル値への調整に基づいて予測ブロックを生成してよい。サンプル値を調整して予測ブロックのサンプルを生成するための1つの例示的な方法は、双方向オプティカルフロー(BDOF)と呼ばれる。たとえば、I⁽⁰⁾(x,y)が第1の参照ブロックを指し、I⁽¹⁾(x,y)が第2の参照ブロックを指すことを想定する。BDOFでは、予測ブロックはI⁽⁰⁾(x,y)プラスI⁽¹⁾(x,y)と見なされてよい。以下で説明するように、ビデオエンコーダおよびビデオデコーダは、予測サンプルを決定するプロセスの一部として、調整ファクタ(すなわち、b(x,y))を決定してよく、調整ファクタを予測ブロックに加算(すなわち、I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) + b(x,y))してよい。予測サンプルを決定するために、I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) + b(x,y)の結果の追加のスケーリングおよびオフセッティング(offsetting)があってよい。 Moreover, in some examples, the video encoder and video decoder may generate predictive blocks based on adjustments to sample values of the first reference block and the second reference block. One example method for adjusting sample values to generate samples for a predictive block is called bidirectional optical flow (BDOF). For example, assume that I ⁽⁰⁾ (x,y) points to a first reference block and I ⁽¹⁾ (x,y) points to a second reference block. In BDOF, the predictive block may be considered as I ⁽⁰⁾ (x,y) plus I ⁽¹⁾ (x,y). As explained below, video encoders and video decoders may determine an adjustment factor (i.e., b(x,y)) as part of the process of determining prediction samples, and add the adjustment factor to the prediction block. (i.e., I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) + b(x,y)). There may be additional scaling and offsetting of the result of I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) + b(x,y) to determine the predicted samples. .

BDOFでは、ビデオエンコーダおよびビデオデコーダは、予測ブロックのサンプル値を調整して予測サンプルを生成するために、動きベクトルを利用して調整ファクタ(たとえば、乗算または加算されるファクタ)を決定する。一例として、ビデオエンコーダおよびビデオデコーダは、第1の参照ブロックの対応するサンプル、第2の参照ブロックの対応するサンプル、および動き改善から生成された対応する値を加算することによって、予測サンプルを生成してよい。 In BDOF, video encoders and video decoders utilize motion vectors to determine adjustment factors (eg, factors that are multiplied or added to) to adjust sample values of predictive blocks to generate predictive samples. As an example, video encoders and video decoders generate predicted samples by adding corresponding samples of a first reference block, corresponding samples of a second reference block, and corresponding values generated from motion improvement. You may do so.

様々なタイプのBDOF技法があってよい。BDOFの一例はサブブロックBDOFであり、BDOF技法の別の例はピクセル単位BDOFである。サブブロックBDOFでは、ビデオエンコーダおよびビデオデコーダは、サブブロックに対して動き改善(改善された動きとも呼ばれる)を決定する。サブブロックBDOFの場合、ビデオエンコーダおよびビデオデコーダは、予測ブロックからのサンプルを調整するために同じ動き改善を使用し、ここで、予測ブロックは、第1の参照ブロックおよび第2の参照ブロックを用いて生成されてよい(たとえば、第1の参照ブロックと第2の参照ブロックとの和、または第1の参照ブロックと第2の参照ブロックとの重み付き平均)。ピクセル単位BDOFでは、ビデオエンコーダおよびビデオデコーダは、現在ブロックの中の2つ以上のサンプルに対して異なる場合がある動き改善ファクタを決定してよい。ピクセル単位BDOFの場合、ビデオエンコーダおよびビデオデコーダは、ピクセル単位サンプルに対して決定された動き改善(改善された動きとも呼ばれる)を使用して予測ブロックからのサンプルを調整してよく、予測ブロックは第1の参照ブロックおよび第2の参照ブロックを用いて生成され得る。 There may be various types of BDOF techniques. One example of BDOF is sub-block BDOF, and another example of a BDOF technique is pixel-by-pixel BDOF. In a sub-block BDOF, the video encoder and video decoder determine motion improvements (also referred to as improved motion) for the sub-block. For sub-block BDOF, the video encoder and video decoder use the same motion improvement to adjust the samples from the prediction block, where the prediction block uses the first reference block and the second reference block. (eg, the sum of the first reference block and the second reference block, or the weighted average of the first reference block and the second reference block). In pixel-by-pixel BDOF, video encoders and video decoders may determine motion improvement factors that may be different for two or more samples in the current block. For per-pixel BDOF, video encoders and video decoders may adjust the samples from the predictive block using the motion improvement (also referred to as improved motion) determined for the pixel-by-pixel samples, and the predictive block It may be generated using a first reference block and a second reference block.

BDOFまたは他の改善技法は、ブロックレベルにおいて選択的に有効化されてよいが、サブブロックレベルにおいてBDOFが適用されるか否かが、ひずみ値に基づいて推定されてよい。たとえば、ビデオエンコーダは、ブロックに対してBDOFを有効化してよく、ブロックに対してBDOFが有効化されることを示す情報をシグナリングしてよい。 BDOF or other improvement techniques may be selectively enabled at the block level, but whether BDOF is applied at the sub-block level may be estimated based on distortion values. For example, a video encoder may enable BDOF for a block and may signal information indicating that BDOF is enabled for the block.

それに応答して、ビデオデコーダは、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割してよい。ブロックに対してBDOFが有効化されるが、ビデオデコーダは、BDOFが実際に実行されることになるのかそれともバイパスされることになるのかを、サブブロックごとに決定してよい。たとえば、ビデオデコーダは、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定する。 In response, the video decoder may divide the block into multiple subblocks based on the determination that BDOF is enabled for the block. Although BDOF is enabled for a block, the video decoder may decide for each sub-block whether BDOF is actually to be performed or bypassed. For example, a video decoder determines a respective distortion value for each subblock of one or more of the plurality of subblocks.

本開示で説明する1つまたは複数の例によれば、ビデオデコーダは、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定してよい。たとえば、ビデオデコーダは、第1のサブブロックに対する第1のひずみ値を決定してよく、第1のサブブロックに対してピクセル単位BDOFが実行されることを第1のひずみ値に基づいて決定してよい。ビデオデコーダは、第2のサブブロックに対する第2のひずみ値を決定してよく、第2のサブブロックに対してBDOFがバイパスされることを第2のひずみ値に基づいて決定してよく、以下同様である。 According to one or more examples described in this disclosure, a video decoder performs pixel-by-pixel BDOF for each sub-block of one or more of the plurality of sub-blocks. One of the BDOFs to be bypassed may be determined based on the respective distortion values. For example, the video decoder may determine a first distortion value for a first sub-block, and may determine based on the first distortion value that pixel-wise BDOF is performed on the first sub-block. It's fine. The video decoder may determine a second distortion value for the second sub-block, and may determine based on the second distortion value that the BDOF is bypassed for the second sub-block, and the following: The same is true.

1つまたは複数の例では、BDOFが実行されることをビデオデコーダが決定する場合、ビデオデコーダは、ピクセル単位BDOFを実行してよく、他のBDOF技法は、ビデオデコーダにとって利用可能でなくてよい。すなわち、ビデオデコーダは、各サブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つを、サブブロックごとに決定してよい。BDOFが実行されるとき、ビデオデコーダにとって利用可能なBDOF技法はピクセル単位BDOFであってよく、他のBDOF技法が利用可能でなくてよい。 In one or more examples, if the video decoder determines that BDOF is performed, the video decoder may perform pixel-by-pixel BDOF and other BDOF techniques may not be available to the video decoder. . That is, the video decoder may determine for each sub-block one of: pixel-by-pixel BDOF is performed or BDOF is bypassed for each sub-block. When BDOF is implemented, the BDOF technique available to the video decoder may be pixel-by-pixel BDOF, and no other BDOF techniques may be available.

1つまたは複数の例では、上記で説明したように、ビデオデコーダは、ピクセル単位BDOFが実行されるかどうか、またはBDOFがバイパスされるかどうかを決定するためのひずみ値を、サブブロックごとに決定してよい。いくつかの例では、以下でより詳細に説明するように、ビデオデコーダは、ひずみ値を決定するために使用された計算を、ピクセル単位BDOFに対するピクセル単位動き改善を決定するために再使用してよい。たとえば、第1のサブブロックに対して、ビデオデコーダは第1のひずみ値を決定してよい。第1のサブブロックに対して、ピクセル単位BDOFが有効化されることをビデオデコーダが決定したことを想定する。いくつかの例では、ピクセル単位動き改善を決定するのに必要とされるすべての値を再計算するのではなく、ビデオデコーダは、ピクセル単位BDOFが実行されることを決定するためにビデオデコーダが実行した計算からの結果を、ピクセル単位動き改善を決定するために再使用するように構成され得る。 In one or more examples, as described above, the video decoder may provide distortion values for each subblock to determine whether pixel-by-pixel BDOF is performed or whether BDOF is bypassed. You may decide. In some examples, as described in more detail below, the video decoder reuses the calculations used to determine the distortion values to determine the per-pixel motion improvement relative to the per-pixel BDOF. good. For example, for a first sub-block, a video decoder may determine a first distortion value. Assume that the video decoder has decided that for the first sub-block, per-pixel BDOF is enabled. In some instances, rather than recalculating all the values needed to determine per-pixel motion improvement, the video decoder uses Results from the performed calculations may be configured to be reused to determine pixel-by-pixel motion improvement.

ビデオデコーダは、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定するように構成され得る。たとえば、サブブロックに対して、ピクセル単位BDOFが実行されることを想定する。この例では、ビデオデコーダは、ピクセル単位動き改善に基づいて予測ブロック(たとえば、2つの参照ブロックを結合することから生成されるブロック)のサンプルを改善することによって、サブブロックのための予測サンプルを生成してよい。別の例として、サブブロックに対して、BDOFがバイパスされることを想定する。この例では、ビデオデコーダは、予測サンプルを生成するために予測ブロックのサンプルの改善を実行しなくてよい。むしろ、予測ブロックのサンプルは予測サンプルと同じであってよい(または、場合によっては、BDOFに基づかないいくつかの調整を伴う)。たとえば、BDOFがバイパスされるとき、ビデオエンコーダおよびビデオデコーダは、第1の参照ブロックおよび第2の参照ブロックの中の対応するサンプルの重み付き平均を決定することによって、予測サンプルを生成してよい。 The video decoder may be configured to determine prediction samples for each subblock of the one or more subblocks based on a determination that pixel-by-pixel BDOF is performed or BDOF is bypassed. For example, assume that pixel-by-pixel BDOF is performed on a sub-block. In this example, the video decoder improves the predictive samples for the sub-block by improving the samples of the predictive block (e.g., the block generated from combining two reference blocks) based on pixel-by-pixel motion improvement. May be generated. As another example, assume that BDOF is bypassed for a sub-block. In this example, the video decoder does not have to perform enhancement of the samples of the predictive block to generate predictive samples. Rather, the samples of the prediction block may be the same as the prediction samples (or possibly with some non-BDOF-based adjustments). For example, when BDOF is bypassed, the video encoder and video decoder may generate predicted samples by determining a weighted average of corresponding samples in the first reference block and the second reference block. .

ビデオデコーダは、予測サンプルに基づいてブロックを再構成してよい。たとえば、ビデオデコーダは、予測サンプルとブロックのサンプルとの間の差分を示す残差値を受信してよく、残差値を予測サンプルに加算してブロックを再構成してよい。上記の例は、ビデオデコーダの観点から説明される。ビデオエンコーダは、類似の技法を実行するように構成され得る。たとえば、ビデオデコーダによって生成される予測サンプルは、ビデオエンコーダによって生成された予測サンプルと同じはずである。したがって、ビデオエンコーダは、ビデオデコーダと同じ方法で予測サンプルを決定するために、上記で説明した技法と類似の技法を実行してよい。 A video decoder may reconstruct blocks based on the predicted samples. For example, a video decoder may receive residual values indicating differences between predicted samples and samples of a block, and may add the residual values to the predicted samples to reconstruct the block. The above example is described from the perspective of a video decoder. Video encoders may be configured to perform similar techniques. For example, the predicted samples produced by a video decoder should be the same as the predicted samples produced by a video encoder. Accordingly, a video encoder may perform techniques similar to those described above to determine prediction samples in the same way as a video decoder.

図1は、本開示の技法を実行し得る例示的なビデオ符号化および復号システム100を示すブロック図である。本開示の技法は、一般に、ビデオデータをコーディング(符号化および/または復号)することを対象とする。一般に、ビデオデータは、ビデオを処理するための任意のデータを含む。したがって、ビデオデータは、未加工の符号化されていないビデオ、符号化されたビデオ、復号された(たとえば、再構成された)ビデオ、およびシグナリングデータなどのビデオメタデータを含んでよい。 FIG. 1 is a block diagram illustrating an example video encoding and decoding system 100 that may implement the techniques of this disclosure. The techniques of this disclosure are generally directed to coding (encoding and/or decoding) video data. Generally, video data includes any data for processing video. Accordingly, video data may include raw unencoded video, encoded video, decoded (eg, reconstructed) video, and video metadata such as signaling data.

図1に示すように、システム100は、この例では、宛先デバイス116によって復号および表示されるべき符号化ビデオデータを提供する、ソースデバイス102を含む。詳細には、ソースデバイス102は、コンピュータ可読媒体110を介して宛先デバイス116にビデオデータを提供する。ソースデバイス102および宛先デバイス116は、デスクトップコンピュータ、ノートブック(すなわち、ラップトップ)コンピュータ、モバイルデバイス、タブレットコンピュータ、セットトップボックス、スマートフォンなどの電話ハンドセット、テレビジョン、カメラ、ディスプレイデバイス、デジタルメディアプレーヤ、ビデオゲーミングコンソール、ビデオストリーミングデバイス、ブロードキャスト受信機デバイスなどを含む、幅広いデバイスのうちのいずれかを備えてよい。場合によっては、ソースデバイス102および宛先デバイス116は、ワイヤレス通信用に装備されることがあり、したがって、ワイヤレス通信デバイスと呼ばれることがある。 As shown in FIG. 1, system 100 includes, in this example, a source device 102 that provides encoded video data to be decoded and displayed by a destination device 116. In particular, source device 102 provides video data to destination device 116 via computer-readable medium 110. Source device 102 and destination device 116 may include desktop computers, notebook (i.e., laptop) computers, mobile devices, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, digital media players, It may comprise any of a wide variety of devices, including video gaming consoles, video streaming devices, broadcast receiver devices, and the like. In some cases, source device 102 and destination device 116 may be equipped for wireless communication and may therefore be referred to as wireless communication devices.

図1の例では、ソースデバイス102は、ビデオソース104、メモリ106、ビデオエンコーダ200、および出力インターフェース108を含む。宛先デバイス116は、入力インターフェース122、ビデオデコーダ300、メモリ120、およびディスプレイデバイス118を含む。本開示によれば、ソースデバイス102のビデオエンコーダ200および宛先デバイス116のビデオデコーダ300は、テンプレートマッチング、双方向マッチング、デコーダ側動きベクトル(MV)改善、および双方向オプティカルフローなどの、デコーダ側動きベクトル導出技法のための技法を適用するように構成され得る。したがって、ソースデバイス102はビデオ符号化デバイスの一例を表し、宛先デバイス116はビデオ復号デバイスの一例を表す。他の例では、ソースデバイスおよび宛先デバイスは、他の構成要素または構成を含んでよい。たとえば、ソースデバイス102は、外部カメラなどの外部ビデオソースからビデオデータを受信してよい。同様に、宛先デバイス116は、統合型ディスプレイデバイスを含むのではなく、外部ディスプレイデバイスとインターフェースしてよい。 In the example of FIG. 1, source device 102 includes video source 104, memory 106, video encoder 200, and output interface 108. Destination device 116 includes input interface 122, video decoder 300, memory 120, and display device 118. According to this disclosure, the video encoder 200 of the source device 102 and the video decoder 300 of the destination device 116 perform decoder side motion vector (MV) improvement, such as template matching, bidirectional matching, decoder side motion vector (MV) improvement, and bidirectional optical flow. The method may be configured to apply techniques for vector derivation techniques. Thus, source device 102 represents an example of a video encoding device, and destination device 116 represents an example of a video decoding device. In other examples, the source device and destination device may include other components or configurations. For example, source device 102 may receive video data from an external video source, such as an external camera. Similarly, destination device 116 may interface with an external display device rather than include an integrated display device.

図1に示すようなシステム100は一例にすぎない。一般に、任意のデジタルビデオ符号化および/または復号デバイスが、テンプレートマッチング、双方向マッチング、デコーダ側動きベクトル(MV)改善、および双方向オプティカルフロー(BDOF)などの、デコーダ側動きベクトル導出技法のための技法を実行してよい。ソースデバイス102および宛先デバイス116は、ソースデバイス102が宛先デバイス116への送信のためにコード化ビデオデータを生成するような、コーディングデバイスの例にすぎない。本開示は、データのコーディング(符号化および/または復号)を実行するデバイスとして、「コーディング」デバイスに言及する。したがって、ビデオエンコーダ200およびビデオデコーダ300は、コーディングデバイス、具体的には、それぞれ、ビデオエンコーダおよびビデオデコーダの例を表す。いくつかの例では、ソースデバイス102および宛先デバイス116は、ソースデバイス102および宛先デバイス116の各々がビデオ符号化および復号構成要素を含むように、実質的に対称に動作し得る。したがって、システム100は、たとえば、ビデオストリーミング、ビデオ再生、ビデオブロードキャスティング、またはビデオテレフォニーのための、ソースデバイス102と宛先デバイス116との間の一方向または二方向のビデオ送信をサポートし得る。 A system 100 as shown in FIG. 1 is only one example. In general, any digital video encoding and/or decoding device supports decoder-side motion vector derivation techniques, such as template matching, bidirectional matching, decoder-side motion vector (MV) improvement, and bidirectional optical flow (BDOF). You may perform this technique. Source device 102 and destination device 116 are only examples of coding devices, such as source device 102 generating coded video data for transmission to destination device 116. This disclosure refers to "coding" devices as devices that perform coding (encoding and/or decoding) of data. Accordingly, video encoder 200 and video decoder 300 represent examples of coding devices, specifically video encoders and video decoders, respectively. In some examples, source device 102 and destination device 116 may operate substantially symmetrically, such that each of source device 102 and destination device 116 includes video encoding and decoding components. Thus, system 100 may support one-way or two-way video transmission between source device 102 and destination device 116, for example, for video streaming, video playback, video broadcasting, or video telephony.

一般に、ビデオソース104は、ビデオデータ(すなわち、未加工の符号化されていないビデオデータ)のソースを表し、ビデオデータの連続した一連のピクチャ(「フレーム」とも呼ばれる)をビデオエンコーダ200に提供し、ビデオエンコーダ200は、ピクチャのためのデータを符号化する。ソースデバイス102のビデオソース104は、ビデオカメラ、以前にキャプチャされた未加工ビデオを含むビデオアーカイブ、および/またはビデオコンテンツプロバイダからビデオを受信するためのビデオフィードインターフェースなどの、ビデオキャプチャデバイスを含んでよい。さらなる代替として、ビデオソース104は、ソースビデオとしてのコンピュータグラフィックスベースのデータ、またはライブビデオ、アーカイブされたビデオ、およびコンピュータ生成されたビデオの組合せを生成し得る。各場合において、ビデオエンコーダ200は、キャプチャされたビデオデータ、事前にキャプチャされたビデオデータ、またはコンピュータ生成されたビデオデータを符号化する。ビデオエンコーダ200は、受信された順序(「表示順序」と呼ばれることがある)からコーディング用のコーディング順序にピクチャを並べ替えてよい。ビデオエンコーダ200は、符号化ビデオデータを含むビットストリームを生成し得る。ソースデバイス102は、次いで、たとえば、宛先デバイス116の入力インターフェース122による受信および/または取出しのために、出力インターフェース108を介してコンピュータ可読媒体110上に符号化ビデオデータを出力し得る。 Generally, video source 104 represents a source of video data (i.e., raw, unencoded video data) that provides a continuous series of pictures (also referred to as “frames”) of video data to video encoder 200. , video encoder 200 encodes data for pictures. Video source 104 of source device 102 includes a video capture device, such as a video camera, a video archive containing previously captured raw video, and/or a video feed interface for receiving video from a video content provider. good. As a further alternative, video source 104 may produce computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In each case, video encoder 200 encodes captured video data, pre-captured video data, or computer-generated video data. Video encoder 200 may reorder the pictures from the order in which they were received (sometimes referred to as the "display order") to a coding order for coding. Video encoder 200 may generate a bitstream that includes encoded video data. Source device 102 may then output encoded video data via output interface 108 onto computer-readable medium 110 for reception and/or retrieval by input interface 122 of destination device 116, for example.

ソースデバイス102のメモリ106および宛先デバイス116のメモリ120は、汎用メモリを表す。いくつかの例では、メモリ106、120は、未加工ビデオデータ、たとえば、ビデオソース104からの未加工ビデオ、およびビデオデコーダ300からの未加工の復号ビデオデータを記憶し得る。追加または代替として、メモリ106、120は、たとえば、それぞれ、ビデオエンコーダ200およびビデオデコーダ300によって実行可能なソフトウェア命令を記憶し得る。メモリ106およびメモリ120は、この例ではビデオエンコーダ200およびビデオデコーダ300とは別個に示されるが、ビデオエンコーダ200およびビデオデコーダ300がまた、機能的に類似のまたは均等な目的のために内部メモリを含み得ることを理解されたい。さらに、メモリ106、120は、符号化ビデオデータ、たとえば、ビデオエンコーダ200からの出力およびビデオデコーダ300への入力を記憶し得る。いくつかの例では、メモリ106、120の部分は、たとえば、未加工の復号ビデオデータおよび/または符号化ビデオデータを記憶するための、1つまたは複数のビデオバッファとして割り振られ得る。 Memory 106 of source device 102 and memory 120 of destination device 116 represent general purpose memory. In some examples, memories 106, 120 may store raw video data, eg, raw video from video source 104 and raw decoded video data from video decoder 300. Additionally or alternatively, memories 106, 120 may store software instructions executable by video encoder 200 and video decoder 300, respectively, for example. Although memory 106 and memory 120 are shown separately from video encoder 200 and video decoder 300 in this example, video encoder 200 and video decoder 300 may also use internal memory for functionally similar or equivalent purposes. It should be understood that this may include Additionally, memories 106, 120 may store encoded video data, eg, output from video encoder 200 and input to video decoder 300. In some examples, portions of memory 106, 120 may be allocated as one or more video buffers, eg, for storing raw decoded and/or encoded video data.

コンピュータ可読媒体110は、符号化ビデオデータをソースデバイス102から宛先デバイス116にトランスポートすることが可能な任意のタイプの媒体またはデバイスを表してよい。一例では、コンピュータ可読媒体110は、たとえば、無線周波数ネットワークまたはコンピュータベースのネットワークを介して、ソースデバイス102がリアルタイムで符号化ビデオデータを宛先デバイス116へ直接送信することを可能にする通信媒体を表す。ワイヤレス通信プロトコルなどの通信規格に従って、出力インターフェース108が、符号化ビデオデータを含む送信信号を変調してよく、入力インターフェース122が、受信された送信信号を復調してよい。通信媒体は、無線周波数(RF)スペクトルまたは1つもしくは複数の物理伝送線路などの、任意のワイヤレスまたは有線の通信媒体を備えてよい。通信媒体は、ローカルエリアネットワーク、ワイドエリアネットワーク、またはインターネットなどのグローバルネットワークなどのパケットベースネットワークの一部を形成してよい。通信媒体は、ルータ、スイッチ、基地局、またはソースデバイス102から宛先デバイス116への通信を容易にするために有用であり得る任意の他の機器を含んでよい。 Computer-readable medium 110 may represent any type of medium or device that can transport encoded video data from source device 102 to destination device 116. In one example, computer-readable medium 110 represents a communication medium that enables source device 102 to transmit encoded video data in real-time directly to destination device 116, such as over a radio frequency network or a computer-based network. . Output interface 108 may modulate the transmitted signal containing encoded video data and input interface 122 may demodulate the received transmitted signal in accordance with a communication standard, such as a wireless communication protocol. The communication medium may comprise any wireless or wired communication medium, such as the radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. Communication media may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 102 to destination device 116.

いくつかの例では、ソースデバイス102は、符号化データを出力インターフェース108から記憶デバイス112に出力し得る。同様に、宛先デバイス116は、入力インターフェース122を介して、記憶デバイス112からの符号化データにアクセスし得る。記憶デバイス112は、ハードドライブ、ブルーレイディスク、DVD、CD-ROM、フラッシュメモリ、揮発性もしくは不揮発性メモリ、または符号化ビデオデータを記憶するための任意の他の好適なデジタル記憶媒体などの、様々な分散型データ記憶媒体またはローカルにアクセスされるデータ記憶媒体のいずれかを含んでよい。 In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 may access encoded data from storage device 112 via input interface 122. Storage device 112 can be various, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. The data storage media may include either distributed data storage media or locally accessed data storage media.

いくつかの例では、ソースデバイス102は、ソースデバイス102によって生成された符号化ビデオデータを記憶し得るファイルサーバ114または別の中間記憶デバイスに、符号化ビデオデータを出力し得る。宛先デバイス116は、ストリーミングまたはダウンロードを介して、ファイルサーバ114からの記憶されたビデオデータにアクセスし得る。 In some examples, source device 102 may output encoded video data to file server 114 or another intermediate storage device that may store encoded video data generated by source device 102. Destination device 116 may access stored video data from file server 114 via streaming or downloading.

ファイルサーバ114は、符号化ビデオデータを記憶するとともにその符号化ビデオデータを宛先デバイス116へ送信することが可能な、任意のタイプのサーバデバイスであってよい。ファイルサーバ114は、(たとえば、ウェブサイト用の)ウェブサーバ、(ファイル転送プロトコル(FTP)またはファイル配信オーバー単方向トランスポート(FLUTE:File Delivery over Unidirectional Transport)プロトコルなどの)ファイル転送プロトコルサービスを提供するように構成されたサーバ、コンテンツ配信ネットワーク(CDN)デバイス、ハイパーテキスト転送プロトコル(HTTP)サーバ、マルチメディアブロードキャストマルチキャストサービス(MBMS)もしくは拡張MBMS(eMBMS)サーバ、および/またはネットワーク接続ストレージ(NAS)デバイスを表してよい。ファイルサーバ114は、追加または代替として、動的適応ストリーミングオーバーHTTP(DASH:Dynamic Adaptive Streaming over HTTP)、HTTPライブストリーミング(HLS:HTTP Live Streaming)、リアルタイムストリーミングプロトコル(RTSP:Real Time Streaming Protocol)、HTTP動的ストリーミングなどの、1つまたは複数のHTTPストリーミングプロトコルを実施してよい。 File server 114 may be any type of server device capable of storing encoded video data and transmitting the encoded video data to destination device 116. File server 114 is a web server (e.g., for a website), and provides file transfer protocol services (such as File Transfer Protocol (FTP) or File Delivery over Unidirectional Transport (FLUTE) protocol). a server, content delivery network (CDN) device, hypertext transfer protocol (HTTP) server, multimedia broadcast multicast service (MBMS) or enhanced MBMS (eMBMS) server, and/or network attached storage (NAS) configured to May represent a device. File server 114 may additionally or alternatively support Dynamic Adaptive Streaming over HTTP (DASH), HTTP Live Streaming (HLS), Real Time Streaming Protocol (RTSP), HTTP One or more HTTP streaming protocols may be implemented, such as dynamic streaming.

宛先デバイス116は、インターネット接続を含む任意の標準的なデータ接続を通じて、ファイルサーバ114からの符号化ビデオデータにアクセスし得る。これは、ワイヤレスチャネル(たとえば、Wi-Fi接続)、有線接続(たとえば、デジタル加入者回線(DSL)、ケーブルモデムなど)、またはファイルサーバ114上に記憶された符号化ビデオデータにアクセスするのに適した両方の組合せを含んでよい。入力インターフェース122は、ファイルサーバ114からメディアデータを取り出すかもしくは受信するための上記で説明した様々なプロトコル、またはメディアデータを取り出すための他のそのようなプロトコルのうちの、いずれか1つまたは複数に従って動作するように構成され得る。 Destination device 116 may access encoded video data from file server 114 through any standard data connection, including an Internet connection. This can be used to access encoded video data stored on a wireless channel (e.g., Wi-Fi connection), a wired connection (e.g., digital subscriber line (DSL), cable modem, etc.), or on a file server 114. May include suitable combinations of both. Input interface 122 supports any one or more of the various protocols described above for retrieving or receiving media data from file server 114 or other such protocols for retrieving media data. may be configured to operate according to.

出力インターフェース108および入力インターフェース122は、ワイヤレス送信機/受信機、モデム、有線ネットワーキング構成要素(たとえば、Ethernetカード)、様々なIEEE802.11規格のうちのいずれかに従って動作するワイヤレス通信構成要素、または他の物理構成要素を表してよい。出力インターフェース108および入力インターフェース122がワイヤレス構成要素を備える例では、出力インターフェース108および入力インターフェース122は、4G、4G-LTE(ロングタームエボリューション)、LTEアドバンスト、5Gなどのセルラー通信規格に従って、符号化ビデオデータなどのデータを転送するように構成され得る。出力インターフェース108がワイヤレス送信機を備えるいくつかの例では、出力インターフェース108および入力インターフェース122は、IEEE802.11仕様、IEEE802.15仕様(たとえば、ZigBee(商標))、Bluetooth(商標)規格などの他のワイヤレス規格に従って、符号化ビデオデータなどのデータを転送するように構成され得る。いくつかの例では、ソースデバイス102および/または宛先デバイス116は、それぞれのシステムオンチップ(SoC)デバイスを含んでよい。たとえば、ソースデバイス102は、ビデオエンコーダ200および/または出力インターフェース108に起因する機能性を実行するためのSoCデバイスを含んでよく、宛先デバイス116は、ビデオデコーダ300および/または入力インターフェース122に起因する機能性を実行するためのSoCデバイスを含んでよい。 Output interface 108 and input interface 122 may be a wireless transmitter/receiver, modem, wired networking component (e.g., an Ethernet card), wireless communication component operating according to any of various IEEE 802.11 standards, or other may represent a physical component of In examples where the output interface 108 and the input interface 122 comprise wireless components, the output interface 108 and the input interface 122 provide encoded video according to cellular communication standards such as 4G, 4G-LTE (Long Term Evolution), LTE Advanced, 5G, etc. The device may be configured to transfer data such as data. In some examples where output interface 108 comprises a wireless transmitter, output interface 108 and input interface 122 may be compatible with other standards such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., ZigBee(TM)), the Bluetooth(TM) standard, etc. may be configured to transfer data, such as encoded video data, in accordance with a wireless standard. In some examples, source device 102 and/or destination device 116 may include respective system-on-chip (SoC) devices. For example, source device 102 may include an SoC device for performing functionality due to video encoder 200 and/or output interface 108, and destination device 116 may include a video decoder 300 and/or input interface 122. May include an SoC device to perform the functionality.

本開示の技法は、オーバージエアテレビジョンブロードキャスト、ケーブルテレビジョン送信、衛星テレビジョン送信、動的適応ストリーミングオーバーHTTP(DASH)などのインターネットストリーミングビデオ送信、データ記憶媒体上で符号化されているデジタルビデオ、データ記憶媒体上に記憶されたデジタルビデオの復号、または他の適用例などの、様々なマルチメディア適用例のうちのいずれかをサポートするビデオコーディングに適用されてよい。 The techniques of this disclosure can be applied to over-the-air television broadcasts, cable television transmissions, satellite television transmissions, Internet streaming video transmissions such as Dynamic Adaptive Streaming over HTTP (DASH), and digital video encoded on a data storage medium. It may be applied to video coding to support any of a variety of multimedia applications, such as video, decoding of digital video stored on a data storage medium, or other applications.

宛先デバイス116の入力インターフェース122は、コンピュータ可読媒体110(たとえば、通信媒体、記憶デバイス112、ファイルサーバ114など)から符号化ビデオビットストリームを受信する。符号化ビデオビットストリームは、ビデオブロックまたは他のコード化ユニット(たとえば、スライス、ピクチャ、ピクチャグループ、シーケンスなど)の特性および/または処理を記述する値を有するシンタックス要素などの、ビデオデコーダ300によっても使用されビデオエンコーダ200によって規定されるシグナリング情報を含んでよい。ディスプレイデバイス118は、復号ビデオデータの復号ピクチャをユーザに表示する。ディスプレイデバイス118は、液晶ディスプレイ(LCD)、プラズマディスプレイ、有機発光ダイオード(OLED)ディスプレイ、または別のタイプのディスプレイデバイスなどの、様々なディスプレイデバイスのうちのいずれかを表してよい。 Input interface 122 of destination device 116 receives an encoded video bitstream from computer-readable medium 110 (eg, a communication medium, storage device 112, file server 114, etc.). The encoded video bitstream is encoded by video decoder 300, such as syntax elements having values that describe the characteristics and/or processing of video blocks or other coded units (e.g., slices, pictures, picture groups, sequences, etc.). may also include signaling information used and defined by video encoder 200. Display device 118 displays decoded pictures of the decoded video data to the user. Display device 118 may represent any of a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

図1に示さないが、いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は各々、オーディオエンコーダおよび/またはオーディオデコーダと統合されてよく、共通のデータストリームの中にオーディオとビデオの両方を含む多重化されたストリームを処理するために、適切なMUX-DEMUXユニット、または他のハードウェアおよび/もしくはソフトウェアを含んでよい。適用可能な場合、MUX-DEMUXユニットは、ITU H.223マルチプレクサプロトコル、またはユーザデータグラムプロトコル(UDP)などの他のプロトコルに準拠し得る。 Although not shown in FIG. 1, in some examples, video encoder 200 and video decoder 300 may each be integrated with an audio encoder and/or audio decoder to include both audio and video in a common data stream. It may include a suitable MUX-DEMUX unit or other hardware and/or software to process the multiplexed streams. If applicable, the MUX-DEMUX unit may comply with the ITU H.223 multiplexer protocol or other protocols such as User Datagram Protocol (UDP).

ビデオエンコーダ200およびビデオデコーダ300は各々、1つまたは複数のマイクロプロセッサ、デジタル信号プロセッサ(DSP)、特定用途向け集積回路(ASIC)、フィールドプログラマブルゲートアレイ(FPGA)、個別論理、ソフトウェア、ハードウェア、ファームウェア、またはそれらの任意の組合せなどの、様々な好適なエンコーダおよび/またはデコーダ回路構成のうちのいずれかとして実装され得る。技法が部分的にソフトウェアで実装されるとき、デバイスは、好適な非一時的コンピュータ可読媒体にソフトウェア用の命令を記憶してよく、本開示の技法を実行するために1つまたは複数のプロセッサを使用してハードウェアの中で命令を実行し得る。すなわち、実行されたとき、本開示で説明する例示的な技法を1つまたは複数のプロセッサに実行させる命令をその上に記憶する、コンピュータ可読記憶媒体があってよい。ビデオエンコーダ200およびビデオデコーダ300の各々は、1つまたは複数のエンコーダまたはデコーダの中に含まれてよく、それらのいずれも、それぞれのデバイスの中で複合エンコーダ/デコーダ(コーデック)の一部として統合されてよい。ビデオエンコーダ200および/またはビデオデコーダ300を含むデバイスは、集積回路、マイクロプロセッサ、および/またはセルラー電話などのワイヤレス通信デバイスを備えてよい。 Video encoder 200 and video decoder 300 each include one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, May be implemented as any of a variety of suitable encoder and/or decoder circuitry, such as firmware, or any combination thereof. When the techniques are partially implemented in software, the device may store instructions for the software on a suitable non-transitory computer-readable medium and implement one or more processors to execute the techniques of this disclosure. can be used to execute instructions in hardware. That is, there may be a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to perform the example techniques described in this disclosure. Each of video encoder 200 and video decoder 300 may be included within one or more encoders or decoders, any of which may be integrated within their respective devices as part of a combined encoder/decoder (CODEC). It's okay to be. Devices including video encoder 200 and/or video decoder 300 may include integrated circuits, microprocessors, and/or wireless communication devices such as cellular telephones.

以下のことはビデオコーディング規格を説明する。ビデオコーディング規格は、そのスケーラブルビデオコーディング(SVC)拡張およびマルチビュービデオコーディング(MVC)拡張を含む、ITU-T H.261、ISO/IEC MPEG-1 Visual、ITU-T H.262またはISO/IEC MPEG-2 Visual、ITU-T H.263、ISO/IEC MPEG-4 Visual、およびITU-T H.264(ISO/IEC MPEG-4 AVCとも呼ばれる)を含む。加えて、その範囲拡張、マルチビュー拡張(MV-HEVC)、およびスケーラブル拡張(SHVC)を含む、高効率ビデオコーディング(HEVC)またはITU-T H.265が、ビデオコーディング共同研究部会(JCT-VC)、ならびにITU-Tビデオコーディングエキスパートグループ(VCEG)とISO/IECモーションピクチャエキスパートグループ(MPEG)との3Dビデオコーディング拡張策定共同研究部会(JCT-3V)によって、策定されている。HEVC仕様は、ITU-T H.265、「Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Coding of Moving Video, High efficiency Video Coding」、国際電気通信連合、2016年12月、664頁から入手可能である。 The following describes video coding standards. Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC, including its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions. Includes MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC). In addition, High Efficiency Video Coding (HEVC) or ITU-T H.265, including its range extension, multi-view extension (MV-HEVC), and scalable extension (SHVC), has been approved by the Joint Video Coding Working Group (JCT-VC). ) and the Joint Working Group for 3D Video Coding Extensions (JCT-3V) between the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Motion Picture Experts Group (MPEG). HEVC specification available from ITU-T H.265, “Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Coding of Moving Video, High efficiency Video Coding”, International Telecommunication Union, December 2016, p. 664 It is possible.

ITU-T VCEG(Q6/16)およびISO/IEC MPEG(JTC 1/SC29/WG11)は、現在のHEVC規格(スクリーンコンテンツコーディングおよび高ダイナミックレンジコーディングに対する、その現在の拡張および近々の拡張を含む)の圧縮能力を著しく超える圧縮能力を有する将来のビデオコーディング技術の標準化を検討中である。そのグループは、このエリアのそれらの専門家によって提案される圧縮技術設計を評価するために、共同ビデオ探究部会(JVET)と呼ばれる共同研究の取組みの中で、この探求活動に対して一緒に作業中である。参照ソフトウェアの最新バージョン、すなわち、VVCテストモデル10(VTM10.0)は、https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTMからダウンロードされ得る。 ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC29/WG11) define the current HEVC standard, including its current and upcoming extensions for screen content coding and high dynamic range coding. The standardization of future video coding technologies with compression capabilities significantly exceeding those of The group worked together on this exploration effort in a joint research effort called the Joint Video Exploration Group (JVET) to evaluate compression technology designs proposed by those experts in this area. It's inside. The latest version of the reference software, namely VVC Test Model 10 (VTM10.0), can be downloaded from https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM.

ビデオエンコーダ200およびビデオデコーダ300は、高効率ビデオコーディング(HEVC)とも呼ばれるITU-T H.265、またはマルチビューおよび/もしくはスケーラブルビデオコーディング拡張などのそれらへの拡張などの、ビデオコーディング規格に従って動作し得る。代替として、ビデオエンコーダ200およびビデオデコーダ300は、多用途ビデオコーディング(VVC)とも呼ばれるITU-T H.266などの、他のプロプライエタリ規格または業界規格に従って動作し得る。VVC規格のドラフトは、Brossら、「Versatile Video Coding (Draft 10)」、ITU-T SG16 WP 3とISO/IEC JTC 1/SC29/WG11との共同ビデオエキスパートチーム(JVET)、遠隔会議による第18回会合、2020年6月22日～7月1日、JVET-S2001-vA(以下で、「VVCドラフト10」)に記載されている。VVCドラフト10の編集改良が、Brossら、「Versatile Video Coding Editorial Refinements on Draft 10」、ITU-T SG16 WP 3とISO/IEC JTC 1/SC29/WG11との共同ビデオエキスパートチーム(JVET)、遠隔会議による第20回会合、2020年10月7日～16日、JVET-T2001-v2に記載されている。多用途ビデオコーディングおよびテストモデル10(VTM10.0)のアルゴリズム記述は、J.Chen、Y.Ye、およびS.Kim、「Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11)」、JVET-T2002、2020年12月(以下で、JVET-T2002)として参照され得る。しかしながら、本開示の技法は、いかなる特定のコーディング規格にも限定されない。 Video encoder 200 and video decoder 300 operate according to a video coding standard, such as ITU-T H.265, also referred to as High Efficiency Video Coding (HEVC), or extensions thereto such as multi-view and/or scalable video coding extensions. obtain. Alternatively, video encoder 200 and video decoder 300 may operate according to other proprietary or industry standards, such as ITU-T H.266, also referred to as Versatile Video Coding (VVC). The draft of the VVC standard was published by Bross et al., "Versatile Video Coding (Draft 10)", Joint Video Expert Team (JVET) of ITU-T SG16 WP 3 and ISO/IEC JTC 1/SC29/WG11, No. 18 by teleconference. Meeting, June 22, 2020 - July 1, 2020, JVET-S2001-vA (hereinafter referred to as "VVC Draft 10"). Editorial refinements of VVC Draft 10 are presented in Bross et al., “Versatile Video Coding Editorial Refinements on Draft 10,” Joint Video Expert Team (JVET) between ITU-T SG16 WP 3 and ISO/IEC JTC 1/SC29/WG11, remote conference. 20th Meeting, October 7-16, 2020, JVET-T2001-v2. Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11), J. Chen, Y. Ye, and S. Kim, “Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11),” JVET- T2002, December 2020 (hereinafter referred to as JVET-T2002). However, the techniques of this disclosure are not limited to any particular coding standard.

一般に、ビデオエンコーダ200およびビデオデコーダ300は、ピクチャのブロックベースのコーディングを実行し得る。「ブロック」という用語は、一般に、処理される(たとえば、符号化および/または復号プロセスにおいて符号化、復号、または他の方法で使用される)べきデータを含む構造を指す。たとえば、ブロックは、ルミナンスおよび/またはクロミナンスデータのサンプルの2次元行列を含んでよい。一般に、ビデオエンコーダ200およびビデオデコーダ300は、YUV(たとえば、Y、Cb、Cr)フォーマットで表されるビデオデータをコーディングし得る。すなわち、ピクチャのサンプルのための赤色、緑色、および青色(RGB)データをコーディングするのではなく、ビデオエンコーダ200およびビデオデコーダ300は、ルミナンス成分およびクロミナンス成分をコーディングしてよく、ここで、クロミナンス成分は、赤色相と青色相の両方のクロミナンス成分を含んでよい。いくつかの例では、ビデオエンコーダ200は、符号化する前に、受信されたRGBフォーマットされたデータをYUV表現に変換し、ビデオデコーダ300は、YUV表現をRGBフォーマットに変換する。代替として、前処理ユニットおよび後処理ユニット(図示せず)が、これらの変換を実行してもよい。 Generally, video encoder 200 and video decoder 300 may perform block-based coding of pictures. The term "block" generally refers to a structure containing data to be processed (eg, encoded, decoded, or otherwise used in an encoding and/or decoding process). For example, a block may include a two-dimensional matrix of samples of luminance and/or chrominance data. Generally, video encoder 200 and video decoder 300 may code video data represented in YUV (eg, Y, Cb, Cr) format. That is, rather than coding red, green, and blue (RGB) data for samples of a picture, video encoder 200 and video decoder 300 may code luminance and chrominance components, where the chrominance component may include chrominance components in both the red and blue hues. In some examples, video encoder 200 converts the received RGB formatted data to a YUV representation and video decoder 300 converts the YUV representation to RGB format before encoding. Alternatively, a pre-processing unit and a post-processing unit (not shown) may perform these transformations.

本開示は、一般に、ピクチャのデータを符号化または復号するプロセスを含めるように、ピクチャのコーディング(たとえば、符号化および復号)に言及することがある。同様に、本開示は、ブロックのためのデータを符号化または復号するプロセスを含めるように、ピクチャのブロックのコーディング、たとえば、予測および/または残差コーディングに言及することがある。符号化ビデオビットストリームは、一般に、コーディング決定(たとえば、コーディングモード)およびブロックへのピクチャの区分を表すシンタックス要素のための一連の値を含む。したがって、ピクチャまたはブロックをコーディングすることへの言及は、一般に、ピクチャまたはブロックを形成するシンタックス要素のためのコーディング値として理解されるべきである。 This disclosure may generally refer to coding (eg, encoding and decoding) pictures to include the process of encoding or decoding data for a picture. Similarly, this disclosure may refer to coding blocks of pictures, such as predictive and/or residual coding, to include the process of encoding or decoding data for the blocks. A coded video bitstream typically includes a series of values for syntax elements representing coding decisions (eg, coding modes) and partitioning of pictures into blocks. Accordingly, references to coding a picture or block should generally be understood as coding values for the syntax elements forming the picture or block.

HEVCは、コーディングユニット(CU)、予測ユニット(PU)、および変換ユニット(TU)を含む、様々なブロックを規定する。HEVCによれば、(ビデオエンコーダ200などの)ビデオコーダは、4分木構造に従ってコーディングツリーユニット(CTU)をCUに区分する。すなわち、ビデオコーダは、CTUおよびCUを4個の等しい重複しない正方形に区分し、4分木の各ノードは、0個または4個のいずれかの子ノードを有する。子ノードがないノードは「リーフノード」と呼ばれることがあり、そのようなリーフノードのCUは、1つもしくは複数のPUおよび/または1つもしくは複数のTUを含んでよい。ビデオコーダはPUおよびTUをさらに区分し得る。たとえば、HEVCでは、残差4分木(RQT)はTUの区分を表す。HEVCでは、PUはインター予測データを表し、TUは残差データを表す。イントラ予測されるCUは、イントラモード表示などのイントラ予測情報を含む。 HEVC defines various blocks, including coding units (CUs), prediction units (PUs), and transform units (TUs). According to HEVC, a video coder (such as video encoder 200) partitions coding tree units (CTUs) into CUs according to a quadtree structure. That is, the video coder partitions the CTU and CU into four equal non-overlapping squares, where each node in the quadtree has either 0 or 4 child nodes. A node without child nodes may be referred to as a "leaf node," and the CU of such a leaf node may include one or more PUs and/or one or more TUs. A video coder may further partition PUs and TUs. For example, in HEVC, a residual quadrant tree (RQT) represents a partition of a TU. In HEVC, PU represents inter-predicted data and TU represents residual data. The intra-predicted CU includes intra-prediction information such as intra-mode indication.

別の例として、ビデオエンコーダ200およびビデオデコーダ300は、VVCに従って動作するように構成され得る。VVCによれば、(ビデオエンコーダ200などの)ビデオコーダは、ピクチャを複数のコーディングツリーユニット(CTU)に区分する。ビデオエンコーダ200は、4分木2分木(QTBT)構造またはマルチタイプツリー(MTT)構造などのツリー構造に従ってCTUを区分し得る。QTBT構造は、HEVCのCU、PU、およびTUの間の分離などの、複数の区分タイプの概念を排除する。QTBT構造は、2つのレベル、すなわち、4分木区分に従って区分された第1のレベル、および2分木区分に従って区分された第2のレベルを含む。QTBT構造のルートノードはCTUに対応する。2分木のリーフノードはコーディングユニット(CU)に対応する。 As another example, video encoder 200 and video decoder 300 may be configured to operate according to VVC. According to VVC, a video coder (such as video encoder 200) partitions a picture into multiple coding tree units (CTUs). Video encoder 200 may partition CTUs according to a tree structure, such as a quadrilateral tree (QTBT) structure or a multitype tree (MTT) structure. The QTBT structure eliminates the concept of multiple partition types, such as the separation between CU, PU, and TU in HEVC. The QTBT structure includes two levels: a first level partitioned according to quadtree partitioning, and a second level partitioned according to binary partitioning. The root node of the QTBT structure corresponds to the CTU. Leaf nodes of the binary tree correspond to coding units (CUs).

MTT区分構造では、ブロックは、4分木(QT)区分、2分木(BT)区分、および1つまたは複数のタイプの3分木(TT:triple tree)(3元木(TT:ternary tree)とも呼ばれる)区分を使用して区分され得る。3分木区分または3元木区分は、ブロックが3個のサブブロックに分割される区分である。いくつかの例では、3分木区分または3元木区分は、中心を通って元のブロックを分割することなく、ブロックを3個のサブブロックに分割する。MTTにおける区分タイプ(たとえば、QT、BT、およびTT)は対称または非対称であり得る。 In an MTT partition structure, blocks are divided into quadtree (QT) partitions, binary tree (BT) partitions, and one or more types of triple tree (TT) partitions. ) can be partitioned using partitions. A ternary tree partition or ternary tree partition is a partition in which a block is divided into three subblocks. In some examples, ternary tree partitioning or ternary tree partitioning partitions a block into three subblocks without splitting the original block through the center. The partition types in MTT (eg, QT, BT, and TT) can be symmetric or asymmetric.

いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、ルミナンス成分およびクロミナンス成分の各々を表すために単一のQTBTまたはMTT構造を使用してよいが、他の例では、ビデオエンコーダ200およびビデオデコーダ300は、ルミナンス成分のための1つのQTBT/MTT構造および両方のクロミナンス成分のための別のQTBT/MTT構造(または、それぞれのクロミナンス成分のための2つのQTBT/MTT構造)などの、2つ以上のQTBTまたはMTT構造を使用してよい。 In some examples, video encoder 200 and video decoder 300 may use a single QTBT or MTT structure to represent each of the luminance and chrominance components, while in other examples, video encoder 200 and video The decoder 300 includes two QTBT/MTT structures, such as one QTBT/MTT structure for the luminance component and another QTBT/MTT structure for both chrominance components (or two QTBT/MTT structures for each chrominance component). More than one QTBT or MTT structure may be used.

ビデオエンコーダ200およびビデオデコーダ300は、HEVCによる4分木区分、QTBT区分、MTT区分、または他の区分構造を使用するように構成され得る。説明のために、本開示の技法の説明はQTBT区分に関して提示される。しかしながら、本開示の技法がまた、4分木区分または他のタイプの区分も使用するように構成されたビデオコーダに適用され得ることを理解されたい。 Video encoder 200 and video decoder 300 may be configured to use quadtree partitioning with HEVC, QTBT partitioning, MTT partitioning, or other partitioning structures. For purposes of explanation, descriptions of the techniques of this disclosure are presented in terms of the QTBT partition. However, it should be understood that the techniques of this disclosure may also be applied to video coders configured to use quadtree partitioning or other types of partitioning as well.

いくつかの例では、CTUは、ルーマサンプルのコーディングツリーブロック(CTB)、3つのサンプルアレイを有するピクチャのクロマサンプルの2つの対応するCTB、またはモノクロピクチャ、もしくはサンプルをコーディングするために使用される3つの別個の色平面およびシンタックス構造を使用してコーディングされるピクチャの、サンプルのCTBを含む。CTBは、CTBへの成分の分割が区分であるような、いくつかの値のNに対するサンプルのN×Nブロックであってよい。成分は、4:2:0、4:2:2、もしくは4:4:4カラーフォーマットでピクチャを構成する3つのアレイ(ルーマおよび2つのクロマ)のうちの1つからの、アレイもしくは単一のサンプル、またはモノクロフォーマットでピクチャを作成するアレイの、アレイもしくは単一のサンプルである。いくつかの例では、コーディングブロックは、コーディングブロックへのCTBの分割が区分であるような、いくつかの値のMおよびNに対するサンプルのM×Nブロックである。 In some examples, a CTU is used to code a coding tree block (CTB) of luma samples, two corresponding CTBs of chroma samples of a picture with three sample arrays, or a monochrome picture, or samples. Contains sample CTBs of pictures coded using three separate color planes and syntax structures. A CTB may be an N×N block of samples for some value of N, such that the division of components into CTBs is piecewise. The components can be arrays or singles from one of the three arrays (luma and two chromas) that make up a picture in 4:2:0, 4:2:2, or 4:4:4 color format. An array of samples or a single sample of an array that creates a picture in monochrome format. In some examples, the coding block is an M×N block of samples for several values of M and N, such that the division of the CTB into coding blocks is partitioned.

ブロック(たとえば、CTUまたはCU)は、ピクチャの中で様々な方法でグループ化され得る。一例として、ブリックは、ピクチャの中の特定のタイル内のCTU行の長方形領域を指すことがある。タイルは、ピクチャの中の特定のタイル列および特定のタイル行内のCTUの長方形領域であり得る。タイル列は、ピクチャの高さに等しい高さおよび(たとえば、ピクチャパラメータセットの中などの)シンタックス要素によって指定される幅を有する、CTUの長方形領域を指す。タイル行は、(たとえば、ピクチャパラメータセットの中などの)シンタックス要素によって指定される高さおよびピクチャの幅に等しい幅を有する、CTUの長方形領域を指す。 Blocks (eg, CTUs or CUs) may be grouped in various ways within a picture. As an example, a brick may refer to a rectangular area of a CTU row within a particular tile within a picture. A tile may be a rectangular region of a CTU within a particular tile column and a particular tile row within a picture. A tile column refers to a rectangular region of a CTU with a height equal to the height of a picture and a width specified by a syntax element (eg, in a picture parameter set). A tile row refers to a rectangular region of the CTU with a height specified by a syntax element (eg, in a picture parameter set) and a width equal to the width of the picture.

いくつかの例では、タイルは複数のブリックに区分されてよく、ブリックの各々はタイル内の1つまたは複数のCTU行を含んでよい。複数のブリックに区分されないタイルも、ブリックと呼ばれることがある。しかしながら、タイルの真のサブセットであるブリックは、タイルと呼ばれないことがある。 In some examples, a tile may be partitioned into multiple bricks, and each brick may include one or more CTU rows within the tile. Tiles that are not divided into multiple bricks may also be called bricks. However, bricks that are a true subset of tiles may not be called tiles.

ピクチャの中のブリックはまた、スライスの中に並べられてよい。スライスは、単一のネットワークアブストラクションレイヤ(NAL)ユニットの中に独占的に含まれ得る、ピクチャの整数個のブリックであってよい。いくつかの例では、スライスは、いくつかの完全なタイル、または1つのタイルの完全なブリックの連続するシーケンスのみのいずれかを含む。 Bricks within a picture may also be arranged into slices. A slice may be an integer number of bricks of a picture that may be contained exclusively within a single network abstraction layer (NAL) unit. In some examples, a slice includes either a number of complete tiles or only a contiguous sequence of complete bricks of one tile.

本開示は、垂直次元および水平次元に換算して(CUまたは他のビデオブロックなどの)ブロックのサンプル次元を指すために、互換的に「N×N」および「NバイN(N by N)」、たとえば、16×16サンプルまたは16バイ16サンプルを使用してよい。一般に、16×16のCUは、垂直方向に16個のサンプル(y=16)および水平方向に16個のサンプル(x=16)を有する。同様に、N×NのCUは、一般に、垂直方向にN個のサンプルおよび水平方向にN個のサンプルを有し、ここで、Nは負ではない整数値を表す。CUの中のサンプルは、行および列をなして並べられてよい。その上、CUは、必ずしも水平方向において垂直方向におけるのと同じ数のサンプルを有する必要があるとは限らない。たとえば、CUはN×Mサンプルを備えてよく、ここで、Mは必ずしもNに等しいとは限らない。 This disclosure uses "N×N" and "N by N" interchangeably to refer to the sample dimension of a block (such as a CU or other video block) in terms of vertical and horizontal dimensions. ”, for example, 16×16 samples or 16 by 16 samples may be used. Generally, a 16x16 CU has 16 samples vertically (y=16) and 16 samples horizontally (x=16). Similarly, an N×N CU generally has N samples vertically and N samples horizontally, where N represents a non-negative integer value. Samples within a CU may be arranged in rows and columns. Moreover, a CU does not necessarily need to have the same number of samples in the horizontal direction as in the vertical direction. For example, a CU may comprise N×M samples, where M is not necessarily equal to N.

ビデオエンコーダ200は、予測情報および/または残差情報、ならびに他の情報を表す、CUのためのビデオデータを符号化する。予測情報は、CUのための予測ブロックを形成するためにCUがどのように予測されることになるのかを示す。残差情報は、概して、符号化する前のCUのサンプルと予測ブロックのサンプルとの間のサンプルごとの差分を表す。 Video encoder 200 encodes video data for a CU representing prediction and/or residual information, as well as other information. Prediction information indicates how a CU is to be predicted to form a prediction block for the CU. The residual information generally represents the sample-by-sample difference between the samples of the CU and the samples of the predictive block before encoding.

CUを予測するために、ビデオエンコーダ200は、概して、インター予測またはイントラ予測を通じてCUのための予測ブロックを形成してよい。インター予測とは、概して、以前にコーディングされたピクチャのデータからCUを予測することを指し、イントラ予測とは、概して、同じピクチャの以前にコーディングされたデータからCUを予測することを指す。インター予測を実行するために、ビデオエンコーダ200は、1つまたは複数の動きベクトルを使用して予測ブロックを生成し得る。ビデオエンコーダ200は、一般に、たとえば、CUと参照ブロックとの間の差分に関してCUに密に整合する参照ブロックを識別するために、動き探索を実行し得る。ビデオエンコーダ200は、参照ブロックが現在CUに密に整合するかどうかを決定するために、絶対差分和(SAD)、2乗差分和(SSD)、平均絶対差(MAD)、平均2乗差(MSD)、または他のそのような差分計算を使用して差分メトリックを計算し得る。いくつかの例では、ビデオエンコーダ200は、単方向予測または双方向予測を使用して現在CUを予測し得る。 To predict a CU, video encoder 200 may generally form a prediction block for the CU through inter-prediction or intra-prediction. Inter prediction generally refers to predicting a CU from data of a previously coded picture, and intra prediction generally refers to predicting a CU from previously coded data of the same picture. To perform inter prediction, video encoder 200 may use one or more motion vectors to generate a predictive block. Video encoder 200 may generally perform motion searching, for example, to identify reference blocks that closely match the CU with respect to the difference between the CU and the reference block. Video encoder 200 uses sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared difference ( MSD), or other such difference calculations may be used to calculate the difference metric. In some examples, video encoder 200 may predict the current CU using unidirectional prediction or bidirectional prediction.

VVCのいくつかの例はまた、インター予測モードと見なされ得るアフィン動き補償モードを提供する。アフィン動き補償モードでは、ビデオエンコーダ200は、ズームインもしくはズームアウト、回転、観点移動、または他の不規則な動きタイプなどの、非並進運動を表す2つ以上の動きベクトルを決定し得る。 Some examples of VVC also provide an affine motion compensation mode, which can be considered an inter-prediction mode. In affine motion compensation mode, video encoder 200 may determine two or more motion vectors representing non-translational motion, such as zooming in or out, rotation, perspective movement, or other irregular motion types.

イントラ予測を実行するために、ビデオエンコーダ200は、イントラ予測モードを選択して予測ブロックを生成してよい。VVCのいくつかの例は、様々な方向モードを含む67個のイントラ予測モード、ならびに平面モードおよびDCモードを提供する。概して、ビデオエンコーダ200は、現在ブロックのサンプルをそこから予測すべき、現在ブロック(たとえば、CUのブロック)への隣接サンプルを記述するイントラ予測モードを選択する。そのようなサンプルは、概して、ビデオエンコーダ200がラスタ走査順序で(左から右に、上から下に)CTUおよびCUをコーディングすると想定すると、現在ブロックと同じピクチャの中の現在ブロックの上方、上方かつ左、または左にあってよい。 To perform intra prediction, video encoder 200 may select an intra prediction mode to generate predictive blocks. Some examples of VVC provide 67 intra-prediction modes, including various directional modes, as well as planar and DC modes. Generally, video encoder 200 selects an intra prediction mode that describes adjacent samples to a current block (eg, a block of a CU) from which samples of the current block are to be predicted. Such samples are generally above, above, and above the current block in the same picture as the current block, assuming that video encoder 200 codes CTUs and CUs in raster scan order (from left to right, top to bottom). and may be on the left or on the left.

ビデオエンコーダ200は、現在ブロックのための予測モードを表すデータを符号化する。たとえば、インター予測モードの場合、ビデオエンコーダ200は、様々な利用可能なインター予測モードのうちのどれが使用されるのか、ならびに対応するモードに対する動き情報を表すデータを符号化してよい。単方向または双方向インター予測の場合、たとえば、ビデオエンコーダ200は、高度動きベクトル予測(AMVP)モードまたはマージモードを使用して動きベクトルを符号化してよい。ビデオエンコーダ200は、アフィン動き補償モードのための動きベクトルを符号化するために類似のモードを使用してよい。 Video encoder 200 encodes data representing the prediction mode for the current block. For example, for inter-prediction modes, video encoder 200 may encode data representative of which of the various available inter-prediction modes will be used, as well as motion information for the corresponding mode. For unidirectional or bidirectional inter-prediction, for example, video encoder 200 may encode motion vectors using advanced motion vector prediction (AMVP) mode or merge mode. Video encoder 200 may use a similar mode to encode motion vectors for affine motion compensation mode.

ブロックのイントラ予測またはインター予測などの予測に続いて、ビデオエンコーダ200はブロックのための残差データを計算してよい。残差ブロックなどの残差データは、ブロックと、対応する予測モードを使用して形成されたそのブロックのための予測ブロックとの間の、サンプルごとの差分を表す。ビデオエンコーダ200は、サンプル領域ではなく変換領域において変換データを生成するために、1つまたは複数の変換を残差ブロックに適用してよい。たとえば、ビデオエンコーダ200は、離散コサイン変換(DCT)、整数変換、ウェーブレット変換、または概念的に類似の変換を残差ビデオデータに適用してよい。追加として、ビデオエンコーダ200は、モード依存非分離可能2次変換(MDNSST:mode-dependent non-separable secondary transform)、信号依存変換、カルーネンレーベ変換(KLT)などの2次変換を、最初の変換に続いて適用してよい。ビデオエンコーダ200は、1つまたは複数の変換の適用に続いて変換係数を生成する。 Following prediction, such as intra-prediction or inter-prediction, of a block, video encoder 200 may calculate residual data for the block. Residual data, such as a residual block, represents the sample-by-sample difference between a block and a predicted block for that block formed using a corresponding prediction mode. Video encoder 200 may apply one or more transforms to the residual block to generate transform data in the transform domain rather than the sample domain. For example, video encoder 200 may apply a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video data. Additionally, video encoder 200 may perform a secondary transform, such as a mode-dependent non-separable secondary transform (MDNSST), a signal-dependent transform, or a Karhunen-Loeve transform (KLT), as the first transform. May be applied following. Video encoder 200 generates transform coefficients following application of one or more transforms.

上述のように、変換係数を生成するための任意の変換に続いて、ビデオエンコーダ200は、変換係数の量子化を実行してよい。量子化とは、一般に、変換係数を表すために使用されるデータの量をできる限り低減するために変換係数が量子化され、さらなる圧縮を行うプロセスを指す。量子化プロセスを実行することによって、ビデオエンコーダ200は、変換係数の一部または全部に関連するビット深度を低減し得る。たとえば、ビデオエンコーダ200は、量子化の間にnビット値をmビット値に切り捨ててよく、ここで、nはmよりも大きい。いくつかの例では、量子化を実行するために、ビデオエンコーダ200は、量子化されるべき値のビット単位の右シフトを実行してよい。 As described above, following any transformation to generate transform coefficients, video encoder 200 may perform quantization of the transform coefficients. Quantization generally refers to a process in which transform coefficients are quantized to reduce as much as possible the amount of data used to represent the transform coefficients, providing further compression. By performing a quantization process, video encoder 200 may reduce the bit depth associated with some or all of the transform coefficients. For example, video encoder 200 may truncate an n-bit value to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, video encoder 200 may perform a bitwise right shift of the value to be quantized.

量子化に続いて、ビデオエンコーダ200は、変換係数を走査してよく、量子化変換係数を含む2次元行列から1次元ベクトルを生成する。走査は、より高いエネルギー(したがって、より低い周波数)の変換係数をベクトルの前方に置き、より低いエネルギー(したがって、より高い周波数)の変換係数をベクトルの後方に置くように設計され得る。いくつかの例では、ビデオエンコーダ200は、量子化変換係数を走査してシリアル化ベクトルを生成するために既定の走査順序を利用してよく、次いで、ベクトルの量子化変換係数をエントロピー符号化してよい。他の例では、ビデオエンコーダ200は適応走査を実行してよい。量子化変換係数を走査して1次元ベクトルを形成した後、ビデオエンコーダ200は、たとえば、コンテキスト適応型バイナリ算術コーディング(CABAC)に従って、1次元ベクトルをエントロピー符号化してよい。ビデオエンコーダ200はまた、ビデオデータを復号する際のビデオデコーダ300による使用のために、符号化ビデオデータに関連するメタデータを記述するシンタックス要素のための値をエントロピー符号化してよい。 Following quantization, video encoder 200 may scan the transform coefficients to generate a one-dimensional vector from the two-dimensional matrix containing the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) transform coefficients at the front of the vector and lower energy (and therefore higher frequency) transform coefficients at the back of the vector. In some examples, video encoder 200 may utilize a default scanning order to scan the quantized transform coefficients to generate a serialized vector, and then entropy encode the quantized transform coefficients of the vector. good. In other examples, video encoder 200 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 200 may entropy encode the one-dimensional vector, for example, according to context-adaptive binary arithmetic coding (CABAC). Video encoder 200 may also entropy encode values for syntax elements that describe metadata associated with encoded video data for use by video decoder 300 in decoding the video data.

CABACを実行するために、ビデオエンコーダ200は、コンテキストモデル内のコンテキストを、送信されるべきシンボルに割り当ててよい。コンテキストは、たとえば、シンボルの隣接値がゼロ値化されているか否かに関係し得る。確率決定は、シンボルに割り当てられたコンテキストに基づき得る。 To perform CABAC, video encoder 200 may assign contexts in a context model to symbols to be transmitted. The context may relate, for example, to whether adjacent values of the symbol are zeroed. The probability determination may be based on the context assigned to the symbol.

ビデオエンコーダ200は、たとえば、ピクチャヘッダ、ブロックヘッダ、スライスヘッダ、またはシーケンスパラメータセット(SPS)、ピクチャパラメータセット(PPS)、もしくはビデオパラメータセット(VPS)などの他のシンタックスデータの中で、ビデオデコーダ300へのブロックベースのシンタックスデータ、ピクチャベースのシンタックスデータ、およびシーケンスベースのシンタックスデータなどのシンタックスデータをさらに生成してよい。ビデオデコーダ300は、そのようなシンタックスデータを同様に復号して、対応するビデオデータをどのように復号すべきかを決定してよい。 Video encoder 200 may encode video data in a picture header, block header, slice header, or other syntax data such as a sequence parameter set (SPS), picture parameter set (PPS), or video parameter set (VPS), for example. Syntax data such as block-based syntax data, picture-based syntax data, and sequence-based syntax data to decoder 300 may also be generated. Video decoder 300 may similarly decode such syntax data to determine how to decode the corresponding video data.

このようにして、ビデオエンコーダ200は、符号化ビデオデータ、たとえば、ブロック(たとえば、CU)へのピクチャの区分ならびにブロックのための予測情報および/または残差情報を記述するシンタックス要素を含む、ビットストリームを生成し得る。最終的に、ビデオデコーダ300は、ビットストリームを受信してよく、符号化ビデオデータを復号してよい。 In this manner, video encoder 200 includes syntax elements that describe encoded video data, e.g., partitioning of pictures into blocks (e.g., CUs) and prediction information and/or residual information for the blocks. A bitstream can be generated. Ultimately, video decoder 300 may receive the bitstream and decode the encoded video data.

概して、ビデオデコーダ300は、ビデオエンコーダ200によって実行されるプロセスへの相反プロセスを実行して、ビットストリームの符号化ビデオデータを復号する。たとえば、ビデオデコーダ300は、ビデオエンコーダ200のCABAC符号化プロセスとは相反ではあるが実質的に類似の方法で、CABACを使用してビットストリームのシンタックス要素に対する値を復号してよい。シンタックス要素は、CTUへのピクチャの区分およびQTBT構造などの対応する区分構造による各CTUの区分のための、区分情報を規定して、CTUのCUを規定し得る。シンタックス要素は、ビデオデータのブロック(たとえば、CU)に対する予測情報および残差情報をさらに規定し得る。 Generally, video decoder 300 performs a reciprocal process to the process performed by video encoder 200 to decode encoded video data of a bitstream. For example, video decoder 300 may decode values for syntax elements of a bitstream using CABAC in a manner that is contrary to, but substantially similar to, the CABAC encoding process of video encoder 200. The syntax element may define partitioning information for partitioning pictures into CTUs and partitioning each CTU by a corresponding partitioning structure, such as a QTBT structure, to define CUs of CTUs. The syntax elements may further define prediction and residual information for a block of video data (eg, a CU).

残差情報は、たとえば、量子化変換係数によって表され得る。ビデオデコーダ300は、ブロックの量子化変換係数を逆量子化および逆変換して、ブロックのための残差ブロックを再生してよい。ビデオデコーダ300は、シグナリングされた予測モード(イントラ予測またはインター予測)および関連する予測情報(たとえば、インター予測に対する動き情報)を使用して、ブロックのための予測ブロックを形成する。ビデオデコーダ300は、次いで、予測ブロックと残差ブロックとを(サンプルごとに)結合して元のブロックを再生してよい。ビデオデコーダ300は、ブロックの境界に沿った視覚アーティファクトを低減するためのデブロッキングプロセスを実行するなどの、追加の処理を実行してよい。 The residual information may be represented by, for example, quantized transform coefficients. Video decoder 300 may dequantize and inverse transform the quantized transform coefficients of the block to reconstruct a residual block for the block. Video decoder 300 forms a prediction block for the block using the signaled prediction mode (intra-prediction or inter-prediction) and associated prediction information (eg, motion information for inter-prediction). Video decoder 300 may then combine the predictive block and the residual block (sample by sample) to reconstruct the original block. Video decoder 300 may perform additional processing, such as performing a deblocking process to reduce visual artifacts along block boundaries.

本開示の技法によれば、ビデオエンコーダ200およびビデオデコーダ300は、双方向オプティカルフロー(BDOF)を実行するように構成され得る。たとえば、ビデオエンコーダ200は、現在ブロックを符号化することの一部としてBDOFを実行するように構成されてよく、ビデオデコーダ300は、現在ブロックを復号することの一部としてBDOFを実行するように構成されてよい。 According to the techniques of this disclosure, video encoder 200 and video decoder 300 may be configured to perform bidirectional optical flow (BDOF). For example, video encoder 200 may be configured to perform BDOF as part of encoding the current block, and video decoder 300 may be configured to perform BDOF as part of decoding the current block. may be configured.

より詳細に説明するように、いくつかの例では、ビデオコーダ(たとえば、ビデオエンコーダ200および/またはビデオデコーダ300)は、入力ブロックを複数のサブブロックに分割することであって、入力ブロックのサイズがコーディングユニットのサイズよりも小さいかまたはそれに等しいことと、複数のサブブロックのうちのサブブロックに双方向オプティカルフロー(BDOF)が適用されることになることを、条件が満たされることに基づいて決定することと、サブブロックを複数のサブサブブロックに分割することと、サブサブブロックのうちの1つまたは複数に対して、改善された動きベクトルを決定することであって、1つまたは複数のサブサブブロックのうちのサブサブブロックに対する改善された動きベクトルが、サブサブブロックの中の複数のサンプルにとって同じであることと、1つまたは複数のサブサブブロックに対する改善された動きベクトルに基づいてサブブロックに対してBDOFを実行することとを行うように構成され得る。 As described in more detail, in some examples, a video coder (e.g., video encoder 200 and/or video decoder 300) partitions an input block into multiple subblocks, the size of the input block being is smaller than or equal to the size of the coding unit, and bidirectional optical flow (BDOF) is to be applied to a subblock of the plurality of subblocks. determining an improved motion vector for one or more of the sub-sub-blocks, the method comprising: determining an improved motion vector for one or more of the sub-sub-blocks; The improved motion vector for a sub-subblock of a block is the same for multiple samples in the sub-subblock, and the improved motion vector for one or more sub-subblocks is and performing BDOF.

別の例として、ビデオコーダは、入力ブロックを複数のサブブロックに分割することであって、入力ブロックのサイズがコーディングユニットのサイズよりも小さいかまたはそれに等しいことと、複数のサブブロックのうちのサブブロックに双方向オプティカルフロー(BDOF)が適用されることになることを、条件が満たされることに基づいて決定することと、サブブロックを複数のサブサブブロックに分割することと、サブブロックの中の1つまたは複数のサンプルの各々に対して、改善された動きベクトルを決定することと、サブブロックの中の1つまたは複数のサンプルの各々に対する改善された動きベクトルに基づいてサブブロックに対してBDOFを実行することとを行うように構成され得る。 As another example, a video coder may partition an input block into multiple subblocks, the size of the input block being smaller than or equal to the size of the coding unit, and the size of the input block being smaller than or equal to the size of the coding unit. determining that bidirectional optical flow (BDOF) is to be applied to a sub-block based on the satisfaction of a condition; dividing the sub-block into multiple sub-sub-blocks; determining an improved motion vector for each of the one or more samples in the sub-block; and determining an improved motion vector for each of the one or more samples in the sub-block. and performing BDOF.

たとえば、上記で説明したように、ビデオエンコーダ200またはビデオデコーダ300は、サブブロックの中の1つまたは複数のサンプルの各々に対して、改善された動きベクトルを決定してよく、サブブロックの中の1つまたは複数のサンプルの各々に対する改善された動きベクトルに基づいてBDOFを実行してよい。本開示では、サブブロックの中の1つまたは複数のサンプルの各々に対する改善された動きベクトルに基づいてBDOFを実行することは、「ピクセル単位BDOF」と呼ばれる。たとえば、ピクセル単位BDOFでは、サブブロックの中のすべてのサンプルにとって同じである改善された1つの動きベクトルを有するのではなく、サブブロックの中の各サンプルに対する改善された動きベクトルは別個に決定される。 For example, as explained above, video encoder 200 or video decoder 300 may determine an improved motion vector for each of the one or more samples in the sub-block, and BDOF may be performed based on the improved motion vector for each of the one or more samples of . In this disclosure, performing BDOF based on the improved motion vector for each of one or more samples in a sub-block is referred to as "pixel-by-pixel BDOF." For example, in pixel-wise BDOF, instead of having one refined motion vector that is the same for all samples in a sub-block, the refined motion vector for each sample in a sub-block is determined separately. Ru.

改善された動きベクトルは、必ずしもサブブロックのための動きベクトルが変更されることを意味し得るとは限らない。むしろ、サンプルのための改善された動きベクトルは、予測サンプルを生成するために予測ブロックの中のサンプルが調整される量を決定するために使用され得る。たとえば、第1のサブブロックの第1のサンプルに対して、第1の改善された動きベクトルは、第1の予測サンプルを生成するために予測ブロックの中の第1のサンプルをどのくらい調整すべきかを示してよく、第1のサブブロックの第2のサンプルに対して、第2の改善された動きベクトルは、第2の予測サンプルを生成するために予測ブロックの中の第2のサンプルをどのくらい調整すべきかを示してよく、以下同様である。 An improved motion vector may not necessarily mean that the motion vector for the sub-block is changed. Rather, the improved motion vector for the sample may be used to determine the amount by which the samples in the prediction block are adjusted to generate the prediction sample. For example, for the first sample of the first sub-block, the first improved motion vector is how much the first sample in the prediction block should be adjusted to generate the first prediction sample. For the second sample of the first sub-block, the second improved motion vector indicates how much the second sample in the prediction block needs to be used to generate the second prediction sample. It may indicate whether the adjustment should be made, and the same applies hereafter.

本開示で説明する1つまたは複数の例によれば、ビデオエンコーダ200およびビデオデコーダ300は、ブロック(たとえば、入力ブロック)の1つまたは複数のサブブロックの各サブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定してよい。たとえば、上記で説明したように、ビデオエンコーダ200およびビデオデコーダ300は、条件が満たされることに基づいてピクセル単位BDOFを実行してよい。条件が満たされることとは、サブブロックに対するひずみ値がしきい値よりも大きいかどうかであってよい。 According to one or more examples described in this disclosure, video encoder 200 and video decoder 300 determine a pixel-by-pixel BDOF for each sub-block of one or more sub-blocks of a block (e.g., an input block). One of execution or BDOF bypass may be determined based on the respective distortion values. For example, as explained above, video encoder 200 and video decoder 300 may perform pixel-by-pixel BDOF based on a condition being met. The condition being met may be whether the distortion value for the sub-block is greater than a threshold.

したがって、いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300にとってのオプションは、サブブロックに対するひずみ値がしきい値よりも大きいかそれともしきい値以下であるかに基づいて、サブブロックに対してピクセル単位BDOFを実行することまたはBDOFをバイパスすることのいずれかに設定され得る。たとえば、いくつかの技法では、ビデオエンコーダ200およびビデオデコーダ300が、ピクセル単位BDOFを実行するが、BDOFがバイパスされるかどうかをサブブロックごとに決定しないことが可能であり得る。サブブロックごとにBDOFがバイパスされ得るいくつかの技法では、ピクセル単位BDOFが利用可能でなかったことがあり得る。本開示で説明する例示的な技法を用いて、ビデオエンコーダ200およびビデオデコーダ300は、選択的にピクセル単位BDOFを実行するかまたはBDOFをバイパスするように構成されてよく、そのことは、適切に復号オーバーヘッドのバランスをとる、より良好なビデオ圧縮をもたらし得る。 Thus, in some examples, the options for video encoder 200 and video decoder 300 are to determine whether the distortion value for a sub-block is greater than a threshold value or less than or equal to a threshold value. Can be configured to either perform pixel-by-pixel BDOF or bypass BDOF. For example, in some techniques it may be possible for video encoder 200 and video decoder 300 to perform pixel-by-pixel BDOF but not determine on a sub-block basis whether BDOF is bypassed. In some techniques where BDOF may be bypassed on a per-subblock basis, per-pixel BDOF may not have been available. Using the example techniques described in this disclosure, video encoder 200 and video decoder 300 may be configured to selectively perform pixel-by-pixel BDOF or bypass BDOF, as appropriate. Balances decoding overhead and may result in better video compression.

1つまたは複数の例では、それぞれ、ビデオデータを符号化または復号するために、ビデオエンコーダ200およびビデオデコーダ300は、ビデオデータのブロックに対してBDOFが有効化されることを決定し、ブロックに対してBDOFが有効化されるという決定に基づいて、またはより一般には、ブロックに対してBDOFが有効化されるとき、ブロックを複数のサブブロックに分割するように構成され得る。ビデオエンコーダ200およびビデオデコーダ300は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定してよい。それぞれのひずみ値を決定するための例示的な方法が、以下でより詳細に説明される。ビデオエンコーダ200およびビデオデコーダ300は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定してよく、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定してよい。 In one or more examples, to encode or decode video data, respectively, video encoder 200 and video decoder 300 determine that BDOF is enabled for a block of video data and that the block is Based on a determination that BDOF is enabled for a block, or more generally when BDOF is enabled for a block, the block may be configured to be divided into a plurality of sub-blocks. Video encoder 200 and video decoder 300 may determine respective distortion values for each subblock of one or more of the plurality of subblocks. Exemplary methods for determining respective strain values are described in more detail below. Video encoder 200 and video decoder 300 perform one of: pixel-by-pixel BDOF is performed or BDOF is bypassed for each sub-block of one or more of the plurality of sub-blocks. may be determined based on the respective distortion values, and the predicted samples for each sub-block of the one or more sub-blocks may be determined based on the determination that pixel-wise BDOF is performed or BDOF is bypassed. You may do so.

ビデオエンコーダ200は、予測サンプルとブロックのサンプルとの間の差分を示す残差値を決定してよく、残差値をシグナリングしてよい。ビデオデコーダ300は、予測サンプルとブロックのサンプルとの間の差分を示す残差値を受信してよく、残差値を予測サンプルに加算してブロックを再構成してよい。いくつかの例では、残差値を受信するために、ビデオデコーダ300は、残差値を示す情報を受信するように構成されてよく、ビデオデコーダ300は、そうした情報から残差値を決定する。 Video encoder 200 may determine a residual value that indicates the difference between the predicted samples and the samples of the block, and may signal the residual value. Video decoder 300 may receive residual values indicating the differences between the predicted samples and the samples of the block and may add the residual values to the predicted samples to reconstruct the block. In some examples, to receive the residual value, video decoder 300 may be configured to receive information indicative of the residual value, and video decoder 300 determines the residual value from such information. .

本開示は、概して、シンタックス要素などのいくつかの情報を「シグナリング」することに言及することがある。「シグナリング」という用語は、一般に、符号化ビデオデータを復号するために使用されるシンタックス要素および/または他のデータに対する値の通信を指してよい。すなわち、ビデオエンコーダ200は、ビットストリームの中でシンタックス要素に対する値をシグナリングし得る。概して、シグナリングとは、ビットストリームの中で値を生成することを指す。上述のように、ソースデバイス102は、実質的にリアルタイムで、または宛先デバイス116によって後で取り出すためにシンタックス要素を記憶デバイス112に記憶するときに行われ得るようなリアルタイムではなく、ビットストリームを宛先デバイス116にトランスポートし得る。 This disclosure may generally refer to "signaling" some information, such as syntax elements. The term "signaling" may generally refer to the communication of values for syntax elements and/or other data used to decode encoded video data. That is, video encoder 200 may signal values for syntax elements within the bitstream. Generally speaking, signaling refers to the generation of values within a bitstream. As mentioned above, the source device 102 may store the bitstream in substantially real-time or not in real-time, as may be done when storing the syntax elements on the storage device 112 for later retrieval by the destination device 116. and may be transported to destination device 116.

図2Aおよび図2Bは、例示的な4分木2分木(QTBT)構造130および対応するコーディングツリーユニット(CTU)132を示す概念図である。実線は4分木分割を表し、点線は2分木分割を示す。2分木の分割された(すなわち、非リーフ)各ノードでは、どの分割タイプ(すなわち、水平または垂直)が使用されるのかを示すために1つのフラグがシグナリングされ、ここで、この例では、0が水平分割を示し、1が垂直分割を示す。4分木分割の場合、4分木ノードはブロックをサイズが等しい4つのサブブロックに水平および垂直に分割するので、分割タイプを示す必要はない。したがって、QTBT構造130の領域木レベル(すなわち、実線)のための(分割情報などの)シンタックス要素、およびQTBT構造130の予測木レベル(すなわち、破線)のための(分割情報などの)シンタックス要素を、ビデオエンコーダ200が符号化してよくビデオデコーダ300が復号してよい。QTBT構造130の末端リーフノードによって表されるCUのための、予測データおよび変換データなどのビデオデータを、ビデオエンコーダ200が符号化してよくビデオデコーダ300が復号してよい。 2A and 2B are conceptual diagrams illustrating an example quadtree binary tree (QTBT) structure 130 and a corresponding coding tree unit (CTU) 132. A solid line represents a quadtree partition, and a dotted line represents a binary tree partition. At each split (i.e., non-leaf) node of the binary tree, one flag is signaled to indicate which split type (i.e., horizontal or vertical) is used, where, in this example, 0 indicates horizontal division, 1 indicates vertical division. For quadtree partitioning, there is no need to indicate the partition type because the quadtree node partitions the block horizontally and vertically into four equal-sized subblocks. Therefore, syntax elements (such as partitioning information) for the region tree level (i.e., solid lines) of QTBT structure 130, and syntax elements (such as partitioning information) for the prediction tree level (i.e., dashed lines) of QTBT structure 130. Tax elements may be encoded by video encoder 200 and decoded by video decoder 300. Video encoder 200 may encode and video decoder 300 may decode video data, such as prediction data and transform data, for CUs represented by terminal leaf nodes of QTBT structure 130.

一般に、図2BのCTU132は、第1のレベルおよび第2のレベルでQTBT構造130のノードに対応するブロックのサイズを規定するパラメータに関連し得る。これらのパラメータは、CTUサイズ(サンプルの中のCTU132のサイズを表す)、最小4分木サイズ(MinQTSize、許容される最小の4分木リーフノードサイズを表す)、最大2分木サイズ(MaxBTSize、許容される最大の2分木ルートノードサイズを表す)、最大2分木深度(MaxBTDepth、許容される最大の2分木深度を表す)、および最小2分木サイズ(MinBTSize、許容される最小の2分木リーフノードサイズを表す)を含んでよい。 In general, CTU 132 of FIG. 2B may be associated with parameters that define the size of blocks corresponding to nodes of QTBT structure 130 at first and second levels. These parameters are CTU size (representing the size of CTU132 in the sample), minimum quadtree size (MinQTSize, representing the smallest allowed quadtree leaf node size), maximum binary size (MaxBTSize, ), the maximum binary tree depth (MaxBTDepth, which represents the maximum allowed binary tree root node size), and the minimum binary tree size (MinBTSize, which represents the minimum allowed binary tree depth). (representing the binary tree leaf node size).

CTUに対応するQTBT構造のルートノードは、QTBT構造の第1のレベルにおいて4個の子ノードを有してよく、子ノードの各々は、4分木区分に従って区分されてよい。すなわち、第1のレベルのノードは、(子ノードを有しない)リーフノードであるか、4個の子ノードを有するかのいずれかである。QTBT構造130の例は、分岐のための実線を有する親ノードおよび子ノードを含むようなノードを表す。第1のレベルのノードが、許容される最大の2分木ルートノードサイズ(MaxBTSize)よりも大きくない場合、ノードはそれぞれの2分木によってさらに区分されてよい。分割の結果として生じるノードが、許容される最小の2分木リーフノードサイズ(MinBTSize)または許容される最大の2分木深度(MaxBTDepth)に達するまで、1つのノードの2分木分割が反復され得る。QTBT構造130の例は、分岐のための破線を有するようなノードを表す。2分木リーフノードはコーディングユニット(CU)と呼ばれ、コーディングユニット(CU)は、それ以上の区分を伴わずに予測(たとえば、イントラピクチャ予測またはインターピクチャ予測)および変換のために使用される。上記で説明したように、CUは「ビデオブロック」または「ブロック」と呼ばれることもある。 The root node of the QTBT structure corresponding to a CTU may have four child nodes at the first level of the QTBT structure, and each of the child nodes may be partitioned according to the quadtree partition. That is, a first level node is either a leaf node (with no child nodes) or has four child nodes. An example QTBT structure 130 represents a node that includes a parent node and a child node with solid lines for branching. If the first level nodes are not larger than the maximum allowed binary tree root node size (MaxBTSize), the nodes may be further partitioned by their respective binary trees. Binary tree splitting of one node is iterated until the resulting node reaches the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). obtain. The example QTBT structure 130 represents such nodes with dashed lines for branches. Binary tree leaf nodes are called coding units (CUs), and coding units (CUs) are used for prediction (e.g., intra-picture prediction or inter-picture prediction) and transformations without further partitioning. . As explained above, a CU is sometimes referred to as a "video block" or "block."

QTBT区分構造の一例では、CTUサイズは128×128(ルーマサンプルおよび2つの対応する64×64クロマサンプル)として設定され、MinQTSizeは16×16として設定され、MaxBTSizeは64×64として設定され、(幅と高さの両方に対する)MinBTSizeは4として設定され、MaxBTDepthは4として設定される。4分木リーフノードを生成するために、4分木区分が最初にCTUに適用される。4分木リーフノードは、16×16(すなわち、MinQTSize)から128×128(すなわち、CTUサイズ)までのサイズを有してよい。4分木リーフノードが128×128である場合、サイズがMaxBTSize(すなわち、この例では64×64)を超えるので、4分木リーフノードは2分木によってそれ以上分割されない。そうでない場合、4分木リーフノードは2分木によってさらに区分される。したがって、4分木リーフノードは2分木のためのルートノードでもあり、0としての2分木深度を有する。2分木深度がMaxBTDepth(この例では4)に達すると、さらなる分割は許容されない。MinBTSize(この例では4)に等しい幅を有する2分木ノードは、その2分木ノードに対してそれ以上の垂直分割(すなわち、幅の分割)が許容されないことを暗示する。同様に、MinBTSizeに等しい高さを有する2分木ノードは、その2分木ノードに対してそれ以上の水平分割(すなわち、高さの分割)が許容されないことを暗示する。上述のように、2分木のリーフノードはCUと呼ばれ、それ以上の区分を伴わずに予測および変換に従ってさらに処理される。 In an example QTBT partitioned structure, CTU size is set as 128x128 (luma sample and two corresponding 64x64 chroma samples), MinQTSize is set as 16x16, MaxBTSize is set as 64x64, and ( MinBTSize (for both width and height) is set as 4 and MaxBTDepth is set as 4. To generate quadtree leaf nodes, quadtree partitioning is first applied to the CTU. Quadtree leaf nodes may have a size from 16×16 (ie, MinQTSize) to 128×128 (ie, CTU size). If the quadtree leaf node is 128x128, the quadtree leaf node is not further divided by the binary tree because the size exceeds MaxBTSize (ie, 64x64 in this example). Otherwise, the quadtree leaf nodes are further partitioned by a binary tree. Therefore, the quadtree leaf node is also the root node for the binary tree and has the binary tree depth as 0. Once the binary tree depth reaches MaxBTDepth (4 in this example), no further splits are allowed. A binary tree node with a width equal to MinBTSize (4 in this example) implies that no further vertical splits (ie, width splits) are allowed for that binary tree node. Similarly, a binary tree node with a height equal to MinBTSize implies that no further horizontal splits (ie, height splits) are allowed for that binary tree node. As mentioned above, the leaf nodes of the binary tree are called CUs and are further processed according to predictions and transformations without further partitioning.

図3は、本開示の技法を実行し得る例示的なビデオエンコーダ200を示すブロック図である。図3は説明のために提供され、本開示において広く例示および説明するような技法の限定と見なされるべきでない。説明のために、本開示は、VVC(開発中のITU-T H.266)およびHEVC(ITU-T H.265)の技法によるビデオエンコーダ200を説明する。しかしながら、本開示の技法は、他のビデオコーディング規格に対して構成されているビデオ符号化デバイスによって実行され得る。 FIG. 3 is a block diagram illustrating an example video encoder 200 that may implement the techniques of this disclosure. FIG. 3 is provided for illustrative purposes and should not be considered a limitation of the techniques as broadly illustrated and described in this disclosure. For purposes of illustration, this disclosure describes a video encoder 200 in accordance with VVC (ITU-T H.266 under development) and HEVC (ITU-T H.265) techniques. However, the techniques of this disclosure may be performed by video encoding devices configured for other video coding standards.

図3の例では、ビデオエンコーダ200は、ビデオデータメモリ230、モード選択ユニット202、残差生成ユニット204、変換処理ユニット206、量子化ユニット208、逆量子化ユニット210、逆変換処理ユニット212、再構成ユニット214、フィルタユニット216、復号ピクチャバッファ(DPB)218、およびエントロピー符号化ユニット220を含む。ビデオデータメモリ230、モード選択ユニット202、残差生成ユニット204、変換処理ユニット206、量子化ユニット208、逆量子化ユニット210、逆変換処理ユニット212、再構成ユニット214、フィルタユニット216、DPB218、およびエントロピー符号化ユニット220のいずれかまたはすべては、1つもしくは複数のプロセッサの中または処理回路構成の中に実装され得る。たとえば、ビデオエンコーダ200のユニットは、ハードウェア回路構成の一部としての1つもしくは複数の回路もしくは論理要素として、またはプロセッサ、ASIC、もしくはFPGAの一部として実装され得る。その上、ビデオエンコーダ200は、これらおよび他の機能を実行するための追加または代替のプロセッサまたは処理回路構成を含んでよい。 In the example of FIG. 3, video encoder 200 includes video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, It includes a configuration unit 214, a filter unit 216, a decoded picture buffer (DPB) 218, and an entropy encoding unit 220. Video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, DPB 218, and Any or all of entropy encoding units 220 may be implemented in one or more processors or in processing circuitry. For example, a unit of video encoder 200 may be implemented as one or more circuits or logic elements as part of a hardware circuitry, or as part of a processor, ASIC, or FPGA. Moreover, video encoder 200 may include additional or alternative processors or processing circuitry to perform these and other functions.

ビデオデータメモリ230は、ビデオエンコーダ200の構成要素によって符号化されるべきビデオデータを記憶してよい。ビデオエンコーダ200は、たとえば、ビデオソース104(図1)から、ビデオデータメモリ230の中に記憶されたビデオデータを受信してよい。DPB218は、ビデオエンコーダ200による後続のビデオデータの予測における使用のために参照ビデオデータを記憶する、参照ピクチャメモリとして作用してよい。ビデオデータメモリ230およびDPB218は、同期DRAM(SDRAM)を含むダイナミックランダムアクセスメモリ(DRAM)、磁気抵抗RAM(MRAM)、抵抗RAM(RRAM)、または他のタイプのメモリデバイスなどの、様々なメモリデバイスのいずれかによって形成され得る。ビデオデータメモリ230およびDPB218は、同じメモリデバイスまたは別個のメモリデバイスによって提供され得る。様々な例では、ビデオデータメモリ230は、図示したように、ビデオエンコーダ200の他の構成要素とともにオンチップであってよく、またはそれらの構成要素に対してオフチップであってもよい。 Video data memory 230 may store video data to be encoded by components of video encoder 200. Video encoder 200 may receive video data stored in video data memory 230, for example, from video source 104 (FIG. 1). DPB 218 may act as a reference picture memory that stores reference video data for use in predicting subsequent video data by video encoder 200. Video data memory 230 and DPB 218 can be used in a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. It can be formed by any of the following. Video data memory 230 and DPB 218 may be provided by the same memory device or separate memory devices. In various examples, video data memory 230 may be on-chip with other components of video encoder 200, as shown, or off-chip with respect to those components.

本開示では、ビデオデータメモリ230への言及は、そのように特に記載されない限り、ビデオエンコーダ200の内部のメモリに、またはそのように特に記載されない限り、ビデオエンコーダ200の外部のメモリに限定されるものとして、解釈されるべきではない。むしろ、ビデオデータメモリ230への言及は、符号化するためにビデオエンコーダ200が受信するビデオデータ(たとえば、符号化されることになる現在ブロックのためのビデオデータ)を記憶する参照メモリとして理解されるべきである。図1のメモリ106はまた、ビデオエンコーダ200の様々なユニットからの出力の一時的な記憶域を提供し得る。 In this disclosure, references to video data memory 230 are limited to memory internal to video encoder 200, unless specifically noted as such, or to memory external to video encoder 200, unless specifically noted as such. It should not be interpreted as such. Rather, references to video data memory 230 are understood as a reference memory that stores video data that video encoder 200 receives for encoding (e.g., video data for the current block to be encoded). Should. Memory 106 of FIG. 1 may also provide temporary storage of outputs from various units of video encoder 200.

図3の様々なユニットは、ビデオエンコーダ200によって実行される動作を理解する助けとなるために示される。ユニットは、固定機能回路、プログラマブル回路、またはそれらの組合せとして実装され得る。固定機能回路は、特定の機能性を提供する回路を指し、実行され得る動作に対して事前設定される。プログラマブル回路は、様々なタスクを実行するようにプログラムされ得る回路を指し、実行され得る動作においてフレキシブルな機能性を提供する。たとえば、プログラマブル回路は、ソフトウェアまたはファームウェアの命令によって規定される方式でプログラマブル回路を動作させるソフトウェアまたはファームウェアを実行し得る。固定機能回路は(たとえば、パラメータを受信するかまたはパラメータを出力するための)ソフトウェア命令を実行し得るが、固定機能回路が実行する動作のタイプは一般に不変である。いくつかの例では、ユニットのうちの1つまたは複数は異なる回路ブロック(固定機能またはプログラマブル)であってよく、いくつかの例では、ユニットのうちの1つまたは複数は集積回路であってよい。 The various units in FIG. 3 are shown to aid in understanding the operations performed by video encoder 200. A unit may be implemented as a fixed function circuit, a programmable circuit, or a combination thereof. Fixed function circuitry refers to circuitry that provides specific functionality and is preconfigured for operations that may be performed. Programmable circuit refers to a circuit that can be programmed to perform a variety of tasks, providing flexible functionality in the operations that can be performed. For example, a programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by the software or firmware instructions. Although fixed function circuits may execute software instructions (eg, to receive parameters or output parameters), the types of operations that fixed function circuits perform generally remain unchanged. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be an integrated circuit. .

ビデオエンコーダ200は、算術論理ユニット(ALU)、初等関数ユニット(EFU)、デジタル回路、アナログ回路、および/またはプログラマブル回路から形成されたプログラマブルコアを含んでよい。ビデオエンコーダ200の動作が、プログラマブル回路によって実行されるソフトウェアを使用して実行される例では、メモリ106(図1)が、ビデオエンコーダ200が受信および実行するソフトウェアの命令(たとえば、オブジェクトコード)を記憶してよく、またはビデオエンコーダ200内の別のメモリ(図示せず)が、そのような命令を記憶してもよい。 Video encoder 200 may include a programmable core formed from an arithmetic logic unit (ALU), an elementary function unit (EFU), digital circuitry, analog circuitry, and/or programmable circuitry. In examples where the operations of video encoder 200 are performed using software executed by programmable circuitry, memory 106 (FIG. 1) stores software instructions (e.g., object code) that video encoder 200 receives and executes. or another memory (not shown) within video encoder 200 may store such instructions.

ビデオデータメモリ230は、受信されたビデオデータを記憶するように構成される。ビデオエンコーダ200は、ビデオデータメモリ230からビデオデータのピクチャを取り出してよく、ビデオデータを残差生成ユニット204およびモード選択ユニット202に提供してよい。ビデオデータメモリ230の中のビデオデータは、符号化されることになる未加工ビデオデータであってよい。 Video data memory 230 is configured to store received video data. Video encoder 200 may retrieve pictures of video data from video data memory 230 and provide video data to residual generation unit 204 and mode selection unit 202. The video data in video data memory 230 may be raw video data that is to be encoded.

モード選択ユニット202は、動き推定ユニット222、動き補償ユニット224、およびイントラ予測ユニット226を含む。モード選択ユニット202は、他の予測モードに従ってビデオ予測を実行するための追加の機能ユニットを含んでよい。例として、モード選択ユニット202は、パレットユニット、(動き推定ユニット222および/または動き補償ユニット224の一部であり得る)イントラブロックコピーユニット、アフィンユニット、線形モデル(LM)ユニットなどを含んでよい。 Mode selection unit 202 includes a motion estimation unit 222, a motion compensation unit 224, and an intra prediction unit 226. Mode selection unit 202 may include additional functional units for performing video prediction according to other prediction modes. By way of example, mode selection unit 202 may include a palette unit, an intra block copy unit (which may be part of motion estimation unit 222 and/or motion compensation unit 224), an affine unit, a linear model (LM) unit, etc. .

モード選択ユニット202は、一般に、符号化パラメータの組合せおよびそのような組合せに対する結果として生じるレートひずみ値をテストするために複数の符号化パスを協調させる。符号化パラメータは、CUへのCTUの区分、CUのための予測モード、CUの残差データのための変換タイプ、CUの残差データのための量子化パラメータなどを含んでよい。モード選択ユニット202は、テストされた他の組合せよりもレートひずみ値が良好な符号化パラメータの組合せを最終的に選択してよい。 Mode selection unit 202 generally coordinates multiple encoding passes to test combinations of encoding parameters and resulting rate-distortion values for such combinations. The encoding parameters may include partitioning of the CTU into CUs, a prediction mode for the CU, a transform type for the residual data of the CU, a quantization parameter for the residual data of the CU, and the like. Mode selection unit 202 may ultimately select a combination of encoding parameters that has a better rate-distortion value than other combinations tested.

ビデオエンコーダ200は、ビデオデータメモリ230から取り出されたピクチャを一連のCTUに区分してよく、1つまたは複数のCTUをスライス内にカプセル化してよい。モード選択ユニット202は、上記で説明したHEVCのQTBT構造または4分木構造などのツリー構造に従ってピクチャのCTUを区分し得る。上記で説明したように、ビデオエンコーダ200は、ツリー構造に従ってCTUを区分することから1つまたは複数のCUを形成し得る。そのようなCUは、一般に、「ビデオブロック」または「ブロック」と呼ばれることもある。 Video encoder 200 may partition a picture retrieved from video data memory 230 into a series of CTUs and may encapsulate one or more CTUs within a slice. Mode selection unit 202 may partition the CTU of a picture according to a tree structure, such as the HEVC QTBT structure or quadtree structure described above. As explained above, video encoder 200 may form one or more CUs from partitioning the CTUs according to a tree structure. Such a CU is also commonly referred to as a "video block" or "block."

概して、モード選択ユニット202はまた、現在ブロック(たとえば、現在CU、またはHEVCでは、PUおよびTUの重複する部分)のための予測ブロックを生成するために、その構成要素(たとえば、動き推定ユニット222、動き補償ユニット224、およびイントラ予測ユニット226)を制御する。現在ブロックのインター予測の場合、動き推定ユニット222は、1つまたは複数の参照ピクチャ(たとえば、DPB218の中に記憶された1つまたは複数の以前にコーディングされたピクチャ)の中の密に整合する1つまたは複数の参照ブロックを識別するために動き探索を実行してよい。詳細には、動き推定ユニット222は、たとえば、絶対差分和(SAD)、2乗差分和(SSD)、平均絶対差(MAD)、平均2乗差(MSD)などに従って、可能な参照ブロックが現在ブロックにどのくらい類似しているのかを表す値を計算し得る。動き推定ユニット222は、概して、現在ブロックと検討中の参照ブロックとの間のサンプルごとの差分を使用して、これらの計算を実行してよい。動き推定ユニット222は、現在ブロックに最も密に整合する参照ブロックを示す、これらの計算の結果として生じる最も小さい値を有する参照ブロックを識別し得る。 In general, mode selection unit 202 also selects its components (e.g., motion estimation unit 222 , a motion compensation unit 224, and an intra prediction unit 226). For inter prediction of the current block, motion estimation unit 222 closely matches among one or more reference pictures (e.g., one or more previously coded pictures stored in DPB 218). Motion search may be performed to identify one or more reference blocks. In particular, the motion estimation unit 222 determines whether the possible reference blocks are currently A value representing how similar the blocks are can be calculated. Motion estimation unit 222 may generally perform these calculations using sample-by-sample differences between the current block and the reference block under consideration. Motion estimation unit 222 may identify the reference block with the smallest value resulting from these calculations, indicating the reference block that most closely matches the current block.

動き推定ユニット222は、現在ピクチャの中の現在ブロックの位置に対する参照ピクチャ中の参照ブロックの位置を規定する1つまたは複数の動きベクトル(MV)を形成し得る。動き推定ユニット222は、次いで、動きベクトルを動き補償ユニット224に提供し得る。たとえば、単方向インター予測の場合、動き推定ユニット222は単一の動きベクトルを提供し得るが、双方向インター予測の場合、動き推定ユニット222は2つの動きベクトルを提供し得る。動き補償ユニット224は、次いで、動きベクトルを使用して予測ブロックを生成し得る。たとえば、動き補償ユニット224は、動きベクトルを使用して参照ブロックのデータを取り出してよい。別の例として、動きベクトルが分数サンプル精度を有する場合、動き補償ユニット224は、1つまたは複数の補間フィルタに従って予測ブロックに対する値を補間してよい。その上、双方向インター予測の場合、動き補償ユニット224は、それぞれの動きベクトルによって識別された2つの参照ブロックのためのデータを取り出してよく、たとえば、サンプルごとの平均化または重み付き平均化を通じて、取り出されたデータを結合してよい。 Motion estimation unit 222 may form one or more motion vectors (MVs) that define the position of a reference block in a reference picture relative to the position of a current block in a current picture. Motion estimation unit 222 may then provide the motion vector to motion compensation unit 224. For example, for unidirectional inter-prediction, motion estimation unit 222 may provide a single motion vector, whereas for bidirectional inter-prediction, motion estimation unit 222 may provide two motion vectors. Motion compensation unit 224 may then generate a predictive block using the motion vector. For example, motion compensation unit 224 may use motion vectors to retrieve data for reference blocks. As another example, if the motion vector has fractional sample precision, motion compensation unit 224 may interpolate the values for the predictive block according to one or more interpolation filters. Moreover, for bidirectional inter-prediction, motion compensation unit 224 may retrieve data for the two reference blocks identified by their respective motion vectors, e.g., through sample-by-sample averaging or weighted averaging. , the retrieved data may be combined.

別の例として、イントラ予測またはイントラ予測コーディングの場合、イントラ予測ユニット226は、現在ブロックに隣接するサンプルから予測ブロックを生成してよい。たとえば、方向モードの場合、イントラ予測ユニット226は、一般に、隣接サンプルの値を数学的に結合してよく、計算されたこれらの値を現在ブロックにわたる定義された方向に分布させて、予測ブロックを生成してよい。別の例として、DCモードの場合、イントラ予測ユニット226は、現在ブロックに対する隣接サンプルの平均を計算してよく、予測ブロックの各サンプルに対して結果として生じるこの平均を含めるべき予測ブロックを生成し得る。 As another example, for intra prediction or intra predictive coding, intra prediction unit 226 may generate a predictive block from samples adjacent to the current block. For example, for directional mode, intra prediction unit 226 may generally mathematically combine the values of adjacent samples and distribute these calculated values in a defined direction across the current block to form the predicted block. May be generated. As another example, for DC mode, intra prediction unit 226 may calculate the average of adjacent samples for the current block and generate a prediction block that should include this resulting average for each sample of the prediction block. obtain.

モード選択ユニット202は、予測ブロックを残差生成ユニット204に提供する。残差生成ユニット204は、ビデオデータメモリ230から現在ブロックの未加工の符号化されていないバージョンを、またモード選択ユニット202から予測ブロックを受信する。残差生成ユニット204は、現在ブロックと予測ブロックとの間のサンプルごとの差分を計算する。結果として生じるサンプルごとの差分は、現在ブロックに対する残差ブロックを規定する。いくつかの例では、残差生成ユニット204はまた、残差差分パルスコード変調(RDPCM:residual differential pulse code modulation)を使用して残差ブロックを生成するために、残差ブロックの中のサンプル値の間の差分を決定し得る。いくつかの例では、残差生成ユニット204は、バイナリ減算を実行する1つまたは複数の減算器回路を使用して形成され得る。 Mode selection unit 202 provides predictive blocks to residual generation unit 204. Residual generation unit 204 receives the raw, uncoded version of the current block from video data memory 230 and the predictive block from mode selection unit 202. Residual generation unit 204 calculates the sample-by-sample difference between the current block and the predicted block. The resulting sample-by-sample difference defines the residual block with respect to the current block. In some examples, residual generation unit 204 also generates sample values in the residual block to generate the residual block using residual differential pulse code modulation (RDPCM). The difference between can be determined. In some examples, residual generation unit 204 may be formed using one or more subtractor circuits that perform binary subtraction.

モード選択ユニット202がCUをPUに区分する例では、各PUはルーマ予測ユニットおよび対応するクロマ予測ユニットに関連付けられてよい。ビデオエンコーダ200およびビデオデコーダ300は、様々なサイズを有するPUをサポートし得る。上記で示したように、CUのサイズは、CUのルーマコーディングブロックのサイズを指すことがあり、PUのサイズは、PUのルーマ予測ユニットのサイズを指すことがある。特定のCUのサイズが2N×2Nであることを想定すると、ビデオエンコーダ200は、イントラ予測に対して2N×2NまたはN×NというPUサイズ、およびインター予測に対して2N×2N、2N×N、N×2N、N×N、または類似の、対称のPUサイズをサポートし得る。ビデオエンコーダ200およびビデオデコーダ300はまた、インター予測に対して、2N×nU、2N×nD、nL×2N、およびnR×2NというPUサイズのための非対称区分をサポートし得る。 In examples where mode selection unit 202 partitions CUs into PUs, each PU may be associated with a luma prediction unit and a corresponding chroma prediction unit. Video encoder 200 and video decoder 300 may support PUs with various sizes. As indicated above, the size of a CU may refer to the size of the luma coding block of the CU, and the size of the PU may refer to the size of the luma prediction unit of the PU. Assuming that the size of a particular CU is 2N×2N, video encoder 200 uses a PU size of 2N×2N or N×N for intra prediction and 2N×2N, 2N×N for inter prediction. , N×2N, N×N, or similar symmetric PU sizes. Video encoder 200 and video decoder 300 may also support asymmetric partitioning for PU sizes of 2N×nU, 2N×nD, nL×2N, and nR×2N for inter prediction.

モード選択ユニット202がそれ以上CUをPUに区分しない例では、各CUはルーマコーディングブロックおよび対応するクロマコーディングブロックに関連付けられてよい。上記のように、CUのサイズは、CUのルーマコーディングブロックのサイズを指すことがある。ビデオエンコーダ200およびビデオデコーダ300は、2N×2N、2N×N、またはN×2NというCUサイズをサポートしてよい。 In examples where mode selection unit 202 no longer partitions CUs into PUs, each CU may be associated with a luma coding block and a corresponding chroma coding block. As mentioned above, the size of a CU may refer to the size of the luma coding block of the CU. Video encoder 200 and video decoder 300 may support CU sizes of 2N×2N, 2N×N, or N×2N.

いくつかの例として、イントラブロックコピーモードコーディング、アフィンモードコーディング、および線形モデル(LM)モードコーディングなどの、他のビデオコーディング技法のために、モード選択ユニット202は、コーディング技法に関連するそれぞれのユニットを介して、符号化されつつある現在ブロックのための予測ブロックを生成する。パレットモードコーディングなどのいくつかの例では、モード選択ユニット202は、予測ブロックを生成しなくてよく、代わりに、選択されたパレットに基づいてブロックを再構成するための方式を示すシンタックス要素を生成してよい。そのようなモードでは、モード選択ユニット202は、符号化されるべきこれらのシンタックス要素をエントロピー符号化ユニット220に提供し得る。 For other video coding techniques, such as intra block copy mode coding, affine mode coding, and linear model (LM) mode coding, as some examples, mode selection unit 202 selects the respective units associated with the coding technique. Generate a predictive block for the current block being encoded via . In some examples, such as palette mode coding, mode selection unit 202 may not generate a predictive block, but instead generates a syntax element that indicates a scheme for reconstructing the block based on the selected palette. May be generated. In such a mode, mode selection unit 202 may provide these syntax elements to be encoded to entropy encoding unit 220.

上記で説明したように、残差生成ユニット204は、現在ブロックおよび対応する予測ブロックのためのビデオデータを受信する。残差生成ユニット204は、次いで、現在ブロックのための残差ブロックを生成する。残差ブロックを生成するために、残差生成ユニット204は、予測ブロックと現在ブロックとの間のサンプルごとの差分を計算する。 As explained above, residual generation unit 204 receives video data for the current block and the corresponding predictive block. Residual generation unit 204 then generates a residual block for the current block. To generate a residual block, residual generation unit 204 calculates the sample-by-sample difference between the predicted block and the current block.

変換処理ユニット206は、変換係数のブロック(本明細書で「変換係数ブロック」と呼ばれる)を生成するために、1つまたは複数の変換を残差ブロックに適用する。変換処理ユニット206は、変換係数ブロックを形成するために、様々な変換を残差ブロックに適用し得る。たとえば、変換処理ユニット206は、離散コサイン変換(DCT)、方向変換、カルーネンレーベ変換(KLT)、または概念的に類似の変換を残差ブロックに適用し得る。いくつかの例では、変換処理ユニット206は、複数の変換、たとえば、回転変換などの1次変換および2次変換を、残差ブロックに対して実行してよい。いくつかの例では、変換処理ユニット206は、変換を残差ブロックに適用しない。 Transform processing unit 206 applies one or more transforms to the residual block to generate a block of transform coefficients (referred to herein as a "transform coefficient block"). Transform processing unit 206 may apply various transforms to the residual block to form a transform coefficient block. For example, transform processing unit 206 may apply a discrete cosine transform (DCT), a directional transform, a Karhunen-Loeve transform (KLT), or a conceptually similar transform to the residual block. In some examples, transform processing unit 206 may perform multiple transforms, eg, linear and quadratic transforms, such as rotation transforms, on the residual block. In some examples, transform processing unit 206 does not apply a transform to the residual block.

量子化ユニット208は、変換係数ブロックの中で変換係数を量子化して量子化変換係数ブロックを生成してよい。量子化ユニット208は、現在ブロックに関連する量子化パラメータ(QP)値に従って変換係数ブロックの変換係数を量子化し得る。ビデオエンコーダ200は(たとえば、モード選択ユニット202を介して)、CUに関連するQP値を調整することによって、現在ブロックに関連する変換係数ブロックに適用される量子化の程度を調整し得る。量子化は情報の損失をもたらすことがあり、したがって、量子化変換係数は変換処理ユニット206によって生成された元の変換係数よりも低い精度を有することがある。 Quantization unit 208 may quantize the transform coefficients within the transform coefficient block to generate a quantized transform coefficient block. Quantization unit 208 may quantize the transform coefficients of the transform coefficient block according to a quantization parameter (QP) value associated with the current block. Video encoder 200 (eg, via mode selection unit 202) may adjust the degree of quantization applied to the transform coefficient block associated with the current block by adjusting the QP value associated with the CU. Quantization may result in a loss of information, and therefore the quantized transform coefficients may have lower accuracy than the original transform coefficients produced by transform processing unit 206.

逆量子化ユニット210および逆変換処理ユニット212は、それぞれ、逆量子化および逆変換を量子化変換係数ブロックに適用して、変換係数ブロックから残差ブロックを再構成し得る。再構成ユニット214は、再構成された残差ブロックおよびモード選択ユニット202によって生成された予測ブロックに基づいて、(潜在的にいくらかの程度のひずみを伴うが)現在ブロックに対応する再構成ブロックを生成し得る。たとえば、再構成ユニット214は、再構成された残差ブロックのサンプルを、モード選択ユニット202によって生成された予測ブロックからの対応するサンプルに加算して、再構成ブロックを生成し得る。 Inverse quantization unit 210 and inverse transform processing unit 212 may apply inverse quantization and inverse transform, respectively, to the quantized transform coefficient block to reconstruct a residual block from the transform coefficient block. The reconstruction unit 214 generates a reconstructed block corresponding to the current block (potentially with some degree of distortion) based on the reconstructed residual block and the prediction block generated by the mode selection unit 202. can be generated. For example, reconstruction unit 214 may add samples of the reconstructed residual block to corresponding samples from the predictive block generated by mode selection unit 202 to generate a reconstructed block.

フィルタユニット216は、再構成ブロックに対して1つまたは複数のフィルタ動作を実行してよい。たとえば、フィルタユニット216は、CUのエッジに沿ったブロッキネスアーティファクトを低減するためにデブロッキング動作を実行し得る。いくつかの例では、フィルタユニット216の動作はスキップされ得る。 Filter unit 216 may perform one or more filter operations on the reconstructed block. For example, filter unit 216 may perform a deblocking operation to reduce blockiness artifacts along the edges of the CU. In some examples, operation of filter unit 216 may be skipped.

ビデオエンコーダ200は、再構成ブロックをDPB218の中に記憶する。たとえば、フィルタユニット216の動作が実行されない例では、再構成ユニット214が、再構成ブロックをDPB218に記憶し得る。フィルタユニット216の動作が実行される例では、フィルタユニット216が、フィルタ処理された再構成ブロックをDPB218に記憶し得る。動き推定ユニット222および動き補償ユニット224は、後で符号化されるピクチャのブロックをインター予測するために、再構成された(かつ、潜在的にフィルタ処理された)ブロックから形成された参照ピクチャをDPB218から取り出し得る。加えて、イントラ予測ユニット226は、現在ピクチャの中の他のブロックをイントラ予測するために、現在ピクチャの、DPB218の中の再構成ブロックを使用し得る。 Video encoder 200 stores the reconstructed blocks in DPB 218. For example, in instances where the operations of filter unit 216 are not performed, reconstruction unit 214 may store reconstruction blocks in DPB 218. In examples where the operations of filter unit 216 are performed, filter unit 216 may store filtered reconstructed blocks in DPB 218. Motion estimation unit 222 and motion compensation unit 224 use reference pictures formed from the reconstructed (and potentially filtered) blocks to inter-predict blocks of pictures that are subsequently coded. Can be extracted from DPB218. Additionally, intra prediction unit 226 may use reconstructed blocks in DPB 218 of the current picture to intra predict other blocks in the current picture.

概して、エントロピー符号化ユニット220は、ビデオエンコーダ200の他の機能構成要素から受信されたシンタックス要素をエントロピー符号化し得る。たとえば、エントロピー符号化ユニット220は、量子化ユニット208からの量子化変換係数ブロックをエントロピー符号化し得る。別の例として、エントロピー符号化ユニット220は、モード選択ユニット202からの予測シンタックス要素(たとえば、インター予測のための動き情報またはイントラ予測のためのイントラモード情報)をエントロピー符号化し得る。エントロピー符号化ユニット220は、ビデオデータの別の例であるシンタックス要素に対して1つまたは複数のエントロピー符号化動作を実行して、エントロピー符号化データを生成し得る。たとえば、エントロピー符号化ユニット220は、コンテキスト適応型可変長コーディング(CAVLC)動作、CABAC動作、可変対可変(V2V)長コーディング動作、シンタックスベースのコンテキスト適応型バイナリ算術コーディング(SBAC)動作、確率間隔区分エントロピー(PIPE)コーディング動作、指数ゴロム符号化動作、または別のタイプのエントロピー符号化動作をデータに対して実行し得る。いくつかの例では、エントロピー符号化ユニット220は、シンタックス要素がエントロピー符号化されないバイパスモードで動作し得る。 Generally, entropy encoding unit 220 may entropy encode syntax elements received from other functional components of video encoder 200. For example, entropy encoding unit 220 may entropy encode the quantized transform coefficient block from quantization unit 208. As another example, entropy encoding unit 220 may entropy encode prediction syntax elements (eg, motion information for inter prediction or intra mode information for intra prediction) from mode selection unit 202. Entropy encoding unit 220 may perform one or more entropy encoding operations on syntax elements that are another example of video data to generate entropy encoded data. For example, entropy encoding unit 220 may perform a context adaptive variable length coding (CAVLC) operation, a CABAC operation, a variable-to-variable (V2V) length coding operation, a syntax-based context adaptive binary arithmetic coding (SBAC) operation, a probability interval A piecewise entropy (PIPE) coding operation, an exponential Golomb coding operation, or another type of entropy coding operation may be performed on the data. In some examples, entropy encoding unit 220 may operate in a bypass mode in which syntax elements are not entropy encoded.

ビデオエンコーダ200は、スライスまたはピクチャのブロックを再構成するために必要とされるエントロピー符号化シンタックス要素を含むビットストリームを出力し得る。詳細には、エントロピー符号化ユニット220がビットストリームを出力してよい。 Video encoder 200 may output a bitstream that includes entropy-encoded syntax elements needed to reconstruct slices or blocks of pictures. In particular, entropy encoding unit 220 may output a bitstream.

上記で説明した動作はブロックに関して説明される。そのような説明は、ルーマコーディングブロックおよび/またはクロマコーディングブロックのための動作であるものとして理解されるべきである。上記で説明したように、いくつかの例では、ルーマコーディングブロックおよびクロマコーディングブロックは、CUのルーマ成分およびクロマ成分である。いくつかの例では、ルーマコーディングブロックおよびクロマコーディングブロックは、PUのルーマ成分およびクロマ成分である。 The operations described above are described in terms of blocks. Such descriptions should be understood as being operations for luma coding blocks and/or chroma coding blocks. As explained above, in some examples, the luma and chroma coding blocks are the luma and chroma components of the CU. In some examples, the luma coding block and chroma coding block are the luma and chroma components of the PU.

いくつかの例では、ルーマコーディングブロックに関して実行される動作は、クロマコーディングブロックのために反復される必要はない。一例として、ルーマコーディングブロックのための動きベクトル(MV)および参照ピクチャを識別するための動作は、クロマブロックのためのMVおよび参照ピクチャを識別するために反復される必要はない。むしろ、ルーマコーディングブロックのためのMVは、クロマブロックのためのMVを決定するためにスケーリングされてよく、参照ピクチャは同じであってよい。別の例として、イントラ予測プロセスは、ルーマコーディングブロックおよびクロマコーディングブロックに対して同じであってよい。 In some examples, operations performed on luma coding blocks need not be repeated for chroma coding blocks. As an example, the operations for identifying motion vectors (MVs) and reference pictures for luma coding blocks do not need to be repeated to identify MVs and reference pictures for chroma blocks. Rather, the MV for the luma coding block may be scaled to determine the MV for the chroma block, and the reference picture may be the same. As another example, the intra prediction process may be the same for luma coding blocks and chroma coding blocks.

ビデオエンコーダ200は、ビデオデータを記憶するように構成されたメモリと、回路構成の中に実装された1つまたは複数の処理ユニットとを含む、ビデオデータを符号化するように構成されたデバイスの一例を表し、1つまたは複数の処理ユニットは、入力ブロックを複数のサブブロックに分割することであって、入力ブロックのサイズがコーディングユニットのサイズよりも小さいかまたはそれに等しいことと、複数のサブブロックのうちのサブブロックに双方向オプティカルフロー(BDOF)が適用されることになることを、条件が満たされることに基づいて決定することと、サブブロックを複数のサブサブブロックに分割することと、サブサブブロックのうちの1つまたは複数に対して、改善された動きベクトルを決定することであって、1つまたは複数のサブサブブロックのうちのサブサブブロックに対する改善された動きベクトルが、サブサブブロックの中の複数のサンプルにとって同じであることと、1つまたは複数のサブサブブロックに対する改善された動きベクトルに基づいてサブブロックに対してBDOFを実行することとを行うように構成される。 Video encoder 200 is a device configured to encode video data that includes a memory configured to store video data and one or more processing units implemented in circuitry. Representing one example, the one or more processing units may divide an input block into multiple sub-blocks, wherein the size of the input block is less than or equal to the size of the coding unit; determining that bidirectional optical flow (BDOF) is to be applied to a sub-block of the block based on a condition being met; and dividing the sub-block into a plurality of sub-sub-blocks; determining an improved motion vector for one or more of the sub-subblocks, wherein the improved motion vector for the sub-subblock of the one or more sub-subblocks is determined for one or more of the sub-subblocks; and performing BDOF on the sub-blocks based on the improved motion vectors for the one or more sub-sub-blocks.

別の例として、回路構成の中に実装された1つまたは複数の処理ユニットは、入力ブロックを複数のサブブロックに分割することであって、入力ブロックのサイズがコーディングユニットのサイズよりも小さいかまたはそれに等しいことと、複数のサブブロックのうちのサブブロックに双方向オプティカルフロー(BDOF)が適用されることになることを、条件が満たされることに基づいて決定することと、サブブロックを複数のサブサブブロックに分割することと、サブブロックの中の1つまたは複数のサンプルの各々に対して、改善された動きベクトルを決定することと、サブブロックの中の1つまたは複数のサンプルの各々に対する改善された動きベクトルに基づいてサブブロックに対してBDOFを実行することとを行うように構成され得る。 As another example, one or more processing units implemented in a circuit configuration may partition an input block into multiple subblocks, the size of the input block being smaller than the size of the coding unit. or equivalent thereto, and determining that Bidirectional Optical Flow (BDOF) is to be applied to a sub-block of the plurality of sub-blocks based on the satisfaction of a condition; for each of the one or more samples in the sub-block, determining an improved motion vector for each of the one or more samples in the sub-block; and performing BDOF on the sub-blocks based on the improved motion vectors for the sub-blocks.

また別の例として、ビデオエンコーダ200の処理回路構成は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定し、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定し、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定し、予測サンプルとブロックとの間の差分を示す残差値を決定し、残差値を示す情報をシグナリングするように構成され得る。 As yet another example, the processing circuitry of video encoder 200 determines that bidirectional optical flow (BDOF) is enabled for a block of video data, and that BDOF is enabled for the block. divide the block into multiple sub-blocks based on the determination, determine a respective strain value for each sub-block of one or more of the multiple sub-blocks, Determine whether per-pixel BDOF is performed or BDOF is bypassed for each sub-block of the one or more sub-blocks based on the respective distortion values, and per-pixel BDOF is performed. determine a predicted sample for each subblock of the one or more subblocks based on a determination that the BDOF is bypassed or that the BDOF is bypassed; and determine a residual value indicating the difference between the predicted sample and the block. and may be configured to signal information indicative of the residual value.

図4は、本開示の技法を実行し得る例示的なビデオデコーダ300を示すブロック図である。図4は説明のために提供され、本開示において広く例示および説明するような技法を限定するものではない。説明のために、本開示は、VVC(開発中のITU-T H.266)およびHEVC(ITU-T H.265)の技法によるビデオデコーダ300を説明する。しかしながら、本開示の技法は、他のビデオコーディング規格に対して構成されているビデオコーディングデバイスによって実行され得る。 FIG. 4 is a block diagram illustrating an example video decoder 300 that may implement the techniques of this disclosure. FIG. 4 is provided for purposes of illustration and not limitation of the techniques as broadly illustrated and described in this disclosure. For purposes of illustration, this disclosure describes a video decoder 300 in accordance with VVC (ITU-T H.266 under development) and HEVC (ITU-T H.265) techniques. However, the techniques of this disclosure may be performed by video coding devices configured for other video coding standards.

図4の例では、ビデオデコーダ300は、コード化ピクチャバッファ(CPB)メモリ320、エントロピー復号ユニット302、予測処理ユニット304、逆量子化ユニット306、逆変換処理ユニット308、再構成ユニット310、フィルタユニット312、および復号ピクチャバッファ(DPB)314を含む。CPBメモリ320、エントロピー復号ユニット302、予測処理ユニット304、逆量子化ユニット306、逆変換処理ユニット308、再構成ユニット310、フィルタユニット312、およびDPB314のうちのいずれかまたはすべては、1つもしくは複数のプロセッサの中または処理回路構成の中に実装され得る。たとえば、ビデオデコーダ300のユニットは、ハードウェア回路構成の一部としての1つもしくは複数の回路もしくは論理要素として、またはプロセッサ、ASIC、もしくはFPGAの一部として実装され得る。その上、ビデオデコーダ300は、これらおよび他の機能を実行するための追加または代替のプロセッサまたは処理回路構成を含んでよい。 In the example of FIG. 4, the video decoder 300 includes a coded picture buffer (CPB) memory 320, an entropy decoding unit 302, a prediction processing unit 304, an inverse quantization unit 306, an inverse transform processing unit 308, a reconstruction unit 310, and a filter unit. 312, and a decoded picture buffer (DPB) 314. Any or all of CPB memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, filter unit 312, and DPB 314 may include one or more may be implemented in a processor or in processing circuitry. For example, a unit of video decoder 300 may be implemented as one or more circuits or logic elements as part of a hardware circuitry, or as part of a processor, ASIC, or FPGA. Moreover, video decoder 300 may include additional or alternative processors or processing circuitry to perform these and other functions.

予測処理ユニット304は、動き補償ユニット316およびイントラ予測ユニット318を含む。予測処理ユニット304は、他の予測モードに従って予測を実行するための追加のユニットを含んでよい。例として、予測処理ユニット304は、パレットユニット、(動き補償ユニット316の一部を形成し得る)イントラブロックコピーユニット、アフィンユニット、線形モデル(LM)ユニットなどを含んでよい。他の例では、ビデオデコーダ300は、もっと多数の、もっと少数の、または異なる機能構成要素を含んでよい。 Prediction processing unit 304 includes a motion compensation unit 316 and an intra prediction unit 318. Prediction processing unit 304 may include additional units for performing predictions according to other prediction modes. By way of example, prediction processing unit 304 may include a palette unit, an intra block copy unit (which may form part of motion compensation unit 316), an affine unit, a linear model (LM) unit, and the like. In other examples, video decoder 300 may include more, fewer, or different functional components.

CPBメモリ320は、ビデオデコーダ300の構成要素によって復号されるべき符号化ビデオビットストリームなどのビデオデータを記憶してよい。CPBメモリ320の中に記憶されたビデオデータは、たとえば、コンピュータ可読媒体110(図1)から取得され得る。CPBメモリ320は、符号化ビデオビットストリームからの符号化ビデオデータ(たとえば、シンタックス要素)を記憶するCPBを含んでよい。また、CPBメモリ320は、ビデオデコーダ300の様々なユニットからの出力を表す一時的なデータなどの、コード化ピクチャのシンタックス要素以外のビデオデータを記憶してよい。DPB314は、一般に、符号化ビデオビットストリームの後続のデータまたはピクチャを復号するときにビデオデコーダ300が参照ビデオデータとして出力および/または使用し得る復号ピクチャを記憶する。CPBメモリ320およびDPB314は、SDRAMを含むDRAM、MRAM、RRAM、または他のタイプのメモリデバイスなどの、様々なメモリデバイスのうちのいずれかによって形成され得る。CPBメモリ320およびDPB314は、同じメモリデバイスまたは別個のメモリデバイスによって提供され得る。様々な例では、CPBメモリ320は、ビデオデコーダ300の他の構成要素とともにオンチップであってよく、またはそれらの構成要素に対してオフチップであってもよい。 CPB memory 320 may store video data, such as an encoded video bitstream to be decoded by components of video decoder 300. Video data stored in CPB memory 320 may be obtained from computer-readable medium 110 (FIG. 1), for example. CPB memory 320 may include a CPB that stores encoded video data (eg, syntax elements) from an encoded video bitstream. CPB memory 320 may also store video data other than syntax elements of coded pictures, such as temporary data representing outputs from various units of video decoder 300. DPB 314 generally stores decoded pictures that video decoder 300 may output and/or use as reference video data when decoding subsequent data or pictures of the encoded video bitstream. CPB memory 320 and DPB 314 may be formed by any of a variety of memory devices, such as DRAM, including SDRAM, MRAM, RRAM, or other types of memory devices. CPB memory 320 and DPB 314 may be provided by the same memory device or separate memory devices. In various examples, CPB memory 320 may be on-chip with other components of video decoder 300 or off-chip with respect to those components.

追加または代替として、いくつかの例では、ビデオデコーダ300は、メモリ120(図1)からコード化ビデオデータを取り出し得る。すなわち、メモリ120は、CPBメモリ320を用いて上記で説明したようなデータを記憶し得る。同様に、メモリ120は、ビデオデコーダ300の機能性の一部または全部がビデオデコーダ300の処理回路構成によって実行されるべきソフトウェアで実装されるとき、ビデオデコーダ300によって実行されるべき命令を記憶してよい。 Additionally or alternatively, in some examples, video decoder 300 may retrieve coded video data from memory 120 (FIG. 1). That is, memory 120 may store data as described above using CPB memory 320. Similarly, memory 120 stores instructions to be executed by video decoder 300 when some or all of the functionality of video decoder 300 is implemented in software to be executed by processing circuitry of video decoder 300. It's fine.

図4に示す様々なユニットは、ビデオデコーダ300によって実行される動作を理解する助けとなるために示される。ユニットは、固定機能回路、プログラマブル回路、またはそれらの組合せとして実装され得る。図3と同様に、固定機能回路は、特定の機能性を提供する回路を指し、実行され得る動作に対して事前設定される。プログラマブル回路は、様々なタスクを実行するようにプログラムされ得る回路を指し、実行され得る動作においてフレキシブルな機能性を提供する。たとえば、プログラマブル回路は、ソフトウェアまたはファームウェアの命令によって規定される方式でプログラマブル回路を動作させるソフトウェアまたはファームウェアを実行し得る。固定機能回路は(たとえば、パラメータを受信するかまたはパラメータを出力するための)ソフトウェア命令を実行し得るが、固定機能回路が実行する動作のタイプは一般に不変である。いくつかの例では、ユニットのうちの1つまたは複数は異なる回路ブロック(固定機能またはプログラマブル)であってよく、いくつかの例では、ユニットのうちの1つまたは複数は集積回路であってよい。 The various units shown in FIG. 4 are shown to aid in understanding the operations performed by video decoder 300. A unit may be implemented as a fixed function circuit, a programmable circuit, or a combination thereof. Similar to FIG. 3, fixed function circuitry refers to circuitry that provides specific functionality and is preconfigured for operations that may be performed. Programmable circuit refers to a circuit that can be programmed to perform a variety of tasks, providing flexible functionality in the operations that can be performed. For example, a programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by the software or firmware instructions. Although fixed function circuits may execute software instructions (eg, to receive parameters or output parameters), the types of operations that fixed function circuits perform generally remain unchanged. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be an integrated circuit. .

ビデオデコーダ300は、ALU、EFU、デジタル回路、アナログ回路、および/またはプログラマブル回路から形成されたプログラマブルコアを含んでよい。ビデオデコーダ300の動作が、プログラマブル回路上で実行するソフトウェアによって実行される例では、オンチップメモリまたはオフチップメモリが、ビデオデコーダ300が受信および実行するソフトウェアの命令(たとえば、オブジェクトコード)を記憶してよい。 Video decoder 300 may include a programmable core formed from ALUs, EFUs, digital circuits, analog circuits, and/or programmable circuits. In examples where the operations of video decoder 300 are performed by software executing on programmable circuitry, on-chip or off-chip memory stores software instructions (e.g., object code) that video decoder 300 receives and executes. It's fine.

エントロピー復号ユニット302は、CPBから符号化ビデオデータを受信しビデオデータをエントロピー復号して、シンタックス要素を再生してよい。予測処理ユニット304、逆量子化ユニット306、逆変換処理ユニット308、再構成ユニット310、およびフィルタユニット312は、ビットストリームから抽出されたシンタックス要素に基づいて、復号ビデオデータを生成してよい。 Entropy decoding unit 302 may receive encoded video data from the CPB and entropy decode the video data to reproduce syntax elements. Prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, and filter unit 312 may generate decoded video data based on syntax elements extracted from the bitstream.

一般に、ビデオデコーダ300は、ブロックごとにピクチャを再構成する。ビデオデコーダ300は、各ブロックに対して個別に再構成動作を実行してよい(ここで、現在再構成されつつある、すなわち、復号されつつあるブロックは「現在ブロック」と呼ばれることがある)。 Generally, video decoder 300 reconstructs pictures block by block. Video decoder 300 may perform a reconstruction operation on each block individually (where the block currently being reconstructed, ie, being decoded, may be referred to as the "current block").

エントロピー復号ユニット302は、量子化変換係数ブロックの量子化変換係数、ならびに量子化パラメータ(QP)および/または変換モード表示などの変換情報を規定する、シンタックス要素をエントロピー復号してよい。逆量子化ユニット306は、量子化変換係数ブロックに関連するQPを使用して、量子化の程度、および同様に逆量子化ユニット306が適用すべき逆量子化の程度を決定してよい。逆量子化ユニット306は、たとえば、量子化変換係数を逆量子化するために、ビット単位の左シフト演算を実行してよい。それによって、逆量子化ユニット306は、変換係数を含む変換係数ブロックを形成し得る。 Entropy decoding unit 302 may entropy decode syntax elements that define quantized transform coefficients of the quantized transform coefficient block and transform information such as quantization parameters (QPs) and/or transform mode indications. Inverse quantization unit 306 may use the QP associated with the quantized transform coefficient block to determine the degree of quantization, and likewise the degree of inverse quantization that inverse quantization unit 306 should apply. Dequantization unit 306 may, for example, perform a bitwise left shift operation to dequantize the quantized transform coefficients. Inverse quantization unit 306 may thereby form a transform coefficient block that includes transform coefficients.

逆量子化ユニット306が変換係数ブロックを形成した後、逆変換処理ユニット308は、現在ブロックに関連する残差ブロックを生成するために、1つまたは複数の逆変換を変換係数ブロックに適用してよい。たとえば、逆変換処理ユニット308は、逆DCT、逆整数変換、逆カルーネンレーベ変換(KLT)、逆回転変換、逆方向変換、または別の逆変換を、変換係数ブロックに適用し得る。 After inverse quantization unit 306 forms the transform coefficient block, inverse transform processing unit 308 applies one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block. good. For example, inverse transform processing unit 308 may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotation transform, an inverse transform, or another inverse transform to the block of transform coefficients.

さらに、予測処理ユニット304は、エントロピー復号ユニット302によってエントロピー復号された予測情報シンタックス要素に従って予測ブロックを生成する。たとえば、現在ブロックがインター予測されることを予測情報シンタックス要素が示す場合、動き補償ユニット316が予測ブロックを生成し得る。この場合、予測情報シンタックス要素は、そこから参照ブロックを取り出すべきDPB314の中の参照ピクチャ、ならびに現在ピクチャの中の現在ブロックのロケーションに対する参照ピクチャの中の参照ブロックのロケーションを識別する動きベクトルを示し得る。動き補償ユニット316は、概して、動き補償ユニット224(図3)に関して説明した方式と実質的に類似の方式で、インター予測プロセスを実行し得る。 Further, the prediction processing unit 304 generates a prediction block according to the prediction information syntax elements entropy decoded by the entropy decoding unit 302. For example, motion compensation unit 316 may generate a predictive block if the prediction information syntax element indicates that the current block is inter-predicted. In this case, the prediction information syntax element specifies the reference picture in the DPB 314 from which the reference block is to be retrieved, as well as a motion vector that identifies the location of the reference block in the reference picture relative to the location of the current block in the current picture. can be shown. Motion compensation unit 316 may generally perform an inter-prediction process in a manner substantially similar to that described with respect to motion compensation unit 224 (FIG. 3).

別の例として、現在ブロックがイントラ予測されることを予測情報シンタックス要素が示す場合、イントラ予測ユニット318は、予測情報シンタックス要素によって示されたイントラ予測モードに従って予測ブロックを生成してよい。この場合も、イントラ予測ユニット318は、概して、イントラ予測ユニット226(図3)に関して説明した方式と実質的に類似の方式で、イントラ予測プロセスを実行してよい。イントラ予測ユニット318は、現在ブロックへの隣接サンプルのデータをDPB314から取り出し得る。 As another example, if the prediction information syntax element indicates that the current block is intra-predicted, intra prediction unit 318 may generate the prediction block according to the intra prediction mode indicated by the prediction information syntax element. Again, intra prediction unit 318 may generally perform the intra prediction process in a manner substantially similar to that described with respect to intra prediction unit 226 (FIG. 3). Intra prediction unit 318 may retrieve data for adjacent samples to the current block from DPB 314.

再構成ユニット310は、予測ブロックおよび残差ブロックを使用して現在ブロックを再構成してよい。たとえば、再構成ユニット310は、残差ブロックのサンプルを予測ブロックの対応するサンプルに加算して現在ブロックを再構成してよい。 Reconstruction unit 310 may reconstruct the current block using the prediction block and the residual block. For example, reconstruction unit 310 may add samples of the residual block to corresponding samples of the predictive block to reconstruct the current block.

フィルタユニット312は、再構成ブロックに対して1つまたは複数のフィルタ動作を実行してよい。たとえば、フィルタユニット312は、再構成ブロックのエッジに沿ったブロッキネスアーティファクトを低減するために、デブロッキング動作を実行してよい。フィルタユニット312の動作は、必ずしもすべての例において実行されるとは限らない。 Filter unit 312 may perform one or more filter operations on the reconstructed block. For example, filter unit 312 may perform a deblocking operation to reduce blockiness artifacts along the edges of the reconstructed block. The operations of filter unit 312 are not necessarily performed in all instances.

ビデオデコーダ300は、再構成ブロックをDPB314の中に記憶してよい。たとえば、フィルタユニット312の動作が実行されない例では、再構成ユニット310が、再構成ブロックをDPB314に記憶してよい。フィルタユニット312の動作が実行される例では、フィルタユニット312が、フィルタ処理された再構成ブロックをDPB314に記憶してよい。上記で説明したように、DPB314は、イントラ予測のための現在ピクチャおよび後続の動き補償のための以前に復号されたピクチャのサンプルなどの参照情報を、予測処理ユニット304に提供してよい。その上、ビデオデコーダ300は、図1のディスプレイデバイス118などのディスプレイデバイス上に後で提示するために、DPB314からの復号ピクチャ(たとえば、復号ビデオ)を出力してよい。 Video decoder 300 may store reconstruction blocks in DPB 314. For example, in examples where the operations of filter unit 312 are not performed, reconstruction unit 310 may store reconstruction blocks in DPB 314. In examples where the operations of filter unit 312 are performed, filter unit 312 may store filtered reconstructed blocks in DPB 314. As explained above, DPB 314 may provide reference information to prediction processing unit 304, such as samples of the current picture for intra prediction and previously decoded pictures for subsequent motion compensation. Additionally, video decoder 300 may output decoded pictures (eg, decoded video) from DPB 314 for later presentation on a display device, such as display device 118 of FIG. 1.

このようにして、ビデオデコーダ300は、ビデオデータを記憶するように構成されたメモリと、回路構成の中に実装された1つまたは複数の処理ユニットとを含む、ビデオ復号デバイスの一例を表し、1つまたは複数の処理ユニットは、入力ブロックを複数のサブブロックに分割することであって、入力ブロックのサイズがコーディングユニットのサイズよりも小さいかまたはそれに等しいことと、複数のサブブロックのうちのサブブロックに双方向オプティカルフロー(BDOF)が適用されることになることを、条件が満たされることに基づいて決定することと、サブブロックを複数のサブサブブロックに分割することと、サブサブブロックのうちの1つまたは複数に対して、改善された動きベクトルを決定することであって、1つまたは複数のサブサブブロックのうちのサブサブブロックに対する改善された動きベクトルが、サブサブブロックの中の複数のサンプルにとって同じであることと、1つまたは複数のサブサブブロックに対する改善された動きベクトルに基づいてサブブロックに対してBDOFを実行することとを行うように構成される。 Video decoder 300 thus represents an example of a video decoding device that includes a memory configured to store video data and one or more processing units implemented in circuitry; The one or more processing units are configured to divide the input block into a plurality of sub-blocks, the size of the input block being smaller than or equal to the size of the coding unit, and the size of the input block being smaller than or equal to the size of the coding unit; determining that bidirectional optical flow (BDOF) is to be applied to a sub-block based on a condition being met; dividing the sub-block into a plurality of sub-sub-blocks; determining an improved motion vector for one or more of the sub-sub-blocks, the improved motion vector for the sub-sub-block of the one or more sub-sub-blocks comprising: determining an improved motion vector for one or more of the sub-sub-blocks; and performing BDOF on the sub-blocks based on the improved motion vectors for the one or more sub-sub-blocks.

別の例として、ビデオデコーダ300の処理回路構成(たとえば、動き補償ユニット316)は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定し、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定し、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定し、予測サンプルに基づいてブロックを再構成するように構成され得る。たとえば、処理回路構成は、予測サンプルとブロックのサンプルとの間の差分を示す残差値を受信してよく、残差値を予測サンプルに加算してブロックを再構成してよい。 As another example, processing circuitry (e.g., motion compensation unit 316) of video decoder 300 determines that bidirectional optical flow (BDOF) is enabled for a block of video data and that dividing the block into a plurality of subblocks based on the determination that BDOF is enabled; determining a respective strain value for each subblock of the one or more subblocks of the plurality of subblocks; Determining whether pixel-wise BDOF is performed or BDOF is bypassed for each subblock of one or more of the plurality of subblocks based on the respective distortion values. determine predicted samples for each subblock of the one or more subblocks based on the decision that pixel-wise BDOF is performed or BDOF is bypassed, and reconstruct the block based on the predicted samples. may be configured to do so. For example, processing circuitry may receive a residual value indicating a difference between the predicted samples and the samples of the block and may add the residual value to the predicted samples to reconstruct the block.

以下のことは、HEVCにおけるCU構造および動きベクトル予測を説明する。以下のことは、CUおよび動きベクトル予測の上記の説明に追加のコンテキストを提供することがあり、理解の助けとなるように上記の説明のいくつかの繰返しを含むことがある。 The following describes the CU structure and motion vector prediction in HEVC. The following may provide additional context to the above description of CU and motion vector prediction, and may include some repetitions of the above description to aid in understanding.

HEVCでは、スライスの中の最も大きいコーディングユニットは、コーディングツリーブロック(CTB)またはコーディングツリーユニット(CTU)と呼ばれる。CTBは4分木を含み、4分木のノードがコーディングユニットである。CTBのサイズは、HEVCメインプロファイルでは(技術的には8×8のCTBサイズがサポートされ得るが)16×16から64×64までに及ぶことができる。コーディングユニット(CU)は、CTBの同じサイズから8×8と同程度に小さい場合がある。各コーディングユニットは、1つのモード、すなわち、インターモードまたはイントラモードを用いてコーディングされる。CUは、インターコーディングされるとき、2個もしくは4個の予測ユニット(PU)にさらに区分されてよく、またはそれ以上の区分が適用されないとき、ただ1つのPUになってもよい。1つのCUの中に2個のPUが存在するとき、それらはサイズが半分の長方形、またはCUの1/4もしくは3/4のサイズを有する2つの長方形であり得る。CUがインターコーディングされるとき、各PUは、固有のインター予測モードを用いて導出される、動き情報の1つのセットを有する。 In HEVC, the largest coding unit in a slice is called a coding tree block (CTB) or coding tree unit (CTU). CTB contains a quadtree, and the nodes of the quadtree are coding units. The size of the CTB can range from 16x16 to 64x64 in the HEVC main profile (though technically an 8x8 CTB size could be supported). The coding unit (CU) may be as small as 8×8 from the same size of the CTB. Each coding unit is coded using one mode: inter mode or intra mode. When inter-coded, a CU may be further partitioned into two or four prediction units (PUs), or may become only one PU when no further partitioning is applied. When there are two PUs in one CU, they can be half rectangles in size, or two rectangles with the size of 1/4 or 3/4 of the CU. When CUs are inter-coded, each PU has one set of motion information derived using a unique inter-prediction mode.

以下のことは動きベクトル予測を説明する。HEVC規格では、それぞれ、予測ユニット(PU)に対して、マージモード(スキップモードはマージモードの特別なケースと見なされる)および高度動きベクトル予測(AMVP)モードと名付けられた、2つのインター予測モードがある。 The following describes motion vector prediction. The HEVC standard provides two inter-prediction modes for a prediction unit (PU), named merge mode (skip mode is considered a special case of merge mode) and advanced motion vector prediction (AMVP) mode, respectively. There is.

AMVPモードまたはマージモードのいずれかにおいて、複数の動きベクトル予測子に対して動きベクトル(MV)候補リストが維持される。現在PUの動きベクトルならびにマージモードにおける参照インデックスは、MV候補リストから1つの候補を取ることによって生成される。 In either AMVP mode or merge mode, a motion vector (MV) candidate list is maintained for multiple motion vector predictors. The motion vector of the current PU as well as the reference index in merge mode are generated by taking one candidate from the MV candidate list.

MV候補リストは、マージモードのための最大5個の候補、およびAMVPモードのための2個だけの候補を含む。マージ候補は、動き情報のセット、たとえば、参照ピクチャリスト(リスト0およびリスト1)と参照インデックスの両方に対応する動きベクトルを含んでよい。マージインデックスによってマージ候補が識別される場合、現在ブロックの予測のために使用される参照ピクチャならびに関連する動きベクトルが決定される。一方、AMVPモードのもとでは、リスト0またはリスト1のいずれかからの可能な各予測方向に対して、AMVP候補が動きベクトルしか含まないので、参照インデックスは、MV候補リストへのMV予測子(MVP)インデックスと一緒に明示的にシグナリングされることになる。AMVPモードでは、予測された動きベクトルがさらに改善され得る。両方のモードのための候補は、同じ空間隣接ブロックおよび時間隣接ブロックから同様に導出される。 The MV candidate list contains up to 5 candidates for merge mode and only 2 candidates for AMVP mode. A merge candidate may include a set of motion information, eg, motion vectors that correspond to both reference picture lists (List 0 and List 1) and reference indices. If a merge candidate is identified by the merge index, the reference picture used for prediction of the current block as well as the associated motion vector are determined. On the other hand, under AMVP mode, for each possible prediction direction from either list 0 or list 1, the AMVP candidates only contain motion vectors, so the reference index is the MV predictor to the MV candidate list. (MVP) will be explicitly signaled along with the index. In AMVP mode, the predicted motion vectors may be further improved. Candidates for both modes are similarly derived from the same spatial and temporal neighbors.

以下のことは空間隣接候補を説明する。たとえば、図5Aおよび図5Bは、それぞれ、マージモードおよび高度動きベクトル予測子(AMVP)モードのための、空間隣接動きベクトル候補の例を示す概念図である。 The following describes spatial neighbor candidates. For example, FIGS. 5A and 5B are conceptual diagrams illustrating examples of spatially adjacent motion vector candidates for merge mode and advanced motion vector predictor (AMVP) mode, respectively.

空間MV候補は、特定のPU(PU₀)500に対して、図5Aおよび図5Bに示す隣接ブロックから導出されるが、ブロックから候補を生成する方法はマージモードおよびAMVPモードにとって異なる。マージモードでは、最大4個の空間MV候補が、番号を用いて図5Aに示す順序とともに導出されることが可能であり、その順序は、以下の通り、すなわち、図5Aに示すように、左(0,A1)、上(1,B1)、右上(2,B0)、左下(3,A0)、および左上(4,B2)である。 Spatial MV candidates are derived from neighboring blocks shown in FIGS. 5A and 5B for a particular PU (PU ₀ ) 500, but the method of generating candidates from blocks is different for merge mode and AMVP mode. In merge mode, up to 4 spatial MV candidates can be derived using numbers with the order shown in Figure 5A as follows: (0,A1), top (1,B1), top right (2,B0), bottom left (3,A0), and top left (4,B2).

AVMPモードでは、隣接ブロックは、図5Bの中のPU0 502に示すように、2つのグループ、すなわち、ブロック0および1からなる左のグループ、ならびにブロック2、3、および4からなる上のグループに分割される。各グループに対して、シグナリングされる参照インデックスによって示されるものと同じ参照ピクチャを参照する隣接ブロックの中の可能な候補が、グループの最終候補を形成するために選ばれるべき最高の優先度を有する。すべての隣接ブロックが、同じ参照ピクチャを指し示す動きベクトルを含まないことがあり得る。したがって、そのような候補が見つけられない場合、利用可能な最初の候補が、最終候補を形成するようにスケーリングされてよく、したがって、時間距離差が補償され得る。 In AVMP mode, the adjacent blocks are divided into two groups, the left group consisting of blocks 0 and 1, and the upper group consisting of blocks 2, 3, and 4, as shown in PU0 502 in Figure 5B. be divided. For each group, the possible candidates among the neighboring blocks that refer to the same reference picture as indicated by the signaled reference index have the highest priority to be chosen to form the final candidates of the group. . It is possible that all neighboring blocks do not contain motion vectors pointing to the same reference picture. Therefore, if no such candidate is found, the first available candidate may be scaled to form the final candidate, thus compensating for the time-distance difference.

以下のことはHEVCにおける時間動きベクトル予測を説明する。時間動きベクトル予測子(TMVP)候補は、有効にされ利用可能である場合、空間動きベクトル候補の後にMV候補リストの中に追加される。TMVP候補のための動きベクトル導出のプロセスは、マージモードとAMVPモードの両方にとって同じであるが、マージモードにおけるTMVP候補に対するターゲット参照インデックスは、常に0に設定される。 The following describes temporal motion vector prediction in HEVC. Temporal motion vector predictor (TMVP) candidates, if enabled and available, are added into the MV candidate list after the spatial motion vector candidates. The process of motion vector derivation for TMVP candidates is the same for both merge mode and AMVP mode, but the target reference index for TMVP candidates in merge mode is always set to 0.

TMVP候補導出のための1次ブロックロケーションは、空間隣接候補を生成するために使用される、上のブロックおよび左のブロックへのバイアスを補償するための、ブロック602として示されるブロック「T」として図6Aに示すようにコロケートPUの外側の右下ブロックである。しかしながら、そのブロックが現在CTB行の外側に位置するか、または動き情報が利用可能でない場合、ブロックは、ブロック604として示すPUの中心ブロックと置換される。 The primary block location for TMVP candidate derivation is as block “T”, denoted as block 602, to compensate for the bias towards the top and left blocks used to generate spatial neighbor candidates. As shown in Figure 6A is the outer lower right block of the colocated PU. However, if the block is currently located outside the CTB row or no motion information is available, then the block is replaced with the central block of the PU, shown as block 604.

TMVP候補のための動きベクトルは、スライスレベルの中で示される、コロケートピクチャのコロケートPUから導出される。コロケートPUのための動きベクトルは、コロケートMVと呼ばれる。AVCにおける時間ダイレクトモードと同様に、TMVP候補動きベクトルを導出するために、コロケートMVは、図6Bに示すように、時間距離差を補償するようにスケーリングされることになる。 Motion vectors for TMVP candidates are derived from colocated PUs of colocated pictures, indicated within the slice level. The motion vector for collocated PU is called collocated MV. Similar to the temporal direct mode in AVC, to derive the TMVP candidate motion vector, the collocated MVs will be scaled to compensate for the temporal distance difference, as shown in FIG. 6B.

以下のことはHEVCにおける動き予測の追加態様を説明する。マージモードおよびAMVPモードのいくつかの態様は、次のように述べる価値がある。動きベクトルスケーリング:動きベクトルの値がプレゼンテーション時間におけるピクチャの距離に比例することが想定される。動きベクトルは、2つのピクチャ、すなわち、参照ピクチャと動きベクトルを含むピクチャ(すなわち、格納ピクチャ)とを関連付ける。その動きベクトルが、他の動きベクトルを予測するために利用されるとき、格納ピクチャと参照ピクチャとの距離が、ピクチャ順序カウント(POC)値に基づいて計算される。 The following describes additional aspects of motion prediction in HEVC. Some aspects of merge mode and AMVP mode are worth mentioning as follows. Motion Vector Scaling: It is assumed that the value of the motion vector is proportional to the distance of the picture at the presentation time. A motion vector associates two pictures: a reference picture and a picture containing the motion vector (ie, a stored picture). When that motion vector is used to predict other motion vectors, the distance between the stored picture and the reference picture is calculated based on the picture order count (POC) value.

予測されるべき動きベクトルに対して、その関連する格納ピクチャと参照ピクチャの両方が異なる場合がある。したがって、(POCに基づく)新たな距離が計算される。そして、動きベクトルは、これらの2つのPOC距離に基づいてスケーリングされる。空間隣接候補の場合、2つの動きベクトルに対する格納ピクチャは同じであるが、参照ピクチャは異なる。HEVCでは、動きベクトルスケーリングは、空間隣接候補および時間隣接候補に対してTMVPとAMVPの両方に適用される。 For a motion vector to be predicted, both its associated stored picture and reference picture may be different. Therefore, a new distance (based on POC) is calculated. The motion vector is then scaled based on these two POC distances. For spatially adjacent candidates, the stored pictures for the two motion vectors are the same, but the reference pictures are different. In HEVC, motion vector scaling is applied to both TMVP and AMVP for spatial and temporal neighbor candidates.

人工動きベクトル候補生成:動きベクトル候補リストが完全でない場合、人工動きベクトル候補が生成され、すべての候補をリストが有するまでリストの末尾に挿入される。 Artificial motion vector candidate generation: If the motion vector candidate list is not complete, artificial motion vector candidates are generated and inserted at the end of the list until the list has all candidates.

マージモードでは、2つのタイプの人工MV候補、すなわち、Bスライスに対してのみ導出される複合候補、および第1のタイプが十分な人工候補を提供しない場合にAMVPのためにのみ使用されるゼロ候補がある。すでに候補リストの中にあり必要な動き情報を有する候補の各ペアに対して、リスト0の中のピクチャを参照する第1の候補の動きベクトルとリスト1の中のピクチャを参照する第2の候補の動きベクトルとの組合せによって、双方向複合動きベクトル候補が導出される。 In merge mode, there are two types of artificial MV candidates: composite candidates, which are derived only for B slices, and zeros, which are used only for AMVP if the first type does not provide enough artificial candidates. There are candidates. For each pair of candidates that are already in the candidate list and have the required motion information, the motion vector of the first candidate refers to a picture in list 0 and the second candidate refers to a picture in list 1. A bidirectional composite motion vector candidate is derived by combination with the candidate motion vector.

候補挿入のためのプルーニング(pruning)プロセス:異なるブロックからの候補がたまたま同じである場合があり、そのことはマージ/AMVP候補リストの効率を下げる。この問題を解決するためにプルーニングプロセスが適用される。プルーニングプロセスは、同一の候補を挿入することをいくらかの程度で回避するために、現在の候補リストの中の他方に対して一方の候補を比較する。複雑度を低減するために、可能な各候補をすべての他の既存の候補と比較するのではなく、限定された回数だけのプルーニングプロセスが適用される。 Pruning process for candidate insertion: Candidates from different blocks may happen to be the same, which reduces the efficiency of the merge/AMVP candidate list. A pruning process is applied to solve this problem. The pruning process compares one candidate against another in the current candidate list in order to avoid inserting identical candidates to some extent. To reduce complexity, rather than comparing each possible candidate with all other existing candidates, a pruning process is applied only a limited number of times.

以下のことはテンプレートマッチング予測を説明する。テンプレートマッチング(TM)予測は、フレームレートアップコンバージョン(FRUC)技法に基づく特別なマージモードである。このモードを用いると、ブロックの動き情報はシグナリングされないが、デコーダ側において(たとえば、ビデオデコーダ300によって)導出される。TM予測は、AMVPモードと通常のマージモードの両方に適用される。AMVPモードでは、現在ブロックテンプレートと参照ブロックテンプレートとの間の最小差に達する候補を拾い上げるために、テンプレートマッチングに基づいてMVP候補選択が決定される。通常のマージモードでは、TMの使用を示すためにTMモードフラグがシグナリングされ、次いで、MV改善のためにマージインデックスによって示されるマージ候補にTMが適用される。 The following describes template matching prediction. Template matching (TM) prediction is a special merging mode based on frame rate upconversion (FRUC) techniques. Using this mode, motion information for the block is not signaled, but is derived at the decoder side (eg, by video decoder 300). TM prediction applies to both AMVP mode and normal merge mode. In AMVP mode, MVP candidate selection is determined based on template matching to pick up the candidate that reaches the minimum difference between the current block template and the reference block template. In normal merge mode, a TM mode flag is signaled to indicate the use of a TM, and then the TM is applied to the merge candidates indicated by the merge index for MV improvement.

図7に示すように、テンプレートマッチングが使用されて、現在フレーム700の中のテンプレート(現在CUの上および/または左の隣接ブロック)と参照フレーム702の中のブロック(テンプレートと同じサイズ)との間の最も近い整合を見つけることによって現在CUの動き情報を導出する。初期整合誤差に基づいて選択されたAMVP候補を用いて、AMVP候補のMVPがテンプレートマッチングによって改善される。シグナリングされたマージインデックスによって示されるマージ候補を用いて、L0およびL1に対応するマージ候補のマージされたMVが、テンプレートマッチングによって独立して改善され、次いで、さほど正確でないMVが、より良好なMVを優先として用いてさらにもう一度改善される。 As shown in FIG. 7, template matching is used to match a template in current frame 700 (the adjacent block above and/or to the left of the current CU) with a block in reference frame 702 (same size as the template). Derive the motion information of the current CU by finding the closest match between. With the AMVP candidates selected based on the initial matching error, the MVP of the AMVP candidates is improved by template matching. With the merge candidates indicated by the signaled merge index, the merged MVs of the merge candidates corresponding to L0 and L1 are independently improved by template matching, and then the less accurate MVs are compared to the better MVs. is further improved using as a priority.

コスト関数に対して、動きベクトルが分数サンプル位置を指し示すとき、動き補償された補間が利用されてよい。複雑度を低減するために、両方のテンプレートマッチングのために通常の8タップDCT-IF補間の代わりに双線形補間が使用されて、参照ピクチャ上のテンプレートを生成する。テンプレートマッチングのマッチングコストCは、次のように計算される。 For the cost function, motion compensated interpolation may be utilized when the motion vector points to a fractional sample position. To reduce complexity, bilinear interpolation is used instead of the usual 8-tap DCT-IF interpolation for both template matching to generate templates on the reference picture. The matching cost C of template matching is calculated as follows.

上の式において、wは、経験的に4に設定される重み付け係数であり、MVおよびMV^sは、それぞれ、現在テストしているMV、および初期MV(すなわち、AMVPモードにおけるMVP候補、またはマージモードにおけるマージされた動き)を示す。テンプレートマッチングのマッチングコストとして、SAD(絶対差分和)が使用される。 In the above equation, w is a weighting factor that is empirically set to 4, and MV and MV ^s are the currently testing MV, and the initial MV (i.e., MVP candidate in AMVP mode, or merged (merged movement in mode). SAD (sum of absolute differences) is used as a matching cost for template matching.

TMが使用されるとき、動きはルーマサンプルのみを使用することによって改善される。導出された動きは、MC(動き補償)インター予測のためにルーマとクロマの両方のために使用され得る。MVが決定された後、ルーマに対して8タップ補間フィルタかつクロマに対して4タップ補間フィルタを使用して、最終MCが実行される。 When TM is used, motion is improved by using only luma samples. The derived motion can be used for both luma and chroma for MC (motion compensation) inter prediction. After the MV is determined, a final MC is performed using an 8-tap interpolation filter for luma and a 4-tap interpolation filter for chroma.

探索方法に対して、MV改善は、テンプレートマッチングコストの基準を用いたパターンベースのMV探索である。MV改善のために、2つの探索パターン、すなわち、ダイアモンド探索およびクロス探索がサポートされる。MVは、ダイヤモンドパターンを用いて1/4のルーマサンプルMVD確度で直接探索され、クロスパターンを用いて1/4のルーマサンプルMVD確度が後続し、次いで、クロスパターンを用いて1/8のルーマサンプルMVD改善がこれに後続する。MV改善の探索範囲は、初期MVの周囲の(-8,+8)ルーマサンプルに等しく設定される。 For search methods, MV improvement is a pattern-based MV search using the criterion of template matching cost. Two search patterns are supported for MV improvement: diamond search and cross search. The MV is searched directly with a 1/4 luma sample MVD accuracy using a diamond pattern, followed by a 1/4 luma sample MVD accuracy using a cross pattern, then 1/8 luma sample MVD accuracy using a cross pattern. This will be followed by sample MVD improvements. The search range for MV improvement is set equal to (-8,+8) luma samples around the initial MV.

以下のことは双方向マッチング予測を説明する。双方向マッチング(双方向マージとも呼ばれる)(BM)予測は、フレームレートアップコンバージョン(FRUC)技法に基づく別のマージモードである。ブロックがBMモードを適用すべきと決定されると、構成されたマージリストの中のマージ候補を選択するために、シグナリングされたマージ候補インデックスを使用することによって、2つの初期動きベクトルMV0およびMV1が導出される。双方向マッチング探索は、MV0およびMV1の周囲にあってよい。最小双方向マッチングコストに基づいて、最終のMV0'およびMV1'が導出される。 The following describes bidirectional matching prediction. Bidirectional matching (also called bidirectional merging) (BM) prediction is another merging mode based on frame rate up conversion (FRUC) techniques. When it is determined that a block should apply BM mode, the two initial motion vectors MV0 and MV1 are determined by using the signaled merging candidate index to select the merging candidate in the configured merging list. is derived. A bidirectional matching search may be around MV0 and MV1. Based on the minimum two-way matching cost, the final MV0' and MV1' are derived.

2つの参照ブロックを指し示す動きベクトル差分MVD0 800(MV0' - MV0によって示される)およびMVD1 802(MV1' - MV1によって示される)は、現在ピクチャと2つの参照ピクチャとの間の時間距離(TD)、たとえば、TD0およびTD1に比例してよい。図8は、MVD0およびMVD1の一例を示し、TD1はTD0の4倍である。 The motion vector differences MVD0 800 (indicated by MV0' - MV0) and MVD1 802 (indicated by MV1' - MV1) pointing to the two reference blocks are the temporal distance (TD) between the current picture and the two reference pictures. , for example, may be proportional to TD0 and TD1. Figure 8 shows an example of MVD0 and MVD1, where TD1 is 4 times TD0.

しかしながら、時間距離TD0およびTD1に関係なくMVD0およびMVD1が鏡のように反射する随意の設計がある。図9は、鏡のように反射するMVD0 900およびMVD1 902の一例を示し、TD1はTD0の4倍である。 However, there are optional designs in which MVD0 and MVD1 reflect like mirrors regardless of the time distances TD0 and TD1. Figure 9 shows an example of a mirror-reflecting MVD0 900 and MVD1 902, where TD1 is 4 times TD0.

双方向マッチングは、最終のMV0'およびMV1'を導出するために初期MV0およびMV1の周囲の局所的な探索を実行する。局所的な探索は、探索範囲[-8,8]を通じたループに3×3の正方形探索パターンを適用する。各探索反復において、探索パターンの中の8個の周囲MVの双方向マッチングコストが計算され、中心MVの双方向マッチングコストと比較される。最小双方向マッチングコストを有するMVが、次の探索反復における新たな中心MVになる。局所的な探索は、現在の中心MVが3×3の正方形探索パターン内で最小コストを有するかまたは局所的な探索が既定の最大探索反復に達すると終了する。図10は、探索範囲[-8,8]の中の3×3の正方形探索パターン1000の一例を示す。 Bidirectional matching performs a local search around the initial MV0 and MV1 to derive the final MV0' and MV1'. The local search applies a 3x3 square search pattern in a loop through the search range [-8,8]. At each search iteration, the two-way matching costs of the eight surrounding MVs in the search pattern are calculated and compared with the two-way matching cost of the central MV. The MV with the minimum two-way matching cost becomes the new central MV in the next search iteration. The local search ends when the current center MV has the minimum cost within the 3x3 square search pattern or the local search reaches a predetermined maximum search iterations. FIG. 10 shows an example of a 3×3 square search pattern 1000 within the search range [-8,8].

以下のことはデコーダ側動きベクトル改善を説明する。マージモードのMVの確度を高めるために、VVCにおいてデコーダ側動きベクトル改善(DMVR)が適用される。双予測動作では、改善されるMVは、参照ピクチャリストL0および参照ピクチャリストL1の中の初期MVの周囲で探索される。DMVR方法は、参照ピクチャリストL0およびリストL1の中の2つの候補ブロックの間のひずみを計算する。図11に示すように、初期MVの周囲の各MV候補に基づいて、ブロック1102と1100との間のSADが計算される。最小のSADを有するMV候補が、改善されたMVになり、双予測された信号を生成するために使用される。 The following describes the decoder side motion vector improvement. To increase the accuracy of merge mode MV, decoder-side motion vector refinement (DMVR) is applied in VVC. In bi-prediction operations, the MV to be improved is searched around the initial MV in reference picture list L0 and reference picture list L1. The DMVR method calculates the distortion between two candidate blocks in reference picture list L0 and list L1. As shown in FIG. 11, the SAD between blocks 1102 and 1100 is calculated based on each MV candidate around the initial MV. The MV candidate with the smallest SAD becomes the improved MV and is used to generate the bi-predicted signal.

DMVRプロセスによって導出される改善されたMVは、インター予測サンプルを生成するために使用され、将来のピクチャコーディングのための時間動きベクトル予測においても同じく使用される。元のMVは、デブロッキングプロセスの中で使用されるが、将来のCUコーディングのための空間動きベクトル予測においても同じく使用される。 The improved MV derived by the DMVR process is used to generate inter-prediction samples and is also used in temporal motion vector prediction for future picture coding. The original MV is used in the deblocking process, but also in spatial motion vector prediction for future CU coding.

DMVRは、16×16ルーマサンプルという既定の最大処理ユニットを用いたサブブロックベースのマージモードである。CUの幅および/または高さが16ルーマサンプルよりも大きいとき、CUは、16ルーマサンプルに等しい幅および/または高さを有するサブブロックにさらに分割されてよい。 DMVR is a subblock-based merging mode with a default maximum processing unit of 16x16 luma samples. When the width and/or height of the CU is greater than 16 luma samples, the CU may be further divided into sub-blocks having a width and/or height equal to 16 luma samples.

以下のことは探索方式を説明する。DVMRでは、初期MVを取り囲んでいる探索ポイント、およびMVオフセットは、MV差分ミラーリング規則に準拠し得る。たとえば、候補MVペア(MV0,MV1)によって示される、DMVRによってチェックされるいかなるポイントも、以下の2つの式に準拠し得る。
MV0' = MV0 + MV_offset
MV1' = MV1 - MV_offset The following describes the search method. In DVMR, the search points surrounding the initial MV and the MV offset may comply with MV differential mirroring rules. For example, any point checked by DMVR, indicated by a candidate MV pair (MV0, MV1), may conform to the following two equations:
MV0' = MV0 + MV_offset
MV1' = MV1 - MV_offset

上の式において、MV_offsetは、参照ピクチャのうちの1つの中の初期MVと改善されるMVとの間の改善オフセットを表す。改善探索範囲は、初期MVからの2つの整数ルーマサンプルである。探索は、整数サンプルオフセット探索ステージ、および分数サンプル改善ステージを含む。 In the above equation, MV_offset represents the improvement offset between the initial MV and the improved MV in one of the reference pictures. The improvement search range is two integer luma samples from the initial MV. The search includes an integer sample offset search stage and a fractional sample improvement stage.

整数サンプルオフセット探索のために25ポイントの全探索が適用される。初期MVペアのSADが最初に計算される。初期MVペアのSADがしきい値よりも小さい場合、DMVRの整数サンプルステージが終了する。そうでない場合、残りの24ポイントのSADが計算されラスタ走査順序でチェックされる。SADが最も小さいポイントが、整数サンプルオフセット探索ステージの出力として選択される。DMVR改善の不確実性の不利益を低減するために、DMVRプロセス中の元のMVが選好されてよい。初期MV候補によって参照される参照ブロックの間のSADは、SAD値の1/4だけ小さくなる。 A 25-point full search is applied for the integer sample offset search. The SAD of the initial MV pair is first calculated. If the SAD of the initial MV pair is less than the threshold, the integer sample stage of DMVR ends. If not, the SAD of the remaining 24 points is calculated and checked in raster scan order. The point with the smallest SAD is selected as the output of the integer sample offset search stage. To reduce the penalty of uncertainty in DMVR improvement, the original MV during the DMVR process may be preferred. The SAD between the reference blocks referenced by the initial MV candidate will be smaller by 1/4 of the SAD value.

整数サンプル探索に分数サンプル改善が後続する。計算複雑度を節約するために、SAD比較を伴う追加の探索ではなくパラメトリック誤差曲面方程式を使用することによって、分数サンプル改善が導出される。分数サンプル改善は、整数サンプル探索ステージの出力に基づいて条件付きで呼び出される。第1の反復探索または第2の反復探索のいずれかの中で中心が最小SADを有して整数サンプル探索ステージが終了すると、分数サンプル改善がさらに適用される。 Integer sample search is followed by fractional sample improvement. To save computational complexity, a fractional sample improvement is derived by using a parametric error surface equation rather than an additional search with SAD comparison. Fractional sample refinement is conditionally invoked based on the output of the integer sample search stage. Once the integer sample search stage ends with the center having the minimum SAD in either the first iterative search or the second iterative search, fractional sample refinement is further applied.

パラメトリック誤差曲面ベースのサブピクセルオフセット推定では、中心位置コスト、および中心からの隣接する4個の位置におけるコストが、以下の形式の2D放物線誤差曲面方程式を当てはめるために使用される。
E(x,y) = A(x - x_min)² + B(y - y_min)² + C In parametric error surface-based subpixel offset estimation, the center location cost and the costs at four adjacent locations from the center are used to fit a 2D parabolic error surface equation of the form:
E(x,y) = A(x - x _min ) ² + B(y - y _min ) ² + C

上の式において、(x_min,y_min)はコストが最小の分数位置に相当し、Cは最小コスト値に相当する。5つの探索ポイントのコスト値を使用することによって上の式を解くことによって、(x_min,y_min)が、
x_min = (E(-1,0) - E(1,0))/(2(E(-1,0) + E(1,0) - 2E(0,0)))
y_min = (E(0,-1) - E(0,1))/(2((E(0,-1) + E(0,1) - 2E(0,0)))
として算出される。 In the above equation, (x _min ,y _min ) corresponds to the fractional position with minimum cost, and C corresponds to the minimum cost value. By solving the above equation by using the cost values of the five search points, (x _min ,y _min ) becomes
x _min = (E(-1,0) - E(1,0))/(2(E(-1,0) + E(1,0) - 2E(0,0)))
y _min = (E(0,-1) - E(0,1))/(2((E(0,-1) + E(0,1) - 2E(0,0)))
It is calculated as

すべてのコスト値が正であり最小値がE(0,0)であるので、x_minおよびy_minの値は、-8と8との間となるように自動的に制約される。これは、VVCにおける1/16ペルのMV確度を有する1/2ペルのオフセットに対応する。算出された分数(x_min,y_min)が整数距離改善MVに加算されて、サブピクセル確度の改善デルタMVを得る。 Since all cost values are positive and the minimum value is E(0,0), the values of x _min and y _min are automatically constrained to be between -8 and 8. This corresponds to a 1/2 pel offset with 1/16 pel MV accuracy in VVC. The calculated fraction (x _min ,y _min ) is added to the integer distance improvement MV to obtain the sub-pixel accuracy improvement delta MV.

以下のことは双線形補間およびサンプルパディングを説明する。VVCでは、MVの分解能は1/16ルーマサンプルである。分数位置におけるサンプルは、8タップ補間フィルタを使用して補間される。DMVRでは、探索ポイントは、整数サンプルオフセットを有する初期分数ペルMVを取り囲んでおり、したがって、それらの分数位置のサンプルは、DMVR探索プロセスのために補間されてよい。計算複雑度を低減するために、DMVRにおける探索プロセスのための分数サンプルを生成するために双線形補間フィルタが使用される。いくつかの例では、2サンプルの探索範囲を有する双線形フィルタを使用することによって、DVMRは、通常の動き補償プロセスと比較してより多くの参照サンプルにアクセスしない。改善されるMVがDMVR探索プロセスを用いて達成された後、最終予測を生成するために通常の8タップ補間フィルタが適用される。通常のMCプロセスよりも多くの参照サンプルにアクセスしないために、元のMVに基づく補間プロセスのために必要とされないが改善されたMVに基づく補間プロセスのために必要とされるサンプルは、それらの利用可能なサンプルからパディングされる。 The following describes bilinear interpolation and sample padding. In VVC, the resolution of MV is 1/16 luma sample. Samples at fractional positions are interpolated using an 8-tap interpolation filter. In DMVR, search points surround an initial fractional pel MV with an integer sample offset, so samples at those fractional positions may be interpolated for the DMVR search process. To reduce computational complexity, a bilinear interpolation filter is used to generate fractional samples for the search process in DMVR. In some examples, by using a bilinear filter with a search range of 2 samples, DVMR does not access more reference samples compared to regular motion compensation processes. After the improved MV is achieved using the DMVR search process, a regular 8-tap interpolation filter is applied to generate the final prediction. In order not to access more reference samples than the normal MC process, the samples that are not needed for the original MV-based interpolation process but are needed for the improved MV-based interpolation process are Padded from available samples.

以下のことはDMVRのための例示的な有効化条件を説明する。次の条件がすべて満たされる場合、DMVRが有効化される。
a.双予測MVを用いたCUレベルマージモード。
b.現在ピクチャに対して1つの参照ピクチャが過去にあり別の参照ピクチャが将来にある。
c.両方の参照ピクチャから現在ピクチャまでの距離(たとえば、POC差分)が同じである。
d.CUが64個よりも多くのルーマサンプルを有する。
e.CU高さとCU幅の両方が8ルーマサンプルよりも大きいかまたはそれに等しい。
f.BCW(CUレベルの重みを用いた双予測)重みインデックスが等しい重みを示す。
g.現在ブロックに対してWP(重み付き予測)が有効化されていない。
h.現在ブロックに対してCIIP(複合インターおよびイントラ予測)モードが使用されない。 The following describes exemplary enabling conditions for DMVR. DMVR is enabled if all of the following conditions are met:
a. CU level merge mode using bi-predictive MV.
b. One reference picture is in the past and another in the future for the current picture.
c. The distances (eg, POC difference) from both reference pictures to the current picture are the same.
d.CU has more than 64 luma samples.
e. Both CU height and CU width are greater than or equal to 8 luma samples.
f.BCW (bi-prediction using CU-level weights) weight index indicates equal weight.
g.WP (weighted prediction) is not enabled for the current block.
h. CIIP (Combined Inter and Intra Prediction) mode is not used for the current block.

以下のことは双方向オプティカルフローを説明する。双方向オプティカルフロー(BDOF)は、4×4のサブブロックレベルにおけるCUの中のルーマサンプルの双予測信号を改善するために使用される。その名称が示すように、BDOFモードは、物体の動きが滑らかであることを想定するオプティカルフロー概念に基づく。4×4の各サブブロックに対して、L0予測サンプルとL1予測サンプルとの間の差分を最小化することによって動き改善(v_x,v_y)が計算される。動き改善は、次いで、4×4のサブブロックの中で双予測されるサンプル値を調整するために使用される。 The following describes bidirectional optical flow. Bidirectional optical flow (BDOF) is used to improve the bi-predictive signal of the luma samples in the CU at the 4×4 sub-block level. As its name suggests, BDOF mode is based on the optical flow concept, which assumes smooth object motion. For each 4×4 sub-block, the motion improvement (v _x ,v _y ) is computed by minimizing the difference between the L0 and L1 predicted samples. Motion enhancement is then used to adjust the bi-predicted sample values within the 4x4 sub-blocks.

たとえば、BDOFの場合、ビデオエンコーダ200およびビデオデコーダは、ブロックに対してBDOFが有効化されることを決定し、ブロックに対してBDOFが有効化されるとき、ブロックを複数のサブブロックに分割してよい。いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、ブロックに対する第1の動きベクトルから第1の参照ブロックを、かつブロックに対する第2の動きベクトルから第2の参照ブロックを決定してよい。ビデオエンコーダ200およびビデオデコーダ300は、予測ブロックを生成するために、第1の参照ブロックの中のサンプルと第2の参照ブロックの中のサンプルとをブレンド(たとえば、重み付き平均)してよい。ビデオエンコーダ200およびビデオデコーダ300は、サブブロックのサンプルを符号化または復号するために使用される予測サンプルを生成するために、動き改善を決定してよく予測ブロックの中のサンプルを調整してよい。いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、サブブロックの中の各サンプルにとって同じである動き改善(すなわち、サブブロックBDOFと呼ばれる、サブブロックレベルの動き改善)を決定してよい。いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、サブブロックにおける動き改善または各サンプル(すなわち、ピクセル単位BDOFと呼ばれる、サンプルレベルの動き改善)を決定してよい。 For example, in the case of BDOF, video encoder 200 and video decoder determine that BDOF is enabled for a block, and when BDOF is enabled for a block, they divide the block into multiple subblocks. It's fine. In some examples, video encoder 200 and video decoder 300 may determine a first reference block from a first motion vector for the block and a second reference block from a second motion vector for the block. Video encoder 200 and video decoder 300 may blend (eg, weighted average) the samples in the first reference block and the samples in the second reference block to generate a predictive block. Video encoder 200 and video decoder 300 may determine motion improvements and adjust samples in a predictive block to generate predictive samples that are used to encode or decode samples of a sub-block. . In some examples, video encoder 200 and video decoder 300 may determine a motion improvement that is the same for each sample in a subblock (ie, subblock-level motion improvement, referred to as subblock BDOF). In some examples, video encoder 200 and video decoder 300 may determine motion improvement in sub-blocks or each sample (ie, sample-level motion improvement, referred to as pixel-by-pixel BDOF).

サブブロックBDOFに適用可能であり得るBDOFプロセスの中で、以下のステップが適用される。ピクセル単位BDOFのためのステップが、以下でさらにより詳細に説明される。 In the BDOF process that may be applicable to sub-block BDOF, the following steps are applied. The steps for pixel-wise BDOF are explained in further detail below.

最初に、2つの予測信号の水平勾配および垂直勾配 First, the horizontal and vertical slopes of the two predicted signals

および and

が、2つの隣接サンプルの間の差分を直接計算すること、すなわち、 but directly computes the difference between two adjacent samples, i.e.

によって算出される。 Calculated by

上の例において、I^(k)(i,j)は、リストk(k=0,1)の中の予測信号の座標(i,j)におけるサンプル値であり、shift1は、shift1が6に等しくなるように設定されるようにルーマビット深度bitDepthに基づいて計算される。すなわち、I⁽⁰⁾は第1の参照ブロックのサンプルを指し、I⁽¹⁾は第2の参照ブロックのサンプルを指し、ここで、第1の参照ブロックおよび第2の参照ブロックは、そのサンプルがBDOF技法に従って調整中である予測ブロックを生成するために使用された。 In the above example, I ^(k) (i,j) is the sample value at coordinates (i,j) of the predicted signal in list k (k=0,1), and shift1 is The luma bit depth is calculated based on bitDepth to be set equal. That is, I ⁽⁰⁾ refers to the sample of the first reference block, I ⁽¹⁾ refers to the sample of the second reference block, where the first reference block and the second reference block refer to the sample of the first reference block. was used to generate prediction blocks that are being adjusted according to the BDOF technique.

次いで、勾配の自己相関および相互相関S₁、S₂、S₃、S₅、およびS₆が、
S₁ = Σ_(i,j)∈Ω|ψ_x(i,j)|、S₃ = Σ_(i,j)∈Ωθ(i,j)・(-sign(ψ_x(i,j))
S₂ = Σ_(i,j)∈Ωψ_x(i,j)・sign(ψ_y(i,j))
S₅ = Σ_(i,j)∈Ω|ψ_y(i,j)|、S₆ = Σ_(i,j)∈Ωθ(i,j)・(-sign(ψ_y(i,j)) (1-6-2)
として計算され、ここで、 Then the autocorrelations and crosscorrelations of the slopes S ₁ , S ₂ , S ₃ , S ₅ , and S ₆ are
S ₁ = Σ _(i,j)∈Ω |ψ _x (i,j)|, S ₃ = Σ _(i,j)∈Ω θ(i,j)・(-sign(ψ _x (i,j) )
S ₂ = Σ _(i,j)∈Ω ψ _x (i,j)・sign(ψ _y (i,j))
S ₅ = Σ _(i,j)∈Ω |ψ _y (i,j)|, S ₆ = Σ _(i,j)∈Ω θ(i,j)・(-sign(ψ _y (i,j) ) (1-6-2)
is calculated as, where:

であり、ただし、Ωは4×4のサブブロックの周囲の6×6のウィンドウであり、shift2の値は4に等しくなるように設定され、shift3の値は1に等しくなるように設定される。 , where Ω is a 6×6 window around the 4×4 subblock, the value of shift2 is set equal to 4, and the value of shift3 is set equal to 1 .

動き改善(v_x,v_y)が、次いで、以下のことを使用して相互相関項および自己相関項を使用して導出される。この例では、動き改善はサブブロックに対するものである。ピクセル単位動き改善計算が以下でより詳細に説明される。 The motion improvement (v _x ,v _y ) is then derived using the cross-correlation and autocorrelation terms using: In this example, the motion improvement is for sub-blocks. Pixel-by-pixel motion improvement calculations are explained in more detail below.

ただし、th'_BIO = 1 << 4である。 However, th' _BIO = 1 << 4.

は、床関数である。 is the floor function.

動き改善および勾配に基づいて、4×4のサブブロックの中の各サンプルに対して以下の調整が計算される。 Based on the motion improvement and gradient, the following adjustments are calculated for each sample in a 4x4 sub-block.

最後に、次のように双予測サンプルを調整することによってCUのBDOFサンプルが計算される。
pred_BDOF(x,y) = (I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) + b(x,y) + ο_offset) >> shift5 (1-6-6)
ただし、shift5はMax(3,15 - BitDepth)に等しく設定され、変数ο_offsetは(1 << (shift5 - 1))に等しく設定される。 Finally, the BDOF samples of the CU are calculated by adjusting the bi-predictive samples as follows.
pred _BDOF (x,y) = (I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) + b(x,y) + ο _offset ) >> shift5 (1-6-6)
However, shift5 is set equal to Max(3,15 - BitDepth), and the variable ο _offset is set equal to (1 << (shift5 - 1)).

上の例において、I⁽⁰⁾は第1の参照ブロックを指し、I⁽¹⁾は第2の参照ブロックを指し、b(x,y)は、サブブロックに対する動き改善(v_x,v_y)に基づいて決定される調整値である。いくつかの例では、I⁽⁰⁾(x,y) + I⁽¹⁾(x,y)は予測ブロックと見なされてよく、したがって、b(x,y)は予測ブロックを調整するものと見なされてよい。式(1-6-6)に示すように、予測サンプル(pred_BDOF(x,y))を生成するために、o_offsetの加算、およびshift5だけの右シフト演算があってよい。 In the above example, I ⁽⁰⁾ refers to the first reference block, I ⁽¹⁾ refers to the second reference block, and b(x,y) is the motion improvement (v _x ,v _y ) is the adjustment value determined based on In some examples, I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) may be considered a predictive block, and therefore b(x,y) is considered to adjust the predictive block. It's good to be considered. As shown in equation (1-6-6), there may be an addition of o _offset and a right shift operation of shift5 in order to generate the predicted sample (pred _BDOF (x,y)).

上記のことは、サブブロックの中のすべてのサンプルにとって同じである動き改善(v_x,v_y)が同じであることを、ビデオエンコーダ200およびビデオデコーダ300が決定する、サブブロックBDOFに対する一例を説明する。調整値b(x,y)は、勾配のためにサブブロックの中の各サンプルにとって異なる場合があるが、動き改善は同じであってよい。 The above provides an example for a sub-block BDOF where video encoder 200 and video decoder 300 determine that the motion improvements (v _x ,v _y ) are the same for all samples in the sub-block. explain. The adjustment value b(x,y) may be different for each sample in the subblock due to the gradient, but the motion improvement may be the same.

以下でさらにより詳細に説明するように、ピクセル単位BDOFでは、ビデオエンコーダ200およびビデオデコーダ300は、ピクセル単位動き改善(v_x',v_y')を決定してよい。すなわち、サブブロックBDOFにおけるようにサブブロックに対して1つの動き改善があるのではなく、ピクセル単位BDOFでは、各サンプル(たとえば、ピクセル)に対して、異なる動き改善があってよい。ビデオエンコーダ200およびビデオデコーダ300は、サブブロックにとって同じである動き改善を使用するのではなく、各サンプルのための対応するピクセル単位動き改善に基づいて、そのサンプルに対して調整値b'(x,y)を決定してよい。 As described in even more detail below, for pixel-wise BDOF, video encoder 200 and video decoder 300 may determine pixel-wise motion improvements (v _x ′,v _y ′). That is, rather than having one motion improvement for a sub-block as in sub-block BDOF, in pixel-wise BDOF there may be a different motion improvement for each sample (eg, pixel). Video encoder 200 and video decoder 300 base the adjustment value b'(x ,y) may be determined.

いくつかの例では、式1-6-6からの値は、BDOFプロセスにおける乗数が15ビットを超えずBDOFプロセスにおける中間パラメータの最大ビット幅が32ビット内に保たれるように選択される。 In some examples, the values from Equation 1-6-6 are selected such that the multiplier in the BDOF process does not exceed 15 bits and the maximum bit width of intermediate parameters in the BDOF process is kept within 32 bits.

勾配値を導出するために、現在のCU境界の外側の、リストk(k=0,1)の中のいくつかの予測サンプルI^(k)(i,j)が、ビデオエンコーダ200およびビデオデコーダ300によって生成される。図12に示すように、BDOFは、CU1200の境界の周囲の、1つの拡張された行/列を使用する。境界の外側の予測サンプルを生成する計算量を制御するために、ビデオエンコーダ200およびビデオデコーダ300は、補間を用いずに直接(座標に対してfloor()演算を使用して)近くの整数位置における参照サンプルを取ることによって、拡張されたエリア(白色の位置)の中の予測サンプルを生成してよく、通常の8タップ動き補償補間フィルタが、CU内(灰色の位置)の予測サンプルを生成するために使用される。これらの拡張されたサンプル値は、勾配計算においてのみ使用されてよい。BDOFプロセスにおける残りのステップに対して、CU境界の外側の任意のサンプルおよび勾配値が必要とされる場合、サンプルおよび勾配値は、それらの最も近くのネイバーからパディング(すなわち、反復)される。 To derive the gradient values, some predicted samples I ^(k) (i,j) in the list k (k=0,1) outside the current CU boundary are sent to the video encoder 200 and the video decoder Generated by 300. As shown in Figure 12, BDOF uses one extended row/column around the border of CU 1200. To control the amount of computation that generates predicted samples outside the boundary, video encoder 200 and video decoder 300 calculate directly (using the floor() operation on the coordinates) nearby integer locations without using interpolation. By taking reference samples at , we may generate predicted samples within the extended area (white positions), and a regular 8-tap motion compensated interpolation filter may generate predicted samples within the CU (gray positions). used to. These expanded sample values may only be used in gradient calculations. For the remaining steps in the BDOF process, if any samples and gradient values outside the CU boundary are needed, the samples and gradient values are padded (ie, repeated) from their nearest neighbors.

BDOFは、4×4サブブロックレベルにおけるCUの双予測信号(たとえば、第1の参照ブロックと第2の参照ブロックとの和)を改善するために使用される。次の条件のすべてが満たされる場合、BDOFがCUに適用される。
a.「真の」双予測モードを使用してCUがコーディングされ、すなわち、2つの参照ピクチャのうちの一方が表示順序で現在ピクチャの前にあり、他方が表示順序で現在ピクチャの後にある。
b.CUがアフィンモードまたはATMVPマージモードを使用してコーディングされていない。
c.CUが64個よりも多くのルーマサンプルを有する。
d.CU高さとCU幅の両方が8ルーマサンプルよりも大きいかまたはそれに等しい。
e.BCW重みインデックスが等しい重みを示す。
f.現在CUに対してWPが有効化されていない。
g.現在CUに対してCIIPモードが使用されない。 BDOF is used to improve the CU's bi-predicted signal (eg, the sum of the first reference block and the second reference block) at the 4×4 subblock level. BDOF applies to a CU if all of the following conditions are met:
a. The CU is coded using a "true" bi-prediction mode, i.e. one of the two reference pictures is before the current picture in the display order and the other is after the current picture in the display order.
b.CU is not coded using affine mode or ATMVP merge mode.
c.CU has more than 64 luma samples.
d. Both CU height and CU width are greater than or equal to 8 luma samples.
e.BCW weight index indicates equal weight.
f.WP is not currently enabled for the CU.
g. CIIP mode is not currently used for the CU.

BDOFを伴ういくつかの問題があり得る。上記で説明したように、現在のバージョンのVVCでは、BDOF方法は、4×4サブブロックレベルにおけるコーディングブロックの中のルーマサンプルの双予測信号を改善するために使用される。動き改善(v_x,v_y)は、6×6のルーマサンプル領域の中でL0予測サンプルとL1予測サンプルとの間の差分を最小化することによって導出される。L0予測サンプルは、第1の参照ブロックのサンプルを指し、L1予測サンプルは、第2の参照ブロックのサンプルを指す。動き改善(v_x,v_y)が、次いで、4×4のサブブロックの各予測サンプルを調整するために使用される。 There can be several problems with BDOF. As explained above, in the current version of VVC, the BDOF method is used to improve the bi-predictive signal of the luma samples in the coding block at the 4×4 sub-block level. The motion improvement (v _x ,v _y ) is derived by minimizing the difference between the L0 and L1 predicted samples within the 6×6 luma sample region. L0 predicted samples refer to samples of the first reference block, and L1 predicted samples refer to samples of the second reference block. Motion enhancement (v _x ,v _y ) is then used to adjust each predicted sample of the 4x4 sub-block.

しかしながら、4×4のサブブロックの中のルーマサンプルは、4×4のサブブロックの中の他のルーマサンプルと比較して異なる動き改善特性を有することがある。ピクセルレベルにおいて動き改善(v'_x,v'_y)を計算することは、各ピクセルに対する動き改善の確度を改善することができ、したがって、サブブロックまたはブロック予測品質を改善することができる。 However, luma samples within a 4x4 sub-block may have different motion improvement characteristics compared to other luma samples within a 4x4 sub-block. Computing the motion improvement (v' _x ,v' _y ) at the pixel level can improve the accuracy of the motion improvement for each pixel and thus the sub-block or block prediction quality.

しかしながら、BDOFはデコーダ側プロセスであり、BDOFの複雑度はまた、ビデオコーディング方法を設計するときに検討されるべき重要な側面である。ピクセルレベルにおいて動き改善(v'_x,v'_y)が計算されるとき、BDOFの複雑度は、4×4サブブロックレベルにおける現在のBDOFと比較して16倍であり得る。言い換えれば、現在の4×4サブブロックBDOFは、最良の予測品質を達成しない。ピクセル単位BDOFはより良好な予測品質を有するが、ビデオコーディングにとって複雑度が問題である。 However, BDOF is a decoder-side process, and the complexity of BDOF is also an important aspect to be considered when designing a video coding method. When the motion improvement (v' _x ,v' _y ) is computed at the pixel level, the complexity of the BDOF may be 16 times higher than the current BDOF at the 4x4 sub-block level. In other words, the current 4x4 sub-block BDOF does not achieve the best prediction quality. Although pixel-wise BDOF has better prediction quality, complexity is an issue for video coding.

VVCドラフト10では、BDOFがデコーダ側動きベクトル改善(DMVR)に先行するとき、BDOFプロセスは、DMVR探索プロセスにおける最小SADに基づいてバイパスされ得る。DMVRプロセスは、16×16サブブロックレベルにおけるものである。このBDOFバイパス方式は、複雑度を低減することができる。 In VVC Draft 10, when BDOF precedes decoder-side motion vector refinement (DMVR), the BDOF process may be bypassed based on the minimum SAD in the DMVR search process. The DMVR process is at the 16x16 subblock level. This BDOF bypass method can reduce complexity.

しかしながら、16×16のサブブロック内のサブエリアの予測信号は、BDOFによって改善される必要があり得る。VVCドラフト10方式のBDOFバイパスは、16×16のサブブロック内のサブエリアにおいてBDOFを適用できず、その間に他のサブエリアにおいてBDOFをバイパスする。VVCドラフト10では、(DMVR予測されない)双予測されるコーディングブロックに対してBDOFが適用されるとき、バイパスBDOF方式がない。 However, the prediction signal of a subarea within a 16x16 subblock may need to be improved by BDOF. In the BDOF bypass of the VVC Draft 10 method, BDOF cannot be applied in subareas within a 16×16 subblock, and BDOF is bypassed in other subareas in the meantime. In VVC Draft 10, there is no bypass BDOF scheme when BDOF is applied to bi-predicted coding blocks (not DMVR predicted).

以下のことは、上記の問題に対処し得る例示的な技法を説明する。しかしながら、本技法は、上記の問題に対処することへの限定またはそのことのために必要とされるものと見なされるべきでない。以下の技法は、事実上、別個にまたは任意の組合せで使用され得る。簡単のために、以下の技法は様々な態様として説明されるが、そのような態様は、別個であることを必要とされるものと見なされるべきでなく、事実上、様々な態様が組み合わせられ得る。別段に規定されていない限り、例示的な態様がビデオエンコーダ200および/またはビデオデコーダ300によって実行されてよい。 The following describes example techniques that may address the above problems. However, the present techniques should not be considered limited to or required to address the above problems. The following techniques may be used separately or in virtually any combination. Although, for simplicity, the techniques below are described as various aspects, such aspects should not be considered as required to be separate; in fact, the various aspects may be combined. obtain. Unless otherwise specified, example aspects may be performed by video encoder 200 and/or video decoder 300.

第1の態様は、サブブロックBDOFをバイパスすることに関する。この第1の態様では、W×Hのコーディングブロックが双方向オプティカルフロー(BDOF)を適用すべきと決定されると、ビデオエンコーダ200および/またはビデオデコーダ300は、コーディングブロックのサブエリアに対してBDOFプロセスをバイパスしてよい。第1の態様に対するBDOFプロセスは次の通りであり得る。
a.BDOFプロセスは入力ブロック(S1と指名する)とともに開始し、S1は寸法W_1×H_1を有し、S1の寸法はコーディングブロックの寸法に等しいかまたはそれよりも小さい。先に行われたプロセスがブロックベースであるとき、S1の寸法はコーディングブロックに等しい。先に行われたプロセスがサブブロックベース(ハードウェア制約に起因するかまたは以前の処理ステージからのサブブロック区分)であるとき、S1の寸法はコーディングブロックよりも小さい。
b.入力ブロックS1はN個のサブブロック(S2と指名する)に分割され、S2は寸法W_2×H_2を有し、S2の寸法はS1の寸法に等しいかまたはそれよりも小さい。条件Tによって決定される各S2に対して、S2はBDOFを適用すべきか否かが決定される。いくつかの例では、条件Tとは、参照ピクチャ0および参照ピクチャ1の中の2つの予測信号の間のSADがしきい値よりも小さいか否かをチェックすることである。このステップにおけるサブブロックは、ユニット内のすべてのサンプルにBDOFを適用すべきかどうかの決定のための基本ユニットを規定する。
c.S2にBDOFを適用すべきと決定されると、S2はM個のサブブロック(S3と指名する)に分割され、S3は寸法W_3×H_3を有し、S3の寸法はS2の寸法に等しいかまたはそれよりも小さい。各S3に対して、BDOFプロセスが適用されて改善された動きベクトル(v'_x,v'_y)を導出し、(動き補償、または初期予測信号にオフセットを加算することのいずれかを通じて)導出された動きベクトルを使用してS3の予測信号を導出する。このステップにおけるサブブロックは、改善された動きベクトルの粒度に対するユニットを規定し、ユニット内のすべてのサンプルは、同じ改善された動きを共有する。 The first aspect relates to bypassing the sub-block BDOF. In this first aspect, when it is determined that a W×H coding block should apply bidirectional optical flow (BDOF), video encoder 200 and/or video decoder 300 May bypass the BDOF process. The BDOF process for the first embodiment may be as follows.
a. The BDOF process starts with an input block (designated S1), S1 has dimensions W_1×H_1, and the dimensions of S1 are equal to or smaller than the dimensions of the coding block. When the previous process is block-based, the size of S1 is equal to the coding block. When the previously performed process is sub-block based (sub-block partitioning due to hardware constraints or from previous processing stages), the size of S1 is smaller than the coding block.
b. The input block S1 is divided into N sub-blocks (designated S2), S2 has dimensions W_2×H_2, and the dimensions of S2 are equal to or smaller than the dimensions of S1. For each S2 determined by condition T, it is determined whether S2 should apply BDOF. In some examples, condition T is to check whether the SAD between two predicted signals in reference picture 0 and reference picture 1 is less than a threshold. The sub-blocks in this step define the basic unit for the decision whether to apply BDOF to all samples in the unit.
c. Once it is decided that BDOF should be applied to S2, S2 is divided into M sub-blocks (designated as S3), S3 has dimensions W_3×H_3, and the dimensions of S3 are the dimensions of S2. Equal to or less than. For each S3, a BDOF process is applied to derive the improved motion vector (v' _x ,v' _y ) and (either through motion compensation or adding an offset to the initial predicted signal) The predicted signal of S3 is derived using the motion vector obtained. The sub-blocks in this step define a unit for refined motion vector granularity, and all samples within the unit share the same improved motion.

態様1のBDOFプロセスでは、ブロックS1、S2、およびS3が規定される。S3の寸法はS2に等しいかまたはそれよりも小さくてよく、S2の寸法はS1に等しいかまたはそれよりも小さくてよい。言い換えれば、W_3はW_2に等しいかまたはそれよりも小さく、H_3はH_2に等しいかまたはそれよりも小さく、W_2はW_1に等しいかまたはそれよりも小さく、H_2はH_1に等しいかまたはそれよりも小さい。サイズは固定されてよく、ピクチャ解像度に適合されてよく、またはビットストリームの中でシグナリングされてもよい。 In the BDOF process of aspect 1, blocks S1, S2, and S3 are defined. The dimensions of S3 may be equal to or less than S2, and the dimensions of S2 may be equal to or less than S1. In other words, W_3 is less than or equal to W_2, H_3 is less than or equal to H_2, W_2 is less than or equal to W_1, and H_2 is less than or equal to H_1. . The size may be fixed, adapted to the picture resolution, or signaled within the bitstream.

1つの事例は、W_3が1に等しくH_3が1に等しいことであり、ここで、S3はピクセルベースである。この事例はピクセル単位BDOFプロセスであってよい。 One case is that W_3 equals 1 and H_3 equals 1, where S3 is pixel-based. This case may be a pixel-by-pixel BDOF process.

いくつかの例では、先に行われたサブブロックベースのプロセスがコーディングブロックに適用されるか否かにかかわらず、S1はコーディングブロックである。 In some examples, S1 is a coding block regardless of whether a previously performed subblock-based process is applied to the coding block.

第2の態様は、サブブロックBDOFバイパス方式を用いたピクセル単位BDOFに関する。第1の態様におけるように、W×Hのコーディングブロック(S1)が双方向オプティカルフロー(BDOF)を適用すべきと決定されると、コーディングブロックはN個のサブブロック(S2)に分割される。各サブブロックに対して、サブブロックにBDOFを適用すべきか否かが、参照ピクチャ0および参照ピクチャ1の中の2つの予測信号の間のSADがしきい値よりも小さいか否かをチェックすることによって、さらに決定される。サブブロックにBDOFを適用すべきと決定される場合、改善された動きベクトル(v'_x,v'_y)がサブブロック(S2)内の各ピクセル(S3)に対して計算される。改善された動きベクトル(v'_x,v'_y)は、サブブロック(S2)内のそのピクセル(S3)に対する予測信号を調整するために使用される。サブブロックバイパスプロセスを伴うピクセル単位BDOFの一例が図13に示される。 The second aspect relates to pixel-by-pixel BDOF using a sub-block BDOF bypass scheme. As in the first aspect, when it is determined that a W×H coding block (S1) should apply bidirectional optical flow (BDOF), the coding block is divided into N sub-blocks (S2). . For each subblock, check whether the SAD between the two predicted signals in reference picture 0 and reference picture 1 is smaller than a threshold whether to apply BDOF to the subblock. This is further determined by: If it is decided that BDOF should be applied to a sub-block, an improved motion vector (v' _x , v' _y ) is calculated for each pixel (S3) in the sub-block (S2). The improved motion vector (v' _x ,v' _y ) is used to adjust the prediction signal for that pixel (S3) within the sub-block (S2). An example of pixel-wise BDOF with sub-block bypass process is shown in FIG. 13.

たとえば、図13において、ビデオエンコーダ200およびビデオデコーダ300は、ビデオデータのブロックに対してBDOFが有効化されることを決定してよく、ビデオエンコーダ200およびビデオデコーダ300は、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割してよい。図13に示すように、サブブロックの個数およびN個のサブブロックインデックス<i = 0>を導出することは(1300)、ビデオエンコーダ200およびビデオデコーダ300がブロックをN個のサブブロックに分割し、ここで、各サブブロックがそれぞれのインデックスによって識別され、第1のインデックスが0であることを指す。したがって、インデックスは0からN-1までに及ぶ。 For example, in FIG. 13, video encoder 200 and video decoder 300 may determine that BDOF is enabled for a block of video data, and video encoder 200 and video decoder 300 may determine that BDOF is enabled for the block. A block may be divided into multiple sub-blocks based on the determination to be enabled. As shown in FIG. 13, deriving the number of subblocks and N subblock indices <i = 0> (1300) involves video encoder 200 and video decoder 300 dividing the block into N subblocks. , where each sub-block is identified by a respective index, with the first index being 0. Therefore, the index ranges from 0 to N-1.

ビデオエンコーダ200およびビデオデコーダ300は、i < Nによって表されるように、ブロックの中のすべてのサブブロックのための予測サンプルが決定されているかどうかを決定してよい(1302)。すべてのサブブロックのための予測サンプルが決定されている場合(1302のNO)、ビデオエンコーダ200およびビデオデコーダ300は、サブブロックのための予測サンプルを決定するプロセスを終えてよい。しかしながら、すべてのサブブロックのための予測サンプルが決定されていない場合(1302のYES)、ビデオエンコーダ200およびビデオデコーダ300は、ブロックが分割された先の複数のサブブロックのうちの現在のサブブロックの予測サンプルを決定するプロセスを継続してよい。 Video encoder 200 and video decoder 300 may determine whether prediction samples for all subblocks in the block have been determined (1302), as represented by i < N. If predictive samples for all sub-blocks have been determined (NO at 1302), video encoder 200 and video decoder 300 may finish the process of determining predictive samples for the sub-blocks. However, if prediction samples for all sub-blocks have not been determined (YES at 1302), video encoder 200 and video decoder 300 determine whether the current sub-block of the multiple sub-blocks into which the block has been partitioned The process of determining predictive samples of may continue.

現在のサブブロックに対して、ビデオエンコーダ200およびビデオデコーダ300はひずみ値を決定してよい(1304)。ひずみ値に対する決定がサブブロックごとに行われてよいので、ビデオエンコーダ200およびビデオデコーダ300は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するものと見なされてよい(たとえば、第1のサブブロックに対する第1のひずみ値、第2のサブブロックに対する第2のひずみ値、以下同様)。 For the current subblock, video encoder 200 and video decoder 300 may determine a distortion value (1304). Since decisions on distortion values may be made for each sub-block, video encoder 200 and video decoder 300 determine respective distortion values for each sub-block of one or more of the plurality of sub-blocks. (eg, a first strain value for a first sub-block, a second strain value for a second sub-block, and so on).

現在のサブブロックに対するひずみ値を決定するための1つの例示的な方法は、第1の参照ブロック(ref0)と第2の参照ブロック(ref1)との間の絶対差分和(SAD)を決定することによるものである。しかしながら、ひずみ値を決定するための他の方法があってよい。たとえば、以下でさらにより詳細に説明するように、いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、ビデオエンコーダ200およびビデオデコーダ300がBDOFを実行することになるときなどの、得られた値が後で再使用され得るような方法でひずみ値を決定してよい。 One exemplary method for determining the strain value for the current subblock is to determine the sum of absolute differences (SAD) between a first reference block (ref0) and a second reference block (ref1). This is due to a number of reasons. However, there may be other methods for determining strain values. For example, as described in further detail below, in some examples, video encoder 200 and video decoder 300 perform a Strain values may be determined in such a way that the values can be reused later.

図13に示すように、ビデオエンコーダ200およびビデオデコーダ300は、ひずみ値をしきい値と比較してよい(1306)。比較に基づいて、ビデオエンコーダ200およびビデオデコーダ300は2つのオプションを有してよい。第1のオプションは、ピクセル単位BDOFを実行することであってよく、第2のオプションは、BDOFをバイパスすることであってよい。ビデオエンコーダ200およびビデオデコーダ300にはサブブロックBDOFなどの他のオプションがない場合がある。したがって、ビデオエンコーダ200およびビデオデコーダ300は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて(たとえば、固定されたしきい値またはそれぞれのしきい値との、それぞれのひずみ値の比較に基づいて)決定するものと見なされてよい。 As shown in FIG. 13, video encoder 200 and video decoder 300 may compare the distortion value to a threshold (1306). Based on the comparison, video encoder 200 and video decoder 300 may have two options. The first option may be to perform pixel-by-pixel BDOF, and the second option may be to bypass BDOF. Video encoder 200 and video decoder 300 may not have other options such as sub-block BDOF. Accordingly, video encoder 200 and video decoder 300 determine whether pixel-by-pixel BDOF is performed or BDOF is bypassed for each sub-block of one or more of the plurality of sub-blocks. One may be considered to be determined based on the respective strain value (eg, based on a comparison of the respective strain value with a fixed threshold or a respective threshold).

たとえば、現在のサブブロックに対するひずみ値がしきい値よりも大きい場合(1306のNO)、ビデオエンコーダ200およびビデオデコーダ300は、ピクセル単位BDOFを実行してよい(1308)。現在のサブブロックに対するひずみ値がしきい値よりも小さい場合(1306のYES)、ビデオエンコーダ200およびビデオデコーダ300は、(たとえば、サブブロックに対してBDOFをバイパスすることによって)サブブロックの中の予測信号を導出してよい(1310)。 For example, if the distortion value for the current sub-block is greater than the threshold (NO at 1306), video encoder 200 and video decoder 300 may perform pixel-by-pixel BDOF (1308). If the distortion value for the current sub-block is less than the threshold (YES at 1306), video encoder 200 and video decoder 300 determine whether the distortion value in the sub-block (e.g., by bypassing the BDOF for the sub-block) A predicted signal may be derived (1310).

1つまたは複数の例では、ビデオエンコーダ200およびビデオデコーダ300は、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定してよい。たとえば、ビデオエンコーダ200およびビデオデコーダ300が現在のサブブロックに対してBDOFを実行することになる場合、ビデオエンコーダ200およびビデオデコーダ300はピクセル単位BDOF技法を使用して予測サンプルを決定してよいが、ビデオエンコーダ200およびビデオデコーダ300が現在のサブブロックに対してBDOFをバイパスすることになる場合、ビデオエンコーダ200およびビデオデコーダ300は、BDOF技法を使用せずに予測サンプルを決定してよい。 In one or more examples, video encoder 200 and video decoder 300 may perform for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed. A predictive sample may be determined. For example, if video encoder 200 and video decoder 300 are to perform BDOF on the current sub-block, video encoder 200 and video decoder 300 may use pixel-wise BDOF techniques to determine prediction samples; , video encoder 200 and video decoder 300 may determine prediction samples without using BDOF techniques if video encoder 200 and video decoder 300 are to bypass BDOF for the current sub-block.

図13の上の例は、現在のサブブロックに対してピクセル単位BDOFが実行されるのかそれともBDOFがバイパスされるのかの決定がどのように行われるのかを説明した。ビデオエンコーダ200およびビデオデコーダ300は、上記の例示的な技法をサブブロックごとに実行してよい。 The upper example of FIG. 13 explained how the decision is made whether pixel-by-pixel BDOF is performed or BDOF is bypassed for the current sub-block. Video encoder 200 and video decoder 300 may perform the example techniques described above on a subblock-by-subblock basis.

たとえば、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するために、1つまたは複数のサブブロックのうちの第1のサブブロックに対して、ビデオエンコーダ200およびビデオデコーダ300は、それぞれのひずみ値の第1のひずみ値を決定してよく、1つまたは複数のサブブロックのうちの第2のサブブロックに対して、ビデオエンコーダ200およびビデオデコーダ300は、それぞれのひずみ値の第2のひずみ値を決定してよい。 For example, for a first subblock of one or more subblocks to determine a respective strain value for each subblock of one or more of the subblocks. video encoder 200 and video decoder 300 may determine a first distortion value of the respective distortion values, and for a second sub-block of the one or more sub-blocks, video encoder 200 and Video decoder 300 may determine a second distortion value for each distortion value.

複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するために、複数のサブブロックのうちの第1のサブブロックに対して、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックに対してBDOFが有効化されることを第1のひずみ値に基づいて(たとえば、第1のひずみ値がしきい値よりも大きいことに基づいて)決定してよい。この例では、第1のサブブロックに対してBDOFが有効化されるという決定に基づいて、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックのための予測サンプルの第1のセットを改善するためのピクセル単位動き改善を決定してよい(たとえば、ピクセル単位BDOFを実行してよい)。たとえば、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックの第1のサンプルに対して、第1の予測サンプルを改善するための第1の動き改善を導出してよく、第1のサブブロックの第2のサンプルに対して、第2の予測サンプルを改善するための第2の動き改善を導出してよく、以下同様である。 For each sub-block of one or more of the plurality of sub-blocks, one of: per-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. To determine, for a first subblock of the plurality of subblocks, video encoder 200 and video decoder 300 use a first distortion to determine that BDOF is enabled for the first subblock. The determination may be based on a value (eg, based on the first strain value being greater than a threshold). In this example, based on the determination that BDOF is enabled for the first sub-block, video encoder 200 and video decoder 300 improve the first set of prediction samples for the first sub-block. A pixel-by-pixel motion improvement may be determined (eg, pixel-by-pixel BDOF may be performed). For example, video encoder 200 and video decoder 300 may derive a first motion improvement for a first sample of a first sub-block to improve a first predicted sample; For a second sample of the block, a second motion improvement may be derived to improve the second predicted sample, and so on.

しかしながら、複数のサブブロックのうちの第2のサブブロックに対して、ビデオエンコーダ200およびビデオデコーダ300は、BDOFがバイパスされることを第2のひずみ値に基づいて(たとえば、第2のひずみ値がしきい値よりも小さいことに基づいて)決定することがある。この例では、第2のブロックに対してBDOFがバイパスされるという決定に基づいて、ビデオエンコーダ200およびビデオデコーダ300は、第2のサブブロックのための予測サンプルの第2のセットを改善するためのピクセル単位動き改善を決定することをバイパスしてよい(たとえば、BDOFをバイパスしてよい)。たとえば、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックの第1のサンプルに対して、第1の予測サンプルを改善するための第1の動き改善の導出をバイパスしてよく、第1のサブブロックの第2のサンプルに対して、第2の予測サンプルを改善するための第2の動き改善の導出をバイパスしてよく、以下同様である。 However, for a second subblock of the plurality of subblocks, video encoder 200 and video decoder 300 determine that the BDOF is bypassed based on the second distortion value (e.g., the second distortion value is smaller than a threshold). In this example, based on the decision that the BDOF is bypassed for the second block, video encoder 200 and video decoder 300 improve the second set of prediction samples for the second sub-block. (e.g., BDOF may be bypassed). For example, video encoder 200 and video decoder 300 may bypass deriving a first motion improvement to improve a first predicted sample for a first sample of a first sub-block; For the second sample of the sub-block of , the derivation of the second motion improvement to improve the second predicted sample may be bypassed, and so on.

ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定するために、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックに対して、第1のサブブロックの予測サンプルの改善された第1のセットを第1のサブブロックのためのピクセル単位動き改善に基づいて決定してよい。第2のサブブロックに対して、ビデオエンコーダ200およびビデオデコーダ300は、予測サンプルの第2のセットを改善するためのピクセル単位動き改善に基づいて予測サンプルの第2のセットを改善することなく、予測サンプルの第2のセットを決定してよい。 To determine prediction samples for each subblock of the one or more subblocks based on the determination that pixel-by-pixel BDOF is performed or BDOF is bypassed, video encoder 200 and video decoder 300 include: For the first sub-block, an improved first set of prediction samples for the first sub-block may be determined based on pixel-by-pixel motion improvement for the first sub-block. For the second sub-block, video encoder 200 and video decoder 300 perform a second set of predicted samples without improving the second set of predicted samples based on pixel-by-pixel motion improvement to improve the second set of predicted samples. A second set of predictive samples may be determined.

第2の態様内で、以下のことはサブブロックBDOFをバイパスすることを説明する。双方向オプティカルフロー(BDOF)を適用すべきと決定されるW×Hのコーディングブロックが与えられると、サブブロックの個数Nは次のように決定される。
a. numSbX = (W > thW) ? (W / thW) : 1
b. numSbY = (H > thH) ? (H / thH) : 1
c. N = numSbX * numSbY Within the second aspect, the following describes bypassing the sub-block BDOF. Given a W×H coding block to which bidirectional optical flow (BDOF) is determined to be applied, the number N of subblocks is determined as follows.
a. numSbX = (W > thW) ? (W / thW) : 1
b. numSbY = (H > thH) ? (H / thH) : 1
c. N = numSbX * numSbY

上記では、thWは最大サブブロック幅を表し、thHは最大サブブロック高さを表す。thWおよびthHの値は事前決定された整数値である(たとえば、thW = thH = 8)。 In the above, thW represents the maximum subblock width and thH represents the maximum subblock height. The values of thW and thH are predetermined integer values (eg, thW = thH = 8).

各サブブロックに対して、ビデオエンコーダ200および/またはビデオデコーダ300は、それぞれ、参照ピクチャ0および参照ピクチャ1から予測信号predSig0および予測信号predSig1を導出してよい。predSig0およびpredSig1の幅(sbWidth)および高さ(sbHeight)は、次のように決定される。
a. sbWidth = (W > thW) ? thW : W
b. sbHeight = (H > thH) ? thH : H For each sub-block, video encoder 200 and/or video decoder 300 may derive prediction signal predSig0 and prediction signal predSig1 from reference picture 0 and reference picture 1, respectively. The width (sbWidth) and height (sbHeight) of predSig0 and predSig1 are determined as follows.
a. sbWidth = (W > thW) ? thW : W
b. sbHeight = (H > thH) ? thH : H

サブブロックにおいてBDOFをバイパスすべきか否かは、predSig0とpredSig1との間のSADをチェックすることによって決定される。SADは次のように導出される。 Whether or not to bypass BDOF in a sub-block is determined by checking the SAD between predSig0 and predSig1. SAD is derived as follows.

上の式において、Ω''はsbWidth×sbHeightのサブブロックであり、I^(k)(i,j)は、参照ピクチャk(k=0,1)の中の予測信号の座標(i,j)におけるサンプル値である。 In the above equation, Ω'' is a subblock of sbWidth×sbHeight, and I ^(k) (i,j) is the coordinate (i,j ) is the sample value at

sbSADがしきい値sbDistThよりも小さい場合、ビデオエンコーダ200および/またはビデオデコーダ300は、サブブロックにおいてBDOFをバイパスすべきと決定してよく、そうでない場合(sbSADがsbDistThに等しいかまたはそれよりも大きい場合)、ビデオエンコーダ200および/またはビデオデコーダ300は、サブブロックにBDOFを適用すべきと決定してよい。しきい値sbDistThは次のように導出される。
sbDistTh = (sbWidth・sbHeight・s) << n (3-1-1-2) If sbSAD is less than the threshold sbDistTh, video encoder 200 and/or video decoder 300 may decide that BDOF should be bypassed in the sub-block, otherwise (sbSAD is equal to or greater than sbDistTh) (large), video encoder 200 and/or video decoder 300 may decide to apply BDOF to the sub-block. The threshold value sbDistTh is derived as follows.
sbDistTh = (sbWidth・sbHeight・s) << n (3-1-1-2)

上の式において、nおよびsは事前決定された値である。たとえば、nは、n = InternalBitDepth - bitDepth + 1として導出され得る。上の式において、sはスケール係数を表し、たとえば、s = 1である。現在のバージョンのVVCでは、InternalBitDepthはbitDepth10において14に等しく、したがって、nは5に等しい。スケールsは、1、2、3、事前定義された他の値であってよく、またはビットストリームの中でシグナリングされてもよい。 In the above formula, n and s are predetermined values. For example, n may be derived as n = InternalBitDepth - bitDepth + 1. In the above formula, s represents a scale factor, for example, s = 1. In the current version of VVC, InternalBitDepth is equal to 14 at bitDepth10, so n is equal to 5. The scale s may be 1, 2, 3, other predefined values, or may be signaled in the bitstream.

上記のことが、しきい値を決定する1つの例示的な方法およびひずみ値を決定する1つの例示的な方法を説明することを、理解されたい。しかしながら、例示的な技法はそのように限定されない。以下でより詳細に説明するように、いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、ピクセル単位BDOFが実行されることになるという決定がなされる場合、ひずみ値を決定するために使用された計算がピクセル単位BDOFを実行するために再使用され得るような方法でひずみ値を決定してよい。 It should be appreciated that the above describes one example method of determining a threshold and one example method of determining a strain value. However, the example techniques are not so limited. As described in more detail below, in some examples, video encoder 200 and video decoder 300 use The distortion values may be determined in such a way that the calculated calculations can be reused to perform pixel-by-pixel BDOF.

第2の態様内で、以下のことはピクセル単位BDOFを説明する。ビデオエンコーダ200および/またはビデオデコーダ300が、sbWidth×sbHeightのサブブロックにBDOFを適用すべきと決定した場合、サブブロックは(sbWidth + 4)×(sbHeight + 4)の領域に拡張される。サブブロック内の各ピクセルに対して、ビデオエンコーダ200および/またはビデオデコーダ300は、5×5の周囲領域の勾配に基づいて、改善された動きベクトルとも呼ばれる動き改善(v'_x,v'_y)を導出してよい。図14は、8×8のサブブロックのピクセル単位BDOFの一例を示す。したがって、ピクセル単位BDOFでは、ビデオエンコーダ200およびビデオデコーダ300はピクセル単位動き改善を決定してよい。サブブロックBDOFでは、動き改善はサブブロックに対するものであり、サンプルごとには(たとえば、ピクセルごとには)決定されない。 Within the second aspect, the following describes a pixel-wise BDOF. If video encoder 200 and/or video decoder 300 determines that BDOF should be applied to a sub-block of sbWidth x sbHeight, the sub-block is expanded to an area of (sbWidth + 4) x (sbHeight + 4). For each pixel within a sub-block, video encoder 200 and/or video decoder 300 calculates a motion improvement (v' _x ,v' _y ) may be derived. FIG. 14 shows an example of a pixel-by-pixel BDOF of an 8×8 sub-block. Therefore, for pixel-by-pixel BDOF, video encoder 200 and video decoder 300 may determine pixel-by-pixel motion improvement. In sub-block BDOF, motion improvement is for sub-blocks and is not determined on a sample-by-sample (eg, pixel-by-pixel) basis.

上記では、sbWidth×sbHeightのサブブロックが与えられると、ピクセル単位BDOFプロセスにおいて以下のステップが適用される。
- 2つの予測信号の水平勾配および垂直勾配 In the above, given a sub-block of sbWidth×sbHeight, the following steps are applied in the pixel-wise BDOF process.
- horizontal and vertical slopes of the two predicted signals

および and

が、上記で説明した双方向オプティカルフローにおけるように2つの隣接サンプルの間の差分を直接計算することによって算出され、(i,j)は、参照ピクチャ0および参照ピクチャ1の中の予測信号の(sbWidth + 4)×(sbHeight + 4)の領域の中の座標付き位置である。
- サブブロック内の各ピクセルに対して、以下のステップが適用される。
・勾配の自己相関および相互相関S₁、S₂、S₃、S₅、およびS₆が、上記で説明した双方向オプティカルフローにおけるように計算され、Ω'は、ピクセルの周囲の5×5のウィンドウである。
・動き改善(v'_x,v'_y)が、次いで、相互相関項および自己相関項を使用して導出される。
・動き改善および勾配に基づいて、ピクセルの予測信号を導出するために以下の調整が計算される。 is calculated by directly calculating the difference between two adjacent samples as in the bidirectional optical flow explained above, and (i,j) is the predicted signal in reference picture 0 and reference picture 1. It is a position with coordinates in the area of (sbWidth + 4) x (sbHeight + 4).
- For each pixel within a sub-block, the following steps are applied.
The auto- and cross-correlations of the gradients S ₁ , S ₂ , S ₃ , S ₅ , and S ₆ are calculated as in the bidirectional optical flow described above, and Ω' is the 5×5 window.
- The motion improvement (v' _x ,v' _y ) is then derived using the cross-correlation and autocorrelation terms.
- Based on the motion improvement and gradient, the following adjustments are calculated to derive the predicted signal of the pixel.

上の例において、I⁽⁰⁾は第1の参照ブロックを指し、I⁽¹⁾は第2の参照ブロックを指す。調整値b'(x,y)は、サブブロックの中の各サンプルに対するピクセル単位動き改善(v'_x,v'_y)に基づいて決定される調整値である。いくつかの例では、I⁽⁰⁾(x,y) + I⁽¹⁾(x,y)は予測ブロックと見なされてよく、したがって、b'(x,y)は予測ブロックを調整するものと見なされてよい。式(3-1-2-1)に示すように、予測サンプル(pred_BDOF(x,y))を生成するために、o_offsetの加算、およびshift5だけの右シフト演算があってよい。 In the above example, I ⁽⁰⁾ refers to the first reference block and I ⁽¹⁾ refers to the second reference block. The adjustment value b'(x,y) is an adjustment value determined based on the pixel-by-pixel motion improvement (v' _x , v' _y ) for each sample in the sub-block. In some examples, I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) may be considered a predictive block, and therefore b'(x,y) is the one that adjusts the predictive block. may be considered as As shown in equation (3-1-2-1), in order to generate the predicted sample (pred _BDOF (x,y)), there may be an addition of o _offset and a right shift operation of shift5.

第3の態様は、代替のサブブロックSAD導出に関する。SADを導出するためのこの例示的な技法は、SAD導出のために決定された値がピクセル単位BDOFを実行するために再使用され得るようなものであってよい。すなわち、ビデオエンコーダ200およびビデオデコーダは、ピクセル単位BDOFを実行すべきか否かを決定するために、最初にサブブロックに対してひずみ値(たとえば、SAD値)を決定してよい。ピクセル単位BDOFが実行されることになることをビデオエンコーダ200およびビデオデコーダ300が決定する場合、ピクセル単位BDOFを実行すべきか否かを決定するためにビデオエンコーダ200およびビデオデコーダ300が実行した計算が、ピクセル単位BDOFを実行するために再使用されてよい。 A third aspect relates to alternative sub-block SAD derivation. This exemplary technique for deriving SAD may be such that the values determined for SAD derivation may be reused to perform pixel-by-pixel BDOF. That is, video encoder 200 and video decoder may first determine distortion values (eg, SAD values) for a sub-block to determine whether to perform pixel-by-pixel BDOF. If video encoder 200 and video decoder 300 determine that pixel-wise BDOF is to be performed, the calculations performed by video encoder 200 and video decoder 300 to determine whether to perform pixel-wise BDOF are , may be reused to perform pixel-wise BDOF.

たとえば、サブブロックに対してひずみ値を決定するための1つの方法は、(たとえば、第1の動きベクトルによって識別される)第1の参照ブロックおよび(たとえば、第2の動きベクトルによって識別される)第2の参照ブロックを決定し、第1の参照ブロックのサンプルと第2の参照ブロックのサンプルとの間の差分値を決定して、ひずみ値を決定することである。一例として、上記で説明したように、ひずみ値を決定するための1つの方法は、 For example, one method for determining distortion values for subblocks is to use a first reference block (e.g., identified by a first motion vector) and a second reference block (e.g., identified by a second motion vector). ) determining a second reference block and determining a difference value between samples of the first reference block and samples of the second reference block to determine a distortion value; As an example, as explained above, one method for determining strain values is

を決定することである。 It is to decide.

上の式において、I⁽¹⁾(i,j)は、第1の参照ブロックのサンプルを指し、I⁽⁰⁾(i,j)は、第2の参照ブロックのサンプルを指す。上記でさらに説明したように、ピクセル単位動き改善(たとえば、v'_x、v'_y)を含む、動き改善を決定するために、ビデオエンコーダ200およびビデオデコーダ300は、勾配の自己相関および相互相関であるS₁、S₂、S₃、S₅、およびS₆を決定してよい。式1-6-3で説明したように、勾配の自己相関および相互相関を決定することの一部は、θに対する中間値を決定することであり、ただし、θ = (I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)である。 In the above equation, I ⁽¹⁾ (i,j) refers to the samples of the first reference block, and I ⁽⁰⁾ (i,j) refers to the samples of the second reference block. As further explained above, to determine motion improvement, including pixel-by-pixel motion improvement (e.g., _v _' S ₁ , S ₂ , S ₃ , S ₅ , and S ₆ may be determined. As explained in Equation 1-6-3, part of determining the autocorrelation and cross-correlation of the slopes is determining the intermediate value for θ, where θ = (I ⁽⁰⁾ (i ,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2).

したがって、サブブロックに対してピクセル単位BDOFが実行されることになる場合、ビデオエンコーダ200およびビデオデコーダ300は、(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)を決定する必要があり得る。1つまたは複数の例では、サブブロックに対してひずみ値を決定することの一部として、ビデオエンコーダ200およびビデオデコーダ300は、(I⁽¹⁾(i,j)) - (I⁽⁰⁾(i,j))に基づいてひずみ値を決定する代わりに(または、それに加えて)(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)に基づいてサブブロックに対してひずみ値を決定してよい。すなわち、ピクセル単位BDOFが実行されることになるかどうかを決定するためなどの、サブブロックに対してひずみ値を決定するために、ビデオエンコーダ200およびビデオデコーダ300は、sbSADに対する値として(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)を決定してよい。このようにして、ピクセル単位BDOFが実行されることになる場合、ビデオエンコーダ200およびビデオデコーダ300は、(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)に対する値をすでに決定していることになり、そうした値はθの値であり、動き改善を決定するために使用される。 Therefore, if pixel-wise BDOF is to be performed on a sub-block, video encoder 200 and video decoder 300 will perform (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i ,j) >> shift2) may need to be determined. In one or more examples, as part of determining distortion values for sub-blocks, video encoder 200 and video decoder 300 determine (I ⁽¹⁾ (i,j)) - (I ⁽⁰⁾ (i,j)) instead of (or in addition to) determining the strain value based on (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2) may be used to determine the strain value for the sub-block. That is, to determine a distortion value for a sub-block, such as to determine whether pixel-wise BDOF is to be performed, video encoder 200 and video decoder 300 use (I ^{( 0)} (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2) may be determined. In this way, if pixel-wise BDOF is to be performed, video encoder ²⁰⁰ and video decoder ^{300 will} ) >> shift2) and those values are the values of θ and are used to determine the motion improvement.

したがって、1つまたは複数の例では、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するために、ビデオエンコーダ200およびビデオデコーダ300は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、第1の参照ブロックおよび第2の参照ブロックを決定するように構成され得る。たとえば、I⁽⁰⁾(i,j)が第1の参照ブロックであってよく、I⁽¹⁾(i,j)が第2の参照ブロックであってよい。 Accordingly, in one or more examples, to determine a respective distortion value for each sub-block of one or more of the plurality of sub-blocks, video encoder 200 and video decoder 300 may For each subblock of one or more of the plurality of subblocks, the first reference block and the second reference block may be configured to be determined. For example, I ⁽⁰⁾ (i,j) may be the first reference block and I ⁽¹⁾ (i,j) may be the second reference block.

ビデオエンコーダ200およびビデオデコーダ300は、第1の参照ブロックのサンプルおよび第2の参照ブロックのサンプルをスケーリングしてよい。たとえば、ビデオエンコーダ200およびビデオデコーダ300は、I⁽⁰⁾(i,j) >> shift2という演算を実行してよい。この例では、shift2の値は、第1の参照ブロックのスケーリングされたサンプルを生成するために、I⁽⁰⁾(i,j)の値をどのくらいだけスケーリングすべきかを規定し得る。同様に、ビデオエンコーダ200およびビデオデコーダ300は、I⁽¹⁾(i,j) >> shift2という演算を実行してよい。この例では、shift2の値は、第2の参照ブロックのスケーリングされたサンプルを生成するために、I⁽¹⁾(i,j)の値をどのくらいだけスケーリングすべきかを規定し得る。 Video encoder 200 and video decoder 300 may scale the samples of the first reference block and the samples of the second reference block. For example, video encoder 200 and video decoder 300 may perform the operation I ⁽⁰⁾ (i,j) >> shift2. In this example, the value of shift2 may define how much the value of I ⁽⁰⁾ (i,j) should be scaled to generate the scaled samples of the first reference block. Similarly, video encoder 200 and video decoder 300 may perform the operation I ⁽¹⁾ (i,j) >> shift2. In this example, the value of shift2 may define how much the value of I ⁽¹⁾ (i,j) should be scaled to generate the scaled samples of the second reference block.

ビデオエンコーダ200およびビデオデコーダ300は、それぞれのひずみ値を決定するために、第1の参照ブロックのスケーリングされたサンプルと第2の参照ブロックのスケーリングされたサンプルとの間の差分値を決定してよい。たとえば、ビデオエンコーダ200およびビデオデコーダ300は、(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)を決定してよい。ビデオエンコーダ200およびビデオデコーダ300は、(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)の結果に基づいてサブブロックに対してひずみ値(たとえば、sbSAD)を決定してよい。 Video encoder 200 and video decoder 300 determine a difference value between the scaled samples of the first reference block and the scaled samples of the second reference block to determine respective distortion values. good. For example, video encoder 200 and video decoder 300 may determine (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2). Video encoder 200 and video decoder 300 apply distortion values to sub-blocks based on the results of (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2). (eg, sbSAD).

上記で説明したように、いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300に対して計算利得があってよく、(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)の値がピクセル単位BDOFに対して再使用され得る。たとえば、符号化中または復号中のブロックが分割された先の複数のサブブロックのうちの1つまたは複数のサブブロックの第1のサブブロックに対してピクセル単位BDOFが実行されることを、ビデオエンコーダ200およびビデオデコーダ300が決定したことを想定する。 As explained above, in some examples there may be a computational gain for video encoder 200 and video decoder 300 such that (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ The values of (i,j) >> shift2) can be reused for the pixel-by-pixel BDOF. For example, a video may indicate that pixel-wise BDOF is performed on the first subblock of one or more subblocks into which the block being encoded or decoded is divided. Assume that encoder 200 and video decoder 300 have made a decision.

この例では、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックの中の各サンプルに対してそれぞれの動き改善を決定してよい。すなわち、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックの中のすべてのサンプルにとって同じである1つの動き改善(v_x,v_y)を決定するのではなく、またはそれに加えて、第1のサブブロックの各サンプルに対して動き改善(v'_x,v'_y)を決定してよい。 In this example, video encoder 200 and video decoder 300 may determine respective motion improvements for each sample in the first sub-block. That is, video encoder 200 and video decoder 300 do not determine one motion improvement (v _x ,v _y ) that is the same for all samples in the first sub-block, or in addition to The motion improvement (v' _x ,v' _y ) may be determined for each sample of one sub-block.

ビデオエンコーダ200およびビデオデコーダ300は、それぞれの動き改善に基づいて第1のサブブロックのための予測ブロックの中のサンプルから、第1のサブブロックの中の各サンプルに対してそれぞれの改善されたサンプル値を決定するように構成され得る。たとえば、上記で説明したように、ピクセル単位BDOFに対して予測サンプルを決定するための式は、pred_BDOF(x,y) = (I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) + b'(x,y) + ο_offset) >> shift5であってよい。 Video encoder 200 and video decoder 300 perform respective motion improvement operations for each sample in the first sub-block from the samples in the predictive block for the first sub-block based on the respective motion improvement. The sample value may be configured to determine the sample value. For example, as explained above, the formula for determining prediction samples for a pixel-wise BDOF is pred _BDOF (x,y) = (I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x ,y) + b'(x,y) + ο _offset ) >> shift5.

pred_BDOFを決定するために、ビデオエンコーダ200およびビデオデコーダ300は、それぞれのピクセル単位動き改善(すなわち、(v'_x,v'_y))から決定されるピクセル単位調整値であるb'(x,y)を決定してよい。いくつかの例では、予測ブロックは、第1の参照ブロックと第2の参照ブロックとの和(すなわち、I⁽⁰⁾(i,j) + I⁽¹⁾(i,j))と見なされてよい。pred_BDOFを決定するための式に示すように、ビデオエンコーダ200およびビデオデコーダ300は、I⁽⁰⁾(i,j) + I⁽¹⁾(i,j)をb'(x,y)に加算してよい。したがって、pred_BDOFを決定することの一部として、ビデオエンコーダ200およびビデオデコーダ300は、それぞれの動き改善(たとえば、b'(x,y)を決定するために使用される(v'_x,v'_y))に基づいて第1のサブブロックに対して、(たとえば、予測ブロックがI⁽⁰⁾(i,j) + I⁽¹⁾(i,j)に等しい場合)予測ブロックの中のサンプルから、改善されたサンプル値(たとえば、pred_BDOF)を決定してよい。 To determine the pred _BDOF , video encoder ₂₀₀ and video decoder ₃₀₀ use b'(x ,y) may be determined. In some examples, the predicted block is considered the sum of the first reference block and the second reference block (i.e., I ⁽⁰⁾ (i,j) + I ⁽¹⁾ (i,j)). It's fine. As shown in the formula for determining pred _BDOF , video encoder 200 and video decoder 300 convert I ⁽⁰⁾ (i,j) + I ⁽¹⁾ (i,j) to b'(x,y). May be added. Therefore, as part of determining the pred _BDOF , video encoder 200 and video decoder 300 are used to determine their respective motion improvements (e.g., b'(x,y) (v' _x ,v ' _y )) in the predicted block (e.g., if the predicted block is equal to I ⁽⁰⁾ (i,j) + I ⁽¹⁾ (i,j)) From the samples, improved sample values (eg, pred _BDOF ) may be determined.

言い換えれば、ビデオエンコーダ200およびビデオデコーダ300は、1つまたは複数のサブブロックのうちの第1のサブブロックのための第1の参照ブロックの中のサンプル値の第1のセットを決定してよい(たとえば、I⁽⁰⁾(i,j)を決定してよい)。ビデオエンコーダ200およびビデオデコーダ300は、スケーリングされたサンプル値の第1のセットを生成するために、スケール係数を用いてサンプル値の第1のセットをスケーリングしてよい。すなわち、I⁽⁰⁾(i,j) >> shift2を実行するために、ビデオエンコーダ200およびビデオデコーダ300は、「>>」および「shift2」の値によって規定されるスケール係数だけ、サンプルの第1のセットをスケーリングするものと見なされてよい。 In other words, video encoder 200 and video decoder 300 may determine a first set of sample values in a first reference block for a first sub-block of the one or more sub-blocks. (For example, you may determine I ⁽⁰⁾ (i,j)). Video encoder 200 and video decoder 300 may scale the first set of sample values using a scale factor to generate a first set of scaled sample values. That is, to perform I ⁽⁰⁾ (i,j) >> shift2, video encoder 200 and video decoder 300 change the number of samples by the scale factor defined by the values of ">>" and "shift2". may be considered as scaling the set of 1.

ビデオエンコーダ200およびビデオデコーダ300は、1つまたは複数のサブブロックのうちの第1のサブブロックのための第2の参照ブロックの中のサンプル値の第2のセットを決定してよい(たとえば、I⁽¹⁾(i,j)を決定してよい)。ビデオエンコーダ200およびビデオデコーダ300は、スケーリングされたサンプル値の第2のセットを生成するために、スケール係数を用いてサンプル値の第2のセットをスケーリングしてよい。すなわち、I⁽¹⁾(i,j) >> shift2を実行するために、ビデオエンコーダ200およびビデオデコーダ300は、「>>」および「shift2」の値によって規定されるスケール係数だけ、サンプルの第2のセットをスケーリングするものと見なされてよい。 Video encoder 200 and video decoder 300 may determine a second set of sample values in a second reference block for a first of the one or more subblocks (e.g., I ⁽¹⁾ (i,j) may be determined). Video encoder 200 and video decoder 300 may scale the second set of sample values using a scale factor to generate a second set of scaled sample values. That is, to perform I ⁽¹⁾ (i,j) >> shift2, video encoder 200 and video decoder 300 change the number of samples by the scale factor defined by the values of ">>" and "shift2". may be considered as scaling the set of 2.

ビデオエンコーダ200およびビデオデコーダ300は、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットに基づいて(たとえば、I⁽⁰⁾(i,j) >> shift2およびI⁽¹⁾(i,j) >> shift2に基づいて)、第1のサブブロックに対してひずみ値を決定してよい。たとえば、ビデオエンコーダ200およびビデオデコーダ300は、(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)に基づいて第1のサブブロックに対してひずみ値を決定してよい。 Video encoder 200 and video decoder 300 are configured based on the first set of scaled sample values and the second set of scaled sample values (e.g., I ⁽⁰⁾ (i,j) >> shift2 and I ⁽¹⁾ (based on (i,j) >> shift2), a strain value may be determined for the first sub-block. For example, video encoder ²⁰⁰ and video decoder 300 ^may The strain value may be determined by

1つまたは複数の例では、上記で説明したように、第1のサブブロックに対してピクセル単位BDOFが実行されることを想定する。この例では、ビデオエンコーダ200およびビデオデコーダ300は、ピクセル単位BDOFに対するピクセル単位動き改善を決定するために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用してよい。たとえば、ビデオエンコーダ200およびビデオデコーダ300は、勾配の自己相関および相互相関を決定するための(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)という計算を、ピクセル単位動き改善(たとえば、(v'_x,v'_y))を決定するために再使用してよい。上記で説明したように、ビデオエンコーダ200およびビデオデコーダ300は、pred_BDOF(すなわち、ブロックの第1のサブブロックを符号化または復号するための予測サンプル)を決定するために使用されるb'(x,y)という調整値を決定するために、ピクセル単位動き改善を使用してよい。 In one or more examples, assume that pixel-by-pixel BDOF is performed on the first sub-block, as described above. In this example, video encoder 200 and video decoder 300 replay a first set of scaled sample values and a second set of scaled sample values to determine pixel-by-pixel motion improvement for the pixel-by-pixel BDOF. May be used. For example, video encoder 200 and video decoder 300 may use (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2) may be reused to determine the per-pixel motion improvement (eg, (v' _x ,v' _y )). As explained above, video encoder 200 and video decoder 300 perform _b '( Pixel-by-pixel motion improvement may be used to determine the adjustment values x,y).

上記のことは、ビデオエンコーダ200およびビデオデコーダ300が、ピクセル単位BDOFに対するピクセル単位動き改善を決定するために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用し得る一例を説明する。しかしながら、技法はそのように限定されない。いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、BDOFに対する動き改善を決定するために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用してよい。すなわち、例示的な技法は、ピクセル単位BDOFに対するピクセル単位動き改善のために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用することに限定されなくてよく、BDOFに対する動き改善のために、より一般に使用され得る(たとえば、ピクセル単位BDOFに対するピクセル単位動き改善に限定されない)。ピクセル単位BDOFに対してだけでなく、BDOFがピクセルごとでなくサブブロック全体に対して動き改善を含む例におけるようなサブブロックベースのBDOFに対しても、複雑度の低減があり得る。 The foregoing indicates that video encoder 200 and video decoder 300 use a first set of scaled sample values and a second set of scaled sample values to determine pixel-by-pixel motion improvement relative to pixel-by-pixel BDOF. An example that can be reused will be explained. However, the technique is not so limited. In some examples, video encoder 200 and video decoder 300 reuse the first set of scaled sample values and the second set of scaled sample values to determine motion improvement for the BDOF. It's fine. That is, the example techniques are not limited to reusing the first set of scaled sample values and the second set of scaled sample values for pixel-by-pixel motion improvement for pixel-by-pixel BDOF. and may be used more generally for motion improvement for BDOF (eg, not limited to per-pixel motion improvement for pixel-by-pixel BDOF). There can be complexity reduction not only for pixel-by-pixel BDOF, but also for sub-block-based BDOF, such as in examples where the BDOF includes motion improvement for the entire sub-block rather than pixel-by-pixel.

したがって、第2の態様におけるように、以下のことは、サブブロックをバイパスすべきか否か(すなわち、BDOFがバイパスされるか否か)を決定するために使用されるサブブロックSADを導出するための代替方法を説明する。上記で説明したように、例示的な方法は、式1-6を用いて上記で説明した双方向オプティカルフローにおけるようにθ(i,j)を計算するのと同じ方法で、2つの基準信号の間の差分diff(i,j)を計算する。 Therefore, as in the second aspect, the following is used to derive the sub-block SAD that is used to decide whether a sub-block should be bypassed (i.e. whether the BDOF is bypassed or not). Describe an alternative method. As explained above, the exemplary method uses Equations 1-6 to calculate θ(i,j) as in the bidirectional optical flow described above using two reference signals. Calculate the difference diff(i,j) between .

サブブロックがBDOFを適用すべきと決定される場合、diff(i,j)は、上記で説明した双方向オプティカルフローにおけるように勾配の自己相関および相互相関S3およびS6を計算するためのステップにおいて再使用され得る。 If it is determined that a sub-block should be subjected to BDOF, diff(i,j) is Can be reused.

第2の態様における(3-1-1-1)の式は次のように修正される。 Formula (3-1-1-1) in the second embodiment is modified as follows.

上の式において、I^(k)(i,j)は、参照ピクチャk(k=0,1)の中の予測信号の(sbWidth + 4)×(sbHeight + 4)の領域の中の座標(i,j)におけるサンプル値である。shift2は事前決定された値であり、たとえば、shift2は4に等しい。Ω''は、sbWidth×sbHeightのサブブロック領域である。 In the above equation, I ^(k) (i,j) is the coordinate ( i, j). shift2 is a predetermined value, for example shift2 equals 4. Ω'' is a sub-block area of sbWidth×sbHeight.

θ(i,j) = (I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)に基づいてサブブロックに対してひずみ値を決定するための(たとえば、sbSADを決定するための)代替技法が、ピクセル単位BDOFが実行される例に限定されると見なされるべきでないことに、留意されたい。サブブロックに対してひずみ値を決定するための代替技法は、サブブロックBDOFまたはいくつかの他のBDOF技法が適用される例にさえ適用可能であってよい。たとえば、サブブロックBDOFに対してさえ、ビデオエンコーダ200およびビデオデコーダ300は、ひずみ値を決定するための代替技法を、サブブロックに対してBDOFが実行されるか否かを決定するために利用してよい。BDOFが実行されることになる場合、ビデオエンコーダ200およびビデオデコーダ300は、代替技法がひずみ値を決定するための計算を、サブブロックBDOFの一部として動き改善を決定するために再使用してよい(たとえば、代替技法がひずみ値を決定するための計算の再使用があってよい)。 To determine the strain value for a subblock based on θ(i,j) = (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2) Note that alternative techniques (eg, for determining sbSAD) should not be considered limited to examples where pixel-by-pixel BDOF is performed. Alternative techniques for determining distortion values for sub-blocks may be applicable even in instances where sub-block BDOF or some other BDOF technique is applied. For example, even for sub-block BDOF, video encoder 200 and video decoder 300 may utilize alternative techniques for determining distortion values to determine whether BDOF is performed on the sub-block. It's fine. If BDOF is to be performed, video encoder 200 and video decoder 300 may use alternative techniques to reuse the calculations for determining distortion values to determine motion improvements as part of the sub-block BDOF. (eg, there may be reuse of calculations for alternative techniques to determine strain values).

上記で説明したように、ピクセル単位BDOFが実行されるのかそれともBDOFがバイパスされるのかを決定するためにひずみ値がそれと比較されるしきい値は、上の式3-1-1-2に示すように(sbWidth*sbHeight*s) << nとして計算されるsbDistThである。しかしながら、ひずみ値を決定するための代替技法では、ビデオエンコーダ200およびビデオデコーダ300は、上記で説明したように、>> shift2によってI⁽⁰⁾(i,j)をスケーリングしてよく、>> shift2によってI⁽¹⁾をスケーリングしてよい。したがって、いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300がsbDistThを決定する方式は、>> shift2スケーリングを考慮に入れるように修正されてよい。 As explained above, the threshold against which the distortion value is compared to determine whether pixel-wise BDOF is performed or BDOF is bypassed is given by Equation 3-1-1-2 above. sbDistTh is calculated as (sbWidth*sbHeight*s) << n as shown. However, in an alternative technique for determining distortion values, video encoder 200 and video decoder 300 may scale I ⁽⁰⁾ (i,j) by >> shift2 and >> You may scale I ⁽¹⁾ by shift2. Accordingly, in some examples, the manner in which video encoder 200 and video decoder 300 determine sbDistTh may be modified to take >> shift2 scaling into account.

sbDistThを計算するための第2の態様における(3-1-1-2)の式は、次のように修正される。
sbDistTh = (sbWidth・sbHeight・s) << (n - shift2) (3-2-2) The formula (3-1-1-2) in the second embodiment for calculating sbDistTh is modified as follows.
sbDistTh = (sbWidth・sbHeight・s) << (n - shift2) (3-2-2)

したがって、しきい値を決定するために、ビデオエンコーダ200およびビデオデコーダ300は、1つまたは複数のサブブロックのうちの第1のサブブロックの幅(すなわち、式3-2-2の中のsbWidth)、1つまたは複数のサブブロックのうちの第1のサブブロックの高さ(すなわち、式3-2-2の中のsbHeight)、および第1のスケール係数(すなわち、式3-2-2の中の「s」)を乗算して中間値を生成するように構成され得る。ビデオエンコーダ200およびビデオデコーダ300は、第2のスケール係数に基づいて中間値に対して左シフト演算を実行してしきい値を生成するように構成され得る。たとえば、第2のスケール係数は式3-2-2の中の(n - shift2)であってよく、左シフト演算は式3-2-2の中の「<<」として示される。 Therefore, to determine the threshold, video encoder 200 and video decoder 300 determine the width of the first of the one or more subblocks (i.e., sbWidth in Equation 3-2-2) ), the height of the first subblock of the one or more subblocks (i.e., sbHeight in Equation 3-2-2), and the first scale factor (i.e., sbHeight in Equation 3-2-2 's' in ) to generate an intermediate value. Video encoder 200 and video decoder 300 may be configured to perform a left shift operation on the intermediate value based on the second scale factor to generate the threshold. For example, the second scale factor may be (n - shift2) in Equation 3-2-2, and the left shift operation is indicated as "<<" in Equation 3-2-2.

1つまたは複数の例では、ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックに対するひずみ値(たとえば、ひずみ値を決定するための代替技法を使用して計算されたひずみ値)をしきい値(たとえば、式3-2-2において決定されるようなsbDistTh)と比較してよい。ビデオエンコーダ200およびビデオデコーダ300は、第1のサブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つを、その比較に基づいて決定してよい。たとえば、ひずみ値がしきい値よりも小さい場合(たとえば、図13の中の1306のYES)、ビデオエンコーダ200およびビデオデコーダ300はBDOFをバイパスしてよい。ひずみ値がしきい値よりも大きい場合(たとえば、図13の中の1306のNO)、ビデオエンコーダ200およびビデオデコーダ300はピクセル単位BDOFを実行してよい。 In one or more examples, video encoder 200 and video decoder 300 threshold a distortion value (e.g., a distortion value calculated using an alternative technique for determining the distortion value) for the first sub-block. (eg, sbDistTh as determined in equation 3-2-2). Video encoder 200 and video decoder 300 may determine one of whether pixel-wise BDOF is performed or BDOF is bypassed for the first sub-block based on the comparison. For example, if the distortion value is less than a threshold (eg, 1306 YES in FIG. 13), video encoder 200 and video decoder 300 may bypass the BDOF. If the distortion value is greater than the threshold (eg, NO at 1306 in FIG. 13), video encoder 200 and video decoder 300 may perform pixel-by-pixel BDOF.

第4の態様は、thWおよびthHの値を決定することに関する。上の態様におけるように、例示的な技法は双予測されるコーディングブロックに適用されてよい。サブブロックの総数が、現在ブロックの幅および高さ、ならびにサブブロックの最大サブブロック幅(thW)および高さ(thH)から導出される。 A fourth aspect relates to determining the values of thW and thH. As in the above aspects, example techniques may be applied to bi-predicted coding blocks. The total number of subblocks is derived from the width and height of the current block and the maximum subblock width (thW) and height (thH) of the subblock.

現在のコーディングブロックがサブブロックベースの方法、たとえば、DMVRを適用するとき、thWおよびthHの値は、先に行われた方法(たとえば、DMVR)の最大サブブロック幅および高さに等しいかまたはそれよりも小さいはずである。 When the current coding block applies a subblock-based method, e.g., DMVR, the values of thW and thH are equal to or less than the maximum subblock width and height of the previously performed method (e.g., DMVR). It should be smaller than .

thWおよびthHの値は、固定の事前決定された値であり得、たとえば、thWは8に等しく、thHは8に等しい。thWおよびthHの値は適応的であり得、その値はビットストリームからの復号された情報によって決定される。以下のことは、thWおよびthHの値が適応的となるための方法を説明する。
a.先に行われたコーディング方法によって決定される。現在のコーディングブロックがサブブロックベースの方法を適用する場合、thWおよびthHは先に行われた方法と同じサブブロック寸法に設定され得る。たとえば、現在のコーディングブロックにDMVRが適用されるとき、thWは、DMVR最大サブブロック幅、たとえば、16に等しくなるように設定され、thHは、DMVR最大サブブロック高さ、たとえば、16に等しくなるように設定される。そうでない場合(現在のコーディングブロックが、いかなるサブブロックベースの方法も適用しない場合)、thWおよびthHは、事前決定された値、たとえば、8に設定され得る。
b.現在のコーディングブロック寸法によって決定される。この例では、しきい値T(たとえば、T=128)よりも大きいルーマサンプルの総数を有するコーディングブロックには、thWおよびthHのうちの大きい方の値が設定される。W×Hのコーディングブロックが与えられると、W*HがTよりも大きい場合、thWおよびthHの値を16に等しく設定する。そうでない場合(W*HがTに等しいかまたはそれよりも小さい場合)、thWおよびthHの値を8に等しく設定する。 The values of thW and thH may be fixed, predetermined values, for example, thW equals 8 and thH equals 8. The values of thW and thH may be adaptive, with the values determined by decoded information from the bitstream. The following describes how the values of thW and thH are made adaptive.
a. Determined by the previous coding method. If the current coding block applies a subblock-based method, thW and thH may be set to the same subblock size as the previously done method. For example, when DMVR is applied to the current coding block, thW is set equal to the DMVR maximum subblock width, e.g., 16, and thH is set equal to the DMVR maximum subblock height, e.g., 16. It is set as follows. Otherwise (if the current coding block does not apply any subblock-based method), thW and thH may be set to a predetermined value, eg, 8.
b. Determined by the current coding block dimensions. In this example, coding blocks that have a total number of luma samples greater than a threshold T (eg, T=128) are set to the larger value of thW and thH. Given a W×H coding block, if W*H is greater than T, set the values of thW and thH equal to 16. Otherwise (W*H is less than or equal to T), set the values of thW and thH equal to 8.

第5の態様は、サブブロックバイパスを伴うピクセル単位BDOFを適用することの例示的なデコーダプロセスに関する。上の態様は、エンコーダ(たとえば、ビデオエンコーダ200)および/またはデコーダ(たとえば、ビデオデコーダ300)の中で適用され得る。デコーダ(たとえば、ビデオデコーダ300)は、ビットストリームからのピクチャの中のインター予測ブロックを復号するための以下のステップの全部またはサブセットによって、ここで説明する方法を実行してよい。
1.ビットストリームの中のシンタックス要素を復号することによって、現在ブロックの左上のルーマ位置として位置成分(cbX,cbY)を導出する。
2.ビットストリームの中のシンタックス要素を復号することによって、幅値Wおよび高さ値Hとして現在ブロックのサイズを導出する。
3.ビットストリームの中の要素を復号することから、現在ブロックがインター予測ブロックであることを決定する。
4.ビットストリームの中の要素を復号することから、現在ブロックの動きベクトル成分(mvL0およびmvL1)および参照インデックス(refPicL0およびrefPicL1)を導出する。
5.ビットストリームの中の要素を復号することからフラグを推測し、フラグは、デコーダ側動きベクトル導出(たとえば、DMVR、双方向マージ、テンプレートマッチング)が現在ブロックに適用されるか否かを示す。フラグの推定方式は、限定はしないが、DMVRがいつ有効化されるのかに対する有効化条件に関して上記で説明した例と同じであり得る。別の例では、このフラグは、デコーダにおける複雑な条件チェックを避けるためにビットストリームの中で明示的にシグナリングされてよい。
6.現在ブロックにDMVRを適用すべきと決定される場合、改善された動きベクトルを導出する。
7.復号されたrefPicL0、refPicL1、および動きベクトルから、2つの(W + 6)×(H + 6)のルーマ予測サンプルアレイpredSampleL0およびpredSampleL1を導出し、DMVRを適用すべきと決定される場合、動きベクトルは改善された動きベクトルであり、そうでない場合、動きベクトルはmvL0、mvL1である。
8.ビットストリームの中の要素を復号することからフラグを推測し、フラグは、双方向オプティカルフローが現在ブロックに適用されるか否かを示す。フラグの推定方式は、限定はしないが、双方向オプティカルフローと同じであり得る。別の例では、このフラグは、デコーダにおける複雑な条件チェックを避けるためにビットストリームの中で明示的にシグナリングされてよい。
9.上述のフラグ値に従って、決定が、BDOFを現在ブロックに適用することである場合、水平方向におけるサブブロックの個数numSbXおよび垂直方向におけるサブブロックの個数numSbY、サブブロック幅sbWidthおよびサブブロック高さsbHeightを、次のように導出する。
・numSbX = (W > thW) ? (W / thW) : 1
・numSbY = (H > thH) ? (H / thH) : 1
・sbWidth = (W > thW) ? thW : W
・sbHeight = (H > thH) ? thH : H
ただし、thWおよびthHは事前決定された整数値である(たとえば、thW = thH = 8)。
10.次のように変数sbDistThを導出する。
sbDistTh = sbWidth * sbHeight * s << (n - shift2)
ただし、
shift2は事前決定された値であり、たとえば、shift2は4に等しい。
nは事前決定された値であり、たとえば、n = InternalBitDepth - bitDepth + 1 = 5である。
sはスケール係数であり、たとえば、s = 1である。
11.現在ブロックの第1のサブブロックの左上のルーマ位置として位置成分(sbX,sbY) = (0,0)を設定する。
12.(sbX,sbY)における各サブブロックに対して、sbXがWよりも小さくsbYがHよりも小さいとき、以下のステップが適用される。
12.1. x = sbX - 2...sbX + sbWidth + 1、y = sbY - 2...sbY + sbHeight + 1に対して、変数diff[x][y]が次のように導出される。
・diff[x][y] = (predSamplesL0[x][y] >> shift2) - (predSamplesL1[x][y] >> shift2)
ただし、shift2は事前決定された値であり、たとえば、shift2は4に等しい。
12.2. 変数sbDistを次のように導出する。
・sbDist = Σ_iΣ_jAbs(diff[sbX + i][sbY + j])
ただし、i = 0...sbWidth - 1、j = 0...sbHeight - 1
12.3. (サブブロックBDOFをバイパスする)sbDistがsbDistThよりも小さい場合、次のようにサブブロックの予測信号を導出する。
12.3.1. x = sbX...sbX + sbWidth - 1、y = sbY...sbY + sbHeight - 1に対して、
・predSamples[x + cbX][y + cbY] = Clip3(0,(2^BitDepth) - 1,(predSamplesL0[x][y] + predSamplesL1[x][y] + offset5) >> shift5)
ただし、
shift5は、Max(3,15 - BitDepth)に等しく設定され、
offset5は、(1 << (shift5 - 1))に等しく設定される。
12.4. そうでない場合(sbDistがsbDistThに等しいかまたはそれよりも大きい場合)、以下のステップが適用される。
12.4.1. x = sbX - 2...sbX + sbWidth + 1、y = sbY - 2...sbY + sbHeight + 1に対して、変数gradientHL0[x][y]、gradientVL0[x][y]、gradientHL1[x][y]、およびgradientVL1[x][y]が次のように導出される。
・gradientHL0[x][y] = (predSamplesL0[x + 1][y] >> shift1) - (predSamplesL0[x - 1][y] >> shift1)
・gradientVL0[x][y] = (predSamplesL0[x][y + 1] >> shift1) - (predSamplesL0[x][y - 1] >> shift1)
・gradientHL1[x][y] = (predSamplesL1[x + 1][y] >> shift1) - (predSamplesL1[x - 1][y] >> shift1)
・gradientVL1[x][y] = (predSamplesL1[x][y + 1] >> shift1) - (predSamplesL1[x][y - 1] >> shift1)
ただし、shift1は事前決定された値であり、たとえば、shift1は6に等しく設定される。
12.4.2. x = sbX - 2...sbX + sbWidth + 1, y = sbY - 2...sbY + sbHeight + 1に対して、変数tempH[x][y]およびtempV[x][y]が次のように導出される。
・tempH[x][y] = (gradientHL0[x][y] + gradientHL1[x][y]) >> shift3
・tempV[x][y] = (gradientVL0[x][y] + gradientVL1[x][y]) >> shift3
ただし、shift3は事前決定された値であり、たとえば、shift3は1に等しく設定される。
12.4.3. (piX,piY)における各ピクセルに対して、piX = sbX...sbX + sbWidth - 1、piY = sbY...sbY + sbHeight - 1であって、以下のステップが適用される。
12.4.3.1. 変数sGx2、sGy2、sGxGy、sGxdI、およびsGydIが次のように導出される。
・sGx2 = Σ_iΣ_jAbs(tempH[piX + i][piY + j])
・sGy2 = Σ_iΣ_jAbs(tempV[piX + i][piY + j])
・sGxGy = Σ_iΣ_j(Sign(tempV[piX + i][piY + j]) * tempH[piX + i][piY + j])
・sGxdI = Σ_iΣ_j(-Sign(tempH[piX + i][piY + j]) * diff[piX + i][piY + j])
・sGydI = Σ_iΣ_j(-Sign(tempV[piX + i][piY + j]) * diff[piX + i][piY + j])
ただし、i = -2...2、j = -2...2である。
12.4.3.2. 現在ピクセルの水平動きオフセットおよび垂直動きオフセットが次のように導出される。
・v_x = sGx2 > 0 ? Clip3(-mvRefineThres + 1, mvRefineThres - 1, (sGxdI << 2) >> Floor(Log2(sGx2))) : 0
・v_y = sGy2 > 0 ? Clip3(-mvRefineThres + 1, mvRefineThres - 1, ((sGydI << 2) - ((v_x * sGxGy) >> 1)) >> Floor(Log2(sGy2))) : 0
ただし、mvRefineThresは事前決定された値であり、たとえば、mvRefineThresは(1 << 4)に等しく設定される。
12.4.3.3. 現在ピクセルの予測信号が次のように導出される。
・bdofOffset = v_x * (gradientHL0[piX][piY] - gradientHL1[piX][piY]) + v_y * (gradientVL0[piX][piY] - gradientVL1[piX][piY])
・predSamples[piX + cbW][piY + cbY] = Clip3(0,(2^BitDepth) - 1,(predSamplesL0[xPix][yPix] + predSamplesL1[xPix][yPix] + bdofOffset + offset5) >> shift5)
ただし、
shift5はMax(3,15 - BitDepth)に等しく設定され、
offset5は(1 << (shift5 - 1))に等しく設定される。
12.5. 左上のルーマ位置としてのサブブロックを次のように更新する。
・sbX = (sbX + sbWidth) < W ? sbX + sbWidth : 0
・sbY = (sbX + sbWidth) < W ? sbY : sbY + sbHeight
13.各サブブロックの導出された予測信号を使用して、予測されたブロックを導出し、導出かつ予測されたブロックをビデオ復号のために使用する。 A fifth aspect relates to an example decoder process of applying pixel-wise BDOF with sub-block bypass. The above aspects may be applied within an encoder (eg, video encoder 200) and/or a decoder (eg, video decoder 300). A decoder (eg, video decoder 300) may perform the methods described herein by all or a subset of the following steps for decoding inter-predicted blocks in pictures from a bitstream.
1. Derive the position component (cbX, cbY) as the upper left luma position of the current block by decoding the syntax elements in the bitstream.
2. Derive the size of the current block as width value W and height value H by decoding the syntax elements in the bitstream.
3. Determine that the current block is an inter-predicted block from decoding the elements in the bitstream.
4. Derive the motion vector components (mvL0 and mvL1) and reference indices (refPicL0 and refPicL1) of the current block from decoding the elements in the bitstream.
5. Infer a flag from decoding elements in the bitstream, the flag indicating whether decoder-side motion vector derivation (e.g., DMVR, bidirectional merging, template matching) is applied to the current block. . The flag estimation scheme may be, but is not limited to, the same as the example described above regarding the enabling conditions for when DMVR is enabled. In another example, this flag may be explicitly signaled in the bitstream to avoid complex condition checks at the decoder.
6. Derive an improved motion vector if it is determined that DMVR should be applied to the current block.
7. If it is determined that two (W + 6) × (H + 6) luma prediction sample arrays predSampleL0 and predSampleL1 should be derived from the decoded refPicL0, refPicL1, and motion vectors, and DMVR should be applied, The motion vector is the improved motion vector, otherwise the motion vector is mvL0, mvL1.
8. Infer a flag from decoding elements in the bitstream, the flag indicating whether bidirectional optical flow is currently applied to the block. The flag estimation method may be, but is not limited to, the same as for bidirectional optical flow. In another example, this flag may be explicitly signaled in the bitstream to avoid complex condition checks at the decoder.
9. If the decision is to apply BDOF to the current block, the number of sub-blocks in the horizontal direction numSbX and the number of sub-blocks in the vertical direction numSbY, the sub-block width sbWidth and the sub-block height, according to the flag values mentioned above. Derive sbHeight as follows.
・numSbX = (W > thW) ? (W / thW) : 1
・numSbY = (H > thH) ? (H / thH) : 1
・sbWidth = (W > thW)? thW : W
・sbHeight = (H > thH) ? thH : H
where thW and thH are predetermined integer values (eg, thW = thH = 8).
10. Derive the variable sbDistTh as follows.
sbDistTh = sbWidth * sbHeight * s << (n - shift2)
however,
shift2 is a predetermined value, for example shift2 equals 4.
n is a predetermined value, for example, n = InternalBitDepth - bitDepth + 1 = 5.
s is a scale factor, for example s=1.
11. Set positional components (sbX,sbY) = (0,0) as the upper left luma position of the first subblock of the current block.
12. For each subblock in (sbX,sbY), when sbX is less than W and sbY is less than H, the following steps are applied.
12.1. For x = sbX - 2...sbX + sbWidth + 1, y = sbY - 2...sbY + sbHeight + 1, the variable diff[x][y] is derived as follows.
・diff[x][y] = (predSamplesL0[x][y] >> shift2) - (predSamplesL1[x][y] >> shift2)
However, shift2 is a predetermined value, for example shift2 is equal to 4.
12.2. Derive the variable sbDist as follows.
・sbDist = Σ _i Σ _j Abs(diff[sbX + i][sbY + j])
However, i = 0...sbWidth - 1, j = 0...sbHeight - 1
12.3. If sbDist (which bypasses the subblock BDOF) is smaller than sbDistTh, derive the subblock prediction signal as follows.
12.3.1. For x = sbX...sbX + sbWidth - 1, y = sbY...sbY + sbHeight - 1,
・predSamples[x + cbX][y + cbY] = Clip3(0,(2 ^BitDepth ) - 1,(predSamplesL0[x][y] + predSamplesL1[x][y] + offset5) >> shift5)
however,
shift5 is set equal to Max(3,15 - BitDepth) and
offset5 is set equal to (1 << (shift5 - 1)).
12.4. Otherwise (sbDist is greater than or equal to sbDistTh), the following steps apply.
12.4.1. Variables gradientHL0[x][y], gradientVL0[x][y for x = sbX - 2...sbX + sbWidth + 1, y = sbY - 2...sbY + sbHeight + 1 ], gradientHL1[x][y], and gradientVL1[x][y] are derived as follows.
・gradientHL0[x][y] = (predSamplesL0[x + 1][y] >> shift1) - (predSamplesL0[x - 1][y] >> shift1)
・gradientVL0[x][y] = (predSamplesL0[x][y + 1] >> shift1) - (predSamplesL0[x][y - 1] >> shift1)
・gradientHL1[x][y] = (predSamplesL1[x + 1][y] >> shift1) - (predSamplesL1[x - 1][y] >> shift1)
・gradientVL1[x][y] = (predSamplesL1[x][y + 1] >> shift1) - (predSamplesL1[x][y - 1] >> shift1)
However, shift1 is a predetermined value, for example shift1 is set equal to 6.
12.4.2. Variables tempH[x][y] and tempV[x][y for x = sbX - 2...sbX + sbWidth + 1, y = sbY - 2...sbY + sbHeight + 1 ] is derived as follows.
・tempH[x][y] = (gradientHL0[x][y] + gradientHL1[x][y]) >> shift3
・tempV[x][y] = (gradientVL0[x][y] + gradientVL1[x][y]) >> shift3
However, shift3 is a predetermined value, for example shift3 is set equal to 1.
12.4.3. For each pixel in (piX,piY), piX = sbX...sbX + sbWidth - 1, piY = sbY...sbY + sbHeight - 1, and the following steps are applied: .
12.4.3.1. The variables sGx2, sGy2, sGxGy, sGxdI, and sGydI are derived as follows.
・sGx2 = Σ _i Σ _j Abs(tempH[piX + i][piY + j])
・sGy2 = Σ _i Σ _j Abs(tempV[piX + i][piY + j])
・sGxGy = Σ _i Σ _j (Sign(tempV[piX + i][piY + j]) * tempH[piX + i][piY + j])
・sGxdI = Σ _i Σ _j (-Sign(tempH[piX + i][piY + j]) * diff[piX + i][piY + j])
・sGydI = Σ _i Σ _j (-Sign(tempV[piX + i][piY + j]) * diff[piX + i][piY + j])
However, i = -2...2, j = -2...2.
12.4.3.2. The horizontal and vertical motion offsets of the current pixel are derived as follows.
・v _x = sGx2 > 0 ? Clip3(-mvRefineThres + 1, mvRefineThres - 1, (sGxdI << 2) >> Floor(Log2(sGx2))) : 0
・v _y = sGy2 > 0 ? Clip3(-mvRefineThres + 1, mvRefineThres - 1, ((sGydI << 2) - ((v _x * sGxGy) >> 1)) >> Floor(Log2(sGy2))): 0
However, mvRefineThres is a predetermined value, for example mvRefineThres is set equal to (1 << 4).
12.4.3.3. The predicted signal for the current pixel is derived as follows.
・bdofOffset = v _x * (gradientHL0[piX][piY] - gradientHL1[piX][piY]) + v _y * (gradientVL0[piX][piY] - gradientVL1[piX][piY])
・predSamples[piX + cbW][piY + cbY] = Clip3(0,(2 ^BitDepth ) - 1,(predSamplesL0[xPix][yPix] + predSamplesL1[xPix][yPix] + bdofOffset + offset5) >> shift5)
however,
shift5 is set equal to Max(3,15 - BitDepth) and
offset5 is set equal to (1 << (shift5 - 1)).
12.5. Update the subblock as the upper left luma position as follows.
・sbX = (sbX + sbWidth) < W ? sbX + sbWidth : 0
・sbY = (sbX + sbWidth) < W ? sbY : sbY + sbHeight
13. Using the derived prediction signal of each sub-block to derive a predicted block and using the derived and predicted block for video decoding.

図15は、本開示の技法による、ビデオデータを復号するための例示的な方法を示すフローチャートである。現在ブロックは、現在CUを備えてよい。ビデオデコーダ300(図1および図4)に関して説明するが、他のデバイスが図15の方法と類似の方法を実行するように構成され得ることを理解されたい。たとえば、予測処理ユニット304および/または動き補償ユニット316は、図15の例示的な技法を実行するように構成され得る。予測処理ユニット304および/または動き補償ユニット316は、DPB314などのメモリまたはビデオデコーダ300の他のメモリに結合されてよい。いくつかの例では、ビデオデコーダ300は、図15の例示的な技法を実行するためにビデオデコーダ300によって使用される情報を記憶するメモリ120に結合されてよい。 FIG. 15 is a flowchart illustrating an example method for decoding video data in accordance with the techniques of this disclosure. A current block may currently comprise a CU. Although described with respect to video decoder 300 (FIGS. 1 and 4), it should be understood that other devices may be configured to perform a method similar to that of FIG. 15. For example, prediction processing unit 304 and/or motion compensation unit 316 may be configured to perform the example technique of FIG. 15. Prediction processing unit 304 and/or motion compensation unit 316 may be coupled to memory such as DPB 314 or other memory of video decoder 300. In some examples, video decoder 300 may be coupled to memory 120 that stores information used by video decoder 300 to perform the example technique of FIG. 15.

ビデオデコーダ300は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定してよい(1500)。たとえば、ビデオデコーダ300は、ブロックに対してBDOFが有効化されることを示すシグナリングを受信し得る。いくつかの例では、ビデオデコーダ300は、いくつかの基準が満たされることに基づくなどして、ブロックに対してBDOFが有効化されることを推測してよい(たとえば、シグナリングを受信することなく決定してよい)。 Video decoder 300 may determine that bidirectional optical flow (BDOF) is enabled for a block of video data (1500). For example, video decoder 300 may receive signaling indicating that BDOF is enabled for a block. In some examples, video decoder 300 may infer that BDOF is enabled for a block, such as based on some criteria being met (e.g., without receiving any signaling) (you may decide).

ビデオデコーダ300は、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割してよい(1502)。たとえば、ビデオデコーダ300は、ブロックをN個のサブブロックに分割してよい。場合によっては、サブブロックのうちの2つ以上が、異なるサイズであってよいが、サブブロックが、同じサイズを有することが可能である。ビデオデコーダ300は、シグナリングされる情報に基づいて、または推定によって、どのようにブロックを分割すべきかを決定してよい。 Video decoder 300 may partition the block into multiple subblocks based on a determination that BDOF is enabled for the block (1502). For example, video decoder 300 may divide a block into N subblocks. In some cases, two or more of the sub-blocks may be of different sizes, but it is possible for the sub-blocks to have the same size. Video decoder 300 may determine how to partition blocks based on signaled information or by estimation.

ビデオデコーダ300は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定してよい(1504)。ビデオデコーダ300がそれぞれのひずみ値を決定し得る様々な方法があり得る。一例として、ビデオデコーダ300は、第1の参照ブロック(たとえば、I⁽⁰⁾(i,j))を決定してよく、第2の参照ブロック(たとえば、I⁽¹⁾(i,j))を決定してよい。ビデオデコーダ300は、I⁽⁰⁾(i,j)とI⁽¹⁾(i,j)との間の絶対差分和(SAD)を計算してよい。 Video decoder 300 may determine a respective distortion value for each subblock of one or more of the plurality of subblocks (1504). There may be various ways in which video decoder 300 may determine the respective distortion values. As an example, video decoder 300 may determine a first reference block (e.g., I ⁽⁰⁾ (i,j)) and a second reference block (e.g., I ⁽¹⁾ (i,j)). may be determined. Video decoder 300 may calculate the sum of absolute differences (SAD) between I ⁽⁰⁾ (i,j) and I ⁽¹⁾ (i,j).

しかしながら、例示的な技法はそのように限定されない。いくつかの例では、ビデオデコーダ300は、上記で説明したように、ひずみ値を決定するための代替技法を実行してよい。たとえば、ビデオデコーダ300は、1つまたは複数のサブブロックのうちの第1のサブブロックのための第1の参照ブロックの中のサンプル値の第1のセットを決定してよい(たとえば、I⁽⁰⁾(i,j)を決定してよい)。ビデオデコーダ300は、スケーリングされたサンプル値の第1のセットを生成するために、スケール係数を用いてサンプル値の第1のセットをスケーリングしてよい(たとえば、スケーリングされたサンプル値の第1のセットを生成するためにI⁽⁰⁾(i,j) << shift2を決定してよい)。ビデオデコーダ300は、1つまたは複数のサブブロックのうちの第1のサブブロックのための第2の参照ブロックの中のサンプル値の第2のセットを決定してよい(たとえば、I⁽¹⁾(i,j)を決定してよい)。ビデオデコーダ300は、スケーリングされたサンプル値の第2のセットを生成するために、スケール係数を用いてサンプル値の第2のセットをスケーリングしてよい(たとえば、スケーリングされたサンプル値の第2のセットを生成するためにI⁽¹⁾(i,j) << shift2を決定してよい)。1つまたは複数の例では、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するために、ビデオデコーダ300は、第1のサブブロックに対して、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットに基づいてそれぞれのひずみ値のひずみ値を決定する(たとえば、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットに基づいてSADを決定する)ように構成され得る。 However, the example techniques are not so limited. In some examples, video decoder 300 may perform alternative techniques for determining distortion values, as described above. For example, video decoder 300 may determine a first set of sample values in a first reference block for a first sub-block of the one or more sub-blocks (e.g., I ^{( 0)} (i,j) may be determined). Video decoder 300 may scale the first set of sample values using a scale factor to generate a first set of scaled sample values (e.g., the first set of scaled sample values to generate the set I ⁽⁰⁾ (i,j) << shift2). Video decoder 300 may determine a second set of sample values in a second reference block for a first of the one or more subblocks (e.g., I ⁽¹⁾ (i,j) may be determined). Video decoder 300 may scale the second set of sample values using a scale factor to generate a second set of scaled sample values (e.g., a second set of scaled sample values). to generate the set I ⁽¹⁾ (i,j) << shift2). In one or more examples, to determine a respective distortion value for each subblock of one or more of the plurality of subblocks, video decoder 300 may for each strain value based on the first set of scaled sample values and the second set of scaled sample values (e.g., the first set of scaled sample values and determining the SAD based on the second set of scaled sample values).

ビデオデコーダ300は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定してよい(1506)。たとえば、図13に関して説明したように、ビデオデコーダ300にとって2つのオプションがあってよく、サブブロックに対してピクセル単位BDOFを実行するかまたはBDOFをバイパスするかのいずれかである。いくつかの例では、サブブロックを評価するとき、ビデオデコーダ300にとって他のオプションがない場合がある。 Video decoder 300 determines one of: pixel-by-pixel BDOF is performed or BDOF is bypassed for one of the plurality of sub-blocks or each sub-block of the plurality of sub-blocks. The determination may be made based on the strain value (1506). For example, as discussed with respect to FIG. 13, there may be two options for video decoder 300: either perform pixel-by-pixel BDOF on the sub-blocks or bypass BDOF. In some examples, there may be no other options for video decoder 300 when evaluating a sub-block.

いくつかの例では、ピクセル単位BDOFを実行すべきかそれともBDOFをバイパスすべきかを決定するために、ビデオエンコーダ200およびビデオデコーダ300はしきい値を決定してよい。しきい値を決定するための1つの例示的な方法は、sbDistTh = (sbWidth・sbHeight・s) << nである。しかしながら、ひずみ値を決定するための代替技法が利用される例では、ビデオデコーダ300は、sbDistTh = (sbWidth・sbHeight・s) << (n - shift2)としてしきい値を決定してよい。 In some examples, video encoder 200 and video decoder 300 may determine a threshold to determine whether to perform pixel-by-pixel BDOF or bypass BDOF. One exemplary method for determining the threshold is sbDistTh = (sbWidth·sbHeight·s) << n. However, in examples where alternative techniques for determining distortion values are utilized, video decoder 300 may determine the threshold as sbDistTh = (sbWidth·sbHeight·s) << (n − shift2).

すなわち、ビデオデコーダ300は、中間値を生成するために、1つまたは複数のサブブロックのうちの第1のサブブロックの幅(たとえば、sbWidth)、1つまたは複数のサブブロックのうちの第1のサブブロックの高さ(たとえば、sbHeight)、および第1のスケール係数(たとえば、「s」)を乗算してよい。ビデオデコーダ300は、しきい値を生成するために、第2のスケール係数に基づいて中間値に対して左シフト演算を実行してよい(たとえば、<< (n - shift2)を実行してよく、ただし、(n - shift2)は第2のスケール係数である)。 That is, video decoder 300 determines the width of the first of the one or more sub-blocks (e.g., sbWidth), the width of the first of the one or more sub-blocks, to generate the intermediate value. may be multiplied by the sub-block height (eg, sbHeight), and a first scale factor (eg, "s"). Video decoder 300 may perform a left shift operation on the intermediate value based on the second scale factor (e.g., may perform << (n - shift2)) to generate the threshold. , where (n - shift2) is the second scale factor).

ビデオデコーダ300は、第1のサブブロックに対するそれぞれのひずみ値のひずみ値をしきい値と比較してよい。図13の決定ブロック1306に示すように、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するために、ビデオデコーダ300は、第1のサブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つを比較に基づいて決定してよい。 Video decoder 300 may compare the respective distortion values for the first sub-block to a threshold value. As shown in decision block 1306 of FIG. 13, for each subblock of one or more of the plurality of subblocks, pixel-wise BDOF is performed or BDOF is bypassed. based on the respective distortion values, video decoder 300 determines whether pixel-wise BDOF is performed or BDOF is bypassed for the first sub-block. May be determined based on comparison.

ビデオデコーダ300は、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定するように構成され得る(1508)。一例として、予測サンプルを決定するために、ビデオデコーダ300は、1つまたは複数のサブブロックのうちの第1のサブブロックに対してピクセル単位BDOFが実行されることを決定してよい。この例では、ビデオデコーダ300は、第1のサブブロックの中の各サンプルに対してそれぞれの動き改善を決定してよく、第1のサブブロックの中の各サンプルに対して、第1のサブブロックのための予測ブロックの中のサンプルからのそれぞれの改善されたサンプル値をそれぞれの動き改善に基づいて決定してよい。 Video decoder 300 may be configured to determine prediction samples for each subblock of the one or more subblocks based on a determination that pixel-by-pixel BDOF is performed or BDOF is bypassed (1508 ). As an example, to determine prediction samples, video decoder 300 may determine that pixel-by-pixel BDOF is performed on a first sub-block of the one or more sub-blocks. In this example, video decoder 300 may determine a respective motion improvement for each sample in the first sub-block, and for each sample in the first sub-block, video decoder 300 may determine a respective motion improvement for each sample in the first sub-block. Respective improved sample values from samples in the predictive block for the block may be determined based on the respective motion improvement.

たとえば、ビデオデコーダ300は、pred_BDOF(x,y) = (I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) + b'(x,y) + ο_offset) >> shift5という演算を実行してよい。pred_BDOFは、改善されたサンプル値を表してよい。この例では、I⁽⁰⁾(x,y) + I⁽¹⁾(x,y)が予測ブロックと見なされてよい。b'(x,y)に対する値は、サブブロックの中の各サンプルに対してそれぞれの動き改善(v'_x,v'_y)によって決定されてよい。したがって、改善されたそれぞれのサンプル値(たとえば、pred_BDOF)は、予測ブロックおよびそれぞれの動き改善に基づく。 For example, video decoder 300 pred _BDOF (x,y) = (I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) + b'(x,y) + ο _offset ) >> shift5 You can perform the following operation. pred _BDOF may represent the improved sample value. In this example, I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) may be considered the prediction block. The value for b'(x,y) may be determined by the respective motion refinement (v' _x ,v' _y ) for each sample in the sub-block. Therefore, each improved sample value (eg, pred _BDOF ) is based on the predictive block and the respective motion improvement.

動き改善(v'_x,v'_y)を決定するための様々な方法があってよい。動き改善を決定することの一部として、ビデオデコーダ300は、θ(i,j) = (I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)を含む、自己相関および相互相関を決定してよい。ひずみ値を決定するための代替技法が使用される場合などの、1つまたは複数の例では、ビデオデコーダ300は、第1のサブブロックに対するひずみ値を決定するための(I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)をすでに決定していることがある。そのような例では、ビデオデコーダ300は、ピクセル単位BDOFに対するピクセル単位動き改善を決定するために、スケーリングされたサンプル値の第1のセット(たとえば、I⁽⁰⁾(i,j) >> shift2)およびスケーリングされたサンプル値の第2のセット(たとえば、I⁽¹⁾(i,j) >> shift2)を再使用してよい(たとえば、θ(i,j)に対する値は、I⁽⁰⁾(i,j) >> shift2およびI⁽¹⁾(i,j) >> shift2を再計算することなく決定され得る)。 There may be various ways to determine motion improvement (v' _x ,v' _y ). As part of determining motion improvement, video decoder 300 determines θ(i,j) = (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> autocorrelation and cross-correlation, including shift2) may be determined. In one or more examples, such as when an alternative technique for determining the distortion value is used, video decoder 300 uses (I ⁽⁰⁾ (i ,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2) may have already been determined. In such an example, video decoder 300 uses a first set of scaled sample values (e.g., I ⁽⁰⁾ (i,j) >> shift2 ) and a second set of scaled sample values (e.g., I ⁽¹⁾ (i,j) >> shift2) may be reused (e.g., the value for θ(i,j) is I ^{(0 )} (i,j) >> shift2 and I ⁽¹⁾ (i,j) >> shift2 can be determined without recalculating).

ビデオデコーダ300は、予測サンプルに基づいてブロックを再構成してよい(1510)。たとえば、予測サンプルに基づいてブロックを再構成することは、ビデオデコーダ300が、予測サンプルとブロックのサンプルとの間の差分を示す残差値を受信することと、残差値を予測サンプルに加算してブロックを再構成することとを含んでよい。 Video decoder 300 may reconstruct the block based on the predicted samples (1510). For example, reconstructing a block based on predicted samples involves video decoder 300 receiving a residual value indicating the difference between the predicted sample and the sample of the block, and adding the residual value to the predicted sample. and reconfiguring the block.

上記のことは、ブロックのそれぞれのサブブロックに関する例を提供する。以下のことは、2つのサブブロックがあり一方のサブブロックに対してピクセル単位BDOFが実行され他方のサブブロックに対してBDOFがバイパスされる一例である。 The above provides examples for each sub-block of a block. The following is an example where there are two sub-blocks and pixel-wise BDOF is performed on one sub-block and BDOF is bypassed on the other sub-block.

たとえば、1つまたは複数のサブブロックのうちの第1のサブブロックに対して、ビデオデコーダ300は、それぞれのひずみ値のうちの第1のひずみ値を決定してよく、1つまたは複数のサブブロックのうちの第2のサブブロックに対して、ビデオデコーダ300は、それぞれのひずみ値のうちの第2のひずみ値を決定してよい。 For example, for a first sub-block of the one or more sub-blocks, video decoder 300 may determine a first of the respective distortion values; For a second sub-block of the blocks, video decoder 300 may determine a second of the respective distortion values.

複数のサブブロックのうちの第1のサブブロックに対して、ビデオデコーダ300は、第1のサブブロックに対してBDOFが有効化されることを第1のひずみ値に基づいて(たとえば、しきい値との第1のひずみ値の比較に基づいて)決定してよい。第1のサブブロックに対してBDOFが有効化されるという決定に基づいて、ビデオデコーダ300は、第1のサブブロックのための予測サンプルの第1のセットを改善するためのピクセル単位動き改善を決定してよい。たとえば、ビデオデコーダ300は、第1のサブブロックの第1のサンプルに対して、第1の予測サンプルを改善するための第1の動き改善を導出してよく、第1のサブブロックの第2のサンプルに対して、第2の予測サンプルを改善するための第2の動き改善を導出してよく、以下同様である。 For a first subblock of the plurality of subblocks, video decoder 300 determines based on a first distortion value (e.g., a threshold) that BDOF is enabled for the first subblock. (based on a comparison of the first strain value with the first strain value). Based on the determination that BDOF is enabled for the first sub-block, video decoder 300 performs pixel-by-pixel motion improvement to improve the first set of prediction samples for the first sub-block. You may decide. For example, video decoder 300 may derive a first motion improvement for a first sample of a first sub-block to improve a first predicted sample, and a first motion improvement for a first sample of a first sub-block to improve a first predicted sample. A second motion improvement may be derived for the sample to improve the second predicted sample, and so on.

複数のサブブロックのうちの第2のサブブロックに対して、ビデオデコーダ300は、BDOFがバイパスされることを第2のひずみ値に基づいて(たとえば、しきい値との第2のひずみ値の比較に基づいて)決定してよい。第2のブロックに対してBDOFがバイパスされるという決定に基づいて、ビデオデコーダ300は、第2のサブブロックのための予測サンプルの第2のセットを改善するためのピクセル単位動き改善を決定することをバイパスしてよい。たとえば、ビデオデコーダ300は、第1のサブブロックの第1のサンプルに対して、第1の予測サンプルを改善するための第1の動き改善の導出をバイパスしてよく、第1のサブブロックの第2のサンプルに対して、第2の予測サンプルを改善するための第2の動き改善の導出をバイパスしてよく、以下同様である。 For the second subblock of the plurality of subblocks, video decoder 300 determines that the BDOF is bypassed based on the second distortion value (e.g., the second distortion value with a threshold value). (based on comparison). Based on the determination that the BDOF is bypassed for the second block, video decoder 300 determines a pixel-by-pixel motion improvement to improve the second set of prediction samples for the second sub-block. You can bypass that. For example, video decoder 300 may bypass deriving a first motion improvement to improve the first predicted sample for the first sample of the first sub-block, and For the second sample, the derivation of the second motion improvement to improve the second predicted sample may be bypassed, and so on.

第1のサブブロックに対して、ビデオデコーダ300は、第1のサブブロックの予測サンプルの改善された第1のセットを第1のサブブロックのためのピクセル単位動き改善に基づいて決定してよい(たとえば、本開示で説明する例示的な技法を使用してpred_BDOFを決定してよい)。第2のサブブロックに対して、ビデオデコーダ300は、予測サンプルの第2のセットを改善するためのピクセル単位動き改善に基づいて予測サンプルの第2のセットを改善することなく、予測サンプルの第2のセットを決定してよい。すなわち、第2のサブブロックに対して、BDOFはバイパスされる。ビデオデコーダ300は、参照ブロックの重み付き平均に基づいて予測ブロックを決定することなどの、様々な技法に基づいて第2のサブブロックのための予測サンプルを決定してよい。 For the first sub-block, video decoder 300 may determine an improved first set of predictive samples for the first sub-block based on pixel-wise motion improvement for the first sub-block. (For example, the pred _BDOF may be determined using the example techniques described in this disclosure). For the second sub-block, video decoder 300 performs a second set of predicted samples without improving the second set of predicted samples based on pixel-by-pixel motion improvement to improve the second set of predicted samples. You may decide on a set of 2. That is, for the second sub-block, the BDOF is bypassed. Video decoder 300 may determine predictive samples for the second sub-block based on various techniques, such as determining a predictive block based on a weighted average of reference blocks.

図16は、本開示の技法による、ビデオデータを符号化する例示的な方法を示すフローチャートである。現在ブロックは、現在CUを備えてよい。ビデオエンコーダ200(図1および図3)に関して説明するが、他のデバイスが図16の方法と類似の方法を実行するように構成され得ることを理解されたい。たとえば、動き選択ユニット202および/または動き補償ユニット224は、図16の例示的な技法を実行するように構成され得る。動き選択ユニット202および/または動き補償ユニット224は、DPB218などのメモリまたはビデオエンコーダ200の他のメモリに結合されてよい。いくつかの例では、ビデオエンコーダ200は、図16の例示的な技法を実行するためにビデオエンコーダ200によって使用される情報を記憶するメモリ106に結合されてよい。一般に、ビデオエンコーダ200は、予測サンプルを生成するためにビデオデコーダ300と同じ動作を実行してよい。 FIG. 16 is a flowchart illustrating an example method of encoding video data in accordance with the techniques of this disclosure. A current block may currently comprise a CU. Although described with respect to video encoder 200 (FIGS. 1 and 3), it should be understood that other devices may be configured to perform a method similar to that of FIG. 16. For example, motion selection unit 202 and/or motion compensation unit 224 may be configured to perform the example technique of FIG. 16. Motion selection unit 202 and/or motion compensation unit 224 may be coupled to memory such as DPB 218 or other memory of video encoder 200. In some examples, video encoder 200 may be coupled to memory 106 that stores information used by video encoder 200 to perform the example technique of FIG. 16. In general, video encoder 200 may perform the same operations as video decoder 300 to generate predictive samples.

ビデオエンコーダ200は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定してよい(1600)。たとえば、ビデオエンコーダ200は、異なるコーディングモードに関連するレートひずみコストを決定してよく、レートひずみコストに基づいて、ブロックに対してBDOFが有効化されることを決定してよい。 Video encoder 200 may determine that bidirectional optical flow (BDOF) is enabled for a block of video data (1600). For example, video encoder 200 may determine rate-distortion costs associated with different coding modes and may determine that BDOF is enabled for a block based on the rate-distortion costs.

ビデオエンコーダ200は、ブロックに対してBDOFが有効化されるとき、ブロックを複数のサブブロックに分割してよい(1602)。ビデオエンコーダ200は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定してよい(1604)。ビデオエンコーダ200は、ビデオデコーダ300がそれぞれのひずみ値を決定するための、説明した技法と同じ技法を実行してよい。 Video encoder 200 may divide the block into multiple subblocks when BDOF is enabled for the block (1602). Video encoder 200 may determine a respective distortion value for each subblock of one or more of the plurality of subblocks (1604). Video encoder 200 may perform the same techniques described for video decoder 300 to determine respective distortion values.

ビデオエンコーダ200は、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定してよい(1606)。たとえば、ビデオエンコーダ200は、ピクセル単位BDOFが実行されるのかそれともBDOFがバイパスされるのかを示す情報をシグナリングしないことがあるので、ビデオエンコーダ200は、各サブブロックに対してピクセル単位BDOFが実行されるのかそれともBDOFがバイパスされるのかを決定するために、ビデオデコーダ300と同じ動作を実行してよい。 Video encoder 200 determines one of: pixel-by-pixel BDOF is performed or BDOF is bypassed for one of the plurality of sub-blocks or each sub-block of the plurality of sub-blocks. The determination may be made based on the strain value (1606). For example, video encoder 200 may not signal information indicating whether per-pixel BDOF is performed or BDOF is bypassed, so video encoder 200 may not perform per-pixel BDOF for each sub-block. The same operations as video decoder 300 may be performed to determine whether BDOF is used or BDOF is bypassed.

ビデオエンコーダ200は、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定してよい(1608)。ビデオエンコーダ200は、予測サンプルとブロック(たとえば、それぞれのサブブロック)のサンプルとの間の残差値をシグナリングしてよい(1610)。 Video encoder 200 may determine prediction samples for each subblock of the one or more subblocks based on the determination that pixel-by-pixel BDOF is performed or BDOF is bypassed (1608). Video encoder 200 may signal (1610) residual values between the predicted samples and samples of a block (eg, a respective subblock).

以下のことは、一緒にまたは別個に適用され得るいくつかの例示的な技法を説明する。 The following describes some example techniques that may be applied together or separately.

条項1. ビデオデータを復号する方法であって、方法は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定することと、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割することと、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定することと、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定することと、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定することと、予測サンプルに基づいてブロックを再構成することとを備える。 Clause 1. A method for decoding video data, the method comprising: determining that bidirectional optical flow (BDOF) is enabled for a block of video data; and determining that BDOF is enabled for the block. dividing the block into a plurality of sub-blocks based on a determination that the plurality of sub-blocks is determined to be a plurality of sub-blocks; and determining a respective strain value for each sub-block of the one or more sub-blocks of the plurality of sub-blocks; For each sub-block of one or more of the plurality of sub-blocks, one of: per-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. determining a predicted sample for each subblock of the one or more subblocks based on the determination that pixel-wise BDOF is performed or the BDOF is bypassed; and reconstructing the block.

条項2. 条項1の方法であって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定することは、1つまたは複数のサブブロックのうちの第1のサブブロックに対して、それぞれのひずみ値のうちの第1のひずみ値を決定することと、1つまたは複数のサブブロックのうちの第2のサブブロックに対して、それぞれのひずみ値のうちの第2のひずみ値を決定することとを備え、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定することは、複数のサブブロックのうちの第1のサブブロックに対して、第1のサブブロックに対してBDOFが有効化されることを第1のひずみ値に基づいて決定することと、第1のサブブロックに対してBDOFが有効化されるという決定に基づいて、第1のサブブロックのための予測サンプルの第1のセットを改善するためのピクセル単位動き改善を決定することと、複数のサブブロックのうちの第2のサブブロックに対して、BDOFがバイパスされることを第2のひずみ値に基づいて決定することと、第2のブロックに対してBDOFがバイパスされるという決定に基づいて、第2のサブブロックのための予測サンプルの第2のセットを改善するためのピクセル単位動き改善を決定することをバイパスすることとを備え、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定することは、第1のサブブロックに対して、第1のサブブロックの予測サンプルの改善された第1のセットを第1のサブブロックのためのピクセル単位動き改善に基づいて決定することと、第2のサブブロックに対して、予測サンプルの第2のセットを改善するためのピクセル単位動き改善に基づいて予測サンプルの第2のセットを改善することなく、予測サンプルの第2のセットを決定することとを備える。 Clause 2. The method of Clause 1, wherein determining a respective strain value for each sub-block of one or more of the sub-blocks comprises: determining the first of the respective strain values for the first sub-block of the sub-block; and determining the first of the respective strain values for the first sub-block of the sub-block; determining a second strain value of the strain values; BDOF is valid for the first sub-block of the multiple sub-blocks, determining which one is bypassed based on the respective distortion values based on the first distortion value and the determination that BDOF is enabled for the first sub-block. 1 and determining, based on the second distortion value, that the BDOF is bypassed for a second subblock of the plurality of subblocks. and determining a pixel-by-pixel motion improvement to improve a second set of predictive samples for the second sub-block based on the determination that the BDOF is bypassed for the second block. and determining prediction samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed. determining, for one sub-block, an improved first set of predictive samples of the first sub-block based on pixel-by-pixel motion improvement for the first sub-block; and and determining a second set of predicted samples for, without improving the second set of predicted samples based on pixel-wise motion improvement to improve the second set of predicted samples. .

条項3. 条項1および2のうちのいずれかの方法であって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定することは、1つまたは複数のサブブロックのうちの第1のサブブロックに対してピクセル単位BDOFが実行されることを決定することを備え、方法は、第1のサブブロックの中の各サンプルに対してそれぞれの動き改善を決定することをさらに備え、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定することは、第1のサブブロックの中の各サンプルに対して、第1のサブブロックのための予測ブロックの中のサンプルからのそれぞれの改善されたサンプル値をそれぞれの動き改善に基づいて決定することを備える。 Clause 3. The method of clauses 1 and 2, wherein pixel-by-pixel BDOF is performed or BDOF is performed on each sub-block of one or more of the sub-blocks. Determining which one of the subblocks to be bypassed based on the respective distortion values means that pixel-wise BDOF is performed on the first subblock of the one or more subblocks. determining, the method further comprising determining a respective motion improvement for each sample in the first sub-block, a determination that pixel-wise BDOF is performed or BDOF is bypassed; Determining a predicted sample for each sub-block of one or more sub-blocks based on the predicted sample for the first sub-block is determined based on the predicted sample for the first sub-block. determining respective improved sample values from the samples in based on the respective motion improvement.

条項4. 条項1～3のうちのいずれかの方法であって、中間値を生成するために、1つまたは複数のサブブロックのうちの第1のサブブロックの幅、1つまたは複数のサブブロックのうちの第1のサブブロックの高さ、および第1のスケール係数を乗算することと、しきい値を生成するために、第2のスケール係数に基づいて中間値に対して左シフト演算を実行することと、第1のサブブロックに対するそれぞれのひずみ値のひずみ値をしきい値と比較することとをさらに備え、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定することは、第1のサブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つを比較に基づいて決定することを備える。 Clause 4. The method of any of clauses 1 to 3, in which the width of the first of the one or more sub-blocks, the width of the first of the one or more sub-blocks, Multiplying the height of the first sub-block of the blocks, and the first scale factor and left shift operation for the intermediate value based on the second scale factor to generate the threshold and comparing the strain value of each strain value for the first sub-block with a threshold, each sub-block of the one or more sub-blocks of the plurality of sub-blocks. For the first sub-block, determining whether pixel-wise BDOF is performed or BDOF is bypassed is determined based on the respective distortion values. or the BDOF is bypassed based on the comparison.

条項5. 条項1～4のうちのいずれかの方法であって、1つまたは複数のサブブロックのうちの第1のサブブロックのための第1の参照ブロックの中のサンプル値の第1のセットを決定することと、スケーリングされたサンプル値の第1のセットを生成するために、スケール係数を用いてサンプル値の第1のセットをスケーリングすることと、1つまたは複数のサブブロックのうちの第1のサブブロックのための第2の参照ブロックの中のサンプル値の第2のセットを決定することと、スケーリングされたサンプル値の第2のセットを生成するために、スケール係数を用いてサンプル値の第2のセットをスケーリングすることとをさらに備え、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定することは、第1のサブブロックに対して、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットに基づいてそれぞれのひずみ値のひずみ値を決定することを備える。 Clause 5. In the method of any of clauses 1 to 4, the first of the sample values in the first reference block for the first sub-block of the one or more sub-blocks. determining a set of the one or more sub-blocks; scaling the first set of sample values with a scale factor to generate a first set of scaled sample values; Determine a second set of sample values in the second reference block for the first sub-block of and use the scale factor to generate a second set of scaled sample values. and determining a respective distortion value for each sub-block of the one or more sub-blocks of the plurality of sub-blocks comprises scaling the second set of sample values by for a sub-block of , determining a distortion value for each distortion value based on the first set of scaled sample values and the second set of scaled sample values.

条項6. 条項5の方法であって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定することは、第1のサブブロックに対してピクセル単位BDOFが実行されることを決定することを備え、方法は、ピクセル単位BDOFに対するピクセル単位動き改善を決定するために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用することをさらに備える。 Clause 6. The method of Clause 5, wherein for each sub-block of one or more of the plurality of sub-blocks, pixel-wise BDOF is performed or BDOF is bypassed. determining one of the two based on the respective distortion values comprises determining that a pixel-wise BDOF is performed on the first sub-block, and the method includes pixel-wise motion improvement for the pixel-wise BDOF. The method further comprises reusing the first set of scaled sample values and the second set of scaled sample values to determine .

条項7. 条項5の方法であって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定することは、第1のサブブロックに対してピクセル単位BDOFが実行されることを決定することを備え、方法は、BDOFに対する動き改善を決定するために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用することをさらに備える。 Clause 7. The method of Clause 5, wherein for each sub-block of one or more of the plurality of sub-blocks, pixel-wise BDOF is performed or BDOF is bypassed. determining one of the two based on the respective distortion values comprises determining that a pixel-wise BDOF is performed on the first sub-block, the method for determining a motion improvement for the BDOF. further comprising reusing the first set of scaled sample values and the second set of scaled sample values.

条項8. 条項1～7のうちのいずれかの方法であって、ブロックを再構成することは、予測サンプルとブロックのサンプルとの間の差分を示す残差値を受信することと、残差値を予測サンプルに加算してブロックを再構成することとを備える。 Clause 8. The method of any of Clauses 1 to 7, wherein reconstructing a block comprises receiving a residual value indicating a difference between a predicted sample and a sample of the block; adding the values to the predicted samples to reconstruct the block.

条項9. ビデオデータを復号するためのデバイスであって、デバイスは、ビデオデータを記憶するように構成されたメモリと、メモリに結合された処理回路構成とを備え、処理回路構成は、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定し、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定し、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定し、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定し、予測サンプルに基づいてブロックを再構成するように構成される。 Clause 9. A device for decoding video data, the device comprising a memory configured to store video data and processing circuitry coupled to the memory, the processing circuitry configured to store video data. determines that bidirectional optical flow (BDOF) is enabled for a block, divides the block into multiple sub-blocks based on the decision that BDOF is enabled for the block, and splits the block into multiple sub-blocks based on the decision that BDOF is enabled for the block. Determine a respective strain value for each sub-block of one or more sub-blocks of the blocks, and for each sub-block of one or more sub-blocks of the plurality of sub-blocks, one of determining whether BDOF is performed or BDOF is bypassed based on the respective distortion values; one based on the determination that per-pixel BDOF is performed or BDOF is bypassed; or configured to determine prediction samples for each subblock of the plurality of subblocks and reconstruct the block based on the prediction samples.

条項10. 条項9のデバイスであって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するために、処理回路構成は、1つまたは複数のサブブロックのうちの第1のサブブロックに対して、それぞれのひずみ値のうちの第1のひずみ値を決定し、1つまたは複数のサブブロックのうちの第2のサブブロックに対して、それぞれのひずみ値のうちの第2のひずみ値を決定するように構成され、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するために、処理回路構成は、複数のサブブロックのうちの第1のサブブロックに対して、第1のサブブロックに対してBDOFが有効化されることを第1のひずみ値に基づいて決定し、第1のサブブロックに対してBDOFが有効化されるという決定に基づいて、第1のサブブロックのための予測サンプルの第1のセットを改善するためのピクセル単位動き改善を決定し、複数のサブブロックのうちの第2のサブブロックに対して、BDOFがバイパスされることを第2のひずみ値に基づいて決定し、第2のブロックに対してBDOFがバイパスされるという決定に基づいて、第2のサブブロックのための予測サンプルの第2のセットを改善するためのピクセル単位動き改善を決定することをバイパスするように構成され、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定するために、処理回路構成は、第1のサブブロックに対して、第1のサブブロックの予測サンプルの改善された第1のセットを第1のサブブロックのためのピクセル単位動き改善に基づいて決定し、第2のサブブロックに対して、予測サンプルの第2のセットを改善するためのピクセル単位動き改善に基づいて予測サンプルの第2のセットを改善することなく、予測サンプルの第2のセットを決定するように構成される。 Clause 10. In the device of Clause 9, for determining a respective strain value for each sub-block of one or more of the plurality of sub-blocks, the processing circuitry comprises one or determining a first of the respective strain values for a first subblock of the plurality of subblocks; and determining a first of the respective strain values for a first subblock of the one or more subblocks; , configured to determine a second strain value of the respective strain values, and a pixel-wise BDOF is performed for each sub-block of the one or more sub-blocks of the plurality of sub-blocks. or the BDOF is bypassed, based on the respective distortion values, the processing circuitry configures the first sub-block of the plurality of sub-blocks to determining that BDOF is enabled for the sub-block based on the first strain value; Determine pixel-by-pixel motion improvement to improve the first set of predictive samples for the second distortion value that, for the second sub-block of the plurality of sub-blocks, the BDOF is bypassed. and determining a pixel-by-pixel motion improvement to improve a second set of predicted samples for the second sub-block based on the determination that the BDOF is bypassed for the second block. and to determine prediction samples for each subblock of the one or more subblocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed. , the processing circuitry determines, for the first sub-block, an improved first set of prediction samples of the first sub-block based on pixel-by-pixel motion improvement for the first sub-block; For the second sub-block, determine a second set of predicted samples without improving the second set of predicted samples based on pixel-wise motion improvement to improve the second set of predicted samples. configured to do so.

条項11. 条項9および10のうちのいずれかのデバイスであって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するために、処理回路構成は、1つまたは複数のサブブロックのうちの第1のサブブロックに対してピクセル単位BDOFが実行されることを決定するように構成され、処理回路構成は、第1のサブブロックの中の各サンプルに対してそれぞれの動き改善を決定するようにさらに構成され、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定するために、処理回路構成は、第1のサブブロックの中の各サンプルに対して、第1のサブブロックのための予測ブロックの中のサンプルからのそれぞれの改善されたサンプル値をそれぞれの動き改善に基づいて決定するように構成される。 Clause 11. The device of any of clauses 9 and 10, wherein pixel-by-pixel BDOF is performed or BDOF is performed on each sub-block of one or more of the plurality of sub-blocks. The processing circuitry determines which one of the subblocks is bypassed based on the respective distortion values. the processing circuitry is further configured to determine a respective motion improvement for each sample in the first sub-block to perform pixel-wise BDOF; To determine a predicted sample for each subblock of the one or more subblocks based on the determination that the BDOF is bypassed or the BDOF is bypassed, the processing circuitry includes a In contrast, each improved sample value from samples in the predictive block for the first sub-block is configured to be determined based on the respective motion improvement.

条項12. 条項9～11のうちのいずれかのデバイスであって、処理回路構成は、中間値を生成するために、1つまたは複数のサブブロックのうちの第1のサブブロックの幅、1つまたは複数のサブブロックのうちの第1のサブブロックの高さ、および第1のスケール係数を乗算し、しきい値を生成するために、第2のスケール係数に基づいて中間値に対して左シフト演算を実行し、第1のサブブロックに対するそれぞれのひずみ値のひずみ値をしきい値と比較するように構成され、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するために、処理回路構成は、第1のサブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つを比較に基づいて決定するように構成される。 Clause 12. The device of any of clauses 9 to 11, wherein the processing circuitry is configured to reduce the width of the first sub-block of the one or more sub-blocks, 1 Multiply the height of the first subblock of the one or more subblocks, and the first scale factor, and for the intermediate value based on the second scale factor to generate the threshold each subblock of the one or more subblocks of the plurality of subblocks configured to perform a left shift operation and compare the strain value of the respective strain value for the first subblock with a threshold; For the first sub-block, the processing circuitry determines whether pixel-wise BDOF is performed or BDOF is bypassed based on the respective distortion values. The method is configured to determine one of whether pixel-by-pixel BDOF is performed or BDOF is bypassed based on the comparison.

条項13. 条項9～12のうちのいずれかのデバイスであって、処理回路構成は、1つまたは複数のサブブロックのうちの第1のサブブロックのための第1の参照ブロックの中のサンプル値の第1のセットを決定し、スケーリングされたサンプル値の第1のセットを生成するために、スケール係数を用いてサンプル値の第1のセットをスケーリングし、1つまたは複数のサブブロックのうちの第1のサブブロックのための第2の参照ブロックの中のサンプル値の第2のセットを決定し、スケーリングされたサンプル値の第2のセットを生成するために、スケール係数を用いてサンプル値の第2のセットをスケーリングするように構成され、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するために、処理回路構成は、第1のサブブロックに対して、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットに基づいてそれぞれのひずみ値のひずみ値を決定するように構成される。 Clause 13. A device according to any of Clauses 9 to 12, wherein the processing circuitry is a sample in a first reference block for a first sub-block of the one or more sub-blocks. Determine a first set of values and scale the first set of sample values using a scale factor to generate a first set of scaled sample values of one or more subblocks. Determine the second set of sample values in the second reference block for the first sub-block out of and use the scale factor to generate the second set of scaled sample values. the processing circuitry is configured to scale the second set of sample values and to determine a respective distortion value for each sub-block of the one or more sub-blocks of the plurality of sub-blocks; , configured to determine a respective strain value for the first sub-block based on the first set of scaled sample values and the second set of scaled sample values.

条項14. 条項13のデバイスであって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するために、処理回路構成は、第1のサブブロックに対してピクセル単位BDOFが実行されることを決定するように構成され、処理回路構成は、ピクセル単位BDOFに対するピクセル単位動き改善を決定するために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用するように構成される。 Clause 14. The device of Clause 13, wherein for each sub-block of one or more of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed. The processing circuitry is configured to determine that a pixel-by-pixel BDOF is performed on the first sub-block to determine one of the first sub-blocks based on the respective distortion values; The first set of scaled sample values and the second set of scaled sample values are configured to reuse the first set of scaled sample values to determine a pixel-by-pixel motion improvement for the pixel-by-pixel BDOF.

条項15. 条項13のデバイスであって、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するために、処理回路構成は、第1のサブブロックに対してピクセル単位BDOFが実行されることを決定するように構成され、処理回路構成は、BDOFに対する動き改善を決定するために、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットを再使用するように構成される。 Clause 15. The device of Clause 13, wherein for each sub-block of one or more of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed. The processing circuitry is configured to determine that a pixel-by-pixel BDOF is performed on the first sub-block to determine one of the first sub-blocks based on the respective distortion values; The first set of scaled sample values and the second set of scaled sample values are configured to reuse the first set of scaled sample values to determine a motion improvement for the BDOF.

条項16. 条項9～15のうちのいずれかのデバイスであって、ブロックを再構成するために、処理回路構成は、予測サンプルとブロックのサンプルとの間の差分を示す残差値を受信し、残差値を予測サンプルに加算してブロックを再構成するように構成される。 Clause 16. The device of any of Clauses 9 to 15, wherein the processing circuitry receives a residual value indicative of a difference between a predicted sample and a sample of the block, in order to reconstruct the block. , configured to add the residual values to the predicted samples to reconstruct the block.

条項17. 条項9～16のうちのいずれかのデバイスであって、復号ビデオデータを表示するように構成されたディスプレイをさらに備える。 Clause 17. The device of any of Clauses 9 to 16, further comprising a display configured to display decoded video data.

条項18. 条項9～17のデバイスであって、デバイスは、カメラ、コンピュータ、モバイルデバイス、ブロードキャスト受信機デバイス、またはセットトップボックスのうちの1つまたは複数を備える。 Clause 18. The device of Clauses 9 to 17, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.

条項19. その上に命令を記憶するコンピュータ可読記憶媒体であって、命令は、実行されたとき、1つまたは複数のプロセッサに、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定させ、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割させ、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定させ、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定させ、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定させ、予測サンプルに基づいてブロックを再構成させる。 Clause 19. A computer-readable storage medium having instructions stored thereon, the instructions, when executed, causing one or more processors to enable bidirectional optical flow (BDOF) for a block of video data. cause the block to be split into multiple sub-blocks based on the decision that BDOF is enabled for the block, each sub-block of one or more of the multiple sub-blocks Let the blocks determine their respective distortion values, and for each sub-block of one or more of the sub-blocks, whether pixel-wise BDOF is performed or BDOF is bypassed. one of the predicted samples for each sub-block of one or more sub-blocks based on the decision that pixel-wise BDOF is performed or BDOF is bypassed. is determined and the block is reconstructed based on the predicted samples.

条項20. 条項19のコンピュータ可読記憶媒体であって、1つまたは複数のプロセッサに、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定させる、命令は、1つまたは複数のプロセッサに、1つまたは複数のサブブロックのうちの第1のサブブロックに対して、それぞれのひずみ値のうちの第1のひずみ値を決定させ、1つまたは複数のサブブロックのうちの第2のサブブロックに対して、それぞれのひずみ値のうちの第2のひずみ値を決定させる、命令を備え、1つまたは複数のプロセッサに、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定させる、命令は、1つまたは複数のプロセッサに、複数のサブブロックのうちの第1のサブブロックに対して、第1のサブブロックに対してBDOFが有効化されることを第1のひずみ値に基づいて決定させ、第1のサブブロックに対してBDOFが有効化されるという決定に基づいて、第1のサブブロックのための予測サンプルの第1のセットを改善するためのピクセル単位動き改善を決定させ、複数のサブブロックのうちの第2のサブブロックに対して、BDOFがバイパスされることを第2のひずみ値に基づいて決定させ、第2のブロックに対してBDOFがバイパスされるという決定に基づいて、第2のサブブロックのための予測サンプルの第2のセットを改善するためのピクセル単位動き改善を決定することをバイパスさせる、命令を備え、1つまたは複数のプロセッサに、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定させる、命令は、1つまたは複数のプロセッサに、第1のサブブロックに対して、第1のサブブロックの予測サンプルの改善された第1のセットを第1のサブブロックのためのピクセル単位動き改善に基づいて決定させ、第2のサブブロックに対して、予測サンプルの第2のセットを改善するためのピクセル単位動き改善に基づいて予測サンプルの第2のセットを改善することなく、予測サンプルの第2のセットを決定させる、命令を備える。 Clause 20. The computer-readable storage medium of Clause 19, wherein the one or more processors determine a respective strain value for each sub-block of one or more of the plurality of sub-blocks. , instructions cause the one or more processors to determine a first of the respective strain values for a first of the one or more sub-blocks; instructions for causing the one or more processors to determine a second of the respective strain values for a second sub-block of the plurality of sub-blocks; For each sub-block of one or more sub-blocks of , the instructions cause one of performing pixel-wise BDOF or BDOF to be bypassed to be determined based on the respective distortion values. causing the one or more processors to determine, for a first subblock of the plurality of subblocks, that BDOF is enabled for the first subblock based on the first distortion value; , based on the decision that BDOF is enabled for the first sub-block, determine a pixel-by-pixel motion improvement to improve the first set of predictive samples for the first sub-block; for a second sub-block of the sub-blocks of , having the BDOF determined to be bypassed based on the second distortion value; , comprising instructions for causing one or more processors to perform a pixel-by-pixel BDOF, bypassing determining a pixel-by-pixel motion improvement to improve a second set of predicted samples for a second sub-block; The instructions cause the one or more processors to determine prediction samples for each sub-block of the one or more sub-blocks based on a determination that the first sub-block , let an improved first set of predicted samples for the first sub-block be determined based on the pixel-wise motion improvement for the first sub-block, and for the second sub-block, let the improved first set of predicted samples for the first sub-block instructions for determining a second set of predictive samples without improving the second set of predictive samples based on pixel-by-pixel motion improvement to improve the second set of predictive samples.

条項21. 条項19および20のうちのいずれかのコンピュータ可読記憶媒体であって、1つまたは複数のプロセッサに、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定させる、命令は、1つまたは複数のプロセッサに、1つまたは複数のサブブロックのうちの第1のサブブロックに対してピクセル単位BDOFが実行されることを決定させる、命令を備え、命令は、1つまたは複数のプロセッサに、第1のサブブロックの中の各サンプルに対してそれぞれの動き改善を決定させる、命令をさらに備え、1つまたは複数のプロセッサに、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定させる、命令は、1つまたは複数のプロセッサに、第1のサブブロックの中の各サンプルに対して、第1のサブブロックのための予測ブロックの中のサンプルからのそれぞれの改善されたサンプル値をそれぞれの動き改善に基づいて決定させる、命令を備える。 Clause 21. A computer-readable storage medium according to any of clauses 19 and 20, wherein one or more processors, for each sub-block of one or more of the plurality of sub-blocks, , instructions cause one or more processors to determine one of whether pixel-by-pixel BDOF is performed or BDOF is bypassed based on their respective distortion values. instructions for causing the one or more processors to determine that a pixel-by-pixel BDOF is performed on a first sub-block of the further comprising instructions for causing the one or more processors to determine respective motion improvements for each of the one or more sub-blocks based on the determination that pixel-by-pixel BDOF is performed or BDOF is bypassed; Instructions that cause one or more processors to determine predicted samples for a subblock include, for each sample in a first subblock, a sample in the predicted block for the first subblock. instructions for causing each improved sample value from to be determined based on the respective motion improvement.

条項22. 条項19～21のコンピュータ可読記憶媒体であって、1つまたは複数のプロセッサに、中間値を生成するために、1つまたは複数のサブブロックのうちの第1のサブブロックの幅、1つまたは複数のサブブロックのうちの第1のサブブロックの高さ、および第1のスケール係数を乗算させ、しきい値を生成するために、第2のスケール係数に基づいて中間値に対して左シフト演算を実行させ、第1のサブブロックに対するそれぞれのひずみ値のひずみ値をしきい値と比較させる、命令をさらに備え、1つまたは複数のプロセッサに、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定させる、命令は、1つまたは複数のプロセッサに、第1のサブブロックに対してピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つを比較に基づいて決定させる、命令を備える。 Clause 22. The computer-readable storage medium of clauses 19 to 21, in which the width of a first sub-block of the one or more sub-blocks is transmitted to one or more processors to generate an intermediate value; Multiply the height of the first subblock of one or more subblocks, and the first scale factor for the intermediate value based on the second scale factor to generate the threshold. further comprising instructions for causing the one or more processors to perform a left shift operation on one of the plurality of subblocks and to compare the strain value of each strain value for the first subblock with a threshold; instructions for determining, for each sub-block of the one or more sub-blocks, one of whether pixel-wise BDOF is performed or BDOF is bypassed based on the respective distortion values; or instructions for causing the plurality of processors to determine, based on the comparison, one of performing pixel-by-pixel BDOF or bypassing BDOF for the first sub-block.

条項23. 条項19～22のうちのいずれかのコンピュータ可読記憶媒体であって、1つまたは複数のプロセッサに、1つまたは複数のサブブロックのうちの第1のサブブロックのための第1の参照ブロックの中のサンプル値の第1のセットを決定させ、スケーリングされたサンプル値の第1のセットを生成するために、スケール係数を用いてサンプル値の第1のセットをスケーリングさせ、1つまたは複数のサブブロックのうちの第1のサブブロックのための第2の参照ブロックの中のサンプル値の第2のセットを決定させ、スケーリングされたサンプル値の第2のセットを生成するために、スケール係数を用いてサンプル値の第2のセットをスケーリングさせる、命令をさらに備え、1つまたは複数のプロセッサに、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定させる、命令は、1つまたは複数のプロセッサに、第1のサブブロックに対して、スケーリングされたサンプル値の第1のセットおよびスケーリングされたサンプル値の第2のセットに基づいてそれぞれのひずみ値のひずみ値を決定させる、命令を備える。 Clause 23. The computer-readable storage medium of any of clauses 19 to 22, wherein the computer-readable storage medium of any of clauses 19 to 22 is configured to provide one or more processors with a first storage medium for a first sub-block of one or more sub-blocks. determine a first set of sample values in a reference block, and scale the first set of sample values with a scale factor to generate a first set of scaled sample values; or to determine a second set of sample values in a second reference block for a first sub-block of the plurality of sub-blocks and to generate a second set of scaled sample values. , further comprising instructions for causing the one or more processors to scale the second set of sample values using a scale factor, for each sub-block of the one or more sub-blocks of the plurality of sub-blocks. instructions that cause the one or more processors to determine, for a first sub-block, a first set of scaled sample values and a second set of scaled sample values. instructions for determining a strain value for each strain value based on the strain value.

条項24. ビデオデータを復号するためのデバイスであって、デバイスは、ビデオデータのブロックに対して双方向オプティカルフロー(BDOF)が有効化されることを決定するための手段と、ブロックに対してBDOFが有効化されるという決定に基づいてブロックを複数のサブブロックに分割するための手段と、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対してそれぞれのひずみ値を決定するための手段と、複数のサブブロックのうちの1つまたは複数のサブブロックの各サブブロックに対して、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるかのうちの1つをそれぞれのひずみ値に基づいて決定するための手段と、ピクセル単位BDOFが実行されるかまたはBDOFがバイパスされるという決定に基づいて1つまたは複数のサブブロックの各サブブロックに対して予測サンプルを決定するための手段と、予測サンプルに基づいてブロックを再構成するための手段とを備える。 Clause 24. A device for decoding video data, the device comprising: means for determining that bidirectional optical flow (BDOF) is enabled for a block of video data; means for dividing the block into multiple sub-blocks based on a determination that BDOF is enabled and a respective strain value for each sub-block of one or more of the multiple sub-blocks; and one of: pixel-wise BDOF is performed or BDOF is bypassed for each sub-block of the one or more sub-blocks of the plurality of sub-blocks; a means for determining the predicted samples for each sub-block of the one or more sub-blocks based on the respective distortion values and the determination that pixel-wise BDOF is performed or BDOF is bypassed. and means for reconstructing the block based on the predicted samples.

条項25. ビデオデータをコーディングする方法であって、方法は、入力ブロックを複数のサブブロックに分割することであって、入力ブロックのサイズがコーディングユニットのサイズよりも小さいかまたはそれに等しいことと、複数のサブブロックのうちのサブブロックに双方向オプティカルフロー(BDOF)が適用されることになることを、条件が満たされることに基づいて決定することと、サブブロックを複数のサブサブブロックに分割することと、サブサブブロックのうちの1つまたは複数のための改善された動きベクトルを決定することであって、1つまたは複数のサブサブブロックのうちのサブサブブロックのための改善された動きベクトルがサブサブブロックの中の複数のサンプルにとって同じであることと、1つまたは複数のサブサブブロックのための改善された動きベクトルに基づいてサブブロックに対してBDOFを実行することとを備える。 Clause 25. A method of coding video data, the method comprising dividing an input block into a plurality of sub-blocks, the size of the input block being smaller than or equal to the size of the coding unit; Determining that bidirectional optical flow (BDOF) is to be applied to a sub-block of the plurality of sub-blocks based on a condition being satisfied, and dividing the sub-block into a plurality of sub-sub-blocks. determining an improved motion vector for one or more of the sub-subblocks, the improved motion vector for the sub-subblock of the one or more sub-subblocks being the same for multiple samples in the block, and performing a BDOF on the sub-block based on the improved motion vector for the one or more sub-sub-blocks.

条項26. ビデオデータをコーディングする方法であって、方法は、入力ブロックを複数のサブブロックに分割することであって、入力ブロックのサイズがコーディングユニットのサイズよりも小さいかまたはそれに等しいことと、複数のサブブロックのうちのサブブロックに双方向オプティカルフロー(BDOF)が適用されることになることを、条件が満たされることに基づいて決定することと、サブブロックを複数のサブサブブロックに分割することと、サブブロックの中の1つまたは複数のサンプルの各々のための改善された動きベクトルを決定することと、サブブロックの中の1つまたは複数のサンプルの各々のための改善された動きベクトルに基づいてサブブロックに対してBDOFを実行することとを備える。 Clause 26. A method of coding video data, the method comprising dividing an input block into a plurality of sub-blocks, the size of the input block being smaller than or equal to the size of the coding unit; Determining that bidirectional optical flow (BDOF) is to be applied to a sub-block of the plurality of sub-blocks based on a condition being satisfied, and dividing the sub-block into a plurality of sub-sub-blocks. determining an improved motion vector for each of the one or more samples in the sub-block; and determining an improved motion vector for each of the one or more samples in the sub-block. and performing BDOF on the sub-blocks based on the vectors.

条項27. 条項25および26のうちのいずれかの方法であって、複数のサブブロックのうちの他のサブブロックに対してBDOFをバイパスすることをさらに備える。 Clause 27. The method of any of Clauses 25 and 26, further comprising bypassing the BDOF for other subblocks of the plurality of subblocks.

条項28. 条項25～27のうちのいずれかの方法であって、条件が満たされることは、参照ピクチャ0および参照ピクチャ1の中の2つの予測信号の間の絶対差分和(SAD)がしきい値よりも小さいかどうかの決定を含む。 Clause 28. In any method of clauses 25 to 27, the condition is satisfied if the sum of absolute differences (SAD) between the two predicted signals in reference picture 0 and reference picture 1 is Involves determining whether it is less than a threshold.

条項29. 条項25～28のうちのいずれかの方法であって、入力ブロックのサイズはthW×thHであり、thWおよびthHは、固定の事前決定された値、ビットストリームから復号された値のうちの1つもしくは複数に基づくか、またはコーディングユニットを符号化もしくは復号する際のBDOFの前に使用されたブロックのサイズに基づく。 Clause 29. The method of any of Clauses 25 to 28, wherein the size of the input block is thW × thH, where thW and thH are fixed, predetermined values of the value decoded from the bitstream. or based on the size of the block used before the BDOF when encoding or decoding the coding unit.

条項30. ビデオデータをコーディングする方法であって、方法は、条項25～29のうちのいずれか1つまたはその組合せを備える。 Clause 30. A method of coding video data, the method comprising any one or a combination of clauses 25 to 29.

条項31. 条項25～30のうちのいずれかの方法であって、BDOFを実行することは、ビデオデータを復号することの一部としてBDOFを実行することを備える。 Clause 31. The method of any of clauses 25 to 30, wherein performing BDOF comprises performing BDOF as part of decoding the video data.

条項32. 条項25～31のうちのいずれかの方法であって、BDOFを実行することは、符号化の再構成ループの中に含む、ビデオデータを符号化することの一部としてBDOFを実行することを備える。 Clause 32. The method of any of clauses 25 to 31, wherein performing BDOF includes performing BDOF as part of encoding the video data, including within the encoding reconstruction loop. be prepared to do something.

条項33. ビデオデータをコーディングするためのデバイスであって、デバイスは、ビデオデータを記憶するためのメモリと、メモリに結合された処理回路構成とを備え、処理回路構成は、条項25～32のうちのいずれか1つまたはその組合せを実行するように構成される。 Clause 33. A device for coding video data, the device comprising a memory for storing video data and processing circuitry coupled to the memory, the processing circuitry as described in Clauses 25 to 32. configured to perform any one or a combination thereof.

条項34. ビデオデータをコーディングするためのデバイスであって、デバイスは、条項25～32のうちのいずれかの方法を実行するための1つまたは複数の手段を備える。 Clause 34. A device for coding video data, the device comprising one or more means for performing the method of any of clauses 25 to 32.

条項35. 条項33および34のうちのいずれかのデバイスであって、復号ビデオデータを表示するように構成されたディスプレイをさらに備える。 Clause 35. The device of any of Clauses 33 and 34, further comprising a display configured to display decoded video data.

条項36. 条項33～35のうちのいずれかのデバイスであって、デバイスは、カメラ、コンピュータ、モバイルデバイス、ブロードキャスト受信機デバイス、またはセットトップボックスのうちの1つまたは複数を備える。 Clause 36. The device of any of clauses 33 to 35, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.

条項37. 条項33～36のうちのいずれかのデバイスであって、処理回路構成または実行するための手段はビデオデコーダを備える。 Clause 37. The device of any of clauses 33 to 36, wherein the processing circuitry or means for carrying out comprises a video decoder.

条項38. 条項33～37のうちのいずれかのデバイスであって、処理回路構成または実行するための手段はビデオエンコーダを備える。 Clause 38. The device of any of Clauses 33 to 37, wherein the processing circuitry or means for carrying out comprises a video encoder.

条項39. 実行されたとき、条項25～32のうちのいずれかの方法を1つまたは複数のプロセッサに実行させる命令をその上に記憶した、コンピュータ可読記憶媒体。 Clause 39. A computer readable storage medium having instructions stored thereon which, when executed, cause one or more processors to carry out the method of any of Clauses 25 to 32.

例に応じて、本明細書で説明した技法のうちのいずれかのいくつかの行為またはイベントが、異なるシーケンスで実行されることが可能であり、追加されてよく、統合されてよく、または完全に除外されてよい(たとえば、説明したすべての行為またはイベントが技法の実践にとって必要であるとは限らない)ことを認識されたい。その上、いくつかの例では、行為またはイベントは、連続的にではなく、たとえば、マルチスレッド処理、割込み処理、または複数のプロセッサを通じて並行して実行されてよい。 Depending on the example, some acts or events of any of the techniques described herein may be performed in different sequences, may be added, integrated, or completely (e.g., not all acts or events described may be necessary for the practice of a technique). Moreover, in some examples, acts or events may be performed in parallel, eg, through multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.

1つまたは複数の例では、説明した機能は、ハードウェア、ソフトウェア、ファームウェア、またはそれらの任意の組合せで実装されてよい。ソフトウェアで実装される場合、機能は、1つまたは複数の命令またはコードとして、コンピュータ可読媒体上に記憶されるかまたはコンピュータ可読媒体を介して送信されてよく、ハードウェアベースの処理ユニットによって実行されてよい。コンピュータ可読媒体は、データ記憶媒体などの有形媒体に相当するコンピュータ可読記憶媒体、または、たとえば、通信プロトコルに従って、ある場所から別の場所へのコンピュータプログラムの転送を容易にする任意の媒体を含む通信媒体を含んでよい。このようにして、コンピュータ可読媒体は、一般に、(1)非一時的である有形コンピュータ可読記憶媒体、または(2)信号もしくは搬送波などの通信媒体に相当し得る。データ記憶媒体は、本開示で説明した技法の実装のための命令、コード、および/またはデータ構造を取り出すために1つもしくは複数のコンピュータまたは1つもしくは複数のプロセッサによってアクセスされ得る、任意の利用可能な媒体であってよい。コンピュータプログラム製品はコンピュータ可読媒体を含んでよい。 In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code and executed by a hardware-based processing unit. It's fine. Computer-readable media refers to a computer-readable storage medium that represents a tangible medium, such as a data storage medium or a communication protocol, including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. May include a medium. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media that is non-transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementation of the techniques described in this disclosure. Any possible medium may be used. A computer program product may include a computer readable medium.

限定ではなく例として、そのようなコンピュータ可読記憶媒体は、RAM、ROM、EEPROM、CD-ROMもしくは他の光ディスクストレージ、磁気ディスクストレージもしくは他の磁気記憶デバイス、フラッシュメモリ、または命令もしくはデータ構造の形態の所望のプログラムコードを記憶するために使用され得るとともに、コンピュータによってアクセスされ得る、任意の他の媒体を備えることができる。また、いかなる接続も適切にコンピュータ可読媒体と呼ばれる。たとえば、命令が、同軸ケーブル、光ファイバーケーブル、ツイストペア、デジタル加入者回線(DSL)、または赤外線、無線、およびマイクロ波などのワイヤレス技術を使用して、ウェブサイト、サーバ、または他のリモートソースから送信される場合、同軸ケーブル、光ファイバーケーブル、ツイストペア、DSL、または赤外線、無線、およびマイクロ波などのワイヤレス技術は、媒体の定義の中に含まれる。しかしながら、コンピュータ可読記憶媒体およびデータ記憶媒体が、接続、搬送波、信号、または他の一時的媒体を含まず、代わりに非一時的有形記憶媒体を対象とすることを理解されたい。本明細書で使用するディスク(disk)およびディスク(disc)は、コンパクトディスク(disc)(CD)、レーザーディスク(disc)、光ディスク(disc)、デジタル多用途ディスク(disc)(DVD)、フロッピーディスク(disk)、およびブルーレイディスク(disc)を含み、ここで、ディスク(disk)は、通常、データを磁気的に再生し、ディスク(disc)は、レーザーを用いてデータを光学的に再生する。上記のものの組合せもコンピュータ可読媒体の範囲内に含まれるべきである。 By way of example and not limitation, such computer readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any form of instruction or data structure. Any other medium that can be used to store desired program code and that can be accessed by a computer can be included. Also, any connection is properly termed a computer-readable medium. For example, instructions may be transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave. If so, coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included within the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals or other transitory media, and instead refer to non-transitory tangible storage media. As used herein, "disk" and "disc" refer to compact disc (disc) (CD), laser disc (disc), optical disc (disc), digital versatile disc (disc) (DVD), and floppy disc. discs, and Blu-ray discs, where discs typically reproduce data magnetically and discs typically reproduce data optically using a laser. Combinations of the above should also be included within the scope of computer-readable media.

命令は、1つまたは複数のDSP、汎用マイクロプロセッサ、ASIC、FPGA、または他の均等な集積論理回路構成もしくは個別論理回路構成などの、1つまたは複数のプロセッサによって実行され得る。したがって、本明細書で使用する「プロセッサ」および「処理回路構成」という用語は、上記の構造、または本明細書で説明した技法の実装に適した任意の他の構造のうちのいずれかを指すことがある。加えて、いくつかの態様では、本明細書で説明した機能は、符号化および復号のために構成された専用のハードウェアモジュールおよび/もしくはソフトウェアモジュール内で提供されてよく、または複合コーデックの中に組み込まれてもよい。また、技法は、1つまたは複数の回路または論理要素の中に完全に実装され得る。 The instructions may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry. Accordingly, the terms "processor" and "processing circuitry" as used herein refer to any of the structures described above or any other structure suitable for implementing the techniques described herein. Sometimes. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or within a composite codec. may be incorporated into. Also, the techniques may be implemented entirely in one or more circuits or logic elements.

本開示の技法は、ワイヤレスハンドセット、集積回路(IC)、またはICのセット(たとえば、チップセット)を含む、多種多様なデバイスまたは装置の中に実装され得る。開示する技法を実行するように構成されたデバイスの機能的態様を強調するために、様々な構成要素、モジュール、またはユニットが本開示で説明されるが、それらは必ずしも異なるハードウェアユニットによる実現を必要とするとは限らない。むしろ、上記で説明したように、様々なユニットは、コーデックハードウェアユニットの中で組み合わせられてよく、または好適なソフトウェアおよび/もしくはファームウェアと連携して、上記で説明したような1つまたは複数のプロセッサを含む、相互動作可能なハードウェアユニットの集合によって提供されてよい。 The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or sets of ICs (eg, chipsets). Although various components, modules, or units are described in this disclosure to emphasize functional aspects of a device configured to perform the disclosed techniques, they do not necessarily require implementation by different hardware units. It doesn't necessarily mean you need it. Rather, as explained above, the various units may be combined in a codec hardware unit, or in conjunction with suitable software and/or firmware, one or more of the units as explained above. It may be provided by a collection of interoperable hardware units, including a processor.

様々な例が説明されている。これらおよび他の例は、以下の特許請求の範囲内に入る。 Various examples are explained. These and other examples are within the scope of the following claims.

100 ビデオ符号化および復号システム
102 ソースデバイス
104 ビデオソース
106 メモリ
108 出力インターフェース
110 コンピュータ可読媒体
112 記憶デバイス
114 ファイルサーバ
116 宛先デバイス
118 ディスプレイデバイス
120 メモリ
122 入力インターフェース
130 4分木2分木(QTBT)構造
132 コーディングツリーユニット(CTU)
200 ビデオエンコーダ
202 モード選択ユニット
204 残差生成ユニット
206 変換処理ユニット
208 量子化ユニット
210 逆量子化ユニット
212 逆変換処理ユニット
214 再構成ユニット
216 フィルタユニット
218 復号ピクチャバッファ(DPB)
220 エントロピー符号化ユニット
222 動き推定ユニット
224 動き補償ユニット
226 イントラ予測ユニット
230 ビデオデータメモリ
300 ビデオデコーダ
302 エントロピー復号ユニット
304 予測処理ユニット
306 逆量子化ユニット
308 逆変換処理ユニット
310 再構成ユニット
312 フィルタユニット
314 復号ピクチャバッファ(DPB)
316 動き補償ユニット
318 イントラ予測ユニット
320 コード化ピクチャバッファ(CPB)メモリ
602、604 ブロック
700 現在フレーム
702 参照フレーム
1000 正方形探索パターン
1100、1102 ブロック 100 video encoding and decoding systems
102 Source device
104 Video Source
106 Memory
108 output interface
110 Computer-readable media
112 Storage devices
114 File server
116 Destination Device
118 Display devices
120 memory
122 input interface
130 Quadrant binary tree (QTBT) structure
132 Coding Tree Unit (CTU)
200 video encoder
202 Mode selection unit
204 Residual generation unit
206 Conversion processing unit
208 Quantization unit
210 Inverse quantization unit
212 Inverse transformation processing unit
214 Reconfiguration unit
216 Filter unit
218 Decoded Picture Buffer (DPB)
220 entropy coding units
222 Motion estimation unit
224 Motion Compensation Unit
226 Intra prediction unit
230 video data memory
300 video decoder
302 Entropy decoding unit
304 Prediction processing unit
306 Inverse quantization unit
308 Inverse conversion processing unit
310 Reconfiguration unit
312 Filter unit
314 Decoded Picture Buffer (DPB)
316 Motion Compensation Unit
318 Intra prediction unit
320 coded picture buffer (CPB) memory
602, 604 block
700 current frames
702 Reference frame
1000 square search patterns
1100, 1102 block

Claims

A method for decoding video data, the method comprising:
determining that bidirectional optical flow (BDOF) is enabled for the block of video data;
dividing the block into a plurality of sub-blocks based on the determination that BDOF is enabled for the block;
determining a respective strain value for each subblock of one or more of the plurality of subblocks;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. a step of determining one of the
determining prediction samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed;
reconstructing the block based on the predicted samples.

determining a respective strain value for each sub-block of the one or more sub-blocks of the plurality of sub-blocks;
determining a first of the respective strain values for a first sub-block of the one or more sub-blocks;
determining a second of the respective strain values for a second sub-block of the one or more sub-blocks;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. The step of determining one of the
determining, for the first sub-block of the plurality of sub-blocks, that BDOF is enabled for the first sub-block based on the first distortion value;
determining a pixel-by-pixel motion improvement to improve a first set of predictive samples for the first sub-block based on the determination that BDOF is enabled for the first sub-block; step and
determining that BDOF is bypassed for the second sub-block of the plurality of sub-blocks based on the second distortion value;
determining a pixel-by-pixel motion improvement to improve a second set of predictive samples for the second sub-block based on the determination that BDOF is bypassed for the second sub-block; and a step to bypass the
determining the predicted samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed;
determining, for the first sub-block, the improved first set of predictive samples of the first sub-block based on the pixel-by-pixel motion improvement for the first sub-block; ,
For the second sub-block, the second set of predictive samples is modified without improving the second set of predictive samples based on the pixel-by-pixel motion improvement to improve the second set of predictive samples. 2. The method of claim 1, comprising: determining a set of 2.

For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. determining that pixel-by-pixel BDOF is performed on a first sub-block of the one or more sub-blocks;
The method further comprises determining a respective motion improvement for each sample in the first sub-block;
determining the predicted samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed; 4. Determining, for each sample in a block, a respective improved sample value from a sample in a predictive block for the first sub-block based on the respective motion improvement. The method described in Section 1.

to generate an intermediate value, a width of a first sub-block of the one or more sub-blocks, a height of the first sub-block of the one or more sub-blocks, and a first sub-block of the one or more sub-blocks. multiplying by a scale factor of 1;
performing a left shift operation on the intermediate value based on a second scale factor to generate a threshold;
further comprising comparing a strain value of the respective strain values for the first sub-block with the threshold;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. determining one of whether pixel-by-pixel BDOF is performed or BDOF is bypassed for the first sub-block based on the comparison; The method according to claim 1.

determining a first set of sample values in a first reference block for a first sub-block of the one or more sub-blocks;
scaling the first set of sample values using a scale factor to generate a first set of scaled sample values;
determining a second set of sample values in a second reference block for the first sub-block of the one or more sub-blocks;
scaling the second set of sample values using the scale factor to generate a second set of scaled sample values;
Determining the respective distortion values for each sub-block of one or more of the plurality of sub-blocks includes, for the first sub-block, determining the respective distortion values of the scaled sample values. 2. The method of claim 1, comprising determining a distortion value of the respective distortion values based on the first set and the second set of scaled sample values.

For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. determining that a pixel-by-pixel BDOF is performed on the first sub-block;
The method further comprises reusing the first set of scaled sample values and the second set of scaled sample values to determine a pixel-by-pixel motion improvement for a pixel-by-pixel BDOF. 6. The method according to claim 5.

For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. determining that a pixel-by-pixel BDOF is performed on the first sub-block;
6. The method further comprises reusing the first set of scaled sample values and the second set of scaled sample values to determine motion improvement for BDOF. Method described.

The step of reconfiguring the block comprises:
receiving a residual value indicating a difference between the predicted samples and the samples of the block;
and adding the residual values to the predicted samples to reconstruct the block.

A device for decoding video data, the device comprising:
a memory configured to store the video data;
processing circuitry coupled to the memory, the processing circuitry comprising:
determining that bidirectional optical flow (BDOF) is enabled for the block of video data;
dividing the block into a plurality of sub-blocks based on the determination that BDOF is enabled for the block;
determining a respective strain value for each subblock of the one or more subblocks of the plurality of subblocks;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. determine one of
determining prediction samples for each subblock of the one or more subblocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed;
A device configured to reconstruct the block based on the predicted samples.

the processing circuitry for determining a respective strain value for each subblock of one or more of the plurality of subblocks;
determining a first of the respective strain values for a first sub-block of the one or more sub-blocks;
configured to determine a second of the respective strain values for a second sub-block of the one or more sub-blocks;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. In order to determine one of the processing circuit configurations,
determining, for the first sub-block of the plurality of sub-blocks, that BDOF is enabled for the first sub-block based on the first distortion value;
Based on the determination that BDOF is enabled for the first sub-block, determining a pixel-by-pixel motion improvement to improve a first set of prediction samples for the first sub-block. ,
determining that BDOF is bypassed for the second sub-block of the plurality of sub-blocks based on the second distortion value;
determining a pixel-by-pixel motion improvement to improve a second set of prediction samples for the second sub-block based on the determination that BDOF is bypassed for the second sub-block; configured to bypass
the processing circuitry for determining the prediction samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed; ,
for the first sub-block, determining the improved first set of predictive samples of the first sub-block based on the pixel-by-pixel motion improvement for the first sub-block;
For the second sub-block, the second set of predictive samples is modified without improving the second set of predictive samples based on the pixel-by-pixel motion improvement to improve the second set of predictive samples. 10. The device of claim 9, configured to determine a set of 2.

For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. the processing circuitry is configured to determine that a pixel-by-pixel BDOF is performed on a first sub-block of the one or more sub-blocks;
the processing circuitry is further configured to determine a respective motion improvement for each sample in the first sub-block;
the processing circuitry for determining the prediction samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed; , for each sample in the first sub-block, determine a respective improved sample value from a sample in the predictive block for the first sub-block based on the respective motion improvement. 10. The device of claim 9, configured to.

The processing circuit configuration is
to generate an intermediate value, a width of a first sub-block of the one or more sub-blocks, a height of the first sub-block of the one or more sub-blocks, and a first sub-block of the one or more sub-blocks. Multiply by a scale factor of 1,
performing a left shift operation on the intermediate value based on a second scale factor to generate a threshold;
configured to compare a strain value of the respective strain values for the first sub-block to the threshold;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. The processing circuitry determines one of whether pixel-wise BDOF is performed or BDOF is bypassed for the first sub-block to determine one of: 10. The device of claim 9, configured to determine.

The processing circuit configuration is
determining a first set of sample values in a first reference block for a first sub-block of the one or more sub-blocks;
scaling the first set of sample values using a scale factor to generate a first set of scaled sample values;
determining a second set of sample values in a second reference block for the first sub-block of the one or more sub-blocks;
configured to scale the second set of sample values using the scale factor to generate a second set of scaled sample values;
In order to determine the respective distortion values for each sub-block of one or more sub-blocks of the plurality of sub-blocks, the processing circuitry is configured to perform scaling for the first sub-block. 10. The first set of scaled sample values and the second set of scaled sample values are configured to determine a distortion value of the respective distortion values. device.

For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. the processing circuitry is configured to determine that a pixel-by-pixel BDOF is performed on the first sub-block;
The processing circuitry is configured to reuse the first set of scaled sample values and the second set of scaled sample values to determine a pixel-by-pixel motion improvement for a pixel-by-pixel BDOF. 14. The device of claim 13.

For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. the processing circuitry is configured to determine that a pixel-by-pixel BDOF is performed on the first sub-block;
4. The processing circuitry is configured to reuse the first set of scaled sample values and the second set of scaled sample values to determine motion improvement for a BDOF. Devices described in Section 13.

In order to reconfigure the block, the processing circuitry comprises:
receiving a residual value indicating a difference between the predicted sample and the sample of the block;
10. The device of claim 9, configured to add the residual values to the predicted samples to reconstruct the block.

10. The device of claim 9, further comprising a display configured to display decoded video data.

10. The device of claim 9, comprising one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.

A computer-readable storage medium having instructions stored thereon, the instructions, when executed, causing one or more processors to:
determining that bidirectional optical flow (BDOF) is enabled for a block of video data;
dividing the block into a plurality of sub-blocks based on the determination that BDOF is enabled for the block;
determining respective strain values for each subblock of one or more subblocks of the plurality of subblocks;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. let them decide on one of the
determining prediction samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed;
A computer-readable storage medium that causes the block to be reconstructed based on the predicted samples.

The instructions cause the one or more processors to determine a respective strain value for each sub-block of one or more of the plurality of sub-blocks. to the processor,
determining a first strain value of the respective strain values for a first subblock of the one or more subblocks;
causing a second sub-block of the one or more sub-blocks to determine a second strain value of the respective strain values;
equipped with commands,
causing the one or more processors to perform pixel-by-pixel BDOF on each sub-block of the one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values; or the instruction causes the one or more processors to determine one of: or BDOF is bypassed;
causing the first sub-block of the plurality of sub-blocks to determine, based on the first distortion value, that BDOF is enabled for the first sub-block;
determining a pixel-by-pixel motion improvement to improve a first set of predictive samples for the first sub-block based on the determination that BDOF is enabled for the first sub-block; ,
determining that BDOF is bypassed for the second sub-block of the plurality of sub-blocks based on the second distortion value;
determining a pixel-by-pixel motion improvement to improve a second set of prediction samples for the second sub-block based on the determination that BDOF is bypassed for the second sub-block; bypass the
equipped with commands,
determining the prediction samples for each subblock of the one or more subblocks based on the determination that pixel-by-pixel BDOF is performed or BDOF is bypassed; causing the one or more processors to:
for the first sub-block, causing the improved first set of predictive samples of the first sub-block to be determined based on the pixel-by-pixel motion improvement for the first sub-block;
For the second sub-block, the second set of predictive samples is modified without improving the second set of predictive samples based on the pixel-by-pixel motion improvement to improve the second set of predictive samples. Let the set of 2 be determined,
20. The computer readable storage medium of claim 19, comprising instructions.

causing the one or more processors to perform pixel-by-pixel BDOF on each sub-block of the one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values; or a BDOF is bypassed, the instructions cause the one or more processors to determine, pixel by pixel, for a first subblock of the one or more subblocks. comprises an instruction that causes the BDOF to decide to be executed;
the instructions further comprising instructions causing the one or more processors to determine a respective motion improvement for each sample in the first sub-block;
determining the prediction samples for each subblock of the one or more subblocks based on the determination that pixel-by-pixel BDOF is performed or BDOF is bypassed; causing the one or more processors to calculate, for each sample in the first sub-block, a respective improvement from a sample in the predictive block for the first sub-block. 20. The computer-readable storage medium of claim 19, comprising instructions for causing a determined sample value to be determined based on the respective motion improvement.

said one or more processors;
to generate an intermediate value, a width of a first sub-block of the one or more sub-blocks, a height of the first sub-block of the one or more sub-blocks, and a first sub-block of the one or more sub-blocks. Multiply by a scale factor of 1,
performing a left shift operation on the intermediate value based on a second scale factor to generate a threshold;
comparing a strain value of the respective strain values for the first sub-block with the threshold;
With more instructions,
causing the one or more processors to perform pixel-by-pixel BDOF on each sub-block of the one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values; or BDOF is bypassed, said instructions causing said one or more processors to determine one of: pixel-by-pixel BDOF is performed for said first sub-block or BDOF is bypassed. 20. The computer-readable storage medium of claim 19, comprising instructions for causing one of the following to be determined based on the comparison.

said one or more processors;
determining a first set of sample values in a first reference block for a first sub-block of the one or more sub-blocks;
scaling the first set of sample values using a scale factor to generate a first set of scaled sample values;
determining a second set of sample values in a second reference block for the first sub-block of the one or more sub-blocks;
scaling the second set of sample values using the scale factor to generate a second set of scaled sample values;
With more instructions,
The instructions cause the one or more processors to determine the respective strain value for each sub-block of one or more of the plurality of sub-blocks. a processor for determining a distortion value of the respective distortion values for the first sub-block based on the first set of scaled sample values and the second set of scaled sample values; 20. The computer readable storage medium of claim 19, comprising instructions for determining.

A device for decoding video data, the device comprising:
means for determining that bidirectional optical flow (BDOF) is enabled for the block of video data;
means for dividing the block into a plurality of sub-blocks based on the determination that BDOF is enabled for the block;
means for determining a respective strain value for each subblock of one or more of the plurality of subblocks;
For each sub-block of the one or more sub-blocks of the plurality of sub-blocks, pixel-by-pixel BDOF is performed or BDOF is bypassed based on the respective distortion values. a means for determining one of the
means for determining prediction samples for each sub-block of the one or more sub-blocks based on the determination that pixel-wise BDOF is performed or BDOF is bypassed;
and means for reconstructing the block based on the predicted samples.