[go: up one dir, main page]

WO2024145086A1 - Dérivation de contenu pour codage vidéo en mode de partitionnement géométrique - Google Patents

Dérivation de contenu pour codage vidéo en mode de partitionnement géométrique Download PDF

Info

Publication number
WO2024145086A1
WO2024145086A1 PCT/US2023/084924 US2023084924W WO2024145086A1 WO 2024145086 A1 WO2024145086 A1 WO 2024145086A1 US 2023084924 W US2023084924 W US 2023084924W WO 2024145086 A1 WO2024145086 A1 WO 2024145086A1
Authority
WO
WIPO (PCT)
Prior art keywords
blending
content type
index
subset
indices
Prior art date
Application number
PCT/US2023/084924
Other languages
English (en)
Inventor
Xiang Li
Jingning Han
Yaowu Xu
Debargha Mukherjee
Original Assignee
Google Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Llc filed Critical Google Llc
Publication of WO2024145086A1 publication Critical patent/WO2024145086A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • a first subset of blending indices is associated with a screen content type and a second subset of blending indices is associated with a camera content type.
  • the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size, a second blending index corresponding to a half of the blending area size, and a third blending index corresponding to the blending area size
  • the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size, and a fifth blending index corresponding to quadruple the blending area size.
  • the content type is the screen content type
  • the first blending index is signaled using one bin
  • each of the second blending index and the third blending index is signaled using more than one bin.
  • determining the content type for the video data comprises: decoding one or more syntax elements signaled within a bitstream associated with the video data; and determining the content type based on the decoded one or more syntax elements.
  • the content type is determined using one or both of a prediction sample or a motion vector for a coding unit associated with the video data.
  • the one or more processors are configured to execute the instructions to: determine the content type as one of a screen content type or a camera content type.
  • the subset of blending indices is a first subset of blending indices where the content type is the screen content type or a second subset of blending indices where the content type is the camera content type.
  • the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size and signaled using one bin, a second blending index corresponding to a half of the blending area size and signaled using more than one bin, and a third blending index corresponding to the blending area size and signaled using more than one bin
  • the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size and signaled using more than one bin, and a fifth blending index corresponding to quadruple the blending area size and signaled using more than one bin.
  • the one or more processors execute the instructions to: determine the content type based on one or more syntax elements signaled from a bitstream associated with the video data.
  • FIG. l is a schematic of an example of a video encoding and decoding system.
  • FIG. 2 is a block diagram of an example of a computing device that can implement a transmitting station or a receiving station.
  • FIG. 3 is a diagram of an example of a video stream to be encoded and decoded.
  • FIG. 4 is a block diagram of an example of an encoder.
  • FIG. 5 is a block diagram of an example of a decoder.
  • FIG. 6 is an illustration of examples of portions of a video frame.
  • FIG. 7 is an illustration of examples of geometric partitions of a coding unit (CU).
  • FIG. 10 is a flowchart diagram of an example of a technique for content derivation for geometric partitioning mode video coding.
  • Video compression schemes may include breaking respective images, or frames, of a video stream into smaller portions, such as blocks, or coding tree units (CTUs), and generating an encoded bitstream using techniques to limit the information included for respective CTUs thereof.
  • the bitstream can be decoded to re-create the source frames from the limited information.
  • Encoding CTUs to or decoding CTUs from a bitstream can include predicting the values of pixels or CTUs based on similarities with other pixels or CTUs which have already been coded in the same frame, using intra-prediction, or in one or more different frames, using inter-prediction.
  • the result of an intra- or inter-prediction mode performed against a CU is a prediction unit (PU).
  • a prediction residual can be determined based on a difference between the pixel values of the CU and the pixel values of the PU.
  • the prediction residual and the prediction mode used to ultimately obtain that prediction residual can then be encoded to a bitstream.
  • the prediction residual is reconstructed into a CU using a PU produced based on the prediction mode and is thereafter included in an output video stream.
  • Conventional video codecs include functionality for partitioning a CTU or CU into one or more smaller CUs, such as to enable the different processing of each of the smaller CUs based on the specific video data thereof.
  • conventional partitioning schemes are generally limited such as to rectangular shapes or certain CU sizes and/or based on the type of inter- or intra-prediction being performed.
  • such conventional partitioning schemes may be suboptimal in some cases, such as where object edges do not align with rectangular CU boundaries.
  • VVC Versatile Video Coding
  • the geometric partitioning mode enables a CU to be partitioned other than into rectangular shapes in order to better control the prediction of objects with edges that do not align with rectangular CU boundaries.
  • VVC supports 64 different partitions using the geometric partitioning mode, in which each partition splits a given CU into two partitions.
  • the location of the splitting line for a given geometric partition of the geometric partitioning mode is mathematically derived from the angle and offset parameters of a specific partition.
  • Each part of a given geometric partition in a CU is predicted (e.g., inter-predicted) using its own motion. Because each geometric partition has one motion vector and one reference index, prediction using a geometric partition is generally limited to unidirectional prediction. This constraint is applied to ensure that only two motion compensated predictions are used for each CU.
  • the sample values along the geometric partition edge are adjusted using a blending processing with adaptive weighting to form the final prediction for each of those partitioned areas.
  • the transform and quantization processes are applied to the whole CU, rather than to each geometric partition individually, as is done with other prediction modes.
  • the motion field for the CU predicted using the geometric partition mode is then stored for later use.
  • a unidirectional prediction candidate list is derived directly from the merge candidate list.
  • a motion vector MvO is determined for the first geometric partition
  • a motion vector Mvl is determined for the other geometric partition
  • One motion vector type is stored for each position in the motion field of a geometric partition mode coded CU.
  • the frame 306 may be further subdivided into CTUs 310, which can contain data corresponding to, for example, NxM pixels in the frame 306, in which N and M may refer to the same integer value or to different integer values.
  • the CTUs 310 can also be arranged to include data from one or more slices 308 of pixel data.
  • the CTUs 310 can be of any suitable size, such as 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or larger up to a maximum size, which may be 128x128 pixels or another NxM pixels size.
  • the video frame 600 may include CTUs larger than 64x64 and/or CUs smaller than 4x4. Subject to features within the video frame 600 and/or other criteria, the video frame 600 may be partitioned into various arrangements. Although one arrangement of CUs is shown, any arrangement may be used. Although FIG. 6 shows NxN CTUs and CUs, in some implementations, NxM CTUs and/or CUs may be used, wherein N and M are different numbers. For example, 32x64 CTUs, 64x32 CTUs, 16x32 CUs, 32x 16 CUs, or any other size may be used. In some implementations, Nx2N CTUs or CUs, 2NxN CTUs or CUs, or a combination thereof, may be used.
  • a CU is geometrically partitioned into two partitions in which a prediction signal for a first geometric partition is determined using a motion vector from a first reference picture list (e.g., L(L) and a prediction signal for a second geometric partition is determined using a motion vector from a second reference picture list (e.g., Z7).
  • a first reference picture list e.g., L(L)
  • a prediction signal for a second geometric partition is determined using a motion vector from a second reference picture list (e.g., Z7).
  • the most frequently used blending index may be the third blending index 906, as it provides the smallest blending area size available for the blending indices of the second subset of blending indices.
  • one bin may be used to signal the use of the third blending index 906 for camera content, while two bins may be used to signal each of the fourth blending index 908 and the fifth blending index 910 for camera content.
  • a first cost (e.g., a first rate-distortion cost) can be determined for the encoded first copy and a second cost (e.g., a second rate-distortion cost) can be determined for the encoded second copy.
  • the content type for the video data may thus be determined as the one of the screen content type or the camera content type based on a lowest one of the first cost and the second cost.
  • the encoded first copy and the encoded second copy may each be produced using all blending indices instead of using only a subset of blending indices.
  • the content type is determined as the camera content type and thus the video data is determined to be camera content.
  • the content type may be determined as the screen content type and thus the video data is determined to be screen content; otherwise, the content type may be determined as the camera content type and thus the video data is determined to be camera content.
  • a blending index is determined for geometric partitions of a CU of the video data to encode or decode based on the content type determined for the video data.
  • a first subset of blending indices is used where the content type is the screen content type
  • a second subset of blending indices is used where the content type is the camera content type.
  • the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size (e.g., of a default blending area size), a second blending index corresponding to a half of the blending area size, and a third blending index corresponding to the blending area size.
  • the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size, and a fifth blending index corresponding to quadruple the blending area size.
  • the blending index determined for the geometric partitions of the CU is the first blending index
  • the blending index determined is the third blending index. This is because the first blending index uses a smallest blending area for the first subset of blending indices corresponding to the screen content type and the third blending index uses a smallest blending area for the second subset of blending indices corresponding to the camera content type.
  • the determined content type may be verified, such as by repeating one or more of the above operations described for determining the content type using one or more other CUs.
  • the blending index determined for the current CU may be the third blending index regardless of the subset of blending indices from which it is determined (i.e., regardless of the determined content type).
  • blending is performed against prediction signals of the geometric partitions according to the blending index determined therefor to produce a prediction unit.
  • two integer blending matrices Wo and Wi are used, in which each of those matrices corresponds to a different one of the geometric partitions and thus is used for a different one of the prediction signals.
  • the weights in Wo and Wi are derived from a ramp function based on a displacement from a predicted sample position to the boundary (i.e., splitting line) between the geometric partitions.
  • the prediction unit is encoded (e.g., to a bitstream) or decoded (e.g., for output within an output video stream), based on whether the technique 1000 is performed during encoding or decoding.
  • the prediction unit may be used as reference data for the prediction of one or more other CUs, whether in a same frame as the CU which includes the geometric partitions based on which the prediction unit was produced or a different frame.
  • the technique 1000 includes signaling, based on the content type, an indication of the blending index used to perform the blending against the prediction signals of the geometric partitions. Because the first blending index provides the smallest blending area size for screen content from amongst the blending indices in the first subset and the third blending index provides the smallest blending area size for camera content from amongst the blending indices in the second subset, the first blending index is the most frequently used blending index in the first subset of blending indices and the third blending index is the most frequently used blending index in the second subset of blending indices.
  • a smallest number of bins is used to signal the first blending index as the blending index where the content type is the screen content type or the third blending index as the blending index where the content type is the camera content type
  • the other blending indices for each content type may be signaled using a larger number of bins.
  • one bin may be used to signal the use of the first blending index for screen content or the use of the third blending index for camera content
  • two bins may be used to signal the use of the second or third blending index for screen content or the use of the fourth or fifth blending index for camera content.
  • a constraint or other configuration of the coder used to perform the prediction of the CU which includes the geometric partitions may limit the available blending indices for each content type to a single blending index (e.g., the first blending index for screen content and the third blending index for camera content).
  • the blending index determined for the CU is determined based on content type, and the technique 1000 may accordingly omit signaling the blending index.
  • Implementations of the transmitting station 102 and/or the receiving station 106 can be realized in hardware, software, or any combination thereof.
  • the hardware can include, for example, computers, intellectual property (IP) cores, application-specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit.
  • IP intellectual property
  • ASICs application-specific integrated circuits
  • programmable logic arrays optical processors
  • programmable logic controllers microcode, microcontrollers
  • servers microprocessors, digital signal processors, or any other suitable circuit.
  • signal processors should be understood as encompassing any of the foregoing hardware, either singly or in combination.
  • signals and “data” are used interchangeably. Further, portions of the transmitting station 102 and the receiving station 106 do not necessarily have to be implemented in the same manner.
  • the transmitting station 102 or the receiving station 106 can be implemented using a general purpose computer or general purpose processor with a computer program that, when executed, carries out any of the respective methods, algorithms, and/or instructions described herein.
  • a special purpose computer/processor can be utilized which can contain other hardware for carrying out any of the methods, algorithms, or instructions described herein.
  • the transmitting station 102 and the receiving station 106 can, for example, be implemented on computers in a video conferencing system.
  • the transmitting station 102 can be implemented on a server, and the receiving station 106 can be implemented on a device separate from the server, such as a handheld communications device.
  • the transmitting station 102 can encode content into an encoded video signal and transmit the encoded video signal to the communications device.
  • the communications device can then decode the encoded video signal.
  • the communications device can decode content stored locally on the communications device, for example, content that was not transmitted by the transmitting station 102.
  • Other suitable transmitting and receiving implementation schemes are available.
  • the receiving station 106 can be a generally stationary personal computer rather than a portable communications device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention porte sur une dérivation de contenu pour un codage vidéo en mode de partitionnement géométrique qui est effectuée pour déterminer un indice de mélange pour un mélange adaptatif des partitions géométriques pour une unité de codage. L'indice de mélange peut être signalé sur la base du type de contenu dérivé pour les données vidéo de sujet. Un type de contenu est déterminé pour des données vidéo à coder ou décoder. Un indice de mélange est déterminé pour des partitions géométriques d'une unité de codage des données vidéo sur la base du type de contenu. Le mélange est effectué par rapport à des signaux de prédiction des partitions géométriques selon l'indice de mélange pour produire une unité de prédiction. L'unité de prédiction est ensuite codée (par exemple, à un train de bits) ou décodée (par exemple, pour le stockage ou l'affichage). Le type de contenu peut être un type de contenu d'écran ou un type de contenu de caméra, et le sous-ensemble d'indices de mélange à partir duquel l'indice de mélange est déterminé est basé sur ce type de contenu.
PCT/US2023/084924 2022-12-27 2023-12-19 Dérivation de contenu pour codage vidéo en mode de partitionnement géométrique WO2024145086A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263435440P 2022-12-27 2022-12-27
US63/435,440 2022-12-27

Publications (1)

Publication Number Publication Date
WO2024145086A1 true WO2024145086A1 (fr) 2024-07-04

Family

ID=89834201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/084924 WO2024145086A1 (fr) 2022-12-27 2023-12-19 Dérivation de contenu pour codage vidéo en mode de partitionnement géométrique

Country Status (1)

Country Link
WO (1) WO2024145086A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023205740A1 (fr) * 2022-04-20 2023-10-26 Tencent America LLC . mélange adaptatif pour mode de partition géométrique (gpm)

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023205740A1 (fr) * 2022-04-20 2023-10-26 Tencent America LLC . mélange adaptatif pour mode de partition géométrique (gpm)

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BIN LI ET AL: "A Fast Algorithm for Adaptive Motion Compensation Precision in Screen Content Coding", DATA COMPRESSION CONFERENCE. PROCEEDINGS, IEEE COMPUTER SOCIETY, PISCATAWAY, NJ, US, 7 April 2015 (2015-04-07), pages 243 - 252, XP032963972, ISSN: 1068-0314, [retrieved on 20150702], DOI: 10.1109/DCC.2015.17 *
GAO H ET AL: "Non-EE2: Adaptive Blending for GPM", no. JVET-Z0137 ; m59470, 14 April 2022 (2022-04-14), XP030301022, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/26_Teleconference/wg11/JVET-Z0137-v3.zip JVET-Z0137.docx> [retrieved on 20220414] *
GAO H ET AL: "Non-EE2: Adaptive Blending for GPM", no. JVET-Z0137 ; m59470, 23 April 2022 (2022-04-23), XP030301024, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/26_Teleconference/wg11/JVET-Z0137-v3.zip JVET-Z0137.pptx> [retrieved on 20220423] *
HAOMING CHEN ET AL: "On Intra Prediction for Screen Content Video Coding", ARXIV.ORG, 10 November 2015 (2015-11-10), pages 1 - 13, XP055940748, Retrieved from the Internet <URL:https://doi.org/10.48550/arXiv.1511.01862> [retrieved on 20220711], DOI: 10.48550/ARXIV.1511.01862 *
KIDANI (KDDI) Y ET AL: "EE2-2.7: GPM adaptive blending (JVET-Z0059, JVET-Z0137)", no. JVET-AA0058 ; m60028, 11 July 2022 (2022-07-11), XP030302752, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/27_Teleconference/wg11/JVET-AA0058-v2.zip JVET-AA0058-v2_clean.docx> [retrieved on 20220711] *
KIDANI (KDDI) Y ET AL: "Non-EE2: Adaptive width for GPM blending area", no. JVET-Z0059 ; m59373, 20 April 2022 (2022-04-20), XP030300817, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/26_Teleconference/wg11/JVET-Z0059-v2.zip JVET-Z0059.docx> [retrieved on 20220420] *
KIDANI (KDDI) Y ET AL: "Non-EE2: Adaptive width for GPM blending area", no. JVET-Z0059 ; m59373, 20 April 2022 (2022-04-20), XP030300818, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/26_Teleconference/wg11/JVET-Z0059-v2.zip JVET-Z0059.pptx> [retrieved on 20220420] *

Similar Documents

Publication Publication Date Title
US10798408B2 (en) Last frame motion vector partitioning
US11979564B2 (en) Adaptive filter intra prediction modes in image/video compression
EP2774360B1 (fr) Prédiction intra en modulation différentielle par impulsions et codage pour un codage vidéo de grande efficacité
US8798131B1 (en) Apparatus and method for encoding video using assumed values with intra-prediction
EP4322531A2 (fr) Codeur, décodeur et procédés correspondants d&#39;harmonisation de prédiction intra basée sur une matrice et de sélection de c ur secondaire secondaire
EP3622713A1 (fr) Codage intraprédiction directionnel
US9693066B1 (en) Object-based intra-prediction
US10567772B2 (en) Sub8×8 block processing
WO2018118153A1 (fr) Prédiction non causale de blocs chevauchés, dans un codage vidéo à taille de blocs variable
WO2024145086A1 (fr) Dérivation de contenu pour codage vidéo en mode de partitionnement géométrique
WO2024081010A1 (fr) Prédiction inter-composantes basée sur une région
WO2024081011A1 (fr) Simplification de dérivation de coefficients de filtre pour prédiction inter-composante
WO2024254037A1 (fr) Limitation de syntaxe de vecteur de mouvement signalé pour un codage vidéo d&#39;image interpolé dans le temps
WO2025019204A1 (fr) Décalage de mouvement non linéaire au niveau de la trame dans un codage vidéo
WO2024158769A1 (fr) Mode de saut hybride avec sous-bloc codé pour codage vidéo
WO2024254041A1 (fr) Prédiction d&#39;image interpolée dans le temps au moyen d&#39;un vecteur de mouvement de niveau de trame
WO2022116054A1 (fr) Procédé et système de traitement d&#39;images, codeur vidéo et décodeur vidéo
US20210409705A1 (en) Extended transform partitions for video compression
WO2025064566A1 (fr) Prédiction de trame d&#39;image interpolée
WO2025019207A1 (fr) Réutilisation des facteurs de pondération entre les composants dans l&#39;inter-prédiction pondérée
WO2025006167A1 (fr) Compression vidéo à plans de luminance et de chrominance séparés
WO2025010397A1 (fr) Prédiction inter-composantes fondée sur une fusion et prédiction inter à filtrage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23848305

Country of ref document: EP

Kind code of ref document: A1