[go: up one dir, main page]

WO2025149564A2 - Image and video coding and decoding - Google Patents

Image and video coding and decoding

Info

Publication number
WO2025149564A2
WO2025149564A2 PCT/EP2025/050422 EP2025050422W WO2025149564A2 WO 2025149564 A2 WO2025149564 A2 WO 2025149564A2 EP 2025050422 W EP2025050422 W EP 2025050422W WO 2025149564 A2 WO2025149564 A2 WO 2025149564A2
Authority
WO
WIPO (PCT)
Prior art keywords
current block
temporal
split
block
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2025/050422
Other languages
French (fr)
Other versions
WO2025149564A3 (en
Inventor
Guillaume Laroche
Patrice Onno
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Europe Ltd
Canon Inc
Original Assignee
Canon Europe Ltd
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Europe Ltd, Canon Inc filed Critical Canon Europe Ltd
Publication of WO2025149564A2 publication Critical patent/WO2025149564A2/en
Publication of WO2025149564A3 publication Critical patent/WO2025149564A3/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • the present invention relates to encoding and decoding of image and video data and particularly, but not exclusively, image and video partitioning data.
  • VVC Versatile Video Coding
  • JVET Since the end of the standardisation of VVC vl, JVET has launched an exploration phase by establishing an exploration software (ECM). It gathers additional tools and improvements of existing tools on top of the VVC standard to target better coding efficiency.
  • ECM exploration software
  • a method of encoding or decoding image data into or from a bitstream including data indicating a partitioning of the image data into a plurality of blocks according to a coding tree, wherein blocks in the coding tree may be partitioned according to a plurality of split modes, each split mode being indicated by a respective partitioning syntax element, the method comprising: determining, for a current block to be decoded, a coding order for the partitioning syntax elements, wherein the determining is based on at least one parameter.
  • coding order refers to a given sequence of partitioning syntax elements with respect to the order in which they appear in the bitstream.
  • ABCD where each of A, B, C and D each represent a respective partitioning syntax element, represents a first coding order, whereby A is coded, followed by B, followed by C, followed by D.
  • BACD represents a second coding order in which the partitioning syntax element B has been moved forward (i.e. in the direction of the beginning of the coding order) in the coding order, such that B is coded before A.
  • ABDC represents a third coding order in which the partitioning syntax element C has been moved backward (i.e. in the direction of the end of the coding order) in the coding order, such that C is coded after D.
  • the at least one parameter comprises one or more variable used to determine the partitioning of another block.
  • the at least one parameter comprises a spatial parameter.
  • the at least one parameter comprises a quad tree depth, or a block size, of a block neighbouring the current block.
  • the at least one parameter comprises one or more of: an average quad tree depth, or average block size, of two or more blocks neighbouring the current block; a minimum quad tree depth, or minimum block size, of two or more blocks neighbouring the current block; and a maximum quad tree depth, or maximum block size, of two or more blocks neighbouring the current block.
  • the at least one parameter comprises a multi tree depth value of a block neighbouring the current block.
  • the method further comprises a step of comparing the multi tree depth value of the block neighbouring the current block to the multi tree depth value of the current block, wherein the coding order is determined based on the comparison.
  • the method further comprises a step of converting the multi tree depth value of the block neighbouring the current block to a converted value, wherein the method further comprises a step of comparing the converted value to the quad tree depth value of the current block, wherein the coding order is determined based on the comparison.
  • the step of converting the multi tree depth value of the block neighbouring the current block to a converted value comprises dividing the multi tree depth value of the block neighbouring the current block by two.
  • the at least one parameter comprises one or more of: an average multi tree depth of two or more blocks neighbouring the current block; a minimum multi tree depth of two or more blocks neighbouring the current block; and a maximum multi tree depth of two or more blocks neighbouring the current block.
  • the at least one parameter comprises a binary tree depth value of a block neighbouring the current block.
  • the at least one parameter comprises one or more of: an average binary tree depth of two or more blocks neighbouring the current block; a minimum binary tree depth of two or more blocks neighbouring the current block; and a maximum binary tree depth of two or more blocks neighbouring the current block.
  • the at least one parameter comprises one or more of: an average ternary tree depth of two or more blocks neighbouring the current block; a minimum ternary tree depth of two or more blocks neighbouring the current block; and a maximum ternary tree depth of two or more blocks neighbouring the current block.
  • the method further comprises a step of comparing the height and/or width of at least one block neighbouring the current block to the height and/or width of the current block, wherein the coding order is determined based on the comparison.
  • the method further comprises a step of comparing the height to width ratio of at least one block neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
  • the method further comprises a step of comparing an average of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
  • the method further comprises a step of comparing a maximum of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
  • the or each block neighbouring the current block is also used for context derivation.
  • the or each block neighbouring the current block comprises a block at a bottom left position to the current block and/or a block diagonally below and to the left of the current block.
  • the or each block neighbouring the current block comprises a block at a top right position to the current block and/or a block diagonally above and to the right of the current block.
  • all blocks having a common border or corner with the current block are used as neighbouring blocks.
  • the at least one parameter comprises at least one temporal parameter.
  • the at least one temporal parameter comprises one or more of: an average quad tree depth, or average block size, of a temporal area; a minimum quad tree depth, or minimum block size, of a temporal area; and a maximum quad tree depth, or maximum block size, of a temporal area.
  • the at least one temporal parameter comprises a multi tree depth value of a temporal area.
  • the method further comprises a step of comparing the multi tree depth value of the temporal area to the multi tree depth value of the current block, wherein the coding order is determined based on the comparison.
  • the method further comprises a step of converting the multi tree depth value of the temporal area to a converted value, wherein the method further comprises a step of comparing the converted value to the quad tree depth value of the current block, wherein the coding order is determined based on the comparison.
  • the step of converting the multi tree depth value of the temporal area to a converted value comprises dividing the multi tree depth value of the temporal area by two.
  • the at least one temporal parameter comprises one or more of: an average multi tree depth of a temporal area; a minimum multi tree depth of a temporal area; and a maximum multi tree depth of a temporal area.
  • the at least one temporal parameter comprises a binary tree depth value of a temporal area.
  • the at least one temporal parameter comprises one or more of: an average binary tree depth of a temporal area; a minimum binary tree depth of a temporal area; and a maximum binary tree depth of a temporal area.
  • the at least one temporal parameter comprises a ternary tree depth value of a temporal area.
  • the header is a picture header or a slice header.
  • the at least one parameter comprises a parameter transmitted in a sequence parameter set and/or a picture parameter set.
  • the at least one parameter comprises a fixed number.
  • the at least one parameter comprises a parameter that is dependent on a slice type of the slice to which the current block belongs.
  • the at least one parameter comprises a parameter dependent on the quantization parameter value of the slice to which current block belongs.
  • the at least one parameter comprises a parameter dependent on the temporal ID or the hierarchy depth of the frame to which current block belongs.
  • the at least one parameter comprises a parameter dependent on maximum multi tree depth of the slice to which current block belongs.
  • the at least one parameter comprises a parameter dependent on maximum multi tree depth of the picture to which current block belongs.
  • the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to a quad tree split is coded before a flag related to no split.
  • the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split qt flag is coded before split cu flag.
  • the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to no split is coded after a flag related to a binary tree split.
  • the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to no split is coded after a flag related to a ternary tree split.
  • the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split qt flag is coded after mtt split cu vertical flag and mtt split cu binary flag.
  • the step of determining a coding order for the partitioning syntax elements is performed based on a high level syntax flag.
  • the temporal area comprises a temporal block collocated with respect to the current block.
  • the temporal area comprises one or more temporal block neighbouring a temporal block collocated with respect to the current block.
  • the temporal area comprises a coding tree unit collocated with respect to the current block.
  • the temporal area is determined based on a center of the current block.
  • the temporal area comprises all blocks of a temporal frame.
  • the frame with the same temporal ID as the temporal ID of a frame to which the current block belongs comprises a temporally closest frame with the same temporal ID.
  • Said temporally closest frame could be the closest frame based on an absolute difference in picture order count between the two frames.
  • the temporal area comprises an area of a frame, or a reference frame, with a same quantization parameter as a frame to which the current block belongs.
  • the temporal area comprises an area of a reference frame used for temporal motion vector prediction.
  • the temporal area comprises an area of a reference frame, said reference frame being a temporally closest reference frame to a frame to which the current block belongs.
  • Said temporally closest reference frame could be the closest reference frame based on an absolute difference in picture order count between the two frames.
  • the temporal area comprises a first area of a first reference frame and a second area of a second reference frame.
  • Such a binary split may include horizontal binary splitting and/or vertical binary splitting.
  • a ternary split may include horizontal ternary splitting and/or vertical ternary splitting.
  • the above embodiments refer to binary tree (horizontal and vertical), ternary tree (horizontal and vertical), quad tree and no split being possible splits of a block, it would be understood that the invention is not so limited and other modes may be considered.
  • the coding order of syntax elements relating to other geometrical splits may be determined according to one or more criteria mentioned in the aspects and embodiments above.
  • the methods described above may be disabled for screen content coded image data or video data, for a low delay configuration, using at least one flag (e.g. transmitted in a header). Whether the image or video data to be encoded or decoded is screen content coded image data may be determined based on whether a number of blocks in an area of the frame including the current block or an (e.g.
  • collocated or temporal area of another frame which are Intra block coded or are palette mode coded crosses a threshold (is above a predetermined value or alternatively is below a predetermined value).
  • a threshold is above a predetermined value or alternatively is below a predetermined value.
  • whether the image or video data is screen content coded may be indicated by whether the palette mode has been enabled for the image data or video data (e.g. by the setting of a flag in a header).
  • the device for encoding image data into a bitstream, the device being configured to perform the method according to any of the aspects and embodiments mentioned above.
  • the device for decoding image data from a bitstream, the device being configured to perform the method of any of the embodiments and aspects mentioned above.
  • the computer program may be provided on its own or may be carried on, by or in a carrier medium.
  • the carrier medium may be non-transitory, for example a storage medium, in particular a computer-readable storage medium.
  • the carrier medium may also be transitory, for example a signal or other transmission medium.
  • the signal may be transmitted via any suitable network, including the Internet.
  • Figure 1 is a diagram which illustrates a coding structure used in HEVC
  • Figure 2 is a block diagram schematically illustrating a data communication system in which one or more embodiments of the invention may be implemented;
  • FIG. 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented;
  • Figure 4 is a schematic illustrating functional elements of an encoder according to embodiments of the invention.
  • Figure 5 is a schematic illustrating functional elements of a decoder according to embodiments of the invention.
  • Figures 6 shows blocks positioned relative to a current block including a collocated block
  • Figure 7 illustrates a temporal random-access GOP (group of pictures) structure for 33 frames with the related Temporal ID and POC (picture order count);
  • FIG. 8 illustrates the 6 possible split modes of VVC
  • Figure 9 illustrates an example of MinQTSize variable
  • Figure 10 illustrates the MaxBTSize and MaxMttDepth
  • FIG. 11 illustrates partitioning constraints
  • Figure 12 illustrates incomplete CTUs in the borders of a frame
  • Figure 13 illustrates an encoding by settings of MaxMttDepth based on the temporal ID
  • Figure 14 illustrates several temporal positions
  • Figure 15 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments;
  • Figure 16 is a schematic block diagram of a computing device for implementation of one or more embodiments;
  • Figure 17 is a diagram illustrating a network camera system
  • Figure 18 is a diagram illustrating a smart phone.
  • Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video and Versatile Video Coding (VVC) standards.
  • a video sequence 1 is made up of a succession of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
  • An image 2 of the sequence may be divided into slices 3.
  • a slice may in some instances constitute an entire image.
  • These slices are divided into non-overlapping Coding Tree Units (CTUs).
  • a Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) video standards and conceptually corresponds in structure to macroblock units that were used in several previous video standards.
  • a CTU is also sometimes referred to as a Largest Coding Unit (LCU).
  • a CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
  • CTB Coding Tree Block
  • a CTU is generally of size 64 pixels x 64 pixels for HEVC, yet for VVC this size can be 128 pixels x 128 pixels.
  • Each CTU may in turn be iteratively divided into smaller variablesize Coding Units (CUs) 5 using a quadtree (QT) decomposition.
  • CUs Coding Units
  • QT quadtree
  • Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU).
  • the maximum size of a PU or TU is equal to the CU size.
  • a Prediction Unit corresponds to the partition of the CU for prediction of pixels values.
  • Various different partitions of a CU into PUs are possible as shown by 6 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs.
  • a Transform Unit is an elementary unit that is subjected to spatial transformation using discrete cosine transform (DCT).
  • DCT discrete cosine transform
  • a CU can be partitioned into TUs based on a quadtree representation 7.
  • NAL Network Abstraction Layer
  • coding parameters of the video sequence are stored in dedicated NAL units called parameter sets.
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream.
  • the VPS is a type of parameter set defined in HEVC, and applies to all of the layers of a bitstream.
  • a layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer.
  • HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
  • VVC Video Codtures
  • subpictures which are independently coded groups of one or more slices.
  • FIG. 2 illustrates a data communication system in which one or more embodiments of the invention may be implemented.
  • the data communication system comprises a transmission device, in this case a server 201, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 202, via a data communication network 200.
  • the data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN).
  • WAN Wide Area Network
  • LAN Local Area Network
  • Such a network may be for example a wireless network (Wifi / 802.1 la or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks.
  • the data communication system may be a digital television broadcast system in which the server 201 sends the same data content to multiple clients.
  • the data stream 204 provided by the server 201 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 or received by the server 201 from another data provider, or generated at the server 201.
  • the server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
  • the compression of the video data may be for example in accordance with the HEVC format or H.264/ A VC format or VVC format or the format of data generated by the ECM.
  • the client 202 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker.
  • a streaming scenario is considered in the example of Figure 2, it will be appreciated that in some embodiments of the invention the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
  • a video image is transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
  • FIG. 3 schematically illustrates a processing device 300 configured to implement at least an embodiment of the present invention.
  • the processing device 300 may be a device such as a micro-computer, a workstation or a light portable device.
  • the device 300 comprises a communication bus 313 connected to:
  • central processing unit 311 such as a microprocessor, denoted CPU;
  • ROM read only memory
  • RAM random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention;
  • the apparatus 300 may also include the following components:
  • -a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention;
  • the disk drive being adapted to read data from the disk 306 or to write data onto said disk;
  • Figure 7 shows a temporal random-access GOP structure for 33 consecutive frames 0 to 32.
  • the length of the vertical line representing each frame corresponds their temporal ID. (e.g. the longest length corresponds temporal ID 0 and the shortest length for temporal ID 5).
  • the frames with a temporal ID 0 are the highest in the temporal hierarchy because they can be decoded independently to all others frames with a higher temporal ID value.
  • the frames with a temporal ID 1 are in the second in the temporal hierarchy and they can be decoded independently to all others frames with a higher temporal ID value and so on for the other temporal IDs.
  • MaxQTSize there is no definition of a MaxQTSize, so it corresponds to the CTU size.
  • the minimum allowed block size for the width and the height is 4.
  • a set of depths are also defined.
  • Depth is the depth in the tree.
  • a leaf is a terminating node of a tree that is a root node of a tree of depth 0. It means that for each split this value is incremented (by 1).
  • MttDepth it is the depth of multi tree.
  • the multi tree includes BT splits and TT splits.
  • the MaxMttDepth is defined in VVC specification which is the maximum allowed multi tree depth. So, MttDepth is greater than or equal to maxMttDepth.
  • Figure 10 illustrates the concept of maxMttDepth.
  • VVC these variables are defined independently for Luma and Chroma.
  • VTM VVC Test Model
  • the variable currBtDepth is the current number of BT splits used to reach the current tree node (or the current block).
  • the variable currMttDepth is the current number of BT splits and TT splits used to reach the current tree node (or the current block).
  • the variable MaxBtDepth corresponds to the variable MaxMttDepth of the VVC specification.
  • the currQtDepth is the current number of QT splits used to reach the current tree node (or the current block).MaxBtDepth: is the maximum allowed binary tree depth, i.e., the lowest level at which binary splitting may occur, where the quadtree leaf node is the root (e.g., 3).
  • the VVC partitioning has several restrictions. These restrictions are mainly to avoid the same partitioning after several consecutive splits.
  • Figure 11 illustrates some of these constraints. The idea is to avoid the same partitioning with BT and TT. As depicted in Figure 11(a) two consecutive vertical BT split are allowed but a vertical TT followed by a vertical BT split in the center block is not allowed as depicted in Figure 11 (b).
  • VVC VVC there are additional constraints for the minimum chroma block size and for the TT and BT maximum block size for inter block size. These constraints have been removed for the ECM software.
  • split cu flag 0 specifies that a coding unit is not split
  • split cu flag 1 specifies that a coding unit is split into four coding units using a quad split as indicated by the syntax element split qt flag, or into two coding units using a binary split or into three coding units using a ternary split as indicated by the syntax element mtt split cu binary flag.
  • the binary or ternary split can be either vertical or horizontal as indicated by the syntax element mtt split cu verti cal fl ag .
  • split cu flag When split cu flag is not present, the value of split cu flag is inferred as follows: - If one or more of the following conditions are true, the value of split cu flag is inferred to be equal to 1 :
  • split qt flag specifies whether a coding unit is split into coding units with half horizontal and vertical size.
  • split qt flag is inferred to be equal to 1 :
  • split qt flag is inferred to be equal to 0.
  • mtt split cu vertical flag 0 specifies that a coding unit is split horizontally
  • mtt split cu vertical flag 1 specifies that a coding unit is split vertically
  • mtt_split_cu_binary_flag 0 specifies that a coding unit is split into three coding units using a ternary split
  • mtt split cu binary flag 1 specifies that a coding unit is split into two coding units using a binary split.
  • allowSplitBtHor is equal to TRUE and allowSplitTtVer is equal to TRUE, the value of mtt split cu binary flag is inferred to be equal to 1 - mtt split cu vertical flag. - Otherwise (allowSplitBtVer is equal to TRUE and allowSplitTtHor is equal to TRUE), the value of mtt split cu binary flag is inferred to be equal to mtt split cu vertical flag.
  • the split qt flag specifies that the current block is split thanks to the QT split. It is signalled when the split cu flag was equal to 0 and when the QT split is available and when at least one MTT split is available.
  • the availability of split modes is determined by some rules including the minimum QT depth, an average QT depth obtained from a temporal area.
  • the maximum MTT depth for a current block is determined based on the value of maximum MTT depth obtained from a temporal area. This value can be incremented or decremented compared to the value for the current picture.
  • the coding order of partitioning syntax elements depends one at least one parameter.
  • the regular coding order is split cu flag, split qt flag, mtt split cu vertical flag followed by 1 mtt split cu binary flag. This order is fixed. Yet, one or more flags are not coded according to the allowance of split modes and of course when one split mode is identified. This is not the object of the invention.
  • the embodiments of the invention change the coding order with respect to one another according to at least one parameter value.
  • the order can be split qt flag, mtt split cu vertical flag, mtt split cu binary flag and split cu flag.
  • At least one parameter represents a variable, or several variables, used for the splitting of other blocks
  • the advantage is a coding efficiency as the coding order is adapted to take into account the spatial correlations between spatial partitioning.
  • the advantage is coding efficiency improvement as the MTT depth of the current block is correlated to the MTT depths of neighbouring blocks.
  • the advantage is coding efficiency improvement as the average, the minimum and the maximum are more relevant to determine the correlation.
  • Emb.Pl.S.5 The BT or TT Depth values/average/minimum/maximum.
  • the BT depth and/or the TT depth or the average, the minimum, the maximum of BT depth and/or the TT depth can be considered to change the order. This can be used for all changes of coding orders of all flags. Yet it is more interesting for the MTT flags.
  • the advantage is coding efficiency improvement as a finer granularity to set a better coding order improves the coding efficiency.
  • the height and width of the neighbouring blocks are compared to the current height and width of the current block to determine the coding order of the current block.
  • An average or minimum or a maximum of ratio can be also considered.
  • the advantage is coding efficiency improvement as the ratio gives the information of the direction of the neighbouring blocks compared to the current block. So, when it is used, it is useful to order the MTT split flag.
  • the neighbouring blocks used for the derivation of the spatial parameters to change the coding order of the partitioning are the same neighbouring blocks as used for the context derivation.
  • the advantage is a reduced memory access compared to the following embodiments. Indeed, when using the same blocks as those used for the context derivation, the neighbouring blocks need to be accessed once.
  • all neighbouring blocks which have a linear frontier or a corner are considered for the derivation of the determination of parameters to change the coding order of partitioning syntax elements.
  • the blocks AO, Al A2 and BO, Bl, B2, B3 are considered.
  • At least one parameter is a temporal parameter
  • At least one parameter is a temporal parameter, to exploit temporal correlations.
  • the temporal parameter or parameters are derived from a temporal area.
  • the temporal area is defined later in description of the invention.
  • Emb.Pl.T.5 The BT or TT Depth values/average/minimum/maximum.
  • the BT depth and/or the TT depth or the average, the minimum, the maximum of BT depth and/or the TT depth can be considered to change the order. This can be used for all change coding order of all flags. Yet it is more interesting for the MTT flags.
  • the advantage is coding efficiency improvement as a finer granularity to set a better coding order improve the coding efficiency.
  • the height and the width or an average of the height or the width of the blocks from a temporal area are compared to the current height and width of the current block to determine the coding order of the partitioning syntax elements for the current block.
  • An average or minimum or a maximum of a ratio between the height and width of the blocks from a temporal area can be alternatively or additionally also considered.
  • the ratio between the height and width of the temporal is compared to the ratio of the height and the width of the current block.
  • the advantage is coding efficiency improvement as the ratio gives the information of the direction of the temporal area compared to the current block. So, when it used, it is useful to order the MTT split flag values.
  • At least one parameter transmitted in a header is considered to change the order of the partitioning syntax elements for the current block.
  • the parameter or parameters can be compared to the current QT depth or the current MTT depth depending on the change order that can be used.
  • At least one parameter is transmitted in the SPS.
  • At least one parameter is transmitted in the picture header.
  • At least one parameter is a fixed number and it is considered to change the order of the partitioning syntax elements for the current block. If several parameters are needed, several fixed numbers are considered.
  • the advantage is coding efficiency improvement compared to the previous embodiment with a similar complexity.
  • At least one parameter depends on the usage of Intra mode and Inter mode in the neighbourhood.
  • the neighbourhood here includes spatial and/or temporal blocks.
  • the proportion of pixels coded in Intra compared to the pixels coded in Inter is used to determine the parameter or the parameters to change the order of the partitioning syntax elements for the current block.
  • the advantage is coding efficiency improvement compared to the previous embodiment but with additional complexity.
  • At least one parameter depends on the QP value of the current block or on the QP value of the slice.
  • At least one parameter depends on the temporal ID or on the hierarchy depth of the frame of the current block.
  • the split qt flag is coded before the split cu flag according to at least one parameter. As described in the following table syntax elements.
  • split qt flag is equal to 1 (or true), the other syntax elements don’t need to be decoded. Otherwise, split qt flag is equal to 0 (or false), the split cu flag is decoded and if it is equal to 1, the regular decoding of another flag is applied. Except, for the split qt flag which is decoded only when the current QT depth is superior or equal to the QT depth pred. Indeed, if the current QT depth is inferior to the QT depth predictor, the split qt flag have been already decoded. Please note that it is an example and the last condition is just a check if split qt flag has been decoded or not. For example, another variable representing that split qt flag has been decoded or not can be used. Or if the variable split qt flag is initialized to 1, the check to decode split qt flag can be if split qt flag is equal to 1.
  • the predictor “QTDepthPred” is a QT depth value obtained from a temporal area.
  • the temporal QT depth is correlated to the current QT depth.
  • This QT depth can be a value extracted from a temporal block or an average, a minimum or a maximum value among several temporal blocks.
  • this temporal QT depth value is an average of QT depth of blocks in a temporal area as described latter.
  • the MTT depth predictor is divided by 2 (or right shifted by 1) as the following:
  • the MttDepthPred » 1 is greater than 0, even if the split qt flag is signalled before the split cu flag, there is a low chance that the split cu flag is equal to 0. So, in that case there is no additional bins signalled compared to the regular coding order.
  • the QTDepthPred and the MttDepthPred are obtained spatially and/or temporally.
  • QTDepthPred can be set equal as the following:
  • the QT depth predictor is set equal to a syntax element decoded from an header.
  • the or these syntax elements can be a picture header syntax element ph QtDethPred, and/or a slice header syntax element sh QtDethPred, and/or a picture parameter set syntax element pps QtDethPred, and/or a sequence parameter set syntax element sps QtDethPred.
  • a CTU syntax element ctu QtDethPred ctu QtDethPred.
  • the advantage of this embodiment is the simplification of the parsing/decoding, as no value for the QT depth predictor needs to be updated at block level.
  • the QTdepthpred depends on the type of the slice. For example, QTdepthpred is equal 3 is affected for intra slices and QTdepthpred is equal 2 for Inter slices.
  • the usage of Intra or Inter mode in the neighbourhood change the value of QTdepthpred.
  • the value can be incremented when there is one or more Intra block in the neighbourhood. This can be also an additional embodiment to the other based on temporal and spatial QT depth predictor.
  • the QTdepthpred depends on the QP value or on high/low bitrate. When the QP value is low (high bitrate), the QTdepthpred is higher than for high QP (low bitrate).
  • the QTdepthpred depends on the temporal ID or the hierarchy depth. For example, for small temporal ID or hierarchy depth, the QTdepthpred is set at an higher value than high temporal ID or hierarchy depth. For example, 3 or 4 for the lower level (temporal ID 0) and to 1 or 0 for the highest level.
  • the QTdepthpred depends on the Maximum MTT depth for the current picture/slice. For example, when the maximum MTT depth is low (1 or 0), the QTdepthpred has a highest value than when the MTT depth is low (3 or 4).
  • this embodiment is the simplification of the parsing/decoding, as no value for the QT depth predictor needs to be updated at block level.
  • the parameter is a fix value
  • the last MTT decoded split can be, for example, the least probable MTT split among the available MTT splits.
  • the MttDepthPred can be obtained according to one or more of the following examples.
  • the predictor “MttDepthPred” is a MTT depth value obtained from a temporal area.
  • the temporal MTT depth is correlated to the current MTT depth.
  • This MTT depth can be a value extracted from a temporal block or an average, a minimum or a maximum value among several temporal blocks.
  • this temporal MTT depth value is an average of MTT depth of blocks in a temporal area as described later.
  • the MTT depth predictor is set equal to a syntax element decoded from a header.
  • the or these syntax elements can be a picture header syntax element ph MttDethPred, and/or a slice header syntax element sh MttDethPred, and/or a picture parameter set syntax element pps MttDethPred, and/or a sequence parameter set syntax element sps MttDethPred.
  • a CTU syntax element ctu MttDethPred ctu MttDethPred.
  • the advantage of this embodiment is the simplification of the parsing/decoding, as no value for the MTT depth predictor needs to be updated at block level.
  • the MttDepthPred depends on the type of the slice. For example, MttDepthPred is set equal to 2 for intra slices and MttDepthPred is set equal to 1 for Inter slices.
  • the usage of Intra or Inter mode in the neighbourhood changes the value of MttDepthPred.
  • the value can be incremented when there is Intra blocks in the neighbourhood.
  • the MttDepthPred depends on the QP value or on high/low bitrate. When the QP value is low (high bitrate), the MttDepthPred is higher than for high QP (low bitrate).
  • the MttDepthPred depends on the temporal ID or the Hierarchy Depth. For example, for small temporal ID or hierarchy depth, the MttDepthPred is set at a higher value than high temporal ID or hierarchy depth. For example, MttDepthPred is set equal to 3 or 4 for the lower level (temporal ID 0) and to 1 or 0 for the highest level.
  • the MttDepthPred depends on the maximum MTT depth for the current picture/slice. For example, when the maximum MTT depth is low (1 or 0), the MttDepthPred is set to a higher value than when the MTT depth is low (3 or 4). As the previous example, this embodiment is the simplification of the parsing/decoding, as no value for the MTT depth predictor needs to be updated at block level.
  • the parameter is a fixed value
  • the MttDepthPred is determined based on at least one another value.
  • MttDepthPred can be fixed. In that case setting a MttDepthPred equal to 1 gives beneficial results. A value of 2 has also been found to produce beneficial results.
  • the split qt flag is coded after the MTT split flags
  • the split qt flag is coded after the MTT split flags as depicted in the following table syntax elements.
  • the advantage is coding efficiency improvement sometimes.
  • the improvement is obtained for low bit rates or for some cases as described in the following embodiments.
  • the at least parameter is at least one of parameter described in Emb. P.l
  • the split cu flag is coded after the MTT flags according to at least one parameter otherwise it is coded as the regular coding order.
  • the current QT depth “QTDepth” is compared to a second depth predictor “QTDepthPred2”. If, according to the regular conditions for QT split decoding, and if the current QT depth is inferior to the second QT depth predictor, the split qt flag is decoded. In this example, the split qt flag is initialized to 0.
  • split qt flag is equal to 0 (or false), whether it has been decoded or not, the other syntax elements are decoded. Otherwise, split qt flag is equal to 1 (or true), the current block is QT split and no other split flags are decoded.
  • split qt flag is equal to 0, i.e. if the split qt flag is decoded and equal to 0 or not decoded, the MTT flags are decoded. If the MTT split is the last decoded split, and if the QT depth is superior or equal to the second predictor QTDepthPred2, the split qt flag is decoded to know if the split mode for the current block is QT split or the last decoded MTT split.
  • the last MTT decoded split can be for example the least probable MTT split among the available MTT splits.
  • the advantage is a coding efficiency improvement, as the QT depth is predictable. More precisely, the value of split qt flag is often predictable when it is equal to 0. So, it is preferable to avoid the coding of split qt flag for some cases.
  • the at least one parameter is at least one of parameters as described previously. This “at least one parameter” is used to obtained MttDepthPred in this example.
  • the split qt flag is coded before the split cu flag according to the condition defined previously and the split cu flag is coded after the MTT split flags according to some other conditions as defined previously.
  • the following table of syntax elements illustrates this embodiment.
  • the split qt flag is coded before the split cu flag according to the condition described previously and the split_qt flag is also coded after the MTT split flags according to some other conditions as defined previously.
  • the following table of syntax elements illustrates this embodiment.
  • the split cu flag is coded after the MTT split flags according to the condition described previously and the split qt flag is not coded according to some other conditions as defined previously as depicted in the following table of syntax elements.
  • the advantage is coding efficiency improvement as it combined the efficiency of these both change orders.
  • Emb.COl + Emb.CO2+ Emb.CO3 In one embodiment, the three change orders are combined as depicted in the following table of syntax elements.
  • the advantage is coding efficiency improvement as it combined the efficiency of these 3 change orders.
  • the proposed invention is not limited to the VVC partitioning. And can be adapted for other type of partitioning. For example, if additional partitioning type are added.
  • the method is adapted to take into account additional MTT modes, as for example, an asymmetric binary coding can be added in the list of MTT split modes.
  • the advantage is a coding efficiency improvement as for the proposed examples described previously.
  • each change order has its own flag to enable or disable it.
  • Emb.O3 QT depth can be replaced by Block size
  • all embodiments using a QT depth or temporal QT depth can replace this with the block size. Indeed, this represents the same principle. Indeed, the QT is only at the beginning of the split, because the QT split is not allowed after an MTT split. The QT depth can be easily considered as the block size.
  • the current one increases the coding efficiency as the temporal QT depth determined or the temporal block size or the average temporal MTT depth or the maximum MTT depth is more often reliable.
  • the collocated block or several temporal blocks or a temporal area come from the closest frame with the same temporal ID.
  • the closest frame with the same temporal ID is (generally) more correlated than the others. So, the result is better.
  • a reference frame which is the same as those used for the temporal Motion vector prediction
  • this embodiment gives the best coding efficiency even if this reference frame has a lower QP. Yet, it is closer to the current frame compared to all frames with the same temporal ID.
  • the collocated block or several temporal blocks or a temporal area come from the closest reference frame.
  • two reference frames are considered and two temporal areas or two sets of several blocks or two collocated blocks are used to determine two temporal QT depths or two block sizes. These are then used to determine one QT depth or one block size or one MTT depth. For example, the minimum of QT depth from the two temporal areas can be considered.
  • More than two reference frames can also be considered.
  • the advantage is a better compromise between encoder time reduction and coding efficiency as the QT depth value or the MTT depth value is computed from more data. This is particularly efficient when both reference frames have the same temporal distance, but it increases the number of memory accesses.
  • VVC the syntax elements to represent the different split modes are coded, if needed, in the following order:
  • the syntax elements are coded in the following order, otherwise the syntax elements are coded as usual:
  • Figure 15 shows a system 191 195 comprising at least one of an encoder 150 or a decoder 100 and a communication network 199 according to embodiments of the present invention.
  • the system 195 is for processing and providing a content (for example, a video and audio content for displaying/outputting or streaming video/audio content) to a user, who has access to the decoder 100, for example through a user interface of a user terminal comprising the decoder 100 or a user terminal that is communicable with the decoder 100.
  • a user terminal may be a computer, a mobile phone, a tablet or any other type of a device capable of providing/displaying the (provided/streamed) content to the user.
  • the system 195 obtains/receives a bitstream 101 (in the form of a continuous stream or a signal - e.g. while earlier video/audio are being displayed/output) via the communication network 199.
  • the system 191 is for processing a content and storing the processed content, for example a video and audio content processed for displaying/outputting/streaming at a later time.
  • the system 191 obtains/receives a content comprising an original sequence of images 151, which is received and processed (including filtering with a deblocking filter according to the present invention) by the encoder 150, and the encoder 150 generates a bitstream 101 that is to be communicated to the decoder 100 via a communication network 191.
  • the bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus.
  • the system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g.
  • the encoder 150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content.
  • the decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
  • the executable code may be stored either in the ROM 3603, on the HD 3606 or on a removable digital medium such as, for example a disk.
  • the executable code of the programs can be received by means of a communication network, via the NET 3604, in order to be stored in one of the storage means of the communication device 3600, such as the HD 3606, before being executed.
  • the CPU 3601 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 3601 is capable of executing instructions from main RAM memory 3602 relating to a software application after those instructions have been loaded from the program ROM 3603 or the HD 3606, for example.
  • a software application when executed by the CPU 3601, causes the steps of the method according to the invention to be performed.
  • a decoder according to an aforementioned embodiment is provided in a user terminal such as a computer, a mobile phone (a cellular phone), a table or any other type of a device (e.g. a display apparatus) capable of providing/displaying a content to a user.
  • a device e.g. a display apparatus
  • an encoder is provided in an image capturing apparatus which also comprises a camera, a video camera or a network camera (e.g. a closed-circuit television or video surveillance camera) which captures and provides the content for the encoder to encode. Two such examples are provided below with reference to Figures 17 and 18.
  • Figure 17 is a diagram illustrating a network camera system 3700 including a network camera 3702 and a client apparatus 202.
  • the network camera 3702 includes an imaging unit 3706, an encoding unit 3708, a communication unit 3710, and a control unit 3712.
  • the network camera 3702 and the client apparatus 202 are mutually connected to be able to communicate with each other via the network 200.
  • the imaging unit 3706 includes a lens and an image sensor (e.g., a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS)), and captures an image of an object and generates image data based on the image.
  • This image can be a still image or a video image.
  • the encoding unit 3708 encodes the image data by using said encoding methods explained above, or a combination of encoding methods described above.
  • the communication unit 3710 of the network camera 3702 transmits the encoded image data encoded by the encoding unit 3708 to the client apparatus 202.
  • the communication unit 3710 receives commands from client apparatus 202.
  • the commands include commands to set parameters for the encoding of the encoding unit 3708.
  • the client apparatus 202 includes a communication unit 3714, a decoding unit 3716, and a control unit 3718.
  • the communication unit 3714 of the client apparatus 202 transmits the commands to the network camera 3702.
  • the communication unit 3714 of the client apparatus 202 receives the encoded image data from the network camera 3712.
  • the control unit 3718 of the client apparatus 202 controls other units in the client apparatus 202 in accordance with the user operation or commands received by the communication unit 3714.
  • the control unit 3718 of the client apparatus 202 controls a display apparatus 2120 so as to display an image decoded by the decoding unit 3716.
  • the control unit 3718 of the client apparatus 202 also controls a display apparatus 2120 so as to display GUI (Graphical User Interface) to designate values of the parameters for the network camera 3702 includes the parameters for the encoding of the encoding unit 3708.
  • GUI Graphic User Interface
  • the control unit 3718 of the client apparatus 202 also controls other units in the client apparatus 202 in accordance with user operation input to the GUI displayed by the display apparatus 2120.
  • the control unit 3718 of the client apparatus 202 controls the communication unit 3714 of the client apparatus 202 so as to transmit the commands to the network camera 3702 which designate values of the parameters for the network camera 3702, in accordance with the user operation input to the GUI displayed by the display apparatus 2120.
  • FIG 18 is a diagram illustrating a smart phone 3800.
  • the smart phone 3800 includes a communication unit 3802, a decoding unit 3804, a control unit 3806 and a display unit 3808.
  • the communication unit 3802 receives the encoded image data via network 200.
  • the decoding unit 3804 decodes the encoded image data received by the communication unit 3802.
  • the decoding / encoding unit 3804 decodes / encodes the encoded image data by using said decoding methods explained above.
  • the control unit 3806 controls other units in the smart phone 3800 in accordance with a user operation or commands received by the communication unit 3806.
  • control unit 3806 controls a display unit 3808 so as to display an image decoded by the decoding unit 3804.
  • the smart phone 3800 may also comprise sensors 3812 and an image recording device 3810. In such a way, the smart phone 3800 may record images, encode the images (using a method described above).
  • the smart phone 3800 may subsequently decode the encoded images (using a method described above) and display them via the display unit 3808 - or transmit the encoded images to another device via the communication unit 3802 and network 200.
  • any result of comparison, determination, assessment, selection, execution, performing, or consideration described above may be indicated in or determinable/inferable from data in a bitstream, for example a flag or data indicative of the result, so that the indicated or determined/inferred result can be used in the processing instead of actually performing the comparison, determination, assessment, selection, execution, performing, or consideration, for example during a decoding process.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Improvements to the processing of partitioning data for image and video data are described. A Image data is encoded or decoded into or from a bitstream. The bitstream includes data indicating a partitioning of the image data into a plurality of blocks according to a coding tree, wherein blocks in the coding tree may be partitioned according to a plurality of split modes, each split mode being indicated by a respective partitioning syntax element. For a current block to be decoded, a coding order for the partitioning syntax elements is to be determined using at least one other parameter associated with the image data.

Description

IMAGE AND VIDEO CODING AND DECODING
Field of invention
The present invention relates to encoding and decoding of image and video data and particularly, but not exclusively, image and video partitioning data.
Background
The Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group 16’s VCEG, released a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before). The main target applications and services include — but not limited to — 360-degree and high- dynamic-range (HDR) videos. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.
Since the end of the standardisation of VVC vl, JVET has launched an exploration phase by establishing an exploration software (ECM). It gathers additional tools and improvements of existing tools on top of the VVC standard to target better coding efficiency.
Summary of Invention
In accordance with a first aspect of the invention there is provided a method of encoding or decoding image data into or from a bitstream, the bitstream including data indicating a partitioning of the image data into a plurality of blocks according to a coding tree, wherein blocks in the coding tree may be partitioned according to a plurality of split modes, each split mode being indicated by a respective partitioning syntax element, the method comprising: determining, for a current block to be decoded, a coding order for the partitioning syntax elements, wherein the determining is based on at least one parameter.
The term “coding order” refers to a given sequence of partitioning syntax elements with respect to the order in which they appear in the bitstream. For example, ABCD, where each of A, B, C and D each represent a respective partitioning syntax element, represents a first coding order, whereby A is coded, followed by B, followed by C, followed by D. As a further example, BACD represents a second coding order in which the partitioning syntax element B has been moved forward (i.e. in the direction of the beginning of the coding order) in the coding order, such that B is coded before A. As a further example, ABDC represents a third coding order in which the partitioning syntax element C has been moved backward (i.e. in the direction of the end of the coding order) in the coding order, such that C is coded after D.
Optionally, the at least one parameter comprises one or more variable used to determine the partitioning of another block.
Optionally, the at least one parameter comprises a spatial parameter.
Optionally, the at least one parameter comprises a quad tree depth, or a block size, of a block neighbouring the current block.
Optionally, the at least one parameter comprises one or more of: an average quad tree depth, or average block size, of two or more blocks neighbouring the current block; a minimum quad tree depth, or minimum block size, of two or more blocks neighbouring the current block; and a maximum quad tree depth, or maximum block size, of two or more blocks neighbouring the current block.
Optionally, the at least one parameter comprises a multi tree depth value of a block neighbouring the current block.
Optionally, the method further comprises a step of comparing the multi tree depth value of the block neighbouring the current block to the multi tree depth value of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of converting the multi tree depth value of the block neighbouring the current block to a converted value, wherein the method further comprises a step of comparing the converted value to the quad tree depth value of the current block, wherein the coding order is determined based on the comparison.
Optionally, the step of converting the multi tree depth value of the block neighbouring the current block to a converted value comprises dividing the multi tree depth value of the block neighbouring the current block by two.
Optionally, the at least one parameter comprises one or more of: an average multi tree depth of two or more blocks neighbouring the current block; a minimum multi tree depth of two or more blocks neighbouring the current block; and a maximum multi tree depth of two or more blocks neighbouring the current block.
Optionally, the at least one parameter comprises a binary tree depth value of a block neighbouring the current block. Optionally, the at least one parameter comprises one or more of: an average binary tree depth of two or more blocks neighbouring the current block; a minimum binary tree depth of two or more blocks neighbouring the current block; and a maximum binary tree depth of two or more blocks neighbouring the current block.
Optionally, the at least one parameter comprises a ternary tree depth value of a block neighbouring the current block.
Optionally, the at least one parameter comprises one or more of: an average ternary tree depth of two or more blocks neighbouring the current block; a minimum ternary tree depth of two or more blocks neighbouring the current block; and a maximum ternary tree depth of two or more blocks neighbouring the current block.
Optionally, the method further comprises a step of comparing the height and/or width of at least one block neighbouring the current block to the height and/or width of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing the height to width ratio of at least one block neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing an average of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing a minimum of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing a maximum of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the or each block neighbouring the current block is also used for context derivation.
Optionally, the or each block neighbouring the current block comprises a block at a bottom left position to the current block and/or a block diagonally below and to the left of the current block.
Optionally, the or each block neighbouring the current block comprises a block at a top right position to the current block and/or a block diagonally above and to the right of the current block. Optionally, all blocks having a common border or corner with the current block are used as neighbouring blocks.
Optionally, the at least one parameter comprises at least one temporal parameter.
Optionally, the at least one temporal parameter comprises a quad tree depth value, or block size, computed from a temporal area.
Optionally, the at least one temporal parameter comprises one or more of: an average quad tree depth, or average block size, of a temporal area; a minimum quad tree depth, or minimum block size, of a temporal area; and a maximum quad tree depth, or maximum block size, of a temporal area.
Optionally, the at least one temporal parameter comprises a multi tree depth value of a temporal area.
Optionally, the method further comprises a step of comparing the multi tree depth value of the temporal area to the multi tree depth value of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of converting the multi tree depth value of the temporal area to a converted value, wherein the method further comprises a step of comparing the converted value to the quad tree depth value of the current block, wherein the coding order is determined based on the comparison.
Optionally, the step of converting the multi tree depth value of the temporal area to a converted value comprises dividing the multi tree depth value of the temporal area by two.
Optionally, the at least one temporal parameter comprises one or more of: an average multi tree depth of a temporal area; a minimum multi tree depth of a temporal area; and a maximum multi tree depth of a temporal area.
Optionally, the at least one temporal parameter comprises a binary tree depth value of a temporal area.
Optionally, the at least one temporal parameter comprises one or more of: an average binary tree depth of a temporal area; a minimum binary tree depth of a temporal area; and a maximum binary tree depth of a temporal area.
Optionally, the at least one temporal parameter comprises a ternary tree depth value of a temporal area.
Optionally, the at least one temporal parameter comprises one or more of: an average ternary tree depth of a temporal area; a minimum ternary tree depth of a temporal area; and a maximum ternary tree depth of a temporal area. Optionally, the method further comprises a step of comparing the height and/or width of at least one block of a temporal area to the height and/or width of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing the height to width ratio of at least one block of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing an average of the height to width ratios of two or more blocks of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing a minimum of the height to width ratios of two or more blocks of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the method further comprises a step of comparing a maximum of the height to width ratios of two or more blocks of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
Optionally, the at least one parameter comprises a parameter transmitted in a header.
Optionally, the header is a picture header or a slice header.
Optionally, the at least one parameter comprises a parameter transmitted in a picture header and a parameter transmitted in a slice header.
Optionally, the at least one parameter comprises a parameter transmitted in a sequence parameter set and/or a picture parameter set.
Optionally, the at least one parameter comprises a fixed number.
Optionally, the at least one parameter comprises a parameter that is dependent on a slice type of the slice to which the current block belongs.
Optionally, the at least one parameter comprises a parameter that is dependent on a number of blocks in the vicinity of the current block encoded using intra predication and/or a number of blocks in the vicinity of the current block encoded using inter prediction.
Optionally, at least one of the blocks in the vicinity of the current block belongs to a temporal area.
Optionally, the method further comprises a step of comparing a number of pixels in the vicinity of the current block encoded using intra prediction to a number of pixels in the vicinity of the current block encoded using inter prediction, wherein the coding order is determined based on the comparison. Optionally, the at least one parameter comprises a parameter dependent on the quantization parameter value of the current block.
Optionally, the at least one parameter comprises a parameter dependent on the quantization parameter value of the slice to which current block belongs.
Optionally, the at least one parameter comprises a parameter dependent on the temporal ID or the hierarchy depth of the frame to which current block belongs.
Optionally, the at least one parameter comprises a parameter dependent on maximum multi tree depth of the slice to which current block belongs.
Optionally, the at least one parameter comprises a parameter dependent on maximum multi tree depth of the picture to which current block belongs.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to a quad tree split is coded before a flag related to no split.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split qt flag is coded before split cu flag.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to no split is coded after a flag related to a binary tree split.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to no split is coded after a flag related to a ternary tree split.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split cu flag is coded after mtt split cu vertical flag and mtt split cu binary flag.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to quad tree split is coded after a flag related to a binary tree split.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to quad tree split is coded after a flag related to a ternary tree split.
Optionally, the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split qt flag is coded after mtt split cu vertical flag and mtt split cu binary flag. Optionally, the step of determining a coding order for the partitioning syntax elements is performed based on a high level syntax flag.
Optionally, the high level syntax flag is transmitted in a sequence parameter set, a picture parameter set, a picture header or a slice header.
Optionally, the temporal area comprises a temporal block collocated with respect to the current block.
Optionally, the temporal area comprises one or more temporal block neighbouring a temporal block collocated with respect to the current block.
Optionally, the temporal area comprises a coding tree unit collocated with respect to the current block.
Optionally, the temporal area is determined based on a center of the current block.
Optionally, the temporal area comprises all blocks of a temporal frame.
Optionally, the temporal area comprises an area of a frame with a same temporal ID as the temporal ID of a frame to which the current block belongs.
Optionally, the frame with the same temporal ID as the temporal ID of a frame to which the current block belongs comprises a temporally closest frame with the same temporal ID. Said temporally closest frame could be the closest frame based on an absolute difference in picture order count between the two frames.
Optionally, the temporal area comprises an area of a frame, or a reference frame, with a same quantization parameter as a frame to which the current block belongs.
Optionally, the temporal area comprises an area of a reference frame used for temporal motion vector prediction.
Optionally, the temporal area comprises an area of a reference frame, said reference frame being a temporally closest reference frame to a frame to which the current block belongs. Said temporally closest reference frame could be the closest reference frame based on an absolute difference in picture order count between the two frames.
Optionally, the temporal area comprises a first area of a first reference frame and a second area of a second reference frame.
In the aspects and embodiments we refer to a binary split. Such a binary split may include horizontal binary splitting and/or vertical binary splitting. In the aspects embodiments we refer to a ternary split. Such a ternary split may include horizontal ternary splitting and/or vertical ternary splitting.
Further, although the above embodiments refer to binary tree (horizontal and vertical), ternary tree (horizontal and vertical), quad tree and no split being possible splits of a block, it would be understood that the invention is not so limited and other modes may be considered. For example, the coding order of syntax elements relating to other geometrical splits may be determined according to one or more criteria mentioned in the aspects and embodiments above. In further embodiments, the methods described above may be disabled for screen content coded image data or video data, for a low delay configuration, using at least one flag (e.g. transmitted in a header). Whether the image or video data to be encoded or decoded is screen content coded image data may be determined based on whether a number of blocks in an area of the frame including the current block or an (e.g. collocated or temporal) area of another frame which are Intra block coded or are palette mode coded crosses a threshold (is above a predetermined value or alternatively is below a predetermined value). Alternatively, whether the image or video data is screen content coded may be indicated by whether the palette mode has been enabled for the image data or video data (e.g. by the setting of a flag in a header).
Other aspects of the invention relate to corresponding, an encoding device, a decoding device, and a computer program operable to carry out the decoding and/or encoding methods of the invention.
In a further aspect according to the present invention, there is provided device for encoding image data into a bitstream, the device being configured to perform the method according to any of the aspects and embodiments mentioned above.
In another aspect according to the present invention, there is provided device for decoding image data from a bitstream, the device being configured to perform the method of any of the embodiments and aspects mentioned above.
In a yet further aspect, there is provided a computer program which is arranged to, upon execution, cause the method of any of the aspects and embodiments to be performed.
The computer program may be provided on its own or may be carried on, by or in a carrier medium. The carrier medium may be non-transitory, for example a storage medium, in particular a computer-readable storage medium. The carrier medium may also be transitory, for example a signal or other transmission medium. The signal may be transmitted via any suitable network, including the Internet. Further features of the invention are characterised by the independent and dependent claims.
Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa.
Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly Any apparatus feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.
It should also be appreciated that particular combinations of the various features described and defined in any aspects of the invention can be implemented and/or supplied and/or used independently.
Brief Description of the Drawings
Reference will now be made, by way of example, to the accompanying drawings, in which:
Figure 1 is a diagram which illustrates a coding structure used in HEVC;
Figure 2 is a block diagram schematically illustrating a data communication system in which one or more embodiments of the invention may be implemented;
Figure 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented;
Figure 4 is a schematic illustrating functional elements of an encoder according to embodiments of the invention;
Figure 5 is a schematic illustrating functional elements of a decoder according to embodiments of the invention;
Figures 6 shows blocks positioned relative to a current block including a collocated block;
Figure 7 illustrates a temporal random-access GOP (group of pictures) structure for 33 frames with the related Temporal ID and POC (picture order count);
Figure 8 illustrates the 6 possible split modes of VVC;
Figure 9 illustrates an example of MinQTSize variable;
Figure 10 illustrates the MaxBTSize and MaxMttDepth;;
Figure 11 illustrates partitioning constraints;
Figure 12 illustrates incomplete CTUs in the borders of a frame;
Figure 13 illustrates an encoding by settings of MaxMttDepth based on the temporal ID;
Figure 14 illustrates several temporal positions;
Figure 15 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments; Figure 16 is a schematic block diagram of a computing device for implementation of one or more embodiments;
Figure 17 is a diagram illustrating a network camera system; and
Figure 18 is a diagram illustrating a smart phone.
Detailed description
Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video and Versatile Video Coding (VVC) standards. A video sequence 1 is made up of a succession of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
An image 2 of the sequence may be divided into slices 3. A slice may in some instances constitute an entire image. These slices are divided into non-overlapping Coding Tree Units (CTUs). A Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) video standards and conceptually corresponds in structure to macroblock units that were used in several previous video standards. A CTU is also sometimes referred to as a Largest Coding Unit (LCU). A CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
A CTU is generally of size 64 pixels x 64 pixels for HEVC, yet for VVC this size can be 128 pixels x 128 pixels. Each CTU may in turn be iteratively divided into smaller variablesize Coding Units (CUs) 5 using a quadtree (QT) decomposition.
Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU). The maximum size of a PU or TU is equal to the CU size. A Prediction Unit corresponds to the partition of the CU for prediction of pixels values. Various different partitions of a CU into PUs are possible as shown by 6 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs. A Transform Unit is an elementary unit that is subjected to spatial transformation using discrete cosine transform (DCT). A CU can be partitioned into TUs based on a quadtree representation 7.
Each slice is embedded in one Network Abstraction Layer (NAL) unit. In addition, the coding parameters of the video sequence are stored in dedicated NAL units called parameter sets. In HEVC and H.264/AVC two kinds of parameter sets NAL units are employed: first, a Sequence Parameter Set (SPS) NAL unit that gathers all parameters that are unchanged during the whole video sequence. Typically, it handles the coding profile, the size of the video frames and other parameters. Secondly, a Picture Parameter Set (PPS) NAL unit includes parameters that may change from one image (or frame) to another of a sequence. HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream. The VPS is a type of parameter set defined in HEVC, and applies to all of the layers of a bitstream. A layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer. HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
Other ways of splitting an image have been introduced in VVC including subpictures, which are independently coded groups of one or more slices.
Figure 2 illustrates a data communication system in which one or more embodiments of the invention may be implemented. The data communication system comprises a transmission device, in this case a server 201, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 202, via a data communication network 200. The data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be for example a wireless network (Wifi / 802.1 la or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks. In a particular embodiment of the invention the data communication system may be a digital television broadcast system in which the server 201 sends the same data content to multiple clients.
The data stream 204 provided by the server 201 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 or received by the server 201 from another data provider, or generated at the server 201. The server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
In order to obtain a better ratio of the quality of transmitted data to quantity of transmitted data, the compression of the video data may be for example in accordance with the HEVC format or H.264/ A VC format or VVC format or the format of data generated by the ECM.
The client 202 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker. Although a streaming scenario is considered in the example of Figure 2, it will be appreciated that in some embodiments of the invention the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
In one or more embodiments of the invention a video image is transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
Figure 3 schematically illustrates a processing device 300 configured to implement at least an embodiment of the present invention. The processing device 300 may be a device such as a micro-computer, a workstation or a light portable device. The device 300 comprises a communication bus 313 connected to:
-a central processing unit 311, such as a microprocessor, denoted CPU;
-a read only memory 307, denoted ROM, for storing computer programs for implementing the invention;
-a random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention; and
-a communication interface 302 connected to a communication network 303 over which digital data to be processed are transmitted or received
Optionally, the apparatus 300 may also include the following components:
-a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention;
-a disk drive 305 for a disk 306, the disk drive being adapted to read data from the disk 306 or to write data onto said disk;
-a screen 309 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 310 or any other pointing means.
The apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
The communication bus provides communication and interoperability between the various elements included in the apparatus 300 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is operable to communicate instructions to any element of the apparatus 300 directly or by means of another element of the apparatus 300.
The disk 306 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
The executable code may be stored either in read only memory 306, on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously. According to a variant, the executable code of the programs can be received by means of the communication network 303, via the interface 302, in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the hard disk 304.
The central processing unit 311 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 304 or in the read only memory 307, are transferred into the random access memory 312, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
Figure 4 illustrates a block diagram of an encoder according to at least an embodiment of the invention. The encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, at least one corresponding step of a method implementing at least an embodiment of encoding an image of a sequence of images according to one or more embodiments of the invention.
An original sequence of digital images iO to in 401 is received as an input by the encoder 400. Each digital image is represented by a set of samples, sometimes also referred to as pixels (hereinafter, they are referred to as pixels). A bitstream 410 is output by the encoder 400 after implementation of the encoding process. The bitstream 410 comprises a plurality of encoding units or slices, each slice comprising a slice header for transmitting encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data.
The input digital images io to in 401 are divided into blocks of pixels by module 402. The blocks correspond to image portions and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32, 64x64, 128x128 pixels and several rectangular block sizes can be also considered). A coding mode is selected for each input block. Two families of coding modes are provided: coding modes based on spatial prediction coding (Intra prediction), and coding modes based on temporal prediction (Inter coding, Merge, SKIP). The possible coding modes are tested.
Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405. Firstly, a reference image from among a set of reference images 416 is selected, and a portion of the reference image, also called reference area or image portion, which is the closest area (closest in terms of pixel value similarity) to the given block to be encoded, is selected by the motion estimation module 404. Motion compensation module 405 then predicts the block to be encoded using the selected area. The difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 405. The selected reference area is indicated using a motion vector.
Thus, in both cases (spatial and temporal prediction), a residual is computed by subtracting the predictor from the original block.
In the INTRA prediction implemented by module 403, a prediction direction is encoded. In the Inter prediction implemented by modules 404, 405, 416, 418, 417, at least one motion vector or data for identifying such motion vector is encoded for the temporal prediction.
Information relevant to the motion vector and the residual block is encoded if the Inter prediction is selected. To further reduce the bitrate, assuming that motion is homogeneous, the motion vector is encoded by difference with respect to a motion vector predictor. Motion vector predictors from a set of motion information predictor candidates is obtained from the motion vectors field 418 by a motion vector prediction and coding module 417. The encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion. In order to further reduce redundancies a transform (such as DCT) is applied by transform module 407 to the residual block, the transformed data obtained is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409. Finally, the encoded residual block of the current block being encoded is inserted into the bitstream 410.
The encoder 400 also performs decoding of the encoded image in order to produce a reference image (e.g. those in Reference images/pictures 416) for the motion estimation of the subsequent images. This enables the encoder and the decoder receiving the bitstream to have the same reference frames (reconstructed images or image portions are used). The inverse quantization (“dequantization”) module 411 performs inverse quantization (“dequantization”) of the quantized data, followed by an inverse transform by inverse transform module 412. The intra prediction module 413 uses the prediction information to determine which predictor to use for a given block and the motion compensation module 414 actually adds the residual obtained by module 412 to the reference area obtained from the set of reference images 416.
Post filtering is then applied by module 415 to filter the reconstructed frame (image or image portions) of pixels. In some embodiments of the invention a sample adaptive offset (SAO) loop filter is used in which compensation offsets are added to the pixel values of the reconstructed pixels of the reconstructed image. However, it should be understood that post filtering does not always have to performed. Also, any other type of post filtering may also be performed in addition to, or instead of, the SAO loop filtering.
Figure 5 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention. The decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, a corresponding step of a method implemented by the decoder 60.
The decoder 60 receives a bitstream 61 comprising encoded units (e.g. data corresponding to a block or a coding unit), each one being composed of a header containing information on encoding parameters and a body containing the encoded video data. As explained with respect to Figure 4, the encoded video data is entropy encoded, and the motion vector predictors’ indexes are encoded, for a given block, on a predetermined number of bits. The received encoded video data is entropy decoded by module 62. The residual data are then dequantized by module 63 and then an inverse transform is applied by module 64 to obtain pixel values. The mode data indicating the coding mode are also entropy decoded and based on the mode, an INTRA type decoding or an INTER type decoding is performed on the encoded blocks (units/sets/groups) of image data.
In the case of INTRA mode, an INTRA predictor is determined by intra prediction module 65 based on the intra prediction mode specified in the bitstream.
If the mode is INTER, the motion prediction information is extracted from the bitstream so as to find (identify) the reference area used by the encoder. The motion prediction information comprises the reference frame index and the motion vector residual. The motion vector predictor is added to the motion vector residual by motion vector decoding module 70 in order to obtain the motion vector. The various motion predictor tools used in VVC are discussed in more detail below with reference to Figures 6-10.
Motion vector decoding module 70 applies motion vector decoding for each current block encoded by motion prediction. Once an index of the motion vector predictor for the current block has been obtained, the actual value of the motion vector associated with the current block can be decoded and used to apply motion compensation by module 66. The reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to apply the motion compensation 66. The motion vector field data 71 is updated with the decoded motion vector in order to be used for the prediction of subsequent decoded motion vectors.
Finally, a decoded block is obtained. Where appropriate, post filtering is applied by post filtering module 67. A decoded video signal 69 is finally obtained and provided by the decoder 60.
Random Access configuration
Figure 7 shows a temporal random-access GOP structure for 33 consecutive frames 0 to 32. The length of the vertical line representing each frame corresponds their temporal ID. (e.g. the longest length corresponds temporal ID 0 and the shortest length for temporal ID 5). The frames with a temporal ID 0 are the highest in the temporal hierarchy because they can be decoded independently to all others frames with a higher temporal ID value. In the same way, the frames with a temporal ID 1 are in the second in the temporal hierarchy and they can be decoded independently to all others frames with a higher temporal ID value and so on for the other temporal IDs. In other words, a frame with a particular temporal ID can be decoded independently from frames with temporal IDs higher in value but may be dependent on frames with lower temporal IDs. This is what is known as temporal scalability. This parameter is similar to the hierarchy depth but the hierarchy depth does not imply the independence decoding to all other frames with a higher depth.
VVC Partitioning
The VVC Partitioning has a specific block partitioning. For one tree node, 6 possible splits are possible as depicted in Figure 8:
-The quad split QT, 801, which divides a block into 4 equally sized square blocks -The binary split BT with its two possible subdivisions 802, 803:
-the vertical binary split, 802, SPLIT BT VER
-the horizontal binary split, 803, SPLIT BT HOR
-The ternary split TT with its two possible subdivisions 804, 805 where the block is split into 3 blocks with a larger block in the middle:
-the vertical ternary split, 804, SPLIT TT VER
-the horizontal ternary split, 805, SPLIT TT HOR
-The No Split, 806, which terminates a tree node so there is no splitting.
VVC splitting control variables
For a current block not all the possible splits are not always permitted. Which splits are available depends on several conditions. These conditions depend on several splitting control variables which have been defined. A first set of variables define the maximum and the minimum block/node size:
• CTU size: it corresponds to the root node size of a quadtree (for example 256^256, 128x 128, 64x64, 32x32, 16x 16 luma samples);
• MaxBTSize: is the maximum allowed binary tree root node size, i.e., the maximum size of a leaf quadtree node that may be partitioned by binary splitting. A current block can be split thanks to a BT split if both height and width of the current block are less than or equal to MaxBTSize. Figure 10 illustrates the concept of MaxBTSize where the MaxBTSize is the size of the quad tree leaf nodes 1002 of a CTU 1001 • MinBTSize: is the minimum allowed binary tree leaf node size; i.e., the minimum width or height of a binary leaf node. So, a current block can be split thanks to a horizontal BT split if its height is greater than MinBTSize. And a current block can be split thanks to a vertical BT split if its width is greater than MinBTSize.
• MaxTTSize: is the maximum allowed ternary tree root node size, i.e., the maximum size of a leaf quadtree node that may be partitioned by ternary splitting. A current block can be split thanks to a TT split if both height and width of the current block are less than or equal to MaxTTSize.
• MinTTSize: represents the minimum allowed ternary tree (TT) leaf node size; i.e., the minimum width or height of a binary leaf node. But in contrast to the BT Split, to be allowed a minimum TT partition size is considered. So, a current block can be split thanks to a horizontal TT split if its height is greater than twice the MinTTSize. And a current block can be split thanks to a vertical TT split if its width is strictly greater than twice the MinTTSize.
• MinQTSize: is the minimum allowed quadtree (QT) leaf node size; So, for the current block, if the current block width is not greater than the MinQTSize, the QT split mode is not allowed. Figure 9 illustrates an example of MinQTSize. By considering a CTU 128, in the illustrated example the MinQTsize is equal to 16.
There is no definition of a MaxQTSize, so it corresponds to the CTU size.
The minimum allowed block size for the width and the height is 4.
A set of depths are also defined.
Depth: is the depth in the tree. In VVC specification a leaf is a terminating node of a tree that is a root node of a tree of depth 0. It means that for each split this value is incremented (by 1).
MttDepth: it is the depth of multi tree. The multi tree includes BT splits and TT splits. • The MaxMttDepth is defined in VVC specification which is the maximum allowed multi tree depth. So, MttDepth is greater than or equal to maxMttDepth. Figure 10 illustrates the concept of maxMttDepth.
In VVC these variables are defined independently for Luma and Chroma.
In the VVC Test Model (VTM) software and ECM software, there is several other variables corresponding to depths.
The variable currBtDepth is the current number of BT splits used to reach the current tree node (or the current block). The variable currMttDepth is the current number of BT splits and TT splits used to reach the current tree node (or the current block). The variable MaxBtDepth corresponds to the variable MaxMttDepth of the VVC specification. The currQtDepth is the current number of QT splits used to reach the current tree node (or the current block).MaxBtDepth: is the maximum allowed binary tree depth, i.e., the lowest level at which binary splitting may occur, where the quadtree leaf node is the root (e.g., 3).
VVC splitting control syntax elements
To set the values of these different variables, some high-level syntax elements are transmitted in the SPS as depicted in the following table of SPS syntax elements
When the sps_partition_constraints_override_enabled flag is enabled in the SPS, some picture header syntax elements are transmitted to update the partitioning variables as depicted in the following table of PH syntax elements.
VVC spliting restrictions
The VVC partitioning has several restrictions. These restrictions are mainly to avoid the same partitioning after several consecutive splits. Figure 11 illustrates some of these constraints. The idea is to avoid the same partitioning with BT and TT. As depicted in Figure 11(a) two consecutive vertical BT split are allowed but a vertical TT followed by a vertical BT split in the center block is not allowed as depicted in Figure 11 (b).
In the same way, As depicted in Figure 11 (c) two consecutive horizontal BT split are allowed but a horizontal TT followed by a horizontal BT split in the center block is not allowed as depicted in Figure 11 (d).
In VVC there are additional constraints for the minimum chroma block size and for the TT and BT maximum block size for inter block size. These constraints have been removed for the ECM software.
Chroma partitioning
In VVC, the Chroma partitioning may be inferred based on the Luma partitioning but this can be disabled. When the Dual tree mode is enabled, the partitioning tree of Chroma is independent to the tree of Luma. But some restrictions exist.
The tree can be also partially dependent to the Luma partitioning for the crosscomponent linear model (CCLM) mode, otherwise it is independent.
Picture boundary
The frame resolution is not always equal to an integer multiple of CTU size. Consequently, there can be incomplete CTUs in the borders of the frame as depicted in Figure 12 where CTUs 1201-1206 are incomplete due to the bottom and right boundaries 1207, 1208 of the frame. In VVC, in contrast to the previous standards, the signaling of the split is allowed at the picture boundary. The splitting process in the boundary is applied until that the coding tree node represents a CU located entirely within a picture. But some splits are inferred (not transmitted). Consequently, the different variables such as MaxMttDepth, MinQtDepth MinQTsize, are increased or decreased according to the possible splits not in the boundary. QT BT TT Encoding choice
In the VTM and ECM software several encoder side optimizations are used for the QT BT TT encoding choice.
One such optimization includes determining if the QT split is tested before the BT split.
The condition is that at least one CU on the left or above the current coding tree node has a QT depth larger than the QT depth of the current coding tree node; and if the CU width represented by the current coding tree node is greater than MinQTSize * 2.
If this condition is true then the QT is before BT and the splits will be treated as the following order:
-No Split
-QT
-BT Horizontal
-BT Vertical
-TT Horizontal
-TT Vertical
Otherwise, the order will be:
-No Split
-BT Horizontal
-BT Vertical
-TT Horizontal
-TT Vertical
-QT
This order is important as according to some optimizations, several splits will not be tested depending on results of the first tested modes. So, when QT is tested last there is a lot of occasions where it will not be evaluated.
MaxMttDepth
The maximum MTT depth has a significant impact on the encoder complexity. The common test conditions for the ECM have been updated to reduce the encoding by settings different MaxMttDepth as depicted in Figure 13. In this setting, the MaxMttDepth is lower for some temporal ID for large resolutions or small QP (quantization parameter) settings. Adaptive MaxBTSize
In the VTM and ECM, there is a frame level encoding choice which sets the MaxBTSize according to the average block sizes of the previous encoded frames with the same depth (=> same temporal ID within common test conditions (CTC) random access (RA) case). The average block size is compared to thresholds as the following pseudo code: if( dBlkSize < AMAXBT TH32 )
{ newMaxBtSize = 32;
} else if( dBlkSize < AMAXBT TH64 )
{ newMaxBtSize = 64;
} else if( dBlkSize < AMAXBT TH128 )
{ newMaxBtSize = 128;
} else
{ newMaxBtSize = 256;
}
Where AMAXBT TH32 is equal to 15, AMAXBT TH64 is equal to 30 and AMAXBT TH128 is equal to 60. This method decreases the maximum BT size when the average block size is small and increases it when it is large.
VVC Coding split mode
In VVC, the coding split modes are transmitted in the coding tree as depicted in the following syntax table where the conditionally parsed flags, split cu flag, split qt flag, mtt split cu vertical flag, mtt split cu binary flag define the splitting of a CU.
The VVC specifications contains the following definitions for these syntax elements: split cu flag equal to 0 specifies that a coding unit is not split, split cu flag equal to 1 specifies that a coding unit is split into four coding units using a quad split as indicated by the syntax element split qt flag, or into two coding units using a binary split or into three coding units using a ternary split as indicated by the syntax element mtt split cu binary flag. The binary or ternary split can be either vertical or horizontal as indicated by the syntax element mtt split cu verti cal fl ag .
When split cu flag is not present, the value of split cu flag is inferred as follows: - If one or more of the following conditions are true, the value of split cu flag is inferred to be equal to 1 :
- xO + cbWidth is greater than pps_pic_width_in_luma_samples.
- yO + cbHeight is greater than pps_pic_height_in_luma_samples.
- Otherwise, the value of split cu flag is inferred to be equal to 0. split qt flag specifies whether a coding unit is split into coding units with half horizontal and vertical size.
When split qt flag is not present, the following applies:
- If all of the following conditions are true, split qt flag is inferred to be equal to 1 :
- split cu flag is equal to 1.
- allowSplitQt, allowSplitBtHor, allowSplitBtVer, allowSplitTtHor and allowSplitTtVer are equal to FALSE.
- Otherwise, if allowSplitQt is equal to TRUE, the value of split qt flag is inferred to be equal to 1.
- Otherwise, the value of split qt flag is inferred to be equal to 0. mtt split cu vertical flag equal to 0 specifies that a coding unit is split horizontally, mtt split cu vertical flag equal to 1 specifies that a coding unit is split vertically When mtt split cu vertical flag is not present, it is inferred as follows:
- If allowSplitBtHor is equal to TRUE or allowSplitTtHor is equal to TRUE, the value of mtt split cu vertical flag is inferred to be equal to 0.
- Otherwise, the value of mtt split cu vertical flag is inferred to be equal to 1. mtt_split_cu_binary_flag equal to 0 specifies that a coding unit is split into three coding units using a ternary split, mtt split cu binary flag equal to 1 specifies that a coding unit is split into two coding units using a binary split.
When mtt split cu binary flag is not present, it is inferred as follows:
- If allowSplitBtVer is equal to FALSE and allowSplitBtHor is equal to FALSE, the value of mtt split cu binary flag is inferred to be equal to 0.
- Otherwise, if allowSplitTtVer is equal to FALSE and allowSplitTtHor is equal to FALSE, the value of mtt split cu binary flag is inferred to be equal to 1.
- Otherwise, if allowSplitBtHor is equal to TRUE and allowSplitTtVer is equal to TRUE, the value of mtt split cu binary flag is inferred to be equal to 1 - mtt split cu vertical flag. - Otherwise (allowSplitBtVer is equal to TRUE and allowSplitTtHor is equal to TRUE), the value of mtt split cu binary flag is inferred to be equal to mtt split cu vertical flag.
To summarize this description, the split cu flag specifies if the current block is split or not. And this flag is signalled only if at least one other split is allowed for the current block and if the no split is possible for the current block.
The split qt flag specifies that the current block is split thanks to the QT split. It is signalled when the split cu flag was equal to 0 and when the QT split is available and when at least one MTT split is available.
When the current block is not a no split or a QT split, it is split thanks to an MTT split. Two flags are defined to signal the four MTT splits.
When at least one horizontal split and at least one vertical split are allowed, the mtt split cu vertical flag is signalled. And it signals, if the MTT split is vertical or horizontal.
According to the availability of the MTT splits, and the value of mtt split cu vertical flag, the flag mtt split cu binary flag is decoded to know if the current split is BT or TT. Of course, the checks before decoding these flags avoid an extra signalling of these flags when it is not needed.
Previous work
In previous work, the availability of split modes is determined by some rules including the minimum QT depth, an average QT depth obtained from a temporal area. In another previous work the maximum MTT depth for a current block is determined based on the value of maximum MTT depth obtained from a temporal area. This value can be incremented or decremented compared to the value for the current picture.
EMBODIMENTS
1/Solution 1: Change the coding order of partitioning syntax elements according to at least one parameter
Emb.Main The coding order of partitioning syntax elements depends on at least one parameter
In embodiments of the invention, the coding order of partitioning syntax elements, depends one at least one parameter. For the example of VVC partitioning, the regular coding order is split cu flag, split qt flag, mtt split cu vertical flag followed by 1 mtt split cu binary flag. This order is fixed. Yet, one or more flags are not coded according to the allowance of split modes and of course when one split mode is identified. This is not the object of the invention. The embodiments of the invention change the coding order with respect to one another according to at least one parameter value. For example, the order can be split qt flag, mtt split cu vertical flag, mtt split cu binary flag and split cu flag. The aim of changing the coding order is to set the flags corresponding to the most probable split modes at the beginning and those which have less probability to be selected at the end. So, the aim of changing the coding order is to minimize the number of flags to be coded for the current block.
The advantage of this embodiment, is a coding efficiency improvement. Indeed, changing the order of these flags reduces, on average, the number of coding bins to be transmitted. Even if these flags are CABAC (context-adaptive binary arithmetic coding) context coded, changing their orders is more efficient in term of bitrate.
1.1/Parameters
One key point to obtain gain by changing the order is the selection of the parameters
Emb.Pl At least one parameter represents a variable, or several variables, used for the splitting of other blocks
In an embodiment, at least one parameter represents a variable (or several variables) used for the splitting of other blocks. Indeed, partitioning of the current block has spatial and temporal correlations. To exploit these correlations the parameters used to change the coding order can be a variable of the current block and the variables of one or more spatial or temporal neighbouring blocks.
The advantage is a coding efficiency as the coding order is adapted to take into account the spatial and/or temporal correlations between spatial or temporal partitioning.
Emb.Pl. S At least one parameter is a spatial parameter
In an embodiment, at least one parameter is a spatial parameter, to exploit spatial correlations between neighbouring blocks.
The advantage is a coding efficiency as the coding order is adapted to take into account the spatial correlations between spatial partitioning.
Emb.Pl.S.l The QT Depth value of the neighbouring blocks In an embodiment, the QT depth values of the neighbouring blocks are considered to change the coding order. This value is compared, for example, to the QT depth of the current block to know if the split qt flag should be set early in the coding order, or set at the end, or kept in its place. The QT depth can be also used to determine the position of the split cu flag or all flags.
The advantage is coding efficiency improvement as the QT depth of the neighbouring blocks is correlated to the QT depth of the current block.
Emb.Pl.S.2 The average/minimum/maximum QT depth of the neighbouring blocks
In an embodiment, instead of considering only the QT depth of neighbouring blocks, the average QT depth, and/or the minimum and/or the maximum are considered or also considered to determine the best position of the split qt flag or split cu flag or all flags.
The advantage is coding efficiency improvement as the average, the minimum and the maximum are more relevant to determine the spatial correlations.
Emb.Pl.S.3 The MTT Depth value of the neighbouring blocks
In an embodiment, the MTT depth value of the neighbouring blocks is considered to change the coding order. This parameter is, for example, compared to the current MTT depth, to change the order of MTT flag but also of the other split cu flag and split qt flag.
In an another, example, its value is converted to be compared to the QT depth. In one example, the conversion is a division by 2. And for example, it is added to the value of the spatial QT depth.
The advantage is coding efficiency improvement as the MTT depth of the current block is correlated to the MTT depths of neighbouring blocks.
Emb.Pl.S.4 The average/minimum/maximum MTT depth of the neighbouring blocks
In an embodiment, instead of considering only the MTT depth of neighbouring blocks, the average MTT depth, and/or the minimum and or the maximum are considered or also considered to determine the best coding order of all flags.
The advantage is coding efficiency improvement as the average, the minimum and the maximum are more relevant to determine the correlation.
Emb.Pl.S.5 The BT or TT Depth values/average/minimum/maximum. In an embodiment, the BT depth and/or the TT depth or the average, the minimum, the maximum of BT depth and/or the TT depth can be considered to change the order. This can be used for all changes of coding orders of all flags. Yet it is more interesting for the MTT flags.
The advantage is coding efficiency improvement as a finer granularity to set a better coding order improves the coding efficiency.
Emb.Pl.S.6 The height and the width or a ratio of neighbouring blocks
In an embodiment, the height and width of the neighbouring blocks are compared to the current height and width of the current block to determine the coding order of the current block. An average or minimum or a maximum of ratio can be also considered.
Alternatively, or additionally, the ratio between the height and width of the neighbouring blocks is compared to the ratio of the height and the width of the current block.
The advantage is coding efficiency improvement as the ratio gives the information of the direction of the neighbouring blocks compared to the current block. So, when it is used, it is useful to order the MTT split flag.
Emb.Pl.S.N.lThe same neighbouring blocks as used for the context derivation
In an embodiment, the neighbouring blocks used for the derivation of the spatial parameters to change the coding order of the partitioning are the same neighbouring blocks as used for the context derivation.
The advantage is a reduced memory access compared to the following embodiments. Indeed, when using the same blocks as those used for the context derivation, the neighbouring blocks need to be accessed once.
Emb.Pl.S.N.2Al/A0 B1/B0
In an embodiment, the neighbouring blocks used for the determination of parameters to change the coding order of partitioning syntax elements are the blocks Al and/or AO and the blocks Bl and/or BO of Figure 6.
The advantage is coding efficiency improvement compared to the previous embodiment. Indeed, these block positions, in the top right and bottom left comers, create diversity compared to usage of 2 block positions around the top left of the current block. Moreover, as the spatial correlations of the neighbouring blocks A2 and B3 are exploited thanks to the context increment derivation, when using these block positions (A0/A1, B0/B1) which are not used in the context derivation, the spatial correlations between neighbouring blocks is fully exploited. Emb.Pl.S.N.3 All neighbouring blocks
In an embodiment, all neighbouring blocks which have a linear frontier or a corner are considered for the derivation of the determination of parameters to change the coding order of partitioning syntax elements. Or alternatively, the blocks AO, Al A2 and BO, Bl, B2, B3 are considered.
The advantage is a coding efficiency improvement compared to the previous embodiments (as more spatial correlations are exploited) and an increase in the precision of the average, the minimum or the maximum, when they are used to generate the parameters to change the coding order.
Emb.Pl.T At least one parameter is a temporal parameter
In an embodiment, at least one parameter is a temporal parameter, to exploit temporal correlations. The temporal parameter or parameters are derived from a temporal area. The temporal area is defined later in description of the invention.
The advantage is a coding efficiency improvement, as the coding order is adapted to take into account the temporal correlations between a temporal partitioning and the current one. But in opposite to the spatial correlations, this can’t be exploited for the first intra frame of a sequence or a GOP.
Emb.Pl.T.l The QT Depth value of a temporal area
In an embodiment, the QT depth value computed from a temporal area is considered to change the coding order. This value is compared, for example, to the QT depth of the current block to know if the split qt flag should be set early in the coding order or at the end or keep its place. The QT depth can be also used to determine the position of the split cu flag or all flags.
The advantage is coding efficiency improvement as the QT depth obtained from a temporal area is correlated to the QT depth of the current block.
Emb.Pl.T.l The average/minimum/maximum QT of a temporal area
In an embodiment, instead of considering only the QT depth of the temporal area, the average QT depth, and/or the minimum and/or the maximum are considered or also considered to determine the best position of the split qt flag or split cu flag or all flags. The advantage is coding efficiency improvement as the average, the minimum and the maximum are more relevant to determine the correlation.
Emb.Pl.T.3 The MTT depth value of a temporal area
In an embodiment, the MTT depth value of a temporal area is considered to change the coding order. This parameter is, for example, compared to the current MTT depth, to change the order of MTT flag but also of the other split cu flag and split qt flag.
In an another, example, its value is converted in order to be compared to the QT depth. In one example, the conversion is a division by 2. And for example, it is added to the value of the temporal QT depth.
The advantage is coding efficiency improvement as the MTT depth of the current block is correlated to the MTT depth of a temporal area.
Emb.Pl.T.4 The average/minimum/maximum MTT depth of a temporal area
In an embodiment, instead of considering only the MTT depth of a temporal area, the average MTT depth, and/or the minimum and or the maximum are considered or also considered to determine the best coding order of all flags
The advantage is coding efficiency improvement as the average, the minimum and the maximum are more relevant to determine the temporal correlation.
Emb.Pl.T.5 The BT or TT Depth values/average/minimum/maximum.
In an embodiment, the BT depth and/or the TT depth or the average, the minimum, the maximum of BT depth and/or the TT depth can be considered to change the order. This can be used for all change coding order of all flags. Yet it is more interesting for the MTT flags.
The advantage is coding efficiency improvement as a finer granularity to set a better coding order improve the coding efficiency.
Emb.Pl.T.6 The height and the width or a ratio of a temporal area
In an embodiment, the height and the width or an average of the height or the width of the blocks from a temporal area are compared to the current height and width of the current block to determine the coding order of the partitioning syntax elements for the current block. An average or minimum or a maximum of a ratio between the height and width of the blocks from a temporal area can be alternatively or additionally also considered. Alternatively, or additionally, the ratio between the height and width of the temporal is compared to the ratio of the height and the width of the current block.
The advantage is coding efficiency improvement as the ratio gives the information of the direction of the temporal area compared to the current block. So, when it used, it is useful to order the MTT split flag values.
Emb.H At least one parameter transmitted in a header
In an embodiment, at least one parameter transmitted in a header is considered to change the order of the partitioning syntax elements for the current block. Similarly to previous embodiments, the parameter or parameters can be compared to the current QT depth or the current MTT depth depending on the change order that can be used.
The advantage is coding efficiency improvement as the change order can obtain gain even if the parameter is not defined locally. Moreover, this embodiment is less complex as the parameter is transmitted and doesn’t need a change of the syntax order at block level.
Emb.Hl SPS
In an embodiment, at least one parameter is transmitted in the SPS. The advantage is that one different order can be adapted to the whole sequence. Even if it is less efficient than a block level adaption, this can be efficient for constant bit rate or low delay configurations.
Emb.Hl PPS
In an embodiment, at least one parameter is transmitted in the PPS. The advantage is that one different order can be adapted to a set of frames. Even if it is less efficient than a block level adaption, this can be efficient as the parameter is adapted for a set of pictures which have the same characteristics of coding, and the different depths are restricted for frames with the same characteristics of coding.
Emb.H3 Picture header
In an embodiment, at least one parameter is transmitted in the picture header. The advantage is that one different order can be adapted to a frame. This is efficient, even if it is less complex.
Emb.H4 Slice Header In an embodiment, at least one parameter is transmitted in the slice header. The advantage is that one different order can be adapted to a slice. This is the most efficient granularity.
Emb.H5 In several headers
In an embodiment, several parameters can be transmitted in different headers. The advantage is the flexibly for encoder implementations.
Emb.Fl At least one parameter is a fixed number
In an embodiment, at least one parameter is a fixed number and it is considered to change the order of the partitioning syntax elements for the current block. If several parameters are needed, several fixed numbers are considered.
The advantage is a coding efficiency improvement, but less than when using parameters which depend on already decoded values. Yet, the advantage is that it is less complex.
Emb.Fl The parameter depends on the type of the slice
In an embodiment, at least one parameter is a fixed number which depends on the slice type. Indeed, the partitioning for intra or inter is different as the correlations exploited are different. Several parameters can be also considered.
The advantage is coding efficiency improvement compared to the previous embodiment with a similar complexity.
Emb.F3 The usage of Intra or Inter mode in the neighbourhood
In an embodiment, at least one parameter depends on the usage of Intra mode and Inter mode in the neighbourhood. The neighbourhood here includes spatial and/or temporal blocks. In one example, the proportion of pixels coded in Intra compared to the pixels coded in Inter is used to determine the parameter or the parameters to change the order of the partitioning syntax elements for the current block.
The advantage is coding efficiency improvement compared to the previous embodiment but with additional complexity.
Emb.F4 The parameter depends on the QP value
In an embodiment, at least one parameter depends on the QP value of the current block or on the QP value of the slice. The advantage is coding efficiency improvement based on the QP value which has a significant impact on the partitioning, and the additional complexity is very low.
Emb.F5 The parameter depends on the temporal ID/ Hierarchy Depth
In an embodiment, at least one parameter depends on the temporal ID or on the hierarchy depth of the frame of the current block.
The advantage is a coding efficiency improvement based on the temporal ID or on the hierarchy depth value which has an impact on the partitioning, and the additional complexity is very low.
Emb.F6 The parameter depends on the Maximum MTT depth for the current picture/Slice
In an embodiment, at least one parameter depends on the Maximum MTT depth for the current picture/Slice of the current block.
The advantage is coding efficiency improvement. Indeed, the maximum MTT depth is one of the partitioning parameters which impacts significantly the partitioning. As this value is transmitted at picture level, the complexity is low.
Emb.Sl.1.5 Combinations
In an embodiment, at least 2 embodiments as described previously are combined.
1.2/ Changing order
Emb.COl The split qt flag is coded before the split cu flag
In an embodiment, the split qt flag is coded before the split cu flag as depicted in the following table syntax elements. In the following tables, strikethrough is used to indicate deletion and underline is used to indicate insertion.
The advantage is a coding efficiency improvement sometimes as high bitrates.
Emb.COl.l According to at least one parameter In an embodiment, the split qt flag is coded before the split cu flag according to at least one parameter. As described in the following table syntax elements.
In this table syntax element, the current QT depth “QTDepth” is compared to a QT depth predictor “QTDepthPred”. If, according to the regular conditions for QT split decoding, and if the current QT depth is inferior to the QT depth predictor, when QT split is allowed, the split qt flag is decoded.
If split qt flag is equal to 1 (or true), the other syntax elements don’t need to be decoded. Otherwise, split qt flag is equal to 0 (or false), the split cu flag is decoded and if it is equal to 1, the regular decoding of another flag is applied. Except, for the split qt flag which is decoded only when the current QT depth is superior or equal to the QT depth pred. Indeed, if the current QT depth is inferior to the QT depth predictor, the split qt flag have been already decoded. Please note that it is an example and the last condition is just a check if split qt flag has been decoded or not. For example, another variable representing that split qt flag has been decoded or not can be used. Or if the variable split qt flag is initialized to 1, the check to decode split qt flag can be if split qt flag is equal to 1.
In the same way, the proposed criterion is “if QTDepth is inferior to QTDepthPred”, yet another notation can be considered as “inferior or equal”, as it depends to the predictor value that it is considered.
The advantage is a coding efficiency improvement, as the value QTDepth is predictable. More precisely, the value of split qt flag is predictable when it is equal to 1 and consequently, the split cu flag can be also predictable and inferred to be equal to 1. So, it is preferable to avoid the coding of split cu flag for that case.
Emb.COl.1.1 The at least one parameter is at least one of parameter described in previous embodiment Emb. S.l In an embodiment, the at least one parameter is at least one of parameters as described previously.
Example: Temporal QTDepth
For example, the predictor “QTDepthPred” is a QT depth value obtained from a temporal area. Indeed, the temporal QT depth is correlated to the current QT depth. This QT depth can be a value extracted from a temporal block or an average, a minimum or a maximum value among several temporal blocks. In a preferred embodiment, this temporal QT depth value is an average of QT depth of blocks in a temporal area as described latter.
Example: Spatial QTDepth
For example, the predictor “QTDepthPred” is a QT Depth value obtained from the neighbouring blocks. This QT depth can be a value extracted from one neighbouring block or an average, a minimum or a maximum value among several neighbouring blocks as described previously.
Example: Spatial and temporal QTDepth
To increase the coding efficiency, for example, the predictor of QT depth can be an average of one temporal QT depth or one spatial QT depth or a maximum or a minimum between one temporal QT depth or one spatial QT. The temporal QT depth and spatial can be obtained as described previously.
Example: include an MTT depth
In one example, the QT depth predictor is the sum of a QT depth predictor and an MTT depth predictor.
QTDepthPred = QTDepthPred + MttDepthPred
To be in line with the QT depth predictor the MTT depth predictor is divided by 2 (or right shifted by 1) as the following:
QTDepthPred = QTDepthPred + MttDepthPred » 1
In the case that the MttDepthPred » 1, is greater than 0, even if the split qt flag is signalled before the split cu flag, there is a low chance that the split cu flag is equal to 0. So, in that case there is no additional bins signalled compared to the regular coding order.
Example: include an MTT depth (spatial and temporal) In one example, additionally to the previous one, the QTDepthPred and the MttDepthPred are obtained spatially and/or temporally.
As example the QTDepthPred can be set equal as the following:
QTDepthPred = Max(QTDepthTempoPred, QTDepthSpatialPred) + Min(MttDepthT empoPred, MttDepthSpatialPred)
Example: the parameter comes from an header
In one example, the QT depth predictor is set equal to a syntax element decoded from an header. So, for example the or these syntax elements can be a picture header syntax element ph QtDethPred, and/or a slice header syntax element sh QtDethPred, and/or a picture parameter set syntax element pps QtDethPred, and/or a sequence parameter set syntax element sps QtDethPred. We can also consider a CTU syntax element ctu QtDethPred.
The advantage of this embodiment is the simplification of the parsing/decoding, as no value for the QT depth predictor needs to be updated at block level.
Example: the parameter is determined based on a value
In example, the QTdepthpred depends on the type of the slice. For example, QTdepthpred is equal 3 is affected for intra slices and QTdepthpred is equal 2 for Inter slices.
In another example, the usage of Intra or Inter mode in the neighbourhood change the value of QTdepthpred. The value can be incremented when there is one or more Intra block in the neighbourhood. This can be also an additional embodiment to the other based on temporal and spatial QT depth predictor.
In another example, the QTdepthpred depends on the QP value or on high/low bitrate. When the QP value is low (high bitrate), the QTdepthpred is higher than for high QP (low bitrate).
In another example, the QTdepthpred depends on the temporal ID or the hierarchy depth. For example, for small temporal ID or hierarchy depth, the QTdepthpred is set at an higher value than high temporal ID or hierarchy depth. For example, 3 or 4 for the lower level (temporal ID 0) and to 1 or 0 for the highest level.
In another example, the QTdepthpred depends on the Maximum MTT depth for the current picture/slice. For example, when the maximum MTT depth is low (1 or 0), the QTdepthpred has a highest value than when the MTT depth is low (3 or 4).
As the previous example, this embodiment is the simplification of the parsing/decoding, as no value for the QT depth predictor needs to be updated at block level. Example: the parameter is a fix value
In one example, the QTdepthpred is determined based on at least one another value. As described previously the QTdepthpred is fix. In that case a QTdepthpred equal to 2 gives interesting results, but the value 3 is also interesting.
Emb.COl The split cu flag is coded after the MTT split flags
In an embodiment, the split cu flag is coded after the MTT split flags as described in the following table syntax elements:
The advantage is coding efficiency improvement sometimes. The improvement is obtained for high bit rates or for some cases as described in the following embodiments.
Emb.COl.l The at least parameter is at least one of parameter described in Emb. P.l In an embodiment, the split cu flag is coded after the MTT flags according to at least one parameter otherwise it is coded as the regular coding order. In this table syntax element, the current MTT depth “MttDepth” is compared to a MTT depth predictor “MttDepthPred”. If, according to the regular conditions for MTT splits decoding, and if the current MTT depth is superior to the MTT depth predictor, the split cu flag is decoded. In this example, the split cu flag is initialized to 1.
If split cu flag is equal to 1 (or true), whether it has been decoded or not, the other syntax elements are decoded. Otherwise, split cu flag is equal to 0 (or false), the current block is a CU and no other split flags are decoded. When split cu flag is equal to 1, the split qt flag is decoded and if it is equal to 0 or it is not decoded, the MTT flags are decoded. If the MTT split is the last decoded split, and if the MTT depth is inferior or equal to the MttDepth predictor, the split cu flag is decoded to know if the split mode for the current block is no split or the last decoded MTT split.
The last MTT decoded split can be, for example, the least probable MTT split among the available MTT splits.
The advantage is coding efficiency improvement, as the value “No split” is predictable. More precisely, the value of split cu flag is often predictable when it is equal to 1. So, it is preferable to avoid the coding of split cu flag for some cases.
In an embodiment, the at least one parameter is at least one parameter as described previously. At least one parameter is used to obtain MttDepthPred in this example.
In the same principle as the previous embodiment, the MttDepthPred can be obtained according to one or more of the following examples.
Example: Temporal MTTDepth
For example, the predictor “MttDepthPred” is a MTT depth value obtained from a temporal area. Indeed, the temporal MTT depth is correlated to the current MTT depth. This MTT depth can be a value extracted from a temporal block or an average, a minimum or a maximum value among several temporal blocks. In a preferred embodiment, this temporal MTT depth value is an average of MTT depth of blocks in a temporal area as described later.
Example: Spatial MTTDepth
For example, the predictor “MttDepthPred” is a MTT depth value obtained from the neighbouring blocks. This MTT depth can be a value extracted from one neighbouring block or an average, a minimum or a maximum value among several neighbouring blocks as described previously.
Example: Spatial and temporal MTTDepth
To increase the coding efficiency, for example, the predictor of MTT depth can be an average of one temporal MTT depth or one spatial MTT depth or a maximum or a minimum between one temporal MTT depth or one spatial MTT. The temporal MTT depth and spatial MTT depth are obtained as described previously.
Example: picture/Slice MTT depth In one example, the MTT depth predictor is determined based on the Picture/slice MTT depth: the maximum MTT depth for the picture or slice (PH MaxMttDepth). In one example, the predictor “MttDepthPred” is set equal to the half of the value as the following formula:
MttDepthPred = PH MaxMttDepth » 1
Or alternatively:
MttDepthPred = (PH_MaxMttDepth+l) » 1
Example: the parameter comes from a header
In one example, the MTT depth predictor is set equal to a syntax element decoded from a header. So, for example the or these syntax elements can be a picture header syntax element ph MttDethPred, and/or a slice header syntax element sh MttDethPred, and/or a picture parameter set syntax element pps MttDethPred, and/or a sequence parameter set syntax element sps MttDethPred. We can also consider a CTU syntax element ctu MttDethPred.
The advantage of this embodiment is the simplification of the parsing/decoding, as no value for the MTT depth predictor needs to be updated at block level.
Example: the parameter is determined based on a value
In an example, the MttDepthPred depends on the type of the slice. For example, MttDepthPred is set equal to 2 for intra slices and MttDepthPred is set equal to 1 for Inter slices.
In another example, the usage of Intra or Inter mode in the neighbourhood changes the value of MttDepthPred. The value can be incremented when there is Intra blocks in the neighbourhood. This can be also an additional embodiment to the other embodiment based on temporal and spatial QT depth predictor.
In another example, the MttDepthPred depends on the QP value or on high/low bitrate. When the QP value is low (high bitrate), the MttDepthPred is higher than for high QP (low bitrate).
In another example, the MttDepthPred depends on the temporal ID or the Hierarchy Depth. For example, for small temporal ID or hierarchy depth, the MttDepthPred is set at a higher value than high temporal ID or hierarchy depth. For example, MttDepthPred is set equal to 3 or 4 for the lower level (temporal ID 0) and to 1 or 0 for the highest level.
In another example, the MttDepthPred depends on the maximum MTT depth for the current picture/slice. For example, when the maximum MTT depth is low (1 or 0), the MttDepthPred is set to a higher value than when the MTT depth is low (3 or 4). As the previous example, this embodiment is the simplification of the parsing/decoding, as no value for the MTT depth predictor needs to be updated at block level.
Example: the parameter is a fixed value In one example, the MttDepthPred is determined based on at least one another value.
As described previously the MttDepthPred can be fixed. In that case setting a MttDepthPred equal to 1 gives beneficial results. A value of 2 has also been found to produce beneficial results.
Emb.CO3 The split qt flag is coded after the MTT split flags In an embodiment, the split qt flag is coded after the MTT split flags as depicted in the following table syntax elements.
The advantage is coding efficiency improvement sometimes. The improvement is obtained for low bit rates or for some cases as described in the following embodiments.
Emb.CO3.1 The at least parameter is at least one of parameter described in Emb. P.l
In an embodiment, the split cu flag is coded after the MTT flags according to at least one parameter otherwise it is coded as the regular coding order.
In this table syntax element, the current QT depth “QTDepth” is compared to a second depth predictor “QTDepthPred2”. If, according to the regular conditions for QT split decoding, and if the current QT depth is inferior to the second QT depth predictor, the split qt flag is decoded. In this example, the split qt flag is initialized to 0.
If split qt flag is equal to 0 (or false), whether it has been decoded or not, the other syntax elements are decoded. Otherwise, split qt flag is equal to 1 (or true), the current block is QT split and no other split flags are decoded. When split qt flag is equal to 0, i.e. if the split qt flag is decoded and equal to 0 or not decoded, the MTT flags are decoded. If the MTT split is the last decoded split, and if the QT depth is superior or equal to the second predictor QTDepthPred2, the split qt flag is decoded to know if the split mode for the current block is QT split or the last decoded MTT split.
The last MTT decoded split can be for example the least probable MTT split among the available MTT splits. The advantage is a coding efficiency improvement, as the QT depth is predictable. More precisely, the value of split qt flag is often predictable when it is equal to 0. So, it is preferable to avoid the coding of split qt flag for some cases. In an embodiment, the at least one parameter is at least one of parameters as described previously. This “at least one parameter” is used to obtained MttDepthPred in this example.
In the same principle as the previous embodiment describing value for the QtDepthPred can be applied to this example. For example, when using the temporal average QT depth can be considered for QtDepthPred2 of this example.
Emb.CO4 Combinations
All these conditional change orders can be considered together. The following embodiments gives examples with the related table of syntax elements.
Emb.COl + Emb.COl In one embodiment, the split qt flag is coded before the split cu flag according to the condition defined previously and the split cu flag is coded after the MTT split flags according to some other conditions as defined previously. The following table of syntax elements illustrates this embodiment.
The advantage is coding efficiency improvement as it combines the efficiency of these both change orders. Emb.COl + Emb.CO3
In one embodiment, the split qt flag is coded before the split cu flag according to the condition described previously and the split_qt flag is also coded after the MTT split flags according to some other conditions as defined previously. The following table of syntax elements illustrates this embodiment.
The advantage is coding efficiency improvement as it combines the efficiency of these both change orders. Emb.COl + Emb.CO3
In one embodiment, the split cu flag is coded after the MTT split flags according to the condition described previously and the split qt flag is not coded according to some other conditions as defined previously as depicted in the following table of syntax elements. The advantage is coding efficiency improvement as it combined the efficiency of these both change orders.
Emb.COl + Emb.CO2+ Emb.CO3 In one embodiment, the three change orders are combined as depicted in the following table of syntax elements.
The advantage is coding efficiency improvement as it combined the efficiency of these 3 change orders.
1.3/ Others Emb.Ol Not Limited to the VVC splits modes
The proposed invention is not limited to the VVC partitioning. And can be adapted for other type of partitioning. For example, if additional partitioning type are added. In an embodiment, the method is adapted to take into account additional MTT modes, as for example, an asymmetric binary coding can be added in the list of MTT split modes.
The advantage is a coding efficiency improvement as for the proposed examples described previously.
Emb.O2 Enable/disable the proposed method
In an embodiment, the proposed change order or changes orders can be enabled or disabled according to a high level syntax flag. The syntax flag can be transmitted at GCI (General Constraints Information), SPS, PPS, Picture Header or slice header.
In an additional embodiment, each change order has its own flag to enable or disable it.
The advantage is coding efficiency is a flexibility of the usage of the method.
Emb.O3 QT depth can be replaced by Block size
In an embodiment, all embodiments using a QT depth or temporal QT depth can replace this with the block size. Indeed, this represents the same principle. Indeed, the QT is only at the beginning of the split, because the QT split is not allowed after an MTT split. The QT depth can be easily considered as the block size.
For example, when the CTU size is 128, the following gives the conversion of the QT depth to the block size. Please note that all blocks are square blocks with a QT split. So only one number can be considered
QtDepth = 0 => 128
QtDepth = 1 => 64
QtDepth = 2 => 32
QtDepth = 3 => 16
QtDepth = 4 => 8
QtDepth = 5 => 4
Another example when the CTU size is 256:
QtDepth = 0 => 256
QtDepth = 1 => 128
QtDepth = 2 => 64
QtDepth = 3 => 32
QtDepth = 4 => 16
QtDepth = 5 => 8
QtDepth = 6 => 4 Temporal Area
Collocated
In an embodiment, the block considered to determine a temporal value of the QT depth or block size is the QT depth value or the block size of the temporal collocated block or a temporal block shifted by a motion vector value obtained from a neighboring block, for example. Similarly, the MaxMttDepth can be the MttDepth of the temporal collocated block or a temporal block shifted by a motion vector value obtained from a neighboring block, for example.
A plurality of positions
In an embodiment, several positions of blocks are considered to determine a temporal value of the QT depth or block size or for the maximum MTT depth or for the average MTT depth. For example, the positions C, TL, TR, BL, BR, of Figure 14 can be considered. In this figure the position C is the center of the temporal collocated block. Positions TL, TR, BL, BR are respectively, the Top Left, Top right, Bottom Left and Bottom Right position around the temporal collocated block.
Compared to the previous embodiment, the current one increases the coding efficiency as the temporal QT depth determined or the temporal block size or the average temporal MTT depth or the maximum MTT depth is more often reliable.
A higher area
In an embodiment, a temporal value of the QT depth or block size or the average temporal MTT depth or the maximum MTT depth is determined based on a temporal area. For example, the temporal area is a collocated CTU.
Compared to the two previous embodiments, more blocks can be considered, so the coding efficiency increases.
The center of the current block is used to determine the temporal positions
In an embodiment, to determine the collocated block or several temporal blocks or a temporal area, the center of the current block is considered.
The advantage is an increase of coding efficiency as the center is the best position to represent the current block. Alternatively, when the center of the block is outside the current frame, the top left position is considered. A whole frame
In an embodiment, a temporal value of the QT depth or the block size or the average temporal MTT depth or the maximum MTT depth is determined based on all blocks of a temporal frame.
The advantages of this embodiment is a simplification of the process to determine the temporal QT depth or block size or the average temporal MTT depth or the maximum MTT depth value, but it is less efficient because it is less adapted to the content compared to the previous embodiments.
A frame with the same temporal ID
In an embodiment, the collocated block or several temporal blocks or a temporal area, come from a frame with the same temporal ID.
For the example of the Random Access, configuration as represented in Figure 7, if the current frame has a temporal ID equal to 4, another encoded/decoded frame with the same temporal ID equal to 4 is used to determine the values of the proposed method.
The frames with the same temporal ID often have the same coding parameters. In particular, they often have the same or similar QP and the same spatial distances to their reference frames. So, they are very useful to predict the QT depth or the average temporal MTT depth or the maximum MTT depth, as this data is correlated to the QP and the spatial distance between frames.
The closest frame with the same temporal ID
In an embodiment, the collocated block or several temporal blocks or a temporal area, come from the closest frame with the same temporal ID.
For the example of the Random Access configuration, as represented in Figure 7, the closest frame with the same temporal ID is (generally) more correlated than the others. So, the result is better.
A frame or a reference frame with the same QP
In an embodiment, the collocated block or several temporal blocks or a temporal area, come from a frame or a reference frame with the same QP. Ideally, a reference frame with the same QP. As mentioned above, the QP has an important influence on the block partitioning. So, with a frame with the same QP, the temporal QT depth or block size or the average temporal MTT depth or the maximum MTT depth is a better predictor.
A reference frame which is the same as those used for the temporal Motion vector prediction
In an embodiment, the collocated block or several temporal blocks or a temporal area, come from the reference frame which is used for the temporal motion vector prediction. This can be the first reference of the reference List 0 or the first reference frame of List 1 according to a flag transmitted in the picture header or in the slice header.
Surprisingly, this embodiment gives the best coding efficiency even if this reference frame has a lower QP. Yet, it is closer to the current frame compared to all frames with the same temporal ID.
The closest reference frame
In an embodiment, the collocated block or several temporal blocks or a temporal area, come from the closest reference frame.
As explained for the previous embodiment, the distance to the current frame provides a useful compromise between encoder time reduction and coding efficiency, even if the frames with the same QP have statistically more correlations between their QP depths and MTT depth.
More than one reference frame
In an embodiment, two reference frames are considered and two temporal areas or two sets of several blocks or two collocated blocks are used to determine two temporal QT depths or two block sizes. These are then used to determine one QT depth or one block size or one MTT depth. For example, the minimum of QT depth from the two temporal areas can be considered.
More than two reference frames can also be considered.
The advantage is a better compromise between encoder time reduction and coding efficiency as the QT depth value or the MTT depth value is computed from more data. This is particularly efficient when both reference frames have the same temporal distance, but it increases the number of memory accesses.
Further Embodiment(s) In VVC, the syntax elements to represent the different split modes are coded, if needed, in the following order:
• split cu flag
• split qt flag
• mtt split cu vertical flag
• mtt split cu binary flag
According to an embodiment of the invention, when the current QT depth is strictly inferior to the average temporal QT depth, the syntax elements are coded in the following order, otherwise the syntax elements are coded as usual:
• split qt flag
• split cu flag
• mtt split cu vertical flag
• mtt split cu binary flag
Implementation of the invention
Figure 15 shows a system 191 195 comprising at least one of an encoder 150 or a decoder 100 and a communication network 199 according to embodiments of the present invention. According to an embodiment, the system 195 is for processing and providing a content (for example, a video and audio content for displaying/outputting or streaming video/audio content) to a user, who has access to the decoder 100, for example through a user interface of a user terminal comprising the decoder 100 or a user terminal that is communicable with the decoder 100. Such a user terminal may be a computer, a mobile phone, a tablet or any other type of a device capable of providing/displaying the (provided/streamed) content to the user. The system 195 obtains/receives a bitstream 101 (in the form of a continuous stream or a signal - e.g. while earlier video/audio are being displayed/output) via the communication network 199. According to an embodiment, the system 191 is for processing a content and storing the processed content, for example a video and audio content processed for displaying/outputting/streaming at a later time. The system 191 obtains/receives a content comprising an original sequence of images 151, which is received and processed (including filtering with a deblocking filter according to the present invention) by the encoder 150, and the encoder 150 generates a bitstream 101 that is to be communicated to the decoder 100 via a communication network 191. The bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus. The system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g. the title of the content and other meta/ storage location data for identifying, selecting and requesting the content), and for receiving and processing a user request for a content so that the requested content can be delivered/streamed from the storage apparatus to the user terminal. Alternatively, the encoder 150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content. The decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
Any step of the method/process according to the invention or functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the steps/functions may be stored on or transmitted over, as one or more instructions or code or program, or a computer-readable medium, and executed by one or more hardware-based processing unit such as a programmable computing machine, which may be a PC (“Personal Computer”), a DSP (“Digital Signal Processor”), a circuit, a circuitry, a processor and a memory, a general purpose microprocessor or a central processing unit, a microcontroller, an ASIC (“Application-Specific Integrated Circuit”), a field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques describe herein.
Embodiments of the present invention can also be realized by wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of JCs (e.g. a chip set). Various components, modules, or units are described herein to illustrate functional aspects of devices/apparatuses configured to perform those embodiments, but do not necessarily require realization by different hardware units. Rather, various modules/units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors in conjunction with suitable software/firmware.
Embodiments of the present invention can be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium to perform the modules/units/functions of one or more of the above-described embodiments and/or that includes one or more processing unit or circuits for performing the functions of one or more of the above-described embodiments, and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiments and/or controlling the one or more processing unit or circuits to perform the functions of one or more of the abovedescribed embodiments. The computer may include a network of separate computers or separate processing units to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a computer-readable medium such as a communication medium via a network or a tangible storage medium. The communication medium may be a signal/bitstream/carrier wave. The tangible storage medium is a “non-transitory computer-readable storage medium” which may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like. At least some of the steps/functions may also be implemented in hardware by a machine or a dedicated component, such as an FPGA (“Field-Programmable Gate Array”) or an ASIC (“Application-Specific Integrated Circuit”).
Figure 16 is a schematic block diagram of a computing device 3600 for implementation of one or more embodiments of the invention. The computing device 3600 may be a device such as a micro-computer, a workstation or a light portable device. The computing device 3600 comprises a communication bus connected to: - a central processing unit (CPU) 3601, such as a microprocessor; - a random access memory (RAM) 3602 for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encoding or decoding at least part of an image according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; - a read only memory (ROM) 3603 for storing computer programs for implementing embodiments of the invention; - a network interface (NET) 3604 is typically connected to a communication network over which digital data to be processed are transmitted or received. The network interface (NET) 3604 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 3601; - a user interface (UI) 3605 may be used for receiving inputs from a user or to display information to a user; - a hard disk (HD) 3606 may be provided as a mass storage device; - an Input/Output module (IO) 3607 may be used for receiving/sending data from/to external devices such as a video source or display. The executable code may be stored either in the ROM 3603, on the HD 3606 or on a removable digital medium such as, for example a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the NET 3604, in order to be stored in one of the storage means of the communication device 3600, such as the HD 3606, before being executed. The CPU 3601 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 3601 is capable of executing instructions from main RAM memory 3602 relating to a software application after those instructions have been loaded from the program ROM 3603 or the HD 3606, for example. Such a software application, when executed by the CPU 3601, causes the steps of the method according to the invention to be performed.
It is also understood that according to another embodiment of the present invention, a decoder according to an aforementioned embodiment is provided in a user terminal such as a computer, a mobile phone (a cellular phone), a table or any other type of a device (e.g. a display apparatus) capable of providing/displaying a content to a user. According to yet another embodiment, an encoder according to an aforementioned embodiment is provided in an image capturing apparatus which also comprises a camera, a video camera or a network camera (e.g. a closed-circuit television or video surveillance camera) which captures and provides the content for the encoder to encode. Two such examples are provided below with reference to Figures 17 and 18.
Figure 17 is a diagram illustrating a network camera system 3700 including a network camera 3702 and a client apparatus 202.
The network camera 3702 includes an imaging unit 3706, an encoding unit 3708, a communication unit 3710, and a control unit 3712.
The network camera 3702 and the client apparatus 202 are mutually connected to be able to communicate with each other via the network 200.
The imaging unit 3706 includes a lens and an image sensor (e.g., a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS)), and captures an image of an object and generates image data based on the image. This image can be a still image or a video image.
The encoding unit 3708 encodes the image data by using said encoding methods explained above, or a combination of encoding methods described above.
The communication unit 3710 of the network camera 3702 transmits the encoded image data encoded by the encoding unit 3708 to the client apparatus 202.
Further, the communication unit 3710 receives commands from client apparatus 202. The commands include commands to set parameters for the encoding of the encoding unit 3708.
The control unit 3712 controls other units in the network camera 3702 in accordance with the commands received by the communication unit 3712.
The client apparatus 202 includes a communication unit 3714, a decoding unit 3716, and a control unit 3718.
The communication unit 3714 of the client apparatus 202 transmits the commands to the network camera 3702.
Further, the communication unit 3714 of the client apparatus 202 receives the encoded image data from the network camera 3712.
The decoding unit 3716 decodes the encoded image data by using said decoding methods explained above, or a combination of the decoding methods explained above.
The control unit 3718 of the client apparatus 202 controls other units in the client apparatus 202 in accordance with the user operation or commands received by the communication unit 3714.
The control unit 3718 of the client apparatus 202 controls a display apparatus 2120 so as to display an image decoded by the decoding unit 3716.
The control unit 3718 of the client apparatus 202 also controls a display apparatus 2120 so as to display GUI (Graphical User Interface) to designate values of the parameters for the network camera 3702 includes the parameters for the encoding of the encoding unit 3708.
The control unit 3718 of the client apparatus 202 also controls other units in the client apparatus 202 in accordance with user operation input to the GUI displayed by the display apparatus 2120.
The control unit 3718 of the client apparatus 202 controls the communication unit 3714 of the client apparatus 202 so as to transmit the commands to the network camera 3702 which designate values of the parameters for the network camera 3702, in accordance with the user operation input to the GUI displayed by the display apparatus 2120.
Figure 18 is a diagram illustrating a smart phone 3800. The smart phone 3800 includes a communication unit 3802, a decoding unit 3804, a control unit 3806 and a display unit 3808. the communication unit 3802 receives the encoded image data via network 200.
The decoding unit 3804 decodes the encoded image data received by the communication unit 3802.
The decoding / encoding unit 3804 decodes / encodes the encoded image data by using said decoding methods explained above.
The control unit 3806 controls other units in the smart phone 3800 in accordance with a user operation or commands received by the communication unit 3806.
For example, the control unit 3806 controls a display unit 3808 so as to display an image decoded by the decoding unit 3804. The smart phone 3800 may also comprise sensors 3812 and an image recording device 3810. In such a way, the smart phone 3800 may record images, encode the images (using a method described above).
The smart phone 3800 may subsequently decode the encoded images (using a method described above) and display them via the display unit 3808 - or transmit the encoded images to another device via the communication unit 3802 and network 200.
Alternatives and modifications
While the present invention has been described with reference to embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. It will be appreciated by those skilled in the art that various changes and modification might be made without departing from the scope of the invention, as defined in the appended claims. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
It is also understood that any result of comparison, determination, assessment, selection, execution, performing, or consideration described above, for example a selection made during an encoding or filtering process, may be indicated in or determinable/inferable from data in a bitstream, for example a flag or data indicative of the result, so that the indicated or determined/inferred result can be used in the processing instead of actually performing the comparison, determination, assessment, selection, execution, performing, or consideration, for example during a decoding process.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.

Claims

1. A method of encoding or decoding image data into or from a bitstream, the bitstream including data indicating a partitioning of the image data into a plurality of blocks according to a coding tree, wherein blocks in the coding tree may be partitioned according to a plurality of split modes, each split mode being indicated by a respective partitioning syntax element, the method comprising: determining, for a current block to be decoded, a coding order for the partitioning syntax elements, wherein the determining is based on at least one parameter.
2. A method according to claim 1, wherein the at least one parameter comprises one or more variable used to determine the partitioning of another block.
3. A method according to claim 1 or claim 2, wherein the at least one parameter comprises a spatial parameter.
4. A method according to any preceding claim, wherein the at least one parameter comprises a quad tree depth, or a block size, of a block neighbouring the current block.
5. A method according to any preceding claim, wherein the at least one parameter comprises one or more of: an average quad tree depth, or average block size, of two or more blocks neighbouring the current block; a minimum quad tree depth, or minimum block size, of two or more blocks neighbouring the current block; and a maximum quad tree depth, or maximum block size, of two or more blocks neighbouring the current block.
6. A method according to any preceding claim, wherein the at least one parameter comprises a multi tree depth value of a block neighbouring the current block.
7. A method according to claim 6, further comprising a step of comparing the multi tree depth value of the block neighbouring the current block to the multi tree depth value of the current block, wherein the coding order is determined based on the comparison.
8. A method according to claim 6, further comprising a step of converting the multi tree depth value of the block neighbouring the current block to a converted value, wherein the method further comprises a step of comparing the converted value to the quad tree depth value of the current block, wherein the coding order is determined based on the comparison.
9. A method according to claim 8, wherein the step of converting the multi tree depth value of the block neighbouring the current block to a converted value comprises dividing the multi tree depth value of the block neighbouring the current block by two.
10. A method according to any preceding claim, wherein the at least one parameter comprises one or more of: an average multi tree depth of two or more blocks neighbouring the current block; a minimum multi tree depth of two or more blocks neighbouring the current block; and a maximum multi tree depth of two or more blocks neighbouring the current block.
11. A method according to any preceding claim, wherein the at least one parameter comprises a binary tree depth value of a block neighbouring the current block.
12. A method according to any preceding claim, wherein the at least one parameter comprises one or more of: an average binary tree depth of two or more blocks neighbouring the current block; a minimum binary tree depth of two or more blocks neighbouring the current block; and a maximum binary tree depth of two or more blocks neighbouring the current block.
13. A method according to any preceding claim, wherein the at least one parameter comprises a ternary tree depth value of a block neighbouring the current block.
14. A method according to any preceding claim, wherein the at least one parameter comprises one or more of: an average ternary tree depth of two or more blocks neighbouring the current block; a minimum ternary tree depth of two or more blocks neighbouring the current block; and a maximum ternary tree depth of two or more blocks neighbouring the current block.
15. A method according to any preceding claim, further comprising a step of comparing the height and/or width of at least one block neighbouring the current block to the height and/or width of the current block, wherein the coding order is determined based on the comparison.
16. A method according to any preceding claim, further comprising a step of comparing the height to width ratio of at least one block neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
17. A method according to claim 16, further comprising a step of comparing an average of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
18. A method according to claim 16, further comprising a step of comparing a minimum of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
19. A method according to claim 16, further comprising a step of comparing a maximum of the height to width ratios of two or more blocks neighbouring the current block to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
20. A method according to any of claims 4 to 19, wherein the or each block neighbouring the current block is also used for context derivation.
21. A method according to any of claims 4 to 19, wherein the or each block neighbouring the current block comprises a block at a bottom left position to the current block and/or a block diagonally below and to the left of the current block.
22. A method according to any of claims 4 to 19, wherein the or each block neighbouring the current block comprises a block at a top right position to the current block and/or a block diagonally above and to the right of the current block.
23. A method according to any of claims 4 to 19, wherein all blocks having a common border or corner with the current block are used as neighbouring blocks.
24. A method according to any preceding claim, wherein the at least one parameter comprises at least one temporal parameter.
25. A method according to claim 24, wherein the at least one temporal parameter comprises a quad tree depth value, or block size, computed from a temporal area.
26. A method according to claim 24 or claim 25, wherein the at least one temporal parameter comprises one or more of: an average quad tree depth, or average block size, of a temporal area; a minimum quad tree depth, or minimum block size, of a temporal area; and a maximum quad tree depth, or maximum block size, of a temporal area.
27. A method according to any of claims 24 to 26, wherein the at least one temporal parameter comprises a multi tree depth value of a temporal area.
28. A method according to claim 27, further comprising a step of comparing the multi tree depth value of the temporal area to the multi tree depth value of the current block, wherein the coding order is determined based on the comparison.
29. A method according to claim 27, further comprising a step of converting the multi tree depth value of the temporal area to a converted value, wherein the method further comprises a step of comparing the converted value to the quad tree depth value of the current block, wherein the coding order is determined based on the comparison.
30. A method according to claim 29, wherein the step of converting the multi tree depth value of the temporal area to a converted value comprises dividing the multi tree depth value of the temporal area by two.
31. A method according to any of claims 24 to 30, wherein the at least one temporal parameter comprises one or more of: an average multi tree depth of a temporal area; a minimum multi tree depth of a temporal area; and a maximum multi tree depth of a temporal area.
32. A method according to any of claims 24 to 31, wherein the at least one temporal parameter comprises a binary tree depth value of a temporal area.
33. A method according to any of claims 24 to 32, wherein the at least one temporal parameter comprises one or more of: an average binary tree depth of a temporal area; a minimum binary tree depth of a temporal area; and a maximum binary tree depth of a temporal area.
34. A method according to any of claims 24 to 33, wherein the at least one temporal parameter comprises a ternary tree depth value of a temporal area.
35. A method according to any of claims 24 to 34, wherein the at least one temporal parameter comprises one or more of: an average ternary tree depth of a temporal area; a minimum ternary tree depth of a temporal area; and a maximum ternary tree depth of a temporal area.
36. A method according to any of claims 24 to 35, further comprising a step of comparing the height and/or width of at least one block of a temporal area to the height and/or width of the current block, wherein the coding order is determined based on the comparison.
37. A method according to any of claims 24 to 36, further comprising a step of comparing the height to width ratio of at least one block of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
38. A method according to claim 37, further comprising a step of comparing an average of the height to width ratios of two or more blocks of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
39. A method according to claim 37, further comprising a step of comparing a minimum of the height to width ratios of two or more blocks of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
40. A method according to claim 37, further comprising a step of comparing a maximum of the height to width ratios of two or more blocks of a temporal area to the height to width ratio of the current block, wherein the coding order is determined based on the comparison.
41. A method according to any preceding claim, wherein the at least one parameter comprises a parameter transmitted in a header.
42. A method according to claim 41 , wherein the header is a picture header or a slice header.
43. A method according to claim 41, wherein the at least one parameter comprises a parameter transmitted in a picture header and a parameter transmitted in a slice header.
44. A method according to any preceding claim, wherein the at least one parameter comprises a parameter transmitted in a sequence parameter set and/or a picture parameter set.
45. A method according to any preceding claim, wherein the at least one parameter comprises a fixed number.
46. A method according to any preceding claim, wherein the at least one parameter comprises a parameter that is dependent on a slice type of the slice to which the current block belongs.
47. A method according to any preceding claim, wherein the at least one parameter comprises a parameter that is dependent on a number of blocks in the vicinity of the current block encoded using intra predication and/or a number of blocks in the vicinity of the current block encoded using inter prediction.
48. A method according to claim 47, wherein at least one of the blocks in the vicinity of the current block belongs to a temporal area.
49. A method according to claim 47 or 48, the method further comprising a step of comparing a number of pixels in the vicinity of the current block encoded using intra prediction to a number of pixels in the vicinity of the current block encoded using inter prediction, wherein the coding order is determined based on the comparison.
50. A method according to any preceding claim, wherein the at least one parameter comprises a parameter dependent on the quantization parameter value of the current block.
51. A method according to any preceding claim, wherein the at least one parameter comprises a parameter dependent on the quantization parameter value of the slice to which current block belongs.
52. A method according to any preceding claim, wherein the at least one parameter comprises a parameter dependent on the temporal ID or the hierarchy depth of the frame to which current block belongs.
53. A method according to any preceding claim, wherein the at least one parameter comprises a parameter dependent on maximum multi tree depth of the slice to which current block belongs.
54. A method according to any preceding claim, wherein the at least one parameter comprises a parameter dependent on maximum multi tree depth of the picture to which current block belongs.
55. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to a quad tree split is coded before a flag related to no split.
56. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split qt flag is coded before split cu flag.
57. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to no split is coded after a flag related to a binary tree split.
58. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to no split is coded after a flag related to a ternary tree split.
59. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split cu flag is coded after mtt split cu vertical flag and mtt split cu binary flag.
60. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to quad tree split is coded after a flag related to a binary tree split.
61. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which a flag related to quad tree split is coded after a flag related to a ternary tree split.
62. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements comprises determining a coding order in which split qt flag is coded after mtt split cu vertical flag and mtt split cu binary flag.
63. A method according to any preceding claim, wherein the step of determining a coding order for the partitioning syntax elements is performed based on a high level syntax flag.
64. A method according to claim 63, wherein the high level syntax flag is transmitted in a sequence parameter set, a picture parameter set, a picture header or a slice header.
65. A method according to any of claims 25 to 40, wherein the temporal area comprises a temporal block collocated with respect to the current block.
66. A method according to any of claims 25 to 40, wherein the temporal area comprises one or more temporal block neighbouring a temporal block collocated with respect to the current block.
67. A method according to any of claims 25 to 40, wherein the temporal area comprises a coding tree unit collocated with respect to the current block.
68. A method according to any of claims 25 to 40, wherein the temporal area is determined based on a center of the current block.
69. A method according to any of claims 25 to 40, wherein the temporal area comprises all blocks of a temporal frame.
70. A method according to any of claims 25 to 40, wherein the temporal area comprises an area of a frame with a same temporal ID as the temporal ID of a frame to which the current block belongs.
71. A method according to claim 70, wherein the frame with the same temporal ID as the temporal ID of a frame to which the current block belongs comprises a temporally closest frame with the same temporal ID.
72. A method according to any of claims 25 to 40, wherein the temporal area comprises an area of a frame, or a reference frame, with a same quantization parameter as a frame to which the current block belongs.
73. A method according to any of claims 25 to 40, wherein the temporal area comprises an area of a reference frame used for temporal motion vector prediction.
74. A method according to any of claims 25 to 40, wherein the temporal area comprises an area of a reference frame, said reference frame being a temporally closest reference frame to a frame to which the current block belongs.
75. A method according to any of claims 25 to 40, wherein the temporal area comprises a first area of a first reference frame and a second area of a second reference frame.
PCT/EP2025/050422 2024-01-09 2025-01-09 Image and video coding and decoding Pending WO2025149564A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2400290.9A GB2637136A (en) 2024-01-09 2024-01-09 Image and video coding and decoding
GB2400290.9 2024-01-09

Publications (2)

Publication Number Publication Date
WO2025149564A2 true WO2025149564A2 (en) 2025-07-17
WO2025149564A3 WO2025149564A3 (en) 2025-09-04

Family

ID=89901552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2025/050422 Pending WO2025149564A2 (en) 2024-01-09 2025-01-09 Image and video coding and decoding

Country Status (2)

Country Link
GB (1) GB2637136A (en)
WO (1) WO2025149564A2 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101809630B1 (en) * 2015-06-11 2017-12-15 인텔렉추얼디스커버리 주식회사 Method and apparatus of image encoding/decoding using adaptive deblocking filtering
WO2018061550A1 (en) * 2016-09-28 2018-04-05 シャープ株式会社 Image decoding device and image coding device
US10848788B2 (en) * 2017-01-06 2020-11-24 Qualcomm Incorporated Multi-type-tree framework for video coding
WO2019243541A2 (en) * 2018-06-21 2019-12-26 Telefonaktiebolaget Lm Ericsson (Publ) Flexible tile partitions
BR112021000817A2 (en) * 2018-07-17 2021-04-13 Huawei Technologies Co., Ltd. LIMITED ENCODING TREE FOR VIDEO ENCODING
WO2023197998A1 (en) * 2022-04-13 2023-10-19 Mediatek Inc. Extended block partition types for video coding

Also Published As

Publication number Publication date
GB202400290D0 (en) 2024-02-21
GB2637136A (en) 2025-07-16
WO2025149564A3 (en) 2025-09-04

Similar Documents

Publication Publication Date Title
US20250193381A1 (en) High level syntax for video coding and decoding
US12238340B2 (en) High level syntax for video coding and decoding
KR102408765B1 (en) Video coding and decoding
US20260032289A1 (en) High level syntax for video coding and decoding
GB2582929A (en) Residual signalling
US20250280113A1 (en) High level syntax for video coding and decoding
US12323627B2 (en) High level syntax for video coding and decoding
US20250071337A1 (en) High level syntax for video coding and decoding
EP4409907A1 (en) Video coding and decoding
US12003779B2 (en) High level syntax for video coding and decoding
GB2585019A (en) Residual signalling
WO2025149564A2 (en) Image and video coding and decoding
WO2024213439A1 (en) Image and video coding and decoding
WO2024213386A1 (en) Image and video coding and decoding
GB2585018A (en) Residual signalling
GB2629032A (en) Image and video coding and decoding
WO2024213516A1 (en) Image and video coding and decoding
GB2628991A (en) Image and video coding and decoding
CN121058232A (en) Image and video encoding and decoding