[go: up one dir, main page]

HK40000117B - Video decoding method and device, and video coding method and device - Google Patents

Video decoding method and device, and video coding method and device Download PDF

Info

Publication number
HK40000117B
HK40000117B HK19123320.4A HK19123320A HK40000117B HK 40000117 B HK40000117 B HK 40000117B HK 19123320 A HK19123320 A HK 19123320A HK 40000117 B HK40000117 B HK 40000117B
Authority
HK
Hong Kong
Prior art keywords
picture
layer
information
nal unit
reference picture
Prior art date
Application number
HK19123320.4A
Other languages
Chinese (zh)
Other versions
HK40000117A (en
Inventor
姜晶媛
李河贤
崔振秀
金镇雄
Original Assignee
韩国电子通信研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 韩国电子通信研究院 filed Critical 韩国电子通信研究院
Publication of HK40000117A publication Critical patent/HK40000117A/en
Publication of HK40000117B publication Critical patent/HK40000117B/en

Links

Description

Video decoding method and apparatus, video encoding method and apparatus
The divisional application is filed on the basis of 16.4.4.2013, and has the application number of 201380025824.8, and the invention name of the divisional application is 'an image information decoding method, an image decoding method and a device using the methods'.
Technical Field
The present invention relates to video encoding and decoding processes, and more particularly, to a video decoding method and apparatus, a video encoding method and apparatus, a method of storing and generating a bitstream, and a computer readable medium.
Background
As broadcasting with High Definition (HD) resolution is extended and serviced nationwide and worldwide, many users are becoming accustomed to video with high resolution and high SNR. Accordingly, many organizations have made many attempts to develop next generation video devices. In addition, since there is an increasing interest in Ultra High Definition (UHD) having a resolution 4 times higher than that of HDTV together with HDTV, a technique of compressing and processing video having higher resolution and higher SNR is required.
For compressing video, an inter prediction technique in which values of pixels included in a current picture are predicted from temporally previous and/or subsequent pictures, an intra prediction technique in which values of pixels included in the current picture are predicted using information on the pixels included in the current picture, an entropy coding technique in which short codes are allocated to symbols having a high occurrence frequency and long codes are allocated to symbols having a low occurrence frequency, and the like may be used.
Video compression techniques include techniques in which a specific network bandwidth is provided in a limited operating environment of hardware, regardless of a flexible network environment. However, in order to compress video data applied to a network environment including a frequently changing bandwidth, a new compression technique is required. For this purpose, a scalable video encoding/decoding method may be used.
Disclosure of Invention
Technical problem
It is an object of the present invention to provide a method and apparatus for describing extraction and scalability information within a layered bitstream.
Another object of the present invention is to provide a method and apparatus for representing scalability information with respect to various bit streams used in a flexible manner.
It is still another object of the present invention to provide a method and apparatus for providing extraction and scalability information within a layered bitstream such that the extraction and scalability information can be adaptively transformed at a packet level.
Technical scheme
The video decoding apparatus according to an embodiment of the present invention includes: the analysis module is used for analyzing the fragment header of the received picture; a decoding module for decoding a received picture; and a Decoded Picture Buffer (DPB) for storing the decoded picture, wherein the decoded picture is marked in the DPB as a reference picture, and whether the decoded picture is the reference picture or a non-reference picture is marked again based on reference picture information included in a slice header of a next picture of the decoded picture.
The parsing module parses a Network Abstraction Layer (NAL) unit header that includes information about the encoded video, and the NAL unit header does not include flag information indicating whether the NAL unit is a non-reference picture in the entire bitstream or 1 bit of a reference picture when the video is encoded.
The NAL unit header includes: layer identification information for identifying a scalable layer in a bitstream supporting the scalable layer.
1 bit of flag information for indicating whether a NAL unit is a non-reference picture in the entire bitstream or a reference picture is used to signal the layer identification information.
The video decoding method according to an embodiment of the present invention may include: decoding a received picture; marking the decoded picture as a reference picture in a Decoded Picture Buffer (DPB); parsing a slice header of a next picture of the decoded picture; and marking whether the decoded picture is a reference picture or a non-reference picture based on the reference picture information included in the slice header.
The video decoding method may include: parsing a Network Abstraction Layer (NAL) unit header of a NAL unit including information on the encoded video, and the NAL unit header not including flag information indicating whether the NAL unit is a non-reference picture in an entire bitstream or 1 bit of a reference picture when the video is encoded.
The NAL unit header includes: layer identification information for identifying a scalable layer in a scalable layer-supported bitstream.
1 bit of flag information for indicating whether a NAL unit is a non-reference picture in the entire bitstream or a reference picture is used to signal the layer identification information.
A video encoding device may include: an encoding module for encoding a slice header including reference picture information referred to by a corresponding slice; and a transmission module for transmitting the bitstream including the slice header.
The encoding module encodes a Network Abstraction Layer (NAL) unit header including information on the encoded video, and the NAL unit header does not include flag information indicating whether the NAL unit is a non-reference picture in an entire bitstream or 1 bit of a reference picture when encoding the video.
The NAL unit header includes: layer identification information for identifying a scalable layer in a scalable layer-supported bitstream.
The encoding module encodes the layer identification information using 1 bit of flag information indicating whether a NAL unit is a non-reference picture or a reference picture in the entire bitstream.
A video encoding method may include: encoding a slice header including reference picture information referred to by a corresponding slice; and transmitting a bitstream including the slice header.
The video encoding method may further include: encoding a Network Abstraction Layer (NAL) unit header comprising NAL units of information about the encoded video; and transmitting a bitstream including a NAL unit, wherein the NAL unit header does not include flag information indicating whether the NAL unit is a non-reference picture in the entire bitstream or 1 bit of a reference picture when encoding video.
The NAL unit header includes: layer identification information for identifying a scalable layer in a scalable layer-supported bitstream.
1 bit of flag information indicating whether a NAL unit is a non-reference picture in the entire bitstream or a reference picture is used to encode the layer identification information.
A video decoding apparatus may include: a parsing module to receive a Supplemental Enhancement Information (SEI) message including information on an active parameter set and parse the information on the parameter set.
The information about the set of active parameters includes: at least one of information indexing an active Video Parameter Set (VPS) and information indexing an active Sequence Parameter Set (SPS).
The information about the set of activity parameters includes: information indexing an active VPS, information indicating the number of SPS referring to the active VPS, and information indexing SPS.
The information on the parameter set is used to extract a sub-layer or a layer providing at least one of temporal, spatial, quality, and view scalability.
The information about the parameter set is used to refer to or activate the parameter set for decoding a video coding layer Network Abstraction Layer (NAL) unit.
The SEI message includes: at least one of information indexing a VPS and information indexing an SPS, and the video decoding apparatus further comprises: a decoding module for activating at least one of a VPS and an SPS corresponding to the information indexing the VPS and the information indexing the SPS.
A video decoding method may include: receiving a Supplemental Enhancement Information (SEI) message including information about an active parameter set; and parsing information about the parameter set.
The information about the set of active parameters includes: at least one of information indexing an active Video Parameter Set (VPS) and information indexing an active Sequence Parameter Set (SPS).
The information about the set of active parameters includes: information indexing an active VPS, information indicating the number of SPS referring to the active VPS, and information indexing SPS.
The information on the parameter set is used to extract sub-layers providing temporal, spatial, quality, and view scalability.
The information about the parameter set is used to refer to or activate the parameter set for decoding a video coding layer Network Abstraction Layer (NAL) unit.
The SEI message includes: at least one of information indexing a VPS and information indexing an SPS, and the video decoding method further comprises: activating at least one of a VPS and an SPS corresponding to the information indexing the VPS and the information indexing the SPS.
A video encoding device may include: an encoding module for encoding a Supplemental Enhancement Information (SEI) message including information on an active parameter set; and a transmission module for transmitting a bitstream including the SEI message.
The information about the set of active parameters includes: at least one of information indexing an active Video Parameter Set (VPS) and information indexing an active Sequence Parameter Set (SPS).
The information about the set of active parameters includes: information indexing an active VPS, information indicating the number of SPS referring to the active VPS, and information indexing SPS.
The information on the parameter set is used to extract sub-layers providing temporal, spatial, quality, and view scalability.
The information about the parameter set is used to refer to or activate the parameter set for decoding a video coding layer Network Abstraction Layer (NAL) unit.
A video encoding method may include: encoding a Supplemental Enhancement Information (SEI) message including information on an active parameter set; and transmitting a bitstream including the SEI message.
The information about the set of activity parameters includes: at least one of information indexing an active Video Parameter Set (VPS) and information indexing an active Sequence Parameter Set (SPS).
The information about the set of activity parameters includes: information indexing an active VPS, information indicating the number of SPS referring to the active VPS, and information indexing SPS.
The information on the parameter set is used to extract sub-layers providing temporal, spatial, quality, and view scalability.
The information about the parameter set is used to refer to or activate the parameter set for decoding a video coding layer Network Abstraction Layer (NAL) unit.
Technical effects
According to an embodiment of the present invention, a method and apparatus for describing extraction and scalability information within a layered bitstream may be provided.
According to an embodiment of the present invention, it is possible to provide a method and apparatus for representing scalability information with respect to various bit streams used in a flexible manner.
Another embodiment of the present invention may provide a method and apparatus for providing extraction and scalability information within a layered bitstream such that the extraction and scalability information may be adaptively transformed at a packet level.
Drawings
Fig. 1 is a block diagram showing an example of the structure of a video encoding apparatus according to an exemplary embodiment;
fig. 2 is a block diagram showing an example of the structure of a video decoding apparatus according to an exemplary embodiment;
fig. 3 is a conceptual diagram schematically illustrating an exemplary embodiment of a Scalable Video Coding (SVC) structure using multiple layers to which the present invention can be applied;
fig. 4 is a control flow diagram illustrating a method of encoding video information according to the present invention; and
fig. 5 is a control flow diagram illustrating a method of decoding video information according to the present invention.
Detailed Description
Some exemplary embodiments of the present invention are described in detail with reference to the accompanying drawings. However, in describing the embodiments of the present specification, detailed descriptions of well-known functions and structures will be omitted if it is considered that the gist of the present invention is unnecessarily unclear.
In this specification, when an element is referred to as being "connected" or "coupled" to another element, it may be directly connected or coupled to the other element or a third element may be "connected" or "coupled" between the two elements. In addition, in the present specification, when it is stated that "including" a specific element, it may mean that elements other than the specific element are not excluded, and additional elements may be included in the exemplary embodiment of the present invention or the technical scope of the present invention.
Terms such as first and second may be used to describe various elements, but the elements are not limited by the terms. The term is used merely to distinguish one element from another. For example, a first element could be termed a second element without departing from the scope of the present invention. Likewise, a second element may be termed a first element.
Further, the element units described in the exemplary embodiments of the present invention may be independently illustrated to indicate the difference and the characteristic functions, and this does not mean that each element unit is formed of a separate piece of hardware or a piece of software. That is, for convenience of explanation, element units are arranged and included, and at least two element units may form one element unit or one element may be divided into a plurality of element units and the plurality of divided element units may perform functions. Embodiments in which elements are integrated or some elements are separated therefrom are also included in the scope of the present invention unless they depart from the essence of the present invention.
Further, in the present invention, some elements are not essential elements for performing essential functions, but may be optional elements only for improving performance. The present invention may be implemented using only essential elements for implementing the essence of the invention except for elements for only improving performance, and a structure including only essential elements except for optional elements for only improving performance is included in the scope of the present invention.
Fig. 1 is a block diagram showing an example of the structure of a video encoding apparatus according to an exemplary embodiment. The scalable video encoding/decoding method or apparatus may be implemented by an extension of a general video encoding/decoding method or apparatus that does not provide scalability. Fig. 1 is a block diagram illustrating an exemplary embodiment of a video encoding device that may form the basis for a scalable video encoding device.
Referring to fig. 1, the video encoding apparatus 100 includes a motion prediction module 111, a motion compensation module 112, an intra prediction module 120, a switch 115, a subtractor 125, a transform module 130, a quantization module 140, an entropy encoding module 150, a dequantization module 160, an inverse transform module 170, an adder 175, a filter 180, and a reference picture buffer 190.
The video encoding apparatus 100 may perform encoding with respect to an input picture in an intra mode or an inter mode and output a bitstream as a result of the encoding. In this specification, intra prediction has the same meaning as intra-picture prediction, and inter prediction has the same meaning as inter-picture prediction. In the case of the intra mode, the switch 115 may switch to the intra mode. In the case of the inter mode, the switch 115 may switch to the inter mode. The video encoding device 100 may generate a prediction block for an input block of an input picture and then encode a residual (residual) between the input block and the prediction block.
In the case of the intra mode, the intra prediction module 120 may generate a prediction block by performing spatial prediction using pixel values of neighboring blocks of the current block that have already been encoded.
In the case of the inter mode, the motion prediction module 111 may obtain a motion vector by searching a reference picture stored in a reference picture buffer for a region that best matches an input block in the motion estimation process. The motion compensation module 112 may generate the prediction block by performing motion compensation using the motion vector and the reference picture stored in the reference picture buffer 190.
The subtractor 125 may generate a residue block based on a residue between the input block and the generated prediction block. The transform module 130 may perform a transform with respect to the residue block and output a transform coefficient according to the transformed block. In addition, the quantization module 140 may output a quantized coefficient by quantizing the received transform coefficient using at least one of a quantization parameter and a quantization matrix.
The entropy encoding module 150 may perform entropy encoding on the symbols according to a probability distribution based on the values calculated by the quantization module 140 or the encoding parameter values calculated in the encoding process, and output a bit stream as a result of the entropy encoding. The entropy encoding method is a method of receiving symbols having various values and representing the symbols in the form of a string of decodable binary numbers while removing statistical redundancy from the symbols.
Here, the symbol refers to a syntax element and a coding parameter to be encoded or decoded, a value of a residual signal, and the like. The encoding parameters are parameters required for encoding and decoding. The encoding parameters may include not only information encoded by the encoder and then transmitted to the decoder together with the syntax elements, but also information that may be introduced in the encoding or decoding process. The encoding parameter refers to information required to encode or decode video. The encoding parameters may include, for example: a value or statistic of an intra/inter prediction mode, a motion vector, a reference picture index, a coding block mode (pattern), information on whether a residual signal exists, a transform coefficient, a quantized transform coefficient, a quantization parameter, a block size, and block partition information. In addition, the residual signal may refer to a difference between the original signal and the predicted signal. In addition, the residual signal may refer to a signal obtained by transforming a difference between an original signal and a predicted signal, or a signal obtained by transforming and quantizing a difference between an original signal and a predicted signal. In the block unit, the residue signal may be referred to as a residue block.
If entropy encoding is used, the bit stream size of symbols to be encoded can be reduced because the symbols are represented by allocating a small number of bits to symbols having a high occurrence frequency and allocating a large number of bits to symbols having a low occurrence frequency. Accordingly, the compression performance of video encoding can be improved by entropy encoding.
For entropy encoding, encoding methods such as exponential Golomb (Golomb), context-adaptive variable length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC) may be used. For example, a table for performing entropy encoding, such as a variable length coding/code (VLC) table, may be stored in the entropy encoding module 150, and the entropy encoding module 150 may perform entropy encoding using the stored VLC table. Further, the entropy encoding module 150 may derive a binarization method for the target symbol and a probability model for the target symbol/bin (bin), and perform entropy encoding using the derived binarization method or probability model.
The quantized coefficients are dequantized by a dequantization module 160 and then inverse transformed by an inverse transform module 170. The dequantized and inverse transform coefficients may be added to the predicted block by adder 175, thereby generating a reconstructed block.
The reconstructed block is subjected to a filter 180. The filter 180 may apply one or more of a deblocking (deblocking) filter, a Sample Adaptive Offset (SAO), and an Adaptive Loop Filter (ALF) to the reconstructed block or the reconstructed picture. The reconstructed block having undergone the filter 180 may be stored in a reference picture buffer 190.
Fig. 2 is a block diagram illustrating an example of the structure of a video decoding apparatus according to an exemplary embodiment. As described above with reference to fig. 1, the scalable video encoding/decoding method or apparatus can be implemented by an extension of a general video encoding/decoding method or apparatus that does not provide scalability. Fig. 2 is a block diagram illustrating an exemplary embodiment of a video decoding device that may form the basis for a scalable video decoding device.
Referring to fig. 2, the video decoding module 200 includes an entropy decoding module 210, an inverse quantization module 220, an inverse transformation module 230, an intra prediction module 240, a motion compensation module 250, a filter 260, and a reference picture buffer 270.
The video decoding apparatus 200 may receive a bitstream output from an encoder, perform decoding with respect to the bitstream in an intra mode or an inter mode, and output a reconstructed picture, i.e., a reconstructed picture. In the case of the intra mode, the switch may switch to the intra mode. In the case of the inter mode, the switch 115 may switch to the inter mode. The video decoding apparatus 200 may obtain a reconstructed residual block from the received bitstream, generate a prediction block, and then generate a reconstructed block, i.e., a reconstructed block, by adding the reconstructed residual block to the prediction block.
The entropy decoding module 210 may generate a symbol including symbols in the form of quantized coefficients by performing entropy decoding on the received bitstream according to the probability distribution. The entropy decoding method is a method of receiving a string of binary numbers and generating each symbol using the string of binary numbers. The entropy decoding method is similar to the entropy encoding method described above.
The quantized coefficients are dequantized by an inverse quantization module 220 and inverse transformed by an inverse transform module 230. As a result of the dequantization/inverse transformation of the quantized coefficients, a residue block may be generated.
In case of the intra mode, the intra prediction module 240 may generate a prediction block by performing spatial prediction using pixel values of already decoded blocks adjacent to the current block. In case of the inter mode, the motion compensation module 250 may generate a prediction block by performing motion compensation using a motion vector and a reference picture stored in the reference picture buffer 270.
The residue block and the predicted block are added together by adder 255. The summed blocks are subjected to a filter 260. The filter 260 may apply at least one of the deblocking filter, SAO, and ALF to the reconstructed block or the reconstructed picture. The filter 260 outputs the reconstructed picture, i.e., a reconstructed picture. The reconstructed picture may be stored in the reference picture buffer 270 and may be used for inter prediction.
From the entropy decoding module 210, the inverse quantization module 220, the inverse transform module 230, the intra prediction module 240, the motion compensation module 250, the filter 260, and the reference picture buffer 270 included in the video decoding apparatus 200, elements directly related to video decoding, for example, the entropy decoding module 210, the inverse quantization module 220, the inverse transform module 230, the intra prediction module 240, the motion compensation module 250, and the filter 260 may be represented as decoding modules to distinguish them from other elements.
Further, the video decoding apparatus 200 may additionally include a parsing module (not shown) for parsing information about the encoded video included in the bitstream. The parsing module may include the entropy decoding module 210, or the parsing module may be included in the entropy decoding module 210. The parsing module may be represented as one of the elements of the decoding module.
Fig. 3 is a conceptual diagram schematically illustrating an exemplary embodiment of a Scalable Video Coding (SVC) structure using multiple layers to which the present invention can be applied. In fig. 3, a group of pictures (GOP) indicates a group of pictures, i.e., a group of pictures.
In order to transmit video data, a transmission medium is required, and the transmission medium has different performances depending on various network environments. For application to various transmission media or network environments, a Scalable Video Coding (SVC) method may be employed.
The SVC method is an encoding method that improves encoding/decoding performance by removing redundancy between layers using texture information, motion information, a residual signal, and the like between the layers. The SVC method can provide various scalabilities from the viewpoint of space, time, and signal-to-noise ratio (SNR) depending on surrounding conditions such as a transmission bit rate, a transmission error rate, and system resources.
SVC may be performed using a multi-layer structure so that bitstreams applicable to various network conditions may be provided. For example, the SVC structure may include a base layer whose video data can be compressed and processed by using a general video coding method, and may include an enhancement layer whose video data can be compressed and processed by using both coding information of the base layer and the general video coding method.
Here, a layer refers to a set of pictures and bitstreams classified based on spatial resolution (e.g., image size), temporal resolution (e.g., coding order, picture output order, and frame rate), SNR, and complexity. In addition, the base layer may refer to a base layer, and the enhanced layer may refer to an enhanced layer. Further, multiple layers may have correlation between them.
For example, referring to fig. 3, the base layer may be defined by Standard Definition (SD), a frame rate of 15Hz, and a bit rate of 1 Mbps. The first enhancement layer may be defined by High Definition (HD), a frame rate of 30Hz, a bit rate of 3.9 Mbps. The second enhancement layer may be defined by a 4K Ultra High Definition (UHD), a frame rate of 60Hz, and a bit rate of 27.2 Mbps. The format, frame rate, bit rate, etc. are only exemplary embodiments and may be determined differently, if necessary. Further, the number of layers used is not limited to the current exemplary embodiment, and may be determined differently according to circumstances.
For example, if the transmission bandwidth is 4Mbps, the frame rate of the first enhancement layer HD can be reduced to less than 15Hz. The SVC method may provide temporal, spatial, and SNR scalability according to the methods described above in connection with the exemplary embodiment of fig. 3.
SVC has the same meaning as scalable video coding from an encoding viewpoint and scalable video decoding from a decoding viewpoint.
As described above, scalability is now an important function of video formats due to heterogeneous communication networks and various terminals. SVC, an extended standard of Advanced Video Coding (AVC), was developed to generate a bitstream having a wide range of bit rates while maintaining compression efficiency to the maximum extent. In order to satisfy characteristics and changes of various devices and networks, an SVC bitstream can be easily extracted in various ways. That is, the SVC standard provides spatial, temporal, and SNR scalability.
Meanwhile, a bitstream including multiple layers is composed of Network Abstraction Layer (NAL) units enabling adaptive transmission of video over a packet-switched network. As in multiple layers, the relationship between multiple views in multi-view video coding, which includes multiple multi-view videos within a bitstream, is similar to the relationship between middle layers in video supporting multiple layers.
In order to transform a bitstream efficiently and effectively in all nodes in a content transmission path, scalability information on the bitstream is very important. In the current standard for single-layer video coding (i.e., high efficiency video coding), two fields regarding layer information, i.e., time id (temporal _ id) and reserved one 5bits (reserved _ one _5 bits), are present in a NAL unit (hereinafter, abbreviated 'NALU') header. A field 'temporal _ id' having a length of 3 bits indicates a temporal layer of the video bitstream, and a field 'reserved _ one _5bits' corresponds to an area for indicating another subsequent layer information.
The temporal layer refers to a layer of a temporally scalable bitstream including a Video Coding Layer (VCL) NALU, and has a specific temporal _ id value.
The present invention relates to a method of efficiently describing extraction information and scalability information regarding a picture within a bitstream supporting multiple layers and concurrently signaling the extraction information and the scalability information, and an apparatus for implementing the same.
In the present invention, the bit stream is divided into two types: a base type supporting only temporal scalability and an extension type capable of having scalability supporting spatial/SNR/multi-view.
A first type of bitstream relates to bitstreams supporting single-layer video, and a second type thereof relates to enhancement layers in HEVC-based layered video coding. In the following, an improved method for representing scalability information on two bitstream types is proposed. In the extended type, 5bits of 'reserved _ one _5bits' may be used as a layer id (layer _ id) indicating an identifier of a scalable layer according to the present invention.
Slave NALU headerRemoving nal _ ref _ flag
The Nal reference flag (Nal _ ref _ flag) is used to indicate a non-reference picture. This information indicates the approximate priority between the non-reference picture and the reference picture, but the use of the nal _ ref _ flag is somewhat limited.
A reference picture refers to a picture including samples that can be used for inter prediction when a subsequent picture is decoded in decoding order.
A non-reference picture refers to a picture that includes samples that are not used for inter-picture prediction when a subsequent picture is decoded in decoding order.
nal _ ref _ flag is a flag indicating information indicating whether the corresponding NALU is a non-reference picture or a reference picture in the entire bitstream at the time of encoding.
When the value of nal _ ref _ flag is 1, NALU refers to a slice (slice) including a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS), or a reference picture. When the value of nal _ ref _ flag is 0, NALU refers to a slice including some part of a non-reference picture or the entire non-reference picture.
Here, NALU having a value of nal _ ref _ flag of 1 may include slices of the reference picture. nal _ ref _ flag has a value of 1 for NALU of Video Parameter Set (VPS), sequence Parameter Set (SPS), or Picture Parameter Set (PPS). If the value of nal _ ref _ flag in one of the VCL NALUs of a particular picture is 0, then nal _ ref _ flag has a value of 0 for all VCL NALUs of that picture.
Meanwhile, if all non-reference pictures, in particular, non-reference pictures mainly corresponding to the highest temporal layer are extracted, nal _ ref _ flag of all pictures remaining after the extraction becomes 1.
However, although the value of nal _ ref _ flag is 1, some pictures of the adaptively transformed bit stream, i.e., pictures corresponding to the highest temporal layer among the remaining bit streams, become non-reference pictures.
In other words, another syntax element (e.g., temporal _ id) of the NALU header may be more efficient in supporting adaptive transformation (or extraction). That is, a bitstream including a desired temporal layer may be extracted using a total number of temporal layers including the bitstream and a value of temporal _ id of a NALU header.
Further, when a picture formed of NALUs including a nal _ ref _ flag is decoded (reconstructed) and then the decoded picture is stored in a memory such as a Decoded Picture Buffer (DPB), the nal _ ref _ flag may also be used to indicate whether the corresponding picture will be subsequently used as a reference picture. If the value of nal _ ref _ flag is 1, it indicates that the corresponding picture is subsequently used as a reference picture. If the value of nal _ ref _ flag is 0, it indicates that the corresponding picture is not used as a reference picture subsequently.
When a decoded picture is stored in the DPB without determining whether the corresponding NALU is a non-reference picture or a reference picture based on the nal _ ref _ flag, the decoded picture can be represented as a reference picture. In this case, although the decoded picture is a non-reference picture, it is represented as a reference picture because when the next picture of the corresponding picture is decoded in decoding order, the corresponding picture will not be included in the reference picture list transmitted in the slice header of the next picture, so there is no problem.
That is, when the next picture is decoded, whether the previously decoded picture is a reference picture or a non-reference picture is indicated based on the reference picture list included in the slice header of the next picture. Therefore, even if the decoded picture is represented as a reference picture without considering the nal _ ref _ flag, there is no problem in determining whether the decoded picture is a reference picture or a non-reference picture.
The present invention proposes to delete nal _ ref _ flag or change the semantics of nal _ ref _ flag from NALU header. An example of the deletion of nal _ ref _ flag is as follows.
Example 1
The flag 'nal _ ref _ flag' is changed to a slice reference flag (slice _ ref _ flag), and the position of the flag is moved from the NALU header to the slice header. The syntax of the slice header may be modified as in table 1.
< Table 1>
In table 1, when the value of slice _ ref _ flag is 1, it indicates that a slice is a part of a reference picture. When the value of slice _ ref _ flag is 0, it indicates that the slice is some part of the reference picture.
Example 2
The flag 'nal _ ref _ flag' is changed to au reference flag (au _ ref _ flag), and the position of the flag is moved from the NALU header to the access unit delimiter. The syntax of the access unit delimiter may be the same as table 2.
< Table 2>
In table 2, when the value of au _ ref _ flag is 1, it indicates that the access unit includes a reference picture. When the value of au _ ref _ flag is 0, it indicates that the access unit includes a non-reference picture.
Example 3
Instead of moving to another syntax, the nal _ ref _ flag is deleted from the NALU header.
If nal _ ref _ flag, i.e., 1-bit flag information, indicating whether a picture in the entire bitstream is a non-reference picture or a reference picture when the picture is decoded, is deleted, determination of whether the picture passing the nal _ ref _ flag is a reference picture may be performed through another process. After decoding a received picture, the decoded picture is unconditionally indicated in a Decoded Picture Buffer (DPB) as a reference picture. That is, whether or not the decoded picture is a reference picture may not be determined, but the decoded picture may be indicated as a reference picture.
Thereafter, a slice header of a picture next to the decoded picture is parsed, and whether the decoded picture is a reference picture or a non-reference picture can be indicated based on reference picture information included in the slice header.
Example 4
Nal _ ref _ flag may be deleted from NALU header and temporal _ id may be used to indicate that NALU is a non-reference picture. To indicate the non-reference picture, temporal _ id may be '7', the maximum number of temporal layers included in the bitstream-1 (i.e., max _ temporal _ layers _ minus 1), or a preset value other than '0'.
Example 5
Nal _ ref _ flag may be deleted from NALU header and reserved _ one _5bits may be used as priority id element to indicate that NALU is a non-reference picture. The priority _ id is an identifier indicating the priority of the corresponding NALU, and is used to provide a bitstream extraction function based on the priority regardless of different spaces, times, and SNRs.
That is, if temporal _ id = Ta is an identifier of the highest temporal layer, then temporal _ id = Ta and NALU, i.e., priority _ id =31 (or another specific value) is used to indicate that NALU is a NALU of a non-reference picture.
The 1 bit for signaling nal _ ref _ flag may be used as one of the following matters.
(1) The 1 bit may be used to indicate the nal unit type (nal _ unit _ type). The nal _ unit _ type may be a 7-bit signal, and the number of NALU types may be doubled.
(2) A 1 bit may be used to indicate the temporal _ id. temporal _ id may be a 4-bit signal and the maximum number of temporal layers may be doubled.
(3) A 1 bit may be used to indicate layer _ id. layer _ id refers to an identifier of a scalable layer of a layered bitstream and can be signaled by a reserved _ one _5bits syntax element. 1 bit for signaling nal _ ref _ flag may be added to 5bits of reserved _ one _5bits for identifying a scalable layer, and thus layer _ id may be a signal of 6 bits. If 6 bits are used, 64 scalability layers may be identified.
(4) The 1 bit may be used as a flag informing whether reserved _ one _5bits indicate priority.
(5) 1 bit may be used as a reserved bit (reserved _ bit).
If the nal _ ref _ flag is not deleted from the NALU header, the semantics of the nal _ ref _ flag may be modified as follows.
When the value of nal _ ref _ flag is 0, it indicates that the NALU includes only slices of non-reference pictures. When the value of nal _ ref _ flag is 1, it indicates that the NALU may include a slice of a reference picture or a non-reference picture.
Signaling of Video Parameter Set (VPS) activation
The VPS includes the most basic information for decoding video and may include content existing in an existing SPS.
The VPS may include information on a sub-layer representing a temporal layer supporting temporal scalability and information on a multi-layer supporting spatial, quality, and view scalability. That is, the VPS may include multi-layer information, i.e., syntax for HEVC extensions.
A. Video parameter set
The syntax for VPS is the same as table 3.
< Table 3>
In table 3, most of the syntax has the same semantics as the SPS syntax applied to the bitstream including a single layer, and the other parts are as follows.
Video parameter set id (video _ parameter _ set _ id) refers to the identifier of the VPS and can refer to the video _ parameter _ set _ id in Sequence Parameter Set (SPS), supplemental Enhancement Information (SEI), or access unit delimiter.
When the value of priority _ id _ flag is 1, it indicates that reserved _ one _5bits can be used identically to priority _ id of the SVC standard. When the value of priority _ id _ flag is 0, this means that reserved _ one _5bits can be used as layer _ id.
When the value of the extension information flag (extension _ info _ flag) is 0, it indicates that the bitstream conforms to the single-layer standard of HEVC. When the value of extension _ info _ flag is 1, it indicates an enhancement layer for supporting scalability (i.e., when HEVC extension is supported), and provides information on the layer.
B. Modification of Sequence Parameter Set (SPS)
As in table 4, some existing syntax may be incorporated into the VPS and may be deleted from the SPS. Meanwhile, a vps _ id syntax element may be added to the SPS. The SPS syntax to which vps _ id has been added is the same as table 4. In table 4, the deleted syntax is indicated by the line through the middle of the syntax.
VPS _ id indicates an identifier for identifying a VPS to which reference can be made in the SPS, and may have a range of 0 to X.
< Table 4>
VPS activation signaling
The slice header includes index information on a Picture Parameter Set (PPS) to which the corresponding slice refers, and the PPS includes index information on a Sequence Parameter Set (SPS) to which the corresponding picture refers. The SPS includes information about a Video Parameter Set (VPS) of a corresponding sequence reference. As described above, when information on a parameter set is parsed and then information on the parsed parameter set is referred to as activation.
In order to use information about a specific parameter set, i.e., to activate the parameter set, the parameter set needs to be gradually parsed from the slice header. This means that all slice headers and the related PPS need to be analyzed to know what SPS is activated.
When extracting some parts of a sub-layer (i.e., temporal layer) from a bitstream including a single layer, an extractor needs to analyze (or parse) the NALU header and a plurality of parameter sets.
If the extracted information for the NALU is included in the VPS or SPS, the extractor needs to sequentially parse higher parameter sets from the slice header. This means that the extractor needs to know all syntax elements of the parameter set and the fragment header.
On the other hand, when there is no complicated parsing process even in the video decoding process, vps _ id or sps _ id may be searched and only a required parameter set may be activated. In this case, if the VPS or SPS includes parameter index information to be activated, a parsing process for a complicated slice header and a related PPS may be reduced.
Meanwhile, only some elements of the syntax may include pieces of information required to extract the bitstream. However, analyzing all syntax elements may become a large burden on the extractor. In order to solve this problem, the following methods are proposed.
In the present invention, activation of a parameter set refers to performing signaling so that an extractor can know what parameter set is activated without analyzing a slice header and a related Picture Parameter Set (PPS).
According to the invention it may be additionally signaled which VPS SPS or PPS is active, so that the burden on the extractor that needs to analyze all slice headers and the associated PPS is reduced.
The VPS may be updated. One of the following three methods may be used so that the extractor can know the active VPS and the associated SPS or PPS without analyzing the slice header.
(1) vps _ id, sps _ id, and pps _ id may be included in the access unit delimiter. VPS _ id, SPS _ id and PPS _ id indicate the identifiers of the respective VPS, SPS and PPS for the NALU of the associated AU.
To indicate whether an identifier is present in the access unit delimiter, a vps _ id present flag (vps _ id _ present _ flag), a sps _ id present flag (sps _ id _ present _ flag), and a pps _ id present flag (pps _ id _ present _ flag) are used. The syntax of the proposed access unit delimiter can be the same as table 5.
< Table 5>
(1-1) in another method, sps _ id and pps _ id are excluded and only vps _ id may be included in the access unit delimiter as in table 6.
< Table 6>
(2) Another method for activation signaling of VPS is to set a reference (parameter _ set _ reference) using a new SEI message 'parameter'. The SEI message includes syntax for informing whether VPS _ id, SPS _ id, and PPS _ id, which indicate identifiers for NALUs within the related AU, exist.
To indicate whether the identifier exists, vps _ id _ present _ flag, sps _ id _ present _ flag, and pps _ id _ present _ flag syntax may be used, and the SEI syntax is the same as table 7.
< Table 7>
(2-1) furthermore, activation of VPS and SPS can be notified by excluding pps _ id and including SPS _ id and VPS _ id in SEI message as in < table 8 >. The sps _ id and vps _ id in the SEI message may include the sps _ id and vps _ id referenced by the video coding layer NALU of the access unit associated with the SEI message. Thus, sps _ id and vps _ id may indicate information on parameter sets with possibility of activation.
< Table 8>
In Table 8, VPS _ id indicates the video _ parameter _ set _ id of the now activated VPS. The value of vps _ id may have a value of 0-15.
If the SPS _ id _ present _ flag has the value 1, it indicates that sequence _ parameter _ set _ id of the now activated SPS is included in the corresponding SEI message. If SPS _ id _ present _ flag has the value 0, it indicates that sequence _ parameter _ set _ id of the now activated SPS is not included in the corresponding SEI message.
The SPS _ id indicates the sequence _ parameter _ set _ id of the now activated SPS. The sps _ id may have a value of 0-31, more restrictively, 0-15.
When the value of the psr extension flag (psr _ extension _ flag) is 0, it indicates that the SEI message extension syntax element is not included in the SEI message. When the value of the psr _ extension _ flag is 1, it indicates an extension and extends a syntax element using an SEI message including an SEI message.
The psr extension length (psr _ extension _ length) indicates the length of psr extension data (psr _ extension _ data). The psr _ extension _ length may have a value ranging from 0 to 256, and the psr extension data byte (psr _ extension _ data _ byte) may have any value.
(2-2) further, one or more sps _ id and vps _ id other than pps _ id may be included in the SEI message and then signaled, as in table 9.
< Table 9>
In table 9, VPS _ id indicates the video _ parameter _ set _ id of the active VPS. vps _ id may have a value of 0-15.
The number reference SPS (num _ reference _ SPS) indicates the number of SPS referencing vps _ id of activity.
The SPS _ id (i) indicates the sequence _ parameter _ set _ id of the active SPS, and may have a value of 0-31, more restrictively, 0-15.
(2-3) furthermore, only the vps _ id in addition to the sps _ id and pps _ id may be included in the SEI message and then signaled, as in table 10.
< Table 10>
parameter_set_reference(payloadSize){ Descriptor(s)
vps_id ue(v)
}
(3) Another method for activation signaling of VPS is to include information informing of VPS _ id, sps _ id, and pps _ id in a buffer period SEI message. Table 11 shows a syntax including a vps _ id _ present _ flag, a sps _ id _ present, and a pps _ id _ present _ flag indicating whether vps _ id, sps _ id, and pps _ id exist.
< Table 11>
(3-1) furthermore, as in table 12, activation of VPS can be signaled by including only VPS _ id in addition to sps _ id and pps _ id in the buffer period SEI message.
< TABLE 12>
(4) Another method for activation signaling of parameter sets is to include information informing of vps _ id, sps _ id, and pps _ id in the recovery point SEI message. Table 13 shows a syntax including a vps _ id _ present _ flag, a sps _ id _ present _ flag, and a pps _ id _ present _ flag indicating whether vps _ id, sps _ id, and pps _ id are present.
< Table 13>
(4-1) furthermore, as in table 14, there is a method of notifying the vps _ id, sps _ id, and pps _ id by including only the vps _ id in addition to the sps _ id and pps _ id in the recovery point SEI message.
< Table 14>
A message for transmitting the vps _ id or sps _ id may be included in an Intra Random Access Point (IRAP) access unit.
If any of the above-described information signaling methods is included in the access unit and used, the extractor may recognize the vps _ id, sps _ id, and pps _ id values through the above-described signaling method to extract the bitstream, and may manage one or more vps/sps/pps.
Further, a decoding apparatus or a decoding module for performing decoding may recognize vps _ id, sps _ id, and pps _ id values by the above-described signaling method, and may decode an associated AU in a signaling method by activating a parameter set.
Representation of bit streams in extension types
Hereinafter, extension information () (extension _ info ()) of the VPS and a new SEI message are proposed to describe and signal information on a scalable layer when a bitstream supporting a layer extension is included. To represent the bitstream in the extension type, the following information may be signaled.
layer _ id signals whether or not it conveys the priority value of the layer.
Here, a spatial layer, which is identified by a dependency id value, an SNR layer, which is identified by a quality id value, a view, which is identified by a view id value, etc., may be signaled in response to each layer id value, and a temporal layer may be identified by a temporal id of the NALU header.
In addition, a region of the video regarding layer _ id may be signaled by region id (region _ id).
In addition, correlation information, bit rate information, and quality information of each scalable layer may be signaled.
extension _ info () syntax is the same as table 15.
< Table 15>
The semantics of the syntax of table 15 are as follows.
-the number frame size minus1 (num _ frame _ sizes _ minus 1) plus 1 indicates the maximum number of size information of other types of pictures included in the coded video sequence (e.g. picture width in luma _ samples [ i ]), picture height in luma samples [ i ]), picture cut flag (pic _ cropping _ flag [ i ]), picture cut right offset (pic _ cropping _ right _ offset [ i ]), picture cut up offset (pic _ cropping _ top _ offset [ i ]), and picture cut down offset (pic _ cropping _ bottom [ i ]). The num _ frame _ sizes _ minus1 value may be 0-X. Other types of pictures may include pictures with different resolutions.
The number rep format minus1 (num _ rep _ formats _ minus 1) plus 1 indicates the maximum number of different types of bit depths and chroma formats included in the coded video sequence (e.g., bit depth luma minus8 (bit _ depth _ luma _ minus8[ i ]), bit depth chroma minus8 (bit _ depth _ chroma _ minus8[ i ]), and chroma format idc (chroma _ format _ idc) values [ i ]). The value of num _ rep _ formats _ minus1 may be in the range of 0-X.
-pic _ width _ in _ luma _ samples [ i ], pic _ height _ in _ luma _ samples [ i ], pic _ cropping _ flag [ i ], pic _ crop _ left _ offset [ i ], pic _ crop _ right _ offset [ i ], pic _ crop _ top _ offset [ i ], and pic _ crop _ bottom _ offset [ i ] indicate the ith pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, pic _ cropping _ flag, pic _ cropping _ index _ offset, pic _ cropping _ flag, pic _ cropping _ frame _ offset, pic _ frame _ offset, pic _ cropping _ frame _ offset, pic _ frame _ offset.
-bit _ depth _ luma _ minus8[ i ], bit _ depth _ chroma _ minus8[ i ] and chroma _ format _ idc [ i ] indicate the ith bit _ depth _ luma _ minus8, bit _ depth _ chroma _ minus8 and chroma _ format _ idc values of the coded video sequence.
-number layers minus1 (num _ layers _ minus 1) indicates the number of scalable layers available in the bitstream.
-when the value of the dependency id flag (dependency _ id _ flag) is 1, it indicates that there are one or more dependency _ id values with respect to the layer _ id value.
-when the value of the quality id flag (quality _ id _ flag) is 1, it indicates that there are one or more quality _ id values related to the layer _ id value.
-when the value of the view id flag (view _ id _ flag) is 1, it indicates that there are one or more view _ id values with respect to layer _ id value.
-when the value of the region id flag (region _ id _ flag) is 1, it indicates that there are one or more related region _ id values with respect to the layer _ id value.
When the value of the layer dependency information flag (layer dependency info flag) is 1, it indicates that the dependency information of the scalable layer is provided.
-the frame size idx (frame _ size _ idx [ i ]) indicates the index for a set of frame sizes applied to a layer with layer _ id value i. frame _ size _ idx [ i ] has a value ranging from 0 to X.
-rep format idx (rep _ format _ idx [ i ]) indicates an index for a set of bit depth and chroma format applied to a layer having a layer _ id value of i. rep _ format _ idx [ i ] has a value ranging from 0 to X.
-when the value of one dependency id flag (one dependency id flag [ i ]) is 1, it indicates that there is only one dependency id associated with the layer whose layer id is i. When the value of one _ dependency _ id _ flag [ i ] is 0, it indicates that there are two or more dependency _ ids associated with a layer whose layer _ id is i.
-dependency _ id [ i ] indicates the value of dependency _ id associated with the layer whose layer _ id is i.
-dependency id minimum (dependency _ id _ min [ i ]) and associated id group large (dependency _ id _ max [ i ]) indicate the minimum dependency _ id value and the maximum dependency _ id value, respectively, associated with the layer whose layer _ id is iI.
When the value of one quality id flag (one _ quality _ id _ flag [ i ]) is 1, it indicates that there is only one quality _ id associated with a layer whose layer _ id is i. When the value of one _ quality _ id _ flag [ i ] is 0, it indicates that there are two or more quality _ id values associated with the layer whose layer _ id is i.
Quality _ id [ i ] indicates the value of quality _ id associated with the layer whose layer _ id is i.
Quality id min (quality _ id _ min [ I ]) and quality id max (quality _ id _ max [ I ]) indicate the minimum quality _ id value and the maximum quality _ id value, respectively, associated with the layer whose layer _ id is I.
-when the value of one view id flag (one _ view _ id _ flag [ i ]) is 1, it indicates that there is only one view _ id associated with the layer whose layer _ id is i. When the value of one _ view _ id _ flag [ i ] is 0, it indicates that there are two or more view _ id values associated with a layer whose layer _ id is i.
View _ id [ i ] indicates the value of view _ id associated with the layer whose layer _ id is i.
-when the value of the depth flag (depth _ flag [ i ]) is 1, it indicates that the scalable layer whose layer _ id is i includes depth information of the 3-D video bitstream.
View id min (view _ id _ min [ i ]) and view id max (view _ id _ max [ i ]) indicate the minimum view _ id value and the maximum view _ id value, respectively, associated with the layer whose layer _ id is i.
The number area minus1 (num _ regions _ minus 1) plus 1 indicates the number of areas associated with a layer whose layer _ id is i.
Region _ id [ j ] indicates the identifier of the region j associated with the layer whose layer _ id is i.
The number of directly related layers (num _ direct _ dependent _ layers [ i ]) indicates the number of scalability layers to which the current scalability layer i is directly associated (i.e. the number of layers needed to generate a prediction signal when performing decoding).
The directly related layer id delta minus1 (direct _ dependent _ layer _ id _ delta _ minus1[ i ] [ j ]) plus 1 indicates layer _ id [ i ], i.e. the difference between the layer identifiers of the current and the jth scalability layer to which the current scalability layer is directly associated. The layer identifier of the jth scalable layer directly associated with the current scalable layer is the same as (layer _ id [ i ] -direct _ dependent _ layer _ id _ delta _ minus1[ i ] [ j ] -1).
Extension _ info () syntax according to another embodiment is the same as table 16.
< Table 16>
As shown in table 16, pic _ width _ in _ luma _ samples [ i ] and pic _ height _ in _ luma _ samples [ i ], bit _ depth _ luma _ minus8[ i ], bit _ depth _ chroma _ minus8[ i ], and chroma _ format _ idc [ i ] may be signaled by information on different representation formats.
According to another embodiment, pic _ width _ in _ luma _ samples [ i ], pic _ height _ in _ luma _ samples [ i ], bit _ depth _ luma _ minus8[ i ], bit _ depth _ chroma _ minus8[ i ], and chroma _ format _ idc [ i ] may be signaled by information on different pictures (i.e., pictures with different resolutions).
The syntax of the activation SEI message for signaling bit rate and quality information is the same as table 17.
< Table 17>
The semantics of the syntax of table 17 are as follows.
Num _ layers _ minus1 indicates the number of scalable layers that can be provided in the bitstream.
-when the value of the bit rate information flag (bit rate _ info _ flag) is 1, it indicates that bit rate information for each scalability layer is provided.
-when the value of the quality information flag (quality _ info _ flag) is 1, it indicates information providing a quality value of each scalable layer.
When the value of the quality type flag (quality _ type _ flag) is 1, it indicates information providing a quality type of each scalable layer.
-the maximum bit rate (max _ bit rate [ i ]) indicates the maximum bit rate of the scalability layer whose layer _ id value is i, and the average bit rate (average _ bit rate [ i ]) indicates the average bit rate of the scalability layer whose layer _ id value is i.
-the quality value (quality _ value [ i ]) indicates the quality value of scalable layer i.
-a quality type URI (quality _ type _ URI [ qualitytypateuriidx ]) indicates the qualitytypateuriidx byte with a null0terminated (null 0 terminated) string encoded in UTF-8 characters and indicates a Universal Resource Identifier (URI) that includes a representation of the type of quality value.
Hereinafter, a scheme of improving a Video Parameter Set (VPS) is proposed to efficiently extract a bitstream.
Layer referencing
The method of indicating the relationship between layer _ ID and scalability dimension ID in a bitstream supporting multiple layers may include a first method and a second method. The first method informs the mapping method between layer _ ID and scalability dimension ID. The second method partitions or concatenates bits of layer _ id, and then informs which dimension type exists in the bits of the partition or concatenation.
In a bitstream supporting multiple layers, a dimension type may refer to a type of scalability (e.g., spatial scalability and quality scalability), and a dimension ID may refer to an index of a layer for a specific dimension type.
In a bitstream supporting multiple layers, in general, a specific layer (for the purpose of facilitating understanding, for example, in the case of supporting temporal scalability in a bitstream of a single layer, a temporal layer (sublayer) 3) may directly refer to a next lower layer (for example, a temporal layer (sublayer) 2) in a specific dimension.
For example, in case of supporting spatial scalability, this means that spatial layer 2 directly refers to the next lower spatial layer 1.
Therefore, to indicate the above, it is proposed to first describe the dimensions with default direct correlation.
Hereinafter, the specific dependencies are described in detail in a description ring for the scalable layer.
The following proposes a scheme for signaling layer references using these two methods. The improved syntax for the vps extension (vps _ extension) is the same as table 18 to table 21.
< Table 18>
Table 18 shows a syntax for mapping layer _ ID to scalability dimension ID using the first method. The semantics of the syntax of table 18 are as follows.
-when the value of all default dependency flag (all _ default _ dependency _ flag) is 1, it indicates that all layer dimensions have default dependency. That is, this means that in a specific dimension i, a layer having 'dimension id (dimension _ id [ i ]) = n' directly refers to another layer having dimension _ id [ i ] = n-1 by default.
When the value of all _ default _ dependency _ flag is 0, it indicates that all layer dimensions may not have default dependencies. When the value of all _ default _ dependency _ flag is 0, the following 'num _ default _ dim _ minus1' is signaled.
-the number default dimension minus1 (num _ default _ dim _ minus 1) indicates the number of dimensions with default relevance.
Dimension [ j ] specifies the type of layer dimension with default relevance. That is, information about the type of layer dimension having the default correlation is signaled while increasing the number of dimensions having the default correlation one by one. In the corresponding dimension, a higher layer (e.g., dimension _ id = n) will directly refer to the next lower layer (e.g., dimension _ id = n-1).
When the value of the specific dependency flag (specific dependency flag i) is 1, this means that there is a direct dependency/reference detailed for the corresponding layer. Therefore, when the value of specific _ dependency _ flag [ i ] is 1, the number of layers directly referred to by the corresponding layer and the ID of the layer are signaled.
Layer C directly referencing layer B means that the decoder needs to use the information of layer B (decoded or not) to decode layer C. If layer B directly uses the information of layer a, layer C is not considered to directly refer to layer a.
< Table 19>
Table 19 shows syntax in which bits of layer _ id are allocated to a scalability dimension type and the length of the allocated dimension type is signaled using the second method.
The number dimension minus1 (num _ dimensions _ minus 1) described in table 19 indicates the number of layer dimensions present in the NALU header. That is, the number of layer dimensions present in the NALU header is checked, and the number of layer types present in each respective layer dimension and the bits allocated to the dimension type are checked.
The syntax 'all _ default _ dependency _ flag, num _ default _ dim _ minus1, dimension [ j ], and specific _ dependency _ flag [ i ]' for layer reference described in table 19 have the same semantics as the syntax described in table 18.
Tables 20 and 21 describe alternative syntax of tables 18 and 19. Table 20 shows an alternative syntax indicating the default correlation when the first method is used, and table 21 shows an alternative syntax indicating the default correlation when the second method is used.
< TABLE 20>
< Table 21>
Among the syntaxes of tables 20 and 21, descriptions of the syntaxes described in tables 18 and 19 are omitted.
The new syntax 'default dependency flag (default _ dependency _ flag [ i ])' in tables 20 and 21 indicates whether dimension type i uses default dependency. In the corresponding dimension, a higher layer (e.g., dimension _ id [ i ] = n) directly refers to a lower right layer (e.g., dimension _ id [ i ] = n-1).
That is, after a specific dimension type is specified by num _ dimensions _ minus1 and dimension type (dimension _ type [ i ]), whether the specific dimension type uses default correlation is signaled. If it is not signaled, it indicates that information for a layer directly referred to by the corresponding layer is signaled.
The dimension types according to the invention are listed in table 22.
< Table 22>
dimension_type[i][j] dimension_id[i][j]
0 View order idx (view order idx)
1 Depth flag (depth flag)
2 Association ID (dependency ID)
3 Quality ID (quality ID)
4 Priority ID (priority ID)
5 Region ID (region ID)
6..15 Retention
According to the present invention, dimension types 4 and 5, i.e., types indicating a priority ID and a region ID, have been added to existing dimension types.
The dimension _ type [ i ] [ j ] may have a value between 0 and 5. Other values may be defined later, and if the dimension _ type [ i ] [ j ] does not have a value between 0 and 5, the decoder may disregard the value of dimension _ type [ i ] [ j ].
If the dimension _ type has a value of 4, the corresponding dimension _ ID indicates an ID of a priority layer of a bitstream in the SVC standard.
If the dimension _ type has a value of 5, a corresponding dimension _ ID indicates an ID of a specific region of the bitstream. The particular region may be one or more spatio-temporal segments in the bitstream.
Fig. 4 is a control flow diagram illustrating a method of encoding video information according to the present invention.
Referring to fig. 4, the encoding apparatus encodes a Network Abstraction Layer (NAL) unit including information on video in step 401.
The NALU header of a NALU does not include information indicating whether the NALU includes slices containing at least some or the entire non-reference pictures.
Meanwhile, the NALU header includes layer ID information to identify a scalable layer in a bitstream supporting the scalable layer.
Here, bits used to signal information indicating whether the NALU other than the NALU header includes a slice containing at least some or the entire non-reference picture may be used to signal layer ID information.
In addition, NALUs may include information about various parameter sets required for decoding video.
The encoding apparatus may encode a Supplemental Enhancement Information (SEI) message including information on an active parameter set as an independent NALU.
The information on the active parameter set may include at least one of information on which the active VPS is indexed and information on which the active SPS is indexed.
Further, the information on the active parameter set may include information on the active VPS indexed thereon, information on the number of SPSs referring to the active VPS, and information on the SPS indexed thereon.
The decoding apparatus can extract a sub-layer providing temporal scalability using information on a parameter set.
In addition, when a parameter set required for decoding a video coding layer NALU is activated, a decoding apparatus or a decoding module for performing decoding may use information on the parameter set.
The encoding apparatus transmits a NALU including information on encoded video in the form of a bitstream in step 402.
Fig. 5 is a control flow diagram illustrating a method of decoding video information according to the present invention.
Referring to fig. 5, a decoding apparatus receives a NALU including information on encoded video through a bitstream in step 501.
The decoding device parses the header and NAL payload of the NALU in step 502. The parsing of the video information may be performed by the entropy decoding module or an additional parsing module.
The decoding device may obtain various pieces of information included in the header of the NALU and the NAL payload by parsing.
The NALU header may include layer ID information for identifying a scalable layer in a bitstream supporting the scalable layer and may not include flag information indicating whether the NALU is a non-reference picture or 1 bit of a reference picture when encoding video data in the entire bitstream.
Here, bits used to signal information indicating whether the NALU other than the NALU header includes a slice containing at least some or the entire non-reference picture may be used to signal layer ID information.
Further, the decoding apparatus may obtain information on the parameter set included in the SEI message through parsing. The obtained information for the parameter sets is needed to decode the NALUs associated with the SEI message.
The information on the active parameter set may include at least one of information on which the active VPS is indexed and information on which the active SPS is indexed.
Further, the information on the active parameter set may include information on which the active VPS is indexed, information indicating the number of SPSs referring to the active VPS, and information on which the SPSs are indexed.
The decoding apparatus can extract a sub-layer providing temporal scalability using these pieces of information on the parameter set.
In addition, the piece of information on the parameter set may be used when decoding a bitstream or in session negotiation (e.g., session negotiation at the time of streaming in an IP network).
In the above-described embodiments, although the method has been described based on the flowchart in the form of a series of steps or blocks, the present invention is not limited to the sequence of the steps, and some steps may be performed in a different order from other steps or may be performed simultaneously with other steps. Further, those skilled in the art will appreciate that the steps shown in the flowcharts are not exclusive and that the steps may include additional steps or one or more steps in the flowcharts may be deleted without changing the scope of the present invention.
The above-described embodiments include various aspects of examples. While all possible combinations that present the various aspects may not be described, those skilled in the art will appreciate that other combinations are possible. Accordingly, the invention is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.

Claims (9)

1. A video decoding apparatus, comprising:
a decoding module for decoding the picture,
a parsing module that parses a Network Abstraction Layer (NAL) unit header of a NAL unit including information about the picture; and
a decoded picture buffer DPB for storing the decoded pictures,
wherein the NAL unit header does not include NAL reference information indicating whether the picture is a reference picture or a non-reference picture, and the picture is marked as a reference picture or a non-reference picture based on slice reference information in the slice header, an
The NAL unit header includes a layer identifier indicating a layer defined by at least one of scalable features other than temporal scalability, and a temporal identifier indicating a temporal layer of the NAL unit.
2. The video decoding apparatus of claim 1, wherein the layer identifier is represented by 6 bits.
3. A video decoding method, comprising:
decoding the picture;
parsing a Network Abstraction Layer (NAL) unit header of a NAL unit including information about the picture; and
marking the picture as one of a reference picture and a non-reference picture,
wherein the content of the first and second substances,
the NAL unit header does not include NAL reference information indicating whether the picture is a reference picture or a non-reference picture, and the picture is marked as a reference picture or a non-reference picture based on slice reference information in the slice header, an
The NAL unit header includes a layer identifier indicating a layer defined by at least one of scalable features other than temporal scalability, and a temporal identifier indicating a temporal layer of the NAL unit.
4. A video encoding device comprising:
a coding unit for coding a picture, coding a slice header and a NAL unit header of a network abstraction layer NAL unit, and generating a bitstream including the coded picture and the coded slice header,
wherein, the first and the second end of the pipe are connected with each other,
the NAL unit header includes information on the picture,
the NAL unit header does not include NAL reference information indicating whether the picture is a reference picture or a non-reference picture, and the picture is marked as a reference picture or a non-reference picture based on slice reference information in the slice header, an
The NAL unit header includes a layer identifier indicating a layer defined by at least one of scalable features other than temporal scalability, and a temporal identifier indicating a temporal layer of the NAL unit.
5. The video encoding apparatus of claim 4, wherein for the decoding process of the picture, the picture is stored in a Decoded Picture Buffer (DPB).
6. The video encoding device of claim 4, wherein the layer identifier is represented by 6 bits.
7. A video encoding method, comprising:
coding the picture; and
encoding a slice header;
coding a NAL unit header of a network abstraction layer NAL unit; and
generating a bitstream including the encoded picture and the encoded slice header,
wherein the content of the first and second substances,
the NAL unit header includes information on the picture,
the NAL unit header does not include NAL reference information indicating whether the picture is a reference picture or a non-reference picture, and the picture is marked as a reference picture or a non-reference picture based on slice reference information in the slice header, an
The NAL unit header includes a layer identifier indicating a layer defined by at least one of scalable features other than temporal scalability, and a temporal identifier indicating a temporal layer of the NAL unit.
8. A method for storing a bitstream, comprising:
generating a bitstream including information of a picture and information of a slice header and information of a NAL unit header of a network abstraction layer NAL unit; and
the bit stream is stored in a memory, and,
wherein the NAL unit header includes information on the picture,
the NAL unit header does not include NAL reference information indicating whether the picture is a reference picture or a non-reference picture, and the picture is marked as a reference picture or a non-reference picture based on slice reference information in the slice header, an
The NAL unit header includes a layer identifier indicating a layer defined by at least one of scalable features other than temporal scalability, and a temporal identifier indicating a temporal layer of the NAL unit.
9. A method for generating a bitstream, comprising:
generating a bitstream including information of the picture, information of the slice header, and information of a NAL unit header of a network abstraction layer NAL unit,
wherein the NAL unit header includes information on the picture,
the NAL unit header does not include NAL reference information indicating whether the picture is a reference picture or a non-reference picture, and the picture is marked as a reference picture or a non-reference picture based on slice reference information in the slice header, an
The NAL unit header includes a layer identifier indicating a layer defined by at least one of scalable features other than temporal scalability, and a temporal identifier indicating a temporal layer of the NAL unit.
HK19123320.4A 2012-04-16 2019-05-06 Video decoding method and device, and video coding method and device HK40000117B (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
KR10-2012-0038870 2012-04-16
KR10-2012-0066606 2012-06-21
KR10-2012-0067925 2012-06-25
KR10-2012-0071933 2012-07-02
KR10-2012-0077012 2012-07-16
KR10-2012-0108925 2012-09-28
KR10-2012-0112598 2012-10-10

Publications (2)

Publication Number Publication Date
HK40000117A HK40000117A (en) 2020-01-31
HK40000117B true HK40000117B (en) 2023-02-10

Family

ID=

Similar Documents

Publication Publication Date Title
JP7583890B2 (en) Video information decoding method, video decoding method and device using the same
HK40000117A (en) Video decoding method and device, and video coding method and device
HK40000117B (en) Video decoding method and device, and video coding method and device
HK40000120B (en) Video decoding method and device, and video coding method and device
HK40000112A (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000112B (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000111A (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000115A (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000116A (en) Video decoding method and device, and video coding method and device
HK40000116B (en) Video decoding method and device, and video coding method and device
HK40000121A (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000113A (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000114A (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000114B (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000119A (en) Video decoding method and device, and video coding method and device
HK40000120A (en) Video decoding method and device, and video coding method and device
HK40008641A (en) Video decoding method and device, and video coding method and device
HK40000119B (en) Video decoding method and device, and video coding method and device
HK40000113B (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000121B (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method
HK40000111B (en) Video coding method, video decoding method, bit stream storing method and bit stream generating method