CN104094604A

CN104094604A - Method and apparatus for encoding and decoding video using temporal motion vector prediction

Info

Publication number: CN104094604A
Application number: CN201380005801.0A
Authority: CN
Inventors: 乃苏孟德; 袁明亮; 林宗顺; 孙海威; 温觉觉; 西孝启; 笹井寿郎; 柴原阳司; 杉尾敏康; 谷川京子; 松延彻; 寺田健吾
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Sun Patent Trust Inc
Priority date: 2012-01-20
Filing date: 2013-01-16
Publication date: 2014-10-08
Anticipated expiration: 2033-01-16
Also published as: EP2805511A4; EP2805511B1; KR20140116871A; US10129563B2; US9591328B2; US10616601B2; US20170094307A1; JP2015504254A; US20190028734A1; ES2728146T3; EP2805511A1; WO2013108616A1; PL2805511T3; CN104094604B; JP6394966B2; KR102030205B1; US20140369415A1

Abstract

A method of encoding a video into a coded video bitstream with temporal motion vector prediction, the method comprising: determining a value of a flag for indicating whether temporal motion vector prediction is used or not used for the inter-picture prediction of a sub-picture unit of a picture; and writing the flag having said value into a header of the sub-picture unit or a header of the picture; wherein if the flag indicates that temporal motion vector prediction is used, the method further comprises: creating a first list of motion vector predictors comprising a plurality of motion vector predictors including at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture; selecting a motion vector predictor out of the first list for a prediction unit in the sub-picture unit; and writing a first parameter into the coded video bitstream for indicating the selected motion vector predictor out of the first list, wherein if the flag indicates that temporal motion vector prediction is not used, the method further comprises: creating a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors; and selecting a motion vector predictor out of the first list for a prediction unit in the sub-picture unit. writing a second parameter into the coded video bitstream for indicating the selected motion vector predictor out of the second list. In addition, there is provided a method of decoding an encoded video and corresponding apparatuses for encoding and decoding a video.

Description

Method and apparatus for encoding and decoding video using temporal motion vector prediction

技术领域technical field

本发明涉及使用时间运动矢量预测对视频进行编码的方法和对视频进行解码的方法，及其装置。本发明可以应用于任何多媒体数据编码，更具体地说，本发明可以应用于利用用于图片间预测的时间运动矢量预测来对图像和视频内容进行编码。The present invention relates to a method of encoding a video and a method of decoding a video using temporal motion vector prediction, and an apparatus therefor. The invention can be applied to any multimedia data encoding, more specifically, the invention can be applied to encoding image and video content with temporal motion vector prediction for inter-picture prediction.

背景技术Background technique

诸如H.264/MPEG-4AVC以及即将到来的HEVC(高效视频编码)的视频编码方案根据之前编码/解码的参考图片使用图片间(或者简称“间”)预测来执行对图像/视频内容的编码/解码，以便利用跨越时间上连续的图片的信息冗余。Video coding schemes such as H.264/MPEG-4 AVC and the upcoming HEVC (High Efficiency Video Coding) use inter-picture (or "inter" for short) prediction to perform coding of image/video content based on previously coded/decoded reference pictures /decode in order to exploit the redundancy of information across temporally consecutive pictures.

在经编码的视频比特流中，用于预测单元(例如采样的M×N块)的图片间预测处理的参考图片通过使用参考索引而被识别或参考。参考索引是包括一个或多个参考图片的有序列表(称为参考图片列表)的索引。每一个参考索引与参考图片列表中的参考图片唯一地相关联。也就是说，参考索引是用于对多个参考图片彼此进行区分的值。In an encoded video bitstream, reference pictures used for inter-picture prediction processing of a prediction unit (eg, an M×N block of samples) are identified or referenced by using a reference index. A reference index is an index that includes an ordered list of one or more reference pictures (referred to as a reference picture list). Each reference index is uniquely associated with a reference picture in the reference picture list. That is, the reference index is a value for distinguishing a plurality of reference pictures from each other.

上述编码方案支持运动矢量的时间预测(即，运动矢量预测或MVP)，由此采样的目标块的运动矢量是根据共置的参考图片中的一个或多个之前编码的采样块的运动矢量进行预测的。时间运动矢量预测通过利用时间上相邻的运动矢量之间的信息冗余，进一步降低了与运动矢量相关联的比特速率。共置的参考图片是使用预先确定的方案在可用参考图片之中选择的，例如，在预先确定的参考图片列表(例如参考图片列表0)中选择第一参考图片作为共置的参考图片。The coding schemes described above support temporal prediction of motion vectors (i.e., motion vector prediction or MVP), whereby the motion vector of a sampled target block is derived from the motion vectors of one or more previously coded sampled blocks in a co-located reference picture predicted. Temporal motion vector prediction further reduces the bit rate associated with motion vectors by exploiting the information redundancy between temporally adjacent motion vectors. The co-located reference picture is selected among available reference pictures using a predetermined scheme, eg, the first reference picture is selected in a predetermined reference picture list (eg, reference picture list 0) as the co-located reference picture.

在需要跨越有损环境传输视频的应用中，当共置的参考图片丢失或包含错误时，时间运动矢量预测易受运动矢量的错误预测影响。在处于发展中的HEVC标准中，公开了一种用于禁用某一子图像单元(例如，切片)的时间运动矢量预测的技术。JCTVC-G398，"High-level Syntax:Markingprocess for non-TMVP pictures"，ITU-T SG16WP3和ISO/IECJTC1/SC29/WG11第七次会议的视频编码联合协作小组(JCT-VC)，日内瓦，CH，2011年11月。在该技术中，有必要在图片参数集(PPS)中引入用于将解码器图片缓冲器(DPB)中的图片标记为“未用于时间运动矢量预测”的标记标志。当子图片单元指的是具有等于“真(TRUE)”的标记标志的PPS时，该标记处理由解码器执行。In applications that need to transmit video across lossy environments, temporal motion vector prediction is susceptible to misprediction of motion vectors when co-located reference pictures are missing or contain errors. In the HEVC standard under development, a technique for disabling temporal motion vector prediction of a certain sub-picture unit (eg, slice) is disclosed. JCTVC-G398, "High-level Syntax: Marking process for non-TMVP pictures", Joint Collaborative Team on Video Coding (JCT-VC) of the seventh meeting of ITU-T SG16WP3 and ISO/IECJTC1/SC29/WG11, Geneva, CH, November 2011. In this technique, it is necessary to introduce a marking flag for marking a picture in the decoder picture buffer (DPB) as "not used for temporal motion vector prediction" in the picture parameter set (PPS). This marking process is performed by the decoder when the sub-picture unit refers to a PPS with a marking flag equal to "TRUE".

参考文献列表Reference list

非专利文献non-patent literature

NPL1：ISO/IEC14496-10,"MPEG-4Part10Advanced Video Coding"NPL1: ISO/IEC14496-10,"MPEG-4Part10Advanced Video Coding"

NPL2：JCTVC-G398,"High-level Syntax:Marking process for non-TMVPpictures",ITU-T SG16WP3和ISO/IEC JTC1/SC29/WG11第七次会议的视频编码联合协作小组(JCT-VC)，日内瓦，CH，2011年11月。NPL2: JCTVC-G398, "High-level Syntax: Marking process for non-TMVP pictures", ITU-T SG16WP3 and ISO/IEC JTC1/SC29/WG11 7th Joint Collaborative Team on Video Coding (JCT-VC), Geneva , CH, November 2011.

发明内容Contents of the invention

技术问题technical problem

如背景技术中所述，在用于禁用某些切片的时间运动矢量预测的公开技术中，有必要在图片参数集(PPS)中引入标记标志用于将解码器图片缓冲器(DPB)中的图片标记为“不用于时间运动矢量预测”。与该技术相关联的一个主要问题是：当调用标记处理的切片丢失或包含错误时，解码器不能执行预期的标记处理。从而，失去了随后的编码器与解码器之间的同步。因此用于禁用时间运动矢量预测的上述技术不是鲁棒的。As mentioned in the background, in the disclosed technique for disabling temporal motion vector prediction for certain slices, it is necessary to introduce a flag flag in the picture parameter set (PPS) for the decoder picture buffer (DPB) The picture is marked as "not used for temporal motion vector prediction". A major problem associated with this technique is that the decoder cannot perform the intended marking process when the slice calling the marking process is missing or contains errors. Consequently, synchronization between subsequent encoders and decoders is lost. The above-described techniques for disabling temporal motion vector prediction are therefore not robust.

问题的解决方案problem solution

本发明寻求提供具有提升的容错性的用于利用时间运动矢量预测对视频进行编码和解码的方法和装置。具体而言，以不容易受错误影响的方式来启用/禁用针对子图片单元(例如，切片)的时间运动矢量预测。例如，根据本发明的实施例，消除由解码器执行的上述标记处理(即，用于将参考图片标记为“不用于时间运动矢量预测”)。The present invention seeks to provide methods and apparatus for encoding and decoding video with temporal motion vector prediction with improved error tolerance. In particular, temporal motion vector prediction for sub-picture units (eg, slices) is enabled/disabled in a manner that is not susceptible to errors. For example, according to an embodiment of the present invention, the above-mentioned marking process performed by the decoder (ie for marking a reference picture as "not used for temporal motion vector prediction") is eliminated.

根据本发明的第一方案，提供了一种利用时间运动矢量预测将视频编码成经编码的视频比特流的方法，所述方法包括：According to a first aspect of the present invention, there is provided a method of encoding video into an encoded video bitstream using temporal motion vector prediction, the method comprising:

确定用于指示针对图片的子图片单元的图片间预测使用还是未使用时间运动矢量预测的标志的值；determining the value of a flag indicating whether temporal motion vector prediction is used or not used for inter-picture prediction of a sub-picture unit of a picture;

将所述标志写入所述子图片单元的头部或所述图片的头部；并且writing the flag into the header of the sub-picture unit or the header of the picture; and

其中，如果所述标志指示使用了时间运动矢量预测，那么所述方法还包括：Wherein, if the flag indicates that temporal motion vector prediction is used, the method further includes:

创建包括多个运动矢量预测符(predictor)的运动矢量预测符的第一列表，所述多个运动矢量预测符包括：从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符；Creating a first list of motion vector predictors comprising a plurality of motion vector predictors comprising: at least one temporal motion derived from at least one motion vector from a collocated reference picture vector predictor;

针对所述子图片单元中的预测单元，从所述第一列表中选择运动矢量预测符；以及selecting a motion vector predictor from the first list for a prediction unit in the sub-picture unit; and

将第一参数写入所述经编码的视频比特流来指示从所述第一列表中所选择的运动矢量预测符。A first parameter is written to the encoded video bitstream to indicate a motion vector predictor selected from the first list.

优选地，如果所述标志指示未使用时间运动矢量预测，那么所述方法还包括：Preferably, if the flag indicates that temporal motion vector prediction is not used, the method further comprises:

创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的第二列表；creating a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors;

针对所述子图片单元中的预测单元，从所述第二列表中选择运动矢量预测符；以及selecting a motion vector predictor from the second list for a prediction unit in the sub-picture unit; and

将第二参数写入所述经编码的视频比特流来指示从所述第二列表中所选择的运动矢量预测符。A second parameter is written to the encoded video bitstream to indicate a motion vector predictor selected from the second list.

在一个实施例中，所述标志的值是基于所述图片的时间层确定的。In one embodiment, the value of the flag is determined based on the temporal layer of the picture.

优选地，如果确定所述图片的所述时间层是最低层或基层，那么设置所述标志的值来指示未使用时间运动矢量预测，否则，设置所述标志的值来指示使用了时间运动矢量预测。Preferably, if it is determined that the temporal layer of the picture is the lowest layer or base layer, the value of the flag is set to indicate that temporal motion vector prediction is not used, otherwise, the value of the flag is set to indicate that temporal motion vector prediction is used predict.

在另一个实施例中，所述标志的值是基于所述图片的图片顺序计数(POC)值确定的。In another embodiment, the value of the flag is determined based on a picture order count (POC) value of the picture.

优选地，如果确定所述图片的所述POC值大于解码器图片缓冲器(DPB)中的参考图片的任意POC值，那么设置所述标志的值来指示未使用时间运动矢量预测，否则，设置所述标志的值来指示使用了时间运动矢量预测。Preferably, if it is determined that said POC value of said picture is greater than any POC value of a reference picture in a decoder picture buffer (DPB), then the value of said flag is set to indicate that temporal motion vector prediction is not used, otherwise, set The value of the flag to indicate that temporal motion vector prediction is used.

在又一个实施例中，所述标志的值是基于所述图片中的图片间子图片单元的子图片单元类型确定的。In yet another embodiment, the value of the flag is determined based on a sub-picture unit type of an inter-picture sub-picture unit in the picture.

优选地，如果所述子图片单元类型是预测性(P)类型，那么设置所述标志的值来指示未使用时间运动矢量预测，否则，设置所述标志的值来指示使用了时间运动矢量预测。Preferably, if the sub-picture unit type is a predictive (P) type, then set the value of the flag to indicate that temporal motion vector prediction is not used, otherwise, set the value of the flag to indicate that temporal motion vector prediction is used .

在又一个实施例中，所述标志的值是基于包含所述子图片单元的所述图片是否是随机接入点(RAP)图片来确定的。In yet another embodiment, the value of the flag is determined based on whether the picture containing the sub-picture unit is a Random Access Point (RAP) picture.

优选地，如果所述图片是RAP图片并且所述子图片单元属于所述图片的非基层，那么设置所述标志的值来指示未使用时间运动矢量预测，否则，设置所述标志的值来指示使用了时间运动矢量预测。Preferably, if the picture is a RAP picture and the sub-picture unit belongs to the non-base layer of the picture, then set the value of the flag to indicate that temporal motion vector prediction is not used, otherwise, set the value of the flag to indicate Temporal motion vector prediction is used.

优选地，将所述标志写入所述子图片单元的所述头部。Preferably, the flag is written into the header of the sub-picture unit.

优选地，所述方法还包括：将一个或多个参数写入所述子图片单元的头部，以便指定在用于子图片单元的图片间预测的一个或多个参考图片列表中的参考图片的顺序。Advantageously, the method further comprises: writing one or more parameters into the header of the sub-picture unit to specify reference pictures in one or more reference picture lists for inter-picture prediction of the sub-picture unit Order.

优选地，所述方法还包括：Preferably, the method also includes:

使用所选择的运动矢量预测符来执行运动补偿的图片间预测以便产生所述预测单元；performing motion-compensated inter-picture prediction using the selected motion vector predictor to generate the prediction unit;

从原始采样块中减去所述预测单元以便产生剩余采样块；以及subtracting the prediction unit from the original sample block to produce a residual sample block; and

将与预测单元相对应的剩余采样块编码成所述经编码的视频比特流。The remaining sample blocks corresponding to prediction units are encoded into the encoded video bitstream.

在一个实施例中，所述第二列表包括比所述第一列表少一个的运动矢量预测符，并且除了所述时间运动矢量预测符，所述第一和第二列表的所述运动矢量预测符是相同的。In one embodiment, said second list includes one less motion vector predictor than said first list, and in addition to said temporal motion vector predictors, said motion vector predictors of said first and second lists characters are the same.

优选地，所述第一和第二参数在所述经编码的视频比特流中是使用不同的预先确定的比特表示来表示的。Advantageously, said first and second parameters are represented in said encoded video bitstream using different predetermined bit representations.

在另一个实施例中，所述第一和第二列表包括相同的预先确定的数量的运动矢量预测符，并且所述第二列表包括并不存在于所述第一列表中且是在未使用来自任何参考图片的运动矢量的情况下推导出的的运动矢量预测符。In another embodiment, said first and second lists include the same predetermined number of motion vector predictors, and said second list includes A motion vector predictor derived from the presence of motion vectors from any reference picture.

优选地，所述标志用于指示：针对独立于所述图片中的其它子图片单元的子图片单元的所述图片间预测，使用还是未使用时间运动矢量预测。Preferably, the flag is used to indicate whether temporal motion vector prediction is used or not used for the inter-picture prediction of the sub-picture unit independent of other sub-picture units in the picture.

优选地，所述子图片单元是图片切片。Preferably, the sub-picture unit is a picture slice.

根据本发明的第二方案，提供了一种利用时间运动矢量预测对经编码的视频比特流进行解码的方法，所述方法包括：According to a second aspect of the present invention, there is provided a method of decoding an encoded video bitstream using temporal motion vector prediction, the method comprising:

对来自经编码的视频的子图片单元的头部或者图片的头部的标志进行解析；以及parsing flags from headers of sub-picture units or headers of pictures of the encoded video; and

确定所述标志指示使用还是未使用时间运动矢量预测；determining whether the flag indicates whether temporal motion vector prediction is used or not used;

创建包括多个运动矢量预测符的运动矢量预测符的第一列表，所述多个运动矢量预测符包括：从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符；creating a first list of motion vector predictors comprising a plurality of motion vector predictors comprising: at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture ;

对来自所述经编码的视频比特流的第一参数进行解析，所述第一参数指示针对所述子图片单元中的预测单元从所述第一列表中所选择的运动矢量预测符。A first parameter from the encoded video bitstream is parsed, the first parameter indicating a motion vector predictor selected from the first list for a prediction unit in the sub-picture unit.

创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的第二列表；以及creating a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors; and

对来自所述经编码的视频比特流的第二参数进行解析，所述第二参数指示针对所述子图片单元中的预测单元从所述第二列表中所选择的运动矢量预测符。A second parameter from the encoded video bitstream is parsed, the second parameter indicating a motion vector predictor selected from the second list for a prediction unit in the sub-picture unit.

根据本发明的第三方案，提供了一种利用时间运动矢量预测将视频编码成经编码的视频比特流的装置，所述装置包括：According to a third aspect of the present invention, there is provided an apparatus for encoding video into an encoded video bitstream using temporal motion vector prediction, the apparatus comprising:

控制单元，其可操作用于：确定用于指示针对图片的子图片单元的图片间预测使用还是未使用时间运动矢量预测的标志的值；a control unit operable to: determine a value of a flag indicating whether temporal motion vector prediction is used or not used for inter-picture prediction of a sub-picture unit of a picture;

写单元，其可操作用于：将具有所述值的标志写入所述子图片单元的头部或所述图片的头部；a writing unit operable to: write a flag having said value into a header of said sub-picture unit or a header of said picture;

运动矢量预测单元；以及motion vector prediction unit; and

图片间预测单元，其用于：基于从所述运动矢量预测单元选择的运动矢量预测符来执行图片间预测，an inter-picture prediction unit configured to: perform inter-picture prediction based on a motion vector predictor selected from the motion vector prediction unit,

其中，所述运动矢量预测单元被配置为：接收所述标志，并且基于所述标志是第一值，所述运动矢量预测单元可操作用于：创建包括多个运动矢量预测符的运动矢量预测符的第一列表，所述多个运动矢量预测符包括：从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符，以及针对所述子图片单元中的预测单元，从所述第一列表中选择运动矢量预测符；以及Wherein the motion vector prediction unit is configured to: receive the flag, and based on the flag being a first value, the motion vector prediction unit is operable to: create a motion vector prediction comprising a plurality of motion vector predictors A first list of motion vector predictors comprising: at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture, and for a prediction unit in the sub-picture unit , select a motion vector predictor from said first list; and

所述写单元还可操作用于：将第一参数写入所述经编码的视频比特流来指示从所述第一列表中所选择的运动矢量预测符。The writing unit is further operable to: write a first parameter to the encoded video bitstream indicating the selected motion vector predictor from the first list.

优选地，当所述标志是第二值时，所述运动矢量预测单元可操作用于：创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的第二列表；以及针对所述子图片单元中的预测单元，从所述第一列表中选择运动矢量预测符；以及Advantageously, when said flag is a second value, said motion vector prediction unit is operable to: create a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors; and selecting a motion vector predictor from the first list for a prediction unit in the sub-picture unit; and

所述写单元还可操作用于：将第二参数写入所述经编码的视频比特流来指示从所述第二列表中所选择的运动矢量预测符。The writing unit is further operable to write a second parameter to the encoded video bitstream indicating the selected motion vector predictor from the second list.

根据本发明的第四方案，提供了一种利用时间运动矢量预测对经编码的视频比特流进行解码的装置，所述装置包括：According to a fourth aspect of the present invention, an apparatus for decoding an encoded video bitstream using temporal motion vector prediction is provided, the apparatus comprising:

解析单元，其可操作用于：对来自经编码的视频的子图片单元的头部或者图片的头部的标志进行解析；以及确定所述标志指示使用还是未使用时间运动矢量预测；a parsing unit operable to: parse a flag from a header of a sub-picture unit or a header of a picture of the encoded video; and determine whether the flag indicates whether temporal motion vector prediction is used or not used;

运动矢量预测单元；以及motion vector prediction unit; and

图片间预测单元，其用于：基于从所述运动矢量预测单元中所选择的运动矢量预测符来执行图片间预测；an inter-picture prediction unit configured to: perform inter-picture prediction based on a motion vector predictor selected from the motion vector prediction unit;

其中，所述运动矢量预测单元被配置为：接收所述标志，并且基于所述标志是第一值，所述运动矢量预测单元可操作用于：创建包括多个运动矢量预测符的运动矢量预测符的第一列表，所述多个运动矢量预测符包括：从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符；以及Wherein the motion vector prediction unit is configured to: receive the flag, and based on the flag being a first value, the motion vector prediction unit is operable to: create a motion vector prediction comprising a plurality of motion vector predictors a first list of symbols, the plurality of motion vector predictors comprising: at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture; and

所述解析单元还可操作用于：对来自所述经编码的视频比特流的第一参数进行解析，所述第一参数指示针对所述子图片单元中的预测单元从所述第一列表中所选择的运动矢量预测符。The parsing unit is further operable to: parse a first parameter from the encoded video bitstream, the first parameter indicating a prediction unit from the first list for a prediction unit in the sub-picture unit The selected motion vector predictor.

优选地，当所述标志是第二值时，所述运动矢量预测单元可操作用于：创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的第二列表；以及Advantageously, when said flag is a second value, said motion vector prediction unit is operable to: create a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors; as well as

所述解析单元还可操作用于：对来自所述经编码的视频比特流的第二参数进行解析，所述第二参数指示针对所述子图片单元中的预测单元从所述第二列表中所选择的运动矢量预测符。The parsing unit is further operable to parse a second parameter from the encoded video bitstream, the second parameter indicating a prediction unit from the second list for a prediction unit in the sub-picture unit. The selected motion vector predictor.

本发明的有利效果Advantageous effect of the present invention

本发明的实施例提供了具有图片间预测的提升的容错性的、用于使用时间运动矢量预测对视频进行编码和解码的方法和装置。例如，这些实施例还可以导致图片间预测的提升的灵活性和编码效率，因为可以针对同一个图片中的多个子图片单元来独立地启用和禁用时间运动矢量预测。Embodiments of the present invention provide methods and apparatus for encoding and decoding video using temporal motion vector prediction with improved error tolerance of inter-picture prediction. For example, these embodiments may also result in improved flexibility and coding efficiency of inter-picture prediction, since temporal motion vector prediction may be enabled and disabled independently for multiple sub-picture units in the same picture.

附图说明Description of drawings

图1描绘了根据本发明的实施例的示例性经编码的视频比特流的分解示意图；Figure 1 depicts an exploded schematic diagram of an exemplary encoded video bitstream according to an embodiment of the present invention;

图2描绘了示出根据本发明的实施例的对视频进行编码的方法的流程图；FIG. 2 depicts a flowchart illustrating a method of encoding video according to an embodiment of the present invention;

图3描绘了用于对输入视频/图像比特流进行编码的示例性装置的示意性框图；Figure 3 depicts a schematic block diagram of an exemplary apparatus for encoding an input video/image bitstream;

图4描绘了示出根据本发明的实施例的对经编码的视频进行解码的方法的流程图；FIG. 4 depicts a flowchart illustrating a method of decoding encoded video according to an embodiment of the present invention;

图5描绘了用于对输入的经编码的比特流进行解码的示例性装置的示意性框图；5 depicts a schematic block diagram of an exemplary apparatus for decoding an input encoded bitstream;

图6描绘了示出一组示例性图片的不同时间层的图；Figure 6 depicts a diagram showing different temporal layers of an exemplary set of pictures;

图7描绘了示出根据第一实施例确定时间运动矢量预测使用标志的值的方法的流程图；FIG. 7 depicts a flowchart illustrating a method of determining the value of a temporal motion vector prediction usage flag according to a first embodiment;

图8描绘了示出根据第二实施例确定时间运动矢量预测使用标志的值的方法的流程图；FIG. 8 depicts a flowchart illustrating a method of determining the value of a temporal motion vector prediction usage flag according to a second embodiment;

图9描绘了示出根据第三实施例确定时间运动矢量预测使用标志的值的方法的流程图；FIG. 9 depicts a flowchart illustrating a method of determining the value of a temporal motion vector prediction usage flag according to a third embodiment;

图10描绘了NAL单元流的图表示，即，用于经编码的视频比特流的一系列NAL单元；Figure 10 depicts a graph representation of a NAL unit stream, i.e., a series of NAL units for a coded video bitstream;

图11利用多个切片描绘了包含多个视图/层的示例性RAP图片的图表示；Figure 11 depicts a graph representation of an exemplary RAP picture comprising multiple views/layers with multiple slices;

图12描绘了示出根据第四实施例确定时间运动矢量预测使用标志的值的方法的流程图；FIG. 12 depicts a flowchart illustrating a method of determining the value of a temporal motion vector prediction usage flag according to a fourth embodiment;

图13示出了用于实现内容分配服务的内容提供系统的总体配置；FIG. 13 shows an overall configuration of a content providing system for realizing a content distribution service;

图14示出了数字广播系统的总体配置；FIG. 14 shows an overall configuration of a digital broadcasting system;

图15示出了说明电视机的配置示例的框图。FIG. 15 shows a block diagram illustrating a configuration example of a television.

图16示出了说明从作为光盘的记录介质上读取信息以及在其上写入信息的信息复制/记录单元的配置示例的框图；16 shows a block diagram illustrating a configuration example of an information reproduction/recording unit that reads information from and writes information on a recording medium that is an optical disc;

图17示出了作为光盘的记录介质的配置的示例；FIG. 17 shows an example of the configuration of a recording medium as an optical disc;

图18A示出了蜂窝电话的示例；Figure 18A shows an example of a cellular phone;

图18B是示出蜂窝电话的配置示例的框图；FIG. 18B is a block diagram showing a configuration example of a cellular phone;

图19示出了复用数据的结构；Figure 19 shows the structure of multiplexed data;

图20示意性地示出了在复用数据中每个流是怎样被复用的；Fig. 20 schematically shows how each stream is multiplexed in multiplexed data;

图21更加详细地示出了视频流怎样存储在PES分组的流中；Figure 21 shows in more detail how a video stream is stored in a stream of PES packets;

图22示出了TS分组的结构和在复用数据中的源分组；Figure 22 shows the structure of TS packets and source packets in multiplexed data;

图23示出了PMT的数据结构；Figure 23 shows the data structure of PMT;

图24示出了复用数据信息的内部结构；Figure 24 shows the internal structure of the multiplexing data information;

图25示出了流属性信息的内部结构；Figure 25 shows the internal structure of stream attribute information;

图26示出了用于识别视频数据的步骤；Figure 26 shows the steps for identifying video data;

图27示出了用于实现根据每一实施例的运动图片编码方法和运动图片解码方法的集成电路的配置的示例；FIG. 27 shows an example of the configuration of an integrated circuit for realizing a moving picture encoding method and a moving picture decoding method according to each embodiment;

图28示出了用于在驱动频率之间进行切换的配置；Figure 28 shows an arrangement for switching between drive frequencies;

图29示出了用于识别视频数据以及在驱动频率之间进行切换的步骤；Figure 29 shows steps for identifying video data and switching between drive frequencies;

图30示出了在其中视频数据标准与驱动频率相关联的查找表的示例；FIG. 30 shows an example of a lookup table in which video data standards are associated with drive frequencies;

图31A是示出用于对信号处理单元的模块进行共享的配置的示例的图；31A is a diagram showing an example of a configuration for sharing modules of a signal processing unit;

图31B是示出用于对信号处理单元的模块进行共享的配置的另一个示例的图。FIG. 31B is a diagram showing another example of a configuration for sharing modules of a signal processing unit.

具体实施方式Detailed ways

根据本发明的示例性实施例，提供了使用时间运动矢量预测(TMVP)对视频进行编码的方法和对视频进行解码的方法，及其装置。具体而言，以不易受错误影响的方式来启用/禁用针对子图片单元(例如，切片)的时间运动矢量预测。为了实现该目标，根据本发明的优选实施例，将标志引入到图片的头部中或者更优选地引入到子图片单元的头部中，以用于指示针对子图片单元的图片间(或简称为“间”)预测是否使用了时间运动矢量预测。该标志还可以被称为时间运动矢量预测使用标志。在本发明的另外的方面中，在各个实施例中公开了用于确定/决定标志的值的优选技术。According to exemplary embodiments of the present invention, a method of encoding a video using Temporal Motion Vector Prediction (TMVP) and a method of decoding a video, and apparatuses thereof are provided. In particular, temporal motion vector prediction for sub-picture units (eg, slices) is enabled/disabled in a manner that is not susceptible to errors. In order to achieve this goal, according to a preferred embodiment of the present invention, a flag is introduced into the header of the picture or more preferably into the header of the sub-picture unit to indicate the inter-picture (or simply is "inter") whether temporal motion vector prediction is used for prediction. This flag may also be referred to as a temporal motion vector prediction use flag. In a further aspect of the invention, preferred techniques for determining/determining the value of the flag are disclosed in various embodiments.

为了清楚和简单起见，现在将对本发明的示例性实施例进行进一步的详细描述，由此子图片单元是图片的切片。本领域的技术人员将明白的是：切片分割仅是用于将图片划分成多个子图片分区的一种可能的方法。因此，下文中描述的本发明的实施例不局限于子图片单元是切片。例如，诸如拼贴、熵片和波阵面分割单元的其它子图片分割方法都在本发明的范围之内。Exemplary embodiments of the present invention will now be described in further detail for clarity and simplicity whereby a sub-picture unit is a slice of a picture. Those skilled in the art will appreciate that slice partitioning is only one possible method for dividing a picture into sub-picture partitions. Therefore, embodiments of the present invention described hereinafter are not limited to sub-picture units being slices. For example, other sub-picture segmentation methods such as tiling, entropy slices, and wavefront segmentation units are within the scope of the present invention.

图1是根据本发明的实施例的示例性经编码的视频比特流100的分解示意图。经编码的视频比特流100包括头部110和与头部110相关联的多个图片112。通常将图片112分割成多个子图片单元(例如，切片)114。每一个切片114包括切片头部116和与切片头部116相关联的切片数据118。切片数据118包括多个图片间预测类型的预测单元120。FIG. 1 is an exploded schematic diagram of an exemplary encoded video bitstream 100 according to an embodiment of the present invention. The encoded video bitstream 100 includes a header 110 and a plurality of pictures 112 associated with the header 110 . A picture 112 is typically partitioned into a plurality of sub-picture units (eg, slices) 114 . Each slice 114 includes a slice header 116 and slice data 118 associated with the slice header 116 . The slice data 118 includes a plurality of prediction units 120 of inter-picture prediction type.

在如图1中所示的示例性实施例中，用于指示针对切片114的图片间预测是否使用了时间运动矢量预测的标志122优选位于切片头部116中。因此，可以独立于同一图片112中的其它切片114来启用和禁用每一个切片114的时间运动矢量预测。切片头部116还包括：用于规定一个或多个参考图片列表中的参考图片的顺序的参考图片列表排序参数124。这些参数124确定用于与切片头部116相关联或相对应的切片114的图片间预测的参考图片列表中的参考图片的有效或最终顺序。这些参数124可以规定要在一个或多个初始参考图片列表上执行的重新排序过程，或者可以规定在不进行重新排序的情况下使用初始参考图片列表。如图1中所示，标志122优选位于与参考图片列表排序参数124相同的切片头部116中。运动矢量预测符选择参数126位于每一个预测单元120中，用于在可用于预测单元120的图片间预测的多个运动矢量预测符中选择运动矢量预测符。In the exemplary embodiment as shown in FIG. 1 , a flag 122 for indicating whether temporal motion vector prediction is used for the inter-picture prediction of the slice 114 is preferably located in the slice header 116 . Thus, temporal motion vector prediction for each slice 114 can be enabled and disabled independently of other slices 114 in the same picture 112 . The slice header 116 also includes a reference picture list ordering parameter 124 for specifying the order of the reference pictures in the one or more reference picture lists. These parameters 124 determine the effective or final order of the reference pictures in the reference picture list used for inter-picture prediction of the slice 114 associated with or corresponding to the slice header 116 . These parameters 124 may specify a reordering process to be performed on one or more of the original reference picture lists, or may specify that the original reference picture lists be used without reordering. As shown in FIG. 1 , flags 122 are preferably located in the same slice header 116 as reference picture list ordering parameters 124 . A motion vector predictor selection parameter 126 is located in each prediction unit 120 for selecting a motion vector predictor among a plurality of motion vector predictors available for inter-picture prediction of the prediction unit 120 .

在另一个实施例中，参考图片列表排序参数124和时间运动矢量预测使用标志122位于在同一图片112中的多个切片114之间共享的头部(未示出)中。例如，图片级别头部110可以是HEVC编码方案中的适应参数集合(APS)或公共切片片段头部。In another embodiment, the reference picture list ordering parameter 124 and the temporal motion vector prediction usage flag 122 are located in a header (not shown) that is shared between multiple slices 114 in the same picture 112 . For example, the picture level header 110 may be an adaptation parameter set (APS) or a common slice segment header in the HEVC coding scheme.

如同前文中所解释的，切片分割仅是用于将图片划分成多个子图片分区的一种可能的方法。可以使用其它可能的子图片分割方法，例如，拼贴、熵片和波阵面分割单元。在这些其它的子图片分割方法中，如前文中所述，位于切片头部116中的参数124和标志122反而可以位于子图片单元的头部中。As explained above, slice partitioning is only one possible method for dividing a picture into sub-picture partitions. Other possible sub-picture segmentation methods can be used, such as tiling, entropy slices, and wavefront segmentation units. In these other sub-picture segmentation methods, the parameters 124 and flags 122 located in the slice header 116 may instead be located in the header of the sub-picture unit as described above.

图2描绘了示出根据本发明的实施例的对视频进行编码的方法200的流程图。在步骤S202中，将一个或多个参数(即，参考图片列表排序参数)124写入切片114的头部116中，以规定用于切片124的图片间预测的一个或多个参考图片列表中的参考图片的顺序。这些参考图片列表中的一个参考图片列表(例如参考图片列表0)中预先确定的位置(例如第一图片)指示共置的参考图片。在步骤S204中，确定指示针对切片124的图片间预测是否使用了时间运动矢量的标志122的值。将在后文根据本发明的各个实施例描述用于确定标志122的值的多种技术。随后在步骤S206中，将标志122写入切片114的头部116中。在步骤S208中，对标志122的值进行分析或判断以便确定标志122指示使用还是未使用时间运动矢量预测。例如，具有值“0”的标志122可以指示未使用时间运动矢量预测，而具有值“1”的标志122可以指示使用了时间运动矢量预测，或者反之亦然。FIG. 2 depicts a flowchart illustrating a method 200 of encoding video according to an embodiment of the present invention. In step S202, one or more parameters (that is, reference picture list sorting parameters) 124 are written into the header 116 of the slice 114 to specify one or more reference picture lists for inter-picture prediction of the slice 124 The order of the reference pictures. A predetermined position (eg first picture) in one of these reference picture lists (eg reference picture list 0) indicates a co-located reference picture. In step S204, the value of the flag 122 indicating whether temporal motion vectors are used for the inter-picture prediction for the slice 124 is determined. Various techniques for determining the value of flag 122 will be described later in accordance with various embodiments of the invention. Then in step S206 , the flag 122 is written into the header 116 of the slice 114 . In step S208, the value of the flag 122 is analyzed or judged to determine whether the flag 122 indicates whether temporal motion vector prediction is used or not used. For example, a flag 122 with a value of "0" may indicate that temporal motion vector prediction is not used, while a flag 122 with a value of "1" may indicate that temporal motion vector prediction is used, or vice versa.

如果标志122指示使用了时间运动矢量预测，那么在步骤S210中，创建运动矢量预测符的列表(第一列表)，其包括多个运动矢量预测符，其中包括从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符。仅通过示例的方式，多个运动矢量可以包括：至少一个时间运动矢量预测符，从空间相邻的预测单元/块(即，空间运动矢量预测符)推导出的一个或多个运动矢量，以及零运动矢量。在步骤S212中，从运动矢量预测符的列表中选出运动矢量预测符用于切片124中的采样的目标块(即预测单元)120。在步骤214中，将参数(即，运动矢量预测符选择参数)(例如，第一参数)126写入经编码的视频比特流100(即，写入切片114的预测单元120)，以用于指示从运动矢量预测符的列表中选出的运动矢量预测符。If the flag 122 indicates that temporal motion vector prediction is used, then in step S210, a list of motion vector predictors (the first list) is created, which includes a plurality of motion vector predictors including at least A motion vector is derived from at least one temporal motion vector predictor. By way of example only, the plurality of motion vectors may include at least one temporal motion vector predictor, one or more motion vectors derived from spatially adjacent prediction units/blocks (i.e., spatial motion vector predictors), and Zero motion vectors. In step S212 , a motion vector predictor is selected from the list of motion vector predictors for the sampled target block (ie prediction unit) 120 in the slice 124 . In step 214, a parameter (i.e., a motion vector predictor selection parameter) (e.g., a first parameter) 126 is written to the encoded video bitstream 100 (i.e., to the prediction unit 120 of the slice 114) for Indicates a motion vector predictor selected from the list of motion vector predictors.

另一方面，如果标志122指示未使用时间运动矢量预测，那么在步骤S216中，创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的列表(例如，第二列表)。在步骤S218中，从运动矢量预测符的列表中选出运动矢量预测符用于切片124中的采样的目标块(即预测单元)。在步骤S220中，将参数(即，运动矢量预测符选择参数)(例如，第二参数)写入经编码的视频比特流100(即，写入与切片头部116相关联的切片数据118的每一个预测单元120)，以用于指示从运动矢量预测符的列表中选出的运动矢量预测符。On the other hand, if the flag 122 indicates that temporal motion vector prediction is not used, then in step S216, a list of motion vector predictors including a plurality of motion vector predictors without any temporal motion vector predictor is created (for example, a second list ). In step S218 , a motion vector predictor is selected from the list of motion vector predictors for the sampled target block (ie, prediction unit) in the slice 124 . In step S220, a parameter (i.e., a motion vector predictor selection parameter) (e.g., a second parameter) is written into the encoded video bitstream 100 (i.e., into the slice data 118 associated with the slice header 116 Each prediction unit 120) is used to indicate a motion vector predictor selected from the list of motion vector predictors.

在步骤S214或步骤S220之后，使用所选择的运动矢量预测符针对切片214执行经运动补偿的图片间预测来产生预测采样块。随后，在步骤S226中，从原始采样块中减去预测采样块来产生剩余采样块。因此，在步骤S226中，将与目标块相对应的剩余采样块编码成经编码的视频比特流100。After step S214 or step S220, motion-compensated inter-picture prediction is performed for the slice 214 using the selected motion vector predictor to generate a prediction sample block. Subsequently, in step S226, the predicted sample block is subtracted from the original sample block to generate a residual sample block. Therefore, in step S226 , the remaining sample blocks corresponding to the target block are encoded into the encoded video bitstream 100 .

因此，在本发明的上述实施例中，用于指示是否使用了时间运动矢量预测的标志122能够独立于同一个图片112中的其它切片114来控制一个切片114。因此，与第一切片114相对应的标志122在同一图片112中的第二或其它切片中并不确定是否使用了时间运动矢量预测。另外，在上述实施例中，消除了如背景技术中所述的对在解码器图片缓冲器(DPB)中的参考图片进行的标记处理。这导致提升了图片间预测的灵活性和编码效率。Therefore, in the above-described embodiments of the present invention, the flag 122 for indicating whether temporal motion vector prediction is used can control one slice 114 independently of other slices 114 in the same picture 112 . Therefore, a flag 122 corresponding to a first slice 114 in a second or other slice in the same picture 112 does not determine whether temporal motion vector prediction is used. In addition, in the above-described embodiments, the marking process for reference pictures in the decoder picture buffer (DPB) as described in the background art is eliminated. This results in improved flexibility and coding efficiency for inter-picture prediction.

在本发明的实施例中，运动矢量预测符的第一和第二列表包括不同数量的运动矢量预测符。优选地，第二列表包括的运动矢量预测符比第一列表少一个。在第一和第二列表二者中，不同于时间运动矢量预测符的运动矢量预测符可以是相同或等效的。这可以增加编码效率，因为编码器具有更多选择来从包括时间运动矢量预测符的列表(即，第一列表)中选择最佳的候选者。因为未使用时间运动矢量预测，所以第二列表可以提供更好的容错性。在经编码的视频比特流100中，第一和第二参数表示所选择的运动矢量预测符可以使用不同的比特表示，例如，使用在算术编码二值化或可变长度码中具有不同最大值的截短的一元表示。In an embodiment of the invention, the first and second lists of motion vector predictors comprise different numbers of motion vector predictors. Preferably, the second list includes one less motion vector predictor than the first list. Motion vector predictors other than temporal motion vector predictors may be the same or equivalent in both the first and second lists. This can increase coding efficiency because the encoder has more options to choose the best candidate from the list comprising temporal motion vector predictors (ie, the first list). The second list may provide better error tolerance because temporal motion vector prediction is not used. In the encoded video bitstream 100, the first and second parameters representing the selected motion vector predictors may be represented using different bits, e.g. using binary or variable length codes with different maximum values The truncated unary representation of .

在本发明的另一个实施例中，第一和第二列表包括相同数量的运动矢量预测符。第二列表包括不存在于第一列表中的另一个唯一预先确定的运动矢量预测符，而不是时间运动矢量预测符。这可以增加编码效率，因为编码器具有更多选择来从包括唯一预先确定的运动矢量预测符的列表(即，第二列表)中选择最佳的候选者。由于候选时间运动矢量预测符的最大数量对于第一和第二列表来说是相同的，因此，这降低了对用于指示所选择的运动矢量预测符的索引参数进行的解析过程的复杂度。唯一的运动矢量预测符是在没有时间依赖性(即，未使用来自任何参考图片的运动矢量)的情况下推导出的。仅通过示例的方式，唯一的运动矢量预测符可以是来自预先确定的相邻位置的空间运动矢量预测符。作为另一个示例，唯一的运动矢量预测符可以是零运动矢量预测符。In another embodiment of the invention, the first and second lists include the same number of motion vector predictors. The second list includes another unique predetermined motion vector predictor not present in the first list, instead of the temporal motion vector predictor. This can increase coding efficiency, since the encoder has more options to select the best candidate from the list (ie, the second list) comprising unique predetermined motion vector predictors. Since the maximum number of candidate temporal motion vector predictors is the same for the first and second lists, this reduces the complexity of the parsing process for the index parameter indicating the selected motion vector predictor. A unique motion vector predictor is derived without temporal dependence (ie, without using motion vectors from any reference pictures). Merely by way of example, the only motion vector predictor may be a spatial motion vector predictor from a predetermined neighboring position. As another example, the only motion vector predictor may be a zero motion vector predictor.

现在下面将描述根据本发明的实施例的用于对视频进行编码的示例性装置300。An exemplary apparatus 300 for encoding video according to an embodiment of the present invention will now be described below.

图3描绘了用于基于逐块对输入视频/图像比特流302进行编码以便生成经编码的视频比特流304的示例性装置300的示意性框图。装置300包括：可操作用于将输入数据变换成频率系数的变换单元306；可操作用于对输入数据进行量化的量化单元308；可操作用于对输入数据进行反量化的反量化单元310；可操作用于对输入数据进行反频率变换的反变换单元312；可操作用于存储诸如视频和图像的数据的块存储器314和图片存储器316；可操作用于执行图片内预测的图片内预测单元318；可操作用于执行图片间预测的图片间预测单元320；可操作用于将输入数据编码成经编码的视频比特流304的熵编码单元322；可操作用于决定针对目标切片的图片间预测是否使用时间运动矢量预测的控制单元324；运动矢量预测单元330；以及可操作用于将数据写入经编码的视频比特流304的写单元328。FIG. 3 depicts a schematic block diagram of an exemplary apparatus 300 for encoding an input video/image bitstream 302 on a block-by-block basis to generate an encoded video bitstream 304 . The apparatus 300 comprises: a transformation unit 306 operable to transform input data into frequency coefficients; a quantization unit 308 operable to quantize the input data; an inverse quantization unit 310 operable to dequantize the input data; an inverse transform unit 312 operable to inverse frequency transform input data; a block memory 314 and a picture memory 316 operable to store data such as video and images; an intra-picture prediction unit operable to perform intra-picture prediction 318; an inter-picture prediction unit 320 operable to perform inter-picture prediction; an entropy encoding unit 322 operable to encode input data into the encoded video bitstream 304; operable to decide an inter-picture prediction for a target slice A control unit 324 that predicts whether to use temporal motion vector prediction; a motion vector prediction unit 330 ; and a write unit 328 operable to write data into the encoded video bitstream 304 .

为了清晰起见，现在将对通过如图3中所示的装置300的示例性数据流进行描述。将输入视频302输入加法器，并且向变换单元306输出相加的值305。变换单元306将相加的值305变换成频率系数，并且向量化单元308输出所产生的频率系数307。量化单元308对输入的频率系数307进行量化，并且向反量化单元310和熵编码单元322输出所产生的经量化的值309。熵编码单元322对从量化单元308输出的经量化的值309进行编码，并且输出经编码的视频比特流304。For clarity, an exemplary data flow through apparatus 300 as shown in FIG. 3 will now be described. The input video 302 is input to the adder and the added value 305 is output to the transform unit 306 . The transform unit 306 transforms the added value 305 into a frequency coefficient, and outputs the resulting frequency coefficient 307 to the quantization unit 308 . The quantization unit 308 quantizes the input frequency coefficient 307 and outputs the resulting quantized value 309 to the inverse quantization unit 310 and the entropy encoding unit 322 . The entropy encoding unit 322 encodes the quantized value 309 output from the quantization unit 308 , and outputs an encoded video bitstream 304 .

反量化单元310对从量化单元308输出的经量化的值309进行反量化，并且向反变换单元312输出频率系数311。反变换单元312对频率系数311进行反频率变换以便将频率系数变换成比特流的采样值，并且向加法器输出所产生的采样值313。加法器将从反变换单元314输出的比特流的采样值313加上从图片内预测单元318或图片间预测单元320输出的预测的视频/图像值319，并且向块存储器105或图片存储器106输出所产生的相加的值315用于进一步的预测。图片内预测单元318或图片间预测单元320在存储在块存储器314或图片存储器316中的重构的视频/图像中进行搜索，并且估计例如与输入视频/图像最相似的视频/图像区域用于预测。The dequantization unit 310 dequantizes the quantized value 309 output from the quantization unit 308 , and outputs the frequency coefficient 311 to the inverse transformation unit 312 . The inverse transform unit 312 performs inverse frequency transform on the frequency coefficient 311 to transform the frequency coefficient into a sample value of the bit stream, and outputs the resulting sample value 313 to the adder. The adder adds the sample value 313 of the bitstream output from the inverse transform unit 314 to the predicted video/image value 319 output from the intra-picture prediction unit 318 or the inter-picture prediction unit 320, and outputs to the block memory 105 or the picture memory 106 The resulting added value 315 is used for further prediction. The intra-picture prediction unit 318 or the inter-picture prediction unit 320 searches in the reconstructed video/image stored in the block memory 314 or the picture memory 316 and estimates, for example, the video/image region most similar to the input video/image for predict.

控制单元324做出关于针对目标切片的图片间预测是否使用了时间运动矢量预测的决定，并且向运动矢量预测单元330和写单元322输出指示该决定的信号325。随后将根据本发明的各个实施例来对用于决定/确定是否使用了时间运动矢量预测(即，确定标志122的值)的多种技术进行描述。基于该决定，图片间预测单元320在使用或未使用时间运动矢量预测符的情况下执行图片间预测。具体而言，运动矢量预测单元330被配置为：接收标志122，并且如果标志是第一值(例如，“1”)，那么运动矢量预测单元330可操作用于创建运动矢量预测符的第一列表，其包括多个运动矢量预测符，其中包括从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符，以及针对子图片单元中的预测单元从第一列表中选出运动矢量预测符。写单元328还可操作用于：将第一参数写入经编码的视频比特流来指示从第一列表中选出的运动矢量预测符331。另一方面，如果标志122是第二值(例如，“0”)，那么运动矢量预测单元330可操作用于：创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的第二列表；以及针对子图片单元中的预测单元，从第二列表中选出运动矢量预测符。在这种情况下，写单元328还可操作用于：将第二参数写入经编码的视频比特流304来指示从第二列表中选出的运动矢量预测符331。写单元328还可操作用于：将表示具有指示是否使用了时间运动矢量预测的第一值或第二值(例如，“0”或“1”)的标志122的数据326写入经编码的视频比特流304(例如，子图片单元的头部或图片的头部)。The control unit 324 makes a decision as to whether the inter-picture prediction for the target slice uses temporal motion vector prediction, and outputs a signal 325 indicating the decision to the motion vector prediction unit 330 and the writing unit 322 . Various techniques for deciding/determining whether temporal motion vector prediction is used (ie, determining the value of flag 122 ) will be described later in accordance with various embodiments of the invention. Based on this decision, inter-picture prediction unit 320 performs inter-picture prediction with or without using a temporal motion vector predictor. Specifically, the motion vector prediction unit 330 is configured to receive the flag 122, and if the flag is a first value (eg, "1"), the motion vector prediction unit 330 is operable to create a first list comprising a plurality of motion vector predictors, including at least one temporal motion vector predictor derived from at least one motion vector from a co-located reference picture, and for a prediction unit in a sub-picture unit from the first list A motion vector predictor is selected. The writing unit 328 is further operable to: write a first parameter into the encoded video bitstream indicating the motion vector predictor 331 selected from the first list. On the other hand, if flag 122 is a second value (eg, "0"), motion vector prediction unit 330 is operable to: create a motion vector prediction that includes multiple motion vector predictors without any temporal motion vector predictors a second list of predictors; and for the prediction unit in the sub-picture unit, select a motion vector predictor from the second list. In this case, the writing unit 328 is further operable to: write a second parameter to the encoded video bitstream 304 indicating the motion vector predictor 331 selected from the second list. The writing unit 328 is also operable to: write data 326 representing a flag 122 having a first value or a second value (for example, "0" or "1") indicating whether temporal motion vector prediction is used into the encoded Video bitstream 304 (eg, sub-picture unit headers or picture headers).

图4描绘了示出根据本发明的实施例的对经编码的视频进行解码的方法400的流程图。具体而言，方法400可操作用于：对根据如图2中所示的上述对视频进行编码的方法进行编码的经编码的视频比特流100进行解码。在步骤S402中，对来自切片114的头部116的一个或多个参数(即，参考图片列表排序参数)进行解析，以规定用于切片114的图片间预测的一个或多个参考图片列表中的参考图片的顺序。如上文所提到的，在这些参考图片列表中的一个参考图片列表(例如参考图片列表0)中的预先确定的位置(例如第一图片)指示共置的参考图片。在步骤S404中，对来自头部116的标志(即，时间运动矢量预测标志)122进行解析，标志122指示针对切片118的图片间预测是否使用了时间运动矢量预测。在步骤S406中，对标志122的值进行分析或判断以便确定标志122指示使用还是未使用时间运动矢量预测。FIG. 4 depicts a flowchart illustrating a method 400 of decoding encoded video according to an embodiment of the present invention. Specifically, the method 400 is operable to: decode the encoded video bitstream 100 encoded according to the above-mentioned method for encoding video as shown in FIG. 2 . In step S402, one or more parameters from the header 116 of the slice 114 (that is, the reference picture list sorting parameters) are parsed to specify one or more reference picture lists for inter-picture prediction of the slice 114. The order of the reference pictures. As mentioned above, a predetermined position (eg first picture) in one of the reference picture lists (eg reference picture list 0) indicates a co-located reference picture. In step S404, the flag (ie, temporal motion vector prediction flag) 122 from the header 116 is analyzed, and the flag 122 indicates whether temporal motion vector prediction is used for the inter-picture prediction of the slice 118 . In step S406, the value of the flag 122 is analyzed or judged to determine whether the flag 122 indicates whether temporal motion vector prediction is used or not used.

如果标志122指示使用了时间运动矢量预测，那么在步骤S408中，创建运动矢量预测符的列表(第一列表)，其包括多个运动矢量预测符，其中包括从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符。仅通过示例的方式，多个运动矢量可以包括：至少一个时间运动矢量预测符，从空间相邻的预测单元/块(即，空间运动矢量预测符)推导出的一个或多个运动矢量，以及零运动矢量。在步骤S410中，对来自经编码的视频比特流100(即，根据切片114的预测单元120)的参数(即，运动矢量预测符选择参数)(例如，第一参数)126进行解析，其指示针对切片114中的采样的目标块(即预测单元120)从运动矢量预测符的列表中选出的运动矢量预测符。If the flag 122 indicates that temporal motion vector prediction is used, then in step S408, a list of motion vector predictors (the first list) is created, which includes a plurality of motion vector predictors including at least A motion vector is derived from at least one temporal motion vector predictor. By way of example only, the plurality of motion vectors may include at least one temporal motion vector predictor, one or more motion vectors derived from spatially adjacent prediction units/blocks (i.e., spatial motion vector predictors), and Zero motion vectors. In step S410, parameters (ie, motion vector predictor selection parameters) (eg, first parameters) 126 from the encoded video bitstream 100 (ie, prediction units 120 according to slices 114 ) are parsed, which indicate A motion vector predictor selected from the list of motion vector predictors for the sampled target block in slice 114 (ie, prediction unit 120).

另一方面，如果标志122指示未使用时间运动矢量预测，那么在步骤S412中，创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的列表(例如，第二列表)。在步骤S414中，对来自经编码的视频比特流100(即，根据切片114的预测单元120)对参数(即，运动矢量预测符选择参数)(例如，第二参数)进行解析，其指示针对切片114中的采样的目标块(即预测单元120)从运动矢量预测符的列表中选出的运动矢量预测符。On the other hand, if the flag 122 indicates that temporal motion vector prediction is not used, then in step S412, a list of motion vector predictors including a plurality of motion vector predictors without any temporal motion vector predictor (for example, a second list ). In step S414, a parameter (ie, motion vector predictor selection parameter) (eg, a second parameter) from the encoded video bitstream 100 (ie, prediction unit 120 according to slice 114 ) is parsed, which indicates for The motion vector predictor selected by the sampled target block in slice 114 (ie, prediction unit 120 ) from the list of motion vector predictors.

在步骤S410或步骤S414之后，在步骤S416中，使用所选择的运动矢量预测符执行经运动补偿的图片间预测来产生预测采样块。随后，在步骤S418中，从经编码的视频比特流100中解码出剩余采样块。此后，在步骤S420中，将预测采样块和剩余采样块加在一起以产生与目标块相对应的重构采样块。After step S410 or step S414, in step S416, motion compensated inter-picture prediction is performed using the selected motion vector predictor to generate a prediction sample block. Subsequently, in step S418 , the remaining sample blocks are decoded from the encoded video bitstream 100 . Thereafter, in step S420, the predicted sample block and the remaining sample block are added together to generate a reconstructed sample block corresponding to the target block.

现在下面将描述根据本发明的实施例的用于对经编码的视频进行解码的示例性装置500。An exemplary apparatus 500 for decoding encoded video according to an embodiment of the present invention will now be described below.

图5描绘了用于基于逐块对输入经编码的比特流502进行解码并且例如向显示器输出视频/图像504的示例性装置500的示意性框图。装置500包括：可操作用于对输入经编码的比特流502进行解码的熵解码单元506；可操作用于对输入数据进行反量化的反量化单元508；可操作用于对输入数据进行反频率变换的反变换单元510；可操作用于存储诸如视频和图像的数据的块存储器512和图片存储器514；用于执行图片内预测的图片内预测单元516；用于执行图片间预测的图片间预测单元518；运动矢量预测单元522；以及可操作用于对输入经编码的比特流502进行解析并输出各个参数520、521的解析单元503。Fig. 5 depicts a schematic block diagram of an exemplary apparatus 500 for decoding an input encoded bitstream 502 on a block-by-block basis and outputting a video/image 504, for example to a display. The apparatus 500 comprises: an entropy decoding unit 506 operable to decode an input encoded bitstream 502; an inverse quantization unit 508 operable to inverse quantize the input data; Inverse transform unit 510 for transform; block memory 512 and picture memory 514 operable to store data such as video and images; intra-picture prediction unit 516 for performing intra-picture prediction; inter-picture prediction for performing inter-picture prediction unit 518; a motion vector prediction unit 522; and a parsing unit 503 operable to parse the input encoded bitstream 502 and output respective parameters 520,521.

为了清晰起见，现在将对通过如图5中所示的装置500的示例性数据流进行描述。将输入经编码的比特流502输入到熵解码单元506。在经编码的比特流502输入到熵解码单元506之后，熵解码单元506对输入经编码的比特流502进行解码，并且将经解码的值507输出到反量化单元508。反量化单元508对经解码的值507进行反量化，并且向反变换单元510输出频率系数509。反变换单元510对频率系数509进行反频率变换以便将频率系数509变换成采样值511，并且向加法器输出所产生的采样值511。加法器将所产生的采样值511加上从图片内预测单元516或图片间预测单元518输出的预测的视频/图像值519，并且向例如显示器以及向块存储器512或图片存储器514输出所产生的值504以用于进一步的预测。此外，图片内预测单元516或图片间预测单元518在存储于块存储器512或图片存储器514中的视频/图像中进行搜索，并且估计例如与经解码的视频/图像最相似的视频/图像区域用于预测。For clarity, an exemplary data flow through apparatus 500 as shown in FIG. 5 will now be described. The input encoded bitstream 502 is input to an entropy decoding unit 506 . After the encoded bitstream 502 is input to the entropy decoding unit 506 , the entropy decoding unit 506 decodes the input encoded bitstream 502 and outputs the decoded value 507 to the inverse quantization unit 508 . The dequantization unit 508 dequantizes the decoded value 507 and outputs the frequency coefficient 509 to the inverse transformation unit 510 . The inverse transform unit 510 performs inverse frequency transform on the frequency coefficient 509 to transform the frequency coefficient 509 into a sample value 511, and outputs the resulting sample value 511 to an adder. The adder adds the resulting sampled value 511 to the predicted video/image value 519 output from the intra-picture prediction unit 516 or the inter-picture prediction unit 518 and outputs the resulting sampled value 519 to, for example, a display and to a block memory 512 or a picture memory 514 Value 504 for further prediction. Furthermore, the intra-picture prediction unit 516 or the inter-picture prediction unit 518 searches the video/image stored in the block memory 512 or the picture memory 514 and estimates, for example, the video/image area most similar to the decoded video/image with in forecasting.

另外，解析单元506对来自切片或图片的头部用于指示针对目标切片的图片间预测是否使用了时间运动矢量预测的标志122进行解析，并且向运动矢量预测单元522输出所解析的数据520。图片间预测单元518可操作用于：基于标志122的值和来自运动矢量预测单元522的所选择的运动矢量预测符，在使用或未使用时间运动矢量预测符的情况下执行图片间预测。具体而言，运动矢量预测单元522被配置为：接收包含标志122的数据520，并且如果标志是第一值(例如，“1”)，那么运动矢量预测单元522可操作用于创建运动矢量预测符的第一列表，其包括多个运动矢量预测符，其中包括从来自共置的参考图片的至少一个运动矢量推导出的至少一个时间运动矢量预测符。如果标志是第二值(例如，“0”)，那么运动矢量单元522可操作用于：创建包括多个运动矢量预测符而没有任何时间运动矢量预测符的运动矢量预测符的第二列表。解析单元503还可操作用于：对来自经编码的视频比特流502的第一或第二参数进行解析，所述第一或第二参数指示针对子图片单元中的预测单元从第二列表选出的运动矢量预测符，并且向运动矢量预测单元522输出所解析的数据521。In addition, the parsing unit 506 parses the flag 122 from the header of the slice or picture indicating whether the inter-picture prediction for the target slice uses temporal motion vector prediction, and outputs the parsed data 520 to the motion vector predicting unit 522 . The inter-picture prediction unit 518 is operable to perform inter-picture prediction with or without using a temporal motion vector predictor based on the value of the flag 122 and the selected motion vector predictor from the motion vector prediction unit 522 . In particular, motion vector prediction unit 522 is configured to receive data 520 containing flag 122, and if flag is a first value (eg, "1"), motion vector prediction unit 522 is operable to create a motion vector prediction A first list of symbols comprising a plurality of motion vector predictors including at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture. If the flag is a second value (eg, "0"), motion vector unit 522 is operable to: create a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors. The parsing unit 503 is further operable to: parse a first or a second parameter from the encoded video bitstream 502, the first or second parameter indicating that the prediction unit in the sub-picture unit is selected from the second list. The extracted motion vector predictor and the parsed data 521 are output to the motion vector prediction unit 522.

如上文所提到的，现在将根据本发明的各个实施例来对用于决定/确定是否使用了时间运动矢量预测(即，确定标志122的值)的多种技术进行描述。As mentioned above, various techniques for deciding/determining whether temporal motion vector prediction is used (ie, determining the value of flag 122 ) will now be described in accordance with various embodiments of the invention.

根据第一实施例，标志122的值是基于当前图片的时间层确定的。图6描绘了示出一组图片例如在组大小/结构被配置为4时的不同时间层的图。在该示例中，存在三个时间层，即，时间层“0”602、时间层“1”604和时间层“2”606。具有为0、4和8的图片顺序计数(POC)值的图片位于时间层“0”602中，具有为2和6的POC值的图片位于时间层“1”604中，而具有为1、3、5和7的POC值的图片位于时间层“2”606中。时间层“0”、“1”和“2”分别与时间ID0、1和2相关联或者由时间ID0、1和2表示。相应地，时间层“0”602中的图片具有与其相关联的时间ID“0”，时间层“1”604中的图片具有与其相关联的时间ID1，而时间层“2”606中的图片具有与其相关联的时间ID2。According to the first embodiment, the value of the flag 122 is determined based on the temporal layer of the current picture. FIG. 6 depicts a diagram showing different temporal layers of a group of pictures, eg, when the group size/structure is configured as four. In this example, there are three temporal strata, namely, temporal stratum “0” 602 , temporal stratum “1” 604 , and temporal stratum “2” 606 . Pictures with picture order count (POC) values of 0, 4, and 8 are in temporal layer "0" 602, pictures with POC values of 2 and 6 are in temporal layer "1" 604, and pictures with POC values of 1, The pictures for POC values of 3, 5 and 7 are located in temporal layer "2" 606 . Time layers "0", "1" and "2" are associated with or represented by time ID0, 1 and 2, respectively. Accordingly, pictures in temporal layer "0" 602 have temporal ID "0" associated therewith, pictures in temporal layer "1" 604 have temporal ID1 associated therewith, and pictures in temporal layer "2" 606 has a time ID2 associated with it.

图7描绘了示出根据第一实施例的用于确定标志122的值的方法700的流程图。在步骤S702中，基于与当前图片相关联的时间ID来确定当前图片的时间层。随后，在步骤S704中，分析或判断所确定的时间层是否是最低层或基层(即，是否时间ID＝0)。如果时间层是最低层，那么在步骤S706中，将标志122设置为指示未使用时间运动矢量预测的值(例如，“0”)。另一方面，如果时间层不是最低层，那么在步骤S708中，将标志122设置为指示使用了时间运动矢量预测的值(例如，“1”)。这是因为，在典型的编码结构中，较高的时间ID图片通常参考具有时间ID＝0的图片。在当具有时间ID＝0的图片丢失或包含错误时的情况下，错误将被传播到参考具有时间ID＝0的图片的任何图片。该错误传播可能继续并且影响使用具有时间ID＝0的时间运动矢量图片的所有随后图片的重构。因此，该实施例通过未使用具有时间ID＝0的时间运动矢量图片而提升了容错性。FIG. 7 depicts a flowchart illustrating a method 700 for determining the value of the flag 122 according to the first embodiment. In step S702, the temporal layer of the current picture is determined based on the temporal ID associated with the current picture. Subsequently, in step S704, analyze or judge whether the determined temporal layer is the lowest layer or the base layer (ie, whether temporal ID=0). If the temporal layer is the lowest layer, in step S706, the flag 122 is set to a value indicating that temporal motion vector prediction is not used (for example, "0"). On the other hand, if the temporal layer is not the lowest layer, then in step S708, the flag 122 is set to a value indicating that temporal motion vector prediction is used (for example, "1"). This is because, in typical coding structures, higher temporal ID pictures usually refer to pictures with temporal ID=0. In the case when a picture with temporal ID=0 is missing or contains an error, the error will be propagated to any picture that references the picture with temporal ID=0. This error propagation may continue and affect the reconstruction of all subsequent pictures using the temporal motion vector picture with temporal ID=0. Therefore, this embodiment improves error tolerance by not using temporal motion vector pictures with temporal ID=0.

根据第二实施例，标志122的值是基于当前图片的POC值确定的。图8描绘了示出根据第二实施例的用于确定标志122的值的方法800的流程图。在步骤S802中，获得或确定当前图片的POC值和DPB中的所有参考图片的POC值。在步骤S804中，分析并判断当前图片的POC值是否大于DPB中的参考图片的任意POC值。如果是，那么在步骤S806中，将标志122设置为指示未使用时间运动矢量预测的值(例如，“0”)。否则，在步骤S808中，将标志122设置为指示使用了时间运动矢量预测的值(例如，“1”)。这是因为较高质量的图片(例如，时间层0图片)只参考相同或更高质量的图片。在该实施例中，鉴于包含在存储有多个参考图片的经解码的图片缓冲器中的参考图片的POC值识别出较高质量的图片。出于与上述第一实施例相似的原因，随后的图片通常参考较高质量的图片。因此，为了防止错误传播或使其最小化，并且提升容错性，针对较高质量的图片禁用标志122。According to the second embodiment, the value of the flag 122 is determined based on the POC value of the current picture. FIG. 8 depicts a flowchart illustrating a method 800 for determining the value of the flag 122 according to the second embodiment. In step S802, the POC value of the current picture and the POC values of all reference pictures in the DPB are obtained or determined. In step S804, analyze and judge whether the POC value of the current picture is greater than any POC value of the reference picture in the DPB. If so, then in step S806, flag 122 is set to a value indicating that temporal motion vector prediction is not used (eg, "0"). Otherwise, in step S808, the flag 122 is set to a value indicating that temporal motion vector prediction is used (for example, "1"). This is because higher quality pictures (eg, temporal layer 0 pictures) only reference pictures of the same or higher quality. In this embodiment, a higher quality picture is identified in view of a POC value of a reference picture contained in a decoded picture buffer storing a plurality of reference pictures. Subsequent pictures generally refer to higher quality pictures for reasons similar to the first embodiment described above. Therefore, to prevent or minimize error propagation, and improve fault tolerance, flag 122 is disabled for higher quality pictures.

根据第三实施例，标志122的值是基于当前图片中的间切片的切片类型确定的。间切片是使用图片间预测来编码或解码的切片。图9描绘了示出根据第三实施例的用于确定标志122的值的方法900的的流程图。在步骤902中，确定当前图片中的间切片的切片类型。随后，分析并判断切片类型是否是P切片(即，预测型切片)。如果是，那么在步骤S906中，将标志122设置为指示未使用时间运动矢量预测的值(例如，“0”)。另一方面，如果确定的切片类型不是P切片(例如，其是双向预测型或B切片)，那么在步骤S908中，将标志122设置为指示使用了时间运动矢量预测的值(例如，“1”)。其原因是因为P切片使用单向前向预测。因此，为了防止错误传播或使其最小化，并且提升容错性，针对P切片禁用标志122。According to the third embodiment, the value of the flag 122 is determined based on the slice type of the inter slice in the current picture. An inter slice is a slice encoded or decoded using inter-picture prediction. FIG. 9 depicts a flowchart illustrating a method 900 for determining the value of the flag 122 according to the third embodiment. In step 902, the slice type of the inter slice in the current picture is determined. Subsequently, it is analyzed and judged whether the slice type is a P slice (ie, a predictive slice). If so, then in step S906, flag 122 is set to a value indicating that temporal motion vector prediction is not used (eg, "0"). On the other hand, if the determined slice type is not a P slice (for example, it is a bidirectional predictive type or a B slice), then in step S908, the flag 122 is set to a value indicating that temporal motion vector prediction is used (for example, "1 "). The reason for this is because P slices use one-way forward prediction. Therefore, to prevent or minimize error propagation, and improve fault tolerance, flag 122 is disabled for P slices.

根据第四实施例，标志122的值是基于图片是否是随机接入点(RAP)图片确定的。RAP图片是在不必执行解码顺序中在该RAP图片之前的任意图片的解码过程的情况下，其本身及解码顺序中随后的图片能够被正确解码的图片。例如，HEVC规范规定了RAP图片作为对其每一个切片片段具有范围为7至12(包含边界)的NAL单元类型(即，nal_unit_type)的经编码的图片。图10描绘了NAL单元流的图表示，即，用于经编码的视频比特流的一系列NAL单元102。如本领域技术人员已知的，NAL(网络抽象层)对经编码的视频的视频编码层(VCL)表示进行格式化，并以适合于通过各种传输层或存储介质传送的方式来提供头部信息。每个NAL单元102包括其后跟随有数据段106的头部104。头部104包括用来指示NAL单元102中的数据的类型的参数，并且数据段106包含由头部104所指示的数据。例如，图10示出了三个NAL单元：包含参数集(如由NAL单元类型108所指示的)的第一NAL单元、包含基视图/层(如由NAL单元类型110所指示的)的第二NAL单元、以及包含非基视图/层(如由NAL单元类型112所指示的)的第三NAL单元。每一个NAL单元的头部104还包括：如图7中示出的第一实施例中描述的时间ID。According to the fourth embodiment, the value of the flag 122 is determined based on whether the picture is a random access point (RAP) picture. A RAP picture is a picture that itself and subsequent pictures in decoding order can be correctly decoded without having to perform the decoding process of any picture preceding the RAP picture in decoding order. For example, the HEVC specification specifies a RAP picture as a coded picture with a NAL unit type (ie, nal_unit_type) in the range of 7 to 12 (inclusive) for each slice segment. Figure 10 depicts a graph representation of a NAL unit stream, ie a sequence of NAL units 102 for an encoded video bitstream. As known to those skilled in the art, the NAL (Network Abstraction Layer) formats the Video Coding Layer (VCL) representation of encoded video and provides headers in a manner suitable for transmission over various transport layers or storage media. department information. Each NAL unit 102 includes a header 104 followed by a data segment 106 . Header 104 includes parameters to indicate the type of data in NAL unit 102 , and data segment 106 contains the data indicated by header 104 . For example, FIG. 10 shows three NAL units: a first NAL unit containing a parameter set (as indicated by NAL unit type 108), a second NAL unit containing a base view/layer (as indicated by NAL unit type 110). Two NAL units, and a third NAL unit comprising a non-base view/layer (as indicated by NAL unit type 112). The header 104 of each NAL unit also includes: the time ID described in the first embodiment as shown in FIG. 7 .

图11使用多个切片描绘了包含多个视图/层的示例性RAP图片1100的图表示。如图所示，RAP图片1100包括在基层(图片内视图)1104中的多个切片1102和非基层(图片间视图)1110中的多个切片1106。FIG. 11 depicts a graph representation of an exemplary RAP picture 1100 containing multiple views/layers using multiple slices. As shown, a RAP picture 1100 includes a number of slices 1102 in a base layer (intra-picture view) 1104 and a number of slices 1106 in a non-base layer (inter-picture view) 1110 .

图12描绘了示出根据第四实施例的用于确定标志122的值的方法1200的流程图。在步骤S1202中，对图片进行分析以确定或获得指定切片的NAL单元类型的图片的每一个切片的参数。随后，在步骤S1204中，基于所获得的参数来确定或判断包含当前切片的图片是否是RAP图片，以及当前切片是否属于图片的非基视图/层。图片是否是RAP图片1100可以通过以下来确定：对图片中的每一个NAL单元或切片1002的头部1004中的NAL单元类型1008、1010、1012的值进行分析。如同上面所提到的，RAP图片1100是在不执行解码顺序中在该RAP图片1100之前的任意图片的解码过程的情况下，其本身及解码顺序中随后的图片能够被正确解码的图片。例如，HEVC规范规定了RAP图片作为对其每一个切片片段具有范围为7至12(包含边界)的NAL单元类型的经编码的图片。因此，在该示例中，如果图片中的每一个NAL单元1002的NAL单元类型1008、1010、1012在范围7至12之间(包括边界)，那么确定该图片是RAP图片1100。当前切片是否是图片的非基层可以通过对当前切片的NAL单元类型1008、1010、1012的检查来确定。例如，NAL单元类型1012指示相关联的切片1006属于非基层，并且NAL单元类型1010指示相关联的切片1006属于基层。然而，本领域技术人员将明白的是，可以基于依赖于视频编码方案的其它参数来识别非基层。例如，在当前的HEVC多视图HEVC工作草案中，当前切片是否是图片的非基层是通过层ID确定的。如果图片是RAP图片1100并且当前切片属于图片的非基层，那么在步骤S1206中，将标志122设置为指示未使用时间运动矢量预测的值(例如，“0”)。否则，在步骤S1208中，将标志122设置为指示使用了时间运动矢量预测的值(例如，“1”)。其原因是因为使用时间运动矢量预测的益处在于从时间上改进运动矢量预测，也就是说，根据在时间上不同的其它图片进行预测。然而，如果内图片和间图片在当前图片的相同时间之内，则使用时间运动矢量预测是无益的。因此，为了提升编码/解码效率，针对属于RAP图片1100的非基(或图片间视图)层的切片1106禁用标志122。FIG. 12 depicts a flowchart illustrating a method 1200 for determining the value of the flag 122 according to the fourth embodiment. In step S1202, the picture is analyzed to determine or obtain parameters of each slice of the picture of the NAL unit type of the specified slice. Subsequently, in step S1204, it is determined or judged based on the obtained parameters whether the picture including the current slice is a RAP picture, and whether the current slice belongs to a non-base view/layer of the picture. Whether a picture is a RAP picture 1100 may be determined by analyzing the value of the NAL unit type 1008, 1010, 1012 in the header 1004 of each NAL unit or slice 1002 in the picture. As mentioned above, a RAP picture 1100 is a picture that itself and subsequent pictures in decoding order can be correctly decoded without performing the decoding process of any picture preceding the RAP picture 1100 in decoding order. For example, the HEVC specification specifies a RAP picture as a coded picture with NAL unit types ranging from 7 to 12 (boundary) for each slice segment. Thus, in this example, a picture is determined to be a RAP picture 1100 if the NAL unit type 1008, 1010, 1012 of each NAL unit 1002 in the picture is in the range 7 to 12 inclusive. Whether the current slice is a non-base layer of a picture may be determined by checking the NAL unit type 1008, 1010, 1012 of the current slice. For example, NAL unit type 1012 indicates that the associated slice 1006 belongs to a non-base layer, and NAL unit type 1010 indicates that the associated slice 1006 belongs to a base layer. However, it will be apparent to those skilled in the art that non-base layers may be identified based on other parameters depending on the video coding scheme. For example, in the current HEVC multi-view HEVC working draft, whether the current slice is a non-base layer of a picture is determined by a layer ID. If the picture is the RAP picture 1100 and the current slice belongs to the non-base layer of the picture, then in step S1206, the flag 122 is set to a value indicating that temporal motion vector prediction is not used (for example, "0"). Otherwise, in step S1208, the flag 122 is set to a value indicating that temporal motion vector prediction is used (for example, "1"). The reason for this is because the benefit of using temporal motion vector prediction is to temporally improve the motion vector prediction, that is, to make predictions from other pictures that are temporally different. However, it is not beneficial to use temporal motion vector prediction if the intra-picture and inter-picture are within the same time of the current picture. Therefore, to improve encoding/decoding efficiency, the flag 122 is disabled for slices 1106 belonging to non-base (or inter-picture view) layers of the RAP picture 1100 .

(实施例A)(Example A)

在各个实施例中描述的处理可以通过在记录介质中记录用于实现各个实施例中描述的运动图片编码方法(图象编码方法)和运动图片解码方法(图像解码方法)的配置的程序，而在独立的计算机系统中简单地实现。记录介质可以是诸如磁盘、光盘、磁光盘、IC卡和半导体存储器的任何记录介质，只要程序可以被记录。The processing described in each embodiment can be performed by recording a program for realizing the configuration of the moving picture encoding method (image encoding method) and moving picture decoding method (image decoding method) described in each embodiment in a recording medium, Simple implementation in a stand-alone computer system. The recording medium may be any recording medium such as a magnetic disk, an optical disk, a magneto-optical disk, an IC card, and a semiconductor memory as long as the program can be recorded.

下文中，将对在各个实施例中描述的运动图片编码方法(图象编码方法)和运动图片解码方法(图像解码方法)的应用以及使用它们的系统进行描述。系统具有以下特征：具有包括使用图像编码方法的图像编码装置和使用图像解码方法的图像解码装置的图像编码和解码装置。可以根据情况适当改变系统中的其它配置。Hereinafter, applications of the moving picture encoding method (image encoding method) and moving picture decoding method (image decoding method) described in the respective embodiments and a system using them will be described. The system is characterized by having an image encoding and decoding device including an image encoding device using an image encoding method and an image decoding device using an image decoding method. Other configurations in the system can be changed appropriately according to the situation.

图13示出了用于实现内容分配服务的内容提供系统ex100的总体配置。用于提供通信服务的区域被划分成期望大小的小区，并且基站ex106、ex107、ex108、ex109和ex110(它们是固定无线站)置于各个小区中。Fig. 13 shows an overall configuration of a content providing system ex100 for realizing content distribution services. An area for providing communication services is divided into cells of a desired size, and base stations ex106, ex107, ex108, ex109, and ex110, which are fixed wireless stations, are placed in the respective cells.

内容提供系统ex100分别经由互联网ex101、互联网服务提供商ex102、电话网络ex104、以及基站ex106至ex110连接到诸如计算机ex111、个人数字助理(PDA)ex112、摄像机ex113、蜂窝电话ex114和游戏机ex115的各个设备。The content providing system ex100 is connected to various devices such as a computer ex111, a personal digital assistant (PDA) ex112, a camera ex113, a cellular phone ex114, and a game machine ex115 via the Internet ex101, an Internet service provider ex102, a telephone network ex104, and base stations ex106 to ex110, respectively. equipment.

但是，内容提供系统ex100的配置并不局限于图13中示出的配置，并且这些元件中的任意元件连接的组合是可接受的。另外，每一个设备可以直接连接到电话网络ex104，而不是经由基站ex106至ex110(它们是固定无线站)。另外，这些设备可以经由短距离无线通信等彼此互连。However, the configuration of the content providing system ex100 is not limited to the configuration shown in FIG. 13 , and a combination in which any of these elements are connected is acceptable. In addition, each device can be directly connected to the telephone network ex104 instead of via the base stations ex106 to ex110 (which are fixed wireless stations). In addition, these devices may be interconnected with each other via short-range wireless communication or the like.

摄像机ex113(例如数字摄像机)能够拍摄视频。照相机ex116(例如数码照相机)能够捕捉静止图像和视频。另外，蜂窝电话ex114可以是满足诸如全球移动通信系统(GSM)(注册商标)、码分多址(CDMA)、宽带码分多址(W-CDMA)、长期演进(LTE)以及高速分组接入(HSPA)标准中的任意标准的蜂窝电话。或者，蜂窝电话ex114可以是个人手持电话系统(PHS)。The video camera ex113 (such as a digital video camera) is capable of taking video. The camera ex116 (eg, a digital camera) is capable of capturing still images and video. In addition, the cellular phone ex114 may be a device that meets requirements such as Global System for Mobile Communications (GSM) (registered trademark), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (W-CDMA), Long Term Evolution (LTE), and High Speed Packet Access. (HSPA) standards for cellular telephones of any standard. Alternatively, the cellular telephone ex114 may be a Personal Handyphone System (PHS).

在内容提供系统ex100中，流服务器ex103经由电话网络ex104和基站ex109连接到摄像机ex113和其它设备，其能够对直播节目等的图像进行分发。这这样的分发中，如同上面在各个实施例中所描述的，对用户使用摄像机ex113捕捉的内容(例如，音乐直播节目的视频)进行编码(即，摄像机用作根据本发明的一方案的图像编码装置)，并且将经编码的内容发送到流服务器ex103。另一方面，当客户端进行请求时，流服务器ex103向客户端进行对所发送的内容数据的流分发。客户端包括能够对上述经编码的数据进行解码的计算机ex111、PDA exX112、摄像机ex113、蜂窝电话ex114、游戏机ex115。已接收到分发的数据的各个设备对经编码的数据进行解码和重现(即，用作根据本发明的一方案的图像解码装置)。In the content providing system ex100, a streaming server ex103 is connected to a camera ex113 and other devices via a telephone network ex104 and a base station ex109, which can distribute images of live programs and the like. In such a distribution, as described above in various embodiments, the content captured by the user using the camera ex113 (for example, a video of a live music program) is encoded (that is, the camera is used as an image according to an aspect of the present invention) encoding device), and transmits the encoded content to the streaming server ex103. On the other hand, when a client makes a request, the streaming server ex103 performs streaming distribution of the transmitted content data to the client. The clients include a computer ex111, a PDA exX112, a video camera ex113, a cellular phone ex114, and a game machine ex115 capable of decoding the above-mentioned encoded data. Each device that has received the distributed data decodes and reproduces the coded data (ie, functions as an image decoding device according to an aspect of the present invention).

捕捉的数据可以由摄像机ex113或发送数据的流服务器ex103进行编码，或者编码过程可以在摄像机ex113与流服务器ex103之间共享。类似地，分发的数据可以由客户端或流服务器ex103进行解码，或者解码过程可以在客户端与流服务器ex103之间共享。另外，不仅是由摄像机ex113捕捉的而且由照相机ex116捕捉的静止图像和视频的数据可以通过计算机ex111发送到流服务器ex103。编码过程可以由照相机ex116、计算机ex111或流服务器ex103来执行，或者在它们之间共享。Captured data may be encoded by the camera ex113 or the streaming server ex103 sending the data, or the encoding process may be shared between the camera ex113 and the streaming server ex103. Similarly, distributed data may be decoded by the client or the streaming server ex103, or the decoding process may be shared between the client and the streaming server ex103. In addition, data of still images and videos captured not only by the camera ex113 but also by the camera ex116 can be sent to the streaming server ex103 through the computer ex111. The encoding process may be performed by the camera ex116, the computer ex111, or the streaming server ex103, or shared among them.

另外，编码和解码过程可以由通常包括在各个计算机ex111和设备中的LSI ex500来执行。LSI ex500可以被配置为具有单个芯片或多个芯片。用于对视频进行编码和解码的软件可以被集成到计算机ex111等可读的某种类型的记录介质(诸如CD-ROM、软盘和硬盘)中，并且可以使用软件来执行编码和解码过程。另外，当蜂窝电话ex114配备有摄像机时，可以发送由摄像机获得的视频数据。视频数据是由包括在蜂窝电话ex114中的LSIex500编码的数据。In addition, encoding and decoding processes can be performed by LSI ex500 generally included in each computer ex111 and equipment. The LSI ex500 can be configured with a single chip or multiple chips. Software for encoding and decoding video may be integrated into some type of recording medium such as CD-ROM, floppy disk, and hard disk readable by a computer ex111, and the encoding and decoding process may be performed using the software. Also, when the cellular phone ex114 is equipped with a camera, video data obtained by the camera can be transmitted. Video data is data encoded by the LSI ex500 included in the cellular phone ex114.

另外，流服务器ex103可以包括服务器和计算机，并且可以分散数据并对分散的数据进行处理、记录或分发数据。In addition, the streaming server ex103 may include servers and computers, and may disperse data and process, record, or distribute data on the dispersed data.

如上所述，客户端可以接收并在内容提供系统ex100中重现经编码的数据。换句话说，客户端可以接收并解码由用户发送的信息，并在内容提供系统ex100中对经解码的数据进行实时重现，从而使得并不具有任何特定权利和设备的用户可以实现个人广播。As described above, the client can receive and reproduce encoded data in the content providing system ex100. In other words, the client can receive and decode information sent by the user, and reproduce the decoded data in real time in the content providing system ex100, thereby enabling a user who does not have any specific rights and equipment to realize personal broadcasting.

除了内容提供系统ex100的示例以外，在各个实施例中描述的运动图片编码装置(图像编码装置)和运动图片解码装置(图像解码装置)中的至少一个可以在图14中所示的数字广播系统ex200中实现。更具体地说，广播站ex201经由无线电波向广播卫星ex202传送或发送通过将音频数据等复用到视频数据上所获得的复用数据。视频数据是由在各个实施例中描述的运动图片编码方法编码的数据(即，由根据本发明的一方案的图像编码装置编码的数据)。当接收到复用数据时，广播卫星ex202发送无线电波来进行广播。然后，具有卫星广播接收功能的家用天线ex204接收该无线电波。接下来，诸如电视机(接收机)ex300和机顶盒(STB)ex217的设备对所接收的复用数据进行解码，并重现经解码的数据(即，用作根据本发明的一方案的图像解码装置)。In addition to the example of the content providing system ex100, at least one of the moving picture encoding device (image encoding device) and the moving picture decoding device (image decoding device) described in the respective embodiments may be in the digital broadcasting system shown in FIG. 14 Implemented in ex200. More specifically, the broadcasting station ex201 transmits or sends multiplexed data obtained by multiplexing audio data and the like onto video data to the broadcasting satellite ex202 via radio waves. Video data is data encoded by the moving picture encoding method described in each embodiment (ie, data encoded by an image encoding device according to an aspect of the present invention). When multiplexed data is received, the broadcast satellite ex202 transmits radio waves for broadcasting. Then, the home antenna ex204 having a satellite broadcast receiving function receives the radio waves. Next, devices such as a television (receiver) ex300 and a set-top box (STB) ex217 decode the received multiplexed data, and reproduce the decoded data (i.e., used as image decoding according to an aspect of the present invention) device).

另外，读取器/记录器ex218(i)对记录在诸如DVD和BD的记录介质ex215上的复用数据进行读取和解码，或者(i)对记录介质ex215中的视频信号进行编码，并且在某些情况下，写入通过将音频信号复用到经编码的数据上而获得的数据。读取器/记录器ex218可以包括如在各个实施例中所示的运动图片解码装置或运动图片编码装置。在这种情况下，重现的视频信号在监视器ex219上显示，并且可以由使用其上记录了复用数据的记录介质ex215的另一个设备或系统重现。也有可能在连接到有线电视的线缆ex203或卫星和/或地面广播的天线ex204的机顶盒ex217中实现运动图片解码装置，以便在电视机ex300的监视器ex219上显示视频信号。运动图片解码装置可以不在机顶盒而是在电视机ex300中实现。In addition, the reader/recorder ex218 (i) reads and decodes the multiplexed data recorded on the recording medium ex215 such as DVD and BD, or (i) encodes the video signal in the recording medium ex215, and In some cases, data obtained by multiplexing an audio signal onto encoded data is written. The reader/recorder ex218 may include a moving picture decoding device or a moving picture encoding device as shown in various embodiments. In this case, the reproduced video signal is displayed on the monitor ex219, and can be reproduced by another device or system using the recording medium ex215 on which the multiplexed data is recorded. It is also possible to realize the motion picture decoding means in the set-top box ex217 connected to the cable ex203 of cable TV or the antenna ex204 of satellite and/or terrestrial broadcasting to display the video signal on the monitor ex219 of the television set ex300. The moving picture decoding device may be realized in the TV ex300 instead of the set-top box.

图15示出了使用在各个实施例中描述的运动图片编码方法和运动图片解码方法的电视机(接收机)ex300。电视机ex300包括：调谐器ex301，其通过接收广播的天线ex204或线缆ex203等获得或提供通过将音频数据复用到视频数据上而获得的复用数据；调制/解调单元ex302，其将所接收到的复用数据进行解调或者将数据调制成要供应到外部的复用数据；以及复用/解复用单元ex303，其将调制的复用数据解复用成视频数据和音频数据，或者将由信号处理单元ex306编码的视频数据和音频数据复用成数据。FIG. 15 shows a television (receiver) ex300 using the moving picture encoding method and the moving picture decoding method described in the respective embodiments. The television ex300 includes: a tuner ex301, which obtains or provides multiplexed data obtained by multiplexing audio data onto video data through an antenna ex204 for receiving broadcasts, a cable ex203, etc.; a modulation/demodulation unit ex302, which converts The received multiplexed data is demodulated or the data is modulated into multiplexed data to be supplied to the outside; and a multiplexing/demultiplexing unit ex303 which demultiplexes the modulated multiplexed data into video data and audio data , or multiplex the video data and audio data encoded by the signal processing unit ex306 into data.

电视机ex300还包括：信号处理单元ex306，其包括分别对音频数据和视频数据进行编码以及对音频数据和视频数据进行解码的音频信号处理单元ex304和视频信号处理单元ex305(其用作根据本发明的方案的图像编码装置和图像解码装置)；以及输出单元ex309，其包括提供经解码的音频信号的扬声器ex307和显示经解码的视频信号的显示器单元ex308(例如显示器)。另外，电视机ex300包括接口单元ex317，其包括接收用户操作的输入的操作输入单元ex312。另外，电视机ex300包括：对电视机ex300的总体各个组成元件进行控制的控制单元ex310，以及向各个元件供电的电源电路单元ex311。除了操作输入单元ex312以外，接口单元ex317可以包括：连接到诸如读取器/记录器ex218的外部设备的桥ex313；用于能够附接诸如SD卡的记录介质ex216的槽单元ex314；连接到诸如硬盘的外部记录介质的驱动器ex315；以及连接到电话网络的调制解调器ex316。在本文中，记录介质ex216可以使用用于存储的非易失性/易失性半导体存储器元件来进行电记录。电视机ex300的组成元件通过同步总线彼此连接。The television set ex300 also includes: a signal processing unit ex306, which includes an audio signal processing unit ex304 and a video signal processing unit ex305 (which are used as a signal processing unit according to the present invention) for encoding audio data and video data and decoding the audio data and video data respectively. and an output unit ex309 including a speaker ex307 providing a decoded audio signal and a display unit ex308 (such as a display) displaying a decoded video signal. In addition, the television ex300 includes an interface unit ex317 including an operation input unit ex312 that receives an input of a user's operation. In addition, the TV ex300 includes: a control unit ex310 for controlling the overall components of the TV ex300, and a power circuit unit ex311 for supplying power to each component. In addition to the operation input unit ex312, the interface unit ex317 may include: a bridge ex313 connected to an external device such as a reader/writer ex218; a slot unit ex314 for attaching a recording medium ex216 such as an SD card; a drive ex315 for an external recording medium of a hard disk; and a modem ex316 connected to a telephone network. Herein, the recording medium ex216 may be electrically recorded using a nonvolatile/volatile semiconductor memory element for storage. The constituent elements of the television ex300 are connected to each other through a sync bus.

首先，将描述电视机ex300对通过天线ex204等从外部获得的复用数据进行解码并且重现经解码的数据的配置。在电视机ex300中，当用户通过远程控制器ex220等操作时，复用/解复用单元ex303在包括CPU的控制单元ex310的控制下对由调制/解调单元ex302解调的复用数据进行解复用。另外，使用各个实施例中描述的解码方法，在电视机ex300中，音频信号处理单元ex304对解复用的音频数据进行解码，并且视频信号处理单元ex305对解复用的视频数据进行解码。输出单元ex309分别向外部提供经解码的视频信号和音频信号。当输出单元ex309提供视频信号和音频信号时，信号可以暂时存储在缓冲器ex318和ex319及其它中，从而信号被彼此同步地重现。另外，电视机ex300可以不通过广播等读取复用数据，而是从诸如磁盘、光盘和SD卡的记录介质ex215和ex216读取。接下来，将描述电视机ex300对音频信号和视频信号进行编码，并且向外发送数据或将数据写到记录介质上的配置。在电视机ex300中，当用户通过远程控制器ex220等操作时，在使用各个实施例中描述的编码方法的控制单元ex310的控制下，音频信号处理单元ex304对音频信号进行编码，并且视频信号处理单元ex305对视频信号进行编码。复用/解复用单元ex303对经编码的视频信号和音频信号进行复用，并向外部提供所产生的信号。当复用/解复用单元ex303对视频信号和音频信号进行复用时，信号可以暂时存储在缓冲器ex320和ex321及其它中，从而信号被彼此同步地重现。在本文中，缓冲器ex318、ex319、ex320和ex321可以是如图所示的多个，或者可以在电视机ex300中共享至少一个缓冲器。另外，数据可以存储在缓冲器中，从而避免例如，调制/解调单元ex302与复用/解复用单元ex303之间的系统上溢和下溢。First, a configuration in which the television ex300 decodes multiplexed data obtained from the outside through the antenna ex204 and the like and reproduces the decoded data will be described. In the television ex300, when the user operates the remote controller ex220, etc., the multiplexing/demultiplexing unit ex303 performs multiplexing on the multiplexed data demodulated by the modulation/demodulation unit ex302 under the control of the control unit ex310 including the CPU. Demultiplexing. Also, using the decoding methods described in the respective embodiments, in the television ex300, the audio signal processing unit ex304 decodes the demultiplexed audio data, and the video signal processing unit ex305 decodes the demultiplexed video data. The output unit ex309 supplies decoded video signals and audio signals to the outside, respectively. When the output unit ex309 supplies video signals and audio signals, the signals can be temporarily stored in the buffers ex318 and ex319 and others so that the signals are reproduced in synchronization with each other. In addition, the television ex300 can read the multiplexed data not by broadcasting or the like, but from recording media ex215 and ex216 such as a magnetic disk, an optical disk, and an SD card. Next, a configuration in which the television ex300 encodes audio signals and video signals, and transmits data externally or writes data on a recording medium will be described. In the television ex300, when the user operates through the remote controller ex220 or the like, under the control of the control unit ex310 using the encoding method described in each embodiment, the audio signal processing unit ex304 encodes the audio signal, and the video signal processes A unit ex305 encodes a video signal. The multiplexing/demultiplexing unit ex303 multiplexes encoded video signals and audio signals, and supplies the resulting signals to the outside. When the multiplexing/demultiplexing unit ex303 multiplexes video signals and audio signals, the signals can be temporarily stored in the buffers ex320 and ex321 and others so that the signals are reproduced in synchronization with each other. Herein, there may be multiple buffers ex318, ex319, ex320 and ex321 as shown in the figure, or at least one buffer may be shared in the TV set ex300. In addition, data can be stored in a buffer, thereby avoiding, for example, system overflow and underflow between the modulation/demodulation unit ex302 and the multiplexing/demultiplexing unit ex303.

另外，电视机ex300可以包括这样的配置：用于从麦克风或摄像机接收AV输入(不同于从广播或记录介质获得音频和视频数据的配置)，并且可以对所获得的数据进行编码。虽然在本说明书中，电视机ex300可以编码、复用并向外部提供数据，但其可以仅能够接收、解码并向外部提供数据，而不能够编码、复用并向外部提供数据。In addition, the television ex300 may include a configuration for receiving AV input from a microphone or a video camera (different from a configuration for obtaining audio and video data from a broadcast or a recording medium), and may encode the obtained data. Although in this specification, the television ex300 can encode, multiplex, and provide data to the outside, it may only be able to receive, decode, and provide data to the outside, but not be able to encode, multiplex, and provide data to the outside.

另外，当读取器/记录器ex218从记录介质读取数据或者在记录介质上写入数据时，电视机ex300和读取器/记录器ex218中的一个可以对复用数据进行解码或编码，并且电视机ex300和读取器/记录器ex218可以共享解码或编码。Also, when the reader/writer ex218 reads data from or writes data on the recording medium, one of the television ex300 and the reader/writer ex218 can decode or encode the multiplexed data, And the TV ex300 and the reader/recorder ex218 can share decoding or encoding.

作为示例，图16示出了当从光盘读取数据或者在光盘上写入数据时信息重现/记录单元ex400的配置。信息重现/记录单元ex400包括要在下文中描述的组成元件ex401、ex402、ex403、ex404、ex405、ex406和ex407。光学头ex401在是光盘的记录介质ex215的记录表面中照射激光点以用于写入信息，并检测来自记录介质ex215的记录表面的反射光来读取信息。调制记录单元ex402电驱动包括在光学头ex401中的半导体激光器，并且根据所记录的数据对激光进行调制。重现解调单元ex403对通过使用包括在光学头ex401中的光检器电检测来自记录表面的反射光而获得的重现信号进行放大，并通过对记录在记录介质ex215上的信号分量进行分离来对重现信号进行解调以便重现必要信息。缓冲器ex404暂时保持要被记录在记录介质ex215上的信息以及从记录介质ex215重现的信息。盘式电动机ex405旋转记录介质ex215。伺服控制单元ex406将光学头ex401移动到预先确定的信息轨道，同时控制盘式电动机ex405的旋转驱动以便跟随激光点。系统控制单元ex407控制整个信息重现/记录单元ex400。读和写过程可以通过以下部件实现：使用存储在缓冲器ex404中的各种信息并且在必要时生成和添加新的信息的系统控制单元ex407，以及通过调制记录单元ex402，重现解调单元ex403，和通过光学头ex401记录并重现信息同时以协调的方式操作的伺服控制单元ex406。例如，系统控制单元ex407包括微处理器，并且通过使计算机执行用于读和写的程序来执行处理。As an example, FIG. 16 shows the configuration of the information reproducing/recording unit ex400 when reading data from or writing data on an optical disc. The information reproducing/recording unit ex400 includes constituent elements ex401, ex402, ex403, ex404, ex405, ex406, and ex407 to be described below. The optical head ex401 irradiates a laser spot in the recording surface of the recording medium ex215 which is an optical disc for writing information, and detects reflected light from the recording surface of the recording medium ex215 to read information. The modulation recording unit ex402 electrically drives the semiconductor laser included in the optical head ex401, and modulates the laser light according to recorded data. The reproduction demodulation unit ex403 amplifies a reproduction signal obtained by electrically detecting reflected light from the recording surface using a photodetector included in the optical head ex401, and separates the signal components recorded on the recording medium ex215 to demodulate the reproduced signal in order to reproduce the necessary information. The buffer ex404 temporarily holds information to be recorded on the recording medium ex215 and information reproduced from the recording medium ex215. The disk motor ex405 rotates the recording medium ex215. The servo control unit ex406 moves the optical head ex401 to a predetermined information track while controlling the rotational drive of the disc motor ex405 so as to follow the laser spot. The system control unit ex407 controls the entire information reproduction/recording unit ex400. The reading and writing process can be realized by the system control unit ex407 which uses various information stored in the buffer ex404 and generates and adds new information as necessary, and by modulating the recording unit ex402, reproducing the demodulating unit ex403 , and a servo control unit ex406 that operates in a coordinated manner while recording and reproducing information through the optical head ex401. For example, the system control unit ex407 includes a microprocessor, and executes processing by causing a computer to execute programs for reading and writing.

虽然在本说明书中光学头ex401照射激光点，但其可以使用近场光来执行高密度记录。Although the optical head ex401 irradiates a laser spot in this specification, it can perform high-density recording using near-field light.

图17示出了是光盘的记录介质ex215。在记录介质ex215的记录表面上，螺旋地形成引导槽，并且信息轨道ex230根据引导槽的形状的变化事先记录用于指示盘上的绝对位置的地址信息。地址信息包括：用于确定是用于记录数据的单元的记录块ex231的位置的信息。在记录和重现数据的装置中重现信息轨道ex230并读取地址信息可以导致记录块的位置的确定。另外，记录介质ex215包括：数据记录区域ex233、内圆周区域ex232和外圆周区域ex234。数据记录区域ex233是用于记录用户数据的区域。在数据记录区域ex233内部和外部的内圆周区域ex232和外圆周区域ex234分别用于除了记录用户数据以外的特定用途。信息重现/记录单元400从记录介质ex215的数据记录区域ex233读取和在其上写入经编码的音频、经编码的视频数据、或者通过对经编码的音频和视频数据进行复用而得到的复用数据。FIG. 17 shows a recording medium ex215 which is an optical disk. On the recording surface of the recording medium ex215, a guide groove is formed spirally, and the information track ex230 records address information indicating an absolute position on the disc in advance according to a change in the shape of the guide groove. The address information includes information for specifying the position of the recording block ex231 which is a unit for recording data. Reproducing the information track ex230 and reading the address information in the apparatus for recording and reproducing data can lead to determination of the position of the recording block. In addition, the recording medium ex215 includes a data recording area ex233, an inner peripheral area ex232, and an outer peripheral area ex234. The data recording area ex233 is an area for recording user data. The inner circumference area ex232 and the outer circumference area ex234 inside and outside the data recording area ex233 are used for specific purposes other than recording user data, respectively. The information reproducing/recording unit 400 reads and writes encoded audio, encoded video data from and on the data recording area ex233 of the recording medium ex215, or is obtained by multiplexing encoded audio and video data. of multiplexing data.

虽然在本说明书中作为示例描述了具有层的光盘(诸如DVD和BD)，但光盘并不局限于此，并且可以是具有多层结构并能够被记录在不同于表面的部分上的光盘。另外，光盘可具有用于多维记录/重现(例如在光盘的相同部分中使用具有不同波长的光的颜色对信息进行记录)、以及用于从各个角度记录具有不同的层的信息的结构。Although an optical disc having layers such as DVD and BD is described as an example in this specification, the optical disc is not limited thereto and may be an optical disc having a multi-layer structure and capable of being recorded on a portion other than the surface. In addition, the optical disc may have a structure for multi-dimensional recording/reproduction such as recording information using colors of light having different wavelengths in the same portion of the optical disc, and for recording information with different layers from various angles.

另外，在数字广播系统ex200中，具有天线ex205的汽车ex210可以从卫星ex202等接收数据，并且在显示设备(例如设置在汽车ex210中的汽车导航系统ex211)上再现视频。在这里，汽车导航系统ex211的配置将是例如包括来自图15中所示的配置的GPS接收单元的配置。对于计算机ex111、蜂窝电话ex114等的配置来说也是如此。Also, in the digital broadcasting system ex200, a car ex210 having an antenna ex205 can receive data from a satellite ex202 etc., and reproduce video on a display device such as a car navigation system ex211 provided in the car ex210. Here, the configuration of the car navigation system ex211 will be, for example, a configuration including a GPS receiving unit from the configuration shown in FIG. 15 . The same is true for the configuration of the computer ex111, the cellular phone ex114, and the like.

图18A示出了使用在实施例中描述的运动图片编码方法和运动图片解码方法的蜂窝电话ex114。蜂窝电话ex114包括：用于通过基站ex110来发送和接收无线电波的天线ex350；能够捕捉运动和静止图像的摄像机单元ex365；以及用于显示数据(例如由摄像机单元ex365捕捉的或者由天线ex350接收的经解码的视频)的显示器单元ex358(例如液晶显示器)。蜂窝电话ex114还包括：包括操作键单元ex366的主体单元；用于音频输出的音频输出单元ex357(例如扬声器)；用于音频输入的音频输入单元ex356(例如麦克风)；用于存储捕捉到的视频或静止图像、录制的音频、接收到的视频的经编码或解码的数据、静止图片、电子邮件等的存储器单元ex367；以及槽单元ex364，其是用于以与存储器单元ex367相同的方式存储数据的记录介质的接口单元。FIG. 18A shows a cellular phone ex114 using the moving picture encoding method and moving picture decoding method described in the embodiment. The cellular phone ex114 includes: an antenna ex350 for transmitting and receiving radio waves through the base station ex110; a camera unit ex365 capable of capturing moving and still images; and a display data such as captured by the camera unit ex365 or received by the antenna ex350 decoded video) of the display unit ex358 (such as a liquid crystal display). The cellular phone ex114 also includes: a main body unit including an operation key unit ex366; an audio output unit ex357 (such as a speaker) for audio output; an audio input unit ex356 (such as a microphone) for audio input; or a memory unit ex367 of coded or decoded data of a still image, recorded audio, received video, still picture, e-mail, etc.; and a slot unit ex364 for storing data in the same manner as the memory unit ex367 interface unit for recording media.

接下来，将参考图18B来描述蜂窝电话ex114的配置的示例。在蜂窝电话ex114中，被设计为对包括显示器单元ex358以及操作键单元ex366的主体的各个单元进行总体控制的主控制单元ex360经由同步总线ex370相互连接到电源电路单元ex361、操作输入控制单元ex362、视频信号处理单元ex355、摄像机接口单元ex363、液晶显示器(LCD)控制单元ex359、调制/解调单元ex352、复用/解复用单元ex353、音频信号处理单元ex354、槽单元ex364以及存储器单元ex367。Next, an example of the configuration of the cellular phone ex114 will be described with reference to FIG. 18B. In the cellular phone ex114, the main control unit ex360 designed to overall control the respective units of the main body including the display unit ex358 and the operation key unit ex366 is connected to the power supply circuit unit ex361, the operation input control unit ex362, Video signal processing unit ex355, camera interface unit ex363, liquid crystal display (LCD) control unit ex359, modulation/demodulation unit ex352, multiplexing/demultiplexing unit ex353, audio signal processing unit ex354, slot unit ex364, and memory unit ex367.

当呼叫结束键或电源键通过用户的操作接通时，电源电路单元ex361向各个单元提供来自电池组的电力，从而激活手机ex114。When the call end key or the power key is turned on by the user's operation, the power circuit unit ex361 supplies power from the battery pack to each unit, thereby activating the mobile phone ex114.

在蜂窝电话ex114中，在包括CPU、ROM和RAM的主控制单元ex360的控制下，音频信号处理单元ex354将由音频输入单元ex356在语音通话模式中收集到的音频信号转换成数字音频信号。然后，调制/解调单元ex352对数字音频信号进行扩频处理，并且发送和接收单元ex351对数据进行数模转换和频率转换，以便经由天线ex350发送所产生的数据。另外，在蜂窝电话ex114中，发送和接收单元ex351对由天线ex350在语音通话模式中接收到的数据进行放大，并对数据进行频率转换和模数转换。然后，调制/解调单元ex352对数据进行反扩频处理，并且音频信号处理单元ex354将其转换成模拟音频信号，以便经由音频输出单元ex357来对其进行输出。In the cellular phone ex114, under the control of the main control unit ex360 including CPU, ROM, and RAM, the audio signal processing unit ex354 converts audio signals collected by the audio input unit ex356 in voice call mode into digital audio signals. Then, the modulation/demodulation unit ex352 performs spread spectrum processing on the digital audio signal, and the transmission and reception unit ex351 performs digital-to-analog conversion and frequency conversion on the data to transmit the resultant data via the antenna ex350. Also, in the cellular phone ex114, the transmission and reception unit ex351 amplifies data received by the antenna ex350 in voice call mode, and performs frequency conversion and analog-to-digital conversion on the data. Then, the modulation/demodulation unit ex352 performs inverse spread spectrum processing on the data, and the audio signal processing unit ex354 converts it into an analog audio signal to output it via the audio output unit ex357.

另外，当在数据通信模式中发送电子邮件时，通过对主体的操作键单元ex366等进行操作而输入的电子邮件的文本数据经由操作输入控制单元ex362送出到主控制单元ex360。主控制单元ex360使调制/解调单元ex352对文本数据进行扩频处理，并且发送和接收单元ex351对所产生的数据进行数模转换和频率转换，以便经由天线ex350向基站ex110发送数据。当接收到电子邮件时，对所接收的数据执行与用于发送电子邮件的处理基本相反的处理，并向显示器单元ex358提供所产生的数据。Also, when sending an e-mail in the data communication mode, the text data of the e-mail input by operating the operation key unit ex366 of the main body is sent to the main control unit ex360 via the operation input control unit ex362. The main control unit ex360 causes the modulation/demodulation unit ex352 to perform spread spectrum processing on the text data, and the transmission and reception unit ex351 performs digital-to-analog conversion and frequency conversion on the generated data to transmit the data to the base station ex110 via the antenna ex350. When e-mail is received, processing substantially inverse to that for sending e-mail is performed on the received data, and the resulting data is supplied to the display unit ex358.

当在数据通信模式中发送视频、静止图像或视频和音频时，视频信号处理单元ex355使用各个实施例中所示的运动图片编码方法对从摄像机单元ex365提供的视频信号进行压缩和编码(即，用作根据本发明的方案的图像编码装置)，并向复用/解复用单元ex353发送经编码的视频数据。相反，在摄像机单元ex365捕捉视频、静止图像等期间，音频信号处理单元ex354对由音频输入单元ex356收集的音频信号进行编码，并向复用/解复用单元ex353发送经编码的音频数据。When transmitting video, still images, or video and audio in the data communication mode, the video signal processing unit ex355 compresses and encodes the video signal supplied from the camera unit ex365 using the moving picture encoding method shown in the respective embodiments (that is, function as an image coding device according to the scheme of the present invention), and sends the coded video data to the multiplexing/demultiplexing unit ex353. In contrast, the audio signal processing unit ex354 encodes audio signals collected by the audio input unit ex356 and sends the encoded audio data to the multiplexing/demultiplexing unit ex353 while the camera unit ex365 captures video, still images, etc.

复用/解复用单元ex353使用预先确定的方法，对从视频信号处理单元ex355提供的经编码的视频数据以及从音频信号处理单元ex354提供的经编码的音频数据进行复用。然后，调制/解调单元(调制/解调电路单元)ex352对复用数据进行扩频处理，并且发送和接收单元ex351对数据进行数模转换和频率转换，以便经由天线ex350来发送所产生的数据。The multiplexing/demultiplexing unit ex353 multiplexes the encoded video data supplied from the video signal processing unit ex355 and the encoded audio data supplied from the audio signal processing unit ex354 using a predetermined method. Then, the modulation/demodulation unit (modulation/demodulation circuit unit) ex352 performs spread spectrum processing on the multiplexed data, and the transmission and reception unit ex351 performs digital-to-analog conversion and frequency conversion on the data to transmit the generated data via the antenna ex350. data.

当在数据通信模式中接收链接到网页的视频文件等时，或者当接收附有视频和/或音频的电子邮件时，为了对经由天线ex350接收到的复用数据进行解码，复用/解复用单元ex353将复用数据解复用成视频数据比特流和音频数据比特流，并且通过同步总线ex370向视频信号处理单元ex355提供经编码的视频数据以及向音频信号处理单元ex354提供经编码的音频数据。视频信号处理单元ex355使用与各个实施例中示出的运动图片编码方法相对应的运动图片解码方法对视频信号进行解码(即，用作根据本发明的方案的图像解码装置)，然后显示器单元ex358经由LCD控制单元ex359显示例如包括在链接到网页的视频文件中的视频和静止图像。另外，音频信号处理单元ex354对音频信号进行解码，并且音频输出单元ex357提供音频。When receiving a video file etc. linked to a web page in data communication mode, or when receiving an e-mail with video and/or audio attached, in order to decode multiplexed data received via the antenna ex350, multiplexing/demultiplexing The multiplexed data is demultiplexed into a video data bit stream and an audio data bit stream by the unit ex353, and the encoded video data is supplied to the video signal processing unit ex355 and the encoded audio is supplied to the audio signal processing unit ex354 through the synchronous bus ex370 data. The video signal processing unit ex355 decodes the video signal using the moving picture decoding method corresponding to the moving picture encoding method shown in the respective embodiments (that is, serves as the image decoding means according to the aspect of the present invention), and then the display unit ex358 For example, video and still images included in video files linked to web pages are displayed via the LCD control unit ex359. In addition, the audio signal processing unit ex354 decodes audio signals, and the audio output unit ex357 supplies audio.

另外，与电视机ex300类似，例如蜂窝电话ex114的终端可能具有3种类型的实现配置，其不仅包括(i)包括编码装置和解码装置二者的发送和接收终端，还包括(ii)仅包括编码装置的发送终端以及(iii)仅包括解码装置的接收终端。虽然在本说明书中，数字广播系统ex200接收并发送通过将音频数据复用到视频数据上获得的复用数据，但复用数据可以并不是通过将音频数据复用而是通过将与视频相关的字符数据复用到视频数据上而获得的，并且可以不是复用数据，而是视频数据本身。In addition, similar to the television ex300, a terminal such as a cellular phone ex114 may have 3 types of implementation configurations including not only (i) a transmitting and receiving terminal including both encoding means and decoding means, but also (ii) including only A transmitting terminal of encoding means and (iii) a receiving terminal comprising only decoding means. Although in this specification, the digital broadcasting system ex200 receives and transmits multiplexed data obtained by multiplexing audio data onto video data, the multiplexed data may be obtained not by multiplexing audio data but by multiplexing video-related The character data is obtained by multiplexing the video data, and may not be the multiplexed data but the video data itself.

因此，在每一个实施例中的运动图片编码方法和运动图片解码方法可以用于所描述的设备和系统中的任意一个。因此，可以获得在每一个实施例中描述的优点。Therefore, the moving picture encoding method and the moving picture decoding method in each embodiment can be used in any of the described devices and systems. Therefore, the advantages described in each embodiment can be obtained.

另外，本发明并不局限于这些实施例，并且在不脱离本发明的范围的前提下，各种修改和修订是可能的。In addition, the present invention is not limited to these embodiments, and various modifications and revisions are possible without departing from the scope of the present invention.

(实施例B)(Example B)

可以通过根据需要在下列各项之间进行切换来生成视频数据：(i)在各个实施例中示出的运动图片编码方法或运动图片编码装置以及(ii)符合不同标准(诸如MPEG-2、MPEG-4AVC和VC-1)的运动图片编码方法或运动图片编码装置。Video data can be generated by switching as needed between (i) the moving picture encoding method or moving picture encoding apparatus shown in the respective embodiments and (ii) conforming to different standards such as MPEG-2, MPEG-4 AVC and VC-1) moving picture encoding method or moving picture encoding apparatus.

在本文中，当生成符合不同标准的多个视频数据并随后对其进行解码时，需要选择解码方法以符合不同的标准。然而，由于无法检测要解码的多个视频数据中的每一个视频数据符合哪种标准，因此存在无法选择合适的解码方法的问题。Herein, when a plurality of video data conforming to different standards is generated and then decoded, it is necessary to select a decoding method so as to conform to the different standards. However, since it is impossible to detect which standard each of a plurality of video data to be decoded conforms to, there is a problem that an appropriate decoding method cannot be selected.

为了解决该问题，通过将音频数据等复用到视频数据上而获得的复用数据具有包括用来指示视频数据所符合的标准的识别信息的结构。将在下文中描述包括在各个实施例中示出的运动图片编码方法中以及通过运动图片编码装置生成的视频数据的复用数据的这种特定结构。复用数据是MPEG-2传输流格式的数字流。In order to solve this problem, multiplexed data obtained by multiplexing audio data or the like onto video data has a structure including identification information indicating a standard to which the video data complies. Such a specific structure of multiplexed data of video data included in the moving picture encoding methods shown in the respective embodiments and generated by the moving picture encoding apparatus will be described below. The multiplexed data is a digital stream in the MPEG-2 transport stream format.

图19示出了复用数据的结构。如图19中所示，复用数据可以通过对视频流、音频流、展示图形流(PG)和交互图形流中的至少一个进行复用来获得。视频流表示电影的主要视频和次要视频，音频流(IG)表示主要音频部分和要与主要音频部分相混合的次要音频部分，并且展示图形流表示电影的字幕。在本文中，主要视频是要显示在屏幕上的正常视频，而次要视频是要显示在主要视频中的较小窗口上的视频。另外，交互图形流表示通过在屏幕上布置GUI组件而生成的交互屏幕。视频流是在各个实施例中示出的运动图片编码方法中或通过运动图片编码装置进行编码的，或者通过符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1的)运动图片编码方法或运动图片编码装置进行编码的。音频流是根据诸如杜比AC-3、杜比数字增强版、MLP、DTS、DTS-HD和线性PCM的标准编码的。Fig. 19 shows the structure of multiplexed data. As shown in FIG. 19, the multiplexed data may be obtained by multiplexing at least one of a video stream, an audio stream, a presentation graphics stream (PG), and an interactive graphics stream. The video stream represents the main video and the secondary video of the movie, the audio stream (IG) represents the main audio part and the secondary audio part to be mixed with the main audio part, and the presentation graphics stream represents the subtitle of the movie. In this article, the primary video is the normal video to be displayed on the screen, and the secondary video is the video to be displayed on a smaller window within the primary video. In addition, the interactive graphics flow represents an interactive screen generated by arranging GUI components on the screen. The video stream is encoded in the moving picture encoding method shown in the various embodiments or by a moving picture encoding device, or by a moving picture encoding method conforming to conventional standards such as MPEG-2, MPEG-4 AVC, and VC-1 or encoded by a motion picture encoding device. Audio streams are encoded according to standards such as Dolby AC-3, Dolby Digital Plus, MLP, DTS, DTS-HD, and Linear PCM.

包括在复用数据中的每一个流是通过PID识别的。例如，0x1011被分配给用于电影的视频的视频流，0x1100至0x111F被分配给音频流，0x1200至0x121F被分配给展示图形流，0x1400至0x141F被分配给交互图形流，0x1B00至0x1B1F被分配给用于电影的次要视频的视频流，并且0x1A00至0x1A1F被分配给用于要与主要音频相混合的次要音频的音频流。Each stream included in the multiplexed data is identified by a PID. For example, 0x1011 is assigned to the video stream for video for movies, 0x1100 to 0x111F are assigned to audio streams, 0x1200 to 0x121F are assigned to presentation graphics streams, 0x1400 to 0x141F are assigned to interactive graphics streams, and 0x1B00 to 0x1B1F are assigned to A video stream for secondary video of a movie, and 0x1A00 to 0x1A1F are allocated to an audio stream for secondary audio to be mixed with primary audio.

图20示意性地示出了数据是如何进行复用的。首先，将由视频帧组成的视频流ex235和由音频帧组成的音频流ex238分别变换成PES分组ex236的流和PES分组ex239的流，并进一步变换成TS分组ex237和TS分组ex240。类似地，将展示图形流ex241的数据和交互图形流ex244的数据分别变换成PES分组ex242的流和PES分组ex245的流，并进一步变换成TS分组ex243和TS分组ex246。将这些TS分组复用成流以便获得复用数据ex247。Fig. 20 schematically shows how data is multiplexed. First, the video stream ex235 composed of video frames and the audio stream ex238 composed of audio frames are respectively converted into a stream of PES packets ex236 and a stream of PES packets ex239, and further converted into TS packets ex237 and TS packets ex240. Similarly, the data of the presentation graphics stream ex241 and the data of the interactive graphics stream ex244 are converted into a stream of PES packets ex242 and a stream of PES packets ex245, respectively, and further converted into TS packets ex243 and TS packets ex246. These TS packets are multiplexed into a stream to obtain multiplexed data ex247.

图21更加详细地示出了视频流怎样存储在PES分组的流中。图21中的第一栏示出了视频流中的视频帧流。第二栏示出了PES分组的流。如同由图21中标注为yy1、yy2、yy3和yy4的箭头所指示的，视频流被划分成作为I图片、B图片、和P图片的图片，这些图片中的每一个图片均是视频展示单元，并且这些图片存储在PES分组中的每一个中的有效载荷中。每一个PES分组具有PES头部，并且PES头部存储用来指示图片的显示时间的展示时间戳(PTS)，以及用来指示图片的解码时间的解码时间戳(DTS)。Figure 21 shows in more detail how a video stream is stored in a stream of PES packets. The first column in Fig. 21 shows the flow of video frames in the video stream. The second column shows the flow of PES packets. As indicated by the arrows labeled yy1, yy2, yy3, and yy4 in Figure 21, the video stream is divided into pictures as I-pictures, B-pictures, and P-pictures, each of which is a video presentation unit , and these pictures are stored in the payload in each of the PES packets. Each PES packet has a PES header, and the PES header stores a presentation time stamp (PTS) indicating a display time of a picture, and a decoding time stamp (DTS) indicating a decoding time of a picture.

图22示出了最终写到复用数据上的TS分组的格式。每一个TS分组是188字节的固定长度分组，其包括具有信息(诸如用于识别流的PID)的4字节TS头部以及用于存储数据的184字节的TS有效载荷。对PES分组进行划分并分别将其存储在TS有效载荷中。当使用BD ROM时，赋予每一个TS分组4个字节的TP_Extra_Header，从而造成192字节的源分组。向复用数据上写入源分组。TP_Extra_Header存储诸如Arrival_Time_Stamp(ATS)的信息。ATS示出向PID滤波器传输每一个TS分组的传输开始时间。源分组如图22的底部所示布置在复用数据中。从复用数据的头部递增的数量被称为源分组数量(SPN)。Fig. 22 shows the format of TS packets finally written on the multiplexed data. Each TS packet is a 188-byte fixed-length packet including a 4-byte TS header with information such as a PID for identifying a stream and a 184-byte TS payload for storing data. PES packets are divided and stored in TS payloads respectively. When using a BD ROM, each TS packet is given a TP_Extra_Header of 4 bytes, resulting in a source packet of 192 bytes. Write source packets to multiplexed data. TP_Extra_Header stores information such as Arrival_Time_Stamp (ATS). ATS shows the transmission start time of transmitting each TS packet to the PID filter. Source packets are arranged in the multiplexed data as shown at the bottom of FIG. 22 . The number incremented from the header of the multiplexed data is called a source packet number (SPN).

包括在复用数据中的每一个TS分组不仅包括音频、视频、字幕等的流，还包括节目关联表(PAT)、节目映射表(PMT)以及节目时钟参考(PCR)。PAT示出复用数据中使用的PMT中的PID所指示的内容，并且PAT的PID本身登记为零。PMT存储包括在复用数据中的音频、视频、字幕等的流的PID以及与PID相对应的这些流的属性信息。PMT还具有与复用数据相关的各种描述符。这些描述符具有例如用来显示是否允许复制复用数据的复制控制信息的信息。PCR存储与示出何时将PCR分组传输到解码器的ATS相对应的STC时间信息，以便实现到达时间时钟(ATC)(其是ATS的时间轴)与系统时间时钟(STC)(其是PTS和DTS的时间轴)之间的同步。Each TS packet included in the multiplexed data includes not only streams of audio, video, subtitles, etc., but also a Program Association Table (PAT), Program Map Table (PMT), and Program Clock Reference (PCR). The PAT shows what is indicated by the PID in the PMT used in the multiplexed data, and the PID itself of the PAT is registered as zero. The PMT stores PIDs of streams of audio, video, subtitles, etc. included in the multiplexed data and attribute information of these streams corresponding to the PIDs. The PMT also has various descriptors related to the multiplexed data. These descriptors have information such as copy control information showing whether copying of multiplexed data is permitted or not. The PCR stores STC time information corresponding to the ATS that shows when the PCR packet is transmitted to the decoder, so that the Arrival Time Clock (ATC) (which is the time axis of the ATS) and the System Time Clock (STC) (which is the PTS and DTS time axis).

图23详细地示出了PMT的数据结构。PMT头部布置在PMT的顶部。PMT头部描述包括在PMT中的数据的长度等。与复用数据相关的多个描述符布置在PMT头部之后。在描述符中描述了诸如复制控制信息的信息。在描述符之后，布置了与包括在复用数据中的流相关的多条流信息。每条流信息包括分别描述信息(诸如用于识别流的压缩编解码器的流类型、流PID以及流属性信息(如帧速率或纵横比))的流描述符。流描述符在数量上与复用数据中的流的数量相等。Fig. 23 shows the data structure of PMT in detail. The PMT head is arranged on top of the PMT. The PMT header describes the length and the like of data included in the PMT. A plurality of descriptors related to multiplexed data are arranged after the PMT header. Information such as copy control information is described in the descriptor. After the descriptor, pieces of stream information related to the stream included in the multiplexed data are arranged. Each piece of stream information includes stream descriptors respectively describing information such as a stream type for identifying a compression codec of the stream, a stream PID, and stream attribute information such as frame rate or aspect ratio. The number of stream descriptors is equal to the number of streams in the multiplexed data.

当复用数据记录在记录介质等上时，其与复用数据信息文件一起记录。When multiplexed data is recorded on a recording medium or the like, it is recorded together with the multiplexed data information file.

每个复用数据信息文件是如图24中所示的复用数据的管理信息。复用数据信息文件与复用数据一一对应，并且每一个文件均包括复用数据信息、流属性信息以及条目映射。Each multiplexed data information file is management information of multiplexed data as shown in FIG. 24 . Multiplexing data information files correspond to multiplexing data one by one, and each file includes multiplexing data information, stream attribute information and entry mapping.

如图24中所示，复用数据信息包括系统速率、重现开始时间和重现结束时间。系统速率指示下面要描述的系统目标解码器向PID滤波器传输复用数据的最大传输速率。包括在复用数据中的ATS的间隔设置为不高于系统速率。重现开始时间在复用数据的头部处指示视频帧中的PTS。将一个帧的间隔添加到复用数据末尾处的视频帧中的PTS，并且将PTS设置为重现结束时间。As shown in FIG. 24, the multiplexed data information includes a system rate, a reproduction start time, and a reproduction end time. The system rate indicates the maximum transmission rate at which the system target decoder to be described below transmits the multiplexed data to the PID filter. The interval of ATS included in the multiplexed data is set not higher than the system rate. The reproduction start time indicates the PTS in the video frame at the head of the multiplexed data. An interval of one frame is added to the PTS in the video frame at the end of the multiplexed data, and the PTS is set as the reproduction end time.

如图25中所示，对于包括在复用数据中的每一个流的每一个PID，一条属性信息被登记在流属性信息中。依赖于对应的流是视频流、音频流、展示图形流还是交互图形流，每一条属性信息具有不同的信息。每条视频流属性信息携带包括哪种压缩编解码器用于对视频流进行压缩、以及包括在视频流中的各条图片数据的分辨率、纵横比和帧速率的信息。每条音频流属性信息携带包括哪种压缩编解码器用于对音频流进行压缩、音频流中包括多少条信道、音频流支持哪种语言以及采样频率为多高的信息。视频流属性信息和音频流属性信息用于在播放器对信息进行回放之前对解码器进行初始化。As shown in FIG. 25, for each PID of each stream included in the multiplexed data, a piece of attribute information is registered in the stream attribute information. Each piece of attribute information has different information depending on whether the corresponding stream is a video stream, an audio stream, a display graphics stream, or an interactive graphics stream. Each piece of video stream attribute information carries information including which compression codec is used to compress the video stream, and the resolution, aspect ratio, and frame rate of each piece of picture data included in the video stream. Each piece of audio stream attribute information carries information including which compression codec is used to compress the audio stream, how many channels are included in the audio stream, which language the audio stream supports, and how high the sampling frequency is. The video stream attribute information and the audio stream attribute information are used to initialize the decoder before the player plays back the information.

在本文的实施例中，要使用的复用数据是包括在PMT中的流类型的。另外，当复用数据记录在记录介质上时，使用包括在复用数据信息中的视频流属性信息。更具体地，在各个实施例中描述的运动图片编码方法或运动图片编码装置包括：用于向包括在PMT的流类型或视频流属性信息分配指示通过各个实施例中的运动图片编码方法或运动图片编码装置生成的视频数据的唯一信息的步骤或单元。使用该配置，由各个实施例中描述的运动图片编码方法或运动图片编码装置生成的视频数据可以与符合另一标准的视频数据区分开。In the embodiments herein, the multiplexed data to be used is of the stream type included in the PMT. In addition, when the multiplexed data is recorded on the recording medium, the video stream attribute information included in the multiplexed data information is used. More specifically, the moving picture encoding method or moving picture encoding apparatus described in each embodiment includes: for assigning an indication to the stream type or video stream attribute information included in the PMT by the moving picture encoding method or motion picture encoding method in each embodiment A step or unit of unique information for video data generated by a picture encoding device. With this configuration, video data generated by the moving picture encoding method or moving picture encoding device described in the respective embodiments can be distinguished from video data conforming to another standard.

另外，图26示出了根据本文实施例的运动图片解码方法的步骤。在步骤exS100中，包括在PMT中的流类型或者包括在复用数据信息中的视频流属性信息是从复用数据获得的。接下来，在步骤exS101中，确定流类型或视频流属性信息是否指示复用数据是通过各个实施例中的运动图片编码方法或运动图片编码装置生成的。当确定流类型或视频流属性信息指示复用数据是通过各个实施例中的运动图片编码方法或运动图片编码装置生成的，那么在步骤exS102中，通过各个实施例中的运动图片解码方法执行解码。另外，当流类型或视频流属性信息指示符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1)，那么在步骤exS103中，通过符合常规标准的运动图片解码方法执行解码。In addition, FIG. 26 shows the steps of the motion picture decoding method according to the embodiment of this document. In step exS100, the stream type included in the PMT or the video stream attribute information included in the multiplexed data information is obtained from the multiplexed data. Next, in step exS101, it is determined whether stream type or video stream attribute information indicates that multiplexed data is generated by the moving picture encoding method or moving picture encoding apparatus in each embodiment. When it is determined that the stream type or video stream attribute information indicates that the multiplexed data is generated by the moving picture encoding method or moving picture encoding device in each embodiment, then in step exS102, decoding is performed by the moving picture decoding method in each embodiment . Also, when the stream type or video stream attribute information indicates compliance with conventional standards such as MPEG-2, MPEG-4 AVC, and VC-1, then in step exS103, decoding is performed by a moving picture decoding method conforming to conventional standards.

从而，向流类型或视频流属性信息分配新的唯一值使得能够确定各个实施例中描述的运动图片解码方法或运动图片解码装置是否可以执行解码。甚至当输入符合不同标准的复用数据时，可以选择合适的解码方法或装置。因此，可以在没有任何错误的情况下对信息进行解码。另外，本文的实施例中的运动图片编码方法或装置、或者运动图片解码方法或装置可以用于上述设备和系统。Thus, assigning a new unique value to stream type or video stream attribute information enables determination of whether the moving picture decoding method or moving picture decoding apparatus described in each embodiment can perform decoding. Even when multiplexed data conforming to different standards is input, an appropriate decoding method or device can be selected. Therefore, the information can be decoded without any errors. In addition, the moving picture encoding method or device, or the moving picture decoding method or device in the embodiments herein can be used in the above-mentioned devices and systems.

(实施例C)(Example C)

各个实施例中的运动图片编码方法、运动图片编码装置、运动图片解码方法和运动图片解码装置中的每一个通常以集成电路或大规模集成(LSI)电路的形式来实现。作为LSI的示例，图27示出了被制成一个芯片的LSI ex500的配置。LSI ex500包括下面将要描述的元件ex501、ex502、ex503、ex504、ex505、ex506、ex507、ex508以及ex509，并且这些元件通过总线ex510彼此连接。当电源电路单元ex505开启时，电源电路单元ex505通过向各个元件供电而激活。Each of the moving picture encoding method, moving picture encoding device, moving picture decoding method, and moving picture decoding device in the respective embodiments is generally implemented in the form of an integrated circuit or a large scale integration (LSI) circuit. As an example of LSI, FIG. 27 shows a configuration of an LSI ex500 made into one chip. The LSI ex500 includes elements ex501, ex502, ex503, ex504, ex505, ex506, ex507, ex508, and ex509 to be described below, and these elements are connected to each other through a bus ex510. When the power supply circuit unit ex505 is turned on, the power supply circuit unit ex505 is activated by supplying power to each element.

例如，当进行编码时，在包括CPU ex502、存储器控制器ex503、流控制器ex504和驱动频率控制单元ex512的控制单元ex501的控制下，LSIex500通过AV IO ex509从麦克风ex117、摄像机ex113等接收AV信号。所接收的AV信号暂时存储在外部存储器ex511(例如SDRAM)中。在控制单元ex501的控制下，根据处理量和要向信号处理单元ex507发送的速度将存储的数据划分成数据部分。然后，信号处理单元ex507对音频信号和/或视频信号进行编码。在本文中，视频信号的编码是各个实施例中描述的编码。另外，信号处理单元ex507有时对经编码音频数据和经编码视频数据进行复用，并且流IO ex506向外部提供复用数据。所提供的复用数据被发送到基站ex107或写到记录介质ex215上。当数据集合被复用时，数据应该暂时存储在缓冲器ex508中，从而使得数据集合彼此同步。For example, when performing encoding, under the control of the control unit ex501 including the CPU ex502, the memory controller ex503, the stream controller ex504, and the drive frequency control unit ex512, the LSI ex500 receives AV signals from the microphone ex117, the video camera ex113, etc. through the AV IO ex509 . The received AV signal is temporarily stored in the external memory ex511 (such as SDRAM). Under the control of the control unit ex501, the stored data is divided into data sections according to the amount of processing and the speed to be transmitted to the signal processing unit ex507. Then, the signal processing unit ex507 encodes the audio signal and/or the video signal. Herein, the encoding of the video signal is the encoding described in the respective embodiments. In addition, the signal processing unit ex507 sometimes multiplexes encoded audio data and encoded video data, and the stream IO ex506 supplies the multiplexed data to the outside. The provided multiplexed data is sent to the base station ex107 or written on the recording medium ex215. When the data sets are multiplexed, the data should be temporarily stored in the buffer ex508 so that the data sets are synchronized with each other.

虽然存储器ex511是LSI ex500外部的元件，但其可以包括在LSI ex500中。缓冲器ex508不局限于一个缓冲器，而是可以由多个缓冲器组成。另外，LSI ex500可以被制成一个芯片或多个芯片。Although the memory ex511 is an element external to the LSI ex500, it may be included in the LSI ex500. The buffer ex508 is not limited to one buffer but may be composed of a plurality of buffers. In addition, LSI ex500 can be made into one chip or multiple chips.

另外，虽然控制单元ex501包括CPU ex502、存储器控制器ex503、流控制器ex504、驱动频率控制单元ex512，但控制单元ex501的配置并不局限于此。例如，信号处理单元ex507还可以包括CPU。信号处理单元ex507中包括另一个CPU可以提升处理速度。另外，作为另一个示例，CPU ex502可以用作或者是信号处理单元ex507的一部分，并且例如，可以包括音频信号处理单元。在这种情况下，控制单元ex501包括信号处理单元ex507或者包括信号处理单元ex507的一部分的CPU ex502。In addition, although the control unit ex501 includes a CPU ex502, a memory controller ex503, a stream controller ex504, and a driving frequency control unit ex512, the configuration of the control unit ex501 is not limited thereto. For example, the signal processing unit ex507 may also include a CPU. Including another CPU in the signal processing unit ex507 can increase the processing speed. Also, as another example, the CPU ex502 may function as or be a part of the signal processing unit ex507, and may include an audio signal processing unit, for example. In this case, the control unit ex501 includes a signal processing unit ex507 or a CPU ex502 including a part of the signal processing unit ex507.

本文中使用的名称是LSI，但也可以根据集成度的不同将其称为IC、系统LSI、超级LSI、超大规模LSI。The name used in this article is LSI, but it can also be called IC, system LSI, super LSI, and ultra-large-scale LSI depending on the degree of integration.

此外，实现集成的方法不限于LSI，并且特殊电路或通用处理器等也可以实现集成。可以在制造LSI之后进行编程的现场可编程门阵列(FPGA)、或者允许LSI的连接或配置的重新配置的可重新配置的处理器可以用于相同的目的。In addition, the method of achieving integration is not limited to LSI, and integration can also be achieved by special circuits or general-purpose processors or the like. A field programmable gate array (FPGA) that can be programmed after the LSI is manufactured, or a reconfigurable processor that allows reconfiguration of connection or configuration of the LSI can be used for the same purpose.

在未来，随着半导体技术的进步，全新的技术可能会取代LSI。可以使用这样的技术来集成功能块。本发明有可能应用于生物技术。In the future, with the advancement of semiconductor technology, completely new technologies may replace LSI. Functional blocks can be integrated using such techniques. The invention has potential application in biotechnology.

(实施例D)(Example D)

当对在各个实施例中描述的运动图片编码方法中或通过运动图片编码装置生成的视频数据进行解码时，与对符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1)的视频数据进行解码相比，很可能增加了处理量。因此，与对符合常规标准的视频数据进行解码所使用的CPU ex502的驱动频率相比，需要将LSI ex500设置为较高的驱动频率。然而，当驱动频率设置地较高时，存在功耗增加的问题。When decoding video data generated in the moving picture encoding method described in the respective embodiments or by the moving picture encoding apparatus, video data conforming to conventional standards such as MPEG-2, MPEG-4 AVC, and VC-1 Compared to decoding, it is likely to increase the amount of processing. Therefore, it is necessary to set the LSI ex500 to a higher drive frequency than that of the CPU ex502 used for decoding video data conforming to the conventional standard. However, when the driving frequency is set high, there is a problem of increased power consumption.

为了解决该问题，诸如电视机ex300和LSI ex500的运动图片解码装置被配置为：确定视频数据所符合的标准，并且根据所确定的标准在驱动频率之间进行切换。图28示出了本实施例中的配置ex800。当视频数据通过各个实施例中描述的运动图片编码方法或运动图片编码装置生成时，驱动频率切换单元ex803将驱动频率设置为较高的驱动频率。然后，驱动频率切换单元ex803指示执行各个实施例中描述的运动图片解码方法的解码处理单元ex801对视频数据进行解码。当视频数据符合常规标准时，与通过各个实施例中描述的运动图片编码方法或运动图片编码装置生成的视频数据的驱动频率相比，驱动频率切换单元ex803将驱动频率设置为较低的驱动频率。然后，驱动频率切换单元ex803指示符合常规标准的解码处理单元ex802对视频数据进行解码。In order to solve this problem, moving picture decoding apparatuses such as the television ex300 and LSI ex500 are configured to determine a standard to which video data complies, and switch between driving frequencies according to the determined standard. Fig. 28 shows a configuration ex800 in this embodiment. The driving frequency switching unit ex803 sets the driving frequency to a higher driving frequency when video data is generated by the moving picture coding method or moving picture coding device described in the respective embodiments. Then, the driving frequency switching unit ex803 instructs the decoding processing unit ex801 executing the moving picture decoding method described in each embodiment to decode video data. When the video data conforms to the conventional standard, the driving frequency switching unit ex803 sets the driving frequency to a lower driving frequency than that of the video data generated by the moving picture encoding method or moving picture encoding apparatus described in the respective embodiments. Then, the driving frequency switching unit ex803 instructs the decoding processing unit ex802 conforming to the conventional standard to decode the video data.

更具体地，驱动频率切换单元ex803包括图27中的CPU ex502和驱动频率控制单元ex512。在本文中，执行各个实施例中描述的运动图片解码方法的解码处理单元ex801以及符合常规标准的解码处理单元ex802中的每一个与图27中的信号处理单元ex507相对应。CPU ex502确定视频数据所符合的标准。然后，驱动频率控制单元ex512基于来自CPU ex502的信号来确定驱动频率。另外，信号处理单元ex507基于来自CPU ex502的信号对视频数据进行解码。例如，实施例B中描述的识别信息很可能用于识别视频数据。识别信息并不局限于实施例B中所描述的，而是可以是任意信息，只要该信息指示视频数据所符合的标准。例如，当可以基于用于确定视频数据用于电视机或磁盘等的外部信号来确定视频数据所符合的标准时，可以基于这样的外部信号而做出所述确定。另外，CPU ex502例如基于如图30中所示的视频数据的标准与驱动频率相关联的查找表来选择驱动频率。可以通过将查找表存储在缓冲器ex508中以及LSI的内部存储器中，并且通过CPU ex502参考查找表来选择驱动频率。More specifically, the driving frequency switching unit ex803 includes the CPU ex502 and the driving frequency control unit ex512 in FIG. 27 . Herein, each of the decoding processing unit ex801 executing the moving picture decoding method described in the respective embodiments and the decoding processing unit ex802 conforming to the conventional standard corresponds to the signal processing unit ex507 in FIG. 27 . The CPU ex502 determines which standards the video data conforms to. Then, the driving frequency control unit ex512 determines the driving frequency based on the signal from the CPU ex502. Also, the signal processing unit ex507 decodes video data based on a signal from the CPU ex502. For example, the identification information described in Embodiment B is likely to be used to identify video data. The identification information is not limited to that described in Embodiment B, but may be arbitrary information as long as the information indicates the standard to which the video data complies. For example, when the standard to which the video data complies can be determined based on an external signal used to determine that the video data is for use in a television set or a magnetic disk, etc., the determination may be made based on such an external signal. In addition, the CPU ex502 selects the driving frequency based on, for example, a lookup table in which the standard of video data is associated with the driving frequency as shown in FIG. 30 . The drive frequency can be selected by storing a lookup table in the buffer ex508 and in the internal memory of the LSI, and referring to the lookup table by the CPU ex502.

图29示出了用于执行本文的实施例中的方法的步骤。首先，在步骤exS200中，信号处理单元ex507从复用数据获得识别信息。接下来，在步骤exS201中，CPU ex502基于识别信息来确定视频数据是否是通过各个实施例中描述的编码方法和编码装置生成的。当视频数据是由各个实施例中描述的运动图片编码方法和运动图片编码装置生成的时，在步骤exS202中，CPU ex502向驱动频率控制单元ex512发送用于将驱动频率设置为较高驱动频率的信号。然后，驱动频率控制单元ex512将驱动频率设置为较高的驱动频率。另一方面，当识别信息指示视频数据符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1)时，在步骤exS203中，CPU ex502向驱动频率控制单元ex512发送用于将驱动频率设置为较低驱动频率的信号。然后，与视频数据是通过各个实施例中描述的运动图片编码方法和运动图片编码装置生成的情况相比，驱动频率控制单元ex512将驱动频率设置为较低的驱动频率。Figure 29 illustrates steps for performing the methods in the embodiments herein. First, in step exS200, the signal processing unit ex507 obtains identification information from the multiplexed data. Next, in step exS201, the CPU ex502 determines based on the identification information whether the video data is generated by the encoding method and encoding device described in the respective embodiments. When the video data is generated by the moving picture coding method and the moving picture coding apparatus described in the respective embodiments, in step exS202, the CPU ex502 sends to the driving frequency control unit ex512 a signal for setting the driving frequency to a higher driving frequency. Signal. Then, the driving frequency control unit ex512 sets the driving frequency to a higher driving frequency. On the other hand, when the identification information indicates that the video data conforms to conventional standards such as MPEG-2, MPEG-4 AVC, and VC-1, in step exS203, the CPU ex502 sends a signal for setting the driving frequency to the driving frequency control unit ex512. Signals with lower drive frequencies. Then, the drive frequency control unit ex512 sets the drive frequency to a lower drive frequency than the case where video data is generated by the moving picture encoding method and the moving picture encoding apparatus described in the respective embodiments.

另外，连同驱动频率的切换，可以通过改变施加于LSI ex500或包括LSI ex500的装置的电压来提升功率节省效果。例如，当将驱动频率设置得较低时，与驱动频率设置得较高情况下的电压相比，施加于LSI ex500或包括LSI ex500的装置的电压很可能设置为较低的电压。In addition, together with the switching of the driving frequency, it is possible to enhance the power saving effect by changing the voltage applied to the LSI ex500 or a device including the LSI ex500. For example, when the driving frequency is set low, the voltage applied to LSI ex500 or a device including LSI ex500 is likely to be set to a lower voltage than the voltage in the case where the driving frequency is set high.

另外，对于用于设置驱动频率的方法来说，当用于解码的处理量较大时，可以将驱动频率设置较高，并且当用于解码的处理量较小时，可以将驱动频率设置较低。因此，设置方法不局限于上述那些方法。例如，当与用于对由各个实施例中描述的运动图片编码方法和运动图片编码装置生成的视频数据进行解码的处理量相比，用于对符合MPEG-4AVC的视频数据进行解码的处理量较大时，驱动频率很可能以与上述设置相反的顺序来设置。Also, as for the method for setting the driving frequency, when the processing amount for decoding is large, the driving frequency can be set high, and when the processing amount for decoding is small, the driving frequency can be set low . Therefore, setting methods are not limited to those described above. For example, when compared with the processing amount for decoding video data generated by the moving picture encoding method and moving picture encoding apparatus described in the respective embodiments, the amount of processing for decoding video data conforming to MPEG-4 AVC When larger, the drive frequency is likely to be set in the reverse order of the above settings.

另外，用于设置驱动频率的方法并不局限于用于将驱动频率设置较低的方法。例如，当识别信息指示视频数据是由各个实施例中描述的运动图片编码方法和运动图片编码装置生成的，那么施加于LSI ex500或包括LSIex500的装置的电压很可能设置得较高。当识别信息指示视频数据符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1)时，施加于LSI ex500或包括LSI ex500的装置的电压很可能设置得较低。作为另一个示例，当识别信息指示视频数据是由各个实施例中描述的运动图片编码方法和运动图片编码装置生成的，那么CPU ex502的驱动很可能并不需要暂停。当识别信息指示视频数据符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1)时，CPU ex502的驱动很可能在给定的时间暂停，因为CPU ex502具有额外的处理容量。甚至，当识别信息指示视频数据是由各个实施例中描述的运动图片编码方法和运动图片编码装置生成的时，在CPU ex502具有额外的处理容量的情况下，CPU ex502的驱动很可能在给定的时间暂停。在这样的情况下，与识别信息指示视频数据符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1)的情况相比，暂停时间很可能设置的较短。In addition, the method for setting the driving frequency is not limited to the method for setting the driving frequency lower. For example, when the identification information indicates that video data is generated by the moving picture coding method and moving picture coding device described in the respective embodiments, the voltage applied to LSI ex500 or a device including LSI ex500 is likely to be set high. When the identification information indicates that video data conforms to conventional standards such as MPEG-2, MPEG-4 AVC, and VC-1, the voltage applied to LSI ex500 or a device including LSI ex500 is likely to be set lower. As another example, when the identification information indicates that the video data is generated by the moving picture encoding method and moving picture encoding apparatus described in the various embodiments, then the driving of the CPU ex502 probably does not need to be suspended. When the identification information indicates that video data conforms to conventional standards such as MPEG-2, MPEG-4AVC, and VC-1, the drive of the CPU ex502 is likely to be suspended at a given time because the CPU ex502 has extra processing capacity. Even when the identification information indicates that the video data is generated by the moving picture encoding method and moving picture encoding apparatus described in the respective embodiments, in the case where the CPU ex502 has an extra processing capacity, the driving of the CPU ex502 is likely to be performed at a given time pause. In such a case, the pause time is likely to be set shorter than in the case where the identification information indicates that video data conforms to conventional standards such as MPEG-2, MPEG-4 AVC, and VC-1.

因此，可以通过根据视频数据所符合的标准在驱动频率之间进行切换来提升功率节省效果。另外，当LSI ex500或包括LSI ex500的装置使用电池驱动时，在具有功率节省效果的情况下可以延长电池寿命。Therefore, power saving can be improved by switching between driving frequencies according to the standards to which the video data complies. In addition, when the LSI ex500 or a device including the LSI ex500 is driven with a battery, the battery life can be extended with a power saving effect.

(实施例E)(Embodiment E)

存在将符合不同标准的多个视频数据提供给设备和系统(诸如电视机和蜂窝电话)的情况。为了能够对符合不同标准的多个视频数据进行解码，LSI ex500的信号处理单元ex507需要符合不同的标准。然而，LSI ex500的电路规模增加和成本增加的问题随着符合各个标准的信号处理单元ex507的相应使用而出现。There are cases where a plurality of video data conforming to different standards are supplied to devices and systems such as televisions and cellular phones. In order to be able to decode multiple video data conforming to different standards, the signal processing unit ex507 of LSI ex500 needs to conform to different standards. However, the problems of circuit scale increase and cost increase of the LSI ex500 arise with corresponding use of the signal processing unit ex507 conforming to each standard.

为了解决该问题，构想了以下配置：用于实现各个实施例中描述的运动图片解码方法的解码处理单元与符合常规标准(诸如MPEG-2、MPEG-4AVC和VC-1)的解码处理单元部分共享。图31A中的ex900示出了该配置的示例。例如各个实施例中描述的运动图片解码方法与符合MPEG-4AVC的运动图片解码方法具有部分为公共的处理细节，诸如熵编码、反量化、去块滤波和运动补偿预测。待共享的处理细节很可能包括使用符合MPEG-4AVC的解码处理单元ex902。相反，专用解码处理单元ex901很可能用于对本发明的方案特有的其它处理。例如，由于本发明的方案尤其以反量化为特征，因此专用解码处理单元ex901用于反量化。否则，很可能针对熵解码、去块滤波和运动补偿中的一个或者这些处理中的全部对解码处理单元进行共享。可以针对要共享的处理来对用于实现各个实施例中描述的运动图片解码方法的解码处理单元进行共享，并且专用解码处理单元可以用于对MPEG-4AVC的专用解码处理单元特有的处理。In order to solve this problem, the following configuration is conceived: a decoding processing unit for realizing the moving picture decoding method described in each embodiment and a decoding processing unit part conforming to conventional standards such as MPEG-2, MPEG-4 AVC, and VC-1 shared. ex900 in Fig. 31A shows an example of this configuration. For example, the motion picture decoding methods described in the various embodiments share some processing details in common with MPEG-4 AVC compliant motion picture decoding methods, such as entropy coding, inverse quantization, deblocking filtering, and motion compensated prediction. Processing details to be shared likely include use of the MPEG-4 AVC-compliant decoding processing unit ex902. Instead, the dedicated decoding processing unit ex901 is likely to be used for other processing specific to the scheme of the present invention. For example, since the solution of the present invention is particularly characterized by inverse quantization, a dedicated decoding processing unit ex901 is used for inverse quantization. Otherwise, the decoding processing unit is likely to be shared for one or all of entropy decoding, deblocking filtering and motion compensation. A decoding processing unit for implementing the moving picture decoding method described in each embodiment may be shared for processing to be shared, and a dedicated decoding processing unit may be used for processing specific to a dedicated decoding processing unit for MPEG-4 AVC.

另外，图31B中的ex1000示出了部分共享处理的另一个示例。该示例使用包括下列各项的配置：支持对本发明的某个方案特有的处理的专用解码处理单元ex1001，支持另一个常规标准特有的处理的专用解码处理单元ex1002，以及支持在根据本发明的方案的运动图片解码方法与常规运动图片解码方法之间共享的处理的解码处理单元ex1003。在本文中，专用解码处理单元ex1001和ex1002不一定分别专门针对根据本发明的方案的处理和常规标准的处理，并且可以是能够实现一般处理的解码处理单元。另外，本文的实施例的配置可以由LSI ex500实现。In addition, ex1000 in FIG. 31B shows another example of partial sharing processing. This example uses a configuration including: a dedicated decoding processing unit ex1001 supporting processing specific to a certain scheme of the present invention, a dedicated decoding processing unit ex1002 supporting processing specific to another conventional standard, and a dedicated decoding processing unit ex1002 supporting processing in a scheme according to the present invention The decoding processing unit ex1003 of processing shared between the moving picture decoding method and the conventional moving picture decoding method. Herein, the dedicated decoding processing units ex1001 and ex1002 are not necessarily dedicated to processing according to the scheme of the present invention and processing of conventional standards, respectively, and may be decoding processing units capable of realizing general processing. In addition, the configuration of the embodiments herein can be realized by LSI ex500.

因此，通过针对要在根据本发明的方案的运动图片解码方法与符合常规标准的运动图片解码方法之间共享的处理来共享解码处理单元，减小LSI的电路的规模和降低成本是可能的。Therefore, by sharing a decoding processing unit for processing to be shared between the moving picture decoding method according to the scheme of the present invention and the moving picture decoding method conforming to the conventional standard, it is possible to reduce the circuit scale and cost of the LSI.

本领域技术人员将明白的是：在不脱离宽泛描述的本发明的精神或范围的前提下，可以对具体实施例中示出的本发明进行多种变化和/或修改。因此，本文的实施例应该在各个方面被认为是说明性的而非限制性的。It will be apparent to those skilled in the art that various changes and/or modifications may be made to the invention shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. Accordingly, the embodiments herein should be considered in all respects as illustrative rather than restrictive.

工业实用性Industrial Applicability

本发明适用于对音频、静止图像和视频进行编码的编码装置，以及对由编码装置编码的数据进行解码的解码装置。例如，本发明适用于诸如音频设备、蜂窝电话、数码摄像机、BD记录器以及数字电视机的各种视听设备。The present invention is applicable to an encoding device that encodes audio, still images, and video, and a decoding device that decodes data encoded by the encoding device. For example, the present invention is applicable to various audiovisual equipment such as audio equipment, cellular phones, digital video cameras, BD recorders, and digital televisions.

Claims

1. A method of encoding video into an encoded video bitstream using temporal motion vector prediction, the method comprising:

determining the value of a flag indicating whether temporal motion vector prediction is used or not used for inter-picture prediction of a sub-picture unit of a picture;

writing a flag with the value to the header of the sub-picture unit or the header of the picture; and

Wherein, if the flag indicates that temporal motion vector prediction is used, the method further includes:

creating a first list of motion vector predictors comprising a plurality of motion vector predictors comprising at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture;

selecting a motion vector predictor from the first list for a prediction unit in the sub-picture unit; and

A first parameter is written to the encoded video bitstream to indicate a motion vector predictor selected from the first list.

2. The method of claim 1, wherein if the flag indicates that temporal motion vector prediction is not used, the method further comprises:

creating a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors;

selecting a motion vector predictor from the second list for a prediction unit in the sub-picture unit; and

A second parameter is written to the encoded video bitstream to indicate a motion vector predictor selected from the second list.

3. The method according to claim 1 or 2, wherein the value of the flag is determined based on the temporal layer of the picture.

4. The method of claim 3, wherein if it is determined that the temporal layer of the picture is the lowest layer or base layer, the value of the flag is set to indicate that temporal motion vector prediction is not used, otherwise, the value of the flag is set The value of the flag to indicate that temporal motion vector prediction is used.

5. The method of claim 1 or 2, wherein the value of the flag is determined based on a picture order count (POC) value of the picture.

6. The method of claim 5, wherein if it is determined that the POC value of the picture is greater than any POC value of a reference picture in a decoder picture buffer (DPB), the value of the flag is set to indicate Temporal motion vector prediction is not used, otherwise the value of the flag is set to indicate that temporal motion vector prediction is used.

7. The method according to claim 1 or 2, wherein the value of the flag is determined based on a sub-picture unit type of an inter-picture sub-picture unit in the picture.

8. The method of claim 7, wherein if the sub-picture unit type is a predictive (P) type, the value of the flag is set to indicate that temporal motion vector prediction is not used, otherwise, the flag is set value to indicate that temporal motion vector prediction is used.

9. The method of claim 1 or 2, wherein the value of the flag is determined based on whether the picture containing the sub-picture unit is a Random Access Point (RAP) picture.

10. The method of claim 9, wherein if the picture is a RAP picture and the sub-picture unit belongs to a non-base layer of the picture, setting the value of the flag to indicate that temporal motion vector prediction is not used, Otherwise, the value of the flag is set to indicate that temporal motion vector prediction is used.

11. The method according to any one of claims 1 to 10, wherein the flag is written in a header of the sub-picture unit.

12. The method according to any one of claims 1 to 11, wherein the method further comprises: writing one or more parameters into the header of the sub-picture unit, so as to specify The order of the reference pictures in one or more reference picture lists for inter-picture prediction.

13. The method according to any one of claims 1 to 12, wherein the method further comprises:

performing motion-compensated inter-picture prediction using the selected motion vector predictor to generate the prediction unit;

subtracting the prediction unit from the original sample block to produce a residual sample block; and

The remaining sample blocks corresponding to prediction units are encoded into the encoded video bitstream.

14. A method according to any one of claims 1 to 13, wherein the second list includes one less motion vector predictor than the first list, and in addition to the temporal motion vector predictors, all The motion vector predictors of the first list and the second list are the same.

15. The method according to any one of claims 1 to 14, wherein the first parameter and the second parameter are represented in the encoded video bitstream using different predetermined bit representations .

16. A method according to any one of claims 1 to 13, wherein the first list and the second list comprise the same predetermined number of motion vector predictors, and the second list comprises not are present in the first list and are motion vector predictors that were derived without using motion vectors from any reference pictures.

17. The method according to any one of claims 1 to 16, wherein the flag is used to indicate that for the inter-picture prediction for a sub-picture unit independent of other sub-picture units in the picture, use Again temporal motion vector prediction is not used.

18. The method of any one of claims 1 to 17, wherein the sub-picture unit is a slice of a picture.

19. A method of decoding an encoded video bitstream using temporal motion vector prediction, the method comprising:

parsing flags from headers of sub-picture units or headers of pictures of the encoded video; and

determining whether the flag indicates whether temporal motion vector prediction is used or not used;

A first parameter from the encoded video bitstream is parsed, the first parameter indicating a motion vector predictor selected from the first list for a prediction unit in the sub-picture unit.

20. The method of claim 19, wherein if the flag indicates that temporal motion vector prediction is not used, the method further comprises:

creating a second list of motion vector predictors comprising a plurality of motion vector predictors without any temporal motion vector predictors; and

A second parameter from the encoded video bitstream is parsed, the second parameter indicating a motion vector predictor selected from the second list for a prediction unit in the sub-picture unit.

21. An apparatus for encoding video into an encoded video bitstream using temporal motion vector prediction, the apparatus comprising:

a control unit operable to: determine a value of a flag indicating whether temporal motion vector prediction is used or not used for inter-picture prediction of a sub-picture unit of a picture;

a writing unit operable to: write a flag having said value into a header of said sub-picture unit or a header of said picture;

motion vector prediction unit; and

an inter-picture prediction unit configured to: perform inter-picture prediction based on a motion vector predictor selected from the motion vector prediction unit,

Wherein the motion vector prediction unit is configured to: receive the flag, and based on the flag being a first value, the motion vector prediction unit is operable to: create a motion vector prediction comprising a plurality of motion vector predictors A first list of motion vector predictors comprising: at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture, and for a prediction unit in the sub-picture unit , select a motion vector predictor from said first list; and

The writing unit is further operable to: write a first parameter to the encoded video bitstream indicating the selected motion vector predictor from the first list.

22. The apparatus of claim 21 , when the flag is a second value, the motion vector prediction unit is operable to: create a motion vector predictor comprising a plurality of motion vector predictors without any temporal motion vector predictors a second list of vector predictors; and for a prediction unit in the sub-picture unit, select a motion vector predictor from the first list; and

The writing unit is further operable to write a second parameter to the encoded video bitstream indicating the selected motion vector predictor from the second list.

23. An apparatus for decoding an encoded video bitstream using temporal motion vector prediction, the apparatus comprising:

a parsing unit operable to: parse a flag from a header of a sub-picture unit or a header of a picture of the encoded video; and determine whether the flag indicates whether temporal motion vector prediction is used or not used;

motion vector prediction unit; and

an inter-picture prediction unit configured to: perform inter-picture prediction based on a motion vector predictor selected from the motion vector prediction unit;

Wherein the motion vector prediction unit is configured to: receive the flag, and based on the flag being a first value, the motion vector prediction unit is operable to: create a motion vector prediction comprising a plurality of motion vector predictors a first list of symbols, the plurality of motion vector predictors comprising: at least one temporal motion vector predictor derived from at least one motion vector from a collocated reference picture; and

The parsing unit is further operable to: parse a first parameter from the encoded video bitstream, the first parameter indicating a prediction unit from the first list for a prediction unit in the sub-picture unit The selected motion vector predictor.

24. The apparatus of claim 23, wherein, when the flag is a second value, the motion vector prediction unit is operable to: create a motion vector predictor comprising a plurality of motion vector predictors without any temporal motion vector predictors a second list of motion vector predictors; and

The parsing unit is further operable to parse a second parameter from the encoded video bitstream, the second parameter indicating a prediction unit from the second list for a prediction unit in the sub-picture unit. The selected motion vector predictor.