CN103379349B

CN103379349B - A kind of View Synthesis predictive coding method, coding/decoding method, corresponding device and code stream

Info

Publication number: CN103379349B
Application number: CN201210125366.2A
Authority: CN
Inventors: 赵寅; 虞露
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2012-04-25
Filing date: 2012-04-25
Publication date: 2016-06-29
Anticipated expiration: 2032-04-25
Also published as: CN103379349A

Abstract

The present invention provides a viewpoint synthesis predictive coding method applied in the field of multimedia communication. In the process of coding an image I in a viewpoint V1 in a three-dimensional video sequence, a synthetic precision X is adopted to generate an image I from another viewpoint V1 in the three-dimensional video sequence. The reconstructed image and the reconstructed depth in the coded viewpoint V2 are synthesized into a viewpoint composite image P of a region H1 in the viewpoint V1, and the viewpoint composite image P is used to generate the prediction pixels required for viewpoint composite prediction in the process of generating the coded image I; the composite precision X It is written into the code stream of the three-dimensional video sequence; the invention also discloses a viewpoint synthesis predictive decoding method, a viewpoint synthesis predictive encoding device, a decoding device and a corresponding code stream. The invention can improve the coding efficiency of the three-dimensional video sequence.

Description

A View Synthesis Predictive Encoding Method, Decoding Method, Corresponding Device, and Code Stream

技术领域 technical field

本发明及一种多媒体通信领域，具体涉及一种视点合成预测编码方法、解码方法、对应的装置及码流。The present invention relates to the field of multimedia communication, in particular to a viewpoint synthesis predictive coding method, a decoding method, a corresponding device and a code stream.

背景技术 Background technique

三维视频(3Dvideo)序列包括多路(通常为2路)图像序列(包含纹理信息)和对应的深度(depth)序列(包含深度信息，即图像中各像素对应三维空间中的物体与摄像机之间的距离)，通常也被称为MVD(multi-viewvideoplusdepth)格式。三维视频序列包含若干个访问单元(accessunit)，一个访问单元包括多个(两个或多于两个)视点的某一时刻的图像及其对应的深度。编码三维视频序列形成三维视频码流(bitstream)，码流由比特组成。编码方法可以为基于MPEG-2、H.264/AVC、HEVC、AVS、VC-1等视频编码标准的编码方法。需要说明的是，视频编码标准规定了码流的语法(syntax)和对符合编码标准的码流的解码方法，并不规定产生码流的编码方法，但是采用的编码方法必须和标准规定的解码方法匹配，形成符合编码标准的码流，解码器才可以完成对这些码流的解码，否则解码过程可能出错。解码得到的图像称为解码(decoded)图像或者重建(reconstructed)图像，解码得到的深度称为解码(decoded)深度或者重建深度；输入到编码器的(未经编码的)图像称为原始(original)图像。也就是说，编码器内部也完成了由编码得到的信息产生重建图像(或重建深度)的解码处理，且该解码处理与解码器中对码流的解码处理相同。当前编码图像(或当前编码的图像)指当前正在编码的宏块(macroblck)所在帧(frame)的原始图像，当前解码图像(或当前解码的图像)指当前正在解码的宏块所在帧的重建图像。当前编码视点指当前正在编码的宏块所在的视点，当前解码视点指当前正在解码的宏块所在的视点。已编码视点指非当前正在编码的宏块所在的视点，且该视点中的帧先于当前编码视点中相同时刻的帧完成编码；已解码视点指非当前正在解码的宏块所在的视点，且该视点中的帧先于当前解码视点中相同时刻的帧完成解码。A 3D video sequence includes a multi-channel (usually 2-channel) image sequence (including texture information) and a corresponding depth (depth) sequence (including depth information, that is, each pixel in the image corresponds to the distance between the object in the three-dimensional space and the camera. The distance), usually also known as MVD (multi-viewvideoplusdepth) format. A 3D video sequence includes several access units, and one access unit includes multiple (two or more than two) images at a certain moment and their corresponding depths. The coded 3D video sequence forms a 3D video code stream (bitstream), and the code stream consists of bits. The encoding method may be an encoding method based on video encoding standards such as MPEG-2, H.264/AVC, HEVC, AVS, and VC-1. It should be noted that the video coding standard specifies the syntax of the code stream and the decoding method for the code stream that conforms to the coding standard, but does not specify the encoding method for generating the code stream, but the encoding method used must be consistent with the decoding method specified in the standard. The method is matched to form a code stream conforming to the encoding standard, and the decoder can complete the decoding of these code streams, otherwise the decoding process may go wrong. The decoded image is called decoded image or reconstructed image, and the decoded depth is called decoded depth or reconstructed depth; the (uncoded) image input to the encoder is called original )image. That is to say, the encoder also completes the decoding process of generating a reconstructed image (or reconstructed depth) from the encoded information, and this decoding process is the same as the decoding process of the code stream in the decoder. The currently encoded image (or currently encoded image) refers to the original image of the frame where the macroblock (macroblck) is currently being encoded, and the currently decoded image (or currently decoded image) refers to the reconstruction of the frame where the macroblock is currently being decoded image. The current coding viewpoint refers to the viewpoint where the macroblock currently being coded is located, and the current decoding viewpoint refers to the viewpoint where the macroblock currently being decoded is located. The coded viewpoint refers to the viewpoint where the macroblock that is not currently being coded is located, and the frame in this viewpoint is coded before the frame at the same time in the current coded viewpoint; the decoded viewpoint refers to the viewpoint where the macroblock that is not currently being decoded is located, and Frames in this view are decoded before frames at the same moment in the currently decoded view.

目前的视频编码标准，例如H.264/AVC，大多基于包含预测(prediction)编码和变换(transform)编码的混合(hybrid)编码框架。视频序列中的各帧图像逐帧编码，一帧图像分为若干个宏块(Macroblock)，每个宏块又可细分为若干个块(block)。通常的编码处理可概括为：对每一个宏块，通过一种预测方法(对应于一种预测模式)，得到该宏块的预测图像(由一组预测像素组成)，其中的预测方法通常包括帧内预测(intraprediction，从当前编码块周围的重建像素获得该块的预测图像)和帧间预测(interprediction，从另一已编码帧的重建图像中获得该块的预测图像)；对多视点视频(multi-viewvideo)序列还包括视间预测(inter-viewprediction，从另一已编码视点的重建图像中获得该块的预测图像)；对包含深度信息的三维视频序列还可以包括视点合成预测(viewsynthesisprediction，从由另一已编码视点的重建图像和重建深度生成的视点合成图像中获得该块的预测图像)。然后，将该宏块的原始图像与预测图像相减，得到残差(residual)；之后，对残差进行变换，得到变换系数；对变换系数进行量化，并将量化后的变换系数、以及预测模式等边信息(sideinformation)进行熵编码，形成码流。由于量化的引入，编码后的图像(即重建图像)和未经编码的输入图像(即原始图像，或称当前编码图像)之间存在一定的差异，通常称为失真。Most of the current video coding standards, such as H.264/AVC, are based on a hybrid coding framework including prediction coding and transform coding. Each frame of image in the video sequence is coded frame by frame, and one frame of image is divided into several macroblocks (Macroblock), and each macroblock can be subdivided into several blocks (block). The usual coding process can be summarized as: for each macroblock, a prediction method (corresponding to a prediction mode) is used to obtain a prediction image (composed of a group of prediction pixels) of the macroblock, wherein the prediction method usually includes Intra prediction (intraprediction, the predicted image of the block is obtained from the reconstructed pixels around the current coded block) and inter prediction (interprediction, the predicted image of the block is obtained from the reconstructed image of another coded frame); for multi-view video The (multi-viewvideo) sequence also includes inter-view prediction (inter-view prediction, which obtains the predicted image of the block from the reconstructed image of another coded viewpoint); for the 3D video sequence containing depth information, it can also include view synthesis prediction (viewsynthesis prediction) , the predicted image of the block is obtained from the view synthesis image generated from the reconstructed image of another encoded viewpoint and the reconstructed depth). Then, the original image of the macroblock is subtracted from the predicted image to obtain a residual; after that, the residual is transformed to obtain a transform coefficient; the transform coefficient is quantized, and the quantized transform coefficient and the predicted Entropy encoding is performed on the side information of the mode to form a code stream. Due to the introduction of quantization, there is a certain difference between the encoded image (ie, the reconstructed image) and the unencoded input image (ie, the original image, or the current encoded image), which is usually called distortion.

解码处理可以视为编码处理的逆过程，通常包括以下步骤：对每一个宏块，从码流中解析出该宏块对应的预测模式信息和量化后的变换系数；通过预测模式，采用相应的一种预测方法，得到该宏块的预测图像；将量化后的变换系数反量化及反变换，产生残差图像，将残差图像和预测图像相加，得到重建图像。The decoding process can be regarded as the reverse process of the encoding process, which usually includes the following steps: for each macroblock, analyze the prediction mode information corresponding to the macroblock and the quantized transform coefficient from the code stream; through the prediction mode, use the corresponding A prediction method obtains a prediction image of the macroblock; dequantizes and inversely transforms quantized transformation coefficients to generate a residual image, and adds the residual image and the prediction image to obtain a reconstructed image.

视点合成预测(viewsynthesisprediction，VSP)是一种三维视频序列的预测编码技术。视点合成预测根据一个视点V2某一时刻的重建图像和对应的重建深度，采用基于深度和图像渲染(depth-image-basedrendering，简称DIBR)技术，将V2的重建图像投影到另一视点V1，从而生成投影图像；再通过空洞填充(holefilling)、滤波(filtering)、变采样(resampling)等处理中的一种或多种处理，产生视点V1中的一个视点合成图像P。该视点合成图像P与同一时刻视点V1中对应的原始图像O有着很高的相似度，所以被作为一个编码V1视点中原始图像O时的预测图像。当前编码宏块(即正在进行编码处理的宏块)可以从视点合成图像中选择一些像素作为该宏块中的预测图像，随后进行变换、量化、熵编码，得到码流；相应的，当前解码宏块(即正在进行解码处理的宏块)可由该预测图像与对应的残差信息相加得到重建图像。总体上，视点合成预测类似于传统的帧间预测，主要区别在于视点合成预测使用的预测图像是一个由不同于当前编码(或解码)视点的一个已编码(或已解码)视点的重建图像和重建深度生成的视点合成图像，而帧间预测使用的预测图像是当前编码(或解码)视点另一时刻的重建图像。View synthesis prediction (VSP) is a predictive coding technique for 3D video sequences. Viewpoint synthesis prediction is based on the reconstructed image of a viewpoint V2 at a certain moment and the corresponding reconstruction depth, using depth-image-based rendering (DIBR for short) technology to project the reconstructed image of V2 to another viewpoint V1, thus Generate a projected image; and then generate a synthetic view image P in the viewpoint V1 through one or more of processing such as hole filling, filtering, and resampling. The view synthesis image P has a high degree of similarity with the corresponding original image O in the viewpoint V1 at the same time, so it is used as a predictive image when encoding the original image O in the viewpoint V1. The currently encoded macroblock (that is, the macroblock undergoing encoding processing) can select some pixels from the view synthesis image as the predicted image in the macroblock, and then perform transformation, quantization, and entropy encoding to obtain the code stream; correspondingly, the current decoding A macroblock (that is, a macroblock undergoing decoding processing) can obtain a reconstructed image by adding the predicted image and corresponding residual information. In general, view synthesis prediction is similar to traditional inter prediction, the main difference is that the prediction image used by view synthesis prediction is a reconstructed image from a coded (or decoded) viewpoint different from the current coded (or decoded) viewpoint and The view synthesis image generated by depth is reconstructed, while the prediction image used in inter prediction is the reconstructed image at another moment of the currently coded (or decoded) viewpoint.

视点合成预测中一个关键点在于产生高质量的视点合成图像。该视点合成图像与原始图像越接近(即相似度越高)，则预测效率越高，相应的残差也会减少。视点合成中投影可采用整像素精度，即将视点V2中的像素投影到视点V1中投影图像的像素栅格上，且此时投影图像和视点V2图像的分辨率相同。将视点V2中各像素投影到视点V1中投影图像的像素栅格上的方法通常为将视点V2中各像素在视点V1中的投影点(由基于深度和图像渲染技术得到，且投影点通常位于视点V1中两个像素点之间)舍入到水平距离最近的那个像素点，产生一个投影像素。当V1中像素栅格上一个像素点只有一个投影像素时，取该投影像素作为该像素点的像素；当多个投影像素对应于V1中像素栅格上同一个像素点时，则将深度最小(距离摄像机最近)的那个投影像素作为该像素点的像素；当没有投影像素对于于V1中像素栅格上的一个像素点时，该像素点称为空洞(hole)。A key point in view synthesis prediction is to generate high-quality view synthesis images. The closer the view synthesis image is to the original image (that is, the higher the similarity), the higher the prediction efficiency and the corresponding residual error will be reduced. The projection in view synthesis can use integer pixel precision, that is, the pixels in the viewpoint V2 are projected onto the pixel grid of the projected image in the viewpoint V1, and the resolution of the projected image and the image in viewpoint V2 are the same at this time. The method of projecting each pixel in the viewpoint V2 onto the pixel grid of the projected image in the viewpoint V1 is usually to project the projection points of the pixels in the viewpoint V2 in the viewpoint V1 (obtained by depth-based and image rendering techniques, and the projection points are usually located at Between two pixels in the viewpoint V1) is rounded to the pixel with the closest horizontal distance to generate a projected pixel. When there is only one projected pixel at a pixel on the pixel grid in V1, the projected pixel is taken as the pixel of the pixel; when multiple projected pixels correspond to the same pixel on the pixel grid in V1, the depth is minimized The projected pixel (closest to the camera) is used as the pixel of the pixel; when there is no projected pixel corresponding to a pixel on the pixel grid in V1, the pixel is called a hole.

视点合成中投影也可以采用亚像素合成精度，记为K/L像素精度(K和L为正整数，且L通常为2的倍数，K通常为1，例如1/2像素合成精度和1/4像素合成精度)。采用K/L像素精度时，将视点V2中的像素投影到视点V1中投影图像的像素栅格上，且此时投影图像水平方向的分辨率为视点V2中图像水平分辨率的L/K倍。将视点V2中一个像素投影到视点V1中投影图像的像素栅格上的方法通常有以下两种方法之一或其组合：1)将视点V2的图像(及其深度)水平方向上采样L/K倍(即上采样图像的水平方向分辨率为原来图像的L/K倍)，然后将上采样的图像投影到视点V1，形成水平方向分辨率为视点V2中图像水平分辨率的L/K倍的投影图像；2)将V1中的投影图像的水平分辨率设为V2中图像的L/K倍，V2图像中各像素点以整像素精度计算得到的投影位置乘以L/K，得到K/L像素精度下的投影位置(即在整像素精度的投影图像上投影坐标为(a，b)的点，在K/L像素精度下的投影坐标为(aK/L，b))；然后，与整像素合成精度视点合成中相似的，将投影位置舍入到投影图像中最相近的像素栅格上，得到投影像素，进而产生水平方向分辨率为视点V2的图像水平分辨率的L/K倍的投影图像。投影图像经过空洞填充等处理形成视点合成图像。编码或解码过程中，采用视点合成预测的宏块或块将会从视点合成图像(分辨率可能与当前编码图像的分辨率不同)上获取一些像素产生该宏块或块的预测图像(例如从该视点合成图像中一个M×L区域中抽取M×K个像素作为当前编码块的预测图像)；或者，当投影图像分辨率与当前编码图像的分辨率不同时，将空洞填充后的图像投影上采样或下采样，得到与当前编码分辨率相同的视点合成图像，当前编码(或解码)块从该经过变采样形成的视点合成图像中获得预测像素(例如从该视点合成图像中取出一个块的像素作为当前编码块的预测图像)。Projection in view synthesis can also use sub-pixel synthesis accuracy, which is recorded as K/L pixel accuracy (K and L are positive integers, and L is usually a multiple of 2, and K is usually 1, such as 1/2 pixel synthesis accuracy and 1/ 4 pixel compositing precision). When K/L pixel precision is used, the pixels in viewpoint V2 are projected onto the pixel grid of the projected image in viewpoint V1, and the horizontal resolution of the projected image is L/K times the horizontal resolution of the image in viewpoint V2 . The method of projecting a pixel in the viewpoint V2 onto the pixel grid of the projected image in the viewpoint V1 usually has one of the following two methods or a combination thereof: 1) Sampling the image (and its depth) of the viewpoint V2 in the horizontal direction by L/ K times (that is, the horizontal resolution of the upsampled image is L/K times that of the original image), and then project the upsampled image to the viewpoint V1 to form a horizontal resolution that is L/K of the horizontal resolution of the image in the viewpoint V2 2) set the horizontal resolution of the projected image in V1 to L/K times of the image in V2, and multiply the projection position obtained by calculating the pixel points in the V2 image with integer pixel precision by L/K to obtain The projection position under K/L pixel precision (that is, the point whose coordinates are (a, b) is projected on the projection image with integer pixel precision, and the projected coordinates under K/L pixel precision are (aK/L, b)); Then, similar to the viewpoint synthesis with integer pixel synthesis accuracy, the projected position is rounded to the nearest pixel grid in the projected image to obtain the projected pixels, and then the horizontal resolution of the image with the horizontal resolution of V2 is generated. L /K times the projected image. The projected image is subjected to processing such as hole filling to form a view synthesis image. During the encoding or decoding process, the macroblock or block predicted by view synthesis will obtain some pixels from the view synthesis image (the resolution may be different from the resolution of the current encoded image) to generate the predicted image of the macroblock or block (for example, from Extract M×K pixels from an M×L area in the view synthesis image as the prediction image of the current coding block); or, when the resolution of the projected image is different from the resolution of the current coding image, project the hole-filled image Up-sampling or down-sampling to obtain a view synthesis image with the same resolution as the current encoding, and the current encoding (or decoding) block obtains prediction pixels from the view synthesis image formed by variable sampling (for example, taking a block from the view synthesis image The pixels are used as the prediction image of the current coding block).

采用不同合成精度生成的视点合成图像通常不同，用于预测时的预测效率也有差别。一般来说，采用亚像素合成精度比采用整像素合成精度可以获得更高质量的视点合成图像，但计算复杂度也更高。为提高压缩编码性能，编码器可以通过自适应的方法选出多个不同合成精度的视点合成图像中与当前编码图像(即原始图像)相似度最高的视点合成图像作为视点合成图像，用于产生视点合成预测中的预测像素，并将对应的合成精度信息写入码流，来告知解码器采用相应的合成精度生成与编码端匹配的视点合成图像。View synthesis images generated with different synthesis precision are usually different, and the prediction efficiency when used for prediction is also different. Generally speaking, a higher-quality view synthesis image can be obtained by adopting sub-pixel synthesis precision than by adopting integer-pixel synthesis precision, but the computational complexity is also higher. In order to improve the compression coding performance, the encoder can select the view synthesis image with the highest similarity to the current coded image (ie the original image) among multiple view synthesis images with different synthesis precisions through an adaptive method as the view synthesis image, which is used to generate The predicted pixels in the view synthesis prediction, and write the corresponding synthesis accuracy information into the code stream to inform the decoder to use the corresponding synthesis accuracy to generate a view synthesis image that matches the encoder.

编码产生的信息(如预测模式、变换系数等)通过熵编码转换为码流。一个信息(例如合成精度)可以进行数值化，即由某个范围(通常为整数域)的数值来表示信息描述的内容，例如一个信息描述三种情况，则可以用0、1、2分别表示这三种情况。信息数值化为一个语法元素(syntaxelement)，语法元素可以根据其数值的范围和分布情况，采用一种合适的码进行编码，形成码字(codeword，由一个或多个比特组成)，即语法元素编码后成为一串比特。常用的码有n比特定长码、指数哥伦布码、算术编码等；更具体的说，1比特无符号整数码包括两个码字0和1，2比特无符号整数码包括四个码字00、01、10和11，0阶指数哥伦布码的码字包括码字1、010、011等。码字由相应的解码方法(例如查表法，即从码表中查找码字对应的语法元素数值)恢复成该码字对应的语法元素的数值。一个信息也可以与其它信息一起被联合数值化为一个语法元素，从而对应于一个码字，例如可以对两个信息的组合进行编号，将这个编号作为一个语法元素。将信息写入码流，通常的方法为将信息数值化为语法元素，将语法元素对应的码字写入码流。The information generated by encoding (such as prediction mode, transformation coefficient, etc.) is converted into a code stream through entropy encoding. A piece of information (such as synthetic precision) can be numericalized, that is, the content of the information description is represented by a value in a certain range (usually an integer field). For example, if one piece of information describes three situations, it can be represented by 0, 1, and 2 respectively These three situations. The information is numerically converted into a syntax element (syntax element), and the syntax element can be encoded with a suitable code according to the range and distribution of its value to form a code word (codeword, consisting of one or more bits), that is, the syntax element Encoded into a string of bits. Commonly used codes include n-ratio specific long codes, exponential Golomb codes, arithmetic coding, etc.; more specifically, 1-bit unsigned integer codes include two codewords 0 and 1, and 2-bit unsigned integer codes include four codewords 00 , 01, 10 and 11, the codewords of the 0th-order Exponential Golomb code include codewords 1, 010, 011 and so on. The codeword is restored to the value of the syntax element corresponding to the codeword by a corresponding decoding method (such as a look-up method, that is, looking up the value of the syntax element corresponding to the codeword from the code table). A piece of information can also be jointly numericalized together with other information into a syntax element, thus corresponding to a codeword. For example, the combination of two pieces of information can be numbered, and this number can be used as a syntax element. To write information into the code stream, the usual method is to digitize the information into syntax elements, and write the code words corresponding to the syntax elements into the code stream.

发明内容 Contents of the invention

为克服现有技术的上述缺陷，本发明的目的在于提供一种可提高编码预测效率的视点合成预测编码方法、解码方法、对应的装置及码流。In order to overcome the above-mentioned defects of the prior art, the purpose of the present invention is to provide a view synthesis predictive encoding method, decoding method, corresponding device and code stream that can improve the efficiency of encoding and prediction.

本发明第一技术方案是提供一种视点合成预测编码方法，当编码三维视频序列中一个视点V1中的一个图像I时，采用一种合成精度X，由所述三维视频序列中另一个已编码视点V2中的重建图像和重建深度，合成所述视点V1中一个区域H1的视点合成图像P，所述视点合成图像P用于产生编码图像I的过程中视点合成预测需要的预测像素；同时，将所述合成精度X写入所述三维视频序列的码流中，其中视点V1即当前编码视点。The first technical solution of the present invention is to provide a predictive coding method for viewpoint synthesis. When coding an image I in a viewpoint V1 in a three-dimensional video sequence, a synthetic precision X is used, and another coded image I in the three-dimensional video sequence is used. The reconstructed image and the reconstructed depth in the viewpoint V2 are synthesized into a viewpoint synthesis image P of an area H1 in the viewpoint V1, and the viewpoint synthesis image P is used to generate prediction pixels required for viewpoint synthesis prediction in the process of generating the coded image I; at the same time, Writing the synthesis precision X into the code stream of the 3D video sequence, wherein the viewpoint V1 is the current encoding viewpoint.

作为优选，所述合成精度X的获得方法包括以下方法之一：Preferably, the method for obtaining the synthetic precision X includes one of the following methods:

1)采用N种合成精度分别合成当前编码视点V1中一个区域H2的视点合成图像Pn，其中N≥2，1≤n≤N，N、n为整数；计算各个视点合成图像Pn与图像I中对应区域的图像之间的相似度，从所有视点合成图像Pn中选择相似度最高的视点合成图像对应的合成精度作为所述合成精度X；1) Synthesize the view synthesis image Pn of a region H2 in the current coded view V1 by using N kinds of synthesis precision, where N≥2, 1≤n≤N, N and n are integers; The similarity between the images of the corresponding regions, selecting the synthesis precision corresponding to the viewpoint synthesis image with the highest similarity from all viewpoint synthesis images Pn as the synthesis precision X;

2)计算所述图像I中K个区域H3k中像素的方差Vk，取像素的方差Vk的平均值V，通过关于平均值V的函数f(V)，导出所述合成精度X，其中1≤k≤K，k为整数；所述函数f(V)例如其中E为常数，如E＝15，表示对x下取整，所述合成精度X为1/D像素精度。2) Calculate the variance Vk of the pixels in the K regions H3k in the image I, take the average value V of the variance Vk of the pixels, and derive the synthesis accuracy X through the function f(V) about the average value V, where 1≤ k≤K, k is an integer; the function f(V) is for example Where E is a constant, such as E=15, Indicates that x is rounded down, and the synthesis precision X is 1/D pixel precision.

作为优选，1)中所述“计算各个视点合成图像Pn与图像I中对应区域的图像之间的相似度”包括以下方法之一：Preferably, "calculating the similarity between each view synthesis image Pn and the image of the corresponding region in the image I" described in 1) includes one of the following methods:

方法一：对所述视点合成图像Pn中的M1个像素Q1m，分别计算各像素Q1m与所述图像I中对应位置的像素R1m之间的差值D1m，其中1≤n≤N，1≤m≤M1，m、M1为正整数，M1小于等于所述的区域H2中的总像素数；计算D1m绝对值的平均值ValA或D1m平方值的平均值ValB，所述平均值ValA或平均值ValB越小，所述视点合成图像Pn与所述图像I中对应区域的图像之间的相似度越高；Method 1: For the M1 pixels Q1m in the viewpoint synthesis image Pn, calculate the difference D1m between each pixel Q1m and the pixel R1m at the corresponding position in the image I, where 1≤n≤N, 1≤m ≤M1, m and M1 are positive integers, M1 is less than or equal to the total number of pixels in the area H2; calculate the average value ValA of the absolute value of D1m or the average value ValB of the square value of D1m, the average value ValA or the average value ValB The smaller the value, the higher the similarity between the view synthesis image Pn and the image of the corresponding region in the image I;

方法二：对所述视点合成图像Pn中的M2个像素Q2m，计算各像素Q2m和所述图像I中对应位置的像素R2m之间的线性相关系数C，所述线性相关系数C越大，所述视点合成图像Pn与所述图像I中对应区域的图像之间的相似度越高，其中1≤n≤N，1≤m≤M2，n、N、m、M2为正整数，M2小于等于所述的区域H2中的总像素数。Method 2: For the M2 pixels Q2m in the viewpoint synthesis image Pn, calculate the linear correlation coefficient C between each pixel Q2m and the corresponding pixel R2m in the image I, the larger the linear correlation coefficient C, the The higher the similarity between the viewpoint synthesis image Pn and the image of the corresponding area in the image I, where 1≤n≤N, 1≤m≤M2, n, N, m, and M2 are positive integers, and M2 is less than or equal to The total number of pixels in the region H2.

本发明第二技术方案是提供一种视点合成预测解码方法，当解码三维视频序列中一个视点V1中的一个图像I时，从所述三维视频序列的码流中解析出所述图像I对应的合成精度X；采用所述合成精度X，由所述三维视频序列中另一个已解码视点V2中的重建图像和重建深度合成所述视点V1中一个区域H1的视点合成图像P，所述视点合成图像P用于产生解码图像I的过程中视点合成预测需要的预测像素。The second technical solution of the present invention is to provide a viewpoint synthesis predictive decoding method. When decoding an image I in a viewpoint V1 in a three-dimensional video sequence, the corresponding image I is parsed from the code stream of the three-dimensional video sequence. Synthesizing accuracy X; using the synthesis accuracy X, synthesizing a view synthesis image P of a region H1 in the view V1 from the reconstructed image and the reconstruction depth in another decoded view V2 in the 3D video sequence, the view synthesis The picture P is used to generate the prediction pixels needed for view synthesis prediction in the process of decoding the picture I.

本发明第三技术方案是提供一种视点合成预测编码装置，其特征在于包括以下两个模块：The third technical solution of the present invention is to provide a view synthesis predictive encoding device, which is characterized in that it includes the following two modules:

视点合成图像生成模块，采用一种合成精度X生成视点合成图像，其输入包括合成精度X、三维视频序列中一个已编码视点V2中的重建图像和重建深度、以及所述已编码视点V2和当前编码视点V1的摄像机参数信息，其输出包括所述视点V1中一个区域H1的视点合成图像P，其完成的处理包括在编码所述视点V1中的一个图像I时，采用所述合成精度X，由所述已编码视点V2中的重建图像和重建深度，合成所述视点V1中一个区域H1的视点合成图像P，所述视点合成图像P用于产生编码图像I的过程中视点合成预测需要的预测像素；The view synthesis image generating module adopts a synthetic precision X to generate a synthetic view image, and its input includes a synthetic precision X, a reconstructed image and a reconstructed depth in a coded viewpoint V2 in a 3D video sequence, and the coded viewpoint V2 and the current encoding camera parameter information of a viewpoint V1, the output of which comprises a view synthesis image P of an area H1 of said viewpoint V1, the processing of which includes applying said synthesis precision X when encoding an image I of said viewpoint V1, From the reconstructed image and the reconstructed depth in the coded viewpoint V2, synthesize a viewpoint synthesis image P of a region H1 in the viewpoint V1, and the viewpoint synthesis image P is used for the viewpoint synthesis prediction required in the process of generating the coded image I predicted pixels;

合成精度写入模块，将所述合成精度X写入所述三维视频序列的码流，其输入包括所述合成精度X和所述三维视频序列的码流，其输出包括含有所述合成精度X的三维视频序列码流，其完成的处理包括将所述合成精度X写入所述三维视频序列的码流。The composite precision writing module writes the composite precision X into the code stream of the three-dimensional video sequence, its input includes the composite precision X and the code stream of the three-dimensional video sequence, and its output includes the composite precision X The code stream of the 3D video sequence, the completed processing includes writing the synthesis precision X into the code stream of the 3D video sequence.

作为优选，所述的一种视点合成预测编码装置还包括以下模块之一：Preferably, the view synthesis predictive encoding device further includes one of the following modules:

基于图像相似度的合成精度决策模块，其输入包括所述已编码视点V2中的重建图像和重建深度、以及所述已编码视点V2和当前编码视点V1的摄像机参数信息，其输出为所述合成精度X，其完成的处理包括采用N种合成精度分别合成当前编码视点V1中一个区域H2的视点合成图像Pn，其中N≥2，n、N为整数，1≤n≤N；计算各个视点合成图像Pn与图像I中对应区域的图像之间的相似度，从Pn中选择相似度最高的视点合成图像对应的合成精度作为合成精度X；Synthesis accuracy decision module based on image similarity, its input includes the reconstructed image and reconstruction depth in the coded viewpoint V2, and the camera parameter information of the coded viewpoint V2 and the current coded viewpoint V1, and its output is the composite Accuracy X, the completed processing includes synthesizing the viewpoint synthesis image Pn of a region H2 in the current coded viewpoint V1 by using N kinds of synthesis precisions, where N≥2, n and N are integers, 1≤n≤N; calculate each viewpoint synthesis The similarity between the image Pn and the image of the corresponding area in the image I, select the synthesis precision corresponding to the viewpoint synthesis image with the highest similarity from Pn as the synthesis precision X;

基于纹理复杂度的合成精度决策模块，其输入包括所述图像I，其输出为所述合成精度X，其完成的处理包括计算所述图像I中K个区域H3k中像素的方差Vk，取Vk的平均值V，通过关于平均值V的函数f(V)，导出所述合成精度X，其中1≤k≤K，k、K为整数；所述函数f(V)例如其中E为常数，如E＝15，表示对x下取整，所述合成精度X为1/D像素精度。The synthetic accuracy decision module based on texture complexity, its input includes the image I, its output is the synthetic accuracy X, and its completed processing includes calculating the variance Vk of the pixels in the K regions H3k in the image I, taking Vk The average value V of the average value V, through the function f(V) about the average value V, derives the composite precision X, wherein 1≤k≤K, k, K are integers; the function f(V) is for example Where E is a constant, such as E=15, Indicates that x is rounded down, and the synthesis precision X is 1/D pixel precision.

作为优选，所述基于图像相似度的合成精度决策模块中，所述计算各个视点合成图像Pn与图像I中对应区域的图像之间的相似度，包括以下处理之一：Preferably, in the image similarity-based synthesis accuracy decision-making module, the calculation of the similarity between each viewpoint synthesis image Pn and the image of the corresponding region in the image I includes one of the following processes:

处理一：对所述视点合成预测图像Pn中的M1个像素Q1m，分别计算各像素Q1m与所述图像I中对应位置的像素R1m之间的差值D1m，其中1≤n≤N，1≤m≤M1，m、M1为正整数，M1小于等于所述的区域H2中的总像素数；计算D1m绝对值的平均值ValA或D1m平方值的平均值ValB，所述平均值ValA或平均值ValB越小，所述视点合成图像Pn与所述图像I中对应区域的图像之间的相似度越高；Process 1: For the M1 pixels Q1m in the viewpoint synthesis prediction image Pn, calculate the difference D1m between each pixel Q1m and the pixel R1m in the corresponding position in the image I, where 1≤n≤N, 1≤ m≤M1, m and M1 are positive integers, M1 is less than or equal to the total number of pixels in the area H2; calculate the average value ValA of the absolute value of D1m or the average value ValB of the square value of D1m, the average value ValA or the average value The smaller the ValB, the higher the similarity between the view synthesis image Pn and the image of the corresponding area in the image I;

处理二：对所述视点合成图像Pn中的M2个像素Q2m，计算各像素Q2m和所述图像I中对应位置的像素R2m之间的线性相关系数C，所述线性相关系数C越大，所述视点合成图像Pn与所述原始图像O中对应区域的图像之间的相似度越高，其中1≤n≤N，1≤m≤M2，m、M2为正整数，M2小于等于所述的区域H2中的总像素数。Processing 2: For the M2 pixels Q2m in the view synthesis image Pn, calculate the linear correlation coefficient C between each pixel Q2m and the corresponding pixel R2m in the image I, the larger the linear correlation coefficient C, the The higher the similarity between the viewpoint synthesis image Pn and the image of the corresponding area in the original image O, where 1≤n≤N, 1≤m≤M2, m and M2 are positive integers, and M2 is less than or equal to the Total number of pixels in region H2.

本发明第四技术方案是提供一种视点合成预测解码装置，包括以下两个模块：The fourth technical solution of the present invention is to provide a viewpoint synthesis predictive decoding device, which includes the following two modules:

合成精度解析模块，从三维视频序列码流中解析出合成精度X，其输入为三维视频序列码流，其输出为所述合成精度X，其完成的功能为当解码三维视频序列中一个视点V1中的一个图像I时，从所述三维视频序列的码流中解析出所述图像I对应的合成精度X；The synthetic accuracy analysis module parses the synthetic precision X from the three-dimensional video sequence code stream, its input is the three-dimensional video sequence code stream, and its output is the synthetic precision X, and its completed function is when decoding a viewpoint V1 in the three-dimensional video sequence When an image I in the image I, analyze the composite accuracy X corresponding to the image I from the code stream of the three-dimensional video sequence;

视点合成图像生成模块，用所述合成精度X合成所述视点V1中的视点合成图像P，其输入包括合成精度X、一个已解码视点V2的重建图像和重建深度、所述已解码视点V2和所述视点V1的摄像机参数，其输出为所述视点V1中一个区域H1的视点合成图像P，其完成的处理包括由所述已解码视点V2中的重建图像和重建深度合成所述视点V1中一个区域H1的视点合成图像P，所述视点合成图像P用于产生解码图像I的过程中视点合成预测需要的预测像素。The viewpoint synthesis image generation module synthesizes the viewpoint synthesis image P in the viewpoint V1 with the synthesis accuracy X, and its input includes synthesis accuracy X, a reconstructed image and reconstruction depth of a decoded viewpoint V2, the decoded viewpoint V2 and The camera parameters of the viewpoint V1, the output of which is a view synthesis image P of an area H1 in the viewpoint V1, and the completed processing includes synthesizing the reconstructed image and the reconstructed depth in the decoded viewpoint V2 in the viewpoint V1 A view synthesis image P of a region H1, the view synthesis image P is used to generate prediction pixels required for view synthesis prediction in the process of decoding the image I.

本发明第五技术方案是提供一种视点合成预测的码流，包含合成精度X对应的信息；所述合成精度为视点合成预测中生成视点合成图像的合成精度。The fifth technical solution of the present invention is to provide a code stream for view synthesis prediction, which includes information corresponding to the synthesis accuracy X; the synthesis accuracy is the synthesis accuracy of view synthesis images generated in view synthesis prediction.

有益效果：与现有技术相比，本发明的视点合成预测编码方法、解码方法、对应的装置及码流通过一个最佳的合成精度，生成视点合成图像，用于提供视点合成预测中需要的预测像素，从而提高编码预测效率。Beneficial effects: Compared with the prior art, the view synthesis predictive encoding method, decoding method, corresponding device and code stream of the present invention generate a view synthesis image with an optimal synthesis accuracy, which is used to provide the required view synthesis prediction. Predict pixels, thereby improving the efficiency of encoding prediction.

附图说明 Description of drawings

结合附图，本发明的其他特点和优点可从下面通过举例来对本发明的原理进行解释的优选实施方式的说明中变得更清楚。Other features and advantages of the invention will become apparent from the following description of preferred embodiments, taken by way of example to explain the principles of the invention, taken in conjunction with the accompanying drawings.

图1为本发明中视点合成预测编码装置的一个实施例的结构示意图；FIG. 1 is a schematic structural diagram of an embodiment of a viewpoint synthesis predictive encoding device in the present invention;

图2为本发明中视点合成预测编码装置的又一个实施例的结构示意图；FIG. 2 is a schematic structural diagram of another embodiment of a view synthesis predictive encoding device in the present invention;

图3为本发明中视点合成预测编码装置的再一个实施例的结构示意图；FIG. 3 is a schematic structural diagram of another embodiment of a view synthesis predictive encoding device in the present invention;

图4为本发明中视点合成预测解码装置的一个实施例的结构示意图。FIG. 4 is a schematic structural diagram of an embodiment of a view synthesis predictive decoding device in the present invention.

具体实施方式 detailed description

下面将结合附图对本发明的实施方式进行详细描述：Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings:

实施例1Example 1

本发明的第一实施方式涉及一种视点合成预测编码方法。对一个三维视频序列(包括至少两个视点，每个视点包括一个图像序列和一个深度序列)，在编码其中一个视点V1中的一个图像I(图像I为视点V1中图像序列的第T帧的原始图像)的过程中，采用一种合成精度X，利用基于深度和图像渲染技术，由所述的三维视频序列中另一个已编码视点V2中同一时刻(或另一时刻，例如第T-1帧)的重建图像和重建深度，合成所述的视点V1中一个区域H1的视点合成图像P。所述的区域H1可以为整个所述的图像I，也可以为图像I中的一个指定的区域(例如一个中心与图像I中心重合且大小为图像I大小的1/2的矩阵窗)，也可以为图像I中的一个宏块或块，等等。所述的视点合成图像P用于产生编码所述的图像I的过程中视点合成预测需要的预测像素，即编码所述的图像I的过程中至少有一个采用视点合成预测的宏块或块，将选取视点合成图像P中的像素作为该宏块或块的预测像素。例如，当使用整像素合成精度时，可以从P中取一个与该宏块或块大小相同的矩形区域的像素，作为该宏块或块的预测像素；当P采用亚像素精度时，从P中一个M×L区域中抽取M×J个像素，或者将P中一个M×L区域中变采样(即上采样或下采样)为M×J个像素，作为该宏块或块(大小为M×J个像素)的预测像素。The first embodiment of the present invention relates to a view synthesis predictive coding method. For a three-dimensional video sequence (comprising at least two viewpoints, each viewpoint including an image sequence and a depth sequence), an image I in one of the viewpoint V1 is encoded (the image I is the Tth frame of the image sequence in the viewpoint V1 In the process of the original image), a synthesis precision X is adopted, and depth-based and image rendering techniques are used to generate the same moment (or another moment, such as T-1th) from another encoded viewpoint V2 in the 3D video sequence. The reconstructed image and the reconstructed depth of the frame) are synthesized to synthesize the viewpoint synthesis image P of an area H1 in the viewpoint V1. Described area H1 can be whole described image I, also can be a designated area in image I (for example a center coincides with image I center and size is the matrix window of 1/2 of image I size), also It can be a macroblock or block in image I, and so on. The view synthesis image P is used to generate prediction pixels required for view synthesis prediction in the process of encoding the image I, that is, there is at least one macroblock or block that adopts view synthesis prediction in the process of encoding the image I, The pixels in the view synthesis image P are selected as the prediction pixels of the macroblock or block. For example, when using integer pixel synthesis precision, a pixel in a rectangular area with the same size as the macroblock or block can be taken from P as the predicted pixel of the macroblock or block; when P adopts sub-pixel precision, from P Extract M×J pixels from an M×L area in P, or change sampling (that is, upsampling or downsampling) into M×J pixels in an M×L area in P, as the macroblock or block (size is M×J pixels) of predicted pixels.

同时，将所述的合成精度X写入所述的三维视频序列的码流中；所述的合成精度X可以为整像素精度或亚像素精度，亚像素精度例如1/2像素精度、1/4像素精度、1/3像素精度、k/l像素精度(k，l为正整数)等。At the same time, writing the synthesis precision X into the code stream of the three-dimensional video sequence; the synthesis precision X can be integer pixel precision or sub-pixel precision, such as 1/2 pixel precision, 1/2 pixel precision, etc. 4 pixel precision, 1/3 pixel precision, k/l pixel precision (k, l are positive integers), etc.

当视点合成中采用的合成精度X为整像素精度时，即将视点V2中的像素投影到视点V1中投影图像的像素栅格上，且此时投影图像和视点V2图像的分辨率相同。将视点V2中各像素投影到视点V1中投影图像的像素栅格上的方法通常为将视点V2中各像素在视点V1中的投影点(由基于深度和图像渲染技术得到，通常位于视点V1中两个像素点之间)舍入到水平距离最近的那个像素点，产生一个投影像素。当V1中像素栅格上一个像素点只有一个投影像素时，取该投影像素作为该像素点的像素；当多个投影像素对应于V1中像素栅格上同一个像素点时，则将深度最小(距离摄像机最近)的那个投影像素作为该像素点的像素。When the synthesis precision X used in viewpoint synthesis is integer pixel precision, the pixels in viewpoint V2 are projected onto the pixel grid of the projected image in viewpoint V1, and the resolutions of the projected image and the viewpoint V2 image are the same. The method of projecting each pixel in the viewpoint V2 onto the pixel grid of the projected image in the viewpoint V1 is usually to project the projection point of each pixel in the viewpoint V2 in the viewpoint V1 (obtained by depth-based and image rendering techniques, usually located in the viewpoint V1 Between two pixels) is rounded to the nearest pixel horizontally, resulting in a projected pixel. When there is only one projected pixel at a pixel on the pixel grid in V1, the projected pixel is taken as the pixel of the pixel; when multiple projected pixels correspond to the same pixel on the pixel grid in V1, the depth is minimized The projected pixel (closest to the camera) is used as the pixel of the pixel.

若视点合成中采用的合成精度X为亚像素合成精度，则合成精度X记为K/L像素精度(K和L为正整数，且L通常为2的倍数，K通常为1，例如1/2像素合成精度和1/4像素合成精度)。采用K/L像素精度时，将视点V2中的像素投影到视点V1中投影图像的像素栅格上，且此时投影图像水平方向的分辨率为视点V2中图像水平分辨率的L/K倍。将视点V2中一个像素投影到视点V1中投影图像的像素栅格上的方法通常有以下两种方法之一或其组合：1)将视点V2的图像(及其深度)水平方向上采样L/K倍(即上采样图像的水平方向分辨率为原来图像的L/K倍)，然后将上采样的图像投影到视点V1，形成水平方向分辨率为视点V2的图像水平分辨率的L/K倍的投影图像；2)将V1中的投影图像的水平分辨率设为V2中图像的L/K倍，将V2图像中各像素点以整像素精度计算得到的投影位置乘以L/K，得到K/L像素精度下的投影位置(即在整像素精度的投影图像上投影坐标为(a，b)的点，在K/L像素精度下的投影坐标为(aK/L，b))，再将投影位置舍入到投影图像中最相近的像素栅格上，得到投影像素，进而产生水平方向分辨率为视点V2的图像水平分辨率的L/K倍的投影图像。If the synthesis precision X used in view synthesis is sub-pixel synthesis precision, then the synthesis precision X is recorded as K/L pixel precision (K and L are positive integers, and L is usually a multiple of 2, and K is usually 1, for example, 1/ 2-pixel compositing precision and 1/4-pixel compositing precision). When K/L pixel precision is used, the pixels in viewpoint V2 are projected onto the pixel grid of the projected image in viewpoint V1, and the horizontal resolution of the projected image is L/K times the horizontal resolution of the image in viewpoint V2 . The method of projecting a pixel in the viewpoint V2 onto the pixel grid of the projected image in the viewpoint V1 usually has one of the following two methods or a combination thereof: 1) Sampling the image (and its depth) of the viewpoint V2 in the horizontal direction by L/ K times (that is, the horizontal resolution of the upsampled image is L/K times that of the original image), and then project the upsampled image to the viewpoint V1 to form a horizontal resolution L/K of the horizontal resolution of the image of the viewpoint V2 2) the horizontal resolution of the projected image in V1 is set as L/K times of the image in V2, and the projection position calculated by each pixel in the V2 image with integer pixel precision is multiplied by L/K, Get the projection position under K/L pixel precision (that is, the projected coordinates of a point (a, b) on the projection image with integer pixel precision, the projected coordinates under K/L pixel precision are (aK/L, b)) , and then the projected position is rounded to the nearest pixel grid in the projected image to obtain the projected pixels, and then the projected image whose horizontal resolution is L/K times the horizontal resolution of the image of viewpoint V2 is generated.

投影图像中存在空洞(hole)时(即投影图像中有些像素点没有投影像素)，还需要进行空洞填充(holefilling)来填补空洞区域缺少的像素，得到视点合成图像。When there are holes in the projected image (that is, some pixels in the projected image do not have projected pixels), hole filling is also required to fill the missing pixels in the hole area to obtain a view synthesis image.

实施例2Example 2

本发明的第二实施方式涉及一种视点合成预测编码方法。对一个三维视频序列，在编码其中一个视点V1中的一个图像I(图像I为视点V1中图像序列的第T帧的原始图像)的过程中，采用一种合成精度X，利用基于深度和图像渲染技术，由所述的三维视频序列中另一个已编码视点V2中同一时刻的重建图像和重建深度，合成所述的视点V1中一个区域H1的视点合成图像P。The second embodiment of the present invention relates to a view synthesis predictive coding method. For a 3D video sequence, in the process of encoding an image I in one of the viewpoints V1 (the image I is the original image of the Tth frame of the image sequence in the viewpoint V1), a synthetic precision X is used, and the depth and image The rendering technique is to synthesize the viewpoint composite image P of an area H1 in the viewpoint V1 from the reconstructed image and the reconstructed depth at the same moment in another coded viewpoint V2 in the 3D video sequence.

其中合成精度X通过以下方法确定。根据视点V2中同一时刻的重建图像和重建深度(以及摄像机参数等其它信息)，采用基于深度和图像渲染技术，以N个合成精度(例如N＝2，分别为整像数合成精度和1/2像素合成精度)分别合成当前编码视点V1中一个区域H2的N个视点合成图像Pn(1≤n≤N，n为整数)。Among them, the synthetic accuracy X is determined by the following method. According to the reconstructed image and reconstructed depth (and other information such as camera parameters) at the same moment in the viewpoint V2, using depth-based and image rendering technology, with N synthesis precision (for example, N=2, respectively, the synthesis precision of the integer number and 1/ 2 pixel synthesis precision) respectively synthesize N view synthesis images Pn (1≤n≤N, n is an integer) of an area H2 in the current coded view V1.

采用N(N≥2，N为整数)种合成精度分别合成当前编码视点V1中一个区域H2的视点合成图像Pn(1≤n≤N，n为整数)。所述的区域H2可以为整个所述的图像I，也可以是区域H1，还可以是面积小于图像I的多个分散的区域(例如宏块)的集合。Synthesize view synthesis images Pn (1≤n≤N, n is an integer) of a region H2 in the current coded viewpoint V1 using N (N≥2, N is an integer) synthesis precisions. The area H2 may be the entire image I, or the area H1, or a collection of multiple scattered areas (such as macroblocks) whose area is smaller than the image I.

对所述的N个视点合成图像Pn，分别计算各视点合成图像Pn与所述的图像I中对应区域U的图像之间的相似度。当采用整像素合成精度时，区域U在图像I中的位置和视点合成图像Pn在投影图像中的位置相同。当采用亚像素合成精度时，所述对应区域U为将图像I根据采用的合成精度相应地变采样(上采样或下采样)后的图像中与视点合成图像Pn在投影图像中的位置相同的区域。For the N synthetic view images Pn, calculate the similarity between each synthetic view image Pn and the image corresponding to the region U in the image I. When the integer pixel synthesis precision is adopted, the position of the region U in the image I is the same as the position of the view synthesis image Pn in the projected image. When the sub-pixel synthesis precision is used, the corresponding region U is the same position as the viewpoint synthesis image Pn in the projected image in the image after the image I is correspondingly sampled (up-sampled or down-sampled) according to the adopted synthesis precision. area.

从所有Pn中选择所述的相似度最高的视点合成图像对应的合成精度作为所述的合成精度X。所述的相似度计算方法包括以下两种方法之一：The synthesis accuracy corresponding to the view synthesis image with the highest similarity is selected from all Pn as the synthesis accuracy X. The similarity calculation method includes one of the following two methods:

方法一：对所述的视点合成图像Pn(1≤n≤N)中的M1个像素Q1m(1≤m≤M1，m为正整数，M1为正整数，M1小于等于所述的区域H2中的总像素数)，分别计算各像素Q1m(1≤m≤M)与所述的图像I中对应位置(即区域U中坐标相同)的像素R1m(1≤m≤M1)之间的差值D1m(1≤m≤M1)；计算D1m(1≤m≤M1)绝对值的平均值ValA或D1m(1≤m≤M1)平方值的平均值ValB，所述的平均值ValA或平均值ValB越小，所述的视点合成图像Pn与所述的图像I中对应区域U的图像之间的相似度越高；Method 1: For the M1 pixels Q1m (1≤m≤M1, m is a positive integer, M1 is a positive integer, and M1 is less than or equal to the area H2) in the viewpoint synthesis image Pn (1≤n≤N). the total number of pixels), calculate the difference between each pixel Q1m (1≤m≤M) and the pixel R1m (1≤m≤M1) at the corresponding position in the image I (that is, the same coordinates in the region U) D1m (1≤m≤M1); calculate the average value ValA of the absolute value of D1m (1≤m≤M1) or the average value ValB of the square value of D1m (1≤m≤M1), the average value ValA or the average value ValB The smaller the value, the higher the similarity between the viewpoint synthesis image Pn and the image corresponding to the region U in the image I;

方法二：对所述的视点合成图像Pn(1≤n≤N)中的M2个像素Q2m(1≤m≤M2，m为正整数，M2为正整数，M2小于等于所述的区域H2中的总像素数)，计算各像素Q2m(1≤m≤M2)和所述的图像I中对应位置(即区域U中坐标相同)的像素R2m(1≤m≤M2)之间的线性相关系数C，即有：Method 2: For M2 pixels Q2m (1≤m≤M2) in the viewpoint synthesis image Pn (1≤n≤N), m is a positive integer, M2 is a positive integer, and M2 is less than or equal to the area H2 The total number of pixels), calculate the linear correlation coefficient between each pixel Q2m (1≤m≤M2) and the pixel R2m (1≤m≤M2) in the corresponding position in the image I (that is, the same coordinates in the region U) C, that is:

$C C = = \frac{{Σ Σ}_{m m = = 11}^{M m} (({Q Q}_{m m} - - {Q Q}_{mean mean})) (({R R}_{m m} - - {R R}_{mean mean}))}{\sqrt{{Σ Σ}_{m m = = 11}^{M m} {(({Q Q}_{m m} - - {Q Q}_{mean mean}))}^{22}} \sqrt{{Σ Σ}_{m m = = 11}^{M m} {(({R R}_{m m} - - {R R}_{mean mean}))}^{22}}}$

其中Q_mean和R_mean为Qm(1≤m≤M)和Rm(1≤m≤M)的均值。所述的线性相关系数C越大，所述的视点合成图像Pn与所述的图像I中对应区域U的图像之间的相似度越高。Among them, Q _mean and R _mean are the mean values of Qm (1≤m≤M) and Rm (1≤m≤M). The larger the linear correlation coefficient C is, the higher the similarity between the view synthesis image Pn and the image corresponding to the region U in the image I is.

同时，将相似度最高的视点合成图像对应的合成精度写入所述的三维视频序列的码流。合成精度可数值化为一个包括0和1两个值的语法元素，其中0和1分别表示整像素投影精度和1/2像素投影精度。合成精度D可采用0阶指数哥伦布码编码，也可以采用1比特无符号整数码编码。At the same time, the synthesis accuracy corresponding to the view synthesis image with the highest similarity is written into the code stream of the 3D video sequence. The synthetic precision can be quantified as a syntax element including two values of 0 and 1, where 0 and 1 represent integer pixel projection precision and 1/2 pixel projection precision respectively. Synthetic accuracy D can be coded by 0-order exponential Columbus code, or by 1-bit unsigned integer code.

实施例3Example 3

本发明的第三实施方式涉及一种视点合成预测编码方法。对一个三维视频序列，在编码其中一个视点V1中的一个图像I(本实施例中所述的图像I中的像素为原始像素)的过程中，采用一种合成精度X，利用基于深度和图像渲染技术，由所述的三维视频序列中另一个已编码视点V2中同一时刻的重建图像和重建深度，合成所述的视点V1中一个区域H1的视点合成图像P。The third embodiment of the present invention relates to a view synthesis predictive coding method. For a three-dimensional video sequence, in the process of encoding an image I in one of the viewpoints V1 (the pixels in the image I described in this embodiment are original pixels), a synthetic precision X is adopted, and the depth and image based The rendering technique is to synthesize the viewpoint composite image P of an area H1 in the viewpoint V1 from the reconstructed image and the reconstructed depth at the same moment in another coded viewpoint V2 in the 3D video sequence.

其中合成精度X通过以下方法确定。Among them, the synthetic accuracy X is determined by the following method.

计算图像I中K个区域H3k(1≤k≤K，k为整数)中像素的方差Vk(1≤k≤K)，取Vk的平均值V，通过一个关于V的函数f(V)，导出所述的合成精度X；所述的f(V)例如Calculate the variance Vk (1≤k≤K) of pixels in the K regions H3k (1≤k≤K, k is an integer) in the image I, take the average value V of Vk, and pass a function f(V) about V, Deriving the synthetic accuracy X described; the f(V) such as

1)其中E为常数，例如E＝15，表示对x下取整，所述的合成精度X为1/D像素精度；1) Where E is a constant, such as E=15, Indicates that x is rounded down, and the synthesis precision X is 1/D pixel precision;

2)其中F为常数，例如F＝10，所述的合成精度X为1/D像素精度。2) Wherein F is a constant, such as F=10, and the synthesis precision X is 1/D pixel precision.

同时，将相合成精度X写入所述的三维视频序列的码流。合成精度可数值化为一个包括N个非负整数的语法元素，其数值Y分别表示合成精度X为1/Y像素精度。At the same time, write the composite precision X into the code stream of the 3D video sequence. The composite precision can be numericalized as a syntax element including N non-negative integers, and its value Y respectively indicates that the composite precision X is 1/Y pixel precision.

实施例4Example 4

本发明的第四实施方式涉及一种视点合成预测解码方法。当解码三维视频序列中一个视点V1中的一个图像I(本实施例中所述的图像I中的像素为重建像素)时，从所述的三维视频序列的码流中解析出所述图像I对应的合成精度X；采用所述的合成精度X，利用基于深度和图像渲染技术，由所述的三维视频序列中另一个已解码视点V2中的重建图像和重建深度(以及相应的摄像机参数，如焦距，摄像机坐标等)合成所述的视点V1中一个区域H1的视点合成图像P。所述的区域H1可以为整个所述的图像I，也可以为图像I中的一个指定的区域(例如一个中心与图像I中心重合且大小为图像I大小的1/2的矩阵窗)，也可以为图像I中的一个宏块或块，等等。所述的视点合成图像P用于产生解码所述的图像I的过程中视点合成预测需要的预测像素，即解码所述的图像I的过程中至少有一个采用视点合成预测的宏块或块，将选取视点合成图像P中的像素作为该宏块或块的预测像素。例如，当使用整像素合成精度时，可以从P中取一个与该宏块或块大小相同的矩形区域的像素，作为该宏块或块的预测像素；当P采用亚像素精度时，从P中一个M×L区域中抽取M×J个像素，或者将P中一个M×L区域中变采样(即上采样或下采样)为M×J个像素，作为该宏块或块(大小为M×J个像素)的预测像素。A fourth embodiment of the present invention relates to a view synthesis predictive decoding method. When decoding an image I in a viewpoint V1 in a three-dimensional video sequence (the pixels in the image I described in this embodiment are reconstructed pixels), the image I is parsed from the code stream of the three-dimensional video sequence Corresponding composite accuracy X; using the composite accuracy X, using depth-based and image rendering techniques, the reconstructed image and the reconstructed depth (and the corresponding camera parameters, such as focal length, camera coordinates, etc.) to synthesize a view synthesis image P of an area H1 in the view V1. Described area H1 can be whole described image I, also can be a designated area in image I (for example a center coincides with image I center and size is the matrix window of 1/2 of image I size), also It can be a macroblock or block in image I, and so on. The view synthesis image P is used to generate prediction pixels required for view synthesis prediction in the process of decoding the image I, that is, there is at least one macroblock or block that adopts view synthesis prediction in the process of decoding the image I, The pixels in the view synthesis image P are selected as the prediction pixels of the macroblock or block. For example, when using integer pixel synthesis precision, a pixel in a rectangular area with the same size as the macroblock or block can be taken from P as the predicted pixel of the macroblock or block; when P adopts sub-pixel precision, from P Extract M×J pixels from an M×L area in P, or change sampling (that is, upsampling or downsampling) into M×J pixels in an M×L area in P, as the macroblock or block (size is M×J pixels) of predicted pixels.

所述的合成精度X可以为整像素精度或亚像素精度，亚像素精度例如1/2像素精度、1/4像素精度、1/3像素精度、3/8像素精度，k/l像素精度(k，l为正整数)等。The synthetic precision X can be integer pixel precision or sub-pixel precision, sub-pixel precision such as 1/2 pixel precision, 1/4 pixel precision, 1/3 pixel precision, 3/8 pixel precision, k/l pixel precision ( k, l are positive integers) and so on.

实施例5Example 5

本发明的第五实施方式涉及一种视点合成预测编码装置。图1为一种视点合成预测编码装置的实施例结构示意图。该装置包括以下两个模块：采用一种合成精度生成视点合成图像的视点合成图像生成模块，以及将所述合成精度信息写入三维视频序列码流的合成精度写入模块。A fifth embodiment of the present invention relates to a view synthesis predictive encoding device. FIG. 1 is a schematic structural diagram of an embodiment of a view synthesis predictive encoding device. The device includes the following two modules: a synthesis-of-view image generation module for generating a synthesis-of-view image with a synthesis precision, and a synthesis precision writing module for writing the synthesis precision information into a three-dimensional video sequence code stream.

视点合成图像生成模块的输入包括三维视频序列中一个已编码视点V2的重建图像和重建深度、合成精度X、已编码视点V2和当前编码视点V1的摄像机参数等信息，其输出包括当前编码视点V1的视点合成图像P，其完成的功能和实施方式与上述视点合成预测编码方法中采用一种合成精度X，利用基于深度和图像渲染技术，由所述的三维视频序列中已编码视点V2中同一时刻(或另一时刻，例如第T-1帧)的重建图像和重建深度，合成所述的视点V1中一个区域H1的视点合成图像P的功能和实施方式相同。The input of the viewpoint synthesis image generation module includes information such as the reconstructed image and reconstruction depth of a coded viewpoint V2 in the 3D video sequence, the synthesis precision X, the coded viewpoint V2 and the camera parameters of the current coded viewpoint V1, and its output includes information such as the current coded viewpoint V1 The viewpoint synthesis image P, its completed function and implementation mode are the same as those in the above-mentioned viewpoint synthesis predictive coding method using a synthesis precision X, using the depth-based and image rendering technology, from the encoded viewpoint V2 in the 3D video sequence The function of synthesizing the reconstructed image and the reconstructed depth at a moment (or another moment, such as frame T-1 ) with the viewpoint synthesis image P of an area H1 in the viewpoint V1 is the same as that in the embodiment.

合成精度写入模块的输入包括所述的合成精度X和三维视频序列码流，其输出包括含有所述的合成精度X的三维视频序列码流，其完成的功能和实施方式与上述视点合成预测编码方法中将所述的合成精度X写入所述的三维视频序列的码流的功能和实施方式相同。The input of the composite precision writing module includes the composite precision X and the 3D video sequence code stream, and its output includes the 3D video sequence code stream containing the composite precision X. In the encoding method, the function of writing the synthesis precision X into the code stream of the 3D video sequence is the same as that in the embodiment.

实施例6Example 6

本发明的第六实施方式涉及一种视点合成预测编码装置。图2为一种视点合成预测编码装置中又一种实施例结构示意图。该装置与实施例5中的装置的区别在于，在所述的视点合成图像生成模块之前还包括一个基于图像相似度的合成精度决策模块，其输入包括所述的已编码视点V2中的重建图像和重建深度、以及所述的已编码视点V2和当前编码视点V1的摄像机参数等信息、当前编码视点V1中当前编码的图像I(本实施例中所述的图像I中的像素为原始像素)，其输出为所述的合成精度X(作为视点合成图像生成模块的输入)，其完成的功能和实施方式与上述视点合成预测编码方法中采用不同的合成精度分别生成所述的当前编码视点V1中一个区域H2的多个视点合成图像Pn(1≤n≤N)，计算各个所述的视点合成图像Pn与所述的图像I中对应区域的图像之间的相似度，从所有Pn中选择所述的相似度最高的视点合成图像对应的合成精度作为所述的合成精度X的功能和实施方式相同。A sixth embodiment of the present invention relates to a view synthesis predictive encoding device. Fig. 2 is a schematic structural diagram of another embodiment of a view synthesis predictive encoding device. The difference between this device and the device in Embodiment 5 is that before the viewpoint synthesis image generation module, it also includes a synthesis accuracy decision module based on image similarity, and its input includes the reconstructed image in the encoded viewpoint V2 and reconstructed depth, as well as information such as the camera parameters of the coded viewpoint V2 and the current coded viewpoint V1, the currently coded image I in the current coded viewpoint V1 (the pixels in the image I described in this embodiment are original pixels) , its output is the synthesis accuracy X (as the input of the view synthesis image generation module), its completed function and implementation are the same as those in the above view synthesis predictive coding method using different synthesis accuracy to generate the current coded viewpoint V1 respectively A plurality of view synthesis images Pn (1≤n≤N) in an area H2, calculate the similarity between each of the view synthesis images Pn and the image of the corresponding area in the image I, and select from all Pn The function of the synthesis accuracy corresponding to the view synthesis image with the highest similarity as the synthesis accuracy X is the same as that in the embodiment.

实施例7Example 7

本发明的第七实施方式涉及一种视点合成预测编码装置。图3为一种视点合成预测编码装置中再一种实施例结构示意图。该装置与实施例5中的装置的区别在于，在所述的视点合成图像生成模块之前还包括一个基于纹理复杂度的合成精度决策模块，其输入包括当前编码的图像I(本实施例中所述的图像I中的像素为原始像素)，其输出为所述的合成精度X，其完成的功能和实施方式与上述视点合成预测编码方法中计算所述的图像I中K个区域H3k(1≤k≤K，k为整数)中像素的方差Vk(1≤k≤K)，取Vk的平均值V，通过一个关于V的函数f(V)，导出所述的合成精度X的功能和实施方式相同。A seventh embodiment of the present invention relates to a view synthesis predictive encoding device. Fig. 3 is a schematic structural diagram of another embodiment of a view synthesis predictive encoding device. The difference between this device and the device in Embodiment 5 is that, before the viewpoint synthesis image generation module, it also includes a synthesis accuracy decision module based on texture complexity, and its input includes the currently encoded image I (the one in this embodiment) The pixel in the image I described above is the original pixel), and its output is the synthesis accuracy X described above. Its completed functions and implementations are the same as those in the above-mentioned view synthesis predictive coding method for calculating K regions H3k in the image I (1 ≤k≤K, k is an integer), the variance Vk (1≤k≤K) of the pixel in the pixel), take the average value V of Vk, through a function f(V) about V, derive the function sum of the synthetic precision X The implementation is the same.

实施例8Example 8

本发明的第八实施方式涉及一种视点合成预测解码装置。图4为一种视点合成预测解码装置的实施例结构示意图。该装置包括以下两个模块：从三维视频序列码流中解析出合成精度X的合成精度解析模块，以及采用所述的合成精度X合成所述的视点V1的视点合成图像的视点合成图像生成模块。An eighth embodiment of the present invention relates to a view synthesis predictive decoding device. Fig. 4 is a schematic structural diagram of an embodiment of a view synthesis predictive decoding device. The device includes the following two modules: a synthesis accuracy analysis module that analyzes the synthesis accuracy X from the three-dimensional video sequence code stream, and a view synthesis image generation module that uses the synthesis accuracy X to synthesize the view synthesis image of the viewpoint V1 .

合成精度解析模块的输入为三维视频序列码流，其输出为所述的合成精度X，其完成的功能和实施方式与上述视点合成预测解码方法中当解码三维视频序列中一个视点V1中的一个图像I(本实施例中所述的图像I中的像素为重建像素)时，从所述的三维视频序列的码流中解析出所述的图像I对应的所述的合成精度X的功能和实施方式相同；The input of the synthesis accuracy analysis module is the 3D video sequence code stream, and its output is the synthesis accuracy X. Its completed functions and implementation methods are the same as those in the above viewpoint synthesis predictive decoding method when decoding one of the viewpoints V1 in the 3D video sequence. When an image I (the pixels in the image I described in this embodiment are reconstructed pixels), the function sum of the synthesis precision X corresponding to the image I is analyzed from the code stream of the three-dimensional video sequence The implementation is the same;

视点合成图像生成模块的输入包括所述的合成精度信息X、一个已解码视点V2的重建图像和重建深度、所述的已解码视点V2和所述的当前解码视点V1的摄像机参数，其输出为所述的视点V1中一个区域H1的视点合成图像P，其完成的功能和实施方式与上述视点合成预测解码方法中由所述的已解码视点V2中的重建图像和重建深度合成所述的视点V1中一个区域H1的视点合成图像P的功能和实施方式相同。所述的视点合成图像P用于产生解码所述的图像I的过程中视点合成预测需要的预测像素。The input of the view synthesis image generation module includes the synthesis accuracy information X, the reconstructed image and the reconstruction depth of a decoded viewpoint V2, the camera parameters of the decoded viewpoint V2 and the current decoded viewpoint V1, and its output is The viewpoint synthesis image P of an area H1 in the viewpoint V1, its completed functions and implementation methods are the same as those described above in the viewpoint synthesis predictive decoding method by synthesizing the viewpoint from the reconstructed image and the reconstructed depth in the decoded viewpoint V2 The function of the view synthesis image P of an area H1 in V1 is the same as that in the embodiment. The view synthesis image P is used to generate prediction pixels required for view synthesis prediction in the process of decoding the image I.

所述的视点合成预测编码装置和解码装置可以由多种方式实现，例如：The view synthesis predictive encoding device and decoding device can be implemented in various ways, for example:

方法一：以电子计算机为硬件附加与所述视点合成预测编码方法和解码方法功能相同的软件程序来实现。Method 1: use a computer as hardware to add a software program with the same functions as the viewpoint synthesis predictive encoding method and decoding method to implement.

方法二：以单片机为硬件附加与所述视点合成预测编码方法和解码方法功能相同的软件程序来实现。Method 2: use a single-chip microcomputer as hardware and add a software program with the same functions as the viewpoint synthesis predictive encoding method and decoding method to implement.

方法三：以数字信号处理器为硬件附加与所述视点合成预测编码方法和解码方法功能相同的软件程序来实现。Method 3: use a digital signal processor as hardware and add a software program with the same functions as the viewpoint synthesis predictive encoding method and decoding method to implement.

方法四：设计与所述视点合成预测编码方法和解码方法功能相同的电路来实现。Method 4: Design a circuit with the same function as the viewpoint synthesis predictive encoding method and decoding method to implement.

实现所述的视点合成预测编码装置和解码装置的方法还可以有其它的方法，不仅限于上述四种。There may also be other methods for implementing the view synthesis predictive encoding device and decoding device, and are not limited to the above four methods.

实施例9Example 9

发明的第九实施方式涉及一种视点合成预测的码流，包含合成精度对应的信息；所述的合成精度为视点合成预测中生成视点合成图像的合成精度。The ninth embodiment of the invention relates to a code stream for view synthesis prediction, which includes information corresponding to synthesis accuracy; the synthesis accuracy is the synthesis accuracy of view synthesis images generated in view synthesis prediction.

码流中的一个合成精度对应的信息可以用于一个三维视频序列中所有视点的所有帧的视点合成预测(指示视点合成预测中生成视频合成图像的合成精度)，或者可以用于一个三维视频序列中某一个视点的所有帧(或若干帧)的视点合成预测，或者可以用于一个三维视频序列中某一个视点的某一帧的视点合成预测，或者可以用于一个三维视频序列中某一个视点的某一帧中某个区域(例如宏块)的视点合成预测。The information corresponding to a synthesis precision in the code stream can be used for view synthesis prediction of all frames of all viewpoints in a 3D video sequence (indicating the synthesis accuracy of video synthesis images generated in view synthesis prediction), or can be used for a 3D video sequence Viewpoint synthesis prediction of all frames (or several frames) of a certain viewpoint in a certain viewpoint, or can be used for viewpoint synthesis prediction of a certain frame of a certain viewpoint in a 3D video sequence, or can be used for a certain viewpoint in a 3D video sequence View synthesis prediction of a certain area (such as a macroblock) in a certain frame of .

虽然结合附图描述了本发明的实施方式，但是本领域普通技术人员可以在所附权利要求的范围内作出各种变形或修改。Although the embodiments of the present invention have been described with reference to the accompanying drawings, various variations or modifications may be made by those skilled in the art within the scope of the appended claims.

Claims

1. A predictive coding method for viewpoint synthesis, characterized in that, when encoding an image I in a viewpoint V1 in a three-dimensional video sequence, a synthetic accuracy X is used to generate an encoded viewpoint V2 in the three-dimensional video sequence The reconstructed image and the reconstructed depth in the viewpoint V1 are synthesized into a view synthesis image P of an area H1 in the viewpoint V1, and the view synthesis image P is used to generate the prediction pixels required for view synthesis prediction in the process of generating the coded image I; at the same time, the area The synthesis precision X of H1 is written into the code stream of the 3D video sequence, wherein the viewpoint V1 is the current encoding viewpoint; the method for obtaining the synthesis precision X includes one of the following methods:

1) Synthesize the view synthesis image Pn of a region H2 in the current coded view V1 by using N kinds of synthesis precision, where N≥2, 1≤n≤N, N and n are integers; The similarity between the images of the corresponding regions, selecting the synthesis precision corresponding to the viewpoint synthesis image with the highest similarity from all viewpoint synthesis images Pn as the synthesis precision X;

2) Calculate the variance Vk of the pixels in the K regions H3k in the image I, take the average value V of the variance Vk of the pixels, and derive the synthesis accuracy X through the function f(V) about the average value V, where 1≤ k≤K, k is an integer, the function f(V) maps V into an integer value, D=f(v), and the synthesis precision X is 1/D pixel precision.

2. A view synthesis predictive coding method as claimed in claim 1, characterized in that, in 1), "calculate the similarity between each view synthesis image Pn and the image of the corresponding region in the image I" includes the following methods one:

Method 1: For the M1 pixels Q1m in the viewpoint synthesis image Pn, calculate the difference D1m between each pixel Q1m and the pixel R1m at the corresponding position in the image I, where 1≤n≤N, 1≤m ≤M1, m and M1 are positive integers, M1 is less than or equal to the total number of pixels in the area H2; calculate the average value ValA of the absolute value of D1m or the average value ValB of the square value of D1m, the average value ValA or the average value ValB The smaller the value, the higher the similarity between the view synthesis image Pn and the image of the corresponding region in the image I;

Method 2: For the M2 pixels Q2m in the viewpoint synthesis image Pn, calculate the linear correlation coefficient C between each pixel Q2m and the corresponding pixel R2m in the image I, the larger the linear correlation coefficient C, the The higher the similarity between the viewpoint synthesis image Pn and the image of the corresponding area in the image I, where 1≤n≤N, 1≤m≤M2, n, N, m, and M2 are positive integers, and M2 is less than or equal to The total number of pixels in the region H2.

3. A view synthesis predictive decoding method, characterized in that, when decoding an image I in a viewpoint V1 in a three-dimensional video sequence, the synthesis accuracy corresponding to the image I is parsed from the code stream of the three-dimensional video sequence X: Synthesize a view synthesis image P of a region H1 in the view V1 from the reconstructed image and the reconstruction depth in another decoded view V2 in the 3D video sequence using the synthesis precision X, the view synthesis image P It is used to generate the predicted pixels required for view synthesis prediction in the process of generating the decoded image I. For different synthesis precisions X, the methods for generating the predicted pixels are different.

4. A view synthesis predictive encoding device, characterized in that it comprises the following two modules:

The view synthesis image generating module adopts a synthetic precision X to generate a synthetic view image, and its input includes a synthetic precision X, a reconstructed image and a reconstructed depth in a coded viewpoint V2 in a 3D video sequence, and the coded viewpoint V2 and the current encoding camera parameter information of a viewpoint V1, the output of which comprises a view synthesis image P of an area H1 of said viewpoint V1, the processing of which includes applying said synthesis precision X when encoding an image I of said viewpoint V1, From the reconstructed image and the reconstructed depth in the coded viewpoint V2, synthesize a viewpoint synthesis image P of a region H1 in the viewpoint V1, and the viewpoint synthesis image P is used for the viewpoint synthesis prediction required in the process of generating the coded image I predicted pixels;

The composite precision writing module writes the composite precision X into the code stream of the three-dimensional video sequence, its input includes the composite precision X and the code stream of the three-dimensional video sequence, and its output includes the composite precision X The code stream of the 3D video sequence, the completed processing includes writing the synthesis precision X into the code stream of the 3D video sequence.

5. A view synthesis predictive encoding device as claimed in claim 4, further comprising one of the following modules:

Synthesis accuracy decision module based on image similarity, its input includes the reconstructed image and reconstruction depth in the coded viewpoint V2, and the camera parameter information of the coded viewpoint V2 and the current coded viewpoint V1, and its output is the composite Accuracy X, the completed processing includes synthesizing the viewpoint synthesis image Pn of a region H2 in the current coded viewpoint V1 by using N kinds of synthesis precisions, where N≥2, n and N are integers, 1≤n≤N; calculate each viewpoint synthesis The similarity between the image Pn and the image of the corresponding area in the image I, select the synthesis precision corresponding to the viewpoint synthesis image with the highest similarity from Pn as the synthesis precision X;

The synthetic accuracy decision module based on texture complexity, its input includes the image I, its output is the synthetic accuracy X, and its completed processing includes calculating the variance Vk of the pixels in the K regions H3k in the image I, taking Vk The average value V of the average value V is derived from the function f(V) of the average value V, and the synthetic precision X is derived, where 1≤k≤K, k and K are integers, and the function f(V) maps V into an integer value , D=f(v), the synthesis precision X is 1/D pixel precision.

6. A view synthesis predictive coding device as claimed in claim 5, wherein in the said image similarity-based synthesis precision decision module, said calculation of images in the corresponding regions of each view synthesis image Pn and image I The similarity between, including one of the following treatments:

Process 1: For the M1 pixels Q1m in the view synthesis image Pn, calculate the difference D1m between each pixel Q1m and the corresponding pixel R1m in the image I, where 1≤n≤N, 1≤m ≤M1, m and M1 are positive integers, M1 is less than or equal to the total number of pixels in the area H2; calculate the average value ValA of the absolute value of D1m or the average value ValB of the square value of D1m, the average value ValA or the average value ValB The smaller the value, the higher the similarity between the view synthesis image Pn and the image of the corresponding region in the image I;

Processing 2: For the M2 pixels Q2m in the view synthesis image Pn, calculate the linear correlation coefficient C between each pixel Q2m and the corresponding pixel R2m in the image I, the larger the linear correlation coefficient C, the The higher the similarity between the viewpoint synthesis image Pn and the image of the corresponding area in the image I, where 1≤n≤N, 1≤m≤M2, m and M2 are positive integers, and M2 is less than or equal to the area Total number of pixels in H2.

7. A view synthesis predictive decoding device, characterized in that it comprises the following two modules:

The synthetic accuracy analysis module parses the synthetic precision X from the three-dimensional video sequence code stream, its input is the three-dimensional video sequence code stream, and its output is the synthetic precision X, and its completed function is when decoding a viewpoint V1 in the three-dimensional video sequence When an image I in the image I, analyze the composite accuracy X corresponding to the image I from the code stream of the three-dimensional video sequence;

The viewpoint synthesis image generation module synthesizes the viewpoint synthesis image P in the viewpoint V1 with the synthesis accuracy X, and its input includes synthesis accuracy X, a reconstructed image and reconstruction depth of a decoded viewpoint V2, the decoded viewpoint V2 and The camera parameters of the viewpoint V1, the output of which is a view synthesis image P of an area H1 in the viewpoint V1, and the completed processing includes synthesizing the reconstructed image and the reconstructed depth in the decoded viewpoint V2 in the viewpoint V1 A view synthesis image P of a region H1, the view synthesis image P is used to generate prediction pixels required for view synthesis prediction in the process of decoding the image I.