CN100565589C

CN100565589C - The apparatus and method that are used for depth perception

Info

Publication number: CN100565589C
Application number: CNB2006800022610A
Authority: CN
Inventors: F·E·厄恩斯特; M·J·R·奥普德比克; C·瓦雷坎普
Original assignee: Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2005-01-12
Filing date: 2006-01-12
Publication date: 2009-12-02
Anticipated expiration: 2026-01-12
Also published as: WO2006075304A3; DE602006005785D1; JP5058820B2; EP1839267A2; JP2008527561A; ES2323287T3; CN101103380A; EP1839267B1; KR101249236B1; WO2006075304A2; US8270768B2; ATE426218T1; KR20070105994A; US20090003728A1

Abstract

The invention discloses a method for rendering a multi-viewpoint image comprising a first output image and a second output image based on an input image (102). The method includes: creating a modulated image (100) comprising irregularly shaped objects (106-112); modulating pixel values of a portion of an input image (102) based on other pixel values of the modulated image (100), thereby forming image (104); and generating a multi-viewpoint image by warping the intermediate image based on the disparity data.

Description

Apparatus and method for depth perception

技术领域 technical field

本发明涉及一种基于输入图像和视差数据对多视点图像进行渲染的方法。The invention relates to a method for rendering multi-viewpoint images based on input images and parallax data.

本发明还涉及一种基于输入图像和视差数据对多视点图像进行渲染的渲染单元。The invention also relates to a rendering unit for rendering multi-viewpoint images based on input images and disparity data.

本发明还涉及一种包括所述渲染单元的图像处理设备。The invention also relates to an image processing device comprising the rendering unit.

本发明还涉及一种由计算机配置加载的计算机程序产品，其包括基于输入图像和视差数据对多视点图像进行渲染的指令，所述计算机配置包括处理器件和存储器。The invention also relates to a computer program product loaded by a computer configuration comprising processing means and memory, comprising instructions for rendering multi-viewpoint images based on input images and disparity data.

背景技术 Background technique

自从引入显示装置以来，许多人开始梦想着逼真的3-D显示装置。已经研究了许多可以实现这样的显示装置的原理。一些原理试图在一定空间中创建逼真的3-D对象。例如，A.Sullivan在proceedings ofSID’03(1531-1533，2003)的文章“Solid-state Multi-planarVolumetric Display”中公开了一种显示装置，其通过快速投影仪在一系列平面上移动可视数据。每个平面是一个可切换的散射器。如果平面的数量足够大，则人脑组合图画并观察到逼真的3-D对象。该原理允许观看者在一定范围里环顾对象。在此显示装置中，所有对象都是(半)透明的。Since the introduction of display devices, many people have dreamed of realistic 3-D display devices. Many principles by which such display devices can be realized have been investigated. Some principles attempt to create realistic 3-D objects in a certain space. For example, A. Sullivan in the article "Solid-state Multi-planar Volumetric Display" in proceedings of SID'03 (1531-1533, 2003) discloses a display device that moves visual data over a series of planes via a fast projector . Each plane is a switchable diffuser. If the number of planes is large enough, the human brain assembles the pictures and sees realistic 3-D objects. This principle allows the viewer to look around the object within a certain range. In this display device all objects are (semi)transparent.

许多其它的原理试图只基于双目视差来创建3-D显示装置。在这些系统中，观看者的左右眼感知不同图像，因此观看者感知到3-D图像。可以在Princeton University Press，1993的图书《StereoComputer Graphics and Other True 3-D Technologies》(D.F.McAllister(Ed.))中找到这些概念的概述。第一个原理使用立体眼镜与例如CRT组合。如果显示奇数帧，则对左眼阻断光，并且如果显示偶数帧，则对右眼阻断光。Many other principles attempt to create 3-D display devices based solely on binocular parallax. In these systems, the viewer's left and right eyes perceive different images, so the viewer perceives a 3-D image. An overview of these concepts can be found in the book "StereoComputer Graphics and Other True 3-D Technologies" (D.F. McAllister (Ed.)), Princeton University Press, 1993. The first principle uses anaglyph glasses in combination with eg a CRT. If an odd frame is displayed, the light is blocked to the left eye, and if an even frame is displayed, the light is blocked to the right eye.

不需要附加工具来显示3-D的显示装置被称为自动-立体镜显示装置。A display device that does not require additional tools to display 3-D is called an auto-stereoscopic display device.

第一个无眼镜显示装置包括遮光板，以创建针对准观看者的左右眼的锥形光。例如，所述锥形光例如相应于奇数和偶数子像素列。利用适宜的信息对这些列进行寻址，如果观看者处于正确的位置点，则在他的左右眼中获得不同的图像，并能够感知到3-D图画。The first glasses-free display devices included gobos to create cones of light aimed at the viewer's left and right eyes. For example, the cones of light correspond to odd and even sub-pixel columns, for example. These columns are addressed with the appropriate information, and if the viewer is at the correct point, different images are obtained in his left and right eyes and a 3-D picture can be perceived.

第二个无眼镜显示装置包括一系列透镜，以将奇数和偶数子像素列的光成像给观看者的左眼和右眼。The second glasses-free display device includes a series of lenses to image light from odd and even sub-pixel columns to the viewer's left and right eyes.

上述无眼镜显示装置的缺陷在于观看者必须保持在固定位置。为了引导观看者，已经提议使用指示器来显示处在正确位置的观看者。例如参见美国专利US5986804，其中遮光板与红和绿发光二极管组合在一起。在观看者正确定位的情况下，他看到绿光或红光。A disadvantage of the glasses-free display devices described above is that the viewer must remain in a fixed position. In order to guide the viewer, it has been proposed to use pointers to show the viewer in the correct position. See for example US Pat. No. 5,986,804, where a visor is combined with red and green LEDs. With the correct orientation of the viewer, he sees green or red light.

为了观看者免于处在固定的位置，已经提出了多视点自动-立体镜显示装置。例如参见美国US60064424和US20000912。在US60064424和US20000912所公开的显示装置中，使用倾斜的双凸透镜，由此双凸透镜的宽度大于两个子像素。这种方式中存在着若干相邻的图像，并且观看者可以稍显自由地向左右移动In order to free the viewer from being in a fixed position, multi-viewpoint auto-stereoscopic display devices have been proposed. See for example US60064424 and US20000912. In the display devices disclosed in US60064424 and US20000912, inclined lenticular lenses are used, whereby the width of the lenticular lens is larger than two sub-pixels. In this way there are several adjacent images and the viewer can move left and right with some freedom

为了在多视点显示装置上生成3-D印象，必须从不同的虚拟视点渲染图像。这需要有多个输入视图或一些3-D或深度的信息。该深度信息可以从多视点照像系统记录、生成，或从传统的2-D视频材料生成。为了从2-D视频生成深度信息，可以采用几种深度暗示：例如运动结构、聚焦信息、几何形状和动态隐面(dynamic occlusion)。目的是生成密集深度图，即每个像素一个深度值。接下来，将深度图用于渲染多视点图像以供给观看者深度印象。在P.A.Redert，E.A.Hendriks和J.Biemond的文章“Synthesis of multi viewpointimages at non-intermediate positions”(Proceedings ofInternational Conference on Acoustics，Speech and SignalProcessing，Vol.IV，ISBN 0-8186-7919-0，第2749-2752页，IEEEComputer Society，Los Alamitos，California，1997)中公开了一种提取深度信息以及基于输入图像和深度图对多视点图像进行渲染的方法。多视点图像是将由多视点显示装置显示以创建3-D印象的一组图像。典型地，基于输入图像创建该组图像。通过将输入图像的像素移位相应的偏移量来创建这些图像之一。这些偏移量被称为视差。因此，典型地，对于每个像素来说存在一个对应的视差值，它们一起形成视差图。典型地，视差值和深度值成反比，即：In order to generate a 3-D impression on a multi-view display device, images must be rendered from different virtual viewpoints. This requires multiple input views or some 3-D or depth information. This depth information can be recorded, generated from a multi-view camera system, or generated from conventional 2-D video material. To generate depth information from 2-D videos, several depth cues can be employed: e.g. motion structure, focus information, geometry, and dynamic occlusion. The goal is to generate a dense depth map, i.e. one depth value per pixel. Next, the depth map is used to render the multi-view image to give the viewer the impression of depth. In the article "Synthesis of multi viewpoint images at non-intermediate positions" by P.A. Redert, E.A. Hendriks and J. Biemond (Proceedings of International Conference on Acoustics, Speech and Signal Processing, Vol. IV, ISBN 0-8186-7919-0, pp. 2749- 2752, IEEE Computer Society, Los Alamitos, California, 1997) discloses a method for extracting depth information and rendering a multi-viewpoint image based on an input image and a depth map. A multi-view image is a set of images to be displayed by a multi-view display device to create a 3-D impression. Typically, the set of images is created based on input images. Create one of these images by shifting the pixels of the input image by the corresponding offset. These offsets are called parallax. Thus, typically, for each pixel there is a corresponding disparity value, which together form a disparity map. Typically, the disparity value is inversely proportional to the depth value, ie:

$S S = = \frac{C C}{D D.} - - - - - - ((11))$

其中S为视差，C为常数和D为深度。创建深度图视为等同于创建视差图。where S is the disparity, C is a constant and D is the depth. Creating a depth map is considered equivalent to creating a disparity map.

对于2-D输入图像的均匀区域，即基本上无纹理的区域来说，难以或有时不可能从多视点显示装置来推断其深度是多少。通常，这将作为相应于处于屏幕水平的均匀区域的对象而被感知。在具有例如蓝天的均匀背景的情况下，所感知的多视点显示装置的深度相对较小。在天空无云的情况下，天空被感知处于屏幕水平，因此对于正确的深度印象，不可能将其它对象置于屏幕后面，这严重降低了深度印象。For uniform regions of a 2-D input image, ie substantially texture-free regions, it is difficult or sometimes impossible to infer from a multi-view display device what its depth is. Typically, this will be perceived as an object corresponding to a uniform area at screen level. With a uniform background such as a blue sky, the perceived depth of a multi-view display device is relatively small. In the case of a cloudless sky, the sky is perceived to be at screen level, so for a correct depth impression it is impossible to place other objects behind the screen, which seriously reduces the depth impression.

发明内容 Contents of the invention

本发明的一个目的是提供一种首段中描述的方法，从而增加深度印象。It is an object of the invention to provide a method as described in the opening paragraph, whereby the impression of depth is increased.

本发明的上述目的是这样实现的，该方法包括：Above-mentioned purpose of the present invention is achieved like this, and this method comprises:

-创建包括不规则形状对象的调制图像；- Create modulated images including irregularly shaped objects;

-在调制图像的其他像素值的基础上调制输入图像的一部分的像素值，从而形成中间图像；和- modulating pixel values of a portion of the input image on the basis of other pixel values of the modulating image, thereby forming an intermediate image; and

-通过在视差数据的基础上扭曲(warping)中间图像而生成多视点图像。- Generation of multi-view images by warping intermediate images on the basis of disparity data.

在多视点显示装置上给观看者3-D印象取决于将第一输出图像示于左眼和将第二输出图像示于右眼。这些输出图像之间的差别通过人脑而被演绎成3-D图像。通过相对于彼此移位输入图像的各对象，构建输出图像。偏移量由对象的深度决定。大脑以不同的视点识别对象之间的对应，即输出图像，并从差别来推断几何形状。如果对象基本上无纹理，则难以形成这样的对应，因为不存在眼睛“锁定”的特征。成像一个均匀黑表面。将它移位到左或右都不会改变它。因此，基于视差并不能推断该表面所处的深度。Giving the viewer a 3-D impression on a multi-view display device depends on showing the first output image to the left eye and the second output image to the right eye. The difference between these output images is interpreted into a 3-D image by the human brain. The output image is constructed by displacing the objects of the input image relative to each other. The offset is determined by the object's depth. The brain recognizes correspondences between objects at different viewpoints, i.e. output images, and infers geometric shapes from the differences. If the object is substantially textureless, it is difficult to make such a correspondence, since there is no feature of eye "lock-in". Imaging a uniform black surface. Shifting it left or right doesn't change it. Therefore, the depth at which the surface is located cannot be inferred based on parallax.

通过基于调制图像的其他像素值来调制输入图像的一部分的像素值，引入特征。这些对应于不规则形状对象的特征首先在输入图像的基本上均匀的区域中是可见的。接着，可以对看起来在某些区域有所不同的第一输出图像和第二输出图像进行渲染，该区域对应于在进行调制之前基本上是均匀的那部分输入图像。现在，用户可以在第一和第二输出图像中所各自引入的不规则形状对象之间形成对应。Features are introduced by modulating pixel values of a portion of an input image based on other pixel values of the modulating image. These features corresponding to irregularly shaped objects are first visible in substantially uniform regions of the input image. The first and second output images may then be rendered to appear to differ in regions corresponding to the portion of the input image that was substantially uniform prior to modulation. The user can now make a correspondence between the irregularly shaped objects introduced in each of the first and second output images.

优选地，不规则形状对象的尺寸与视差数据相关。例如，不规则形状对象的平均尺寸和视差数据的平均值具有相同的数量级。假设视差数据包括1-15个像素范围内的值，则优势在于尺寸，即不规则形状对象的高度和宽度，基本上在相同的范围中。优选地，不规则形状对象的平均直径对于1000*1000像素的图像大约为7-8个像素。平均直径是指两个边缘之间的平均距离。Preferably, the size of the irregularly shaped object is related to the disparity data. For example, the average size of an irregularly shaped object is of the same order as the average value of the disparity data. Assuming that the disparity data includes values in the range 1-15 pixels, the advantage is that the dimensions, ie the height and width of the irregularly shaped object, are substantially in the same range. Preferably, the average diameter of the irregularly shaped object is about 7-8 pixels for an image of 1000*1000 pixels. The average diameter refers to the average distance between two edges.

输入图像的像素值的调制可以覆盖遍布在输入图像上的像素。该调制优选地覆盖对应于基本上均匀的区域的那部分输入图像。优选地，该调制是这样的，提高输入图像的第一部分像素的亮度值，同时降低输入图像的第二部分像素的亮度值。例如，输入图像的第一部分像素对应于代表不规则形状对象的调制图像的像素集合，而输入图像的第二部分像素对应于代表背景的调制图像的另一个像素集合。优选地，平均亮度值不受调制影响，即输入图像的平均亮度值和中间图像的平均亮度值基本上彼此相等。The modulation of the pixel values of the input image may cover pixels spread over the input image. The modulation preferably covers that part of the input image corresponding to a substantially uniform area. Preferably, the modulation is such that the brightness values of a first part of the pixels of the input image are increased, while the brightness values of a second part of the pixels of the input image are decreased. For example, a first portion of pixels of the input image corresponds to a set of pixels representing the modulated image of an irregularly shaped object, while a second portion of pixels of the input image corresponds to another set of pixels of the modulated image representing the background. Preferably, the average brightness value is not affected by the modulation, ie the average brightness value of the input image and the average brightness value of the intermediate image are substantially equal to each other.

在根据本发明的方法的实施例中，创建调制图像包括：In an embodiment of the method according to the invention, creating a modulated image comprises:

-通过生成噪声而创建第一图像；- creating the first image by generating noise;

-使用低通滤波器过滤第一图像，从而形成第二图像；和- filtering the first image using a low pass filter, thereby forming the second image; and

-通过阈值来划分第二图像的像素，从而形成调制图像。- Dividing the pixels of the second image by thresholding to form a modulated image.

优选地，由随机噪声发生器生成噪声。低通滤波器的特点优选地与视差数据相关，以便创建具有适宜尺寸的不规则形状对象。这样进行划分，即将已连接的像素组标记为属于各自的不规则形状对象，同时将已连接的像素的其它组标记为背景。Preferably, the noise is generated by a random noise generator. The characteristics of the low pass filter are preferably correlated with the disparity data in order to create irregularly shaped objects of suitable size. The partitioning is done by marking connected groups of pixels as belonging to the respective irregularly shaped object, while marking other groups of connected pixels as background.

在根据本发明的方法的实施例中，基于视差数据调制像素值。优选地，亮度值的提高和降低取决于局部深度值，并因此取决于像素的局部视差值。优选地，对于远离观看者的输入图像的对象而言，提高和/或降低的量较高。In an embodiment of the method according to the invention the pixel values are modulated based on the disparity data. Preferably, the raising and lowering of the luminance value depends on the local depth value and thus on the local disparity value of the pixel. Preferably, the amount of boosting and/or lowering is higher for objects of the input image that are further away from the viewer.

在根据本发明的方法的实施例中，基于运动矢量创建调制图像，所述运动矢量是在输入图像所属的一系列输入图像的基础上计算的。假设把根据本发明的方法应用于代表运动的一系列输入图像，例如一系列视频图像。例如对应于摇拍照相机。如果该输入图像序列的每个输入图像都由相同调制图像进行调制，并被显示在多视点显示装置上，则结果可能就好像是在通过脏的窗户观看输出图像序列。为了防止这个，优选每个输入图像通过自己的调制图像来进行调制。用于调制特定输入图像的调制图像可以基于其它调制图像，该其它调制图像是为之前的输入图像(即在特定输入图像之前的图像)而创建的。优选地，所述其它调制图像基于在一个方向上为调制特定输入图像而对调制图像进行移位，并且所述其它调制图像与场景中的运动相关。优选地，为获得所述其它调制图像，采用一运动矢量来移位用于调制特定输入图像的调制图像，其中，该运动矢量是通过分析或模拟对应于特定输入图像的运动矢量场而计算的。In an embodiment of the method according to the invention, the modulated image is created based on motion vectors calculated on the basis of the series of input images to which the input image belongs. Suppose the method according to the invention is applied to a series of input images representing motion, for example a series of video images. For example, it corresponds to panning a camera. If each input image of the input image sequence is modulated by the same modulation image and displayed on a multi-view display device, the result may appear to be viewing the output image sequence through a dirty window. To prevent this, preferably each input image is modulated by its own modulation image. The modulation image used to modulate a particular input image may be based on other modulation images that were created for previous input images, ie images preceding the particular input image. Advantageously, said other modulated image is based on shifting the modulated image in one direction for modulating a particular input image, and said other modulated image is related to motion in the scene. Preferably, to obtain said other modulated image, the modulated image used to modulate the particular input image is shifted by a motion vector calculated by analyzing or simulating the motion vector field corresponding to the particular input image .

本发明的另一目的是提供一种首段中描述的渲染单元，从而增加深度印象。Another object of the invention is to provide a rendering unit as described in the opening paragraph, whereby the impression of depth is increased.

本发明的上述目的是这样实现的，该渲染单元包括：Above-mentioned purpose of the present invention is achieved like this, and this rendering unit comprises:

-用于创建包括不规则形状对象的调制图像的创建器件；- creating means for creating modulated images including objects of irregular shape;

-用于在调制图像的其他像素值的基础上调制输入图像的一部分的像素值而形成中间图像的调制器件；和- modulating means for modulating pixel values of a part of the input image on the basis of other pixel values of the modulated image to form an intermediate image; and

-通过在视差数据的基础上扭曲中间图像而生成多视点图像。- Generation of multi-viewpoint images by warping intermediate images on the basis of disparity data.

本发明的另一目的是提供一种包括首段所述的渲染单元的图像处理设备，从而增加深度印象。Another object of the present invention is to provide an image processing device comprising a rendering unit as described in the opening paragraph, thereby increasing the impression of depth.

本发明的上述目的是这样实现的，所述渲染单元包括：The above-mentioned purpose of the present invention is achieved like this, and described rendering unit comprises:

-用于在调制图像的其他像素值的基础上调制输入图像的一部分的像素值从而形成中间图像的调制器件；和- modulating means for modulating pixel values of a part of the input image on the basis of other pixel values of the modulating image to form an intermediate image; and

本发明的另一目的是提供一种首段所述的计算机程序产品，从而增加深度印象。Another object of the present invention is to provide a computer program product as described in the opening paragraph, whereby the impression of depth is increased.

本发明的上述目的是这样实现的，上述计算机程序产品在被加载之后，向所述处理器件提供执行下列操作的能力：The above object of the present invention is achieved in that the above computer program product, after being loaded, provides the processing device with the ability to perform the following operations:

-在调制图像的其他像素值基础上调制输入图像的一部分的像素值，从而形成中间图像；和- modulating pixel values of a portion of the input image based on other pixel values of the modulating image, thereby forming an intermediate image; and

渲染单元的修改及其变化可以对应于图像处理设备、方法和计算机程序产品的修改及其变化。Modifications and variations of the rendering unit may correspond to modifications and variations of the image processing apparatus, method and computer program product.

附图说明 Description of drawings

参照下面对执行和实施例的描述并参考附图，根据本发明的渲染单元、图像处理设备、方法和计算机程序产品的这些和其它方面将变得明显并得以阐明，附图中：These and other aspects of the rendering unit, image processing device, method and computer program product according to the invention will be apparent and elucidated with reference to the following description of implementations and embodiments and with reference to the accompanying drawings in which:

图1示出根据本发明的调制图像、输入图像和中间图像；Figure 1 shows a modulated image, an input image and an intermediate image according to the present invention;

图2示意性地示出了根据本发明的渲染单元的实施例；Fig. 2 schematically shows an embodiment of a rendering unit according to the present invention;

图3示意性地示出了包括根据本发明的渲染单元的实施例的多视点图像生成单元；Fig. 3 schematically shows a multi-viewpoint image generating unit comprising an embodiment of a rendering unit according to the present invention;

图4示意性地示出了调制图像创建装置的实施例；和Figure 4 schematically illustrates an embodiment of a modulated image creation device; and

图5示意性地示出了根据本发明的图像处理设备的实施例。Fig. 5 schematically shows an embodiment of an image processing device according to the invention.

所有附图中相同的附图标记用于表示相似的部件。The same reference numerals are used throughout the drawings to refer to similar parts.

具体实施方式 Detailed ways

图1示出根据本发明的调制图像100、输入图像102和中间图像104。输入图像102是来自视频序列的图像。调制图像100和输入图像102具有相同的尺寸，即包括相同数量的像素。则直接用调制图像100对输入图像102进行调制。对于输入图像102的每个像素，在调制图像100中存在相应的像素，其直接与各自的亮度值的提高量或降低量相关。或者，调制图像100和输入图像102具有彼此不同的尺寸。则通过多次应用调制图像100或只应用调制图像100的一部分来执行对输入图像102的调制。或者，只对输入图像的一部分像素进行调制。Fig. 1 shows a modulated image 100, an input image 102 and an intermediate image 104 according to the invention. The input image 102 is an image from a video sequence. The modulated image 100 and the input image 102 have the same size, ie comprise the same number of pixels. Then the modulated image 100 is directly used to modulate the input image 102 . For each pixel of the input image 102, there is a corresponding pixel in the modulated image 100, which is directly related to the amount of increase or decrease of the respective brightness value. Alternatively, the modulated image 100 and the input image 102 have different sizes from each other. Modulation of the input image 102 is then performed by applying the modulated image 100 multiple times or only a part of the modulated image 100 . Alternatively, only a fraction of the pixels of the input image are modulated.

优选地，调制图像100包括第一组已连接的像素114和第二组像素，其中，第一组像素共同形成背景，而第二组像素形成前景对象106-112。这些前景对象是不规则形状对象。这些不规则形状对象106-112看起来象污点。优选地，这些不规则形状对象106-112的形状与输入图像102中的对象的形状并不相关。Preferably, the modulated image 100 includes a first set of connected pixels 114 and a second set of pixels, wherein the first set of pixels together form the background and the second set of pixels form the foreground objects 106-112. These foreground objects are irregularly shaped objects. These irregularly shaped objects 106-112 appear as blotches. Preferably, the shapes of these irregularly shaped objects 106 - 112 do not correlate with the shapes of the objects in the input image 102 .

这些不规则形状对象106-112的平均尺寸与视差量相关，并且因此与深度相关。注意，不同的不规则形状对象106-112可能具有彼此不同的尺寸。而且，典型地，输入图像102的不同像素的视差量显示出偏离，并且因此中间图像104的视差量也显示出偏离。然而，视差的平均尺寸和不规则形状对象106-112的平均尺寸优选地具有相同的数量级。The average size of these irregularly shaped objects 106-112 is related to the amount of parallax, and thus depth. Note that different irregularly shaped objects 106-112 may have different sizes from each other. Also, typically, the amount of parallax of different pixels of the input image 102 exhibits a deviation, and thus the amount of parallax of the intermediate image 104 also exhibits a deviation. However, the average size of the parallax and the average size of the irregularly shaped objects 106-112 are preferably of the same order of magnitude.

图1示出根据本发明的中间图像104。不规则形状对象106-112清楚可见。注意，所示中间图像104只是示例，以说明被夸张的调制效果。优选地，不规则形状对象较难觉察得到。这就意味着它们不应当是如此明显的。典型地，调制图像100中明显的亮度值的范围和数量与输入图像102中亮度值的数量相比相对较小。假设输入图像102的亮度值范围包括256个不同值。则典型地调制图像100的亮度值范围包括值[-2，2]。例如，第一组像素的亮度值，即背景114的亮度值，都等于-2或-1，同时第二组像素的亮度值，即不规则形状对象106-112的亮度值，都等于+2或+1。Figure 1 shows an intermediate image 104 according to the invention. Irregularly shaped objects 106-112 are clearly visible. Note that the intermediate image 104 shown is only an example to illustrate the exaggerated modulation effect. Preferably, irregularly shaped objects are less perceptible. That means they shouldn't be so obvious. Typically, the range and number of apparent luminance values in modulated image 100 are relatively small compared to the number of luminance values in input image 102 . Assume that the range of luminance values of the input image 102 includes 256 different values. The range of luminance values of the modulated image 100 then typically comprises the values [-2, 2]. For example, the brightness values of the first group of pixels, i.e. the brightness values of the background 114, are all equal to -2 or -1, while the brightness values of the second group of pixels, i.e. the brightness values of the irregularly shaped objects 106-112, are all equal to +2 or +1.

图2示意性地示出了根据本发明的渲染单元200的实施例。渲染单元200用于在输入图像102的基础上对包括第一输出图像和第二输出图像的多视点图像进行渲染。在图像输入连接器208处提供输入图像102。渲染单元200在它的图像输出连接器210和212处提供第一输出图像和第二输出图像。所述渲染单元200包括：Fig. 2 schematically shows an embodiment of a rendering unit 200 according to the present invention. The rendering unit 200 is configured to render a multi-view image including a first output image and a second output image based on the input image 102 . The input image 102 is provided at an image input connector 208 . The rendering unit 200 provides a first output image and a second output image at its image output connectors 210 and 212 . The rendering unit 200 includes:

-调制图像创建装置206，用于创建包括不规则形状对象106-112的调制图像100；- modulation image creation means 206 for creating a modulation image 100 comprising irregularly shaped objects 106-112;

-调制装置202，用于在调制图像100的其他像素值的基础上调制输入图像102的一部分的像素值，从而形成中间图像104；和- modulating means 202 for modulating pixel values of a part of the input image 102 on the basis of other pixel values of the modulated image 100, thereby forming an intermediate image 104; and

-生成装置204，用于生成第一输出图像和第二输出图像其中，第一输出图像是通过在基于视差数据的第一转化的基础上扭曲中间图像而生成的，第二输出图像是通过在基于视差数据的第二转化的基础上扭曲中间图像而生成的。- generating means 204 for generating a first output image and a second output image wherein the first output image is generated by distorting the intermediate image on the basis of a first transformation based on disparity data and the second output image is generated by Generated by distorting the intermediate image based on the second transformation of the disparity data.

可以使用一个处理器来实现调制图像创建装置206、调制装置202和生成装置204。通常，在软件程序产品的控制下执行这些功能。在执行期间，通常将软件程序产品载入例如RAM的存储器，并从那里执行。所述程序可以从后台存储器，例如ROM、硬盘或磁和/或光存储器加载，亦或经由如互联网的网络而加载。可选地，专用集成电路提供所述功能。Modulated image creating means 206, modulating means 202 and generating means 204 may be implemented using one processor. Typically, these functions are performed under the control of a software program product. During execution, the software program product is typically loaded into a memory, such as RAM, and executed from there. The program may be loaded from a background memory such as ROM, hard disk or magnetic and/or optical storage, or via a network such as the Internet. Optionally, an ASIC provides the functions.

结合图4，描述调制图像创建装置206的实施例。An embodiment of the modulated image creating means 206 is described with reference to FIG. 4 .

优选地，将调制装置202设置为执行如等式2所规定的功能。Preferably, the modulation means 202 is arranged to perform the function as specified in Equation 2.

L_out(x，y)＝L_in(x，y)+g(x，y)*L_mod(x，y) (2)L _out (x, y)=L _in (x, y)+g(x, y)*L _mod (x, y) (2)

其中，in,

-L_in(x，y)是输入图像102的坐标为(x，y)的像素的亮度值；-L _in (x, y) is the brightness value of the pixel whose coordinates are (x, y) of the input image 102;

-L_out(x，y)是中间图像104的坐标为(x，y)的像素的亮度值，即调制装置的输出；- L _out (x, y) is the brightness value of the pixel with coordinates (x, y) of the intermediate image 104, i.e. the output of the modulation means;

-L_mod(x，y)是调制图像100的坐标为(x，y)的像素的亮度值；和- L _mod (x, y) is the brightness value of the pixel at coordinates (x, y) of the modulated image 100; and

-g(x，y)是增益因子，优选地它可以由用户调节。所述增益g(x，y)对于所有像素可以是相等的，但是优选地，每个像素拥有其自己的增益因子。可以通过增益输入连接器214提供增益g(x，y)的实际值。-g(x,y) is the gain factor, preferably it can be adjusted by the user. The gain g(x,y) may be equal for all pixels, but preferably each pixel has its own gain factor. The actual value of the gain g(x, y) can be provided through the gain input connector 214 .

将生成装置204设置为渲染第一输出图像和第二输出图像。例如，该渲染正如在P.A.Redert、E.A.Hendriks和J.Biemond的文章“Synthesis of multi viewpoint images at non-intermediatepositions”(Proceedings of International Conference onAcoustics，Speech and Signal Processing，Vol.IV，ISBN 0-8186-7919-0，第2749-2752页，IEEE Computer Society，LosAlamitos，California，1997)中所描述的。或者，该渲染正如在R.P.Berretty和F.E.Ernst的文章“High-quality images from 2.5Dvideo”(Proceedings Eurographics，Granada，2003，Short Note124)中所描述的。为了该渲染，生成装置204需要视差输入连接器216所提供的视差或深度信息。The generating means 204 is arranged to render the first output image and the second output image. For example, the rendering is as described in the article "Synthesis of multi viewpoint images at non-intermediate positions" by P.A. Redert, E.A. Hendriks and J. Biemond (Proceedings of International Conference on Acoustics, Speech and Signal Processing, Vol. IV, ISBN 0-8186-7919 -0, pp. 2749-2752, IEEE Computer Society, Los Alamitos, California, 1997). Alternatively, the rendering is as described in the article "High-quality images from 2.5D video" by R.P. Berretty and F.E. Ernst (Proceedings Eurographics, Granada, 2003, Short Note 124). For this rendering, the generation means 204 requires disparity or depth information provided by the disparity input connector 216 .

调制图像创建装置206可以包括下列两个可选输入连接器：清晰度输入连接器220和运动矢量输入连接器218。The modulated image creation device 206 may include the following two optional input connectors: a resolution input connector 220 and a motion vector input connector 218 .

优选地，在输入图像中引入不规则形状对象仅限于基本上均匀的那部分输入图像。这可以通过只在局部，即在基本上均匀的区域中，调制输入图像而实现。或者，调制图像创建装置206考虑了关于输入图像的图像内容的信息，特别是均匀区域的存在和位置。该信息可以通过外部清晰度计算装置302提供，或者可以由渲染单元200自己计算。在两种情况下，基于对图像像素的清晰度值的计算，来确定清晰度信息。优选地，它是特定的输入图像，调制图像可以加入其中，或者调制图像可以与之合并。或者，基于对另一图像像素的清晰度值的计算，来确定清晰度信息，其中该另一图像来自该特定输入图像所属的图像序列。Preferably, the introduction of irregularly shaped objects in the input image is limited to that portion of the input image that is substantially uniform. This can be achieved by modulating the input image only locally, ie in a substantially uniform area. Alternatively, the modulated image creation means 206 takes into account information about the image content of the input image, in particular the presence and location of homogeneous regions. This information may be provided by an external resolution calculation device 302, or may be calculated by the rendering unit 200 itself. In both cases, the sharpness information is determined based on calculations of sharpness values for image pixels. Preferably, it is a specific input image into which the modulated image can be added, or with which the modulated image can be merged. Alternatively, the sharpness information is determined based on calculation of sharpness values of pixels of another image from the image sequence to which the particular input image belongs.

优选地，通过计算特定像素的亮度和/或色彩值与该特定像素的邻接像素的亮度和/或色彩值之间的差别，来确定该特定像素的清晰度值。通过计算图像的各个像素的清晰度值，来形成清晰度图。亮度和/或色彩值之间的相对较大的差别意味着相对较高的清晰度值。接下来，分析并选择性地修改清晰度图。这意味着确定了具有较多清晰度值较低的像素的第一区域，并确定了具有较多清晰度值较高的像素的第二区域。假设第一区域为均匀区域，设第二区域为纹理区域或细节区域(detai1ed region)。基于该划分，确定增益因子g(x，y)的值，并创建调制图像100。典型地，这意味着对应于第一区域的调制图像100的亮度值L_mod(x，y)是这样的，即在调制期间它们对输入图像100没有或基本没有任何影响，例如L_mod(x，y)＝0，同时对应于第二区域的调制图像100的亮度值L_mod(x，y)是这样的，即在调制期间它们对输入图像100有影响，例如Lmod(x，y)＝-2、-1、1或2。Preferably, the sharpness value of a specific pixel is determined by calculating the difference between the brightness and/or color value of the specific pixel and the brightness and/or color values of neighboring pixels of the specific pixel. A sharpness map is formed by calculating the sharpness value of each pixel of the image. Relatively large differences between brightness and/or color values imply relatively high sharpness values. Next, analyze and optionally modify the sharpness map. This means that a first area with more pixels with lower sharpness values is determined and a second area with more pixels with higher sharpness values is determined. It is assumed that the first region is a uniform region, and the second region is a texture region or a detailed region (detailed region). Based on this division, the value of the gain factor g(x,y) is determined and the modulation image 100 is created. Typically, this means that the luminance values L _mod (x, y) of the modulated image 100 corresponding to the first region are such that they have no or substantially no influence on the input image 100 during modulation, e.g. L _mod (x , y)=0, while the luminance values L _mod (x, y) of the modulated image 100 corresponding to the second region are such that they have an influence on the input image 100 during modulation, for example Lmod (x, y)= -2, -1, 1 or 2.

通过清晰度输入连接器220将包括划分信息的清晰度图提供给渲染单元200。The resolution map including division information is supplied to the rendering unit 200 through the resolution input connector 220 .

对应于后续的输入图像，创建后续的调制图像可以彼此完全独立地进行。或者，在创建特定调制图像和后续调制图像之间存在关联。通过创建后续的调制图像而考虑后续输入图像之间的运动是有益的。通过分析特定输入图像及其后续者之间的动作，可以确定移位。优选地，将该移位应用于移位特定调制图像，以便获得下一调制图像。优选地，后续的输入图像之间的运动取决于在运动矢量场的基础上建立运动模型。通过运动估计器确定该运动矢量场。该运动估计器例如可见于G.de Haan等人的文章“True-Motion Estimation with 3-DRecursive Search Block Matching”(IEEE Transactions oncircuits and systems for video technology，vol.3，no.5，1993年10月，368-379页)。The creation of subsequent modulated images, corresponding to subsequent input images, can be done completely independently of each other. Alternatively, there is a link between the creation of a particular modulated image and subsequent modulated images. It is beneficial to account for motion between subsequent input images by creating subsequent modulated images. Shifts can be determined by analyzing the motion between a particular input image and its successors. Preferably, the shift is applied to shift a particular modulated image in order to obtain the next modulated image. Preferably, the motion between subsequent input images depends on establishing a motion model based on the motion vector field. This motion vector field is determined by a motion estimator. This motion estimator can be found, for example, in the article "True-Motion Estimation with 3-D Recursive Search Block Matching" by G. de Haan et al. (IEEE Transactions on circuits and systems for video technology, vol.3, no.5, October 1993 , pp. 368-379).

通过运动矢量输入连接器218将运动信息提供给渲染单元200。The motion information is provided to the rendering unit 200 through the motion vector input connector 218 .

图3示意性地示出了包括根据本发明的渲染单元200的实施例的多视点图像生成单元300。将多视点图像生成单元300设置为在一系列视频图像的基础上生成一系列多视点图像。多视点图像生成单元300在输入连接器308处备有视频图像流，并分别在输出连接器310和312处提供两个相关的视频图像流。将这两个相关的视频图像流提供给多视点显示装置，所述多视点显示装置用于使基于该相关视频图像流中的第一个视频图像流的第一系列视图可视化，并使基于该相关视频图像流中的第二个视频图像流的第二系列视图可视化。如果用户，即观看者，用他的左眼观察第一系列视图，并用他的右眼观察第二系列视图，则他获得3-D印象。相关的视频图像流中的第一个视频图像流可以对应于接收到的视频图像序列，并且，根据本发明的方法可以基于接收到的视频图像序列对相关的视频图像流中的第二个视频图像流进行渲染。优选地，基于接收到的视频图像序列，根据本发明的方法对两个视频图像流进行渲染。Fig. 3 schematically shows a multi-viewpoint image generation unit 300 including an embodiment of the rendering unit 200 according to the present invention. The multi-viewpoint image generating unit 300 is configured to generate a series of multi-viewpoint images based on a series of video images. The multi-view image generation unit 300 has a video image stream at input connector 308 and provides two related video image streams at output connectors 310 and 312, respectively. The two related video image streams are provided to a multi-view display device for visualizing a first series of views based on a first of the related video image streams and for visualizing a first series of views based on the first of the related video image streams. A second series of view visualizations for a second video image stream in the related video image stream. If the user, ie the viewer, observes the first series of views with his left eye and the second series of views with his right eye, he gets a 3-D impression. The first video image stream in the associated video image stream may correspond to the received video image sequence, and the method according to the present invention may evaluate the second video image in the associated video image stream based on the received video image sequence. The image stream is rendered. Preferably, two video image streams are rendered according to the method of the present invention based on the received video image sequence.

多视点图像生成单元300还包括：The multi-viewpoint image generating unit 300 also includes:

-清晰度计算装置302，用于确定输入图像的哪些区域是均匀的。通过清晰度输入连接器220将清晰度计算装置302的输出提供给渲染单元200。- Sharpness calculation means 302 for determining which regions of the input image are homogeneous. The output of the sharpness calculation means 302 is provided to the rendering unit 200 through the sharpness input connector 220 .

-运动估计器304，用于估计后续输入图像之间的运动。通过运动矢量输入连接器218将运动估计器304的输出提供给渲染单元200；和- A motion estimator 304 for estimating motion between subsequent input images. providing the output of motion estimator 304 to rendering unit 200 via motion vector input connector 218; and

-深度创建单元306，用于确定输入图像中各个对象的深度信息。基于该深度信息确定视差图，所述视差图通过视差输入连接器216被供给渲染单元300。- A depth creation unit 306, configured to determine the depth information of each object in the input image. Based on this depth information a disparity map is determined which is supplied to the rendering unit 300 via the disparity input connector 216 .

注意，虽然将多视点图像生成单元300设计为处理视频图像，但也可以设置多视点图像生成单元300的替代实施例，以基于单独的图像，即静止画面，来生成多视点图像。Note that although the multi-viewpoint image generation unit 300 is designed to process video images, alternative embodiments of the multi-viewpoint image generation unit 300 may also be configured to generate multi-viewpoint images based on a single image, ie, a still picture.

注意，虽然所述多视点图像生成单元300具有两个输出连接器310和312，但也可以有替代的输出方式。除此之外，形成一个多视点图像的输出图像的数量并不严格限于2个。Note that although the multi-viewpoint image generation unit 300 has two output connectors 310 and 312, alternative output methods are also possible. Besides, the number of output images forming one multi-viewpoint image is not strictly limited to two.

图4示意性地示出了根据本发明的调制图像创建装置206的实施例。所述调制图像创建装置包括：Fig. 4 schematically shows an embodiment of a modulated image creation device 206 according to the present invention. The modulation image creation device includes:

-用于创建第一图像的随机噪声发生器402；- a random noise generator 402 for creating the first image;

-低通滤波器404，用于过滤第一图像，从而形成第二图像。低通滤波器的特点与视差数据相关，以便创建具有适宜尺寸的不规则形状对象；和- A low pass filter 404 for filtering the first image to form the second image. The characteristics of the low-pass filter are correlated with the disparity data in order to create irregularly shaped objects of suitable size; and

-比较装置406，用于将第二图像的像素值与预定阈值进行比较，以便划分第二图像的像素，从而形成调制图像。这样进行划分，即将已连接的像素的组标记为属于各不规则形状对象，同时将已连接的像素的其它组标记为背景。- comparing means 406 for comparing the pixel values of the second image with predetermined thresholds in order to divide the pixels of the second image to form a modulated image. The division is made such that groups of connected pixels are marked as belonging to each irregularly shaped object, while other groups of connected pixels are marked as background.

图5示意性地示出了根据本发明的图像处理设备500的实施例，包括：Fig. 5 schematically shows an embodiment of an image processing device 500 according to the present invention, including:

-接收单元502，用于接收表示输入图像的视频信号；- a receiving unit 502 for receiving a video signal representing an input image;

-多视点图像生成单元300，用于基于接收到的输入图像生成多视点图像，如结合图3所述；和- a multi-viewpoint image generating unit 300, configured to generate a multi-viewpoint image based on the received input image, as described in connection with FIG. 3; and

-多视点显示装置504，用于显示由多视点图像生成单元300提供的多视点图像。- a multi-viewpoint display device 504 for displaying the multi-viewpoint images provided by the multi-viewpoint image generation unit 300 .

视频信号可以是经由天线或线缆接收到的广播信号，但也可以是来自例如VCR(录像机)或数字化通用光盘(DVD)之类的存储装置的信号。在输入连接器506处提供信号。图像处理设备500例如可以是TV。或者，图像处理设备500不包括可选的显示装置，而是向包括显示装置504的设备提供输出图像。则图像处理设备500可以是例如机顶盒、卫星调谐器、VCR播放器、DVD播放器或记录器。可选地，图像处理设备500包括例如硬盘的存储器件或用于在例如光盘的可移动介质上进行存储的器件。图像处理设备500还可以是由电影公司或广播公司所应用的系统。The video signal may be a broadcast signal received via an antenna or cable, but may also be a signal from a storage device such as a VCR (Video Recorder) or a Digital Versatile Disc (DVD). The signal is provided at input connector 506 . The image processing device 500 may be, for example, a TV. Alternatively, the image processing device 500 does not include an optional display device, but provides an output image to a device including the display device 504 . The image processing device 500 may then be eg a set top box, satellite tuner, VCR player, DVD player or recorder. Optionally, the image processing device 500 includes a storage device such as a hard disk or a device for storage on a removable medium such as an optical disc. The image processing device 500 may also be a system applied by a movie company or a broadcaster.

应该注意，上述实施例说明而非限制本发明，并且，本领域技术人员可以设计替代实施方式，而不脱离权利要求的范围。权利要求中，括号中的任何附图标记不构成对权利要求的限制。词语“包括”不排除权利要求中未列举的元件或步骤。元件前面的词语“一”或“一个”不排除存在多个这样的元件。通过包括若干确切的元件的硬件和通过合适的编程计算机可以实现本发明。在列举了若干器件的单元权利要求中，这些器件中的一些可以由一个和相同项的硬件实现。词语第一、第二和第三等等的使用不指示任何排序。可以将这些词语解释为相同。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several precise elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second and third etc. does not indicate any ordering. These words can be interpreted as the same.

Claims

1. A method for rendering a multi-viewpoint image based on an input image (102) and parallax data, comprising:

- creating a modulated image (100) comprising irregularly shaped objects (106-112);

- modulating pixel values of a portion of the input image (102) on the basis of corresponding pixel values of the modulating image (100), thereby forming an intermediate image (104); and

- Generating a multi-view image by warping the intermediate image (104) on the basis of the disparity data.

2. The method of claim 1, the size of the irregularly shaped object (106-112) being related to the disparity data.

3. The method of claim 2, the mean size of the irregularly shaped objects (106-112) and the mean value of the disparity data are of the same order of magnitude.

4. A method as claimed in any preceding claim, the part of the input image (102) being a uniform area.

5. The method of claim 4, creating the modulated image (100) comprising:

- creating the first image by generating noise;

- filtering the first image using a low pass filter, thereby forming the second image; and

- Dividing the pixels of the second image by thresholding to form a modulated image (100).

6. The method of claim 4, modulating pixel values based on disparity data.

7. The method of claim 4, creating the modulated image (100) based on motion vectors calculated based on the sequence of input images (102) to which the input image belongs.

8. A rendering unit (200) for rendering a multi-viewpoint image based on an input image (102) and disparity data, the rendering unit comprising:

- creating means (206) for creating a modulated image (100) comprising irregularly shaped objects (106-112);

- modulating means (202) for modulating pixel values of a part of the input image on the basis of corresponding pixel values of the modulating image (100) to form an intermediate image (104); and

- Generating means (204) for generating a multi-viewpoint image by warping the intermediate image (104) on the basis of disparity data.

9. An image processing device (500), comprising:

- receiving means (502) for receiving a signal corresponding to the input image (100);

- a rendering unit (200) for rendering multi-viewpoint images according to claim 8; and

- Display means (504) for displaying multi-viewpoint images.