WO2006082541A2

WO2006082541A2 - Segmentation of an image

Info

Publication number: WO2006082541A2
Application number: PCT/IB2006/050264
Authority: WO
Inventors: Fabian E. Ernst; Patrick P. E. Meuwissen
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2005-02-07
Filing date: 2006-01-25
Publication date: 2006-08-10
Also published as: WO2006082541A3

Abstract

Segmentation for enabling conversion of a two-dimensional image to a three-dimensional image.

Description

Segmentation of an image

The present invention is generally related to the enabling of conversion of images provided in two dimensions to three or more dimensions and more particularly to a method, device and computer program product for segmenting an image in a signal as well as to a device for providing a multi-dimensional image.

There has in recent years been vast research in the area of conversion of two- dimensional images into three-dimensional images for instance in order to provide 3D TV. When doing this it is customary to segment an image. The objective of segmenting is to group pixels of an image having for instance the same color together. This process, which involves quite heavy processing capabilities, can then be provided for all images of a signal. This process guarantees the accuracy of the provided segments. However they might not be consistent from image to image. There is furthermore a latency requirement regarding these images, which means that there is a certain time limit within which a following image has to be processed after a preceding image. This latency requirement can then be hard to meet if fresh segmentations are performed for each image.

Furthermore, in many applications the images are made up of objects that move from image to image. In order to reduce processing, a segment from a previous image can then be moved into a next image, which means that earlier processing can be used. This provides a consistency between different images. However then it is possible that objects that have been blocked in previous images are not taken care of, and new objects entering the scene are not handled either.

WO 2004057460 describes the division of an image into regions or tiles. This tiling is then used for motion estimation. Motion compensation can then be used for moving a segment from image to image.

In regard to this it would be interesting to provide an approach where the different ways of providing segments in an image were combined in a way that reduces the required computational power, while at the same time providing a balance between the accuracy and consistency requirements. One object of the present invention is therefore to provide an improved segmenting selection scheme, and in particular one where segmentation for an image is provided in a way which balances the requirements of accuracy and consistency as well as limits the computational power needed.

According to a first aspect of the present invention, this object is achieved by a method of segmenting an image in a signal comprising a number of images comprising the steps of: - dividing a present image in the signal into a number of regions, selecting a limited number of regions, applying a segmentation scheme on said limited number of regions that provides at least one fresh segment, and providing at least one segment created for a previous image in the signal to the other regions of the present image.

According to a second aspect of the present invention, this object is also achieved by a device for segmenting an image in a signal comprising a number of images and comprising: a segmenting unit arranged to apply a segmentation scheme to images, and - a control unit arranged to:

- divide a present image in the signal into a number of regions,

- select a limited number of regions,

- order said segmenting unit to apply said segmentation scheme on said selected regions that provides at least one fresh segment, and - provide at least one segment created for a previous image in the signal to the other regions of the present image.

According to a third aspect of the present invention, this object is also achieved by a device for providing a multi-dimensional image out of a signal comprising a number of two-dimensional images and comprising: - an image obtaining unit arranged to obtain the signal, a device for segmenting an image in the signal and comprising a segmenting unit arranged to apply segmentation schemes to images, and a control unit arranged to:

- divide a present image in the signal into a number of regions, - select a limited number of regions,

- order said segmenting unit to apply said segmentation scheme on said selected regions that provides at least one fresh segment, and

- provide at least one segment created for a previous image in the signal to the other regions of the present image, a motion estimation unit for generating motion vectors to be applied on created segments, a motion compensating unit arranged to motion compensate segments of images provided by the device for segmenting, and - a conversion unit arranged to code segmented images into a signal (X) having a format allowing the provision of multi-dimensional images.

According to a fourth aspect of the present invention, this object is also achieved by a computer program product for segmenting an image in a signal comprising a number of images and comprising computer program code, to make a computer execute, when said program code is loaded in the computer: divide a present image in the signal into a number of regions, select a limited number of regions, apply a segmentation scheme on said selected regions that provides at least one fresh segment, and - provide at least one segment created for a previous image in the signal to the other regions of the present image.

With the present invention there is provided a balance between the accuracy and consistency requirements of segmentation of an image while at the same time allowing a limiting of the required processing power. The invention thus allows the provision of a good segmentation using limited computational power and fulfilling the latency requirements of the segmenting process. The present invention is furthermore scalable, which allows changing the number of selected regions for fulfilling the latency requirements. The invention furthermore allows the provision of devices for segmenting at a low cost.

According to claim 2 the selection of the limited number of regions is based on computational resource restrictions where the number is set according to the computational resource restrictions. This feature has the advantage of guaranteeing that the processing power is used as efficiently as possible while at the same time meeting the latency requirements. According to claim 3 the providing of at least one segment created for a previous image comprises motion compensating at least one segment created for said previous image and according to claim 4, a selection criterion is used that is based on the movement of segments of said previous image in relation to each other and to a region of the present image. This has the advantage of limiting the selection to regions, where there is known to have been changes in relation to a previous image.

According to claim 5, the selection criterion is based on a motion compensated segment leaving an area of a region of the present image compared with the previous image. This has the advantage of selecting regions having areas that are not occupied by segments. These regions are regions that likely need a fresh segmentation.

According to claim 6 the selection criterion is based on counting, in each region, pixels of all areas left by motion compensated segments and applying a segmentation scheme for regions with the highest count and according to claim 7 the count is reset for regions where a fresh segmentation has been performed. This has the advantage of guaranteeing that also regions where there are small changes, will receive a fresh segmentation from time to time.

Claims 8 and 9 are directed towards alternative ways of selecting regions for fresh segmentations.

According to claim 10 segments provided at the borders of regions are stitched by combining them. This feature has the advantage of making the segments consistent from image to image, and, more importantly, reduce the effect due to the region boundaries, especially where a fresh segmentation has been performed.

According to claim 11 the stitching comprises combining at least two segments in neighboring regions if a quality measure, after such a combining, is kept below a quality measure threshold. This feature has the advantage of only stitching those segments that can reasonably be expected to form the same segment, which further enhances the consistency of segments from image to image.

According to claim 12 the applying of a segmentation scheme or providing at least one segment created for a previous image is performed in parallel for all the regions of the image. This has the advantage of speeding up the segmentation processing and thus helps in meeting the latency requirements.

The basic idea of the invention is to divide a present image in a signal into a number of regions, select a limited number of regions, apply a segmentation scheme on the selected regions that provides at least one fresh segment and provide segments created for a previous image in the signal to the other regions of the present image. In this way the required computational power is reduced, while at the same time striking a balance between the accuracy and consistency requirements of the segmentation.

The above mentioned and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.

The present invention will be further described in relation to the accompanying drawings, in which: Figs. IA and B schematically show two images of a scene,

Fig. 2 schematically shows a block schematic of a device for providing a multi-dimensional image according to the invention,

Fig. 3 schematically shows a block schematic of a device for segmenting an image according to the present invention, Fig.4 schematically shows a flow chart of a method according to the present invention, and

Fig. 5 schematically shows a computer program medium according to the present invention in the form of a CD Rom disc.

The present invention is generally directed towards segmenting of images, which is an important step when processing an image. This is of importance for instance when converting two dimensional images into three dimensional images. However if new segments are to be provided for each image, a lot of processing power is needed, which might not be at hand on all types of devices, especially if they are to be provided on a price sensitive consumer market. This might also reduce the consistency of the segmentation from image to image. It is furthermore possible to motion compensate an already provided segment from one frame to the other, which requires considerably less computational power, but then inconsistencies might also arise because of for instance (de-)occlusion of segments. An example of this is outlined in Figs. IA and B, where Fig. IA shows a first image I₁ in a signal and Fig. IB shows a second image h in the signal. The images are provided after one another in the same video sequence and depict the same scene. Thus there are only small changes between the images. The first image I₁ shows a number of objects that have been segmented. A segment is normally provided by combining a number of pixel elements that have the same color. One way in which this can be done is described in the article by F. Ernst, P. Wilinski and K. van Overveld, "Dense structure-from-motion: an approach based on segment matching", Proc. European Conference on Computer Vision 2002, Springer LNCS 2531, Copenhagen, pages=II-217~II-231, which document is herein incorporated by reference.

In Fig. IA there are shown six segments S₁, S₂, S₃, S₄, S₅ and S₆, where a first and a second segment S₁ and S₂ make up a first object, a third segment S₃ makes up a second object, a fourth segment S₄ makes up a third object and a fifth and sixth segment S₅ and S₆ make up a fourth object. The first and fourth objects are here moving, while the second and third objects are stationary. It should here be realized that the number of segments shown are very few in order to describe the functioning of the present invention. It should be realized that in reality there are normally provided more segments. In Fig. IA the image has furthermore been divided into six regions, where one has received reference numeral 10. In this embodiment the regions are in the form of tiles. The number of regions might also be more or fewer.

As mentioned above the first and fourth objects are moving, where the first object moves to the right, while the fourth object moves to the left. Fig. IB shows a next image I₂ where the objects from the same scene have been moved. When this movement has been made, it can be seen that the first and second segments S₁ and S₂ are partially provided over the third segment S₃. Here a first overlapping area 12 associated with the first segment S₁ is indicated and a second overlapping area 16 associated with the second segments S₂ is indicated as well as a first area 14 that the first segment S₁ has left and a second area 18 that the second segment S₂ has left. The same type of areas 22, 26 and 20, 24 are here provided for the fourth and fifth segments S₄ and S₅. As can be seen from Fig. IB there are areas that might have double occupancy of different segments because of movement, i.e. overlapping areas 12, 16, 22 and 26, as well as empty spaces 14, 18, 20, 24 because of segment movement. These types of areas should then be handled in the time limit that the image format provides. This might be hard because of the limited computational power at hand. A device 28 for providing multi-dimensional images according to the present invention is shown in a block schematic in Fig. 2. The device 28 comprises an image obtaining unit 30 connected to a device 32 for segmenting images, the segmenting device 32 is in turn connected to a motion estimation unit 34, which in turn is connected to a motion compensating unit 36. The motion compensating unit 36 is also connected to the segmenting device 32. The segmenting device 32 is further connected to a conversion unit 38, which in turn is connected to a unit 40 for generating multi-view images. The unit 40 is finally connected to a display unit 42.

A block schematic of the segmentation device 32 is schematically shown in Fig. 3. This device 32 includes a control unit 44, a segment library 46 and a segmenting unit 48 both separately connected to the control unit 44. The control unit 44 is to be connected to the image obtaining unit 30, the motion estimation unit 34, the motion compensating unit 36 and the conversion unit 42.

The functioning of the present invention will now be described with reference being made to Figs. IA, IB, 2, 3 and 4, of which the latter shows a flow chart of a method according to the present invention.

The image obtaining unit 30 first obtains a video signal including a number of images. The signal can for instance be received via the air, via a computer network or from a video camera. It can also be retrieved from a local storage. This signal can be a two- dimensional MPEG coded video signal or include images that are not coded, i.e. a signal where luminance, color and brightness for a number of pixels are provided. In case the signal is an MPEG signal this is then converted to a non-coded image signal. The signal then comprises a number of images of video, where the content in the previously shown images I₁ and h are examples of images in such a signal. Under the assumption that image I₁ is the first image of a certain scene, this image, which is here also called a previous image, is then provided from the image obtaining unit 30 to the control unit 44 of the segmenting device 32. Since it is the first image of a scene, the whole image is provided by the control unit 44 to the segmenting unit 48, which goes on and performs a fresh segmentation of the whole image according to a segmentation scheme, for instance based on color, like in the referenced article. The different segments are then stored in the segment library 46 by the control unit 44.

Thereafter the image obtaining unit 30 sends the second image h to the segmenting device 32, which segmenting device 32 thus receives this image h that is also denoted present image, step 50. The control unit 44 then divides this image h into regions, step 52. Each region is preferably rectangular in shape, where the horizontal and vertical dimension is preferably an integer that is a multiple of eight pixels. In this way the regions can be used for other image based processing. After that the control unit 44 provides the segments S₁ - S₆, here collectively denoted S, to the motion estimation unit 34, which determines motion vectors V for the segments S based on information in the two images I₁ and h. The motion vectors V are then provided from the motion estimation unit 34 to the motion compensating unit 36. The motion compensating unit 36 then motion compensates the segments and provides the motion compensated segments S' to the control unit 44, which determines their relevance for the different regions. The details of how the motion vectors can be generated and used for motion compensation of a segment is described in more detail in WO2004/057460, which is herein incorporated by reference.

Thus the segments S of the first image I₁ are motion compensated and their relevance for the regions of the second image are determined, step 54. Thereafter the control unit 44 compares each region with a selection criterion SC, step 56, which will be described further below. A limited number of regions that best match the selection criterion are then selected by control unit 44 for a fresh new segmentation, step 58. The number of regions that are selected this way are chosen in dependence of the resource restrictions, i.e. the amount of processing power available in order to provide fresh segmentation in the time limit the image format allows. Normally there is a certain time or latency within which the segmentation process of an image has to be completed and the number of regions selected are decided based upon how well the device 32 can fulfill this requirement. This number can furthermore be pre-specified. Thereafter the control unit 44 provides the selected regions to the segmenting unit 48 which applies a fresh segmentation on these regions according to the above-described segmentation scheme, step 60. For each region there is therefore provided at least one fresh segment and normally several fresh segments. The control unit 44 thereafter provides the motion compensated segmentation, i.e. the previously provided segments that have been motion compensated to the rest of the regions, step 62. All the regions processed in this way are here processed in parallel. The control unit 44 then stitches the region borders so that they are consistent between each other, step 64. Stitching is performed in such a way that segments on each side of a region border are combined if the union of these segments lead to a valid segment. If not, only the segments in neighboring regions that share the longest boundary are combined. This feature has the advantage of making the segments consistent from image to image, especially where fresh segmentations have been performed. Thereafter the control unit 44 updates the segments of the present image in the image library 46 according to the segmentation made. The method then continues in the same above-described way according to the method steps in Fig. 4 for all images of the same scene. In case a first image of a new scene is then received by the control unit 44, a fresh segmentation is again performed for the whole image followed by segmenting according to the invention for the following images of that scene. The different method steps performed are also provided in table I, shown below.

50 RECEIVE IMAGE I₂

52 DIVIDE IMAGE I₂ INTO REGIONS

54 MOTION COMPENSATE SEGMENTS OF IMAGE I₁ AND DETERMINE RELEVANCE FOR REGIONS

56 COMPARE EACH REGION WITH SELECTION CRITERION SC

58 SELECT REGIONS ACCORDING TO RESOURCE RESTRICTIONS THAT MOST CLOSELY CORRESPOND TO SELECTION CRITERION

60 PERFORM FRESH SEGMENTATION FOR SELECTED REGIONS

62 PROVIDE THE MOTION COMPENSATED SEGMENTS TO THE REST OF THE REGIONS

64 STITCH THE REGIONS AT REGION BORDERS

TABLE I

The control unit 44 then provides all segmented images of the input signal, including images I₁ and I₂, to the conversion unit 38, which goes on and codes these image in an appropriate way and includes them in a signal X. The signal X comprises coded images suitable for use as three-dimensional images, perhaps via suitable coding regarding depth of the segments, combinations of the segments into objects, focus as well as other properties. The coding can here be made according to for instance the MPEG4 coding scheme. The signal X is then provided to the unit 40. There multi-view images corresponding to each coded image in signal X are generated. The multi-view images are here provided as sets of images, where each set depicts the same content from different viewpoints. These multi-view images are then provided to the display 42 for display to a user.

The selection criterion described above can be based on those regions where segments have been moved in relation to each other and in relation to the previous image. One such situation is where there is a de-occlusion or "hole" for a number of pixels of a region because of the movement of a segment from an area previously occupied by this segment, i.e. that there are many pixels that have no information because of segments being motion compensated. These regions are regions that likely need a fresh segmentation. The hole would however not appear if another segment of the previous image was moving into this area. If there are many such pixels the region in question is selected. If Fig. IB is taken as an example and the resource restriction specify that only two regions can get a fresh segmentation, it can be seen that the two regions furthermost to the left would be selected. This is because the area 14 and part of the area 18 in the uppermost region to the left and the rest of the area 18 in the lowest region to the left are the biggest "holes". This also means that the motion compensated segments S₄, S₅ and S₆ created in the first image I₁ would be used directly without a fresh segmentation, while the segments S₁, S₂ and S₃ all receive a fresh segmentation. One way to perform the selection is to count the "hole" pixels for each region and select the regions that have the highest count. One variation of this selection is that the count for a non-selected region is kept for the next image. Once a region has been selected for fresh segmentation, the count for that region is also reset to zero. This means that regions having small changes will eventually also receive the fresh segmentation in a scene. It is also possible to select a region that has either instead of or in addition to de-occlusion occlusion of objects because of motion compensation, i.e. there are pixel elements that have double occupancy, which is indicated by fields 12, 16, 22 and 26. It is then possible also to count these pixels and select the regions having the highest count.

The present invention has the advantage of allowing limited resources to be used while at the same time allowing a reasonable quality to be obtained. Thus a balance between the accuracy and consistency requirements of segmentation of an image is struck. The invention thus allows the provision of a good segmentation using limited computational power and fulfilling the latency requirements of the segmenting process. The present invention is furthermore scalable, which allows changing the number of regions for which a segmentation scheme is applied that might be needed for fulfilling the latency requirements. The invention furthermore allows the provision of low cost devices for segmenting. By basing the selection of regions on computational resource restrictions it is guaranteed that the processing power is used as efficiently as possible while at the same time meeting the latency requirements. By basing the selection criterion on a motion compensated segment leaving an area of a region, the selection of fresh segmentation is limited to regions, where there is known to have been changes in relation to a previous image. The parallel processing of the regions has the further advantage of speeding up the segmentation processing and thus helps in meeting the latency requirements.

There are a number of further variations that are possible to make in relation to the present invention. The selection criterion can be based on a quality measure of the previous segmentation, such as a variance measure of the average color, which is obtained by taking the root mean square value of for instance the color or brightness and select those regions for which the quality is low, i.e. the variance is high. It is furthermore possible to investigate the variation in motion provided for a region. The quality measure can furthermore also be applied when determining what stitching is to be performed, i.e. to combine segments if the quality measure stays below a threshold after such stitching has been made. This has the advantage of only stitching those segments that can reasonably be expected to form the same segment, which further enhances the consistency of segments from image to image. The regions in the embodiment described above were provided as tiles. They can however have any two-dimensional structure. When there is a scene change, there is no need for a time consistency requirement, as subsequent images are not correlated. Hence it is possible to then provide a fresh segmentation for a whole image. However it is also possible to apply the selection according to the present invention also in this case. Then the effects of a scene change will be taken care of in a number of frames. There are other variations that are possible to make to the present invention, where one such variation is that the display and possibly also the unit for multi- viewing are provided in another device, which the device according to the present invention is communicating with.

The control unit and the segmenting unit are preferably provided in the form of a processor with associated program memory, which comprises program code for performing the method according to the present invention. This program code can then be provided in the form of a computer program product which can be in the form of a CD Rom disc. One such disc 66 is generally shown in Fig. 5. It should be realized that other types of products are also feasible like for instance memory sticks. The program code can furthermore be downloaded into the device from a remote server.

All parts of the device for providing multi-dimensional images except for the display can furthermore be implemented in any suitable form including hardware, software, firmware or combinations of these. The elements and components of an embodiment of the invention may furthermore be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or may be physically and functionally distributed between different units and processors.

Although the present invention has been described in connection with specific embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. In the claims, the term comprising does not exclude the presence of other elements or steps. Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally although individual features may be included in different claims, these may possibly be advantageously combined and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. In addition singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc. do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.

Claims

CLAIMS:

1. Method of segmenting an image in a signal comprising a number of images comprising the steps of: dividing a present image (I₂) in the signal into a number of regions (10), (step 52), - selecting a limited number of regions, (step 58), applying a segmentation scheme on said selected regions that provides at least one fresh segment, (step 60), and providing at least one segment (S) created for a previous image (I₁) in the signal to the other regions of the present image, (step 62).

2. Method according to claim 1 , wherein the step of selecting a limited number of regions is based on computational resource restrictions, where the number is set according to the computational resource restrictions.

3. Method according to claim 1, wherein the step of providing at least one segment created for a previous image comprises motion compensating at least one segment created for said previous image.

4. Method according to claim 3, wherein the step of selecting uses a selection criterion that is based on the movement of segments (S₁, S₂, S₄, S₅) of said previous image in relation to each other and to a region of the present image.

5. Method according to claim 4, wherein said selection criterion is based on a motion compensated segment leaving an area of a region of the present image compared with the previous image.

6. Method according to claim 5, wherein said selection criterion is based on counting, in each region, pixels of all areas left by motion compensated segments and performing the step of applying a segmentation scheme for regions with the highest count.

7. Method according to claim 6, wherein the count is reset for regions where the step of applying a segmentation scheme has been performed.

8. Method according to claim 4, wherein said selection criterion is based on at least two segments occupying the same area in a region of the present image.

9. Method according to claim 4, wherein said selection criterion is based on a quality measure of a segment of the previous image being moved in a region.

10. Method according to claim 1, further comprising the step of stitching segments provided at the borders of at least two regions of said present image by combining them, (step 64).

11. Method according to claim 10, wherein the step of stitching comprises combining at least two segments in neighboring regions if a quality measure, after such a combining, is kept below a quality measure threshold.

12. Method according to claim 1, wherein the step of applying a segmentation scheme or providing at least one segment created for a previous image is performed in parallel for all the regions of the image.

13. Device (32) for segmenting an image in a signal comprising a number of images and comprising: - a segmenting unit (48) arranged to apply a segmentation scheme to images, and a control unit (44) arranged to:

- divide a present image (I₂) in the signal into a number of regions (10),

- select a limited number of regions, - order said segmenting unit to apply said segmentation scheme on said selected regions that provides at least one fresh segment, and

- provide at least one segment (S) created for a previous image (I₁) in the signal to the other regions of the present image.

14. Device (28) for providing a multi-dimensional image out of a signal comprising a number of two-dimensional images (I₁, I₂) and comprising: an image obtaining unit (30) arranged to obtain the signal, a device (32) for segmenting an image in the signal and comprising: - a segmenting unit (48) arranged to apply a segmentation scheme to images, and

- a control unit (44) arranged to: divide a present image (I₂) in the signal into a number of regions, select a limited number of regions, - order said segmenting unit to apply said segmentation scheme on said selected regions that provides at least one fresh segment, and provide at least one segment (S) created for a previous image (I₁) in the signal to the other regions of the present image, a motion estimation unit (34) for generating motion vectors (V) to be applied on created segments, a motion compensating unit (36) arranged to motion compensate segments (S') of images provided by the device for segmenting, and a conversion unit (38) arranged to code segmented images into a signal (X) having a format allowing the provision of multi-dimensional images.

15. Computer program product (66) for segmenting an image in a signal comprising a number of images and comprising computer program code, to make a computer execute, when said program code is loaded in the computer: divide a present image (I₂) in the signal into a number of regions (10), - select a limited number of regions, apply a segmentation scheme on said selected regions that provides at least one fresh segment, and provide at least one segment (S) created for a previous image (I₁) in the signal to the other regions of the present image.