[go: up one dir, main page]

Academia.eduAcademia.edu
A Region Based Stereo Matching Algorithm Using Cooperative Optimization Zeng-Fu Wang Zhi-Gang Zheng University of Science and Technology of China University of Science and Technology of China Jinzhai Road 96#, Hefei, Anhui, P.R.China. Jinzhai Road 96#, Hefei, Anhui, P.R.China. zfwang@ustc.edu.cn Abstract This paper presents a new stereo matching algorithm based on inter-regional cooperative optimization. The proposed algorithm uses regions as matching primitives and defines the corresponding region energy functional for matching by utilizing the color Statistics of regions and the constraints on smoothness and occlusion between adjacent regions. In order to obtain a more reasonable disparity map, a cooperative optimization procedure has been employed to minimize the matching costs of all regions by introducing the cooperative and competitive mechanism between regions. Firstly, a color based segmentation method is used to segment the reference image into regions with homogeneous color. Secondly, a local window-based matching method is used to determine the initial disparity estimate of each image pixel. And then, a voting based plane fitting technique is applied to obtain the parameters of disparity plane corresponding to each image region. Finally, the disparity plane parameters of all regions are iteratively optimized by an inter-regional cooperative optimization procedure until a reasonable disparity map is obtained. The experimental Results on Middlebury test set and real stereo images indicate that the performance of our method is competitive with the best stereo matching algorithms and the disparity maps recovered are close to the ground truth data. 1. Introduction Stereo matching is a key problem in computer vision. A large number of algorithms have been proposed to solve the problem. However, since the problem is an ill-posed one, a satisfying solution has not been received yet up to now [1,2]. With the development of the powerful global optimization methods such as Graph-Cut based and Belief-Propagation based ones [3-5, 18, 20], great progress has been made in stereo matching in recent years. In order to further improve the disparity estimation, the researchers paid more attention on the two difficult problems: occlusion and sparse texture, and presented some hopeful zzg@ustc.edu.cn methods. Among them, the region based methods [10-14, 20-22] have been received great success. They can give better results than others in case of sparse texture. These methods firstly use color or gray information to segment the inputted images, and then obtain the initial disparity estimation of the scene by using a known matching algorithm. Finally, a disparity fitting technique is employed to perform the task of disparity refinement for each region. But both of the two kinds of methods have similar shortcomings. Since they use labeling based optimization, these algorithms generally need to cluster regions in parameter space of disparity plane before optimization to reduce the number of labeling used in the later process. Such a treatment will reduce the generality of the algorithm, and may cause the false disparity results. If the parameters of a disparity plane are falsely estimated before the final optimization, then the errors can not be corrected. Because the disparity plane parameters are invariable in the final optimization. In this paper, we present a new stereo matching algorithm based on inter-regional cooperative optimization. Similar to other region based approaches, our algorithm uses regions as matching primitives and defines the corresponding region energy functional for matching by utilizing the color Statistics of regions and the constraints on smoothness and occlusion between adjacent regions. In order to obtain a more reasonable disparity map, a procedure of cooperative optimization between adjacent regions has been employed to minimize the matching costs of all regions in image. Unlike other similar algorithms, ours uses the cooperative and competitive relations between adjacent regions to optimize the total image energy functional. The main contribution of our paper is to combine some known techniques into a successful stereo matching algorithm under the framework of the cooperative optimization. The optimization procedure is simple but efficient as well as robust. In addition, our algorithm is not parameter sensitive one. It can obtain disparities of different scenes by using a set of fixed parameters. Our algorithm is composed of the following steps. Firstly, a color based segmentation method is used to segment the reference image (the left image of the inputted stereo pair) into regions according to homogeneous color. Secondly, a local window-based matching method is used to determine the initial disparity estimate of each image pixel. And then, a voting based plane fitting technique is applied to obtain the parameters of disparity plane corresponding to each image region. Finally, under the framework of inter-regional cooperative optimization, the disparity plane parameters of all regions are iteratively optimized by a local optimization procedure until a reasonable disparity map is obtained. The flowchart of the algorithm is shown in Figure 1. Our algorithm is robust to the initial estimation of disparities of regions. It permits the presence of errors in the initial disparity estimation. The false matching in the initial estimation can be corrected in the later processing. The experimental Results on Middlebury test set and real stereo images indicate that the performance of our method is competitive with the current best stereo matching algorithms and the disparity maps recovered by our algorithm are quite close to the ground truth data. algorithm. Here, we employ the WTA (the abbreviation of Winners-Take-All) strategy to select disparities from multiple candidates. Figure 2 shows the disparity map of the Tsukuba stereo pair produced by the algorithm. One could see that the disparity map obtained is not a perfect one. A lot of false matching exists. These matching errors will be corrected later in our algorithm. (a) The segmentation result of the left image Stereo images Image segmentation based on Mean-shift algorithm Initial disparity estimation based on the adaptive correlation window stereo matching algorithm The robust estimation of the disparity planes parameters based on fitting The cooperative optimization of disparity plane parameters Disparity computation based on optimal fitting parameters Disparity map Figure 1 The Flowchart of the algorithm 2. Obtaining the initial disparities 2.1. Region segmentation and region based adaptive correlation matching We firstly employ the Mean-Shift algorithm [19] to segment the left image of the inputted stereo pair, and then use a high speed stereo matching algorithm [6-9, 16], e.g. the adaptive correlation window based one presented in [16] to obtain the disparity map of the scene according to the region segmentation results given by the above Mean-Shift (b) The disparity map Figure 2 The disparity map of the Tsukuba stereo pair by using the adaptive correlation window stereo matching algorithm 2.2. A robust disparity plane fitting algorithm based on voting In order to eliminate the outlier in disparity map and obtain robust disparity estimation of a region, we develop a voting based disparity plane fitting algorithm. The disparity plane corresponding to a segmented region is expressed by the following function: (1) d ( x, y ) = a ∗ x + b ∗ y + c Where, x and y are image coordinates, and a, b and c are plane parameters. By using the matching reliability of each pixel as weight, the normal direction of the disparity plane can be given by the eigenvector belonging to the minimum eigenvalue of matrix A (for a similar description, see [15]): ⎡ ∑ N wi xi2 ∑ N wi xi y i ∑ N wi xi d i ⎤ i =0 i =0 ⎢ Ni =0 ⎥ N N (2) A = ⎢∑i =0 wi xi y i ∑i =0 wi y i2 ∑i =0 wi y i d i ⎥ ⎢ N N N 2 ⎥ ⎢⎣∑i =0 wi xi d i ∑i =0 wi y i d i ∑i =0 wi d i ⎥⎦ Where, wi is the weight reflecting the matching reliability of the pixel ( xi , y i ), d i is the disparity of the pixel ( xi , y i ), and N is the number of the matching pixels in the region. Inevitably, some false matching will be included in the initial disparity map. These outliers will then cause the false disparity fitting results. Therefore, we need to get rid of the outliers from the disparities before fitting. The RANSAC algorithm can be used to remove the outliers. However, the result of the RANSAC algorithm relies on the selection of initial points. Since the selection is random, the result obtained is not satisfying in some cases. Meanwhile, the RANSAC algorithm needs to set a threshold to distinguish whether a point is an outlier or not according to the range of the point to the estimated plane. When a false threshold is selected, the fitting result will be incorrect. In order to avoid the limitation, we adopt a more robust method based on voting. According to (1), one could obtain an estimation of the plane parameter a of a region by calculating δd / δx from a pair of points on a line along x-axis belonging to the region. Therefore if we do the similar calculations for all possible point pairs on the same lines belonging to the region, then we can obtain a lot of individual estimations of a. Then we can make a one-dimensional histogram by a voting operation, where the x-coordinate is a and the y-coordinate is the count number of a. After smoothing operation by a Gaussian filter, the maximum of the histogram will be regarded as the final estimation of a. Similarly, we can obtain a one-dimensional histogram from the estimations of b by calculating δd / δy for all possible point pairs on the same lines along y-axis belonging to the region. Again, we can give the final estimation of b from the maximum of the corresponding histogram. Once the estimation of a and b are obtained, we can determine the c from them by a similar voting operation. the outliers in the initial disparity map have been removed effectively. In order to evaluate the above voting method, as an example, we calculated the disparities of the Tsukuba stereo pair by using both the RANSAC algorithm and our voting algorithm respectively. Here, we selected the same system parameters to obtain the disparities for comparison. In addition, we compared the disparities obtained by the two algorithms with the ground truth respectively, and calculated the corresponding errors. The comparison results are shown in Figure 3. Here, the blue line is the error curve of the RANSAC algorithm, and the red line shows the one of our voting algorithm. One can see that both of the two algorithms have smaller error values, but our voting algorithm acts more robustly. Therefore, our method has a more powerful competence in aspect of eliminating the outliers. Figure 4 gives the disparity map obtained by disparity plane fitting operation. The experimental results show that The principle of cooperative optimization [17] is as below: a complex target is decomposed firstly into several comparatively simple sub-targets, and then these sub-targets are optimized individually under the condition of keeping the consistent common parameters. In our problem, the target of cooperative optimization is to optimize the disparities of each region so as to the disparities of the adjacent regions keep consistent. As shown in Figure 5, suppose R1, R2,…, and Rn be regions obtained by the Mean-shift segmentation algorithm. Let the total energy functional of all regions be E (x) , then the cooperative optimization algorithm will firstly decompose it into the sum of sub-target energy functionals as follows: Figure 3 The comparison of the plane fitting results based on the RANSAC algorithm and the voting algorithm. Here, the red horizontal line shows the error rate of the voting algorithm, and the blue line shows the error rate of the RANSAC algorithm Figure 4 The disparities obtained by the plane fitting algorithm based on voting 3. The Principle of Cooperative Optimization Algorithm . E( x) = E1 ( x) + E2 ( x) + L + En ( x) (3) Where, Ei (x) , i = 1, 2 ,… n is the energy functional of the ith region Ri. In such a way, the disparity computation problem becomes a region based optimization problem with multiple sub-targets. It is obvious that if we minimize individual energy functional solely for each region, then the final optimization results may not be consistent between adjacent regions. For example, suppose that the scene is piecewise smooth, and the target of optimization is to compute disparities of the scene, then the disparities obtained by individually optimizing all of sub-targets should be also piecewise smooth. But if we optimize these sub-targets solely, the disparities obtained will generally not be continuous. R1 R3 R2 R4 R6 R5 R7 R8 R9 R10 Figure 5 The sketch map for optimization of sub-targets Therefore, in order to obtain the reasonable disparities, we need to minimize the total energy functional. The cooperative optimization can help us perform the task. The solution is to firstly minimize all the energy functionals of a region and its adjacent regions simultaneously, and then propagate the results via iterative computation. The above optimization process is carried out iteratively until the algorithm converges. Concretely, in order to minimize the sub-target energy functional Ei(x), we consider an optimization problem including several associated sub-targets. The corresponding energy functional is set as follows: (1 − λi ) Ei ( x) + λi wij E j ( x) i.j =1…n (4) ∑ j ≠i Where, Ej(x) is the energy functional of the jth region Rj, Rj is an adjacent region of Ri, and 0 ≤ λi ≤ 1 , and 0 ≤ wij ≤ 1 are the corresponding weights. For example, consider the optimization of E6(x), the energy functional of R6 in Figure 5. Since R2, R3 , R5 . R7 and R9 are neighbors of R6, we should minimize E6(x) according to (4), where j = 2,3,5,7,9. The sub-target optimization problem defined by (4) can be solved by a local optimization method. The disparity plane parameters of each region on the left image are optimized according to (4) at each iteration. The disparity plane parameters obtained will be regarded as the initial estimation in next iteration. The optimization is carried out iteratively until the algorithm converges, or the times number of iteration is reached. The equations for iterative calculation are as follows: ⎛ ⎞ Ψi( k ) ( x) = min ⎜ (1 − λ(i k ) ) Ei ( x) + λ(i k ) wij E j ( x) ⎟ ⎜ ⎟ j ≠i ⎝ ⎠ for i=1…n (5) ∑ Where, the subscript i and j, and the superscript with a bracket (k) show the serial number of region and the times number of iteration. Rj is a neighbor of Ri. 4. The Cooperative Optimization Stereo Matching Algorithm Based on Regions In order to evaluate whether the disparity plane parameters of a region is satisfied or not, we need define a energy functional before cooperative optimization. In this paper, we define the energy functional of the ith region as follows: (6) E i = E data + E occlude + E sm ooth It consists of three parts. The first item is the data energy, and is defined as below: Edata = max ( r ( p) − r (q) , g ( p) − g (q) , b( p) − b(q) ) p∈Vl and q∈Vr (7) Where, the Vl and Vr denote the visible pixel set on the current region of the left and right images, p ∈ Vl and q ∈ Vr are two matching pixels, and r, g, b are the color values of the pixels. The pixel q on the right image is obtained by a projection operation in which the pixel p is projected onto the right image according to the disparity plane parameters of the current region. Since the q may not fall on the position with an integral coordinates, the r, g, b values of it may not be obtained directly from the right image. In that case, the color values are interpolated from the four neighboring pixels of q. Note that in our algorithm the visibility constraints [13] are used to determine the visible pixel set Vl and Vr on the current region. The second item in (6) is the occlusion energy. The following two occlusions were considered: the left occlusion and the right occlusion. As shown in Figure6, consider two adjacent regions A and B on the left image, where A is on the right of B. When the disparities of A are bigger than B along the border of the two regions, then a part of B’(D area in Figure6) will be occluded by A’. Where, B’ and A’ are the projections on the right image of B and A. Such occlusion is called left occlusion in our paper. While when the disparities of A are smaller than B along the border of the two regions, then a blank area (C area in Figure6) will appear between B’ and A’. Such occlusion is called right occlusion where the C area (a part of A’) will be occluded by B on the left image. In our algorithm, the occlusion energy is set as follows: when left occlusion occurs, the left occlusion energy is added onto the corresponding left occlusion area, and when right occlusion occurs, the right occlusion energy is added onto the corresponding right occlusion area. Concretely, the occlusion energy is calculated as follows: project the left image onto the right image according to the disparities obtained and check the right image, if an area is projected multiple times, then add the left occlusion energy on it and if an area is not projected at all, then add the right occlusion energy on it. B’ B D A’ (b) The right image: the disparities of A are bigger than B A B’ (a) The left image C A’ (b) The right image: the disparities of A are smaller than B Figure 6 The sketch map of occlusion According to the description stated above, the occlusion energy at each pixel q on the right image is expressed as: ⎧λocc ⎪ Eocclude(q) = ⎨λocc ⎪0 ⎩ Where, λocc that (9) is given by the sum of the smooth penalty energy added to all the pixels with discontinuous disparities on the border of the current region. From the definition of (6), the smaller the total energy of the current region, the better the corresponding disparity estimation. In each iteration, we select a local optimization method, e.g. the Powell’s method, to optimize the disparity plane parameters of all regions according to (5). Such a process is done iteratively until the algorithm converges. Our algorithm only requests that the initial estimation of disparities is roughly correct. In fact, the computation results of both the disparities and the occlusions depend on each other. Especially, only when the disparities are correct, the occlusions between adjacent regions can be detected correctly. Since the initial estimation of disparities is generally not perfect, the occlusions detected from them are not perfect also. But as long as the disparity plane parameters of adjacent regions are roughly correct, the disparity planes with the false parameters will be corrected in the later iterations. As an example, Figure 7 shows the disparity computation result of Tsukuba stereo pair by our algorithm. We can see that the initial estimation of disparities is obviously not true (see Figure 4), but with the iterations of the cooperative optimization algorithm, the total energy decreases rapidly and the corresponding disparity plane parameters are corrected. Figure 8 shows the change of total energy with the iterations. if q is a left occlusionpixel if q is a right occlusionpixel Otherwise is the penalty constant for occlusion. Once the occlusion energy at each pixel is calculated, the occlusion energy of the current region can be obtained as below: E occlude = ( Occ L + Occ R ) λ occ Where, OccL and Occ R (8) are the number of pixels (a) The first iteration: e=516622.0 belonging to the left occlusion area and the right occlusion area in the current region respectively. The final item of (6) is the smooth energy. It is defined as follows: ⎧⎪λ s if d(p) - d(q) ≥ 1 E smooth = (9) ⎨ Otherwise p∈Bc ⎪ ⎩0 ∑ Where, Bc is the set of the border pixels on the current region, and N is the set of adjacent pixels of the border pixels. p ∈ Bc and q ∈ N are the two 4-connected pixels on the left image, and d(p) and d(q) are the corresponding disparities. λ s is the smooth penalty constant. It is obvious (b) The second iteration: e=487985.0 (c) The third iteration: e=473195.0 we also use some stereo image pairs captured by our stereo vision system to make the matching experiments and evaluations. Figure 10 shows some results. The running time of the algorithm is related to the number of iterations. By using a notebook computer with CPU of PM1.6G, the total time for processing the stereo pair of Tsukuba is about 20s.Here, the number of iterations is 4, and the time for image segmentation is about 8s. The experimental results show that the performance of our algorithm is closed to the best algorithm. The comparison with other stereo matching algorithms is shown in table I. The parameters used in the experiments are below: λ k = 0.5, λs = 5, and λocc = 5. In addition, wij are set according to [17]. 6. Conclusion (d) The fourth iteration: e=467576.0 Figure 7 The cooperative optimization results: disparity map of Tsukuba stereo image pair This paper presents a region based cooperative optimization stereo matching algorithm. The algorithm permits us to obtain the high quality dense disparity map of a scene from its initial disparity estimation. The good point of the algorithm is that it has a strong ability of restraining and correcting errors. In the current moment, we use plane to fit the region with consistent disparities. It can be further improved by introducing B-spline fitting technique or similar ones. In addition, the Mean-shift method is a time-consuming image segmentation algorithm. How to find a more rapid as well as more robust real time image segmentation algorithm is another challenging work. Acknowledgements This work is supported by the National Natural Science Foundation of China under Grant No.60455001 and the open Fund of Key Laboratory of Biomimetic Sensing and Advanced Robot Technology, Anhui, China. The authors would like to thank for their help. References Figure 8 The curve of total energy 5. Experimental results In order to verify the effectiveness of our algorithm, we use VC8.0 to realize the algorithm and employ the stereo image pairs issued by Middlebury website to make experiments and evaluations. Figure 9 shows the corresponding disparity maps obtained by our algorithm. In addition, in order to verify the robustness of the algorithm, [1] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms”, International Journal of Computer Vision 47(1/2/3), pp. 7-42, 2002. [2] Myron Z. Brown, Darius Burschka and Gregory D. Hager, “Advances in Computational Stereo”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.25, NO.8, August 2003. [3] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts”,Transactions on Pattern Analysis and Machine Intelligence 23(11), pp. 1222-1239, 2001. [4] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?”, Transactions on Pattern Analysis and Machine Intelligence 26(2), pp. 147-159, 2004 [5] J. Sun, N.N. Zheng, and H.Y. Shum. “Stereo matching using belief propagation”. Transactions on Pattern Analysis and Machine Intelligence, 25(7):787-800, 2003. [6] A. Fusiello, V. Roberto, and E. Trucco. “Efficient stereo with multiple windowing”. In Conference on Computer Vision and Pattern Recognition, pages 858-863, 1997. [7] T. Kanade and M. Okutomi. “A stereo matching algorithm with an adaptive window: Theory and experiment”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(9):920-932, 1994. [8] O. Veksler. “Stereo correspondence with compact windows via minimum ratio cycle”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12):1654-1660, 2002. [9] K.-J. Yoon and I.-S. Kweon. “Locally Adaptive Support-Weight Approach for Visual Correspondence Search”. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp.924–931, 2005. [10] H. Tao, H. Sawhney, and R. Kumar. “A global matching framework for stereo computation”. In International Conference on Computer Vision, 1, 2001. [11] M. Lin and C. Tomasi. Surfaces with occlusions from layered stereo. Transactions on Pattern Analysis and Machine Intelligence, 26(8):1073-1078, 2004. [12] L. Hong and G. Chen. Segment-based stereo matching using graph cuts. In Conference on Computer Vision and Pattern Recognition, volume 1, pages 74-81, 2004. [13] M. Bleyer and M. Gelautz. “A layered stereo matching algorithm using image segmentation and global visibility constraints”. Photogrammetry and Remote Sensing, 59:128–150, 2005. [14] S. T. Birchfield, B. Natarajan, and C. Tomasi. “Correspondence as Energy-based Segmentation”. Image and Vision Computing. 2007. [15] J. Shi and C. Tomasi. “Good features to track”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.593–600, 1994. [16] Mark Gerrits and Philippe Bekaert. “”Local Stereo Matching with Segmentation-based Outlier Rejection . Third Canadian Conference on Computer and Robot Vision, June 2006. [17] Xiaofei Huang. “Cooperative Optimization for Energy Minimization: A Case Study of Stereo Matching”, http://front.math.ucdavis.edu/author/X.Huang,cs.CV/07010 57, Jan 2007. [18] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. “Efficient Belief Propagation for Early Vision”. International Journal of Computer Vision, Vol. 70, No. 1, October 2006. [19] D. Comanicu, P. Meer: “Mean shift: A robust approach toward feature space analysis”, IEEE Trans. Pattern Anal. Machine Intell., May 2002. [20] A. Klaus, M. Sormann, and K. Karner, “Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure”, ICPR 2006, Vol. 3, pp.15 – 18, 2006. [21] Jian Sun, Yin Li, Sing-Bing Kang and Heung-Yeung Shum., “Symmetric Stereo Matching for Occlusion Handling”, CVPR 2005, Vol. 2, pp. 399-406, 2005. [22] Yi Deng, Qiomg Yang, Xueyin Lin, and Xiaoou Tang, “Stereo Correspondence with Occlusion Handling in a Symmetric Patch Based Graph-Cuts Model”, IEEE Trans. on PAMI, 29(6):1068-1079,2007. Table I The rank in Middleburry website. Here, the first number in each column shows the error rate of the method, and the second number denotes the rank. Tsukuba ground truth Algorithm Avg. Venus ground truth Teddy ground truth Cones ground truth Rank nonocc all disc nonocc all disc nonocc all disc nonocc all disc Adapting BP 2.5 1.11 5 1.37 3 5.79 5 0.10 1 0.21 2 1.44 1 4.22 3 7.06 2 11.8 3 2.48 1 7.92 3 7.32 1 Our method 3.0 0.89 2 1.13 1 4.76 1 0.12 2 0.18 1 1.69 2 5.86 6 9.03 5 14.2 6 2.88 3 7.80 1 8.31 6 Double BP 3.8 0.88 1 1.29 2 4.76 2 0.14 4 0.60 9 2.00 5 3.55 2 8.71 4 9.70 1 2.90 4 9.24 9 7.80 2 SubPix Double BP 4.7 1.24 8 1.76 11 5.98 6 0.12 3 0.46 4 1.74 3 3.45 1 8.38 3 10.0 2 2.93 5 8.73 7 7.91 3 SymBP+ occ 9.0 0.97 4 1.75 10 5.09 4 0.16 5 0.33 3 2.19 6 6.47 8 10.7 6 17.0 12 4.79 19 10.7 16 10.9 15 Segm+vi sib 9.6 1.30 12 1.57 4 6.92 14 0.79 16 1.06 14 6.76 17 5.00 4 6.54 1 12.3 4 3.72 10 8.62 6 10.2 13 C-SemiG lob 9.8 2.61 24 3.29 19 9.89 21 0.25 8 0.57 6 3.24 11 5.14 5 11.8 7 13.0 5 2.77 2 8.35 5 8.20 4 Figure 9 The disparity maps of the standard stereo image pairs on Middleburry website obtain by our algorithm: The 1st row – Tsukuba, the 2nd row- Venus, the 3rd –Teddy, and the 4th –Cones st The 1 column – reference image, the 2nd column – ground truth, the 3rd column – our algorithm, and the 4th column – matching errers. (a) The left image (b) The right image (c) The disparity map obtained by our algorithm Figure 10 The disparity maps of stereo image pairs taken in office environments