Automatic Pose Estimation of Uncalibrated Multi-View Images Based on a Planar Object with a Predefined Contour Model
"> Figure 2
<p>The lines and corners of the planar objects (from left to right: A4 paper, Book 1 cover and Book 2 cover). The model lines are shown in green, and the corner points are shown in red.</p> "> Figure 3
<p>Homography transformation between the model and image lines.</p> "> Figure 4
<p>1D search from the model line to the image line. (<b>a</b>) Sketch map of the image edge and the projected edge. The black solid line is the sampled line segments, and brown points are the sampled points. (<b>b</b>) Real image of the image edge and the projected edge. The yellow line is the image edge projected by a correct homography or the prior homography. The blue line is the projected edge by the current estimated homography.</p> "> Figure 5
<p>Degenerate situation for camera parameter calculation. (<b>a</b>) Image sequence of a flat scene with a fixed camera and the scene moving around a single-axis turntable; (<b>b</b>) the orientation of the camera motion of (<b>a</b>); actually, the pose of the images has a large offset from the real value, which means the pose estimation failed.</p> "> Figure 6
<p>Bundle adjustment model with planar control points.</p> "> Figure 7
<p>Bundle adjustment procedure with planar control points. (<b>a</b>) A feature matching between a pair of images. The green cross flags are matched key points of SIFT. The four line segments with red color are the edges of an ID card; (<b>b</b>) Feature tracks between multiple view images; (<b>c</b>) A bundle adjustment model is generated and executed. The left is the sparse reconstruction result using initial camera parameters. The right is the sparse result refined by bundle the adjustment procedure.</p> "> Figure 8
<p>Homography recognition for disorderly multi-view images. (<b>a</b>) Homography recognition of the disorder multi-view image captured by a smartphone; the correct recognition homographies are shown in a green lines format, while other error lines are shown in red color; (<b>b</b>) the left is the results of pose estimation for the disorder multi-view images; polylines are connected according to the capture sequence. The right is the rearranged sequence by the pose of the cameras.</p> "> Figure 9
<p>Homography tracking for Book 1 and Book 2 cover contour models in an environment. (<b>a</b>) The projections of the contour models of the Book 1 cover are drawn in blue by the recovered homography. The four images are the 100th, 200th, 300th and 400th frame in the video, respectively; (<b>b</b>) The left is the camera pose result of the video images, and the right is the 3D point clouds of the scene containing the Book 1 cover; (<b>c</b>) The projections of the contour models of the Book 2 cover are drawn in yellow by the recovered homography. The four images are the 100th, 200th, 300th and 400th frame in the video, respectively; (<b>d</b>) The left is the camera pose result of the video images, and the right is the 3D point clouds of the scene contain the Book 2 cover.</p> "> Figure 10
<p>Four images of the model plane for camera calibration.</p> "> Figure 11
<p>Results versus the number of model lines.</p> "> Figure 12
<p>Multi-view images of the scene. The images are captured by a smartphone, and a 7 × 9 × 25 mm chessboard was put in the scene.</p> "> Figure 13
<p>Pose estimation of three methods. (<b>a</b>) The left column is the features used in the estimation procedure, which, from top to bottom, are the four outer edge lines of A4 paper, the four outer edge lines of A4 paper with bundle adjustment and the corners of the chessboard, respectively; (<b>b</b>) the right column is the pose results of the three methods.</p> "> Figure 14
<p>Comparison of depth images. We pick one depth image corresponding to the original image of (<b>a</b>), and (<b>b</b>–<b>d</b>) are the depth images of the methods homo_exp, homo&ba_exp and chessbd_exp.</p> "> Figure 15
<p>Point clouds of dense reconstruction using the camera parameters of the three methods. (<b>a</b>,<b>b</b>) The total point clouds with color and normal.</p> "> Figure 16
<p>The detailed parts of the three sets in zoomed-in view. Four parts are selected to show the diversity of the three methods. The detail regions are marked by the green ellipse.</p> "> Figure 17
<p>Bias analysis when the two point clouds are aligned.</p> "> Figure 18
<p>3D shape of the statue from uncalibrated multi-view images. (<b>a</b>) Several frames sampled from a 1080p video captured by a smartphone; (<b>b</b>) recoverable results of the pose parameters of multi-view images and the sparse 3D points; (<b>c</b>) the left is the results of the color point clouds, and the normal point clouds are shown in the middle; the right is the model of the triangulation result.</p> "> Figure 19
<p>3D reconstruction results of One Piece.</p> "> Figure 20
<p>3D reconstruction results of The Hulk.</p> ">
:1. Introduction
- (1)
- A robust contour model-based homography estimation (including recognition and tracking) of the planar object, which can transform disorderly multi-view images into orderly multi-view images in a general environment and provide good initial intrinsic camera and pose parameters.
- (2)
- A complete framework, which automatically provides both the intrinsic camera and pose parameters with a real scale for uncalibrated multi-view images. The framework can develop substantial measurable vision applications.
2. Overview
3. Initial Parameters Obtained from Contour-Based Homography
3.1. Problem Statement
3.2. The Disorderly Images and the First Frame: Recognition of the Contour-Based Homography
3.3. Orderly Images: Contour-Based Homography Optimization (Tracking)
3.4. Initial Intrinsic Camera and Pose Parameter Retrieval
4. Parameter Refinement by Bundle Adjustment
4.1. Sparse Reconstruction
4.2. Bundle Adjustment
5. Experimental Results
5.1. Homography Recognition and Tracking
5.2. Accuracy Evaluation
5.3. 3D Reconstruction Application
6. Discussion
- (1)
- The homography estimation method can be considered the revised version of the model-based 3D tracking [74,75], which was developed to estimate the six DOF pose of the camera. Rather than initially estimating the affine transformation parameters and then the remaining non-affine parameters [76], this method used an iterative optimization process to refine the recognized homography directly.
- (2)
- Compared with a similar work [77], in which mapping was modeled as affine transformation and line correspondences were utilized in the refining process, the proposed method recognized the eight DOF of homography and optimized the initial transformation iteratively by dealing with the object contour as a series of sample points in a manner that the curved edge can be integrated. In this approach, the initial homography was recognized in the framework of hypothesizing and verifying the unmatched set of lines. Moreover, the optimized homography was obtained by minimizing the errors between the sample points and their corresponding image points obtained by utilizing the 1D search along the normal direction.
- (3)
- The robust approximate homography estimation is a vital stage, which can provide good initial parameters for the bundle adjustment procedure and can transform disorderly multi-view images to orderly multi-view images. The proposed method focused on obtaining the intrinsic camera and pose parameters with scale information and improving the precision of those parameters by the bundle adjustment procedure.
7. Conclusions
DOF | Degree of Freedom |
SIFT | Scale-Invariant Feature Transform |
FLANN | Fast Library for Approximate Nearest Neighbors |
RANSAC | Random Sample Consensus |
ILBA | Incremental Light Bundle Adjustment |
VAN | Vision-Aided Navigation |
IGN | The French National Institute of Geography |
Step I: Generate Structured Line Contour Model | |
| The planar objects with the predefined contour model, from left to right, are A4 paper and book covers with line features. In the top row are the predefined contour models. |
Sept II: Acquire Multi-view Images | |
| The A4 paper was placed on the table. is one corner; the axis is the shorter edge; the axis is the other edge; and the axis is confirmed by the right-hand rule. Each camera view has a rigid transformation to the global coordinate system, which is unknown at this phase. |
Sept III: Calculate the Intrinsic Camera and Pose Parameters | |
| (a) Homography recognition and tracking were respectively applied for multi-view images. (b) Matching of line segments in a single image. In the configuration of an ID card, four line segments were matched. (c) Multiple homographies can be computed when homography estimation has been done. Then, initial intrinsic and camera pose parameters can be decomposed by multiple homographies. |
Step IV: Parameter Refinement by Bundle Adjustment | |
| Sparse 3D points can be triangulated by the known initial camera parameters in the previous step and corresponding feature points. In this study, the corner points of the contour model were taken as Ground Control Points (GCPs). The bundle adjustment procedure was executed for parameter refinement. |
Real Object | Features | |
Book 1 Cover | Points index and coordinates (mm): 0: (0,0,0) 1: (174.5,0.0,0.0) 2: (174.5,245.1,0.0) … | Line index: 0: 0 1 1: 1 2 2: 2 3 … |
Methods/Pixels | Reprojection RMS | ||||
Corner-based method | 1147.23 | 1146.68 | 475.39 | 258.04 | 0.41 |
Line-based method | 1150.76 | 1151.49 | 474.61 | 262.60 | 0.42 |
Four line-based method | 1141.14 | 1139.99 | 480.25 | 253.07 | 0.67 |
Object | Planar Model | Photography Way | Number of Images | Runtime |
statue | platform edge | a circle around the object | 100 | 30 min |
One Piece | ID card | more than a circle around the object | 140 | 36 min |
The Hulk | A4 paper | a half circle around the object | 100 | 27 min |
