CN118071797B

CN118071797B - Three-dimensional object accurate tracking method based on fusion of region and point cloud

Info

Publication number: CN118071797B
Application number: CN202410188374.4A
Authority: CN
Inventors: 刘银华; 金一新; 张家伟
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2024-02-20
Filing date: 2024-02-20
Publication date: 2025-01-10
Anticipated expiration: 2044-02-20
Also published as: CN118071797A

Abstract

The invention provides a three-dimensional object precise tracking method based on fusion of a region and a point cloud, which belongs to the technical field of computer vision and comprises the steps of (1) establishing a sparse viewpoint model, (2) establishing a color model on an object projection contour, (3) sampling candidate contour points on the projection contour image by using a search line, (4) providing a new step function to improve the distinguishing degree of foreground and background regions near the contour, (5) fusing color and distance information of the contour of an object to determine matched object contour points, (6) weighting energy functions by a weight function, solving pose according to an energy equation of all matched contour points, and (7) obtaining a transformation matrix of interframe motion, using the transformation matrix for model point cloud initialization, and then carrying out ICP fine registration on the model point cloud and the object point cloud to solve optimal pose so as to realize precise tracking of a three-dimensional object. The method can solve the problems of tracking offset and tracking failure of the low-texture three-dimensional object under the conditions of complex background and shielding.

Description

Three-dimensional object accurate tracking method based on fusion of region and point cloud

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a three-dimensional object accurate tracking method based on fusion of a region and a point cloud.

Background

The ability of three-dimensional object tracking to continuously obtain the spatial positional relationship between a three-dimensional object and a camera is an important task in computer vision. Three-dimensional object tracking can be broadly divided into three categories, feature point-based methods, edge-based methods, and region-based methods, depending on the type of image features used.

The tracking method based on the characteristic points firstly needs to extract three-dimensional characteristic points with obvious distinction on the object, then establishes the corresponding relation between the three-dimensional points of the object and the two-dimensional points of the matched image through characteristic matching, and finally solves the pose of the object. The early tracking method based on the feature points realizes object pose tracking through the markers, but the method needs to place the markers in the scene and the markers cannot be shielded in the tracking process, so that the method is poor in usability. With the development of feature point detection and description algorithms, the method is widely applied to the field of image feature tracking. Since the characteristic points of the object surface are relied on, the characteristic point-based tracking method is not suitable for tracking weak texture or non-texture objects.

The edge-based method finds the corresponding relation between the three-dimensional model contour points and the two-dimensional image contour points of the object by using edge features, and calculates the optimal pose parameters of the object by minimizing the reprojection error iteration of the three-dimensional contour points, so that the method has obvious speed advantages due to fewer sampling points, but the method depends on the matching effect of the image contour points, and when a large number of complex edges exist in the background or the object is blocked, the existing method is easy to generate a large number of mismatched object contour points, so that tracking offset and failure are caused.

The region-based method solves for optimal pose parameters by optimizing the color differences of the foreground and the background. In this process, the step of dividing the foreground region implicitly finds the object contour, making it a great advantage in complex backgrounds. Such methods construct a color probability model based on the foreground-background color, however, in some complex situations, such as scenes with similar foreground-background color or intense illumination changes, the color of the image may change drastically, resulting in failure of timely updating the color model, which causes a problem of tracking failure. Although some current methods use pixel-by-pixel color posterior probability of the area near the projection contour to realize more accurate foreground-background color probability modeling, the tracking accuracy depends on color statistical information of the area near the object contour, when the background color is similar to the appearance color of the object or the object is blocked, the method is difficult to accurately estimate the pose of the object, and as the algorithm needs to repeatedly calculate color segmentation information of all pixels near the object contour in the color probability modeling and pose estimation process, the calculated amount is large, and strong real-time tracking is difficult to realize.

Accordingly, there is a need for a three-dimensional object accurate tracking method that overcomes the above-described problems.

Disclosure of Invention

The invention aims to provide a three-dimensional object accurate tracking method based on fusion of a region and a point cloud, which adopts a new step function and aims to improve the degree of distinction between foreground and background regions near a contour. By fusing the color and distance information of the object outline, the influence of the background with similar color and local shielding on tracking is effectively reduced. And finally, improving a point cloud registration technology to realize high-precision estimation of the object pose. The method is characterized by comprising the following steps:

s1, arranging virtual cameras at a plurality of vertexes of a geometric grid surrounding a model, and rendering a three-dimensional geometric body from a plurality of viewpoints, so as to create a sparse viewpoint model;

S2, searching the nearest computing view according to the pose of the previous frame, then projecting the 3D model into an image, constructing a search line at the projection contour point, and establishing a probability model of the background before the projection contour line is segmented;

S3, searching projection contour points, foreground points and background points on the line, weighting each point position by combining the colors and the distances of the pixel points, and calculating weights belonging to object contour points;

S4, estimating the optimal pose of the current frame according to the minimized energy function of the pixel weight;

S5, using the optimal pose solved in the S4 to initialize a model point cloud, and then carrying out ICP fine registration on the model point cloud and an object point cloud to solve the optimal pose of the object;

And S6, evaluating the accuracy of the optimal pose obtained by solving in the S5 and the accuracy of the optimal pose obtained by solving in the S4, and repeatedly executing the S2-S5 by using the optimal pose in the S5 when the accuracy of the optimal pose obtained by solving in the S5 is greater than the accuracy of the optimal pose obtained by solving in the S4, so as to realize tracking of the three-dimensional object.

Further, in S1, creating the sparse view model specifically includes the following steps:

S11, randomly sampling n _c points from the model outline during each rendering, wherein

S12, for each contour point, estimating the normal vector thereofThe normal vector is perpendicular to the profile;

s13, for each contour point, calculating the depth value thereof The depth value represents depth information acquired from rendering;

S14, converting the 2D contour points and the depth information into 3D coordinates of the model by using a reconstruction formula, wherein the reconstruction formula is represented by a formula I and a formula II:

Equation one is expressed as:

The formula II is expressed as:

3D model point use And its homogeneous formsDescribing, using image coordinatesThe color value y=i _c (x) and the depth value D _z＝I_d (x) are obtained from the corresponding color and depth images, and the 3D model is projected onto the image by the formula three, which is expressed as:

Where f _x and f _y are focal lengths and p _x and p _y are principal point coordinates.

In the first formula, pi ^-1 is inverse operation, reconstructing a three-dimensional model point according to the image coordinate x and the corresponding depth value d _z along the optical axis, and calculating a direction vector pointing from the virtual camera to the center of the modelAnd respectively storing the calculated 3D model point, the normal vector and the direction vector.

Further, S2 specifically includes the following steps:

s21, according to the previous frame posture, acquiring a 3D model point and a normal vector from the nearest view of the sparse viewpoint model, and projecting the 3D model point and the normal vector into an image;

S22, at each model projection contour point m _i, sampling a search line along the normal direction n _i of the projection contour, wherein the j-th sampling pixel point of the search line l _i is represented by x _ij, and S _i represents an object contour point;

S23, setting pixel-by-pixel probability independence, the probability of the shape of the projection contour is expressed as:

Wherein d (x _ij)＝n_i ^T(x_ij-m_i) is the normal Euclidean distance between the search line sampling point x _ij and the projection contour point m _i, P _f(x_ij) is the posterior probability of the foreground region of each pixel color, and P _b(x_ij) is the posterior probability of the background region of each pixel color, and is calculated by a local color histogram with time consistency; Wherein η _f and η _b are the total number of pixels of the foreground region and the background region, respectively, the prior probabilities P (y|M _f) and P (y|M _b) are used to describe the likelihood that the color y is the foreground model M _f and the background model M _b, respectively, and H _f (x) and H _b (x) are step functions for focusing the model on the pixels located near the projection contour;

To prevent misclassification of pixels due to incomplete segmentation and then assign to the wrong color histogram, the step function is defined by equation five and equation six, which are expressed as:

the formula six is expressed as:

in the formulas five and six, the parameter s is a slope, the transition smoothness is adjusted by controlling the slope of the hyperbolic tangent function, and the parameter alpha _h E [0,0.5] is an amplitude parameter for adjusting the step function amplitude.

Further, S3 specifically includes the following steps:

s31, for each sampling point x _ij on the search line, a weight is assigned, expressed as:

Omega _c(x_ij) is a weight calculated based on the color information of the target contour point, omega _d(x_ij) is a weight calculated based on the position information of the target contour point;

s32, weighting the sampling point x _ij by using the color information of the object contour point S _i, wherein a weight function based on the color information of the object contour point is defined as follows by a formula:

Wherein P (S _i |S) is the probability that S _i belongs to the object contour, and c ₁ is the minimum color probability threshold.

S33, weighting the sample points x _ij by using the distance information of the target contour points S _i, wherein a weight function based on the distance information of the object contour points is defined as follows:

Wherein, For a normalized distance of the sample point x _ij to the target contour point s _i,Is the length of search line l _i and c ₂ is the maximum effective distance.

Further, in S4, the energy function is expressed as:

Further, S5 specifically includes the following steps:

s51, using the pose obtained by solving in the S4 to initialize a model point cloud;

s52, searching a corresponding point closest to the Euclidean distance of the object point cloud on the model point cloud based on curvature feature similarity, so that the matching of the wrong corresponding point is reduced;

s53, according to the matched corresponding points, a double-weight function self-adaptive optimization method of curvature characteristic similarity is provided, and the optimal transformation is solved;

s54, performing pose transformation on the model point cloud by using the solved transformation matrix;

s55, algorithm convergence judgment.

Further, in S52, the curvature characteristic similarity between the two corresponding points is calculated by the formula eleven, which is expressed as:

Further, in S53, in a method for adaptively optimizing a dual-weight function of curvature feature similarity, an optimization objective function is expressed as:

Wherein, The curvature characteristic similarity solved by optimizing the objective function is a double weight function, and when the similarity is smaller than a threshold value, the curvature characteristic similarity is obtainedWhen the similarity is greater than the threshold value, thenAnd less weight is assigned to the local deformation points and outliers.

Further, S55 is specifically to judge whether the algorithm meets the convergence condition, wherein the convergence condition is that the maximum iteration number is reached or the objective function converges to the minimum value, and when the algorithm does not meet the convergence condition, S51-S55 is repeated until the algorithm meets the convergence condition.

Further, in S6, the evaluation is specifically that when the accuracy of the optimal pose obtained by solving in S5 is smaller than that of the optimal pose obtained by solving in S4, the model is initialized to the optimal pose obtained by solving in S4, otherwise, the model is initialized to the optimal pose obtained by solving in S5.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a three-dimensional object tracking method based on fusion of a region and a point cloud, which is a new cost function based on the region and provides a smooth step function to reduce misclassification of sampling points. It not only reduces computational costs, but also handles complex scenes such as local occlusions and similar backgrounds more efficiently than existing region-based methods.

2. The invention solves the negative influence of local shielding and similar color background on tracking precision and robustness by introducing a new weighted energy function based on the color and distance information of the object contour, thereby improving the tracking robustness.

3. According to the method, the point cloud registration is adopted to carry out fine registration on the target position, so that the target tracking precision can be improved.

Drawings

Fig. 1 is a schematic structural diagram of computing a 2D rendering graph from a 3D mesh model.

FIG. 2 is a schematic diagram of search line sampling of a projection profile.

Fig. 3 is a schematic view of pose estimation results using point cloud registration.

Fig. 4 is a schematic diagram of tracking results in different scenarios.

Detailed Description

A three-dimensional object precise tracking method based on fusion of a region and a point cloud according to the present invention will be described in more detail with reference to the accompanying drawings, in which preferred embodiments of the present invention are shown, it should be understood that the present invention described herein can be modified by those skilled in the art while still achieving the advantageous effects of the present invention, and thus the following description should be construed as broadly known to those skilled in the art without limiting the present invention.

A three-dimensional object tracking method based on fusion of a region and a point cloud specifically comprises the following steps:

Step 1, arranging virtual cameras at vertices of a geometric grid surrounding the model, and rendering a three-dimensional geometric body from multiple viewpoints to create a sparse viewpoint model.

The method specifically comprises the following steps:

1) At each rendering, randomly sampling n _c points from the model contour

2) For each contour point, its normal vector is estimatedThe normal vector is perpendicular to the profile, as shown in FIG. 1;

3) For each sampling point, a depth value is calculated The value represents depth information acquired from rendering;

4) Converting the 2D contour points and depth information into 3D coordinates of the model using reconstruction formulas (1) and (2):

3D model point use And its homogeneous formsTo describe, use image coordinatesThe color value y=i _c (x) and the depth value d _z＝I_d (x) are obtained from the corresponding color and depth images. The 3D model is projected onto the image by:

In equation (1), pi ^-1 represents an inverse operation, reconstructs a three-dimensional model point from the image coordinate x and the corresponding depth value d _z along the optical axis, and calculates a direction vector pointing from the camera toward the center of the modelRespectively storing the calculated 3D model point, normal vector and direction vector

And 2, searching the closest calculated view according to the pose of the previous frame. And then, the 3D model is projected into an image, a search line is built at the projection contour points, and a probability model of the projection contour line before segmentation and the background is built.

The method specifically comprises the following steps:

1) According to the previous frame posture, 3D model points and normal vectors from the nearest view of the sparse viewpoint model are obtained and projected into an image;

2) At each model projection profile point m _i, the search line is sampled along the normal direction n _i of the projection profile. The j-th sampling pixel point of the search line l _i is denoted by x _ij, and s _i denotes an object contour point, as shown in fig. 2;

3) Assuming pixel-by-pixel probability independence, the probability of the shape of the projected contour is:

Where d (x _ij)＝n_i ^T(x_ij-m_i) is the normal Euclidean distance between the search line sampling point x _ij and the projection profile point m _i. P _f(x_ij) and P _b(x_ij) represent posterior probabilities of foreground and background regions, respectively, for each pixel color, calculated from the temporal consistent local color histogram.

Η _f、η_b is the total number of pixels in the foreground and background regions. The prior probabilities P (y|m _f) and P (y|m _b) describe the likelihood that the color y is the foreground model M _f or the background model M _b. H _f(x)、H_b (x) are step functions, so that the model focuses more on pixel points near the projection contour. In order to prevent misclassification of pixels due to incomplete segmentation, then assigned to the wrong color histogram, the step function is redefined as:

The parameter s is slope, the smoothness of the transition is adjusted by controlling the slope of the hyperbolic tangent function, and the parameter alpha _h E [0,0.5] is an amplitude parameter for adjusting the amplitude of the step function.

And 3, the search line comprises projection contour points, foreground points and background points, the projection contour points are weighted by combining colors and distances of the pixel points, and weights of the object contour points are calculated.

The method specifically comprises the following steps:

1) For each sample point x _ij on the search line, it is assigned a weight:

Where ω _c(x_ij) is a weight calculated based on the color information of the target contour point, and ω _d(x_ij) is a weight calculated based on the position information of the target contour point.

2) The sampling point x _ij is weighted with the color information of the object contour point s _i, and a weight function based on the color information of the object contour point is defined as:

where P (S _i |s) is the probability that S _i belongs to the object contour, and c ₁ is the minimum color probability threshold.

3) Sample point x _ij is weighted with the distance information of target contour point s _i, and a weight function based on the distance information of the object contour point is defined as:

Step 4, estimating the optimal pose of the current frame according to the minimized energy function of the pixel weight;

The energy function is:

Where p represents the optimal pose transform.

Step 5, using the pose solved in the step 4 to initialize a model point cloud, and then carrying out ICP fine registration on the model point cloud and an object point cloud to solve the optimal pose, as shown in fig. 3;

The method specifically comprises the following steps:

1) The pose solved in the step (4) is used for initializing a model point cloud;

2) Searching a point set Q _j with smaller Euclidean distance of any point P '_i in the object point cloud P' in the model point cloud Q;

3) The normal vector and curvature of the object point cloud and the model point cloud are calculated, and for any point P '_i in the object point cloud P', multiple feature vectors (P _i1,P_i2,P_i3,P_i4) are established based on the curvature features of the point, and the vectors sequentially correspond to K, H, K ₁ and K ₂ of the points. Meanwhile, a multiple feature vector (Q _j1,Q_j2,Q_j3,Q_j4) corresponding to any point Q _j in the model point cloud Q. Wherein K ₁、K₂ is the principal curvature of the point cloud, K is the Gaussian curvature, and H is the average curvature;

4) The curvature characteristic similarity between two corresponding points is calculated using the following formula:

If the similarity is smaller than the threshold value omega ₀, the corresponding point does not meet the requirement and needs to be searched again until the point with the calculated similarity value larger than or equal to the threshold value is the corresponding point, so that the error corresponding point matching is reduced;

5) An adaptive optimization method for curvature characteristic similarity is established, and the designed optimization objective function is as follows:

Wherein, For the double weight function, according to the curvature characteristic similarity solved by D, if the similarity is smaller than a threshold valueIf the similarity is greater than the threshold value, thenLess weight is assigned to local deformation points and outliers;

6) Solving an optimization objective function through Singular Value Decomposition (SVD) to obtain a transformation matrix;

7) Applying the solved transformation matrix to the model point cloud, and performing pose transformation on the model point cloud;

And judging whether the algorithm meets the convergence condition. And (3) if the convergence condition is that the maximum iteration number is reached or the objective function converges to the minimum value, repeating the steps 1) to 7) until the convergence condition is met if the convergence condition is not met.

And 6, evaluating the accuracy of the pose obtained by solving in the step 5 and the pose obtained in the step 4, initializing the pose to the pose in the step 4 if the pose obtained by solving in the step 5 is not as accurate as the pose in the step 4, otherwise, repeatedly executing the steps 2 to 5 by using the pose if the pose obtained in the step 5 is more accurate, and realizing tracking of the three-dimensional object.

The method specifically comprises the following steps:

1) Evaluating the accuracy of the pose obtained by solving in the step 5 and the pose obtained in the step 4;

2) If the pose solved in the step 5 is not as accurate as the pose solved in the step 4, initializing the pose as the pose of the step 4;

3) And (5) if the pose obtained in the step (5) is more accurate, repeatedly executing the steps (2) to (5) by using the pose, and realizing tracking of the three-dimensional object.

The three-dimensional object tracking method provided by the invention is used for accurately testing, the adopted data set is an RBOT data set, the data set uses a real scene shot by a camera as a background, a rendered virtual object is superimposed on the real background to obtain a three-dimensional tracking data set with a reference pose, and the three-dimensional tracking data set is a three-dimensional tracking data set with a first camera and an object moving simultaneously. The RBOT dataset contains a three-dimensional model of 18 objects, 4 motion modes (regular mode, dynamic illumination mode, noise and dynamic illumination mode, occlusion mode), a total of 72 video sequences, each video sequence containing 1001 frames of images. The three-dimensional object tracking method provided by the invention is compared with the tracking technologies based on areas such as Stoiber on the RBOT dataset, and the experimental results are shown in table 1.

TABLE 1

As can be seen from Table 1, the algorithm provided by the invention is comprehensively superior to the tracking method provided by the comparison method, the average improvement is 5% under the conditions of conventional dynamic illumination and shielding, and the average improvement is 10% under noise.

In order to show good three-dimensional object tracking effect, experiments are carried out by placing the weak texture object in different scenes, wherein the situations include similar background colors, partial shielding, rapid movement, light reflection and the like. The experimental results are shown in fig. 4, and it can be seen from the graph that the method can achieve a better three-dimensional tracking effect under different scenes.

The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any person skilled in the art will make any equivalent substitution or modification to the technical solution and technical content disclosed in the invention without departing from the scope of the technical solution of the invention, and the technical solution of the invention is not departing from the scope of the invention.

Claims

1. The three-dimensional object accurate tracking method based on the fusion of the region and the point cloud is characterized by comprising the following steps of:

s6, evaluating the accuracy of the optimal pose obtained by solving in the S5 and the accuracy of the optimal pose obtained by solving in the S4, and repeatedly executing the S2-S5 by using the optimal pose in the S5 when the accuracy of the optimal pose obtained by solving in the S5 is greater than the accuracy of the optimal pose obtained by solving in the S4 so as to realize tracking of the three-dimensional object;

In the step S1, creating a sparse viewpoint model specifically includes the following steps:

s11, randomly sampling from the model outline during each rendering A plurality of points, wherein;

S12, for each contour point, estimating the normal vector thereofThe normal vector being perpendicular to the profile;

the formula one is expressed as:

The formula two is expressed as:

,

3D model point use And its homogeneous formsDescribing, using image coordinatesColor value acquisition from corresponding color and depth imagesAnd depth valueThe 3D model is projected onto the image through a formula three, which is expressed as:

,

Wherein, AndIs the focal length of the lens and,AndIs the principal point coordinates;

In the first of the above-mentioned formulas, For inverse operation, based on image coordinatesAnd corresponding depth values along the optical axisReconstructing three-dimensional model points, and calculating a direction vector pointing from a virtual camera to the center of the modelRespectively storing the calculated 3D model points, the normal vector and the direction vector;

the step S2 specifically comprises the following steps:

S22, projecting contour points on each model Where the search line is along the normal direction of the projected profileSampling is performed, wherein a line is searchedIs the first of (2)Each sampling pixel point is composed ofThe representation is made of a combination of a first and a second color,Representing object contour points;

,

Wherein, For searching line sampling pointsAnd projection contour pointsNormal Euclidean distance between the two; Posterior probability for foreground region for each pixel color; the posterior probability of the background area of each pixel color is calculated by a local color histogram with time consistency; Wherein, the method comprises the steps of, wherein, The total number of pixels of the foreground region and the background region respectively, and the prior probabilityAndRespectively for describing colorsIs a foreground modelAnd background modelIs to be used as a potential for a vehicle; are step functions for focusing the model on pixel points located near the projection profile;

to prevent misclassification of pixels due to incomplete segmentation and then assign to the wrong color histogram, the step function is defined by equation five and equation six, which is expressed as:

,

the formula six is expressed as:

,

In the fifth and sixth formulas, parameters For the slope, the smoothness of the transition is regulated by controlling the slope of the hyperbolic tangent function[0,0.5] Is used to adjust the amplitude parameter of the step function amplitude.

2. The method for precisely tracking the three-dimensional object based on the fusion of the region and the point cloud according to claim 1, wherein the step S3 specifically comprises the following steps:

S31 for each sample point on the search line It is assigned a weight expressed by the formula:

,

Wherein, A weight calculated for color information based on the target contour point; weights calculated for position information based on the target contour points;

S32, utilizing object contour points Color information pair sampling points of (a)Weighting is performed, and a weight function based on color information of the object contour point is defined as follows by a formula:

,

Wherein, Is thatProbability of belonging to the object contour; Is a minimum color probability threshold;

S33, utilizing the target contour point Distance information of (2) to sample pointWeighting is performed, and a weight function based on the distance information of the object contour point is defined as follows:

,

Wherein, Is the sampling pointTo the target contour pointIs used for the distance normalization of (a),For searching for linesIs a length of (2); is the maximum effective distance.

3. The method for precisely tracking a three-dimensional object based on fusion of a region and a point cloud according to claim 2, wherein in S4, the energy function is expressed as:

。

4. The method for precisely tracking the three-dimensional object based on the fusion of the region and the point cloud according to claim 1, wherein the step S5 specifically comprises the following steps:

s55, algorithm convergence judgment.

5. The method for precisely tracking a three-dimensional object based on fusion of a region and a point cloud as set forth in claim 4, wherein in S52, curvature feature similarity between two corresponding points is calculated by an eleventh formula, wherein the eleventh formula is expressed as:

。

6. The method for accurately tracking a three-dimensional object based on fusion of a region and a point cloud as claimed in claim 4, wherein in S53, a method for adaptively optimizing a dual-weight function of curvature feature similarity, an optimization objective function is expressed as:

,

Wherein, The curvature characteristic similarity solved by optimizing the objective function is a double weight function, and when the similarity is smaller than a threshold value, the curvature characteristic similarity is obtained=0, When the similarity is greater than the threshold valueAnd less weight is assigned to the local deformation points and outliers.

7. The method for accurately tracking a three-dimensional object based on fusion of a region and a point cloud as set forth in claim 4, wherein the step S55 is specifically to determine whether the algorithm satisfies a convergence condition, the convergence condition being that a maximum number of iterations is reached or that the objective function converges to a minimum value, and when the algorithm does not satisfy the convergence condition, repeating the steps S51 to S55 until the algorithm satisfies the convergence condition.

8. The method for accurately tracking the three-dimensional object based on the fusion of the region and the point cloud according to claim 1, wherein in the step S6, the evaluation is specifically that when the accuracy of the optimal pose obtained by solving in the step S5 is smaller than that of the optimal pose obtained by solving in the step S4, the model is initialized to the optimal pose obtained by solving in the step S4, and otherwise, the model is initialized to the optimal pose obtained by solving in the step S5.