CN118314370A - Layered motion restoration structure method based on grid common view - Google Patents
Layered motion restoration structure method based on grid common view Download PDFInfo
- Publication number
- CN118314370A CN118314370A CN202410410945.4A CN202410410945A CN118314370A CN 118314370 A CN118314370 A CN 118314370A CN 202410410945 A CN202410410945 A CN 202410410945A CN 118314370 A CN118314370 A CN 118314370A
- Authority
- CN
- China
- Prior art keywords
- sub
- scene
- image
- images
- grid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a hierarchical motion restoration structure method based on grid common view, which further processes the matched inner points of image characteristic points by using a grid common view method, so that the image can be correctly classified into one sub-scene when the image is cut, the sub-scene is expanded by using an iterative strong correlation image mode after the image is cut, the overlapping rate between the sub-scenes is improved on the premise that the image can be correctly reconstructed, and the three-dimensional point information is fused when the image is finally combined, thereby improving the accuracy of similar transformation between subareas, and the method has the advantages that: the internal points matched with the image characteristic points are further processed based on a grid common view method, so that the image can be correctly classified into a certain sub-scene when the image is cut, the sub-scene is expanded by using an iterative strong-correlation image mode after the image is cut, the overlapping rate between the sub-scenes is improved on the premise that the image can be correctly reconstructed, three-dimensional point information is fused when the image is finally combined, and the accuracy of similarity transformation between the subareas is improved.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a layered motion restoration structure method based on grid common view.
Background
Calculating the pose and three-dimensional points of the camera from the image information acquired by the unordered RGB camera mainly comprises the process of starting from a robust initial seed reconstruction, adding the cameras one by an incremental motion restoration structure (Structure from Motion, sfM) and then by a perspective n-Point method (PERSPECTIVE-n-Point, pnP). After successful registration of the camera, an additional beam method adjustment step is used to optimize the camera pose and 3D points, which makes the increment SfM robust and accurate. However, incremental SfM can also become inefficient and, due to repeated optimization steps, can encounter memory bottlenecks on large-scale data sets. Furthermore, the way new views are added incrementally makes these methods susceptible to drift, despite the use of additional re-triangulation steps. To overcome the inefficiency of incremental SfM while maintaining robustness of reconstruction, one natural idea is to reconstruct in a divide-and-conquer manner. The hierarchical SfM algorithm generally comprises the steps of firstly partitioning an image set through graph cutting, reconstructing each partition in an incremental or global mode, fusing each sub-reconstruction through similar transformation, and merging each sub-reconstruction by means of the same image.
Therefore, the result of graph cut is critical to the whole reconstruction flow, while the traditional graph cut method only adopts the number of the matching of the characteristic points between the images as the weight of the edges, and even if geometric verification is carried out, the weight of the edges in the scene graph still has larger error by relying on the number of the matching of the characteristic points of the images. The prior work infers the merging paths of the sub-scenes through a minimum spanning tree method, and obtains similar transformation among the subareas through the common image, thereby obtaining an effective reconstruction result. But no further processing of the inliers matching the image features, and only the common images are considered in the sub-scene merging, eventually resulting in sub-scene merging failure.
Disclosure of Invention
The invention aims to provide a layered motion restoration structure method based on grid common view, which aims to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a layered motion restoration structure method based on grid common view specifically comprises the following steps:
S1, inputting an image: inputting an RGB image set;
s2, image preprocessing: extracting characteristic points and matching the characteristic points of the image;
S3, constructing a scene graph: the method comprises the steps that a graph structure formed by all related images is formed, nodes are images, and edges are the association degree between the images;
S4, dividing a scene graph: dividing a large scene graph into a plurality of small scene graphs to obtain a plurality of sub-scenes;
s5, expanding the sub-scene: expanding boundaries of the sub-scene;
s6, reconstructing local SfM: independently reconstructing an SfM system for the image in each sub-scene;
S7, merging sub-scenes: and merging the results of the SfM system reconstruction independently in the previous step to obtain a complete reconstruction result.
As a preferred embodiment of the present invention: before the step S3 is performed, whether the two images are associated or not is needed to be judged, and the association degree is calculated, and the method specifically comprises the following steps:
S2.1, uniformly dividing each image into N 2 grid units, and marking as
S2.2, taking each grid in the image as a node of a common view, wherein edges between the nodes represent common view relations among the grids, if at least T characteristic tracks are shared between two grids, the grids form a pair of common view grids, and the distance between any two images I i and I j is defined based on the common view of the grids;
s2.3, judging whether the two images are co-viewed or not;
s2.4, marking the first layer to the L layer respectively according to the sequence from low resolution to high resolution, marking grid cells with two states, if a common view matching pair is found in the grid cells of the first layer, updating the cell state to be full, and increasing the association score of the image pair by 2 l;
and S2.5, calculating the total association degree score between the two images through the single-side score average value of all the states of which are full grid cells.
As a preferred embodiment of the present invention: the formula for defining the distance between any two images in S2.2 is as follows:
wherein, Representing grids in a common viewAnd grid ofThe shortest path distance between, dist (I i,Ij) represents the minimum distance between all possible co-view mesh pairs between image I i and image I j.
As a preferred embodiment of the present invention: the judging formula of whether the two images in S2.3 are viewed together is as follows:
Where σ represents a preset distance threshold, and if the distance between I i and I j does not exceed σ dist, they are determined to be co-view pairs.
As a preferred embodiment of the present invention: the expanding the boundary of the sub-scene in S5 requires that in the conventional graph cut and expansion mode, the final merging is achieved by creating an overlapping region between the partitions, and the specific method is as follows:
s3.1, collecting boundary images: identifying those images from the already constructed scene graph that are at the junction of different sub-scenes or that are relatively isolated throughout the scene, and collecting them into a set S;
s3.2, gradually expanding the sub-partitions: performing the following steps for each sub-scene i in the scene graph;
S3.3, searching a boundary image with the highest association degree with the external image in the boundary images of the set S as an expansion image, and adding all associated images which do not belong to the partition into all associated images of the expansion image;
S3.4, judging whether the size of the overlapped area meets the requirement: and calculating the proportion of the newly added image in the sub-scene i, and if the proportion reaches or exceeds a set threshold tau overlap, ending the expansion process of the sub-scene.
As a preferred embodiment of the present invention: the sub-scene merging operation of S7 specifically includes the following steps:
S4.1, identifying common three-dimensional points: identifying all the common three-dimensional points observed by at least three characteristic points in two sub-scenes src and tgt to be combined, and marking a coordinate set of the common three-dimensional points in the src sub-scene as src_xyz and the tgt sub-scene as tgt_xyz;
S4.2, randomly sampling: randomly extracting m points from src_xyz and tgt_xyz, which are named src_rand and tgt_rand respectively;
S4.3, estimating similarity transformation: estimating a similarity transformation matrix T from the src sub-scene to the tgt sub-scene by using the src_rand and the tgt_rand;
s4.4, calculating a reprojection error: calculating a reprojection error E src of each three-dimensional point in the src sub-scene in the tgt sub-scene by means of estimating to obtain a similar transformation matrix T, and iterating;
s4.5, iteration judgment: and judging whether the iteration times reach a preset upper limit, ending iteration if the iteration times reach the preset upper limit, and returning to S4.3 if the iteration times do not reach the preset upper limit.
As a preferred embodiment of the present invention: the method for estimating the similarity transformation in S4.3 specifically comprises the following steps:
S4.3.1 the src_rand and tgt_rand are represented by vectors (x 1,x2,...,xn) and (y 1,y2,...,yn), and their means μ x and μ y, and variance are found And
S4.3.2, mu x、μy obtained according to S4.3.1,AndCalculating to obtain covariance matrixes Sigma xy of the two groups of vectors;
S4.3.3, performing SVD decomposition on the covariance matrix sigma xy, and then obtaining the required similarity transformation between two point clouds through the decomposed matrix and the constructed diagonal matrix, wherein the similarity transformation comprises a rotation matrix, a translation vector and a scale.
As a preferred embodiment of the present invention: the covariance matrix Σ xy in S4.3.2 has a calculation formula:
SVD decomposition, rotation matrix, translation vector and scale calculation formula in S4.3.3 are:
R=USVT
t=μy-cRμx
As a preferred embodiment of the present invention: the calculation of the re-projection error in S4.4 also needs to obtain the re-projection error E tgt of the tgt sub-scene, and if both E src and E tgt are smaller than the preset maximum threshold D max, the three-dimensional point is regarded as the interior point.
As a preferred embodiment of the present invention: after each iteration operation in S4.4 is finished, src_rand and tgt_rand need to be updated, and the number of inner points N cur of the current iteration is recorded.
Compared with the prior art, the invention has the beneficial effects that:
The internal points matched with the image characteristic points are further processed based on a grid common view method, so that the image can be correctly classified into a certain sub-scene when the image is cut, the sub-scene is expanded by using an iterative strong-correlation image mode after the image is cut, the overlapping rate between the sub-scenes is improved on the premise that the image can be correctly reconstructed, three-dimensional point information is fused when the image is finally combined, and the accuracy of similarity transformation between the subareas is improved.
Drawings
FIG. 1 is a flow chart of a layered motion restoration structure of the present invention;
FIG. 2 is a schematic diagram of pyramid scoring according to the present invention;
fig. 3 is a sub-scene extension schematic of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a technical solution:
a layered motion restoration structure method based on grid common view specifically comprises the following steps:
S1, inputting an image: inputting an RGB image set;
s2, image preprocessing: extracting characteristic points and matching the characteristic points of the image;
S3, constructing a scene graph: the method comprises the steps that a graph structure formed by all related images is formed, nodes are images, and edges are the association degree between the images;
S4, dividing a scene graph: dividing a large scene graph into a plurality of small scene graphs to obtain a plurality of sub-scenes;
s5, expanding the sub-scene: expanding boundaries of the sub-scene;
s6, reconstructing local SfM: independently reconstructing an SfM system for the image in each sub-scene;
S7, merging sub-scenes: and merging the results of the SfM system reconstruction independently in the previous step to obtain a complete reconstruction result.
Further, before S3 is performed, it is required to determine whether two images are associated and calculate the association degree, which specifically includes the following steps:
S2.1, uniformly dividing each image into N 2 grid units, and marking as
S2.2, taking each grid in the image as a node of a common view, wherein edges between the nodes represent common view relations among the grids, if at least T characteristic tracks are shared between two grids, the grids form a pair of common view grids, and the distance between any two images I i and I j is defined based on the common view of the grids;
s2.3, judging whether the two images are co-viewed or not;
s2.4, marking the first layer to the L layer respectively according to the sequence from low resolution to high resolution, marking grid cells with two states, if a common view matching pair is found in the grid cells of the first layer, updating the cell state to be full, and increasing the association score of the image pair by 2 l;
and S2.5, calculating the total association degree score between the two images through the single-side score average value of all the states of which are full grid cells.
Further, the formula for defining the sum of any two images in S2.2 is:
wherein, Representing grids in a common viewAnd grid ofThe shortest path distance between, dist (I i,Ij) represents the minimum distance between all possible co-view mesh pairs between image I i and image I j.
Further, a judgment formula of whether the two images are viewed together in S2.3 is as follows:
Where σ represents a preset distance threshold, and if the distance between I i and I j does not exceed σ dist, they are determined to be co-view pairs.
Further, expanding the boundaries of the sub-scene in S5 requires that in the conventional graph cut and expansion mode, the final merging is achieved by creating an overlapping region between the partitions, and the specific method is as follows:
s3.1, collecting boundary images: identifying those images from the already constructed scene graph that are at the junction of different sub-scenes or that are relatively isolated throughout the scene, and collecting them into a set S;
s3.2, gradually expanding the sub-partitions: performing the following steps for each sub-scene i in the scene graph;
S3.3, searching a boundary image with the highest association degree with the external image in the boundary images of the set S as an expansion image, and adding all associated images which do not belong to the partition into all associated images of the expansion image;
S3.4, judging whether the size of the overlapped area meets the requirement: and calculating the proportion of the newly added image in the sub-scene i, and if the proportion reaches or exceeds a set threshold tau overlap, ending the expansion process of the sub-scene.
Further, the sub-scene merging operation of S7 specifically includes the following steps:
S4.1, identifying common three-dimensional points: identifying all the common three-dimensional points observed by at least three characteristic points in two sub-scenes src and tgt to be combined, and marking a coordinate set of the common three-dimensional points in the src sub-scene as src_xyz and the tgt sub-scene as tgt_xyz;
S4.2, randomly sampling: randomly extracting m points from src_xyz and tgt_xyz, which are named src_rand and tgt_rand respectively;
S4.3, estimating similarity transformation: estimating a similarity transformation matrix T from the src sub-scene to the tgt sub-scene by using the src_rand and the tgt_rand;
s4.4, calculating a reprojection error: calculating a reprojection error E src of each three-dimensional point in the src sub-scene in the tgt sub-scene by means of estimating to obtain a similar transformation matrix T, and iterating;
s4.5, iteration judgment: and judging whether the iteration times reach a preset upper limit, ending iteration if the iteration times reach the preset upper limit, and returning to S4.3 if the iteration times do not reach the preset upper limit.
Further, the method for estimating the similarity transformation in S4.3 specifically includes the following steps:
S4.3.1 the src_rand and tgt_rand are represented by vectors (x 1,x2,...,xn) and (y 1,y2,...,yn), and their means μ x and μ y, and variance are found And
S4.3.2, mu x、μy obtained according to S4.3.1,AndCalculating to obtain covariance matrixes Sigma xy of the two groups of vectors;
S4.3.3, performing SVD decomposition on the covariance matrix Σ xy, and then obtaining the required similarity transformation between two point clouds through the decomposed matrix and the constructed diagonal matrix, wherein the similarity transformation comprises a rotation matrix, a translation vector and a scale.
Further, the covariance matrix Σ xy in S4.3.2 has the following calculation formula:
SVD decomposition, rotation matrix, translation vector and scale calculation formula in S4.3.3 are:
R=USVT
t=μy-cRμx
Further, the calculation of the re-projection error in S4.4 also needs to obtain the re-projection error E tgt of the tgt sub-scene, and if both E src and E tgt are smaller than the preset maximum threshold D max, the three-dimensional point is regarded as the interior point.
Further, the iterative operation in S4.4 needs to update src_rand and tgt_rand after each end, and records the number of inner points N cur of the current iteration.
In order to better embody the effects of the present invention, the following examples are provided
Example 1: construction of scene graph (common view grid-based scene graph construction method)
Firstly, each image is uniformly divided into N 2 grid units, which are recorded as N 2, each grid in the image is used as a node of a common view, edges between the nodes represent the common view relationship between the grids, if at least T characteristic tracks are shared between two grids, the grids form a pair of common view grids, and based on the grid common view, the distance between any two images I i and I j can be defined as follows:
if the two grids do not communicate in a common view, this distance is defined as infinity.
The construction of the grid-based co-view requires consideration of the transitivity of the co-view relationship, but as the chain length of the co-view increases, the overlap between the end-to-end grids of the chain may decrease, or even disappear altogether. Thus, for image pairs that are farther apart, they should not be considered co-viewed. Specifically, whether or not two images are co-viewed is determined by the following formula:
If the distance between I i and I j does not exceed σ dist, they are considered to be a common view pair. This method effectively processes and evaluates the co-view relationship between images by dividing the images into grids and calculating the shortest distance between the grids.
Then a novel method is introduced for quantifying the association weights between images and reducing the false interpretation of the co-view relationship caused by matching the same object, and a strategy for re-dividing the co-view grid on different resolution layers is proposed herein, which is executed in order from low to high resolution, and is respectively marked as a first layer to a L layer. In each resolution layer, the grid cells may be in two states: empty or full. If a co-view matching pair is found in the grid cells of the first layer, the cell state is updated to be full, and the association score of the image pair is increased by 2 l, and the aim of the strategy is to enhance the robustness and accuracy of the co-view relationship-based image matching process by dispersing the distribution of the co-view grid. In order to calculate the total association score between two images, the method adopts an average value calculation mode of all valid (namely, grid cells with full states) unilateral scores, the process can be illustrated by visual representation, fig. 2 shows that a pyramid mechanism is utilized between two images to calculate association weights between the images, wherein matched characteristic points are marked by red points, and grid cells which are activated successfully are represented by yellow grids. Only the activated grid cells participate in the calculation of the final score, e.g. when L is equal to 3, the single side score of image i is 66, the single side score of image j is 80, and the correlation strength between the final image i and image j is calculated as 73.
Example 2: expansion sub-scene (sub-scene expansion method based on strong association)
In order to solve the problem that the overlapping area in the existing scene graph partitioning method is insufficient to support effective merging, the research proposes a new method. In the traditional graph cut and expand mode, final merging is achieved by creating overlapping regions between partitions, this approach relies on edge selection to determine overlapping image sets, particularly as shown in fig. 3, by collecting specific edges, e.g., (3, 10), (6, 10), (9, 26), etc., to identify image sets that can be used as partition separators, so that images {3,6,9, 10, 11, 16, 18, 19, 24, 25, 26} are used to create overlapping regions, however, this approach faces two main challenges: firstly, when the scene graph is too sparse, the number of the obtained boundary images is insufficient, and the similar transformation for all partial reconstruction merging cannot be calculated, and secondly, the graph cutting method tends to separate the partitions along the weakly associated edges, so that the separators are usually weakly constrained in the reconstruction process, and the accuracy of pose estimation of the separators is affected.
In order to overcome the limitations, the chapter provides a sub-scene expansion algorithm based on strong correlation, which aims to improve the precision of sub-scene merging under the sparse condition of a scene graph. The core idea of the algorithm is to identify and expand edges with strong relevance so as to increase the number and quality of overlapping areas and further improve the accuracy and the robustness of reconstruction, and the method comprises the following specific steps:
Step one: and (5) collecting boundary images. Initially, it is necessary to identify from the already constructed scene graph those images that are at the junction of different sub-scenes or that are relatively isolated throughout the scene, which images are considered boundary images, and to collect them into a set S.
Step two: gradual expansion of sub-partitions. For each sub-scene i in the scene graph, the following steps are performed:
Step three: searching a boundary image with the highest association degree with the external image in the boundary images of the set S, namely an extended image, and adding all associated images which do not belong to the partition into all associated images of the extended image.
Step four: and judging whether the size of the overlapped area meets the requirement. The proportion of the newly added image in the sub-scene is calculated. If this ratio meets or exceeds the set threshold τ overlap, the expansion process for this sub-scene is terminated.
By the method, not only is the boundary of the sub-scene gradually filled, but also effective association is carried out according to the boundary image in the scene, and finally, a more continuous and accurate scene graph is obtained.
Example 3: merging sub-scenes (merging method of sub-scenes merging three-dimensional points)
In existing scene reconstruction methods, merging between sub-scenes typically relies on overlapping images, and similarity transformations between sub-scenes are computed from pose estimates of these common images. However, this approach faces significant challenges. First, if the estimation of the image pose is not accurate enough, the similarity transformation calculated based on these poses will also be unreliable, resulting in a bias in the result of sub-scene merging. Second, the camera center point cloud of the image is typically more sparse than the three-dimensional point cloud, which further reduces the number of data points available for estimating the similarity transformation, reducing the accuracy and stability of the estimation process. In order to solve the problems, an innovative sub-scene merging algorithm is introduced, and the algorithm overcomes the defects of the traditional method by fusing three-dimensional point cloud data.
(1) The specific combining process comprises the following steps:
Step one: identification of common three-dimensional points: first, it is necessary to identify, in the two sub-scenes src and tgt to be combined, all the common three-dimensional points observed by at least three feature points, and these points are valid matching points, which are supported by a plurality of observations, the coordinate set of the common three-dimensional points in the src sub-scene is denoted src_xyz, and the tgt sub-scene is denoted tgt_xyz.
Step two: random sampling: next, m points, named src_rand and tgt_rand, respectively, are randomly extracted from src_xyz and tgt_xyz, which step is to prepare for the estimation of the geometric transformation.
Step three: estimation of similarity transformations: using src_rand and tgt_rand, a similarity transformation matrix T is estimated from the src sub-scene to the tgt sub-scene, this transformation reflecting the spatial relationship between the two sub-scenes.
Step four: calculation of the re-projection error: with the estimated similarity transformation T, the re-projection error E src of each three-dimensional point in the src sub-scene in the tgt sub-scene is calculated herein, similarly, the re-projection error E tgt of the tgt sub-scene is also calculated herein, if both errors are smaller than the preset maximum threshold D max, the three-dimensional point is considered as an interior point, the identification of the interior point helps to improve the accuracy of the transformation estimation, the src_rand and tgt_rand are updated after each iteration is finished, and the number of interior points N cur of the current iteration is recorded.
Step five: and (3) iteration judgment: finally, checking whether the iteration times reach a preset upper limit, and if so, ending the iteration process; if not, the needed iteration times are recalculated according to the current internal point number E tgt, and the step three is returned to be continuously executed.
Through the coherent and iterative process, sub-scenes can be accurately combined, so that higher-quality data fusion is realized in the whole three-dimensional scene reconstruction. The method not only enhances the consistency among sub-scenes, but also improves the accuracy and reliability of the whole three-dimensional model.
(2) Method for obtaining similar transformation between sub-scenes
The src_rand and tgt_rand are expressed by vectors (x 1,x2,...,xn) and (y 1,y2,...,yn) so that their average values mu x and mu y, and variance can be obtainedAndThe method comprises the following steps of:
covariance matrix of the two sets of vectors can then be obtained
Σxy=UDVT
Through SVD decomposition of the covariance matrix, finally, the required similarity transformation between two point clouds can be obtained through the decomposed matrix and the constructed diagonal matrix, and the method comprises the steps of rotating the matrix, translating the vector and the scale:
R=USVT
t=μy-cRμx
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. The layered motion restoration structure method based on grid common view is characterized by comprising the following steps of:
S1, inputting an image: inputting an RGB image set;
s2, image preprocessing: extracting characteristic points and matching the characteristic points of the image;
S3, constructing a scene graph: the method comprises the steps that a graph structure formed by all related images is formed, nodes are images, and edges are the association degree between the images;
S4, dividing a scene graph: dividing a large scene graph into a plurality of small scene graphs to obtain a plurality of sub-scenes;
s5, expanding the sub-scene: expanding boundaries of the sub-scene;
s6, reconstructing local SfM: independently reconstructing an SfM system for the image in each sub-scene;
S7, merging sub-scenes: and merging the results of the SfM system reconstruction independently in the previous step to obtain a complete reconstruction result.
2. A hierarchical motion restoration structure method based on grid-co-vision according to claim 1, wherein: before the step S3 is performed, whether the two images are associated or not is needed to be judged, and the association degree is calculated, and the method specifically comprises the following steps:
S2.1, uniformly dividing each image into N 2 grid units, and marking as
S2.2, taking each grid in the image as a node of a common view, wherein edges between the nodes represent common view relations among the grids, if at least T characteristic tracks are shared between two grids, the grids form a pair of common view grids, and the distance between any two images I i and I j is defined based on the common view of the grids;
s2.3, judging whether the two images are co-viewed or not;
s2.4, marking the first layer to the L layer respectively according to the sequence from low resolution to high resolution, marking grid cells with two states, if a common view matching pair is found in the grid cells of the first layer, updating the cell state to be full, and increasing the association score of the image pair by 2 l;
and S2.5, calculating the total association degree score between the two images through the single-side score average value of all the states of which are full grid cells.
3. A hierarchical motion restoration structure method based on grid-co-vision according to claim 2, wherein: the formula for defining the distance between any two images in S2.2 is as follows:
wherein, Representing grids in a common viewAnd grid ofThe shortest path distance between, dist (I i,Ij) represents the minimum distance between all possible co-view mesh pairs between image I i and image I j.
4. A hierarchical motion restoration structure method based on grid-co-vision according to claim 2, wherein: the judging formula of whether the two images in S2.3 are viewed together is as follows:
Where σ represents a preset distance threshold, and if the distance between I i and I j does not exceed σ dist, they are determined to be co-view pairs.
5. A hierarchical motion restoration structure method based on grid-co-vision according to claim 1, wherein: the expanding the boundary of the sub-scene in S5 requires that in the conventional graph cut and expansion mode, the final merging is achieved by creating an overlapping region between the partitions, and the specific method is as follows:
s3.1, collecting boundary images: identifying those images from the already constructed scene graph that are at the junction of different sub-scenes or that are relatively isolated throughout the scene, and collecting them into a set S;
s3.2, gradually expanding the sub-partitions: performing the following steps for each sub-scene i in the scene graph;
S3.3, searching a boundary image with the highest association degree with the external image in the boundary images of the set S as an expansion image, and adding all associated images which do not belong to the partition into all associated images of the expansion image;
S3.4, judging whether the size of the overlapped area meets the requirement: and calculating the proportion of the newly added image in the sub-scene i, and if the proportion reaches or exceeds a set threshold tau overlap, ending the expansion process of the sub-scene.
6. A hierarchical motion restoration structure method based on grid-co-vision according to claim 1, wherein: the sub-scene merging operation of S7 specifically includes the following steps:
S4.1, identifying common three-dimensional points: identifying all the common three-dimensional points observed by at least three characteristic points in two sub-scenes src and tgt to be combined, and marking a coordinate set of the common three-dimensional points in the src sub-scene as src_xyz and the tgt sub-scene as tgt_xyz;
S4.2, randomly sampling: randomly extracting m points from src_xyz and tgt_xyz, which are named src_rand and tgt_rand respectively;
S4.3, estimating similarity transformation: estimating a similarity transformation matrix T from the src sub-scene to the tgt sub-scene by using the src_rand and the tgt_rand;
s4.4, calculating a reprojection error: calculating a reprojection error E src of each three-dimensional point in the src sub-scene in the tgt sub-scene by means of estimating to obtain a similar transformation matrix T, and iterating;
s4.5, iteration judgment: and judging whether the iteration times reach a preset upper limit, ending iteration if the iteration times reach the preset upper limit, and returning to S4.3 if the iteration times do not reach the preset upper limit.
7. A hierarchical motion restoration structure method based on grid-co-vision as recited in claim 6, wherein: the method for estimating the similarity transformation in S4.3 specifically comprises the following steps:
S4.3.1 the src_rand and tgt_rand are represented by vectors (x 1,x2,...,xn) and (y 1,y2,...,yn), and their means μ x and μ y, and variance are found And
S4.3.2, mu x、μy obtained according to S4.3.1,AndCalculating to obtain covariance matrixes Sigma xy of the two groups of vectors;
S4.3.3, performing SVD decomposition on the covariance matrix sigma xy, and then obtaining the required similarity transformation between two point clouds through the decomposed matrix and the constructed diagonal matrix, wherein the similarity transformation comprises a rotation matrix, a translation vector and a scale.
8. A hierarchical motion restoration structure method based on grid-co-vision as recited in claim 7, wherein: the covariance matrix Σ xy in S4.3.2 has a calculation formula:
SVD decomposition, rotation matrix, translation vector and scale calculation formula in S4.3.3 are:
R=USVT
t=μy-cRμx
9. A hierarchical motion restoration structure method based on grid-co-vision as recited in claim 6, wherein: the calculation of the re-projection error in S4.4 also needs to obtain the re-projection error E tgt of the tgt sub-scene, and if both E src and E tgt are smaller than the preset maximum threshold D max, the three-dimensional point is regarded as the interior point.
10. A hierarchical motion restoration structure method based on grid-co-vision as recited in claim 6, wherein: after each iteration operation in S4.4 is finished, src_rand and tgt_rand need to be updated, and the number of inner points N cur of the current iteration is recorded.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410410945.4A CN118314370A (en) | 2024-04-07 | 2024-04-07 | Layered motion restoration structure method based on grid common view |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410410945.4A CN118314370A (en) | 2024-04-07 | 2024-04-07 | Layered motion restoration structure method based on grid common view |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN118314370A true CN118314370A (en) | 2024-07-09 |
Family
ID=91721592
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410410945.4A Pending CN118314370A (en) | 2024-04-07 | 2024-04-07 | Layered motion restoration structure method based on grid common view |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118314370A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120014179A (en) * | 2025-04-18 | 2025-05-16 | 武汉轻工大学 | A three-dimensional reconstruction method, device and medium for incremental sub-scene merging |
-
2024
- 2024-04-07 CN CN202410410945.4A patent/CN118314370A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120014179A (en) * | 2025-04-18 | 2025-05-16 | 武汉轻工大学 | A three-dimensional reconstruction method, device and medium for incremental sub-scene merging |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111126359B (en) | High-definition image small target detection method based on self-encoder and YOLO algorithm | |
| CN109658445A (en) | Network training method, increment build drawing method, localization method, device and equipment | |
| CN114004938B (en) | Urban scene reconstruction method and device based on mass data | |
| CN106097311A (en) | The building three-dimensional rebuilding method of airborne laser radar data | |
| CN113129311B (en) | A label-optimized point cloud instance segmentation method | |
| CN118736425B (en) | Remote sensing image building change detection method and system based on morphological constraint | |
| CN111027140A (en) | Airplane standard part model rapid reconstruction method based on multi-view point cloud data | |
| CN113325389A (en) | Unmanned vehicle laser radar positioning method, system and storage medium | |
| CN115546116B (en) | Full coverage rock mass discontinuity surface extraction and spacing calculation method and system | |
| CN111916144A (en) | Protein classification method based on self-attention neural network and coarsening algorithm | |
| CN113838005A (en) | Intelligent rock fracture identification and three-dimensional reconstruction method and system based on dimension conversion | |
| WO2017107866A1 (en) | Image retrieval server and system, related retrieval and troubleshooting method | |
| CN119830423A (en) | Building engineering three-dimensional modeling method and system based on BIM | |
| CN114332291A (en) | Oblique photography model building outer contour rule extraction method | |
| CN118038035A (en) | Wheat plant point cloud organ segmentation, analysis method, device, equipment and medium | |
| CN120219919A (en) | Insulator defect detection method, device, medium and equipment | |
| CN118314370A (en) | Layered motion restoration structure method based on grid common view | |
| CN111986223B (en) | Method for extracting trees in outdoor point cloud scene based on energy function | |
| CN112241676A (en) | Method for automatically identifying terrain sundries | |
| CN115578245A (en) | A Contour Extraction and Fusion Method of Contours of Residential Buildings | |
| CN118397016B (en) | Indoor scene layout estimation method based on structural prior | |
| CN119810443A (en) | A method and system for building instance segmentation and regularization based on edge optimization | |
| CN117496104A (en) | Point cloud uniform completion method based on feature fusion | |
| CN116433768A (en) | A Scalable Incremental Visual Mapping Method Based on Neural Radiation Field | |
| CN115713627A (en) | Plane feature extraction method based on normal vector segmentation and region growing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |