Disclosure of Invention
In order to solve the problems of low robustness, low speed and high complexity of the existing image matching method, the invention adopts the technical scheme that the image feature matching method and device based on local consistency are provided.
According to one aspect of the present invention, there is provided an image feature matching method based on local consistency, including the steps of:
s1, acquiring two images to be matched;
S2, detecting feature points in the two images through a SIFT algorithm, and establishing feature descriptors;
S3, constructing a group of initial matching sets according to the similarity of feature descriptors in the two images;
s4, deleting mismatching from the initial matching set through neighborhood consistency constraint to obtain a second matching set;
S5, calculating the deviation of the motion vector of each feature point through the second matching set;
S6, optimizing a preset image feature matching model and determining super parameters by carrying out motion vector consistency constraint on the deviation of the motion vectors;
And S7, performing feature matching on the two images to be matched through the optimized image feature matching model to obtain an image feature matching result.
Preferably, the S4 includes:
s4.1, obtaining attribute variables of each initial match in the initial matching set through neighborhood consistency constraint
S4.2 setting a threshold value eta by combining eta withComparing, filtering the feature points which do not accord with the neighborhood consistency constraint to obtain a second matching set U:
U={(xi,yi)∈S|Rsti>η}
Wherein S represents an initial matching set, i represents a sequence number of the initial matching, (x i,yi) represents an ith initial matching, rst i represents all neighborhoods corresponding to the ith initial matching And M represents the number of neighbors taken for each initial match, j represents the sequence number of the neighbor, i.e., the j-th neighbor, and K j represents the size of the j-th neighbor.
Preferably, the S5 includes:
S5.1, vector conversion is carried out on the second matching set, then the second matching set is gridded into a plurality of non-overlapping units through space, and the estimated motion vector of each unit is calculated through Gaussian kernel convolution operation;
and S5.2, calculating the deviation between the initial motion vector and the estimated motion vector in each unit.
Preferably, the S5.1 includes:
s5.1.1. converting the second set of matches U into an initial set of motion vectors S':
wherein x i and y i respectively represent the i-th feature point forming initial matching in the two images, i represents the feature point or the serial number of the initial matching, m i=yi-xi represents the motion vector obtained by the i-th initial matching (x i,yi), and N represents the number of initial matching in the initial matching set;
S5.1.2 equally dividing each dimension of the initial set of motion vectors S 'into n c non-overlapping portions, then obtaining g=n c×nc units, so that the initial set of motion vectors S' can be divided into G units, becoming a set
Wherein n c represents the number of divisions of each dimension of the initial motion vector set S ', G represents the number of divisions of the initial motion vector set S ' as a whole, that is, S ' is divided into G units, C j,k represents the j-th row, and k-th column of units, j and k respectively represent the row number and the column number of the unit C j,k;
S5.1.3 obtaining an estimated motion vector for each cell C j,k by Gaussian kernel convolution processing
Preferably, the S5.2 includes:
S5.2.1, calculating the numerical deviation of the initial motion vector and the estimated motion vector;
S5.2.2 calculating a length ratio deviation of the initial motion vector from the estimated motion vector;
s5.2.3 calculating an angular deviation of the initial motion vector from the estimated motion vector;
s5.2.4 calculating a motion vector total deviation according to the numerical deviation, the length ratio deviation and the angle deviation.
Preferably, after S5.2.4, the method further includes:
performing quantization treatment on the total deviation of the motion vectors to obtain a quantization result;
the quantization result is used to reflect the degree of consistency between the initial motion vector and the estimated motion vector within each cell.
According to another aspect of the present invention, there is also provided an image feature matching apparatus based on local consistency, including:
The image acquisition module is used for acquiring two images to be matched;
the feature point detection and descriptor establishment module is used for detecting feature points in the two images through a SIFT algorithm and establishing feature descriptors;
The initial matching set constructing module is used for constructing a group of initial matching sets according to the similarity of the feature descriptors in the two images;
The neighborhood consistency constraint module is used for deleting mismatching from the initial matching set through neighborhood consistency constraint to obtain a second matching set;
A motion vector deviation calculation module, configured to calculate a deviation of a motion vector of each feature point through the second matching set;
The motion vector consistency constraint module is used for carrying out motion vector consistency constraint on the deviation of the motion vector so as to optimize a preset image feature matching model and determine super parameters;
And the feature matching module is used for carrying out feature matching on the two images to be matched through the optimized image feature matching model to obtain an image feature matching result.
The invention uses a primary filtering strategy based on feature point neighborhood consistency to remove a part of outliers with obvious errors, and purifies the neighborhood to obtain an approximate inner point set, thereby constructing a more real neighborhood. The image space is meshed into a plurality of non-overlapping units, and an estimated motion vector is calculated for each unit through Gaussian kernel convolution operation to represent the potential motion characteristics of each unit; finally, by judging the degree of consistency between the initial motion vector and the estimated motion vector in each unit, subtle mismatching is further eliminated, and more accurate matching is obtained.
The technical scheme provided by the invention has the following beneficial effects:
1. the method can cope with ground fluctuation change, imaging viewpoint change or more complex non-rigid transformation of the remote sensing image.
2. The method can effectively remove a large amount of mismatching generated when remote sensing image information is complex or the image is affected by nonlinear radiation difference, speckle noise, light change and the like.
3. The method has linear complexity, can solve thousands of matching problems in a few milliseconds, and ensures the speed of an algorithm. This is useful for many real-time applications and can quickly provide good initialization of matching algorithms for specific complex problems, particularly remote sensing images.
Detailed Description
For a clearer understanding of technical features, objects and effects of the present invention, a detailed description of embodiments of the present invention will be made with reference to the accompanying drawings.
Referring to fig. 1, the embodiment provides an image feature matching method based on local consistency, which mainly includes the following steps:
s1, acquiring two images to be matched;
S2, detecting feature points in the two images through a SIFT algorithm, and establishing feature descriptors;
S3, constructing a group of initial matching sets according to the similarity of feature descriptors in the two images;
s4, deleting mismatching from the initial matching set through neighborhood consistency constraint to obtain a second matching set;
the step S4 specifically comprises the following steps:
(1) Neighborhood consistency model
For an image pair describing the same scene or object, when viewpoint change (such as stereo parallax) or non-rigid transformation (such as dynamic scene) occurs, the absolute distance between two mutually corresponding characteristic points may change obviously, but the spatial neighborhood relationship between the characteristic points representing the local neighborhood structure of the image scene is quite stable, not affected, and generally can be well preserved. Taking a non-rigid face image as an example, due to physical constraints of bones and muscles, changes in expression and viewing angle do not result in local structural changes in the face, such as relative positions of eyes, nose, mouth, chin, etc.
For one initial match (x i,yi) in the initial match set S n, if it is an inlier (a feature point that can make a correct match), then the distribution of its neighborhood elements should be similar. In contrast, for an outlier (feature point of mismatching), the corresponding neighborhood distribution will be quite different, which is a neighborhood consistency constraint, as shown in fig. 3, which shows initial matches (x i,yi) for the inliers and outliers, respectively, and 5 sets of initial matches within their neighbors. It is easy to see that many common initial matches are contained within the neighborhood of the inlier but there is little common initial match within the neighborhood of the outlier (the correct match is represented by a black line and the false match is represented by a gray line). To capture this attribute mathematically, the present invention defines a set of sizes asIs a neighborhood of (i.e.)AndTherein, whereinThe neighborhood of the point X i is represented, M represents the number of the neighborhood, and the neighborhood is composed of K j characteristic points which are closest to X i in all characteristic point sets of X, and K j represents the size of the jth neighborhood. This strategy is known as the multiple K-Nearest Neighbor strategy (Multi-K-Nearest Neighbor, multi-KNN). At this timeAndNeighborhood consistency between can be characterized by the following equation:
wherein, Is thatAndThe number of common feature points in both neighborhoods,The ratio of the number of common feature points in the jth neighborhood of the ith initial match to the number of all feature points. Obviously, if the corresponding feature point is an interior point,The value of (2) will be large and vice versa. Note that the true common element cannot be determined without GroundTruth. However, the inclusion in S n may be appliedAndThe number of initial matches between them and the number of common elements are replaced to obtain an approximation. The reason for using this approximation is that if the initial match (x i,yi) is an inlier, then it occurs in the local neighborhood at the same timeAndThe initial match of the feature points in (a) is most likely to be an inlier, and conversely, if (x i,yi) is an outlier, then in its neighborhoodAndIn (3) is likely not to simultaneously occur an initial match, that is, an x i neighborhoodThe matching point of a certain point in the map is in the neighborhood of y i with high probabilityOutside of that.
(2) Neighborhood consistency constraint preliminary filtering
When the image is subject to complex non-rigid transformations or to external noise, the initial set of matches may contain a large number of outliers. In order to remove the outliers with obvious errors to a certain extent to clean the local area, the estimated motion vector which is more similar to the potential true motion vector can be obtained in the following processing, and the attribute variables obtained through the neighborhood consistency constraint are convenient to obtainTo perform the preliminary filtering.
To achieve this, a simple threshold value, η, may be set by combining η withComparing, filtering out feature points which do not meet the neighborhood consistency constraint, and obtaining a preliminarily filtered second matching set U:
U={(xi,yi)∈S|Rsti>η}, (2)
Wherein U is a second matching set after preliminary filtering, eta is a threshold value used in comparison, rst i represents all neighborhoods corresponding to the ith initial matching And M represents the number of neighbors. The second set of matches U after the preliminary filtering is typically clean enough to be used for the construction of the motion vector field. According to a large number of experimental results, the number of neighborhoods=3 is set in a multiple K-nearest neighbor strategy, and the size of each neighborhood is K 1=9,K2 =10 and K 3 =11 respectively. If necessary, the initial matching set may be filtered iteratively by setting different thresholds η, but it is sufficient that the method of the invention performs the filtering only once.
S5, calculating the deviation of the motion vector of each feature point through the second matching set;
the step S5 specifically comprises the following steps:
(1) Numerical deviation of motion vectors
In order to effectively remove noise interference in an image, according to the theory of image denoising, when an image is full of noise, the method fully considers pixel points in a local area (determined by the size of a convolution kernel), and adopts a motion vector method to acquire real pixel information to remove noise. First, the second matching set U is converted into an initial motion vector setWhere m i=yi-xi represents the initial motion vector of the i-th initial match (x i,yi), for potential true matches in the image in the set of matches, the motion vector should be regular and smooth, i.e., motion vector consistency is satisfied, i.e., correct matches will have similar motion behavior in local neighbors, while false matches are typically randomly distributed. As shown in fig. 2, it can be seen that there are three steps for each match, namely, establishing an initial match set, purging the neighborhood with neighborhood consistency, and further filtering outliers with motion vector consistency. With two pictures for each step, the left picture representing the initial motion vector field for gridding and the right picture representing the estimated motion vector field for each cell (generated by gaussian kernel convolution). It can be seen that with deep denoising, fewer outliers in the vector field are generated, the estimated vector field is also more regular and smooth, and some abrupt vectors are removed.
To obtain the estimated motion vector, each dimension of the initial set of motion vectors S 'is first equally divided into n c non-overlapping parts, and then the initial set of motion vectors S' is partitioned into g=n c×nc units (G is the number of units) into a setWhere C j,k represents the j-th row, and the k-th column of cells, j and k represent the row number and column number, respectively, where cell C j,k is located. The initial set of motion vectors after meshing is shown in fig. 2, where n c =20 in fig. 2. Now, it willDefined as an average motion vector matrix in whichIs the average motion vector within the (j, k) th cell C j,k:
Where |·| represents a modulo operation, |c j,k | is the size of cell C j,k (i.e., the number of motion vectors in cell C j,k).
In order to fully exploit the interactions between adjacent cells, the present invention exploits an efficient convolution theory, as it can comprehensively consider the local relations between n k×nk cells. Here the convolution of motion vectorsThe definition is as follows:
Wherein the method comprises the steps of The size of the matrix generated after convolution for the gaussian kernel is n c×nc x 2,For the estimated motion vector of the (j, k) th cell, W is a count matrix representing the number of motion vectors in each cell, W j,k=|Cj,k represents the size of the j-th row, k-th column cell C j,k. The denominator term in equation (4) is used for weight compensation to keep the scale of the convolution result from changing too much whenEpsilon is an infinitesimal positive number when 0 is present. K is a Gaussian kernel distance matrix with the size of n k×nk, also called Gaussian convolution kernel, which represents the Gaussian kernel distance between a certain unit and surrounding (n k×nk -1) units, and the connection relation between the certain unit and the surrounding adjacent units can be established through the parameter. Each element K i,j of K is defined as:
Where K i,j represents the Gaussian kernel distance of the cells of row i, column j, s i,j=(i,j)T, The position of each cell in the gaussian convolution kernel K (i.e., row i, column j) and the center position of the gaussian convolution kernel, respectively, and [ · ] rounds the element to no less than its own and nearest integer. Since the center position inside the convolution kernel needs to be determined, n k must be a positive odd number.
In addition, to avoid the influence of the isolated samples, the isolated samples may be excluded from the convolution process by subtracting the corresponding average vectors from the numerator and denominator and then adjusting the weights of each cell. In formula (4), B (W) represents a binary form of W, which has a value of 0 or 1, and B (W i,j)=1; K* represents a Gaussian kernel distance of the center position only when W i,j >0, and
Through the above convolution operation, an estimated motion vector for each cell can be obtained, which can be used to replace the potential true motion vector. Then, an initial motion vector m i and an estimated motion vector corresponding to each element are obtainedThe value of the deviation d i of (1) is constrained between [0,1] by the following formula:
Where dev i is the motion vector value deviation where β determines the width of the interaction range between two motion vectors. Beta 2 can be empirically set to 0.08.
(2) Length ratio deviation of motion vector
In order to make the difference between the mismatching and the correct matching more obvious, the invention considers the numerical deviation between each motion vector and the estimated motion vector, and further hands down from the length and the angle of the motion vector, thereby more conveniently distinguishing by using the threshold value. The three deviations are normalized and summed to give the total deviation. The defects of all deviations are mutually compensated, and the precision and recall rate are greatly improved.
The length ratio deviation and the angle deviation are constructed, the error matching motion vector has larger angle and length ratio deviation, and the correct matching motion vector is basically consistent with the real motion vector.
Defining length ratio deviations of motion vectors
Wherein the method comprises the steps ofRepresents m i andThe ratio of the lengths between the (j, k) th unit is simply the length ratio between the longest motion vector and the shortest motion vector. The normalization is performed by a Gaussian membership function. UsingAs a length ratio deviation of the motion vector.
(3) Angular deviation of motion vector
Defining an angular deviation of the motion vector:
Wherein the method comprises the steps of AndRepresents m i andAn angle therebetween. Use with the sample inventionAs the angular deviation of the motion vector.
(4) Motion vector total bias and quantization strategy
Will beAndIn combination with the previous numerical deviation dev i, the total deviation between the motion vector and the estimated motion vector is defined as:
By comparing the deviation with a given threshold lambda, the inner set I can be approximately detected *
I*={(xi,yi):di≤λ,i∈NN}. (10)
As can be seen from fig. 4 (a), there is no significant distinction between the interior points and the outliers (where the points represent outliers, the o-points represent interior points, the ordinate represents motion bias, and the abscissa represents the index value of each feature point) over the total bias. This is because the variation range of the deviation of each feature point is not uniform. In order to solve this problem, a quantization strategy is adopted.
For total deviationQuantization processing
Wherein d i,j represents the jth motion vector m j in the x i neighborhood and its corresponding estimated motion vectorTotal motion vector deviation between. s i,j integrates the motion characteristics of the motion vector m i itself and its neighborhood.
Finally, the feature point x i itself including all s i,j in its neighborhood is added to get the quantization deviation. The smaller the value, the more consistent the match is to the motion vector consistency. At this time, the cumulative distribution of quantized motion vector deviations for each feature point is shown in fig. 4 (b), and it can be seen that the difference between the interior point and the outlier becomes apparent.
S6, optimizing a preset image feature matching model and determining super parameters by carrying out motion vector consistency constraint on the deviation of the motion vectors;
S6 specifically comprises the following steps:
(1) Matching model establishment and optimization
Assume that the initial set of matches extracted from the two images isN initial matches are included, x i and y i are two-dimensional column vectors of feature point spatial locations. The initial set of matches S contains a number of false matches.
In order to better preserve local features, the invention adopts a general mathematical model as an image feature matching model. The invention defines I as an unknown interior point set, its optimal solution is I *:
I*=argminIC(I;U,S,λ) (13)
wherein the cost C is defined as:
C(I;S,λ)=∑i∈I∑j∈I(si+λ(N-|I|)) (14)
Where s i is the total bias after quantization, |·| is the cardinality of the set. In this cost function, the first term penalizes any matches that do not maintain local consistency, the second term limits the number of outliers, and the parameter λ >0 is used to control the weight between the two terms. The object of the present invention is to obtain a maximum set of interior points by minimizing the cost function C.
To optimize the cost function, the initial set of matches S is associated with a binary vector P of dimension Nx1, where P i ε {0,1} represents the match correctness of the ith correspondence (xi,yi). Specifically, p i =1 represents an interior point, and p i =0 represents an outlier.
By combining the terms related to p i to reconstruct the common terms in the formula, the following formula can be obtained:
Wherein the method comprises the steps of
ci=si (16)
C i can measure the degree to which the i-th match (x i,yi) satisfies the local consistency. Obviously, a correct match would bring zero or lower cost, while a false match would bring higher cost.
For a given initial set of matches, the coordinates of the feature points are known, i.e. the neighborhood structure and motion vectors between feature points are fixed, so that all total cost values can be pre-computedThat is, the only unknown variable in equation (14) is p i. The optimal solution for P to minimize the cost function is determined by the following simple criteria:
Thus, the optimal inner set I * can be determined by:
I*={i|pi=1,i=1,···,N}. (18)
from equation (17), it can be seen that the parameter λ has a threshold effect in addition to the effect of adjusting the trade-off. There are different distribution parameters, motion vector fields and costs for different types of image pairs, and so will also change.
(2) Determination of superparameter
The present invention has so far converted the feature matching problem into an optimized mathematical model. However, there are several hyper-parameters, i.e., { n c,nk, λ, η }, whose setting of values can seriously affect the result of the algorithm. Therefore, in order to obtain the best result and the best parameter setting, the invention selects 12 images from rigid deformation, rotational deformation, scale deformation and non-rigid deformation respectively for experiment.
The invention changes the values of n c and n k, makes a plurality of groups of experiments, and records the precision, recall and F-score. Precision, recall, and F-score are common indicators of evaluation of algorithm matching performance.
As shown in FIG. 5, FIG. 5 (a) shows the precision, recall and F-score curves obtained by the method of the present invention when n c=20、nk is varied, and FIG. 5 (d) shows the precision, recall and F-score curves obtained by the method of the present invention when n k=7、nc is varied. It is clear from the results n c and n k that there is little change and insensitivity over a range (n c∈(10,40),nk e 5, 9), but beyond this range the accuracy, recall and F-score are significantly degraded. Therefore, as long as the optimum values of n c and n k are chosen by the design adaptive algorithm within this range, the best results are obtained.
In order to determine the optimal super-parameter n c,nk during the gaussian kernel convolution filtering, the present invention also uses the same data comprising 48 pairs of images as the above experiment.
First, the present invention prepares statistics of the numerical deviations of the motion vectors of 12 sets of different (n c,nk) combined interior points and outliers, where n c values are taken at 10 intervals from 10 to 40 (represented by four different gray scales respectively), and n k values are taken at 2 intervals from 5 to 9 (represented by three types of lines, i.e., full curve, dotted line, and dashed line respectively). In fig. 5 (b), the falling trend line represents the statistics of the interior points, and the rising trend represents the statistics of the outliers. The optimal parameter lambda corresponding to each group (n c,nk) is the intersection point of the interior point and the corresponding outlier probability curve. The closer the intersection point is to the lower left corner, the better the separability of the inner layer from outliers. Thus, according to the experimental results, the invention can adaptively set the super parameters N c and N k by utilizing the relation between N c of the initial matching set and the size N of the set.
Where N is the number of initial matches in the initial match set. The elements may be rounded. The odd (n c/3) represents the nearest odd number not greater than n c/3.
According to the invention, 10 test images are respectively selected from 48 test images used in the experiment according to translation, scaling, rotation and non-rigid transformation, and then 10 remote sensing images are additionally selected, wherein all the transformation is included. The same experiment was performed using a total of 50 images of the 5 groups, respectively, and the results are shown in fig. 5 (e). From the results it can be seen that different types of image deformations lead to a large difference in these intersections, which means that different optimal lambda and eta should be chosen. Note that when c i =λ, the setting of p i may be arbitrary.
To determine the best λ and η, the present invention uses the same dataset (including 50 pairs of images) to experiment. First, the value of η is fixed, and the accuracy, recall and F-score are calculated using different λ values. The value of λ is then fixed, different values of η are used, and precision, recall and F-score are calculated. The final results are shown in fig. 5 (c) and (F), where λ=0.45, η=0.3, the present invention gives the highest precision, recall and F-score.
And S7, performing feature matching on the two images to be matched through the optimized image feature matching model to obtain an image feature matching result.
Fig. 6 is a qualitative result of the method of the present invention for a 10-pair representative image pair.
To evaluate the performance of the method of the present invention, 5 generic image matching datasets were used in the experiment, each dataset containing 30 pairs of images and Ground Truth. The 5 data sets comprise a wide range of content, including primarily telemetry data sets and non-telemetry data sets. The telemetry dataset is divided into 720Yun, SUIRD, CIAP and SAR datasets. The non-remote sensing data set comprises natural images, medical images, infrared images and the like. In order to ensure objectivity, the invention uses a VLFeat toolbox with an open source to extract SIFT features from all data sets, and then adopts a nearest neighbor strategy to construct an initial matching set according to the similarity of feature descriptors. Ground Truth of all datasets are either provided by manually marking each match as true or false, or checked using a geometric transformation matrix provided by the original author.
From top to bottom in fig. 6, each row contains two examples selected from the 5 data sets 720Yun, SUIRD, CIAP, SAR and non-telemetry, respectively. These image pairs have features of high outlier scale, small overlapping area, including scale, rotation, translation, and even non-rigid transformations, which present significant difficulties and challenges for the removal of mismatching. The accuracy, recall and F-score of the method of the invention from top to bottom and from left to right are :(98.61%,100.0%,0.9930)、(98.11%,98.96%,0.9854)、 (100.0%,99.04%,0.9952)、(100.0%,98.95%,0.9947)、(99.57%,100.0%,0.9978)、 (99.56%,99.56%,99.56%)、(100.0%,100.0%,1)、(99.49%,99.80%,99.64%)、 (98.95%,99.15%,0.9905)、(97.84%,98.81%,0.9833)., respectively, and can be easily found through experimental results, and the method of the invention can identify most interior points with little erroneous judgment. This shows that the method is universal and robust in processing remote sensing images of different types of image distortions and a large number of outliers. In addition, the method also obtains good experimental results on a non-remote sensing data set of a common scene.
In fig. 6, each line has two types of images, the left image is a feature correspondence wiring diagram, and the right image is a feature motion vector field diagram. The head and tail of each arrow on the motion vector field correspond to the spatial positions of the matching feature points in the two images (the figure contains both the identified correct matches and the identified incorrect matches, but the identified incorrect matches are few, almost 0). For the visibility of the matches in the image pairs, 100 matches are randomly selected in each pair of images to be displayed in a straight line.
FIG. 7 shows the experimental results of the 5 data sets, respectively, plotted against the quantitative evaluation index, accuracy, recall, F-sore, and run time, respectively. Precision (P) is defined as the proportion of correct matches in all matches retained by the algorithm, recall (R) is defined as the ratio of correct matches identified by the algorithm to all correct matches, and F-score is used to evaluate the overall performance of the match. Based on the number of True Positives (TP), true Negatives (TN), false Positives (FP), and False Negatives (FN), the accuracy can be calculated according to the following formula:
the recall rate is calculated according to the following formula:
Wherein True Positive (TP) indicates correct match for which the algorithm determines correct match, true Negative (TN) indicates incorrect match for which the algorithm determines incorrect match, false Positive (FP) indicates incorrect match for which the algorithm incorrectly determines correct match, and False Negative (FN) indicates correct match for which the algorithm incorrectly determines false match.
True Positives (TP) and True Negatives (TN) reflect the accuracy of the algorithm, and False Positives (FP) and False Negatives (FN) reflect the degree of error of the algorithm. F-score is taken as a comprehensive statistic of accuracy and recall, and the solving formula is as follows:
To make the results more convincing, the invention will quantitatively evaluate and compare the feature matching performance of the method of the invention and compare it to the LPM(Locality Preserving Matching,LPM)、LLT(Locally Linear transform,LLT)、LAF(Linear Adaptive Filtering,LAF)、LMR(Learning a Two-Class Classifier for Mismatch Removal,LMR)、VFC(Vector field Consensus, VFC)、RFM-SCAN and EES (Efficient Exact Search) algorithms. The code of the comparison algorithm comes from the publication of the original author, and all parameters of the comparison algorithm are set according to the original text and remain unchanged in the whole experimental process.
The results of fig. 7 show that the method of the present invention achieves good results on all remote sensing datasets, especially the CIAP dataset. The VFC algorithm performs poorly for certain image pairs in the dataset. This is due to the fact that the interpolation rate or interpolation number of these images is low, like the CIAP dataset, interfering with the construction of the vector field. In SAR data sets with severe noise and affine distortion, LMR algorithms produce many false positives and have low recall rates. The method and the LAF algorithm not only can recall more correct matches, but also have higher accuracy than other algorithms. For 720Yun data with non-rigid transformation and low interior point rate, the mismatch removal task is very difficult. The LLT algorithm does not perform perfectly on this dataset because the algorithm takes into account the linear transformation between feature points, with poor adaptation. For SUIRD datasets where viewpoint variation is large, there are severe outliers, such that motion vector bias increases. This affects both the typical stadium of LAF and the motion consistency clustering of RFM-SCAN, thereby reducing accuracy. Thus, the performance of LAF and RFM-SCAN algorithms is significantly degraded compared to other data sets. The EES algorithm is based on an interior point model. When the image contains non-rigid deformations, low interior point ratios and external noise disturbances, the performance of the geometric model fitting, such as 720Yun, CIAP and SAR data set images, will be significantly reduced. The LPM and the deep learning based LMR algorithms have good performance on each dataset, but are slightly lower than the method of the present invention. In addition, the method has good performance on the non-remote sensing data set, which shows that the method has certain universality and expandability and can be expanded to region-based processing of various types of images.
The F-score may reflect the overall performance of the algorithm. As can be seen from all remote sensing data sets, the method can keep robustness on remote sensing images with different problems due to the combination of the fixed characteristic points and the motion characteristic points, and obtain the optimal precision and recall rate. However, in some cases, the index of the comparison algorithm may decrease or even fail. Furthermore, the inventive method has a linear complexity. As can be seen from the last line of fig. 7, the operation speed of the research algorithm is equivalent to that of the LAF algorithm while the comprehensive performance is ensured. It is slightly slower than the LPM algorithm but much faster than other methods, especially when processing tens of thousands of matches.
Quantitative comparisons of LPM, LLT, LAF, LMR, VFC, RFM-SCAN, EES and the method of the present invention were made from left to right in fig. 7 on sets 720Yun, SUIRD, CIAP, SAR and Non-remote sensing 5, respectively. From top to bottom are cumulative distributions of inlier rate, precision, recall rate, F-score, and run time, respectively. The coordinates (x, y) of the points on each curve represent the y-value (i.e., the precision, recall, F-score, or run-time value in ms) of the 30 x image pair in each dataset. The algorithm represented by each curve, average Precision (AP), average Recall (AR), average F-score (AF), and average run time (runtime) are marked at the corresponding locations in the graph.
In some embodiments, there is also provided an image feature matching device based on local consistency, including the following modules:
The image acquisition module is used for acquiring two images to be matched;
the feature point detection and descriptor establishment module is used for detecting feature points in the two images through a SIFT algorithm and establishing feature descriptors;
The initial matching set constructing module is used for constructing a group of initial matching sets according to the similarity of the feature descriptors in the two images;
The neighborhood consistency constraint module is used for deleting mismatching from the initial matching set through neighborhood consistency constraint to obtain a second matching set;
A motion vector deviation calculation module, configured to calculate a deviation of a motion vector of each feature point through the second matching set;
The motion vector consistency constraint module is used for carrying out motion vector consistency constraint on the deviation of the motion vector so as to optimize a preset image feature matching model and determine super parameters;
And the feature matching module is used for carrying out feature matching on the two images to be matched through the optimized image feature matching model to obtain an image feature matching result.
The invention uses a primary filtering strategy based on feature point neighborhood consistency to remove a part of outliers with obvious errors, and purifies the neighborhood to obtain an approximate inner point set, thereby constructing a more real neighborhood. The image space is meshed into a plurality of non-overlapping units, and an estimated motion vector is calculated for each unit through Gaussian kernel convolution operation to represent the potential motion characteristics of each unit; finally, by judging the degree of consistency between the initial motion vector and the estimated motion vector in each unit, subtle mismatching is further eliminated, and more accurate matching is obtained.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.