CN114549858B

CN114549858B - Image feature matching method and device based on local consistency

Info

Publication number: CN114549858B
Application number: CN202210036796.0A
Authority: CN
Inventors: 杨猛; 陈珺; 马佳义; 罗林波; 熊永华
Original assignee: China University of Geosciences Wuhan
Current assignee: China University of Geosciences Wuhan
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2024-12-10
Anticipated expiration: 2042-01-13
Also published as: CN114549858A

Abstract

The present invention provides an image feature matching method and device based on local consistency, the method comprising: obtaining two images to be matched; detecting feature points in the two images by SIFT algorithm and establishing feature descriptors; constructing a set of initial matching sets according to the similarity of feature descriptors in the two images; deleting mismatches from the initial matching set by neighborhood consistency constraint to obtain a second matching set; calculating the deviation of the motion vector of each feature point by the second matching set; optimizing a preset image feature matching model by constraining the deviation of the motion vector by motion vector consistency, and determining hyperparameters; performing feature matching on any two images to be matched by the optimized image feature matching model to obtain an image feature matching result. The present invention has the beneficial effects of: combining the local consistency of neighborhood consistency and motion vector consistency to remove and optimize mismatches in image feature matching, thereby improving matching accuracy.

Description

Image feature matching method and device based on local consistency

Technical Field

The invention relates to the technical field of image processing, in particular to an image matching technology, and provides an image feature matching method and device based on local consistency.

Background

Feature matching is an important task in computer vision, and has important significance for various visual tasks. Feature matching refers to the establishment of a reliable feature point corresponding relation in two images of the same scene, is essentially a complex combination optimization problem of NPC (Non-DETERMINISTIC POLYNOMIAL COMPLETE PROBLEM, NPC), has very large calculation amount, and most of the current methods remove mismatching from a group of initial matching sets on the basis of estimating a bottom image transformation model, but the transformation model may be different from different image data sets, which severely limits the universality of the algorithm. And the matched images are easily influenced by external noise, outliers, non-rigid transformation and other problems, so that the performance of the traditional feature matching algorithm is often greatly reduced, and the engineering use of the traditional feature matching algorithm is severely restricted. Therefore, research on a universal more robust and efficient feature matching algorithm has very important meaning and value in both theoretical and practical application.

In general, image matching methods can be classified into three main categories, region-based methods, feature-based methods, and learning-based methods, according to researches in recent years. The feature-based method is widely and deeply studied, and has great advantages that (1) the features can better reflect the essential structure of an image, and the influence of sensor and environmental noise can be reduced. (2) Compared with the traditional area-based method, the feature method simplifies the image expression form, greatly improves the operation speed of the algorithm and reduces the calculation cost for processing the image. Although the conventional method has a scale-invariant feature transform (SIFT) widely used in various fields, there are still many problems to be solved.

First, imaging viewpoints change or more complex non-rigid transformations due to ground heave changes in the remote sensing image do not fit well using predefined parametric (e.g., affine, monogenic and polar geometry) or non-parametric (e.g., non-rigid) models. Secondly, the traditional local descriptor only considers some weak characteristics of characteristic points, and when remote sensing image information is complex or the image is affected by nonlinear radiation difference, speckle noise, light change and the like, the accuracy of the local descriptor is greatly reduced, so that a large number of mismatching is caused. Third, algorithms typically have high computational complexity due to the combined nature of the matching issues. Even if the interference of outliers is eliminated, a complex arrangement is obtained by simply matching the feature points of two remote sensing images.

Disclosure of Invention

In order to solve the problems of low robustness, low speed and high complexity of the existing image matching method, the invention adopts the technical scheme that the image feature matching method and device based on local consistency are provided.

According to one aspect of the present invention, there is provided an image feature matching method based on local consistency, including the steps of:

s1, acquiring two images to be matched;

S2, detecting feature points in the two images through a SIFT algorithm, and establishing feature descriptors;

S3, constructing a group of initial matching sets according to the similarity of feature descriptors in the two images;

s4, deleting mismatching from the initial matching set through neighborhood consistency constraint to obtain a second matching set;

S5, calculating the deviation of the motion vector of each feature point through the second matching set;

S6, optimizing a preset image feature matching model and determining super parameters by carrying out motion vector consistency constraint on the deviation of the motion vectors;

And S7, performing feature matching on the two images to be matched through the optimized image feature matching model to obtain an image feature matching result.

Preferably, the S4 includes:

s4.1, obtaining attribute variables of each initial match in the initial matching set through neighborhood consistency constraint

S4.2 setting a threshold value eta by combining eta withComparing, filtering the feature points which do not accord with the neighborhood consistency constraint to obtain a second matching set U:

U={(x_i,y_i)∈S|Rst_i>η}

Wherein S represents an initial matching set, i represents a sequence number of the initial matching, (x _i,y_i) represents an ith initial matching, rst _i represents all neighborhoods corresponding to the ith initial matching And M represents the number of neighbors taken for each initial match, j represents the sequence number of the neighbor, i.e., the j-th neighbor, and K _j represents the size of the j-th neighbor.

Preferably, the S5 includes:

S5.1, vector conversion is carried out on the second matching set, then the second matching set is gridded into a plurality of non-overlapping units through space, and the estimated motion vector of each unit is calculated through Gaussian kernel convolution operation;

and S5.2, calculating the deviation between the initial motion vector and the estimated motion vector in each unit.

Preferably, the S5.1 includes:

s5.1.1. converting the second set of matches U into an initial set of motion vectors S':

wherein x _i and y _i respectively represent the i-th feature point forming initial matching in the two images, i represents the feature point or the serial number of the initial matching, m _i＝y_i-x_i represents the motion vector obtained by the i-th initial matching (x _i,y_i), and N represents the number of initial matching in the initial matching set;

S5.1.2 equally dividing each dimension of the initial set of motion vectors S 'into n _c non-overlapping portions, then obtaining g=n _c×n_c units, so that the initial set of motion vectors S' can be divided into G units, becoming a set

Wherein n _c represents the number of divisions of each dimension of the initial motion vector set S ', G represents the number of divisions of the initial motion vector set S ' as a whole, that is, S ' is divided into G units, C _j,k represents the j-th row, and k-th column of units, j and k respectively represent the row number and the column number of the unit C _j,k;

S5.1.3 obtaining an estimated motion vector for each cell C _j,k by Gaussian kernel convolution processing

Preferably, the S5.2 includes:

S5.2.1, calculating the numerical deviation of the initial motion vector and the estimated motion vector;

S5.2.2 calculating a length ratio deviation of the initial motion vector from the estimated motion vector;

s5.2.3 calculating an angular deviation of the initial motion vector from the estimated motion vector;

s5.2.4 calculating a motion vector total deviation according to the numerical deviation, the length ratio deviation and the angle deviation.

Preferably, after S5.2.4, the method further includes:

performing quantization treatment on the total deviation of the motion vectors to obtain a quantization result;

the quantization result is used to reflect the degree of consistency between the initial motion vector and the estimated motion vector within each cell.

According to another aspect of the present invention, there is also provided an image feature matching apparatus based on local consistency, including:

The image acquisition module is used for acquiring two images to be matched;

the feature point detection and descriptor establishment module is used for detecting feature points in the two images through a SIFT algorithm and establishing feature descriptors;

The initial matching set constructing module is used for constructing a group of initial matching sets according to the similarity of the feature descriptors in the two images;

The neighborhood consistency constraint module is used for deleting mismatching from the initial matching set through neighborhood consistency constraint to obtain a second matching set;

A motion vector deviation calculation module, configured to calculate a deviation of a motion vector of each feature point through the second matching set;

The motion vector consistency constraint module is used for carrying out motion vector consistency constraint on the deviation of the motion vector so as to optimize a preset image feature matching model and determine super parameters;

And the feature matching module is used for carrying out feature matching on the two images to be matched through the optimized image feature matching model to obtain an image feature matching result.

The invention uses a primary filtering strategy based on feature point neighborhood consistency to remove a part of outliers with obvious errors, and purifies the neighborhood to obtain an approximate inner point set, thereby constructing a more real neighborhood. The image space is meshed into a plurality of non-overlapping units, and an estimated motion vector is calculated for each unit through Gaussian kernel convolution operation to represent the potential motion characteristics of each unit; finally, by judging the degree of consistency between the initial motion vector and the estimated motion vector in each unit, subtle mismatching is further eliminated, and more accurate matching is obtained.

The technical scheme provided by the invention has the following beneficial effects:

1. the method can cope with ground fluctuation change, imaging viewpoint change or more complex non-rigid transformation of the remote sensing image.

2. The method can effectively remove a large amount of mismatching generated when remote sensing image information is complex or the image is affected by nonlinear radiation difference, speckle noise, light change and the like.

3. The method has linear complexity, can solve thousands of matching problems in a few milliseconds, and ensures the speed of an algorithm. This is useful for many real-time applications and can quickly provide good initialization of matching algorithms for specific complex problems, particularly remote sensing images.

Drawings

The specific effects of the present invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of a local consistency-based image feature matching method of the present invention;

FIG. 2 is a general framework diagram of the local coherency algorithm of the present invention;

FIG. 3 is a neighborhood consistency diagram of the present invention;

fig. 4 is a cumulative distribution diagram of the total motion deviation of the feature points before and after the quantization process according to the present invention, wherein fig. 4 (a) is before the quantization process and fig. 4 (b) is after the quantization process;

FIG. 5 is a schematic diagram of an ultra-parameter set-up experiment of the present invention;

FIG. 6 is a qualitative result of feature matching of the present invention;

FIG. 7 is a quantitative experimental result of the feature matching of the present invention.

Detailed Description

For a clearer understanding of technical features, objects and effects of the present invention, a detailed description of embodiments of the present invention will be made with reference to the accompanying drawings.

Referring to fig. 1, the embodiment provides an image feature matching method based on local consistency, which mainly includes the following steps:

s1, acquiring two images to be matched;

the step S4 specifically comprises the following steps:

(1) Neighborhood consistency model

For an image pair describing the same scene or object, when viewpoint change (such as stereo parallax) or non-rigid transformation (such as dynamic scene) occurs, the absolute distance between two mutually corresponding characteristic points may change obviously, but the spatial neighborhood relationship between the characteristic points representing the local neighborhood structure of the image scene is quite stable, not affected, and generally can be well preserved. Taking a non-rigid face image as an example, due to physical constraints of bones and muscles, changes in expression and viewing angle do not result in local structural changes in the face, such as relative positions of eyes, nose, mouth, chin, etc.

For one initial match (x _i,y_i) in the initial match set S _n, if it is an inlier (a feature point that can make a correct match), then the distribution of its neighborhood elements should be similar. In contrast, for an outlier (feature point of mismatching), the corresponding neighborhood distribution will be quite different, which is a neighborhood consistency constraint, as shown in fig. 3, which shows initial matches (x _i,y_i) for the inliers and outliers, respectively, and 5 sets of initial matches within their neighbors. It is easy to see that many common initial matches are contained within the neighborhood of the inlier but there is little common initial match within the neighborhood of the outlier (the correct match is represented by a black line and the false match is represented by a gray line). To capture this attribute mathematically, the present invention defines a set of sizes asIs a neighborhood of (i.e.)AndTherein, whereinThe neighborhood of the point X _i is represented, M represents the number of the neighborhood, and the neighborhood is composed of K _j characteristic points which are closest to X _i in all characteristic point sets of X, and K _j represents the size of the jth neighborhood. This strategy is known as the multiple K-Nearest Neighbor strategy (Multi-K-Nearest Neighbor, multi-KNN). At this timeAndNeighborhood consistency between can be characterized by the following equation:

wherein, Is thatAndThe number of common feature points in both neighborhoods,The ratio of the number of common feature points in the jth neighborhood of the ith initial match to the number of all feature points. Obviously, if the corresponding feature point is an interior point,The value of (2) will be large and vice versa. Note that the true common element cannot be determined without GroundTruth. However, the inclusion in S _n may be appliedAndThe number of initial matches between them and the number of common elements are replaced to obtain an approximation. The reason for using this approximation is that if the initial match (x _i,y_i) is an inlier, then it occurs in the local neighborhood at the same timeAndThe initial match of the feature points in (a) is most likely to be an inlier, and conversely, if (x _i,y_i) is an outlier, then in its neighborhoodAndIn (3) is likely not to simultaneously occur an initial match, that is, an x _i neighborhoodThe matching point of a certain point in the map is in the neighborhood of y _i with high probabilityOutside of that.

(2) Neighborhood consistency constraint preliminary filtering

When the image is subject to complex non-rigid transformations or to external noise, the initial set of matches may contain a large number of outliers. In order to remove the outliers with obvious errors to a certain extent to clean the local area, the estimated motion vector which is more similar to the potential true motion vector can be obtained in the following processing, and the attribute variables obtained through the neighborhood consistency constraint are convenient to obtainTo perform the preliminary filtering.

To achieve this, a simple threshold value, η, may be set by combining η withComparing, filtering out feature points which do not meet the neighborhood consistency constraint, and obtaining a preliminarily filtered second matching set U:

U={(x_i,y_i)∈S|Rst_i>η}, (2)

Wherein U is a second matching set after preliminary filtering, eta is a threshold value used in comparison, rst _i represents all neighborhoods corresponding to the ith initial matching And M represents the number of neighbors. The second set of matches U after the preliminary filtering is typically clean enough to be used for the construction of the motion vector field. According to a large number of experimental results, the number of neighborhoods=3 is set in a multiple K-nearest neighbor strategy, and the size of each neighborhood is K ₁＝9,K₂ =10 and K ₃ =11 respectively. If necessary, the initial matching set may be filtered iteratively by setting different thresholds η, but it is sufficient that the method of the invention performs the filtering only once.

the step S5 specifically comprises the following steps:

(1) Numerical deviation of motion vectors

In order to effectively remove noise interference in an image, according to the theory of image denoising, when an image is full of noise, the method fully considers pixel points in a local area (determined by the size of a convolution kernel), and adopts a motion vector method to acquire real pixel information to remove noise. First, the second matching set U is converted into an initial motion vector setWhere m _i＝y_i-x_i represents the initial motion vector of the i-th initial match (x _i,y_i), for potential true matches in the image in the set of matches, the motion vector should be regular and smooth, i.e., motion vector consistency is satisfied, i.e., correct matches will have similar motion behavior in local neighbors, while false matches are typically randomly distributed. As shown in fig. 2, it can be seen that there are three steps for each match, namely, establishing an initial match set, purging the neighborhood with neighborhood consistency, and further filtering outliers with motion vector consistency. With two pictures for each step, the left picture representing the initial motion vector field for gridding and the right picture representing the estimated motion vector field for each cell (generated by gaussian kernel convolution). It can be seen that with deep denoising, fewer outliers in the vector field are generated, the estimated vector field is also more regular and smooth, and some abrupt vectors are removed.

To obtain the estimated motion vector, each dimension of the initial set of motion vectors S 'is first equally divided into n _c non-overlapping parts, and then the initial set of motion vectors S' is partitioned into g=n _c×n_c units (G is the number of units) into a setWhere C _j,k represents the j-th row, and the k-th column of cells, j and k represent the row number and column number, respectively, where cell C _j,k is located. The initial set of motion vectors after meshing is shown in fig. 2, where n _c =20 in fig. 2. Now, it willDefined as an average motion vector matrix in whichIs the average motion vector within the (j, k) th cell C _j,k:

Where |·| represents a modulo operation, |c _j,k | is the size of cell C _j,k (i.e., the number of motion vectors in cell C _j,k).

In order to fully exploit the interactions between adjacent cells, the present invention exploits an efficient convolution theory, as it can comprehensively consider the local relations between n _k×n_k cells. Here the convolution of motion vectorsThe definition is as follows:

Wherein the method comprises the steps of The size of the matrix generated after convolution for the gaussian kernel is n _c×n_c x 2,For the estimated motion vector of the (j, k) th cell, W is a count matrix representing the number of motion vectors in each cell, W _j,k＝|C_j,k represents the size of the j-th row, k-th column cell C _j,k. The denominator term in equation (4) is used for weight compensation to keep the scale of the convolution result from changing too much whenEpsilon is an infinitesimal positive number when 0 is present. K is a Gaussian kernel distance matrix with the size of n _k×n_k, also called Gaussian convolution kernel, which represents the Gaussian kernel distance between a certain unit and surrounding (n _k×n_k -1) units, and the connection relation between the certain unit and the surrounding adjacent units can be established through the parameter. Each element K _i,j of K is defined as:

Where K _i,j represents the Gaussian kernel distance of the cells of row i, column j, s _i,j＝(i,j)^T, The position of each cell in the gaussian convolution kernel K (i.e., row i, column j) and the center position of the gaussian convolution kernel, respectively, and [ · ] rounds the element to no less than its own and nearest integer. Since the center position inside the convolution kernel needs to be determined, n _k must be a positive odd number.

In addition, to avoid the influence of the isolated samples, the isolated samples may be excluded from the convolution process by subtracting the corresponding average vectors from the numerator and denominator and then adjusting the weights of each cell. In formula (4), B (W) represents a binary form of W, which has a value of 0 or 1, and B (W _i,j)＝1; K^* represents a Gaussian kernel distance of the center position only when W _i,j >0, and

Through the above convolution operation, an estimated motion vector for each cell can be obtained, which can be used to replace the potential true motion vector. Then, an initial motion vector m _i and an estimated motion vector corresponding to each element are obtainedThe value of the deviation d _i of (1) is constrained between [0,1] by the following formula:

Where dev _i is the motion vector value deviation where β determines the width of the interaction range between two motion vectors. Beta ² can be empirically set to 0.08.

(2) Length ratio deviation of motion vector

In order to make the difference between the mismatching and the correct matching more obvious, the invention considers the numerical deviation between each motion vector and the estimated motion vector, and further hands down from the length and the angle of the motion vector, thereby more conveniently distinguishing by using the threshold value. The three deviations are normalized and summed to give the total deviation. The defects of all deviations are mutually compensated, and the precision and recall rate are greatly improved.

The length ratio deviation and the angle deviation are constructed, the error matching motion vector has larger angle and length ratio deviation, and the correct matching motion vector is basically consistent with the real motion vector.

Defining length ratio deviations of motion vectors

Wherein the method comprises the steps ofRepresents m _i andThe ratio of the lengths between the (j, k) th unit is simply the length ratio between the longest motion vector and the shortest motion vector. The normalization is performed by a Gaussian membership function. UsingAs a length ratio deviation of the motion vector.

(3) Angular deviation of motion vector

Defining an angular deviation of the motion vector:

Wherein the method comprises the steps of AndRepresents m _i andAn angle therebetween. Use with the sample inventionAs the angular deviation of the motion vector.

(4) Motion vector total bias and quantization strategy

Will beAndIn combination with the previous numerical deviation dev _i, the total deviation between the motion vector and the estimated motion vector is defined as:

By comparing the deviation with a given threshold lambda, the inner set I can be approximately detected ^*

I^*＝{(x_i,y_i):d_i≤λ,i∈N_N}. (10)

As can be seen from fig. 4 (a), there is no significant distinction between the interior points and the outliers (where the points represent outliers, the o-points represent interior points, the ordinate represents motion bias, and the abscissa represents the index value of each feature point) over the total bias. This is because the variation range of the deviation of each feature point is not uniform. In order to solve this problem, a quantization strategy is adopted.

For total deviationQuantization processing

Wherein d _i,j represents the jth motion vector m _j in the x _i neighborhood and its corresponding estimated motion vectorTotal motion vector deviation between. s _i,j integrates the motion characteristics of the motion vector m _i itself and its neighborhood.

Finally, the feature point x _i itself including all s _i,j in its neighborhood is added to get the quantization deviation. The smaller the value, the more consistent the match is to the motion vector consistency. At this time, the cumulative distribution of quantized motion vector deviations for each feature point is shown in fig. 4 (b), and it can be seen that the difference between the interior point and the outlier becomes apparent.

S6 specifically comprises the following steps:

(1) Matching model establishment and optimization

Assume that the initial set of matches extracted from the two images isN initial matches are included, x _i and y _i are two-dimensional column vectors of feature point spatial locations. The initial set of matches S contains a number of false matches.

In order to better preserve local features, the invention adopts a general mathematical model as an image feature matching model. The invention defines I as an unknown interior point set, its optimal solution is I ^*:

I^*＝argmin_IC(I;U,S,λ) (13)

wherein the cost C is defined as:

C(I;S,λ)=∑_i∈I∑_j∈I(s_i+λ(N-|I|)) (14)

Where s _i is the total bias after quantization, |·| is the cardinality of the set. In this cost function, the first term penalizes any matches that do not maintain local consistency, the second term limits the number of outliers, and the parameter λ >0 is used to control the weight between the two terms. The object of the present invention is to obtain a maximum set of interior points by minimizing the cost function C.

To optimize the cost function, the initial set of matches S is associated with a binary vector P of dimension Nx1, where P _i ε {0,1} represents the match correctness of the ith correspondence (_xi,y_i). Specifically, p _i =1 represents an interior point, and p _i =0 represents an outlier.

By combining the terms related to p _i to reconstruct the common terms in the formula, the following formula can be obtained:

Wherein the method comprises the steps of

c_i＝s_i (16)

C _i can measure the degree to which the i-th match (x _i,y_i) satisfies the local consistency. Obviously, a correct match would bring zero or lower cost, while a false match would bring higher cost.

For a given initial set of matches, the coordinates of the feature points are known, i.e. the neighborhood structure and motion vectors between feature points are fixed, so that all total cost values can be pre-computedThat is, the only unknown variable in equation (14) is p _i. The optimal solution for P to minimize the cost function is determined by the following simple criteria:

Thus, the optimal inner set I ^* can be determined by:

I^*＝{i|p_i=1,i=1,···,N}. (18)

from equation (17), it can be seen that the parameter λ has a threshold effect in addition to the effect of adjusting the trade-off. There are different distribution parameters, motion vector fields and costs for different types of image pairs, and so will also change.

(2) Determination of superparameter

The present invention has so far converted the feature matching problem into an optimized mathematical model. However, there are several hyper-parameters, i.e., { n _c,n_k, λ, η }, whose setting of values can seriously affect the result of the algorithm. Therefore, in order to obtain the best result and the best parameter setting, the invention selects 12 images from rigid deformation, rotational deformation, scale deformation and non-rigid deformation respectively for experiment.

The invention changes the values of n _c and n _k, makes a plurality of groups of experiments, and records the precision, recall and F-score. Precision, recall, and F-score are common indicators of evaluation of algorithm matching performance.

As shown in FIG. 5, FIG. 5 (a) shows the precision, recall and F-score curves obtained by the method of the present invention when n _c＝20、n_k is varied, and FIG. 5 (d) shows the precision, recall and F-score curves obtained by the method of the present invention when n _k＝7、n_c is varied. It is clear from the results n _c and n _k that there is little change and insensitivity over a range (n _c∈(10,40),n_k e 5, 9), but beyond this range the accuracy, recall and F-score are significantly degraded. Therefore, as long as the optimum values of n _c and n _k are chosen by the design adaptive algorithm within this range, the best results are obtained.

In order to determine the optimal super-parameter n _c,n_k during the gaussian kernel convolution filtering, the present invention also uses the same data comprising 48 pairs of images as the above experiment.

First, the present invention prepares statistics of the numerical deviations of the motion vectors of 12 sets of different (n _c,n_k) combined interior points and outliers, where n _c values are taken at 10 intervals from 10 to 40 (represented by four different gray scales respectively), and n _k values are taken at 2 intervals from 5 to 9 (represented by three types of lines, i.e., full curve, dotted line, and dashed line respectively). In fig. 5 (b), the falling trend line represents the statistics of the interior points, and the rising trend represents the statistics of the outliers. The optimal parameter lambda corresponding to each group (n _c,n_k) is the intersection point of the interior point and the corresponding outlier probability curve. The closer the intersection point is to the lower left corner, the better the separability of the inner layer from outliers. Thus, according to the experimental results, the invention can adaptively set the super parameters N _c and N _k by utilizing the relation between N _c of the initial matching set and the size N of the set.

Where N is the number of initial matches in the initial match set. The elements may be rounded. The odd (n _c/3) represents the nearest odd number not greater than n _c/3.

According to the invention, 10 test images are respectively selected from 48 test images used in the experiment according to translation, scaling, rotation and non-rigid transformation, and then 10 remote sensing images are additionally selected, wherein all the transformation is included. The same experiment was performed using a total of 50 images of the 5 groups, respectively, and the results are shown in fig. 5 (e). From the results it can be seen that different types of image deformations lead to a large difference in these intersections, which means that different optimal lambda and eta should be chosen. Note that when c _i =λ, the setting of p _i may be arbitrary.

To determine the best λ and η, the present invention uses the same dataset (including 50 pairs of images) to experiment. First, the value of η is fixed, and the accuracy, recall and F-score are calculated using different λ values. The value of λ is then fixed, different values of η are used, and precision, recall and F-score are calculated. The final results are shown in fig. 5 (c) and (F), where λ=0.45, η=0.3, the present invention gives the highest precision, recall and F-score.

Fig. 6 is a qualitative result of the method of the present invention for a 10-pair representative image pair.

To evaluate the performance of the method of the present invention, 5 generic image matching datasets were used in the experiment, each dataset containing 30 pairs of images and Ground Truth. The 5 data sets comprise a wide range of content, including primarily telemetry data sets and non-telemetry data sets. The telemetry dataset is divided into 720Yun, SUIRD, CIAP and SAR datasets. The non-remote sensing data set comprises natural images, medical images, infrared images and the like. In order to ensure objectivity, the invention uses a VLFeat toolbox with an open source to extract SIFT features from all data sets, and then adopts a nearest neighbor strategy to construct an initial matching set according to the similarity of feature descriptors. Ground Truth of all datasets are either provided by manually marking each match as true or false, or checked using a geometric transformation matrix provided by the original author.

From top to bottom in fig. 6, each row contains two examples selected from the 5 data sets 720Yun, SUIRD, CIAP, SAR and non-telemetry, respectively. These image pairs have features of high outlier scale, small overlapping area, including scale, rotation, translation, and even non-rigid transformations, which present significant difficulties and challenges for the removal of mismatching. The accuracy, recall and F-score of the method of the invention from top to bottom and from left to right are ：(98.61％,100.0％,0.9930)、(98.11％,98.96％,0.9854)、 (100.0％,99.04％,0.9952)、(100.0％,98.95％,0.9947)、(99.57％,100.0％,0.9978)、 (99.56％,99.56％,99.56％)、(100.0％,100.0％,1)、(99.49％,99.80％,99.64％)、 (98.95％,99.15％,0.9905)、(97.84％,98.81％,0.9833)., respectively, and can be easily found through experimental results, and the method of the invention can identify most interior points with little erroneous judgment. This shows that the method is universal and robust in processing remote sensing images of different types of image distortions and a large number of outliers. In addition, the method also obtains good experimental results on a non-remote sensing data set of a common scene.

In fig. 6, each line has two types of images, the left image is a feature correspondence wiring diagram, and the right image is a feature motion vector field diagram. The head and tail of each arrow on the motion vector field correspond to the spatial positions of the matching feature points in the two images (the figure contains both the identified correct matches and the identified incorrect matches, but the identified incorrect matches are few, almost 0). For the visibility of the matches in the image pairs, 100 matches are randomly selected in each pair of images to be displayed in a straight line.

FIG. 7 shows the experimental results of the 5 data sets, respectively, plotted against the quantitative evaluation index, accuracy, recall, F-sore, and run time, respectively. Precision (P) is defined as the proportion of correct matches in all matches retained by the algorithm, recall (R) is defined as the ratio of correct matches identified by the algorithm to all correct matches, and F-score is used to evaluate the overall performance of the match. Based on the number of True Positives (TP), true Negatives (TN), false Positives (FP), and False Negatives (FN), the accuracy can be calculated according to the following formula:

the recall rate is calculated according to the following formula:

Wherein True Positive (TP) indicates correct match for which the algorithm determines correct match, true Negative (TN) indicates incorrect match for which the algorithm determines incorrect match, false Positive (FP) indicates incorrect match for which the algorithm incorrectly determines correct match, and False Negative (FN) indicates correct match for which the algorithm incorrectly determines false match.

True Positives (TP) and True Negatives (TN) reflect the accuracy of the algorithm, and False Positives (FP) and False Negatives (FN) reflect the degree of error of the algorithm. F-score is taken as a comprehensive statistic of accuracy and recall, and the solving formula is as follows:

To make the results more convincing, the invention will quantitatively evaluate and compare the feature matching performance of the method of the invention and compare it to the LPM(Locality Preserving Matching,LPM)、LLT(Locally Linear transform,LLT)、LAF(Linear Adaptive Filtering,LAF)、LMR(Learning a Two-Class Classifier for Mismatch Removal,LMR)、VFC(Vector field Consensus, VFC)、RFM-SCAN and EES (Efficient Exact Search) algorithms. The code of the comparison algorithm comes from the publication of the original author, and all parameters of the comparison algorithm are set according to the original text and remain unchanged in the whole experimental process.

The results of fig. 7 show that the method of the present invention achieves good results on all remote sensing datasets, especially the CIAP dataset. The VFC algorithm performs poorly for certain image pairs in the dataset. This is due to the fact that the interpolation rate or interpolation number of these images is low, like the CIAP dataset, interfering with the construction of the vector field. In SAR data sets with severe noise and affine distortion, LMR algorithms produce many false positives and have low recall rates. The method and the LAF algorithm not only can recall more correct matches, but also have higher accuracy than other algorithms. For 720Yun data with non-rigid transformation and low interior point rate, the mismatch removal task is very difficult. The LLT algorithm does not perform perfectly on this dataset because the algorithm takes into account the linear transformation between feature points, with poor adaptation. For SUIRD datasets where viewpoint variation is large, there are severe outliers, such that motion vector bias increases. This affects both the typical stadium of LAF and the motion consistency clustering of RFM-SCAN, thereby reducing accuracy. Thus, the performance of LAF and RFM-SCAN algorithms is significantly degraded compared to other data sets. The EES algorithm is based on an interior point model. When the image contains non-rigid deformations, low interior point ratios and external noise disturbances, the performance of the geometric model fitting, such as 720Yun, CIAP and SAR data set images, will be significantly reduced. The LPM and the deep learning based LMR algorithms have good performance on each dataset, but are slightly lower than the method of the present invention. In addition, the method has good performance on the non-remote sensing data set, which shows that the method has certain universality and expandability and can be expanded to region-based processing of various types of images.

The F-score may reflect the overall performance of the algorithm. As can be seen from all remote sensing data sets, the method can keep robustness on remote sensing images with different problems due to the combination of the fixed characteristic points and the motion characteristic points, and obtain the optimal precision and recall rate. However, in some cases, the index of the comparison algorithm may decrease or even fail. Furthermore, the inventive method has a linear complexity. As can be seen from the last line of fig. 7, the operation speed of the research algorithm is equivalent to that of the LAF algorithm while the comprehensive performance is ensured. It is slightly slower than the LPM algorithm but much faster than other methods, especially when processing tens of thousands of matches.

Quantitative comparisons of LPM, LLT, LAF, LMR, VFC, RFM-SCAN, EES and the method of the present invention were made from left to right in fig. 7 on sets 720Yun, SUIRD, CIAP, SAR and Non-remote sensing 5, respectively. From top to bottom are cumulative distributions of inlier rate, precision, recall rate, F-score, and run time, respectively. The coordinates (x, y) of the points on each curve represent the y-value (i.e., the precision, recall, F-score, or run-time value in ms) of the 30 x image pair in each dataset. The algorithm represented by each curve, average Precision (AP), average Recall (AR), average F-score (AF), and average run time (runtime) are marked at the corresponding locations in the graph.

In some embodiments, there is also provided an image feature matching device based on local consistency, including the following modules:

The image acquisition module is used for acquiring two images to be matched;

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for image feature matching based on local consistency, characterized in that the local consistency includes: a neighborhood consistency constraint and a motion vector consistency constraint, and the image feature matching method includes the following steps:

S1: Get two images to be matched;

S2: Detect feature points in the two images using the SIFT algorithm and establish feature descriptors;

S3: construct an initial set of matches based on the similarity of feature descriptors in the two images;

S4: deleting mismatches from the initial matching set by using a neighborhood consistency constraint to obtain a second matching set;

The S4 includes:

S4.1: Obtain the attribute variables of each initial match in the initial match set through neighborhood consistency constraints

S4.2: Set the threshold η by comparing η with Compare and filter the feature points that do not meet the neighborhood consistency constraints to obtain the second matching set U:

U＝{(x _i ,y _i )∈S|Rst _i >η}

Where S represents the initial matching set, i represents the sequence number of the initial matching, ( _xi , _yi ) represents the i-th initial matching, and Rst _i represents the number of all neighbors corresponding to the i-th initial matching. The sum of , M represents the number of neighborhoods taken by each initial match, j represents the sequence number of the neighborhood, that is, the jth neighborhood, and _Kj represents the size of the jth neighborhood;

S5: Calculate the deviation of the motion vector of each feature point through the second matching set;

The S5 includes:

S5.1: after performing vector conversion on the second matching set, spatially gridding the second matching set into a plurality of non-overlapping units, and calculating an estimated motion vector of each unit by Gaussian kernel convolution operation;

S5.2: Calculate the deviation between the initial motion vector and the estimated motion vector in each unit;

The S5.1 includes:

S5.1.1: Convert the second matching set U into a vector set S':

Wherein, _xi and _yi represent the i-th feature points constituting the initial match in the two images, respectively, i represents the sequence number of the feature point or the initial match, _mi = _yi _-xi represents the motion vector obtained by the i-th initial match ( _xi , _yi ), and N represents the number of initial matches in the initial match set;

S5.1.2: Divide each dimension of the initial motion vector set S' into n _c non-overlapping parts, and then obtain G = n _c × n _c units. In this way, the initial motion vector set S' can be divided into G units, becoming a set

Wherein, n _c represents the number of divisions of each dimension of the initial motion vector set S', G represents the number of divisions of the initial motion vector set S' as a whole, that is, S' is divided into G units, C _j,k represents the unit in the jth row and the kth column, j and k represent the row number and column number of the unit C _j,k respectively;

S5.1.3: Obtain the estimated motion vector of each unit Cj _,k through Gaussian kernel convolution

The S5.2 includes:

S5.2.1: Calculate a numerical deviation between the initial motion vector and the estimated motion vector;

S5.2.2: Calculate the length ratio deviation between the initial motion vector and the estimated motion vector;

S5.2.3: Calculate the angular deviation between the initial motion vector and the estimated motion vector;

S5.2.4: Calculate a total motion vector deviation based on the numerical deviation, the length ratio deviation, and the angle deviation;

After S5.2.4, it also includes:

Performing quantization processing on the total deviation of the motion vector to obtain a quantization result;

The quantization result is used to reflect the consistency between the initial motion vector and the estimated motion vector in each unit;

S6: performing motion vector consistency constraints on the deviation of the motion vector, thereby optimizing the preset image feature matching model and determining hyperparameters;

S7: Perform feature matching on any two images to be matched using the optimized image feature matching model to obtain an image feature matching result.

2. An image feature matching device based on local consistency, characterized by comprising the following modules:

An image acquisition module, used for acquiring two images to be matched;

Feature point detection and descriptor creation module, used to detect feature points in two images through SIFT algorithm and create feature descriptors;

An initial matching set construction module, used to construct a set of initial matching sets according to the similarity of feature descriptors in two images;

A neighborhood consistency constraint module, used to delete false matches from the initial matching set through neighborhood consistency constraints to obtain a second matching set;

The neighborhood consistency constraint module includes:

U＝{(x _i ,y _i )∈S|Rst _i >η}

A motion vector deviation calculation module, used to calculate the deviation of the motion vector of each feature point through the second matching set;

The motion vector deviation calculation module includes:

The S5.1 includes:

S5.1.1: Convert the second matching set U into a vector set S':

The S5.2 includes:

After S5.2.4, it also includes:

A motion vector consistency constraint module, used to optimize a preset image feature matching model and determine hyperparameters by performing motion vector consistency constraints on the deviation of the motion vector;

The feature matching module is used to perform feature matching on two images to be matched through an optimized image feature matching model to obtain an image feature matching result.