Scale-adaptive long-term correlation target tracking method
Technical Field
The invention belongs to the field of visual target tracking, and particularly relates to a scale self-adaptive long-term correlation target tracking method.
Background
Target tracking pertains to the content of video analysis, i.e., processing a sequence of video images. The task of target tracking is to determine the position and the size of the target in each subsequent frame by analyzing the group of video image sequences after the information such as the position and the size of the target in the first frame is given, and accurately frame the target. The target tracking technology integrates the knowledge of mathematics, physics, image processing and the like, and has wide application and development prospect in the aspects of military national defense, intelligent transportation and the like. For example, in the military field, for missile defense, guidance systems, air traffic control, and the like; the intelligent traffic system is used for real-time monitoring of traffic flow, traffic accident detection, pedestrian counting and the like in the intelligent traffic field.
The correlation filtering-based tracking algorithm sees the tracking process as a process of template matching and ridge regression. Nuclear correlation filtering (KCF) is a related filtering target tracking algorithm added with a kernel function, the algorithm extracts characteristics by using a multi-channel direction gradient Histogram (HOG), and positive and negative samples are constructed by cyclic shift, but the KCF cannot cope with the problem of target scale change.
In order to solve the problem, a Discriminant Scale Space (DSST) target tracking algorithm adopts a mode of jointly tracking a three-dimensional scale space related filter, firstly, position information of a target in a video sequence is determined by utilizing a two-dimensional discriminant position filter, and then, a tracking target output by the position filter is detected by utilizing a one-dimensional scale filter, so that the optimal scale of the current target is output. The long-term correlation filtering (LCT) algorithm adds a correlation filter for detecting confidence on the basis of a DSST algorithm position filtering and scale filtering framework, and the added detection mechanism enables the tracking performance of the algorithm in videos with attributes such as shielding and exceeding the field of view to be good, but the tracking performance of the LCT algorithm in environments with attributes such as overlarge target scale change, low resolution, rapid motion, illumination change and the like is required to be improved.
Disclosure of Invention
In view of this, the present invention provides a scale-adaptive long-term correlation (LCSA) target tracking algorithm to improve the accuracy of the existing target tracking algorithm, so that the target tracking algorithm can overcome the interference of multiple environmental factors.
For this purpose, the invention provides the following technical scheme:
a scale-adaptive long-term correlation target tracking algorithm, comprising the steps of:
(1) Initializing a target detection frame; extracting the characteristics of the target according to the detection frame in the first frame, and initializing a time context regression model R c Regression model R of target appearance t And detector D rf; wherein Rc Responsible for translation estimation, R t Responsible for scale estimation, D rf Responsible for re-detection;
(2) For the t-th frame, the target position (x t-1 ,y t-1 ) Cutting a search window in a t frame, extracting HOG characteristics, and training a relevant filter template;
(3) Performing translation estimation using R c And calculating a response based on the correlation filter score obtained from the correlation filter templateAnd estimates the current frame position +.>
(4) Constructing a scale pool by using a multi-scale search strategy and mapping y by correlation s and Rt Adaptive estimation of optimal dimensionsObtaining an initial predicted target state of the t-th frame +.>
(5) If it isUsing D rf Performing a re-detection to find a candidate set of states X, for each state X 'in X' i Calculating confidence score y' i If max (y' i )>τ t Then-> wherein ,τr Is a first threshold value τ t Is a second threshold, ++>A correlation map representing the predictions; obtaining the final predicted target state of the t-th frame +.>
(6) Updating R c ;
(7) If it isUpdating R t; wherein ,τa Is a third threshold;
(8) Update D rf ;
(9) Repeating (2) - (8) for the t+1st frame until the video sequence ends.
Further, the relevant filter templates trained in the step (2) specifically include:
wherein w represents the correlation filter, x m,n Representing an image block x having m x n pixels, y (m, n) representing x m,n The gaussian sample tab generated as a training sample,represents the mapping to kernel space, λ represents the regularization parameters.
Further, the relevant filter templates trained in the step (2) specifically include:
wherein the coefficient a is defined by the following formula:f represents a discrete Fourier operator, x m,n The image block x is represented by m×n pixels, and x and y represent pixel coordinates.
Further, the tracking task is to calculate the correlation map by the image block z of the new frame in the image frames with the size of m×n, the response of the step (3)Determined according to the following formula:
where f represents the learned target appearance model, +.Finding the predicted position of the target.
Further, the scale search strategy of step (4) is:
the template scale is fixed as S T =(s x ,s y ) The scale pool is set to s= { t 1 ,t 2 ,…t k For the current frame, at { t } i s t |t i Sampling k dimensions in S to find the appropriate target scale, and then employing bilinear interpolation so that samples of each scale become equal to S T The samples are of uniform size;
the final target scale response value is calculated as follows:
is the ith scale sample in the scale pool, with the size of t i S t 。
Further, R is updated c Update R t Comprising the following steps:
for R c and Rt The coefficients f and a in the model are updated frame by frame at the learning rate α as:
further, D rf A support vector machine detector;
update D rf Comprising the following steps:
in each frame, a training set { (v) i ,c i ) I = 1,2,..n } and N samples, where v i Is the feature vector generated by the ith sample, c i E { +1, -1} is a sample tag, and the objective function for solving the support vector machine detector hyperplane h is:
wherein ,<h,v>represents the inner product of h and v; λ represents a regularization parameter;
updating hyperplane parameters using a passive algorithm:
wherein ,is the gradient of the loss function with respect to h, τ e (0, +_j) is a hyper-parameter that controls the h update rate.
Further, τ r Is 0.15 τ t Is 0.5 τ a 0.38.
The invention has the following beneficial effects:
according to the scale self-adaptive long-term correlation target tracking method provided by the invention, a scale self-adaptive strategy and an LCT target tracking frame are effectively fused, and firstly, a scale pool is introduced, so that an algorithm can self-adaptively select the optimal scale for finding the position of a tracking target. The multi-scale search can be combined with the position estimation filter more stably, and the situation that the scale estimation offset is too large is not easy to occur.
According to the scale self-adaptive long-term correlation target tracking method provided by the invention, compared with the tracking precision of an LCT algorithm, under classical target tracking scenes such as scale change, aspect ratio, low resolution, rapid motion, complete shielding, partial shielding, beyond-view, illumination change, view point transition, camera motion, similar objects and the like on Unmanned Aerial Vehicles (UAV 123) can be obviously improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of a scale-adaptive long-term correlation target tracking method in an embodiment of the invention;
FIG. 2 is a schematic diagram of a scale-adaptive long-term correlation target tracking method in an embodiment of the invention;
FIG. 3 is a schematic diagram of an adaptive scale model of a scale-adaptive long-term correlation target tracking method according to an embodiment of the present invention;
FIG. 4 is a graph of tracking accuracy of the present invention and other algorithms in a UAV123 dataset over 123 video sequences;
FIG. 5 is a graph of the success rate of the present invention and other algorithms in a UAV123 dataset over 123 video sequences;
FIG. 6 is a graph of tracking accuracy and success rate in the context of Scale Variation in a UAV123 dataset by the present invention and other algorithms;
FIG. 7 is a graph of tracking accuracy and success rate against UAV123 dataset Aspect Ratio Change (aspect ratio) background for the present invention and other algorithms;
FIG. 8 is a graph of tracking accuracy and success rate in the context of Low Resolution in the UAV123 dataset by the present invention and other algorithms;
FIG. 9 is a graph of tracking accuracy and success rate against Fast Motion (Fast Motion) background in a UAV123 dataset of the present invention and other algorithms;
FIG. 10 is a graph of tracking accuracy and success rate in the context of Full Occlusion in the UAV123 dataset by the present invention and other algorithms;
FIG. 11 is a graph of tracking accuracy and success rate against UAV123 dataset Partial Occlusion (partial occlusion) background for the present invention and other algorithms;
FIG. 12 is a graph of tracking accuracy and success rate in the context of the Out-of-View (Out-of-View) of the UAV123 dataset by the present invention and other algorithms;
FIG. 13 is a graph of tracking accuracy and success rate against UAV123 dataset Background Clutter (background clutter) background for the present invention and other algorithms;
FIG. 14 is a graph of tracking accuracy and success rate against UAV123 dataset illumination Variation (illumination variation) background for the present invention and other algorithms;
FIG. 15 is a graph of tracking accuracy and success rate in the context of a ViewPoint Change in UAV123 dataset by the present invention and other algorithms;
FIG. 16 is a graph of tracking accuracy and success rate against a Camera Motion background in a UAV123 dataset of the present invention and other algorithms;
FIG. 17 is a graph of tracking accuracy and success rate in the context of a Similar Object in the UAV123 dataset by the present invention and other algorithms.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1 and 2, two flowcharts of a scale-adaptive long-term correlation target tracking method according to an embodiment of the present invention are shown, respectively, and the method includes the following steps:
pretreatment: and reading the target position of the first frame to obtain a large background detection frame, a small background detection frame and Gaussian regression labels corresponding to each detection frame.
The processing procedure of the video of the first frame: extracting target direction gradient Histogram (HOG) characteristics according to the detection frame, and performing Fourier transform to obtain x f Adding a Gaussian kernel function to obtain a Gaussian response k in a frequency domain f Wherein the Gaussian kernel function isBoth regression models were mapped to kernel space, defined as k (x, x') =φ (x) ·φ(x'). Using x f and kf Calculating classifier parameters to obtain a time context regression model R c Regression model R of target appearance t And detector D rf, wherein Rc Responsible for translation estimation, R t Responsible for scale estimation, D rf Is responsible for re-detection.
The processing procedure of the current frame (t-th frame):
(1) According to the target position (x) of the t-1 th frame t-1 ,y t-1 ) Cutting a search window in a t frame, extracting HOG characteristics, and training a relevant filter template;
the training related filter template specifically comprises the following components:
wherein w represents the correlation filter, x m,n Representing an image block x having M x N pixels, y (M, N) representing x m,n The gaussian sample tab generated as a training sample,represents the mapping to kernel space, λ represents the regularization parameters.
The above can also be written as w= Σ m,n a(m,n)φ(x m,n )。
Wherein the coefficient a is defined by the following formula:f represents a discrete Fourier operator, x m,n The image block x is represented by m×n pixels, and x and y represent pixel coordinates.
(2) Translation estimation using context regression model R c Correlation filter score calculation response y t And estimates the position (x) t ,y t );
The tracking task may calculate the correlation map by image blocks z of a new one of the image frames of size m x n. The response of step (2) may be determined according to the following equation:
where f represents the learned target appearance model, +.Finding the predicted position of the target.
(3) As shown in FIG. 3, a scale pool is built using a multi-scale search strategy, by correlation mapping y s And a target appearance regression model R t Adaptive estimation of optimal dimensionsObtaining an initial predicted target state of the t-th frame +.>
The scale search strategy is as follows: the template scale is fixed as S T =(s x ,s y ) The scale pool is set to s= { t 1 ,t 2 ,…t k For the current frame, at { t } i s t |t i The k dimensions are sampled in S to find the appropriate target scale, and bilinear interpolation is then employed so that the samples of each scale become equal to S T Samples were of uniform size.
The final target scale response value is calculated as follows: is the ith scale sample in the scale pool, with the size of t i S t . By bilinear interpolation, < > on->Will be adjusted to S T 。
(4) Setting a first threshold value tau r Second threshold τ t ,A correlation map representing the predictions; if->Using D rf Performing re-detection to find a candidate state set X, for each state X in X i 'calculate confidence score y' i If max (y' i )>τ t Then->Obtaining the final predicted target state of the t-th frame +.>
(5) Updating the model R c ;
(6) Setting a third threshold value tau a If (3)Updating the model R t ;
In the steps (5) and (6), R is as follows c 、R t The model updates the coefficients f and A in the model frame by frame at the learning rate alpha as:
(7) Update detector D rf ;
In the embodiment of the invention, D rf Is a Support Vector Machine (SVM) detector. For SVM, a training set { (v) is given in each frame i ,c i ) I = 1,2,..n } and N samples, where v i Is the feature vector generated by the ith sample, c i E { +1, -1} is then the sample label, and the objective function for solving the SVM detector hyperplane h is:
wherein ,<h,v>representing the inner product of h and v.
Passive algorithms are used to efficiently update hyperplane parameters:
wherein ,is the gradient of the loss function with respect to h, τ e (0, +_j) is a hyper-parameter that controls the h update rate.
Obtaining final predicted target state of the t-th frame from steps (2) - (4)Obtaining R of the current frame from steps (5) - (7) c 、R t and Drf ;
(8) Repeating (1) - (7) for the t+1st frame until the video sequence ends.
According to the scale self-adaptive long-term correlation target tracking method provided by the embodiment of the invention, a scale self-adaptive strategy and an LCT target tracking frame are effectively fused, and firstly, a scale pool is introduced, so that an algorithm can self-adaptively select the optimal scale for finding the position of a tracking target. The multi-scale search can be combined with the position estimation filter more stably, and the situation that the scale estimation offset is too large is not easy to occur.
Based on the above embodiments, this embodiment provides a simulation experiment.
Simulation conditions: the simulation provided in this example was at Intel (R) Core (TM) i3-4170CPU@3.70GHz 3.70GHz, a hardware environment with 4.00GB memory, and a software environment with MATLAB R2016 a. The experimental parameters were set as follows: regularization parameter λ=10 -4 Gaussian kernel σ=0.1, learning rate α=0.01, threshold τ r =0.15,τ a =0.38,τ t =0.5, scale pool set to [1,0.99,1.01 ]]. The algorithm presented herein is then compared to LCTs and other existing classical target tracking algorithms.
The simulation content: the proposed method is evaluated on a large reference data set UAV-123 containing 123 videos, and the evaluation mode selects one pass success rate (OPE), i.e. the target position given by the first frame starts tracking, and the tracking is not reinitialized after failure.
Fig. 4 to 17 are graphs of experimental results of the present experiment, in which LCSA represents the scale-adaptive long-term correlation filter tracking method proposed by the present invention, LCT and kcf_ GaussHog, CSK, IVT, DFT represent other excellent target tracking algorithms, respectively. The LCSA algorithm provided by the invention has a score of 0.40 in the graph of fig. 5 versus the success rate curve and a score of 0.58 in the graph of fig. 4 versus the accuracy curve. With the long-term correlation tracking algorithm (LCT) as a reference algorithm, the LCSA can be obtained from experimental data on the UAV-123 to be improved by 2.56% in AUC success rate and 5.26% in accuracy compared with the LCT. Although the accuracy is slightly lower than LCT in the case of background clutter on UAV123, there is a performance improvement over classical target tracking algorithms such as LCT in classical target tracking scenarios such as scale change, aspect ratio, low resolution, fast motion, complete occlusion, partial occlusion, out-of-view, illumination change, field of view point transition, camera motion, similar objects, etc.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.