CN113033356B

CN113033356B - A scale-adaptive long-term correlation target tracking method

Info

Publication number: CN113033356B
Application number: CN202110265773.2A
Authority: CN
Inventors: 索继东; 王思鹏; 张伟红; 柳晓鸣; 陈晓楠
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2023-10-17
Anticipated expiration: 2041-03-11
Also published as: CN113033356A

Abstract

The invention discloses a scale-adaptive long-term correlation target tracking method. First, preprocess the first frame image to obtain the temporal context regression model R _c , the target appearance regression model R _t and the detector D _rf . For tracking of subsequent frames, a search area is created based on the target position of the previous frame, and HOG features are extracted to train the relevant filter template; translation estimation is performed to estimate the target position of the current frame; then, a scale pool is constructed to adaptively Estimate the best scale of the predicted target and obtain the target state of the current frame; if the maximum response y _s is less than the threshold τ _r , use D _rf to perform re-detection and update the position of the target; next, update R _c ; if the maximum response y When _s is greater than the threshold τ _a , R _t is updated; then, D _rf is updated; finally, the target state, R _c , R _t and D _rf predicted by the current frame are obtained. Repeat the above steps until the video image sequence ends. Compared with algorithms such as long-term correlation filtering (LCT), the present invention improves the performance of target tracking and has better robustness in various complex environments.

Description

Scale-adaptive long-term correlation target tracking method

Technical Field

The invention belongs to the field of visual target tracking, and particularly relates to a scale self-adaptive long-term correlation target tracking method.

Background

Target tracking pertains to the content of video analysis, i.e., processing a sequence of video images. The task of target tracking is to determine the position and the size of the target in each subsequent frame by analyzing the group of video image sequences after the information such as the position and the size of the target in the first frame is given, and accurately frame the target. The target tracking technology integrates the knowledge of mathematics, physics, image processing and the like, and has wide application and development prospect in the aspects of military national defense, intelligent transportation and the like. For example, in the military field, for missile defense, guidance systems, air traffic control, and the like; the intelligent traffic system is used for real-time monitoring of traffic flow, traffic accident detection, pedestrian counting and the like in the intelligent traffic field.

The correlation filtering-based tracking algorithm sees the tracking process as a process of template matching and ridge regression. Nuclear correlation filtering (KCF) is a related filtering target tracking algorithm added with a kernel function, the algorithm extracts characteristics by using a multi-channel direction gradient Histogram (HOG), and positive and negative samples are constructed by cyclic shift, but the KCF cannot cope with the problem of target scale change.

In order to solve the problem, a Discriminant Scale Space (DSST) target tracking algorithm adopts a mode of jointly tracking a three-dimensional scale space related filter, firstly, position information of a target in a video sequence is determined by utilizing a two-dimensional discriminant position filter, and then, a tracking target output by the position filter is detected by utilizing a one-dimensional scale filter, so that the optimal scale of the current target is output. The long-term correlation filtering (LCT) algorithm adds a correlation filter for detecting confidence on the basis of a DSST algorithm position filtering and scale filtering framework, and the added detection mechanism enables the tracking performance of the algorithm in videos with attributes such as shielding and exceeding the field of view to be good, but the tracking performance of the LCT algorithm in environments with attributes such as overlarge target scale change, low resolution, rapid motion, illumination change and the like is required to be improved.

Disclosure of Invention

In view of this, the present invention provides a scale-adaptive long-term correlation (LCSA) target tracking algorithm to improve the accuracy of the existing target tracking algorithm, so that the target tracking algorithm can overcome the interference of multiple environmental factors.

For this purpose, the invention provides the following technical scheme:

a scale-adaptive long-term correlation target tracking algorithm, comprising the steps of:

(1) Initializing a target detection frame; extracting the characteristics of the target according to the detection frame in the first frame, and initializing a time context regression model R _c Regression model R of target appearance _t And detector D _rf； wherein R_c Responsible for translation estimation, R _t Responsible for scale estimation, D _rf Responsible for re-detection;

(2) For the t-th frame, the target position (x _t-1 ，y _t-1 ) Cutting a search window in a t frame, extracting HOG characteristics, and training a relevant filter template;

(3) Performing translation estimation using R _c And calculating a response based on the correlation filter score obtained from the correlation filter templateAnd estimates the current frame position +.>

(4) Constructing a scale pool by using a multi-scale search strategy and mapping y by correlation _s and R_t Adaptive estimation of optimal dimensionsObtaining an initial predicted target state of the t-th frame +.>

(5) If it isUsing D _rf Performing a re-detection to find a candidate set of states X, for each state X 'in X' _i Calculating confidence score y' _i If max (y' _i )＞τ _t Then-> wherein ,τ_r Is a first threshold value τ _t Is a second threshold, ++>A correlation map representing the predictions; obtaining the final predicted target state of the t-th frame +.>

(6) Updating R _c ；

(7) If it isUpdating R _t； wherein ,τ_a Is a third threshold;

(8) Update D _rf ；

(9) Repeating (2) - (8) for the t+1st frame until the video sequence ends.

Further, the relevant filter templates trained in the step (2) specifically include:

wherein w represents the correlation filter, x _m,n Representing an image block x having m x n pixels, y (m, n) representing x _m,n The gaussian sample tab generated as a training sample,represents the mapping to kernel space, λ represents the regularization parameters.

wherein the coefficient a is defined by the following formula:f represents a discrete Fourier operator, x _m,n The image block x is represented by m×n pixels, and x and y represent pixel coordinates.

Further, the tracking task is to calculate the correlation map by the image block z of the new frame in the image frames with the size of m×n, the response of the step (3)Determined according to the following formula:

where f represents the learned target appearance model, +.Finding the predicted position of the target.

Further, the scale search strategy of step (4) is:

the template scale is fixed as S _T ＝(s _x ,s _y ) The scale pool is set to s= { t ₁ ,t ₂ ,…t _k For the current frame, at { t } _i s _t |t _i Sampling k dimensions in S to find the appropriate target scale, and then employing bilinear interpolation so that samples of each scale become equal to S _T The samples are of uniform size;

the final target scale response value is calculated as follows:

is the ith scale sample in the scale pool, with the size of t _i S _t 。

Further, R is updated _c Update R _t Comprising the following steps:

for R _c and R_t The coefficients f and a in the model are updated frame by frame at the learning rate α as:

further, D _rf A support vector machine detector;

update D _rf Comprising the following steps:

in each frame, a training set { (v) _i ,c _i ) I = 1,2,..n } and N samples, where v _i Is the feature vector generated by the ith sample, c _i E { +1, -1} is a sample tag, and the objective function for solving the support vector machine detector hyperplane h is:

wherein ,<h,v>represents the inner product of h and v; λ represents a regularization parameter;

updating hyperplane parameters using a passive algorithm:

wherein ,is the gradient of the loss function with respect to h, τ e (0, +_j) is a hyper-parameter that controls the h update rate.

Further, τ _r Is 0.15 τ _t Is 0.5 τ _a 0.38.

The invention has the following beneficial effects:

according to the scale self-adaptive long-term correlation target tracking method provided by the invention, a scale self-adaptive strategy and an LCT target tracking frame are effectively fused, and firstly, a scale pool is introduced, so that an algorithm can self-adaptively select the optimal scale for finding the position of a tracking target. The multi-scale search can be combined with the position estimation filter more stably, and the situation that the scale estimation offset is too large is not easy to occur.

According to the scale self-adaptive long-term correlation target tracking method provided by the invention, compared with the tracking precision of an LCT algorithm, under classical target tracking scenes such as scale change, aspect ratio, low resolution, rapid motion, complete shielding, partial shielding, beyond-view, illumination change, view point transition, camera motion, similar objects and the like on Unmanned Aerial Vehicles (UAV 123) can be obviously improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of a scale-adaptive long-term correlation target tracking method in an embodiment of the invention;

FIG. 2 is a schematic diagram of a scale-adaptive long-term correlation target tracking method in an embodiment of the invention;

FIG. 3 is a schematic diagram of an adaptive scale model of a scale-adaptive long-term correlation target tracking method according to an embodiment of the present invention;

FIG. 4 is a graph of tracking accuracy of the present invention and other algorithms in a UAV123 dataset over 123 video sequences;

FIG. 5 is a graph of the success rate of the present invention and other algorithms in a UAV123 dataset over 123 video sequences;

FIG. 6 is a graph of tracking accuracy and success rate in the context of Scale Variation in a UAV123 dataset by the present invention and other algorithms;

FIG. 7 is a graph of tracking accuracy and success rate against UAV123 dataset Aspect Ratio Change (aspect ratio) background for the present invention and other algorithms;

FIG. 8 is a graph of tracking accuracy and success rate in the context of Low Resolution in the UAV123 dataset by the present invention and other algorithms;

FIG. 9 is a graph of tracking accuracy and success rate against Fast Motion (Fast Motion) background in a UAV123 dataset of the present invention and other algorithms;

FIG. 10 is a graph of tracking accuracy and success rate in the context of Full Occlusion in the UAV123 dataset by the present invention and other algorithms;

FIG. 11 is a graph of tracking accuracy and success rate against UAV123 dataset Partial Occlusion (partial occlusion) background for the present invention and other algorithms;

FIG. 12 is a graph of tracking accuracy and success rate in the context of the Out-of-View (Out-of-View) of the UAV123 dataset by the present invention and other algorithms;

FIG. 13 is a graph of tracking accuracy and success rate against UAV123 dataset Background Clutter (background clutter) background for the present invention and other algorithms;

FIG. 14 is a graph of tracking accuracy and success rate against UAV123 dataset illumination Variation (illumination variation) background for the present invention and other algorithms;

FIG. 15 is a graph of tracking accuracy and success rate in the context of a ViewPoint Change in UAV123 dataset by the present invention and other algorithms;

FIG. 16 is a graph of tracking accuracy and success rate against a Camera Motion background in a UAV123 dataset of the present invention and other algorithms;

FIG. 17 is a graph of tracking accuracy and success rate in the context of a Similar Object in the UAV123 dataset by the present invention and other algorithms.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1 and 2, two flowcharts of a scale-adaptive long-term correlation target tracking method according to an embodiment of the present invention are shown, respectively, and the method includes the following steps:

pretreatment: and reading the target position of the first frame to obtain a large background detection frame, a small background detection frame and Gaussian regression labels corresponding to each detection frame.

The processing procedure of the video of the first frame: extracting target direction gradient Histogram (HOG) characteristics according to the detection frame, and performing Fourier transform to obtain x _f Adding a Gaussian kernel function to obtain a Gaussian response k in a frequency domain _f Wherein the Gaussian kernel function isBoth regression models were mapped to kernel space, defined as k (x, x') =φ (x) ·φ(x'). Using x _f and k_f Calculating classifier parameters to obtain a time context regression model R _c Regression model R of target appearance _t And detector D _rf, wherein R_c Responsible for translation estimation, R _t Responsible for scale estimation, D _rf Is responsible for re-detection.

The processing procedure of the current frame (t-th frame):

(1) According to the target position (x) of the t-1 th frame _t-1 ，y _t-1 ) Cutting a search window in a t frame, extracting HOG characteristics, and training a relevant filter template;

the training related filter template specifically comprises the following components:

The above can also be written as w= Σ _m,n a(m,n)φ(x _m,n )。

(2) Translation estimation using context regression model R _c Correlation filter score calculation response y _t And estimates the position (x) _t ，y _t )；

The tracking task may calculate the correlation map by image blocks z of a new one of the image frames of size m x n. The response of step (2) may be determined according to the following equation:

(3) As shown in FIG. 3, a scale pool is built using a multi-scale search strategy, by correlation mapping y _s And a target appearance regression model R _t Adaptive estimation of optimal dimensionsObtaining an initial predicted target state of the t-th frame +.>

The scale search strategy is as follows: the template scale is fixed as S _T ＝(s _x ,s _y ) The scale pool is set to s= { t ₁ ,t ₂ ,…t _k For the current frame, at { t } _i s _t |t _i The k dimensions are sampled in S to find the appropriate target scale, and bilinear interpolation is then employed so that the samples of each scale become equal to S _T Samples were of uniform size.

The final target scale response value is calculated as follows: is the ith scale sample in the scale pool, with the size of t _i S _t . By bilinear interpolation, < > on->Will be adjusted to S _T 。

(4) Setting a first threshold value tau _r Second threshold τ _t ，A correlation map representing the predictions; if->Using D _rf Performing re-detection to find a candidate state set X, for each state X in X _i 'calculate confidence score y' _i If max (y' _i )＞τ _t Then->Obtaining the final predicted target state of the t-th frame +.>

(5) Updating the model R _c ；

(6) Setting a third threshold value tau _a If (3)Updating the model R _t ；

In the steps (5) and (6), R is as follows _c 、R _t The model updates the coefficients f and A in the model frame by frame at the learning rate alpha as:

(7) Update detector D _rf ；

In the embodiment of the invention, D _rf Is a Support Vector Machine (SVM) detector. For SVM, a training set { (v) is given in each frame _i ,c _i ) I = 1,2,..n } and N samples, where v _i Is the feature vector generated by the ith sample, c _i E { +1, -1} is then the sample label, and the objective function for solving the SVM detector hyperplane h is:

wherein ,<h,v>representing the inner product of h and v.

Passive algorithms are used to efficiently update hyperplane parameters:

Obtaining final predicted target state of the t-th frame from steps (2) - (4)Obtaining R of the current frame from steps (5) - (7) _c 、R _t and D_rf ；

(8) Repeating (1) - (7) for the t+1st frame until the video sequence ends.

According to the scale self-adaptive long-term correlation target tracking method provided by the embodiment of the invention, a scale self-adaptive strategy and an LCT target tracking frame are effectively fused, and firstly, a scale pool is introduced, so that an algorithm can self-adaptively select the optimal scale for finding the position of a tracking target. The multi-scale search can be combined with the position estimation filter more stably, and the situation that the scale estimation offset is too large is not easy to occur.

Based on the above embodiments, this embodiment provides a simulation experiment.

Simulation conditions: the simulation provided in this example was at Intel (R) Core (TM) i3-4170CPU@3.70GHz 3.70GHz, a hardware environment with 4.00GB memory, and a software environment with MATLAB R2016 a. The experimental parameters were set as follows: regularization parameter λ=10 ^-4 Gaussian kernel σ=0.1, learning rate α=0.01, threshold τ _r ＝0.15，τ _a ＝0.38，τ _t =0.5, scale pool set to [1,0.99,1.01 ]]. The algorithm presented herein is then compared to LCTs and other existing classical target tracking algorithms.

The simulation content: the proposed method is evaluated on a large reference data set UAV-123 containing 123 videos, and the evaluation mode selects one pass success rate (OPE), i.e. the target position given by the first frame starts tracking, and the tracking is not reinitialized after failure.

Fig. 4 to 17 are graphs of experimental results of the present experiment, in which LCSA represents the scale-adaptive long-term correlation filter tracking method proposed by the present invention, LCT and kcf_ GaussHog, CSK, IVT, DFT represent other excellent target tracking algorithms, respectively. The LCSA algorithm provided by the invention has a score of 0.40 in the graph of fig. 5 versus the success rate curve and a score of 0.58 in the graph of fig. 4 versus the accuracy curve. With the long-term correlation tracking algorithm (LCT) as a reference algorithm, the LCSA can be obtained from experimental data on the UAV-123 to be improved by 2.56% in AUC success rate and 5.26% in accuracy compared with the LCT. Although the accuracy is slightly lower than LCT in the case of background clutter on UAV123, there is a performance improvement over classical target tracking algorithms such as LCT in classical target tracking scenarios such as scale change, aspect ratio, low resolution, fast motion, complete occlusion, partial occlusion, out-of-view, illumination change, field of view point transition, camera motion, similar objects, etc.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A scale-adaptive long-term correlation tracking method, characterized by including the following steps:

(1) Initialize the target detection frame; extract the characteristics of the target according to the detection frame in the first frame, and initialize the temporal context regression model R _c , the target appearance regression model R _t and the detector D _rf ; where R _c is responsible for translation estimation, R _t is responsible for scale estimation, and D _rf is responsible for re-detection;

(2) For the t-th frame, according to the target position (x _t-1 , y _t-1 ) of the t-1th frame, crop the search window in the t-th frame, extract HOG features, and train the relevant filter template;

(3) Perform translation estimation and calculate the response using R _c and the correlation filter score obtained based on the correlation filter template. And estimate the current frame position/>

(4) Use a multi-scale search strategy to construct a scale pool, and adaptively estimate the best scale through correlation mapping y _s and R _t Get the initial predicted target state of frame t/>

(5)If Use D _rf to perform re-detection, find the candidate state set X, and calculate the confidence score y′ _i for each state x′ i in X. If max(y′ _i ₎ >τ _t , then/> i＝arg max _i y′ _i ; where, τ _r is the first threshold, τ _t is the second threshold,/> Represents the predicted correlation mapping; obtains the final predicted target state of the tth frame/>

(6) Update R _c ;

(7)If Update R _t ; where, τ _a is the third threshold;

(8) Update D _rf ;

(9) For the t+1th frame, repeat (2) to (8) until the end of the video sequence.

2. A scale-adaptive long-term correlation tracking method according to claim 1, characterized in that the correlation filter template trained in step (2) is specifically:

Among them, w represents the correlation filter, x _m,n represents that the image block x has m*n pixels, y(m,n) represents the Gaussian sample label generated with x _m,n as the training sample, represents the mapping to the kernel space, and λ represents the regularization parameter.

3. A scale-adaptive long-term correlation tracking method according to claim 1, characterized in that the correlation filter template trained in step (2) is specifically:

The coefficient a is defined by the following formula: F represents the discrete Fourier operator, x _{m, n} represents that the image block x has m*n pixels, and x and y represent the pixel point coordinates.

4. A scale-adaptive long-term correlation tracking method according to claim 3, characterized in that the tracking task is to calculate the correlation mapping through the image block z of a new frame in the image frame of size m*n. , the response to step (3) Determine according to the following formula:

where f represents the learned target appearance model, ⊙ represents the Hadamard product, and is found by The maximum value of finds the predicted position of the target.

5. A scale-adaptive long-term correlation tracking method according to claim 1, characterized in that the scale search strategy of step (4) is:

The template scale is fixed to S _T = (s _x , s _y ), and the scale pool is set to S = {t ₁ , t ₂ ,...t _k }. For the current frame, in {t _i s _t |t _i ∈S} Sampling k sizes to find the appropriate target scale, and then using bilinear interpolation to make the samples of each scale become the same size as the S _T sample;

The final target scale response value is calculated as follows:

is the i-th scale sample in the scale pool, with size t _i S _t .

6. A scale-adaptive long-term correlation tracking method according to claim 3, characterized in that updating R _c and updating R _t includes:

For R _c and R _t , updating the coefficients f and A in the model frame by frame at the learning rate α is:

7. A scale-adaptive long-term correlation tracking method according to claim 1, characterized in that D _rf is a support vector machine detector;

Updated D _rf includes:

In each frame, given a training set {( _vi , _ci )|i=1,2,...,N} and N samples, where _vi is the feature vector generated by the i-th sample, c _i ∈{+1,-1} is the sample label, and the objective function to solve the support vector machine detector hyperplane h is:

Among them, l(h; (v, c))=max{0,1-c<h,v>}, <h,v> represents the inner product of h and v; λ represents the regularization parameter;

Update hyperplane parameters using a passive algorithm:

in, is the gradient of the loss function with respect to h, and τ∈(0,∞) is a hyperparameter that controls the update rate of h.

8. A scale-adaptive long-term correlation tracking method according to claim 1, characterized in that τ _r is 0.15, τ _t is 0.5, and τ _a is 0.38.