CN104036528A

CN104036528A - Real-time distribution field target tracking method based on global search

Info

Publication number: CN104036528A
Application number: CN201410298728.7A
Authority: CN
Inventors: 宁纪锋; 叱干鹏飞; 石武祯
Original assignee: Northwest A&F University
Current assignee: Northwest A&F University
Priority date: 2014-06-26
Filing date: 2014-06-26
Publication date: 2014-09-10

Abstract

本发明公开了一种基于全局搜索的实时分布场目标跟踪方法，包括以下步骤：1)在第一帧选定的目标图像I手动标记目标的位置；2)当目标图像I的分布场模型经过高斯平滑处理后，目标模型建立，接下来是在下一帧图像中确定待搜索区域并对其进行高斯平滑得到候选区域分布场，然后寻找与目标模型具有最大相关系数的图像块；3)将前后两帧得到的两个目标模型按照一定的学习率ρ融合来更新当前帧的目标模型；4)循环2)和3)，直到整个视频序列结束。本发明采用基于相关系数的全局模板匹配搜索策略，既克服了原始分布场利用梯度下降法容易陷入局部最优值的局限，又避免使用L1距离度量相似性容易受噪声影响的缺点，改进了跟踪性能。The invention discloses a real-time distribution field target tracking method based on global search, which comprises the following steps: 1) manually marking the position of the target on the target image I selected in the first frame; 2) when the distribution field model of the target image I passes through After the Gaussian smoothing process, the target model is established, and the next step is to determine the area to be searched in the next frame of image and perform Gaussian smoothing on it to obtain the distribution field of the candidate area, and then find the image block with the largest correlation coefficient with the target model; The two target models obtained in two frames are fused according to a certain learning rate ρ to update the target model of the current frame; 4) Cycle 2) and 3) until the end of the entire video sequence. The present invention adopts the global template matching search strategy based on the correlation coefficient, which not only overcomes the limitation that the original distribution field is easy to fall into the local optimal value by using the gradient descent method, but also avoids the disadvantage that the similarity of the L1 distance measure is easily affected by noise, and improves the tracking performance.

Description

A real-time distributed field target tracking method based on global search

【技术领域】【Technical field】

本发明属于计算机视觉与图像分析领域，具体涉及一种基于全局搜索的实时分布场目标跟踪方法。The invention belongs to the field of computer vision and image analysis, in particular to a global search-based real-time distribution field target tracking method.

【背景技术】【Background technique】

跟踪技术是计算机视觉的一个重要问题，已经广泛应用到视频监控，人机接口、机器人感知、行为理解和动作识别等领域。由于跟踪过程中目标的旋转、变形、遮挡和光照变化等复杂因素的影响，视觉跟踪技术一直是一个值得深入研究的问题^[1，2]。Tracking technology is an important problem in computer vision and has been widely used in video surveillance, human-machine interface, robot perception, behavior understanding and action recognition and other fields. Due to the influence of complex factors such as target rotation, deformation, occlusion and illumination changes in the tracking process, visual tracking technology has always been a problem worthy of further research ^{[1, 2]} .

一般地，跟踪算法主要包括目标表示、搜索策略和模型更新三个方面。其中，目标表示是跟踪算法首先要解决的问题。早期的目标常用一个模板表示，该模板包含目标的亮度、梯度信息或其他特征^[3]，但是它对目标的空间结构变化敏感。另一种基于模板的目标表示方法是直方图表示法^[4，5]，计算简单，速度快，对目标的形变、姿势变化等不敏感，可以在一定程度上避免漂移。但是，它是一种基于统计的目标表示方法，会丢失一些空间信息。而且，当目标与背景相似度较高时，这些方法表现力下降。2001年，Viola^[6]等人首先将基于Haar-like特征的Adaboost算法引入到人脸检测中。由于将积分图像的思想应用到Haar-like特征的计算中，极大地提高了特征的获取速度。受此启发，Babenko^[7]等人通过在线多示例学习训练分类器的方法，利用Haar-like特征对目标和背景训练一个判别式模型实现了鲁棒地跟踪。Haar-like特征计算简单，但对边缘、线段比较敏感，而且只能描述特定走向的特征，比较粗糙。姚志均^[8]提出了一种新的空间直方图相似性度量，将空间直方图中的每个区间的空间分布看作为一个高斯分布，用JSD(Jensen-Shannon Divergence)和直方图相交法分别度量空间分布和颜色直方图中的相似性，并将其应用到粒子滤波跟踪算法中，改进跟踪结果。Generally, tracking algorithms mainly include three aspects: target representation, search strategy and model update. Among them, the target representation is the first problem to be solved by the tracking algorithm. The early target is often represented by a template, which contains the brightness, gradient information or other features of the target ^[3] , but it is sensitive to the change of the spatial structure of the target. Another template-based object representation method is the histogram representation method ^{[4, 5]} , which is simple in calculation, fast in speed, insensitive to deformation and posture changes of the target, and can avoid drift to a certain extent. However, it is a statistics-based object representation method that loses some spatial information. Moreover, the expressiveness of these methods decreases when the object-background similarity is high. In 2001, Viola ^[6] and others first introduced the Adaboost algorithm based on Haar-like features into face detection. Since the idea of integral image is applied to the calculation of Haar-like features, the feature acquisition speed is greatly improved. Inspired by this, Babenko ^[7] et al. used the method of online multi-instance learning to train a classifier, and used Haar-like features to train a discriminative model for targets and backgrounds to achieve robust tracking. The calculation of Haar-like features is simple, but it is more sensitive to edges and line segments, and can only describe the characteristics of a specific direction, which is relatively rough. Yao Zhijun ^[8] proposed a new spatial histogram similarity measure, which regards the spatial distribution of each interval in the spatial histogram as a Gaussian distribution, and uses JSD (Jensen-Shannon Divergence) and histogram intersection method to measure The spatial distribution and the similarity in the color histogram are applied to the particle filter tracking algorithm to improve the tracking results.

最近，Laura Sevilla-Lara^[9]等采用了一种新颖的分布场(Distribution Fields，简称DF)目标表示方法，并将其引入到目标跟踪领域。该方法首先通过对图像自然分层，保留原始图像的基本信息，通过对图像各层以及层间进行高斯平滑后，在目标表示中引入了“模糊性”，在一定程度上克服了形变和光照等变化的影响。然而，在搜索策略上，原始分布场按照梯度下降法搜索，当出现极小值的时候停止搜索，部分降低运算量。但是对于目标函数是非凸的，梯度下降法在具体跟踪过程中只是根据上一帧检测得到目标的最大响应位置，在当前帧中从此位置开始在有限的区域内计算L1范数搜索目标，这样对于目标运动相对比较快的情况下，在有限区域搜索易于陷入局部最优解，限制了跟踪效果。Recently, Laura Sevilla-Lara ^[9] adopted a novel distribution field (Distribution Fields, DF for short) object representation method and introduced it into the field of object tracking. This method first preserves the basic information of the original image by natural layering of the image, and introduces "fuzziness" into the target representation after Gaussian smoothing of each layer and between layers of the image, which overcomes the deformation and illumination to a certain extent. effects of such changes. However, in the search strategy, the original distribution field is searched according to the gradient descent method, and the search is stopped when the minimum value appears, which partially reduces the amount of calculation. But for the target function is non-convex, the gradient descent method only detects the maximum response position of the target according to the previous frame detection in the specific tracking process, and calculates the L1 norm search target in a limited area from this position in the current frame, so for When the target moves relatively fast, it is easy to fall into a local optimal solution when searching in a limited area, which limits the tracking effect.

参考文献：references:

[1]Yilmaz A,Javed O,ShahHAH M.Object Tracking:a Survey[J],ACM ComputingSurveys(CSUR),2006,38(4):13.[1] Yilmaz A, Javed O, ShahHAH M. Object Tracking: a Survey [J], ACM Computing Surveys (CSUR), 2006, 38(4): 13.

[2]Yang Han-xuan,Zheng Feng,Wang Liang,et al.Recent advances and trends in visualtracking:A review[J].Neurocomputing.2011,74(18):3823-3831.[2] Yang Han-xuan, Zheng Feng, Wang Liang, et al.Recent advances and trends in visualtracking: A review[J].Neurocomputing.2011,74(18):3823-3831.

[3]Baker S,Matthews I.Lucas-Kanade20years on:A unifying framework[J].InternationalJournal of Computer Vision,2004,56(3):221-255.[3] Baker S, Matthews I. Lucas-Kanade20years on: A unifying framework [J]. International Journal of Computer Vision, 2004, 56(3): 221-255.

[4]Collins R T,Mean-shift blob tracking through scale space[C]//Proc of IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition,Madison:IEEE Press,2003,2:II-234-40.[4]Collins R T, Mean-shift blob tracking through scale space[C]//Proc of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison:IEEE Press,2003,2:II-234-40.

[5]Ning Ji-feng,Zhang Lei,Zhang David,et al.Robust Mean Shift Tracking with CorrectedBackground-Weighted Histogram[J],IET Computer Vision.2012,6(1):62-69.[5] Ning Ji-feng, Zhang Lei, Zhang David, et al. Robust Mean Shift Tracking with Corrected Background-Weighted Histogram [J], IET Computer Vision. 2012, 6(1): 62-69.

[6]Viola P,Jones M,Rapid object detection using a boosted cascade of simple features[C]//Proc of IEEE Conference on Computer Vision and Pattern Recognition,Hawaii:IEEEPress2001,1:511-518.[6]Viola P, Jones M, Rapid object detection using a boosted cascade of simple features[C]//Proc of IEEE Conference on Computer Vision and Pattern Recognition,Hawaii:IEEE Press2001,1:511-518.

[7]Babenko B,Yang M,Belongie S,Robust Object Tracking with Online Multiple InstanceLearning[J],IEEE Transaction on Pattern Analysis and Machine Intelligence,2011,33(8):1619-1632.[7] Babenko B, Yang M, Belongie S, Robust Object Tracking with Online Multiple Instance Learning [J], IEEE Transaction on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1619-1632.

[8]姚志均.一种新的空间直方图相似性度量方法及其在目标跟踪中的应用[J].电子与信息学报,2013,35(7):1644-1649.[8] Yao Zhijun. A new spatial histogram similarity measurement method and its application in target tracking [J]. Journal of Electronics and Information Technology, 2013,35(7):1644-1649.

[9]Laura S L,Erik L M,Distribution fields for tracking[C]//Proc of IEEE Conference onComputer Vision and Pattern Recognition.Providence:IEEE Press,2012:1910-1917.[9]Laura S L, Erik L M, Distribution fields for tracking[C]//Proc of IEEE Conference on Computer Vision and Pattern Recognition.Providence:IEEE Press,2012:1910-1917.

【发明内容】【Content of invention】

本发明的目的在于针对视频目标跟踪中存在目标被部分遮挡、旋转、缩放、光照变化、运动模糊、复杂背景等困难和挑战，提供了一种基于全局搜索的实时分布场目标跟踪方法，以适应目标外观模型的变化。The purpose of the present invention is to provide a real-time distributed field target tracking method based on global search to address the difficulties and challenges in video target tracking such as target being partially occluded, rotated, zoomed, illumination changes, motion blur, and complex backgrounds. Changes to the target appearance model.

为达到上述的目的是，本发明通过如下技术方案实现的：To achieve the above object, the present invention is achieved through the following technical solutions:

一种基于全局搜索的实时分布场目标跟踪方法，包括以下步骤：A real-time distributed field target tracking method based on global search, comprising the following steps:

1)在第一帧选定的目标图像I手动标记目标位置，即以一个矩形框划定目标区域，标记出矩形框的左上角坐标及矩形框的宽、高，然后对划定目标区域用分布场模型表示，分布场模型的构建如下：1) Manually mark the target position on the selected target image I in the first frame, that is, delineate the target area with a rectangular frame, mark the coordinates of the upper left corner of the rectangular frame and the width and height of the rectangular frame, and then delineate the target area with The distribution field model representation, the construction of the distribution field model is as follows:

利用Kronecker delta函数把目标图像I用一个分布场模型表示，得到目标图像I的分布场模型df(i,j,k)，其公式如下所示：Using the Kronecker delta function to represent the target image I with a distribution field model, the distribution field model df(i,j,k) of the target image I is obtained, and its formula is as follows:

$df df ((i i,, j j,, k k)) = = \{\begin{matrix} 11 & if I if i ((i i,, j j)) / / ((255255 / / K K)) = = = = k k \\ 00 & otherwise otherwise \end{matrix} - - - - - - ((11))$

式中：i和j分别表示目标图像I的行和列；In the formula: i and j represent the row and column of the target image I respectively;

K表示对目标图像I要分的层数；K represents the number of layers to be divided into the target image I;

k表示各层的序号，k＝1，2，3…，K；k represents the serial number of each layer, k=1, 2, 3..., K;

df(i,j,k)表示目标图像I经过分解后在第k层上第i行第j列的值，它的取值范围为0或1；df(i, j, k) represents the value of the i-th row and the j-th column on the k layer after the target image I is decomposed, and its value range is 0 or 1;

深度为255/K的集合称为“一层”；A collection with a depth of 255/K is called a "layer";

I(i,j)表示在目标图像I上第i行和第j列的像素值；I(i,j) represents the pixel value of the i-th row and j-th column on the target image I;

接着，对目标图像I的分布场模型进行高斯平滑，高斯平滑分空间域平滑和特征域平滑，首先对目标图像I的分布场模型进行空间域平滑，空间域平滑是在目标图像I的x和y两个方向上进行平滑，其计算公式如下：Next, Gaussian smoothing is performed on the distribution field model of the target image I. Gaussian smoothing is divided into spatial domain smoothing and feature domain smoothing. First, the distribution field model of the target image I is smoothed in the spatial domain. Smoothing in two directions of y, the calculation formula is as follows:

${df df}_{s the s} ((k k)) = = df df ((k k)) * * {h h}_{{σ σ}_{s the s}} - - - - - - ((22))$

式中：df_s(k)表示空间域平滑后的目标图像I的分布场模型；In the formula: df _s (k) represents the distribution field model of the target image I smoothed in the space domain;

df(k)表示目标图像I的分布场模型的第k层；df(k) represents the kth layer of the distribution field model of the target image I;

是一个标准差为σ_s的2D高斯核； is a 2D Gaussian kernel with standard deviation σ _s ;

“*”为卷积符号；"*" is the convolution symbol;

然后对空间域平滑后的目标图像I的分布场模型进行特征域平滑，其计算公式如下：Then perform feature domain smoothing on the distribution field model of the target image I smoothed in the spatial domain, and the calculation formula is as follows:

${df df}_{ss ss} ((i i,, j j)) = = {df df}_{s the s} ((i i,, j j)) * * {h h}_{{σ σ}_{f f}} - - - - - - ((33))$

式中：df_ss(i,j)表示特征域平滑后的目标图像I的分布场模型；In the formula: df _ss (i, j) represents the distribution field model of the target image I after feature domain smoothing;

df_s(i,j)表示经过空间域平滑后目标图像I的分布场模型第i行和第j列的平滑值；df _s (i, j) represents the smoothed value of the i-th row and the j-th column of the distribution field model of the target image I after spatial domain smoothing;

是一个标准σ_f差为的1D高斯核； is a 1D Gaussian kernel with a standard σ _f difference of ;

高斯平滑后目标图像I的分布场模型各像素每一列积分为1；The distribution field model of the target image I after Gaussian smoothing is 1 for each column of each pixel;

在跟踪过程中，为了适应目标环境的外观变化，需要动态地更新目标图像的分布场模型，即按一定比例混合旧的目标图像分布场模型和新跟踪得到的目标图像相对应的分布场模型，进而一个简单的目标图像分布场模型更新公式如下：In the tracking process, in order to adapt to the appearance changes of the target environment, it is necessary to dynamically update the distribution field model of the target image, that is, mix the distribution field model of the old target image and the distribution field model corresponding to the target image obtained by the new tracking in a certain proportion, Furthermore, a simple target image distribution field model update formula is as follows:

df_t+1(i,j,k)＝ρdf_t(i,j,k)+(1-ρ)df_t-1(i,j,k) (4)df _t+1 (i,j,k)=ρdf _t (i,j,k)+(1-ρ)df _t-1 (i,j,k) (4)

式中：ρ表示学习率，其值在0～1之间，一般设为0.9，其用来控制目标模型的更新速度；In the formula: ρ represents the learning rate, its value is between 0 and 1, generally set to 0.9, which is used to control the update speed of the target model;

t表示跟踪目标图像的当前帧，t+1表示跟踪目标图像的当前帧的下一帧，t-1表示跟踪目标图像的当前帧的上一帧；t represents the current frame of the tracking target image, t+1 represents the next frame of the current frame of the tracking target image, and t-1 represents the previous frame of the current frame of the tracking target image;

df_t+1(i,j,k)表示跟踪目标图像中当前帧的下一帧目标分布场模型的第k层第i行第j列的值；df _t+1 (i, j, k) represents the value of the kth layer, row i, and column j of the next frame target distribution field model of the current frame in the tracking target image;

df_t(i,j,k)表示跟踪目标图像中当前帧目标分布场模型的第k层第i行第j列的值；df _t (i, j, k) represents the value of the kth layer, row i, and column j of the current frame target distribution field model in the tracking target image;

df_t-1(i,j,k)表示跟踪目标图像中当前帧的上一帧目标分布场模型的第k层第i行第j列的值；df _t-1 (i, j, k) represents the value of the kth layer, row i, and column j of the previous frame target distribution field model of the current frame in the tracking target image;

2)当目标图像I的分布场模型经过高斯平滑处理后，目标模型建立，接下来是在下一帧图像中确定待搜索区域并对其进行高斯平滑得到候选区域分布场，然后寻找与目标模型具有最大相关系数的图像块，其中，目标模型与候选区域分布场的相关系数可表示为：2) After the distribution field model of the target image I has been processed by Gaussian smoothing, the target model is established, and the next step is to determine the area to be searched in the next frame of image and perform Gaussian smoothing on it to obtain the distribution field of the candidate area, and then find The image block with the maximum correlation coefficient, where the correlation coefficient between the target model and the distribution field of the candidate area can be expressed as:

${C C}_{i i,, j j} = = {Σ Σ}_{k k = = 11}^{K K} {Σ Σ}_{m m = = 11}^{M m} {Σ Σ}_{n no = = 11}^{N N} df df ((m m,, n no,, k k)) {df df}_{i i,, j j} ((m m + + 11,, n no + + 11,, k k)) - - - - - - ((55))$

式中：df(m,n,k)为目标模型；In the formula: df(m,n,k) is the target model;

df_i,j(m+1,n+1,k)为候选区域的分布场；df _i,j (m+1,n+1,k) is the distribution field of the candidate area;

i，j为候选区域第i行和第j列的坐标；i, j are the coordinates of row i and column j of the candidate area;

C_i,j为候选区域分布场与目标模型的相关系数矩阵；C _{i, j} is the correlation coefficient matrix between the candidate area distribution field and the target model;

m及n表示候选区域第m行和第n列的坐标；m and n represent the coordinates of the mth row and nth column of the candidate area;

设候选图像为M×M，目标图像为N×N，在频域内，需要对候选图像和目标图像进行补零延拓为(M+N-1)²，令S＝M+N-1，算法的复杂度将降低为O(S²log₂S)，由于时域的卷积如公式(5)可用频域的乘积来实现，因此，在频域计算目标模型和候选区域分布场的相关系数，公式如下：Suppose the candidate image is M×M, and the target image is N×N. In the frequency domain, the candidate image and the target image need to be zero-filled and extended to (M+N-1) ² , let S=M+N-1, The complexity of the algorithm will be reduced to O(S ² log ₂ S). Since the convolution in the time domain such as formula (5) can be realized by the product in the frequency domain, the correlation between the target model and the distribution field of the candidate area is calculated in the frequency domain Coefficient, the formula is as follows:

${C C}_{i i,, j j} = = {Σ Σ}_{k k = = 11}^{K K} ifft ifft ((fft fft ((df df ((k k)))) \cdot &Center Dot; {((fft fft (({df df}_{i i,, j j} ((k k))))))}^{* *})) - - - - - - ((66))$

式中：In the formula:

ifft(fft(df(k))·(fft(df_i,j(k)))^*)表示对目标图像分布场模型df(k)和候选图像分布场df_i,j(k)的第k层分别进行快速傅里叶变换fft(df(k))、fft(df_i,j(k))，并求出变换后候选图像分布场的共轭(fft(df_i,j(k)))^*，然后将(fft(df_i,j(k)))^*与fft(df(k))相乘得到频域相关系数矩阵，最后通过反傅里叶变换得到时域相关系数矩阵C_i,j；ifft(fft(df(k))·(fft(df _i,j (k))) ^* ) represents the k-th kth of the target image distribution field model df(k) and the candidate image distribution field df _i,j (k) Each layer performs fast Fourier transform fft(df(k)), fft(df _i,j (k)), and finds the conjugate of the transformed candidate image distribution field (fft(df _i,j (k)) ) ^* , then multiply (fft(df _i,j (k))) ^* with fft(df(k)) to obtain the frequency domain correlation coefficient matrix, and finally obtain the time domain correlation coefficient matrix C _i by inverse Fourier transform _,j ;

fft(df(k))表示对目标图像分布场模型df(k)第k层进行快速傅里叶变换；fft(df(k)) means performing fast Fourier transform on the kth layer of the target image distribution field model df(k);

(fft(df_i,j(k)))^*表示对候选图像分布场df_i,j(k)第k层进行快速傅里叶变换得到fft(df_i,j(k))并求出它的共轭(fft(df_i,j(k)))^*；(fft(df _i,j (k))) ^* means to perform fast Fourier transform on the kth layer of the candidate image distribution field df _i,j (k) to obtain fft(df _i,j (k)) and find it conjugate(fft(df _i,j (k))) ^* ;

df(k)和df_i,j(k)分别为目标模型和候选区域分布场的第k层；df(k) and df _i,j (k) are the target model and the kth layer of the candidate region distribution field respectively;

C_i,j为候选区域分布场与目标图像分布场模型的相关系数矩阵；C _{i, j} is the correlation coefficient matrix of the distribution field of the candidate area and the distribution field model of the target image;

3)利用公式(4)将前后两帧得到的两个目标模型按照一定的学习率ρ融合来更新当前帧的目标模型；3) Use the formula (4) to fuse the two target models obtained by the two frames before and after according to a certain learning rate ρ to update the target model of the current frame;

4)循环2)和3)，直到整个视频序列结束。4) Cycle 2) and 3) until the entire video sequence ends.

与现有技术相比，本发明具有如下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

(1)由于采用基于相关系数的全局模板匹配搜索策略，既克服了原始分布场利用梯度下降法容易陷入局部最优值的局限，又避免使用L1距离度量相似性容易受噪声影响的缺点，改进了跟踪性能。(1) Due to the use of the global template matching search strategy based on the correlation coefficient, it not only overcomes the limitation that the original distribution field is easy to fall into the local optimal value using the gradient descent method, but also avoids the disadvantage of using the L1 distance measure similarity that is easily affected by noise, and improves tracking performance.

(2)由于采用了密集采样策略，基于循环矩阵理论，利用了傅里叶变换，将相关系数从计算复杂度高的时域转换到计算复杂度低的频域来实现，算法的复杂度大为降低，满足了实时性要求。(2) Due to the adoption of a dense sampling strategy, based on the circular matrix theory, Fourier transform is used to convert the correlation coefficient from the time domain with high computational complexity to the frequency domain with low computational complexity. The complexity of the algorithm is large In order to reduce, to meet the real-time requirements.

【附图说明】【Description of drawings】

图1为将图像Twinings转化为分布场的示意图，其中，图1(a)为原始图像，图1(b)为原始图像分布场的示意图，图1(c)为原始图像平滑后的分布场的示意图。Figure 1 is a schematic diagram of converting image Twinings into a distribution field, where Figure 1(a) is the original image, Figure 1(b) is a schematic diagram of the distribution field of the original image, and Figure 1(c) is the smoothed distribution field of the original image schematic diagram.

图2为时域全局搜索算法示意图。Fig. 2 is a schematic diagram of a time-domain global search algorithm.

图3为基于FFT的相关系数匹配算法示意图。Fig. 3 is a schematic diagram of an FFT-based correlation coefficient matching algorithm.

图4为视频序列中心误差图。Fig. 4 is a video sequence center error diagram.

图5为各个视频序列的一些跟踪结果对比图。Fig. 5 is a comparison diagram of some tracking results of various video sequences.

【具体实施方式】【Detailed ways】

下面结合附图对本发明作进一步说明。The present invention will be further described below in conjunction with accompanying drawing.

本发明一种基于全局搜索的实时分布场目标跟踪方法，包括以下步骤：A kind of real-time distributed field target tracking method based on global search of the present invention comprises the following steps:

“*”为卷积符号；"*" is the convolution symbol;

${C C}_{i i,, j j} = = {Σ Σ}_{k k = = 11}^{K K} ifft ifft ((fft fft ((df df ((k k)))) \cdot \cdot {((fft fft (({df df}_{i i,, j j} ((k k))))))}^{* *})) - - - - - - ((66))$

式中：In the formula:

参见图1和图2，图1为将图像Twinings转化为分布场的示意图，其中，图1(a)为原始图像，图1(b)为原始图像分布场的示意图，图1(c)为原始图像平滑后的分布场的示意图。图2为时域全局搜索算法示意图。See Figure 1 and Figure 2, Figure 1 is a schematic diagram of converting image Twinings into a distribution field, where Figure 1(a) is the original image, Figure 1(b) is a schematic diagram of the original image distribution field, and Figure 1(c) is Schematic illustration of the distribution field after smoothing the original image. Fig. 2 is a schematic diagram of a time-domain global search algorithm.

基于全局搜索策略的图像匹配算法是在候选图像上进行逐点搜索而进行匹配的算法：如图2所示：每个圆点表示要搜索的位置，黑色的圆点表示最佳的匹配位置。这种遍历式的全局搜索策略，使其具有全局最优解的特性，避免局部极值，但是它有个缺点：当目标较大或者候选区域比较大时，计算量庞大，无法满足实时性要求。The image matching algorithm based on the global search strategy is a point-by-point search and matching algorithm on the candidate image: as shown in Figure 2: each dot represents the position to be searched, and the black dot represents the best matching position. This ergodic global search strategy makes it have the characteristics of a global optimal solution and avoids local extremums, but it has a disadvantage: when the target is large or the candidate area is relatively large, the calculation is huge and cannot meet the real-time requirements .

图3为基于FFT的相关系数匹配算法示意图Figure 3 is a schematic diagram of FFT-based correlation coefficient matching algorithm

利用FFT实现快速图像匹配可以有效的提高相关计算的时间，使得大规模的相关匹配算法能够实时的进行处理。通常在目标图像边界超出候选图像边界部分的相关值对最终计算结果没有贡献，因此设搜索图为X(k)，大小为L×L，模板图为H(k)，大小为M×M，可以将两幅图像都放大到N＞＝L，且N＝2^r，(r为整数)，此时，相关面的左上部分为图像匹配所需的部分，右下角部分为循环相关的混淆部分。在具体匹配过程中混淆部分是被忽略的，不影响匹配结果。Using FFT to achieve fast image matching can effectively improve the time of correlation calculation, so that large-scale correlation matching algorithms can be processed in real time. Usually, the correlation value in the part where the boundary of the target image exceeds the boundary of the candidate image does not contribute to the final calculation result, so let the search image be X(k), the size is L×L, the template image is H(k), the size is M×M, Both images can be enlarged until N>=L, and N=2 ^r , (r is an integer), at this time, the upper left part of the relevant surface is the part required for image matching, and the lower right part is the confusing part of the circular correlation . The obfuscation part is ignored during the specific matching process and does not affect the matching result.

具体操作步骤如下：The specific operation steps are as follows:

(1)对搜索图X(k)，模板图H(k)先进行延拓处理，扩展到N×N，然后进行的FFT，求出X(k)、H(K)，并求出共轭X^*(k)。(1) For the search graph X(k), the template graph H(k) is first extended to N×N, and then FFT is performed to obtain X(k) and H(K), and the total Yoke X ^* (k).

(2)对复数矩阵X^*(k)H(K)逐点相乘后取共轭，得到:(2) Take the conjugate after multiplying the complex matrix X ^* (k)H(K) point by point, and obtain:

Y(k)＝(H(k)X^*(k))^*＝H^*(k)X(k)Y(k)＝(H(k)X ^* (k)) ^* ＝H ^* (k)X(k)

(3)对Y(k)做N×N点IFFT，得到相关矩阵Y(n)，然后剔除右下角混合部分，得到需要的相关矩阵y(n)；(3) Do N×N point IFFT on Y(k) to get the correlation matrix Y(n), and then remove the mixed part in the lower right corner to get the required correlation matrix y(n);

(4)查找矩阵y(n)最大值即为最佳匹配点，反算其(x,y)坐标，返回结果。(4) The maximum value of the search matrix y(n) is the best matching point, invert its (x, y) coordinates, and return the result.

实施例：Example:

为了验证本文算法的性能，本文选取了由Babenko(2009)所编辑的视频库作为测试集，它涵盖了视觉跟踪领域多方面的困难，比如长时间遮挡(Occluded Face、Occluded Face2)、目标旋转(Sylv、Twinings)、特殊形状(Surfer、Coke11)、目标形变(Cliffbar、Twinings)、光照变化(David、Sylv、Coke11)、快速运动和遮挡(Tiger1、Tiger2)、相似物诱导(Dollar、Cliffbar)等，在比较算法上，选取代表稀疏采样的多示例跟踪算法(MIL)和原始的分布场跟踪算法(DF)，从成功率，平均误差和速度等方面进行比较，以测试提出算法的性能。算法在windows7系统下Matlab R2010a运行的，计算机配置是Inter(R)Core(TM)i3-2130.3.40GHz CPU，4GB RAM。In order to verify the performance of the algorithm in this paper, this paper selects the video library edited by Babenko (2009) as the test set, which covers various difficulties in the field of visual tracking, such as long-term occlusion (Occluded Face, Occluded Face2), target rotation ( Sylv, Twinings), special shape (Surfer, Coke11), target deformation (Cliffbar, Twinings), illumination change (David, Sylv, Coke11), fast movement and occlusion (Tiger1, Tiger2), analog induction (Dollar, Cliffbar), etc. , in terms of comparison algorithms, the multi-instance tracking algorithm (MIL) representing sparse sampling and the original distributed field tracking algorithm (DF) are selected, and compared in terms of success rate, average error and speed, to test the performance of the proposed algorithm. The algorithm is run on Matlab R2010a under the windows7 system, and the computer configuration is Inter(R)Core(TM)i3-2130.3.40GHz CPU, 4GB RAM.

参数的设置Parameter setting

算法参数设置如下：对于分布场的层数K，考虑到不同视频序列的特性，将其设置成4或8层。层数越少虽速度快，但准确率下降；对于一些视频，层数过多也未必有更好的跟踪效果。对图像做空间域高斯平滑的参数宽度和方差，与原始分布场跟踪算法所建议的一致，目标越大，参数越大，反之则越小。本发明空间域高斯平滑方差为0.3，特征域高斯平滑的方差为1。对于候选区域的搜索半径，因各个视频目标大小和目标运动幅度不同，将其设置为5或20略有不同。最后，目标模型更新时学习率设置为0.95，除Coke11和Tiger1为0.8，Sylv和Surf为0.75，以满足不同视频要求。对于多示例学习跟踪和原始的分布场跟踪算法，我们调出每种算法的最佳跟踪结果。The parameters of the algorithm are set as follows: for the number of layers K of the distribution field, considering the characteristics of different video sequences, set it to 4 or 8 layers. The fewer layers, the faster the speed, but the lower the accuracy rate; for some videos, too many layers may not have a better tracking effect. The parameter width and variance of the Gaussian smoothing in the spatial domain of the image are consistent with the original distributed field tracking algorithm. The larger the target, the larger the parameter, and vice versa. In the present invention, the variance of Gaussian smoothing in space domain is 0.3, and the variance of Gaussian smoothing in feature domain is 1. For the search radius of the candidate area, it is slightly different to set it to 5 or 20 due to the different size and motion range of each video target. Finally, the learning rate is set to 0.95 when the target model is updated, except for Coke11 and Tiger1 which are 0.8, and Sylv and Surf which are 0.75 to meet different video requirements. For multiple-instance learning tracking and the original distributed field tracking algorithm, we call out the best tracking results for each algorithm.

定量分析quantitative analysis

实验使用三种不同策略来分析跟踪结果，分别是测试位置与真实位置的中心偏移距离(表1)和跟踪成功率(表2)以及跟踪速度(表3)。其中，中心误差反应了测试位置与实际位置偏移的大小，其值越小说明跟踪更接近实际位置，效果越好；其值越大说明跟踪位置离真实位置差的比较远，跟踪性能差。由于每个视频序列所针对的场景不一样，其值波动也比较大。对于一个视频帧，如果(A∩B)/(A∪B)＞0.5，认为跟踪成功，其中A表示跟踪结果矩形框，B表示目标位置真实值矩形框。它也是衡量算法跟踪性能的一项指标，通常认为跟踪矩形框与目标位置真实矩形框重合率大于50％就认为跟踪成功，否则认为跟踪失败，其值越大说明跟踪效果越好。The experiment uses three different strategies to analyze the tracking results, which are the center offset distance between the test position and the real position (Table 1), the tracking success rate (Table 2) and the tracking speed (Table 3). Among them, the center error reflects the size of the deviation between the test position and the actual position. The smaller the value, the closer the tracking is to the actual position, and the better the effect; the larger the value, the farther the tracking position is from the real position, and the tracking performance is poor. Since each video sequence targets different scenes, its value fluctuates greatly. For a video frame, if (A∩B)/(A∪B)>0.5, it is considered that the tracking is successful, where A represents the tracking result rectangle, and B represents the real value rectangle of the target position. It is also an indicator to measure the tracking performance of the algorithm. It is generally considered that the tracking rectangle and the real rectangle of the target position overlap more than 50% and the tracking is considered successful. Otherwise, the tracking is considered to be failed. The larger the value, the better the tracking effect.

由表1和表2可以看出，对于大部分视频序列，提出的算法在跟踪中心误差和跟踪成功率上比多示例和分布场两种算法得到了更好的跟踪效果。表3是三种算法在跟踪速度方面的比较。显然，提出的算法具有更快的跟踪速度，在Matlab环境下，仍具有实时性。图4是跟踪结果和目标位置真实值之间的相对位置错误(以像素为单位)。It can be seen from Table 1 and Table 2 that for most of the video sequences, the proposed algorithm has better tracking effect than the two algorithms of multi-instance and distributed field in terms of tracking center error and tracking success rate. Table 3 is the comparison of three algorithms in terms of tracking speed. Obviously, the proposed algorithm has a faster tracking speed, and it still has real-time performance under the Matlab environment. Figure 4 is the relative position error (in pixels) between the tracking result and the true value of the target position.

表1 跟踪结果与真实位置中心距离Table 1 The distance between the tracking results and the center of the real position

表2 跟踪成功率Table 2 Tracking success rate

说明：表1和表2中粗体字体表示最好的结果，而斜体字体表示次好的结果。Note: Bold fonts in Tables 1 and 2 indicate the best results, while italic fonts indicate the next best results.

由表1和表2可以看出，对于大部分视频序列，提出的算法比多示例和分布场两种算法得到了更好的跟踪效果。表3是三种算法在跟踪速度方面的比较。显然，提出的算法具有更快的跟踪速度，在Matlab环境下，仍具有实时性。图4是跟踪结果和目标位置真实值之间的相对位置错误(以像素为单位)。分析其原因，这主要借助特殊的分布场模型目标表示方法，通过将图像扩展到立体空间并在特征域和空间域进行平滑后，不仅能够保留目标所有的信息，又扩大了目标的吸引域，妥善处理“不确定”因素的影响，以应对目标在运动中形变、遮挡、光照变化的影响。但原始分布场在目标搜索策略上采用梯度下降法，即将新一帧图像也扩展开和平滑得到分布场模型，然后计算目标分布场与候选样本L1范数的差的梯度，接着沿着梯度下降的方向继续搜索，直到达到一个局部极优解，就用它来更新目标的外观模型。然而当目标运动比较剧烈或长时间遮挡时，只在有限的区域内搜索容易陷入局部极优值，并且目标模型会加入错误的背景信息，这样不能有效的表示目标，导致再次搜索目标时匹配的精度下降，限制了跟踪效果。因此，本发明提出了一种采用相关系数代替L1范数的全局搜索策略，克服原始分布场局部搜索和实时性差的局限。It can be seen from Table 1 and Table 2 that for most video sequences, the proposed algorithm has better tracking effect than the two algorithms of multi-instance and distributed field. Table 3 is the comparison of three algorithms in terms of tracking speed. Obviously, the proposed algorithm has a faster tracking speed, and it still has real-time performance under the Matlab environment. Figure 4 is the relative position error (in pixels) between the tracking result and the true value of the target position. Analyzing the reason, this is mainly based on the special distributed field model target representation method. By extending the image to the three-dimensional space and smoothing in the feature domain and the spatial domain, not only can all the information of the target be retained, but also the attractive domain of the target can be expanded. Properly handle the influence of "uncertainty" factors to deal with the influence of deformation, occlusion, and illumination changes of the target during motion. However, the original distribution field adopts the gradient descent method in the target search strategy, that is, the new frame image is expanded and smoothed to obtain the distribution field model, and then the gradient of the difference between the target distribution field and the L1 norm of the candidate sample is calculated, and then descends along the gradient Continue to search until a local optimal solution is reached, and use it to update the appearance model of the target. However, when the target moves violently or is occluded for a long time, it is easy to fall into the local optimum when searching only in a limited area, and the target model will add wrong background information, which cannot effectively represent the target, resulting in matching when searching for the target again. Accuracy drops, limiting tracking effectiveness. Therefore, the present invention proposes a global search strategy using the correlation coefficient instead of the L1 norm to overcome the limitations of local search and poor real-time performance in the original distribution field.

表3 跟踪速度(帧/秒)Table 3 Tracking speed (frame/second)

说明：粗体字体表示最好的结果，而斜体字体表示次好的结果。Note: Bold font indicates the best result, while italic font indicates the next best result.

定性分析qualitative analysis

图5列出了提出的方法与其它两种跟踪算法在12个视频序列中一些帧的跟踪效果。原始的分布场(DF)和本发明的算法取得了较好的跟踪效果，由于都继承了分布场算法的处理模糊和保持空间结构敏感性等优点，算法对长时间大范围遮挡(参见#732Occluded face、#387Occluded face2)、复杂背景变化(参见#886Sylvester、#281David)和目标与背景比较相似(#236Dollar)等场景时表现出优良特性。Figure 5 lists the tracking effects of the proposed method and other two tracking algorithms on some frames in 12 video sequences. The original distributed field (DF) and the algorithm of the present invention have achieved better tracking results. Since both have inherited the advantages of the distributed field algorithm in processing blur and maintaining the sensitivity of the spatial structure, the algorithm is not suitable for long-term and large-scale occlusion (see #732Occluded face, #387Occluded face2), complex background changes (see #886Sylvester, #281David) and scenes where the target is similar to the background (#236Dollar) and other scenes.

原始的分布场跟踪算法提出的分布场目标描述算子具有宽吸引域，能够保留目标的空间信息以及对目标的不确定表示等特点，使得该目标描述算子能够使用简单的梯度下降法搜索目标，并取得很好的跟踪效果。但是梯度下降法存在局部最优解问题，从而引起跟踪不准确。跟踪不准确的累积最后会导致跟踪漂移。而本发明采用的是密集采样策略进行全局搜索匹配，避免了这个缺陷，而且采用快速傅里叶变换，算法复杂度低，运行速度快，具有更好的跟踪结果(参见Twinings#429，Girl#311)。The distribution field target description operator proposed by the original distribution field tracking algorithm has the characteristics of a wide area of attraction, can retain the spatial information of the target and uncertain representation of the target, so that the target description operator can use a simple gradient descent method to search for the target , and achieved a good tracking effect. However, the gradient descent method has a local optimal solution problem, which causes inaccurate tracking. The accumulation of tracking inaccuracies eventually leads to tracking drift. And what the present invention adopts is dense sampling strategy to carry out global search matching, has avoided this defective, and adopts fast Fourier transform, algorithm complexity is low, and running speed is fast, has better tracking result (referring to Twinings#429, Girl# 311).

对于多示例学习法，它能够自适应地更新外观模型，正确地反映目标的外观的改变，还引入了包的概念，一定程度上缓解了目标匹配不准确引起的跟踪漂移问题。它对遮挡表现良好的性能，但是当遮挡离去的时候就会出现漂移，这可能是因为多示例使用的Noisy-OR模型没有充分使用上一帧正确的信息，导致下一帧采集正样本时效率低下，训练分类器时误差增大最后引起漂移(参见Occluded face#732、Occluded face2#387)。For the multi-instance learning method, it can adaptively update the appearance model to correctly reflect the change of the appearance of the target. It also introduces the concept of package, which alleviates the tracking drift problem caused by inaccurate target matching to a certain extent. It has good performance on occlusion, but when the occlusion leaves, there will be drift. This may be because the Noisy-OR model used by multiple examples does not fully use the correct information of the previous frame, resulting in the next frame collecting positive samples. Inefficient, the error increases when training the classifier and finally causes drift (see Occluded face#732, Occluded face2#387).

总的来说，提出的算法先采用密集采样策略进行全局搜索匹配，避免了算法陷入局部最优解，然后借助傅里叶变换将时域的复杂计算转换到频域进行快速计算，在提高跟踪速度的同时，克服多示例学习跟踪和分布场跟踪速度较慢、易陷入局部极小值的不足。因此，本发明的算法在应对遮挡、外观变化、光照变化、旋转变化等更具优势，且具有实时性。In general, the proposed algorithm first uses a dense sampling strategy for global search and matching, avoiding the algorithm from falling into a local optimal solution, and then converts the complex calculations in the time domain to the frequency domain for fast calculations with the help of Fourier transform, improving tracking performance. At the same time, it overcomes the shortcomings of multi-instance learning tracking and distribution field tracking, which are slow and easy to fall into local minima. Therefore, the algorithm of the present invention has more advantages in dealing with occlusion, appearance changes, illumination changes, rotation changes, etc., and has real-time performance.

需要补充的是：本发明选取了由Babenko(2009)所编辑的视频库作为测试集，它涵盖了视觉跟踪领域多方面的困难，比如长时间遮挡(Occluded Face、Occluded Face2)、目标旋转(Sylv、Twinings)、特殊形状(Surfer、Coke11)、目标形变(Cliffbar、Twinings)、光照变化(David、Sylv、Coke11)、快速运动和遮挡(Tiger1、Tiger2)、相似物诱导(Dollar、Cliffbar)等，目前已被绝大多数研究者所采纳并成为目标跟踪领域实际上的标准测试库。It should be added that the present invention selects the video library edited by Babenko (2009) as a test set, which covers various difficulties in the field of visual tracking, such as long-term occlusion (Occluded Face, Occluded Face2), target rotation (Sylv , Twinings), special shape (Surfer, Coke11), target deformation (Cliffbar, Twinings), illumination change (David, Sylv, Coke11), fast movement and occlusion (Tiger1, Tiger2), analog induction (Dollar, Cliffbar), etc., At present, it has been adopted by most researchers and has become the de facto standard test library in the field of object tracking.

参考文献:Babenko B，M.H.Yang，and S.Belongie.2009.Visual Tracking with OnlineMultiple Instance Learning.Computer Vision and Pattern Recognition，IEEE Conference on，IEEE:983-990References: Babenko B, M.H. Yang, and S. Belongie. 2009. Visual Tracking with Online Multiple Instance Learning. Computer Vision and Pattern Recognition, IEEE Conference on, IEEE:983-990

Claims

1. a real-time distribution field target tracking method based on global search, is characterized in that, comprises the following steps:

1) Manually mark the target position on the selected target image I in the first frame, that is, delineate the target area with a rectangular frame, mark the coordinates of the upper left corner of the rectangular frame and the width and height of the rectangular frame, and then delineate the target area with The distribution field model representation, the construction of the distribution field model is as follows:

Using the Kronecker delta function to represent the target image I with a distribution field model, the distribution field model df(i,j,k) of the target image I is obtained, and its formula is as follows:

df df ((i i,, j j,, k k)) = = \{\begin{matrix} 11 & if I if i ((i i,, j j)) / / ((255255 / / K K)) = = = = k k \\ 00 & otherwise otherwise \end{matrix} - - - - - - ((11))

In the formula: i and j represent the row and column of the target image I respectively;

K represents the number of layers to be divided into the target image I;

k represents the serial number of each layer, k=1, 2, 3..., K;

df(i, j, k) represents the value of the i-th row and the j-th column on the k layer after the target image I is decomposed, and its value range is 0 or 1;

A collection with a depth of 255/K is called a "layer";

I(i,j) represents the pixel value of the i-th row and j-th column on the target image I;

Next, Gaussian smoothing is performed on the distribution field model of the target image I. Gaussian smoothing is divided into spatial domain smoothing and feature domain smoothing. First, the distribution field model of the target image I is smoothed in the spatial domain. Smoothing in two directions of y, the calculation formula is as follows:

{df df}_{s the s} ((k k)) = = df df ((k k)) * * {h h}_{{σ σ}_{s the s}} - - - - - - ((22))

In the formula: df _s (k) represents the distribution field model of the target image I smoothed in the space domain;

df(k) represents the kth layer of the distribution field model of the target image I;

is a 2D Gaussian kernel with standard deviation σ _s ;

"*" is the convolution symbol;

Then perform feature domain smoothing on the distribution field model of the target image I smoothed in the spatial domain, and the calculation formula is as follows:

{df df}_{ss ss} ((i i,, j j)) = = {df df}_{s the s} ((i i,, j j)) * * {h h}_{{σ σ}_{f f}} - - - - - - ((33))

In the formula: df _ss (i, j) represents the distribution field model of the target image I after feature domain smoothing;

df _s (i, j) represents the smoothed value of the i-th row and the j-th column of the distribution field model of the target image I after spatial domain smoothing;

is a 1D Gaussian kernel with a standard σ _f difference of ;

The distribution field model of the target image I after Gaussian smoothing is 1 for each column of each pixel;

In the tracking process, in order to adapt to the appearance changes of the target environment, it is necessary to dynamically update the distribution field model of the target image, that is, mix the distribution field model of the old target image and the distribution field model corresponding to the target image obtained by the new tracking in a certain proportion, Furthermore, a simple target image distribution field model update formula is as follows:

df _t+1 (i,j,k)=ρdf _t (i,j,k)+(1-ρ)df _t-1 (i,j,k) (4)

In the formula: ρ represents the learning rate, and its value is between 0 and 1, which is used to control the update speed of the target model;

t represents the current frame of the tracking target image, t+1 represents the next frame of the current frame of the tracking target image, and t-1 represents the previous frame of the current frame of the tracking target image;

df _t+1 (i, j, k) represents the value of the kth layer, row i, and column j of the next frame target distribution field model of the current frame in the tracking target image;

df _t (i, j, k) represents the value of the kth layer, row i, and column j of the current frame target distribution field model in the tracking target image;

df _t-1 (i, j, k) represents the value of the kth layer, row i, and column j of the previous frame target distribution field model of the current frame in the tracking target image;

2) After the distribution field model of the target image I has been processed by Gaussian smoothing, the target model is established, and the next step is to determine the area to be searched in the next frame of image and perform Gaussian smoothing on it to obtain the distribution field of the candidate area, and then find The image block with the maximum correlation coefficient, where the correlation coefficient between the target model and the distribution field of the candidate area can be expressed as:

{C C}_{i i,, j j} = = {Σ Σ}_{k k = = 11}^{K K} {Σ Σ}_{m m = = 11}^{M m} {Σ Σ}_{n no = = 11}^{N N} df df ((m m,, n no,, k k)) {df df}_{i i,, j j} ((m m + + 11,, n no + + 11,, k k)) - - - - - - ((55))

In the formula: df(m,n,k) is the target model;

df _i,j (m+1,n+1,k) is the distribution field of the candidate area;

i, j are the coordinates of row i and column j of the candidate area;

C _{i, j} is the correlation coefficient matrix between the candidate area distribution field and the target model;

m and n represent the coordinates of the mth row and nth column of the candidate area;

Suppose the candidate image is M×M, and the target image is N×N. In the frequency domain, the candidate image and the target image need to be zero-filled and extended to (M+N-1) ² , let S=M+N-1, The complexity of the algorithm will be reduced to O(S ² log ₂ S). Since the convolution in the time domain such as formula (5) can be realized by the product in the frequency domain, the correlation between the target model and the distribution field of the candidate area is calculated in the frequency domain Coefficient, the formula is as follows:

{C C}_{i i,, j j} = = {Σ Σ}_{k k = = 11}^{K K} ifft ifft ((fft fft ((df df ((k k)))) \cdot \cdot {((fft fft (({df df}_{i i,, j j} ((k k))))))}^{* *})) - - - - - - ((66))

In the formula:

ifft(fft(df(k))·(fft(df _i,j (k))) ^* ) represents the k-th kth of the target image distribution field model df(k) and the candidate image distribution field df _i,j (k) Each layer performs fast Fourier transform fft(df(k)), fft(df _i,j (k)), and finds the conjugate of the transformed candidate image distribution field (fft(df _i,j (k)) ) ^* , then multiply (fft(df _i,j (k))) ^* with fft(df(k)) to obtain the frequency domain correlation coefficient matrix, and finally obtain the time domain correlation coefficient matrix C _i by inverse Fourier transform _,j ;

fft(df(k)) means performing fast Fourier transform on the kth layer of the target image distribution field model df(k);

(fft(df _i,j (k))) ^* means to perform fast Fourier transform on the kth layer of the candidate image distribution field df _i,j (k) to obtain fft(df _i,j (k)) and find it conjugate(fft(df _i,j (k))) ^* ;

df(k) and df _i,j (k) are the target model and the kth layer of the candidate region distribution field respectively;

C _{i, j} is the correlation coefficient matrix of the distribution field of the candidate area and the distribution field model of the target image;

3) Use the formula (4) to fuse the two target models obtained by the two frames before and after according to a certain learning rate ρ to update the target model of the current frame;

4) Cycle 2) and 3) until the entire video sequence ends.

2. A global search-based real-time distribution field target tracking method according to claim 1, wherein the learning rate ρ is set to 0.9.