CN108021869A

CN108021869A - A kind of convolutional neural networks tracking of combination gaussian kernel function

Info

Publication number: CN108021869A
Application number: CN201711127478.0A
Authority: CN
Inventors: 汪鸿翔; 柳培忠; 顾培婷; 刘晓芳; 陈智; 范宇凌
Original assignee: Quanzhou City Hongye Mdt Infotech Ltd In Imitation; Huaqiao University
Current assignee: Quanzhou City Hongye Mdt Infotech Ltd In Imitation; Huaqiao University
Priority date: 2017-11-15
Filing date: 2017-11-15
Publication date: 2018-05-11

Abstract

A kind of convolutional neural networks tracking of combination gaussian kernel function disclosed by the invention, this method step include：First two field picture is normalized first and clusters extraction target information, with reference to target background information during tracking collectively as each rank wave filter in convolutional network structure, convolution algorithm speed is improved by gaussian kernel function, extract target simple abstract feature, then the convolution results for being superimposed simple layer obtain the profound expression of target, finally realize and track with reference to particle filter tracking frame.Convolutional network structure after the present invention is simplified, departs from the depth abstract characteristics of harsh deep learning running environment extraction, can effectively cope with low resolution, the scene such as target occlusion and deformation, improves the tracking efficiency under complex background.

Description

A Convolutional Neural Network Tracking Method Combined with Gaussian Kernel Function

技术领域technical field

本发明涉及计算机视觉的目标跟踪领域，特别是一种结合高斯核函数的卷积神经网络跟踪方法。The invention relates to the field of computer vision target tracking, in particular to a convolutional neural network tracking method combined with a Gaussian kernel function.

背景技术Background technique

视觉跟踪是计算机视觉领域的研究热点，在虚拟现实、人机交互、智能监控、增强现实、机器感知等场景中有着重要的研究与应用价值。视觉跟踪主要通过分析视频图片序列，对检测出的各个候选目标区域实施匹配，定位跟踪目标在视频序列中的位置。目前跟踪算法已经取得很多研究成果，但在实际中应对各类复杂场景时仍面临很大挑战，例如面对遮挡、形变、视频序列分辨率低等诸多因素影响时，如何实现更加鲁棒和准确的跟踪仍然是目前研究的核心。Visual tracking is a research hotspot in the field of computer vision, and has important research and application value in scenarios such as virtual reality, human-computer interaction, intelligent monitoring, augmented reality, and machine perception. Visual tracking mainly analyzes the video image sequence, matches each detected candidate target area, and locates the position of the tracking target in the video sequence. At present, many research results have been achieved in tracking algorithms, but they still face great challenges in dealing with various complex scenes in practice, such as how to achieve more robustness and accuracy in the face of many factors such as occlusion, deformation, and low resolution of video sequences. The tracking of is still the core of current research.

传统跟踪算法大多数直接使用视频图像序列中的像素值特征进行建模，当跟踪过程中出现复杂场景等较大挑战时，浅层的像素级特征无法很好应对。Most of the traditional tracking algorithms directly use the pixel value features in the video image sequence for modeling. When there are big challenges such as complex scenes in the tracking process, shallow pixel-level features cannot cope well.

因此，本发明人对其进一步的探索和研究，提出一种结合高斯核函数的卷积神经网络跟踪方法。Therefore, the inventors further explored and studied it, and proposed a convolutional neural network tracking method combined with a Gaussian kernel function.

发明内容Contents of the invention

为了解决深度学习在跟踪领域的问题，本发明提出结合高斯核函数的卷积神经网络跟踪方法：In order to solve the problem of deep learning in the tracking field, the present invention proposes a convolutional neural network tracking method combined with a Gaussian kernel function:

本发明提出一种结合高斯核函数的卷积神经网络跟踪方法，包括以下步骤：The present invention proposes a kind of convolutional neural network tracking method combined with Gaussian kernel function, comprising the following steps:

步骤1、初始化：包括针对各帧图像的归一化，粒子滤波，网络规模和样本容量这些参数的设置；且设置的参数包括滤波网络取片尺寸w*w，滤波器数P，归一化尺寸n*n，粒子滤波器的目标状态的标准偏差σ_x，σ_y，σ_s以及使用N个粒子；Step 1. Initialization: including the normalization of each frame image, particle filter, network size and sample capacity, and the parameters set include the filter network slice size w*w, the number of filters P, normalization Size n*n, the standard deviation σ _x , σ _y , σ _s of the target state of the particle filter and the use of N particles;

步骤2、初始滤波器提取：针对第一帧图像的目标，通过滑动窗口和K-means聚类提取一个初始滤波器组用以后续网络的滤波器使用，在跟踪过程中此滤波器组保持不变；Step 2. Initial filter extraction: For the target of the first frame image, an initial filter bank is extracted through sliding window and K-means clustering for subsequent network filters. This filter bank remains unchanged during the tracking process. Change;

步骤3、根据卷积神经网络结构，先提取各候选样本的深层抽象特征，再利用高斯核函数方式加速卷积计算，其具体包括：Step 3. According to the convolutional neural network structure, first extract the deep abstract features of each candidate sample, and then use the Gaussian kernel function to accelerate the convolution calculation, which specifically includes:

步骤31、简单层特征提取：针对输入图像帧，通过预处理将图像归一化到n*n大小，对目标区域利用w*w大小的滑动窗口进行采样，得到长度为L的图像块组X；Step 31. Simple layer feature extraction: For the input image frame, normalize the image to n*n size through preprocessing, and sample the target area using a sliding window of w*w size to obtain an image block group X with a length L ;

步骤32、用k-means聚类的方法从L＝(n-w+1)×(n-w+1)个图像块中聚类得到d个图像块滤波器作为卷积核，将卷积核记作 Step 32, use k-means clustering method to cluster from L=(n-w+1)×(n-w+1) image blocks to obtain d image block filters as convolution kernels, and convolve Nuclear as

步骤33、对输入的图像I所对应的响应如公式(1)所示：Step 33, the corresponding response to the input image I is shown in formula (1):

其中，S为第一层卷积结果，F为卷积核；Among them, S is the convolution result of the first layer, and F is the convolution kernel;

步骤34、对图像的目标周围的区域随机采样得到l个样本，同样进行k-means聚类获得图像的背景模板： Step 34, randomly sample the area around the target of the image to obtain l samples, and also perform k-means clustering to obtain the background template of the image:

步骤35、采用均值池化方式处理所有图像的背景模板得到平均背景：Step 35. Process the background templates of all images by mean pooling to obtain the average background:

其中，F^b为背景卷积核，b是标明为背景，d是获取一组背景模板的总个数，m为平均池化操作的参数；Among them, F ^b is the background convolution kernel, b is marked as the background, d is the total number of background templates obtained, and m is the parameter of the average pooling operation;

步骤36、简单层的特征表达如公式(2)所示：Step 36, the feature expression of the simple layer is shown in formula (2):

步骤37、复杂层特征提取：将d个简单层的特征进行堆叠，构成一个三维张量来表示目标的复杂层特征，并将该复杂层特征记作C∈R^{(n-w+1)×(n-w+1)×d}；Step 37, complex layer feature extraction: stack the features of d simple layers to form a three-dimensional tensor to represent the complex layer features of the target, and record the complex layer features as C∈R ^{(n-w+1)× (n-w+1)×d} ;

步骤38、采用稀疏表达的方式表示特征得到特征张量的C的稀疏表达，且 Step 38, using sparse expression to represent features to obtain the sparse expression of C of the feature tensor, and

步骤39、根据soft-shrinking方法获得目标特征表达如公式(3)所示：Step 39, obtain the target feature expression according to the soft-shrinking method as shown in formula (3):

步骤310、利用高斯核函数进行卷积运算，其表达式如下公式(4)所示：Step 310, use the Gaussian kernel function to perform convolution operation, the expression of which is shown in the following formula (4):

其中，*表示复共轭，k(x，x′)表示高斯核函数；Among them, * represents the complex conjugate, k(x, x′) represents the Gaussian kernel function;

步骤311、设是一个高维核希尔伯特空间的映射，则核函数权重可表示为其中，系数向量为α，元素为α_i；Step 311, set is a map of a high-dimensional kernel Hilbert space, then the kernel function The weight can be expressed as Among them, the coefficient vector is α, and the element is α _i ;

步骤312、要求解的参数由v变为α，则核正则化最小二乘分类器(KernelizedRegularized Least Square，KRLS)的闭式解可表示为：Step 312, the parameter to be solved is changed from v to α, then the closed-form solution of the kernel regularized least squares classifier (KernelizedRegularized Least Square, KRLS) can be expressed as:

α＝(K+λI)^-1y (5)α＝(K+λI) ^-1 y (5)

其中，K是核函数矩阵，矩阵元素为K_ij＝k(x_i，x_j)，I是单位矩阵，向量y的元素为y_i，Wherein, K is a kernel function matrix, and the matrix element is K _ij =k( _xi , x _j ), I is an identity matrix, and the element of vector y is y _i ,

由于K是循环矩阵，则将上述公式(5)可转换到DFT域，Since K is a circular matrix, the above formula (5) can be converted to the DFT domain,

其中，是核函数矩阵K的第一行元素组成的向量，符号∧表示傅里叶变换；in, is a vector composed of elements in the first row of the kernel function matrix K, and the symbol ∧ represents Fourier transform;

步骤4、特征匹配与定位：利用粒子滤波跟踪框架，进行特征匹配与定位，以进行目标跟踪。Step 4. Feature matching and positioning: Use the particle filter tracking framework to perform feature matching and positioning for target tracking.

所述步骤4具体包括：Described step 4 specifically comprises:

步骤41、设第t帧时总观测序列为O_t＝{o₁，...，o_t}，根据贝叶斯理论，求出后验概率p的最大值，Step 41. Set the total observation sequence at frame t as O _t ={o ₁ ,...,o _t }, and calculate the maximum value of the posterior probability p according to Bayesian theory,

其中S_t＝[x_t，y_t，s_t]^T，其中x_t，y_t为目标的位置，s_t为尺度参数，p(S_t|S_t-1)为运动模型，p(S_t|O_t)为观测模型。where S _t = [x _t , y _t , s _t ] ^T , where x _t , y _t are the position of the target, s _t is the scale parameter, p(S _t |S _t-1 ) is the motion model, p(S _t |O _t ) is the observation model.

步骤42、对于运动模型p(S_t|S_t-1)，假设目标状态参数相互独立，用三个高斯分布描述，则运动模型即为布朗运动，Step 42. For the motion model p(S _t |S _t-1 ), assuming that the target state parameters are independent of each other and described by three Gaussian distributions, the motion model is Brownian motion.

p(S_t|S_t-1)＝N(S_t|S_t-1，∑S) (8)p(S _t |S _t-1 )=N(S _t |S _t-1 , ∑S) (8)

其中∑S＝diag(σ_x，σ_y，σ_t)为对角协方差矩阵；Where ∑S=diag(σ _x , σ _y , σ _t ) is the diagonal covariance matrix;

步骤43、对于观测模型p(S_t|O_t)；通过测量样本目标之间的相似度计算： Step 43. For the observation model p(S _t |O _t ); calculate by measuring the similarity between sample targets:

步骤44、最终根据公式(9)跟踪目标：Step 44, finally track the target according to formula (9):

在所述步骤4之后，若处理的是最后一帧图像，则输出结果，若不是最后一帧图像，则依次进入步骤5和6；After the step 4, if the last frame of image is processed, the result is output, if it is not the last frame of image, then step 5 and 6 are entered in sequence;

其中步骤5、网络更新：包括采取限定阈值的方式，即当所有粒子中最高的置信值低于阈值时，更新网络，利用初始滤波器组，结合跟踪过程中得到目前前景滤波器组，通过不同权重进行相加，得到新的卷积网络滤波器；Step 5, network update: including adopting the method of limiting the threshold, that is, when the highest confidence value of all particles is lower than the threshold, the network is updated, using the initial filter set, combined with the current foreground filter set obtained during the tracking process, through different The weights are added to obtain a new convolutional network filter;

步骤6、模板更新：利用一个模板匹配方案进行模板更新。Step 6. Template update: use a template matching scheme to update the template.

所述步骤6具体包括：Described step 6 specifically comprises:

步骤61、以第一帧中目标的中心点为中心，偏移量为±1范围内进行等尺寸采样，构成正样本集合。Step 61: Take the central point of the object in the first frame as the center and perform equal-sized sampling within the range of ±1 offset to form a set of positive samples.

步骤62、以当前帧的近、远2类距离采样，构成负样本集合；正模板在整个序列中相同；预设一个更新阈值，当达到阈值时更新一次模板。Step 62: Sampling with the near and far distances of the current frame to form a negative sample set; the positive templates are the same in the entire sequence; preset an update threshold, and update the template once when the threshold is reached.

本发明相比现有技术，具有的优点如下：Compared with the prior art, the present invention has the following advantages:

本发明方法引入高斯核函数加速计算，采用简化后的卷积神经网络，脱离深度学习算法苛刻的运行环境，提取目标的深度抽象特征。第一层利用K-means在第一帧中提取归一化图像块作为滤波器组提取目标的简单层特征，第二层将简单的单元特征图堆叠形成一个复杂的特征映射，并编码目标的局部结构位置信息，实现了鲁棒的跟踪。The method of the present invention introduces a Gaussian kernel function to accelerate calculation, adopts a simplified convolutional neural network, breaks away from the harsh operating environment of a deep learning algorithm, and extracts deep abstract features of a target. The first layer uses K-means to extract normalized image blocks in the first frame as the filter bank to extract the simple layer features of the target, and the second layer stacks simple unit feature maps to form a complex feature map and encodes the target Local structure position information, enabling robust tracking.

下面结合附图对本发明做进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings.

附图说明Description of drawings

图1为本发明一种结合高斯核函数的卷积神经网络跟踪方法的处理流程图。Fig. 1 is a processing flowchart of a convolutional neural network tracking method combined with a Gaussian kernel function in the present invention.

具体实施方式Detailed ways

如图1所示本实施例揭示的一种结合高斯核函数的卷积神经网络跟踪方法，具体包括以下步骤：As shown in Figure 1, a convolutional neural network tracking method combined with a Gaussian kernel function disclosed in this embodiment specifically includes the following steps:

步骤1、初始化：包括针对各帧图像的归一化，粒子滤波，网络规模和样本容量这些参数的设置；且设置的参数包括滤波网络取片尺寸w*w(6×6)，滤波器数P＝100，归一化尺寸n*n(32×32)，粒子滤波器的目标状态的标准偏差设置如下：σ_x＝4，σ_y＝4，σ_s＝0.01，使用N＝300个粒子；Step 1. Initialization: including the normalization of each frame image, particle filter, network size and sample size, and the parameters set include the filter network slice size w*w (6×6), the number of filters P=100, normalization size n*n(32×32), the standard deviation of the target state of the particle filter is set as follows: _σx =4, _σy =4, _σs =0.01, using N=300 particles ;

步骤2、初始滤波器提取：针对第一帧图像的目标(即首帧)，通过滑动窗口和K-means聚类提取一个初始滤波器组用以后续网络的滤波器使用，在跟踪过程中此滤波器组保持不变；Step 2, initial filter extraction: for the target of the first frame image (i.e. the first frame), an initial filter bank is extracted through sliding window and K-means clustering for the filter use of the subsequent network. During the tracking process, this The filter bank remains unchanged;

步骤31、简单层特征提取：针对输入图像帧，通过预处理将图像归一化到n*n大小，对目标区域利用w*w大小的滑动窗口进行采样，得到长度为L的图像块组X，这里w＝6；Step 31. Simple layer feature extraction: For the input image frame, normalize the image to n*n size through preprocessing, and sample the target area using a sliding window of w*w size to obtain an image block group X with a length L , where w=6;

步骤32、用k-means聚类的方法从L＝(n-w+1)×(n-w+1)(此L与步骤31的L相同)个图像块中聚类得到d个图像块滤波器作为卷积核，将卷积核记作 Step 32, use k-means clustering method to obtain d image blocks by clustering from L=(n-w+1)×(n-w+1) (this L is the same as L in step 31) image blocks The filter is used as a convolution kernel, and the convolution kernel is denoted as

α＝(K+λI)^-1y (5)α＝(K+λI) ^-1 y (5)

其中，是核函数矩阵K的第一行元素组成的向量，符号∧表示傅里叶变换；KRLS分类器的闭式解可利用FFT快速得到。in, is a vector composed of elements in the first row of the kernel function matrix K, and the symbol ∧ represents Fourier transform; the closed-form solution of the KRLS classifier can be obtained quickly by using FFT.

步骤4、特征匹配与定位：利用粒子滤波跟踪框架，进行特征匹配与定位，以进行目标跟踪，其具体包括：Step 4. Feature matching and positioning: use the particle filter tracking framework to perform feature matching and positioning for target tracking, which specifically includes:

步骤6、模板更新：利用一个模板匹配方案进行模板更新，其具体包括：Step 6. Template update: use a template matching scheme to update the template, which specifically includes:

上述说明示出并描述了本发明的优选实施例，应当理解本发明并非局限于本文所披露的形式，不应看作是对其他实施例的排除，而可用于各种其他组合、修改和环境，并能够在本文发明构想范围内，通过上述教导或相关领域的技术或知识进行改动。而本领域人员所进行的改动和变化不脱离本发明的精神和范围，则都应在本发明所附权利要求的保护范围内。The above description shows and describes the preferred embodiments of the present invention, it should be understood that the present invention is not limited to the form disclosed herein, should not be regarded as excluding other embodiments, but can be used in various other combinations, modifications and environments , and can be modified within the scope of the inventive concept herein through the above teachings or techniques or knowledge in related fields. However, changes and changes made by those skilled in the art do not depart from the spirit and scope of the present invention, and should all be within the protection scope of the appended claims of the present invention.

Claims

1. a convolutional neural network tracking method in conjunction with Gaussian kernel function, is characterized in that, comprises the following steps:

Step 1. Initialization: including the normalization of each frame image, particle filter, network size and sample capacity, and the parameters set include the filter network slice size w*w, the number of filters P, normalization Size n*n, the standard deviation σ _x , σ _y , σ _s of the target state of the particle filter and the use of N particles;

Step 2. Initial filter extraction: For the target of the first frame image, an initial filter bank is extracted through sliding window and K-means clustering for subsequent network filters. This filter bank remains unchanged during the tracking process. Change;

Step 3. According to the convolutional neural network structure, first extract the deep abstract features of each candidate sample, and then use the Gaussian kernel function to accelerate the convolution calculation, which specifically includes:

Step 31. Simple layer feature extraction: For the input image frame, normalize the image to n*n size through preprocessing, and sample the target area using a sliding window of w*w size to obtain an image block group X with a length L ;

Step 32, use k-means clustering method to cluster from L=(n-w+1)×(n-w+1) image blocks to obtain d image block filters as convolution kernels, and convolve Nuclear as

<mrow><mi>F</mi><mo>=</mo><mo>{</mo><msub><mi>F</mi><mn>1</mn></msub><mo>,</mo><msub><mi>F</mi><mn>2</mn></msub><mo>,</mo><mn>...</mn><mo>,</mo><msub><mi>F</mi><mi>d</mi></msub><mo>,</mo><mo>}</mo><mo>&Subset;</mo><mi>X</mi><mo>;</mo></mrow>

Step 33, the corresponding response to the input image I is shown in formula (1):

<mrow><msubsup><mi>S</mi><mi>i</mi><mi>o</mi></msubsup><mo>=</mo><msubsup><mi>F</mi><mi>i</mi><mi>o</mi></msubsup><mo>&CircleTimes;</mo><mi>I</mi><mo>,</mo><msubsup><mi>S</mi><mi>i</mi><mi>o</mi></msubsup><mo>&Element;</mo><msup><mi>R</mi><mrow><mo>(</mo><mi>n</mi><mo>-</mo><mi>w</mi><mo>+</mo><mn>1</mn><mo>)</mo><mo>&times;</mo><mo>(</mo><mi>n</mi><mo>-</mo><mi>w</mi><mo>+</mo><mn>1</mn><mo>)</mo></mrow></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow>

Among them, S is the convolution result of the first layer, and F is the convolution kernel;

Step 34, randomly sampling the area around the target of the image to obtain samples, also perform k-means clustering to obtain the background template of the image:

Step 35. Process the background templates of all images by mean pooling to obtain the average background:

<mrow><msup><mi>F</mi><mi>b</mi></msup><mo>=</mo><mo>{</mo><msubsup><mi>F</mi><mn>1</mn><mi>b</mi></msubsup><mo>=</mo><mfrac><mn>1</mn><mi>m</mi></mfrac><msubsup><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></msubsup><msubsup><mi>F</mi><mrow><mi>i</mi><mo>,</mo><mn>1</mn></mrow><mi>b</mi></msubsup><mo>,</mo><msubsup><mi>F</mi><mn>2</mn><mi>b</mi></msubsup><mo>=</mo><mfrac><mn>1</mn><mi>m</mi></mfrac><msubsup><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></msubsup><msubsup><mi>F</mi><mrow><mi>i</mi><mo>,</mo><mn>2</mn></mrow><mi>b</mi></msubsup><mo>,</mo><mn>...</mn><mo>,</mo><msubsup><mi>F</mi><mi>d</mi><mi>b</mi></msubsup><mo>=</mo><mfrac><mn>1</mn><mi>m</mi></mfrac><msubsup><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></msubsup><msubsup><mi>F</mi><mrow><mi>i</mi><mo>,</mo><mi>d</mi></mrow><mi>b</mi></msubsup><mo>}</mo></mrow>

Among them, F ^b is the background convolution kernel, b is marked as the background, d is the total number of background templates obtained, and m is the parameter of the average pooling operation;

Step 36, the feature expression of the simple layer is shown in formula (2):

<mrow><msub><mi>S</mi><mi>i</mi></msub><mo>=</mo><msubsup><mi>S</mi><mi>i</mi><mi>o</mi></msubsup><mo>-</mo><msubsup><mi>S</mi><mi>i</mi><mi>b</mi></msubsup><mo>=</mo><mrow><mo>(</mo><msubsup><mi>F</mi><mi>i</mi><mi>o</mi></msubsup><mo>-</mo><msubsup><mi>F</mi><mi>i</mi><mi>d</mi></msubsup><mo>)</mo></mrow><mo>&CircleTimes;</mo><mi>I</mi><mo>,</mo><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>...</mn><mo>,</mo><mi>d</mi><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow></mrow>

Step 37, complex layer feature extraction: stack the features of d simple layers to form a three-dimensional tensor to represent the complex layer features of the target, and record the complex layer features as C∈R ^{(n-w+1)× (n-w+1} ) ^×d ;

Step 38, using sparse expression to represent features to obtain the sparse expression of C of the feature tensor, and

Step 39, obtain the target feature expression according to the soft-shrinking method as shown in formula (3):

Step 310, use the Gaussian kernel function to perform convolution operation, the expression of which is shown in the following formula (4):

Among them, * represents the complex conjugate, k(x, x′) represents the Gaussian kernel function;

Step 311, set is a map of a high-dimensional kernel Hilbert space, then the kernel function The weight can be expressed as Among them, the coefficient vector is α, and the element is α _i ;

Step 312, the parameter to be solved is changed from v to α, then the closed-form solution of the kernel regularized least squares classifier can be expressed as:

α＝(K+λI) ^-1 y (5)

Wherein, K is a kernel function matrix, and the matrix element is K _ij =k( _xi , x _j ), I is an identity matrix, and the element of vector y is y _i ,

Since K is a circular matrix, the above formula (5) can be converted to the DFT domain,

<mrow><msup><mover><mi>&alpha;</mi><mo>^</mo></mover><mo>*</mo></msup><mo>=</mo><mover><mi>y</mi><mo>^</mo></mover><mo>&times;</mo><msup><mrow><mo>(</mo><msup><mover><mi>k</mi><mo>^</mo></mover><mrow><mi>x</mi><mi>x</mi></mrow></msup><mo>+</mo><mi>&lambda;</mi><mo>)</mo></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>6</mn><mo>)</mo></mrow></mrow>

in, is a vector composed of elements in the first row of the kernel function matrix K, and the symbol eight represents Fourier transform;

Step 4. Feature matching and positioning: Use the particle filter tracking framework to perform feature matching and positioning for target tracking.

2. a kind of convolutional neural network tracking method in conjunction with Gaussian kernel function as claimed in claim 1, is characterized in that, described step 4 specifically comprises:

Step 41. Set the total observation sequence at frame t as O _t ={o ₁ ,...,o _t }, and calculate the maximum value of the posterior probability p according to Bayesian theory,

p(S _t |O _t )∝p(O _t |S _t )∫p(S _t |S _t-1 )p(S _t-1 |O _t-1 dS _t-1 (7)

where S _t = [x _t , y _t , s _t ] ^T , where x _t , y _t are the position of the target, s _t is the scale parameter, p(S _t |S _t-1 ) is the motion model, p(S _t |O _t ) is the observation model.

Step 42. For the motion model p(S _t |S _t-1 ), assuming that the target state parameters are independent of each other and described by three Gaussian distributions, the motion model is Brownian motion.

p(S _t |S _t-1 )=N(S _t |S _t-1 , ∑S) (8)

Where ∑s=diag(σ _x , σ _y , σ _t ) is the diagonal covariance matrix;

Step 43. For the observation model p(S _t |O _t ); calculate by measuring the similarity between sample targets:

Step 44, finally track the target according to formula (9):

<mrow><msub><mover><mi>S</mi><mo>^</mo></mover><mi>t</mi></msub><mo>=</mo><msub><mi>argmax</mi><msubsup><mrow><mo>{</mo><msubsup><mi>s</mi><mi>t</mi><mi>i</mi></msubsup><mo>}</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>N</mi></msubsup></msub><mi>p</mi><mrow><mo>(</mo><msub><mi>o</mi><mi>t</mi></msub><mo>|</mo><msubsup><mi>s</mi><mi>t</mi><mi>i</mi></msubsup><mo>)</mo></mrow><mi>p</mi><mrow><mo>(</mo><msubsup><mi>s</mi><mi>t</mi><mi>i</mi></msubsup><mo>|</mo><msub><mover><mi>s</mi><mo>^</mo></mover><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>9</mn><mo>)</mo></mrow><mo>.</mo></mrow>

3. a kind of convolutional neural network tracking method in conjunction with Gaussian kernel function as claimed in claim 1, is characterized in that, after described step 4, if what process is last frame image, then output result, if not last For one frame of image, proceed to steps 5 and 6 in sequence;

Step 5, network update: including adopting the method of limiting the threshold, that is, when the highest confidence value of all particles is lower than the threshold, the network is updated, using the initial filter set, combined with the current foreground filter set obtained during the tracking process, through different The weights are added to obtain a new convolutional network filter;

Step 6. Template update: use a template matching scheme to update the template.

4. a kind of convolutional neural network tracking method in conjunction with Gaussian kernel function as claimed in claim 3, is characterized in that, described step 6 specifically comprises:

Step 61: Take the central point of the object in the first frame as the center and perform equal-sized sampling within the range of ±1 offset to form a set of positive samples.

Step 62: Sampling with the near and far distances of the current frame to form a negative sample set; the positive template is the same in the whole sequence; preset an update threshold, and update the template once when the threshold is reached.