CN109285179A

CN109285179A - A moving target tracking method based on multi-feature fusion

Info

Publication number: CN109285179A
Application number: CN201810834082.8A
Authority: CN
Inventors: 尚振宏; 益争祝玛
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2019-01-29
Anticipated expiration: 2038-07-26
Also published as: CN109285179B

Abstract

The invention relates to a moving target tracking method based on multi-feature fusion, and belongs to the field of computer vision. The present invention firstly initializes the target area in the first frame image, and uses the direction histogram and the color feature to train two position filters respectively; secondly, the detection samples of the two features are extracted around the target of the subsequent frame, and the two detection samples are calculated respectively with The correlation scores of the respective position filters obtained from the training in the previous step are obtained, that is, the response maps of different features are obtained; again, according to the peak sidelobe ratio of different feature response maps, the two feature response values are weighted and fused, and the point with the largest response value is selected as the target. The current center position; then use the directional gradient histogram feature to build a scale pyramid to train the scale filter, and obtain the point with the largest response value as the current scale of the target; finally, according to the peak-to-side ratio of the final response map of each frame, determine whether occlusion occurs, and in the case of occlusion , the position filter is not updated.

Description

A kind of motion target tracking method based on multi-feature fusion

Technical field

The invention discloses a kind of motion target tracking methods based on multi-feature fusion, belong to computer vision field.

Background technique

Target following is a hot spot of computer vision field, is widely used in video monitoring, robot learning, work Industry intelligence etc..Its essence is that position and the state of target are found in one section of continuous videos sequence image.Although at present Target following has been achieved with remarkable progress, but because being blocked, many factors such as illumination variation and dimensional variation influence, it is still It is a challenging problem.

In recent years, due to the remarkable result of correlation filter algorithm, many scholars by correlation filter be introduced into target with In track frame.The selection of feature influences the performance of tracking very big in correlation filter target tracking algorism.Wherein, Bolme etc. The minimum output square error and (Minimun Output Sum of Square Error, MOSSE) algorithm of proposition, are only adopted It is tracked with gray feature, Henriques etc. proposes previous single channel gray feature being extended to multichannel, using direction ladder Spend (the Kernel Correlation of feature (Histogram of Oriented Gridients, HOG) tracking target Filter, KCF) algorithm, improve the accuracy of tracking.Danelljan etc. is added color characteristic in the algorithm, and using it is main at Analysis (Principal Component Analysis, PCA) is to color characteristic CN dimension-reduction treatment, in color image sequence Application effect is pretty good.Danelljan M etc. proposes to carry out mesh using HOG feature construction scale pyramid on the basis of MOSSE The DSST algorithm of scale estimation.Above-mentioned algorithm is all used only single features and describes target, can not more comprehensively express target, Tracking performance has larger difference under different scenes.In addition, above-mentioned algorithm all using frame by frame it is fixed update filter model by the way of, But every frame tracking situation is different, is easy for the information of mistake to be added in object module, and subsequent tracking is caused to lose.

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of motion target tracking method based on multi-feature fusion, to It solves existing single feature and describes target, can not comprehensively express target, the defect that tracking performance differs greatly under different scenes, And it when solving fixed update filter model frame by frame, is easy that error message is added in object module and tracking is caused to fail The problem of.

The technical solution adopted by the present invention is that: a kind of motion target tracking method based on multi-feature fusion, method is by mesh Target is described using single features in mark tracking and conventional model update method is improved to multiple features fusion and selective updating mould Type method.First in first frame image, initialized target region is utilized respectively direction histogram (Histogram of Oriented Gradient, HOG) and color (Color Name, CN) feature two position filtering devices of training；Secondly new one Two kinds of features are extracted in frame target area respectively and obtain two detection samples, calculates separately and is instructed in two detection samples and previous step The relevance scores of the respective position filtering device got are to get the response diagram for arriving different characteristic；It is responded again according to different characteristic The peak sidelobe ratio of figure, two kinds of characteristic response values of Weighted Fusion choose the maximum point of response and are used as the current center of target； Then using HOG feature construction scale pyramid training scaling filter, and response maximum point is obtained as the current ruler of target Degree；Finally according to ratio beside the peak value of the final response diagram of every frame, judges whether to block, under circumstance of occlusion, do not update position Filter.

Specific step is as follows for the method:

Step1, initialized target simultaneously choose target area；

Step2, target area histograms of oriented gradients feature is extracted as training sample, while extracting target area color Feature is as another training sample.With two training sample training respective positions filter models；

Step3, it two kinds of features is extracted respectively in new frame target area obtains two detection samples, calculate separately two The relevance scores for the respective position filtering device that training obtains in detection sample and previous step are to get the response diagram for arriving different characteristic；

Step4, calculate different characteristic response diagram peak sidelobe ratio, according to two kinds of characteristic response values of its Weighted Fusion, It chooses and is used as target current location at maximum value；

Step5, scale is obtained by HOG feature construction scale pyramid training scaling filter in current goal region Maximum value is responded as target current scale；

Step6, scaling filter model is updated；

Step7, according to the peak sidelobe ratio of every frame final position response diagram, judge whether target blocks, if it is Step 3 is repeated to 6, if not occurring blocking enters step 8；

Step8, position filtering device model is updated；

Step9, step 3 is repeated to 8 until tracking terminates.

Specific step is as follows by the step Step1:

Step1.1, according to the input picture first frame, centered on target position, acquire one having a size of 2 times of target The image block P of size.

The specific steps of the Step2 are as follows:

Step2.1, train the application mode of obtained position filtering device identical with principle using target different characteristic.Below It will be all described by taking HOG feature as an example.The HOG feature f of P is extracted as training sample, wherein the dimension of feature is d dimension, f^l It is l dimension therein, l ∈ { 1 ..., d }.Sample training is to make input sample in order to find optimal position filtering device h and need Square error is minimum between exporting.The filter h that h is trained by each dimension of feature^lComposition, h are square by minimizing as follows Difference acquires:

In formula, g indicates the desired output of filter h, and τ is regularization parameter.Training sample f shares d dimension, f^lIt is therein Dimension, l ∈ { 1 ..., d }.* indicate that circulation is related.(1) solution of the minimum value of formula in frequency domain is as follows:

In formula, H^l, G, F be respectively be h^l, g, f frequency domain description,Respectively indicate the conjugate complex number of G, F.F^kIt is F Kth dimension, whereinIt is F^kConjugate complex number.A^l, B be filter h molecule denominator.

The specific steps of the Step3 are as follows:

Step3.1, above-mentioned calculation method obtain position filtering device model, complete the training process of position filtering device.This Target is detected at place, extracts HOG feature as detection sample z in new frame target area, calculates z and use HOG before The relevance scores y of the trained filter h of feature is to get the response diagram for arriving this feature:

In formulaIndicate A^lConjugate complex number, Z be z frequency domain description, Z^lIt is the l dimension of Z, wherein l ∈ { 1 ..., d }. Using CN and HOG feature track respectively target obtain filter response be denoted as y_{T, cn}And y_{T, hog}。

The specific steps of the Step4 are as follows:

Step4.1, in t frame, calculate separately the peak sidelobe ratio of CN feature and HOG characteristic response figure, be denoted as PSR_{T, cn} And PSR_{T, hog}；

Step4.2, CN feature and HOG feature the normalized weight w in t frame are calculated separately_{T, cn}And w_{T, hog},

Step4.3, response level carry out Fusion Features, in t frame, respectively using CN and HOG feature training obtain two A position filtering device response, is denoted as y_{T, cn}And y_{T, hog}, fused response y is obtained using following method of weighting_t,

y_t=w_{T, cn}×y_{T, cn}+w_{T, hog}×y_{T, hog} (6)

Step4.4, y is calculated_tMaximum value, obtain the target position final in t frame.

The specific steps of the Step5 are as follows:

Step5.1, after determining target position, centered on the new position of target, pyramidal 33 figures of building scale are intercepted As layer, and extract one scaling filter H of HOG feature training of these image layers_s, to estimate target scale.And scale Filter is identical with principle with the application mode of position filtering device h, so H_sIt can be obtained by formula (2) calculation method；

Step5.2, in a new frame, target scale, seeks y using formula (3) in order to obtain_sAnd its maximum value is obtained, determine mesh Mark current scale.

The specific steps of the Step6 are as follows:

Step6.1, it is updated with fixing learning rate η to scaling filter model, more new formula are as follows:

In formula, each frame scale separated method device is all updated.In formulaB_{T, s}Indicate the scaling filter the in t frame The molecule and denominator of l dimension.B_{T-1, s}For the molecule denominator of previous frame scaling filter model.Indicate ruler when t frame Spend the conjugate complex number of the frequency domain description of filter desired output.The training sample of training scaling filter the when indicating t frame The frequency domain description of l dimension.Indicate the conjugate complex of the frequency domain description of the training sample kth dimension of training scaling filter when t frame It counts, wherein k ∈ { 1 ..., d }.

The specific steps of the Step7 are as follows:

Step7.1, PSR value are target occlusion judgment basis, for determining whether position filtering device model needs to update.Such as Fruit occurs circumstance of occlusion and does not update position filtering device model then, is otherwise updated to position filter model, and reduction is blocked pair The influence of target following.

The specific steps of the Step8 are as follows:

Step8.1, when judging that target is not blocked, with fix learning rate η to position filter model carry out more Newly, more new formula are as follows:

In formulaB_tIt indicates in t frame, the molecule and denominator of position filtering device h l dimension,B_t-1For previous frame The molecule denominator of position filtering device model,Indicate the conjugate complex number of the frequency domain description of t frame position filter h desired output,Indicate the frequency domain description of the training sample l dimension of training position filtering device h when t frame,Training position filtering when t frame The conjugate complex number of the frequency domain description of the training sample kth dimension of device h, wherein k ∈ { 1 ..., d }.

The specific steps of the Step9 are as follows:

So far, the second frame end of run, target position, scale and all filter models are all for Step9.1, algorithm operation It is updated completion, next frame reruns step 3 to 8 until video terminates.

The beneficial effects of the present invention are:

1, using the motion target tracking method of multiple features fusion

If describing target only with single features (HOG feature or color characteristic).HOG feature is that the part of image is special Sign well adapts to ability to the subtle deformation of target, illumination variation etc., but if target occurs biggish deformation and blocks When, it may occur that it is wrong with or leakage with；And the mankind identify that the important Perception Features color characteristic of image is that one kind based on pixel is complete Office feature, to target rotation, translation and dimensional variation it is insensitive, but color characteristic cannot describe very well target local feature and It can not adapt to illumination variation.For this purpose, both Fusion Features are got up to describe object module by the present invention, it is global special obtaining target While sign, also available target local feature, improves the accuracy of target detection.

2, method for tracking target is realized using selective updating model strategy

For the present invention based on correlation filter target tracking algorism, general correlation filter target tracking algorism uses mesh Model fixed more new strategy frame by frame is marked, if target is blocked, mould can be added to for incorrect information by continuing more new model In type, the failure of target following will lead to.So that tracking performance is improved, the plan updated when proposing only to meet certain condition Slightly, by judge target whether block decide whether progress model modification, reduce the influence blocked to target following, To improve the stability of algorithm

3, target scale is estimated by building scale pyramid training scaling filter

Target following frame when target becomes larger, can only obtain the partial information of target if it is fixed in motion process, When target becomes smaller, it is readily incorporated the background information of interference, will affect the tracking precision of algorithm.To solve this problem, originally Invention estimates target scale by building scale pyramid training scaling filter, solves the problems, such as that moving target dimensional variation is very big Ground reduces in object tracking process because of the fixed bring error message of tracking box.

In short, motion target tracking method based on multi-feature fusion, combines the attribute information of multiple features, using more Feature describes target and selective updating model method.First mesh can be being obtained with a more complete description target using multiple features While marking global characteristics, also available target local feature, improves the accuracy of target detection.Secondly by building ruler Spend the adaptive update target scale of pyramid.The peak sidelobe ratio adaptive updates object module for finally utilizing response diagram, mentions The high validity of model.

Detailed description of the invention

Fig. 1 is method flow diagram in the present invention.

Specific embodiment

In the following with reference to the drawings and specific embodiments, the present invention is further illustrated.

Embodiment 1: as shown in Figure 1, motion target tracking method based on multi-feature fusion, the specific steps of the method It is as follows:

Step1, initialized target simultaneously choose target area；

Step2, target area histograms of oriented gradients (Histogram of Oriented Gradient, HOG) is extracted Feature extracts target area color (Color Name, CN) feature as another training sample as training sample.With Two training sample training respective positions filter models；

Step6, scaling filter model is updated；

Step8, position filtering device model is updated；

Step9, step 3 is repeated to 8 until tracking terminates.

Specific step is as follows by the step Step1:

The specific steps of the Step2 are as follows:

Step2.1, train the application mode of obtained position filtering device identical with principle using target different characteristic.HOG Feature (27 dimension Gradient Features, in addition one-dimensional gray feature, 28 is tieed up totally) and CN feature (by 11 dimensional feature dimensionality reductions to 2 dimensions).Below It will be all described by taking HOG feature as an example.The HOG feature f of P is extracted as training sample, wherein the dimension of feature is d dimension, f^l It is l dimension therein, l ∈ { 1 ..., d }.Sample training is to make input sample in order to find optimal position filtering device h and need Square error is minimum between exporting.The filter h that h is trained by each dimension of feature^lComposition, h are square by minimizing as follows Difference acquires:

The specific steps of the Step3 are as follows:

In formulaIndicate A^lConjugate complex number, Z^lIt is the l dimension of z, wherein l ∈ { 1 ..., d }.Use CN and HOG feature Tracking target obtains filter response and is denoted as y respectively_{T, cn}And y_{T, hog}。

The specific steps of the Step4 are as follows:

Step4.3, target is tracked respectively using CN and HOG feature and is obtained in t frame in response level progress Fusion Features It is responded to position filter, is denoted as y_{T, cn}And y_{T, hog}, fused response y is obtained using following method of weighting_t,

Yt=w_{T, cn}×y_{T, cn}+w_{T, hog}×y_{T, hog} (6)

Step4.4, y is calculated_tMaximum value, obtain the final position of target.

The specific steps of the Step5 are as follows:

The specific steps of the Step6 are as follows:

The specific steps of the Step7 are as follows:

Step7.1, shadowing judge whether target blocks for determining whether that updating position filters according to PSR value Wave device model reduces and blocks influence to target following, when t frame, PSR calculation method:

In formula, PSR_tIndicate peak value side ratio, y when t frame_{T, max}For the peak value of t frame response diagram, μ_tAnd σ_tWhen being t frame The mean value and standard deviation of peak response position peripheral region.PSR_tBigger, peak strength is higher in response distribution, then target confidence It spends higher.

The specific steps of the Step8 are as follows:

Step8.1, it when judging that target is not blocked, is updated with fixing learning rate η to position filter model, More new formula are as follows:

The specific steps of the Step9 are as follows:

The present invention gets up to describe target mould with the heterogeneity of HOG feature and color characteristic CN, by both Fusion Features Type, while obtaining target global characteristics, also available target local feature, improves the accuracy of target detection.Together When according to the peak sidelobe ratio of every frame final goal response diagram, judge whether target blocks to decide whether to carry out model more Newly, the influence blocked to target following is reduced, to improve the stability of algorithm.

Above in conjunction with attached drawing, the embodiment of the present invention is explained in detail, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims

1. a moving target tracking method based on multi-feature fusion, is characterized in that: comprise the steps: at first in the first frame image, initialize the target area, utilize the direction histogram and the color feature to train two position filters respectively; The detection samples of two kinds of features are extracted around the target in the subsequent frame, and the correlation scores of the two detection samples and the respective position filters trained in the previous step are calculated respectively, that is, the response maps of different features are obtained; Sidelobe ratio, weighted fusion of two feature response values, select the point with the largest response value as the current center position of the target; then use the directional gradient histogram feature to build a scale pyramid to train the scale filter, and obtain the point with the largest response value as the target current scale; Finally, according to the peak-to-side ratio of the final response map of each frame, it is judged whether occlusion occurs. In the case of occlusion, the position filter is not updated.

2. the moving target tracking method based on multi-feature fusion according to claim 1, is characterized in that: the concrete steps of described method are as follows:

Step1, initialize the target and select the target area;

Step 2. Extract the directional gradient histogram feature of the target area as a training sample, and extract the color feature of the target area as another training sample, and use the two training samples to train the respective position filter models;

Step 3. Extract two types of features in the target area of a new frame to obtain two detection samples, respectively calculate the correlation scores between the two detection samples and the respective position filters trained in the previous step, that is, to obtain response maps of different features;

Step4: Calculate the peak-to-side lobe ratio of the response maps of different features, fuse the two feature response values according to their weights, and select the maximum value as the current position of the target;

Step 5. In the current target location area, build a scale pyramid to train a scale filter through the HOG feature, and obtain the maximum scale response as the target current scale;

Step6, update the scale filter model;

Step7. According to the peak sidelobe ratio of the final position response map of each frame, determine whether the target is occluded. If it is, repeat steps 3 to 6. If there is no occlusion, go to step 8;

Step8, update the position filter model;

Step9. Repeat steps 3 to 8 until the tracking ends.

3. the moving target tracking method based on multi-feature fusion according to claim 2, is characterized in that: the concrete steps of described step Step1 are as follows:

Step1.1. According to the first frame of the input image, taking the target position as the center, collect an image block P whose size is twice the size of the target.

4. the moving target tracking method based on multi-feature fusion according to claim 3, is characterized in that: the concrete steps of described step Step2 are as follows:

Step2.1. The application method and principle of the position filter obtained by training with different features of the target are the same. The following will take the HOG feature as an example to describe, extract the HOG feature f of P as a training sample, and the dimension of the feature is d dimension, f ^l is the lth dimension, l∈{1,...,d}, the sample training is to find the optimal position filter h, so as to minimize the squared error between the input sample and the desired output, h is determined by each dimension of the feature The trained filters hl are composed of h ^l , and h is obtained by minimizing the mean square error as follows:

In the formula, g represents the expected output of the filter h, τ is the regularization parameter, the training sample f has a total of d dimensions, f ^l is the lth dimension, l∈{1,...,d}, * denotes the cyclic correlation, ( The solution of the minimum value of Eq. 1) in the frequency domain is as follows:

where H ^l , G and F are the frequency domain descriptions of h ^l , g and f, respectively, represent the complex conjugates of G and F, respectively, and F ^k is the kth dimension of F, where k∈{1,…,d}, is the complex conjugate of F ^k , and A ^l and B are the numerator and denominator of the filter h.

5. the moving target tracking method based on multi-feature fusion according to claim 4, is characterized in that: the concrete steps of described Step3 are:

Step3.1. The above calculation method obtains the position filter model, and the training process of the position filter is completed. Here, the target is detected, and the HOG feature is extracted in the target area of a new frame as the detection sample z, and the calculation z is the same as the previous HOG feature. The correlation score y of the trained filter h, that is, the response map of the feature is obtained:

in the formula Represents the conjugate complex number of A ^l , Z is the frequency domain description of z, Z ^l is the lth dimension of Z, where l∈{1,...,d}, using CN and HOG features to track the target separately to obtain the filter response denoted as y _t,cn and y _t,hog .

6. the moving target tracking method based on multi-feature fusion according to claim 5, is characterized in that: the concrete steps of described Step4 are:

Step4.1, at the time of frame t, calculate the peak sidelobe ratio of CN feature and HOG feature response map respectively, denoted as PSR _t,cn and PSR _t,hog ;

Step4.2. Calculate the normalized weights wt _,cn and wt _,hog of CN feature and HOG feature at frame t respectively:

Step4.3. Perform feature fusion at the response level. At frame t, use CN and HOG features to train to obtain two position filter responses, denoted as y _t,cn and y _t,hog , and use the following weighting method to obtain the fusion the response y _t ,

y _t =w _t,cn ×y _t,cn +w _t,hog ×y _t,hog (6)

Step4.4. Calculate the maximum value of y _t to get the final position of the target in the t-th frame.

7. the moving target tracking method based on multi-feature fusion according to claim 6, is characterized in that: the concrete steps of described Step5 are:

Step5.1. After determining the target position, take the new target position as the center, intercept the 33 image layers that build the scale pyramid, and extract the HOG features of these image layers to train a scale filter H _s to estimate the target scale, and the scale The application method and principle of the filter and the position filter h are the same, so H _s can be obtained by the calculation method of formula (2);

Step5.2. In a new frame, in order to obtain the target scale, use the calculation method of formula (3) to find y _s and obtain its maximum value to determine the current scale of the target.

8. the moving target tracking method based on multi-feature fusion according to claim 7, is characterized in that: the concrete steps of described Step6 are:

Step6.1. Update the scale filter model with a fixed learning rate η. The update formula is:

In the formula, the mesoscale filter is updated in each frame, where B _t,s represents the numerator and denominator of the lth dimension of the scale filter at frame t, B _t-1,s is the numerator and denominator of the scale filter model of the previous frame, is a complex conjugate number representing the frequency-domain description of the expected output of the time-scale filter at frame t, represents the frequency domain description of the l-th dimension of the training sample of the training scale filter at the t-th frame, A complex conjugate representing the frequency-domain description of the kth dimension of the training samples of the training scale filter at frame t, where k∈{1,…,d}.

9. the moving target tracking method based on multi-feature fusion according to claim 8, is characterized in that: the concrete steps of described Step7 are:

Step7.1. The PSR value is the basis for judging the target occlusion, which is used to determine whether the position filter model needs to be updated. If the occlusion occurs, the position filter model is not updated. Otherwise, the position filter model is updated to reduce the occlusion and target tracking. Impact.

10. The moving target tracking method based on multi-feature fusion according to claim 9, is characterized in that: the concrete steps of described Step8 are:

Step8.1. When it is judged that the target is not occluded, update the position filter model with a fixed learning rate η. The update formula is:

in the formula B _t represents the numerator and denominator of the l-th dimension of the position filter h at the t-th frame, B _t-1 is the numerator and denominator of the position filter model of the previous frame, is a complex conjugate number representing the frequency-domain description of the expected output of the filter h at the t-th frame position, represents the frequency domain description of the l-th dimension of the training sample of the training position filter h at the t-th frame, The complex conjugate of the frequency-domain description of the k-th dimension of the training sample for training position filter h at frame t, where k ∈ {1,…,d}.

11. the moving target tracking method based on multi-feature fusion according to claim 10, is characterized in that: the concrete steps of described Step9 are:

Step9.1. The algorithm is running so far, the second frame is finished, the target position, scale and all filter models have been updated. Repeat steps 3 to 8 for the next frame until the video ends.