A kind of motion target tracking method based on multi-feature fusion
Technical field
The invention discloses a kind of motion target tracking methods based on multi-feature fusion, belong to computer vision field.
Background technique
Target following is a hot spot of computer vision field, is widely used in video monitoring, robot learning, work
Industry intelligence etc..Its essence is that position and the state of target are found in one section of continuous videos sequence image.Although at present
Target following has been achieved with remarkable progress, but because being blocked, many factors such as illumination variation and dimensional variation influence, it is still
It is a challenging problem.
In recent years, due to the remarkable result of correlation filter algorithm, many scholars by correlation filter be introduced into target with
In track frame.The selection of feature influences the performance of tracking very big in correlation filter target tracking algorism.Wherein, Bolme etc.
The minimum output square error and (Minimun Output Sum of Square Error, MOSSE) algorithm of proposition, are only adopted
It is tracked with gray feature, Henriques etc. proposes previous single channel gray feature being extended to multichannel, using direction ladder
Spend (the Kernel Correlation of feature (Histogram of Oriented Gridients, HOG) tracking target
Filter, KCF) algorithm, improve the accuracy of tracking.Danelljan etc. is added color characteristic in the algorithm, and using it is main at
Analysis (Principal Component Analysis, PCA) is to color characteristic CN dimension-reduction treatment, in color image sequence
Application effect is pretty good.Danelljan M etc. proposes to carry out mesh using HOG feature construction scale pyramid on the basis of MOSSE
The DSST algorithm of scale estimation.Above-mentioned algorithm is all used only single features and describes target, can not more comprehensively express target,
Tracking performance has larger difference under different scenes.In addition, above-mentioned algorithm all using frame by frame it is fixed update filter model by the way of,
But every frame tracking situation is different, is easy for the information of mistake to be added in object module, and subsequent tracking is caused to lose.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of motion target tracking method based on multi-feature fusion, to
It solves existing single feature and describes target, can not comprehensively express target, the defect that tracking performance differs greatly under different scenes,
And it when solving fixed update filter model frame by frame, is easy that error message is added in object module and tracking is caused to fail
The problem of.
The technical solution adopted by the present invention is that: a kind of motion target tracking method based on multi-feature fusion, method is by mesh
Target is described using single features in mark tracking and conventional model update method is improved to multiple features fusion and selective updating mould
Type method.First in first frame image, initialized target region is utilized respectively direction histogram (Histogram of
Oriented Gradient, HOG) and color (Color Name, CN) feature two position filtering devices of training;Secondly new one
Two kinds of features are extracted in frame target area respectively and obtain two detection samples, calculates separately and is instructed in two detection samples and previous step
The relevance scores of the respective position filtering device got are to get the response diagram for arriving different characteristic;It is responded again according to different characteristic
The peak sidelobe ratio of figure, two kinds of characteristic response values of Weighted Fusion choose the maximum point of response and are used as the current center of target;
Then using HOG feature construction scale pyramid training scaling filter, and response maximum point is obtained as the current ruler of target
Degree;Finally according to ratio beside the peak value of the final response diagram of every frame, judges whether to block, under circumstance of occlusion, do not update position
Filter.
Specific step is as follows for the method:
Step1, initialized target simultaneously choose target area;
Step2, target area histograms of oriented gradients feature is extracted as training sample, while extracting target area color
Feature is as another training sample.With two training sample training respective positions filter models;
Step3, it two kinds of features is extracted respectively in new frame target area obtains two detection samples, calculate separately two
The relevance scores for the respective position filtering device that training obtains in detection sample and previous step are to get the response diagram for arriving different characteristic;
Step4, calculate different characteristic response diagram peak sidelobe ratio, according to two kinds of characteristic response values of its Weighted Fusion,
It chooses and is used as target current location at maximum value;
Step5, scale is obtained by HOG feature construction scale pyramid training scaling filter in current goal region
Maximum value is responded as target current scale;
Step6, scaling filter model is updated;
Step7, according to the peak sidelobe ratio of every frame final position response diagram, judge whether target blocks, if it is
Step 3 is repeated to 6, if not occurring blocking enters step 8;
Step8, position filtering device model is updated;
Step9, step 3 is repeated to 8 until tracking terminates.
Specific step is as follows by the step Step1:
Step1.1, according to the input picture first frame, centered on target position, acquire one having a size of 2 times of target
The image block P of size.
The specific steps of the Step2 are as follows:
Step2.1, train the application mode of obtained position filtering device identical with principle using target different characteristic.Below
It will be all described by taking HOG feature as an example.The HOG feature f of P is extracted as training sample, wherein the dimension of feature is d dimension, fl
It is l dimension therein, l ∈ { 1 ..., d }.Sample training is to make input sample in order to find optimal position filtering device h and need
Square error is minimum between exporting.The filter h that h is trained by each dimension of featurelComposition, h are square by minimizing as follows
Difference acquires:
In formula, g indicates the desired output of filter h, and τ is regularization parameter.Training sample f shares d dimension, flIt is therein
Dimension, l ∈ { 1 ..., d }.* indicate that circulation is related.(1) solution of the minimum value of formula in frequency domain is as follows:
In formula, Hl, G, F be respectively be hl, g, f frequency domain description,Respectively indicate the conjugate complex number of G, F.FkIt is F
Kth dimension, whereinIt is FkConjugate complex number.Al, B be filter h molecule denominator.
The specific steps of the Step3 are as follows:
Step3.1, above-mentioned calculation method obtain position filtering device model, complete the training process of position filtering device.This
Target is detected at place, extracts HOG feature as detection sample z in new frame target area, calculates z and use HOG before
The relevance scores y of the trained filter h of feature is to get the response diagram for arriving this feature:
In formulaIndicate AlConjugate complex number, Z be z frequency domain description, ZlIt is the l dimension of Z, wherein l ∈ { 1 ..., d }.
Using CN and HOG feature track respectively target obtain filter response be denoted as yT, cnAnd yT, hog。
The specific steps of the Step4 are as follows:
Step4.1, in t frame, calculate separately the peak sidelobe ratio of CN feature and HOG characteristic response figure, be denoted as PSRT, cn
And PSRT, hog;
Step4.2, CN feature and HOG feature the normalized weight w in t frame are calculated separatelyT, cnAnd wT, hog,
Step4.3, response level carry out Fusion Features, in t frame, respectively using CN and HOG feature training obtain two
A position filtering device response, is denoted as yT, cnAnd yT, hog, fused response y is obtained using following method of weightingt,
yt=wT, cn×yT, cn+wT, hog×yT, hog (6)
Step4.4, y is calculatedtMaximum value, obtain the target position final in t frame.
The specific steps of the Step5 are as follows:
Step5.1, after determining target position, centered on the new position of target, pyramidal 33 figures of building scale are intercepted
As layer, and extract one scaling filter H of HOG feature training of these image layerss, to estimate target scale.And scale
Filter is identical with principle with the application mode of position filtering device h, so HsIt can be obtained by formula (2) calculation method;
Step5.2, in a new frame, target scale, seeks y using formula (3) in order to obtainsAnd its maximum value is obtained, determine mesh
Mark current scale.
The specific steps of the Step6 are as follows:
Step6.1, it is updated with fixing learning rate η to scaling filter model, more new formula are as follows:
In formula, each frame scale separated method device is all updated.In formulaBT, sIndicate the scaling filter the in t frame
The molecule and denominator of l dimension.BT-1, sFor the molecule denominator of previous frame scaling filter model.Indicate ruler when t frame
Spend the conjugate complex number of the frequency domain description of filter desired output.The training sample of training scaling filter the when indicating t frame
The frequency domain description of l dimension.Indicate the conjugate complex of the frequency domain description of the training sample kth dimension of training scaling filter when t frame
It counts, wherein k ∈ { 1 ..., d }.
The specific steps of the Step7 are as follows:
Step7.1, PSR value are target occlusion judgment basis, for determining whether position filtering device model needs to update.Such as
Fruit occurs circumstance of occlusion and does not update position filtering device model then, is otherwise updated to position filter model, and reduction is blocked pair
The influence of target following.
The specific steps of the Step8 are as follows:
Step8.1, when judging that target is not blocked, with fix learning rate η to position filter model carry out more
Newly, more new formula are as follows:
In formulaBtIt indicates in t frame, the molecule and denominator of position filtering device h l dimension,Bt-1For previous frame
The molecule denominator of position filtering device model,Indicate the conjugate complex number of the frequency domain description of t frame position filter h desired output,Indicate the frequency domain description of the training sample l dimension of training position filtering device h when t frame,Training position filtering when t frame
The conjugate complex number of the frequency domain description of the training sample kth dimension of device h, wherein k ∈ { 1 ..., d }.
The specific steps of the Step9 are as follows:
So far, the second frame end of run, target position, scale and all filter models are all for Step9.1, algorithm operation
It is updated completion, next frame reruns step 3 to 8 until video terminates.
The beneficial effects of the present invention are:
1, using the motion target tracking method of multiple features fusion
If describing target only with single features (HOG feature or color characteristic).HOG feature is that the part of image is special
Sign well adapts to ability to the subtle deformation of target, illumination variation etc., but if target occurs biggish deformation and blocks
When, it may occur that it is wrong with or leakage with;And the mankind identify that the important Perception Features color characteristic of image is that one kind based on pixel is complete
Office feature, to target rotation, translation and dimensional variation it is insensitive, but color characteristic cannot describe very well target local feature and
It can not adapt to illumination variation.For this purpose, both Fusion Features are got up to describe object module by the present invention, it is global special obtaining target
While sign, also available target local feature, improves the accuracy of target detection.
2, method for tracking target is realized using selective updating model strategy
For the present invention based on correlation filter target tracking algorism, general correlation filter target tracking algorism uses mesh
Model fixed more new strategy frame by frame is marked, if target is blocked, mould can be added to for incorrect information by continuing more new model
In type, the failure of target following will lead to.So that tracking performance is improved, the plan updated when proposing only to meet certain condition
Slightly, by judge target whether block decide whether progress model modification, reduce the influence blocked to target following,
To improve the stability of algorithm
3, target scale is estimated by building scale pyramid training scaling filter
Target following frame when target becomes larger, can only obtain the partial information of target if it is fixed in motion process,
When target becomes smaller, it is readily incorporated the background information of interference, will affect the tracking precision of algorithm.To solve this problem, originally
Invention estimates target scale by building scale pyramid training scaling filter, solves the problems, such as that moving target dimensional variation is very big
Ground reduces in object tracking process because of the fixed bring error message of tracking box.
In short, motion target tracking method based on multi-feature fusion, combines the attribute information of multiple features, using more
Feature describes target and selective updating model method.First mesh can be being obtained with a more complete description target using multiple features
While marking global characteristics, also available target local feature, improves the accuracy of target detection.Secondly by building ruler
Spend the adaptive update target scale of pyramid.The peak sidelobe ratio adaptive updates object module for finally utilizing response diagram, mentions
The high validity of model.
Detailed description of the invention
Fig. 1 is method flow diagram in the present invention.
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is further illustrated.
Embodiment 1: as shown in Figure 1, motion target tracking method based on multi-feature fusion, the specific steps of the method
It is as follows:
Step1, initialized target simultaneously choose target area;
Step2, target area histograms of oriented gradients (Histogram of Oriented Gradient, HOG) is extracted
Feature extracts target area color (Color Name, CN) feature as another training sample as training sample.With
Two training sample training respective positions filter models;
Step3, it two kinds of features is extracted respectively in new frame target area obtains two detection samples, calculate separately two
The relevance scores for the respective position filtering device that training obtains in detection sample and previous step are to get the response diagram for arriving different characteristic;
Step4, calculate different characteristic response diagram peak sidelobe ratio, according to two kinds of characteristic response values of its Weighted Fusion,
It chooses and is used as target current location at maximum value;
Step5, scale is obtained by HOG feature construction scale pyramid training scaling filter in current goal region
Maximum value is responded as target current scale;
Step6, scaling filter model is updated;
Step7, according to the peak sidelobe ratio of every frame final position response diagram, judge whether target blocks, if it is
Step 3 is repeated to 6, if not occurring blocking enters step 8;
Step8, position filtering device model is updated;
Step9, step 3 is repeated to 8 until tracking terminates.
Specific step is as follows by the step Step1:
Step1.1, according to the input picture first frame, centered on target position, acquire one having a size of 2 times of target
The image block P of size.
The specific steps of the Step2 are as follows:
Step2.1, train the application mode of obtained position filtering device identical with principle using target different characteristic.HOG
Feature (27 dimension Gradient Features, in addition one-dimensional gray feature, 28 is tieed up totally) and CN feature (by 11 dimensional feature dimensionality reductions to 2 dimensions).Below
It will be all described by taking HOG feature as an example.The HOG feature f of P is extracted as training sample, wherein the dimension of feature is d dimension, fl
It is l dimension therein, l ∈ { 1 ..., d }.Sample training is to make input sample in order to find optimal position filtering device h and need
Square error is minimum between exporting.The filter h that h is trained by each dimension of featurelComposition, h are square by minimizing as follows
Difference acquires:
In formula, g indicates the desired output of filter h, and τ is regularization parameter.Training sample f shares d dimension, flIt is therein
Dimension, l ∈ { 1 ..., d }.* indicate that circulation is related.(1) solution of the minimum value of formula in frequency domain is as follows:
In formula, Hl, G, F be respectively be hl, g, f frequency domain description,Respectively indicate the conjugate complex number of G, F.FkIt is F
Kth dimension, whereinIt is FkConjugate complex number.Al, B be filter h molecule denominator.
The specific steps of the Step3 are as follows:
Step3.1, above-mentioned calculation method obtain position filtering device model, complete the training process of position filtering device.This
Target is detected at place, extracts HOG feature as detection sample z in new frame target area, calculates z and use HOG before
The relevance scores y of the trained filter h of feature is to get the response diagram for arriving this feature:
In formulaIndicate AlConjugate complex number, ZlIt is the l dimension of z, wherein l ∈ { 1 ..., d }.Use CN and HOG feature
Tracking target obtains filter response and is denoted as y respectivelyT, cnAnd yT, hog。
The specific steps of the Step4 are as follows:
Step4.1, in t frame, calculate separately the peak sidelobe ratio of CN feature and HOG characteristic response figure, be denoted as PSRT, cn
And PSRT, hog;
Step4.2, CN feature and HOG feature the normalized weight w in t frame are calculated separatelyT, cnAnd wT, hog,
Step4.3, target is tracked respectively using CN and HOG feature and is obtained in t frame in response level progress Fusion Features
It is responded to position filter, is denoted as yT, cnAnd yT, hog, fused response y is obtained using following method of weightingt,
Yt=wT, cn×yT, cn+wT, hog×yT, hog (6)
Step4.4, y is calculatedtMaximum value, obtain the final position of target.
The specific steps of the Step5 are as follows:
Step5.1, after determining target position, centered on the new position of target, pyramidal 33 figures of building scale are intercepted
As layer, and extract one scaling filter H of HOG feature training of these image layerss, to estimate target scale.And scale
Filter is identical with principle with the application mode of position filtering device h, so HsIt can be obtained by formula (2) calculation method;
Step5.2, in a new frame, target scale, seeks y using formula (3) in order to obtainsAnd its maximum value is obtained, determine mesh
Mark current scale.
The specific steps of the Step6 are as follows:
Step6.1, it is updated with fixing learning rate η to scaling filter model, more new formula are as follows:
In formula, each frame scale separated method device is all updated.In formulaBT, sIndicate the scaling filter the in t frame
The molecule and denominator of l dimension.BT-1, sFor the molecule denominator of previous frame scaling filter model.Indicate ruler when t frame
Spend the conjugate complex number of the frequency domain description of filter desired output.The training sample of training scaling filter the when indicating t frame
The frequency domain description of l dimension.Indicate the conjugate complex of the frequency domain description of the training sample kth dimension of training scaling filter when t frame
It counts, wherein k ∈ { 1 ..., d }.
The specific steps of the Step7 are as follows:
Step7.1, shadowing judge whether target blocks for determining whether that updating position filters according to PSR value
Wave device model reduces and blocks influence to target following, when t frame, PSR calculation method:
In formula, PSRtIndicate peak value side ratio, y when t frameT, maxFor the peak value of t frame response diagram, μtAnd σtWhen being t frame
The mean value and standard deviation of peak response position peripheral region.PSRtBigger, peak strength is higher in response distribution, then target confidence
It spends higher.
The specific steps of the Step8 are as follows:
Step8.1, it when judging that target is not blocked, is updated with fixing learning rate η to position filter model,
More new formula are as follows:
In formulaBtIt indicates in t frame, the molecule and denominator of position filtering device h l dimension,Bt-1For previous frame
The molecule denominator of position filtering device model,Indicate the conjugate complex number of the frequency domain description of t frame position filter h desired output,Indicate the frequency domain description of the training sample l dimension of training position filtering device h when t frame,Training position filtering when t frame
The conjugate complex number of the frequency domain description of the training sample kth dimension of device h, wherein k ∈ { 1 ..., d }.
The specific steps of the Step9 are as follows:
So far, the second frame end of run, target position, scale and all filter models are all for Step9.1, algorithm operation
It is updated completion, next frame reruns step 3 to 8 until video terminates.
The present invention gets up to describe target mould with the heterogeneity of HOG feature and color characteristic CN, by both Fusion Features
Type, while obtaining target global characteristics, also available target local feature, improves the accuracy of target detection.Together
When according to the peak sidelobe ratio of every frame final goal response diagram, judge whether target blocks to decide whether to carry out model more
Newly, the influence blocked to target following is reduced, to improve the stability of algorithm.
Above in conjunction with attached drawing, the embodiment of the present invention is explained in detail, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.