A kind of composite type method for tracking target based on depth characteristic fusion convolutional neural networks
Technical field
The contents of the present invention are related to computer vision field, and specific application is a kind of based on depth characteristic fusion convolution
The composite type method for tracking target of neural network.It can make video frequency object tracking in complex scene using the method in the invention
Success rate and accuracy be effectively improved.
Background technique
In modern society, the development speed of social informatization is getting faster, in the work and life of people there is
A large amount of video capture device, these equipment record and save a large amount of video data.On the one hand, for point of these data
Analysis and processing rely on artificial mode that will gradually become abnormal difficult, it might even be possible to say it is infeasible.However on the other hand,
For these video datas, but there is the demands from many class different applications in actual application, are leading among these
If the management of security monitoring, intelligent transportation including video, the man-machine interactive system of intelligence, the analysis of target movement and machine
The automatic Pilot etc. of the dynamic vehicles, items of the video frequency object tracking technology in video analysis, video understanding and video interactive
There is particularly important, one basic skill for needing to rely on when being the video task progress of such high-order in concrete application
Art.
Video frequency object tracking problem is a very active research topic in computer vision field, but simultaneously by
The disturbing factor of some column such as illumination variation that may be present, dimensional variation, attitudes vibration, target occlusion in scene, therefore
It is again very challenging.
Video frequency object tracking refers to using after video capture device acquisition video data, and one is selected from video either
Multiple objects provide the initial center in target area and scale size information as tracking target, effective by designing
Method for tracking target predicts the center of target in subsequent video frame and scale size information, to complete to mesh
Target persistently tracks.
Although there is a large amount of applications to need in the work and life of people, the target following skill based on video is needed
It is supported based on art, is automatically performed target following using computer vision technique, people can be made from a large amount of cumbersome inefficient
It is freed in task, and important foundation is provided for the analysis and decision of people.But it is passed through among complicated reality scene
It often will appear a variety of different disturbing factors, cause to become very difficult based on video progress target following.
Therefore, it is necessary to develop a kind of novel method or system for carrying out target following based on video, to realize robust
The target following that property is strong, accuracy is high.
Summary of the invention
Aiming at the above defects or improvement requirements of the prior art, the present invention provides one kind merges convolution based on depth characteristic
The composite type method for tracking target of neural network, it is intended that constructing multiple and different places by extracting target depth feature
Reason mode sufficiently combines production model, discriminative model, long-term follow and short-term the advantages of tracking, realizes that strong robustness is quasi-
The high target following of exactness, to be further video analysis, video understands, video interactive provides good basis, Jin Erwei
Video security monitoring, intellectual traffic control, target motion analysis, man-machine interactive system and automatic Pilot are that the vision of representative is answered
With the good technical support of offer.
To achieve the above object, the present invention provides a kind of composite type mesh based on depth characteristic fusion convolutional neural networks
Tracking is marked, the also referred to as single vision method for tracking target of complex scene, this method includes the following steps:
(1) modify VGG-M network model and be added channel characteristics fusion convolutional layer, using the conventional part in network as
Shared depth characteristic extracts sub-network, and using the remaining part of network as the specific depth characteristic classification sub-network of sequence,
The two is connected to the convolutional neural networks model of construction one channel characteristics fusion;
In the present processes, modified network reduces the convolutional layer of VGG-M network and the number of plies of full articulamentum, most
Depth characteristic classification sub-network afterwards only retains a full articulamentum.The spy that existing network model is exported in last convolutional layer
Levy number of channels it is more, and the data in each feature channel be in fact it is sparse, the present processes before full articulamentum plus
Enter a channel characteristics fusion convolutional layer, includes the characteristic information of basic isodose with lower data dimension, be conducive to accelerate
The speed of similarity computing module in production model.It is not added before this channel characteristics fusion convolutional layer, convolutional layer output
512 kinds of channel characteristics, what is obtained after addition is the channel characteristics of 32 kinds of fusions.
(2) video sequence for carrying target position and dimensional information is collected, to each video sequence therein according to mark
The target information that note provides acquires the training set of the sample composition network model of prospect class and background classes respectively;
Wherein, some scholars and research institution provide disclosed video frequency object tracking data set, select therein several
Data set comprising different challenge factors, including VOT-2013, VOT-2014 and VOT-2015 will wherein be gone duplicate video
Fall.For these each of selected video sequences, a part of video frame images are therefrom randomly chosen, then for each
The selected each frame of sequence utilizes the coordinate parameters and ruler of target's center's point according to the target position of mark and dimensional information
The Gaussian function sampling of the high wide parameter of degree, to generate a large amount of sample image sub-block.Intercept the image in these sub-block regions simultaneously
The image procossing that it is normalized is defined according to these sub-block regions ratio Chong Die with true object block region
Prospect class and background classes are classified into corresponding two class and retain according to a certain percentage these two types of samples, thus
Constitute the training sample set of network model.
(3) training sample is formed into batch according to the corresponding mode of sequence, sequence recycles network model one by one
Repetitive exercise, until completing the cycle-index of setting or reaching preset precision threshold;
Influenced by deep neural network processing speed, in network model training process using sample in batches by the way of will
It is organized.The training of network repetitive exercise by the way of sequence loops specifically refers in circulation each time to shared
Feature extraction sub-network and the specific tagsort sub-network of sequence use the specific tagsort sub-network of the sequence one by one
Corresponding sequence batch sample.The convergent of a certain size cycle-index observation network class performance can be first set, when
It is unsatisfactory for increasing the threshold value of cycle-index when convergent requirement, conversely, for the overfitting problem for avoiding depth network, it should be appropriate
Reduction the number of iterations.
(4) for new video sequence, the corresponding specific tagsort sub-network of sequence is reconfigured
They are extracted sub-network with shared depth characteristic and are connected by module and the specific regression forecasting sub-network module of a sequence
It connects, to constitute new sequence target following network model;
Specifically, since there is illumination variation, postures to become in the video sequence in used training sample set
The various disturbing factors such as change, target rotation, dimensional variation, motion blur and target occlusion.Therefore, when these samples pair of utilization
Network model carries out after sufficient repetitive exercise, it will be able to the depth of strong robustness is extracted by shared feature extraction sub-network
Spend fusion feature.
Tracked target in each video sequence is different, and the target being tracked in some sequence is in addition
A video in may be background interfering object even similar with target.Therefore, for the target of new video sequences
Tracking, needs to construct a completely new sequence specific depth characteristic classification sub-network, and by itself and trained shared spy
Sign extracts sub-network and is connected to constitute classification prediction network model used in tracking in the process.In addition, in the method for the present invention
A regression forecasting network module is also used, which is equally used for constructing a sequence specifically deeply to new video sequence
Spend feature regression forecasting sub-network module.
(5) the prospect class sample and back initial using the position of the target in new sequence head frame and the information collection of scale
Scape class sample, and be trained with tagsort sub-network of these samples to neotectonics, for regression forecasting sub-network module
It is then trained using positive sample therein, extracts sub-network using shared depth characteristic and depth characteristic is carried out to initial target
Extraction, and feature templates that the feature of extraction is initial as target;
The tagsort prediction sub-network module and feature of needs return when previous step constructs new video sequences tracking
It predicts sub-network module, needs to acquire the sample of prospect class and background classes in initial frame according to the information of new sequence initial target
This, obtains prediction sub-network module of classifying for a long time using whole sample trainings classification sub-network, uses prospect class sample therein
Training regression forecasting sub-network module.It regard the least significant end convolutional layer output of target prime area as characteristic processing, and saves as
Initial target signature template.
(6) using a variety of different target signature templates are arrived in object tracking process, wherein initial history target signature
The set of template is set as empty, and the target signature template of previous frame is then set as initial target signature template;
The method of the present invention is a kind of method for tracking target of composite type, wherein using the generation of a multi-template matching strategy
Formula module.Separately included in the previous frame of initial frame and present frame target it is initial and it is upper it is primary track obtained information,
Furthermore in object tracking process before, there may be may repeat in some subsequent tracking external appearance characteristic
The external appearance characteristic information of significant change.Initial target feature templates, previous frame target signature are constructed respectively according to this information above
Template and a history target signature template set empty for history feature template set before carrying out target following, and
Initial target signature template is then set by the target signature template of previous frame.
(7) candidate region that target is generated using newest target position and dimensional information, uses these regions shared
Feature extraction sub-network extracts their depth characteristic and calculates separately the class probability that they belong to prospect class and background classes;
The movement of target has certain regularity, the position of target and the variation of scale in a new frame under normal conditions
It is likely to be therefore a kind of Gaussian Profile can use Gaussian function for the target position of previous frame and scale
The region of candidate target is generated, then extracts the candidate target that sub-network generates these using the sharing feature of network model
Extracted region feature predicts that automatic network further calculates the class probability that they belong to prospect class and background classes using classification.
(8) variation degree that prospect class probability results judge target appearance is belonged to according to the depth characteristic of all candidate blocks,
By these probability values with one set threshold value be compared, using comparison result as a condition, i.e., whether all candidates
The probability value that block belongs to prospect class is both greater than the threshold value of the setting;
(9) when the probability value that all candidate blocks belong to prospect class is both greater than the threshold value set, the Rule of judgment of previous step at
It is vertical, show that the cosmetic variation degree of target is little, the probability correctly identified by long-term classification prediction sub-network module is higher, at this time
It is combined using long-term classification prediction sub-network module and regression forecasting network module and carries out the comprehensive predicted value of analytical calculation;
Conversely, then show that the appearance of target may have occurred biggish variation when Rule of judgment is invalid, it is then new at this time
Construction one short-term classification prediction sub-network module mutually ties shot and long term classification prediction sub-network module with multi-template matching module
It closes and carries out the comprehensive predicted value of analytical calculation;
Predict that the tracking relatively high sample of collected confidence level in the process is used only in sub-network module due to classifying for a long time
Originally it is updated, therefore when the appearance of the appearance of target largely changes, this module be may tend to all times
Selecting block sort is threshold value of the probability less than setting that background classes and all candidate blocks belong to prospect class.At this time merely with long-term point
Class prediction sub-network module is easy to appear serious tracking drift, therefore constructs a new short-term classification and predict sub-network mould
It is combined with the production module of long-term classification prediction sub-network module and multi-template matching and calculates the pre- of synthesis by block
Measured value.Conversely, the probability that part candidate blocks belong to prospect class if it exists is greater than the threshold value of setting, then show target appearance not
Very big variation occurs, it is only necessary to which long-term classification prediction sub-network module is combined into calculating with regression forecasting sub-network module
Comprehensive predicted value.
(10) using the highest candidate blocks of predicted value as the target following of present frame as a result, and by the target signature of previous frame
Template renewal is that new target block feature is added them into and is used for according to new target position and dimensional information collecting sample
In the sample set for updating short-term classification prediction sub-network module, and the probability that all candidate blocks belong to prospect class is analyzed, thus
Determine whether to add them into the sample set of long-term classification prediction sub-network module, and whether generates new history mesh
It marks feature templates and updates network;
It is calculated by the combination of disparate modules after the integrated forecasting value of all candidate blocks, wherein predicted value is most for selection
Then the previous frame target signature used in multi-template strategy is replaced with fresh target block by target of the big block as present frame
Depth characteristic, and according to new target position and dimensional information collecting sample, by them plus it is used to update short-term classification prediction
The sample set of network module.
Classified using long-term classification prediction sub-network module to the feature of all candidate blocks, obtained result can be compared with
For the size for objectively reflecting target appearance variation degree, if all candidate blocks belong to prospect class probability not Gao Zeke with
Think that current tracking result confidence level is not also high, such case shows that more apparent change has occurred in the external appearance characteristic of target
Change, at this time using the sample in the higher sample set of confidence level collected during tracking to long-term classification prediction sub-network mould
Block is updated, while and the depth characteristic of fresh target block being added in history target signature template set;
Conversely, the sample of acquisition is added in the sample set of long-term classification prediction sub-network module.
(11) judge whether tracking terminates, if it has not ended, then going to step (7), circuit sequentially and execute step (7)
To step (11).
In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show
Beneficial effect:
The method include the steps that the depth characteristic of two classification of construction merges convolutional neural networks model, network mould
Type includes shared feature extraction sub-network, and with the specific tagsort sub-network of the one-to-one sequence of tracking sequence;
Selection video sequence is concentrated to construct training set from the video tracking public data with mark, and from wherein acquiring prospect class sample and back
Scape class sample carries out the repetitive exercise of sequence wheel streaming using the sequence samples of acquisition to network model.To new video sequence
In target when being tracked, the various parameters in feature extraction sub-network are kept fixed, and reconfigure one for new sequence
The specific tagsort sub-network of sequence and the specific regression forecasting sub-network module of a sequence;According to the first frame mesh of new sequence
Cursor position and dimensional information acquire the relevant prospect class background classes classification samples of initial sequence, and using these samples to new structure
The specific tagsort sub-network of the sequence made and regression forecasting sub-network module are trained;In the process of target following
In, candidate blocks are generated according to newest target position and dimensional information, feature and classification are extracted to them using newest network,
When the probability that all candidate blocks belong to prospect class is both greater than the classification thresholds of a setting, sub-network is predicted using long-term classification
Module and regression forecasting sub-network module are combined and are predicted, are saved in for a long time according to new target status information collecting sample
In the sample set of classification prediction sub-network module;Otherwise it constructs and one short-term sequence of training is specifically classified sub-network mould
Shot and long term is classified and predicts that sub-network module and multi-template matching module are combined and predicted by block, and during utilization tracking
The Sample Refreshment of collection is classified for a long time predicts sub-network module, and history target is added in the depth characteristic of new target area
In feature templates set;The sample set of short-term sorter network module is saved according to new target status information collecting sample
In;Using the highest candidate blocks of predicted value as new target following result.The present invention merges convolutional Neural net by depth characteristic
Network extracts feature, and proposes the composite type method for tracking target based on fusion feature, and design is simple, is classified by shot and long term pre-
The composite type target following model that sub-network module and multi-template matching module are closed is surveyed, target following can be effectively improved
Precision of prediction.
Detailed description of the invention
Fig. 1 is a kind of composite type target following side that convolutional neural networks are merged based on channel characteristics in the embodiment of the present invention
The block schematic illustration of method principle.
Fig. 2 is the schematic network structure of the channel characteristics fusion convolutional neural networks in the embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
Not constituting a conflict with each other can be combined with each other.
It is a kind of based on logical it is a primary object of the present invention to be provided for the tracking problem of the single vision target under complex scene
The composite type tracking of road Fusion Features convolutional neural networks, this method is by extracting to target rotation, illumination variation, posture
Variation, target occlusion etc. have the depth characteristic of good robustness, construct multiple and different processing modules, sufficiently combine production
The advantages of model, discriminative model, long-term follow and short-term tracking, realize the high target following of strong robustness accuracy, thus
Be further video analysis, video understands, video interactive provides good basis, and then be Video security monitoring, intelligent transportation
Control, target motion analysis, man-machine interactive system and automatic Pilot provide good technical support for the vision application of representative.
Main thought of the invention is, proposes a kind of composite type target based on channel characteristics fusion convolutional neural networks
Tracking.On the one hand, a new channel characteristics weighted convolution layer is wherein added for network structure, and constructs a kind of be suitble to
It is indicated in the convolutional neural networks of target following for extracting depth characteristic as appearance, so that sparse but number of channels is more originally
Feature with lower characteristic dimension include basic same information content feature, be conducive to the calculating for accelerating similarity.Another party
Face, the long-term classification prediction sub-network module and regression forecasting sub-network module of the front construction of tracking, utilizes the letter of initial target
It ceases collecting sample and sub-network module and regression forecasting sub-network module is predicted in the long-term classification of training, using long-term during tracking
Classification prediction sub-network module classifies to all candidate blocks, the adaptive knot of the probability results for belonging to prospect class according to them
Shot and long term classification prediction sub-network module, regression forecasting sub-network module and multi-template matching module is closed to be tracked.Equally, short
Phase classification prediction the reconfiguring of sub-network module, the update of classification prediction sub-network module, regression forecasting sub-network mould in short term
The acquisition of sample and the generation of history target signature template are belonged to according to all candidate blocks during the update of block, tracking
The probability results of prospect class adaptively carry out.
Fig. 1 is that the present invention implements a kind of composite type target following that convolutional neural networks are merged based on channel characteristics in example
The block schematic illustration of method, this method are main including the following steps:
(1) the convolution kernel size of the number of plies of the classical sorter network VGG-M of modification and each convolutional layer, and new depth is added
Fusion Features convolutional layer, the feature extraction sub-network that the conventional part before full articulamentum is shared as all sequences, to every
Sequence structure one of a tracking includes the specific tagsort sub-network of sequence of a full articulamentum and function layer, two sons
Network connection forms depth characteristic together and merges convolutional neural networks model;
What is analyzed when tracking is certain pieces in every frame image, and size is smaller, therefore modifying VGG-M network connects network
Input picture size after the normalization of receipts is 107*107*3, reduces the convolutional layer of network to 3, the size of each layer convolution kernel
Respectively 7*7*3*96,5*5*96*259,3*3*256*512, two parameters indicate the size of core before convolution kernel size, after
Two, face parameter then respectively indicate before convolution and after convolution feature channel quantity.Wherein, the first two convolutional layer carries out convolution behaviour
The step-length of work is 2*2, and the convolution step-length of third convolutional layer is then 1*1.It further include ReLU layers, normalizing between these convolutional layers
Change layer and pond layer, wherein the scale in the pond pond Hua Ceng is 3*3, step-length 2*2.In third convolutional layer followed by one
It is ReLU layers a, a channel characteristics fusion convolutional layer and one ReLU layers, the convolution kernel size of feature channel convolutional layer is added
For 1*1*512*32, the step-length of convolution operation is 1*1, these above-mentioned layers together constitute the shared feature of a sequence
Extract sub-network, feature share be connected to after sub-network be full articulamentum of the convolution kernel having a size of 3*3*512*2 with
And a function layer, they collectively form the specific tagsort sub-network of a sequence.The present processes also use one
A specific regression forecasting sub-network module of sequence, structure are similar to tagsort sub-network, the difference is that the size of convolution kernel
For 3*3*512*1, while function layer uses logistic function rather than softmax function.
(2) in order to which training obtains the network model for tracking problem, collect with target position and dimensional information with
Track video utilizes the status information capture prospect class sample and background classes sample of target to each video sequence, to constitute net
The training set of network model;
Selection has under different challenge factor scenes and includes the video sequence of mark for network model training sample
Sampling.It is randomly chosen 8 frame image therein in each video sequence, is given on these images according to the mark of target
Position and dimensional information out defines prospect class and background classes sample, and acquires 50 and 200 these two types of samples respectively.Before
Scape class and background classes sample are the sizes of the region that is marked according to the sample areas and real goal overlapping ratio of area between the two
It is defined, two threshold values is set, one is 0.7, another is 0.5.If the ratio of area overlapping is more than or equal to
0.7, then corresponding sample is defined as prospect class sample, whereas if area overlapping ratio less than 0.5, then will be corresponding
Sample is defined as background classes sample.
(3) training sample of acquisition is formed into batch by different sequences, list type is carried out to network model using it
Loop iteration training, until reaching the cycle-index of setting or the error rate of network lower than preset threshold value;
Initial network model is trained 150 times using the video sequence loop iteration in training set, this process is mainly
The convolution nuclear parameter of the shared feature extraction sub-network of study.It is every from training set in trained for loop iteration each time
32 prospect class samples and 96 background classes samples therein, which are randomly chosen, in all samples of a sequence constitutes the sequence
Sample batch used in an iteration.
(4) for new video sequence, the corresponding specific tagsort sub-network of sequence is reconfigured
With the specific regression forecasting sub-network module of a sequence, they are extracted into sub-network with shared depth characteristic and is connected, from
And constitute the network model used when new sequence target following;
(5) the prospect class sample and back initial using the position of the target in new sequence head frame and the information collection of scale
Scape class sample, and being trained to the tagsort sub-network of neotectonics with these samples, for regression forecasting sub-network mould
Block is then trained using positive sample therein, is extracted sub-network using shared depth characteristic and is carried out depth spy to initial target
The extraction of sign, and the feature templates that the feature of extraction is initial as target;
500 prospect class samples and 5000 are acquired respectively using the position of target in sequence initial frame and dimensional information
Background classes sample similarly randomly selects 32 prospect class samples from these samples every time and 96 background classes samples is made
A sample batch of processing is received for network, and carries out 20 loop iteration training, to realize the sequence to neotectonics
The training study of specific full connection layer parameter.Then, by the shared extracted initial target block of feature extraction sub-network
Depth characteristic carries out vectorization and normalized, and using result as initial target signature template.
(6) it is used in object tracking process and arrives a variety of different target signature templates, wherein initial history target signature
The set of template is set as empty, and the target signature template of previous frame is then set as initial target signature template.
(7) these regions are used network by the candidate region that target is generated using newest target position and dimensional information
Its depth characteristic of model extraction simultaneously calculates the class probability that they belong to prospect class and background classes;
It is raw using the coordinate of central point and the Gaussian function of length and width scale according to the newest position of target and dimensional information
At 256 candidate sample blocks, their depth characteristic is extracted, and using newest long-term classification prediction sub-network module to this
The feature of class candidate blocks is classified.
(8) the variation journey of target appearance is judged according to the result that the depth characteristic of all candidate blocks belongs to prospect class probability
Degree, and the threshold value that these probability values are set with one is compared, using comparison result as decision condition, i.e., whether own
The probability value that candidate blocks belong to prospect class is both greater than the threshold value of the setting;
Threshold value used when comparing in Rule of judgment is set as 0.55, illustrates to belong to prospect class higher than this threshold value
A possibility that possibility ratio belongs to background classes is higher, counter to push away the result known lower than this threshold value.
(9) when the Rule of judgment of previous step is set up namely a possibility that prospect class a possibility that ratio belongs to background classes more
Gao Shi shows that the cosmetic variation degree of target is little, and the probability correctly identified by long-term classification prediction sub-network module is higher, this
Shi Liyong classifies for a long time predicts that sub-network module and regression forecasting sub-network module combine the prediction for carrying out analytical calculation synthesis
Value;
And when lower when Rule of judgment is invalid namely a possibility that prospect class a possibility that ratio belongs to background classes, then
Show that the appearance of target may have occurred biggish variation, then sub-network module is predicted in neotectonics one short-term classification at this time, will
Shot and long term classification prediction sub-network module is combined with multi-template matching module carries out the comprehensive predicted value of analytical calculation;
When the Rule of judgment of previous step is set up, block of the probability greater than 0.5 for belonging to prospect class is selected from candidate blocks, is made
It is combined with long-term classification prediction sub-network module and regression forecasting sub-network module, wherein the former, which takes, belongs to the general of prospect class
Rate value, weight are fixed as 1, and the output valve of regression forecasting sub-network module directly indicates that these blocks are the probability of target, weight
It is set as belonging in selected block the average value of highest 5 values of prospect class probability value (if the block for meeting condition takes institute less than 5
There is the average value of value).Conversely, selection belongs to the descending sequence of prospect class probability value from candidate blocks when condition is invalid
Preceding 50 blocks, construct a new short-term classification prediction sub-network module, and be trained using the sample of nearest three frame, so
The selected piece of probability value for belonging to prospect class is calculated using it afterwards, shot and long term is classified, and it is general to predict that network module is calculated
Rate value weighted combination, the weight of long-term module are set as 1, and the weight of short-term classification prediction sub-network module is to utilize tracking process
The ratio-dependent that the middle higher partial frame prospect class sample of confidence level is correctly classified by short-term classification prediction sub-network module;It connects
Using EMD distance calculate selected block depth characteristic and three kinds of target signature templates used in method it is corresponding similar
It spends and weights and obtain comprehensive matching value, the weighting weight of three kinds of similarities is set according to following formula respectively:
ωf=C1 (1)
Wherein, ωf、ωl、ωhIt respectively indicates and initial target feature templates, previous frame target signature template and history
Weighting weight when the relevant three kinds of Similarity-Weighteds of target signature template set are summed, p*(t-1)Indicate be previous frame with
The probability for belonging to prospect class being calculated when track result is using long-term classification prediction sub-network module, parameter C1, C2, α, β are
Four different constants are respectively set to 2,0.2,0.5,0.01.Finally probabilistic forecasting value and matching value weighted combination are calculated comprehensive
The predicted value of conjunction, the two weight are respectively set to 0.7 and 0.3.
(10) using the highest candidate blocks of predicted value as the target following of present frame as a result, and by the target signature of previous frame
Template renewal is new target block feature, according to new target position and dimensional information collecting sample, is added them into for more
The sample set of new short-term classification prediction sub-network module, and the probability that all candidate blocks belong to prospect class is analyzed, so that it is determined that
Whether these samples are added in the sample set of long-term classification prediction sub-network module, and whether benefit is special to history target
Sign template and network model are updated;
Integrated forecasting is worth the highest piece of target following as present frame as a result, and replacing previous frame target signature template
It is changed to the depth characteristic of the block, is then protected using the position of newest object block and dimensional information acquisition prospect class and background classes sample
It is stored to after constructing short-term classification prediction sub-network module in the used sample set of training.For present frame, if long-term point
When there are the candidate blocks for belonging to prospect class probability value more than or equal to 0.6 in the classification results of class prediction sub-network module, by these
The sample of sampling is also added in sample set used in the long-term classification prediction subsequent update of sub-network module.Conversely, then recognizing
Obvious variation has occurred for the appearance of new object block, the long-term classification prediction sub-network module of collection is updated into institute at this time
In the prospect class sample and nearest 50 frame for (then taking all nearest frames less than 20 frames) in nearest 20 frame in the sample set used
The background classes sample of (then taking all nearest frames less than 50 frames) is updated long-term classification prediction sub-network module, simultaneously will
Treated that the depth characteristic history target signature template new as one is saved in history target signature template for newest object block
In set.
(11) judge whether tracking terminates, recycled if being not over and execute step (7) to (11).
Fig. 2 is the schematic network structure of the channel characteristics fusion convolutional neural networks in the embodiment of the present invention, You Tuke
Know, the Conv in figure indicates convolutional layer, which convolutional layer the subsequent digital representation layer is, K:n*n table in the bracket of lower section
What is shown is the scale size of core, the step-length for the operation that s:l is indicated;What same pooling was indicated is convolutional layer, and ReLu indicates amendment
Linear unit layer, normalize indicate normalization layer;Feature fusion and full connection respectively indicate feature
Fused layer and full articulamentum, they are the special shapes of convolutional layer.
Testing as an example, introducing the evaluation index of two kinds of target followings, and displaying makes with multiple video sequences below
The target following result that the tracking proposed in the present invention obtains.
Evaluation goal tracks there are mainly two types of the indexs of accuracy, and one is use in euclidean distance metric tracking result
The error of center in heart position and target time of day, referred to as centralized positioning error (Center Location
Error, CLE), it is evident that the smaller expression error of the Euclidean distance at center is smaller, then it is more accurate to track;Another is then used for
The region area in the region and target time of day of measuring tracking result is overlapped ratio, referred to as overlapping ratio (Overlap
Ratio, OR), show that the registration of prediction is higher when the overlapping ratio of region area is higher, then tracking result is more accurate.For
The evaluation of entire video sequence tracking result accuracy is then that the average value of single frames evaluation result is taken to be compared, it is assumed that a certain frame
The abscissa and ordinate and region area for the prediction result center that tracking obtains are denoted as (x respectivelyp, yp) and Rp,
The abscissa and ordinate and region area of the center of corresponding real goal are denoted as (x respectivelyg, yg) and Rg, then two kinds
The calculation formula of evaluation index is as follows:
The first row in table is the title of different video sequence, and CLE refers to that the smaller then center of target value is more accurate, and
OR refers to that the more big then registration of target value is higher.It was found from above table, method of the invention can obtain in the above video sequence
The tracking effect that center deviation is small while registration is high.
The present invention makes full use of the advanced image procossing proposed in computer vision field and mode identification technology, effectively complete
At video frequency object tracking in complex scene.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.