[go: up one dir, main page]

CN117274308A - Multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation - Google Patents

Multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation Download PDF

Info

Publication number
CN117274308A
CN117274308A CN202311226983.6A CN202311226983A CN117274308A CN 117274308 A CN117274308 A CN 117274308A CN 202311226983 A CN202311226983 A CN 202311226983A CN 117274308 A CN117274308 A CN 117274308A
Authority
CN
China
Prior art keywords
model
training
tracking
target tracking
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311226983.6A
Other languages
Chinese (zh)
Inventor
马素刚
段帅鹏
杨小宝
侯志强
蒲磊
余旺盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Posts and Telecommunications
Original Assignee
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Posts and Telecommunications filed Critical Xian University of Posts and Telecommunications
Priority to CN202311226983.6A priority Critical patent/CN117274308A/en
Publication of CN117274308A publication Critical patent/CN117274308A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本公开揭示了一种基于双分支特征增强和多级轨迹关联的多目标跟踪方法,包括:S100:获取含有多个待跟踪目标的输入图像;S200:构建多目标跟踪模型并进行训练,以获得训练好的多目标跟踪模型;S300:将输入图像输入训练好的多目标跟踪模型中,以实现对输入图像中的多个待跟踪目标进行同时跟踪。本公开还揭示了一种基于双分支特征增强和多级轨迹关联的多目标跟踪装置、存储介质以及电子设备。本公开能够解决目标遮挡、模糊等复杂场景下目标跟踪失败的问题,从而能够提高在复杂场景下对多个待跟踪目标的跟踪性能。

This disclosure discloses a multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation, including: S100: Obtaining an input image containing multiple targets to be tracked; S200: Constructing a multi-target tracking model and training to obtain The trained multi-target tracking model; S300: Input the input image into the trained multi-target tracking model to achieve simultaneous tracking of multiple targets to be tracked in the input image. The present disclosure also discloses a multi-target tracking device, a storage medium and an electronic device based on dual-branch feature enhancement and multi-level trajectory correlation. The present disclosure can solve the problem of target tracking failure in complex scenarios such as target occlusion and blur, thereby improving the tracking performance of multiple targets to be tracked in complex scenarios.

Description

Multi-target tracking method based on dual-branch feature enhancement and multi-stage track association
Technical Field
The disclosure belongs to the field of target tracking, and in particular relates to a multi-target tracking method based on double-branch feature enhancement and multi-level track association.
Background
Multi-target tracking (Multiple Object Tracking, MOT) is an important research direction in the field of computer vision, aims at predicting the positions of multiple targets in continuous video frames, identifies which targets belong to the same target and generates a motion trail of the target, and has very wide application in actual scenes such as video analysis, automatic driving, robot, motion recognition and the like.
In recent years, with the improvement of the accuracy and the speed of the target detector, the MOT algorithm is rapidly developed, and currently, the MOT algorithm model is divided into two types, namely a two-stage model and a single-stage model, wherein the two-stage model follows a paradigm of detection and tracking, such as SORT, deepSORT and POI, and divides the MOT into two independent tasks, namely, detecting a video frame through the detector to obtain a bounding box of the target, and then carrying out association tracking between the targets. Although the tracking accuracy of the two-stage model is at a leading level, the model is complex, the calculation cost is high, and the algorithm accuracy and the tracking speed cannot be considered; the single-phase model follows a joint detection and tracking paradigm that detects and correlates in the same network by redesigning the detector head of the detector as a tracking branch, and simultaneously deriving the detection branch and tracking branch results to achieve correlation between targets. For example, a typical single-stage MOT algorithm centrtrack uses a detector centrnet without an anchor box and obtains offset vectors between targets by adding a regression-predicted trace branch, thereby performing a joint training of detection and correlation. Compared with a two-stage model, the single-stage model simplifies the complexity of the model, improves the tracking speed better, and can give consideration to both the accuracy of the algorithm and the tracking speed.
However, when facing complex scenes, the single-stage model still has problems of insufficient target feature extraction, identity switching, track deletion and the like, so that the performance of the MOT is reduced. For most MOT algorithm detectors, if the target feature extraction is insufficient, high quality detection results cannot be output, thus limiting the performance of the tracker.
Disclosure of Invention
Aiming at the defects in the prior art, the aim of the present disclosure is to provide a multi-target tracking method based on dual-branch feature enhancement and multi-level track association, which can solve the problem of target tracking failure in complex scenes such as target shielding, blurring and the like, so as to improve the tracking performance of a plurality of targets to be tracked in the complex scenes.
In order to achieve the above object, the present disclosure provides the following technical solutions:
a multi-target tracking method based on dual-branch feature enhancement and multi-stage track association comprises the following steps:
s100: acquiring an input image containing a plurality of targets to be tracked;
s200: constructing a multi-target tracking model and training to obtain a trained multi-target tracking model;
the multi-target tracking model is used for relieving transition competition between detection and tracking by utilizing a double-branch feature learning network, and obtaining a more accurate offset vector by introducing an incidence matrix AM prediction so as to reduce the identity switching times of a target to be tracked in an input image;
s300: and inputting the input image into a trained multi-target tracking model to realize simultaneous tracking of a plurality of targets to be tracked in the input image.
Preferably, in step S200, the multi-target tracking model is trained by the following method:
s201: acquiring a data set, and dividing the data set into a training set and a testing set;
s202: setting training parameters, training the model by using a training set, and finishing the model training when the training reaches the set number of rounds;
s203: and testing the trained model by using a test set, wherein in the test process, multi-target tracking accuracy and IDF1 fraction are used as evaluation indexes to evaluate the model, and when the tracking accuracy reaches 66.1%, the IDF1 fraction reaches 64.2%, the model test is passed.
Preferably, in step S203, the multi-target tracking accuracy is expressed as follows:
wherein FN represents false negative, FP represents false positive, IDS represents identity switching times, GT is group trunk, and the number of marked targets in the scene is represented.
Preferably, in step S203, the IDF1 score is expressed as:
wherein IDTP is true positive ID, which indicates the number of correctly allocated detection targets in the whole video; the IDFN is a false negative ID, and represents the number of missed allocation of the detection targets in the whole video; IDFP is a false positive ID indicating the number of false assignments of detection targets in the entire video.
The present disclosure also provides a multi-target tracking device based on dual-branch feature enhancement and multi-stage trajectory correlation, comprising:
the acquisition module is used for acquiring an input image containing a plurality of targets to be tracked;
the model construction and training module is used for constructing a multi-target tracking model and training the multi-target tracking model to obtain a trained multi-target tracking model;
the multi-target tracking model is used for relieving transition competition between detection and tracking by utilizing a double-branch feature learning network, and obtaining a more accurate offset vector by introducing an incidence matrix AM prediction so as to reduce the identity switching times of a target to be tracked in an input image;
and the tracking module is used for inputting the input image into the trained multi-target tracking model so as to realize simultaneous tracking of a plurality of targets to be tracked in the input image.
Preferably, the model building and training module includes:
dividing the data set of model training into a training set and a testing set;
the training sub-module is used for training the model by utilizing the training set;
and the test sub-module is used for testing the trained model by using the test set.
The present disclosure also provides a computer storage medium storing computer-executable instructions for performing a method as described in any one of the preceding claims.
The present disclosure also provides an electronic device, including:
a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein,
the processor, when executing the program, implements a method as described in any of the preceding.
Compared with the prior art, the beneficial effects that this disclosure brought are:
1. according to the method, the specificity and the correlation of the two tasks are detected and tracked by adopting the dual-branch feature learning network, so that excessive competition between the two tasks is relieved, and sufficient target feature information can be extracted;
2. the method introduces the incidence matrix, and can reduce the number of identity switching times by predicting the offset vector by using more time sequence information;
3. the method adopts a multi-level track association strategy, and associates the high-low detection frames with the tracks in different matching modes, so that the track missing times can be reduced;
4. the present disclosure is based on the improvements in the above 3 aspects, and can improve the tracking performance of complex scenes for multiple targets.
Drawings
FIG. 1 is a flow chart of a multi-objective tracking method based on dual-branch feature enhancement and multi-level trajectory correlation provided by one embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a multi-objective tracking model provided in another embodiment of the present disclosure;
fig. 3 is a schematic diagram of the structure of the DFL network in the model shown in fig. 2;
FIG. 4 is a schematic diagram of the architecture of the MTA strategy in the model of FIG. 2;
FIG. 5 (a) is a graph showing the tracking result of the CenterTrack algorithm on dataset MOT 17-04;
FIG. 5 (b) is a schematic diagram of the tracking results of a multi-target tracking model on dataset MOT17-04 provided by another embodiment of the present disclosure;
FIG. 6 (a) is a graph showing the tracking results of the CenterTrack algorithm on dataset MOT 17-09;
FIG. 6 (b) is a schematic diagram of the tracking results of a multi-target tracking model on the data set MOT17-09 provided in another embodiment of the disclosure;
FIG. 7 (a) is a graph showing the tracking results of the CenterTrack algorithm on dataset MOT 17-11;
FIG. 7 (b) is a diagram of the tracking results of a multi-target tracking model on the data set MOT17-11 provided in another embodiment of the present disclosure.
Detailed Description
Specific embodiments of the present disclosure will be described in detail below with reference to fig. 1 to 7 (b). While specific embodiments of the disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It should be noted that certain terms are used throughout the description and claims to refer to particular components. Those of skill in the art will understand that a person may refer to the same component by different names. The specification and claims do not identify differences in terms of components, but rather differences in terms of the functionality of the components. As used throughout the specification and claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. The description hereinafter sets forth the preferred embodiments for carrying out the present disclosure, but is not intended to limit the scope of the disclosure in general, as the description proceeds. The scope of the present disclosure is defined by the appended claims.
For the purposes of promoting an understanding of the embodiments of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific examples, without the intention of being limiting the embodiments of the disclosure.
In one embodiment, as shown in fig. 1, the present disclosure provides a multi-target tracking method based on dual-branch feature enhancement and multi-level track association, comprising the steps of:
s100: acquiring an input image containing a plurality of targets to be tracked;
s200: constructing a multi-target tracking model and training;
s300: and inputting the input image into the multi-target tracking model to realize simultaneous tracking of a plurality of targets to be tracked in the input image.
In another embodiment, as shown in fig. 2, the multi-objective tracking model includes: the present embodiment provides a detailed description of the input layer, the feature extraction layer, the feature enhancement layer, the parallel detection layer and tracking layer, the association layer, and the output layer.
1. Feature extraction layer:
the feature extraction layer adopts DLASeg (DLASeg is a segmented network added with deformation convolution (Deformable Convolution) on the basis of DLA (Deep Layer Aggregation)) as a main network, and the input image generates basic features through the main network(H F Representing the height, W, of the input image after 4 times downsampling F Representing the width of the input image after 4 times downsampling), where H F =H/4,W F =W/4。
2. Feature enhancement layer:
the feature enhancement layer learns features for the detection layer to perform detection tasks and for the tracking layer to perform tracking tasks by adopting a Dual-branch feature learning network (DFL, dual-branch Feature Learning) so as to relieve the excessive competition problem of the two tasks and extract sufficient target feature information. The DFL network mainly realizes feature enhancement by learning the specificity and relativity of two tasks, and the structure is shown in fig. 3:
the DFL network comprises two branches, which first employ different pooling functions (the first branch employing average pooling and the second branch employing maximum pooling) for resolution reduction; then, the two branches respectively generate a characteristic diagram A of the respective branches through a convolution combination (3×3 convolution+instance Norm (a normalization method) +leak ReLU (activation function)) without sharing parameters 1 And A 2 And carrying out interactive calculation between the feature graphs, and obtaining an output result with the initial feature graphs through operation (matrix multiplication and addition). Specifically, the DFL network first obtains the shared feature from the backbone networkIn order to reduce the calculation amount brought by the matrix operation of the feature map, the shared feature is first required to be +.>Pooling operation is carried out, and different pooling modes are needed for different tasks, wherein the characteristics obtained by average pooling (Avgpool) are more sensitive to background information, and can be used for learning detection characteristics of a detection layer; the features obtained by maximum pooling (Maxpool) are more sensitive to texture information and can be used for learning tracking features of a tracking layer. Shared features->After pooling, two features containing local information are obtained, namely the detection feature +.>And tracking feature->Next, f 1 、f 2 Encoding by 3 x 3 convolutional layers, respectively, to generate feature map a for detection and tracking 1 And A 2 And remodelled (using Reshape function) to a size of c×h' F W′ F Of (2) two-dimensional tensor M 1 And M 2 . Then, for M 1 And M 2 And its corresponding transpose tensor->And->Matrix multiplication (Matrix Multiplication) and normalization by softmax function are performed to calculate a task-specific response map s for each task k ∈R C×C The calculation method is as follows:
wherein, the dot product operation representing two vectors,and->Respectively represent M k Lines i, j and l, +.>Is S k The value of the upper (i, j) position represents the correlation between the i-th channel and the j-th channel in the feature map, C represents the number of feature channels, and c=64.
Next, M will be 1 AndM 2 and->Respectively performing matrix multiplication, wherein T represents transposition operation to learn correlation among different tasks, and normalizing to obtain a correlation response graph R among the tasks k ∈R C×C The calculation method is as follows:
wherein,representing the relation of the ith channel of task 1 to the jth channel of task 2,Representing the correlation of the ith channel of task 2 to the jth channel of task 1, (k, h) represents different tensor combinations, e.g., (1, 2) represents tensors M1 and M2.The larger the value, the greater the degree of attention that the characteristic information representing the channel is commonly focused by the two tasks.
Finally, a feature enhancement response diagram W is obtained by fusing the special response diagram and the correlation response diagram through a trainable parameter lambda k ∈R C×C The calculation method is as follows:
W k =λ k ×S k +(1-λ k )×R k ,k∈{1,2}
wherein lambda is k Representing trainable parameters lambda 1 Representing a response diagram W 1 Trainable parameter lambda 2 Representing a response diagram W 2 Is provided for the training parameters.
Enhancing response graphs W corresponding to different tasks 1 And W is 2 Matrix-multiplying the remodeled input features to obtain enhanced features of each task, wherein the enhanced features are remodeled into the input featuresAnd (3) the three-dimensional tensors with the same shape are fused with f to prevent information loss. Two features for detecting branch and tracking branch inputs are finally obtained.
3. Detection layer and tracking layer:
detection ofThe layers include three output heads, each head consisting of two convolutional layers and a Relu activation function located in both convolutional layers. Wherein the first convolution layer is a 3 x 3 convolution for increasing the number of characteristic channels from 64 to 256; the second convolution layer is a 1 x 1 convolution for reducing the number of characteristic channels from 256 to 1 or 2 (center point: H) F ×W F X 1; center point offset: h F ×W F X 2; width and height: h F ×W F ×2)。
The Re-identification network (ReID) in the tracking layer consists of four layers of convolution, wherein the first two layers are depth separable convolution, namely channel-by-channel convolution and point-by-point convolution respectively, and are 1 multiplied by 1 convolution, and the Re-identification network (Re-identification) is used for improving the number of characteristic channels from 64 to 128; then, a BatchNorm2d normalization and Relu activation function is carried out; the third layer convolution is 3×3 convolution, and the number of channels is unchanged; then a BatchNorm2d normalization and Relu activation function is carried out; the fourth layer convolution is a 1×1 convolution, and the number of channels is unchanged.
In order to utilize more time sequence information, the embodiment also introduces an AM matrix in the tracking layer, and the AM matrix utilizes the extracted characteristics to construct a similarity relation between two frames, so that more accurate offset vectors are obtained through prediction, and the identity switching times of a target to be tracked in an input image can be reduced. AM matrix is expressed asCan pass-> Features extracted by ReID) and e t-1 Matrix multiplication is carried out on the corresponding transposed tensor to obtain a matrix A which represents an image I t And I t-1 The similarity between the two is calculated as follows:
wherein T is a transposed symbol, A i,j,m,n And the characteristic similarity between the t-th frame image target point (i, j) and the t-1 th frame image target point (m, n) is represented. For the center point (i, j) of the t-th frame target x, the corresponding two-dimensional associated matrix can be obtained from the matrix ARepresenting feature similarity between the object x and all points on the t-1 frame image.
Next, through A i,j The offset vector is determined, i.e. the maximum pooling in both horizontal and vertical directions acts on A i,j The pooling cores are H respectively F X 1 and 1 XW F Obtaining a matrix in two directionsAnd->Will->Andnormalizing by softmax function to obtain two vectors +.>And->Wherein (1)>And->The probabilities of the target appearing in the horizontal direction position and the vertical direction position on the t-1 th frame are respectively represented. An offset template for two directions is defined according to the magnitude of the resolution of the output image,respectively->And->The offset value representing the actual appearance of the target at other positions is calculated as follows:
X i,j,n =(n-j)×s 1≤n≤W F
Y i,j,m =(m-i)×s 1≤m≤H F
wherein s represents a downsampling magnification, and is set to 4, X i,j,n And Y i,j,m Indicating the offset of the presence of the object at positions (x, n) and (m,) on the t-1 frame, respectively. The final tracking offset is obtained by the dot product of the offset value of the actual target position and the offset probability of the actual target position at the corresponding position:
the deviation of the horizontal position and the vertical position is learned through two channels respectively, and finally the tracked deviation is obtainedFor subsequent track association.
Most existing trackers filter out low-resolution detection boxes when the detection box score is below a threshold. However, these low-resolution detection frames may be due to occlusion, and the like, so simply filtering these detection frames easily causes track loss, and based on this problem, the present embodiment introduces a Multi-level track association policy (MTA, multi-level Trajectory Association) policy in the association layer to reduce track loss, thereby further improving tracking performance in complex scenes. As shown in fig. 4, the MTA policy re-divides the detection frame into a high frame and a low frame according to a set threshold, where the high frame can extract accurate target feature information, so as to realize long-term association of targets; the low frame can be used for recovering the missing track, and the two frames are respectively associated with the track by adopting different matching modes.
When the tracks are associated, firstly, the high-score detection frame is matched with the track of the previous frame by a simple greedy algorithm based on the offset vector, and an unmatched detection frame, a successfully matched track and an unmatched track are generated. And then, carrying out secondary matching on cosine similarity between the unmatched detection frame and the tracking track calculation characteristics of the previous frame, and creating a new track if the similarity is lower than a threshold value, so that re-association after target shielding can be realized, and the IDs times are reduced. If the matching is higher than the threshold value, the matching is successful, the matching is added into the track, and the characteristic information f of the ith track in the t frame is updated i t The calculation formula is as follows:
wherein,and the epsilon is the weight for the current frame image characteristics extracted by the ReID network. The unmatched track may be caused by filtering out the occlusion of the low-resolution detection frame, so that the low-resolution detection frame and the unmatched track are subjected to secondary matching to recover the lost track in the scene such as the occlusion.
In another embodiment, the present disclosure trains a model using a MOT17 dataset, the MOT17 dataset consisting of 7 video sequences for training and 7 sequences for testing. MOT17 carries bounding boxes generated by three different object detectors, namely DPM, fast R-CNN and SDP.
In this embodiment, the front half part of each video in the MOT17 data set is used as a training set to train the model, and the rear half part of each video is used as a testing set to test the model, so that the model is specifically trained by the following method:
training the model with the training set while setting training parameters, i.e., batch size, to 32, initial learningThe learning rate is set to be 1.25X10 -4 The training wheel number is set to be 70, and when the model training times reach 70 times, the training is completed;
and testing the trained model by using a test set, and adopting a multi-objective tracking accuracy (MOTA) and an IDF1 score as evaluation indexes in the test process, wherein the MOTA represents the overall performance of the tracker, and the MOTA is measured by evaluating three sources of False Negative (FN), false Positive (FP) and identity switching times (IDs), and the calculation mode is as follows:
wherein GT is the group Truth, which represents the marked number of targets in the scene.
The IDF1 score represents the associated performance of the tracker, which refers to the average ratio of the correct target detection number to the sum of the true number and the calculated detection number, calculated as follows:
wherein IDTP is true positive ID, which indicates the number of correctly allocated detection targets in the whole video; the IDFN is a false negative ID, and represents the number of missed allocation of the detection targets in the whole video; IDFP is a false positive ID indicating the number of false assignments of detection targets in the entire video.
Above, when the tracking accuracy reaches 66.1% and the IDF1 score reaches 64.2%, the model test passes.
Next, the effectiveness of the model described in the present disclosure will be described in detail in connection with fig. 5 (a) to 7 (b) and tables 1 and 2.
Firstly, the validity of each module in the model is verified through an ablation experiment, and the validity is specifically shown in table 1:
TABLE 1
In table 1, the first line of data is the results of an ablation experiment using the baseline algorithm centrtrack.
The second line of data is the result of ablation experiments with AM matrix introduced, compared with the first line of data, the MOTA value is improved by 1.0% and the IDF1 score is improved by 4.4% by using the AM matrix for offset prediction, and IDs is reduced from 528 to 369. The reference algorithm centrtrack predicts an offset vector from the center point of the current frame to the center point of the previous frame using regression learning, and such an offset vector does not make full use of timing information. The AM matrix is formed by the similar relation between adjacent frames and contains more time sequence information, so that the predicted offset vector is more accurate. Therefore, the track association is performed by using the offset vector predicted by the AM, so that the number of times of target IDs can be greatly reduced, and the association capability IDF1 is improved.
And the third row of data is an ablation experimental result adopting a DFL network, compared with the second row of data, the MOTA is improved by 0.3%, the IDF1 is improved by 0.5%, and the overall performance of the model is improved. This is mainly because the two branches of the DFL network enhance the input shared features respectively to obtain the detection features and the tracking features, thereby alleviating the competition problem between detection and tracking.
The fourth row of data is an ablation experimental result adopting an MTA strategy on the basis of introducing an AM matrix, compared with the second row of data, MOTA is improved by 0.8%, IDF1 score is reduced by 0.8%, and overall performance of the tracker is improved but association capability is reduced. In the experiment, the detection frames are divided by confidence coefficient eta, wherein the range of the high-resolution detection frame is eta more than or equal to 0.4, and the range of the low-resolution detection frame is 0.2 less than or equal to eta less than 0.4. Since the MTA strategy utilizes a portion of the low scoring boxes, retaining them results in an increase in false positive FP and therefore a decrease in IDF1 score. Compared to the first line of data, the MT was increased by 3.2% after the MTA strategy, indicating a substantial decrease in the missing traces. MT represents the ratio at which at least 80% of the track of the object is correctly tracked during video tracking.
The last line of data is an ablation experimental result after three modules of AM, DFL and MTA are added simultaneously, and compared with the first line of data, MOTA and IDF1 are respectively improved by 2.1 percent and 4.3 percent, and IDs are reduced from 528 to 333, so that the model disclosed by the disclosure can be proved to be capable of effectively improving the performance of multi-target tracking in a complex scene.
Furthermore, the method also selects video scenes in the MOT17 test set to perform qualitative analysis on the model disclosed by the method, and compares the model with a reference algorithm centrTrack to achieve a target tracking effect. Fig. 5 (a) and 5 (b) show the results of the reference algorithm centrtrack and the partial visualization of the model described in the present disclosure in MOT17-04 video sequence, respectively, from left to right, before occlusion, during occlusion, after occlusion, and the lower right corner is the frame number. In a scene of a distant view, the targets are easily affected by local occlusion, and as can be seen from fig. 5 (a), the targets No. 103 and No. 128 at the arrow point generate IDs phenomenon after being occluded, and are switched into a new track identifier. As shown in fig. 5 (b), after the DFL network, the AM matrix, and the MTA policy are introduced, the reference algorithm centrtrack can maintain the original target ID in the occlusion scene, and improve the tracking performance of the multi-target tracker.
Fig. 6 (a) and 6 (b) show the visualization results of the benchmark algorithm centrtrack and the model described in the present disclosure at MOT17-09, respectively. In close-up scenes, the target is susceptible to severe occlusion. As shown in fig. 6 (a), the target 21 at the arrow after undergoing complete occlusion generates a new ID value of 29; as shown in fig. 6 (b), the model disclosed in the present disclosure may implement re-association after occlusion, that is, after complete occlusion, when the target reappears, cosine distance matching may be performed between the appearance feature extracted by the ReID and the track, so as to cope with a strong occlusion scene, so that tracking performance of the tracker may be improved.
Fig. 7 (a) and 7 (b) show the visualization results of the benchmark algorithm centrtrack and the model described in the present disclosure at MOT17-11, respectively. In some scenarios, the target may decrease the detection box score as the occlusion level increases, and low-scoring detection boxes are typically filtered out by the detector, thus causing a loss of trajectory. As shown in fig. 7 (a), the target under the reference algorithm centrtrack is not detected due to occlusion, and cannot be tracked; as shown in fig. 7 (b), the model disclosed in the disclosure maintains a part of low-score detection frames in the MTA policy part, so that the targets can be continuously tracked all the time, the track missing phenomenon is reduced, and the tracking performance of the tracker can be improved.
To further verify the effectiveness of the model described in this disclosure, 6 advanced MOT algorithms, CTracker, JDE, centerTrack, quasiDense, transTrack and MOTR respectively, were selected and compared on MOT17 and MOT20 data sets, with the analysis results shown in table 2.
TABLE 2
As can be seen from Table 2, the MOTA and IDF1 indexes on the MOT17 data set of the model disclosed by the disclosure reach 68.2% and 68.5% respectively, and the MOTA and IDF1 indexes on the MOT20 data set reach 52.7% and 48.2% respectively, so that the optimal tracking results are obtained compared with the other 6 algorithms.
On MOT20, the model of the present disclosure increased the MOTA index by 1.4% and IDF1 index by 7.9% compared to the benchmark algorithm centrtrack, IDS decreased from 7731 to 3043, fp increased from 10080 to 13403, and fn decreased from 281757 to 274419. The result shows that the model can effectively solve the problems of insufficient extraction of target features, identity switching and track missing in dense scenes.
In another embodiment, the present disclosure further provides a multi-target tracking apparatus based on dual branch feature enhancement and multi-level trajectory correlation, the apparatus comprising:
the acquisition module is used for acquiring an input image containing a plurality of targets to be tracked;
the model construction and training module is used for constructing a multi-target tracking model and training the multi-target tracking model to obtain a trained multi-target tracking model;
the multi-target tracking model is used for relieving transition competition between detection and tracking by utilizing a double-branch feature learning network, and obtaining a more accurate offset vector by introducing an incidence matrix AM prediction so as to reduce the identity switching times of a target to be tracked in an input image;
the tracking module is used for inputting the input image into the trained multi-target tracking model so as to realize simultaneous tracking of a plurality of targets to be tracked in the input image.
In another embodiment, the model building and training module comprises:
dividing the data set of model training into a training set and a testing set;
the training sub-module is used for training the model by utilizing the training set;
and the test sub-module is used for testing the trained model by using the test set.
In another embodiment, the present disclosure also provides a computer storage medium storing computer-executable instructions for performing a method as set forth in any one of the preceding claims.
In another embodiment, the present disclosure further provides an electronic device, including:
a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein,
the processor, when executing the program, implements a method as described in any of the preceding.
The applicant of the present disclosure has described embodiments of the present disclosure in detail with reference to the accompanying drawings of the specification, but it should be understood by those skilled in the art that the above embodiments are merely preferred examples of the present disclosure and are not limited to the specific embodiments described above. The detailed description knowledge is intended to aid the reader in better understanding the spirit of the disclosure, and is not intended to limit the scope of the disclosure, but rather any modifications or variations based on the spirit of the disclosure are intended to be included within the scope of the disclosure.

Claims (8)

1. A multi-target tracking method based on dual-branch feature enhancement and multi-level track association, the method comprising the steps of:
s100: acquiring an input image containing a plurality of targets to be tracked;
s200: constructing a multi-target tracking model and training to obtain a trained multi-target tracking model;
the multi-target tracking model is used for relieving transition competition between detection and tracking by utilizing a double-branch feature learning network, and obtaining a more accurate offset vector by introducing an incidence matrix AM prediction so as to reduce the identity switching times of a target to be tracked in an input image;
s300: and inputting the input image into a trained multi-target tracking model to realize simultaneous tracking of a plurality of targets to be tracked in the input image.
2. The method according to claim 1, characterized in that preferably in step S200 the multi-objective tracking model is trained by:
s201: acquiring a data set, and dividing the data set into a training set and a testing set;
s202: setting training parameters, training the model by using a training set, and finishing the model training when the training reaches the set number of rounds;
s203: and testing the trained model by using a test set, wherein in the test process, multi-target tracking accuracy and IDF1 fraction are used as evaluation indexes to evaluate the model, and when the tracking accuracy reaches 66.1%, the IDF1 fraction reaches 64.2%, the model test is passed.
3. The method according to claim 2, wherein in step S203, the multi-target tracking accuracy is represented as follows:
wherein FN represents false negative, FP represents false positive, IDS represents identity switching times, GT is group trunk, and the number of marked targets in the scene is represented.
4. The method according to claim 2, wherein in step S203, the IDF1 score is expressed as:
wherein IDTP is true positive ID, which indicates the number of correctly allocated detection targets in the whole video; the IDFN is a false negative ID, and represents the number of missed allocation of the detection targets in the whole video; IDFP is a false positive ID indicating the number of false assignments of detection targets in the entire video.
5. A multi-target tracking device based on dual-branch feature enhancement and multi-level trajectory correlation, the device comprising:
the acquisition module is used for acquiring an input image containing a plurality of targets to be tracked;
the model construction and training module is used for constructing a multi-target tracking model and training the multi-target tracking model to obtain a trained multi-target tracking model;
the multi-target tracking model is used for relieving transition competition between detection and tracking by utilizing a double-branch feature learning network, and obtaining a more accurate offset vector by introducing an incidence matrix AM prediction so as to reduce the identity switching times of a target to be tracked in an input image;
the tracking module is used for inputting the input image into the trained multi-target tracking model so as to realize simultaneous tracking of a plurality of targets to be tracked in the input image.
6. The apparatus of claim 5, wherein the model building and training module comprises:
dividing the data set of model training into a training set and a testing set;
the training sub-module is used for training the model by utilizing the training set;
and the test sub-module is used for testing the trained model by using the test set.
7. A computer storage medium having stored thereon computer executable instructions for performing the method of any of claims 1 to 4.
8. An electronic device, comprising:
a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein,
the processor, when executing the program, implements the method of any one of claims 1 to 4.
CN202311226983.6A 2023-09-21 2023-09-21 Multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation Pending CN117274308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311226983.6A CN117274308A (en) 2023-09-21 2023-09-21 Multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311226983.6A CN117274308A (en) 2023-09-21 2023-09-21 Multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation

Publications (1)

Publication Number Publication Date
CN117274308A true CN117274308A (en) 2023-12-22

Family

ID=89209976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311226983.6A Pending CN117274308A (en) 2023-09-21 2023-09-21 Multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation

Country Status (1)

Country Link
CN (1) CN117274308A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120411865A (en) * 2025-07-03 2025-08-01 山东科技大学 A multi-target tracking method and device based on secondary graph matching

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018037931A1 (en) * 2016-08-22 2018-03-01 Canon Kabushiki Kaisha Continuum robot, modification method of kinematic model of continuum robot, and control method of continuum robot
GB201908574D0 (en) * 2019-06-14 2019-07-31 Vision Semantics Ltd Optimised machine learning
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
CN112668432A (en) * 2020-12-22 2021-04-16 上海幻维数码创意科技股份有限公司 Human body detection tracking method in ground interactive projection system based on YoloV5 and Deepsort
CN112927264A (en) * 2021-02-25 2021-06-08 华南理工大学 Unmanned aerial vehicle tracking shooting system and RGBD tracking method thereof
WO2021184621A1 (en) * 2020-03-19 2021-09-23 南京因果人工智能研究院有限公司 Multi-object vehicle tracking method based on mdp
CN114155279A (en) * 2021-11-29 2022-03-08 西安邮电大学 Visual target tracking method based on multi-feature game
CN114419343A (en) * 2021-12-09 2022-04-29 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) A multi-target identification and tracking method and identification and tracking system
CN114529581A (en) * 2022-01-28 2022-05-24 西安电子科技大学 Multi-target tracking method based on deep learning and multi-task joint training
CN116168322A (en) * 2023-01-10 2023-05-26 中国人民解放军军事科学院国防科技创新研究院 Unmanned aerial vehicle long-time tracking method and system based on multi-mode fusion

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018037931A1 (en) * 2016-08-22 2018-03-01 Canon Kabushiki Kaisha Continuum robot, modification method of kinematic model of continuum robot, and control method of continuum robot
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
GB201908574D0 (en) * 2019-06-14 2019-07-31 Vision Semantics Ltd Optimised machine learning
WO2021184621A1 (en) * 2020-03-19 2021-09-23 南京因果人工智能研究院有限公司 Multi-object vehicle tracking method based on mdp
CN112668432A (en) * 2020-12-22 2021-04-16 上海幻维数码创意科技股份有限公司 Human body detection tracking method in ground interactive projection system based on YoloV5 and Deepsort
CN112927264A (en) * 2021-02-25 2021-06-08 华南理工大学 Unmanned aerial vehicle tracking shooting system and RGBD tracking method thereof
CN114155279A (en) * 2021-11-29 2022-03-08 西安邮电大学 Visual target tracking method based on multi-feature game
CN114419343A (en) * 2021-12-09 2022-04-29 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) A multi-target identification and tracking method and identification and tracking system
CN114529581A (en) * 2022-01-28 2022-05-24 西安电子科技大学 Multi-target tracking method based on deep learning and multi-task joint training
CN116168322A (en) * 2023-01-10 2023-05-26 中国人民解放军军事科学院国防科技创新研究院 Unmanned aerial vehicle long-time tracking method and system based on multi-mode fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAO SHI: "《Double-Branch Network with Pyramidal Convolution and Iterative Attention for Hyperspectral Image Classification》", 《REMOTE SENSING》, 6 April 2021 (2021-04-06) *
侯志强, 郭浩: "《基于双分支特征融合的无锚框目标检测算法》", 《电子与信息学报》, 30 June 2022 (2022-06-30) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120411865A (en) * 2025-07-03 2025-08-01 山东科技大学 A multi-target tracking method and device based on secondary graph matching

Similar Documents

Publication Publication Date Title
CN114972418B (en) Maneuvering multi-target tracking method based on combination of kernel adaptive filtering and YOLOX detection
CN114092517B (en) Multi-target tracking method based on traditional and deep learning algorithms
CN108447080B (en) Target tracking method, system and storage medium based on hierarchical data association and convolutional neural network
CN108320306B (en) Video target tracking method fusing TLD and KCF
CN107590442A (en) A kind of video semanteme Scene Segmentation based on convolutional neural networks
CN111429485B (en) Cross-modal filter tracking method based on adaptive regularization and high confidence update
CN117011342A (en) An attention-enhanced spatiotemporal Transformer visual single target tracking method
CN117333753A (en) Fire detection method based on PD-YOLO
CN116416503A (en) A small sample target detection method, system and medium based on multimodal fusion
CN111160212A (en) An improved tracking learning detection system and method based on YOLOv3-Tiny
CN118968012A (en) Infrared small target detection model establishment method and detection method based on multi-scale attention feature superposition
CN109784155B (en) Visual target tracking method based on verification and error correction mechanism and intelligent robot
CN111241987A (en) A Cost-Sensitive Three-Way Decision-Based Visual Tracking Method for Multi-target Models
CN117274308A (en) Multi-target tracking method based on dual-branch feature enhancement and multi-level trajectory correlation
CN116486203B (en) Single-target tracking method based on twin network and online template updating
CN111144220B (en) Personnel detection method, device, equipment and medium suitable for big data
CN119888465B (en) Marine organism detection method based on reinforced fine grain characteristic expression branch and light WCSPOmni-DETR
CN114219826B (en) Ground target tracking method applied to aerial video
CN113724291B (en) Multi-panda tracking method, system, terminal device and readable storage medium
Sun et al. Multi-AUV target recognition method based on GAN-meta learning
CN119963800A (en) A method and system for detecting common salient objects
CN119992044A (en) An efficient approach for unsupervised domain adaptive object detection
CN116343263A (en) Method and system for generating content-adaptive pedestrian re-identification dataset
Xu Stereo matching and depth map collection algorithm based on deep learning
Kamiya et al. Tracking correction method for rapid and random protein molecules movement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination