CN119810505B

CN119810505B - Method for smoothing target detection label based on historical information

Info

Publication number: CN119810505B
Application number: CN202411798622.3A
Authority: CN
Inventors: 王俊波; 徐志豪; 朱伟; 常传文; 潘怡瑾
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2024-12-09
Filing date: 2024-12-09
Publication date: 2025-09-23
Anticipated expiration: 2044-12-09
Also published as: CN119810505A

Abstract

The invention discloses a method for smoothing a target detection label based on historical information, which comprises the steps of inputting a video frame sequence into a trained detector to obtain a detection result, inputting the detection result into a tracker, carrying out Kalman filtering prediction on each result track to obtain a unique ID and a tracking result of each target, distributing different weights to a historical frame label and a current frame label by utilizing the uniqueness of the track ID and the label information of the detection result to obtain the label information of the current frame, correcting the label of the detection result according to the obtained label information, and outputting a final stable detection label.

Description

Method for smoothing target detection label based on historical information

Technical Field

The invention belongs to the technical field of target detection and tracking, and particularly relates to a method for smoothing a target detection label based on historical information.

Background

With the rapid development of artificial intelligence technology, the fields of intelligent monitoring systems, automatic driving and the like based on computer vision technology are greatly broken through compared with the prior art, the waste of human resources is further reduced, and the safety in the fields of security and transportation is improved. Target detection is a key fundamental technology in these fields, and the performance of the target detection directly influences the reliability and effectiveness of the system.

In practical object detection applications, particularly when dealing with dynamic video streams, the classification of certain objects may suddenly mutate due to motion distortion, mutual occlusion of moving objects, or fluctuation of the detector itself. For example, one object is detected as the same class in consecutive frames, but is erroneously classified as another class in a certain frame due to motion distortion or the like. Such tag mutations may seriously affect the overall stability of the detection system, which in turn negatively affects subsequent decisions. Thus, research into smoothing of the target detection tag is necessary.

The main disadvantage of the conventional tag smoothing technique is that it cannot effectively cope with the tag changes of dynamic objects, especially when the objects are distorted, blocked or changed rapidly. Conventional approaches generally assume that the object class labels are relatively stable in time series, so for fast moving or occluded objects, the labels may appear to be abrupt or inconsistent, resulting in tracking failure or misrecognition. In addition, the traditional label smoothing method lacks of dynamic analysis on the motion trail of the target and cannot be adaptively adjusted according to the historical state of the target, so that the label is easily corrected or lost by mistake in a complex scene, and the overall tracking performance and accuracy are affected.

Disclosure of Invention

Aiming at the problems, the invention provides a method for smoothing the target detection label based on historical information, which can effectively correlate the detection result with track information by utilizing a tracking algorithm to generate a unique ID of the detection target, can effectively record the historical label information of the current detection target by utilizing the unique ID of the target generated by the tracking algorithm, can effectively output stable label information by weighting and fusing the historical label information of the target and the current detection label information, and reduces the problem of frequent jump of the target detection label caused by factors such as mutual shielding and motion blurring of the target.

In order to achieve the above purpose, the present invention provides a method for smoothing a target detection tag based on history information, the method comprising the following steps:

S1, taking an image or video frame image as input, and taking the category and confidence of a target in the image and the coordinates of a target frame of a frame selected target as output training models to obtain a detector;

S2, inputting the video frame sequence into a detector to obtain a target detection result, wherein the detection result comprises coordinates, categories and confidence scores of all detected target frames in the image;

S3, inputting the detection result into a target tracker, judging whether the current input is the first frame of the video sequence, if so, creating a track list, generating an initial track of the target according to the information such as the coordinates of the target frame, the category, the confidence level and the like, and storing all tracks into the track list. If not, predicting all tracks in the track list by using Kalman filtering to obtain predicted positions of the tracks, and updating the track list after being associated and matched with the detection result;

S4, carrying out data association matching on the detection result and the predicted positions of all tracks in the track list, and obtaining the tracking state of each target and updated track information by utilizing the uniqueness of the track ID;

S5, judging whether each target in the tracking state is the first target according to the uniqueness of the track ID. If the ID appears for the first time, a historical class label probability matrix is created for the currently detected class label, and the class label probability is recorded. If the ID does not appear for the first time, a weighted fusion method is adopted for the category label detected by the current frame and the historical category label probability matrix of the track to obtain a smooth label of the target;

S6, for the first appearing target, directly outputting the current detection type label, for the non-first appearing target, acquiring the smoothed label, then distributing the smoothed label to a target frame in the current frame detection result, and outputting a stable detection label.

In particular, the detector used in S1 may be a model obtained after training of a generic data set or an application-specific data set, such as YOLOv, YOLC, etc., which is the currently prevailing target detection algorithm. As long as the detected outputs of the model can contain the content required for step S2 (e.g., class and confidence scores of the targets), even if these outputs were not originally provided, this can be achieved by modifying the relevant output portions in the detector source code.

Further, the step S3 includes:

If the input is the first frame, generating an initial track of the target according to the coordinates of the target frame, the category, the confidence and other information, initializing the coordinates of all tracks to be detection result coordinates, initializing the mean value and the variance of a Kalman filter according to the coordinates, marking the detection frame as a tracking state according to the detection frame result, and distributing a unique ID for subsequent matching.

Further, the step S4 includes:

step one, setting a confidence coefficient threshold value track_thresh, judging each detection target after obtaining a detection frame and confidence coefficient information of the targets through a detector, judging the detection frame as a high sub-frame if the confidence coefficient is larger than the track_thresh, judging the detection frame as a low sub-frame if the confidence coefficient is smaller than the track_thresh but higher than 0.1, and judging the detection frame as a detector misjudgment if the confidence coefficient is lower than 0.1, wherein the track_thresh is larger than 0.1;

Step two, aiming at the high frame, calculating IOU of the high frame and the predicted frame, matching the IOU by using Hungary algorithm, if Gao Fenkuang is matched with the predicted frame, updating the frame in the tracking track into the high frame, and distributing ID (identity), if the high frame is not matched with the predicted frame, executing the step three and the step four;

thirdly, aiming at a prediction frame which is failed to be matched with a high frame, marking the prediction frame as a lost state if the prediction frame is not in a tracking state, deleting the prediction frame information when the maximum cache frame number is 25 frames, calculating the IOU of the prediction frame which is failed to be matched with the low frame obtained in the first step if the prediction frame is in the tracking state, matching the IOU by using a Hungary algorithm, updating the frame in the tracking track as the low frame if the matching is successful, allocating an ID, and directly discarding the prediction frame which is failed to be matched if the matching is failed;

And step four, matching the high-level frame which is not matched with the prediction frame and the prediction frame with the inactive state, if the matching is successful, activating the state of the prediction frame, marking the state as a tracking state, and updating the frame as the high-level frame at the moment, if the matching is failed, marking the non-matched prediction frame as a deletion, and if the confidence coefficient of the still non-matched high-level frame is larger than track_thresh+0.1, creating a tracking track, and if the confidence coefficient of the still non-matched high-level frame is smaller than track_thresh+0.1, discarding the tracking track.

Further, the step S5 includes:

Initializing a history category label probability matrix His according to the number k of detection categories:

wherein x _i represents the probability of the detection class i, in each frame, the ID of the target successfully tracked output by the tracker is recorded, and for the target with the same ID, a current class label probability matrix Cur is generated by the current detection class j, j E [0, k-1]:

different weights are allocated to the current category label probability matrix and the historical category label probability matrix, and the formula is as follows:

s=αCur+(1-α)His

where s is a smoothed class label probability matrix, α represents a weight, and the smaller the value, the more important the historical label information is.

The smoothed label L is the class label with the largest probability in the class label probability matrix after smoothing, and the specific formula is as follows:

L=max(P_Class),P_Class∈s。

Particularly, in the step S6, the probability matrix of the current detection tag is different from the probability matrix of the history tracking tag in that the probability matrix of the current target detection tag is necessarily a sparse table, namely, only the probability of the current detection category is alpha, and the rest is 0, while the probability matrix of the history tracking tag is different from 0 in a plurality of categories of tags due to the influence of tag jump, and is only a sparse table in an ideal state.

Furthermore, the invention proposes a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of any one of the preceding claims when executing the computer program.

Furthermore, the present invention proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the claims.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

(1) The method breaks through the limitation that the traditional label smoothing cannot be adaptively adjusted according to the historical state of the target by utilizing the tracking algorithm, effectively integrates detection information and track information, enables the target to still output a relatively stable label under challenge scenes such as mutual shielding, motion blurring and the like, and has good robustness.

(2) According to the method, the historical tag information and the current detection information of the same target are combined by different weights in a weighting mode, so that smoother tag information is obtained, the problem of tag mutation of a single frame or a plurality of frames of tags is effectively restrained, and the probability of mutation is lower and lower along with the time.

(3) The detector used by the invention can be a training model of any mainstream algorithm on a general data set or a specific data set, and the tracking algorithm can also be any mainstream tracking algorithm, so long as the two algorithms can be compatible and matched, thus the invention has wider application value.

(4) The method and the device can effectively improve the stability and the accuracy of the target detection label. When the distance between targets is relatively short or shielding occurs, the problem that the labels are frequently changed due to fluctuation of the output of the detector is solved. The target tracking and tag output is made more stable and robust by a simple and efficient method. The invention can be combined with any mainstream target tracker generating unique track ID, and has stronger adaptability and wide practical value.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the following description briefly describes the drawings used in the embodiments.

FIG. 1 is a flow chart of a tag smoothing method based on history information according to the present invention;

FIG. 2 is a flow chart of a tracking algorithm employed by the present invention.

Detailed Description

In order to more clearly illustrate the technical scheme of the invention, the invention is further described below with reference to the accompanying drawings and the embodiments. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

In practical target detection applications, the classification of certain targets may suddenly mutate due to mutual occlusion of moving targets, or fluctuation of the detector itself. Frequent tag mutations can seriously affect the overall stability of the detection system, which in turn negatively affects subsequent decisions.

Based on the thought, the invention provides a method for smoothing the target detection label based on the historical information, which utilizes a tracking algorithm to fuse the current detection label information of the target with the historical label information to generate a smoother label, thereby effectively reducing frequent jump of the label caused by challenges such as motion shielding, blurring and the like and outputting a more stable label.

As shown in fig. 1, the method for smoothing the target detection label based on the history information provided by the invention specifically comprises the following steps:

Step 1, training by using a model which takes an image or video frame image as input and takes the category and confidence of a target in the image and the coordinates of a target frame of a frame selected target as output as a detector. In a preferred embodiment, the detector is YoloV and training is performed on the unmanned aerial vehicle and bird data set with official pre-training weights to obtain the detector of the present invention.

Step2, inputting a video sequence to be detected into a detector in the form of a frame image to obtain an output result of the detector, wherein the output result comprises a target position, a confidence coefficient and category label information;

In particular, if the confidence score and class label information of the target are not included in the output of a certain detector, the confidence score and class label information can be obtained by modifying the source code.

In a preferred embodiment, the result output of each detection target of the detector is a 6-dimensional vector (x 1, y1, x2, y2, conf, cls), the first four values of the vector are the upper left corner and lower right corner coordinates of the detection frame, conf is a confidence score, the value range is between 0 and 1, cls is a classification result, and cls is an integer variable.

And step 3, if the output image is the first frame, generating initial tracks of the target according to the information such as the coordinates of the target frame, the category, the confidence level and the like, namely initializing the coordinates of all tracks to be detection result coordinates, initializing the mean value and the variance of a Kalman filter according to the coordinates, marking the detection frame as a tracking state according to the detection frame result, and distributing a unique ID for subsequent matching.

It should be noted that the detector designed by the invention is not limited to YoloV, and only needs to output the information which can meet the tracking algorithm and simultaneously contains the label type information, and the tracking algorithm is only required to be compatible with the detector and meets the setting of generating the unique ID.

And 4, the tracker correlates the detection result with the data according to the flow shown in fig. 2, and initializes the historical category label probability matrix according to the category number after obtaining the unique ID of the target.

In a preferred embodiment, the target class is bird and unmanned aerial vehicle, 0 represents bird, 1 represents unmanned aerial vehicle, and for the first appearance of target ID, the initialized historical class label probability matrix is [0,0], and the corresponding class probability is set to 1 according to the current detection frame detection label.

In a preferred embodiment, the tracking target ID is 1, first appears, and the current detection class of 0 indicates birds, so the final historical class tag probability matrix is initialized to [1,0].

And 5, carrying out weighted fusion on the detection category label probability matrix and the history category label probability matrix of the current frame, wherein the specific formula is as follows:

s=αCur+(1-α)His

Wherein s is a smoothed class label probability matrix, cur is a current class label probability matrix generated by a current detection class, his is a history class label probability matrix, alpha represents a weight, and the smaller the value is, the more important the history label information is indicated.

Finally, the smoothed label L is the class label with the largest probability in the smoothed class label probability matrix, and the specific formula is as follows:

L=max(P_Class),P_Class∈s。

In a preferred embodiment, 0 represents birds, 1 represents unmanned aerial vehicles, the ID of the tracking target is 1, the first occurrence of the tracking target is 0, the Cur generated by the tracking target is [1,0], the His is also [1,0] because of the first occurrence of the tracking target, the probability matrix s of the smoothed tag information is [1,0], and the final smoothed tag L is a class index with the highest probability, namely, an index 0 of the position where the probability is 1, and the tracking target is expressed as birds.

In a preferred embodiment, when the tracking ID is 1 and the second occurrence occurs, the detected tag is 1, then the generated Cur is [0,1], his is [1,0], and α has a value of 0.25, so that the smoothed tag information probability matrix s is [0.75,0.25], and the final smoothed tag L is a category index with the highest probability, that is, an index 0 with a probability of 0.75, and still represents birds.

Step 6, according to the final smoothed label L, the prior class information is read, and the final detection label is output, in a preferred embodiment, 0 represents birds, 1 represents unmanned aerial vehicle class, the first output label is birds for the target with ID 1, the second first detection label is unmanned aerial vehicle class, but after label smoothing, the target is still birds finally.

It should be noted that the main solution here is frequent jump of the tag class due to factors such as occlusion and blurring of the moving object, instead of the false alarm of the detector.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. A method for smoothing target detection labels based on historical information, comprising the following steps:

S1. Take an image or video frame as input and output the category, confidence, and target box coordinates of the target in the image to train the model to obtain a detector;

S2. Input the video frame sequence into the detector to obtain the target detection results, which include the coordinates, categories and confidence scores of all detected target boxes in the image;

S3. Input the detection result into the target tracker to determine whether the current input is the first frame of the video sequence. If so, create a track list and generate the initial track of the target based on the target box coordinates, category, and confidence information, and store all tracks in the track list. If not, use Kalman filtering to predict all tracks in the track list, obtain the predicted position of the track, and update the track list after correlation matching with the detection result.

S4. Perform data association matching on the detection results and the predicted positions of all trajectories in the trajectory list, and use the uniqueness of the trajectory ID to obtain the tracking status and updated trajectory information of each target;

S5. For each target in the tracking state, determine whether it is the first appearance of the target based on the uniqueness of the trajectory ID. If the ID is the first appearance, create a historical category label probability matrix for it based on the currently detected category label and record its category label probability; if the ID is not the first appearance, use a weighted fusion method to obtain a smooth label for the target by combining the category label detected in the current frame and the historical category label probability matrix of the trajectory;

S6. For the first-appearing target, directly output its current detection category label. For the non-first-appearing target, obtain the smoothed label and assign it to the target box in the current frame detection result, and output a stable detection label.

2. A method for target detection label smoothing based on historical information according to claim 1, characterized in that the S3 initial trajectory includes initializing the coordinates of all trajectories to the detection result coordinates, initializing the mean and variance of the Kalman filter based on the coordinates, marking it as a tracking state according to the detection box result, and assigning a unique ID for matching.

3. The method for object detection label smoothing based on historical information according to claim 1, wherein the step S4 comprises:

Step 1: Set a confidence threshold track_thresh. After obtaining the target's detection frame and confidence information through the detector, judge each detected target. If its confidence is greater than track_thresh, its detection frame is judged as a high-scoring frame. If the confidence is less than track_thresh but higher than 0.1, it is judged as a low-scoring frame. If the confidence is lower than 0.1, it is considered a misjudgment by the detector and is directly discarded. The track_thresh is greater than 0.1.

Step 2: For the high-score frame, calculate the IOU of the high-score frame and the predicted frame, and use the Hungarian algorithm to match the IOU. If the high-score frame matches the predicted frame, update the frame in the tracking trajectory to the high-score frame and assign an ID. If the high-score frame does not match the predicted frame, execute steps 3 and 4.

Step 3: For the predicted box that fails to match the high-resolution box, if this predicted box is not in the tracking state, it is marked as lost. When the maximum cache frame number exceeds 25 frames, the predicted box information is deleted; if it is in the tracking state, the IOU with the low-resolution box obtained in step 1 is calculated, and the Hungarian algorithm is used to match the IOU. If the match is successful, the box in the tracking trajectory is updated to the low-resolution box and an ID is assigned; if the match fails, the predicted box that failed to match is directly discarded; for the low-resolution box that failed to match, if it fails to match all the predicted boxes that failed to match in step 2, the low-resolution box is discarded;

Step 4: For the high-score frame that fails to match the predicted frame, match it with the inactive predicted frame. If the match is successful, activate the predicted frame state, mark it as tracking state, and update the frame to the high-score frame at this time; if the match fails, mark the unmatched predicted frame for deletion. For the high-score frame that still has not been matched, if the confidence is greater than track_thresh+0.1, create a new tracking track; if it is less than, discard it.

4. The method for object detection label smoothing based on historical information according to claim 1, wherein S5 comprises:

According to the number of detection categories k, initialize the historical category label probability matrix His:

Among them, _xi represents the probability of detecting category i. In each frame, the ID of the target successfully tracked by the tracker is recorded, and for the target with the same ID, a current category label probability matrix Cur is generated by its current detection category j, j∈[0,k-1]:

Different weights are assigned to the current category label probability matrix and the historical category label probability matrix. The formula is as follows:

s＝αCur+(1-α)His

Among them, s is the smoothed category label probability matrix, and α represents the weight;

The smoothed label L is the category label with the highest probability in the smoothed category label probability matrix. The specific formula is as follows:

L=max(P _Class ), P _Class ∈s.

5. A terminal device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method according to any one of claims 1 to 4 when executing the computer program.

6 . A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to claim 1 when executed by a processor.