CN119810505B - Method for smoothing target detection label based on historical information - Google Patents
Method for smoothing target detection label based on historical informationInfo
- Publication number
- CN119810505B CN119810505B CN202411798622.3A CN202411798622A CN119810505B CN 119810505 B CN119810505 B CN 119810505B CN 202411798622 A CN202411798622 A CN 202411798622A CN 119810505 B CN119810505 B CN 119810505B
- Authority
- CN
- China
- Prior art keywords
- frame
- target
- detection
- label
- track
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Radar Systems Or Details Thereof (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for smoothing a target detection label based on historical information, which comprises the steps of inputting a video frame sequence into a trained detector to obtain a detection result, inputting the detection result into a tracker, carrying out Kalman filtering prediction on each result track to obtain a unique ID and a tracking result of each target, distributing different weights to a historical frame label and a current frame label by utilizing the uniqueness of the track ID and the label information of the detection result to obtain the label information of the current frame, correcting the label of the detection result according to the obtained label information, and outputting a final stable detection label.
Description
Technical Field
The invention belongs to the technical field of target detection and tracking, and particularly relates to a method for smoothing a target detection label based on historical information.
Background
With the rapid development of artificial intelligence technology, the fields of intelligent monitoring systems, automatic driving and the like based on computer vision technology are greatly broken through compared with the prior art, the waste of human resources is further reduced, and the safety in the fields of security and transportation is improved. Target detection is a key fundamental technology in these fields, and the performance of the target detection directly influences the reliability and effectiveness of the system.
In practical object detection applications, particularly when dealing with dynamic video streams, the classification of certain objects may suddenly mutate due to motion distortion, mutual occlusion of moving objects, or fluctuation of the detector itself. For example, one object is detected as the same class in consecutive frames, but is erroneously classified as another class in a certain frame due to motion distortion or the like. Such tag mutations may seriously affect the overall stability of the detection system, which in turn negatively affects subsequent decisions. Thus, research into smoothing of the target detection tag is necessary.
The main disadvantage of the conventional tag smoothing technique is that it cannot effectively cope with the tag changes of dynamic objects, especially when the objects are distorted, blocked or changed rapidly. Conventional approaches generally assume that the object class labels are relatively stable in time series, so for fast moving or occluded objects, the labels may appear to be abrupt or inconsistent, resulting in tracking failure or misrecognition. In addition, the traditional label smoothing method lacks of dynamic analysis on the motion trail of the target and cannot be adaptively adjusted according to the historical state of the target, so that the label is easily corrected or lost by mistake in a complex scene, and the overall tracking performance and accuracy are affected.
Disclosure of Invention
Aiming at the problems, the invention provides a method for smoothing the target detection label based on historical information, which can effectively correlate the detection result with track information by utilizing a tracking algorithm to generate a unique ID of the detection target, can effectively record the historical label information of the current detection target by utilizing the unique ID of the target generated by the tracking algorithm, can effectively output stable label information by weighting and fusing the historical label information of the target and the current detection label information, and reduces the problem of frequent jump of the target detection label caused by factors such as mutual shielding and motion blurring of the target.
In order to achieve the above purpose, the present invention provides a method for smoothing a target detection tag based on history information, the method comprising the following steps:
S1, taking an image or video frame image as input, and taking the category and confidence of a target in the image and the coordinates of a target frame of a frame selected target as output training models to obtain a detector;
S2, inputting the video frame sequence into a detector to obtain a target detection result, wherein the detection result comprises coordinates, categories and confidence scores of all detected target frames in the image;
S3, inputting the detection result into a target tracker, judging whether the current input is the first frame of the video sequence, if so, creating a track list, generating an initial track of the target according to the information such as the coordinates of the target frame, the category, the confidence level and the like, and storing all tracks into the track list. If not, predicting all tracks in the track list by using Kalman filtering to obtain predicted positions of the tracks, and updating the track list after being associated and matched with the detection result;
S4, carrying out data association matching on the detection result and the predicted positions of all tracks in the track list, and obtaining the tracking state of each target and updated track information by utilizing the uniqueness of the track ID;
S5, judging whether each target in the tracking state is the first target according to the uniqueness of the track ID. If the ID appears for the first time, a historical class label probability matrix is created for the currently detected class label, and the class label probability is recorded. If the ID does not appear for the first time, a weighted fusion method is adopted for the category label detected by the current frame and the historical category label probability matrix of the track to obtain a smooth label of the target;
S6, for the first appearing target, directly outputting the current detection type label, for the non-first appearing target, acquiring the smoothed label, then distributing the smoothed label to a target frame in the current frame detection result, and outputting a stable detection label.
In particular, the detector used in S1 may be a model obtained after training of a generic data set or an application-specific data set, such as YOLOv, YOLC, etc., which is the currently prevailing target detection algorithm. As long as the detected outputs of the model can contain the content required for step S2 (e.g., class and confidence scores of the targets), even if these outputs were not originally provided, this can be achieved by modifying the relevant output portions in the detector source code.
Further, the step S3 includes:
If the input is the first frame, generating an initial track of the target according to the coordinates of the target frame, the category, the confidence and other information, initializing the coordinates of all tracks to be detection result coordinates, initializing the mean value and the variance of a Kalman filter according to the coordinates, marking the detection frame as a tracking state according to the detection frame result, and distributing a unique ID for subsequent matching.
Further, the step S4 includes:
step one, setting a confidence coefficient threshold value track_thresh, judging each detection target after obtaining a detection frame and confidence coefficient information of the targets through a detector, judging the detection frame as a high sub-frame if the confidence coefficient is larger than the track_thresh, judging the detection frame as a low sub-frame if the confidence coefficient is smaller than the track_thresh but higher than 0.1, and judging the detection frame as a detector misjudgment if the confidence coefficient is lower than 0.1, wherein the track_thresh is larger than 0.1;
Step two, aiming at the high frame, calculating IOU of the high frame and the predicted frame, matching the IOU by using Hungary algorithm, if Gao Fenkuang is matched with the predicted frame, updating the frame in the tracking track into the high frame, and distributing ID (identity), if the high frame is not matched with the predicted frame, executing the step three and the step four;
thirdly, aiming at a prediction frame which is failed to be matched with a high frame, marking the prediction frame as a lost state if the prediction frame is not in a tracking state, deleting the prediction frame information when the maximum cache frame number is 25 frames, calculating the IOU of the prediction frame which is failed to be matched with the low frame obtained in the first step if the prediction frame is in the tracking state, matching the IOU by using a Hungary algorithm, updating the frame in the tracking track as the low frame if the matching is successful, allocating an ID, and directly discarding the prediction frame which is failed to be matched if the matching is failed;
And step four, matching the high-level frame which is not matched with the prediction frame and the prediction frame with the inactive state, if the matching is successful, activating the state of the prediction frame, marking the state as a tracking state, and updating the frame as the high-level frame at the moment, if the matching is failed, marking the non-matched prediction frame as a deletion, and if the confidence coefficient of the still non-matched high-level frame is larger than track_thresh+0.1, creating a tracking track, and if the confidence coefficient of the still non-matched high-level frame is smaller than track_thresh+0.1, discarding the tracking track.
Further, the step S5 includes:
Initializing a history category label probability matrix His according to the number k of detection categories:
wherein x i represents the probability of the detection class i, in each frame, the ID of the target successfully tracked output by the tracker is recorded, and for the target with the same ID, a current class label probability matrix Cur is generated by the current detection class j, j E [0, k-1]:
different weights are allocated to the current category label probability matrix and the historical category label probability matrix, and the formula is as follows:
s=αCur+(1-α)His
where s is a smoothed class label probability matrix, α represents a weight, and the smaller the value, the more important the historical label information is.
The smoothed label L is the class label with the largest probability in the class label probability matrix after smoothing, and the specific formula is as follows:
L=max(PClass),PClass∈s。
Particularly, in the step S6, the probability matrix of the current detection tag is different from the probability matrix of the history tracking tag in that the probability matrix of the current target detection tag is necessarily a sparse table, namely, only the probability of the current detection category is alpha, and the rest is 0, while the probability matrix of the history tracking tag is different from 0 in a plurality of categories of tags due to the influence of tag jump, and is only a sparse table in an ideal state.
Furthermore, the invention proposes a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of any one of the preceding claims when executing the computer program.
Furthermore, the present invention proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the claims.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
(1) The method breaks through the limitation that the traditional label smoothing cannot be adaptively adjusted according to the historical state of the target by utilizing the tracking algorithm, effectively integrates detection information and track information, enables the target to still output a relatively stable label under challenge scenes such as mutual shielding, motion blurring and the like, and has good robustness.
(2) According to the method, the historical tag information and the current detection information of the same target are combined by different weights in a weighting mode, so that smoother tag information is obtained, the problem of tag mutation of a single frame or a plurality of frames of tags is effectively restrained, and the probability of mutation is lower and lower along with the time.
(3) The detector used by the invention can be a training model of any mainstream algorithm on a general data set or a specific data set, and the tracking algorithm can also be any mainstream tracking algorithm, so long as the two algorithms can be compatible and matched, thus the invention has wider application value.
(4) The method and the device can effectively improve the stability and the accuracy of the target detection label. When the distance between targets is relatively short or shielding occurs, the problem that the labels are frequently changed due to fluctuation of the output of the detector is solved. The target tracking and tag output is made more stable and robust by a simple and efficient method. The invention can be combined with any mainstream target tracker generating unique track ID, and has stronger adaptability and wide practical value.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the following description briefly describes the drawings used in the embodiments.
FIG. 1 is a flow chart of a tag smoothing method based on history information according to the present invention;
FIG. 2 is a flow chart of a tracking algorithm employed by the present invention.
Detailed Description
In order to more clearly illustrate the technical scheme of the invention, the invention is further described below with reference to the accompanying drawings and the embodiments. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
In practical target detection applications, the classification of certain targets may suddenly mutate due to mutual occlusion of moving targets, or fluctuation of the detector itself. Frequent tag mutations can seriously affect the overall stability of the detection system, which in turn negatively affects subsequent decisions.
Based on the thought, the invention provides a method for smoothing the target detection label based on the historical information, which utilizes a tracking algorithm to fuse the current detection label information of the target with the historical label information to generate a smoother label, thereby effectively reducing frequent jump of the label caused by challenges such as motion shielding, blurring and the like and outputting a more stable label.
As shown in fig. 1, the method for smoothing the target detection label based on the history information provided by the invention specifically comprises the following steps:
Step 1, training by using a model which takes an image or video frame image as input and takes the category and confidence of a target in the image and the coordinates of a target frame of a frame selected target as output as a detector. In a preferred embodiment, the detector is YoloV and training is performed on the unmanned aerial vehicle and bird data set with official pre-training weights to obtain the detector of the present invention.
Step2, inputting a video sequence to be detected into a detector in the form of a frame image to obtain an output result of the detector, wherein the output result comprises a target position, a confidence coefficient and category label information;
In particular, if the confidence score and class label information of the target are not included in the output of a certain detector, the confidence score and class label information can be obtained by modifying the source code.
In a preferred embodiment, the result output of each detection target of the detector is a 6-dimensional vector (x 1, y1, x2, y2, conf, cls), the first four values of the vector are the upper left corner and lower right corner coordinates of the detection frame, conf is a confidence score, the value range is between 0 and 1, cls is a classification result, and cls is an integer variable.
And step 3, if the output image is the first frame, generating initial tracks of the target according to the information such as the coordinates of the target frame, the category, the confidence level and the like, namely initializing the coordinates of all tracks to be detection result coordinates, initializing the mean value and the variance of a Kalman filter according to the coordinates, marking the detection frame as a tracking state according to the detection frame result, and distributing a unique ID for subsequent matching.
It should be noted that the detector designed by the invention is not limited to YoloV, and only needs to output the information which can meet the tracking algorithm and simultaneously contains the label type information, and the tracking algorithm is only required to be compatible with the detector and meets the setting of generating the unique ID.
And 4, the tracker correlates the detection result with the data according to the flow shown in fig. 2, and initializes the historical category label probability matrix according to the category number after obtaining the unique ID of the target.
In a preferred embodiment, the target class is bird and unmanned aerial vehicle, 0 represents bird, 1 represents unmanned aerial vehicle, and for the first appearance of target ID, the initialized historical class label probability matrix is [0,0], and the corresponding class probability is set to 1 according to the current detection frame detection label.
In a preferred embodiment, the tracking target ID is 1, first appears, and the current detection class of 0 indicates birds, so the final historical class tag probability matrix is initialized to [1,0].
And 5, carrying out weighted fusion on the detection category label probability matrix and the history category label probability matrix of the current frame, wherein the specific formula is as follows:
s=αCur+(1-α)His
Wherein s is a smoothed class label probability matrix, cur is a current class label probability matrix generated by a current detection class, his is a history class label probability matrix, alpha represents a weight, and the smaller the value is, the more important the history label information is indicated.
Finally, the smoothed label L is the class label with the largest probability in the smoothed class label probability matrix, and the specific formula is as follows:
L=max(PClass),PClass∈s。
In a preferred embodiment, 0 represents birds, 1 represents unmanned aerial vehicles, the ID of the tracking target is 1, the first occurrence of the tracking target is 0, the Cur generated by the tracking target is [1,0], the His is also [1,0] because of the first occurrence of the tracking target, the probability matrix s of the smoothed tag information is [1,0], and the final smoothed tag L is a class index with the highest probability, namely, an index 0 of the position where the probability is 1, and the tracking target is expressed as birds.
In a preferred embodiment, when the tracking ID is 1 and the second occurrence occurs, the detected tag is 1, then the generated Cur is [0,1], his is [1,0], and α has a value of 0.25, so that the smoothed tag information probability matrix s is [0.75,0.25], and the final smoothed tag L is a category index with the highest probability, that is, an index 0 with a probability of 0.75, and still represents birds.
Step 6, according to the final smoothed label L, the prior class information is read, and the final detection label is output, in a preferred embodiment, 0 represents birds, 1 represents unmanned aerial vehicle class, the first output label is birds for the target with ID 1, the second first detection label is unmanned aerial vehicle class, but after label smoothing, the target is still birds finally.
It should be noted that the main solution here is frequent jump of the tag class due to factors such as occlusion and blurring of the moving object, instead of the false alarm of the detector.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411798622.3A CN119810505B (en) | 2024-12-09 | 2024-12-09 | Method for smoothing target detection label based on historical information |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411798622.3A CN119810505B (en) | 2024-12-09 | 2024-12-09 | Method for smoothing target detection label based on historical information |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119810505A CN119810505A (en) | 2025-04-11 |
| CN119810505B true CN119810505B (en) | 2025-09-23 |
Family
ID=95267121
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411798622.3A Active CN119810505B (en) | 2024-12-09 | 2024-12-09 | Method for smoothing target detection label based on historical information |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119810505B (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115147594A (en) * | 2022-07-06 | 2022-10-04 | 上海海事大学 | A ship image trajectory tracking and prediction method based on ship heading recognition |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9336433B1 (en) * | 2013-07-24 | 2016-05-10 | University Of Central Florida Research Foundation, Inc. | Video face recognition |
| US11599746B2 (en) * | 2020-06-30 | 2023-03-07 | Microsoft Technology Licensing, Llc | Label shift detection and adjustment in predictive modeling |
| CN111914664A (en) * | 2020-07-06 | 2020-11-10 | 同济大学 | Vehicle multi-target detection and trajectory tracking method based on re-identification |
| US11472444B2 (en) * | 2020-12-17 | 2022-10-18 | May Mobility, Inc. | Method and system for dynamically updating an environmental representation of an autonomous agent |
| CN114998999B (en) * | 2022-07-21 | 2022-12-06 | 之江实验室 | Multi-target tracking method and device based on multi-frame input and track smoothing |
| CN115631445A (en) * | 2022-10-31 | 2023-01-20 | 武汉华中天经通视科技有限公司 | A detection frame smoothing method for multi-frame object synthesis |
| CN116152292B (en) * | 2023-02-13 | 2025-09-12 | 东南大学 | A multi-category multi-target tracking method based on cubic matching |
| CN115965657B (en) * | 2023-02-28 | 2023-06-02 | 安徽蔚来智驾科技有限公司 | Target tracking method, electronic device, storage medium and vehicle |
| CN116468112B (en) * | 2023-04-06 | 2024-03-12 | 北京百度网讯科技有限公司 | Training method, device, electronic equipment and storage medium for target detection model |
-
2024
- 2024-12-09 CN CN202411798622.3A patent/CN119810505B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115147594A (en) * | 2022-07-06 | 2022-10-04 | 上海海事大学 | A ship image trajectory tracking and prediction method based on ship heading recognition |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119810505A (en) | 2025-04-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11393103B2 (en) | Target tracking method, device, system and non-transitory computer readable medium | |
| CN110853078B (en) | An online multi-target tracking method based on occlusion pairs | |
| CN111488795A (en) | A real-time pedestrian tracking method applied to unmanned vehicles | |
| CN111192294B (en) | Target tracking method and system based on target detection | |
| CN113034541A (en) | Target tracking method and device, computer equipment and storage medium | |
| CN110288627B (en) | Online multi-target tracking method based on deep learning and data association | |
| CN114049383A (en) | Multi-target tracking method and device and readable storage medium | |
| CN116580333A (en) | Grain depot vehicle tracking method based on YOLOv5 and improved StrongSORT | |
| Meuter et al. | A decision fusion and reasoning module for a traffic sign recognition system | |
| CN114255434A (en) | Multi-target tracking method and device | |
| CN115100565A (en) | Multi-target tracking method based on spatial correlation and optical flow registration | |
| CN107133970A (en) | Online multi-object tracking method and device based on movable information | |
| CN113903083B (en) | Behavior recognition method and apparatus, electronic device, and storage medium | |
| CN116343080A (en) | Method, device and storage medium for dynamic sparse key frame video target detection | |
| Kumar et al. | A novel approach for multi-cue feature fusion for robust object tracking | |
| Walia et al. | Unified graph-based multicue feature fusion for robust visual tracking | |
| CN117237867A (en) | Adaptive scene surveillance video target detection method and system based on feature fusion | |
| CN114092521A (en) | Robust target tracking method and system based on multi-stage adaptive network | |
| CN111931571B (en) | Video text target tracking method and electronic device based on online enhanced detection | |
| CN117036405A (en) | Anti-occlusion target tracking method integrating multi-granularity dynamic appearance | |
| CN115861386A (en) | Unmanned aerial vehicle multi-target tracking method and device through divide-and-conquer association | |
| CN115797410A (en) | A vehicle tracking method and system | |
| CN118297989B (en) | Semi-supervised high-robustness infrared small target tracking method and system | |
| Peng et al. | Tracklet siamese network with constrained clustering for multiple object tracking | |
| CN114972434B (en) | Cascade detection and matching end-to-end multi-target tracking system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |