CN110047095B

CN110047095B - Tracking method and device based on target detection and terminal equipment

Info

Publication number: CN110047095B
Application number: CN201910166616.9A
Authority: CN
Inventors: 叶明�
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-03-06
Filing date: 2019-03-06
Publication date: 2023-07-21
Anticipated expiration: 2039-03-06
Also published as: CN110047095A

Abstract

The invention is applicable to the technical field of data processing, and provides a tracking method, a device, a terminal device and a computer readable storage medium based on target detection, comprising the following steps: carrying out detection frame analysis on at least two frames of images to be detected to obtain at least one non-overlapping candidate frame; performing association operation on each non-overlapping candidate frame in each frame of image to be detected and each non-overlapping candidate frame in the next frame of image to be detected, and determining at least two associated candidate frames associated with the same target; performing key point detection on each associated candidate frame to obtain key point confidence coefficient, and calculating a single frame score based on the detection frame confidence coefficient and the key point confidence coefficient; and calculating a comprehensive score based on the single frame score, determining an associated candidate frame corresponding to the comprehensive score higher than the high peak value of the score as a target frame, outputting images in the target frame as tracking results, and hiding the associated candidate frames except the target frame in all the images to be detected. The invention improves the efficiency of target tracking on the basis of ensuring the detection precision.

Description

Tracking method and device based on target detection and terminal equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a tracking method and apparatus based on target detection, a terminal device, and a computer readable storage medium.

Background

Along with the rapid development of computer technology and algorithm ideas, target detection has become a current popular research direction, and is widely used in scenes such as entrance guard, monitoring, accident identification and the like. Since the object of object detection is typically video, the current problem is how to track moving objects in dynamic video.

In the prior art, when tracking targets in a video, the video is mostly split into multiple frame images, feature extraction is performed on each analyzed detection frame in a single frame image, features extracted in front and rear frame images are compared, whether the detection frames of the front and rear frames are the same target is judged according to the obtained similarity, and as each detection frame needs to be subjected to feature extraction and comparison, the detection speed is low, and when the number of images is large or targets in the images are large, the data quantity to be calculated is large, so that real-time tracking cannot be realized. In summary, the efficiency of performing object tracking in the prior art is low, and the method is not applicable to scenes with more images to be detected or more objects in the images to be detected.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a tracking method, apparatus, terminal device and computer readable storage medium based on target detection, so as to solve the problem that in the prior art, the efficiency of target tracking is low and the method is not applicable to a scene with more images to be detected or more targets in the images to be detected.

A first aspect of an embodiment of the present invention provides a tracking method based on target detection, including:

carrying out detection frame analysis on at least two frames of images to be detected to obtain at least one non-overlapping candidate frame in each frame of the images to be detected, wherein the non-overlapping candidate frame is used for indicating a target to be detected, and the confidence of the detection frame of the non-overlapping candidate frame is higher than a preset confidence threshold;

performing association operation on each non-overlapping candidate frame in each frame of the image to be detected and each non-overlapping candidate frame in the next frame of the image to be detected, and determining at least two non-overlapping candidate frames associated with the same target as associated candidate frames;

performing key point detection on each associated candidate frame to obtain key point confidence coefficient, and calculating a single frame score of the associated candidate frame based on the detection frame confidence coefficient and the key point confidence coefficient;

and calculating the comprehensive score of the associated candidate frame based on the single frame score, determining the associated candidate frame corresponding to the comprehensive score higher than a preset score peak value as a target frame, outputting an image in the target frame as a tracking result, and hiding other associated candidate frames except the target frame in the image to be detected.

A second aspect of an embodiment of the present invention provides a tracking device based on target detection, including:

the detection frame analysis unit is used for carrying out detection frame analysis on at least two frames of images to be detected to obtain at least one non-overlapping candidate frame in each frame of the images to be detected, wherein the non-overlapping candidate frame is used for indicating a target to be detected, and the confidence of the detection frame of the non-overlapping candidate frame is higher than a preset confidence threshold;

the association operation unit is used for carrying out association operation on each non-overlapping candidate frame in each frame of the image to be detected and each non-overlapping candidate frame in the next frame of the image to be detected, and determining at least two non-overlapping candidate frames associated with the same target as associated candidate frames;

the computing unit is used for carrying out key point detection on each associated candidate frame to obtain key point confidence coefficient, and computing single frame scores of the associated candidate frames based on the detection frame confidence coefficient and the key point confidence coefficient;

and the hiding unit is used for calculating the comprehensive score of the associated candidate frame based on the single frame score, determining the associated candidate frame corresponding to the comprehensive score higher than a preset high score peak value as a target frame, outputting an image in the target frame as a tracking result, and hiding other associated candidate frames except the target frame in the image to be detected.

A third aspect of an embodiment of the present invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

A fourth aspect of the embodiments of the present invention provides a computer readable storage medium storing a computer program which when executed by a processor performs the steps of:

Compared with the prior art, the embodiment of the invention has the beneficial effects that:

according to the embodiment of the invention, the targets are used as detection conditions, the associated candidate frames related to the same target in at least two frames of images to be detected are determined, the comprehensive scores of the associated candidate frames are calculated through the confidence level of the detection frames and the confidence level of the key points, when the comprehensive scores are higher than the preset high-peak value of the scores, the corresponding associated candidate frames are determined as target frames, the images in the target frames are output as tracking results, and the associated candidate frames except the target frames in all the frames of images to be detected are hidden. According to the embodiment of the invention, the step of feature comparison is omitted by setting the front-rear association mechanism and the scoring mechanism, so that the target frame related to the target can be rapidly determined, other detection frames which are not attached to the target are hidden, the efficiency of target tracking is greatly improved while a certain detection precision is ensured, and the method and the device are applicable to scenes with more images to be detected or more targets in the images to be detected.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an implementation of a tracking method based on object detection according to an embodiment of the present invention;

fig. 2 is a flowchart of an implementation of a tracking method based on object detection according to a second embodiment of the present invention;

FIG. 3 is a flowchart of a tracking method based on object detection according to a third embodiment of the present invention;

fig. 4 is a flowchart of an implementation of a tracking method based on object detection according to a fourth embodiment of the present invention;

FIG. 5 is a flowchart of a tracking method based on object detection according to a fifth embodiment of the present invention;

FIG. 6 is a block diagram of a tracking device based on object detection according to a sixth embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to a seventh embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

Fig. 1 shows an implementation flow of a tracking method based on target detection according to an embodiment of the present invention, which is described in detail below:

in S101, performing detection frame analysis on at least two frames of images to be detected to obtain at least one non-overlapping candidate frame in each frame of the images to be detected, where the non-overlapping candidate frame is used for indicating a target to be detected, and the confidence of the detection frame of the non-overlapping candidate frame is higher than a preset confidence threshold.

In the embodiment of the invention, firstly, detection frame analysis is performed on at least two acquired frames of images to be detected (at least two frames of images to be detected which can be split by videos), the detection frame analysis can be realized through a preset detection model, and the detection model is, for example, a convolutional neural network (Convolutional Neural Networks, CNN) model or a hyperspace model. In the process of analyzing a detection frame of each frame of image to be detected through a detection model, at least one candidate frame where a target in each frame of image to be detected is likely to be located is firstly analyzed, and the confidence of the detection frame of each candidate frame is calculated, wherein the confidence of the detection frame refers to the probability that the candidate frame truly contains the target, and the calculation process of the confidence of the detection frame is related to a specific detection model. Because at least two targets possibly exist in one frame of image to be detected, screening out a candidate frame with highest confidence coefficient of the detection frame for at least one candidate frame corresponding to each target detected in each frame of image to be detected, and if the confidence coefficient of the detection frame of the candidate frame is higher than a preset confidence coefficient threshold, determining the candidate frame as a non-overlapping candidate frame so as to facilitate subsequent analysis; if the confidence coefficient of the detection frame of the candidate frame is not higher than the preset confidence coefficient threshold value, hiding the candidate frame, and not analyzing the candidate frame later. The confidence threshold can be set according to an actual application scene, the lower the confidence threshold is, the faster the detection speed is, and the lower the detection precision is, for example, the confidence threshold can be set to be 50%. It should be noted that, since the initial image to be measured is generally larger in size (e.g. 1920×1080), after each frame of image to be measured is uniformly compressed (uniform size compression is used to ensure that the sizes of all compressed images to be measured are the same), the compressed image to be measured is subjected to detection frame analysis, so as to increase the detection speed, and correspondingly, after the image to be measured is compressed, the confidence of the detection frame of the corresponding candidate frame is also reduced, so that, in order to ensure the detection effect, the value of the confidence threshold is correspondingly reduced according to the compression condition of the image to be measured, for example, the confidence threshold is reduced to 40%.

After the non-overlapping candidate frames are determined, each non-overlapping candidate frame in each frame of the image to be measured indicates a target to be measured, and targets indicated by different non-overlapping candidate frames in one frame of the image to be measured are different. In addition, the candidate frames except the determined non-overlapping candidate frames in each frame of the image to be detected can be directly hidden, and the subsequent calculated amount is reduced by hiding in advance, so that the operation of hiding the candidate frames is equivalent to directly discarding the candidate frames. It should be noted that the specific type of the object to be detected in the embodiment of the present invention is not limited, for example, the object to be detected may be a face, a vehicle, a building, or the like.

In S102, performing an association operation between each non-overlapping candidate frame in each frame of the image to be measured and each non-overlapping candidate frame in the next frame of the image to be measured, and determining at least two non-overlapping candidate frames associated with the same object as associated candidate frames.

Since there may be at least two targets in the image to be measured, for example, in the case where the image to be measured is a live image of a sidewalk and the target to be measured is a face, there may be at least two faces in one frame of the image to be measured, in order to track the motion of each target, each of the images to be measured is recorded in each frame And carrying out association operation on the non-overlapping candidate frames and each non-overlapping candidate frame in the next frame of image to be detected, and determining at least two non-overlapping candidate frames associated with the same target as associated candidate frames, wherein the at least two associated candidate frames associated with the same target are not located in the same frame of image to be detected. When the association operation is performed, for the image to be detected with the fixed appearance position of the target, the appearance areas of at least two targets may be divided in advance, and the non-overlapping candidate frame in one frame of the image to be detected and the non-overlapping candidate frame in the next frame of the image to be detected in the same appearance area are both determined as the association candidate frame. It should be noted that the correlation operation is performed based on the sequence of the photographing time of the images to be measured from front to back, for example, if the images to be measured Picture are sequentially obtained in the sequence of the photographing time from front to back _A 、Picture _B Pictures _C And Picture _A There are non-overlapping candidate boxes in _A1 And Box _A2 ，Picture _B There are non-overlapping candidate boxes in _B1 And Box _B2 ，Picture _C There are non-overlapping candidate boxes in _C1 And Box _C2 Then Picture is first taken _A Each non-overlapping candidate box and Picture in (a) _B Each non-overlapping candidate Box in the block is subjected to association operation, and the obtained result is a Box _A1 And Box _B1 Associated with the same object, box _A2 And Box _B2 Associated with the same target, and then Picture _B Each non-overlapping candidate box and Picture in (a) _C Each non-overlapping candidate Box in the block is subjected to association operation, and the obtained result is a Box _B1 And Box _C1 Are all associated with the same target, box _B2 And Box _C2 All are associated with the same target, and finally the Box can be determined _A1 、Box _B1 Box _C1 Box, which is an associated candidate Box associated with the same object _A2 、Box _B2 Box _C2 Is an association candidate box associated with another object.

In S103, performing a keypoint detection on each associated candidate frame to obtain a keypoint confidence, and calculating a single frame score of the associated candidate frame based on the detection frame confidence and the keypoint confidence.

After the associated candidate frame is obtained, the associated candidate frame is subjected to key point detection, wherein the purpose of the key point detection is to detect a key point of a target to be detected in the associated candidate frame, the key point detection can be realized based on a key point detection model of an open source, and the key point detection model is an active shape model (Active Shape Model, ASM), an active appearance model (Active Appearnce Model, AAM) or a deep learning model. After the detection of the key points is completed, the confidence coefficient of the key points is obtained, the confidence coefficient of the key points is the probability that the key points detected in the associated candidate frames belong to the target to be detected, the calculation process of the confidence coefficient of the key points is related to a specifically adopted key point detection model, and the embodiment of the invention is not described in detail herein. After obtaining the confidence coefficient of the key point, calculating a single frame score of the associated candidate frame based on the confidence coefficient of the detection frame and the confidence coefficient of the key point of the associated candidate frame, wherein the single frame score can be directly calculated as the sum of the confidence coefficient of the detection frame and the confidence coefficient of the key point, and other calculation modes exist, and specific contents are described later.

In S104, calculating a comprehensive score of the associated candidate frame based on the single frame score, determining the associated candidate frame corresponding to the comprehensive score higher than a preset high score peak as a target frame, outputting an image in the target frame as a tracking result, and hiding other associated candidate frames except the target frame in the image to be detected.

After obtaining a single frame score of an associated candidate frame in a frame of an image to be detected, calculating a comprehensive score of the associated candidate frame based on the single frame score, wherein the comprehensive score indicates the credibility of the associated candidate frame for tracking analysis. Specifically, if the associated candidate frame in the current frame to-be-detected image is the first associated candidate frame which is detected to be related to the new target, directly determining the single frame score as the comprehensive score of the associated candidate frame; if the associated candidate frame in the current frame to-be-detected image is not the first associated candidate frame related to the new target, determining the associated candidate frame of the previous frame to-be-detected image as a basic candidate frame, and calculating the comprehensive score of the current frame basic candidate frame according to the comprehensive score of the basic candidate frame and the single frame score of the current frame associated candidate frame, wherein the comprehensive score of the current frame basic candidate frame can be specifically set as the sum of the comprehensive score of the basic candidate frame and the single frame score of the current frame associated candidate frame, and more calculation modes can be applied according to different practical application scenes, and the basic candidate frame and the associated candidate frame in the current frame to-be-detected image correspond to the same target.

After the comprehensive score of each associated candidate frame in each frame of image to be detected is determined, the associated candidate frame corresponding to the comprehensive score higher than the preset score peak value is determined as a target frame, the images in the target frame are output as tracking results, the associated candidate frames except the target frame in all frames of image to be detected are hidden, the calculation efficiency of subsequent calculation is improved, the calculation amount caused by excessive frames is prevented from being large, wherein the numerical value of the score peak value can be set in a self-defined mode, the higher the numerical value of the score peak value is, the higher the tracking precision is, but the number of the determined target frames is smaller. When the images in the target frames are output as tracking results, partial images in the target frames corresponding to the same target in each frame of images to be detected can be sequentially output according to the sequence from front to back of shooting time and according to the mode of target outputting, so that the tracking condition of the same target is directly reflected.

As can be seen from the embodiment shown in fig. 1, in the embodiment of the present invention, by determining the associated candidate frame in each frame of the image to be detected, calculating the comprehensive score of the associated candidate frame, determining the associated candidate frame corresponding to the comprehensive score higher than the preset high peak value of the score as the target frame, outputting the image in the target frame as the tracking result, and hiding other associated candidate frames except the target frame in all frames of the image to be detected, the embodiment of the present invention greatly improves the efficiency of target tracking on the basis of ensuring a certain detection accuracy by hiding the associated candidate frame with a lower reliability, and is suitable for the scene with more frames of the image to be detected or more targets in the image to be detected.

Fig. 2 shows a method of expanding the first embodiment of the present invention. The embodiment of the invention provides a realization flow chart of a tracking method based on target detection, as shown in fig. 2, the tracking method can comprise the following steps:

in S201, the associated candidate frame in the image to be tested of the current frame is determined as a current candidate frame, and the associated candidate frame in the image to be tested of the next frame is determined as a candidate frame to be evaluated, where the current candidate frame and the candidate frame to be evaluated correspond to the same target.

In order to further improve the efficiency of target tracking, the calculation mode of single frame scores of the associated candidate frames in the next frame of the image to be evaluated can be updated based on the comprehensive scores of the associated candidate frames in the current frame of the image to be evaluated, in order to facilitate distinguishing, the associated candidate frames in the current frame of the image to be evaluated are named as current candidate frames, the associated candidate frames in the next frame of the image to be evaluated of the current frame of the image to be evaluated are determined as candidate frames to be evaluated, and the current candidate frames and the candidate frames to be evaluated correspond to the same target.

In S202, if the composite score of the current candidate frame is higher than the score peak, the single frame score of the candidate frame to be evaluated is calculated based on the detection frame confidence of the candidate frame to be evaluated.

If the obtained comprehensive score of the current candidate frame is higher than the score high peak value, the confidence level of the current candidate frame is higher, and the target corresponding to the current candidate frame is easy to detect, in order to reduce resource consumption, key point detection is not carried out on the candidate frame to be evaluated in the next frame of image to be evaluated, and single frame score of the candidate frame to be evaluated is directly calculated based on the confidence level of the detection frame of the candidate frame to be evaluated. The calculation mode of the single frame score in the step is determined according to the original single frame score calculation mode, for example, in the original single frame score calculation mode, the single frame score is the sum of the confidence coefficient of the detection frame and the confidence coefficient of the key point, and the confidence coefficient of the detection frame of the candidate frame to be evaluated can be directly determined as the single frame score of the candidate frame to be evaluated in the step.

In S203, if the composite score of the current candidate frame is lower than a preset low score valley, determining the current candidate frame as a failure detection frame, where the low score valley is smaller than the high score peak.

In the embodiment of the invention, besides the high-peak value of the score, a low-valley value of the score is set in advance and is used for determining the associated candidate frame with lower credibility, wherein the value of the low-valley value of the score is smaller than the high-peak value of the score. If the comprehensive score of the current candidate frame is lower than the score valley value, determining the current candidate frame as a failure detection frame, namely, recognizing that the target is not detected in the current candidate frame, and in addition, obtaining the single-frame score of the candidate frame to be evaluated in the next frame of to-be-tested image according to the original single-frame score calculation mode.

Optionally, if consecutive failure detection frames with the number exceeding the preset upper limit number appear, the associated candidate frame of the same target corresponding to the failure detection frame is hidden in the image to be detected of the subsequent frame. In order to prevent resource loss caused by continuously calculating the associated candidate frames with lower reliability, after detecting continuous failure detection frames (i.e. the images to be detected where the failure detection frames are located are continuous frames) which are more than a preset upper limit number and correspond to the same target, the target is considered to be lost, and the associated candidate frames which correspond to the same target with the failure detection frames are hidden in the images to be detected of the subsequent frames, wherein the preset upper limit number can be set according to the shooting frequency of the images to be detected and the expected motion condition of the target, for example, the preset upper limit number can be set to be 5. In addition, a tracking failure prompt may be output, where the tracking failure prompt is used to indicate that tracking of the target corresponding to the failure detection box fails.

In S204, if the composite score of the current candidate frame is located between the low score valley and the high score peak, performing keypoint detection on the candidate frame to be evaluated to obtain the keypoint confidence, and calculating the single frame score of the candidate frame to be evaluated based on the detection frame confidence and the keypoint confidence of the candidate frame to be evaluated.

In the third case, the comprehensive score of the current candidate frame is located between the low score valley value and the high score peak value, in this case, the single-frame score of the candidate frame to be evaluated is calculated according to the original single-frame score calculation mode, that is, the candidate frame to be evaluated is subjected to key point detection to obtain the key point confidence, and the single-frame score of the candidate frame to be evaluated is calculated based on the detection frame confidence of the candidate frame to be evaluated and the key point confidence.

As can be seen from the embodiment shown in fig. 2, in the embodiment of the present invention, the comprehensive score of the current candidate frame is measured according to the preset high-peak score and low-valley score, and the single-frame score of the candidate frame to be evaluated is calculated in different manners according to the measurement result, for example, in the case that the comprehensive score is higher than the high-peak score, the single-frame score of the candidate frame to be evaluated is calculated only according to the confidence of the detection frame of the candidate frame to be evaluated, and the calculation efficiency of the single-frame score is improved through the calculation process of the precise simple-frame score in the specific case.

Fig. 3 shows a method obtained by performing association operation on each non-overlapping candidate frame in each frame of image to be measured and each non-overlapping candidate frame in the next frame of image to be measured, and refining a process of determining at least two non-overlapping candidate frames associated with the same object as associated candidate frames on the basis of the first embodiment of the present invention. The embodiment of the invention provides a realization flow chart of a tracking method based on target detection, as shown in fig. 3, the tracking method can comprise the following steps:

In S301, performing a cross-over ratio operation on each non-overlapping candidate frame in the image to be tested and each non-overlapping candidate frame in the image to be tested of the next frame, to obtain at least one cross-over ratio result.

In determining the associated candidate box, a cross-over algorithm may be applied to make the determination. Specifically, performing an intersection ratio operation on each non-overlapping candidate frame in each frame of image to be detected and each non-overlapping candidate frame in the next frame of image to be detected to obtain at least one intersection ratio result, wherein each intersection ratio result is related to one non-overlapping candidate frame in the current frame of image to be detected and one non-overlapping candidate frame in the next frame of image to be detected.

In S302, two non-overlapping candidate frames corresponding to the cross ratio result higher than a preset cross ratio threshold are determined as the associated candidate frames, where the determined two associated candidate frames are related to the same target.

And regarding the obtained cross ratio result, two non-overlapping candidate frames corresponding to the cross ratio result which is higher than a preset cross ratio threshold (for example, set to 0.5) are considered to be related to the same target, so that two non-overlapping candidate frames corresponding to the cross ratio result which is higher than the preset cross ratio threshold are both determined to be related candidate frames. And hiding two non-overlapping candidate frames corresponding to the cross ratio result for the cross ratio result which is not higher than the preset cross ratio threshold. And repeating the steps S301 to S302 until the determination of the associated candidate frame in the last frame of image to be detected is completed.

Optionally, calculating a correlation value between each non-overlapping candidate frame in each frame of the image to be detected and each non-overlapping candidate frame in the next frame of the image to be detected based on a Deepsort algorithm; and determining two non-overlapping candidate frames corresponding to the association value higher than the preset association threshold as the association candidate frames. In the embodiment of the invention, the Deepsort algorithm can be further applied to calculate the association value between each non-overlapping candidate frame in each frame of image to be detected and each non-overlapping candidate frame in the next frame of image to be detected, and the Deepsort algorithm is further combined with the depth characteristic similarity of the target to be detected to calculate the association value between the two non-overlapping candidate frames on the basis of the cross-over comparison algorithm, namely the association value is generated by fusing the cross-over comparison result and the depth characteristic similarity of the target to be detected. For the calculated association value, two non-overlapping candidate frames corresponding to the association value higher than a preset association threshold (for example, set to 0.5) are determined as the association candidate frames, and the reliability of the determined association candidate frames is further improved by combining the characteristics of the target to be detected.

As can be seen from the embodiment shown in fig. 3, in the embodiment of the present invention, the cross-correlation operation is performed on each non-overlapping candidate frame in each frame of the image to be detected and each non-overlapping candidate frame in the next frame of the image to be detected to obtain at least one cross-correlation result, and two non-overlapping candidate frames corresponding to the cross-correlation result higher than the preset cross-correlation threshold are determined as associated candidate frames, so that the determination of the associated candidate frames is realized based on the cross-correlation algorithm, and the reliability of the determined associated candidate frames is improved.

Fig. 4 shows a method of refining a process of calculating a single frame score of an associated candidate frame based on a confidence level of a detection frame and a confidence level of a key point according to the first embodiment of the present invention. The embodiment of the invention provides a realization flow chart of a tracking method based on target detection, as shown in fig. 4, the tracking method can comprise the following steps:

in S401, a preset detection frame coefficient is set as a weight of the confidence coefficient of the detection frame, and a preset key point coefficient is set as a weight of the confidence coefficient of the key point, where a ratio between the detection frame coefficient and the key point coefficient is related to the density degree of the target in the image to be detected.

In order to improve the accuracy of target tracking and enable single-frame scoring to be in line with the actual condition of a target in an image to be detected, the embodiment of the invention further provides a single-frame scoring calculation mode besides directly taking the sum of the confidence coefficient of the detection frame and the confidence coefficient of the key point as the single-frame scoring. Specifically, a preset detection frame coefficient is set as a weight of the confidence coefficient of the detection frame, and a preset key point coefficient is set as a weight of the confidence coefficient of the key point, wherein the set ratio between the detection frame coefficient and the key point coefficient is related to the density degree of the target in the image to be detected, if the density degree of the target in the image to be detected is higher, the target in the image to be detected is harder to detect, namely the reliability degree of the confidence coefficient of the detection frame is reduced, namely the ratio between the detection frame coefficient and the key point coefficient is smaller. For example, the ratio between the detection box coefficient and the keypoint coefficient may be set to 2:1.

In S402, the confidence of the detection frame and the confidence of the key point are weighted and summed to obtain the single frame score.

After the weights of the detection frame confidence and the key point confidence are determined, the detection frame confidence and the key point confidence are weighted and summed, and the weighted and summed result is used as a single frame score, namely, single frame score=detection frame confidence, detection frame coefficient+key point confidence, and key point coefficient.

Under the condition that the original calculation mode of the single frame score is described in this step, if the operation in step S202 needs to be performed, that is, the calculation mode of the candidate frame to be evaluated is simplified, the confidence coefficient of the key point and the key point coefficient may be omitted, and the single frame score of the candidate frame to be evaluated may be directly calculated according to the confidence coefficient of the detection frame and the detection frame coefficient, that is, the single frame score=confidence coefficient of the detection frame of the candidate frame to be evaluated.

As can be seen from the embodiment shown in fig. 4, in the embodiment of the present invention, a preset detection frame coefficient is set as a weight of the confidence coefficient of the detection frame, and a preset key point coefficient is set as a weight of the confidence coefficient of the key point, and the confidence coefficient of the detection frame and the confidence coefficient of the key point are weighted and summed to obtain a single frame score.

Fig. 5 shows a method of refining a process of calculating a composite score of an associated candidate frame based on a single frame score according to the first embodiment of the present invention. The embodiment of the invention provides a realization flow chart of a tracking method based on target detection, as shown in fig. 5, the tracking method can comprise the following steps:

in S501, the associated candidate frame in the image to be measured of the previous frame is determined as a basic candidate frame, the comprehensive score of the basic candidate frame is determined as a basic score, and the associated candidate frame in the image to be measured of the current frame is determined as a current candidate frame, where the basic candidate frame and the current candidate frame correspond to the same target.

In embodiments of the present invention, the composite score of the associated candidate box may be determined in conjunction with the displacement velocity of the target. Specifically, when calculating the comprehensive score of the associated candidate frame in the current frame to-be-measured image, determining the associated candidate frame in the previous frame to-be-measured image as a basic candidate frame, determining the comprehensive score of the basic candidate frame as a basic score, and determining the associated candidate frame in the current frame to-be-measured image as a current candidate frame, wherein the basic candidate frame and the current candidate frame correspond to the same target. If the current frame to-be-detected image is the first frame to-be-detected image or the previous frame to-be-detected image does not have the basic candidate frame corresponding to the same target with the current candidate frame, directly taking the single frame score of the current candidate frame as the comprehensive score of the current candidate frame.

In S502, a preset first sliding coefficient and a preset second sliding coefficient are obtained, where the first sliding coefficient and the second sliding coefficient are both related to the displacement speed of the target in the image to be measured, and the sum of the first sliding coefficient and the second sliding coefficient is one.

In order to make the composite score fit the displacement speed of the target, a first sliding coefficient and a second sliding coefficient for use as weights are preset in the embodiment of the present invention, and the sum of the first sliding coefficient and the second sliding coefficient is one. The first sliding coefficient is used for indicating the influence degree of the comprehensive score of the basic candidate frame on the comprehensive score of the current candidate frame, and can be set according to the actual application scene depending on the displacement speed of the target. When the displacement speed of the target is faster, namely, when the distance between the same target is larger for two adjacent frames of images to be measured, the first sliding coefficient is set smaller. Preferably, the first slip factor is greater than the second slip factor. For example, in the case where the acquisition frequency of the image to be measured is 25 frames per second and the displacement speed of the target is small, the first sliding coefficient may be set to 0.9 and the second sliding coefficient may be set to 0.1.

In S503, the first sliding coefficient is set as a weight of the base score, the second sliding coefficient is set as a weight of the single frame score of the current candidate frame, and the base score and the single frame score of the current candidate frame are weighted and summed to obtain the composite score of the current candidate frame.

Setting the first sliding coefficient as a weight of a basic score, setting the second sliding coefficient as a weight of a single frame score of a current candidate frame, and then carrying out weighted summation on the basic score and the single frame score of the current candidate frame to obtain a comprehensive score of the current candidate frame, namely, a calculation formula is as follows:

composite score of current candidate block = base score first sliding coefficient + single frame score of current candidate block second sliding coefficient.

As can be seen from the embodiment shown in fig. 5, in the embodiment of the present invention, a preset first sliding coefficient is used as a weight of a basic score, a preset second sliding coefficient is used as a weight of a single frame score of a current candidate frame, and the basic score and the single frame score of the current candidate frame are weighted and summed to obtain a comprehensive score of the current candidate frame.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Corresponding to the tracking method based on target detection described in the above embodiments, fig. 6 shows a block diagram of a tracking device based on target detection according to an embodiment of the present invention, and referring to fig. 6, the tracking device includes:

the detection frame analysis unit 61 is configured to perform detection frame analysis on at least two frames of images to be detected to obtain at least one non-overlapping candidate frame in each frame of the images to be detected, where the non-overlapping candidate frame is used for indicating a target to be detected, and the confidence of the detection frame of the non-overlapping candidate frame is higher than a preset confidence threshold;

an association operation unit 62, configured to perform association operation on each non-overlapping candidate frame in each frame of the image to be measured and each non-overlapping candidate frame in the next frame of the image to be measured, and determine at least two non-overlapping candidate frames associated with the same object as associated candidate frames;

a calculating unit 63, configured to perform key point detection on each associated candidate frame to obtain a key point confidence coefficient, and calculate a single frame score of the associated candidate frame based on the detection frame confidence coefficient and the key point confidence coefficient;

And a hiding unit 64, configured to calculate a comprehensive score of the associated candidate frame based on the single frame score, determine the associated candidate frame corresponding to the comprehensive score higher than a preset high score peak as a target frame, output an image in the target frame as a tracking result, and hide other associated candidate frames except the target frame in the image to be detected.

Optionally, the tracking device further comprises:

the determining unit is used for determining the associated candidate frame in the image to be evaluated in the current frame as a current candidate frame and determining the associated candidate frame in the image to be evaluated in the next frame as a candidate frame to be evaluated, wherein the current candidate frame and the candidate frame to be evaluated correspond to the same target;

a first branching unit, configured to calculate, if the composite score of the current candidate frame is higher than the score peak value, the single frame score of the candidate frame to be evaluated based on the detection frame confidence of the candidate frame to be evaluated;

a second branching unit, configured to determine the current candidate frame as a failure detection frame if the composite score of the current candidate frame is lower than a preset score low-valley value, where the score low-valley value is smaller than the score high-peak value;

And the third branch unit is used for detecting key points of the candidate frames to be evaluated to obtain the key point confidence coefficient if the comprehensive score of the current candidate frame is positioned between the score low valley value and the score high peak value, and calculating the single frame score of the candidate frames to be evaluated based on the detection frame confidence coefficient of the candidate frames to be evaluated and the key point confidence coefficient.

Optionally, the tracking device further comprises:

and the failure hiding unit is used for hiding the associated candidate frames of the same target corresponding to the failure detection frames in the image to be detected of the subsequent frames if the continuous failure detection frames with the number exceeding the preset upper limit number appear.

Alternatively, the association operation unit 62 includes:

the cross-over comparison unit is used for carrying out cross-over comparison operation on each non-overlapping candidate frame in each frame of the image to be detected and each non-overlapping candidate frame in the next frame of the image to be detected to obtain at least one cross-over comparison result;

and the association frame determining unit is used for determining two non-overlapping candidate frames corresponding to the cross ratio result higher than a preset cross ratio threshold value as the association candidate frames, wherein the determined two association candidate frames are related to the same target.

Optionally, the calculation unit 63 includes:

the weight setting unit is used for setting a preset detection frame coefficient as the weight of the confidence coefficient of the detection frame and setting a preset key point coefficient as the weight of the confidence coefficient of the key point, wherein the ratio between the detection frame coefficient and the key point coefficient is related to the density degree of the target in the image to be detected;

and the first weighting unit is used for carrying out weighted summation on the confidence coefficient of the detection frame and the confidence coefficient of the key point to obtain the single frame score.

Optionally, the hiding unit 64 includes:

a basic determining unit, configured to determine the associated candidate frame in the image to be measured of the previous frame as a basic candidate frame, determine the comprehensive score of the basic candidate frame as a basic score, and determine the associated candidate frame in the image to be measured of the current frame as a current candidate frame, where the basic candidate frame and the current candidate frame correspond to the same target;

the device comprises an acquisition unit, a detection unit and a display unit, wherein the acquisition unit is used for acquiring a preset first sliding coefficient and a preset second sliding coefficient, the first sliding coefficient and the second sliding coefficient are both related to the displacement speed of the target in the image to be detected, and the sum of the first sliding coefficient and the second sliding coefficient is one;

And the second weighting unit is used for setting the first sliding coefficient as the weight of the basic score, setting the second sliding coefficient as the weight of the single frame score of the current candidate frame, and carrying out weighted summation on the basic score and the single frame score of the current candidate frame to obtain the comprehensive score of the current candidate frame.

Therefore, the tracking device based on target detection provided by the embodiment of the invention omits the step of feature comparison by setting the front-rear association mechanism and the scoring mechanism, so that the target frame related to the target can be rapidly determined, other detection frames which are not attached to the target are hidden, the efficiency of target tracking is greatly improved while certain detection precision is ensured, and the tracking device based on the target detection is applicable to scenes with more images to be detected or more targets in the images to be detected.

Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72 stored in the memory 71 and executable on the processor 70, for example a tracking program based on object detection. The processor 70, when executing the computer program 72, implements the steps of the various embodiments of the tracking method based on object detection described above, such as steps S101 to S104 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, performs the functions of the units of the tracking device embodiments described above based on object detection, such as the functions of the units 61 to 64 shown in fig. 6.

By way of example, the computer program 72 may be divided into one or more units, which are stored in the memory 71 and executed by the processor 70 to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 72 in the terminal device 7. For example, the computer program 72 may be divided into a detection frame analysis unit, an association operation unit, a calculation unit, and a concealment unit, each unit specifically functioning as follows:

The terminal device 7 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the terminal device 7 and does not constitute a limitation of the terminal device 7, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The processor 70 may be a central processing unit (Central Processing Unit, CPU), or may be another general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 71 may also be used for temporarily storing data that has been output or is to be output.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units is illustrated, and in practical application, the above-mentioned functional allocation may be performed by different functional units, that is, the internal structure of the terminal device is divided into different functional units, so as to perform all or part of the above-mentioned functions. The functional units in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present application. The specific working process of the units in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed terminal device and method may be implemented in other manners. For example, the above-described terminal device embodiments are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A tracking method based on target detection, comprising:

performing cross-over comparison operation on each non-overlapping candidate frame in each frame of the image to be detected and each non-overlapping candidate frame in the next frame of the image to be detected to obtain at least one cross-over comparison result;

determining two non-overlapping candidate frames corresponding to the cross ratio result higher than a preset cross ratio threshold as associated candidate frames, wherein the two determined associated candidate frames are related to the same target;

Setting a preset detection frame coefficient as a weight of the confidence coefficient of the detection frame, and setting a preset key point coefficient as a weight of the confidence coefficient of the key point, wherein the ratio between the detection frame coefficient and the key point coefficient is related to the density degree of the target in the image to be detected;

carrying out weighted summation on the confidence coefficient of the detection frame and the confidence coefficient of the key point to obtain a single frame score of the associated candidate frame;

determining the associated candidate frame in the image to be detected of the previous frame as a basic candidate frame, determining the comprehensive score of the basic candidate frame as a basic score, and determining the associated candidate frame in the image to be detected of the current frame as a current candidate frame, wherein the basic candidate frame and the current candidate frame correspond to the same target;

acquiring a preset first sliding coefficient and a preset second sliding coefficient, wherein the first sliding coefficient and the second sliding coefficient are related to the displacement speed of the target in the image to be detected, and the sum of the first sliding coefficient and the second sliding coefficient is one;

setting the first sliding coefficient as a weight of the basic score, setting the second sliding coefficient as a weight of the single frame score of the current candidate frame, and carrying out weighted summation on the basic score and the single frame score of the current candidate frame to obtain the comprehensive score of the current candidate frame;

And determining the associated candidate frames corresponding to the comprehensive scores higher than the preset score peak value as target frames, outputting images in the target frames as tracking results, and hiding other associated candidate frames except the target frames in the images to be detected.

2. The tracking method of claim 1, further comprising:

determining the associated candidate frame in the image to be evaluated in the current frame as a current candidate frame, and determining the associated candidate frame in the image to be evaluated in the next frame as a candidate frame to be evaluated, wherein the current candidate frame and the candidate frame to be evaluated correspond to the same target;

if the comprehensive score of the current candidate frame is higher than the score peak value, calculating the single frame score of the candidate frame to be evaluated based on the detection frame confidence of the candidate frame to be evaluated;

if the comprehensive score of the current candidate frame is lower than a preset score low valley value, determining the current candidate frame as a failure detection frame, wherein the score low valley value is smaller than the score high peak value;

and if the comprehensive score of the current candidate frame is positioned between the score low valley value and the score high peak value, performing key point detection on the candidate frame to be evaluated to obtain the key point confidence coefficient, and calculating the single frame score of the candidate frame to be evaluated based on the detection frame confidence coefficient and the key point confidence coefficient of the candidate frame to be evaluated.

3. The tracking method of claim 2, further comprising:

if the continuous failure detection frames with the number exceeding the preset upper limit number appear, hiding the associated candidate frames of the same target corresponding to the failure detection frames in the image to be detected of the subsequent frames.

4. A tracking device based on target detection for implementing the tracking method based on target detection according to any one of claims 1 to 3, the tracking device comprising:

5. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

6. The terminal device of claim 5, further comprising:

7. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the tracking method according to any one of claims 1 to 3.