[go: up one dir, main page]

CN112381132A - Target object tracking method and system based on fusion of multiple cameras - Google Patents

Target object tracking method and system based on fusion of multiple cameras Download PDF

Info

Publication number
CN112381132A
CN112381132A CN202011253000.4A CN202011253000A CN112381132A CN 112381132 A CN112381132 A CN 112381132A CN 202011253000 A CN202011253000 A CN 202011253000A CN 112381132 A CN112381132 A CN 112381132A
Authority
CN
China
Prior art keywords
target
target object
detection frame
image
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011253000.4A
Other languages
Chinese (zh)
Inventor
赖哲渊
姚明江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAIC Volkswagen Automotive Co Ltd
Original Assignee
SAIC Volkswagen Automotive Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAIC Volkswagen Automotive Co Ltd filed Critical SAIC Volkswagen Automotive Co Ltd
Priority to CN202011253000.4A priority Critical patent/CN112381132A/en
Publication of CN112381132A publication Critical patent/CN112381132A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target object tracking method based on fusion of a plurality of cameras, which comprises the following steps: 100: extracting target object information to be tracked in real time from images shot by a plurality of cameras; 200: adopting a trained depth residual encoder to identify images input into a target object detection frame; 300: storing the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp of the target object detection frame in the image, and taking the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp as corresponding historical data; 400: predicting the current position of the target object detection frame in the image according to the historical position information of the target object detection frame in the image; 500: screening out candidate target objects smaller than a first threshold value based on the set first threshold value; 600: screening out candidate matching target objects smaller than a second threshold value based on a set second threshold value; 700: and performing matching assignment on the current target object which is currently detected from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.

Description

Target object tracking method and system based on fusion of multiple cameras
Technical Field
The present invention relates to a target tracking method and system, and more particularly, to a target tracking method and system based on a camera.
Background
In recent years, with the rapid development of the automatic driving technology, the possibility of the automatic driving automobile being used in daily life is increasing. The method for detecting and tracking the object by utilizing the vehicle-mounted camera is an important link of the automatic driving automobile in automatic driving perception.
At present, the existing multi-object reproduction tracking method is generally performed on the basis of detection, and is almost based on a vehicle-mounted front-view camera. The current mainstream tracking method comprises the following steps: object positions are predicted based on optical flow tracking, linear velocity assumptions and matched by cross-over-crossing ratios (IOU), and the like.
However, the above methods have problems that a large estimation deviation is brought when a detected object is occluded for a long time, an object that has already appeared is tagged with a new tag (ID), and the possibility that IDs of different objects are exchanged is high; on the other hand, the field of view of a single camera is limited, and in a scene with multiple cameras, such as looking around, tracked objects are easily lost, and the algorithm is hardly applicable, which in turn affects prediction and planning.
Based on the situation, the invention is based on the automatic driving scene of the vehicle, and the number of cameras of the automatic driving vehicle is considered to be large, so that the target object tracking method based on the fusion of the plurality of cameras is expected to be obtained.
Disclosure of Invention
One of the objectives of the present invention is to provide a target tracking method based on multiple camera fusion, which can perform training and re-recognition on target objects, such as vehicles and pedestrians, during automatic driving, so as to extract appearance features of the target objects, and use the similarity as a matching reference to improve the tracking accuracy of the target objects.
In order to achieve the above object, the present invention provides a target tracking method based on fusion of multiple cameras, which includes the steps of:
100: extracting target object information to be tracked in real time from images shot by a plurality of cameras, wherein the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
200: adopting a trained depth residual encoder to identify the image input into the target object detection frame so as to output a corresponding target object appearance characteristic code; the number of depth residual encoders corresponds to the number of object classes;
300: storing the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp of the target object detection frame in the image, and taking the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp as corresponding historical data;
400: predicting the current position of the target object detection frame in the image according to the historical position information of the target object detection frame in the image to obtain the predicted position of the target object detection frame;
500: calculating the Euclidean distance between the current target object detection frame and the corresponding predicted position of the target object detection frame based on the position of the current target object detection frame, and screening out candidate target objects smaller than a first threshold value based on a set first threshold value;
600: calculating the cosine distance between the appearance feature code of the current target and the appearance feature code of the candidate target based on the appearance feature code of the current target detected currently, and screening out candidate matching targets smaller than a second threshold based on a set second threshold;
700: and performing matching assignment on the current target object which is currently detected from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.
The difference between the target object tracking method based on the fusion of a plurality of cameras and the prior tracking technology is as follows: the traditional tracking technology is mostly used for predicting and judging based on the position of a target object, whether the target object is the same or not is difficult to determine in a plurality of different cameras, and the ID of the target object is often easy to lose; the target object tracking method based on the fusion of the plurality of cameras respectively performs training re-recognition on vehicles and pedestrians, extracts appearance characteristics of the target object, and takes the similarity as a matching reference, so that the tracking accuracy of the target object is effectively improved.
Further, in the target tracking method based on the fusion of multiple cameras of the present invention, the target at least includes a pedestrian and a vehicle.
Further, in the target tracking method based on the fusion of multiple cameras according to the present invention, in step 400, a kalman filter is used to predict the current position of the target detection frame in the image.
Further, in the target tracking method based on the fusion of multiple cameras according to the present invention, a preprocessing step is further included between step 100 and step 200: the image within the object detection box is scaled to the input size of the depth residual encoder.
Further, in the target tracking method based on the fusion of multiple cameras of the present invention, the position information of the target detection frame in the image includes the pixel position of the center point of the detection frame and the length and width of the detection frame.
Further, in the target tracking method based on the fusion of multiple cameras according to the present invention, the method further includes step 800: when the currently detected target object is not matched with the corresponding target object from the candidate matching target objects, a new ID is given to the currently detected target object, and the currently detected target object is stored as historical data.
Further, in the target tracking method based on the fusion of the plurality of cameras, the depth residual encoder is trained by adopting an MOT pedestrian re-identification data set and a vehicle-mounted camera acquisition data set.
Accordingly, another object of the present invention is to provide a target tracking system based on fusion of multiple cameras, which can be used to implement the above-mentioned target tracking method of the present invention.
In order to achieve the above object, the present invention provides a target tracking system based on fusion of multiple cameras, which includes:
the target object detection module extracts target object information to be tracked in real time from images shot by a plurality of cameras, and the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
a target re-identification encoding module including depth residual encoders corresponding to the number of types of the target, each depth residual encoder outputting a corresponding target appearance feature code based on the input image in the target detection frame;
the database receives the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp of the target object detection frame in the image and stores the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp as corresponding historical data;
the online matching and tracking module comprises a position prediction submodule, a distance calculation submodule and a Hungarian matching submodule, wherein:
the position prediction sub-module predicts the current position of the target object detection frame in the image based on the historical position information of the target object detection frame in the image stored in the database so as to obtain the predicted position of the target object detection frame;
the distance calculation sub-module calculates the Euclidean distance between the position of the current target object detection frame detected currently and the predicted position of the corresponding target object detection frame, and screens out candidate target objects smaller than a first threshold value from a database on the basis of the set first threshold value; then calculating the cosine distance between the appearance feature code of the current detected target object and the appearance feature code of the candidate target object, and screening out candidate matching target objects smaller than a second threshold value from the database based on the set second threshold value;
and the Hungarian matching submodule performs matching assignment on the current detected target object from the candidate matching target objects by adopting a Hungarian algorithm so as to realize the tracking work.
Further, in the target tracking system based on the fusion of the plurality of cameras, the database includes a matching database and a cloud database, the cloud database stores all historical data of the target, and the matching database stores historical data of a set number of frames of the target.
Further, in the target tracking system based on the fusion of multiple cameras of the present invention, the target re-identification encoding module further includes a preprocessing sub-module, and the preprocessing sub-module scales the image in the target detection frame to the input size of the depth residual encoder.
Compared with the prior art, the target object tracking method and system based on the fusion of the plurality of cameras have the following advantages and beneficial effects:
(1) the invention provides a cross-camera multi-object tracking method suitable for an automatic driving scene of a vehicle, which can realize tracking of a wider field of view by utilizing the spatial arrangement of a plurality of cameras;
(2) the depth residual error encoder in the target object re-recognition encoding module can be trained through the data sets of pedestrian and vehicle re-recognition respectively, so that the image in the target object detection frame is input based on the image, the corresponding target object appearance characteristic code is output, and the tracking accuracy of the target object is improved;
drawings
Fig. 1 schematically shows a tracking algorithm overall module schematic diagram of a target tracking system based on multiple camera fusion in an embodiment of the invention.
Fig. 2 schematically shows a neural network training diagram of a target re-identification coding module of the target tracking system based on fusion of multiple cameras in an embodiment of the present invention.
Fig. 3 schematically shows a database module diagram of a target object tracking system based on multiple camera fusion according to an embodiment of the present invention.
Fig. 4 schematically shows a flowchart of steps of a target tracking method based on multi-camera fusion according to an embodiment of the present invention.
Detailed Description
The target tracking method and system based on multi-camera fusion according to the present invention will be further explained and explained with reference to the drawings and specific embodiments of the specification, however, the explanation and explanation do not unduly limit the technical solution of the present invention.
Fig. 1 schematically shows a tracking algorithm overall module schematic diagram of a target tracking system based on multiple camera fusion in an embodiment of the invention.
As shown in fig. 1, in the present embodiment, the target tracking system according to the present invention may include: the device comprises a target object detection module, a target object re-identification coding module, a database and an online matching tracking module.
In the target tracking system of the present invention, the target detection module can extract target information to be tracked in real time from images taken by a plurality of cameras, and the target information at least includes: position information of the object detection frame in the image, the image within the object detection frame, the object type, and the object ID. The target object detection module transmits the target object information to be tracked, which is extracted in real time, to the target object re-identification coding module, and the target object re-identification coding module may include depth residual encoders corresponding to the number of types of the target object, and each depth residual encoder outputs a corresponding target object appearance feature code based on the input image in the target object detection frame.
Correspondingly, the database in the target object tracking system can effectively store the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp of the target object detection frame in the image, and the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp of the target object detection frame are used as corresponding historical data.
The online matching and tracking module can receive the output of the current target object re-identification coding module and perform feature matching with the database to complete re-identification and tracking of the target object.
It should be noted that, in the present invention, the position prediction sub-module in the online matching and tracking module can predict the current position of the target object detection frame in the image based on the historical position information of the target object detection frame in the image stored in the database, so as to obtain the predicted position of the target object detection frame; a distance calculation sub-module in the online matching and tracking module can calculate the Euclidean distance between the position of the current target object detection frame currently detected and the predicted position of the corresponding target object detection frame, and screen out candidate target objects smaller than a first threshold value from a database based on the set first threshold value; then calculating the cosine distance between the appearance feature code of the current detected target object and the appearance feature code of the candidate target object, and screening out candidate matching target objects smaller than a second threshold value from the database based on the set second threshold value; after the candidate matching target objects are screened out, matching assignment is carried out on the current target object detected currently from the candidate matching target objects by the Hungarian matching submodule through the Hungarian algorithm, and tracking is achieved.
In addition, in this embodiment, the target re-identification encoding module in the target tracking system further includes a pre-processing sub-module, and the pre-processing sub-module may scale the image in the target detection frame to the input size of the depth residual encoder.
In addition, it should be noted that, in the present embodiment, the target object in the target object tracking system based on multiple camera fusion according to the present invention may include at least a pedestrian and a vehicle.
Fig. 2 schematically shows a neural network training diagram of a target re-identification coding module of the target tracking system based on fusion of multiple cameras in an embodiment of the present invention.
As shown in fig. 2, in the present embodiment, the target re-identification coding module of the target tracking system according to the present invention introduces a depth residual encoder (or called depth residual network) to extract coding of the appearance features of the target. The depth residual encoder consists of 2 convolutional layers, 1 pooling layer, 6 residual modules and 1 fully-connected layer. One part of data in the training stage of the depth residual error encoder is derived from a public MOT pedestrian re-identification data set, the other part of data is derived from a vehicle-mounted camera acquisition data set, and the depth residual error encoder adopts the MOT pedestrian re-identification data set and the vehicle-mounted camera acquisition data set for training.
In the present embodiment, in the training phase of the depth residual encoder, the data set is extended by using data enhancement in the preprocessing module, considering that the postures and the integrity of the same target object in different cameras are greatly different. Specifically, by randomly picking 1/3 pedestrians and vehicles in the data set, randomly cropping the bottom and top portion pixels, and then scaling to the original size. The enhanced data set better simulates the misalignment problem that exists in a multi-camera scene. The target object re-identification coding module separately trains 2 corresponding weights for pedestrians and vehicles, and can better distinguish the characteristics of the same type of target object. The output of the depth residual network is a 128-dimensional feature vector, which is encoded as the appearance feature of the target.
Fig. 3 schematically shows a database module of the target tracking system based on multi-camera fusion according to an embodiment of the present invention.
As shown in fig. 3, in this embodiment, the database in the target tracking system based on multiple camera fusion according to the present invention may include: the system comprises a matching database and a cloud database.
In the invention, the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding timestamp of the target object detection frame in the image need to be uploaded to a cloud database and a matching database. The cloud database records feature codes of all target objects, and on one hand, the cloud database can be used for adjusting the number of frames stored in the matching database and increasing the robustness of the algorithm; on the other hand, the characteristics and the appearing time sequence of a certain target object can be quickly found in a scene with monitoring requirements.
Accordingly, the matching database in the database only stores the historical data of the set frame number of the target object (or called tracker), and simultaneously has a data updating and deleting mechanism. In the present embodiment, the matching database may store only records of past 100 frames of the target object. When the target object (or called tracker) is not matched with the target objects of all the cameras in a new frame, the target object loss time is accumulated, and when the loss time exceeds a threshold value, the target object (or called tracker) is deleted from the matching database.
Fig. 4 schematically shows a flowchart of steps of a target tracking method based on multi-camera fusion according to an embodiment of the present invention.
It should be noted that, in the present invention, a target object tracking method based on the fusion of multiple cameras is also disclosed. As shown in fig. 4, and with reference to fig. 1 to fig. 3, the target tracking method based on multiple camera fusion according to the present invention can be obtained, and the target tracking method may include the following steps:
100: extracting target object information to be tracked in real time from images shot by a plurality of cameras, wherein the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
200: adopting a trained depth residual encoder to identify the image input into the target object detection frame so as to output a corresponding target object appearance characteristic code; the number of depth residual encoders corresponds to the number of object classes;
300: storing the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp of the target object detection frame in the image, and taking the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp as corresponding historical data;
400: predicting the current position of the target object detection frame in the image according to the historical position information of the target object detection frame in the image to obtain the predicted position of the target object detection frame;
500: calculating the Euclidean distance between the current target object detection frame and the corresponding predicted position of the target object detection frame based on the position of the current target object detection frame, and screening out candidate target objects smaller than a first threshold value based on a set first threshold value;
600: calculating the cosine distance between the appearance feature code of the current target and the appearance feature code of the candidate target based on the appearance feature code of the current target detected currently, and screening out candidate matching targets smaller than a second threshold based on a set second threshold;
700: and performing matching assignment on the current target object which is currently detected from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.
In the target tracking method according to the present invention, in step 400, a kalman filter may be used to predict the current position of the target detection frame in the image.
In addition, in this embodiment, a preprocessing step may be further included between the step 100 and the step 200: the image within the object detection box is scaled to the input size of the depth residual encoder.
In addition, in some other embodiments, in the target tracking system based on multiple camera fusion according to the present invention, the position information of the target detection frame in the image may include the pixel position of the center point of the detection frame and the length and width of the detection frame.
In the target tracking method based on multi-camera fusion according to the present invention, in step 800, when a currently detected target is not matched with a corresponding target from among candidate matching targets, a new ID is assigned to the currently detected target, and the currently detected target is stored as history data.
Referring to fig. 4 in conjunction with the step 100 and 700 of the target tracking method, the target tracking method of the present invention is implemented based on the target tracking system of the present invention.
In the embodiment shown in fig. 4, the target to be tracked by the target tracking method according to the present invention is referred to as a tracker. In the process shown in fig. 4, the target detection module detects an input image sequence, obtains a tracker detection frame from an image of a current frame, and inputs the tracker detection frame into a trained depth residual encoder, thereby outputting a corresponding tracker appearance feature code.
And when an online matching tracking module in the system processes the previous frame, the position of the tracker possibly appearing in the current frame is measured by using a Kalman filtering predictor according to the historical information of the central position of each tracker detection frame, so as to obtain the predicted position of the tracker detection frame.
Accordingly, in the present embodiment, the distance calculating section of the online matching tracking module includes two steps of screening:
step one, calculating Euclidean distance between a current frame detection frame and a prediction position of each tracker detection frame; and secondly, calculating the minimum cosine distance between the appearance feature code of the tracker meeting the condition that the Euclidean distance in the first step is smaller than the threshold value, and regarding the minimum cosine distance as the similarity between the tracker and the detection target object, so as to screen out the tracker with the similarity smaller than the threshold value with the detection target object, and regarding the tracker as a potential candidate matching tracker. Wherein, the significance of the first step of screening is as follows: the positions of the trackers are constrained, the number of candidate trackers is reduced, and the calculation burden of the second step of screening is reduced.
In the embodiment, the detection target object and the potential matching tracker are assigned by adopting a Hungarian algorithm, if the detection target object and the potential matching tracker are matched with the corresponding tracker, the ID of the tracker is given to the target object, and meanwhile, the information of the matching database is updated to complete the tracking of the current target object; and if the current detection target object is not matched with the corresponding tracker, the current detection target object is a new target object, a new ID is given to the current detection target object, and the current detection target object is uploaded to the matching database to be used as the tracker.
In the above technical solution, for the remaining tracker that is not matched with the target object, it is described that if the tracker disappears in the current frame, the lost time is updated, and if the lost time exceeds the threshold, the tracker is deleted from the matching database, which indicates that the tracker is no longer observed by the camera.
The scope of the present invention is not limited to the examples given herein, and all prior art that does not contradict the inventive concept, including but not limited to prior patent documents, prior publications, and the like, are intended to be encompassed by the present invention.
In addition, the combination of the features in the present application is not limited to the combination described in the claims of the present application or the combination described in the embodiments, and all the features described in the present application may be freely combined or combined in any manner unless contradictory to each other.
It should also be noted that the above-mentioned embodiments are only specific embodiments of the present invention. It is apparent that the present invention is not limited to the above embodiments and similar changes or modifications can be easily made by those skilled in the art from the disclosure of the present invention and shall fall within the scope of the present invention.

Claims (10)

1.一种基于多个摄像头融合的目标物跟踪方法,其特征在于,包括步骤:1. a target tracking method based on the fusion of multiple cameras, is characterized in that, comprises the steps: 100:从若干个摄像头拍摄的图像中实时提取需要跟踪的目标物信息,所述目标物信息至少包括:目标物检测框在图像中的位置信息、目标物检测框内的图像、目标物类别和目标物ID;100: Extract the target object information to be tracked in real time from the images captured by several cameras, the target object information at least includes: position information of the target object detection frame in the image, the image in the target object detection frame, the target object category and target ID; 200:采用经过训练的深度残差编码器对输入其中的目标物检测框内的图像进行识别,以输出对应的目标物外观特征编码;所述深度残差编码器的数量与目标物类别的数量对应;200: Use a trained depth residual encoder to identify the image in the input target detection frame to output the corresponding target appearance feature code; the number of the depth residual encoder and the number of target object categories correspond; 300:存储目标物检测框在图像中的位置信息、目标物类别、目标物ID、目标物外观特征编码及其对应的时间戳,并将其作为对应的历史数据;300: Store the position information of the target detection frame in the image, the target category, the target ID, the target appearance feature code and its corresponding timestamp, and use it as the corresponding historical data; 400:目标物检测框在图像中的历史位置信息,对目标物检测框在图像中的当前位置进行预测,以得到目标物检测框的预测位置;400 : historical position information of the target detection frame in the image, and predict the current position of the target detection frame in the image to obtain the predicted position of the target detection frame; 500:基于当前检测的当前目标物检测框的位置,计算其与对应的目标物检测框的预测位置之间的欧式距离,并基于设定的第一阈值筛选出小于第一阈值的候选目标物;500: Calculate the Euclidean distance between the current target detection frame and the predicted position of the corresponding target detection frame based on the currently detected position of the current target detection frame, and screen out candidate targets smaller than the first threshold based on the set first threshold ; 600:基于当前检测的当前目标物外观特征编码,计算其与候选目标物的外观特征编码的余弦距离,并基于设定的第二阈值筛选出小于第二阈值的候选匹配目标物;600: Based on the currently detected appearance feature encoding of the current target object, calculate the cosine distance between it and the appearance feature encoding of the candidate target object, and screen out candidate matching objects smaller than the second threshold value based on the set second threshold value; 700:采用匈牙利算法从候选匹配目标物中为当前检测的当前目标物进行匹配指派,以实现跟踪。700: Use the Hungarian algorithm to match and assign the currently detected current target from the candidate matching targets to achieve tracking. 2.如权利要求1所述的基于多个摄像头融合的目标物跟踪方法,其特征在于,所述目标物至少包括行人和车辆。2 . The target object tracking method based on the fusion of multiple cameras according to claim 1 , wherein the target objects at least include pedestrians and vehicles. 3 . 3.如权利要求1所述的基于多个摄像头融合的目标物跟踪方法,其特征在于,在步骤400中,采用卡尔曼滤波器对目标物检测框在图像中的当前位置进行预测。3 . The target tracking method based on the fusion of multiple cameras according to claim 1 , wherein, in step 400 , a Kalman filter is used to predict the current position of the target detection frame in the image. 4 . 4.如权利要求1所述的基于多个摄像头融合的目标物跟踪方法,其特征在于,在步骤100和步骤200之间还包括预处理步骤:将目标物检测框内的图像缩放至深度残差编码器的输入尺寸。4. The target tracking method based on the fusion of multiple cameras as claimed in claim 1, characterized in that, between step 100 and step 200, it further comprises a preprocessing step: zooming the image in the target detection frame to a depth residual The input size of the difference encoder. 5.如权利要求1所述的基于多个摄像头融合的目标物跟踪方法,其特征在于,目标物检测框在图像中的位置信息包括检测框中心点的像素位置以及检测框的长度和宽度。5 . The target tracking method based on the fusion of multiple cameras according to claim 1 , wherein the position information of the target detection frame in the image includes the pixel position of the center point of the detection frame and the length and width of the detection frame. 6 . 6.如权利要求1所述的基于多个摄像头融合的目标物跟踪方法,其特征在于,还包括步骤800:当没有从候选匹配目标物中为当前检测的目标物匹配到对应的目标物时,为该当前检测的目标物赋予新的ID,并将其作为历史数据存储。6. The target tracking method based on the fusion of multiple cameras according to claim 1, further comprising step 800: when the currently detected target is not matched to the corresponding target from the candidate matching target , assign a new ID to the currently detected target, and store it as historical data. 7.如权利要求1所述的基于多个摄像头融合的目标物跟踪方法,其特征在于,所述深度残差编码器采用MOT行人重识别数据集和车载摄像头采集数据集进行训练。7 . The target tracking method based on the fusion of multiple cameras according to claim 1 , wherein the depth residual encoder is trained by using a MOT pedestrian re-identification data set and a vehicle-mounted camera collection data set. 8 . 8.一种基于多个摄像头融合的目标物跟踪系统,其特征在于,包括:8. A target tracking system based on the fusion of multiple cameras, comprising: 目标物检测模块,其从若干个摄像头拍摄的图像中实时提取需要跟踪的目标物信息,所述目标物信息至少包括:目标物检测框在图像中的位置信息、目标物检测框内的图像、目标物类别和目标物ID;A target detection module, which extracts the target information to be tracked in real time from the images captured by several cameras, the target information at least includes: the position information of the target detection frame in the image, the image in the target detection frame, Target type and target ID; 目标物重识别编码模块,其包括与目标物类别的数量对应的深度残差编码器,所述各深度残差编码器分别基于输入的目标物检测框内的图像,而输出对应的目标物外观特征编码;A target object re-identification coding module, which includes a depth residual encoder corresponding to the number of target object categories, each depth residual encoder is based on the input target object detection frame image, and outputs the corresponding target object appearance feature code; 数据库,其接收所述目标物检测框在图像中的位置信息、目标物类别、目标物ID、目标物外观特征编码及其对应的时间戳,并将其作为对应的历史数据进行存储;A database, which receives the position information of the target detection frame in the image, the target category, the target ID, the target appearance feature code and its corresponding timestamp, and stores it as the corresponding historical data; 在线匹配跟踪模块,其包括位置预测子模块、距离计算子模块和匈牙利匹配子模块,其中:An online matching and tracking module, which includes a position prediction sub-module, a distance calculation sub-module and a Hungarian matching sub-module, wherein: 位置预测子模块基于数据库中存储的目标物检测框在图像中的历史位置信息,对目标物检测框在图像中的当前位置进行预测,以得到目标物检测框的预测位置;The position prediction sub-module predicts the current position of the target detection frame in the image based on the historical position information of the target detection frame in the image stored in the database, so as to obtain the predicted position of the target detection frame; 距离计算子模块,其先计算当前检测的当前目标物检测框的位置与对应的目标物检测框的预测位置之间的欧式距离,并基于设定的第一阈值从数据库中筛选出小于第一阈值的候选目标物;然后计算当前检测的当前目标物的外观特征编码和候选目标物的外观特征编码的余弦距离,并基于设定的第二阈值从数据库中筛选出小于第二阈值的候选匹配目标物;A distance calculation sub-module, which firstly calculates the Euclidean distance between the position of the current target detection frame currently detected and the predicted position of the corresponding target detection frame, and selects a value smaller than the first threshold from the database based on the set first threshold. Threshold candidate target; then calculate the cosine distance between the currently detected appearance feature code of the current target object and the appearance feature code of the candidate target, and based on the set second threshold value from the database to filter out candidate matches smaller than the second threshold value Target; 匈牙利匹配子模块,其采用匈牙利算法从候选匹配目标物中为当前检测的当前目标物进行匹配指派,以实现跟踪。The Hungarian matching sub-module uses the Hungarian algorithm to match and assign the currently detected current target from the candidate matching targets to achieve tracking. 9.如权利要求8所述的基于多个摄像头融合的目标物跟踪系统,其特征在于,所述数据库包括匹配数据库和云端数据库,所述云端数据库保存目标物的所有历史数据,所述匹配数据库保存目标物的设定帧数的历史数据。9. The target tracking system based on the fusion of multiple cameras according to claim 8, wherein the database comprises a matching database and a cloud database, the cloud database saves all historical data of the target, and the matching database Save the historical data of the set number of frames of the target. 10.如权利要求8所述的基于多个摄像头融合的目标物跟踪系统,其特征在于,所述目标物重识别编码模块还包括预处理子模块,所述预处理子模块将所述目标物检测框内的图像缩放至深度残差编码器的输入尺寸。10 . The target object tracking system based on the fusion of multiple cameras according to claim 8 , wherein the target object re-identification coding module further comprises a preprocessing submodule, and the preprocessing submodule The image inside the detection box is scaled to the input size of the depth residual encoder.
CN202011253000.4A 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras Pending CN112381132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011253000.4A CN112381132A (en) 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011253000.4A CN112381132A (en) 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras

Publications (1)

Publication Number Publication Date
CN112381132A true CN112381132A (en) 2021-02-19

Family

ID=74582097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011253000.4A Pending CN112381132A (en) 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras

Country Status (1)

Country Link
CN (1) CN112381132A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598715A (en) * 2021-03-04 2021-04-02 奥特酷智能科技(南京)有限公司 Multi-sensor-based multi-target tracking method, system and computer readable medium
CN113012223A (en) * 2021-02-26 2021-06-22 清华大学 Target flow monitoring method and device, computer equipment and storage medium
CN113052876A (en) * 2021-04-25 2021-06-29 合肥中科类脑智能技术有限公司 Video relay tracking method and system based on deep learning
CN113625718A (en) * 2021-08-12 2021-11-09 上汽大众汽车有限公司 Vehicle path planning method
CN115291606A (en) * 2022-07-22 2022-11-04 天津海关工业产品安全技术中心 Robot automatic following method and system
CN115861907A (en) * 2023-03-02 2023-03-28 山东华夏高科信息股份有限公司 Helmet detection method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074665A1 (en) * 2018-09-03 2020-03-05 Baidu Online Network Technology (Beijing) Co., Ltd. Object detection method, device, apparatus and computer-readable storage medium
CN111145213A (en) * 2019-12-10 2020-05-12 中国银联股份有限公司 Target tracking method, device and system and computer readable storage medium
CN111192297A (en) * 2019-12-31 2020-05-22 山东广域科技有限责任公司 A Multi-Camera Target Association Tracking Method Based on Metric Learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074665A1 (en) * 2018-09-03 2020-03-05 Baidu Online Network Technology (Beijing) Co., Ltd. Object detection method, device, apparatus and computer-readable storage medium
CN111145213A (en) * 2019-12-10 2020-05-12 中国银联股份有限公司 Target tracking method, device and system and computer readable storage medium
CN111192297A (en) * 2019-12-31 2020-05-22 山东广域科技有限责任公司 A Multi-Camera Target Association Tracking Method Based on Metric Learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NICOLAI WOJKE ET AL.: "Simple Online And Realtime Tracking With A Deep Association Metric", 《ARXIV:1703.07402V1》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012223A (en) * 2021-02-26 2021-06-22 清华大学 Target flow monitoring method and device, computer equipment and storage medium
CN112598715A (en) * 2021-03-04 2021-04-02 奥特酷智能科技(南京)有限公司 Multi-sensor-based multi-target tracking method, system and computer readable medium
CN113052876A (en) * 2021-04-25 2021-06-29 合肥中科类脑智能技术有限公司 Video relay tracking method and system based on deep learning
CN113625718A (en) * 2021-08-12 2021-11-09 上汽大众汽车有限公司 Vehicle path planning method
CN113625718B (en) * 2021-08-12 2023-07-21 上汽大众汽车有限公司 Vehicle route planning method
CN115291606A (en) * 2022-07-22 2022-11-04 天津海关工业产品安全技术中心 Robot automatic following method and system
CN115861907A (en) * 2023-03-02 2023-03-28 山东华夏高科信息股份有限公司 Helmet detection method and system

Similar Documents

Publication Publication Date Title
CN112381132A (en) Target object tracking method and system based on fusion of multiple cameras
TWI750498B (en) Method and device for processing video stream
JP4429298B2 (en) Object number detection device and object number detection method
US9405974B2 (en) System and method for using apparent size and orientation of an object to improve video-based tracking in regularized environments
EP2549759B1 (en) Method and system for facilitating color balance synchronization between a plurality of video cameras as well as method and system for obtaining object tracking between two or more video cameras
CN111199556A (en) Indoor pedestrian detection and tracking method based on camera
CN116580333A (en) Grain depot vehicle tracking method based on YOLOv5 and improved StrongSORT
CN114398950A (en) Garbage identification and classification method, computer readable storage medium and robot
CN113408550B (en) Intelligent weighing management system based on image processing
CN115019241A (en) Pedestrian identification and tracking method and device, readable storage medium and equipment
CN114937248A (en) Method, apparatus, electronic device, storage medium for vehicle tracking across cameras
Shen et al. An interactively motion-assisted network for multiple object tracking in complex traffic scenes
CN114613006A (en) A kind of long-distance gesture recognition method and device
CN110570318A (en) Computer-executed vehicle damage assessment method and device based on video stream
KR101842488B1 (en) Smart monitoring system applied with patten recognition technic based on detection and tracking of long distance-moving object
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN113112479A (en) Progressive target detection method and device based on key block extraction
JP4918615B2 (en) Object number detection device and object number detection method
CN118154854A (en) Target detection method for multi-view feature aggregation
Zhang et al. Vehicle detection and tracking in remote sensing satellite vidio based on dynamic association
CN115880661B (en) A method and device for vehicle matching, electronic equipment, and storage medium
CN115147450B (en) Moving target detection method and detection device based on motion frame difference image
CN117132922A (en) Image recognition method, device, equipment and storage medium
CN111160115A (en) Video pedestrian re-identification method based on twin double-flow 3D convolutional neural network
CN116912763A (en) Multi-pedestrian re-recognition method integrating gait face modes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210219