[go: up one dir, main page]

CN110517285B - Large-scene minimum target tracking based on motion estimation ME-CNN network - Google Patents

Large-scene minimum target tracking based on motion estimation ME-CNN network Download PDF

Info

Publication number
CN110517285B
CN110517285B CN201910718847.6A CN201910718847A CN110517285B CN 110517285 B CN110517285 B CN 110517285B CN 201910718847 A CN201910718847 A CN 201910718847A CN 110517285 B CN110517285 B CN 110517285B
Authority
CN
China
Prior art keywords
target
network
cnn
training
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910718847.6A
Other languages
Chinese (zh)
Other versions
CN110517285A (en
Inventor
焦李成
杨晓岩
李阳阳
唐旭
程曦娜
刘旭
杨淑媛
冯志玺
侯彪
张丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910718847.6A priority Critical patent/CN110517285B/en
Publication of CN110517285A publication Critical patent/CN110517285A/en
Application granted granted Critical
Publication of CN110517285B publication Critical patent/CN110517285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明提出了一种基于运动估计ME‑CNN网络的大场景极小目标跟踪方法,解决了无需配准利用运动参数进行极小目标跟踪的问题,实现步骤为:获取目标运动估计网络ME‑CNN的初始训练集D;构建估计目标运动的网络ME‑CNN;用目标运动参数计算网络ME‑CNN损失函数;判断是否为初始训练集;更新损失函数训练标签;得到预测目标运动位置初始模型;修正预测模型位置;用修正后的目标位置更新训练数据集,完成一帧的目标跟踪;得到遥感视频目标跟踪结果。本发明用深度学习网络ME‑CNN预测目标运动位置,避免了跟踪中大场景图像配准,超模糊目标特征提取难的问题,减小目标特征依赖性,提高了超模糊视频中目标跟踪的准确度。

Figure 201910718847

The invention proposes a large-scene minimal target tracking method based on a motion estimation ME-CNN network, which solves the problem of using motion parameters for minimal target tracking without registration. The implementation steps are: obtaining the target motion estimation network ME-CNN The initial training set D of ; construct the network ME-CNN for estimating the target motion; calculate the loss function of the network ME-CNN with the target motion parameters; judge whether it is the initial training set; update the training label of the loss function; obtain the initial model for predicting the target motion position; Predict the model position; update the training data set with the corrected target position to complete the target tracking of one frame; obtain the remote sensing video target tracking result. The invention uses the deep learning network ME-CNN to predict the moving position of the target, avoids the problems of image registration in the tracking medium and large scenes, and difficult extraction of super-blurred target features, reduces the dependence of target features, and improves the accuracy of target tracking in super-blurred videos. Spend.

Figure 201910718847

Description

Large-scene minimum target tracking based on motion estimation ME-CNN network
Technical Field
The invention belongs to the technical field of remote sensing video processing, relates to remote sensing video target tracking of a large-scene minimum target, and particularly relates to a large-scene minimum target remote sensing video tracking method based on a motion estimation ME-CNN network. The method is used for safety monitoring, smart city construction, traffic facility monitoring and the like.
Background
Remote sensing target tracking is an important research direction in the field of computer vision, wherein target tracking of a remote sensing video with a small target and low resolution in a large scene shot by a moving satellite is a very challenging research problem. The remote sensing video of the large scene and the small target records the daily activity condition of a certain area for a period of time, because the height of satellite shooting is very high and covers most of a city, the resolution of the video is not high, the sizes of vehicles, ships and airplanes in the video are extremely small, the sizes of the vehicles in the video even reach about 3 to 3 pixels, the contrast with the surrounding environment is extremely low, and only one small bright spot can be observed by human eyes, so the problem of tracking the ultra-low pixel and extremely-small target belongs to the problem of tracking the large scene and the extremely-small target, and the difficulty is higher; and because the satellite for shooting the video continuously moves, the video has obvious deviation in one direction as a whole, and simultaneously partial regional scaling can occur due to the height of the region, so that the method of firstly carrying out image registration and then carrying out frame difference method to obtain the moving position of the target is difficult, and great challenge is brought to the remote sensing video target tracking of the extremely small target in a large scene.
Video target tracking is the prediction of target position and size in subsequent video frames after the target position and size in an initial frame of a video are given. At present, algorithms in the field of video tracking are mostly based on Neural networks (Neural networks) and related filtering (Correlation filters), wherein the Neural Network-based algorithms, such as the CNN-SVM method, have the main idea that a target is input into a multilayer Neural Network first, target characteristics are learned, tracking is performed by using a traditional SVM method, and the extracted target characteristics are learned through a large amount of training data and have higher discriminative performance than the traditional characteristics; the basic idea of an algorithm based on the relevant filtering, such as the KCF method, is to search a filtering template, convolve the image of the next frame with the filtering template, make the region with the largest response as the predicted target, and use the template and other search regions to perform convolution operation, and the search region with the largest response is the target position.
The algorithm of natural optical video tracking is difficult to be applied to the remote sensing video of a tiny small target in a large scene, and because the target is tiny and fuzzy in size, effective target features cannot be obtained by learning through a neural network. The traditional tracking method of the remote sensing video is not suitable for the video with continuous background deviation and partial area scaling, the technical methods of image registration and frame difference method cannot be realized, the contrast ratio of the target and the surrounding environment is extremely low, and the tracking method is easy to lose.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a large-scene small-target remote sensing video tracking method based on motion estimation, which is low in calculation complexity and higher in precision.
The invention relates to a large-scene minimum target remote sensing video tracking method based on a motion estimation ME-CNN network, which is characterized by comprising the following steps of:
(1) obtaining an initial training set D of a minimum target motion estimation network ME-CNN:
taking the front F frame images of the original remote sensing data video A, continuously marking a boundary box for the same target of each image, and arranging the vertex coordinates of the upper left corner of each boundary box together according to the video frame number sequence to be used as a training set D;
(2) constructing a network ME-CNN for estimating the movement of the minimum target: the system comprises three parallel convolution modules for extracting different characteristics of training data, and a connecting layer, a full connecting layer and an output layer are sequentially stacked;
(3) calculating the loss function of the network ME-CNN by using the minimum target motion parameter: calculating to obtain the motion trend of the target according to the motion rule of the target, taking the motion trend as a training label corresponding to the target, and calculating the Euclidean spatial distance between the training label and the prediction result of the ME-CNN network as a loss function of the ME-CNN network optimization training;
(4) judging whether the training set is an initial training set: judging whether the current training set is an initial training set, if not, executing the step (5) and updating the training labels in the loss function; otherwise, if the training set is the initial training set, executing the step (6) and entering the circular training of the network;
(5) updating training labels in the loss function: when the current training set is not the initial training set, recalculating the training labels of the loss function by using the data of the current training set, calculating the training labels by using the minimum target motion parameters by using the calculation method, wherein the method is the same as the method in the step (3), the recalculated training labels participate in the ME-CNN training of the motion estimation network, and entering the step (6);
(6) obtaining an initial model M1 for predicting the movement position of the target: inputting the training set D into a target motion estimation network ME-CNN, training the network according to the current loss function, and obtaining an initial model M1 for predicting the motion position of the target;
(7) position result of the corrected prediction model: calculating the auxiliary position offset of the target, and correcting the position result predicted by the motion estimation network ME-CNN by using the offset;
(7a) obtaining a target gray level image block: obtaining the target position (P) of the next frame according to the initial model M1 for predicting the target motion positionx,Py) Based on the obtained target position (P)x,Py) Taking out a gray image block of the target from the image of the next frame, and normalizing to obtain a normalized target gray image block;
(7b) obtaining a target position offset: carrying out brightness grading on the normalized target gray image block, determining the position of a target in the image block by using a vertical projection method, and calculating the distance between the center position of the target and the center position of the image block to obtain the offset of the target position;
(7c) obtaining a corrected target position: correcting the position of the predicted target by the motion estimation network ME-CNN by using the obtained target position offset to obtain all positions of the corrected target;
(8) and updating the training data set by using the corrected target position to complete target tracking of one frame: adding the obtained upper left corner position of the target into the last line of the training set D, removing the first line of the training set D, performing one-time operation to obtain a corrected and updated training set D, completing the training of one frame, and obtaining the target position result of one frame;
(9) judging whether the current video frame number is less than the total video frame number: if the number of the video frames is less than the total number of the video frames, repeating the steps (4) to (9) in a circulating way, performing tracking optimization training on the target until all the video frames are traversed, continuing the training, otherwise, if the number of the video frames is equal to the total number of the video frames, finishing the training, and executing the step (10);
(10) obtaining a remote sensing video target tracking result: and the accumulated output is the remote sensing video target tracking result.
The invention solves the problems of high calculation complexity and low tracking precision of the existing video tracking algorithm.
Compared with the prior art, the invention has the following advantages:
(1) the ME-CNN adopted by the invention does not need to carry out image registration and then carry out frame difference method or complex image background modeling in the traditional method to obtain the motion trail of the target, can carry out analysis on a training set consisting of target positions of the previous F frame image through a neural network, and can realize network self-circulation training by network prediction without manually marking target position labels in subsequent video frames, thereby greatly reducing the complexity of a tracking algorithm and improving the practicability of the algorithm.
(2) The algorithm adopted by the invention automatically corrects the target position of the remote sensing video by combining the ME-CNN network and the auxiliary position offset method, modifies the loss function of the motion estimation network according to the motion rule of the target, reduces the calculated amount of the network and improves the robustness of target tracking.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
fig. 2 is a schematic structural diagram of an ME-CNN network proposed by the present invention;
FIG. 3 is a graph comparing the predicted trajectory results of the present invention for very small targets in a large scene with the standard target trajectory, where the predicted results of the present invention are green curves and red is the accurate target trajectory.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments.
Example 1
The remote sensing video tracking of the large-scene tiny target plays an important role in safety monitoring, smart city construction, traffic facility monitoring and the like. The remote sensing video researched by the invention is a remote sensing video with a small target and low resolution in a large scene shot by a mobile satellite. The video tracking method has the advantages that the researched video tracking target is extremely fuzzy, the target is extremely small, the contrast with the surrounding environment is low, the human eyes can hardly see that the target is a vehicle under the condition that the target does not move, the video can be subjected to image translation and partial zooming due to the movement of a satellite and the change of the altitude of a shooting area, the target tracking difficulty of the video is greatly improved compared with that of the clear video, and the video tracking method is also a challenge of remote sensing video tracking. The existing methods mainly comprise two methods, one method is to extract target characteristics by using neural network learning, extract a plurality of search frames in the next frame and select the frame with the highest target characteristic score as the position of a target. The other method is that firstly, the image is registered, then the frame difference method is carried out to obtain the target motion track, then a filtering template is searched, the image of the next frame and the filtering template are subjected to convolution operation, and the region with the largest response is the predicted target. Therefore, the invention provides a large-scene minimum target remote sensing video tracking method based on a motion estimation ME-CNN network by research aiming at the current situations, and referring to fig. 1, the method comprises the following steps:
(1) obtaining an initial training set D of a minimum target motion estimation network ME-CNN:
the method comprises the steps of taking front F frame images of an original remote sensing data video A, selecting only one target in each image, and continuously marking a boundary frame for the same target of each image.
(2) Constructing a network ME-CNN for estimating the movement of the minimum target: the ME-CNN network comprises three convolution modules which are connected in parallel and extract different characteristics of training data to obtain different motion characteristics of a target, and then different extracted motion characteristics, a full connection layer and an output layer are sequentially stacked and fused by a connection layer to obtain an output result, so that the ME-CNN network is formed. According to the method, three convolution modules are used for obtaining different motion characteristics of the target, a single convolution module is difficult to process to obtain the characteristics of the whole training set, and if the network layer is deep, the problem of gradient disappearance occurs, so that the method widens the network, extracts the characteristics of the training set under different conditions in multiple scales, reduces the complexity of the network, and accelerates the network speed. Because the video of the invention continuously moves and shifts and partial areas are zoomed due to different heights of regions, methods such as an image registration and frame difference method, background modeling and the like cannot be used aiming at the video, and at the moment, a target motion track can be obtained by using an ME-CNN network.
(3) Calculating the loss function of the network ME-CNN by using the minimum target motion parameter: the method comprises the steps of calculating the motion trend of a target according to the motion rule of the target, using the motion trend as a training label corresponding to the target, and calculating the Euclidean space distance between the training label and the prediction result of the ME-CNN network to be used as a loss function for optimizing the ME-CNN network.
(4) Judging whether the training set is an initial training set: and (5) judging whether the current training set is an initial training set, if not, executing the step (5), updating the training labels in the loss function, and further participating in network training. Otherwise, if the current training set is the initial training set, the step (6) is executed, and the circular training of the network is entered.
(5) Updating training labels in the loss function: because the training set D is continuously updated in the subsequent step (8), the training labels in the loss function need to be continuously adjusted according to the updated training set D in the training process, when the current training set is not the initial training set, the training labels of the loss function should be recalculated by using the data of the current training set, and the calculation method is the same as the method of the step (3) in that the training labels are calculated by using the minimum target motion parameters; and (5) the recalculated training label participates in the ME-CNN training of the motion estimation network, and the step (6) is entered.
(6) Obtaining an initial model M1 for predicting the movement position of the target: and inputting the training set D into the object motion estimation network ME-CNN, training the network according to the current loss function, and obtaining an initial model M1 for predicting the motion position of the object.
(7) Position result of the corrected prediction model: and calculating the auxiliary position offset of the target, and correcting the position result predicted by the motion estimation network ME-CNN by using the offset.
(7a) Obtaining a target gray level image block: obtaining the target position (P) of the next frame according to the initial model M1 for predicting the target motion positionx,Py) Based on the obtained target position (P)x,Py) The gray image blocks of the target are taken out from the image of the next frame and are normalized to obtain the normalized gray image blocks of the target, and because the size of the target is extremely small and the contrast with the surrounding environment is extremely low, the method for judging the offset by using the neural network has poor effect on the image blocks, a smaller target frame is firstly taken, and then the method for judging the offset in the frame is better.
(7b) Obtaining a target position offset: and (3) carrying out brightness grading on the normalized target gray image block, displaying the target and the road at different brightness, determining the position of the target in the image block by using a vertical projection method because the surrounding environment of the road and the contrast of the target are extremely low, and calculating the distance between the center position of the target and the center position of the image block to obtain the target position offset.
(7c) Obtaining a corrected target position: and correcting the position of the predicted target by the motion estimation network ME-CNN by using the obtained target position offset to obtain all the corrected positions of the target, including the position of the upper left corner of the target.
(8) And updating the training data set by using the corrected target position to complete target tracking of one frame: and adding the obtained position of the upper left corner of the target into the last line of the training set D, removing the first line of the training set D, performing one-time operation to obtain a corrected and updated training set D, completing the training of one frame, and obtaining the target position result of one frame.
(9) And (4) judging whether the current video frame number is less than the total video frame number, if so, circularly repeating the steps (4) to (9), updating the model parameters again, improving the model adaptability, performing tracking optimization training on the target until all the video frames are traversed, continuing the training, otherwise, if the current video frame number is equal to the total video frame number, ending the training, and executing the step (10).
(10) Obtaining a remote sensing video target tracking result: and after the training is finished, the accumulated target position output is the remote sensing video target tracking result.
The ME-CNN adopted by the invention does not need to carry out image registration and then carry out frame difference method or complex image background modeling in the traditional method to obtain the target motion track, and the new algorithm provided by the invention can effectively extract the target motion characteristics by analyzing the training set formed by the target positions of the previous F frame images through the neural network. Because the gradient disappears and other problems can occur when the network is too deep, the ME-CNN network of multi-scale analysis is adopted to predict the motion trend of the target, and the target position label in the subsequent video frame is not required to be marked manually, so that the network self-circulation training can be realized, the complexity of the tracking algorithm is greatly reduced, the practicability of the algorithm is improved, and the target position can be quickly and accurately found through the motion estimation network of the target without image registration. The method is characterized in that the ME-CNN network and the auxiliary position offset method are combined to automatically judge the target position of the remote sensing video, the movement speed of the target is obtained according to the movement condition of the target, the possible movement trend of the target is analyzed, the loss function of the movement estimation network is modified, and the robustness of target tracking is improved.
The method utilizes a deep learning-based method to carry out motion analysis on the super-blurred target, predicts the next prediction direction of the super-blurred target, corrects the motion estimation network by the position offset, and can track the target without subsequent labels, thereby avoiding the problems of image registration of large scenes in tracking and difficult extraction of the super-blurred target characteristics, obviously improving the accuracy of target tracking in the super-blurred video, and being also suitable for tracking in other various remote sensing videos.
Example 2
The method for tracking a large-scene minimum target remote sensing video based on a motion estimation ME-CNN network is the same as that in embodiment 1, and the method for constructing the network ME-CNN for estimating the minimum target motion described in step (2) comprises the following steps as shown in FIG. 2:
(2a) overall structure of the motion estimation network: the invention relates to a motion estimation network ME-CNN, which comprises three convolution modules connected in parallel. The invention constructs different motion characteristics extracted by using the connection layer fusion in the network ME-CNN for estimating the minimum target motion, uses the full connection layer to refine and analyze, and obtains a result through the output layer output.
(2b) Structure of three convolution modules in parallel: the convolution modules in parallel are convolution module I, convolution module I and convolution module I respectively, wherein
The convolution module I comprises a locally connected Locallyconnected1D convolution layer, and the step length is 2 to extract the coordinate position information of the target;
the convolution module I comprises cavity convolution, and the step length is 1;
the convolution module I comprises one-dimensional convolution with the step length of 2;
the convolution modules I, I and I obtain the position characteristics of different scales of the target to obtain three output data, and then the outputs of the three convolution modules are connected in series to obtain a fusion convolution result; and inputting the full connection layer and the output layer to obtain a final prediction result. According to the method, three convolution modules are used for obtaining different motion characteristics of the target, a single convolution module is difficult to process to obtain the characteristics of the whole training set, and if the network layer is deep, the problem of gradient disappearance occurs, so that the method widens the network, extracts the characteristics of the training set under different conditions in multiple scales, reduces the complexity of the network, and accelerates the network speed. Because the video of the invention continuously moves and shifts and partial areas are zoomed due to different heights of regions, methods such as an image registration and frame difference method, background modeling and the like cannot be used aiming at the video, and at the moment, a target motion track can be obtained by using an ME-CNN network.
Example 3
The method for tracking the large-scene minimum target remote sensing video based on the motion estimation ME-CNN network is the same as the embodiment 1-2, the loss function of the network ME-CNN is calculated by using the minimum target motion parameters in the step 3, the motion condition of the target is roughly analyzed by processing the data of the training set D, and a certain guiding function is provided for the optimization direction of the motion estimation network ME-CNN, and the method comprises the following steps:
(3a) acquiring the target displacement of a training set D: taking out the data of the F-th line, the F-2 th line and the F-4 th line of the training set D, and subtracting the data from the data of the first line of the training set D to obtain the target displacement S between the F-th frame, the F-2 nd frame and the F-4 th frame and the first frame in sequence1、S2、S3。S1Is the target displacement between the F-th frame and the first frame, S2Is the target displacement between the F-2 frame and the first frame, S3Is the target displacement between the F-4 th frame and the first frame. If the training set is not the initial training set but the training set D which is updated for i times, the frame number corresponding to each row of the training set is correspondingly changedChanging the frame into a 1+ i frame, a 2+ i frame, … … and an F + i frame, taking out the data of the F-th line, the F-2 th line and the F-4 th line of the training set D, subtracting the data of the first line of the training set D respectively to obtain the target displacement of the F + i frame, the F + i-2 th frame and the F + i-4 th frame respectively, and sequentially obtaining the displacement S between the target displacement of the F + i frame, the F + i-2 th frame and the F + i-4 th frame and the first frame1、S2、S3
(3b) Obtaining the motion trend of the target:
according to the motion rule of the target, the obtained target displacement is utilized to calculate the motion trend (G) of the target in the x and y directions of the image coordinate system according to the following formulax,Gy);
V1=(S1-S2)/2
V2=(S2-S3)/2
a=(V1-V2)/2
G=V1+a/2
The invention uses an image coordinate system, which takes the upper left corner of an image as an origin, the horizontal right direction is the x direction, and the vertical downward direction is the y direction. In the above formula, V1Is a displacement S1And S2Velocity of movement of the target, V2Is a displacement S2And S3The moving speed of the target, a is the moving acceleration, and G is the moving trend of the target.
(3c) Constructing a loss function of a motion estimation network ME-CNN:
calculating to obtain the motion trend of the target according to the motion rule of the target, using the motion trend as a training label corresponding to the target, and calculating to obtain the motion trend (G) of the targetx,Gy) And estimating the predicted position (P) of the network ME-CNN outputx,Py) The Euclidean spatial distance between the two networks is constructed as a loss function of the motion estimation network ME-CNN;
Figure BDA0002156443730000091
in the formula, GxIs the moving trend of the target in the x direction under the image coordinate system, GyFor the image sittingThe y-direction object motion tendency, PxFor the prediction result of the motion estimation network in the x-direction in the image coordinate system, PyAnd estimating the prediction result of the network in the y direction under the image coordinate system for the motion.
A comprehensive example is given below to further illustrate the invention
Example 4
The method for tracking the remote sensing video of the large-scene tiny target based on the motion estimation ME-CNN network is the same as the embodiment 1-3,
referring to fig. 1, a large-scene minimal target remote sensing video tracking method based on a motion estimation ME-CNN network includes the following steps:
(1) obtaining an initial training set D of a minimum target motion estimation network ME-CNN:
taking the former F frame images of the original remote sensing data video A, continuously marking a boundary frame for a target of each image, and superposing the top left corner vertex coordinates of each boundary frame together to form a training set D, wherein the training set D is a matrix of F rows and 2 columns, each row corresponds to the target coordinates of one frame in the video, the position of the target can be represented by the coordinates of the top left corner vertex and can also be represented by the center coordinates, the analysis of the motion condition of the target is not influenced, and the minimum target is simply called as the target in the invention.
(2) Constructing a network ME-CNN for estimating the movement of the minimum target: the method comprises three convolution modules which are connected in parallel and used for extracting different characteristics of training data so as to obtain different motion characteristics of a target, a single convolution layer is difficult to process so as to obtain the characteristics of the whole training set, if a network layer is deep, the problem of gradient disappearance can occur, therefore, the network is widened, the characteristics of different conditions of the training set are extracted in multiple scales, the complexity of the network can be reduced, the network speed is accelerated, and then connecting layers are sequentially stacked for fusing the extracted motion characteristics and the whole connecting layers to obtain results by adding an analysis layer and an output layer.
(2a) Overall structure of the motion estimation network: the motion estimation network ME-CNN comprises three convolution modules connected in parallel, and a connection layer, a full connection layer and an output layer are sequentially stacked;
(2b) structure of three convolution modules in parallel: the convolution modules in parallel are convolution module I, convolution module I and convolution module I respectively, wherein
The convolution module I comprises a locally connected Locallyconnected1D convolution layer, and the step length is 2 to extract the coordinate position information of the target;
the convolution module I comprises cavity convolution, and the step length is 1;
the convolution module I comprises one-dimensional convolution with the step length of 2;
the convolution modules I, I and I obtain the position characteristics of different scales of the target to obtain three output data, and then the outputs of the three convolution modules are connected in series to obtain a fusion convolution result; and inputting the full connection layer and the output layer to obtain a final prediction result.
(3) Constructing a loss function of the ME-CNN of the minimum target motion estimation network: calculating to obtain the motion trend of the target according to the motion rule of the target, taking the motion trend as a training label corresponding to the target, and calculating the Euclidean spatial distance between the motion trend and the prediction result of the ME-CNN network as a loss function of the ME-CNN network;
(3a) acquiring the target displacement of a training set D: if the training set is an initial training set, taking out the data of the F-th line, the F-2 th line and the F-4 th line of the training set D, and subtracting the data of the first line of the training set D respectively to obtain the target displacement S between the F-th frame, the F-2 nd frame and the F-4 th frame and the first frame in sequence1、S2、S3。S1Is the target displacement between the F-th frame and the first frame, S2Is the target displacement between the F-2 frame and the first frame, S3Is the target displacement between the F-4 th frame and the first frame. If the training set is not the initial training set but the training set D which is updated for i times, the frame number corresponding to each line of the training set is changed correspondingly and is changed into the 1+ i frame, the 2+ i frame, … … and the F + i frame, the data of the F-th line, the F-2 line and the F-4 line of the training set D are taken out and subtracted from the data of the first line of the training set D respectively, and the target displacement of the F + i frame, the F + i-2 frame and the F + i-4 frame is obtained and is S sequentially between the first frame and the F + i frame1、S2、S3
(3b) Obtaining the motion trend of the target:
according to the motion rule of the target, the obtained training data target displacement is utilized to calculate the motion trend (G) of the target in the x and y directions of the image coordinate system according to the following formulax,Gy)。
V1=(S1-S2)/2
V2=(S2-S3)/2
a=(V1-V2)/2
G=V1+a/2
(3c) Constructing a loss function of a motion estimation network ME-CNN:
calculated target motion tendency (G)x,Gy) And estimating the predicted position (P) of the network outputx,Py) The Euclidean spatial distance between the two networks is constructed as a loss function of the motion estimation network ME-CNN.
Figure BDA0002156443730000111
(4) Updating training labels in the loss function: because the training set D is continuously updated in the subsequent step (7), the training labels in the loss function need to be continuously adjusted according to the updated training set D in the training process, and participate in the ME-CNN training of the motion estimation network.
(5) Obtaining an initial model M1 for predicting the movement position of the target: and inputting the training set D into the object motion estimation network ME-CNN, training the network according to the loss function, and obtaining an initial model M1 for predicting the motion position of the object.
(6) Position result of the corrected prediction model: and calculating the auxiliary position offset of the target, and correcting the position result predicted by the motion estimation network ME-CNN by using the offset.
(6a) Obtaining a target gray level image block: obtaining the target position (P) of the next frame according to the initial model M1 for predicting the target motion positionx,Py) Based on the obtained target position (P)x,Py) Taking out the gray image block of the target from the image of the next frame, and normalizing to obtain the normalized targetThe gray scale image block has the advantages that the size of the target is extremely small, the contrast with the surrounding environment is extremely low, and the effect of the method for judging the offset by using the neural network is poor, so that a smaller target frame is obtained firstly, and then the method for judging the offset in the frame is better.
(6b) Obtaining a target position offset: and carrying out brightness grading on the normalized target gray image block, determining the position of a target in the image block by using a vertical projection method, and calculating the distance between the center position of the target and the center position of the image block to obtain the target position offset.
(6c) Obtaining a corrected target position: and correcting the position of the predicted target by the motion estimation network ME-CNN by using the obtained target position offset to obtain all the corrected positions of the target, including the position of the upper left corner of the target.
(7) And updating the training data set by using the corrected target position to complete target tracking of one frame: and adding the obtained position of the upper left corner of the target into the last line of the training set D, removing the first line of the training set D, performing one-time operation to obtain a corrected and updated training set, completing the training of one frame, and obtaining the target position result of one frame.
(8) Obtaining a remote sensing video target tracking result: and (5) repeating the step (4) to the step (7) circularly, continuously using the updated training set to obtain a training label again according to the method in the step (3), updating the network model, repeating iteration, and carrying out tracking approximate training on the target until all video frames are traversed, wherein the accumulated output is the tracking result of the remote sensing video target.
In the embodiment, the motion estimation model of the target can also extract road information of the target through the target motion of the previous frames, find the city where the target with the same longitude and latitude is located on the map, predict the target motion condition by matching the corresponding road condition through the road, fully utilize the three-dimensional information of the road, and accurately track the target under the condition that the road height changes violently and partial zooming exists in the video; the auxiliary position offset of the target can also be obtained by training the neural network, but the target and the surrounding environment need to be processed in advance to obtain image blocks with higher contrast, so that the neural network can be trained.
The technical effects of the invention are further explained by combining simulation tests as follows:
example 5
The method for tracking the remote sensing video of the large-scene tiny target based on the motion estimation ME-CNN network is the same as the embodiment 1-4,
simulation conditions and contents:
the simulation platform of the invention is as follows: intel Xeon CPU E5-2630v3CPU with a main frequency of 2.40GHz, 64GB running memory, Ubuntu16.04 operating system and Keras and Python software platforms. A display card: GeForce GTX TITAN X/PCIe/SSE2 × 2.
The invention uses a remote sensing video of a Libya Delner area shot by a Jilin No. I video satellite, a vehicle of a former 10-frame image is taken as a target, a frame is marked on the target in the image, and the position of the top left vertex of the frame is taken as a training set DateSet. The target video is tracked and simulated by the method and the conventional target tracking method based on KCF respectively.
Simulation content and results:
the comparison method, namely the conventional KCF-based target tracking method, is used for carrying out experiments under the simulation conditions, namely the comparison method and the comparison method are used for tracking the vehicle target in the remote sensing video of the Libya Delner area, a comparison graph of the predicted target track result (green curve) and the accurate target track (red curve) of the ME-CNN network is obtained and shown in figure 3, and the results shown in the table 1 are obtained.
TABLE 1 List of remote sensing video target tracking results in Libiadral area
Figure BDA0002156443730000121
Figure BDA0002156443730000131
And (3) simulation result analysis:
in table 1, Precision represents the area overlapping rate of the target position and the tag position predicted by the ME-CNN network, IOU represents the percentage of the average euclidean distance between the center position of the bounding box and the center position of the tag being smaller than a given threshold, in this example, the given threshold is selected to be 5, KCF represents the comparison method, and ME-CNN represents the method of the present invention.
Referring to table 1, it can be seen from the comparison of the data in table 1 that the present invention greatly improves the tracking accuracy, the present invention improves Precision from 63.21% to 85.63%,
from table 1, it can be seen that the percentage IOU where the average euclidean distance between the center position of the bounding box and the center position of the tag is less than the given threshold improves 58.72% of the KCF-based target tracking method of the comparative method to 76.51%.
Referring to fig. 3, the red curve in fig. 3 is a standard target trajectory curve, the green curve is a tracking prediction estimation curve for the same target by using the method, the extremely small target in the large scene is displayed in the green box, and comparing the two curves shows that the two curves are consistent in height and basically coincide, which proves that the method has high tracking accuracy.
In short, the method for tracking the large-scene minimal target remote sensing video based on the motion estimation ME-CNN network provided by the invention can improve the tracking accuracy under the conditions that a shooting satellite continuously moves, the video has integral translation and partial scaling, the resolution of the video is extremely low and the target size is extremely small, solves the problem that the minimal target tracking is carried out by using motion parameters without registration, and comprises the following implementation steps: obtaining an initial training set D of a minimum target motion estimation network ME-CNN; constructing a network ME-CNN for estimating the movement of the minimum target; calculating a loss function of the network ME-CNN by using the minimum target motion parameter; judging whether the training set is an initial training set; updating the training labels in the loss function; obtaining an initial model M1 for predicting the movement position of the target; correcting the position result of the prediction model; updating the training data set by using the corrected target position to complete target tracking of one frame; judging whether the current video frame number is less than the total video frame number; and obtaining a remote sensing video target tracking result. The invention uses the deep learning network ME-CNN to predict the target motion position, avoids the problems of large scene image registration and difficult extraction of the super-fuzzy target characteristics in the tracking of the existing method, reduces the dependency on the target characteristics, obviously improves the accuracy of target tracking in the super-fuzzy video, and is also suitable for tracking in other various remote sensing videos.

Claims (3)

1.一种基于运动估计ME-CNN网络的大场景极小目标跟踪方法,其特征在于,包括如下步骤:1. a large scene minimal target tracking method based on motion estimation ME-CNN network, is characterized in that, comprises the steps: (1)获取极小目标运动估计网络ME-CNN的初始训练集D:(1) Obtain the initial training set D of the minimal target motion estimation network ME-CNN: 取原始遥感数据视频A的前F帧图像,对每幅图像的同一个目标连续标记边界框,将每个边界框左上角顶点坐标按视频帧数顺序排列在一起作为训练集D;Take the first F frame images of the original remote sensing data video A, continuously mark the bounding box for the same target in each image, and arrange the coordinates of the vertices of the upper left corner of each bounding box in the order of the number of video frames as the training set D; (2)构建估计极小目标运动的网络ME-CNN:包括并联的三个提取训练数据不同特征的卷积模块,再依次层叠连接层、全连接层和输出层;(2) Build a network ME-CNN for estimating the motion of extremely small objects: it includes three parallel convolution modules that extract different features of the training data, and then stack the connection layer, the fully connected layer and the output layer in sequence; (3)用极小目标运动参数计算网络ME-CNN的损失函数:根据目标的运动规律计算得到目标的运动趋势,并将它作为目标对应的训练标签,再计算训练标签与ME-CNN网络的预测结果之间的欧式空间距离,作为ME-CNN网络优化训练的损失函数;(3) Calculate the loss function of the network ME-CNN with the minimal target motion parameters: Calculate the motion trend of the target according to the motion law of the target, and use it as the training label corresponding to the target, and then calculate the training label and the ME-CNN network. The Euclidean space distance between the prediction results, as the loss function of ME-CNN network optimization training; (4)判断是否为初始训练集:判断当前训练集是否为初始训练集,如果不是初始训练集,执行步骤(5),更新损失函数中的训练标签;反之如果是初始训练集,执行步骤(6),进入网络的循环训练;(4) Judging whether it is the initial training set: judge whether the current training set is the initial training set, if it is not the initial training set, execute step (5) to update the training label in the loss function; otherwise, if it is the initial training set, execute step ( 6), enter the loop training of the network; (5)更新损失函数中的训练标签:当前训练集不是初始训练集时,使用当前训练集的数据重新计算损失函数的训练标签,计算方法用极小目标运动参数计算训练标签的方法,与步骤(3)的方法相同,重新计算得到的训练标签,参与运动估计网络ME-CNN训练,进入步骤(6);(5) Update the training label in the loss function: When the current training set is not the initial training set, use the data of the current training set to recalculate the training label of the loss function. The calculation method uses the minimal target motion parameter to calculate the training label, and the steps The method of (3) is the same, the training label obtained by recalculation, participates in the training of the motion estimation network ME-CNN, and enters step (6); (6)得到预测目标运动位置的初始模型M1:将训练集D输入目标运动估计网络ME-CNN,根据当前的损失函数训练网络,得到预测目标运动位置的初始模型M1;(6) Obtain the initial model M1 for predicting the motion position of the target: input the training set D into the target motion estimation network ME-CNN, train the network according to the current loss function, and obtain the initial model M1 for predicting the motion position of the target; (7)修正预测模型的位置结果:计算目标的辅助位置偏移量,用偏移量修正运动估计网络ME-CNN预测的位置结果;(7) Modify the position result of the prediction model: calculate the auxiliary position offset of the target, and use the offset to correct the position result predicted by the motion estimation network ME-CNN; (7a)得到目标灰度图像块:根据预测目标运动位置的初始模型M1得到下一帧的目标位置(Px,Py),根据得到的目标位置(Px,Py)在下一帧的图像中取出目标的灰度图像块,并进行归一化,得到归一化后的目标灰度图像块;(7a) Obtain the target grayscale image block: obtain the target position (P x , P y ) of the next frame according to the initial model M1 for predicting the target motion position, and according to the obtained target position (P x , P y ) in the next frame The grayscale image block of the target is taken out from the image and normalized to obtain the normalized target grayscale image block; (7b)得到目标位置偏移量:对归一化后的目标灰度图像块进行亮度分级,使用垂直投影法确定图像块中目标的位置,计算得到的目标中心位置与图像块中心的位置的距离,即得到目标位置偏移量;(7b) Obtaining the offset of the target position: perform brightness classification on the normalized target grayscale image block, use the vertical projection method to determine the position of the target in the image block, and calculate the difference between the center position of the target and the position of the center of the image block. distance, that is, to get the target position offset; (7c)得到修正后的目标位置:利用得到的目标位置偏移量修正运动估计网络ME-CNN预测目标的位置,得到目标修正后的所有位置;(7c) Obtain the corrected target position: use the obtained target position offset to correct the motion estimation network ME-CNN to predict the position of the target, and obtain all the corrected positions of the target; (8)用修正后的目标位置更新训练数据集,完成一帧的目标跟踪:将得到的目标左上角位置加入训练集D最后一行,并移除训练集D的第一行,进行一次性操作,得到了一个修正并更新的训练集D,完成一帧的训练,得到了一帧的目标位置结果;(8) Update the training data set with the corrected target position to complete the target tracking of one frame: add the obtained position of the upper left corner of the target to the last row of the training set D, and remove the first row of the training set D, and perform a one-time operation , obtained a revised and updated training set D, completed one frame of training, and obtained the target position result of one frame; (9)判断目前的视频帧数是否小于总视频帧数:如果小于总视频帧数就循环重复步骤(4)~步骤(9),进行目标的跟踪优化训练,直至遍历所有的视频帧,如果等于总视频帧数,结束训练,执行步骤(10);(9) Determine whether the current number of video frames is less than the total number of video frames: if it is less than the total number of video frames, repeat steps (4) to (9) cyclically, and perform target tracking optimization training until all video frames are traversed. Equal to the total number of video frames, end the training, and execute step (10); (10)得到遥感视频目标跟踪结果:累积得到的输出即为遥感视频目标跟踪结果。(10) Obtain the remote sensing video target tracking result: the accumulated output is the remote sensing video target tracking result. 2.根据权利要求1所述的基于运动估计ME-CNN网络的大场景极小目标跟踪方法,其特征在于:步骤(2)中所述的构建估计极小目标运动的网络ME-CNN,包括有如下步骤:2. the large-scene minimal target tracking method based on motion estimation ME-CNN network according to claim 1, is characterized in that: the network ME-CNN described in step (2) is constructed to estimate the motion of minimal target, including There are the following steps: (2a)运动估计网络的整体结构:运动估计网络ME-CNN,包括并联的三个卷积模块,再依次层叠连接层、全连接层、输出层;(2a) The overall structure of the motion estimation network: the motion estimation network ME-CNN, including three convolution modules in parallel, and then stacking the connection layer, the fully connected layer, and the output layer in sequence; (2b)并联的三个卷积模块的结构:并联的三个卷积模块,分别为卷积模块Ι、卷积模块ΙΙ和卷积模块ΙΙΙ,其中(2b) The structure of three convolution modules in parallel: the three convolution modules in parallel are respectively convolution module Ι, convolution module ΙΙ and convolution module ΙΙΙ, wherein 卷积模块I包含局部连接的LocallyConnected1D卷积层,步长为2提取目标的坐标位置信息;The convolution module I includes a locally connected LocallyConnected1D convolutional layer, and the step size is 2 to extract the coordinate position information of the target; 卷积模块ΙΙ包含空洞卷积,其步长为1;The convolution module ΙΙ contains atrous convolution with a stride of 1; 卷积模块ΙΙΙ包含步长为2的一维卷积;The convolution module ΙΙΙ includes a one-dimensional convolution with a stride of 2; 卷积模块Ι、ΙΙ和ΙΙΙ获取目标不同尺度的位置特征,得到三个输出数据,然后将三个卷积模块的输出串联在一起得到融合卷积结果;再输入全连接层及输出层,得到最后预测结果。The convolution modules Ι, ΙΙ and ΙΙΙ obtain the positional features of the target at different scales, obtain three output data, and then connect the outputs of the three convolution modules in series to obtain the fusion convolution result; then input the fully connected layer and the output layer to obtain The final prediction result. 3.根据权利要求1所述的基于运动估计ME-CNN网络的大场景极小目标跟踪方法,其特征在于:步骤3中所述的用极小目标运动参数计算网络ME-CNN的损失函数,包括有如下步骤:3. the large-scene minimal target tracking method based on motion estimation ME-CNN network according to claim 1, is characterized in that: the loss function of calculating network ME-CNN with minimal target motion parameter described in step 3, It includes the following steps: (3a)获取训练集D目标位移:将训练集D第F行、第F-2行、第F-4行的数据取出,分别与训练集D第一行数据相减,得到第F帧、第F-2帧、第F-4帧分别与第一帧之间的目标位移依次为S1、S2、S3(3a) Obtaining the target displacement of the training set D: take out the data of the Fth row, the F-2th row, and the F-4th row of the training set D, and subtract them from the data of the first row of the training set D respectively to obtain the Fth frame, The target displacements between the F-2 frame and the F-4 frame and the first frame are respectively S 1 , S 2 , and S 3 ; (3b)得到目标的运动趋势:(3b) Get the movement trend of the target: 根据目标的运动规律,利用得到的目标位移,分别在图像坐标系的x,y方向按照下列公式计算得到目标的运动趋势(Gx,Gy);According to the movement law of the target, using the obtained target displacement, calculate the movement trend (G x , G y ) of the target in the x and y directions of the image coordinate system according to the following formulas respectively; V1=(S1-S2)/2V 1 =(S 1 -S 2 )/2 V2=(S2-S3)/2V 2 =(S 2 -S 3 )/2 a=(V1-V2)/2a=(V 1 -V 2 )/2 G=V1+a/2G=V 1 +a/2 式中,V1为位移S1与S2间的目标运动速度,V2为位移S2与S3间的目标运动速度,a为运动加速度,G为目标的运动趋势;In the formula, V 1 is the target movement speed between the displacements S 1 and S 2 , V 2 is the target movement speed between the displacements S 2 and S 3 , a is the movement acceleration, and G is the movement trend of the target; (3c)构建运动估计网络ME-CNN的损失函数:(3c) Construct the loss function of the motion estimation network ME-CNN: 根据目标的运动规律计算得到目标的运动趋势,并将它作为目标对应的训练标签,计算得到的目标运动趋势(Gx,Gy)与估计网络ME-CNN输出的预测位置(Px,Py)之间的欧式空间距离,构建为运动估计网络ME-CNN的损失函数;The movement trend of the target is calculated according to the movement law of the target, and it is used as the training label corresponding to the target. y ) the Euclidean space distance between, constructed as the loss function of the motion estimation network ME-CNN;
Figure FDA0003156641680000031
Figure FDA0003156641680000031
式中,Gx为图像坐标系下x方向的目标运动趋势,Gy为图像坐标系下y方向的目标运动趋势,Px为运动估计网络在图像坐标系下x方向的预测结果,Py为运动估计网络在图像坐标系下y方向的预测结果。In the formula, G x is the target motion trend in the x direction under the image coordinate system, G y is the target motion trend in the y direction under the image coordinate system, P x is the prediction result of the motion estimation network in the x direction under the image coordinate system, P y It is the prediction result of the motion estimation network in the y direction in the image coordinate system.
CN201910718847.6A 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network Active CN110517285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910718847.6A CN110517285B (en) 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910718847.6A CN110517285B (en) 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network

Publications (2)

Publication Number Publication Date
CN110517285A CN110517285A (en) 2019-11-29
CN110517285B true CN110517285B (en) 2021-09-10

Family

ID=68624473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910718847.6A Active CN110517285B (en) 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network

Country Status (1)

Country Link
CN (1) CN110517285B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986233B (en) * 2020-08-20 2023-02-10 西安电子科技大学 Remote sensing video tracking method for extremely small targets in large scenes based on feature self-learning
CN114066937B (en) * 2021-11-06 2022-09-02 中国电子科技集团公司第五十四研究所 Multi-target tracking method for large-scale remote sensing image
CN115086718A (en) * 2022-07-19 2022-09-20 广州万协通信息技术有限公司 Video stream encryption method and device
CN118823685B (en) * 2024-09-18 2024-12-17 厦门众联世纪股份有限公司 Crowd positioning method, system and storage medium based on hybrid expert network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886120A (en) * 2017-11-03 2018-04-06 北京清瑞维航技术发展有限公司 Method and apparatus for target detection tracking
CN108154522A (en) * 2016-12-05 2018-06-12 北京深鉴科技有限公司 Target tracking system
US10176388B1 (en) * 2016-11-14 2019-01-08 Zoox, Inc. Spatial and temporal information for semantic segmentation
CN109242884A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Remote sensing video target tracking method based on JCFNet network
CN109376736A (en) * 2018-09-03 2019-02-22 浙江工商大学 A video small object detection method based on deep convolutional neural network
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10176388B1 (en) * 2016-11-14 2019-01-08 Zoox, Inc. Spatial and temporal information for semantic segmentation
CN108154522A (en) * 2016-12-05 2018-06-12 北京深鉴科技有限公司 Target tracking system
CN107886120A (en) * 2017-11-03 2018-04-06 北京清瑞维航技术发展有限公司 Method and apparatus for target detection tracking
CN109242884A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Remote sensing video target tracking method based on JCFNet network
CN109376736A (en) * 2018-09-03 2019-02-22 浙江工商大学 A video small object detection method based on deep convolutional neural network
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于CNN-AE特征提取的目标跟踪方法;殷鹤楠,佟国香;《软件导刊》;20180630;第17卷(第6期);第22-26页 *

Also Published As

Publication number Publication date
CN110517285A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN110517285B (en) Large-scene minimum target tracking based on motion estimation ME-CNN network
Lu et al. A CNN-transformer hybrid model based on CSWin transformer for UAV image object detection
CN113516664B (en) Visual SLAM method based on semantic segmentation dynamic points
CN112215128B (en) FCOS-fused R-CNN urban road environment recognition method and device
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN108256431B (en) Hand position identification method and device
JP2022526513A (en) Video frame information labeling methods, appliances, equipment and computer programs
JP6650657B2 (en) Method and system for tracking moving objects in video using fingerprints
CN104463903B (en) A kind of pedestrian image real-time detection method based on goal behavior analysis
CN112464912B (en) Robot end face detection method based on YOLO-RGGNet
CN111881790A (en) Automatic extraction method and device for road crosswalk in high-precision map making
EP3070676A1 (en) A system and a method for estimation of motion
CN108198201A (en) A kind of multi-object tracking method, terminal device and storage medium
CN117949942A (en) Target tracking method and system based on fusion of radar data and video data
CN111199556A (en) Indoor pedestrian detection and tracking method based on camera
CN111161313A (en) Method and device for multi-target tracking in video stream
CN106780564A (en) A kind of anti-interference contour tracing method based on Model Prior
CN111414938B (en) A target detection method for air bubbles in plate heat exchangers
Yu et al. Shallow detail and semantic segmentation combined bilateral network model for lane detection
US20240046495A1 (en) Method for training depth recognition model, electronic device, and storage medium
CN113920254B (en) Monocular RGB (Red Green blue) -based indoor three-dimensional reconstruction method and system thereof
CN111986233B (en) Remote sensing video tracking method for extremely small targets in large scenes based on feature self-learning
CN113095164A (en) Lane line detection and positioning method based on reinforcement learning and mark point characterization
Nejadasl et al. Optical flow based vehicle tracking strengthened by statistical decisions
CN103559722B (en) Based on the sequence image amount of jitter computing method of gray scale linear modelling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant