CN110490073A

CN110490073A - Object detection method, device, equipment and storage medium

Info

Publication number: CN110490073A
Application number: CN201910637703.8A
Authority: CN
Inventors: 樊龙; 黄晓峰; 殷海兵; 贾惠柱
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2019-07-15
Filing date: 2019-07-15
Publication date: 2019-11-22

Abstract

The present application discloses a target detection method, device, equipment, and storage medium. Video data is obtained, and a first image sequence of the video data is preprocessed to obtain a second image sequence in which background images are removed. The image sequence is input into the trained detection model for target detection, and the target detection result is obtained. On the one hand, only the foreground object is reserved for the image with the background removed, and there is no interference from other background images. The detection model pays more attention to the foreground object during learning and reasoning, which can improve the accuracy of object detection; on the other hand, due to the removal of the input image Background pixels, the detection model only sees foreground pixels, and will not be affected by video or picture sequence scenes at all, thus improving the scene migration performance of target detection.

Description

Target detection method, device, equipment and storage medium

技术领域technical field

本申请涉及计算机视觉技术领域，特别涉及一种目标检测方法、装置、设备及存储介质。The present application relates to the technical field of computer vision, and in particular to an object detection method, device, equipment and storage medium.

背景技术Background technique

众所周知，视觉是获取信息最直接和有效的手段，然而大多数监控系统处于“只记录不判断”的工作模式，摄像机获取的视频信号传送到控制中心，由控制中心的操作员分析并且做出相应的判断。然而这样在人力资源上存在极大的浪费。随着计算机视觉智能视频处理系统的出现，利用图像处理技术和机器学习方法实现目标检测和跟踪等视频分析。As we all know, vision is the most direct and effective means of obtaining information. However, most monitoring systems are in the working mode of "only recording without judgment". judgment. However, there is a great waste of human resources in this way. With the emergence of computer vision intelligent video processing systems, image processing technology and machine learning methods are used to achieve video analysis such as target detection and tracking.

目标检测的任务是找出图像中所有感兴趣的目标，确定它们的位置和大小。由于各类物体有不同的外观、形状、姿态，加上成像时光照、遮挡等因素的干扰，目标检测一直是机器视觉领域最具有挑战性的问题。The task of object detection is to find all the objects of interest in the image, determine their position and size. Object detection has always been the most challenging problem in the field of machine vision due to the different appearance, shape, and posture of various objects, coupled with the interference of factors such as illumination and occlusion during imaging.

现有的目标检测在静态图片中对于背景复杂的场景容易产生误检，因此目标检测准确率有待提高。除此之外，现有的目标检测对于实现监控复杂场景的泛化性能也有一定的限制，而为了提高目标检测算法的场景迁移性能，需要训练大量的数据集，对数据的依赖性较强。Existing target detection is prone to false detection for scenes with complex backgrounds in static pictures, so the accuracy of target detection needs to be improved. In addition, the existing target detection also has certain limitations in realizing the generalization performance of monitoring complex scenes. In order to improve the scene transfer performance of the target detection algorithm, a large number of data sets need to be trained, which is highly dependent on data.

发明内容Contents of the invention

本申请的目的在于提供一种目标检测方法、装置、设备及存储介质，以提高目标检测的准确率和场景迁移性能。The purpose of the present application is to provide an object detection method, device, equipment and storage medium, so as to improve the accuracy of object detection and scene transition performance.

第一方面，本申请实施例提供了一种目标检测方法，包括：In the first aspect, the embodiment of the present application provides a target detection method, including:

获取视频数据；Get video data;

对所述视频数据的第一图像序列进行预处理，获得去除背景图像的第二图像序列；Preprocessing the first image sequence of the video data to obtain a second image sequence from which the background image is removed;

将所述第二图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。The second image sequence is input into the trained detection model to perform target detection, and a target detection result is obtained.

在一种可能的实现方式中，在本申请实施例提供的上述方法中，包括：In a possible implementation manner, the above-mentioned method provided in the embodiment of the present application includes:

利用背景减除法对所述视频数据的第一图像序列进行运动目标检测；performing moving target detection on the first image sequence of the video data by using a background subtraction method;

保留运动目标所在区域像素，利用形态学方法对所述运动目标所在区域像素进行分割处理，分割成独立的运动目标单元，以获得去除背景图像的第二图像序列。The pixels of the area where the moving object is located are retained, and the pixels of the area where the moving object is located are segmented by using a morphological method, and divided into independent units of the moving object, so as to obtain a second image sequence in which the background image is removed.

在一种可能的实现方式中，在本申请实施例提供的上述方法中，所述检测模型采用SSD框架，该SSD框架包括：特征提取网络和目标检测网络。In a possible implementation manner, in the above method provided by the embodiment of the present application, the detection model adopts an SSD framework, and the SSD framework includes: a feature extraction network and a target detection network.

在一种可能的实现方式中，在本申请实施例提供的上述方法中，所述方法还包括训练SSD框架，其包括：In a possible implementation, in the above method provided by the embodiment of the present application, the method further includes training the SSD framework, which includes:

对样本视频数据的图像序列进行预处理，获得去除背景图像的样本图像序列；Preprocessing the image sequence of the sample video data to obtain a sample image sequence from which the background image is removed;

对所述样本图像序列进行人工目标标注，获得训练数据集；Carrying out manual target labeling on the sample image sequence to obtain a training data set;

基于所述训练数据集对SSD框架进行训练：首先初始化网络中待训练的参数及超参数，向初始化后的网络中输入训练数据进行网络前向传播，得到实际的输出结果，通过损失函数结合反向传播BP算法调整网络参数，进行迭代训练，至损失函数的损失值小于设定阈值或达到最大迭代次数时训练结束，得到训练好的SSD框架。The SSD framework is trained based on the training data set: first, initialize the parameters and hyperparameters to be trained in the network, input the training data into the initialized network for forward propagation of the network, and obtain the actual output results, and combine the loss function with the inverse Adjust the network parameters to the propagation BP algorithm, perform iterative training, and end the training until the loss value of the loss function is less than the set threshold or reaches the maximum number of iterations, and a trained SSD framework is obtained.

在一种可能的实现方式中，在本申请实施例提供的上述方法中，所述损失函数为位置误差与置信度误差的加权和。In a possible implementation manner, in the above method provided in the embodiment of the present application, the loss function is a weighted sum of a position error and a confidence error.

在一种可能的实现方式中，在本申请实施例提供的上述方法中，所述置信度误差的计算公式如下：In a possible implementation, in the above method provided in the embodiment of the present application, the calculation formula of the confidence error is as follows:

其中，表示预测框i与真实框j关于类别匹配。in, Indicates that the predicted box i matches the ground truth box j with respect to the category.

第二方面，本申请实施例提供了一种目标检测装置，包括：In the second aspect, the embodiment of the present application provides a target detection device, including:

获取模块，用于获取视频数据；An acquisition module, used to acquire video data;

预处理模块，用于对所述视频数据的第一图像序列进行预处理，获得去除背景图像的第二图像序列；A preprocessing module, configured to preprocess the first image sequence of the video data to obtain a second image sequence from which the background image is removed;

目标检测模块，用于将所述第二图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。A target detection module, configured to input the second image sequence into a trained detection model for target detection and obtain a target detection result.

在一种可能的实现方式中，在本申请实施例提供的上述装置中，所述预处理模块，具体用于：In a possible implementation manner, in the above device provided in the embodiment of the present application, the preprocessing module is specifically used for:

保留运动目标所在区域像素，利用形态学方法对所述运动目标所在区域像素进行分割处理，分割成独立的运动目标单元，以获得去除背景图像的第二图像序列；Retaining the pixels in the region where the moving target is located, and segmenting the pixels in the region where the moving target is located by using a morphological method, and dividing them into independent moving target units, so as to obtain a second image sequence in which the background image is removed;

获取去除背景图像的第二图像序列。Obtain a second sequence of images with the background image removed.

在一种可能的实现方式中，在本申请实施例提供的上述装置中，所述检测模型采用SSD框架，该SSD框架包括：特征提取网络和目标检测网络。In a possible implementation manner, in the above device provided by the embodiment of the present application, the detection model adopts an SSD framework, and the SSD framework includes: a feature extraction network and a target detection network.

在一种可能的实现方式中，在本申请实施例提供的上述装置中，还包训练模块，用于：In a possible implementation, in the above device provided in the embodiment of the present application, a training module is also included for:

在一种可能的实现方式中，在本申请实施例提供的上述装置中，所述损失函数为位置误差与置信度误差的加权和。In a possible implementation manner, in the above-mentioned apparatus provided by the embodiment of the present application, the loss function is a weighted sum of a position error and a confidence error.

在一种可能的实现方式中，在本申请实施例提供的上述装置中，所述置信度误差的计算公式如下：In a possible implementation, in the above device provided in the embodiment of the present application, the calculation formula of the confidence error is as follows:

第三方面，本申请实施例提供了一种电子设备，包括：存储器和处理器；In a third aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor;

所述存储器，用于存储计算机程序；The memory is used to store computer programs;

其中，所述处理器执行所述存储器中的计算机程序，以实现上述第一方面以及第一方面的各个实施方式中所述的方法。Wherein, the processor executes the computer program in the memory, so as to implement the above first aspect and the method described in each implementation manner of the first aspect.

第四方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机程序，所述计算机程序被处理器执行时用于实现上述第一方面以及第一方面的各个实施方式中所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned first aspect and the first aspect are implemented. The methods described in the various embodiments of the aspect.

与现有技术相比，本申请提供的目标检测方法、装置、设备及存储介质，获取视频数据，对所述视频数据的第一图像序列进行预处理，获得去除背景图像的第二图像序列，将所述第二图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。一方面，对于去除背景的图像只保留前景目标，没有其它背景图像的干扰，检测模型在学习和推理时更关注前景目标，从而可以提高目标检测准确率；另一方面，由于去除了输入图像的背景像素，检测模型所看到的只有前景像素，完全不会受到视频或者图片序列场景的影响，从而提高了目标检测的场景迁移性能。Compared with the prior art, the object detection method, device, equipment and storage medium provided by the present application acquire video data, perform preprocessing on the first image sequence of the video data, and obtain the second image sequence with the background image removed, The second image sequence is input into the trained detection model to perform target detection, and a target detection result is obtained. On the one hand, only the foreground object is reserved for the image with the background removed, without the interference of other background images, the detection model pays more attention to the foreground object during learning and reasoning, which can improve the accuracy of object detection; on the other hand, due to the removal of the input image Background pixels, the detection model only sees foreground pixels, and will not be affected by video or picture sequence scenes at all, thus improving the scene migration performance of target detection.

附图说明Description of drawings

图1为本申请实施例一提供的目标检测方法的流程示意图；FIG. 1 is a schematic flow chart of a target detection method provided in Embodiment 1 of the present application;

图2为本申请实施例提供的背景去除方法的流程图；FIG. 2 is a flow chart of the background removal method provided by the embodiment of the present application;

图3为本申请实施例提供的基于SSD框架的目标检测系统整体结构；Fig. 3 is the overall structure of the target detection system based on the SSD framework provided by the embodiment of the present application;

图4为本申请实施例二提供的目标检测装置的结构示意图；FIG. 4 is a schematic structural diagram of a target detection device provided in Embodiment 2 of the present application;

图5为本申请实施例三提供的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.

具体实施方式Detailed ways

下面结合附图，对本申请的具体实施方式进行详细描述，但应当理解本申请的保护范围并不受具体实施方式的限制。The specific embodiments of the present application will be described in detail below in conjunction with the accompanying drawings, but it should be understood that the protection scope of the present application is not limited by the specific embodiments.

除非另有其它明确表示，否则在整个说明书和权利要求书中，术语“包括”或其变换如“包含”或“包括有”等等将被理解为包括所陈述的元件或组成部分，而并未排除其它元件或其它组成部分。Unless expressly stated otherwise, throughout the specification and claims, the term "comprise" or variations thereof such as "includes" or "includes" and the like will be understood to include the stated elements or constituents, and not Other elements or other components are not excluded.

目标检测要解决的问题是：通过目标框找出图像或视频中某些类别的物体，并且给出该物体属于某一类别的概率，即一种位置坐标回归与类别预测结合的任务。The problem to be solved in target detection is to find certain categories of objects in images or videos through the target frame, and give the probability that the object belongs to a certain category, that is, a task combining position coordinate regression and category prediction.

SSD：中文全称：单次多框检测器，英文全称：Single Shot MultiBoxDetector，SSD框架包括特征提取网络和目标检测网络，其中，特征提取网络用于对图像进行特征提取，目标检测网络用于根据提取的特征进行位置回归和目标类别的预测，从而识别该图像中的物体类别。SSD: Full name in Chinese: Single Shot MultiBox Detector, full name in English: Single Shot MultiBoxDetector, the SSD framework includes a feature extraction network and a target detection network, where the feature extraction network is used to extract features from images, and the target detection network is used to extract The features of the image are used to perform position regression and target category prediction, thereby identifying the object category in the image.

图1为本申请实施例一提供的目标检测方法的流程示意图，实际应用中，本实施例的执行主体可以为目标检测装置，该目标检测装置可以通过虚拟装置实现，例如软件代码，也可以通过写入有相关执行代码的实体装置实现，例如U盘，再或者，也可以通过集成有相关执行代码的实体装置实现，例如，芯片、电脑、机器人等。Figure 1 is a schematic flow chart of the target detection method provided by Embodiment 1 of the present application. In practical applications, the subject of execution in this embodiment may be a target detection device, and the target detection device may be implemented by a virtual device, such as software code, or by It can be implemented by a physical device with relevant execution codes written therein, such as a USB flash drive, or it can also be realized by a physical device integrated with relevant execution codes, such as a chip, a computer, a robot, and the like.

如图1所示，该方法包括以下步骤S101～S103：As shown in Figure 1, the method includes the following steps S101-S103:

S101、获取视频数据。S101. Acquire video data.

S102、对所述视频数据的第一图像序列进行预处理，获得去除背景图像的第二图像序列。S102. Perform preprocessing on the first image sequence of the video data to obtain a second image sequence with background images removed.

S103、将所述第二图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。S103. Input the second image sequence into the trained detection model to perform target detection, and obtain a target detection result.

本实施例中，该视频数据可以是通过摄像头实时采集的，也可以是事先存储的，可以理解视频数据由多帧图像构成，视频数据中包括需要识别的目标，例如人物、车辆等。获得待目标检测的视频图像之后，对视频图像中的图像序列去除背景，只保留前景目标，也就是只保留目标所在区域的像素，背景像素区域置零，获得去除背景图像的图像序列。In this embodiment, the video data can be collected in real time by the camera or stored in advance. It can be understood that the video data is composed of multiple frames of images, and the video data includes objects to be identified, such as people and vehicles. After the video image to be detected is obtained, the background is removed from the image sequence in the video image, and only the foreground object is retained, that is, only the pixels in the area where the object is located are retained, and the background pixel area is set to zero to obtain an image sequence with the background image removed.

具体的，步骤S102具体可实现为：利用背景减除法对所述视频数据的第一图像序列进行运动目标检测；保留运动目标所在区域像素，利用形态学方法对所述运动目标所在区域像素进行分割处理，分割成独立的运动目标单元，以获得去除背景图像的第二图像序列。图2所示为背景去除方法的流程图。该算法主要是通过计算像素稳定性，算法在运行过程会记录从开始运行到当前时刻为止，稳定时间最长的像素点的灰度值，利用相邻几帧和历史像素的稳定性作为判断依据,在新的一帧来到时，通过一系列的阈值比较操作来判断像素点的稳定性，从而判断其是否为背景点，从而去除背景像素，保留前景像素。Specifically, step S102 can be specifically implemented as: using the background subtraction method to detect the moving object in the first image sequence of the video data; retaining the pixels in the area where the moving object is located, and segmenting the pixels in the area where the moving object is located using a morphological method The processing is divided into independent moving target units to obtain the second image sequence with the background image removed. Figure 2 is a flowchart of the background removal method. The algorithm mainly calculates the pixel stability. During the operation of the algorithm, it will record the gray value of the pixel with the longest stable time from the beginning of operation to the current moment, and use the stability of several adjacent frames and historical pixels as the basis for judgment. , When a new frame comes, the stability of the pixel is judged through a series of threshold comparison operations, so as to judge whether it is a background point, thereby removing the background pixel and retaining the foreground pixel.

然后将去除背景图像的图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。该检测模型在训练时也采用去除背景的样本进行训练。Then input the image sequence with the background image removed into the trained detection model for target detection, and obtain the target detection result. The detection model is also trained with background-removed samples during training.

下面以一个具体的实施方式对本申请进行介绍。The present application will be introduced in a specific implementation manner below.

本实施方式中，所述检测模型采用SSD框架，该SSD框架包括：特征提取网络和目标检测网络。对该SSD框架进行训练的方式如下：In this embodiment, the detection model adopts an SSD framework, and the SSD framework includes: a feature extraction network and a target detection network. The way to train this SSD framework is as follows:

S201、对样本视频数据的图像序列进行预处理，获得去除背景图像的样本图像序列。S201. Perform preprocessing on an image sequence of sample video data to obtain a sample image sequence with background images removed.

S202、对所述样本图像序列进行人工目标标注，获得训练数据集。S202. Perform manual target labeling on the sample image sequence to obtain a training data set.

S203、基于所述训练数据集对SSD框架进行训练：首先初始化网络中待训练的参数及超参数，向初始化后的网络中输入训练数据进行网络前向传播，得到实际的输出结果，通过损失函数结合反向传播BP算法调整网络参数，进行迭代训练，至损失函数的损失值小于设定阈值或达到最大迭代次数时训练结束，得到训练好的SSD框架。S203. Train the SSD framework based on the training data set: first initialize the parameters and hyperparameters to be trained in the network, input the training data into the initialized network for forward propagation of the network, obtain the actual output result, and pass the loss function Combined with the backpropagation BP algorithm to adjust the network parameters, iterative training is carried out, and the training ends when the loss value of the loss function is less than the set threshold or reaches the maximum number of iterations, and the trained SSD framework is obtained.

具体的，首先制作训练数据集：采用传统的图像处理算法，通过背景减除法对运动目标进行检测，保留运动目标所在区域的像素掩码，利用形态学方法尽可能保留运动目标所在区域像素，从而获取去除背景的图像序列，利用标注工具人工标注数据集，获得训练数据集。Specifically, first make a training data set: use traditional image processing algorithms to detect moving objects through the background subtraction method, retain the pixel mask of the area where the moving object is located, and use the morphological method to retain the pixels of the area where the moving object is located as much as possible, so that Obtain the image sequence with the background removed, use the labeling tool to manually label the data set, and obtain the training data set.

检测模型设计：检测模型的设计基于现有的SSD目标检测网络结构，主要针对损失函数进行修改，去除类别损失中的背景损失函数项。本实施方式中，损失函数为位置误差与置信度误差的加权和。针对置信度误差，采用Softmax Loss，针对位置误差，采用SmoothL1loss。Detection model design: The design of the detection model is based on the existing SSD target detection network structure, and the loss function is mainly modified to remove the background loss function item in the category loss. In this embodiment, the loss function is the weighted sum of the position error and the confidence error. For the confidence error, Softmax Loss is used, and for the position error, SmoothL1loss is used.

损失函数如下： The loss function is as follows:

其中，第一项L_conf为置信度误差，第二项L_loc为位置误差，N为匹配的默认边框的数目，α为平衡因子(权重系数)，交叉验证的时候取值为1。c为类别置信度预测值。l为先验框的所对应边界框的位置预测值，而g是真实目标的位置参数。Among them, the first item L _conf is the confidence error, the second item L _loc is the position error, N is the number of matching default borders, α is the balance factor (weight coefficient), and the value is 1 during cross-validation. c is the category confidence prediction value. l is the position prediction value of the corresponding bounding box of the prior box, and g is the position parameter of the real target.

其中，in,

其中，由于背景样本的单一性，可以不考虑背景的特征学习，使网络更关注前景样本学习，表示预测框i与真实框j关于类别匹配，则p的概率预测越高损失越小，通过Softmax获得，如果预测框没有目标，为背景的概率越高，损失越小。Among them, due to the singleness of the background sample, the feature learning of the background can be ignored, so that the network pays more attention to the foreground sample learning, Indicates that the predicted frame i matches the real frame j with respect to the category, then the higher the probability prediction of p, the smaller the loss, Obtained by Softmax, if the prediction box has no target, the higher the probability of being the background, the smaller the loss.

其中，in,

其中，使用位置回归函数，表示第i个预测框与第j个真实框关于类别k是否匹配,和分别表示预测框与真实框，表示第i个真实框的中点，表示第i个默认框的中点，表示第i个默认框的宽度。where, using the positional regression function, Indicates whether the i-th predicted frame matches the j-th real frame with respect to category k, and represent the predicted frame and the real frame, respectively, Indicates the midpoint of the i-th ground-truth box, represents the midpoint of the i-th default box, Indicates the width of the i-th default box.

检测模型训练：利用标注的去背景图像数据集作为训练数据集，采用SSD网络框架训练检测模型。图3所示为基于SSD框架的目标检测系统整体结构，具体的，首先调整输入图像尺寸为网络所要求的输入(例如采用300x300)，将预处理获取的去背景图片作为训练模型的输入数据如图3中B所示，通过主体网络前向传播提取多层图像特征，将不同层图像特征进行融合，与真值数据对比IoU(Intersection over Union，交并比)获取误差值，修改模型目标函数，计算损失值，使网络学习更多的前景信息而忽略背景；利用误差反向传播调整网络参数，反向传播误差过程中采用随机梯度下降法，并且设置网络学习率lr＝0.001，梯度动量momentum＝0.9，完成一次迭代。至损失函数的损失值小于设定阈值或达到最大迭代次数时训练结束，得到训练好的SSD框架。Detection model training: Use the labeled background image dataset as the training dataset, and use the SSD network framework to train the detection model. Figure 3 shows the overall structure of the target detection system based on the SSD framework. Specifically, first adjust the size of the input image to the input required by the network (for example, 300x300), and use the background image obtained by preprocessing as the input data of the training model. As shown in B in Figure 3, the multi-layer image features are extracted through the forward propagation of the main network, the image features of different layers are fused, and the IoU (Intersection over Union) is compared with the real value data to obtain the error value, and the model objective function is modified , calculate the loss value, so that the network learns more foreground information and ignores the background; use error backpropagation to adjust network parameters, use stochastic gradient descent method in the process of backpropagation error, and set network learning rate lr=0.001, gradient momentum momentum =0.9, complete one iteration. The training ends when the loss value of the loss function is less than the set threshold or reaches the maximum number of iterations, and the trained SSD framework is obtained.

如图3所示，基于SSD网络结构添加数据预处理层，数据层获取视频数据，利用背景建模算法去除背景，保留前景图像送入SSD框架进行检测。如图3中C所示实现特征提取，之后通过一些卷积和池化操作提取不同尺度的特征图，针对不同尺度的特征图提出候选目标框，例如特征图为8*8大小，则候选目标框的数量为8*8*9，每个特征点位置生成九种类型候选目标，即3种比例尺度和3种面积。SSD框架对图像进行推断时，产生一系列固定大小的候选框，以及每一个候选框中包含物体实例的可能性。一次前向处理生成大量的目标框，需要采用非极大抑制(Non-maximum suppression，NMS)滤除大部分的目标框，采用的方法是当目标框置信度阈值小于阈值ct(如0.01)与IoU小于lt(如0.45)时，则丢弃，只保留前N个预测结果。对融合获取的特征与真值特征匹配，来约束损失函数，使损失函数更关注于前景目标特征，从而实现目标的检测。As shown in Figure 3, a data preprocessing layer is added based on the SSD network structure. The data layer acquires video data, uses a background modeling algorithm to remove the background, and retains the foreground image and sends it to the SSD framework for detection. Feature extraction is realized as shown in Figure 3 C, and then feature maps of different scales are extracted through some convolution and pooling operations, and candidate target frames are proposed for feature maps of different scales. For example, if the feature map is 8*8 in size, the candidate target The number of boxes is 8*8*9, and each feature point position generates nine types of candidate targets, namely, 3 scales and 3 areas. When the SSD framework infers the image, it generates a series of fixed-size candidate boxes, and the possibility that each candidate box contains an object instance. A forward processing generates a large number of target frames, and it is necessary to use Non-maximum suppression (NMS) to filter out most of the target frames. The method used is when the confidence threshold of the target frame is less than the threshold ct (such as 0.01) and When the IoU is less than lt (such as 0.45), it is discarded and only the first N prediction results are kept. Match the features obtained by fusion with the true features to constrain the loss function, so that the loss function pays more attention to the foreground target features, so as to realize the target detection.

本实施例提供的目标检测方法，对所述视频数据的第一图像序列进行预处理，获得去除背景图像的第二图像序列，将所述第二图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。一方面，对于去除背景的图像只保留前景目标，没有其它背景图像的干扰，检测模型在学习和推理时更关注前景目标，从而可以提高目标检测准确率；另一方面，由于去除了背景像素，检测模型所看到的只有前景像素，完全不会受到视频或者图片序列场景的影响，从而提高了目标检测的场景迁移性能。The object detection method provided in this embodiment preprocesses the first image sequence of the video data to obtain a second image sequence with the background image removed, and inputs the second image sequence into a trained detection model for object detection , to obtain the target detection result. On the one hand, only the foreground object is reserved for the image with the background removed, and there is no interference from other background images. The detection model pays more attention to the foreground object during learning and reasoning, which can improve the accuracy of object detection; on the other hand, due to the removal of background pixels, The detection model sees only foreground pixels, and is completely unaffected by video or picture sequence scenes, thereby improving the scene migration performance of target detection.

下述为本申请装置实施例，可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节，请参照本申请方法实施例。The following are device embodiments of the present application, which can be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

图4为本申请实施例二提供的目标检测装置的结构示意图，如图4所示，该装置可以包括：FIG. 4 is a schematic structural diagram of a target detection device provided in Embodiment 2 of the present application. As shown in FIG. 4, the device may include:

获取模块410，用于获取视频数据；Obtaining module 410, for obtaining video data;

预处理模块420，用于对所述视频数据的第一图像序列进行预处理，获得去除背景图像的第二图像序列；A preprocessing module 420, configured to preprocess the first image sequence of the video data to obtain a second image sequence from which the background image is removed;

目标检测模块430，用于将所述第二图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。The object detection module 430 is configured to input the second image sequence into the trained detection model for object detection, and obtain an object detection result.

本实施例提供的目标检测装置，对所述视频数据的第一图像序列进行预处理，获得去除背景图像的第二图像序列，将所述第二图像序列输入训练好的检测模型中进行目标检测，获得目标检测结果。一方面，对于去除背景的图像只保留前景目标，没有其它背景图像的干扰，检测模型在学习和推理时更关注前景目标，从而可以提高目标检测准确率；另一方面，由于去除了背景像素，检测模型所看到的只有前景像素，完全不会受到视频或者图片序列场景的影响，从而提高了目标检测的场景迁移性能。The target detection device provided in this embodiment preprocesses the first image sequence of the video data to obtain a second image sequence with the background image removed, and inputs the second image sequence into a trained detection model for target detection , to obtain the target detection result. On the one hand, only the foreground object is reserved for the image with the background removed, and there is no interference from other background images. The detection model pays more attention to the foreground object during learning and reasoning, which can improve the accuracy of object detection; on the other hand, due to the removal of background pixels, The detection model sees only foreground pixels, and is completely unaffected by video or picture sequence scenes, thereby improving the scene migration performance of target detection.

在一种可能的实现方式中，在本申请实施例提供的上述装置中，所述预处理模块420，具体用于：In a possible implementation, in the above device provided in the embodiment of the present application, the preprocessing module 420 is specifically used to:

图5为本申请实施例三提供的电子设备的结构示意图，如图5所示，该设备包括：存储器501和处理器502；FIG. 5 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application. As shown in FIG. 5 , the device includes: a memory 501 and a processor 502;

存储器501，用于存储计算机程序；Memory 501, for storing computer programs;

其中，处理器502执行存储器501中的计算机程序，以实现如上所述各方法实施例所提供的方法。Wherein, the processor 502 executes the computer program in the memory 501, so as to implement the methods provided by the above method embodiments.

在实施例中，以一电子设备对本申请提供的目标检测装置进行示例。处理器可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元，并且可以控制电子设备中的其他组件以执行期望的功能。In the embodiment, an electronic device is used as an example of the target detection device provided in this application. The processor may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

存储器可以包括一个或多个计算机程序产品，计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令，处理器可以运行程序指令，以实现上文的本申请的各个实施例中的方法以及/或者其他期望的功能。在计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。The memory may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor may execute the program instructions to implement the above methods and/or other desired functions in the various embodiments of the present application. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.

本申请实施例四提供了一种计算机可读存储介质，该计算机可读存储介质中存储有计算机程序，该计算机程序被处理器执行时用于实现如上所述各方法实施例所提供的方法。Embodiment 4 of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium. When the computer program is executed by a processor, the computer program is used to implement the methods provided in the foregoing method embodiments.

实际应用中，本实施例中的计算机程序可以以一种或多种程序设计语言的任意组合来编写用于执行本申请实施例操作的程序代码，程序设计语言包括面向对象的程序设计语言，诸如Java、C++等，还包括常规的过程式程序设计语言，诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。In practical applications, the computer programs in this embodiment can be written in any combination of one or more programming languages to execute the program codes for performing the operations of the embodiments of the present application. The programming languages include object-oriented programming languages, such as Java, C++, etc., also includes conventional procedural programming languages such as the "C" language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.

实际应用中，计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。In practical applications, any combination of one or more readable media may be used as the computer-readable storage medium. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, but not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

前述对本申请的具体示例性实施方案的描述是为了说明和例证的目的。这些描述并非想将本申请限定为所公开的精确形式，并且很显然，根据上述教导，可以进行很多改变和变化。对示例性实施例进行选择和描述的目的在于解释本申请的特定原理及其实际应用，从而使得本领域的技术人员能够实现并利用本申请的各种不同的示例性实施方案以及各种不同的选择和改变。本申请的范围意在由权利要求书及其等同形式所限定。The foregoing descriptions of specific exemplary embodiments of the present application have been presented for purposes of illustration and description. These descriptions are not intended to limit the application to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the application and their practical application, thereby enabling those skilled in the art to implement and utilize various exemplary embodiments of the application, as well as various Choose and change. It is intended that the scope of the application be defined by the claims and their equivalents.

Claims

1. A target detection method, characterized in that, comprising:

Get video data;

Preprocessing the first image sequence of the video data to obtain a second image sequence from which the background image is removed;

The second image sequence is input into the trained detection model to perform target detection, and a target detection result is obtained.

2. The method according to claim 1, wherein the preprocessing the first image sequence of the video data to obtain the second image sequence for removing the background image comprises:

performing moving target detection on the first image sequence of the video data by using a background subtraction method;

The pixels of the area where the moving object is located are retained, and the pixels of the area where the moving object is located are segmented by using a morphological method, and divided into independent units of the moving object, so as to obtain a second image sequence in which the background image is removed.

3. The method according to claim 1, wherein the detection model adopts an SSD framework, which comprises a feature extraction network and a target detection network.

4. method according to claim 3, is characterized in that, described method also comprises training SSD framework, and it comprises:

Preprocessing the image sequence of the sample video data to obtain a sample image sequence from which the background image is removed;

Carrying out manual target labeling on the sample image sequence to obtain a training data set;

The SSD framework is trained based on the training data set: first, initialize the parameters and hyperparameters to be trained in the network, input the training data into the initialized network for forward propagation of the network, and obtain the actual output results, and combine the loss function with the inverse Adjust the network parameters to the propagation BP algorithm, perform iterative training, and end the training until the loss value of the loss function is less than the set threshold or reaches the maximum number of iterations, and a trained SSD framework is obtained.

5. The method according to claim 4, wherein the loss function is a weighted sum of position error and confidence error.

6. The method according to claim 5, wherein the calculation formula of the confidence error is as follows:

in, Indicates that the predicted box i matches the ground truth box j with respect to the category.

7. A target detection device, characterized in that, comprising:

An acquisition module, used to acquire video data;

A preprocessing module, configured to preprocess the first image sequence of the video data to obtain a second image sequence from which the background image is removed;

A target detection module, configured to input the second image sequence into a trained detection model for target detection and obtain a target detection result.

8. The device according to claim 7, wherein the preprocessing module is specifically used for:

Keep the pixel mask of the area where the moving target is located, and use the morphological method to extract the pixels of the area where the moving target is located;

Obtain a second sequence of images with the background image removed.

9. An electronic device, comprising: a memory and a processor;

The memory is used to store computer programs;

Wherein, the processor executes the computer program in the memory, so as to realize the method according to any one of claims 1-6.

10. A computer-readable storage medium, characterized in that, a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, it is used to implement any one of claims 1-6. described method.