CN114972170B

CN114972170B - A fisheye camera-based object detection method for dense scenes with anti-occlusion

Info

Publication number: CN114972170B
Application number: CN202210335755.1A
Authority: CN
Inventors: 康文雄; 许鸿斌; 王略权
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2024-05-14
Anticipated expiration: 2042-03-31
Also published as: CN114972170A

Abstract

The invention provides an anti-shielding object detection method based on a fisheye camera in a dense scene, which comprises the following steps: collecting an original image in an original scene and marking to obtain a mask of a single object; based on the object mask, synthesizing an image according to a shielding relation based on a spatial position relation between objects and a preset shielding priority, and obtaining a synthesized data set; dividing the synthetic data set into a plurality of sub data sets with different recognition difficulties according to the recognition difficulties of the objects; performing data enhancement on the synthesized data set; training an object detection network by adopting a synthetic data set after data enhancement; inputting the image to be detected into a trained object detection network to obtain an object detection result. The invention can greatly reduce the acquisition cost of data, can randomly increase different types, is used for simulating various possible shielding, and improves the object detection accuracy in complex scenes.

Description

A fisheye camera-based object detection method for dense scenes with anti-occlusion

技术领域Technical Field

本发明属于计算机视觉技术领域，涉及一种基于鱼眼相机的密集场景下抗遮挡的物体检测方法。The present invention belongs to the technical field of computer vision and relates to an anti-occlusion object detection method in a dense scene based on a fisheye camera.

背景技术Background technique

受益于深度学习的快速发展和卷积神经网络的强大学习能力，只要提供足够多的训练数据以及与其对应的样本标签，网络模型便可以在不断迭代学习的过程中实现很好的性能，并可被用于各式各样的实际应用中，如目标检测、语义分割、人脸识别等等。相对地，网络模型往往也反过来受限于有限的数据集。对于一些特殊场景，数据的采集可能十分困难，或者数据的标注也十分困难，导致只能采集到少量的数据。少量的数据无法提供充足的监督信息训练模型，导致在实际场景下效果较差。Benefiting from the rapid development of deep learning and the powerful learning ability of convolutional neural networks, as long as enough training data and corresponding sample labels are provided, the network model can achieve good performance in the process of continuous iterative learning and can be used in a variety of practical applications, such as target detection, semantic segmentation, face recognition, etc. In contrast, network models are often also limited by limited data sets. For some special scenarios, data collection may be very difficult, or data labeling is also very difficult, resulting in only a small amount of data being collected. A small amount of data cannot provide sufficient supervisory information to train the model, resulting in poor results in actual scenarios.

由于鱼眼相机的广角视野特性，其拍摄的图像往往存在自中心发散开来的不同程度的图像畸变，增加了图像识别的难度。此外，如果鱼眼相机图像中还存在复杂且剧烈的遮挡关系，并与图像畸变耦合在一起，会进一步增加图像识别及物体检测的难度。但是，在现实场景中，复杂的遮挡关系以及剧烈的图像畸变包含了大量的变化，导致难以像此前的方法一样，仅通过采集大量数据集来模拟所有可能的情况。Due to the wide-angle field of view of fisheye cameras, the images they capture often have varying degrees of image distortion radiating from the center, which increases the difficulty of image recognition. In addition, if there are complex and severe occlusion relationships in the fisheye camera images, coupled with image distortion, it will further increase the difficulty of image recognition and object detection. However, in real scenes, complex occlusion relationships and severe image distortions contain a large number of changes, making it difficult to simulate all possible situations by simply collecting a large number of data sets as previous methods did.

发明内容Summary of the invention

针对现有技术中存在的问题，本发明提供一种基于鱼眼相机的密集场景下抗遮挡的物体检测方法。In view of the problems existing in the prior art, the present invention provides an anti-occlusion object detection method in dense scenes based on a fisheye camera.

为了实现本发明目的，本发明提供的一种基于鱼眼相机的密集场景下抗遮挡的物体检测方法，包括以下步骤：In order to achieve the purpose of the present invention, the present invention provides an anti-occlusion object detection method in a dense scene based on a fisheye camera, comprising the following steps:

在原始场景中采集原始图像并进行标注，获得单物体的掩膜；Collect and annotate the original image in the original scene to obtain the mask of a single object;

基于物体掩膜，根据基于物体之间的空间位置关系的遮挡关系和预设的遮挡优先级合成图像，获得合成数据集；Based on the object masks, synthesizing the images according to the occlusion relationship based on the spatial position relationship between the objects and the preset occlusion priority to obtain a synthetic data set;

根据物体的识别难度，将合成数据集划分为识别难度不同的多个子数据集；According to the recognition difficulty of the object, the synthetic dataset is divided into multiple sub-datasets with different recognition difficulties;

对合成数据集进行数据增强；Perform data augmentation on synthetic datasets;

采用数据增强后的合成数据集训练物体检测网络；Use data-augmented synthetic datasets to train object detection networks;

将待检测图像输入训练好的物体检测网络中，获得物体检测结果。Input the image to be detected into the trained object detection network to obtain the object detection result.

进一步地，所述在原始场景中采集原始图像，包括：Furthermore, the collecting of the original image in the original scene includes:

将原始场景划分为M行N列，在划分后的原始场景的指定位置采集原始图像；The original scene is divided into M rows and N columns, and the original image is collected at a specified position of the divided original scene;

将划分后的原始场景随机旋转T次，并分别采集图像，总计采集得到M·N·T·K张单物体图像。The divided original scene is randomly rotated T times, and images are collected separately, and a total of M·N·T·K single object images are collected.

进一步地，所述根据基于物体之间的空间位置关系的遮挡关系为，以掩膜位置与鱼眼相机成像中心的相对位置关系来确定不同物体之间的遮挡关系，距离中心位置较远的物体会被距离中心较近的物体遮挡。Furthermore, the occlusion relationship based on the spatial position relationship between objects is to determine the occlusion relationship between different objects based on the relative position relationship between the mask position and the imaging center of the fisheye camera, and objects farther from the center position will be occluded by objects closer to the center.

进一步地，所述预设的遮挡优先级中，采用数值来表示遮挡优先级，数字从小到大表示遮挡优先级，遮挡优先级越高表示越有可能遮挡周围的物体。Furthermore, in the preset occlusion priority, a numerical value is used to represent the occlusion priority, and the numbers represent the occlusion priority from small to large, and the higher the occlusion priority, the more likely it is to occlude surrounding objects.

进一步地，所述根据物体的识别难度，将合成数据集划分为识别难度不同的多个子数据集中，根据物体个数来衡量识别难度。使用物体个数来衡量遮挡关系的复杂程度与识别物体的难度。由此可以划分为难分样本与易分样本。对于易分样本来说，出现的物体个数较少，遮挡关系简单，在训练中较易拟合；对于难分样本来说，出现的物体个数很多，遮挡关系错综复杂，在训练中较难拟合。Furthermore, according to the difficulty of object recognition, the synthetic data set is divided into multiple sub-data sets with different recognition difficulties, and the recognition difficulty is measured according to the number of objects. The number of objects is used to measure the complexity of the occlusion relationship and the difficulty of identifying the object. Thus, it can be divided into difficult-to-distinguish samples and easy-to-distinguish samples. For easy-to-distinguish samples, the number of objects that appear is small, the occlusion relationship is simple, and it is easier to fit in training; for difficult-to-distinguish samples, the number of objects that appear is large, the occlusion relationship is complex, and it is more difficult to fit in training.

进一步地，将数据集划分为4种难度，各个子数据集的比例为：子数据集A:子数据集B:子数据集C:子数据集D＝0.15:0.35:0.35:0.1。Furthermore, the dataset is divided into four levels of difficulty, and the ratio of each sub-dataset is: sub-dataset A: sub-dataset B: sub-dataset C: sub-dataset D = 0.15:0.35:0.35:0.1.

进一步地，所述物体检测网络为cascadeCNN。Furthermore, the object detection network is a cascadeCNN.

进一步地，所述数据增强的方式基于掩模的随机亮度调整和随机位移扰动。Furthermore, the data enhancement method is based on random brightness adjustment and random displacement perturbation of the mask.

进一步地，对掩膜内外进行随机亮度调整的方式为：Furthermore, the random brightness adjustment inside and outside the mask is performed as follows:

在掩膜内部通过随机伽马校正对光照扰动进行调整，对于掩模内的像素点M＝{(x_m，y_m)}，采用公式(1)进行随机光照扰动：The illumination disturbance is adjusted by random gamma correction inside the mask. For the pixel point M={(x _m , y _m )} inside the mask, random illumination disturbance is performed using formula (1):

其中，表示数据增强之后的图像；α表示线性系数，用于调整伽马校正尺度；β服从高斯分布，表示随机噪声干扰；γ表示伽马校正光照强度调整系数，x_m，y_m分别表示掩膜内像素横纵坐标，I(x_m，y_m)表示整个掩膜内所有像素的集合；in, represents the image after data enhancement; α represents the linear coefficient, which is used to adjust the gamma correction scale; β obeys Gaussian distribution and represents random noise interference; γ represents the gamma correction illumination intensity adjustment coefficient, x _m , y _m represent the horizontal and vertical coordinates of the pixels in the mask respectively, and I (x _m , y _m ) represents the set of all pixels in the entire mask;

在掩膜外部，采用高斯核计算干扰衰减系数来对光照扰动进行调整：Outside the mask, a Gaussian kernel is used to calculate the interference attenuation coefficient to adjust the illumination disturbance:

其中，ω表示衰减系数，x_o、y_o分别表示掩模外像素点的横纵坐标，I(x_o，y_o)分别表示增强后的图像和增强前的图像，x_i、y_i、σ²分别表示掩模内的其中一点的坐标和方差。Where ω represents the attenuation coefficient, x _o and _yo represent the horizontal and vertical coordinates of the pixel outside the mask, respectively. I(x _o , _yo ) represents the image after enhancement and the image before enhancement, respectively. x _i , y _i , and σ ² represent the coordinates and variance of a point in the mask, respectively.

进一步地，所述随机位移扰动为，先对原始鱼眼图像进行畸变校正，随后在合成图像时对掩模进行预定范围的随机位移，随后再利用鱼眼畸变模型还原回原图。Furthermore, the random displacement perturbation is to first perform distortion correction on the original fisheye image, then perform random displacement on the mask within a predetermined range when synthesizing the image, and then restore the original image using the fisheye distortion model.

与现有技术相比，本发明能够实现的有益效果至少如下：Compared with the prior art, the present invention can achieve at least the following beneficial effects:

1、本发明尤其适用于空间范围狭小且目标物体位置紧密并存在严重遮挡的场景。由于实际场景下不同物体摆放在不同位置时存在大量的变化，给数据集的采集带来了非常大的困难。本发明所采用的的方案是根据物体之间的空间位置关系推断可能的遮挡关系人工模拟遮挡并随机合成大量的数据，构建具有丰富变化的遮挡样本用于训练模型。本方法能极大降低数据的采集成本，并可以可以随意的增加不同的种类，用于模拟各种可能的遮挡，提升复杂场景下的物体检测准确率。1. The present invention is particularly suitable for scenes with a narrow spatial range, close positions of target objects, and severe occlusion. Since there are a lot of changes when different objects are placed in different positions in actual scenes, it brings great difficulties to the collection of data sets. The solution adopted by the present invention is to infer possible occlusion relationships based on the spatial position relationship between objects, artificially simulate occlusions, and randomly synthesize a large amount of data to construct occlusion samples with rich variations for training models. This method can greatly reduce the cost of data collection, and can arbitrarily add different types to simulate various possible occlusions and improve the accuracy of object detection in complex scenes.

2、对卷积网络模型输入大量带有感兴趣类别物体的图片和相应的标签数据，在不断迭代学习的过程中，训练网络模型去判断输入图片中是否有感兴趣的物体类别，并输出相应的置信度。网络模型的鲁棒性，很大程度上受到训练数据丰富性的影响。受益于本发明提出的数据生成方案，在人力物力有限的条件下，确保了训练数据的数量与质量。2. Input a large number of pictures with objects of interest and corresponding label data to the convolutional network model. In the process of continuous iterative learning, train the network model to determine whether there are objects of interest in the input pictures and output the corresponding confidence. The robustness of the network model is largely affected by the richness of the training data. Benefiting from the data generation scheme proposed in the present invention, the quantity and quality of the training data are ensured under the condition of limited manpower and material resources.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是鱼眼相机拍摄的密集场景图像示意图。FIG1 is a schematic diagram of a dense scene image captured by a fisheye camera.

图2是原始数据采集与标注的示意图。FIG2 is a schematic diagram of raw data collection and annotation.

图3是遮挡优先级与数据合成规则的示意图。FIG. 3 is a schematic diagram of occlusion priority and data synthesis rules.

图4是真实场景下识别以及抗遮挡与畸变效果示意图。FIG4 is a schematic diagram of recognition and anti-occlusion and distortion effects in a real scene.

图5是抗遮挡物体检测流程图。FIG5 is a flowchart of anti-occlusion object detection.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整的描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都是本发明保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

现有技术中，由于鱼眼相机的广角视野特性，其所拍摄的图像中往往会有剧烈的畸变。此外密集场景下不同物体的遮挡关系错综复杂，与畸变耦合在一起，导致检测物体的难度非常大。真实场景下，不同物体的不同排列方式乃至本身的旋转等变化也会导致数据采集本身存在巨大的困难等，如图1、图2所示，将鱼眼相机图像的目标区域划分为M行N列(如M＝5,N＝7)。如图2(b)所示，每个格子表示一个物体可能出现的区域。考虑到单目摄像头视角的单一性，为保证场景的拟真性和数据的丰富性，在拍摄不同物体排列的图片时除考虑物体在水平面的平移还需考虑物体可能的不同旋转情况(3自由度)。假设有K种物体，每种物体存在T种旋转情况，则对于每个位置的每个物体存在(Ｋ·T+１)^Ｍ·N种可能的情况。由此可见，所有可能的排列情况呈指数式增长，受限于昂贵的数据采集与人工标注成本。因此，为了解决现有技术中存在的问题，本发明提出一种基于鱼眼相机的密集场景下抗遮挡的物体检测方法。In the prior art, due to the wide-angle field of view of the fisheye camera, the images it captures often have severe distortion. In addition, the occlusion relationship between different objects in dense scenes is intricate and coupled with distortion, making it very difficult to detect objects. In real scenes, the different arrangements of different objects and even changes in their own rotation will also lead to huge difficulties in data collection itself. As shown in Figures 1 and 2, the target area of the fisheye camera image is divided into M rows and N columns (such as M = 5, N = 7). As shown in Figure 2(b), each grid represents an area where an object may appear. Considering the singleness of the monocular camera's perspective, in order to ensure the realism of the scene and the richness of the data, when taking pictures of different object arrangements, in addition to considering the translation of the object in the horizontal plane, it is also necessary to consider the possible different rotations of the object (3 degrees of freedom). Assuming there are K objects, each object has T rotations, then for each object at each position, there are (K·T+1) ^M·N possible situations. It can be seen that all possible arrangements grow exponentially, limited by the expensive data collection and manual labeling costs. Therefore, in order to solve the problems existing in the prior art, the present invention proposes an anti-occlusion object detection method in dense scenes based on a fisheye camera.

具体的，本发明提供的一种基于鱼眼相机的密集场景下抗遮挡的物体检测方法，包括以下步骤：Specifically, the present invention provides an anti-occlusion object detection method in a dense scene based on a fisheye camera, comprising the following steps:

步骤1、采集与标注原始样本。Step 1: Collect and label original samples.

请参阅图2，步骤1具体包括以下步骤：Please refer to Figure 2, step 1 specifically includes the following steps:

步骤1.1：对原始场景进行划分，如将原始场景划分为M行N列，然后在划分后的原始场景的指定位置采集原始图像。Step 1.1: Divide the original scene, such as dividing the original scene into M rows and N columns, and then collect the original image at a specified position of the divided original scene.

在本发明的其中一些实施例中，考虑到有T种可能的姿态变化，随机旋转T次，并采集图像。In some embodiments of the present invention, considering that there are T possible posture changes, random rotation is performed T times and images are collected.

步骤1.2：使用标注软件对步骤1.1采集的图像进行人工标注，并获得物体的掩膜。Step 1.2: Use annotation software to manually annotate the image collected in step 1.1 and obtain the mask of the object.

在本发明的其中一些实施例中，采用labelme软件进行标注，获得物体的掩膜。In some embodiments of the present invention, labelme software is used for labeling to obtain a mask of the object.

在本发明的其中一些实施例中，标注过程中只需要标注单物体，不需要处理复杂的多物体场景。In some embodiments of the present invention, only a single object needs to be labeled during the labeling process, and there is no need to process complex multi-object scenes.

在本发明的其中一些实施例中，总计获得M·N·T·K张单物体图像，远少于现有方案所需的(K·T+1)^M·N种可能的情况。In some embodiments of the present invention, a total of M·N·T·K single object images are obtained, which is far less than the (K·T+1) ^M·N possible situations required by existing solutions.

在本发明其中一个实施例中，M＝5，N＝7，可以理解的是，在其他实施例中，可以根据需要设置成其他数值。In one embodiment of the present invention, M=5, N=7. It is understandable that in other embodiments, they can be set to other values as needed.

步骤2、根据遮挡关系人工合成数据。Step 2: Artificially synthesize data based on occlusion relationships.

经过步骤1的图像数据准备工作，即可得到单物体掩模，所述单物体掩模包含了颜色、尺寸、位置等信息。根据掩模的位置关系以及鱼眼相机的广角成像特性，本发明可以从掩模位置与鱼眼相机成像中心(即光心)的相对位置关系确定不同物体之间的遮挡关系。如图3(c)所示，假设目标物体都是放置在同一平面上，距离中心位置较远的物体会被距离中心较近的物体遮挡。由此，如图3(d)所示，本在本发明的其中一些实施例中，可以指定相对遮挡优先级来确定物体之间的遮挡关系：数字从小到大表示遮挡优先级，遮挡优先级越高表示越有可能遮挡周围的物体。根据遮挡关系，在合成图像时，按照遮挡优先级从小到大的顺序，根据物体掩膜抠图粘贴到合成图像的对应位置，最后可以合成具有自然遮挡关系(符合物理规则的遮挡)的图像，如图3(e)所示，获得合成数据集。After the image data preparation work in step 1, a single object mask can be obtained, and the single object mask contains information such as color, size, and position. According to the positional relationship of the mask and the wide-angle imaging characteristics of the fisheye camera, the present invention can determine the occlusion relationship between different objects from the relative positional relationship between the mask position and the imaging center (i.e., the optical center) of the fisheye camera. As shown in FIG3(c), assuming that the target objects are all placed on the same plane, objects farther from the center position will be occluded by objects closer to the center. Therefore, as shown in FIG3(d), in some embodiments of the present invention, a relative occlusion priority can be specified to determine the occlusion relationship between objects: numbers from small to large represent the occlusion priority, and the higher the occlusion priority, the more likely it is to occlude the surrounding objects. According to the occlusion relationship, when synthesizing an image, in the order of occlusion priority from small to large, the object mask is cut out and pasted to the corresponding position of the synthesized image, and finally an image with a natural occlusion relationship (occlusion that conforms to physical rules) can be synthesized, as shown in FIG3(e), to obtain a synthetic data set.

步骤3、数据比例分配，根据物体的识别难度，将合成数据集划分为识别难度不同的多个子数据集。Step 3: Data ratio allocation: According to the difficulty of object recognition, the synthetic dataset is divided into multiple sub-datasets with different recognition difficulties.

在实际场景中，往往出现的物体越多，潜在的遮挡关系越复杂，识别的难度也越大。本在本发明的其中一些实施例中，使用物体个数来衡量遮挡关系的复杂程度与识别物体的难度。由此可以划分为难分样本与易分样本。对于易分样本来说，出现的物体个数较少，遮挡关系简单，在训练中较易拟合；对于难分样本来说，出现的物体个数很多，遮挡关系错综复杂，在训练中较难拟合。除此之外，鱼眼相机的畸变特性会进一步给遮挡关系带来更大的变化与不确定性。In actual scenes, the more objects that appear, the more complex the potential occlusion relationship, and the greater the difficulty of identification. In some embodiments of the present invention, the number of objects is used to measure the complexity of the occlusion relationship and the difficulty of identifying objects. It can be divided into difficult-to-distinguish samples and easy-to-distinguish samples. For easy-to-distinguish samples, the number of objects that appear is small, the occlusion relationship is simple, and it is easier to fit in training; for difficult-to-distinguish samples, the number of objects that appear is large, the occlusion relationship is intricate, and it is more difficult to fit in training. In addition, the distortion characteristics of the fisheye camera will further bring greater changes and uncertainties to the occlusion relationship.

为了获得更好的训练效果和模拟真实场景下的情况，根据难分样本和易分样本的比例，本发明将采集获得的数据集划分为4种难度：子数据集A-每张图片中包含1到10个物体；子数据集B-每张图片中包含11到20个物体；子数据集C-每张图片中包含20-28个物体；子数据集D-每张图片中包含29到35个物体。为尽可能模拟真实场景，各个子数据集的比例拟定为：子数据集A:子数据集B:子数据集C:子数据集D＝0.15:0.35:0.35:0.15。In order to obtain better training results and simulate the situation in real scenes, according to the ratio of difficult samples and easy samples, the present invention divides the collected data sets into 4 levels of difficulty: sub-dataset A-each picture contains 1 to 10 objects; sub-dataset B-each picture contains 11 to 20 objects; sub-dataset C-each picture contains 20-28 objects; sub-dataset D-each picture contains 29 to 35 objects. In order to simulate the real scene as much as possible, the ratio of each sub-dataset is proposed as follows: sub-dataset A: sub-dataset B: sub-dataset C: sub-dataset D = 0.15:0.35:0.35:0.15.

步骤4、对合成数据进行随机数据增强。Step 4: Perform random data augmentation on the synthetic data.

为尽可能丰富合成数据集的分布，改善后续识别模型的鲁棒性，在本发明的其中一些实施例中，采用一系列数据增强方案对合成数据集进行处理，包括基于掩模的随机亮度调整和随机位移扰动。In order to enrich the distribution of the synthetic data set as much as possible and improve the robustness of the subsequent recognition model, in some embodiments of the present invention, a series of data enhancement schemes are used to process the synthetic data set, including mask-based random brightness adjustment and random displacement perturbation.

对于基于掩模的随机亮度调整：For mask-based random brightness adjustment:

在密集场景下，物体之间互相反射的光线可能导致光场变化，在成像时对图像亮度产生影响。为了模拟这种不规则的亮度波动，本发明实施例对掩模周围预设范围(根据图像大小决定)内的空间进行随机亮度调整。在掩模区域内部，通过随机伽马校正颜色扰动，对光照进行调整；在掩模区域外部，为了拟合不同物体之间相互反光的作用关系，本发明采用高斯核计算干扰衰减系数，来对光照扰动进行调整。即：In dense scenes, the light reflected from each other between objects may cause changes in the light field, affecting the image brightness during imaging. In order to simulate this irregular brightness fluctuation, an embodiment of the present invention performs random brightness adjustments on the space within a preset range (determined by the image size) around the mask. Inside the mask area, the illumination is adjusted by random gamma correction color disturbance; outside the mask area, in order to fit the interaction relationship between different objects reflecting each other, the present invention uses a Gaussian kernel to calculate the interference attenuation coefficient to adjust the illumination disturbance. That is:

对于掩模内的像素点M＝{(x_m，y_m)}，可以使用如下公式模拟随机光照扰动：For the pixel point M={(x _m , y _m )} in the mask, the following formula can be used to simulate random illumination disturbance:

其中，表示数据增强之后的图像；α表示线性系数，用于调整伽马校正尺度；β∈N(0，1)，服从高斯分布，表示随机噪声干扰；γ表示伽马校正光照强度调整系数，x_m，y_m分别表示掩膜内像素横纵坐标，I(x_m，y_m)表示整个掩膜内所有像素的集合。in, represents the image after data enhancement; α represents the linear coefficient used to adjust the gamma correction scale; β∈N(0,1), obeys Gaussian distribution, and represents random noise interference; γ represents the gamma correction illumination intensity adjustment coefficient, _xm , _ym represent the horizontal and vertical coordinates of the pixels in the mask respectively, and I( _xm , _ym ) represents the set of all pixels in the entire mask.

对于掩模外的像素点除随机光照扰动之外还需计算衰减系数，距离掩模越远，对光照的影响越小；For pixels outside the mask In addition to the random illumination perturbations, the attenuation coefficient must also be calculated. The farther away from the mask, the smaller the impact on the illumination;

其中ω表示衰减系数，x_o、y_o分别表示掩模外像素点的横纵坐标，I(x_o，y_o)分别表示增强后的图像和增强前的图像，x_i、y_i、σ²分别表示掩模内的其中一点的坐标和方差。Where ω represents the attenuation coefficient, x _o and _yo represent the horizontal and vertical coordinates of the pixel outside the mask, I(x _o , _yo ) represents the image after enhancement and the image before enhancement, respectively. x _i , y _i , and σ ² represent the coordinates and variance of a point in the mask, respectively.

对于随机位移扰动：For random displacement perturbations:

采集原始数据的过程中，物体的摆放往往是在预先设定好的指定位置范围内放置的。尽管该种策略能极大的减少工作量，但是却无法充分涵盖真实场景中可能的任意位置摆放情况。为增强摆放位置的随机性，加强数据集的拟真性，且考虑到鱼眼相机的畸变特性，在本发明的其中一些实施例中，先对原始鱼眼图像进行畸变校正，随后在合成图像时对掩模进行预定范围(根据图像大小决定)的随机位移，随后再利用鱼眼畸变模型还原回原图。In the process of collecting raw data, objects are often placed within a pre-set specified position range. Although this strategy can greatly reduce the workload, it cannot fully cover the possible placement of any position in the real scene. In order to enhance the randomness of the placement position, enhance the realism of the data set, and take into account the distortion characteristics of the fisheye camera, in some embodiments of the present invention, the original fisheye image is first distorted and corrected, and then the mask is randomly displaced within a predetermined range (determined by the image size) when synthesizing the image, and then the fisheye distortion model is used to restore the original image.

步骤5、利用数据增强后的合成数据集训练物体检测网络与模型部署。Step 5: Use the data-enhanced synthetic dataset to train the object detection network and model deployment.

现有的检测网络都适用，在本发明的其中一些实施例中，采用的网络是cascadeCNN，模型部署指的是把模型放到实际的场景中去应用，比如放到移动设备。All existing detection networks are applicable. In some embodiments of the present invention, the network used is cascadeCNN. Model deployment refers to applying the model in actual scenarios, such as on mobile devices.

步骤6、将待检测图像输入训练好的物体检测网络中，获得物体检测结果。Step 6: Input the image to be detected into the trained object detection network to obtain the object detection result.

由于卷积神经网络强大的特征学习能力，目前基于深度学习的目标检测算法多采用端到端的机制。对卷积网络模型输入大量带有感兴趣类别物体的图片和相应的标签数据，在不断迭代学习的过程中，训练网络模型去判断输入图片中是否有感兴趣的物体类别，并输出相应的置信度。网络模型的鲁棒性，很大程度上受到训练数据丰富性的影响。受益于本发明提出的数据生成方案，在人力物力有限的条件下，确保了训练数据的数量与质量。如图4所示，在真实场景下，通过采用本发明提供的检测方法，可以很好地识别出场景中的物体。Due to the powerful feature learning ability of convolutional neural networks, the current target detection algorithms based on deep learning mostly adopt an end-to-end mechanism. A large number of pictures with objects of interest and corresponding label data are input to the convolutional network model. In the process of continuous iterative learning, the network model is trained to determine whether there are object categories of interest in the input pictures, and output the corresponding confidence. The robustness of the network model is greatly affected by the richness of the training data. Benefiting from the data generation scheme proposed in the present invention, the quantity and quality of the training data are ensured under the condition of limited manpower and material resources. As shown in Figure 4, in a real scene, by adopting the detection method provided by the present invention, the objects in the scene can be well identified.

综上，本发明实施例所提供方法主要用于解决密集场景下鱼眼全景图像的图像畸变以及复杂遮挡导致物体检测性能下降的问题，其适用于空间范围狭小且目标物体位置紧密并存在严重遮挡的场景。由于实际场景下不同物体摆放在不同位置时存在大量的变化，给数据集的采集带来了非常大的困难。本发明实施例所采用的的方案是根据物体之间的空间位置关系推断可能的遮挡关系人工模拟遮挡并随机合成大量的数据，构建具有丰富变化的遮挡样本用于训练模型。本方法能极大降低数据的采集成本，并可以无限制地合成新的数据，用于模拟各种可能的遮挡，提升复杂场景下的物体检测准确率。In summary, the method provided by the embodiment of the present invention is mainly used to solve the problem of image distortion of fisheye panoramic images in dense scenes and the problem of reduced object detection performance caused by complex occlusion. It is suitable for scenes with a narrow spatial range, close positions of target objects and severe occlusion. Since there are a lot of changes when different objects are placed in different positions in actual scenes, it brings great difficulties to the collection of data sets. The solution adopted by the embodiment of the present invention is to artificially simulate occlusion and randomly synthesize a large amount of data based on the spatial position relationship between objects to infer possible occlusion relationships, and to construct occlusion samples with rich variations for training models. This method can greatly reduce the cost of data collection, and can synthesize new data without restriction to simulate various possible occlusions and improve the accuracy of object detection in complex scenes.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其他实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present invention. Therefore, the present invention will not be limited to the embodiments shown herein, but rather to the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An anti-shielding object detection method based on a fisheye camera in a dense scene is characterized by comprising the following steps:

collecting an original image in an original scene and marking to obtain a mask of a single object;

based on the object mask, synthesizing an image according to a shielding relation based on a spatial position relation between objects and a preset shielding priority, and obtaining a synthesized data set;

Dividing the synthetic data set into a plurality of sub data sets with different recognition difficulties according to the recognition difficulties of the objects;

Performing data enhancement on the synthesized data set;

Training an object detection network by adopting a synthetic data set after data enhancement;

Inputting the image to be detected into a trained object detection network to obtain an object detection result;

wherein the data enhancement mode is based on random brightness adjustment and random displacement disturbance of the mask;

The method for randomly adjusting the brightness of the inside and the outside of the mask comprises the following steps:

Adjusting the disturbance of the illumination by random gamma correction in the mask for the pixel points in the mask Carrying out random illumination disturbance by adopting a formula (1):

（1）

Wherein, Representing the image after the data enhancement; /(I)Representing a linear coefficient for adjusting the gamma correction scale; /(I)Obeying Gaussian distribution to represent random noise interference; /(I)Representing gamma corrected illumination intensity adjustment coefficient,/>Respectively represent the horizontal and vertical coordinates of pixels in the mask/>Representing a set of all pixels within the entire mask;

outside the mask, the disturbance attenuation coefficient is calculated by Gaussian kernel to adjust the illumination disturbance:

Wherein the method comprises the steps of Representing the attenuation coefficient,/>Respectively represent the abscissa and the ordinate of the pixel points outside the mask,/>、/>Representing an enhanced image and a pre-enhanced image, respectively,/>、/>、/>Representing the coordinates and variance of one of the points within the mask, respectively;

the random displacement disturbance is that the original fisheye image is subjected to distortion correction, then the mask is subjected to random displacement within a preset range when the image is synthesized, and then the fisheye distortion model is used for restoring the original image.

2. The method for detecting an anti-occlusion object in a dense scene based on a fisheye camera according to claim 1, wherein the capturing an original image in an original scene comprises:

dividing an original scene into M rows and N columns, and collecting an original image at a designated position of the divided original scene;

randomly rotating the divided original scene for T times, respectively collecting images, and collecting the total Zhang Shan object image,/>K objects are represented.

3. The method for detecting the anti-occlusion object in the dense scene based on the fisheye camera according to claim 1, wherein the occlusion relation between different objects is determined according to the relative position relation between the mask position and the imaging center of the fisheye camera according to the occlusion relation based on the spatial position relation between the objects, and the object far from the center position is occluded by the object near the center.

4. The method for detecting the anti-occlusion object in the dense scene based on the fisheye camera according to claim 1, wherein the preset occlusion priority is represented by a numerical value, the occlusion priority is represented by a number from small to large, and the higher the occlusion priority is, the more likely surrounding objects are occluded.

5. The method for detecting the anti-occlusion object in the dense scene based on the fisheye camera according to claim 1, wherein the synthetic dataset is divided into a plurality of sub-datasets with different recognition difficulties according to the recognition difficulty of the object, and the recognition difficulty is measured according to the number of the objects.

6. The method for detecting an anti-occlusion object in a dense scene based on a fisheye camera according to claim 5, wherein the data set is divided into 4 kinds of difficulties, and the proportions of the sub data sets are as follows: sub-data set a, sub-data set B, sub-data set C, sub-data set d=0.15:0.35:0.35:0.1.

7. The method for detecting an anti-occlusion object in a dense scene based on a fisheye camera according to claim 1, wherein the object detection network is CASCADECNN.