CN111783753B

CN111783753B - Person Re-identification Method Based on Semantically Consistent Horizontal Bars and Foreground Modification

Info

Publication number: CN111783753B
Application number: CN202010918791.1A
Authority: CN
Inventors: 郭海云; 朱宽; 王金桥; 唐明
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Zhongke Zidong Taichu Beijing Technology Co ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-12-15
Anticipated expiration: 2040-09-04
Also published as: CN111783753A

Abstract

The invention belongs to the field of computer vision and pattern recognition, in particular to a pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction, and aims to solve the problem of poor robustness of existing pedestrian re-identification methods. The method of the present invention includes: acquiring an image to be recognized as an input image; extracting features of the input image as a first feature; and based on the first feature, respectively acquiring foregrounds corresponding to pedestrians in the input image through a row classifier in a pedestrian re-identification model. The feature is used as the second feature, and the feature of the horizontal strip area of each set part of the pedestrian in the input image is obtained as the third feature; the second feature and the third feature are multiplied point-to-point, and spliced with the first feature to obtain the fourth feature ; Calculate the Euclidean distance between the fourth feature and the corresponding features of each image in the image library and sort, and output the sorting result as the re-identification result. The present invention improves the robustness of pedestrian re-identification.

Description

Person Re-identification Method Based on Semantically Consistent Horizontal Bars and Foreground Modification

技术领域technical field

本发明属于计算机视觉和模式识别领域，具体涉及一种基于语义一致水平条和前景修正的行人重识别方法、系统、装置。The invention belongs to the field of computer vision and pattern recognition, and in particular relates to a pedestrian re-identification method, system and device based on semantically consistent horizontal bars and foreground correction.

背景技术Background technique

行人重识别属于图像检索领域的一个子问题。给定一个行人图像，行人重识别任务旨在找到其他场景下的该行人图像。但是由于视角的变换、姿态的差异和物体的遮挡，导致人体的部位可能出现在图片的任何位置。因此，学习一种能够有效定位人体各个部位，并单独提取出有足够判别力的部位特征是十分重要的。Person re-identification is a sub-problem in the field of image retrieval. Given a pedestrian image, the pedestrian re-identification task aims to find this pedestrian image in other scenes. However, due to the transformation of perspective, the difference in posture and the occlusion of objects, the parts of the human body may appear anywhere in the picture. Therefore, it is very important to learn a method that can effectively locate each part of the human body and extract the part features with sufficient discriminative power.

现有的基于部位对齐的行人重识别大概有四类：基于水平条的方法、基于包围框的方法、基于注意力的方法和基于额外语义信息的方法。这其中，基于水平条方法因其方便快捷和相对较高的性能而尤其流行。其中，比较流行的有PCB、MGN、Pyramid等。PCB (YifanSun, Liang Zheng,Yi Yang,Qi Tian,Shengjin Wang.Beyond Part Models: PersonRetrieval with Refined Part Pooling (and A Strong Convolutional Baseline).ECCV, 2018)最早提出将行人图片划分成等高度得水平条，然后单独对每一个水平条进行平均池化得到特征，并单独计算损失。MGN (Guanshuo Wang, Yufeng Yuan, Xiong Chen,Jiwei Li.Learning discriminative features with multiple granularities forperson re-identification. ACM MM, 2018)和Pyramid(Zheng F , Deng C , Sun X ,et al. Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training.CVPR, 2019.)在PCB的基础上，设计了多粒度和有重叠的水平条，大大提升了算法的鲁棒性。但是，以上方法都没有解决以下两个问题：(1)水平条高度和位置固定。由于姿态、视角的差异和遮挡等问题的影响，并不能保证每一个水平条内的语义都是一致的。但是以上方法采用固定水平条，并没有尝试解决这个问题。(2)背景噪声的干扰。在每个水平条的内部，不可避免地会有背景信息的干扰，如何去除水平条内部的背景噪音，目前还没有方法能够解决。基于此，本发明提出了一种基于语义一致水平条和前景修正的行人重识别方法。There are roughly four categories of existing part-alignment-based person re-identification: horizontal bar-based methods, bounding box-based methods, attention-based methods, and methods based on additional semantic information. Among them, horizontal bar-based methods are particularly popular due to their convenience and relatively high performance. Among them, the more popular ones are PCB, MGN, Pyramid, etc. PCB (YifanSun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang.Beyond Part Models: PersonRetrieval with Refined Part Pooling (and A Strong Convolutional Baseline).ECCV, 2018) first proposed to divide pedestrian images into horizontal strips of equal height, Then average pooling is performed on each horizontal bar separately to obtain features, and the loss is calculated separately. MGN (Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li. Learning discriminative features with multiple granularities for person re-identification. ACM MM, 2018) and Pyramid (Zheng F , Deng C , Sun X , et al. Pyramidal Person Re-IDentification. via Multi-Loss Dynamic Training.CVPR, 2019.) On the basis of PCB, multi-granularity and overlapping horizontal bars are designed, which greatly improves the robustness of the algorithm. However, none of the above methods solve the following two problems: (1) The height and position of the horizontal bar are fixed. Due to differences in poses, perspectives, and occlusions, there is no guarantee that the semantics within each horizontal bar are consistent. But the above method uses a fixed horizontal bar and does not try to solve this problem. (2) Interference of background noise. Inside each horizontal bar, there will inevitably be interference of background information. How to remove the background noise inside the horizontal bar has no solution at present. Based on this, the present invention proposes a pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的上述问题，即为了解决现有的行人重识别方法由于水平条高度和位置固定、未消除背景噪声，导致重识别鲁棒性较差的问题，本发明提出了一种基于语义一致水平条和前景修正的行人重识别方法，该方法包括：In order to solve the above problems in the prior art, that is, in order to solve the problem that the existing pedestrian re-identification method has a fixed height and position of the horizontal bar, and the background noise is not eliminated, the re-identification robustness is poor. A person re-identification method based on semantically consistent horizontal bars and foreground correction, which includes:

步骤S10，获取待识别的图像，作为输入图像；Step S10, acquiring the image to be recognized as an input image;

步骤S20，通过行人重识别模型的特征提取层提取所述输入图像的特征，作为第一特征；Step S20, extracting the feature of the input image through the feature extraction layer of the pedestrian re-identification model as the first feature;

步骤S30，基于所述第一特征，通过行人重识别模型中预训练的行分类器分别获取所述输入图像中行人对应的前景特征作为第二特征，获取所述输入图像中行人各设定部位水平条区域的特征作为第三特征；Step S30, based on the first feature, obtain the foreground feature corresponding to the pedestrian in the input image as the second feature through the pre-trained line classifier in the pedestrian re-identification model, and obtain each set part of the pedestrian in the input image. The feature of the horizontal bar area is used as the third feature;

步骤S40，将所述第二特征与所述第三特征进行点对点相乘，并与所述第一特征拼接，得到第四特征；Step S40, performing point-to-point multiplication of the second feature and the third feature, and splicing with the first feature to obtain a fourth feature;

步骤S50，计算所述第四特征与图像库中各图像对应特征的欧式距离并进行排序，将排序结果作为重识别结果进行输出；Step S50, calculating the Euclidean distance between the fourth feature and the corresponding feature of each image in the image library and sorting, and outputting the sorting result as a re-identification result;

其中，所述行人重识别模型基于深度卷积神经网络构建；所述行分类器基于全连接层和softmax层构建。Wherein, the pedestrian re-identification model is constructed based on a deep convolutional neural network; the row classifier is constructed based on a fully connected layer and a softmax layer.

在一些优选的实施方式中，所述行分类器，其训练方法为：In some preferred embodiments, the training method of the row classifier is:

步骤A10，获取训练样本图像集；Step A10, obtaining a training sample image set;

步骤A20，对所述训练样本图像集中的任一图像，提取其行特征并池化，得到其对应的平均特征；Step A20, for any image in the training sample image set, extract its row feature and pool it to obtain its corresponding average feature;

步骤A30，判断当前的迭代次数M是否为N的倍数，若是，则执行步骤A40，否则跳转步骤A50；其中，N、M为自然数；Step A30, judge whether the current number of iterations M is a multiple of N, if yes, then execute step A40, otherwise jump to step A50; wherein, N and M are natural numbers;

步骤A40，提取所述训练样本集中各训练样本图像的行特征，通过自相似聚类获取各设定部位对应的伪标签，并执行步骤A50；Step A40, extracting row features of each training sample image in the training sample set, obtaining pseudo-labels corresponding to each set part through self-similar clustering, and executing Step A50;

步骤A50，计算步骤A20获取的局部特征与所述伪标签的损失，并对所述行分类器进行参数更新。Step A50: Calculate the loss of the local features obtained in Step A20 and the pseudo-label, and update the parameters of the row classifier.

在一些优选的实施方式中，所述自相似聚类为k-means聚类方法。In some preferred embodiments, the self-similar clustering is a k-means clustering method.

在一些优选的实施方式中，“通过行人重识别模型中预训练的行分类器分别获取所述输入图像中行人对应的前景特征作为第二特征”，其方法为：In some preferred embodiments, "respectively obtain the foreground features corresponding to pedestrians in the input image as second features through the pre-trained line classifier in the pedestrian re-identification model", the method is:

通过所述行分类器获取所述输入图像中各像素点对人体前景语义的置信度；Obtain the confidence of each pixel in the input image on the human foreground semantics through the row classifier;

将置信度大于第一设定阈值的像素点作为前景像素，将直线度小于第二设定阈值的像素点作为背景像素；The pixels whose confidence is greater than the first set threshold are used as foreground pixels, and the pixels whose straightness is less than the second set threshold are used as background pixels;

基于提取的前景像素构建的特征作为所述输入图像中行人对应的前景特征。The feature constructed based on the extracted foreground pixels is used as the foreground feature corresponding to the pedestrian in the input image.

在一些优选的实施方式中，“通过行人重识别模型中预训练的行分类器获取所述输入图像中行人各设定部位水平条区域的特征作为第三特征”，其方法为：In some preferred embodiments, "acquiring the features of the horizontal strip area of each set part of the pedestrian in the input image through the pre-trained line classifier in the pedestrian re-identification model as the third feature", the method is:

通过行分类器对所述输入图像进行语义分割，得到所述输入图像中行人各设定部位水平条区域的置信图；Semantic segmentation is performed on the input image by a line classifier, and a confidence map of the horizontal bar area of each set part of the pedestrian in the input image is obtained;

将各置信图分别与所述第一特征进行点对点乘积运算，得到所述输入图像中行人各设定部位水平条区域的特征。Perform a point-to-point product operation on each confidence map and the first feature to obtain the feature of the horizontal strip area of each set part of the pedestrian in the input image.

在一些优选的实施方式中，所述行人重识别模型，其在训练时的损失函数为：In some preferred embodiments, the loss function of the pedestrian re-identification model during training is:

其中，

表示行人重识别模型的损失值，

表示行人重识别模型在训练时一批次的训练样本图像的数量，

表示批次，

表示行人重识别模型在训练时一批次的训练样本图像中的任一图像，

表示图像集A中图像特征与的

的特征欧式距离最大的一张训练样本图像，

表示图像集B中图像特征与

的特征的欧式距离最小的一张训练样本图像，

表示预设的距离间隔，

表示包含与

相同ID的所有图像的图像集，

表示当前批次中除了

中包含的图像外所有图像构建的图像集，

表示欧氏距离。 in,

represents the loss value of the person re-identification model,

Represents the number of training sample images in a batch of the person re-identification model during training,

represents the batch,

Represents any image in a batch of training sample images of the person re-identification model during training,

Represents the image features in the image set A and the

A training sample image with the largest Euclidean distance of features,

Represents the image features in the image set B and

A training sample image with the smallest Euclidean distance of the features,

represents the preset distance interval,

means including

an image set of all images with the same ID,

Indicates that the current batch except

An image set constructed from all images except those contained in ,

represents the Euclidean distance.

本发明的第二方面，提出了一种基于语义一致水平条和前景修正的行人重识别系统，该系统包括：图像获取模块、全局特征提取模块、局部特征提取模块、特征拼接模块、识别输出模块；In the second aspect of the present invention, a pedestrian re-identification system based on semantically consistent horizontal bars and foreground correction is proposed. The system includes: an image acquisition module, a global feature extraction module, a local feature extraction module, a feature splicing module, and a recognition output module. ;

所述图像获取模块，配置为获取待识别的图像，作为输入图像；The image acquisition module is configured to acquire an image to be recognized as an input image;

所述全局特征提取模块，配置为通过行人重识别模型的特征提取层提取所述输入图像的特征，作为第一特征；The global feature extraction module is configured to extract the feature of the input image through the feature extraction layer of the pedestrian re-identification model as the first feature;

所述局部特征提取模块，配置为基于所述第一特征，通过行人重识别模型中预训练的行分类器分别获取所述输入图像中行人对应的前景特征作为第二特征，获取所述输入图像中行人各设定部位水平条区域的特征作为第三特征；The local feature extraction module is configured to, based on the first feature, obtain foreground features corresponding to pedestrians in the input image as second features through a pre-trained row classifier in the pedestrian re-identification model, and obtain the input image The characteristics of the horizontal bar area of each set part of the pedestrian are taken as the third characteristic;

所述特征拼接模块，配置为将所述第二特征与所述第三特征进行点对点相乘，并与所述第一特征拼接，得到第四特征；The feature splicing module is configured to perform point-to-point multiplication of the second feature and the third feature, and splicing with the first feature to obtain a fourth feature;

所述识别输出模块，配置为计算所述第三特征与图像库中各图像对应特征的欧式距离并进行排序，将排序结果作为重识别结果进行输出；The recognition output module is configured to calculate the Euclidean distance between the third feature and the corresponding feature of each image in the image library and sort it, and output the sorting result as a re-identification result;

本发明的第三方面，提出了一种存储装置，其中存储有多条程序，所述程序应用由处理器加载并执行以实现上述的基于语义一致水平条和前景修正的行人重识别方法。In a third aspect of the present invention, a storage device is provided in which a plurality of programs are stored, and the program applications are loaded and executed by a processor to implement the above-mentioned method for pedestrian re-identification based on semantically consistent horizontal bars and foreground correction.

本发明的第四方面，提出了一种处理装置，包括处理器、存储装置；处理器，适用于执行各条程序；存储装置，适用于存储多条程序；所述程序适用于由处理器加载并执行以实现上述的基于语义一致水平条和前景修正的行人重识别方法。In a fourth aspect of the present invention, a processing device is proposed, including a processor and a storage device; the processor is adapted to execute various programs; the storage device is adapted to store multiple programs; the programs are adapted to be loaded by the processor And execute to achieve the above-mentioned semantically consistent horizontal bar and foreground correction based pedestrian re-identification method.

本发明的有益效果：Beneficial effects of the present invention:

本发明提高了行人重识别的鲁棒性。本发明通过预训练的行分类器会将每一行分到特定的语义从而组成语义一致的水平条，可以自适应的调整水平条的高度和位置，以确保每一个水平条内部所包含的语义是一致的，解决了水平条语义一致性的问题。The present invention improves the robustness of pedestrian re-identification. The present invention divides each line into specific semantics through the pre-trained line classifier to form a horizontal bar with consistent semantics, and can adjust the height and position of the horizontal bar adaptively to ensure that the semantics contained in each horizontal bar are Consistent, solves the problem of semantic consistency of horizontal bars.

同时，每一个像素还会被分到前景或背景语义。通过取水平条语义和前景区域的交集，我们便会近似地得到人体各个部位的位置，解决了背景信息的干扰，提高各部分定位的精准性和局部特征的判别性。At the same time, each pixel is also assigned to foreground or background semantics. By taking the intersection of the semantics of the horizontal bar and the foreground area, we can approximate the position of each part of the human body, solve the interference of background information, and improve the accuracy of positioning of each part and the discrimination of local features.

附图说明Description of drawings

通过阅读参照以下附图所做的对非限制性实施例所做的详细描述，本申请的其他特征、目的和优点将会变得更明显。Other features, objects and advantages of the present application will become more apparent upon reading the detailed description of non-limiting embodiments taken with reference to the following drawings.

图1 是本发明一种实施例的基于语义一致水平条和前景修正的行人重识别方法的流程示意图；1 is a schematic flowchart of a pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction according to an embodiment of the present invention;

图2为本发明一种实施例的基于语义一致水平条和前景修正的行人重识别系统的框架示意图；2 is a schematic diagram of the framework of a pedestrian re-identification system based on semantically consistent horizontal bars and foreground correction according to an embodiment of the present invention;

图3是本发明一种实施例的基于语义一致水平条和前景修正的行人重识别方法的简略结构示意图；3 is a schematic structural diagram of a pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction according to an embodiment of the present invention;

图4是本发明一种实施例的本发明行分类器与现有水平条高度和位置固定行分类器的对比效果示意图。FIG. 4 is a schematic diagram of a comparison effect between the row classifier of the present invention and the existing horizontal bar height and position fixed row classifier according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict.

本发明第一实施例的一种基于语义一致水平条和前景修正的行人重识别方法，如图1所示，该方法包括以下步骤：A pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction according to the first embodiment of the present invention, as shown in FIG. 1 , the method includes the following steps:

为了更清晰地对本发明基于语义一致水平条和前景修正的行人重识别方法进行说明，下面对本发明方法一种实施例中各步骤进行展开详述。In order to more clearly describe the pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction of the present invention, each step in an embodiment of the method of the present invention will be described in detail below.

在下文的实施例中，先对行人重识别模型的训练过程进行详述，在再对基于语义一致水平条和前景修正的行人重识别方法获取行人重识别结果的过程进行详述。In the following embodiments, the training process of the pedestrian re-identification model will be described in detail first, and then the process of obtaining the pedestrian re-identification result by the pedestrian re-identification method based on the semantically consistent level bar and foreground correction will be described in detail.

1、行人重识别模型的训练过程1. The training process of the pedestrian re-identification model

步骤B10，对行人重识别模型进行预训练Step B10, pre-training the pedestrian re-identification model

本发明中，行人重识别模型基于深度卷积神经网络构建，深度卷积神经网络在本发明中优选为文献“Sun K , Xiao B , Liu D , et al. Deep High-ResolutionRepresentation Learning for Human Pose Estimation[J]. 2019.”所提出的HRNet网络，HRNet包含多尺度的语义信息，较为适合做人体语义解析和行人重识别共同的网络。行人重识别模型，如图3所示，其中，图3中的神经网络模型指的是卷积神经网络，用于提取特征，对齐后的行人各部位的特征，表示按顺序将人体各部位的特征拼接后得到的特征，具体在下文中解释。In the present invention, the pedestrian re-identification model is constructed based on the deep convolutional neural network, and the deep convolutional neural network is preferably the document "Sun K, Xiao B, Liu D, et al. Deep High-Resolution Representation Learning for Human Pose Estimation in the present invention. [J]. 2019." The proposed HRNet network, HRNet contains multi-scale semantic information, which is more suitable for the common network of human semantic analysis and pedestrian re-identification. The pedestrian re-identification model is shown in Figure 3, where the neural network model in Figure 3 refers to a convolutional neural network, which is used to extract features. The aligned features of each part of the pedestrian represent the sequence of each part of the human body. The features obtained after feature splicing are explained in detail below.

在本实施例中，采用ImageNet数据集对行人重识别模型进行预训练，初始化行人重识别模型的网络参数。在预训练过程中，将选取的样本图像尺寸压缩至

，每一次迭代输入64张图像，迭代训练6000次，在其他实施例中，可以根据实际需求选取模型预训练的迭代次数以及每次迭代输入的样本图像数量。 In this embodiment, the ImageNet data set is used to pre-train the pedestrian re-identification model, and the network parameters of the pedestrian re-identification model are initialized. During pre-training, the selected sample image size is compressed to

, 64 images are input in each iteration, and the iterative training is performed 6000 times. In other embodiments, the number of iterations of model pre-training and the number of sample images input in each iteration can be selected according to actual requirements.

步骤B20，获取训练样本图像集Step B20, obtain a training sample image set

在本实施例中，获取包含行人的训练样本图像，构建训练样本图像集。In this embodiment, training sample images containing pedestrians are acquired, and a training sample image set is constructed.

步骤B30，提取训练样本图像集中各训练样本图像的特征，作为全局特征Step B30, extracting the features of each training sample image in the training sample image set as global features

在本实施例中，通过行人重识别模型的特征提取层提取训练样本图像的特征，作为全局特征，特征提取层为基于卷积神经构建的特征提取层。其中，HRNet最终输出的特征（特征图）大小为64322048，本发明中将HRNet输出的特征通过11的卷积对特征进行降维到6432512大小，然后进行行划分和前景修正操作。In this embodiment, the features of the training sample images are extracted through the feature extraction layer of the person re-identification model as global features, and the feature extraction layer is a feature extraction layer constructed based on convolutional neural networks. Among them, the final output feature (feature map) size of HRNet is 64322048. In the present invention, the feature output by HRNet is reduced to 6432512 by 11 convolution, and then row division and foreground correction are performed.

步骤B40，基于全局特征，通过行人重识别模型中预训练的行分类器分别获取各训练样本图像中行人对应的前景特征、获取各训练样本图像中行人各设定部位水平条区域的特征。Step B40 , based on the global features, obtain foreground features corresponding to pedestrians in each training sample image and features of horizontal bar areas of each set part of pedestrians in each training sample image through the pre-trained row classifier in the pedestrian re-identification model.

本发明提出了一种语义一致水平条和前景修正分类器，先对行人图像进行语义一致水平条的划分，然后再去除水平条内部的背景。该网络包含语义一致水平条和前景修正模块。前者用迭代聚类的方式生成水平条的伪标签，然后指导水平条划分的学习；后者用学到的水平条分类器得到前景响应图，并利用前景响应图指导前背景的划分。最后，通过组合全局和局部特征来获得有效的行人特征描述。具体如下：The invention proposes a semantically consistent horizontal bar and foreground correction classifier, which firstly divides the pedestrian image into semantically consistent horizontal bars, and then removes the background inside the horizontal bar. The network contains semantically consistent horizontal bars and a foreground correction module. The former generates pseudo-labels of horizontal bars by iterative clustering, and then guides the learning of horizontal bar division; the latter uses the learned horizontal bar classifier to obtain the foreground response map, and uses the foreground response map to guide the division of the foreground and background. Finally, efficient pedestrian feature descriptions are obtained by combining global and local features. details as follows:

在本实施例中，语义一致水平条的划分，即行划分，行划分主要包括一个行分类器，由全连接层和softmax层构建，可以将每一行分到不同的语义。首先通过对训练样本图像每一行进行池化操作（即图3中的行单元池化），得到每一行的平均特征，即行特征，然后用行分类器将每一行平均特征进行分类，每一行平均特征被分到的类别就代表这一行的语义部分，即通过行分类器对训练样本图像中行人设定身体部位区域进行语义分割，获取行人设定身体部位区域的置信图，将各置信图与全局特征进行点对点乘积运算，形成对应于身体不同部位的加权特征图（行人各设定部位水平条区域（行区域）的特征）。In this embodiment, the division of semantically consistent horizontal bars, ie row division, mainly includes a row classifier, which is constructed by a fully connected layer and a softmax layer, and can classify each row into different semantics. First, perform the pooling operation on each row of the training sample image (that is, row unit pooling in Figure 3) to obtain the average feature of each row, that is, the row feature, and then use the row classifier to classify the average feature of each row. The category into which the feature is classified represents the semantic part of the line, that is, the line classifier is used to perform semantic segmentation on the body part area of the pedestrian set in the training sample image, and the confidence map of the pedestrian set body part area is obtained. The point-to-point product operation is performed on the global features to form weighted feature maps corresponding to different parts of the body (features of the horizontal bar area (row area) of each set part of the pedestrian).

本发明中优选获取行人五个身体部位的水平条特征，分别对应头部，胸部，腹部，腿部和脚部的特征，表示为M1，M2，M3，M4和M5。如图4所示，图4中的（a）（b）（c）为现有的水平条高度和位置固定行分类器获取的特征图，可以发现，利用现有的人体解析模型得到的特征图（即拼接后的特征图）无法利用背包等有效信息而影响性能。In the present invention, it is preferable to obtain the horizontal bar features of the five body parts of the pedestrian, which correspond to the features of the head, chest, abdomen, legs and feet respectively, which are represented as M1, M2, M3, M4 and M5. As shown in Figure 4, (a) (b) (c) in Figure 4 are the feature maps obtained by the existing horizontal bar height and position fixed row classifier. It can be found that the features obtained by using the existing human analysis model The graph (that is, the feature map after splicing) cannot use effective information such as the backpack to affect the performance.

其中，行分类器在训练过程中使用迭代聚类的方法给每一行分配伪标签。即每进行n次训练阶段，我们便将图像每一行（行人设定部位的水平条区域）的特征均值进行聚类，然后按照从上至下的位置分配语义。在之后的训练过程中，将分配到的语义伪标签用于监督行分类器的学习。这样，本发明可自适应地对图片中地不同语义部分进行水平条划分，得到语义一致的水平条。行分类器的训练过程具体如下：Among them, the row classifier uses an iterative clustering method to assign pseudo-labels to each row during training. That is, for every n training stages, we cluster the feature mean of each row of the image (the horizontal bar area of the pedestrian setting part), and then assign semantics according to the position from top to bottom. In the subsequent training process, the assigned semantic pseudo-labels are used to supervise the learning of the row classifier. In this way, the present invention can adaptively divide the different semantic parts in the picture into horizontal stripes to obtain horizontal strips with consistent semantics. The training process of the row classifier is as follows:

步骤B41，对所述训练样本图像集中的任一图像，提取其行特征并池化，得到其对应的平均特征；Step B41, for any image in the training sample image set, extract its row feature and pool it to obtain its corresponding average feature;

步骤B42，判断当前的迭代次数M是否为N的倍数，若是，则执行步骤B43，否则跳转步骤B44；其中，N、M为自然数；当前的迭代次数M也为行人重识别训练模型的当前迭代次数；Step B42, judging whether the current number of iterations M is a multiple of N, if so, execute step B43, otherwise jump to step B44; wherein, N and M are natural numbers; the current number of iterations M is also the current pedestrian re-identification training model. the number of iterations;

步骤B43，提取所述训练样本集中所有训练样本图像的平均特征，并通过自相似聚类获取每一行对应的伪标签进行更新，执行步骤B44；Step B43, extract the average feature of all the training sample images in the training sample set, and obtain the pseudo-label corresponding to each row through self-similar clustering to update, and perform step B44;

步骤B44，计算步骤B41获取的平均特征与更新后的伪标签的损失，并对所述行分类器进行参数更新。Step B44: Calculate the loss of the average feature obtained in Step B41 and the updated pseudo-label, and update the parameters of the row classifier.

步骤B45，循环执行步骤B41-B44，直至得到训练好的行分类器。In step B45, steps B41-B44 are executed cyclically until a trained row classifier is obtained.

自相似聚类在本发明中k-means聚类算法。Self-similar clustering is the k-means clustering algorithm in the present invention.

在行分类器获取行人各设定部位的局部特征后，为了进一步的去除各设定部位水平条中的背景像素，减少噪音干扰。本发明设计了前景引导的部位精细化方法，即添加了一个前背景分类器，来预测训练样本图像每一个像素是属于前景还是背景。鉴于之前已经学到了行分类器，本发明优选用行分类器去区分各像素点对人体各设定部位的置信度，本发明优选将置信度大于0.8的作为前景像素，将置信度小于0.2作为背景像素，其他的作为中立像素（即图3中的中立）。基于提取的前景像素构建的特征作为训练样本图像中行人对应的前景特征（即图3中以像素为单元，提取前景/背景特征）。After the line classifier obtains the local features of each set part of the pedestrian, in order to further remove the background pixels in the horizontal bar of each set part, the noise interference is reduced. The present invention designs a foreground-guided part refinement method, that is, a foreground-background classifier is added to predict whether each pixel of the training sample image belongs to the foreground or the background. In view of the fact that the line classifier has been learned before, the present invention preferably uses the line classifier to distinguish the confidence of each pixel on each set part of the human body. The present invention preferably uses the confidence degree greater than 0.8 as the foreground pixel, and the confidence degree less than 0.2 as the foreground pixel Background pixels, others as neutral pixels (ie neutral in Figure 3). The features constructed based on the extracted foreground pixels are used as the foreground features corresponding to the pedestrians in the training sample images (that is, the foreground/background features are extracted in units of pixels in Figure 3).

步骤B50，特征拼接Step B50, feature splicing

在本实施例中，首先，将M1-5通过全局特征池化压缩为5个256维度的向量，记作S1-5。然后，将M1-5相加，得到前景特征图，并由平均池化映射压缩为一个256维向量，记作S6。将全局特征通过平均池化映射压缩为一个256维向量，记作S7，这个特征向量可以很好地传递整体抽象特征。最后拼接上述三个特征向量，最终获得一个7×256维的特征，来表征行人融合后的特征。In this embodiment, first, M1-5 is compressed into five 256-dimensional vectors through global feature pooling, which is denoted as S1-5. Then, M1-5 are added to obtain the foreground feature map, which is compressed into a 256-dimensional vector by the average pooling map, denoted as S6. The global feature is compressed into a 256-dimensional vector by the average pooling map, denoted as S7, this feature vector can well convey the overall abstract feature. Finally, the above three feature vectors are spliced, and a 7×256-dimensional feature is finally obtained to represent the feature after pedestrian fusion.

其中，S6也可以直接通过行分类器获取各训练样本图像的前景特征，若通过行分类器获取各训练样本图像的前景特征，则将S1-5与S6进行点对点相乘后与S7拼接，来表征行人融合后的特征。Among them, S6 can also directly obtain the foreground features of each training sample image through the line classifier. If the foreground features of each training sample image are obtained through the line classifier, S1-5 and S6 are multiplied point-to-point and then spliced with S7 to get Characterize the features after pedestrian fusion.

步骤B60，计算拼接后的特征与图像库中各图像对应特征的欧式距离并进行排序，将排序结果作为重识别结果进行输出。Step B60: Calculate the Euclidean distance between the spliced feature and the corresponding feature of each image in the image library and sort, and output the sorting result as a re-identification result.

在本实施例中，计算拼接后的行人特征与图像库中各图像对应的欧式距离并进行升序排序，Rank-1（排名第一）和排序靠前的匹配率越高，表明对目标重识别任务的效果越好。图像库为存储多张行人图像的数据库。In this embodiment, the Euclidean distances corresponding to the spliced pedestrian features and the images in the image database are calculated and sorted in ascending order. The higher the matching rate of Rank-1 (ranked first) and the higher ranking, indicates that the target is re-identified. The better the task is. The image library is a database that stores multiple pedestrian images.

基于重识别后的结果，本发明采用三元组损失来监督整个网络的训练。该损失的核心思想是通过距离间隔将不匹配的行人对与匹配的行人对分离开来，以增大类间差异，缩小类内差异，如公式（1）所示：Based on the re-identified results, the present invention uses triple loss to supervise the training of the entire network. The core idea of this loss is to separate the mismatched pedestrian pairs from the matched pedestrian pairs by distance interval, so as to increase the inter-class difference and reduce the intra-class difference, as shown in formula (1):

（1）

(1)

其中，

表示行人重识别模型的损失值，

表示批次，

表示图像集A中图像特征与的

的特征欧式距离最大的一张训练样本图像，即最不像的正样本，

表示图像集B中图像特征与

的特征的欧式距离最小的一张训练样本图像，即最像的负样本，

构成一个三元组，

表示预设的距离间隔，

表示包含与

相同ID的所有图像的图像集，

表示当前批次中除了

中包含的图像外所有图像构建的图像集，

表示欧氏距离。 in,

represents the loss value of the person re-identification model,

represents the batch,

Represents the image features in the image set A and the

A training sample image with the largest Euclidean distance of the features, that is, the most dissimilar positive sample,

Represents the image features in the image set B and

A training sample image with the smallest Euclidean distance of the feature, that is, the most similar negative sample,

form a triple,

represents the preset distance interval,

means including

an image set of all images with the same ID,

Indicates that the current batch except

An image set constructed from all images except those contained in ,

represents the Euclidean distance.

基于上述损失对行人重识别模型的网络参数进行更新，并跳转至步骤B20，直至得到训练好的行人重识别模型。Update the network parameters of the pedestrian re-identification model based on the above loss, and jump to step B20 until the trained pedestrian re-identification model is obtained.

2、基于语义一致水平条和前景修正的行人重识别方法2. Pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction

在本实施例中，先获取识别的行人图像，作为输入图像。In this embodiment, the recognized pedestrian image is obtained first as the input image.

在本实施例中，通过上述训练好的行人重识别模型获取输入图像中行人的全局特征，即基于行人重识别模型的特征提取层提取输入图像的特征，作为第一特征。In this embodiment, the global features of pedestrians in the input image are obtained through the trained pedestrian re-identification model, that is, the features of the input image are extracted based on the feature extraction layer of the pedestrian re-identification model as the first features.

在本实施例中，通过行分类器获取输入图像中各像素点对人体前景语义的置信度，将置信度大于第一设定阈值的像素点作为前景像素，将置信度小于第二设定阈值的像素点作为背景像素，基于提取的前景像素构建的特征作为所述输入图像中行人对应的前景特征，作为第二特征。In this embodiment, the line classifier is used to obtain the confidence of each pixel in the input image on the human foreground semantics, the pixels with the confidence greater than the first set threshold are regarded as foreground pixels, and the confidence is less than the second set threshold. The pixel points are used as background pixels, and the feature constructed based on the extracted foreground pixels is used as the foreground feature corresponding to the pedestrian in the input image, as the second feature.

通过行分类器对输入图像进行语义分割，得到输入图像中行人各设定部位水平条区域的置信图，将各置信图分别与第一特征进行点对点乘积运算，得到输入图像中行人各设定部位水平条区域的特征，作为第三特征。The input image is semantically segmented by the row classifier, and the confidence map of the horizontal bar area of each set part of the pedestrian in the input image is obtained. The feature of the horizontal bar area, as the third feature.

在本实施例中，将获取的行人的各特征进行拼接。In this embodiment, the acquired characteristics of pedestrians are spliced together.

步骤S50，计算所述第四特征与图像库中各图像对应特征的欧式距离并进行排序，将排序结果作为重识别结果进行输出。Step S50: Calculate the Euclidean distance between the fourth feature and the feature corresponding to each image in the image library and sort, and output the sorting result as a re-identification result.

在本实施例中，计算拼接后的第四特征与图像库中各行人图像对应的特征之间的欧式距离，并进行排序，将排序结果作为重识别结果进行输出。本发明中优选采用升序排序，排序越靠前表明匹配率越高。In this embodiment, the Euclidean distance between the spliced fourth feature and the feature corresponding to each pedestrian image in the image library is calculated, and sorted, and the sorting result is output as the re-identification result. In the present invention, ascending order is preferably used, and the higher the order is, the higher the matching rate is.

本发明第二实施例的一种基于语义一致水平条和前景修正的行人重识别系统，如图2所示，包括：图像获取模块100、全局特征提取模块200、局部特征提取模块300、特征拼接模块400、识别输出模块500；A pedestrian re-identification system based on semantically consistent horizontal bars and foreground correction according to the second embodiment of the present invention, as shown in FIG. 2, includes: an image acquisition module 100, a global feature extraction module 200, a local feature extraction module 300, and a feature splicing module. Module 400, identifying and outputting module 500;

所述图像获取模块100，配置为获取待识别的图像，作为输入图像；The image acquisition module 100 is configured to acquire an image to be recognized as an input image;

所述全局特征提取模块200，配置为通过行人重识别模型的特征提取层提取所述输入图像的特征，作为第一特征；The global feature extraction module 200 is configured to extract the feature of the input image through the feature extraction layer of the pedestrian re-identification model as the first feature;

所述局部特征提取模块300，配置为基于所述第一特征，通过行人重识别模型中预训练的行分类器分别获取所述输入图像中行人对应的前景特征作为第二特征，获取所述输入图像中行人各设定部位水平条区域的特征作为第三特征；The local feature extraction module 300 is configured to, based on the first feature, obtain the foreground feature corresponding to the pedestrian in the input image as the second feature through the pre-trained row classifier in the pedestrian re-identification model, respectively, and obtain the input image. The feature of the horizontal strip area of each set part of the pedestrian in the image is used as the third feature;

所述特征拼接模块400，配置为将所述第二特征与所述第三特征进行点对点相乘，并与所述第一特征拼接，得到第四特征；The feature splicing module 400 is configured to perform point-to-point multiplication of the second feature and the third feature, and splicing with the first feature to obtain a fourth feature;

所述识别输出模块500，配置为计算所述第四特征与图像库中各图像对应特征的欧式距离并进行排序，将排序结果作为重识别结果进行输出；The recognition output module 500 is configured to calculate the Euclidean distance between the fourth feature and the corresponding features of each image in the image library, and sort them, and output the sorting result as a re-identification result;

所述技术领域的技术人员可以清楚的了解到，为描述的方便和简洁，上述描述的系统的具体的工作过程及有关说明，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the technical field can clearly understand that, for the convenience and brevity of description, for the specific working process and related description of the system described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here.

需要说明的是，上述实施例提供的基于语义一致水平条和前景修正的行人重识别系统，仅以上述各功能模块的划分进行举例说明，在实际应用中，可以根据需要而将上述功能分配由不同的功能模块来完成，即将本发明实施例中的模块或者步骤再分解或者组合，例如，上述实施例的模块可以合并为一个模块，也可以进一步拆分成多个子模块，以完成以上描述的全部或者部分功能。对于本发明实施例中涉及的模块、步骤的名称，仅仅是为了区分各个模块或者步骤，不视为对本发明的不当限定。It should be noted that the pedestrian re-identification system based on semantically consistent horizontal bars and foreground correction provided by the above-mentioned embodiments is only illustrated by the division of the above-mentioned functional modules. It can be completed by different functional modules, that is, the modules or steps in the embodiments of the present invention are decomposed or combined. For example, the modules in the above-mentioned embodiments can be combined into one module, and can also be further split into multiple sub-modules, so as to complete the above description. All or part of the functionality. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing each module or step, and should not be regarded as an improper limitation of the present invention.

本发明第三实施例的一种存储装置，其中存储有多条程序，所述程序适用于由处理器加载并实现上述的基于语义一致水平条和前景修正的行人重识别方法。A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are suitable for being loaded by a processor and implementing the above-mentioned pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction.

本发明第四实施例的一种处理装置，包括处理器、存储装置；处理器，适于执行各条程序；存储装置，适于存储多条程序；所述程序适于由处理器加载并执行以实现上述的基于语义一致水平条和前景修正的行人重识别方法。A processing device according to a fourth embodiment of the present invention includes a processor and a storage device; the processor is adapted to execute various programs; the storage device is adapted to store multiple programs; the programs are adapted to be loaded and executed by the processor In order to realize the above-mentioned pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction.

所述技术领域的技术人员可以清楚的了解到，为描述的方便和简洁，上述描述的存储装置、处理装置的具体工作过程及有关说明，可以参考前述方法实例中的对应过程，在此不再赘述。Those skilled in the technical field can clearly understand that, for the convenience and brevity of description, the specific working process and related description of the storage device and processing device described above can refer to the corresponding process in the foregoing method example, which is not repeated here. Repeat.

本领域技术人员应该能够意识到，结合本文中所公开的实施例描述的各示例的模块、方法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，软件模块、方法步骤对应的程序可以置于随机存储器（RAM）、内存、只读存储器（ROM）、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。为了清楚地说明电子硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以电子硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those skilled in the art should be aware that the modules and method steps of each example described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, and the programs corresponding to the software modules and method steps Can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or as known in the art in any other form of storage medium. In order to clearly illustrate the interchangeability of electronic hardware and software, the components and steps of each example have been described generally in terms of functionality in the foregoing description. Whether these functions are performed in electronic hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods of implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the present invention.

术语“第一”、“第二”、“第三”等是用于区别类似的对象，而不是用于描述或表示特定的顺序或先后次序。The terms "first," "second," "third," etc. are used to distinguish between similar objects, and are not used to describe or indicate a particular order or sequence.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described with reference to the preferred embodiments shown in the accompanying drawings, however, those skilled in the art can easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after these changes or substitutions will fall within the protection scope of the present invention.

Claims

1. a pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction, is characterized in that, the method comprises:

Step S10, acquiring the image to be recognized as an input image;

Step S20, extracting the feature of the input image through the feature extraction layer of the pedestrian re-identification model as the first feature;

Step S30, based on the first feature, obtain the foreground feature corresponding to the pedestrian in the input image as the second feature through the pre-trained line classifier in the pedestrian re-identification model, and obtain each set part of the pedestrian in the input image. The feature of the horizontal bar area is used as the third feature;

Step S40, performing point-to-point multiplication of the second feature and the third feature, and splicing with the first feature to obtain a fourth feature;

Step S50, calculating the Euclidean distance between the fourth feature and the corresponding feature of each image in the image library and sorting, and outputting the sorting result as a re-identification result;

Wherein, the pedestrian re-identification model is constructed based on a deep convolutional neural network; the row classifier is constructed based on a fully connected layer and a softmax layer, and the training method is:

Step A10, obtaining a training sample image set;

Step A20, for any image in the training sample image set, extract its row feature and pool it to obtain its corresponding average feature;

Step A30, judge whether the current number of iterations M is a multiple of N, if yes, then execute step A40, otherwise jump to step A50; wherein, N and M are natural numbers;

Step A40, extract the average feature of all training sample images in the training sample image set, and obtain the pseudo-label corresponding to each row through self-similar clustering, and execute Step A50;

In step A50, the average feature obtained in step A20 and the loss of the pseudo-label are calculated, and the parameters of the row classifier are updated.

2 . The pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction according to claim 1 , wherein the self-similar clustering is a k-means clustering method. 3 .

3. The pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction according to claim 1, is characterized in that, "respectively obtain the corresponding pedestrians in the input image through the pre-trained row classifier in the pedestrian re-identification model. The foreground feature is used as the second feature", and its method is:

Obtain the confidence of each pixel in the input image to each part of the human body through the row classifier;

The pixels whose confidence is greater than the first set threshold are regarded as foreground pixels, and the pixels whose confidence is less than the second set threshold are regarded as background pixels;

The feature constructed based on the extracted foreground pixels is used as the foreground feature corresponding to the pedestrian in the input image.

4. The pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction according to claim 1, is characterized in that, "obtain each setting of pedestrians in the input image through the pre-trained row classifier in the pedestrian re-identification model. The feature of the horizontal bar area of the part is used as the third feature", and its method is:

Semantically segment the input image by the line classifier, and obtain a confidence map of the horizontal bar area of each set part of the pedestrian in the input image;

Perform a point-to-point product operation on each confidence map and the first feature to obtain the feature of the horizontal strip area of each set part of the pedestrian in the input image.

5. the pedestrian re-identification method based on semantically consistent horizontal bar and foreground correction according to claim 1, is characterized in that, described pedestrian re-identification model, its loss function during training is:

in,

represents the loss value of the person re-identification model,

represents the batch,

Represents the image features in the image set A and

A training sample image with the largest Euclidean distance of the features,

Represents the image features in the image set B and

A training sample image with the smallest Euclidean distance of the features,

represents the preset distance interval,

means including

an image set of all images with the same ID,

Indicates that the current batch except

An image set constructed from all images except those contained in ,

represents the Euclidean distance.

6. A pedestrian re-identification system based on semantically consistent horizontal bars and foreground correction, characterized in that the system comprises: an image acquisition module, a global feature extraction module, a local feature extraction module, a feature splicing module, and a recognition output module;

The image acquisition module is configured to acquire an image to be recognized as an input image;

The global feature extraction module is configured to extract the feature of the input image through the feature extraction layer of the pedestrian re-identification model as the first feature;

The local feature extraction module is configured to, based on the first feature, obtain foreground features corresponding to pedestrians in the input image as second features through a pre-trained row classifier in the pedestrian re-identification model, and obtain the input image The characteristics of the horizontal bar area of each set part of the pedestrian are taken as the third characteristic;

The feature splicing module is configured to perform point-to-point multiplication of the second feature and the third feature, and splicing with the first feature to obtain a fourth feature;

The recognition output module is configured to calculate the Euclidean distance between the fourth feature and the corresponding features of each image in the image library and sort them, and output the sorting result as a re-identification result;

Step A10, obtaining a training sample image set;

7. A storage device, wherein a plurality of programs are stored, wherein the programs are adapted to be loaded and executed by a processor to implement the semantically consistent horizontal bar and foreground correction according to any one of claims 1-5 Pedestrian Re-ID method.

8. A processing device, comprising a processor and a storage device; the processor is adapted to execute each program; the storage device is adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by the processor to Implement the pedestrian re-identification method based on semantically consistent horizontal bars and foreground correction according to any one of claims 1-5.