CN112801020B

CN112801020B - Pedestrian re-identification method and system based on background grayscale

Info

Publication number: CN112801020B
Application number: CN202110174952.5A
Authority: CN
Inventors: 黄立勤; 杨庆庆; 潘林; 杨明静
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2022-10-14
Anticipated expiration: 2041-02-09
Also published as: CN112801020A

Abstract

The invention relates to a pedestrian re-identification method and a system based on background graying, wherein the method comprises the following steps: s1, carrying out background graying processing on an original image to obtain a processed BGg image; s2, based on a ResNet50 two-way network, one way is background gray scale Stream BGg-Stream, feature extraction is carried out on BGg images, and the other way is global Stream G-Stream, feature extraction is carried out on original images; s3, interacting the two paths of networks through cascade connection; s4, connecting the SCAB module with the characteristic diagrams obtained by the two networks at each stage to serve as input characteristic diagrams of the next layer convolution of the BGg-stream, so that the two networks are connected, and the two characteristics are combined; and S5, updating network parameters by adopting a triple loss function, calculating similarity, and finally outputting a permutation sequence. The method and the system are favorable for weakening background interference and improving the accuracy of pedestrian re-identification.

Description

Pedestrian re-identification method and system based on background grayscale

技术领域technical field

本发明属于计算机视觉领域，具体涉及一种基于背景灰度化的行人再识别方法及系统。The invention belongs to the field of computer vision, and in particular relates to a pedestrian re-identification method and system based on background grayscale.

背景技术Background technique

人的重新识别(ReID)是计算机视觉中一个很好的研究问题，目的是通过从不同摄像机拍摄的大量图像中检索特定行人的图像。它是视频监控中最基本的视觉识别问题，具有广泛的应用前景。ReID提出的传统方法大多采用颜色直方图和纹理直方图的低阶特征，利用度量学习找到一个距离函数，该函数将来自同一类的图像之间的距离最小化，将不同类之间的距离最大化。ReID已经在学术界研究多年，直到最近几年随着深度学习的发展，才取得了非常巨大的突破。但依然不能忽视的问题是，深度卷积特征作为高维特征，很容易受人的姿态、物体遮挡、光照强度的不同、背景杂波等因素的影响。Person re-identification (ReID) is a well-researched problem in computer vision, which aims to retrieve images of a specific person from a large number of images captured by different cameras. It is the most basic visual recognition problem in video surveillance and has broad application prospects. The traditional methods proposed by ReID mostly use low-level features of color histogram and texture histogram, and use metric learning to find a distance function that minimizes the distance between images from the same class and maximizes the distance between different classes change. ReID has been studied in academia for many years, and it was only in recent years that with the development of deep learning, a very huge breakthrough was made. However, the problem that cannot be ignored is that, as a high-dimensional feature, deep convolution features are easily affected by factors such as people's poses, object occlusions, differences in light intensity, and background clutter.

过去几年人们提出了许多方法来获得更鲁棒的特征。但这些方法往往直接把整个图像作为输入。这样的全局信息不仅包含行人特征，还包括背景的杂波特征。目前能够缓解背景杂波影响的有效方法大体上主要分为两种：1)基于身体区域检测的方法，例如借助位姿和关键点估计，利用部位区域检测的方法提取图像中的人体信息。2)基于人体分割的方法。近年来现有的图像分割方法包括FCN，面具R-CNN，jppnet等已经能够在截除背景方面获得优秀的效果。Many methods have been proposed in the past few years to obtain more robust features. But these methods often directly take the entire image as input. Such global information contains not only pedestrian features but also background clutter features. At present, the effective methods that can alleviate the influence of background clutter are mainly divided into two types: 1) methods based on body region detection, such as using pose and key point estimation, and using part region detection methods to extract human information in images. 2) Methods based on human body segmentation. In recent years, existing image segmentation methods including FCN, mask R-CNN, jppnet, etc. have been able to obtain excellent results in terms of background removal.

但在实际的无约束情境中，行人重识别仍然是一项极具挑战性的任务。如何提取对背景杂波不变性的鉴别性和鲁棒性特征是核心问题，因为背景中的非行人部分会对前景信息的特征造成很大的干扰。But in practical unconstrained situations, person re-identification is still an extremely challenging task. How to extract discriminative and robust features to background clutter invariance is the core issue, because non-pedestrian parts in the background can greatly interfere with the features of foreground information.

现有解决背景干扰的方法，多数是基于身体区域检测的方法或是利用分割的方法滤除掉背景，然而他们仍然存在着以下限制之一。一，需要对额外的检测模型和分割模型进行预训练，以及额外的数据采集工作。二，姿态估计和reid之间潜在的数据集偏差会造成分区错误，破坏行人原有的完整体型特征。三，虽然背景被部分截除，但依然存在在行人周围，并以与行人区域等同的权重参与模型训练。按照如此分析，这些方法并没有真正针对背景杂波问题的提出根本解决方案。四，强硬的分割所带来的弊端不仅仅是破坏图像原有的结构和平滑度，还完全放弃掉了所有的背景信息。有些背景信息有时候是可以作为有用的上下文信息，忽视所有背景信息会忽略一些关于行人再识别任务的线索。Most of the existing methods to solve background interference are based on body region detection or use segmentation to filter out the background, but they still have one of the following limitations. First, pre-training of additional detection models and segmentation models is required, as well as additional data collection work. Second, the potential dataset bias between pose estimation and reid can cause partition errors and destroy the original complete body shape characteristics of pedestrians. Third, although the background is partially truncated, it still exists around the pedestrian and participates in model training with the same weight as the pedestrian area. According to this analysis, these methods do not really propose a fundamental solution to the background clutter problem. Fourth, the disadvantages brought by the tough segmentation not only destroy the original structure and smoothness of the image, but also completely abandon all the background information. Some background information can sometimes be useful as context information, and ignoring all background information will ignore some clues about the task of pedestrian re-identification.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于背景灰度化的行人再识别方法及系统，该方法及系统有利于弱化背景干扰，提高行人再识别的准确度。The purpose of the present invention is to provide a method and system for pedestrian re-identification based on background grayscale, the method and system are conducive to weakening background interference and improving the accuracy of pedestrian re-identification.

为实现上述目的，本发明采用的技术方案是：一种基于背景灰度化的行人再识别方法，包括以下步骤：S1、对原始图像进行背景灰度化处理，同时图像前景信息保持不变，得到处理后的BGg图像，即背景灰度化图像；In order to achieve the above object, the technical solution adopted in the present invention is: a method for re-identification of pedestrians based on background grayscale, comprising the following steps: S1, performing background grayscale processing on the original image, while the foreground information of the image remains unchanged, Obtain the processed BGg image, that is, the background grayscale image;

S2、基于ResNet50的双路网络，一路为背景灰度流BGg-Stream，对步骤S1得到的背景灰度化图像进行特征提取，另一路为全局流G-Stream，对原始图像进行特征提取；S2. A two-way network based on ResNet50, one is the background grayscale stream BGg-Stream, and the feature extraction is performed on the background grayscale image obtained in step S1, and the other is the global stream G-Stream, which performs feature extraction on the original image;

S3、通过级联对步骤S2得到的两路网络进行彼此交互；S3, the two-way network obtained in step S2 is interacted with each other by cascading;

S4、两路网络在每个阶段所得的特征图，通过空间-通道注意力机制模块，即SCAB模块之后作为BGg-stream的下一层卷积的输入特征图，使得两路网络得以联系，两路特征得以结合；S4. The feature map obtained by the two-way network at each stage, through the space-channel attention mechanism module, that is, the input feature map of the next layer of convolution of the BGg-stream after the SCAB module, so that the two-way network can be connected. Road features are combined;

S5、特征提取后，采用三元组损失函数来更新网络参数，然后进行相似度计算，最后输出排列序列。S5. After feature extraction, use triple loss function to update network parameters, then perform similarity calculation, and finally output permutation sequence.

进一步地，所述BGg-stream的输入是BGg图像，图像的背景灰度化公式为：Further, the input of the BGg-stream is a BGg image, and the background grayscale formula of the image is:

BGg(i,j)＝0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j)BGg(i,j)=0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j)

其中BGg(i,j)为BGg图像在第i行第j列的像素值，R、G、B为RGB图像的三个通道。Among them, BGg(i,j) is the pixel value of the BGg image in the i-th row and the j-th column, and R, G, and B are the three channels of the RGB image.

进一步地，所述SCAB模块包括通道注意模块和空间注意模块。Further, the SCAB module includes a channel attention module and a spatial attention module.

进一步地，所述通道注意模块按如下方法实现：给定输入特征映射F_i∈R^C*H*W，首先利用平均池化操作对特征地图的空间信息进行聚合，生成空间上下文描述符F_i∈R^C*1*1，将空间信息压缩并转换到通道中；然后将隐藏的激活大小设置为F_i∈R^C/r*1*1，以减少参数开销，其中R为缩减率；因此，通道注意模块表示为：Further, the channel attention module is implemented as follows: given an input feature map F _i ∈ R ^C*H*W , first use the average pooling operation to aggregate the spatial information of the feature map to generate a spatial context descriptor F _i ∈R ^C*1*1 , compresses and transforms the spatial information into channels; then sets the hidden activation size to F _i ∈ R ^C/r*1*1 to reduce parameter overhead, where R is the reduction rate; thus , the channel attention module is expressed as:

F_ii＝σ(θ(R(ζ(δ(F_i)))))F _ii =σ(θ(R(ζ(δ(F _i ))))))

其中F_i为各阶段的BGg-Stream特征，σ为sigmoid激活函数，R为ReLU激活函数；δ是平均池数；θ和ζ表示为两个不同的连接层；

表示element-wise乘法。where F _i is the BGg-Stream feature of each stage, σ is the sigmoid activation function, R is the ReLU activation function; δ is the average number of pools; θ and ζ are represented as two different connection layers;

Represents element-wise multiplication.

进一步地，所述空间注意模块按如下方法实现：对于输入特征映射F_i∈R^C*H*W，其中C是信道总数，H*W是特征映射的大小，则空间注意模块表示为：Further, the spatial attention module is implemented as follows: for the input feature map F _i ∈ R ^C*H*W , where C is the total number of channels and H*W is the size of the feature map, the spatial attention module is expressed as:

F_iii＝σ(C(F_i))F _iii =σ(C(F _i ))

其中σ为sigmoid激活函数，空间关注模块的输出为F_iii∈R^1*H*W。where σ is the sigmoid activation function, and the output of the spatial attention module is F _iii ∈ R ^1*H*W .

进一步地，在测试阶段应用局部特征响应最大化策略，即LRM策略；在测试的过程中，将特征图按水平分为适当个数的n个区域，对每一部分特征提取出响应最大的特征，作为该部分的特征。Further, the local feature response maximization strategy, that is, the LRM strategy, is applied in the testing phase; in the testing process, the feature map is divided into n regions of an appropriate number horizontally, and the feature with the largest response is extracted for each part of the feature. as a feature of this section.

本发明还提供了一种基于背景灰度化的行人再识别系统，包括存储器、处理器以及存储于存储器上并能够在处理器上运行的计算机程序，当处理器运行该计算机程序时，实现所述的方法步骤。The present invention also provides a pedestrian re-identification system based on background grayscale, comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor runs the computer program, all described method steps.

相较于现有技术，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明不需要额外的训练和数据集。1. The present invention does not require additional training and data sets.

2、本发明可以保持行人原有体型信息的完整和有效。2. The present invention can maintain the integrity and effectiveness of the pedestrian's original body shape information.

3、本发明可以准确定位人体区域，针对包括行人周身范围在内的所有背景进行处理。3. The present invention can accurately locate the human body area and process all backgrounds including the range of the pedestrian's body.

4、本发明可以不受背景中强烈的颜色干扰又可以留下有用的信息。4. The present invention can leave useful information without being disturbed by strong colors in the background.

5、本发明使模型能在学习过程中专注于前景信息的学习，进一步弱化背景干扰。5. The present invention enables the model to focus on the learning of foreground information during the learning process, and further weakens background interference.

附图说明Description of drawings

图1是本发明实施例的方法实现流程图。FIG. 1 is a flow chart of a method implementation according to an embodiment of the present invention.

图2是本发明实施例中通道注意模块结构示意图。FIG. 2 is a schematic structural diagram of a channel attention module in an embodiment of the present invention.

图3是本发明实施例中空间注意模块结构示意图。FIG. 3 is a schematic structural diagram of a spatial attention module in an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

应该指出，以下详细说明都是示例性的，旨在对本申请提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the application. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

如图1所示，本实施例提供了一种基于背景灰度化的行人再识别方法，包括以下步骤：As shown in FIG. 1 , this embodiment provides a method for pedestrian re-identification based on background grayscale, including the following steps:

S1、利用mask对原始图像进行背景灰度化处理，同时图像前景信息保持不变，得到处理后的BGg图像，即背景灰度化图像(前景为RGB图像，背景为灰度图像的图像)。S1. Use the mask to perform background grayscale processing on the original image, while the foreground information of the image remains unchanged to obtain a processed BGg image, that is, a background grayscale image (an RGB image in the foreground and an image in which the background is a grayscale image).

在本实施例中，所述BGg-stream的输入是BGg图像，图像的背景灰度化公式为：In this embodiment, the input of the BGg-stream is a BGg image, and the background grayscale formula of the image is:

所述SCAB模块包括通道注意模块和空间注意模块。The SCAB module includes a channel attention module and a space attention module.

在本实施例中，通道注意模块的完整结构如图2所示。所述通道注意模块按如下方法实现：给定输入特征映射F_i∈R^C*H*W，首先利用平均池化操作对特征地图的空间信息进行聚合，生成空间上下文描述符F_i∈R^C*1*1，将空间信息压缩并转换到通道中；然后将隐藏的激活大小设置为F_i∈R^C/r*1*1，以减少参数开销，其中R为缩减率；因此，通道注意模块表示为：In this embodiment, the complete structure of the channel attention module is shown in FIG. 2 . The channel attention module is implemented as follows: given an input feature map F _i ∈ R ^C*H*W , first use the average pooling operation to aggregate the spatial information of the feature map to generate a spatial context descriptor F _i ∈ R ^{C *1*1} , the spatial information is compressed and transformed into the channel; then the hidden activation size is set to F _i ∈ R ^C/r*1*1 to reduce the parameter overhead, where R is the reduction rate; therefore, the channel pays attention to Modules are represented as:

F_ii＝σ(θ(R(ζ(δ(F_i)))))F _ii =σ(θ(R(ζ(δ(F _i ))))))

Represents element-wise multiplication.

在本实施例中，空间注意模块的完整结构如图3所示。所述空间注意模块按如下方法实现：对于输入特征映射F_i∈R^C*H*W，其中C是信道总数，H*W是特征映射的大小，则空间注意模块表示为：In this embodiment, the complete structure of the spatial attention module is shown in FIG. 3 . The spatial attention module is implemented as follows: for the input feature map F _i ∈ R ^C*H*W , where C is the total number of channels and H*W is the size of the feature map, the spatial attention module is expressed as:

F_iii＝σ(C(F_i))F _iii =σ(C(F _i ))

在本实施例中，在测试阶段应用局部特征响应最大化策略，即LRM策略；在测试的过程中，将特征图按水平分为适当个数的n个区域，本实施例中令n＝8，对每一部分特征提取出响应最大的特征，作为该部分的特征。In this embodiment, the local feature response maximization strategy, that is, the LRM strategy, is applied in the testing phase; in the testing process, the feature map is divided into n regions of an appropriate number horizontally, and n=8 in this embodiment , and extract the feature with the largest response for each part of the feature as the feature of this part.

本实施例还提供了一种基于背景灰度化的行人再识别系统，包括存储器、处理器以及存储于存储器上并能够在处理器上运行的计算机程序，当处理器运行该计算机程序时，实现上述的方法步骤。This embodiment also provides a pedestrian re-identification system based on background grayscale, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor runs the computer program, the the above method steps.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

以上所述，仅是本发明的较佳实施例而已，并非是对本发明作其它形式的限制，任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例。但是凡是未脱离本发明技术方案内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与改型，仍属于本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in other forms. Any person skilled in the art may use the technical content disclosed above to make changes or modifications to equivalent changes. Example. However, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solutions of the present invention still belong to the protection scope of the technical solutions of the present invention.

Claims

1. a pedestrian re-identification method based on background graying, is characterized in that, comprises the following steps:

S1. Perform background grayscale processing on the original image, while the foreground information of the image remains unchanged, to obtain a processed BGg image, that is, a background grayscale image;

S2. A two-way network based on ResNet50, one is the background grayscale stream BGg-Stream, and the feature extraction is performed on the background grayscale image obtained in step S1, and the other is the global stream G-Stream, which performs feature extraction on the original image;

S3, the two-way network obtained in step S2 is interacted with each other by cascading;

S4. The feature map obtained by the two-way network at each stage, through the space-channel attention mechanism module, that is, the input feature map of the next layer of convolution of the BGg-stream after the SCAB module, so that the two-way network can be connected. Road features are combined;

S5. After feature extraction, use triple loss function to update network parameters, then perform similarity calculation, and finally output permutation sequence;

The SCAB module includes a channel attention module and a space attention module;

The channel attention module is implemented as follows: given an input feature map F _i ∈ R ^C*H*W , first use the average pooling operation to aggregate the spatial information of the feature map to generate a spatial context descriptor F _i ∈ R ^{C *1*1} , the spatial information is compressed and transformed into the channel; then the hidden activation size is set to F _i ∈ R ^C/r*1*1 to reduce the parameter overhead, where R is the reduction rate; therefore, the channel pays attention to Modules are represented as:

Among them, F _i is the BGg-Stream feature of each stage, σ is the sigmoid activation function, R is the ReLU activation function; δ is the average pool number;

and ζ are represented as two different connection layers;

represents element-wise multiplication;

The spatial attention module is implemented as follows: for the input feature map F _i ∈ R ^C*H*W , where C is the total number of channels and H*W is the size of the feature map, the spatial attention module is expressed as:

F _iii =σ(C(F _i ))

where σ is the sigmoid activation function, and the output of the spatial attention module is F _iii ∈ R ^1*H*W .

2. the pedestrian re-identification method based on background graying according to claim 1, is characterized in that, the input of described BGg-stream is BGg image, and the background graying formula of image is:

BGg(i,j)=0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j)

Among them, BGg(i,j) is the pixel value of the BGg image in the i-th row and the j-th column, and R, G, and B are the three channels of the RGB image.

3. the pedestrian re-identification method based on background graying according to claim 1, is characterized in that, in testing stage, apply local feature response maximization strategy, namely LRM strategy; It is divided into an appropriate number of n regions, and the feature with the largest response is extracted for each part of the feature as the feature of the part.

4. a pedestrian re-identification system based on background graying, is characterized in that, comprises memory, processor and the computer program that is stored on memory and can run on processor, when processor runs this computer program, realizes. The method steps of any one of claims 1-3.