CN110717402B

CN110717402B - Pedestrian re-identification method based on hierarchical optimization metric learning

Info

Publication number: CN110717402B
Application number: CN201910869949.8A
Authority: CN
Inventors: 肖江文; 黄正义; 王燕舞
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2022-08-05
Anticipated expiration: 2039-09-16
Also published as: CN110717402A

Abstract

The invention discloses a pedestrian re-identification method based on hierarchical optimization metric learning, which belongs to the field of pedestrian re-identification. The invention proposes a hierarchical optimization strategy, optimizes the parameters of the deep neural network by using the batch gradient descent method, and realizes the convex optimization of the mapping matrix through mathematical transformation. The hierarchical optimization mode can effectively realize the complementary advantages of deep metric learning and traditional metric learning, thereby improving the performance of the algorithm and the performance of pedestrian recognition. It is proposed to construct strong sample constraints, that is, only the samples with the same label with the farthest distance and the samples with different labels with the closest distance are used to construct sample constraints, which can effectively reduce the number of sample constraints, ensure the validity of each sample constraint, and realize the rapid optimization of the model. A new regular term is proposed to measure the difference of the parameter vector by calculating the cosine value of the parameter vector. The given regular form is the summation of the inner product of the parameter vector and its norm normalization loss, which can avoid parameter convergence during the training process. Increase diversity.

Description

A pedestrian re-identification method based on hierarchical optimization metric learning

技术领域technical field

本发明属于行人再识别领域，更具体地，涉及一种基于层级优化度量学习的行人再识别方法。The invention belongs to the field of pedestrian re-identification, and more particularly, relates to a pedestrian re-identification method based on hierarchical optimization metric learning.

背景技术Background technique

在生活中，许多场景都安装有大量的摄像头，这些视频采集设备能够产生海量数据。行人再识别技术就是利用海量的视觉信息来分析行人的特征。行人再识别技术有着广阔的应用，能够用于行人追踪、行人的年龄识别、行人行为判断等。近些年来，行人再识别技术得到了快速发展，其中，基于度量学习的行人再识别技术是较为主流的方法。In life, many scenes are installed with a large number of cameras, and these video capture devices can generate massive amounts of data. Pedestrian re-identification technology is to use massive visual information to analyze the characteristics of pedestrians. Pedestrian re-identification technology has a wide range of applications, which can be used for pedestrian tracking, pedestrian age recognition, and pedestrian behavior judgment. In recent years, pedestrian re-identification technology has developed rapidly, among which, the pedestrian re-identification technology based on metric learning is the more mainstream method.

度量学习的本质可以简单地描述为：有监督地学习一个映射机制，用该机制把样本映射到一个新的嵌入空间中；在这个嵌入空间中，需要满足样本约束条件：具有相同标签样本间的距离小于不同标签样本间的距离。这个完整的度量学习具有三个部分：映射样本，构建样本约束，计算损失并更新参数。度量学习的发展可以划分两个阶段：传统度量学习阶段和深度度量学习阶段。传统度量学习的建模过程可以描述为：利用一个半正定的马氏矩阵定义马氏距离，并通过把问题转化成凸优化问题求得该矩阵最优解。深度度量学习的建模过程比较简单，主要是利用深度神经网络的很强表达能力和强非线性映射能力来实习度量学习的目标。The essence of metric learning can be simply described as: supervised learning of a mapping mechanism, using this mechanism to map samples into a new embedding space; in this embedding space, the sample constraints need to be satisfied: samples with the same label The distance is smaller than the distance between samples with different labels. This complete metric learning has three parts: mapping samples, building sample constraints, computing losses, and updating parameters. The development of metric learning can be divided into two stages: traditional metric learning stage and deep metric learning stage. The modeling process of traditional metric learning can be described as: using a positive semi-definite Mahalanobis matrix to define the Mahalanobis distance, and transforming the problem into a convex optimization problem to obtain the optimal solution of the matrix. The modeling process of deep metric learning is relatively simple, mainly using the strong expressive ability and strong nonlinear mapping ability of deep neural network to practice the goal of metric learning.

传统度量学习和深度度量学习仅仅在损失函数是相同的，在其它面却存在很大的差异性。从结构上分析，传统度量学习属于单层矩阵的模型，因此在优化方面，它具有深度度量学习所不具备的优势，比如，算法收敛性有保障，优化的参数少，数学背景容易解释，约束条件容易转换。但是，传统的网络模型在训练过程中，参数向量(或参数矩阵)容易趋同，使得网络陷入局部最优，并最终影响模型的性能。与传统度量学习相比，具有多层结构的深度度量学习的优化算法主要有梯度下降法以及其改进优化算法，性能效果总体上优于传统度量学习。但是，无论传统的度量学习还是深度度量学习，构建模型的三元样本约束集都会产生很大时间开销。此外，现有的基于度量学习的行人再识别技术，只能单独使用反向传播技术来优化深度模型或者利用数学转换来确保浅层模型凸优化。Traditional metric learning and deep metric learning are only the same in the loss function, but there are great differences in other aspects. Structurally, traditional metric learning is a single-layer matrix model. Therefore, in terms of optimization, it has advantages that deep metric learning does not have. For example, algorithm convergence is guaranteed, optimization parameters are few, mathematical background is easy to explain, constraints Conditions are easy to convert. However, during the training process of traditional network models, the parameter vectors (or parameter matrices) tend to converge, which makes the network fall into a local optimum and ultimately affects the performance of the model. Compared with traditional metric learning, the optimization algorithms of deep metric learning with multi-layer structure mainly include gradient descent method and its improved optimization algorithm, and the performance effect is generally better than that of traditional metric learning. However, regardless of traditional metric learning or deep metric learning, constructing the ternary sample constraint set of the model will generate a lot of time overhead. In addition, existing metric learning-based person re-identification techniques can only use backpropagation alone to optimize deep models or use mathematical transformations to ensure convex optimization of shallow models.

发明内容SUMMARY OF THE INVENTION

针对现有技术的缺陷和改进需求，本发明提供了一种基于层级优化度量学习的行人再识别方法，其目的在于实现深度神经网络与传统度量学习有机地结合，改进了构建三元样本约束的策略，提出了一个新的正则项和一种分级式的优化策略，可以有效地改善样本的分布和实现数据降维。In view of the defects and improvement requirements of the prior art, the present invention provides a pedestrian re-identification method based on hierarchical optimization metric learning, the purpose of which is to realize the organic combination of deep neural network and traditional metric learning, and improve the construction of ternary sample constraints. strategy, a new regular term and a hierarchical optimization strategy are proposed, which can effectively improve the distribution of samples and achieve data dimensionality reduction.

为实现上述目的，按照本发明的一个方面，提供了一种基于层级优化度量学习的行人再识别方法，所述方法包括以下步骤：In order to achieve the above object, according to an aspect of the present invention, there is provided a pedestrian re-identification method based on hierarchical optimization metric learning, the method comprising the following steps:

S1.选择训练样本集，初始化嵌入空间的维度Dim_e、反向传播算法的学习率η和迭代次数N_iter；S1. Select a training sample set, initialize the dimension _Dime of the embedding space, the learning rate η of the back-propagation algorithm, and the number of iterations N _iter ;

S2.使用训练样本集预训深度神经网络，根据训练好的深度神经网络参数初始化去掉损失层的深度神经网络权重参数W，利用随机数初始化映射矩阵L；S2. Use the training sample set to pre-train the deep neural network, initialize the deep neural network weight parameter W that removes the loss layer according to the trained deep neural network parameters, and initialize the mapping matrix L with random numbers;

S3.选取部分训练样本，在当前的W和L下，把该批样本映射到维度为Dim_e的嵌入空间；S3. Select some training samples, and map the batch of samples to the embedding space of dimension _Dime under the current W and L;

S4.在嵌入空间中构建三元样本约束，得到三元样本约束集；S4. Construct ternary sample constraints in the embedding space to obtain a ternary sample constraint set;

S5.遍历全部三元样本约束集，利用最大边际最近邻法更新映射矩阵L；S5. Traverse all ternary sample constraint sets, and update the mapping matrix L using the maximum marginal nearest neighbor method;

S6.固定更新后的映射矩阵L，利用反向传播算法更新网络参数W；S6. Fix the updated mapping matrix L, and use the back-propagation algorithm to update the network parameter W;

S7.判断迭代次数是否小于N_iter，若是，迭代次数加1，转步骤S3，否则，转步骤S8；S7. Determine whether the number of iterations is less than _Niter , if so, add 1 to the number of iterations, and go to step S3; otherwise, go to step S8;

S8.将待测样本输入当前W、L下的去掉损失层的深度神经网络，得到再识别结果。S8. Input the sample to be tested into the deep neural network with the loss layer removed under the current W and L to obtain the re-identification result.

具体地，步骤S2中，利用[-1，1]范围内的随机数，初始化映射矩阵

其中，Dim_Net表示网络输出向量维度。Specifically, in step S2, a random number in the range of [-1, 1] is used to initialize the mapping matrix

Among them, Dim _Net represents the network output vector dimension.

具体地，所述把该批样本映射到维度为Dim_e的嵌入空间为：Specifically, the mapping of the batch of samples to the embedding space of dimension _Dime is as follows:

x′_i＝L×f(W，x_i)x′ _i =L×f(W, x _i )

其中，f(W，x_i)表示样本x_i在参数为W的神经网络模型的映射结果，x′_i表示样本在嵌入空间中的特征向量。Among them, f(W, x _i ) represents the mapping result of the sample _xi in the neural network model with parameter W, and x′ _i represents the feature vector of the sample in the embedding space.

具体地，步骤S4包括以下步骤：Specifically, step S4 includes the following steps:

S41.在嵌入空间中，计算样本x′_i与其他相同标签样本的距离，选择距离最远的样本作为x′_j；S41. In the embedding space, calculate the distance between the sample x′ _i and other samples with the same label, and select the sample with the farthest distance as x′ _j ;

S42.在嵌入空间中，计算样本x′_i与其他不同标签样本的距离，选择距离最远的样本作为x′_k；S42. In the embedding space, calculate the distance between the sample x′ _i and other samples with different labels, and select the sample with the farthest distance as x′ _k ;

S43.返回三元样本约束(x′_i，x′_j，x′_k)。S43. Return ternary sample constraints (x′ _i , x′ _j , x′ _k ).

具体地，度量学习的损失函数如下：Specifically, the loss function of metric learning is as follows:

其中，中(W，L)表示在权重参数W和映射矩阵L下的深度神经网络的损失函数，x′_i，x′_j，x′_k表示样本x_i，x_j，x_k在嵌入空间中的特征向量，样本x_i，x_j属于同一个行人，样本x_i，x_k不属于同一个行人，S表示三元样本约束集合，D_M(，)表示两者之间马氏距离，ρ表示间隔参数，R(W)表示权重参数W对应的正则项。Among them, (W, L) represents the loss function of the deep neural network under the weight parameter W and the mapping matrix L, x′ _i , x′ _j , x′ _k represent the samples x _i , x _j , x _k in the embedding space The feature vector in , the samples x _i , x _j belong to the same pedestrian, the samples x _i , x _k do not belong to the same pedestrian, S represents the ternary sample constraint set, D _M (,) represents the Mahalanobis distance between the two, ρ represents the interval parameter, and R(W) represents the regular term corresponding to the weight parameter W.

具体地，神经网络第h层权重的正则项计算公式如下：Specifically, the calculation formula of the regular term of the weight of the hth layer of the neural network is as follows:

其中，R(W_h)表示第h层的正则项，W_h表示深度神经网络的第h层权重，N_h表示第h层的权重分量个数，

表示W_h的第i分量，<,>表示向量的内积运算，λ、γ为超参数。Among them, R(W _h ) represents the regular term of the h-th layer, W _h represents the weight of the h-th layer of the deep neural network, N _h represents the number of weight components of the h-th layer,

Represents the i-th component of W _h , <,> represents the inner product operation of vectors, and λ and γ are hyperparameters.

为实现上述目的，按照本发明的另一个方面，提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面所述的基于层级优化度量学习的行人再识别方法。In order to achieve the above object, according to another aspect of the present invention, a computer-readable storage medium is provided, and a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the first aspect is implemented The described pedestrian re-identification method based on hierarchical optimization metric learning.

总体而言，通过本发明所构思的以上技术方案，能够取得以下有益效果：In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be achieved:

(1)针对现有的基于度量学习的行人再识别技术，只能单独使用反向传播技术来优化深度模型或者利用数学转换来确保浅层模型凸优化问题，本发明提出了一种层次优化策略，利用预先训练好的模型优化网络参数，利用批量梯度下降法优化深度神经网络参数，通过数学上的转换实现映射矩阵凸优化。分级优化模式能够有效的实现深度度量学习与传统度量学习优势互补，进而提升算法性能，提升了行人识别的性能。(1) Aiming at the existing pedestrian re-identification technology based on metric learning, the deep model can only be optimized by the back-propagation technology alone or the convex optimization problem of the shallow model can be ensured by using mathematical transformation. The present invention proposes a hierarchical optimization strategy , using the pre-trained model to optimize the network parameters, using the batch gradient descent method to optimize the parameters of the deep neural network, and realizing the convex optimization of the mapping matrix through mathematical transformation. The hierarchical optimization mode can effectively realize the complementary advantages of deep metric learning and traditional metric learning, thereby improving the performance of the algorithm and the performance of pedestrian recognition.

(2)针对在构建模型的样本约束时，传统的技术需要大量的时间开销，并且其生成的样本约束很多是无效的(产生的损失为零)问题，本发明提出了一种构建强样本约束的方法，即只利用距离最远的同标签样本与距离最近的异标签样本来构建样本约束。该方法能够有效减少样本约束的数量，也能保证每个样本约束的有效性，进而实现模型快速优化。(2) When constructing the sample constraints of the model, the traditional technology requires a lot of time overhead, and many of the generated sample constraints are invalid (the resulting loss is zero), the present invention proposes a method to construct strong sample constraints method, that is, only the samples with the same label at the farthest distance and the samples with different labels at the nearest distance are used to construct sample constraints. This method can effectively reduce the number of sample constraints, and can also ensure the effectiveness of each sample constraint, thereby achieving rapid model optimization.

(3)针对传统的网络模型在训练过程中，参数向量(或参数矩阵)容易趋同，使得网络陷入局部最优问题，本发明提出了一种新的正则方法，即在每一层的网络层上，通过对参数向量(或者矩阵)求余弦值来衡量参数向量的差异性，最终给出的正则形式为参数向量(或者矩阵)的内积与其范数归一损失的求和。该正则方法能够在训练过程中避免参数趋同，提高多样性。(3) In the training process of the traditional network model, the parameter vector (or parameter matrix) is easy to converge, which makes the network fall into the local optimal problem. Above, the difference of the parameter vector is measured by calculating the cosine value of the parameter vector (or matrix), and the final regular form is the sum of the inner product of the parameter vector (or matrix) and its norm normalization loss. This regularization method can avoid parameter convergence and improve diversity during training.

附图说明Description of drawings

图1为本发明实施例提供的一种基于层级优化度量学习的行人再识别方法流程图；1 is a flowchart of a pedestrian re-identification method based on hierarchical optimization metric learning provided by an embodiment of the present invention;

图2为本发明实施例提供的样本映射到嵌入空间的过程示意图。FIG. 2 is a schematic diagram of a process of mapping a sample to an embedding space according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明的发明构思为：利用深度神经网络给出样本新的表示特征，然后把新的表征传给一个映射矩阵，最终把样本映射到一个新的嵌入空间中。在该嵌入空间中，利用改进的策略来构建三元样本约束集。最后，根据三元样本约束集，利用分级式优化策略来更新模型参数。The inventive concept of the present invention is as follows: using a deep neural network to give a new representation feature of the sample, then passing the new representation to a mapping matrix, and finally mapping the sample into a new embedding space. In this embedding space, an improved strategy is used to construct a ternary sample constraint set. Finally, according to the ternary sample constraint set, a hierarchical optimization strategy is used to update the model parameters.

如图1所示，本发明提供了一种基于层级优化度量学习的行人再识别方法，所述方法包括以下步骤：As shown in FIG. 1 , the present invention provides a pedestrian re-identification method based on hierarchical optimization metric learning, and the method includes the following steps:

步骤S1.选择训练样本集，初始化嵌入空间的维度Dim_e、反向传播算法的学习率η和迭代次数N_iter。Step S1. Select a training sample set, and initialize the dimension _{Dime of the embedding space, the learning rate η of the back-propagation algorithm, and the number of iterations Niter} _.

训练样本集为{x₁，x₂，…，x_i，…，x_n}，样本x_i表示一个视角下的行人图片，每个人对应一个标签，即同一个人在不同视角下的样本的标签一样，因此，标签数等于行人的个数。The training sample set is {x ₁ , x ₂ , ..., x _i , ..., x _n }, the sample x _i represents a pedestrian image from one perspective, and each person corresponds to a label, that is, the same person's sample labels from different perspectives The same, therefore, the number of labels is equal to the number of pedestrians.

本实施例中嵌入空间的维度Dim_e用在步骤S3，初始化为64；学习率η用在步骤S6，初始化为0.001；迭代次数N_iter用在步骤S7，初始化为3000。In this embodiment, the dimension _Dime of the embedding space is used in step S3 and is initialized to 64; the learning rate η is used in step S6 and is initialized to 0.001; the number of iterations N _iter is used in step S7 and is initialized to 3000.

步骤S2.使用训练样本集预训深度神经网络，根据训练好的深度神经网络参数初始化去掉损失层的深度神经网络权重参数W，利用随机数初始化映射矩阵L。Step S2. Use the training sample set to pre-train the deep neural network, initialize the deep neural network weight parameter W that removes the loss layer according to the trained deep neural network parameters, and initialize the mapping matrix L with random numbers.

本实例中采用的深度神经网络是一个常规的卷积神经网络，网络结构为：卷积层、最大池化层、卷积层、最大池化层、全连接层、全连接层，其中，卷积核的大小为3*3，步长为1。由于采用了度量学习的损失函数，本发明中的深度神经网络没有常规网络中的最后一层损失层。在对网络权重初始化时，先把网络补全，训练一个传统网络；再去掉最后一层，用来初始化本发明模型的网络部分。The deep neural network used in this example is a conventional convolutional neural network. The network structure is: convolutional layer, maximum pooling layer, convolutional layer, maximum pooling layer, fully connected layer, and fully connected layer. The size of the product kernel is 3*3, and the step size is 1. Due to the loss function of metric learning, the deep neural network in the present invention does not have the last loss layer in the conventional network. When initializing the weights of the network, first complete the network to train a traditional network; then remove the last layer to initialize the network part of the model of the present invention.

利用[-1，1]范围内的随机数初始化映射矩阵

其中，Dim_Net表示网络输出向量维度。Initialize the mapping matrix with random numbers in the range [-1, 1]

Among them, Dim _Net represents the network output vector dimension.

步骤S3.选取部分训练样本，在当前的W和L下，把该批样本映射到维度为Dim_e的嵌入空间。Step S3. Select some training samples, and map the batch of samples to the embedding space of dimension _Dime under the current W and L.

如图2所示，原始空间是指原始视觉图像，映射过程是指把图像映射到嵌入空间的过程。在传统度量学习中，样本从原始空间到嵌入空间需要一个映射矩阵来实现。嵌入空间的表示样本由映射矩阵映射到的一个新的空间中，一般来说，嵌入空间的维度比原始空间维度低。As shown in Figure 2, the original space refers to the original visual image, and the mapping process refers to the process of mapping the image to the embedding space. In traditional metric learning, a mapping matrix is needed for samples from the original space to the embedding space. The representation sample of the embedding space is mapped into a new space by the mapping matrix. Generally speaking, the dimension of the embedding space is lower than the dimension of the original space.

x′_i＝L×f(W，x_i)x′ _i =L×f(W, x _i )

其中，f(W，x_i)表示x_i在参数为W的神经网络模型的映射结果。Among them, f(W, _xi ) represents the mapping result of _xi in the neural network model whose parameter is W.

步骤S4.在嵌入空间中构建三元样本约束，得到三元样本约束集。Step S4. Construct ternary sample constraints in the embedding space to obtain a ternary sample constraint set.

三元样本约束是指，三个样本间的距离关系应该满足本发明规定的约束——相同标签样本间的距离小于不同标签样本间的距离。遍历批样本集中的样本，与距离最远的同类样本和距离最近的异类样本来构建样本约束。其有益效果是减少三元样本约束集的数量，且得到约束性更强的样本约束。The ternary sample constraint means that the distance relationship between the three samples should satisfy the constraint specified in the present invention—the distance between samples with the same label is smaller than the distance between samples with different labels. It traverses the samples in the batch sample set, and constructs sample constraints with the most distant similar samples and the nearest heterogeneous samples. Its beneficial effect is to reduce the number of ternary sample constraint sets and obtain more constrained sample constraints.

步骤S4包括以下步骤：Step S4 includes the following steps:

S41.在嵌入空间中，计算样本x′_i与其他相同标签样本的距离，选择距离最远的样本作为x′_j。S41. In the embedding space, calculate the distance between the sample x′ _i and other samples with the same label, and select the sample with the farthest distance as x′ _j .

在嵌入空间中距离选择为欧式距离。The distance in the embedding space is chosen as Euclidean distance.

S42.在嵌入空间中，计算样本x′_i与其他不同标签样本的距离，选择距离最远的样本作为x′_k。S42. In the embedding space, calculate the distance between the sample x′ _i and other samples with different labels, and select the sample with the farthest distance as x′ _k .

在嵌入空间中构建三元样本约束，得到三元样本约束集。A ternary sample constraint is constructed in the embedding space, and a ternary sample constraint set is obtained.

步骤S5.遍历全部三元样本约束集，利用最大边际最近邻法更新映射矩阵L。Step S5. Traverse all ternary sample constraint sets, and update the mapping matrix L by using the maximum margin nearest neighbor method.

最大边际最近邻法(Large margin nearest neighbor，LMNN)的目标是学习一种马氏距离度量方式The goal of Large Margin Nearest Neighbor (LMNN) is to learn a Mahalanobis distance metric

其中，半正定对称矩阵M可表示为M＝L^TLAmong them, the positive semi-definite symmetric matrix M can be expressed as M=L ^T L

更新过程如下：The update process is as follows:

(1)计算梯度G_t+1，计算公式如下：(1) Calculate the gradient G _t+1 , the calculation formula is as follows:

其中，G_t表示第t次迭代梯度，μ表示权重系数，一般取0.5，S_t表示第t次迭代三元样本约束集，D_M表示马氏距离。Among them, G _t represents the gradient of the t-th iteration, μ represents the weight coefficient, which is generally 0.5, S _t represents the t-th iteration ternary sample constraint set, and D _M represents the Mahalanobis distance.

(2)更新马氏矩阵(2) Update the Mahalanobis matrix

M_t+1＝SDP(M_t-αG_t+1)M _t+1 =SDP(M _t -αG _t+1 )

其中，M_t表示第t次迭代马氏矩阵，α表示权重系数，SDP(Semi-definiteProgramming，半定规化)为凸函数，保证马氏矩阵为半正定矩阵。Among them, M _t represents the t-th iteration of the Mahalanobis matrix, α represents the weight coefficient, and SDP (Semi-definite Programming, semi-definite regularization) is a convex function, which ensures that the Mahalanobis matrix is a semi-positive definite matrix.

(3)更新映射矩阵L(3) Update the mapping matrix L

利用构建样本的三元约束来训练模型，利用传统的度量学习算法去优化映射矩阵。The model is trained using ternary constraints for constructing samples, and the mapping matrix is optimized using traditional metric learning algorithms.

步骤S6.固定更新后的映射矩阵L，利用反向传播算法更新网络参数W。Step S6. The updated mapping matrix L is fixed, and the network parameter W is updated by using the back-propagation algorithm.

度量学习里面的损失函数。Loss function in metric learning.

其中，Φ(W，L)表示在权重参数W和映射矩阵L下的深度神经网络的损失函数，x′_i，x′_j，x′_k表示样本x_i，x_j，x_k在嵌入空间中的特征向量，样本x_i，x_j属于同一个行人，样本x_i，x_k不属于同一个行人，S表示三元样本约束集合，ρ表示间隔参数，一般为1，R(W)表示权重参数W对应的正则项。Among them, Φ(W, L) represents the loss function of the deep neural network under the weight parameter W and the mapping matrix L, x′ _i , x′ _j , x′ _k represent the samples x _i , x _j , x _k in the embedding space The feature vector in , the samples x _i , x _j belong to the same pedestrian, the samples x _i , x _k do not belong to the same pedestrian, S represents the ternary sample constraint set, ρ represents the interval parameter, generally 1, R(W) represents The regular term corresponding to the weight parameter W.

在损失函数Φ(W，L)的基础上，利用反向传播算法，更新网络参数W。反向传播算法的学习率η为0.001。On the basis of the loss function Φ(W, L), the network parameter W is updated by using the back-propagation algorithm. The learning rate η of the backpropagation algorithm is 0.001.

优选地，为了让同层权重的各个分量尽量差异化，通过最大化余弦相似度来增大每层中神经元的差异性，本发明设计了一种特殊的正则项——该正则项包含有余弦项和范数约束项，并用两个权重系数λ、γ对它们进行合并，该正则项的有益效果是避免优化过程中出现近似的神经元，有利于增加新特征的丰富度。Preferably, in order to make each component of the weight of the same layer as different as possible, and increase the difference of neurons in each layer by maximizing the cosine similarity, the present invention designs a special regular term - the regular term contains The cosine term and the norm constraint term are combined with two weight coefficients λ and γ. The beneficial effect of the regular term is to avoid the appearance of approximate neurons in the optimization process, which is beneficial to increase the richness of new features.

神经网络第h层权重的正则项表示如下：The regular term of the weight of the h-th layer of the neural network is expressed as follows:

表示W_h的第i分量，<，>表示向量的内积运算，超参数λ、γ需要事先定好，本实施例中设置为0.5、0.5。Among them, R(W _h ) represents the regular term of the h-th layer, W _h represents the weight of the h-th layer of the deep neural network, N _h represents the number of weight components of the h-th layer,

represents the i-th component of W _h , and <, > represents the inner product operation of vectors. The hyperparameters λ and γ need to be determined in advance, and are set to 0.5 and 0.5 in this embodiment.

步骤S7.判断迭代次数是否小于N_iter，若是，迭代次数加1，转步骤S3，否则，转步骤S8。Step S7. Determine whether the number of iterations is less than N _iter , if so, add 1 to the number of iterations, and go to step S3; otherwise, go to step S8.

步骤S8.将待测样本输入当前W、L下的去掉损失层的深度神经网络，得到再识别结果。Step S8. Input the sample to be tested into the deep neural network with the loss layer removed under the current W and L to obtain a re-identification result.

输出优化后的参数，得到本算法的模型。至此，本发明得到了一个映射机制，可以把各个视角获得的图片通过这个映射机制映射到嵌入空间中，在这个空间上对行人进行匹配。The optimized parameters are output to obtain the model of the algorithm. So far, the present invention has obtained a mapping mechanism, which can map pictures obtained from various perspectives to the embedding space through this mapping mechanism, and match pedestrians in this space.

利用深度神经网络的强表达能力来提取样本的强特征，然后再在强特征的基础上利用传统的度量学习来优化一个映射矩阵，并保证该映射矩阵在当前强特征下是最优的。这种优化策略把深度神经网络与传统度量学习结合起来。The strong expressive ability of the deep neural network is used to extract the strong features of the samples, and then traditional metric learning is used to optimize a mapping matrix based on the strong features, and to ensure that the mapping matrix is optimal under the current strong features. This optimization strategy combines deep neural networks with traditional metric learning.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, etc., All should be included within the protection scope of the present invention.

Claims

1. A pedestrian re-identification method based on hierarchical optimization metric learning is characterized by comprising the following steps:

s1, selecting a training sample set, and initializing dimension Dim of an embedding space _e Learning rate eta and iteration number N of back propagation algorithm _iter The training sample set is { x ₁ ，x ₂ ，…，x _i ，…，x _n }, sample x _i The method comprises the steps that a pedestrian picture under one visual angle is represented, each person corresponds to one label, namely the labels of samples of the same person under different visual angles are the same, and therefore the number of the labels is equal to the number of pedestrians;

s2, pre-training a deep neural network by using a training sample set, initializing a deep neural network weight parameter W for removing a loss layer according to the trained deep neural network parameters, and initializing a mapping matrix L by using a random number;

s3, selecting part of training samples, and mapping the batch of training samples to Dim with dimension under the current W and L _e The embedding space of (a);

s4, constructing a ternary sample constraint in the embedding space to obtain a ternary sample constraint set;

s5, traversing all the ternary sample constraint sets, and updating a mapping matrix L by using a maximum marginal nearest neighbor method;

s6, fixing the updated mapping matrix L, and updating the weight parameter W by using a back propagation algorithm;

s7, judging whether the iteration frequency is less than N _iter If so, adding 1 to the iteration number, and turning to the step S3, otherwise, turning to the step S8;

s8, inputting the sample to be detected into the current W, L deep neural network with the loss layer removed, and obtaining a re-identification result;

step S4 includes the following steps:

s41. in the embedding space, calculating a sample x' _i Distance from other same label samples, selecting the sample with the farthest distance as x' _j ；

S42, in an embedding space, calculating a sample x' _i Distance from other different label samples, selecting the sample with the farthest distance as x' _k ；

S43, returning ternary sample constraint (x' _i ，x′ _j ，x′ _k )；

The loss function for metric learning is as follows:

where Φ (W, L) represents a loss function, x 'of the deep neural network under the weight parameter W and the mapping matrix L' _i ，x′ _j ，x′ _k Represents a sample x _i ，x _j ，x _k Feature vector in embedding space, sample x _i ，x _j Belonging to the same pedestrian, sample x _i ，x _k Not belonging to the same pedestrian, S represents a ternary sample constraint set, D _M (,) represents the mahalanobis distance between the two, ρ represents the interval parameter, and R (W) represents the regular term corresponding to the weight parameter W;

the regular term calculation formula of the h-th layer weight of the neural network is as follows:

wherein R (W) _h ) Regular term, W, representing the h-th layer _h H-th layer weight, N, representing a deep neural network _h Indicates the number of weight components of the h-th layer,

represents W _h The (i) th component of (a),<，>represents the inner product operation of the vector, and lambda and gamma are hyper-parameters.

2. The method of claim 1, wherein in step S2, use is made of [ -1, 1 [ -1 [ ]]Random number in range, initializing mapping matrix

Wherein, Dim _Net Representing the network output vector dimension.

3. The method of claim 1, wherein mapping the batch of samples to a dimension of Dim _e The embedding space of (a) is:

x′ _i ＝L×f(W，x _i )

wherein, f (W, x) _i ) Represents a sample x _i Mapping result, x 'of neural network model with weight parameter W' _i Representing the feature vector of the sample in the embedding space.

4. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, implements the pedestrian re-identification method based on hierarchical optimization metric learning according to any one of claims 1 to 3.