Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to solve the technical problems that the discriminant of the prior art on an unlabeled target domain is poor due to the fact that domain sharing knowledge on a source domain cannot be fully utilized.
In order to solve the technical problems, the invention adopts the following technical scheme:
A cross-domain pedestrian re-identification method based on deep learning comprises the following steps:
s100, selecting a public dataset A and a public dataset B, and taking the dataset A as a source domain D s, wherein the formula is as follows:
Wherein, Representing the i-th source domain sample,Representing a real label corresponding to an ith source domain sample, and n s represents the total number of source domain samples;
The expression for selecting part of the data in dataset B as target domain training set D T,DT is as follows:
Wherein, Represents the j-th target domain sample, n t represents the total number of target domain samples;
S200, selecting ResNet-50 model M, wherein the model M comprises two modules, the first module is an online feature encoder f (|theta t),θt is a related parameter of the first module, and the second module is a momentum feature encoder The related parameters of the second module;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';
s300, calculating the loss of the initialization model M' by using a loss function;
S400, training the model M ' by taking a source domain and a target domain as input of an initialized model M ', updating parameters in the model M ' according to the loss calculated in the S300, and stopping training when the maximum training times are reached to obtain a trained model M ' ';
S500, inputting the pedestrian image to be predicted into a trained model M '' to obtain a pedestrian retrieval result.
Preferably, the loss function of the initialization model M' in S300 is as follows:
S310, extracting features of the data in D T by using a momentum feature encoder, storing the extracted features in a memory feature library N, and then clustering all the features in N by using a DBSCAN clustering algorithm to generate and store the features Pseudo tags corresponding to each other
S320, calculating training weights w d (i) of each iteration of the source domain by using a time sequence domain relation strategy, wherein the maximum training weight of each iteration of the source domain is t 1 and the minimum training weight of each iteration of the source domain is t 2, and the calculation expression of t 1>t2 is as follows:
wd(i)=(1-s(i))×t1+s(i)×t2
Wherein, symbol% represents the remainder taking operation, i represents the ith training, e represents the maximum training times, w d (i) represents the training weight of the source domain applied to the ith iterative training, and s (i) represents the length of each part after equally dividing t 1 and t 2;
S330, calculating training weights of each source domain sample using a ranking guidance selection strategy The method comprises the following specific steps:
S331 randomly selecting a source domain sample from the source domain Ds And uses the pair of line feature encoders f (|theta t)Extracting features, and then using the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution classified over target domainsAnd probability distribution classified over source domainsThe computational expression is as follows:
Wherein, Representation ofProbability distribution of classification over the target domain, C t representing the class classifier over the target domain, C p representing the number of classes of pseudo tags over the target domain; Representative sample The probability distribution classified on the source domain, C s is the number of categories of the real labels on the source domain, and C s represents the category classifier on the target domain;
s332 calculation The similarity score with the target domain, si, is expressed as follows:
wherein c p represents the number of categories of pseudo tags on the target domain;
S333, calculating similarity scores of all source domain samples and target domains to form a similarity score set Then, all the similarity scores are arranged in a descending order, a source domain sample corresponding to the similarity score of the previous k% is taken as a reliability sample set delta s, and the expression is as follows:
Where τ s represents the similarity score for the kth% of the source domain samples;
S334 definition The maximum class probability and the second largest class probability on the source domain are respectivelyAndCalculation ofUncertainty U i over the source domain is expressed as follows:
s335, calculating all source domain sample uncertainty values to form an uncertainty value set Then, all uncertainty values are arranged in an ascending order, a source domain sample corresponding to the previous k% of uncertainty values is taken as an uncertainty sample set delta u, and the expression is as follows:
S336, combining the formula (6) and the formula (8) to obtain the training weight of each source domain sample The expression is as follows:
S340, calculating cross entropy loss of the source domain according to the source domain sample training weight obtained in S336 The specific expression is as follows:
Wherein, Representing source domain samplesBelongs to the category ofProbability of (2);
S350, calculating the triplet loss of the source domain according to the source domain sample training weight obtained in S336 The method comprises the following specific steps:
s351 calculating the ith to The weight lost for the triplet of anchor points isThe computational expression is as follows:
Wherein, Representation and representationThe source domain positive sample furthest apart,Representation and representationSource domain negative samples closest to the source domain;
s352, after the triplet loss of all source domain samples is calculated, the triplet loss of the source domain can be obtained The specific expression is as follows:
Wherein, AndRespectively represent source domain samplesDistance from the most distant source field positive sample and the most nearest source field negative sample, m represents the granularity of the triplet;
s360, calculating cross entropy loss of target domain And triplet lossThe specific expression is as follows:
Wherein, Representing a target domain sampleBelongs to the category ofIs a probability of (2).AndRespectively represent target domain samplesThe distance between the positive sample of the furthest target domain and the negative sample of the nearest target domain, m represents the interval size of the triples;
s370, obtaining a final loss function L total of the initialization model M' according to the formula (10), the formula (12), the formula (13) and the formula (14), wherein the expression is as follows:
Wherein, Representing the soft cross entropy loss weight,Representing soft triplet loss weights.
The combination of cross entropy loss and triplet loss is adopted for calculation, so that the effect of weight balance can be achieved, and the influence of noise pseudo labels generated by a target domain on model training can be effectively reduced.
Preferably, the loss of M' is calculated using the final loss function L total in S370, and the parameters in f (|θ t) are updated using gradient back-propagation, using equation (16)And parameters of:
where α is a momentum factor and t represents the number of training rounds.
Compared with the prior art, the invention has at least the following advantages:
1. Aiming at the problem that the prior art method possibly cannot fully utilize source knowledge in the training process, the invention provides a novel PKSD method which can effectively utilize source domain knowledge in the whole training process and improve the discriminant accuracy on unlabeled target domains.
2. The invention provides a time sequence domain relation method TDR with linear change, which reduces the influence of a specific sample in a source domain by reducing the training weight of the source domain.
3. The invention provides a sequencing-guided sample selection method RIS, which selects a source domain sample with rich and reliable information by calculating uncertainty and similarity indexes of the source domain sample.
4. In order to alleviate the influence of catastrophic forgetfulness on a source domain, the pedestrian re-recognition model is trained in a collaborative training mode. Specifically, a training model is common to a source domain sample with a true label and a target domain sample assigned a pseudo tag. Unlike most of the previous methods, the method does not adopt a pre-training and fine-tuning two-stage training strategy any more, but is converted into a single-stage collaborative training mode. However, as the number of training rounds grows, the model is easily over-fitted to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain distance between the source domain and the target domain is large.
Detailed Description
The present invention will be described in further detail below.
The pedestrian re-recognition model is trained in a collaborative training mode. Specifically, a training model is common to a source domain sample with a true label and a target domain sample assigned a pseudo tag. Unlike most of the previous methods, the method does not adopt a pre-training and fine-tuning two-stage training strategy any more, but is converted into a single-stage collaborative training mode. However, as the number of training rounds grows, the model is easily over-fitted to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain distance between the source domain and the target domain is large.
To solve the above problems, the present invention proposes a novel cross-domain pedestrian re-recognition method for source domain knowledge retention (PRESERVING THE Knowledge from the Source Domain, PKSD) to effectively utilize knowledge from the source domain throughout the training process. Unlike previous two-stage based training criteria, PKSD employs a co-training strategy, i.e., learning source domain samples and target domain samples simultaneously. Specifically, PKSD uses not only the target domain data with the pseudo tag as the input of the model, but also the source domain data with the true label as the input in each iteration, so as to train the model together. Although the source domain sample is fully utilized, domain specific knowledge present in the source domain has a detrimental effect on domain adaptation tasks. Therefore, this chapter proposes a linearly varying time-sequential domain relationship (Temporal Domain Relation, TDR) approach to gradually slow down the impact of source domain samples. Specifically, as the number of training times increases, the training weight of the source domain is gradually reduced. Some informative and reliable domain sharing knowledge is helpful to promote the performance of the model on the target domain. Further, this chapter proposes a rank-guided sample selection (RIS) method to evaluate the uncertainty and similarity of each sample from the source domain, select samples with information-rich and reliable domain sharing knowledge by the uncertainty and similarity score Ranking, and reassign their sample training weights. In general, by controlling the source domain weights and the sample weights, the proposed PKSD can effectively suppress the influence of domain-specific knowledge from the source domain, improving the test performance of the model on the target domain. Experimental results show that the method greatly exceeds the current most advanced method.
Referring to fig. 1, a cross-domain pedestrian re-identification method based on deep learning includes the following steps:
s100, selecting a public dataset A and a public dataset B, and taking the dataset A as a source domain D s, wherein the formula is as follows:
Wherein, Representing the i-th source domain sample,Representing a real label corresponding to an ith source domain sample, and n s represents the total number of source domain samples;
The expression for selecting part of the data in dataset B as target domain training set D T,DT is as follows:
Wherein, Represents the j-th target domain sample, n t represents the total number of target domain samples;
S200, selecting ResNet-50 model M, wherein the model M comprises two modules, the first module is an online feature encoder f (|theta t),θt is a related parameter of the first module, and the second module is a momentum feature encoder The related parameters of the second module;
the ResNet-50 model is the prior art, the data set ImageNet is the existing public data set, and compared with the initialization parameters which can be given by other public data sets, the data set ImageNet has better accuracy and does not have too large random error;
s300, calculating the loss of the initialization model M' by using a loss function;
The loss function of the initialization model M' in S300 is as follows:
S310, extracting features of the data in D T by using a momentum feature encoder, storing the extracted features in a memory feature library N, and clustering all the features in N by using a DBSCAN clustering algorithm which is the prior art and generating and connecting the features Pseudo tags corresponding to each other
S320, calculating training weights w d (i) of each iteration of the source domain by using a time sequence domain relation strategy, wherein the time sequence domain relation method is the prior art, and the maximum training weight of each iteration of the source domain is t 1 and the minimum training weight of each iteration of the source domain is t 2, wherein t 1>t2 is calculated as follows:
wd(i)=(1-s(i))×t1+s(i)×t2
Wherein, symbol% represents the remainder taking operation, i represents the ith training, e represents the maximum training times, w d (i) represents the training weight of the source domain applied to the ith iterative training, and s (i) represents the length of each part after equally dividing t 1 and t 2;
S330, calculating training weights of each source domain sample using a ranking guidance selection strategy The method comprises the following specific steps:
S331 randomly selecting a source domain sample from the source domain D s And uses the pair of line feature encoders f (|theta t)Extracting features, and then using the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution classified over target domainsAnd probability distribution classified over source domainsClassifying each source domain sample by using a class classifier C t on the target domain to measure the similarity between the source domain sample and the target domain, classifying each source domain sample by using a class classifier C s on the source domain to measure the uncertainty of each source domain sample, wherein the class classifier of the target domain and the class classifier of the source domain adopt the existing classifier, and the calculation expression is as follows:
Wherein, Representation ofProbability distribution of classification over the target domain, C t representing the class classifier over the target domain, C p representing the number of classes of pseudo tags over the target domain; Representative sample The probability distribution classified on the source domain, C s is the number of categories of the real labels on the source domain, and C s represents the category classifier on the target domain;
s332 calculation The similarity score with the target domain, si, is expressed as follows:
wherein c p represents the number of categories of pseudo tags on the target domain;
S333, calculating similarity scores of all source domain samples and target domains to form a similarity score set Then, all the similarity scores are arranged in a descending order, a source domain sample corresponding to the similarity score of the previous k% is taken as a reliability sample set delta s, and the expression is as follows:
Where τ s represents the similarity score for the kth% of the source domain samples;
S334 definition The maximum class probability and the second largest class probability on the source domain are respectivelyAndCalculation ofUncertainty U i over the source domain is expressed as follows:
s335, calculating all source domain sample uncertainty values to form an uncertainty value set Then, all uncertainty values are arranged in an ascending order, a source domain sample corresponding to the previous k% of uncertainty values is taken as an uncertainty sample set delta u, and the expression is as follows:
S336, combining the formula (6) and the formula (8) to obtain the training weight of each source domain sample The expression is as follows:
The smaller the similarity between a sample selected from a source domain and a target domain, the greater the difference in appearance information between the sample and the target domain sample, and conversely, the greater the similarity between the source domain sample and the target domain, the more likely the sample has knowledge of domain sharing with the target domain sample. For samples from the source domain The sample has a low similarity to the target domain, and the share of that model gradually decreases (TDR) with increasing training round number, whereas if the sample has a high similarity to the target domain, that contribution will not be affected by the method from TDR.
If the source domain sample has larger uncertainty, the sample also has a great amount of rich information for the model to learn. The method proposed by combining the formula (6) and the formula (8) can select samples with reliability and abundant information on the source domain, and by increasing the training weights of the samples, domain sharing knowledge from the source domain can be effectively utilized, so that the performance of the model on the target domain can be further improved.
S340, calculating cross entropy loss of the source domain according to the source domain sample training weight obtained in S336The specific expression is as follows:
Wherein, Representing source domain samplesBelongs to the category ofProbability of (2);
S350, calculating the triplet loss of the source domain according to the source domain sample training weight obtained in S336 The method comprises the following specific steps:
s351 calculating the ith to The weight lost for the triplet of anchor points isThe computational expression is as follows:
Wherein, Representation and representationThe source domain positive sample furthest apart,Representation and representationSource domain negative samples closest to the source domain;
s352, after the triplet loss of all source domain samples is calculated, the triplet loss of the source domain can be obtained The specific expression is as follows:
Wherein, AndRespectively represent source domain samplesThe distance between the positive sample in the furthest source domain and the negative sample in the nearest source domain, m represents the distance of the triplets, the more accurate description is that m represents the minimum difference between the distance between the pairs of positive sample features and the distance between the pairs of negative sample features, according to an empirical value, m is set to be 0.5, the method is a super parameter used in designing the loss function, and the main function is to pull the distance between the pairs of similar sample features and push a threshold value of the distance between the pairs of different sample features.
S360, calculating cross entropy loss of target domainAnd triplet lossThe specific expression is as follows:
Wherein, Representing a target domain sampleBelongs to the category ofIs a probability of (2).AndRespectively represent target domain samplesThe distance between the positive sample of the furthest target domain and the negative sample of the nearest target domain, m represents the interval size of the triples;
s370, obtaining a final loss function L total of the initialization model M' according to the formula (10), the formula (12), the formula (13) and the formula (14), wherein the expression is as follows:
Wherein, Representing the soft cross entropy loss weight,Representing soft triplet loss weights.
Calculating the loss of M' by using the final loss function L total in S370, updating the parameters in f (|theta t) by using gradient back transfer, and updating by using the formula (16)Is a parameter of (1):
where α is a momentum factor and t represents the number of training rounds.
S400, training the model M ' by taking a source domain and a target domain as input of an initialized model M ', updating parameters in the model M ' according to the loss calculated in the S300, and stopping training when the maximum training times are reached to obtain a trained model M ' ';
S500, inputting the pedestrian image to be predicted into a trained model M '' to obtain a pedestrian retrieval result.
Experimental design and results analysis
1, Introduction of the data set used
The present invention verifies the validity of the proposed method on three widely used public datasets, market1501, dukeMTMC-ReID and MSMT 17. For mark 1501, the dataset contains 32668 images of pedestrians of different identities taken by 6 cameras. Of these, 12936 pedestrian images of 751 identities were used for training, and the remaining images were used for testing. For DukeMTMC-ReID, the dataset contained 16522 training images, 2228 query images, and 17661 gallery images from 702 different identities taken by 8 cameras. MSMT17 is a larger scale dataset that includes 126441 images of 4101 different identities captured by 15 cameras. Specifically, 32621 images of pedestrians with 1041 different identities are used as a training set, and the rest images are used as a testing set. This section is evaluated with a commonly used average mean Precision (MEAN AVERAGE Precision, mAP) and Cumulative matching curve (cumulates MATCHING CHARACTERISTICS, CMC) of two evaluation indices. For convenience of description, the following references mark 1501, dukeMTMC-ReID and MSMT17 are respectively referred to as mark, duke and MSMT, and will not be described in any additional detail.
2, Experimental setup
In the experiment, resNet-50 was used as feature code for the proposed method and pre-training parameters on ImageNet were loaded. For the co-training set-up, each mini-batch on the source and target domains is constructed as 64 pedestrian images of 16 different identities. The network is optimized by Adam algorithm with weight decay rate of 0.0005. The entire training process was 40 times, this section used the norm-up strategy in the first 10 exercises and set the initial learning rate to 0.00035. For each training, the momentum feature encoder f m(·|θm) updates its parameters in a time-series moving average fashion, and the momentum factor α=0.999. Reassignment of pseudo tags occurs after every 400 iterations. All pedestrian images are resized to 256 x 128 size as input to the network. During the test, the output of the final batch normalization (Batch Normalization, BN) layer is presented as the final feature of the pedestrian image. All experiments were performed on the Pytorch platform and three NVIDIA TITAN V GPUs were used. Note that the test phase only has the momentum signature encoder fm (|θ m) for testing.
Table 1 compares with some of the latest methods on the mark and Duke datasets, respectively.
Table 2 compares with some of the most recent methods on the MSMT dataset.
3, Ablation study
To verify the validity of the proposed method modules, the section combines the different method modules and tests with Duke and mark as the source and target domains, respectively. As shown in fig. 2, the left side of fig. 2 shows the test results of different methods on the source domain, and the right side of fig. 2 shows the test results of different methods on the target domain. Wherein, mark and Duke are respectively used as a target domain and a source domain. As can be seen from the test results on the source domain, the model performance under the two-stage training strategy shows a greatly reduced trend along with the increase of the iteration times, and only 20% of mAP is finally obtained. This suggests that the two-stage training strategy may result in forgetting knowledge of the source domain by the fine-tuning stage model. In contrast, the collaborative training method provided by the invention well overcomes the problem of catastrophic forgetting encountered by source domain knowledge and ensures the final performance of the model on the source domain. From the test results on the target domain, it can be seen that the test performance of the model under the target domain is limited because the two-stage training strategy does not fully utilize the knowledge from the source domain. However, the method provided by the invention can fully and effectively utilize source domain knowledge and remarkably improve the test result of the model in the target domain. Specifically, the model under the two-stage training strategy obtains 74.7% mAP, and the standard method under the collaborative training provided by the invention can reach 79.1% mAP. By introducing TDR to prevent overfitting of the model on the source domain, the test result of the model under the target domain achieves 0.9% improvement of mAP. Further, in combination with the RIS module, the model eventually reached 80.7% mAP. From experimental results, the method provided by the invention can effectively utilize knowledge from a source domain and further promote the test result of the model on a target domain.
4, Comparison of results
The method compares the PKSD method with the current mainstream cross-domain pedestrian re-identification method. Note that the global pooling layer (Gobal Average Pooling, GAP) of the model to the final feature here is changed to a generalized average pooling layer (Generalized Mean Pooling, gem). The experimental results are shown in table 1, and it can be seen that PKSD has a performance greatly exceeding that of all the most advanced cross-domain pedestrian re-identification methods under the strategy of collaborative training. Specifically, in the setting of 'Duke to Market', three methods based on generation, namely SPGAN, PTGAN and ATNet, were first compared. In contrast to the best generation method ATNet, PKSD was improved by 58.5% and 37.8% at mAP and Rank1, respectively. Further, the mainstream methods represented by NRMT, MEB-Net, UNRN, GLT, IDM, PDA and the like are compared. The proposed method achieves the best performance, whether on 'mark to Duke' or 'Duke to mark'. In particular, on 'Marke to Duke', PKSD brings about an improvement of 0.9% in mAP compared to PDA. Also, PKSD improves 1.9% mAP over PDA on the 'Duke to mark'.
The method of the present invention also performed experiments on larger and more challenging MSMT datasets. Some of the latest approaches, such as NRMT, UNRN, GLT, IDM and PDA, demonstrate their good performance on MSMT datasets. As shown in Table 2, PKSD achieved 63.8% Rank1 and 36.5% mAP on the `Market to MSMT`. Likewise, PKSD achieved 63.8% Rank1 and 36.7% mAP under the 'Duke to MSMT' setting. Compared with other methods, PKSD provided by the method obtains the best test result. Overall, these experiments demonstrate that fully efficient use of knowledge from the source domain can further enhance the performance of the model on the target domain.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.