CN114882531B

CN114882531B - A cross-domain person re-identification method based on deep learning

Info

Publication number: CN114882531B
Application number: CN202210554612.XA
Authority: CN
Inventors: 葛永新; 张俊银; 华博誉; 徐玲; 黄晟; 洪明坚; 王洪星; 张小洪; 杨丹
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2025-04-01
Anticipated expiration: 2042-05-19
Also published as: CN114882531A

Abstract

The present invention relates to a cross-domain pedestrian re-identification method based on deep learning, comprising the following steps: selecting a public data set as a source domain and a target domain; selecting a ResNet‑50 model M and initializing its parameters to obtain M′; using the source domain and the target domain as inputs of the initialized model M′ and calculating the corresponding loss to train the model M′, stopping the training after reaching the maximum number of training times, and obtaining a trained model M″; inputting the image of the pedestrian to be predicted into the trained model M″ to obtain the retrieval result of the pedestrian. The method of the present invention can more accurately detect and identify specific pedestrians.

Description

Cross-domain pedestrian re-identification method based on deep learning

Technical Field

The invention relates to the field of pedestrian re-identification, in particular to a cross-domain pedestrian re-identification method based on deep learning.

Background

The current pedestrian re-recognition task aims to retrieve a particular pedestrian across cameras. Because of its important application in intelligent monitoring, pedestrian re-identification tasks have become one of the research hotspots in the field of computer vision. Satisfactory performance has been achieved in recent years based on supervised pedestrian re-recognition methods. However, when the trained pedestrian sample and the tested pedestrian sample are from different data sets, the performance of most supervised methods can be significantly degraded. In the real world, labeling of pedestrian data is expensive and time consuming, and therefore, unsupervised cross-domain pedestrian re-identification tasks are of interest to students.

The purpose of unsupervised cross-domain pedestrian re-identification is to migrate discriminant knowledge from the source domain to the unlabeled target domain, and expect the test result of the model in the target domain to be equivalent to that of the supervised method. This task also presents a significant challenge because of the large inter-domain differences in source and target domains. So far, the clustering-based cross-domain pedestrian re-identification method has been greatly developed, and most of the existing most advanced methods are clustering-based methods, which can be generally divided into two stages, namely 1) a supervised pre-training model by using labeled source domain data, and 2) a pseudo tag distribution on a target domain by using a clustering algorithm and iterative fine tuning of the pre-training model.

However, the pedestrian re-recognition model, which is iteratively trained during the fine tuning phase, gradually forgets discriminatory knowledge from the source domain, i.e., catastrophic forgetfulness. By observation, this phenomenon can be explained in two aspects, 1) with increasing number of iterations of the fine-tuning stage model, its test results on the source domain gradually decrease, and 2) simply removing the pre-training part brings only small performance degradation for most clustering methods. Therefore, it can be inferred that most existing cluster-based approaches do not fully exploit discriminative knowledge on the source domain, but discriminative knowledge from the source domain is important to improve the performance of the model on the target domain.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to solve the technical problems that the discriminant of the prior art on an unlabeled target domain is poor due to the fact that domain sharing knowledge on a source domain cannot be fully utilized.

In order to solve the technical problems, the invention adopts the following technical scheme:

A cross-domain pedestrian re-identification method based on deep learning comprises the following steps:

s100, selecting a public dataset A and a public dataset B, and taking the dataset A as a source domain D _s, wherein the formula is as follows:

Wherein, Representing the i-th source domain sample,Representing a real label corresponding to an ith source domain sample, and n _s represents the total number of source domain samples;

The expression for selecting part of the data in dataset B as target domain training set D _T,D_T is as follows:

Wherein, Represents the j-th target domain sample, n _t represents the total number of target domain samples;

S200, selecting ResNet-50 model M, wherein the model M comprises two modules, the first module is an online feature encoder f (|theta ^t),θ^t is a related parameter of the first module, and the second module is a momentum feature encoder The related parameters of the second module;

initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';

s300, calculating the loss of the initialization model M' by using a loss function;

S400, training the model M ' by taking a source domain and a target domain as input of an initialized model M ', updating parameters in the model M ' according to the loss calculated in the S300, and stopping training when the maximum training times are reached to obtain a trained model M ' ';

S500, inputting the pedestrian image to be predicted into a trained model M '' to obtain a pedestrian retrieval result.

Preferably, the loss function of the initialization model M' in S300 is as follows:

S310, extracting features of the data in D _T by using a momentum feature encoder, storing the extracted features in a memory feature library N, and then clustering all the features in N by using a DBSCAN clustering algorithm to generate and store the features Pseudo tags corresponding to each other

S320, calculating training weights w _d (i) of each iteration of the source domain by using a time sequence domain relation strategy, wherein the maximum training weight of each iteration of the source domain is t ₁ and the minimum training weight of each iteration of the source domain is t ₂, and the calculation expression of t ₁＞t₂ is as follows:

w_d(i)＝(1-s(i))×t₁+s(i)×t₂

Wherein, symbol% represents the remainder taking operation, i represents the ith training, e represents the maximum training times, w _d (i) represents the training weight of the source domain applied to the ith iterative training, and s (i) represents the length of each part after equally dividing t ₁ and t ₂;

S330, calculating training weights of each source domain sample using a ranking guidance selection strategy The method comprises the following specific steps:

S331 randomly selecting a source domain sample from the source domain Ds And uses the pair of line feature encoders f (|theta ^t)Extracting features, and then using the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution classified over target domainsAnd probability distribution classified over source domainsThe computational expression is as follows:

Wherein, Representation ofProbability distribution of classification over the target domain, C ^t representing the class classifier over the target domain, C _p representing the number of classes of pseudo tags over the target domain; Representative sample The probability distribution classified on the source domain, C _s is the number of categories of the real labels on the source domain, and C ^s represents the category classifier on the target domain;

s332 calculation The similarity score with the target domain, si, is expressed as follows:

wherein c _p represents the number of categories of pseudo tags on the target domain;

S333, calculating similarity scores of all source domain samples and target domains to form a similarity score set Then, all the similarity scores are arranged in a descending order, a source domain sample corresponding to the similarity score of the previous k% is taken as a reliability sample set delta _s, and the expression is as follows:

Where τ _s represents the similarity score for the kth% of the source domain samples;

S334 definition The maximum class probability and the second largest class probability on the source domain are respectivelyAndCalculation ofUncertainty U _i over the source domain is expressed as follows:

s335, calculating all source domain sample uncertainty values to form an uncertainty value set Then, all uncertainty values are arranged in an ascending order, a source domain sample corresponding to the previous k% of uncertainty values is taken as an uncertainty sample set delta _u, and the expression is as follows:

S336, combining the formula (6) and the formula (8) to obtain the training weight of each source domain sample The expression is as follows:

S340, calculating cross entropy loss of the source domain according to the source domain sample training weight obtained in S336 The specific expression is as follows:

Wherein, Representing source domain samplesBelongs to the category ofProbability of (2);

S350, calculating the triplet loss of the source domain according to the source domain sample training weight obtained in S336 The method comprises the following specific steps:

s351 calculating the ith to The weight lost for the triplet of anchor points isThe computational expression is as follows:

Wherein, Representation and representationThe source domain positive sample furthest apart,Representation and representationSource domain negative samples closest to the source domain;

s352, after the triplet loss of all source domain samples is calculated, the triplet loss of the source domain can be obtained The specific expression is as follows:

Wherein, AndRespectively represent source domain samplesDistance from the most distant source field positive sample and the most nearest source field negative sample, m represents the granularity of the triplet;

s360, calculating cross entropy loss of target domain And triplet lossThe specific expression is as follows:

Wherein, Representing a target domain sampleBelongs to the category ofIs a probability of (2).AndRespectively represent target domain samplesThe distance between the positive sample of the furthest target domain and the negative sample of the nearest target domain, m represents the interval size of the triples;

s370, obtaining a final loss function L _total of the initialization model M' according to the formula (10), the formula (12), the formula (13) and the formula (14), wherein the expression is as follows:

Wherein, Representing the soft cross entropy loss weight,Representing soft triplet loss weights.

The combination of cross entropy loss and triplet loss is adopted for calculation, so that the effect of weight balance can be achieved, and the influence of noise pseudo labels generated by a target domain on model training can be effectively reduced.

Preferably, the loss of M' is calculated using the final loss function L _total in S370, and the parameters in f (|θ ^t) are updated using gradient back-propagation, using equation (16)And parameters of:

where α is a momentum factor and t represents the number of training rounds.

Compared with the prior art, the invention has at least the following advantages:

1. Aiming at the problem that the prior art method possibly cannot fully utilize source knowledge in the training process, the invention provides a novel PKSD method which can effectively utilize source domain knowledge in the whole training process and improve the discriminant accuracy on unlabeled target domains.

2. The invention provides a time sequence domain relation method TDR with linear change, which reduces the influence of a specific sample in a source domain by reducing the training weight of the source domain.

3. The invention provides a sequencing-guided sample selection method RIS, which selects a source domain sample with rich and reliable information by calculating uncertainty and similarity indexes of the source domain sample.

4. In order to alleviate the influence of catastrophic forgetfulness on a source domain, the pedestrian re-recognition model is trained in a collaborative training mode. Specifically, a training model is common to a source domain sample with a true label and a target domain sample assigned a pseudo tag. Unlike most of the previous methods, the method does not adopt a pre-training and fine-tuning two-stage training strategy any more, but is converted into a single-stage collaborative training mode. However, as the number of training rounds grows, the model is easily over-fitted to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain distance between the source domain and the target domain is large.

Drawings

FIG. 1 shows the main structure of the method PKSD of the present invention.

FIG. 2 shows the results of verification of the effectiveness of the method of the present invention and various other methods.

Detailed Description

The present invention will be described in further detail below.

The pedestrian re-recognition model is trained in a collaborative training mode. Specifically, a training model is common to a source domain sample with a true label and a target domain sample assigned a pseudo tag. Unlike most of the previous methods, the method does not adopt a pre-training and fine-tuning two-stage training strategy any more, but is converted into a single-stage collaborative training mode. However, as the number of training rounds grows, the model is easily over-fitted to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain distance between the source domain and the target domain is large.

To solve the above problems, the present invention proposes a novel cross-domain pedestrian re-recognition method for source domain knowledge retention (PRESERVING THE Knowledge from the Source Domain, PKSD) to effectively utilize knowledge from the source domain throughout the training process. Unlike previous two-stage based training criteria, PKSD employs a co-training strategy, i.e., learning source domain samples and target domain samples simultaneously. Specifically, PKSD uses not only the target domain data with the pseudo tag as the input of the model, but also the source domain data with the true label as the input in each iteration, so as to train the model together. Although the source domain sample is fully utilized, domain specific knowledge present in the source domain has a detrimental effect on domain adaptation tasks. Therefore, this chapter proposes a linearly varying time-sequential domain relationship (Temporal Domain Relation, TDR) approach to gradually slow down the impact of source domain samples. Specifically, as the number of training times increases, the training weight of the source domain is gradually reduced. Some informative and reliable domain sharing knowledge is helpful to promote the performance of the model on the target domain. Further, this chapter proposes a rank-guided sample selection (RIS) method to evaluate the uncertainty and similarity of each sample from the source domain, select samples with information-rich and reliable domain sharing knowledge by the uncertainty and similarity score Ranking, and reassign their sample training weights. In general, by controlling the source domain weights and the sample weights, the proposed PKSD can effectively suppress the influence of domain-specific knowledge from the source domain, improving the test performance of the model on the target domain. Experimental results show that the method greatly exceeds the current most advanced method.

Referring to fig. 1, a cross-domain pedestrian re-identification method based on deep learning includes the following steps:

the ResNet-50 model is the prior art, the data set ImageNet is the existing public data set, and compared with the initialization parameters which can be given by other public data sets, the data set ImageNet has better accuracy and does not have too large random error;

The loss function of the initialization model M' in S300 is as follows:

S310, extracting features of the data in D _T by using a momentum feature encoder, storing the extracted features in a memory feature library N, and clustering all the features in N by using a DBSCAN clustering algorithm which is the prior art and generating and connecting the features Pseudo tags corresponding to each other

S320, calculating training weights w _d (i) of each iteration of the source domain by using a time sequence domain relation strategy, wherein the time sequence domain relation method is the prior art, and the maximum training weight of each iteration of the source domain is t ₁ and the minimum training weight of each iteration of the source domain is t ₂, wherein t ₁＞t₂ is calculated as follows:

w_d(i)＝(1-s(i))×t₁+s(i)×t₂

S331 randomly selecting a source domain sample from the source domain D _s And uses the pair of line feature encoders f (|theta ^t)Extracting features, and then using the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution classified over target domainsAnd probability distribution classified over source domainsClassifying each source domain sample by using a class classifier C ^t on the target domain to measure the similarity between the source domain sample and the target domain, classifying each source domain sample by using a class classifier C ^s on the source domain to measure the uncertainty of each source domain sample, wherein the class classifier of the target domain and the class classifier of the source domain adopt the existing classifier, and the calculation expression is as follows:

The smaller the similarity between a sample selected from a source domain and a target domain, the greater the difference in appearance information between the sample and the target domain sample, and conversely, the greater the similarity between the source domain sample and the target domain, the more likely the sample has knowledge of domain sharing with the target domain sample. For samples from the source domain The sample has a low similarity to the target domain, and the share of that model gradually decreases (TDR) with increasing training round number, whereas if the sample has a high similarity to the target domain, that contribution will not be affected by the method from TDR.

If the source domain sample has larger uncertainty, the sample also has a great amount of rich information for the model to learn. The method proposed by combining the formula (6) and the formula (8) can select samples with reliability and abundant information on the source domain, and by increasing the training weights of the samples, domain sharing knowledge from the source domain can be effectively utilized, so that the performance of the model on the target domain can be further improved.

S340, calculating cross entropy loss of the source domain according to the source domain sample training weight obtained in S336The specific expression is as follows:

Wherein, AndRespectively represent source domain samplesThe distance between the positive sample in the furthest source domain and the negative sample in the nearest source domain, m represents the distance of the triplets, the more accurate description is that m represents the minimum difference between the distance between the pairs of positive sample features and the distance between the pairs of negative sample features, according to an empirical value, m is set to be 0.5, the method is a super parameter used in designing the loss function, and the main function is to pull the distance between the pairs of similar sample features and push a threshold value of the distance between the pairs of different sample features.

S360, calculating cross entropy loss of target domainAnd triplet lossThe specific expression is as follows:

Calculating the loss of M' by using the final loss function L _total in S370, updating the parameters in f (|theta ^t) by using gradient back transfer, and updating by using the formula (16)Is a parameter of (1):

where α is a momentum factor and t represents the number of training rounds.

Experimental design and results analysis

1, Introduction of the data set used

The present invention verifies the validity of the proposed method on three widely used public datasets, market1501, dukeMTMC-ReID and MSMT 17. For mark 1501, the dataset contains 32668 images of pedestrians of different identities taken by 6 cameras. Of these, 12936 pedestrian images of 751 identities were used for training, and the remaining images were used for testing. For DukeMTMC-ReID, the dataset contained 16522 training images, 2228 query images, and 17661 gallery images from 702 different identities taken by 8 cameras. MSMT17 is a larger scale dataset that includes 126441 images of 4101 different identities captured by 15 cameras. Specifically, 32621 images of pedestrians with 1041 different identities are used as a training set, and the rest images are used as a testing set. This section is evaluated with a commonly used average mean Precision (MEAN AVERAGE Precision, mAP) and Cumulative matching curve (cumulates MATCHING CHARACTERISTICS, CMC) of two evaluation indices. For convenience of description, the following references mark 1501, dukeMTMC-ReID and MSMT17 are respectively referred to as mark, duke and MSMT, and will not be described in any additional detail.

2, Experimental setup

In the experiment, resNet-50 was used as feature code for the proposed method and pre-training parameters on ImageNet were loaded. For the co-training set-up, each mini-batch on the source and target domains is constructed as 64 pedestrian images of 16 different identities. The network is optimized by Adam algorithm with weight decay rate of 0.0005. The entire training process was 40 times, this section used the norm-up strategy in the first 10 exercises and set the initial learning rate to 0.00035. For each training, the momentum feature encoder f _m(·|θ_m) updates its parameters in a time-series moving average fashion, and the momentum factor α=0.999. Reassignment of pseudo tags occurs after every 400 iterations. All pedestrian images are resized to 256 x 128 size as input to the network. During the test, the output of the final batch normalization (Batch Normalization, BN) layer is presented as the final feature of the pedestrian image. All experiments were performed on the Pytorch platform and three NVIDIA TITAN V GPUs were used. Note that the test phase only has the momentum signature encoder fm (|θ _m) for testing.

Table 1 compares with some of the latest methods on the mark and Duke datasets, respectively.

Table 2 compares with some of the most recent methods on the MSMT dataset.

3, Ablation study

To verify the validity of the proposed method modules, the section combines the different method modules and tests with Duke and mark as the source and target domains, respectively. As shown in fig. 2, the left side of fig. 2 shows the test results of different methods on the source domain, and the right side of fig. 2 shows the test results of different methods on the target domain. Wherein, mark and Duke are respectively used as a target domain and a source domain. As can be seen from the test results on the source domain, the model performance under the two-stage training strategy shows a greatly reduced trend along with the increase of the iteration times, and only 20% of mAP is finally obtained. This suggests that the two-stage training strategy may result in forgetting knowledge of the source domain by the fine-tuning stage model. In contrast, the collaborative training method provided by the invention well overcomes the problem of catastrophic forgetting encountered by source domain knowledge and ensures the final performance of the model on the source domain. From the test results on the target domain, it can be seen that the test performance of the model under the target domain is limited because the two-stage training strategy does not fully utilize the knowledge from the source domain. However, the method provided by the invention can fully and effectively utilize source domain knowledge and remarkably improve the test result of the model in the target domain. Specifically, the model under the two-stage training strategy obtains 74.7% mAP, and the standard method under the collaborative training provided by the invention can reach 79.1% mAP. By introducing TDR to prevent overfitting of the model on the source domain, the test result of the model under the target domain achieves 0.9% improvement of mAP. Further, in combination with the RIS module, the model eventually reached 80.7% mAP. From experimental results, the method provided by the invention can effectively utilize knowledge from a source domain and further promote the test result of the model on a target domain.

4, Comparison of results

The method compares the PKSD method with the current mainstream cross-domain pedestrian re-identification method. Note that the global pooling layer (Gobal Average Pooling, GAP) of the model to the final feature here is changed to a generalized average pooling layer (Generalized Mean Pooling, gem). The experimental results are shown in table 1, and it can be seen that PKSD has a performance greatly exceeding that of all the most advanced cross-domain pedestrian re-identification methods under the strategy of collaborative training. Specifically, in the setting of 'Duke to Market', three methods based on generation, namely SPGAN, PTGAN and ATNet, were first compared. In contrast to the best generation method ATNet, PKSD was improved by 58.5% and 37.8% at mAP and Rank1, respectively. Further, the mainstream methods represented by NRMT, MEB-Net, UNRN, GLT, IDM, PDA and the like are compared. The proposed method achieves the best performance, whether on 'mark to Duke' or 'Duke to mark'. In particular, on 'Marke to Duke', PKSD brings about an improvement of 0.9% in mAP compared to PDA. Also, PKSD improves 1.9% mAP over PDA on the 'Duke to mark'.

The method of the present invention also performed experiments on larger and more challenging MSMT datasets. Some of the latest approaches, such as NRMT, UNRN, GLT, IDM and PDA, demonstrate their good performance on MSMT datasets. As shown in Table 2, PKSD achieved 63.8% Rank1 and 36.5% mAP on the `Market to MSMT`. Likewise, PKSD achieved 63.8% Rank1 and 36.7% mAP under the 'Duke to MSMT' setting. Compared with other methods, PKSD provided by the method obtains the best test result. Overall, these experiments demonstrate that fully efficient use of knowledge from the source domain can further enhance the performance of the model on the target domain.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.

Claims

1. A cross-domain pedestrian re-identification method based on deep learning, characterized in that it includes the following steps:

S100: Select public datasets A and B, and use dataset A as the source domain D _s . The formula is as follows:

in, represents the i-th source domain sample, represents the true label corresponding to the i-th source domain sample, and n _s represents the total number of source domain samples;

Select part of the data in the dataset B as the target domain training set _DT , and the expression of _DT is as follows:

in, represents the jth target domain sample, and n _t represents the total number of target domain samples;

S200: Select ResNet-50 model M, which consists of two modules. Module 1 is the online feature encoder f(·|θ ^t ), where θ ^t is the relevant parameter of module 1, and module 2 is the momentum feature encoder. are the relevant parameters of module 2;

Use the dataset ImageNet to initialize the parameters of model M and obtain the initialized model M′;

S300: Calculate the loss of the initialization model M′ using the loss function. The loss function of the initialization model M′ is as follows:

S310: Use the momentum feature encoder to extract features from the data in _DT and save them in the memory feature library N. Then use the DBSCAN clustering algorithm to cluster all the features in N and generate One-to-one pseudo-labels

S320: Calculate the training weight w _d (i) of each iteration of the source domain using the temporal domain relationship strategy. The maximum training weight of each iteration of the source domain is preset to be t ₁ and the minimum is t ₂ , where t ₁ >t ₂ . The calculation expression is as follows:

w _d (i)=(1-s(i))×t ₁ +s(i)×t ₂

The symbol % represents the modulo operation, i represents the i-th training, e represents the maximum number of training times, w _d (i) represents the training weight of the source domain in the i-th iteration training, and s(i) represents the length of each portion after t ₁ and t ₂ are equally spaced;

S330: Calculate the training weight of each source domain sample using the ranking-guided selection strategy The specific steps are as follows:

S331: Randomly select a source domain sample from the source domain _Ds 1≤i≤n _s , and use the line feature encoder f(·|θ ^t ) to Extract features, and then use the category classifier of the target domain and the category classifier of the source domain to classify Classify and calculate separately Probability distribution of classification in the target domain and the probability distribution of classification on the source domain The calculation expression is as follows:

in, express The probability distribution of classification on the target domain, C ^t represents the category classifier on the target domain, and c _p represents the number of categories of pseudo labels on the target domain; Representative samples The probability distribution of classification on the source domain, _cs is the number of categories of the true label on the source domain, and ^Cs represents the category classifier on the target domain;

S332: Calculation The similarity score S _i with the target domain is expressed as follows:

Among them, c _p represents the number of pseudo-label categories in the target domain;

S333: Calculate the similarity scores of all source domain samples and the target domain to form a similarity score set Then, all similarity scores are sorted in descending order, and the source domain samples corresponding to the top k% similarity scores are taken as the reliability sample set δ _s , which is expressed as follows:

Among them, _τs represents the similarity score of the k% source domain samples;

S334: Definition The maximum and second largest category probabilities in the source domain are and calculate The uncertainty U _i in the source domain is expressed as follows:

S335: Calculate the uncertainty values of all source domain samples to form an uncertainty value set Then, all uncertainty values are sorted in ascending order, and the source domain samples corresponding to the top k% uncertainty values are taken as the uncertainty sample set δ _u , which is expressed as follows:

S336: Combine formula (6) and formula (8) to obtain the training weight of each source domain sample The expression is as follows:

S340: Calculate the cross entropy loss of the source domain based on the source domain sample training weights obtained in S336 The specific expression is as follows:

in, Represents source domain samples Belongs to category The probability of

S350: Calculate the triplet loss of the source domain based on the source domain sample training weights obtained in S336 The specific steps are as follows:

S351: Calculate the i-th The weight of the triplet loss of the anchor point is The calculation expression is as follows:

in, Representation and The farthest positive source domain sample, Representation and The closest source domain negative sample;

S352: After calculating the triplet loss of all source domain samples, the triplet loss of the source domain can be obtained The specific expression is as follows:

in, and Represent the source domain samples The distance between the farthest source domain positive sample and the nearest source domain negative sample, m represents the interval size of the triplet;

S360: Calculate the cross entropy loss of the target domain and triplet loss The specific expression is as follows:

in, Represents the target domain sample Belongs to category probability. and Represent the target domain samples respectively The distance between the farthest target domain positive sample and the nearest target domain negative sample, m represents the interval size of the triplet;

S370: According to formula (10), formula (12), formula (13) and formula (14), the final loss function L _total of the initialization model M' can be obtained, and the expression is as follows:

in, represents the soft cross entropy loss weight, represents the soft triplet loss weight;

S400: using the source domain and the target domain as inputs of the initialization model M′ to train the model M′, updating the parameters in M′ according to the loss calculated in S300, stopping the training when the maximum number of training times is reached, and obtaining the trained model M″;

S500: Input the pedestrian image to be predicted into the trained model M″ to obtain the retrieval result of the pedestrian.

2. A cross-domain person re-identification method based on deep learning as claimed in claim 1, characterized in that: the loss of M′ is calculated using the final loss function L _total in S370, and the parameters in f(·|θ ^t ) are updated using gradient back propagation, and formula (16) is used to update Parameters in:

Among them, α is the momentum factor and t represents the number of training rounds.