[go: up one dir, main page]

CN114882531B - A cross-domain person re-identification method based on deep learning - Google Patents

A cross-domain person re-identification method based on deep learning Download PDF

Info

Publication number
CN114882531B
CN114882531B CN202210554612.XA CN202210554612A CN114882531B CN 114882531 B CN114882531 B CN 114882531B CN 202210554612 A CN202210554612 A CN 202210554612A CN 114882531 B CN114882531 B CN 114882531B
Authority
CN
China
Prior art keywords
domain
source domain
sample
training
follows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210554612.XA
Other languages
Chinese (zh)
Other versions
CN114882531A (en
Inventor
葛永新
张俊银
华博誉
徐玲
黄晟
洪明坚
王洪星
张小洪
杨丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210554612.XA priority Critical patent/CN114882531B/en
Publication of CN114882531A publication Critical patent/CN114882531A/en
Application granted granted Critical
Publication of CN114882531B publication Critical patent/CN114882531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种基于深度学习的跨域行人再识别方法,包括如下步骤:选用公开数据集作为源域和目标域;选用ResNet‑50模型M并进行初始化其参数得到M′;将源域和目标域作为初始化模型M′的输入并计算相应损失对模型M′进行训练,达到最大训练次数后停止训练,得到训练好的模型M″;将待预测行人图像输入训练好的模型M″中,得到行人的检索结果。使用本发明方法可以更准确的检测识别特定行人。

The present invention relates to a cross-domain pedestrian re-identification method based on deep learning, comprising the following steps: selecting a public data set as a source domain and a target domain; selecting a ResNet‑50 model M and initializing its parameters to obtain M′; using the source domain and the target domain as inputs of the initialized model M′ and calculating the corresponding loss to train the model M′, stopping the training after reaching the maximum number of training times, and obtaining a trained model M″; inputting the image of the pedestrian to be predicted into the trained model M″ to obtain the retrieval result of the pedestrian. The method of the present invention can more accurately detect and identify specific pedestrians.

Description

Cross-domain pedestrian re-identification method based on deep learning
Technical Field
The invention relates to the field of pedestrian re-identification, in particular to a cross-domain pedestrian re-identification method based on deep learning.
Background
The current pedestrian re-recognition task aims to retrieve a particular pedestrian across cameras. Because of its important application in intelligent monitoring, pedestrian re-identification tasks have become one of the research hotspots in the field of computer vision. Satisfactory performance has been achieved in recent years based on supervised pedestrian re-recognition methods. However, when the trained pedestrian sample and the tested pedestrian sample are from different data sets, the performance of most supervised methods can be significantly degraded. In the real world, labeling of pedestrian data is expensive and time consuming, and therefore, unsupervised cross-domain pedestrian re-identification tasks are of interest to students.
The purpose of unsupervised cross-domain pedestrian re-identification is to migrate discriminant knowledge from the source domain to the unlabeled target domain, and expect the test result of the model in the target domain to be equivalent to that of the supervised method. This task also presents a significant challenge because of the large inter-domain differences in source and target domains. So far, the clustering-based cross-domain pedestrian re-identification method has been greatly developed, and most of the existing most advanced methods are clustering-based methods, which can be generally divided into two stages, namely 1) a supervised pre-training model by using labeled source domain data, and 2) a pseudo tag distribution on a target domain by using a clustering algorithm and iterative fine tuning of the pre-training model.
However, the pedestrian re-recognition model, which is iteratively trained during the fine tuning phase, gradually forgets discriminatory knowledge from the source domain, i.e., catastrophic forgetfulness. By observation, this phenomenon can be explained in two aspects, 1) with increasing number of iterations of the fine-tuning stage model, its test results on the source domain gradually decrease, and 2) simply removing the pre-training part brings only small performance degradation for most clustering methods. Therefore, it can be inferred that most existing cluster-based approaches do not fully exploit discriminative knowledge on the source domain, but discriminative knowledge from the source domain is important to improve the performance of the model on the target domain.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to solve the technical problems that the discriminant of the prior art on an unlabeled target domain is poor due to the fact that domain sharing knowledge on a source domain cannot be fully utilized.
In order to solve the technical problems, the invention adopts the following technical scheme:
A cross-domain pedestrian re-identification method based on deep learning comprises the following steps:
s100, selecting a public dataset A and a public dataset B, and taking the dataset A as a source domain D s, wherein the formula is as follows:
Wherein, Representing the i-th source domain sample,Representing a real label corresponding to an ith source domain sample, and n s represents the total number of source domain samples;
The expression for selecting part of the data in dataset B as target domain training set D T,DT is as follows:
Wherein, Represents the j-th target domain sample, n t represents the total number of target domain samples;
S200, selecting ResNet-50 model M, wherein the model M comprises two modules, the first module is an online feature encoder f (|theta t),θt is a related parameter of the first module, and the second module is a momentum feature encoder The related parameters of the second module;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';
s300, calculating the loss of the initialization model M' by using a loss function;
S400, training the model M ' by taking a source domain and a target domain as input of an initialized model M ', updating parameters in the model M ' according to the loss calculated in the S300, and stopping training when the maximum training times are reached to obtain a trained model M ' ';
S500, inputting the pedestrian image to be predicted into a trained model M '' to obtain a pedestrian retrieval result.
Preferably, the loss function of the initialization model M' in S300 is as follows:
S310, extracting features of the data in D T by using a momentum feature encoder, storing the extracted features in a memory feature library N, and then clustering all the features in N by using a DBSCAN clustering algorithm to generate and store the features Pseudo tags corresponding to each other
S320, calculating training weights w d (i) of each iteration of the source domain by using a time sequence domain relation strategy, wherein the maximum training weight of each iteration of the source domain is t 1 and the minimum training weight of each iteration of the source domain is t 2, and the calculation expression of t 1>t2 is as follows:
wd(i)=(1-s(i))×t1+s(i)×t2
Wherein, symbol% represents the remainder taking operation, i represents the ith training, e represents the maximum training times, w d (i) represents the training weight of the source domain applied to the ith iterative training, and s (i) represents the length of each part after equally dividing t 1 and t 2;
S330, calculating training weights of each source domain sample using a ranking guidance selection strategy The method comprises the following specific steps:
S331 randomly selecting a source domain sample from the source domain Ds And uses the pair of line feature encoders f (|theta t)Extracting features, and then using the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution classified over target domainsAnd probability distribution classified over source domainsThe computational expression is as follows:
Wherein, Representation ofProbability distribution of classification over the target domain, C t representing the class classifier over the target domain, C p representing the number of classes of pseudo tags over the target domain; Representative sample The probability distribution classified on the source domain, C s is the number of categories of the real labels on the source domain, and C s represents the category classifier on the target domain;
s332 calculation The similarity score with the target domain, si, is expressed as follows:
wherein c p represents the number of categories of pseudo tags on the target domain;
S333, calculating similarity scores of all source domain samples and target domains to form a similarity score set Then, all the similarity scores are arranged in a descending order, a source domain sample corresponding to the similarity score of the previous k% is taken as a reliability sample set delta s, and the expression is as follows:
Where τ s represents the similarity score for the kth% of the source domain samples;
S334 definition The maximum class probability and the second largest class probability on the source domain are respectivelyAndCalculation ofUncertainty U i over the source domain is expressed as follows:
s335, calculating all source domain sample uncertainty values to form an uncertainty value set Then, all uncertainty values are arranged in an ascending order, a source domain sample corresponding to the previous k% of uncertainty values is taken as an uncertainty sample set delta u, and the expression is as follows:
S336, combining the formula (6) and the formula (8) to obtain the training weight of each source domain sample The expression is as follows:
S340, calculating cross entropy loss of the source domain according to the source domain sample training weight obtained in S336 The specific expression is as follows:
Wherein, Representing source domain samplesBelongs to the category ofProbability of (2);
S350, calculating the triplet loss of the source domain according to the source domain sample training weight obtained in S336 The method comprises the following specific steps:
s351 calculating the ith to The weight lost for the triplet of anchor points isThe computational expression is as follows:
Wherein, Representation and representationThe source domain positive sample furthest apart,Representation and representationSource domain negative samples closest to the source domain;
s352, after the triplet loss of all source domain samples is calculated, the triplet loss of the source domain can be obtained The specific expression is as follows:
Wherein, AndRespectively represent source domain samplesDistance from the most distant source field positive sample and the most nearest source field negative sample, m represents the granularity of the triplet;
s360, calculating cross entropy loss of target domain And triplet lossThe specific expression is as follows:
Wherein, Representing a target domain sampleBelongs to the category ofIs a probability of (2).AndRespectively represent target domain samplesThe distance between the positive sample of the furthest target domain and the negative sample of the nearest target domain, m represents the interval size of the triples;
s370, obtaining a final loss function L total of the initialization model M' according to the formula (10), the formula (12), the formula (13) and the formula (14), wherein the expression is as follows:
Wherein, Representing the soft cross entropy loss weight,Representing soft triplet loss weights.
The combination of cross entropy loss and triplet loss is adopted for calculation, so that the effect of weight balance can be achieved, and the influence of noise pseudo labels generated by a target domain on model training can be effectively reduced.
Preferably, the loss of M' is calculated using the final loss function L total in S370, and the parameters in f (|θ t) are updated using gradient back-propagation, using equation (16)And parameters of:
where α is a momentum factor and t represents the number of training rounds.
Compared with the prior art, the invention has at least the following advantages:
1. Aiming at the problem that the prior art method possibly cannot fully utilize source knowledge in the training process, the invention provides a novel PKSD method which can effectively utilize source domain knowledge in the whole training process and improve the discriminant accuracy on unlabeled target domains.
2. The invention provides a time sequence domain relation method TDR with linear change, which reduces the influence of a specific sample in a source domain by reducing the training weight of the source domain.
3. The invention provides a sequencing-guided sample selection method RIS, which selects a source domain sample with rich and reliable information by calculating uncertainty and similarity indexes of the source domain sample.
4. In order to alleviate the influence of catastrophic forgetfulness on a source domain, the pedestrian re-recognition model is trained in a collaborative training mode. Specifically, a training model is common to a source domain sample with a true label and a target domain sample assigned a pseudo tag. Unlike most of the previous methods, the method does not adopt a pre-training and fine-tuning two-stage training strategy any more, but is converted into a single-stage collaborative training mode. However, as the number of training rounds grows, the model is easily over-fitted to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain distance between the source domain and the target domain is large.
Drawings
FIG. 1 shows the main structure of the method PKSD of the present invention.
FIG. 2 shows the results of verification of the effectiveness of the method of the present invention and various other methods.
Detailed Description
The present invention will be described in further detail below.
The pedestrian re-recognition model is trained in a collaborative training mode. Specifically, a training model is common to a source domain sample with a true label and a target domain sample assigned a pseudo tag. Unlike most of the previous methods, the method does not adopt a pre-training and fine-tuning two-stage training strategy any more, but is converted into a single-stage collaborative training mode. However, as the number of training rounds grows, the model is easily over-fitted to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain distance between the source domain and the target domain is large.
To solve the above problems, the present invention proposes a novel cross-domain pedestrian re-recognition method for source domain knowledge retention (PRESERVING THE Knowledge from the Source Domain, PKSD) to effectively utilize knowledge from the source domain throughout the training process. Unlike previous two-stage based training criteria, PKSD employs a co-training strategy, i.e., learning source domain samples and target domain samples simultaneously. Specifically, PKSD uses not only the target domain data with the pseudo tag as the input of the model, but also the source domain data with the true label as the input in each iteration, so as to train the model together. Although the source domain sample is fully utilized, domain specific knowledge present in the source domain has a detrimental effect on domain adaptation tasks. Therefore, this chapter proposes a linearly varying time-sequential domain relationship (Temporal Domain Relation, TDR) approach to gradually slow down the impact of source domain samples. Specifically, as the number of training times increases, the training weight of the source domain is gradually reduced. Some informative and reliable domain sharing knowledge is helpful to promote the performance of the model on the target domain. Further, this chapter proposes a rank-guided sample selection (RIS) method to evaluate the uncertainty and similarity of each sample from the source domain, select samples with information-rich and reliable domain sharing knowledge by the uncertainty and similarity score Ranking, and reassign their sample training weights. In general, by controlling the source domain weights and the sample weights, the proposed PKSD can effectively suppress the influence of domain-specific knowledge from the source domain, improving the test performance of the model on the target domain. Experimental results show that the method greatly exceeds the current most advanced method.
Referring to fig. 1, a cross-domain pedestrian re-identification method based on deep learning includes the following steps:
s100, selecting a public dataset A and a public dataset B, and taking the dataset A as a source domain D s, wherein the formula is as follows:
Wherein, Representing the i-th source domain sample,Representing a real label corresponding to an ith source domain sample, and n s represents the total number of source domain samples;
The expression for selecting part of the data in dataset B as target domain training set D T,DT is as follows:
Wherein, Represents the j-th target domain sample, n t represents the total number of target domain samples;
S200, selecting ResNet-50 model M, wherein the model M comprises two modules, the first module is an online feature encoder f (|theta t),θt is a related parameter of the first module, and the second module is a momentum feature encoder The related parameters of the second module;
the ResNet-50 model is the prior art, the data set ImageNet is the existing public data set, and compared with the initialization parameters which can be given by other public data sets, the data set ImageNet has better accuracy and does not have too large random error;
s300, calculating the loss of the initialization model M' by using a loss function;
The loss function of the initialization model M' in S300 is as follows:
S310, extracting features of the data in D T by using a momentum feature encoder, storing the extracted features in a memory feature library N, and clustering all the features in N by using a DBSCAN clustering algorithm which is the prior art and generating and connecting the features Pseudo tags corresponding to each other
S320, calculating training weights w d (i) of each iteration of the source domain by using a time sequence domain relation strategy, wherein the time sequence domain relation method is the prior art, and the maximum training weight of each iteration of the source domain is t 1 and the minimum training weight of each iteration of the source domain is t 2, wherein t 1>t2 is calculated as follows:
wd(i)=(1-s(i))×t1+s(i)×t2
Wherein, symbol% represents the remainder taking operation, i represents the ith training, e represents the maximum training times, w d (i) represents the training weight of the source domain applied to the ith iterative training, and s (i) represents the length of each part after equally dividing t 1 and t 2;
S330, calculating training weights of each source domain sample using a ranking guidance selection strategy The method comprises the following specific steps:
S331 randomly selecting a source domain sample from the source domain D s And uses the pair of line feature encoders f (|theta t)Extracting features, and then using the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution classified over target domainsAnd probability distribution classified over source domainsClassifying each source domain sample by using a class classifier C t on the target domain to measure the similarity between the source domain sample and the target domain, classifying each source domain sample by using a class classifier C s on the source domain to measure the uncertainty of each source domain sample, wherein the class classifier of the target domain and the class classifier of the source domain adopt the existing classifier, and the calculation expression is as follows:
Wherein, Representation ofProbability distribution of classification over the target domain, C t representing the class classifier over the target domain, C p representing the number of classes of pseudo tags over the target domain; Representative sample The probability distribution classified on the source domain, C s is the number of categories of the real labels on the source domain, and C s represents the category classifier on the target domain;
s332 calculation The similarity score with the target domain, si, is expressed as follows:
wherein c p represents the number of categories of pseudo tags on the target domain;
S333, calculating similarity scores of all source domain samples and target domains to form a similarity score set Then, all the similarity scores are arranged in a descending order, a source domain sample corresponding to the similarity score of the previous k% is taken as a reliability sample set delta s, and the expression is as follows:
Where τ s represents the similarity score for the kth% of the source domain samples;
S334 definition The maximum class probability and the second largest class probability on the source domain are respectivelyAndCalculation ofUncertainty U i over the source domain is expressed as follows:
s335, calculating all source domain sample uncertainty values to form an uncertainty value set Then, all uncertainty values are arranged in an ascending order, a source domain sample corresponding to the previous k% of uncertainty values is taken as an uncertainty sample set delta u, and the expression is as follows:
S336, combining the formula (6) and the formula (8) to obtain the training weight of each source domain sample The expression is as follows:
The smaller the similarity between a sample selected from a source domain and a target domain, the greater the difference in appearance information between the sample and the target domain sample, and conversely, the greater the similarity between the source domain sample and the target domain, the more likely the sample has knowledge of domain sharing with the target domain sample. For samples from the source domain The sample has a low similarity to the target domain, and the share of that model gradually decreases (TDR) with increasing training round number, whereas if the sample has a high similarity to the target domain, that contribution will not be affected by the method from TDR.
If the source domain sample has larger uncertainty, the sample also has a great amount of rich information for the model to learn. The method proposed by combining the formula (6) and the formula (8) can select samples with reliability and abundant information on the source domain, and by increasing the training weights of the samples, domain sharing knowledge from the source domain can be effectively utilized, so that the performance of the model on the target domain can be further improved.
S340, calculating cross entropy loss of the source domain according to the source domain sample training weight obtained in S336The specific expression is as follows:
Wherein, Representing source domain samplesBelongs to the category ofProbability of (2);
S350, calculating the triplet loss of the source domain according to the source domain sample training weight obtained in S336 The method comprises the following specific steps:
s351 calculating the ith to The weight lost for the triplet of anchor points isThe computational expression is as follows:
Wherein, Representation and representationThe source domain positive sample furthest apart,Representation and representationSource domain negative samples closest to the source domain;
s352, after the triplet loss of all source domain samples is calculated, the triplet loss of the source domain can be obtained The specific expression is as follows:
Wherein, AndRespectively represent source domain samplesThe distance between the positive sample in the furthest source domain and the negative sample in the nearest source domain, m represents the distance of the triplets, the more accurate description is that m represents the minimum difference between the distance between the pairs of positive sample features and the distance between the pairs of negative sample features, according to an empirical value, m is set to be 0.5, the method is a super parameter used in designing the loss function, and the main function is to pull the distance between the pairs of similar sample features and push a threshold value of the distance between the pairs of different sample features.
S360, calculating cross entropy loss of target domainAnd triplet lossThe specific expression is as follows:
Wherein, Representing a target domain sampleBelongs to the category ofIs a probability of (2).AndRespectively represent target domain samplesThe distance between the positive sample of the furthest target domain and the negative sample of the nearest target domain, m represents the interval size of the triples;
s370, obtaining a final loss function L total of the initialization model M' according to the formula (10), the formula (12), the formula (13) and the formula (14), wherein the expression is as follows:
Wherein, Representing the soft cross entropy loss weight,Representing soft triplet loss weights.
Calculating the loss of M' by using the final loss function L total in S370, updating the parameters in f (|theta t) by using gradient back transfer, and updating by using the formula (16)Is a parameter of (1):
where α is a momentum factor and t represents the number of training rounds.
S400, training the model M ' by taking a source domain and a target domain as input of an initialized model M ', updating parameters in the model M ' according to the loss calculated in the S300, and stopping training when the maximum training times are reached to obtain a trained model M ' ';
S500, inputting the pedestrian image to be predicted into a trained model M '' to obtain a pedestrian retrieval result.
Experimental design and results analysis
1, Introduction of the data set used
The present invention verifies the validity of the proposed method on three widely used public datasets, market1501, dukeMTMC-ReID and MSMT 17. For mark 1501, the dataset contains 32668 images of pedestrians of different identities taken by 6 cameras. Of these, 12936 pedestrian images of 751 identities were used for training, and the remaining images were used for testing. For DukeMTMC-ReID, the dataset contained 16522 training images, 2228 query images, and 17661 gallery images from 702 different identities taken by 8 cameras. MSMT17 is a larger scale dataset that includes 126441 images of 4101 different identities captured by 15 cameras. Specifically, 32621 images of pedestrians with 1041 different identities are used as a training set, and the rest images are used as a testing set. This section is evaluated with a commonly used average mean Precision (MEAN AVERAGE Precision, mAP) and Cumulative matching curve (cumulates MATCHING CHARACTERISTICS, CMC) of two evaluation indices. For convenience of description, the following references mark 1501, dukeMTMC-ReID and MSMT17 are respectively referred to as mark, duke and MSMT, and will not be described in any additional detail.
2, Experimental setup
In the experiment, resNet-50 was used as feature code for the proposed method and pre-training parameters on ImageNet were loaded. For the co-training set-up, each mini-batch on the source and target domains is constructed as 64 pedestrian images of 16 different identities. The network is optimized by Adam algorithm with weight decay rate of 0.0005. The entire training process was 40 times, this section used the norm-up strategy in the first 10 exercises and set the initial learning rate to 0.00035. For each training, the momentum feature encoder f m(·|θm) updates its parameters in a time-series moving average fashion, and the momentum factor α=0.999. Reassignment of pseudo tags occurs after every 400 iterations. All pedestrian images are resized to 256 x 128 size as input to the network. During the test, the output of the final batch normalization (Batch Normalization, BN) layer is presented as the final feature of the pedestrian image. All experiments were performed on the Pytorch platform and three NVIDIA TITAN V GPUs were used. Note that the test phase only has the momentum signature encoder fm (|θ m) for testing.
Table 1 compares with some of the latest methods on the mark and Duke datasets, respectively.
Table 2 compares with some of the most recent methods on the MSMT dataset.
3, Ablation study
To verify the validity of the proposed method modules, the section combines the different method modules and tests with Duke and mark as the source and target domains, respectively. As shown in fig. 2, the left side of fig. 2 shows the test results of different methods on the source domain, and the right side of fig. 2 shows the test results of different methods on the target domain. Wherein, mark and Duke are respectively used as a target domain and a source domain. As can be seen from the test results on the source domain, the model performance under the two-stage training strategy shows a greatly reduced trend along with the increase of the iteration times, and only 20% of mAP is finally obtained. This suggests that the two-stage training strategy may result in forgetting knowledge of the source domain by the fine-tuning stage model. In contrast, the collaborative training method provided by the invention well overcomes the problem of catastrophic forgetting encountered by source domain knowledge and ensures the final performance of the model on the source domain. From the test results on the target domain, it can be seen that the test performance of the model under the target domain is limited because the two-stage training strategy does not fully utilize the knowledge from the source domain. However, the method provided by the invention can fully and effectively utilize source domain knowledge and remarkably improve the test result of the model in the target domain. Specifically, the model under the two-stage training strategy obtains 74.7% mAP, and the standard method under the collaborative training provided by the invention can reach 79.1% mAP. By introducing TDR to prevent overfitting of the model on the source domain, the test result of the model under the target domain achieves 0.9% improvement of mAP. Further, in combination with the RIS module, the model eventually reached 80.7% mAP. From experimental results, the method provided by the invention can effectively utilize knowledge from a source domain and further promote the test result of the model on a target domain.
4, Comparison of results
The method compares the PKSD method with the current mainstream cross-domain pedestrian re-identification method. Note that the global pooling layer (Gobal Average Pooling, GAP) of the model to the final feature here is changed to a generalized average pooling layer (Generalized Mean Pooling, gem). The experimental results are shown in table 1, and it can be seen that PKSD has a performance greatly exceeding that of all the most advanced cross-domain pedestrian re-identification methods under the strategy of collaborative training. Specifically, in the setting of 'Duke to Market', three methods based on generation, namely SPGAN, PTGAN and ATNet, were first compared. In contrast to the best generation method ATNet, PKSD was improved by 58.5% and 37.8% at mAP and Rank1, respectively. Further, the mainstream methods represented by NRMT, MEB-Net, UNRN, GLT, IDM, PDA and the like are compared. The proposed method achieves the best performance, whether on 'mark to Duke' or 'Duke to mark'. In particular, on 'Marke to Duke', PKSD brings about an improvement of 0.9% in mAP compared to PDA. Also, PKSD improves 1.9% mAP over PDA on the 'Duke to mark'.
The method of the present invention also performed experiments on larger and more challenging MSMT datasets. Some of the latest approaches, such as NRMT, UNRN, GLT, IDM and PDA, demonstrate their good performance on MSMT datasets. As shown in Table 2, PKSD achieved 63.8% Rank1 and 36.5% mAP on the `Market to MSMT`. Likewise, PKSD achieved 63.8% Rank1 and 36.7% mAP under the 'Duke to MSMT' setting. Compared with other methods, PKSD provided by the method obtains the best test result. Overall, these experiments demonstrate that fully efficient use of knowledge from the source domain can further enhance the performance of the model on the target domain.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.

Claims (2)

1.一种基于深度学习的跨域行人再识别方法,其特征在于:包括如下步骤:1. A cross-domain pedestrian re-identification method based on deep learning, characterized in that it includes the following steps: S100:选用公开数据集A和B,将数据集A作为源域Ds,公式如下:S100: Select public datasets A and B, and use dataset A as the source domain D s . The formula is as follows: 其中,表示第i个源域样本,表示第i个源域样本对应的真实标签,ns表示源域样本总数;in, represents the i-th source domain sample, represents the true label corresponding to the i-th source domain sample, and n s represents the total number of source domain samples; 选取数据集B中的部分数据作为目标域训练集DT,DT的表达式如下:Select part of the data in the dataset B as the target domain training set DT , and the expression of DT is as follows: 其中,表示第j个目标域样本,nt表示目标域样本总数;in, represents the jth target domain sample, and n t represents the total number of target domain samples; S200:选用ResNet-50模型M,模型M包括两个模块,模块一为在线特征编码器f(·|θt),θt为模块一的相关参数,模块二为动量特征编码器 为模块二的相关参数;S200: Select ResNet-50 model M, which consists of two modules. Module 1 is the online feature encoder f(·|θ t ), where θ t is the relevant parameter of module 1, and module 2 is the momentum feature encoder. are the relevant parameters of module 2; 使用数据集ImageNet对模型M进行参数初始化,得到初始化模型M′;Use the dataset ImageNet to initialize the parameters of model M and obtain the initialized model M′; S300:利用损失函数计算初始化模型M′的损失,初始化模型M′的损失函数如下:S300: Calculate the loss of the initialization model M′ using the loss function. The loss function of the initialization model M′ is as follows: S310:使用动量特征编码器对DT中的数据进行特征提取并保存到记忆特征库N中,然后利用DBSCAN聚类算法对N中所有的特征进行聚类并生成与一一对应的伪标签 S310: Use the momentum feature encoder to extract features from the data in DT and save them in the memory feature library N. Then use the DBSCAN clustering algorithm to cluster all the features in N and generate One-to-one pseudo-labels S320:使用时序域关系策略计算源域每次迭代的训练权重wd(i),预设源域每次迭代的训练权重最大为t1,最小为t2,其中t1>t2,计算表达式如下:S320: Calculate the training weight w d (i) of each iteration of the source domain using the temporal domain relationship strategy. The maximum training weight of each iteration of the source domain is preset to be t 1 and the minimum is t 2 , where t 1 >t 2 . The calculation expression is as follows: wd(i)=(1-s(i))×t1+s(i)×t2 w d (i)=(1-s(i))×t 1 +s(i)×t 2 其中,符号%表示取余操作,i表示第i次训练,e表示最大训练次数,wd(i)表示源域作用在第i次迭代训练时的训练权重,s(i)表示将t1和t2等间隔分后每份的长度;The symbol % represents the modulo operation, i represents the i-th training, e represents the maximum number of training times, w d (i) represents the training weight of the source domain in the i-th iteration training, and s(i) represents the length of each portion after t 1 and t 2 are equally spaced; S330:使用排名引导选择策略计算每个源域样本的训练权重具体步骤如下:S330: Calculate the training weight of each source domain sample using the ranking-guided selection strategy The specific steps are as follows: S331:从源域Ds随机选取一个源域样本1≤i≤ns,并利用线特征编码器f(·|θt)对进行提取特征,然后分别利用目标域的类别分类器和源域的类别分类器对进行分类,分别计算在目标域上分类的概率分布和在源域上分类的概率分布计算表达式如下:S331: Randomly select a source domain sample from the source domain Ds 1≤i≤n s , and use the line feature encoder f(·|θ t ) to Extract features, and then use the category classifier of the target domain and the category classifier of the source domain to classify Classify and calculate separately Probability distribution of classification in the target domain and the probability distribution of classification on the source domain The calculation expression is as follows: 其中,表示在目标域上分类的概率分布,Ct表示目标域上的类别分类器,cp表示目标域上伪标签的类别数量;代表样本在源域上分类的概率分布,cs为源域上真实标签的类别数量,Cs表示目标域上的类别分类器;in, express The probability distribution of classification on the target domain, C t represents the category classifier on the target domain, and c p represents the number of categories of pseudo labels on the target domain; Representative samples The probability distribution of classification on the source domain, cs is the number of categories of the true label on the source domain, and Cs represents the category classifier on the target domain; S332:计算与目标域的相似性分数Si,表达式如下:S332: Calculation The similarity score S i with the target domain is expressed as follows: 其中,cp表示目标域上伪标签的类别数量;Among them, c p represents the number of pseudo-label categories in the target domain; S333:计算所有源域样本与目标域的相似性分数,组成相似性分数集合然后将所有相似性分数进行降序排列,取靠前的k%的相似性分数对应的源域样本作为可靠性样本集合δs,表达式如下:S333: Calculate the similarity scores of all source domain samples and the target domain to form a similarity score set Then, all similarity scores are sorted in descending order, and the source domain samples corresponding to the top k% similarity scores are taken as the reliability sample set δ s , which is expressed as follows: 其中,τs表示第k%个源域样本的相似性分数;Among them, τs represents the similarity score of the k% source domain samples; S334:定义在源域上的最大类别概率和第二大类别概率分别为计算在源域上的不确定性Ui,表达式如下:S334: Definition The maximum and second largest category probabilities in the source domain are and calculate The uncertainty U i in the source domain is expressed as follows: S335:计算所有源域样本不确定性值,组成不确定性值集合然后将所有不确定性值进行升序排列,取靠前的k%的不确定性值对应的源域样本作为不确定性样本集合δu,表达式如下:S335: Calculate the uncertainty values of all source domain samples to form an uncertainty value set Then, all uncertainty values are sorted in ascending order, and the source domain samples corresponding to the top k% uncertainty values are taken as the uncertainty sample set δ u , which is expressed as follows: S336:结合公式(6)和公式(8)得到每个源域样本的训练权重表达式如下:S336: Combine formula (6) and formula (8) to obtain the training weight of each source domain sample The expression is as follows: S340:依据S336得到的源域样本训练权重,计算源域的交叉熵损失具体表达式如下:S340: Calculate the cross entropy loss of the source domain based on the source domain sample training weights obtained in S336 The specific expression is as follows: 其中,表示源域样本属于类别的概率;in, Represents source domain samples Belongs to category The probability of S350:依据S336得到的源域样本训练权重,计算源域的三元组损失具体步骤如下:S350: Calculate the triplet loss of the source domain based on the source domain sample training weights obtained in S336 The specific steps are as follows: S351:计算第i个以为锚点的三元组损失的权重为计算表达式如下:S351: Calculate the i-th The weight of the triplet loss of the anchor point is The calculation expression is as follows: 其中,表示与距离最远的源域正样本,表示与距离最近的源域负样本;in, Representation and The farthest positive source domain sample, Representation and The closest source domain negative sample; S352:计算所有源域样本的三元组损失后,可以得到源域的三元组损失具体表达式如下:S352: After calculating the triplet loss of all source domain samples, the triplet loss of the source domain can be obtained The specific expression is as follows: 其中,分别表示源域样本与最远源域正样本和最近源域负样本之间的距离,m表示三元组的间隔大小;in, and Represent the source domain samples The distance between the farthest source domain positive sample and the nearest source domain negative sample, m represents the interval size of the triplet; S360:计算目标域的交叉熵损失和三元组损失具体表达式如下:S360: Calculate the cross entropy loss of the target domain and triplet loss The specific expression is as follows: 其中,表示目标域样本属于类别的概率。分别表示目标域样本与最远目标域正样本和最近目标域负样本间的距离,m表示三元组的间隔大小;in, Represents the target domain sample Belongs to category probability. and Represent the target domain samples respectively The distance between the farthest target domain positive sample and the nearest target domain negative sample, m represents the interval size of the triplet; S370:根据公式(10)、公式(12)、公式(13)和公式(14)可得到初始化模型M’的最终损失函数Ltotal,表达式如下:S370: According to formula (10), formula (12), formula (13) and formula (14), the final loss function L total of the initialization model M' can be obtained, and the expression is as follows: 其中,表示软交叉熵损失权重,表示软三元组损失权重;in, represents the soft cross entropy loss weight, represents the soft triplet loss weight; S400:将源域和目标域作为初始化模型M′的输入对模型M′进行训练,根据S300中计算的损失更新M′中的参数,当达到最大训练次数时停止训练,得到训练好的模型M″;S400: using the source domain and the target domain as inputs of the initialization model M′ to train the model M′, updating the parameters in M′ according to the loss calculated in S300, stopping the training when the maximum number of training times is reached, and obtaining the trained model M″; S500:将待预测行人图像输入训练好的模型M″中,得到行人的检索结果。S500: Input the pedestrian image to be predicted into the trained model M″ to obtain the retrieval result of the pedestrian. 2.如权利要求1所述的一种基于深度学习的跨域行人再识别方法,其特征在于:利用S370中最终损失函数Ltotal计算M′的损失,并利用梯度反传更新f(·|θt)中参数,利用公式(16)更新中的参数:2. A cross-domain person re-identification method based on deep learning as claimed in claim 1, characterized in that: the loss of M′ is calculated using the final loss function L total in S370, and the parameters in f(·|θ t ) are updated using gradient back propagation, and formula (16) is used to update Parameters in: 其中,α是动量因子,t表示训练的轮数。Among them, α is the momentum factor and t represents the number of training rounds.
CN202210554612.XA 2022-05-19 2022-05-19 A cross-domain person re-identification method based on deep learning Active CN114882531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210554612.XA CN114882531B (en) 2022-05-19 2022-05-19 A cross-domain person re-identification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210554612.XA CN114882531B (en) 2022-05-19 2022-05-19 A cross-domain person re-identification method based on deep learning

Publications (2)

Publication Number Publication Date
CN114882531A CN114882531A (en) 2022-08-09
CN114882531B true CN114882531B (en) 2025-04-01

Family

ID=82677958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210554612.XA Active CN114882531B (en) 2022-05-19 2022-05-19 A cross-domain person re-identification method based on deep learning

Country Status (1)

Country Link
CN (1) CN114882531B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205570B (en) * 2022-09-14 2022-12-20 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning
CN115482927B (en) * 2022-09-21 2023-05-23 浙江大学 Children's pneumonia diagnostic system based on little sample
CN117892183B (en) * 2024-03-14 2024-06-04 南京邮电大学 A method and system for electroencephalogram signal recognition based on reliable transfer learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326731A (en) * 2021-04-22 2021-08-31 南京大学 Cross-domain pedestrian re-identification algorithm based on momentum network guidance

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597887B (en) * 2020-04-08 2023-02-03 北京大学 A pedestrian re-identification method and system
CN111881714B (en) * 2020-05-22 2023-11-21 北京交通大学 An unsupervised cross-domain person re-identification method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326731A (en) * 2021-04-22 2021-08-31 南京大学 Cross-domain pedestrian re-identification algorithm based on momentum network guidance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的跨域行人再识别算法研究;张俊银;《中国优秀硕士学位论文全文数据库》;20240915(第09期);61 *

Also Published As

Publication number Publication date
CN114882531A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN114882531B (en) A cross-domain person re-identification method based on deep learning
CN107480261B (en) Fine-grained face image fast retrieval method based on deep learning
CN110942091B (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
KR102305568B1 (en) Finding k extreme values in constant processing time
CN110941734B (en) Deep unsupervised image retrieval method based on sparse graph structure
CN113326731A (en) Cross-domain pedestrian re-identification algorithm based on momentum network guidance
CN108846259A (en) A kind of gene sorting method and system based on cluster and random forests algorithm
CN108446334B (en) A Content-Based Image Retrieval Method with Unsupervised Adversarial Training
CN109241377A (en) A kind of text document representation method and device based on the enhancing of deep learning topic information
CN111125411A (en) A Large-scale Image Retrieval Method Based on Deep Strong Correlation Hash Learning
CN113095229B (en) Self-adaptive pedestrian re-identification system and method for unsupervised domain
CN113505225A (en) Small sample medical relation classification method based on multilayer attention mechanism
CN113326392A (en) Remote sensing image audio retrieval method based on quadruple hash
TW202109312A (en) Image feature extraction method, network training method, electronic device and computer readable storage medium
CN111079840A (en) Image Semantic Complete Labeling Method Based on Convolutional Neural Network and Concept Lattice
CN109871379A (en) An online hash nearest neighbor query method based on data block learning
CN112818859A (en) Deep hash-based multi-level retrieval pedestrian re-identification method
CN113920536B (en) Unsupervised pedestrian re-identification method based on online hierarchical clustering
CN115730312A (en) Deep hash-based family malware detection method
CN110275972A (en) A Content-Based Instance Retrieval Method Introducing Adversarial Training
CN110659375A (en) Hash model training method, similar object retrieval method and device
CN110796260A (en) A Neural Network Model Optimization Method Based on Class Expansion Learning
CN115984630B (en) Small sample open set image recognition method based on low-dimensional contrast adaptation
CN114020948B (en) Sketch image retrieval method and system based on sequencing cluster sequence discrimination selection
CN116630694A (en) A target classification method, system and electronic equipment for more marked images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant