Disclosure of Invention
The invention aims to overcome the defect that the prior art cannot effectively detect and defend the counterattack in the task of re-identifying pedestrians, and provides a multi-expert counterattack detection method based on the inconsistency of contextual characteristics.
The invention provides a multi-expert anti-attack detection method based on context feature inconsistency, which comprises the following steps:
in the training phase, the training phase is carried out,
s1: establishing a pedestrian re-identification data set, wherein the pedestrian re-identification data set comprises a query image set and a gallery; the query image set comprises a benign query image set and a confrontational query image set;
s2: selecting a plurality of pedestrian re-identification expert models, inputting the benign query image set, the confrontation query image set and the gallery in the S1 into the plurality of pedestrian re-identification expert models, and extracting image features in the benign query image set, the confrontation query image set and the gallery; searching in a gallery by adopting a query image set, and taking a set of search results as a support set of the query image set; wherein the support set of the benign query image set is a benign support set, and the support set of the confrontation query image set is a confrontation support set;
s3: labeling the characteristics of the benign query image set and the benign support set, and forming a benign training set according to the labels on the characteristics; labeling the characteristics of the confrontation query image set and the confrontation support set, and forming a confrontation training set according to the labels on the characteristics;
s4: obtaining context characteristics according to the benign training set and the confrontation training set; inputting the context characteristics into a multi-layer perceptron for training, and taking the multi-layer perceptron as an anti-attack detector;
in the application phase, the application phase is,
s5: establishing a pedestrian re-identification test set, acquiring an image to be inquired in the pedestrian re-identification test set, inputting the image to be inquired into a plurality of pedestrian re-identification expert models, and extracting context characteristics of the image to be inquired;
s6: inputting the context characteristics of the image to be queried into a counter attack detector, wherein the counter attack detector outputs the attacked probability;
s7: and evaluating the performance of the anti-attack detector according to the output result of the anti-attack detector on the pedestrian re-identification test set.
Preferably, in S1, the benign query image set includes benign query samples, and the benign query samples are query samples of a training set in the pedestrian re-identification benchmark test data set; the challenge query image set interferes the benign query image set by adopting a challenge attack method, so that a challenge query sample is generated; the gallery comprises gallery samples, images of pedestrians are randomly selected from the training set in the Market1501 data set to serve as query image samples, and unselected images serve as gallery samples.
Preferably, S2 includes the steps of:
s2.1: inputting the benign query image set, the confrontation query image set and the gallery in S1Re-identifying the expert models to a plurality of pedestrians; by using Fn(. N) is a function of the nth pedestrian re-identification expert model, and F is adoptedn(I) The image characteristics of a benign query image set, the image characteristics of a confrontation query image set and the image characteristics of a map library extracted by the nth pedestrian re-identification expert model are referred to;
s2.2: calculating the distance between the image features of the query image set and the image features of the gallery according to the image features in the S2.1, returning K images of the image features of the gallery with the closest distance to the image features of the query image set, taking the set of the K images as a support set of the query image set, and recording the support set as Sn={Sn,jJ ═ 1,. K }; n represents the nth pedestrian re-identification expert model, and j represents the jth image in the support set;
preferably, in S3, the features of the benign query image set and the benign support set are labeled with a label y00 and forming a benign training set according to the labels on the features, wherein the benign training set is recorded as { (x)i,y0) 1,. M }; labeling the features of the confrontation query image set and the confrontation support set with a label of y 11 and forming a confrontation training set according to the labels on the features, wherein the confrontation training set is marked as { (x)i,y1) 1,. M }; where M is the size of the benign training set or the antagonistic training set.
Preferably, in S4, the context features include query-support neighbor features, support-support neighbor features, and cross-expert neighbor features.
Preferably, the step of obtaining the contextual characteristics comprises:
s4.1: according to the benign training set and the confrontation training set, calculating the characteristics of the image I in the query image set and the corresponding support concentrated image S in each pedestrian re-identification expert modeljCosine similarity A 'between features of'q-sIdentifying again a 'of expert models for a plurality of pedestrians'q-sStack derived query-support neighbor feature Aq-s;
To obtain A'q-sThe calculation formula of (2) is as follows:
A'q-s[j]=CosSimilarity(F(I),F(Sj))
wherein F (I) represents the image characteristics of the query image set in the expert model for pedestrian re-identification, F (S)j) Re-identifying the image characteristics of the jth image of the support set corresponding to the query image set in the expert model for the pedestrian; a'q-sStacking to obtain N x K dimensionality inquiry-support neighbor characteristic Aq-s,Aq-sA two-dimensional matrix is formed; n is the number of pedestrian re-identification expert models, and K is the number of support concentrated images;
s4.2: according to the benign training set and the confrontation training set, calculating cosine similarity A between the feature of the ith image in the support set and the feature of the jth image in the support set in each pedestrian re-recognition expert model1 s-s[i,j](ii) a Re-identifying A in expert model for multiple pedestrians1 s-s[i,j]Stacked supported-supported neighbor feature A1 s-s;
To obtain A1 s-s[i,j]The calculation formula of (2) is as follows:
A1 s-s[i,j]=CosSimilarity(F(Si),F(Sj))
wherein, F (S)i) Image features expressed as the ith image of the support set, F (S)j) Image features represented as the jth image of the support set; a. the1 s-sFor K x K dimensional matrices, keep K x (K-1)/2 elements in the upper right (lower left) matrix;
s4.3: a in S4.21 s-sLeft-up (left-down) matrix elements of (c) to a new vector A's-s[i,j]And then identifying A 'in the expert models by the pedestrians's-s[i,j]Stacking results in a new support-support neighbor feature As-s;A's-s[i,j]The dimension of (a) is K' ═ K (K-1)/2; a. thes-sHas the dimension of N x K', As-sA two-dimensional matrix is formed;
s4.4: taking the nth pedestrian re-identification expert model as a basic model, calculating the frequency of the jth image of the support set in the basic model appearing in the support sets in other pedestrian re-identification expert models, and recording the frequency as Ac-e[n,j]Finally, cross-expert neighbor feature A is obtainedc-e;
Ac-e[n,j]The calculation formula of (2) is as follows:
wherein n is represented as the nth pedestrian re-identification expert model, F (-) is an indicator function, when the parameter is true, 1 is output, otherwise 0 is output; slA set of support sets representing remaining pedestrian re-identification expert models excluding the base model; a. thec-eHas a dimension of N x K, Ac-eA two-dimensional matrix is formed;
s4.5: a is to beq-s,As-s,Ac-eReducing the two-dimensional matrix into a one-dimensional vector, and reducing A of the one-dimensional vectorq-s,As-s,Ac-eConnecting to obtain a context feature x of a single query image sample, wherein the dimension of x is d, and d is N K + N K' + N K;
s4.6: the context features are input into the multi-layer perceptron to train, and the multi-layer perceptron is used as an anti-attack detector.
Preferably, in S6, according to the probability of being attacked output by the counter attack detector, when the probability is greater than a set probability threshold, the image to be queried is a counter query sample, otherwise, the image is a benign query sample.
Preferably, in S7, the performance of the countering attack detector is evaluated according to the output result of the countering detector for the pedestrian re-identification test set, and using the classification precision, the area under the receiver operation characteristic curve, and the harmonic mean value of the determination precision and the recall rate.
Preferably, the counterattack method includes a deep misordering counterattack method and a hostile template counterattack method.
Has the advantages that:
1. according to the method, the ReID networks with different architectures are used as expert models in the scheme, the context inconsistent features are extracted, the multi-layer perceptron is trained to detect the counterattack to the ReID system, and the problem of stability when the ReID system encounters the counterattack method can be effectively solved.
2. The context characteristics based on the context inconsistency, provided by the invention, more effectively utilize rich information contained in top-K retrieval obtained by the output of the ReID system, more fully excavate the context inconsistency of the result output by the ReID system caused by the anti-attack sample under the comparison with benign query samples, wherein the context inconsistency comprises the inconsistency of the characteristic distance between the anti-query image and the top-K retrieval thereof with the benign query image, the inconsistency of the distance between images in a support set of the anti-query image with the benign query image, the inconsistency of top-K retrieval of the benign query image obtained by a plurality of expert ReID models with the anti-query image, and the like, and the success rate of anti-attack detection is improved.
3. The invention obtains the context characteristics through the benign training set and the countertraining set, thereby training the attack counterattack detector, not only successfully detecting the counterattack method of the countertraining sample, but also effectively defending other counterattack methods, and having adaptability aiming at different counterattack methods.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For resisting attack, in a person re-identification (ReID) problem, an attacker aims to enable a ReID system to retrieve a person image with a wrong identity, and in the embodiment, the attacker is assumed to attack the ReID system by disturbing an inquiry image; when the countermeasures sample deception ReID system causes retrieval of wrong images from its library set, confusing retrieval results are caused, in the embodiment, the first K retrieval results returned by the ReID system are defined as support sets, each retrieval result in the support sets is defined as a support sample, and the support set of the normal query example is called a benign support set, and the support set of the disturbance query example is called a countermeasures support set;
as shown in fig. 1, fig. 2a, fig. 2b, fig. 2c, and fig. 2d, the method for detecting multi-expert counterattack based on context feature inconsistency according to this embodiment includes the following steps:
in the training phase, the training phase is carried out,
s1: establishing a pedestrian re-identification data set, wherein the pedestrian re-identification data set comprises a query image set and a gallery; the query image set comprises a benign query image set and a confrontational query image set;
the benign query image set comprises benign query samples, and the benign query samples adopt query samples of a training set in a pedestrian re-identification benchmark test data set (in the embodiment, a Market1501 data set); the training set contains 12936 cropped images of 751 total pedestrian identities, and the image resolution is 64x 128; the countercheck query image set adopts a countercheck attack method to interfere the benign query image set so as to generate countercheck query samples, and in the embodiment, two countercheck attack methods, namely a deep Mis-ordering countercheck attack method (deep gas-ranking) and an adversary template countercheck attack method (advPattern), are adopted; the gallery comprises a gallery sample, an image of a pedestrian is randomly selected as a query image sample in a training set in the Market1501 data set, and an unselected image is used as the gallery sample; no samples of the query image set are separated from the gallery samples of the gallery in the training set;
s2: selecting a plurality of pedestrian re-identification expert models, inputting the benign query image set, the confrontation query image set and the gallery in the S1 into the plurality of pedestrian re-identification expert models, and extracting image features in the benign query image set, the confrontation query image set and the gallery; searching in a gallery by adopting a query image set, and taking a set of search results as a support set of the query image set; wherein the support set of the benign query image set is a benign support set, and the support set of the confrontation query image set is a confrontation support set;
s2.1: inputting the benign query image set, the confrontation query image set and the gallery in the S1 into a plurality of pedestrian re-identification expert models; by using Fn(. N) is a function of the nth pedestrian re-identification expert model, and F is adoptedn(I) The image characteristics of a benign query image set, the image characteristics of a confrontation query image set and the image characteristics of a map library extracted by the nth pedestrian re-identification expert model are referred to;
s2.2: calculating the distance between the image features of the query image set and the image features of the gallery according to the image features in the S2.1, returning K images of the image features of the gallery with the closest distance to the image features of the query image set, taking the set of the K images as a support set of the query image set, and recording the support set as Sn={Sn,jJ ═ 1,. K }; n represents the nth pedestrian re-identification expert modelJ denotes the jth image in the support set;
s3: labeling the characteristics of the benign query image set and the benign support set with the label of y00 and forming a benign training set according to the labels on the features, wherein the benign training set is recorded as { (x)i,y0) 1,. M }; labeling the features of the confrontation query image set and the confrontation support set with a label of y 11 and forming a confrontation training set according to the labels on the features, wherein the confrontation training set is marked as { (x)i,y1) 1,. M }; wherein M is the size of the benign training set or the antagonistic training set;
s4: obtaining context characteristics according to the benign training set and the confrontation training set; inputting the context characteristics into a multi-layer perceptron for training, and taking the multi-layer perceptron as an anti-attack detector; the context features include query-support neighbor features, support-support neighbor features, and cross-expert neighbor features;
s4.1: according to the benign training set and the confrontation training set, calculating the characteristics of the image I in the query image set and the corresponding support concentrated image S in each pedestrian re-identification expert modeljCosine similarity A 'between features of'q-sIdentifying again a 'of expert models for a plurality of pedestrians'q-sStack derived query-support neighbor feature Aq-s;
To obtain A'q-sThe calculation formula of (2) is as follows:
A'q-s[j]=CosSimilarity(F(I),F(Sj))
wherein F (I) represents the image characteristics of the query image set in the expert model for pedestrian re-identification, F (S)j) Re-identifying the image characteristics of the jth image of the support set corresponding to the query image set in the expert model for the pedestrian; a'q-sStacking to obtain N x K dimensionality inquiry-support neighbor characteristic Aq-s,Aq-sA two-dimensional matrix is formed; n is the number of pedestrian re-identification expert models, and K is the number of support concentrated images;
the pedestrian re-identification expert model adopts four candidate models with superior performance acquired from a Market1501 data set, namely a PCB (printed Circuit Board), an AlignedReiD (AR), a HACNN (Hacnn) and an LSRO (false least squares) as a pedestrian re-identification expert model;
the formula for cosine similarity is:
wherein A ist,BtThe t-th dimension values of the vectors A and B are respectively;
s4.2: according to the benign training set and the confrontation training set, calculating cosine similarity A between the feature of the ith image in the support set and the feature of the jth image in the support set in each pedestrian re-recognition expert model1 s-s[i,j](ii) a Re-identifying A in expert model for multiple pedestrians1 s-s[i,j]Stacked supported-supported neighbor feature A1 s-s;
To obtain A1 s-s[i,j]The calculation formula of (2) is as follows:
A1 s-s[i,j]=CosSimilarity(F(Si),F(Sj))
wherein, F (S)i) Image features expressed as the ith image of the support set, F (S)j) Image features represented as the jth image of the support set; a. the1 s-sThe matrix is a K-K dimensional matrix, the elements of the matrix are uniform, the matrix is a symmetrical matrix, the diagonal elements are always 1, and K-1/2 elements are kept in an upper right (lower left) matrix;
s4.3: a in S4.21 s-sLeft-up (left-down) matrix elements of (c) to a new vector A's-s[i,j]And then identifying A 'in the expert models by the pedestrians's-s[i,j]Stacking results in a new support-support neighbor feature As-s;A's-s[i,j]The dimension of (a) is K' ═ K (K-1)/2; a. thes-sHas the dimension of N x K', As-sA two-dimensional matrix is formed;
s4.4: taking the nth pedestrian re-identification expert model as a basic model, calculating the frequency of the jth image of the support set in the basic model appearing in the support sets in other pedestrian re-identification expert models, and recording the frequency as Ac-e[n,j]Finally, cross-expert neighbor feature A is obtainedc-e;
Ac-e[n,j]The calculation formula of (2) is as follows:
wherein n is represented as the nth pedestrian re-identification expert model, F (-) is an indicator function, when the parameter is true, 1 is output, otherwise 0 is output; slA set of support sets representing remaining pedestrian re-identification expert models excluding the base model; a. thec-eHas a dimension of N x K, Ac-eA two-dimensional matrix is formed;
s4.5: a is to beq-s,As-s,Ac-eReducing the two-dimensional matrix into a one-dimensional vector, and reducing A of the one-dimensional vectorq-s,As-s,Ac-eConnecting to obtain a context feature x of a single query image sample, wherein the dimension of x is d, and d is N K + N K' + N K;
s4.6: inputting the context characteristics into a multi-layer perceptron (MLP) for training, and taking the multi-layer perceptron as an anti-attack detector; the multi-layer perceptron comprises two hidden layers, wherein the two hidden layers comprise 512 nodes and 256 nodes, and a ReLU function is used as an activation function and is used as a binary classification problem for training;
in the training of the multilayer perceptron, an SGD optimizer with momentum of 0.9 is adopted for training, and the learning rate is 1 e-4; the multi-layer perceptron training is completed after 5000 iterations, the batch processing size is set to 1024 times, and a pyroch frame is adopted on the NVIDIA GTX 2080TI GPU;
in the application phase, the application phase is,
s5: establishing a pedestrian re-identification test set, acquiring an image to be inquired in the pedestrian re-identification test set, inputting the image to be inquired into a plurality of pedestrian re-identification expert models, and extracting context characteristics of the image to be inquired;
s6: inputting the context characteristics of the image to be queried into a counter attack detector, wherein the counter attack detector outputs the attacked probability; according to the attacked probability output by the anti-attack detector, when the probability is greater than a set probability threshold, the image to be inquired is an anti-inquiry sample, otherwise, the image to be inquired is a benign inquiry sample; the probability threshold set in this embodiment is 0.5;
s7: the performance of the counter attack detector is evaluated according to the output result of the counter attack detector for the pedestrian re-identification test set, and by adopting the classification precision (Acc), the area under the receiver operation characteristic curve (AUC) and the harmonic mean value of the judgment precision and the recall rate (F1).
The specific performance of the application phase anti-attack detection method is shown by the following three tables:
table 1: the detection performance of the different numbers of pedestrian re-recognition expert models on the Market1501 data set against the resistant attacks;
expert model
|
Acc(%)
|
AUC(%)
|
F1(%)
|
AR* |
95.2
|
99.1
|
95.5
|
AR*+PCB
|
97.8
|
99.7
|
97.9
|
AR*+PCB+LSRO
|
98.4
|
99.8
|
98.4
|
AR*+PCB+LSRO+HACNN
|
98.5
|
99.8
|
98.6 |
As can be seen from table 1, more pedestrian re-identification expert models have better detection performance, that is, more pedestrian re-identification expert models bring more context features, so that the extracted context features can distinguish benign samples from confrontation samples;
table 2: the attack target model is used/not used on the Market1501 data set as the adversarial attack detection performance of the pedestrian re-identification expert model;
expert model
|
Acc(%)
|
AUC(%)
|
F1(%)
|
AR* |
95.2
|
99.1
|
95.5
|
AR*+PCB+LSRO+HACNN
|
98.5
|
99.8
|
98.6
|
PCB
|
88.2
|
95.1
|
88.7
|
PCB+LSRO
|
93.7
|
98.5
|
93.9
|
PCB+LSRO+HACNN
|
94.2
|
98.5
|
94.2 |
As can be seen from table 2, the number indicates the known anti-attack target model of the anti-attack method, and it can be seen in the table that it is beneficial to use the attack target model as one of the pedestrian re-identification expert models;
table 3: the detection performance of the antagonistic attacks with different numbers of support sets on the Market1501 data set;
number of support set searches
|
Acc(%)
|
AUC(%)
|
F1(%)
|
K=1
|
92.3
|
99.2
|
92.9
|
K=5
|
94.4
|
99.7
|
94.7
|
K=10
|
97.5
|
99.8
|
97.6
|
K=15
|
98.5
|
99.8
|
98.6
|
K=20
|
98.5
|
99.8
|
98.5
|
K=20
|
98.5
|
99.8
|
98.6 |
As can be seen from table 3, the attack detection performance was evaluated when K ═ 1, 5, 10, 15, 20, 30, where K ═ 1 indicates that there is no support-support neighbor feature, and only the query-support neighbor feature and cross-expert neighbor feature functions were used; it can be seen that using a larger support set can provide a better attack detection rate, and when K is 15, a detection accuracy of 98.5% is achieved, which is 6.2% higher than that of K1; and table 1 and table 2 were evaluated using K15.
As shown in fig. 3, the top-10 benign support set results obtained by the five pedestrian re-identification expert models before and after the benign query sample is attacked; in this embodiment, a Deep Mis-ranking attack method is adopted, and an aligndreid expert model is adopted as an attack object.
As shown in fig. 4, the benign query sample (corresponding to the benign sample in the graph) is marked with a diamond pattern, and the search result (corresponding to the benign support sample in the graph) is marked with a snowflake pattern; the confrontation query sample (confrontation sample in the corresponding graph) is marked by a square pattern, and the retrieval result of the confrontation query sample (confrontation support sample in the corresponding graph) is marked by a circular pattern; as can be seen from the figure, in the embedding space, the retrieval results of the benign query samples are tightly clustered around the benign query samples, while the retrieval results of the countermeasure query samples are more dispersed.
As shown in fig. 5a, 5b, and 5c, in fig. 5a, the left peak is a disturbance sample, the right peak is a benign sample, and the query-support relationship is defined as an average value of cosine similarities between the features of the benign query sample (corresponding to the benign sample in the figure), the features of the countermeasure query sample (corresponding to the disturbance sample in the figure), and the image features of the support set, respectively; as can be seen from the graph, a challenge query sample generally has lower similarity to its support set in feature space than a benign query sample; in fig. 5b, the left peak is a disturbance sample, the right peak is a benign sample, and the support-support relationship is defined as the average value of cosine similarity between image features of the support set of each query image in the same support set, and it can be seen from the figure that images of the countermeasure support set (corresponding to the disturbance samples in the figure) have lower similarity in feature space compared to images of the benign support set (corresponding to the benign samples in the figure); in FIG. 5c, the left peak is a perturbed sample and the right peak is a benign sample, described for the number of images of the common support set over all support sets; as can be seen from the figure, benign support sets returned by different pedestrian re-identification expert models are overlapped with each other greatly; for benign query samples (corresponding to benign samples in the graph), different expert models tend to return the same search;
the multi-expert attack-fighting detection method based on the context feature inconsistency provided by the embodiment has the following beneficial effects:
1. by using a plurality of ReID networks with different architectures as expert models in the scheme and extracting context inconsistent features to train the multi-layer perceptron to detect counterattack to the ReID system, the problem of stability when the ReID system encounters counterattack methods can be effectively solved.
2. The proposed context characteristics based on the context inconsistency more effectively utilize rich information contained in top-K retrieval obtained by the output of the ReID system, more fully excavate the context inconsistency of results output by the ReID system caused by the anti-attack samples under the condition of comparison with benign query samples, wherein the context inconsistency comprises inconsistency of characteristic distances between the anti-query images and top-K retrieval of the anti-query images with benign query images, inconsistency of distances between images in a support set of the anti-query images with the benign query images, inconsistency of top-K retrieval of the benign query images obtained by a plurality of expert ReID models with the anti-query images, and the like, and the success rate of anti-attack detection is improved.
3. The attack counterattack detector obtained by training the benign and counterattack samples can successfully detect the counterattack method of the counterattack training samples, can effectively defend other counterattack methods, and has adaptability to different counterattack methods.
The present invention is not limited to the above preferred embodiments, and any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.