CN111126240A

CN111126240A - A three-channel feature fusion face recognition method

Info

Publication number: CN111126240A
Application number: CN201911317291.6A
Authority: CN
Inventors: 李云红; 聂梦瑄; 李传真; 穆兴; 刘旭东; 何琛
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2020-05-08
Anticipated expiration: 2039-12-19
Also published as: CN111126240B

Abstract

The invention discloses a three-channel feature fusion face recognition method. The specific steps are as follows: step 1, collecting different face images to form a data set; preprocessing the data set to obtain a preprocessed image set; step 2, establishing a three-channel based image set The BP neural network model of feature fusion includes coarse sampling channel 20, LBP channel 19, and fine sampling channel 18; step 3, use the preprocessing image set to train the BP neural network model based on three-channel feature fusion; step 4, input the For the identified images, use the trained BP neural network model based on three-channel feature fusion to compare the feature similarity, and output the highest similarity and its image; Step 5, set the threshold, according to the similarity and the threshold output in step 4. Compare and judge whether it is the same person in the image to be recognized. The invention solves the problem that the traditional face recognition method cannot extract all-round features of the face, and improves the recognition accuracy of the face.

Description

Three-channel feature fusion face recognition method

Technical Field

The invention belongs to the technical field of face recognition, and discloses a three-channel feature fusion face recognition method.

Background

With the rapid development of society, people have great demands on automatic identity recognition devices in every aspect of daily life. The current identification technology mainly comprises: password authentication, fingerprint identification, face identification, iris identification, gait identification and the like. Because the face recognition has the advantages of non-contact property, high safety, convenience, rapidness and the like, the face recognition is gradually accepted by the public.

Especially in recent years, the computing speed is greatly improved as the computer hardware is continuously updated. Further, the convolutional neural network after several wave breaks is paid attention to people again, and more people are put into research and improvement on the face recognition algorithm. The traditional face recognition algorithm tests the recognition accuracy under a rigid condition and can obtain a good recognition effect. Under the non-rigid condition, the recognition accuracy is greatly reduced due to the influence of illumination, human face posture change, shielding and algorithm defects. The multi-task convolutional neural network appearing in recent years can accurately detect the position of a human face, mark key points and make full preparation for extracting the characteristics of the human face; the LBP operator can effectively reduce the influence of illumination change on the input face image. But how to effectively extract the face features is crucial to improving the face recognition accuracy. The method only preprocesses the input image, does not change the structure of the network, and can not effectively improve the identification accuracy; the two-channel model changes the structure of the neural network model. Although the improved model can effectively classify the input facial expressions, the classification accuracy is greatly influenced by non-rigid factors such as illumination change, shielding and the like of the input images; a traditional LBP operator and a Deep Belief Network (DBN) are combined with a feature extraction module, an image processed by the LBP is input into the DBN, and high identification accuracy is achieved on different data sets. However, most of the global feature information in the original image is lost in the image processed by the LBP operator, and the overall features of the input image cannot be effectively embodied.

Disclosure of Invention

The invention aims to provide a three-channel feature fusion face recognition method, which solves the problem that the traditional face recognition method in the prior art cannot extract all-round features of a face, and improves the face recognition accuracy.

The technical scheme adopted by the invention is that,

a three-channel feature fusion face recognition method comprises the following specific steps:

step 1, collecting different face images to form a data set; preprocessing each face image in the data set to obtain a preprocessed image, wherein the preprocessed image is the face image which is subjected to face correction after irrelevant background information is removed; forming all the preprocessed images into a preprocessed image set;

step 2, establishing a BP neural network model based on three-channel feature fusion, wherein the BP neural network model based on the three-channel feature fusion comprises three parallel feature extraction channels which are respectively a coarse sampling channel, an LBP channel and a fine sampling channel;

step 3, training a BP neural network model based on three-channel feature fusion by utilizing a preprocessed image set;

step 4, inputting an image to be recognized, performing feature similarity comparison on the image to be recognized and images in a training set by using a trained BP neural network model based on three-channel feature fusion, and outputting the image with the highest similarity and the similarity of the image;

and 5, setting a threshold, comparing the similarity output in the step 4 with the threshold, further judging whether the image output in the step 4 and the image to be identified are the same person, and outputting a result.

The present invention is also characterized in that,

the pretreatment operation in the step 1 comprises the following specific steps:

step 1.1, inputting a face image;

step 1.2, carrying out face cutting on the face image, removing redundant information such as background and the like, and obtaining a face image without a background;

and step 1.3, carrying out eye key point marking on the face image without the background, connecting the two eye key points, setting an included angle between a connecting line of the two eye key points and the horizontal direction as a, and rotating the face image anticlockwise by the angle a to obtain a preprocessed image.

In step 1.3, the coordinates of the key points are located by Euclidean distance, as shown in formula (1), wherein L_iRepresenting the euclidean distance at which the key points are located,

representing predicted face keypoint locations, y_iRepresenting the positions of key points of a real face;

the determination of the face key points is as in formula (2):

smaller Y indicates predicted keypoint location

With the true keypoint location y_iThe smaller the error of (2), the smallest Y

The value is the location of the marked keypoint; where Y represents the location information of the final keypoint, N represents the number of training samples, b_iIndicating a sample label.

The angle a can be expressed as:

wherein (x)₁，y₁) Key point No. 1, the coordinates of the center of the left eye, (x)₂，y₂) Key point No. 2 is the coordinate of the center of the right eye.

The BP neural network model based on three-channel feature fusion comprises three parallel feature extraction channels, the three feature extraction channels are respectively a coarse sampling channel, an LBP channel and a fine sampling channel, the output ends of the coarse sampling channel, the LBP channel and the fine sampling channel are all connected with a hidden layer, and the output end of the hidden layer is sequentially connected with a dimensionality reduction layer, a first full-connection layer, a second full-connection layer and a loss function layer.

The coarse sampling channel consists of three convolution layers and three pooling layers, and sequentially comprises the following steps: a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer and a third pooling layer;

the first convolution layer, the second convolution layer and the third convolution layer adopt convolution kernels with the size of 5 multiplied by 5;

the first, second and third pooling layers all employ Max _ pooling, the pooling size is 2 × 2, the pooling step size is 2, and the padding mode is set to SAME.

The LBP channel feature extraction method comprises the following steps:

step 2.1, dividing the preprocessed image into a plurality of sub-images with the same size;

step 2.2, for each sub-image, converting each pixel point information into a pixel brightness value, and setting the pixel brightness value at the middle position as g_cThe surrounding eight neighboring pixels have brightness values of g_i(i ═ 0, 1.., 7), the luminance information is subjected to binarization processing using formula (8) and formula (9):

wherein x represents g_i-g_cDifference of (A), B_iRepresenting a formula for converting the resulting binary number into a binary value, s (x) the resulting pixel value;

step 2.3, the binarized value on the right side of the pixel at the middle position is taken as an initial position, the obtained binarized value is written into an eight-bit binary number in a counterclockwise rotation mode, the binary number is converted into a decimal number, and the decimal number is the LBP value corresponding to the brightness of the pixel at the central point; and performing the operation on each pixel point in the input face image to finally obtain the local characteristic value of each pixel point of the input image.

The fine sampling channel consists of three convolution layers and three pooling layers, and sequentially comprises the following steps: a fourth convolution layer, a fourth pooling layer, a fifth convolution layer, a fifth pooling layer, a sixth convolution layer and a sixth pooling layer;

the fourth convolution layer, the fifth convolution layer and the sixth convolution layer are formed by stacking 1 x 3 convolution kernels and 3 x 1 convolution kernels;

the fourth, fifth and sixth pooling layers all employ Max _ pooling, the pooling size is 2 × 2, the pooling step size is 2, and the padding mode is set to SAME.

The dimensionality reduction layer adopts PCA dimensionality reduction operation to convert the feature fusion into one-dimensional feature vector information, and the method comprises the following specific steps:

step 3.1, performing feature fusion on the feature information of the human faces of the three channels to convert the feature information into a matrix form, and setting the matrix form of the fused feature image as X_(n,m)；

Step 3.2, carrying out zero-mean processing on the matrix X and solving a covariance matrix H:

step 3.3, solving the eigenvalue of the matrix H and calculating the eigenvector corresponding to the eigenvalue;

step 3.4, taking the eigenvectors corresponding to the first k larger eigenvalues of the eigenvectors, and arranging the eigenvectors into a Q matrix according to rows;

and 3.5, obtaining the one-dimensional feature vector after dimension reduction by using the U-QX.

The loss function layer adopts a triple loss function as a loss function, and the formula is as follows:

a in the formula represents a sample arbitrarily selected from a training set; p represents a randomly selected sample of the same kind as a, and is called positive sample (positive); n represents a randomly selected sample of a different class, called negative sample (negative); training a parameter-sharing network aiming at each sample in the triple loss function to obtain the characteristic expressions of the three samples of a, p and n, which are respectively recorded as:

the triple loss function is to minimize the distance between (a, p) features in two classes and maximize the distance between (a, n) features between two classes through the learning of a training set; the effect of the '+' sign in the formula is: if 2]If the value of the inner is greater than 0, then the value is taken as the loss; when the loss is less than 0, the loss is 0.

The step 5 specifically comprises the following steps: and setting a threshold, if the similarity of the image output in the step 4 is greater than the threshold, determining that the image to be recognized and the human image in the image output in the step 4 are the same person, otherwise, determining that the training set does not have the same human face information as the image to be recognized, wherein the setting range of the threshold is 0.73-0.78.

The invention has the advantages that

(1) Firstly, MTCNN is used for carrying out face detection and face key point marking on an input image, background irrelevant information of the input image is removed, and on the basis, a rotary correction method is used for carrying out posture correction on a face, so that the recognition accuracy rate can be improved.

(2) The three channels are used for face feature extraction, so that local features, edge features and contour features of a face and fine features of face organs can be obtained, and the global feature information of the face image can be accurately reflected.

(3) Since the feature acquisition of the input image by using the three feature extraction channels leads to feature dimension explosion and is not beneficial to subsequent operation, the extracted features are subjected to dimension reduction operation by using a Principal Component Analysis (PCA).

(4) The invention selects triple loss as a loss function, and finally trains out the characteristic with cohesion. The triple loss function can distinguish the same and different features, reduce the inter-class distance as much as possible and enlarge the inter-class distance. Cohesion plays an important role in face recognition, and a very good model can be trained by using a small amount of data.

Drawings

FIG. 1 is a structural diagram of a BP neural network model volume based on three-channel feature fusion in a three-channel feature fusion face recognition method of the present invention;

FIG. 2 is a flow chart of a preprocessing of a three-channel feature fusion face recognition method of the present invention;

FIG. 3 is an LBP mapping chart in the three-channel feature fusion face recognition method of the present invention;

FIG. 4 is a graph of features extracted using different methods; wherein (a) is an original graph, (b) is a characteristic graph extracted by a Gabor method, (c) is a characteristic graph extracted by a Haar method, (d) is a characteristic graph extracted by an LBP method, and (e) is a characteristic graph extracted by the method;

FIG. 5 is another feature map extracted using a different method; wherein (a) is an original graph, (b) is a characteristic graph extracted by a Gabor method, (c) is a characteristic graph extracted by a Haar method, (d) is a characteristic graph extracted by an LBP method, and (e) is a characteristic graph extracted by the method;

fig. 6 is feature numbers extracted from Lena pictures by different methods, wherein (a) is a feature map extracted by a Gabor method, (b) is a feature map extracted by a Haar method, (c) is a feature map extracted by an LBP method, and (d) is a feature map extracted by the method of the present invention.

In the figure, 1, a first convolutional layer, 2, a first pooling layer, 3, a second convolutional layer, 4, a second pooling layer, 5, a third convolutional layer, 6, a third pooling layer, 7, a fourth convolutional layer, 8, a fourth pooling layer, 9, a fifth convolutional layer, 10, a fifth pooling layer, 11, a sixth convolutional layer, 12, a sixth pooling layer, 13, feature fusion, 14, a dimensionality reduction layer, 15, a first full-connection layer, 16, a second full-connection layer, 17, a loss function layer, 18, a fine sampling channel, 19, an LBP channel, and 20, a coarse sampling channel.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses a three-channel feature fusion face recognition method, which comprises the following specific steps:

step 2, establishing a BP neural network model based on three-channel feature fusion, as shown in FIG. 1, wherein the BP neural network model based on three-channel feature fusion comprises three parallel feature extraction channels, and the three feature extraction channels are respectively a rough sampling channel 20, an LBP channel 19 and a fine sampling channel 18;

In step 1, as shown in fig. 2, the pretreatment operation specifically includes the following steps:

step 1.1, inputting a face image;

the determination of the face key points is as in formula (2):

smaller Y indicates predicted keypoint location

With the true keypoint location y_iThe smaller the error of (2), the smallest Y

The angle a can be expressed as:

wherein (x)₁，y₁) Key point No. 1, the coordinates of the center of the left eye, (x)₂，y₂) Is No. 2 key point, right eyeThe coordinates of the center.

The BP neural network model based on three-channel feature fusion comprises three parallel feature extraction channels, the three feature extraction channels are respectively a coarse sampling channel 20, an LBP channel 19 and a fine sampling channel 18, the output ends of the coarse sampling channel 20, the LBP channel 19 and the fine sampling channel 18 are all connected with a hidden layer 13, and the output end of the hidden layer 13 is sequentially connected with a dimensionality reduction layer 14, a first full connection layer 15, a second full connection layer 16 and a loss function layer 17.

Wherein the rough sampling channel 20 is used for collecting rough edge and contour features in the preprocessed image;

the LBP channel 19 is used for acquiring a local characteristic map of a local input image of the preprocessed image;

wherein the fine sampling channel 18 is used for acquiring rough edge and contour features in the preprocessed image;

the hidden layer 13 is used for fusing and converting the features extracted by the three sampling channels into a matrix form.

The rough sampling channel 20 is composed of three convolutional layers and three pooling layers, and sequentially comprises: a first convolutional layer 1, a first pooling layer 2, a second convolutional layer 3, a second pooling layer 4, a third convolutional layer 5, and a third pooling layer 6;

the first convolution layer 1, the second convolution layer 3 and the third convolution layer 5 all adopt convolution kernels with the size of 5 multiplied by 5;

the first, second and third pooling layers 2, 4 and 6 all employ Max _ pooling, the pooling size is 2 × 2, the pooling step size is 2, and the padding mode is set to SAME.

The formula for extracting features of an image by using the coarse sampling channel 20 is as follows:

let the input image size be H₁×H₂Convolution kernel size of F₁×F₂Output image size W₁×W₂. The convolutional layer formula is shown in formula (4):

W₁×W₂＝(H₁-F₁+1)*(H₂-F₂+1) (4)

the picture size output by the convolution layer is W₁×W₂Then inputting the mixture into a pooling layer with a pooling size of F₃×F₄The step length of the pooling is S; when padding is in Valid mode, the calculation formula of the image size after passing through the pooling layer is shown as formula (5):

and when padding is in the Same mode, calculating the size of the image after passing through the pooling layer. F_iFor the size of the pooling layer, P is the size of the fill around the image, as shown in equation (6):

as shown in fig. 3, the LBP channel performs the feature extraction method as follows:

The fine sampling channel 18 is composed of three convolutional layers and three pooling layers, and sequentially comprises: a fourth convolutional layer (7), a fourth pooling layer (8), a fifth convolutional layer (9), a fifth pooling layer (10), a sixth convolutional layer (11), and a sixth pooling layer (12);

the fourth convolution layer (7), the fifth convolution layer (9) and the sixth convolution layer (11) are convolution layers formed by stacking 1 x 3 convolution kernels and 3 x 1 convolution kernels;

the fourth pooling layer (8), the fifth pooling layer (10) and the sixth pooling layer (12) all adopt maximum pooling (Max _ pooling), the pooling size is 2 x 2, the pooling step size is 2, and the padding mode is set to SAME.

The feature extraction using the fine sampling channel 18 is the same as the feature extraction using the coarse sampling channel 20.

The dimensionality reduction layer 14 transforms the feature fusion into one-dimensional feature vector information by adopting PCA dimensionality reduction operation, and the method specifically comprises the following steps:

The loss function layer (17) adopts a triple loss function as a loss function, and the formula is as follows:

Specifically, step 5 is to set a threshold, if the similarity of the image output in step 4 is greater than the threshold, the image to be recognized and the portrait in the image output in step 4 are determined to be the same person, otherwise, the training set is determined to have no facial information the same as that of the image to be recognized, and the set range of the threshold is 0.73-0.78.

In order to verify the invention, five algorithms (CNN, LBP + DBN, rough learning + fine learning, HOG + SVM and the invention algorithm) are adopted in the ORL face database to carry out face recognition simulation experiments. The Gabor, Haar, LBP and three-channel fused profiles were used herein for comparison, as shown in fig. 4. The Gabor and LBP algorithm can only extract the local features of the face image; the Haar algorithm value can extract contour characteristic information of the human face; the extracted local and global features of the human face are fused by the method to form output including the local and global features, so that the identification accuracy can be effectively improved. Fig. 5 shows feature numbers extracted from Lena pictures by four methods, in which the abscissa and ordinate represent the number of features extracted from the length and width of an input image, respectively, and the height represents the feature intensity. By comparing the four feature extraction graphs in fig. 5, the image features after three-channel feature fusion can be obtained, more feature information can be extracted, and the feature intensity is higher than that obtained by other methods, so that the identification accuracy is effectively improved.

TABLE 1 comparison of time consumption and accuracy for identification in ORL database_。The ORL face database contains 40 subjects of different ages, gender, and ethnicity. 10 pictures per person for a total of 400 pictures. The original picture size is 92 × 112. These pictures were collected under differential expression, pose and lighting. The original pictures were processed to 64 × 64 in the experiment for the experiment. In the experiment, 200 training samples are selected, and the remaining 200 testing samples are selected.

TABLE 1 comparison of recognition time consumption and accuracy in ORL database

As can be clearly seen from table 1, the performance of the algorithm proposed by the present invention is compared with that of the comparison algorithm on the ORL database. Compared with the traditional CNN algorithm, the algorithm provided by the invention almost has no difference with the comparison algorithm in terms of face recognition time consumption, and the average time consumption is only 0.0135 second more compared with four comparison algorithms of CNN, LBP + DBN, rough learning + fine learning and HOG + SVM. But there is a significant improvement in recognition accuracy. The average accuracy is higher by 5.38%.

Claims

1. a three-channel feature fusion face recognition method, is characterized in that, concrete steps are as follows:

Step 1: Collect different face images to form a data set; perform a preprocessing operation on each face image in the data set to obtain a preprocessed image, and the preprocessed image is obtained after removing irrelevant background information and performing face correction. face image; group all preprocessed images into a preprocessed image set;

Step 2, establish a BP neural network model based on three-channel feature fusion, the BP neural network model based on three-channel feature fusion includes three parallel feature extraction channels, and the three feature extraction channels are respectively rough sampling channels (20 ), LBP channel (19), fine sampling channel (18);

Step 3, using the preprocessing image set to train the BP neural network model based on three-channel feature fusion;

Step 4: Input the image to be identified, use the trained BP neural network model based on three-channel feature fusion to compare the feature similarity between the image to be identified and the image in the training set, and output the image with the highest similarity and the similarity of the image. ;

Step 5: Set a threshold, compare the similarity output in step 4 with the threshold, and then determine whether the image output in step 4 and the image to be recognized are the same person, and output the result.

2. a kind of three-channel feature fusion face recognition method as claimed in claim 1 is characterized in that, the concrete steps of the preprocessing operation described in step 1 are:

Step 1.1, input face image;

Step 1.2, performing face cropping on the face image, removing redundant information such as background, and obtaining a face image without background;

Step 1.3, mark the eye key points on the face image without background, connect the two eye key points, and set the angle between the connection line and the horizontal direction as a, and then use the face image Rotate a counterclockwise to get the preprocessed image.

3. a kind of three-channel feature fusion face recognition method as claimed in claim 2, is characterized in that, in described step 1.3, adopts Euclidean distance to locate key point coordinates, as shown in formula (1), wherein, L _i represents the Euclidean distance for keypoint positioning,

Represents the predicted face key point position, and y _i represents the real face key point position;

The key points of the face are determined as formula (2):

The smaller Y indicates the predicted key point location

The smaller the error with the real key point position _yi is, the smaller the value of Y is.

The value is the position of the marked key point; among them, Y represents the location information of the final key point, N represents the number of training samples, and b _i represents the sample label.

The included angle a can be expressed as:

Among them, (x ₁ , y ₁ ) is the key point No. 1, which is the coordinate of the center of the left eye, and (x ₂ , y ₂ ) is the key point No. 2, which is the coordinate of the center of the right eye.

4. a three-channel feature fusion face recognition method as claimed in claim 1, is characterized in that, the described BP neural network model based on three-channel feature fusion comprises three parallel feature extraction channels, and described three features The extraction channels are respectively a coarse sampling channel (20), an LBP channel (19), and a fine sampling channel (18). The hidden layer (13) is connected, and the output end of the hidden layer (13) is sequentially connected with a dimensionality reduction layer (14), a first fully connected layer (15), a second fully connected layer (16) and a loss function layer (17) ).

5. a three-channel feature fusion face recognition method as claimed in claim 4, is characterized in that, described rough sampling channel (20) is made up of three layers of convolution layers and three layers of pooling layers, in order: the first A convolutional layer (1), a first pooling layer (2), a second convolutional layer (3), a second pooling layer (4), a third convolutional layer (5), and a third pooling layer ( 6);

The first convolutional layer (1), the second convolutional layer (3) and the third convolutional layer (5) all use a 5×5 convolution kernel;

The first pooling layer (2), the second pooling layer (4) and the third pooling layer (6) all use maximum pooling, the pooling size is 2×2, the pooling step size is 2, and the padding Mode is set to SAME.

6. a kind of three-channel feature fusion face recognition method as claimed in claim 4, is characterized in that, the method that LBP channel carries out feature extraction is:

Step 2.1, divide the preprocessed image into several sub-images of the same size;

Step 2.2, for each sub-image, convert the information of each pixel into a pixel brightness value, set the pixel brightness value in the middle position to g _c , and the eight neighboring pixels around the brightness value g _i (i=0, 1, . . . ., 7), using formula (8) and formula (9) to binarize the luminance information:

Where x represents the difference between g _i -g _c , B _i represents the formula for converting the obtained binary number into a binary value, and the pixel value finally obtained by s(x);

Step 2.3, take the binarized value on the right side of the pixel at the middle position as the starting position, rotate counterclockwise to write the obtained binarized value as an eight-bit binary number, convert the binary number into a decimal number, and the decimal number is the center point The LBP value corresponding to the pixel brightness; perform this operation on each pixel in the input face image, and finally obtain the local feature value of each pixel in the input image.

7. A three-channel feature fusion face recognition method as claimed in claim 4, wherein the fine sampling channel (18) is composed of three layers of convolution layers and three layers of pooling layers, in order: the first Four convolutional layers (7), fourth pooling layer (8), fifth convolutional layer (9), fifth pooling layer (10), sixth convolutional layer (11) and sixth pooling layer ( 12);

The fourth convolution layer (7), the fifth convolution layer (9) and the sixth convolution layer (11) all use convolution layers formed by stacking 1×3 convolution kernels and 3×1 convolution kernels;

The fourth pooling layer (8), the fifth pooling layer (10) and the sixth pooling layer (12) all adopt maximum pooling, the pooling size is 2×2, the pooling step size is 2, and the padding Mode is set to SAME.

8. a kind of three-channel feature fusion face recognition method as claimed in claim 4, is characterized in that, described dimension reduction layer (14) adopts PCA dimension reduction operation to convert into one-dimensional feature vector information after feature fusion, concrete The steps are:

Step 3.1, the feature information of the faces of the three channels is carried out feature fusion and transformed into a matrix form, and the matrix form of the feature image after the fusion is assumed to be X _{(n, m)} ;

Step 3.2, perform zero-average processing on the matrix X and obtain its covariance matrix H:

Step 3.3, find the eigenvalues of the matrix H, and calculate the eigenvectors corresponding to the eigenvalues;

Step 3.4, take the eigenvectors corresponding to the first k larger eigenvalues of the eigenvectors, and arrange them into a Q matrix in rows;

Step 3.5, U=QX is the one-dimensional feature vector after dimension reduction.

9. a kind of three-channel feature fusion face recognition method as claimed in claim 4, is characterized in that, described loss function layer (17) adopts Triplet loss function as loss function, and formula is as follows:

In the formula, a represents a sample randomly selected from the training set; p represents a randomly selected sample of the same class as a, which is called a positive sample; n represents a randomly selected sample of a different class from a, which is called a negative sample. ); for each sample in the Triplet loss function, a network with shared parameters is trained, and the feature expressions of the three samples a, p, and n are respectively recorded as:

The Triplet loss function is to learn from the training set to minimize the distance between (a, p) two intra-class features, and (a, n) to maximize the distance between two inter-class features; in the formula '+ The function of the ' number is: if the value in [] is greater than 0, the value is taken as the loss; when it is less than 0, the loss is taken as 0.

10. a kind of three-channel feature fusion face recognition method as claimed in claim 4, it is characterized in that, step 5 is specifically: set threshold value, if the similarity of described step 4 output image is greater than threshold value, it is determined that waiting The recognized image is the same person as the portrait in the output image in step 4, otherwise it is determined that the training set does not have the same face information as the image to be recognized, and the threshold is set in the range of 0.73 to 0.78.