CN116129484A

CN116129484A - Method, device, electronic equipment and storage medium for model training and living body detection

Info

Publication number: CN116129484A
Application number: CN202210879965.7A
Authority: CN
Inventors: 高亮; 周迅溢; 曾定衡
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2023-05-16

Abstract

The application provides a method, a device, an electronic device and a storage medium for model training and living body detection, comprising the following steps: acquiring a marked image training sample set, wherein a training label of the image training sample comprises: the face area label is used for representing the face area characteristics of the prosthesis attack image sample; the false body border region label is used for representing false body border region characteristics of false body attack image samples; the authenticity label is used for representing the authenticity classification of the prosthesis attack image sample; training an initial living body detection model by utilizing an image training sample set; the living body detection model includes: the coding network is used for coding the human face area and the false body frame area of the false body attack image sample; the perception network is used for fusing the facial region features corresponding to the prosthesis attack image sample and the prosthesis frame region features; and the classification network is used for carrying out true and false identification on the fusion characteristics of the prosthesis attack image sample to obtain a true and false identification result corresponding to the prosthesis attack image sample.

Description

Method, device, electronic equipment and storage medium for model training and living body detection

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a method, a device, electronic equipment and a storage medium for model training and living body detection.

Background

With the continuous development of the biological recognition technology and the artificial intelligence technology, the face recognition technology is widely applied, and the processes of identity authentication such as payment, access control, security check and the like are greatly simplified. In practical application, the human face is used as an open biological feature, is easy to be utilized by a malicious person, and presents the human face image of a legal user by using a prosthesis medium, thereby impersonating the legal user to initiate human face recognition. This act of recognizing the impersonation of other user identities with the prosthetic medium is called a prosthetic attack.

For this reason, how to automatically and efficiently identify the false attack of the face image by a machine has become a urgent problem in the industry.

Disclosure of Invention

The application aims to provide a method, a device, electronic equipment and a storage medium for model training and living body detection, which can carry out living body detection on a face image based on a machine and can be used for resisting a prosthesis attack by a face recognition system.

In order to achieve the above object, embodiments of the present application are implemented as follows:

In a first aspect, a model training method is provided, including:

acquiring an image training sample set, wherein the image training sample set comprises a plurality of prosthesis attack image samples and corresponding training labels, the training labels comprise face area labels, prosthesis border area labels and true-false labels, the face area labels are used for representing face area characteristics of the corresponding prosthesis attack image samples, the prosthesis border area labels are used for representing prosthesis border area characteristics of the corresponding prosthesis attack image samples, and the true-false labels are used for representing true-false classification of the corresponding prosthesis attack image samples;

training an initial living body detection model by using the image training sample set to obtain a living body detection model;

wherein the living body detection model comprises a coding network, a perception network and a classification network; the coding network is used for coding a face area and a prosthesis border area of each prosthesis attack image sample in the plurality of prosthesis attack image samples to obtain a face area characteristic and a prosthesis border area characteristic corresponding to each prosthesis attack image sample; the perception network is used for fusing the facial region features and the prosthetic frame region features corresponding to each prosthetic attack image sample to obtain fusion features corresponding to each prosthetic attack image sample; the classification network is used for carrying out true and false identification on the fusion characteristics of each prosthesis attack image sample to obtain a true and false identification result corresponding to each prosthesis attack image sample.

In a second aspect, there is provided a living body detection method including:

responding to a living body detection request initiated by a target user, and acquiring a face shooting image of the target user;

inputting the face shooting image of the target user into a living body detection model to obtain an authenticity identification result corresponding to the face shooting image of the target user; wherein the living body detection model is trained based on the method of the first aspect; the living body detection model is used for encoding the face shooting image into corresponding face area characteristics and false body border area characteristics, fusing the face area characteristics and the false body border area characteristics into fusion characteristics, and then carrying out true and false identification on the face shooting image based on the fusion characteristics.

In a third aspect, there is provided a living body detection apparatus comprising:

the shooting image acquisition module is used for responding to a living body detection request initiated by a target user to acquire a face shooting image of the target user;

the authenticity identification module is used for inputting the face shooting image of the target user into a living body detection model to obtain an authenticity identification result corresponding to the face shooting image of the target user; wherein the living body detection model is trained based on the method of the first aspect; the living body detection model is used for encoding the face shooting image into corresponding face area characteristics and false body border area characteristics, fusing the face area characteristics and the false body border area characteristics into fusion characteristics, and then carrying out true and false identification on the face shooting image based on the fusion characteristics.

In a fourth aspect, there is provided an electronic device comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being executable by the processor to perform the method of the first or second aspect.

In a fifth aspect, there is provided a computer readable storage medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the method of the first or second aspect described above.

In the scheme of the application, three labels are marked on a prosthesis attack image sample for training a living body detection model, and the three labels comprise an authenticity label for representing authenticity classification of the prosthesis attack image sample, a prosthesis border region label for representing characteristics of a prosthesis border region in the prosthesis attack image sample and a face region label for representing characteristics of a face region in the prosthesis attack image sample. In the training process, the coding network of the living body detection model can purposefully learn the respective characteristic knowledge of the human face and the prosthesis under the supervision of the label of the frame region of the prosthesis and the label of the human face region, so that the capability of accurately extracting the respective characteristics of the human face and the prosthesis in the attack image of the prosthesis is provided. In addition, the perception network of the living body detection model can fuse the characteristics of the human face and the prosthesis in the prosthesis attack image and then submit the characteristics to the classification network of the living body detection model, and the classification network learns how to combine the characteristics of the human face and the prosthesis to carry out the true and false analysis of living body detection under the supervision of the true and false labels. According to the scheme, the living body detection model can learn the essential characteristics of the face and the prosthesis more purposefully, and the living body detection analysis is carried out by taking the essential characteristics as the factors, so that the accuracy and the fixed rate of the living body detection model are greatly improved, the living body detection model has good effects in the training stage and the using stage, and the human face recognition system can be assisted to prevent the prosthesis attack of an illegal user more effectively.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person having ordinary skill in the art.

Fig. 1 is a schematic flow chart of a payment system processing a transaction request for face recognition.

Fig. 2 is a schematic flow chart of a model training method according to an embodiment of the present application.

Fig. 3 is a schematic diagram showing a feature of a prosthetic bezel area in a prosthetic attack face image that is different from the face area.

Fig. 4 is a first schematic diagram of coding a prosthetic border region and a human face region in a prosthetic attack human face image sample according to the model training method of the embodiment of the present application.

Fig. 5 is a second schematic diagram of coding a prosthetic border region and a face region in a prosthetic attack face image sample according to the model training method of the embodiment of the present application.

Fig. 6 is a schematic structural diagram of a living body detection model according to an embodiment of the present application.

Fig. 7 is a schematic diagram of a first flowchart of a living body detection method according to an embodiment of the present application.

Fig. 8 is a schematic diagram of a second flow chart of a living body detection method according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application.

Fig. 10 is a schematic structural view of a living body detection apparatus according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present specification, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

With the wide application of face recognition technology and the trend of automation and unsupervised development, how to automatically and effectively identify the false attack of face images through a machine has become a urgent problem in the industry.

The flow chart of the payment system illustrated in fig. 1 for successfully completing a payment transaction based on face recognition specifically includes the following steps:

1) The user initiates a transaction request to the payment system.

2) And the payment system responds to the transaction request of the user, and invokes a camera of the user terminal to attempt to acquire the face image of the user.

3) The payment system performs face recognition on the acquired face image.

4) And after the face recognition is passed, the payment system executes the transaction operation corresponding to the transaction request.

6) The payment system feeds back the transaction result to the user.

As can be seen from the above flow, in the stage of collecting the face shot image, if the illegal user uses the prosthesis such as the head model, the mask, the photo, the electronic screen, etc. to present the face image of the legal user, the payment system may be caused to misjudge and pass the face recognition, and finally accept the transaction request of the illegal user, thereby causing loss to both the legal user and the payment system.

For this reason, before face recognition is performed, it is necessary to perform living detection of face images in order to resist an attack of recognizing face images of a legal user using a prosthesis impersonation.

The living body detection is a method for determining the real physiological characteristics of the object in an identity verification scene. The mainstream mode of accomplishing the living body detection based on artificial intelligence technology at present is to train the deep learning model by using a real face sample and a false human face sample, so that the deep learning model has the capability of identifying whether the human face image is a real human face image or a false human face image. In the method, the human face sample only marks the real human face sample and the false human face sample, the false human face sample itself which is the purpose of impersonation is quite close to the real human face sample in machine vision, if the model does not have more accurate marking, the model does not know which of the two human face samples are the characteristic of the human face and which are the characteristic of the false human body, so that the training has poor interpretation, and the model has better recognition accuracy on a training set, but does not actually learn the characteristic knowledge of the nature of the human face and the false human body, and further cannot effectively perform the analysis of living body detection based on the characteristic knowledge of the human face and the false human body. The deep learning model trained by the method often has greatly reduced accuracy after being put into use, but cannot support business requirements, which is also a common overfitting problem in the industry.

In view of the above, the present application aims to propose a model training scheme capable of performing strong supervised learning on respective features of a face and a prosthesis, so that a living body detection model can purposefully learn feature knowledge of the face and the prosthesis, thereby improving accuracy of living body detection. In this application, the living body detection model includes coding network, perception network and classification network, trains and uses the prosthetic attack image sample that marks face region label, prosthetic frame region label and true and false label, and the flow includes: based on a coding network of the living body detection model, coding a human face region and a false body border region of the false body attack image sample into corresponding human face region features and false body border region features; based on a perception network of the living body detection model, fusing the facial area characteristics of each prosthesis attack image sample and the frame area characteristics of the prosthesis into fusion characteristics; determining the true and false identification result of each prosthesis attack image sample based on the classification network of the living body detection model; and finally, training a coding network, a perception network and a classification network based on the difference between the face region features and the face region labels of the various false attack image samples, the difference between the false border region features and the false border region labels, and the difference between the true and false identification results and the true and false labels.

It can be seen that, in the scheme of the application, the prosthesis attack image sample for training the living body detection model is marked with three kinds of labels, including an authenticity label for representing the authenticity classification of the prosthesis attack image sample, a prosthesis border region label for representing the characteristics of a prosthesis border region in the prosthesis attack image sample, and a face region label for representing the characteristics of a face region in the prosthesis attack image sample. In the training process, the coding network of the living body detection model can purposefully learn the respective characteristic knowledge of the human face and the prosthesis under the supervision of the label of the frame region of the prosthesis and the label of the human face region, so that the capability of accurately extracting the respective characteristics of the human face and the prosthesis in the attack image of the prosthesis is provided. In addition, the perception network of the living body detection model can fuse the characteristics of the human face and the prosthesis in the prosthesis attack image and then submit the characteristics to the classification network of the living body detection model, and the classification network learns how to combine the characteristics of the human face and the prosthesis to carry out the true and false analysis of living body detection under the supervision of the true and false labels. According to the scheme, the living body detection model can learn the essential characteristics of the face and the prosthesis more purposefully, and the living body detection analysis is carried out by taking the essential characteristics as the factors, so that the accuracy and the fixed rate of the living body detection model are greatly improved, the living body detection model has good effects in the training stage and the using stage, and the human face recognition system can be assisted to prevent the prosthesis attack of an illegal user more effectively.

The model training scheme of the application can be executed by the electronic device, and particularly can be executed by a processor of the electronic device. So-called electronic devices may include terminal devices such as smartphones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices, smart appliances, smart watches, car terminals, aircraft, etc.; alternatively, the electronic device may further include a server, such as an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides a cloud computing service.

Based on the foregoing, an embodiment of the present application provides a model training method. Referring to fig. 2, fig. 2 is a flowchart illustration of a model training method provided in an embodiment of the present application, and specifically includes the following steps:

s202, acquiring an image training sample set, wherein the image training sample set comprises a plurality of prosthesis attack image samples and corresponding training labels, each training label comprises a face area label, a prosthesis border area label and an authenticity label, the face area labels are used for representing face area characteristics of the corresponding prosthesis attack image samples, the prosthesis border area labels are used for representing prosthesis border area characteristics of the corresponding prosthesis attack image samples, and the authenticity labels are used for representing authenticity classification of the corresponding prosthesis attack image samples.

The prosthetic attack image may refer to a facial image presented using prosthetic media of the type of head phantom, mask, photo, electronic screen, and the like. In the preparation stage implemented by the application, any type of prosthesis can be used for shooting after the face image is presented, so that a prosthesis attack image is obtained. The captured image of the prosthetic attack can be used as a sample of the training model, i.e., a sample of the prosthetic attack image described in text.

In fig. 3, three kinds of prosthesis mediums of an electronic screen, a photo and a mask are schematic diagrams of a face image, and it can be seen that the electronic screen, the photo and the mask present characteristic similar frames in the outline area of the face image, which are characteristic and unavoidable characteristics of the common prosthesis attack at present. The method aims at enabling the living body detection model to purposefully learn the feature knowledge of the false body border area and the human face area in the false body attack image sample, so that the true or false detection analysis is carried out by combining the feature knowledge of the false body attack image sample and the human face area.

Here, in order to enable the living body detection model to determine, in a training stage, which features are unique to a human face and which features are unique to a prosthetic rim from a prosthetic attack image sample, the present application uses a human face region label and a prosthetic rim region label to label the respective features of the human face region and the prosthetic rim region in the prosthetic attack image sample.

The method comprises the steps of marking a face region, and enabling a prosthesis attack image sample to be replaced by a black-white binary image according to a first black-white binary rule. Under the first black-white binary rule, the pixels of the face area adopt a first gray value, and the non-face area adopts a second gray value. Wherein the first gray value is different from the second gray value, that is to say, based on the black-and-white binary image generated by the first black-and-white binary rule, the pixels of the face region are highlighted, and the pixels of the rest regions are represented in low brightness; alternatively, the pixels of the face region are represented in a low amount, while the pixels of the remaining regions are represented in a high-light. By contrast of the brightness, the black-and-white binary image can highlight the face region, and the black-and-white binary image highlighting the face region is converted into a gray matrix representing the brightness of the image, which is equivalent to the way of machine language, so that the labeling of the face region features, namely the face region label, is completed.

For ease of understanding, a simple example is presented here: assuming that the first gray value of the application takes the maximum gray value (the maximum gray value is 255), and the second gray value takes the minimum gray value (the minimum gray value is 0), according to the first black-white binary rule, each type of prosthesis attack image sample can be converted into a black binary image which represents the face area and is shown in fig. 4. As can be seen from fig. 4, the black-and-white binary image representing the face region highlights the outline of the face region with the maximum gray value, and for the sample of the prosthesis attack image, the face region label is the gray matrix of the black-and-white binary image generated by the prosthesis attack image according to the first black-and-white binary rule.

Similarly, for labeling of the prosthesis border region, the application adopts a second black-and-white binary rule to quasi-convert the prosthesis attack image sample into a black-and-white binary image. Under the second black-and-white rule, the pixels in the border region of the prosthesis adopt third gray values, and the non-prosthesis border adopts fourth gray values. The third gray value is different from the fourth gray value to represent the distinction. As a preferred scheme, the third gray level value takes the maximum gray level value (the gray level value is 255), the fourth gray level value takes the minimum gray level value (the gray level value is 0), and based on the second black-white binary rule, each type of prosthesis attack image sample can be converted into a black binary image which shows the prosthesis border region shown in fig. 4. As can be seen from fig. 4, the black-and-white binary image representing the border region of the prosthesis highlights the outline of the prosthesis with the maximum gray value, and for the sample of the prosthesis attack image, the border region label of the prosthesis is the gray matrix of the black-and-white binary image generated by the prosthesis attack image according to the second black-and-white binary rule.

It should be understood that, based on the black-and-white binary labeling manner, the method can label the characteristics of the false body border region and the characteristics of the human face region in the false body attack image sample in machine language.

In practical applications, the prosthesis attack image samples are further subdivided into positive examples and negative examples. The positive sample refers to a human face image presented by using a prosthesis medium, and is used for a forward thinking angle training model to learn the characteristics of a prosthesis frame region and the characteristics of a human face region in a prosthesis attack image, so that the capability of identifying the prosthesis attack image based on the characteristics of the prosthesis frame region and the characteristics of the human face region of the prosthesis attack image is provided; the counterexample sample is a sample of a false body attack image which is impersonated by using a face image of a real living body and is used for the reverse thinking angle training model to learn the image characteristics which are not the false body attack image, so that the false body attack image identification is carried out by avoiding the image characteristics which are not the special image characteristics of the false body attack image.

It should be appreciated that training the model based on the positive and negative examples described above may enable the model to more accurately distinguish between a prosthetic attack image and a non-prosthetic attack image (real face image). Correspondingly, the true and false labels of the positive examples are marked as false, the true and false labels of the negative examples are marked as true, the false attack image of the positive examples can be judged as false by the model under the supervision of the true and false labels, and the false attack image (namely the face image of a real living body) of the negative examples is judged as true, so that the false and false judgment capability of living body detection is provided.

It should be noted that, the present application uses a face image of a real living body as a counterexample sample, and the counterexample sample image does not have a prosthetic medium, and therefore does not have a prosthetic border region feature. For this purpose, a gray matrix with the same value of all pixels can be used as the false border region label of the counterexample sample. Here, taking the example that the value of all pixels is 0, referring to fig. 5, the false case sample corresponding to the false case border region label is a gray matrix that the value of all pixels is 0, and a full black image is visually presented. The full black image shows no information, so that the false body border region label can solve the problem that the counterexample sample does not exist in the machine language, and the model can learn without deliberately searching the image characteristics of the false body border region in the counterexample sample in the training process.

Furthermore, since the starting point of the present application is to accurately combine the features of the prosthetic rim region and the features of the face region to identify the prosthetic attack image. For the counterexample sample, under the condition that the characteristics of the prosthetic border region do not exist, the position of the face region does not need to be marked deliberately, and correspondingly, the face region label of the counterexample sample can also be marked by adopting a gray matrix with the same value of all pixels. For example, as shown in fig. 5, the face region label corresponding to the counterexample sample is a gray matrix with all pixels having a value of 0, and visually presents a completely black image.

S204, training the initial living body detection model by using the image training sample set to obtain a living body detection model. The living body detection model comprises a coding network, a sensing network and a classification network; the coding network is used for coding a face region and a false body border region of each false body attack image sample in the multiple false body attack image samples to obtain the face region characteristic and the false body border region characteristic corresponding to each false body attack image sample; the sensing network is used for fusing the facial region features corresponding to each prosthesis attack image sample and the prosthesis frame region features to obtain fusion features corresponding to each prosthesis attack image sample; and the classification network is used for carrying out true and false identification on the fusion characteristics of each prosthesis attack image sample to obtain a true and false identification result corresponding to each prosthesis attack image sample.

In the application, the encoding network encodes the face area and the prosthesis frame area of the prosthesis attack image sample refers to: the coding network codes pixels of the prosthesis attack image sample based on a gray matrix according to the first black-white binary rule to obtain corresponding face region characteristics; and performing gray matrix-based coding on the pixels of the prosthesis attack image sample according to the second black-and-white binary rule, so as to obtain corresponding characteristics of the prosthesis region.

As an exemplary introduction, the coding network of the present application specifically includes an embedding layer and an encoder, where the embedding layer is configured to extract, in a vector manner, or extract gray information vectors of a face region and a prosthetic border region in a prosthetic attack image sample; the encoder is used for encoding the gray information vectors of the face region and the false body border region into gray matrixes according to the corresponding black-white binary rule, namely the face region features and the false body border region features.

In the application, the fusion of the facial region features corresponding to the prosthesis attack image sample and the prosthesis frame region features by the perception network means that: and carrying out full-connection calculation on the face region features and the prosthetic border region features corresponding to the prosthetic attack image sample in a nonlinear combination mode, so that brand new expression features comprising two feature dimensions of the face region features and the prosthetic border region features, namely fusion features, are obtained.

As an exemplary introduction, the sensing network of the present application may be a convolutional neural network, and specifically includes a convolutional layer, a pooling layer, and a fully-connected layer. The convolution layer is used for converting the facial region features and the prosthetic border region features into the same space to be expressed in a vector mode, and the pooling layer is used for carrying out proper information dimension reduction on the facial region features and the prosthetic border region features expressed by the vectors so as to accelerate the subsequent calculation speed; the full-connection layer is used for carrying out nonlinear fusion on the facial region features expressed by the vector after dimension reduction and the prosthetic frame region features to obtain corresponding fusion features.

In the application, the identification of the authenticity of the fusion characteristics of the prosthesis attack image sample by the classification network is as follows: based on the fusion characteristics of the prosthesis attack image sample, calculating the probability that the prosthesis attack image sample is a real living body and the probability that the prosthesis is impersonated, and selecting one of the maximum probabilities as an authenticity identification result. For example, the probability of a real living body obtained by calculation of a certain false attack image sample on the classification network side is 10%, the probability of false attack of the false is 90%, and the final corresponding true and false recognition result is a false face image of the false.

By way of example, the classification network of the present application may be, but is not limited to, a classifier that supports classification computation that is currently relatively popular, such as a logistic regression (logics) model, a softmax function model, a support vector machine (Support Vector Machine, SVM), etc., and is not specifically limited herein.

Based on the structure of the living body detection model, the training principle of the living body detection model is to determine the training gradient of the living body detection model based on the difference between the face area characteristics and the face area labels corresponding to each prosthesis attack image sample, the difference between the prosthesis border area characteristics and the prosthesis border area labels and the difference between the true and false identification results and the true and false labels, and train the coding network, the perception network and the classification network of the living body detection model according to the training gradient.

In particular, this step may set different loss functions for the coding network and the classification network. The loss function of the coding network is used for reflecting coding errors of the coding network, and the coding errors comprise differences between face region features and face region labels corresponding to each prosthesis attack image sample and differences between prosthesis border region features and prosthesis border region labels corresponding to each prosthesis attack image sample. The loss function of the classification network is used for reflecting classification calculation errors of the classification network, and the classification calculation errors comprise: the false identification result corresponding to each false attack image sample is different from the false label.

The method and the device can determine the total loss function of the living body detection model based on the loss functions of the coding network and the classification network, the total loss function comprehensively reflects the total error of the living body detection model, the total error comprises the coding error of the coding network and the error of the classifier network, and the network parameters of the coding network, the perception network and the classification network of the living body detection model are adjusted by taking the total error as the training gradient direction, so that the coding capacity of the coding network, the nonlinear characteristic fusion capacity of the perception network and the classification computing capacity of the classification network are improved.

The training of the living body detection model of the present application is described in detail below.

Referring to fig. 6, fig. 6 is a schematic diagram of a living body detection model of the present application, wherein the coding network specifically includes a first coding sub-network and a second coding sub-network. After the prosthesis attack image is input into the living body detection model, on one hand, the pixels of the prosthesis attack image sample are coded based on the gray matrix through the first coding sub-network to obtain the corresponding facial region characteristics, and on the other hand, the pixels of the prosthesis attack image sample are coded based on the gray matrix through the second coding sub-network. And then, the facial region features output by the first coding sub-network and the prosthetic frame region features output by the second coding sub-network are input to a sensing network, and are fused into fusion features by the sensing network. And finally, the fusion characteristics output by the sensing network are further input into a classification network, and the classification network is used for completing the classification calculation of the authenticity to obtain the authenticity identification result corresponding to the prosthesis attack image sample.

In the present application, the first coding sub-network and the second coding sub-network respectively employ a mean square error (Mean Squared Error, MSE) loss function. Wherein, the formula of MSE loss function is as follows:

In the above formula, n represents the number of the image samples attacked by the prosthesis; y is _i Representing the actual output of an ith prosthesis attack image sample, wherein the actual output corresponds to the facial region characteristics in a first coding sub-network, and corresponds to the prosthesis frame region characteristics in a second coding network;

and representing an expected value corresponding to the ith prosthesis attack image sample, wherein the expected value specifically corresponds to a face area label in the first coding sub-network, and specifically corresponds to a prosthesis frame area label in the second coding network. It should be appreciated that the MSE loss function of the first encoding sub-network is equivalent to calculating the average difference between the face region features and the face region labels in each of the prosthetic attack image samples; the MSE loss function of the second encoding sub-network is equivalent to calculating the average difference between the prosthetic border region features and the prosthetic border region labels in each prosthetic attack image sample.

Correspondingly, the classification network uses a cross-over quotient (Cross Entropy Loss Function, CELF) loss function. Wherein, the formula of CELF loss function is as follows:

And representing an expected value corresponding to the ith prosthesis attack image sample, wherein the expected value specifically corresponds to a face area label in the first coding sub-network, and specifically corresponds to a prosthesis frame area label in the second coding network. For the CELF loss function of the classification network, the distance between the actual output probability and the expected output probability of the prosthesis attack image sample is calculated, namely the cross entropy, and the smaller the value of the cross entropy is, the closer the distribution of the actual output probability and the expected output is.

For the living body detection model as a whole, the total loss function is:

Loss＝MSE ₁ +MSE ₂ +CELF

where Loss represents the total Loss function; MSE (mean square error) ₁ Representing an MSE loss function of the first encoding sub-network; MSE (mean square error) ₂ Representing the MSE loss function of the second encoded subnetwork, and CELF represents the CELF loss function of the classification network.

Here, the network parameters of the first coding sub-network, the second coding sub-network, the sensing network and the classification network of the living body detection model can be adjusted according to the training gradient direction with smaller total Loss function Loss.

By means of the training gradient adjustment mode, multiple iterations can be conducted, the face region features coded by the first coding sub-network aiming at the prosthesis attack image sample can be gradually converged on the corresponding face region labels, the prosthesis border region features coded by the second coding sub-network aiming at the prosthesis attack image sample can be converged on the corresponding prosthesis border labels, the fusion features fused by the coding network aiming at the face region features and the prosthesis border region features can be gradually converged on the direction suitable for classification calculation of the classification network, and the true and false identification results obtained by the classification calculation of the classification network based on the fusion features provided by the coding network can be converged on the corresponding true and false labels.

Based on the convergence principle, after the supervised training is completed, the first coding sub-network finally has the capability of extracting the facial region characteristics from the prosthesis attack image; the second coding sub-network finally has the capability of extracting the characteristics of the border region of the prosthesis from the prosthesis attack image; the perception network finally has the capability of fusing the facial region features and the prosthetic border region features into fused features suitable for classification calculation; the classification network finally has the classification computing capability of carrying out true and false identification on the prosthesis attack image based on the fusion characteristics (combining the facial area characteristics and the prosthesis border area characteristics).

It should be appreciated that the present application enables efficient in vivo detection of facial images using trained in vivo detection models. That is, when the true or false identification result output by the living body detection model indicates that the face image is a false face image of the prosthesis, the living body detection is failed; otherwise, when the true or false identification result output by the living body detection model indicates that the face image is the face image of the real living body, the living body detection is passed.

Obviously, the living body detection for the face image can be automatically completed through the living body detection model obtained through training by the method shown in the figure 1.

Based on the above, the application embodiment also provides a living body detection method executed based on the living body detection model. Fig. 7 is a schematic flow chart of the living body detection method, specifically including the following steps:

s702, acquiring a face shooting image of a target user in response to a living body detection request initiated by the target user.

In the application, when the target user initiates face verification through the terminal, a living body detection request is initiated along the way. In the face verification stage, the terminal starts the shooting equipment to shoot the target user, and the face shooting image of the target user can be obtained from the shooting picture.

And S704, inputting the face shooting image of the target user into a living body detection model to obtain an authenticity identification result corresponding to the face shooting image of the target user, wherein the living body detection model is used for encoding the face shooting image into corresponding face area characteristics and false body border area characteristics, fusing the face area characteristics and the false body border area characteristics into fusion characteristics, and then carrying out authenticity identification on the face shooting image based on the fusion characteristics.

It should be understood that the living body detection model in this step is trained based on the model training method shown in fig. 2, and the principle of the living body detection model will not be described in detail here.

In the present application, if the authenticity identification result indicates that the face shot image is a real face image, it is determined that the living body detection is passed, otherwise, it is determined that the living body detection is not passed.

Specifically, if the living body detection is initiated by the user when the user requests the face verification, after determining that the face shot image fails the living body detection, the step can further determine that the face verification fails and end the face verification flow; if the face shot image passes the living body detection, the step can execute a face verification process on the face shot image based on the existing face verification technology.

It can be seen that when the living body detection method of the embodiment of the application performs living body detection on the target user, respective characteristics of the face and the prosthesis can be purposefully tried to be extracted from the face shooting image of the target user, and the respective characteristics of the face and the prosthesis are combined to perform true and false analysis on the face shooting image, namely whether the face shooting image is a real living body face image or a false prosthesis face image is determined, so that whether the living body detection passes or not is determined based on a true and false recognition result, and the prevention effect on the prosthesis attack is achieved.

The application of the living body detection method of the embodiment of the present application will be described below taking a payment system for electronic payment as an example of living body detection model administration.

The method and the device deploy the execution script for performing the living body detection based on the living body detection model in the payment system of the electronic payment, so that the payment system calls the execution script for performing the living body detection when receiving the payment transaction which is initiated by the user and needs the face verification, and performs the living body detection on the user; and then, deciding whether the face verification needs to be further executed according to the living body detection result.

It is assumed here that the present example uses a prosthesis to impersonate a face image of a legitimate user for face recognition by an illegitimate user, and correspondingly, the flow of the living body detection method is as shown in fig. 8, specifically including the steps of:

an illegal user initiates a transaction request for payment to a payment system using a payment APP of a mobile terminal.

Responding to a transaction request of an illegal user by the payment system, and calling a camera of the mobile terminal to attempt to acquire a face shooting image of the illegal user through a payment APP at the side of the illegal user.

In the acquisition stage of face shooting images, an illegal user uses a prosthesis to present face images of legal users so as to impersonate the legal users to carry out face recognition.

After acquiring a face shooting image uploaded by an illegal user terminal through a payment APP, the payment system calls an execution script of living body detection, and runs codes of a living body detection model in the execution script to execute the following living body detection flow for the face shooting image:

1) And encoding the face region and the false body border region of the face shooting image to obtain the face region characteristic and the false body border region characteristic corresponding to the face shooting image.

2) And fusing the facial region features and the prosthetic frame region features to obtain fusion features corresponding to the facial shot images.

3) Based on the fusion characteristics corresponding to the face shooting images, the classification calculation of the authenticity of the face images is carried out, and the authenticity identification result corresponding to the face shooting images is determined.

4) After recognizing that the face shot image is a fake face image impersonated by the prosthesis, the living body detection is judged not to pass.

And then, directly rejecting the transaction request of the illegal user after the in-vivo detection result of the electronic payment APP indicates that the transaction request fails, thereby realizing effective prevention against the false attack of the illegal user.

Corresponding to the method shown in fig. 2, the embodiment of the invention also provides a model training device. Fig. 9 is a schematic structural diagram of the model training apparatus 900, including:

The sample obtaining module 910 is configured to obtain an image training sample set, where the image training sample includes a plurality of prosthesis attack image samples and corresponding training labels, the training labels include a face region label, a prosthesis border region label, and an authenticity label, the face region label is used to characterize a face region feature of the corresponding prosthesis attack image sample, the prosthesis border region label is used to characterize a prosthesis border region feature of the corresponding prosthesis attack image sample, and the authenticity label is used to characterize an authenticity classification of the corresponding prosthesis attack image sample.

The training module 920 is configured to train the initial living body detection model by using the image training sample set, so as to obtain a living body detection model. Wherein the living body detection model comprises a coding network, a perception network and a classification network; the coding network is used for coding a face area and a prosthesis border area of each prosthesis attack image sample in the plurality of prosthesis attack image samples to obtain a face area characteristic and a prosthesis border area characteristic corresponding to each prosthesis attack image sample; the perception network is used for fusing the facial region features and the prosthetic frame region features corresponding to each prosthetic attack image sample to obtain fusion features corresponding to each prosthetic attack image sample; the classification network is used for carrying out true and false identification on the fusion characteristics of each prosthesis attack image sample to obtain a true and false identification result corresponding to each prosthesis attack image sample.

Optionally, the coding network of the living body detection model includes a first coding sub-network and a second coding sub-network, where the first coding sub-network is used for coding the face region of the prosthesis attack image sample to obtain the corresponding face region feature, and the second coding sub-network is used for coding the prosthesis border region of the prosthesis attack image sample to obtain the corresponding prosthesis border region feature.

The face region label is a gray matrix corresponding to a first black-and-white binary rule of the prosthesis attack image sample, wherein in the first black-and-white binary rule, pixels of the face region adopt a first gray value, and a non-face region adopts a second gray value; the first coding sub-network is specifically configured to perform gray matrix coding on the prosthesis attack image sample according to the first black-white binary rule, so as to obtain a corresponding face region feature.

The false body frame region label is a gray matrix corresponding to a second black-and-white binary rule of a false body attack image sample, wherein in the second black-and-white binary rule, pixels in the false body frame region adopt a third gray value, and non-false body frame regions adopt a fourth gray value; the second coding sub-network is specifically configured to perform gray matrix coding on the prosthesis attack image sample according to the second black-white binary rule, so as to obtain a corresponding prosthesis frame region feature.

Optionally, the training module 920 is specifically configured to: determining a loss function of the first coding sub-network based on the difference between the face region characteristics corresponding to each prosthesis attack image sample and the face region labels; and determining a loss function of the second coding sub-network based on the difference between the false border region characteristics corresponding to each false attack image sample and the false border region labels and the difference between the true and false identification results and the true and false labels; and determining a loss function of the classification network based on the difference between the true and false identification result corresponding to each false attack image sample and the true and false label; determining a total loss function of the living detection model based on the loss functions of the first encoding sub-network, the second encoding sub-network, and the classification network; a training gradient of the living detection model is determined based on a total loss function of the living detection model.

Optionally, the loss functions of the first coding sub-network and the second coding sub-network are mean square error loss functions, and the loss functions of the classification network are cross quotient loss functions.

Optionally, the plurality of prosthesis attack image samples includes: a prosthesis based on at least one of the head phantom, the picture, and the mask attacks the image sample.

According to another embodiment of the present application, each unit in the model training apparatus shown in fig. 9 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the data processing apparatus may also include other units, and in practical applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, a model training apparatus as shown in fig. 9 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective method as shown in fig. 2 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and implementing the model training method of the embodiment of the present application. The computer program may be recorded on a computer-readable storage medium, for example, and loaded into and executed in a data processing apparatus via the computer-readable storage medium.

The model training device based on the application can know that the prosthesis attack image sample for training the living body detection model is marked with three labels, wherein the three labels comprise an authenticity label for representing the authenticity classification of the prosthesis attack image sample, a prosthesis border region label for representing the characteristics of a prosthesis border region in the prosthesis attack image sample and a face region label for representing the characteristics of a face region in the prosthesis attack image sample. In the training process, the coding network of the living body detection model can purposefully learn the respective characteristic knowledge of the human face and the prosthesis under the supervision of the label of the frame region of the prosthesis and the label of the human face region, so that the capability of accurately extracting the respective characteristics of the human face and the prosthesis in the attack image of the prosthesis is provided. In addition, the perception network of the living body detection model can fuse the characteristics of the human face and the prosthesis in the prosthesis attack image and then submit the characteristics to the classification network of the living body detection model, and the classification network learns how to combine the characteristics of the human face and the prosthesis to carry out the true and false analysis of living body detection under the supervision of the true and false labels. According to the scheme, the living body detection model can learn the essential characteristics of the face and the prosthesis more purposefully, and the living body detection analysis is carried out by taking the essential characteristics as the factors, so that the accuracy and the fixed rate of the living body detection model are greatly improved, the living body detection model has good effects in the training stage and the using stage, and the human face recognition system can be assisted to prevent the prosthesis attack of an illegal user more effectively.

Corresponding to the method shown in fig. 7, the embodiment of the invention also provides a living body detection device. Fig. 10 is a schematic structural diagram of the living body detection apparatus 1000, including:

the captured image acquiring module 1010 acquires a face captured image of the target user in response to a living body detection request initiated by the target user.

The authenticity identifying module 1020 inputs the face shot image of the target user to a living body detection model trained based on the method described in fig. 1 to obtain an authenticity identifying result corresponding to the face shot image of the target user, wherein the living body detection model is used for encoding the face shot image into corresponding face area features and false body border area features, fusing the face area features and the false body border area features into fusion features, and then carrying out authenticity identifying on the face shot image based on the fusion features.

According to another embodiment of the present application, each unit in the living body detection apparatus shown in fig. 10 may be configured by combining each unit into one or several other units, or some unit(s) thereof may be configured by splitting into a plurality of units having smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the data processing apparatus may also include other units, and in practical applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, a living body detection apparatus as shown in fig. 10 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 7 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and implementing the model training method of the embodiment of the present application. The computer program may be recorded on a computer-readable storage medium, for example, and loaded into and executed in a data processing apparatus via the computer-readable storage medium.

When the living body detection device detects the living body of the target user, the characteristics of the face and the prosthesis can be pertinently tried to be extracted from the face shooting image of the target user, and the authenticity analysis of the face shooting image is carried out by combining the characteristics of the face and the prosthesis, namely whether the face shooting image is the face image of the real living body or the face image impounded by the prosthesis is determined, so that whether the living body passes or not is judged based on the authenticity identification result, and the prevention effect on the prosthesis attack is achieved.

Fig. 11 is a schematic structural view of an electronic device according to an embodiment of the present specification. Referring to fig. 11, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 11, but not only one bus or type of bus.

And a memory for storing a computer program. In particular, the computer program may comprise program code comprising computer operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

Alternatively, the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the model training apparatus shown in fig. 9 above on a logic level. Correspondingly, the processor executes the program stored in the memory and is specifically configured to perform the following operations:

the method comprises the steps that an image training sample set is obtained, the image training sample set comprises a plurality of prosthesis attack image samples and corresponding training labels, the training labels comprise face area labels, prosthesis border area labels and true-false labels, the face area labels are used for representing face area characteristics of the corresponding prosthesis attack image samples, the prosthesis border area labels are used for representing prosthesis border area characteristics of the corresponding prosthesis attack image samples, and the true-false labels are used for representing true-false classification of the corresponding prosthesis attack image samples.

And training the initial living body detection model by using the image training sample set to obtain a living body detection model.

Alternatively, the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program, and the living body detection device shown in fig. 10 is formed on a logic level. Correspondingly, the processor executes the program stored in the memory and is specifically configured to perform the following operations:

and responding to a living body detection request initiated by the target user, and acquiring a face shooting image of the target user.

Inputting the face shooting image of the target user into a living body detection model to obtain an authenticity identification result corresponding to the face shooting image of the target user; wherein, the living body detection model is trained based on the method shown in figure 1; the living body detection model is used for encoding the face shooting image into corresponding face area characteristics and false body border area characteristics, fusing the face area characteristics and the false body border area characteristics into fusion characteristics, and then carrying out true and false identification on the face shooting image based on the fusion characteristics.

The model training method or the living body detection method disclosed in the embodiment shown in the specification can be applied to a processor and implemented by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

Of course, in addition to the software implementation, the electronic device in this specification does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following process is not limited to each logic unit, but may also be hardware or a logic device.

Furthermore, an embodiment of the present invention also proposes a computer-readable storage medium storing one or more programs, the one or more programs including instructions.

Optionally, the instructions, when executed by a portable electronic device comprising a plurality of applications, enable the portable electronic device to perform the steps of the method shown in fig. 2, comprising:

Alternatively, the instructions, when executed by a portable electronic device comprising a plurality of applications, enable the portable electronic device to perform the steps of the method shown in fig. 7, comprising:

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The foregoing is merely an example of the present specification and is not intended to limit the present specification. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description. Moreover, all other embodiments obtained by those skilled in the art without making any inventive effort shall fall within the scope of protection of this document.

Claims

1. A method of model training, comprising:

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the coding network comprises a first coding sub-network and a second coding sub-network, wherein the first coding sub-network is used for coding the face region of the prosthesis attack image sample to obtain corresponding face region characteristics, and the second coding sub-network is used for coding the prosthesis border region of the prosthesis attack image sample to obtain corresponding prosthesis border region characteristics.

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the face region label is a gray matrix corresponding to a first black-and-white binary rule of a prosthesis attack image sample, wherein in the first black-and-white binary rule, pixels of a face region adopt a first gray value, and a non-face region adopts a second gray value; the first coding sub-network is specifically configured to perform gray matrix coding on the prosthesis attack image sample according to the first black-white binary rule, so as to obtain a corresponding face region feature.

4. The method of claim 2, wherein the step of determining the position of the substrate comprises,

5. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the training of the initial living body detection model by using the image training sample set comprises the following steps:

Determining a loss function of the first coding sub-network based on the difference between the face region characteristics corresponding to each prosthesis attack image sample and the face region labels; the method comprises the steps of,

determining a loss function of the second coding sub-network based on the difference between the false border region characteristics corresponding to each false attack image sample and the false border region labels and the difference between the true and false identification results and the true and false labels; the method comprises the steps of,

determining a loss function of the classification network based on the difference between the true and false identification result corresponding to each false attack image sample and the true and false label;

determining a total loss function of the living detection model based on the loss functions of the first encoding sub-network, the second encoding sub-network, and the classification network;

a training gradient of the living detection model is determined based on a total loss function of the living detection model.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

the loss functions of the first coding sub-network and the second coding sub-network are mean square error loss functions, and the loss functions of the classification network are cross quotient loss functions.

7. A living body detecting method, characterized by comprising:

inputting the face shooting image of the target user into a living body detection model to obtain an authenticity identification result corresponding to the face shooting image of the target user; wherein the in vivo detection model is trained based on the method of any one of claims 1-7; the living body detection model is used for encoding the face shooting image into corresponding face area characteristics and false body border area characteristics, fusing the face area characteristics and the false body border area characteristics into fusion characteristics, and then carrying out true and false identification on the face shooting image based on the fusion characteristics.

8. A living body detecting device, characterized by comprising:

the authenticity identification module is used for inputting the face shooting image of the target user into a living body detection model to obtain an authenticity identification result corresponding to the face shooting image of the target user; wherein the in vivo detection model is trained based on the method of any one of claims 1-6; the living body detection model is used for encoding the face shooting image into corresponding face area characteristics and false body border area characteristics, fusing the face area characteristics and the false body border area characteristics into fusion characteristics, and then carrying out true and false identification on the face shooting image based on the fusion characteristics.

9. An electronic device includes: a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program is executed by the processor to perform the method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1 to 7.