WO2022111688A1

WO2022111688A1 - Face liveness detection method and apparatus, and storage medium

Info

Publication number: WO2022111688A1
Application number: PCT/CN2021/134018
Authority: WO
Inventors: 谢妍辉; 薛传颂; 廖晓锋; 刁继尧
Original assignee: 华为技术有限公司
Priority date: 2020-11-30
Filing date: 2021-11-29
Publication date: 2022-06-02
Also published as: CN114596638A

Abstract

The present application relates to the field of artificial intelligence, and in particular, to a face liveness detection method and apparatus, and a storage medium. The method comprises: after enabling a face recognition function, obtaining multimodal information corresponding to an object to be detected, the multimodal information comprising a face image, ambient light information and face gesture information; and performing liveness detection according to the multimodal information to obtain the liveness detection result, the liveness detection result being used for indicating whether said object is a living body. According to embodiments of the present application, liveness detection is performed according to the multimodal information corresponding to said object, and the multimodal information comprises the face image, ambient light information and face gesture information, such that feature information of input parameters of liveness detection in each dimension is more abundant, improving the accuracy of the liveness detection result.

Description

Face liveness detection method, device and storage medium

This application claims the priority of the Chinese patent application with the application number 202011380265.0 and the application title "Facial Liveness Detection Method, Device and Storage Medium" filed with the China Patent Office on November 30, 2020, the entire contents of which are incorporated by reference in in this application.

technical field

The present application relates to the field of artificial intelligence, and in particular, to a method, a device and a storage medium for detecting a living body of a human face.

Background technique

The main purpose of face liveness detection is to determine whether the current face is a live face, so as to resist the attack of fake faces. Face liveness detection is an important step before face recognition. With the application of face recognition in many important fields such as face unlocking and face payment, the problem of using counterfeit faces to attack face recognition has become increasingly prominent. Face liveness detection is the main technical path to resist counterfeit face attacks.

In the related art, an image corresponding to an object to be detected is collected, and a living body detection of the object to be detected is performed based on the collected image to obtain a living body detection result of the object to be detected. However, this kind of living body detection method only pays attention to the information of the image, and the information concerned is limited and easy to be attacked, the accuracy of the living body detection result is low, and the effect of the living body detection is not good.

SUMMARY OF THE INVENTION

In view of this, a method, device and storage medium for face liveness detection are proposed, which can use multimodal information (including face image, ambient light information and face posture information) to perform liveness detection to obtain liveness detection results, which can improve Accuracy of liveness test results.

In a first aspect, an embodiment of the present application provides a face liveness detection method, which includes:

After the face recognition function is activated, the multi-modal information corresponding to the object to be detected is obtained, and the multi-modal information includes a face image, ambient light information and face posture information;

The living body detection result is obtained by performing the living body detection according to the multimodal information, and the living body detection result is used to indicate whether the object to be detected is a living body.

In this implementation, after the face recognition function is activated, the multimodal information corresponding to the object to be detected is obtained, and the living body detection result is obtained by performing living body detection according to the multimodal information. Information and face pose information make the input parameters of the living body detection more abundant in feature information in various dimensions, and improve the accuracy of the living body detection results.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the face image includes an initial first face image and a second face image after image signal processing, and the ambient light information includes a first illumination Intensity information, the face posture information includes the first face posture angle value.

In this implementation, the multi-modal information includes the initial first face image, the second face image after image signal processing, the first illumination intensity information, and the first face attitude angle value. These kinds of information are composed of Multimodal information is used as the input parameter of live detection. On the one hand, it can reduce the attack range, increase the difficulty of attack, and reduce the probability of being attacked. The attacker needs to obtain a specific posture (such as the front) or a photo under specific lighting conditions to attack. It is successful, and it also avoids the delay caused by the need to process multiple frames of images; on the other hand, the input parameters of the living body detection are richer in feature information in each dimension, and the accuracy of the living body detection results is improved.

In combination with the first possible implementation manner of the first aspect, in the second possible implementation manner of the first aspect, performing in vivo detection according to multimodal information to obtain in vivo detection results, including:

obtaining a third face image according to the region where the face is located in the second face image;

Predicting the light intensity of the third face image to obtain the second light intensity information, and predicting the face attitude angle of the third face image to obtain the second face attitude angle value;

The living body detection result is obtained by performing living body detection according to the second light intensity information, the second face attitude angle value and the multimodal information.

In this implementation, the third face image is obtained according to the region where the face is located in the second face image; the second light intensity information is obtained by predicting the light intensity of the third face image, and the third face image is Predict the face attitude angle to obtain the second face attitude angle value; perform living body detection according to the second light intensity information, the second face attitude angle value and the multimodal information to obtain the living body detection result, and based on the obtained multimodality Information, the predicted second light intensity information, and the predicted second face posture angle value are used for living body detection, which further ensures the detection effect of the living body detection result.

With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the first illumination intensity information includes a first illumination intensity value, and the second illumination intensity information includes a second illumination intensity value , according to the second light intensity information, the second face attitude angle value and the multimodal information, performing live detection to obtain a living body detection result, including:

The absolute value of the difference between the first light intensity value and the second light intensity value is determined as the first difference, and the absolute value of the difference between the first face pose angle value and the second face pose angle value is determined as the second difference;

When the first difference and the second difference meet the preset conditions, input the first face image and the second face image into the trained single-frame living detection model, and output the living body detection result;

Wherein, the preset condition includes that the first difference value is smaller than the first preset threshold value and the second difference value is smaller than the second preset threshold value.

In this implementation manner, the difference between the first light intensity value and the second light intensity value, and the difference between the second face pose angle value and the second face pose angle value are first judged. If the difference is small, the subsequent face liveness detection is performed according to the first face image and the second face image, which improves the liveness detection efficiency and further ensures the accuracy of the liveness detection result.

With reference to the second possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the first illumination intensity information includes a first illumination intensity value, and the second illumination intensity information includes a second illumination intensity value , according to the second light intensity information, the second face attitude angle value and the multimodal information, performing live detection to obtain a living body detection result, including:

When the first difference and the second difference do not meet the preset condition, outputting a first detection result, where the first detection result is used to indicate that the object to be detected is a non-living body;

In this implementation, the difference between the first light intensity value and the second light intensity value, and the difference between the second face pose angle value and the second face pose angle value are judged, if one of the differences If it is larger, it is determined that the object to be detected is not a living body, and the living body detection result is directly output, which improves the living body detection efficiency and further ensures the accuracy of the living body detection result.

With reference to the second possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the first illumination intensity information includes a first illumination intensity level, and the second illumination intensity information includes a second illumination intensity level , according to the second light intensity information, the second face attitude angle value and the multimodal information, performing live detection to obtain a living body detection result, including:

When the first light intensity level and the second light intensity level are the same level, and the absolute value of the difference between the first face pose angle value and the second face pose angle value is smaller than the second preset threshold, the first face The image and the second face image are input to the trained single-frame in vivo detection model, and the output is to obtain the in vivo detection result.

In this implementation manner, the difference between the first light intensity level and the second light intensity level, and the difference between the second face pose angle value and the second face pose angle value are judged first. If the difference is small, the subsequent face liveness detection is performed according to the first face image and the second face image, which improves the liveness detection efficiency and further ensures the accuracy of the liveness detection result.

With reference to the second possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the first illumination intensity information includes a first illumination intensity level, and the second illumination intensity information includes a second illumination intensity level , according to the second light intensity information, the second face attitude angle value and the multimodal information, performing live detection to obtain a living body detection result, including:

When the first light intensity level and the second light intensity level are not the same level, or the absolute value of the difference between the first face pose angle value and the second face pose angle value is greater than or equal to the second preset threshold, the output obtains the first A detection result, the first detection result is used to indicate that the object to be detected is not a living body.

In this implementation, the difference between the first light intensity level and the second light intensity level, and the difference between the second face posture angle value and the second face posture angle value are judged, if one of the differences If it is larger, it is determined that the object to be detected is not a living body, and the living body detection result is directly output, which improves the living body detection efficiency and further ensures the accuracy of the living body detection result.

With reference to the first aspect, in a seventh possible implementation manner of the first aspect, before performing the in vivo detection on the multimodal information to obtain the in vivo detection result, the method further includes:

Inputting the multimodal information into the preprocessing model and outputting the scene information, the scene information is used to indicate the current detection scene;

When the scene information is used to indicate that the current detection scene is a preset attack scene, pre-interception processing is performed on the multi-modal information.

In this implementation, after the face recognition function is activated, a preprocessing process is added before the multimodal information is detected in vivo, and the preset attack scenarios are pre-intercepted, the attack scope is narrowed, the interception efficiency is enhanced, and the single-frame living body is compensated. Detect model deficiencies.

In a second aspect, an embodiment of the present application provides a face liveness detection device, the device includes: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above-mentioned first when executing the instructions. A method provided by any one of the possible implementations of the aspect or the first aspect.

In a third aspect, an embodiment of the present application provides a face liveness detection device, the device includes at least one module, and the at least one module is used to implement the first aspect or any one of the possible implementations of the first aspect. Methods.

In a fourth aspect, embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying the computer-readable codes, when the computer-readable codes run in an electronic device At the time, the processor in the electronic device executes the method provided by the first aspect or any one of the possible implementation manners of the first aspect.

In a fifth aspect, embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the first aspect or any of the first aspects can be implemented A possible implementation of the provided method.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features and aspects of the application and together with the description, serve to explain the principles of the application.

FIG. 1 shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.

FIG. 2 shows a flowchart of a method for detecting a face living body provided by an exemplary embodiment of the present application.

FIG. 3 shows a flowchart of a method for detecting a face living body provided by another exemplary embodiment of the present application.

FIG. 4 shows a flowchart of a method for detecting a human face liveness provided by another exemplary embodiment of the present application.

FIG. 5 shows a flowchart of a method for detecting a face living body provided by another exemplary embodiment of the present application.

FIG. 6 shows a flowchart of a method for detecting a face living body provided by another exemplary embodiment of the present application.

FIG. 7 shows a flowchart of a method for detecting a face living body provided by another exemplary embodiment of the present application.

FIG. 8 shows a block diagram of a face liveness detection apparatus provided by an exemplary embodiment of the present application.

Detailed ways

Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, in order to better illustrate the present application, numerous specific details are given in the following detailed description. It should be understood by those skilled in the art that the present application may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present application.

First, the application scenarios involved in this application are introduced.

An embodiment of the present application provides a method for detecting a face living body, and the execution body is a computer device. Please refer to FIG. 1 , which shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.

The computer equipment may be a terminal or a server. The terminal includes a mobile terminal or a stationary terminal, for example, the terminal may be a mobile phone, a tablet computer, a laptop computer, a desktop computer, and the like. The server can be one server, or a server cluster composed of several servers, or a cloud computing service center.

As shown in FIG. 1 , the computer device includes a processor 10 , a memory 20 and a communication interface 30 . Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the computer device, and may include more or less components than the one shown, or combine some components, or arrange different components. in:

The processor 10 is the control center of the computer equipment, using various interfaces and lines to connect various parts of the entire computer equipment, by running or executing the software programs and/or modules stored in the memory 20, and calling the data stored in the memory 20. , perform various functions of computer equipment and process data, so as to carry out overall control of computer equipment. The processor 10 may be implemented by a CPU, or may be implemented by a graphics processor (Graphics Processing Unit, GPU).

The memory 20 may be used to store software programs and modules. The processor 10 executes various functional applications and data processing by executing software programs and modules stored in the memory 20 . The memory 20 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system 21, an acquisition module 22, a detection module 23, and an application program 24 (such as neural network training, etc.) required for at least one function; The storage data area may store data or the like created according to the use of the computer device. The memory 20 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable). Programmable Read-Only Memory, EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read Only Memory (Read Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk. Accordingly, memory 20 may also include a memory controller to provide processor 10 access to memory 20 .

Wherein, the processor 20 performs the following functions by running the acquisition module 22: after the computer device starts the face recognition function, it acquires multimodal information corresponding to the object to be detected, and the multimodal information includes a face image, ambient light information and human face posture information; the processor 20 performs the following functions by running the detection module 23: performing living body detection according to the multimodal information to obtain a living body detection result used to indicate whether the object to be detected is a living body.

The method provided by the embodiment of the present application can be applied to any detection scenario of face liveness detection. Illustratively, in the financial field, there is a demand for face liveness detection. Users can perform operations that require authentication, such as transfers, payments, or modification of account information through their smartphones. For example, when multiple face images of a user are collected through a smartphone, the smartphone can use the face liveness detection method provided in this application to identify the identity of user A, so as to determine whether the operation is initiated by user A himself of. Illustratively, in the security field, self-service customs clearance equipment can be used for customs clearance inspection. For example, user B conducts customs clearance inspection through self-service customs clearance equipment. The self-service customs clearance equipment can use the face liveness detection method provided in this application to conduct liveness detection on the collected avatar of user B to identify whether the identity is fraudulently used. Illustratively, in the field of attendance, it can be applied to face punch cards or face access control systems. For example, when user C punches in or unlocks the access control, his face is detected to prevent him from punching in on behalf of others or unrelated persons from fraudulently using their identities. The methods provided in the embodiments of the present application can also be applied to other face unlocking or face payment scenarios, and the retrieval scenarios are not exhaustively described here.

Hereinafter, several exemplary embodiments are used to introduce the face liveness detection method provided by the embodiments of the present application.

Please refer to FIG. 2 , which shows a flowchart of a method for detecting a face liveness provided by an exemplary embodiment of the present application. This embodiment is illustrated by using the method in the computer device shown in FIG. 1 . The method includes the following steps.

Step 201 , after the face recognition function is activated, obtain multimodal information corresponding to the object to be detected, where the multimodal information includes a face image, ambient light information and face posture information.

Optionally, when the computer device receives the preset trigger signal, the face recognition function is activated to obtain multimodal information corresponding to the object to be detected.

Illustratively, the preset trigger signal is a user operation signal that triggers the activation of the face recognition function. For example, the preset trigger signal includes any one or a combination of a click operation signal, a slide operation signal, a double-click operation signal, and a long-press operation signal.

In other possible implementation manners, the preset trigger signal may also be implemented in the form of voice. For example, a computer device receives a voice signal input by a user, and analyzes the voice signal to obtain voice content. When a preset keyword exists in the voice content, it means that the terminal receives a preset trigger signal and activates the face recognition function.

Optionally, acquiring the multimodal information corresponding to the object to be detected by the computer device includes: collecting a face image through a camera, and collecting ambient light information and face posture information through a sensor.

Optionally, the computer device collects the ambient light information and the face posture information in real time through sensors or at preset time intervals. The preset time interval is a default setting or a self-defined setting, which is not limited in this embodiment. For the convenience of introduction, the following description only takes the real-time collection of ambient light information and face posture information by sensors, that is, the ambient light information and the face posture information are both real-time information as an example for description.

Illustratively, the computer device collects real-time ambient light information through a light sensor, and collects real-time face posture information through a direction sensor. The embodiment of the present application does not limit the type of the sensor.

Optionally, the face image includes an initial first face image and a second face image after image signal processing (Image Signal Processing, ISP). The first face image is also called a face RAW image, and the first face image is an original image including the face of the object to be detected. The second face image is an image after passing through the ISP based on the first face image.

The ambient light information is used to indicate the lighting situation corresponding to the face of the object to be detected, and the face posture information is used to indicate the orientation of the face of the object to be detected.

Optionally, the ambient light information includes first illumination intensity information, and the first illumination intensity information includes a first illumination intensity value or a first illumination intensity level.

Optionally, the face pose information includes a first face pose angle value, and the first face pose angle value is an angle value of the face orientation of the object to be detected.

Step 202 , performing living body detection according to the multimodal information to obtain a living body detection result, and the living body detection result is used to indicate whether the object to be detected is a living body.

The computer device performs a living body detection according to the multimodal information to obtain a living body detection result, and the living body detection result is used to indicate whether the object to be detected is a living body.

Optionally, the computer device invokes the trained target in vivo detection model to perform in vivo detection on the multimodal information, and outputs the in vivo detection result.

Among them, the target living detection model is a model obtained by training a neural network based on the multimodal information of the sample and the correct detection result. That is, the target living detection model is determined according to the multimodal information of the sample and the correct detection result. The correct detection result is a pre-marked correct living body detection result corresponding to the multimodal information of the sample.

The target liveness detection model is used to indicate the correlation between the multimodal information and the liveness detection results.

The target living body detection model is a preset mathematical model, and the target living body detection model includes a model coefficient between the multimodal information and the living body detection result. The model coefficient may be a fixed value, a value that is dynamically modified with time, or a value that is dynamically modified with the detection scene.

Optionally, the target live detection model includes a deep neural network (Deep Neural Network, DNN) model, a recurrent neural network (Recurrent Neural Networks, RNN) model, an embedding (embedding) model, and a gradient boosting decision tree (Gradient Boosting Decision Tree, GBDT) ) model and at least one of a logistic regression (Logistic Regression, LR) model.

Optionally, the living body detection result includes one of a first detection result and a second detection result, the first detection result is used to indicate that the object to be detected is a non-living body, and the second detection result is used to indicate that the object to be detected is a living body. For example, the first detection result is the first identification, the second detection result is the second identification, and the second identification is different from the first identification. This embodiment of the present application does not limit this.

In a schematic example, as shown in FIG. 3 , the mobile terminal is in an off-screen state, and when the mobile terminal detects a double-click operation signal acting on the screen or a click operation signal acting on the power-on button, the face unlocking process is started, The mobile terminal collects the initial first face image 31 and the second face image 32 after ISP through the camera, and collects the ambient light information 33 and the face posture information 34 in real time through the sensor, and the first face image 31, the first face image 31, the second face The two face images 32 , ambient light information 33 and face posture information 34 are input into the target living body detection model 35 and output to obtain a living body detection result. The living body detection result includes one of a living body identification and a non-living body identification.

To sum up, the method for detecting a living body of a face provided by the present application obtains the multimodal information corresponding to the object to be detected after the face recognition function is activated, and performs the living body detection according to the multimodal information to obtain the living body detection result. The state information includes face image, ambient light information and face pose information, which enriches the feature information of input parameters of living body detection in various dimensions and improves the accuracy of living body detection results.

Please refer to FIG. 4 , which shows a flowchart of a method for detecting a face liveness provided by another exemplary embodiment of the present application. This embodiment is exemplified by using the method in the computer device shown in FIG. 1 . The method includes the following steps.

Step 401, after the face recognition function is activated, obtain the multimodal information corresponding to the object to be detected, and the multimodal information includes the initial first face image, the second face image after ISP, and the first light intensity information and the first face pose angle value.

Optionally, the first light intensity information includes a first light intensity value or a first light intensity level.

It should be noted that, after the computer device starts the face recognition function, the process of acquiring the multimodal information corresponding to the object to be detected may refer to the relevant details in the above-mentioned embodiments, which will not be repeated here.

Before the computer equipment performs liveness detection on the multimodal information to obtain the liveness detection result, any preprocessing process can be added to pre-intercept the preset attack scenarios. Optionally, the computer device inputs the multimodal information into the preprocessing model and outputs the scene information, where the scene information is used to indicate the current detection scene; when the scene information is used to indicate that the current detection scene is a preset attack scene, state information for pre-interception processing.

Among them, the preprocessing model is a model obtained by training a neural network based on sample multimodal information and correct scene information. That is, the preprocessing model is determined according to the sample multimodal information and the correct scene information. The correct scene information is pre-marked correct scene information corresponding to the sample multimodal information.

The preprocessing model is used to indicate the correlation between multimodal information and scene information. For the relevant details of the preprocessing model, reference can be made to the relevant description of the above-mentioned target living body detection model, which will not be repeated here.

The scene information is used to indicate the current detection scene. The computer device determines whether the current detection scene is a preset attack scene, and if the current detection scene is a preset attack scene, pre-interception processing is performed on the multi-modal information, and subsequent living body detection steps are not performed, and prompt information is output, and the prompt information is used for Indicates that the current detection scene is a preset attack scene; or, outputs a first detection result, where the first detection result is used to indicate that the object to be detected is a non-living body.

If the current detection scene is not the preset attack scene, the computer device continues to perform the subsequent living body detection steps.

Step 402: Obtain a third face image according to the region where the face is located in the second face image.

Optionally, the computer device performs face detection and alignment on the second face image to obtain a face region in the second face image, and performs clipping processing on the face region to obtain a third face image. Wherein, the third face image is a valid face image obtained by cutting out the second face image after face detection and alignment.

Step 403 , predicting the illumination intensity of the third face image to obtain second illumination intensity information, and predicting the face attitude angle of the third face image to obtain the second face attitude angle value.

The computer device predicts the illumination intensity of the third face image by the prediction tool to obtain the predicted second illumination intensity information, and predicts the face posture angle of the third face image to obtain the predicted second face posture angle value.

Optionally, the computer device predicts the illumination intensity of the third face image based on a neural network classifier to obtain the second illumination intensity information. Optionally, the computer device performs face key point calculation on the third face image, and determines the pitch value in the Euler angle as the second face pose angle value. The embodiment of the present application does not limit the prediction manner.

Step 404: Perform living body detection according to the second illumination intensity information, the second face attitude angle value and the multimodal information to obtain a living body detection result.

The computer device performs living body detection according to the predicted second light intensity information, the predicted second face attitude angle value and the collected multimodal information to obtain a living body detection result.

In a possible implementation manner, the first illumination intensity information includes a first illumination intensity value, and the second illumination intensity information includes a second illumination intensity value. The above step 404 can be replaced by the following steps, as shown in FIG. 5 :

Step 501: Determine the absolute value of the difference between the first light intensity value and the second light intensity value as the first difference value, and determine the absolute value of the difference between the first face posture angle value and the second face posture angle value. is the second difference.

The computer device determines the absolute value of the difference between the first light intensity value and the second light intensity value as the first difference, and determines the absolute value of the difference between the first face pose angle value and the second face pose angle value as second difference.

Step 502 , determine whether the first difference value and the second difference value satisfy a preset condition, and the preset condition includes that the first difference value is smaller than the first preset threshold value and the second difference value is smaller than the second preset threshold value.

The computer device determines whether the first difference is smaller than the first preset threshold and whether the second difference is smaller than the second preset threshold, and if the first difference is smaller than the first preset threshold and the second difference is smaller than the second preset threshold, That is, if the first difference and the second difference satisfy the preset condition, step 503 is executed; if the first difference is greater than or equal to the first preset threshold, or the second difference is greater than or equal to the second preset threshold, that is, the first difference is greater than or equal to the second preset threshold. The first difference and the second difference do not meet the preset condition, and step 504 is executed.

Wherein, the first preset threshold value is the threshold value of the absolute value of the difference between the preset first illumination intensity value and the second illumination intensity value. , and the second preset threshold is the threshold of the absolute value of the difference between the preset first face attitude angle value and the second face attitude angle value.

Step 503, when the first difference value and the second difference value satisfy the preset condition, input the first face image and the second face image into the trained single-frame living body detection model, and output the living body detection result.

When the first difference value and the second difference value satisfy the preset condition, it means that the difference between the first light intensity value and the second light intensity value is small and the difference between the first face pose angle value and the second face pose angle value Smaller, the computer device needs to continue to perform liveness detection based on the first face image and the second face image.

Optionally, the computer device obtains the trained single-frame living body detection model, inputs the first face image and the second face image into the single-frame living body detection model, and outputs the living body detection result.

The single-frame living detection model is a model obtained by training the neural network based on the first face image of the sample, the second face image of the sample and the correct detection result. That is, the single-frame living detection model is determined according to the first face image of the sample, the second face image of the sample and the correct detection result. The sample first face image is the original image of the sample face image, and the sample second face image is an image based on the sample first face image after ISP. The correct detection result is a pre-marked correct living body detection result corresponding to the first face image of the sample and the second face image of the sample.

The single-frame living body detection model is used to indicate the correlation between the first face image, the second face image and the living body detection result. The relevant details of the single-frame living body detection model can be analogously referred to the relevant description of the above-mentioned target living body detection model, which will not be repeated here.

Optionally, the single-frame living detection model is obtained by training according to at least one set of sample data sets, and each set of sample data sets includes: a first sample face image, a second sample face image, and a pre-labeled correct detection result.

Optionally, before the computer device acquires the single-frame living body detection model, the terminal needs to train the single-frame living body detection model. The training process of the single-frame living body detection model includes: the server obtains a training sample set, the training sample set includes at least one set of sample data sets; and the error back propagation algorithm is used to train the at least one set of sample data sets to obtain a single-frame living body detection model.

Step 504, when the first difference value and the second difference value do not meet the preset conditions, output a first detection result, and the first detection result is used to indicate that the object to be detected is a non-living body.

When the first difference value and the second difference value do not meet the preset conditions, it means that the difference between the first light intensity value and the second light intensity value is large or the difference between the first face posture angle value and the second face posture angle value If the difference is large, the first detection result is directly output, and the first detection result is used to indicate that the object to be detected is not a living body. For example, the first detection result includes a non-living body identifier.

In another possible implementation manner, the first illumination intensity information includes a first illumination intensity level, and the second illumination intensity information includes a second illumination intensity level. The above step 404 may be replaced by the following steps, as shown in FIG. 6 . :

Step 601: Determine whether the first light intensity level and the second light intensity level are the same level, and whether the absolute value of the difference between the first face pose angle value and the second face pose angle value is less than a second preset threshold.

The computer device determines whether the first light intensity level and the second light intensity level are the same level, and whether the absolute value of the difference between the first face posture angle value and the second face posture angle value is less than the second preset threshold, if the first A light intensity level and the second light intensity level are the same level, and the absolute value of the difference between the first face pose angle value and the second face pose angle value is smaller than the second preset threshold, then go to step 602, if the first If the light intensity level and the second light intensity level are not the same level, or the absolute value of the difference between the first face pose angle value and the second face pose angle value is greater than or equal to the second preset threshold, step 603 is executed.

Step 602, when the first light intensity level and the second light intensity level are the same level, and the absolute value of the difference between the first face pose angle value and the second face pose angle value is less than the second preset threshold The face image and the second face image are input into the trained single-frame living detection model, and the output is to obtain the living body detection result.

When the first light intensity level and the second light intensity level are the same level, and the absolute value of the difference between the first face pose angle value and the second face pose angle value is smaller than the second preset threshold, it indicates that the first light intensity The difference between the value and the second light intensity value is small, and the difference between the first face posture angle value and the second face posture angle value is small, and the computer equipment needs to continue to perform the in vivo process based on the first face image and the second face image. detection.

It should be noted that, the computer equipment inputs the first face image and the second face image into the single-frame living body detection model, and the process of outputting the living body detection result can be analogous to the relevant description in the above-mentioned embodiment, which will not be repeated here. .

Step 603, when the first light intensity level and the second light intensity level are not the same level, or the absolute value of the difference between the first face pose angle value and the second face pose angle value is greater than or equal to the second preset threshold, A first detection result is outputted, and the first detection result is used to indicate that the object to be detected is a non-living body.

When the first light intensity level and the second light intensity level are not the same level, or the absolute value of the difference between the first face pose angle value and the second face pose angle value is greater than or equal to the second preset threshold, it means that the first The difference between the light intensity value and the second light intensity value is large or the difference between the first face attitude angle value and the second face attitude angle value is large, and the first detection result is directly output, and the first detection result is used to indicate the waiting The detection object is a non-living body. For example, the first detection result includes a non-living body identifier.

In a schematic example, as shown in FIG. 7 , after starting the face recognition function, the mobile terminal collects the RAW image of the face, namely the first face image 71 and the normal face image after ISP, namely the second face Image 72, and obtain real-time ambient light information 73 and face posture information 74 through the sensor at the same time, wherein the ambient light information 73 includes the first light intensity value, namely lux_real-time, and the face posture information 74 includes the first face posture angle value i.e. pitch_realtime. After the mobile terminal performs face detection and alignment on the second face image 72, after the redundant parts are removed, the remaining valid face image is the third face image 75, and the illumination intensity of the third face image 75 is predicted to obtain the second face image. The light intensity value is lux_prediction, and the face pose angle is predicted on the third face image to obtain the second face pose angle value, which is pitch_prediction. Determine the absolute value of the difference between lux_real time and lux_prediction as the first difference, that is, lux_difference, and determine the absolute value of the difference between pitch_realtime and pitch_prediction as pitch_difference, and judge lux_difference Whether the value is less than the first preset threshold and whether the pitch_difference is less than the second preset threshold. When the lux_difference is greater than or equal to the first preset threshold, or the pitch_difference is greater than or equal to the second preset threshold, the directly output living body detection result is a non-living body identification, which is used to indicate that the object to be detected is a non-living body . When the lux_difference is less than the first preset threshold and the pitch_difference is less than the second preset threshold, the algorithm flow of single-frame face living detection is triggered. The first face image 71 and the second face image 72 are input to the single-frame living body detection model 76 and output to obtain a living body detection result, and the living body detection result includes one of a living body identification and a non-living body identification.

To sum up, the method for detecting a face living body provided by the present application also obtains the multimodal information corresponding to the object to be detected after the face recognition function is activated, and the multimodal information includes the initial first face image, the The second face image after ISP, the first light intensity information, and the first face attitude angle value, these kinds of information form multi-modal information as the input parameters of living body detection. On the one hand, the attack range can be narrowed and the attack can be increased. Difficulty, reduce the probability of being attacked, the attacker needs to obtain a specific posture (such as the front) or a photo under specific lighting conditions to attack successfully, and also avoids the delay caused by the need to process multiple frames of images; in another On the one hand, the feature information of the input parameters of the living body detection in each dimension is more abundant, and the accuracy of the living body detection result is improved.

The method for detecting a living body of a face provided by the present application further obtains a third face image by performing face detection and alignment on the second face image and then extracting the third face image; and predicting the light intensity of the third face image to obtain the second light intensity information , and predict the face attitude angle of the third face image to obtain the second face attitude angle value; according to the second light intensity information, the second face attitude angle value and the multimodal information, perform living body detection to obtain the living body detection result , based on the acquired multi-modal information, the predicted second light intensity information and the predicted second face attitude angle value, the living body detection is performed, which further ensures the detection effect of the living body detection result.

The method for detecting a living body of a face provided by the present application further analyzes the difference between the first light intensity information and the second light intensity information, and the difference between the second face pose angle value and the second face pose angle value. Judgment, if one of the differences is large, it is determined that the object to be detected is a non-living body, and the living body detection result is directly output; if both differences are small, then follow-up is performed according to the first face image and the second face image. Face liveness detection improves the efficiency of liveness detection and further ensures the accuracy of liveness detection results.

The face liveness detection method provided by the present application also pre-intercepts preset attack scenarios by adding a preprocessing process before performing liveness detection on multi-modal information after the face recognition function is activated, narrowing the attack range and strengthening the interception. Efficiency and make up for the shortcomings of the single-frame live detection model.

Please refer to FIG. 8 , which shows a block diagram of a face liveness detection apparatus provided by an exemplary embodiment of the present application. The face liveness detection apparatus can be implemented by software, hardware or a combination of the two to become all or a part of the computer equipment shown in FIG. 1 . The face liveness detection apparatus may include: an acquisition module 810 and a detection module 820 .

The obtaining module 810 is configured to obtain multi-modal information corresponding to the object to be detected after the face recognition function is activated, and the multi-modal information includes a face image, ambient light information and face posture information;

The detection module 820 is configured to perform living body detection according to the multimodal information to obtain a living body detection result, and the living body detection result is used to indicate whether the object to be detected is a living body.

In a possible implementation manner, the face image includes an initial first face image and a second face image after image signal processing, the ambient light information includes first illumination intensity information, and the face posture information includes the first The face pose angle value.

In another possible implementation manner, the detection module 820 is further configured to:

In another possible implementation manner, the first illumination intensity information includes a first illumination intensity value, the second illumination intensity information includes a second illumination intensity value, and the detection module 820 is further configured to:

In another possible implementation manner, the first illumination intensity information includes a first illumination intensity value, and the second illumination intensity information includes a second illumination intensity value, and the detection module 820 is further configured to:

In another possible implementation manner, the first illumination intensity information includes a first illumination intensity level, and the second illumination intensity information includes a second illumination intensity level. The detection module 820 is further configured to:

In another possible implementation manner, the apparatus further includes: a preprocessing module; the preprocessing module is used for:

It should be noted that when the device provided in the above embodiment realizes its functions, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated to different functional modules according to actual needs. The content structure of the device is divided into different functional modules to complete all or part of the functions described above.

Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

An embodiment of the present application provides a computer device, the computer device includes: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above method when executing the instructions. Optionally, the computer device is a terminal or a server. This embodiment does not limit this.

Embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are executed in a processor of an electronic device , the processor in the electronic device executes the above method.

Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.

A computer-readable storage medium may be a tangible device that retains and stores instructions for use by the instruction execution device. For example, examples of computer-readable storage media include, but are not limited to, electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Electrically Programmable Read-Only-Memory, EPROM or flash memory), static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD - ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punch cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing .

The computer readable program instructions or code described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

The computer program instructions used to perform the operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, can be connected to an external computer (e.g. use an internet service provider to connect via the internet). In some embodiments, electronic circuits, such as programmable logic circuits, Field-Programmable Gate Arrays (FPGA), or Programmable Logic Arrays (Programmable Logic Arrays), are personalized by utilizing state information of computer-readable program instructions. Logic Array, PLA), the electronic circuit can execute computer readable program instructions to implement various aspects of the present application.

Aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the present application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams. These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of apparatuses, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in hardware (eg, circuits or ASICs (Application) that perform the corresponding functions or actions. Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented by a combination of hardware and software, such as firmware.

Although the application is described herein in conjunction with the various embodiments, those skilled in the art will understand and understand from a review of the drawings, the disclosure, and the appended claims in practicing the claimed application. Other variations of the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.

Various embodiments of the present application have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein.

Claims

A face liveness detection method, characterized in that the method comprises:

After starting the face recognition function, obtain multimodal information corresponding to the object to be detected, where the multimodal information includes a face image, ambient light information and face posture information;

A living body detection result is obtained by performing a living body detection according to the multimodal information, and the living body detection result is used to indicate whether the object to be detected is a living body.
The method according to claim 1, wherein the face image includes an initial first face image and a second face image after image signal processing, and the ambient light information includes first illumination intensity information , the face posture information includes a first face posture angle value.
The method according to claim 2, wherein the obtaining the result of the in vivo detection by performing the in vivo detection according to the multimodal information comprises:

Obtain a third face image according to the region where the face is located in the second face image;

Performing light intensity prediction on the third face image to obtain second light intensity information, and performing a face attitude angle prediction on the third face image to obtain a second face attitude angle value;

The living body detection result is obtained by performing living body detection according to the second illumination intensity information, the second face attitude angle value and the multimodal information.
The method according to claim 3, wherein the first illumination intensity information includes a first illumination intensity value, the second illumination intensity information includes a second illumination intensity value, and the second illumination intensity information includes a second illumination intensity value. Information, the second face attitude angle value and the multimodal information are subjected to living body detection to obtain the living body detection result, including:

The absolute value of the difference between the first light intensity value and the second light intensity value is determined as the first difference value, and the difference between the first face posture angle value and the second face posture angle value is determined. The absolute value of the difference is determined as the second difference;

When the first difference and the second difference meet the preset conditions, the first face image and the second face image are input into the trained single-frame living detection model, and the output is the biopsy result;

Wherein, the preset condition includes that the first difference value is smaller than a first preset threshold value and the second difference value is smaller than a second preset threshold value.
The method according to claim 3, wherein the first illumination intensity information includes a first illumination intensity value, the second illumination intensity information includes a second illumination intensity value, and the second illumination intensity information includes a second illumination intensity value. Information, the second face attitude angle value and the multimodal information are subjected to living body detection to obtain the living body detection result, including:

The absolute value of the difference between the first light intensity value and the second light intensity value is determined as the first difference value, and the difference between the first face posture angle value and the second face posture angle value is determined. The absolute value of the difference is determined as the second difference;

When the first difference value and the second difference value do not meet the preset conditions, the output obtains a first detection result, and the first detection result is used to indicate that the object to be detected is a non-living body;

Wherein, the preset condition includes that the first difference value is smaller than a first preset threshold value and the second difference value is smaller than a second preset threshold value.
The method according to claim 3, wherein the first illumination intensity information includes a first illumination intensity level, the second illumination intensity information includes a second illumination intensity level, and the second illumination intensity information includes a second illumination intensity level. Information, the second face attitude angle value and the multimodal information are subjected to living body detection to obtain the living body detection result, including:

When the first light intensity level and the second light intensity level are the same level, and the absolute value of the difference between the first face pose angle value and the second face pose angle value is smaller than the second preset When the threshold is set, the first face image and the second face image are input into the trained single-frame living detection model, and the living body detection result is obtained by outputting.
The method according to claim 3, wherein the first illumination intensity information includes a first illumination intensity level, the second illumination intensity information includes a second illumination intensity level, and the second illumination intensity information includes a second illumination intensity level. Information, the second face attitude angle value and the multimodal information are subjected to living body detection to obtain the living body detection result, including:

When the first light intensity level and the second light intensity level are not the same level, or the absolute value of the difference between the first face pose angle value and the second face pose angle value is greater than or equal to the second When the threshold is preset, a first detection result is output, and the first detection result is used to indicate that the object to be detected is a non-living body.
The method according to claim 1, characterized in that, before performing the in vivo detection on the multimodal information to obtain the in vivo detection result, the method further comprises:

Inputting the multimodal information into the preprocessing model and outputting scene information, the scene information is used to indicate the current detection scene;

When the scene information is used to indicate that the current detection scene is a preset attack scene, pre-interception processing is performed on the multimodal information.
A face liveness detection device, characterized in that the device comprises:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to implement the method of any one of claims 1-8 when executing the instructions.
A non-volatile computer-readable storage medium on which computer program instructions are stored, characterized in that, when the computer program instructions are executed by a processor, the method of any one of claims 1-8 is implemented.