CN115512428A

CN115512428A - A method, system, device, and storage medium for live face discrimination

Info

Publication number: CN115512428A
Application number: CN202211420960.4A
Authority: CN
Inventors: 谭明奎; 李代远; 陈果; 杜卿
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2022-12-23
Anticipated expiration: 2042-11-15
Also published as: CN115512428B

Abstract

The present invention discloses a method, system, device and storage medium for human face identification, wherein the method includes: acquiring a target image to be identified, and acquiring a face detection frame of the target image; Acquire the first human face image and the second human face image in the target image; the first human face image is input into the first human face living body discrimination model to obtain the first feature and the first output result; the second human face image is input The face image is input into the second human face living body discrimination model to obtain the second feature and the second output result; after the first feature and the second feature are spliced, they are input into the fusion weight prediction sub-model to obtain the two human face living body discrimination models. Fusion weight: Acquire the final output result of human face liveness discrimination according to the fusion weight, the first output result and the second output result. The invention can effectively improve the living body discrimination performance under the condition of limited computing resources and storage space, and can be widely used in the field of image processing.

Description

A method, system, device, and storage medium for live face discrimination

技术领域technical field

本发明涉及图像处理领域，尤其涉及一种人脸活体判别方法、系统、装置和存储介质。The present invention relates to the field of image processing, in particular to a method, system, device and storage medium for distinguishing living human faces.

背景技术Background technique

目前人脸识别技术因其便利性，被广泛应用于智慧安防，智能家居，金融支付等领域。然而，由于人脸信息易获取，不法分子可以通过纸张打印，电子屏幕翻拍，甚至三维打印等物理的方式重现授权人脸图像，从而轻易绕过人脸识别系统，这种相对低成本的欺骗手段给人脸识别系统带来严重安全隐患。为辨别人脸图像真伪、确保系统安全，人脸活体判别技术成为业界关注的焦点。At present, face recognition technology is widely used in smart security, smart home, financial payment and other fields because of its convenience. However, due to the easy access to face information, lawbreakers can reproduce authorized face images through physical means such as paper printing, electronic screen reproduction, or even 3D printing, thereby easily bypassing the face recognition system. This relatively low-cost deception The method brings serious security risks to the face recognition system. In order to identify the authenticity of facial images and ensure system security, face liveness discrimination technology has become the focus of the industry.

人脸识别系统应用大多面向社区门禁，智能门锁等边端嵌入式硬件平台，计算芯片性能、功耗和存储空间有限，对算法模型的参数量以及计算复杂度均有严格限制，设备端难以部署复杂的活体判别算法模型。其次，在图像采集设备端，虽然引入基于结构光或激光散斑的深度相机可以有效帮助抵御打印和电子显示等平面攻击方式，但考虑到深度相机较高的器件成本，目前基于单目红外摄像的图像采集方案依然被广泛采用，如何在有限的计算资源，基于单帧图像充分挖掘人脸局部信息、背景区域信息以及人脸和上下文背景的相对关系信息，对于替身单目活体判别任务有着重要意义。Face recognition system applications are mostly oriented to community access control, smart door locks and other side-end embedded hardware platforms. The computing chip performance, power consumption and storage space are limited, and there are strict restrictions on the number of parameters and computational complexity of the algorithm model. Deploy complex living body discrimination algorithm models. Secondly, on the image acquisition device side, although the introduction of a depth camera based on structured light or laser speckle can effectively help resist planar attacks such as printing and electronic display, but considering the high device cost of the depth camera, currently based on monocular infrared camera The image acquisition scheme is still widely used. How to fully mine the local information of the face, the background area information and the relative relationship information between the face and the context background based on a single frame image with limited computing resources is of great importance for the task of stand-in monocular liveness discrimination. significance.

发明内容Contents of the invention

为至少一定程度上解决现有技术中存在的技术问题之一，本发明的目的在于提供一种人脸活体判别方法、系统、装置和存储介质。In order to solve one of the technical problems in the prior art at least to a certain extent, the object of the present invention is to provide a method, system, device and storage medium for identifying a living human face.

本发明所采用的技术方案是：The technical scheme adopted in the present invention is:

一种人脸活体判别方法，包括以下步骤：A method for discriminating live human faces, comprising the following steps:

获取待判别的目标图像，获取所述目标图像的人脸检测框；Obtain the target image to be discriminated, and obtain the face detection frame of the target image;

根据所述人脸检测框从所述目标图像中获取第一人脸图像和第二人脸图像；其中，第一人脸图像为不包含背景的人脸图像，第二人脸图像为包含背景的人脸图像；Acquire a first human face image and a second human face image from the target image according to the human face detection frame; wherein, the first human face image is a human face image that does not contain a background, and the second human face image is a human face image that includes a background face image;

将所述第一人脸图像输入第一人脸活体判别模型，获得第一特征和第一输出结果；Inputting the first human face image into the first human face living body discrimination model to obtain the first feature and the first output result;

将所述第二人脸图像输入第二人脸活体判别模型，获得第二特征和第二输出结果；Inputting the second human face image into a second human face living body discrimination model to obtain a second feature and a second output result;

将所述第一特征和所述第二特征拼接后输入融合权重预测子模型，获得两个人脸活体判别模型的融合权重；After splicing the first feature and the second feature, input the fusion weight prediction sub-model to obtain the fusion weight of two human face living body discrimination models;

根据所述融合权重、第一输出结果和第二输出结果获取人脸活体判别的最终输出结果。The final output result of human face liveness discrimination is obtained according to the fusion weight, the first output result and the second output result.

进一步地，所述根据所述人脸检测框从所述目标图像中获取第一人脸图像和第二人脸图像，包括：Further, the acquiring the first face image and the second face image from the target image according to the face detection frame includes:

根据所述人脸检测框从所述目标图像中裁剪出第一人脸图像；Cutting out a first human face image from the target image according to the human face detection frame;

对所述人脸检测框进行放大处理，获得第二检测框；Enlarging the face detection frame to obtain a second detection frame;

根据第二检测框从所述目标图像中裁剪出第二人脸图像。Cutting out a second face image from the target image according to the second detection frame.

进一步地，所述第一人脸活体判别模型和所述第二人脸活体判别模型的基础框架相同，且均为轻量级网络模型；Further, the basic framework of the first human face liveness discrimination model and the second human face liveness discrimination model are the same, and both are lightweight network models;

人脸活体判别模型通过堆叠多个MoblieNetV2的Invert residual block构成。The live face discrimination model is composed of multiple Invert residual blocks of MoblieNetV2.

进一步地，所述第一人脸活体判别模型和所述第二人脸活体判别模型通过以下方式进行训练：Further, the first live face discrimination model and the second live face discrimination model are trained in the following manner:

在第一人脸活体判别模型和所述第二人脸活体判别模型的中间层的输出端插入点卷积层、批量归一化层、线性整流层、以构建像素级分类器；Insert point convolution layer, batch normalization layer, linear rectification layer at the output end of the intermediate layer of the first human face liveness discriminant model and the second described second human face liveness discriminant model, to build pixel-level classifier;

根据训练集获取人脸图像和包含背景的人脸图像，根据建像素级分类器分别生成人脸图像和包含背景的人脸图像的像素级分类标签；Obtaining the face image and the face image containing the background according to the training set, and generating the pixel-level classification labels of the face image and the face image containing the background respectively according to the pixel-level classifier;

在所述第一人脸活体判别模型和第二人脸活体判别模型中插入像素级分类损失，并与Focal Loss共同构成目标损失函数；Inserting a pixel-level classification loss into the first human face living body discrimination model and the second human face living body discrimination model, and forming a target loss function together with Focal Loss;

根据二分类标签和所述像素级分类标签优化所述目标损失函数，根据目标损失函数训练所述第一人脸活体判别模型和所述第二人脸活体判别模型。Optimizing the target loss function according to the binary classification label and the pixel-level classification label, and training the first human face liveness discriminant model and the second human face liveness discriminant model according to the target loss function.

进一步地，所述像素级分类标签的生成过程，包括：Further, the generation process of the pixel-level classification label includes:

对于人脸图像的像素级分类标签：构建与所述像素级分类器输出同尺寸的标签图，当所述人脸图像为欺骗人脸类别，分类标签图的值全设为0；当所述人脸图像为真实人脸时，分类标签图的值全设为1；For the pixel-level classification label of the face image: construct a label map with the same size as the output of the pixel-level classifier, when the face image is a deceptive face category, the values of the classification label map are all set to 0; when the When the face image is a real face, the values of the classification label map are all set to 1;

对于包含背景的人脸图像的像素级分类标签：构建与所述像素级分类器输出同尺寸的标签图，当输入图像为欺骗人脸时，标签图内欺骗人脸区域内对应的值为2，背景部分的值为0；在输入图像为真实人脸时，标签图内人脸区域的值为1，背景部分的值为0。For the pixel-level classification label of the face image containing the background: construct a label map with the same size as the pixel-level classifier output, when the input image is a spoofed face, the corresponding value in the spoofed face area in the label map is 2 , the value of the background part is 0; when the input image is a real face, the value of the face area in the label image is 1, and the value of the background part is 0.

进一步地，所述根据目标损失函数训练所述第一人脸活体判别模型和所述第二人脸活体判别模型，包括：Further, the training of the first human face living body discrimination model and the second human face living body discrimination model according to the target loss function includes:

将所述第一人脸活体判别模型和所述第二人脸活体判别模型分开进行训练；Training the first human face living body discrimination model and the second human face living body discrimination model separately;

训练时，逐点计算所述像素级分类器输出和标签图中对应位置标签的交叉熵分类损失，该损失记为

，同时计算模型末端输出和二分类标签的Focal Loss损失，记为

；目标损失函数定义为：During training, the cross-entropy classification loss of the output of the pixel-level classifier and the corresponding position label in the label map is calculated point by point, and the loss is denoted as

, and calculate the Focal Loss loss of the end output of the model and the two-category label at the same time, denoted as

; The objective loss function is defined as:

式中，

为指定像素级分类损失函数的权重。In the formula,

For the weights of the specified pixel-level classification loss function.

进一步地，所述融合权重预测子模型通过堆叠卷积层、批归一化层，线性整流层构成主干结构，同时引入通道注意力机制，以构建所述融合权重预测子模型的基础结构；Further, the fusion weight prediction sub-model forms a backbone structure by stacking convolutional layers, batch normalization layers, and linear rectification layers, and introduces a channel attention mechanism to build the basic structure of the fusion weight prediction sub-model;

所述融合权重预测子模型通过以下方式进行训练：The fusion weight prediction sub-model is trained in the following manner:

获取所述第一人脸活体判别模型输出的第一特征图和所述第二人脸活体判别模型输出的第二特征图；Obtaining the first feature map output by the first human face liveness discrimination model and the second feature map output by the second human face liveness discrimination model;

将第一特征图和第二特征图进行拼接后，输入融合权重预测子模型，输出所述第一人脸活体判别模型末端输出和第二人脸活体判别模型末端输出的加权融合权重；After splicing the first feature map and the second feature map, input the fusion weight prediction sub-model, and output the weighted fusion weight of the terminal output of the first human face liveness discrimination model and the second human face liveness discrimination model terminal output;

利用所述加权融合权重对第一人脸活体判别模型的末端输出和第二人脸活体判别模型的末端输出加权求和，得到最终输出；Utilize described weighted fusion weight to the terminal output of the first human face living body discriminant model and the terminal output weighted summation of the second human face live body discriminant model, obtain final output;

计算所述最终输出与二分类标签的Focal Loss，利用随机梯度下降法优化该损失以训练融合权重预测子模型；Calculate the Focal Loss of the final output and the two classification labels, and optimize the loss using the stochastic gradient descent method to train the fusion weight prediction sub-model;

其中，训练时冻结所述第一人脸活体判别模型和所述第二人脸活体判别模型的权重参数。Wherein, the weight parameters of the first human-face liveness discrimination model and the second human-face liveness discrimination model are frozen during training.

本发明所采用的另一技术方案是：Another technical scheme adopted in the present invention is:

一种人脸活体判别系统，包括：A human face living body discrimination system, comprising:

目标图像获取和人脸检测模块，用于获取待判别的目标图像，获取所述目标图像的人脸检测框；The target image acquisition and face detection module is used to acquire the target image to be discriminated, and obtain the face detection frame of the target image;

人脸图像裁剪模块，用于根据所述人脸检测框从所述目标图像中获取第一人脸图像和第二人脸图像；其中，第一人脸图像为不包含背景的人脸图像，第二人脸图像为包含背景的人脸图像；A face image cropping module, configured to obtain a first face image and a second face image from the target image according to the face detection frame; wherein, the first face image is a face image that does not contain a background, The second human face image is a human face image including a background;

活体判别预测模块，用于将所述第一人脸图像输入第一人脸活体判别模型，获得第一特征和第一输出；将所述第二人脸图像输入第二人脸活体判别模型，获得第二特征和第二输出；The live body discrimination prediction module is used to input the first human face image into the first human face live body discrimination model to obtain the first feature and the first output; the second human face image is input to the second human face live body discrimination model, obtaining a second feature and a second output;

预测融合模块，用于将所述第一特征和所述第二特征拼接后输入融合权重预测子模型，获得两个人脸活体判别模型的融合权重；A predictive fusion module, configured to splicing the first feature and the second feature into the fusion weight prediction sub-model to obtain the fusion weight of the two living face discrimination models;

判断模块，用于根据所述融合权重、第一输出和第二输出获取人脸活体判别的最终输出。A judging module, configured to obtain a final output of human face liveness discrimination according to the fusion weight, the first output and the second output.

一种人脸活体判别装置，包括：A human face living body discrimination device, comprising:

至少一个处理器；at least one processor;

至少一个存储器，用于存储至少一个程序；at least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行，使得所述至少一个处理器实现上所述方法。When the at least one program is executed by the at least one processor, the at least one processor implements the above method.

一种计算机可读存储介质，其中存储有处理器可执行的程序，所述处理器可执行的程序在由处理器执行时用于执行如上所述方法。A computer-readable storage medium stores a processor-executable program therein, and the processor-executable program is used to perform the above method when executed by a processor.

本发明的有益效果是：本发明构建了轻量级的活体判别模型，能应用于边端设备；基于该基础模型构建第一活体判别模型和第二活体判别模型，并分别设计输入图像和像素级分类信息，使二者关注差异化的活体判别线索，提升二者融合性能；通过融合权重子模块，以动态方式决定第一活体判别模型和第二活体判别模型预测的融合权重，能够充分利用了人脸区域和图像背景区域的上下文信息，进而在计算资源和存储空间受限条件下，有效提升活体判别性能。The beneficial effects of the present invention are: the present invention constructs a light-weight living body discrimination model, which can be applied to edge devices; based on the basic model, the first living body discrimination model and the second living body discrimination model are constructed, and input images and pixels are respectively designed Level classification information, so that the two focus on the differentiated living body discrimination clues, and improve the fusion performance of the two; through the fusion weight sub-module, the fusion weight predicted by the first living body discrimination model and the second living body discrimination model can be dynamically determined, which can make full use of The context information of the face area and the image background area is obtained, and the performance of living body discrimination is effectively improved under the condition of limited computing resources and storage space.

附图说明Description of drawings

为了更清楚地说明本发明实施例或者现有技术中的技术方案，下面对本发明实施例或者现有技术中的相关技术方案附图作以下介绍，应当理解的是，下面介绍中的附图仅仅为了方便清晰表述本发明的技术方案中的部分实施例，对于本领域的技术人员而言，在无需付出创造性劳动的前提下，还可以根据这些附图获取到其他附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following describes the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art. It should be understood that the accompanying drawings in the following introduction are only In order to clearly describe some embodiments of the technical solutions of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1是本发明实施例中一种人脸活体判别方法的步骤流程图；Fig. 1 is a flow chart of the steps of a method for discriminating living human faces in an embodiment of the present invention;

图2是本发明实施例中人脸活体判别模型基础框架示意图；Fig. 2 is a schematic diagram of the basic framework of the human face living body discrimination model in the embodiment of the present invention;

图3是本发明实施例中多模型预测融合的人脸活体判别方法示意图；Fig. 3 is a schematic diagram of a human face living body discrimination method for multi-model prediction fusion in an embodiment of the present invention;

图4是本发明实施例中融合权重预测子模型示意图；4 is a schematic diagram of a fusion weight prediction sub-model in an embodiment of the present invention;

图5是本发明实施例中一种人脸活体判别系统的结构示意图；Fig. 5 is a schematic structural diagram of a human face living body discrimination system in an embodiment of the present invention;

图6是本发明实施例中一种人脸活体判别装置的结构示意图。Fig. 6 is a schematic structural diagram of a human face living body discrimination device in an embodiment of the present invention.

具体实施方式detailed description

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。对于以下实施例中的步骤编号，其仅为了便于阐述说明而设置，对步骤之间的顺序不做任何限定，实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. For the step numbers in the following embodiments, it is only set for the convenience of illustration and description, and the order between the steps is not limited in any way. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art sexual adjustment.

在本发明的描述中，需要理解的是，涉及到方位描述，例如上、下、前、后、左、右等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the orientation descriptions, such as up, down, front, back, left, right, etc. indicated orientations or positional relationships are based on the orientations or positional relationships shown in the drawings, and are only In order to facilitate the description of the present invention and simplify the description, it does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

在本发明的描述中，若干的含义是一个或者多个，多个的含义是两个以上，大于、小于、超过等理解为不包括本数，以上、以下、以内等理解为包括本数。如果有描述到第一、第二只是用于区分技术特征为目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present invention, several means one or more, and multiple means more than two. Greater than, less than, exceeding, etc. are understood as not including the original number, and above, below, within, etc. are understood as including the original number. If the description of the first and second is only for the purpose of distinguishing the technical features, it cannot be understood as indicating or implying the relative importance or implicitly indicating the number of the indicated technical features or implicitly indicating the order of the indicated technical features relation.

本发明的描述中，除非另有明确的限定，设置、安装、连接等词语应做广义理解，所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。In the description of the present invention, unless otherwise clearly defined, words such as setting, installation, and connection should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above words in the present invention in combination with the specific content of the technical solution.

术语解释：Explanation of terms:

MoblieNetV2：由Google团队设计的第二代轻量级卷积神经网络基础架构。MoblieNetV2: The second-generation lightweight convolutional neural network infrastructure designed by the Google team.

Invert residual block：Invert residual block 指一种构成卷积神经网络的组件，该组件通常由多个顺序堆叠的点卷积层，深度可分离卷积，点卷积层，以及shortcut结构组成；输入特征的通道维度会由点卷积层进行扩展然后进行压缩。Invert residual block: Invert residual block refers to a component that constitutes a convolutional neural network, which usually consists of multiple sequentially stacked point convolution layers, depth separable convolution, point convolution layers, and shortcut structures; input features The channel dimension of is expanded and then compressed by a point convolution layer.

Focal Loss：一种二分类损失函数，其具体表达式如下：

Focal Loss: A two-category loss function whose specific expression is as follows:

其中

为模型的输出，该损失函数能动态调节容易样本和难样本对于总体损失的影响。in

As the output of the model, the loss function can dynamically adjust the influence of easy samples and hard samples on the overall loss.

二分类标签：2分类标签用于训练所述第一活体判别模型和第二活体判别模型，当输入图像为真实人脸时值为1，欺骗人脸时值为0。Two classification labels: two classification labels are used to train the first living body discrimination model and the second living body discrimination model, and the value is 1 when the input image is a real face, and the value is 0 when the face is deceived.

如图1所示，本实施例提供一种基于多模型预测融合的人脸活体判别预测方法，以实现算力、存储空间受限情况下单目活体判别的性能。该方法具体包括以下步骤：As shown in FIG. 1 , this embodiment provides a multi-model predictive fusion-based method for discrimination and prediction of liveness in human faces, so as to realize the performance of discrimination of livingness in monocular under the condition of limited computing power and storage space. The method specifically includes the following steps:

S1、获取待判别的目标图像，获取所述目标图像的人脸检测框。S1. Acquire a target image to be identified, and acquire a face detection frame of the target image.

作为可选的实施方式，目标图像可利用图像采集设备实时采集，也可直接读取存储设备中图像文件数据。目标图像的人脸检测框可通过将所述待识别目标图像输入至任意现有的人脸检测算法模型来获取，比如基于深度网络模型。其中，人脸检测框可以为矩形框，也可以为圆形框或者其他形状的检测框。As an optional implementation manner, the target image can be collected in real time by an image acquisition device, or directly read image file data in a storage device. The face detection frame of the target image can be obtained by inputting the target image to be recognized into any existing face detection algorithm model, such as a deep network model. Wherein, the face detection frame may be a rectangular frame, or a circular frame or a detection frame of other shapes.

S2、根据所述人脸检测框从所述目标图像中获取第一人脸图像和第二人脸图像；其中，第一人脸图像为不包含背景的人脸图像，第二人脸图像为包含背景的人脸图像。S2. Obtain a first human face image and a second human face image from the target image according to the human face detection frame; wherein, the first human face image is a human face image that does not contain a background, and the second human face image is An image of a face that includes a background.

作为一种可选的实施方式，人脸图像是指基于原始人脸检测框从所述目标图像中裁剪出的图像，包含背景的人脸图像是指将人脸检测框尺寸进行适当放大，再从所述目标图像中裁剪的图像。As an optional implementation manner, the face image refers to an image cut out from the target image based on the original face detection frame, and the face image including the background refers to appropriately amplifying the size of the face detection frame, and then An image cropped from the target image.

具体地，给定人脸坐标

，坐标中

表示人脸所在区域的左上角坐标，坐标中

表示人脸所在区域的右下角坐标，由左上角和右下角即可确定一个人脸框，抠取目标图像

矩形区域内的图像作为人脸图像；以

为中心，抠取目标图像

为左上角、

矩形区域内的图像内容包含作为包含背景的人脸图像。Specifically, given face coordinates

, in coordinates

Indicates the coordinates of the upper left corner of the area where the face is located, in the coordinates

Indicates the coordinates of the lower right corner of the area where the face is located. A face frame can be determined from the upper left corner and the lower right corner, and the target image is selected.

The image in the rectangular area is used as the face image;

Take the target image as the center

for the upper left corner,

The image content within the rectangular area contains the face image as the containing background.

S3、将所述第一人脸图像输入第一人脸活体判别模型，获得第一特征和第一输出结果；将所述第二人脸图像输入第二人脸活体判别模型，获得第二特征和第二输出结果。S3. Input the first human face image into the first human face living body discrimination model to obtain the first feature and the first output result; input the second human face image into the second human face living body discrimination model to obtain the second feature and the second output result.

S4、将所述第一特征和所述第二特征拼接后输入融合权重预测子模型，获得两个人脸活体判别模型的融合权重。S4. Splicing the first feature and the second feature into the fusion weight prediction sub-model to obtain the fusion weight of the two living face discrimination models.

S5、根据所述融合权重、第一输出结果和第二输出结果获取人脸活体判别的最终输出结果。S5. According to the fusion weight, the first output result and the second output result, obtain the final output result of human face liveness discrimination.

本实施例提出的多尺度模型预测融合活体判别方法，充分利用了人脸区域和图像背景区域的上下文信息，能在计算资源和存储空间受限条件下，有效提升活体判别性能。The multi-scale model prediction fusion living body discrimination method proposed in this embodiment fully utilizes the context information of the face area and the image background area, and can effectively improve the living body discrimination performance under the condition of limited computing resources and storage space.

参见图2，作为一种可选的实施方式，所述第一人脸活体判别模型和所述第二人脸活体判别模型的基础框架相同，且均为轻量级网络模型；具体地脸活体判别模型通过堆叠多个MoblieNetV2的Invert residual block构成。Referring to Fig. 2, as an optional implementation manner, the basic framework of the first human face living body discrimination model and the second human face living body discrimination model are the same, and both are lightweight network models; The discriminant model is formed by stacking multiple Invert residual blocks of MoblieNetV2.

在本实施例中，一个人脸活体判别模型包含4个Invert residual block，两个整体模型参数量小于760KB，能够有效地在面向低算力，低功耗和低内存的边端设备进行部署和应用。In this embodiment, a human face liveness discrimination model includes 4 Invert residual blocks, and the parameters of the two overall models are less than 760KB, which can be effectively deployed on edge devices with low computing power, low power consumption and low memory. application.

以下结合图2对人脸活体判别模型的构建方式以及训练方式进行详细解释说明。The construction method and training method of the human face living body discrimination model will be explained in detail below in conjunction with FIG. 2 .

步骤一、通过堆叠数个MoblieNetV2的Invert residual block分别构建第一人脸活体判别模型和第二人脸活体判别模型基础框架Step 1. Build the basic framework of the first live face discrimination model and the second live face discrimination model by stacking several Invert residual blocks of MoblieNetV2

步骤二、在第一人脸活体判别模型和第二人脸活体判别模型的中间层输出端插入点卷积层、批量归一化层、线性整流层、以构建像素级分类器。Step 2. Insert a point convolution layer, a batch normalization layer, and a linear rectification layer at the output ends of the intermediate layers of the first human face liveness discrimination model and the second human face liveness discrimination model to construct a pixel-level classifier.

为便于描述，结合上述步骤一和步骤二统一进行说明，如图2所示，本发明通过堆叠四个MoblieNetV2 Invert residual block构建人脸活体判别模型基础框架（第一人脸活体判别模型和第二人脸活体判别模型的基础框架一致，以下统一进行描述），进一步地，在活体判别框架的第三个Invert residual block输出端额外插入像素级分类分支结构，其中Conv1为3x3卷积层，Bn为批归一化层，Relu为线性整流层，Conv2为1x1卷积层。For the convenience of description, the above-mentioned step 1 and step 2 are combined for description. As shown in FIG. The basic framework of the human face discrimination model is the same, and will be described in a unified manner below), further, a pixel-level classification branch structure is additionally inserted at the output end of the third Invert residual block of the living body discrimination framework, where Conv1 is a 3x3 convolutional layer, and Bn is Batch normalization layer, Relu is a linear rectification layer, and Conv2 is a 1x1 convolutional layer.

步骤三、将原始训练图像处理为人脸图像和包含背景的人脸图像。Step 3: Process the original training image into a face image and a face image including a background.

具体地，可采用与上述步骤S2中相同的方式获取人脸图像和包含背景的人脸图像，在此不再赘述。Specifically, the face image and the face image including the background can be acquired in the same manner as in the above step S2, which will not be repeated here.

步骤四、针对人脸图像和包含背景的人脸图像分别生成像素级分类标签（也称为分类标签图）。Step 4: Generating pixel-level classification labels (also referred to as classification label maps) for the face image and the face image including the background, respectively.

作为一种可选的实施方式，像素级标签图的具体构造方法如下：As an optional implementation, the specific construction method of the pixel-level label map is as follows:

首先，生成与输入图像尺寸一致的标签图像，该标签图的值根据输入图像的类别决定，对于人脸图像：当该图像为攻击人脸时，标签图值全设为0；当输入图像为真实人脸时，标签图内的值为全设为1；对于包含背景的人脸图像：标签图内背景区域的值设为0，真实人脸区域的值设为1，欺骗人脸区域对应的值设为2；最后，将标签图缩放至像素级分类器输出的尺寸。First, generate a label image with the same size as the input image. The value of the label image is determined according to the category of the input image. For a face image: when the image is an attacking face, the value of the label image is all set to 0; when the input image is For a real face, the values in the label map are all set to 1; for a face image that includes a background: the value of the background area in the label map is set to 0, the value of the real face area is set to 1, and the value of the deceptive face area corresponds to The value of is set to 2; finally, the label map is scaled to the size of the output of the pixel-level classifier.

步骤五、在所述第一人脸活体判别模型和第二人脸活体判别模型中分别插入像素级分类损失，与Focal Loss构成目标损失函数。Step 5. Insert pixel-level classification losses into the first human face liveness discrimination model and the second human face liveness discrimination model respectively, and form a target loss function with Focal Loss.

步骤六、使用人脸图像和二分类标签训练第一人脸活体判别模型，使用包含背景的人脸图像和二分类标签训练第二人脸活体判别模型。Step 6, using the face image and the binary classification label to train the first human face liveness discriminant model, and using the face image including the background and the binary classification label to train the second human face liveness discriminant model.

具体地，将第一人脸活体判别模型和第二人脸活体判别模型分开进行训练。训练时，逐点计算所述像素级分类器输出和标签图中对应位置标签的交叉熵分类损失，该损失记为

，同时计算模型末端输出和二分类标签的Focal Loss损失，记为

，同时为了调节像素级分类损失对模型的影响，给与像素级分类损失权重，于是最终的目标损失函数可定义为

，

为指定像素级分类损失函数的权重。通过梯度下降法优化最终目标损失函数训练活体判别模型。Specifically, the first human face living body discrimination model and the second human face living body discrimination model are trained separately. During training, the cross-entropy classification loss of the output of the pixel-level classifier and the corresponding position label in the label map is calculated point by point, and the loss is denoted as

, at the same time, in order to adjust the influence of pixel-level classification loss on the model, the weight of pixel-level classification loss is given, so the final target loss function can be defined as

,

For the weights of the specified pixel-level classification loss function. The final target loss function is optimized by the gradient descent method to train the living body discriminant model.

为便于本领域技术人员理解本发明实施例所提供的技术方案，结合图3，下面以获得融合权重预测子模型为例，对本发明实施例所提供的技术方案进行详细说明。In order to facilitate those skilled in the art to understand the technical solution provided by the embodiment of the present invention, referring to FIG. 3 , the technical solution provided by the embodiment of the present invention will be described in detail below to obtain the fusion weight prediction sub-model as an example.

步骤一、如图3所示，构建融合权重预测子模型的基础结构。具体来说，通过连续堆叠卷积层、批归一化层，线性整流层构成主干结构，同时引入通道注意力机制，给予特征通道赋予重要性权重，最后插入全局池化层和全连接结构，全连接层末端输出2维的向量，表示第一人脸活体判别模型和第二人脸活体判别模型末端输出的加权融合权重。Step 1, as shown in Figure 3, construct the basic structure of the fusion weight prediction sub-model. Specifically, the backbone structure is formed by stacking convolutional layers, batch normalization layers, and linear rectification layers continuously. At the same time, a channel attention mechanism is introduced to assign importance weights to feature channels. Finally, a global pooling layer and a fully connected structure are inserted. The end of the fully connected layer outputs a 2-dimensional vector, which represents the weighted fusion weight output from the end of the first live face discrimination model and the second live face discrimination model.

步骤二、首先将人脸图像数据和包含背景的人脸图像输入所述第一人脸活体判别模型和所述第二人脸活体判别模型，提取所述第一人脸活体判别模型和所述第二人脸活体判别模型第四个Invert residual block输出的特征图；然后，按通道维度对上述特征图进行拼接，输入到融合权重预测子模型获得所述第一人脸活体判别模型末端输出和所述第二人脸活体判别模型末端输出的加权融合权重。Step 2, first input the human face image data and the human face image containing the background into the first human face living body discrimination model and the second human face living body discrimination model, extract the first human face living body discrimination model and the The feature map output by the fourth Invert residual block of the second human face living body discrimination model; then, according to the channel dimension, the above feature map is spliced, input to the fusion weight prediction sub-model to obtain the terminal output of the first human face living body discrimination model and The weighted fusion weight output by the end of the second human face liveness discrimination model.

步骤三、基于所述加权融合权重对第一人脸活体判别模型和第二人脸活体判别模型的末端输出进行加权求和得到最终活体判别输出。Step 3: Based on the weighted fusion weights, the terminal outputs of the first liveness discrimination model and the second liveness discrimination model are weighted and summed to obtain a final liveness discrimination output.

步骤四、利用梯度下降法优化所述最终活体判别输出和二分类标签间的FocalLoss以训练融合权重预测子模型的权重参数。需要注意的是，该步骤须在第一人脸活体判别模型和第二人脸活体判别模型的训练完成之后进行，更进一步地，在子模型训练过程中需冻结第一人脸活体判别模型和第二人脸活体判别模型的权重，仅更新融合权重预测子模型的参数。Step 4: Optimizing the FocalLoss between the final in vivo discrimination output and the binary classification labels by using the gradient descent method to train the weight parameters of the fusion weight prediction sub-model. It should be noted that this step must be performed after the training of the first human face liveness discrimination model and the second human face liveness discrimination model is completed. Furthermore, the first human face liveness discrimination model and The weight of the second human face living body discrimination model only updates the parameters of the fusion weight prediction sub-model.

将上述算法应用部署在近红外人脸智能人脸识别模块上，性能对比如表1和表2所示：Apply the above algorithm to the near-infrared face intelligent face recognition module, and the performance comparison is shown in Table 1 and Table 2:

表1在近红外数据上的融合后与融合模型的性能对比Table 1 Performance comparison between fusion and fusion models on near-infrared data

模型Model tpr@fpr=0.1%（↑）tpr@fpr=0.1%（↑） tpr@fpr=0.01%（↑）tpr@fpr=0.01%（↑） tpr@fpr=0.001%（↑）tpr@fpr=0.001%（↑）模型大小（M）Model size (M) 第一活体判别模型The first in vivo discriminant model 84.8284.82 65.5765.57 46.7146.71 0.340.34 第二活体判别模型The second in vivo discriminant model 88.6888.68 69.3769.37 47.7647.76 0.340.34 多活体判别模型融合Multi-living discriminant model fusion 97.8897.88 94.8994.89 84.3484.34 0.7 0.7

表2 在近红外数据上的与同级别参数量模型的综合性能对比Table 2 Comparison of comprehensive performance with the same level of parameter model on near-infrared data

模型Model tpr@fpr=0.1%tpr@fpr=0.1% tpr@fpr=0.01%tpr@fpr=0.01% tpr@fpr=0.001%tpr@fpr=0.001% Params(M)Params(M) ResNet18ResNet18 88.1088.10 73.5473.54 \\ 11.611.6 CDCNppCDCNpp 83.4383.43 36.5536.55 12.8212.82 2.262.26 CDCNCDCN 95.7495.74 90.4290.42 80.7580.75 2.242.24 第一活体判别模型The first in vivo discriminant model 95.9695.96 86.7586.75 53.7053.70 0.340.34 第二活体判别模型The second in vivo discriminant model 94.7194.71 86.7786.77 73.4173.41 0.340.34 多活体判别模型融合（加权平均融合）Multi-living discriminant model fusion (weighted average fusion) 97.7397.73 94.4394.43 80.2780.27 0.680.68 多活体判别模型融合（融合预测模块融合）Multi-living discriminant model fusion (fusion prediction module fusion) 97.8897.88 94.8994.89 84.3484.34 0.7 0.7

结合上表1和表2，可得到通过计算所得到的性能，证实了本发明实施例所提供的技术方案具备实用性（可行性）。Combining the above Table 1 and Table 2, the performance obtained through calculation can be obtained, which proves that the technical solutions provided by the embodiments of the present invention are practical (feasible).

综上所述，本实施例相对于现有技术，具有如下优点及有益效果：In summary, compared with the prior art, this embodiment has the following advantages and beneficial effects:

（1）本发明实施例基于2个超轻量级近红外活体判别模型实现，整体模型参数量<760KB，可以在面向低算力，低功耗和低内存的边端设备进行部署和应用。(1) The embodiment of the present invention is implemented based on two ultra-lightweight near-infrared living body discrimination models, and the overall model parameter volume is <760KB, which can be deployed and applied on edge devices with low computing power, low power consumption and low memory.

（2）本发明实施例在模型训练阶段，针对第一和第二人脸活体判别模型分别使用了不同尺度人脸输入和监督标签，使第一活体判别模型关注人脸细节内容，第二活体判别模型关注人脸与背景的上下文联系，增加第一活体判别和第二活体判别模型的差异性，有效的提升两者融合的性能。(2) In the model training stage of the embodiment of the present invention, different scales of face input and supervision labels are used for the first and second live face discrimination models, so that the first live body discrimination model pays attention to the details of the face, and the second live body discrimination model focuses on face details. The discriminant model focuses on the contextual connection between the face and the background, increases the difference between the first living body discrimination model and the second living body discrimination model, and effectively improves the performance of the fusion of the two.

（3）本发明实例在模型预测融合阶段，设计了融合权重子模块以动态决定第一活体判别模型和第二活体判别模型预测的融合权重，更进一步地，在融合权重子模块中引入通道注意力机制，使模型能显式考虑不同尺度的人脸图像信息对于活体判别任务的贡献，进一步提升多模型预测融合的性能。(3) In the model prediction fusion stage of the example of the present invention, the fusion weight sub-module is designed to dynamically determine the fusion weights predicted by the first living body discrimination model and the second living body discrimination model, and further, channel attention is introduced into the fusion weight sub-module The force mechanism enables the model to explicitly consider the contribution of face image information of different scales to the liveness discrimination task, further improving the performance of multi-model prediction fusion.

如图5所示，本实施例还提供了一种人脸活体判别系统，包括：As shown in Figure 5, the present embodiment also provides a human face living body discrimination system, including:

目标图像获取和人脸检测模块101，用于获取待判别的目标图像，获取所述目标图像的人脸检测框；Target image acquisition and face detection module 101, used to acquire the target image to be discriminated, and obtain the face detection frame of the target image;

人脸图像裁剪模块102，用于根据所述人脸检测框从所述目标图像中获取第一人脸图像和第二人脸图像；其中，第一人脸图像为不包含背景的人脸图像，第二人脸图像为包含背景的人脸图像；The face image cropping module 102 is used to obtain a first face image and a second face image from the target image according to the face detection frame; wherein, the first face image is a face image that does not contain a background , the second face image is a face image containing a background;

活体判别预测模块103，用于将所述第一人脸图像输入第一人脸活体判别模型，获得第一特征和第一输出；将所述第二人脸图像输入第二人脸活体判别模型，获得第二特征和第二输出；The living body discrimination prediction module 103 is used to input the first human face image into the first human face living body discrimination model to obtain the first feature and the first output; input the second human face image into the second human face living body discrimination model , get the second feature and the second output;

预测融合模块104，用于将所述第一特征和所述第二特征拼接后输入融合权重预测子模型，获得两个人脸活体判别模型的融合权重；The prediction fusion module 104 is used to input the fusion weight prediction sub-model after the first feature and the second feature are spliced to obtain the fusion weight of the two human face living body discrimination models;

判断模块105，用于根据所述融合权重、第一输出和第二输出获取人脸活体判别的最终输出。A judging module 105, configured to obtain a final output of human face liveness discrimination according to the fusion weight, the first output and the second output.

本实施例的一种人脸活体判别系统，可执行本发明方法实施例所提供的一种人脸活体判别方法，可执行方法实施例的任意组合实施步骤，具备该方法相应的功能和有益效果。A human face living body identification system in this embodiment can execute a human face living body identification method provided by the method embodiment of the present invention, can execute any combination of implementation steps of the method embodiment, and has the corresponding functions and beneficial effects of the method .

参见图6，本实施例还提供一种人脸活体判别装置，包括：Referring to Fig. 6, the present embodiment also provides a human face living body discrimination device, including:

近红外图像采集器201，用于采集近红外图像；A near-infrared image collector 201, configured to collect near-infrared images;

至少一个存储器202，用于存储计算机程序和近红外图像数据；at least one memory 202 for storing computer programs and near-infrared image data;

至少一个处理器203，用于执行所述计算机程序时实现如图1所示的基于多模型预测融合的活体判别方法的步骤。At least one processor 203 is configured to implement the steps of the living body discrimination method based on multi-model prediction fusion as shown in FIG. 1 when executing the computer program.

本实施例的一种人脸活体判别装置，可执行本发明方法实施例所提供的一种人脸活体判别方法，可执行方法实施例的任意组合实施步骤，具备该方法相应的功能和有益效果。A human face living body identification device in this embodiment can execute a human face living body identification method provided by the method embodiment of the present invention, can execute any combination of implementation steps of the method embodiment, and has the corresponding functions and beneficial effects of the method .

本申请实施例还公开了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行图1所示的方法。The embodiment of the present application also discloses a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device can read the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method shown in FIG. 1 .

本实施例还提供了一种存储介质，存储有可执行本发明方法实施例所提供的一种人脸活体判别方法的指令或程序，当运行该指令或程序时，可执行方法实施例的任意组合实施步骤，具备该方法相应的功能和有益效果。This embodiment also provides a storage medium, which stores an instruction or a program that can execute a human face living body discrimination method provided by the method embodiment of the present invention. When the instruction or program is executed, any of the method embodiments can be executed. Combining the implementation steps has the corresponding functions and beneficial effects of the method.

在一些可选择的实施例中，在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如，取决于所涉及的功能/操作，连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外，在本发明的流程图中所呈现和描述的实施例以示例的方式被提供，目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的，其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

此外，虽然在功能性模块的背景下描述了本发明，但应当理解的是，除非另有相反说明，所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中，或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是，有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说，考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下，在工程师的常规技术内将会了解该模块的实际实现。因此，本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是，所公开的特定概念仅仅是说明性的，并不意在限制本发明的范围，本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, although the invention has been described in the context of functional modules, it should be understood that one or more of the described functions and/or features may be integrated into a single physical device and/or unless stated to the contrary. or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions and internal relationships of the various functional blocks in the devices disclosed herein, the actual implementation of the blocks will be within the ordinary skill of the engineer. Accordingly, those skilled in the art can implement the present invention set forth in the claims without undue experimentation using ordinary techniques. It is also to be understood that the particular concepts disclosed are illustrative only and are not intended to limit the scope of the invention which is to be determined by the appended claims and their full scope of equivalents.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, and other media that can store program codes. .

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备（如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统）使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with an instruction execution system, device, or device (such as a computer-based system, a system including a processor, or other systems that can fetch instructions from an instruction execution system, device, or device and execute instructions), or in conjunction with such an instruction execution system, device or equipment used. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device.

计算机可读介质的更具体的示例（非穷尽性列表）包括以下：具有一个或多个布线的电连接部（电子装置），便携式计算机盘盒（磁装置），随机存取存储器（RAM），只读存储器（ROM），可擦除可编辑只读存储器（EPROM或闪速存储器），光纤装置，以及便携式光盘只读存储器（CDROM）。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. The program is processed electronically and stored in computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列（PGA），现场可编程门阵列（FPGA）等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGAs), Field Programmable Gate Arrays (FPGAs), etc.

在本说明书的上述描述中，参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施方式或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。In the above description of this specification, the description with reference to the terms "one embodiment/example", "another embodiment/example" or "some embodiments/example" means that the description is described in conjunction with the embodiment or example. A particular feature, structure, material, or characteristic is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施方式，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principle and spirit of the present invention. The scope of the invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明，但本发明并不限于上述实施例，熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present invention. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims

1. A face living body distinguishing method is characterized by comprising the following steps:

acquiring a target image to be distinguished, and acquiring a face detection frame of the target image;

acquiring a first face image and a second face image from the target image according to the face detection frame; the first face image is a face image without a background, and the second face image is a face image with a background;

inputting the first face image into a first face living body distinguishing model to obtain a first characteristic and a first output result;

inputting the second face image into a second face living body distinguishing model to obtain a second characteristic and a second output result;

splicing the first feature and the second feature and then inputting the spliced first feature and the spliced second feature into a fusion weight predictor model to obtain fusion weights of two human face living body discrimination models;

and acquiring a final output result of the living human face judgment according to the fusion weight, the first output result and the second output result.

2. The method for determining the living human face according to claim 1, wherein the acquiring a first human face image and a second human face image from the target image according to the human face detection frame includes:

cutting out a first face image from the target image according to the face detection frame;

amplifying the face detection frame to obtain a second detection frame;

and cutting out a second face image from the target image according to a second detection frame.

3. The living human face identification method according to claim 1, wherein the first living human face identification model and the second living human face identification model have the same basic framework and are both lightweight network models;

the living face discrimination model is formed by stacking a plurality of invirt residual blocks of mobrieetv 2.

4. The living human face identification method according to claim 3, wherein the first living human face identification model and the second living human face identification model are trained by:

inserting a point convolution layer, a batch normalization layer and a linear rectification layer into output ends of intermediate layers of a first human face living body discrimination model and a second human face living body discrimination model to construct a pixel-level classifier;

acquiring a face image and a face image containing a background according to a training set, and respectively generating pixel level classification labels of the face image and the face image containing the background according to a pixel level classifier;

inserting pixel level classification Loss into the first face living body discrimination model and the second face living body discrimination model, and forming a target Loss function together with the Focal local;

and optimizing the target loss function according to the two classification labels and the pixel classification label, and training the first face living body discrimination model and the second face living body discrimination model according to the target loss function.

5. The living human face identification method according to claim 4, wherein the generation process of the pixel-level classification label comprises:

for pixel level classification labels of face images: constructing a label graph with the same size as the output of the pixel-level classifier, and setting all values of the classified label graph to be 0 when the face image is in a deception face category; when the face image is a real face, setting the values of the classification label images to be 1;

for pixel level classification labels of face images containing backgrounds: constructing a label graph with the same size as the output of the pixel-level classifier, wherein when the input image is a deception face, the corresponding value in a deception face area in the label graph is 2, and the value of a background part is 0; when the input image is a real face, the value of the face area in the label image is 1, and the value of the background part is 0.

6. The method according to claim 5, wherein the training the first living human face identification model and the second living human face identification model according to the target loss function comprises:

training the first face living body distinguishing model and the second face living body distinguishing model separately;

during training, calculating the cross entropy classification loss of the output of the pixel-level classifier and the corresponding position label in the label graph point by point, and recording the loss as

Calculate the Loss of the Focal local of the model's end output and the binary label simultaneously, and record it as

(ii) a The objective loss function is defined as:

in the formula (I), the compound is shown in the specification,

the weights of the loss function are classified for a given pixel level.

7. The method for discriminating living human faces according to claim 1, wherein the fusion weight predictor model constructs a basic structure of the fusion weight predictor model by stacking a convolution layer, a batch normalization layer and a linear rectifying layer to form a main structure and introducing a channel attention mechanism;

the fusion weight predictor model is trained in the following way:

acquiring a first feature map output by the first human face living body distinguishing model and a second feature map output by the second human face living body distinguishing model;

after the first feature map and the second feature map are spliced, inputting a fusion weight predictor model, and outputting a weighted fusion weight output by the tail end of the first face living body discrimination model and the tail end of the second face living body discrimination model;

weighting and summing the tail end output of the first face living body distinguishing model and the tail end output of the second face living body distinguishing model by using the weighted fusion weight to obtain a final output;

calculating the Focal local of the final output and the two classification labels, and optimizing the Loss by using a random gradient descent method to train a fusion weight predictor model;

and during training, freezing the weight parameters of the first human face living body discrimination model and the second human face living body discrimination model.

8. A living human face discrimination system, comprising:

the target image acquisition and face detection module is used for acquiring a target image to be distinguished and acquiring a face detection frame of the target image;

the face image cutting module is used for acquiring a first face image and a second face image from the target image according to the face detection frame; the first face image is a face image without a background, and the second face image is a face image with a background;

the living body distinguishing and predicting module is used for inputting the first face image into a first face living body distinguishing model to obtain a first characteristic and a first output; inputting the second face image into a second face living body distinguishing model to obtain a second characteristic and a second output;

the prediction fusion module is used for splicing the first feature and the second feature and then inputting the spliced first feature and second feature into a fusion weight prediction submodel to obtain fusion weights of the two human face living body distinguishing models;

and the judging module is used for acquiring the final output of the human face living body judgment according to the fusion weight, the first output and the second output.

9. A living human face discrimination apparatus, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is adapted to perform the method according to any one of claims 1 to 7 when executed by the processor.