CN113158773B

CN113158773B - Training method and training device for living body detection model

Info

Publication number: CN113158773B
Application number: CN202110245292.5A
Authority: CN
Inventors: 张兴; 谢思敏
Original assignee: TP Link Technologies Co Ltd
Current assignee: TP Link Technologies Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2024-03-22
Anticipated expiration: 2041-03-05
Also published as: CN113158773A

Abstract

The application is applicable to the technical field of image processing, and provides a training method and a training device for a living body detection model, wherein the training method comprises the following steps: acquiring an initial model and a sample data set; the initial model comprises a first convolution network, a second convolution network and a third convolution network; inputting the sample data set into the first branch network and the second branch network, calculating a joint loss function, and training the second convolution network according to the joint loss function to obtain a trained second convolution network; and taking the first convolution network and the trained second convolution network as target living body detection models. In the process, the third convolution network in the second branch network is only used for assisting in training the second convolution network, so that the excellent detection precision of the target living body detection model is ensured, unnecessary calculated amount is reduced, and living body detection efficiency is improved.

Description

Training method and training device for living body detection model

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a training method and a training device for a living body detection model.

Background

Living detection is a method of determining the true physiological characteristics of a subject in some authentication scenarios. Common attack means such as photos, face changes, masks, shielding, screen shots and the like can be effectively resisted, so that a user is helped to screen fraudulent behaviors, and the benefit of the user is guaranteed.

The existing living body detection technology is often matched with various models to realize living body detection. However, many models require more computational resources, resulting in less efficient in vivo detection. This is a technical problem that needs to be solved.

Disclosure of Invention

In view of this, the embodiments of the present application provide a method for training a living body detection model, a method for living body detection, a device for training a living body detection model, a device for living body detection, a terminal device, a living body detection device, and a computer readable storage medium, which can solve the technical problem that many kinds of models require more computing resources, resulting in lower efficiency of living body detection.

A first aspect of an embodiment of the present application provides a training method for a living body detection model, the training method including:

acquiring an initial model and a sample data set; the initial model comprises a first convolution network, a second convolution network and a third convolution network; the first convolution network and the second convolution network form a first branch network; the first convolution network and the third convolution network form a second branch network;

Inputting the sample data set into the first branch network and the second branch network, calculating a joint loss function, and training the second convolution network according to the joint loss function to obtain a trained second convolution network;

and taking the first convolution network and the trained second convolution network as target living body detection models.

A second aspect of embodiments of the present application provides a living body detection method, including:

acquiring an image to be detected;

inputting the image to be detected into a target living body detection model to obtain a detection result output by the target living body detection model; the detection result is used for representing the probability that the image to be detected is a living body face image; the target living body detection model is a first branch network in an initial model, and the first branch network comprises a first convolution network and a second convolution network; the initial model also comprises a second branch network; the first branch network and the second branch network are used for jointly training the second convolution network;

and determining whether the image to be detected is a living human face image or not according to the detection result.

A third aspect of the embodiments of the present application provides a training device for a living body detection model, the training device including:

A first acquisition unit for acquiring an initial model and a sample data set; the initial model comprises a first convolution network, a second convolution network and a third convolution network; the first convolution network and the second convolution network form a first branch network; the first convolution network and the third convolution network form a second branch network;

the training unit is used for inputting the sample data set into the first branch network and the second branch network, calculating a joint loss function, training the second convolution network according to the joint loss function, and obtaining a trained second convolution network;

and the clipping unit is used for taking the first convolution network and the trained second convolution network as a target living body detection model.

A fourth aspect of embodiments of the present application provides an apparatus for in vivo detection, the apparatus comprising:

the second acquisition unit is used for acquiring the image to be detected;

the detection unit is used for inputting the image to be detected into a target living body detection model to obtain a detection result output by the target living body detection model; the detection result is used for representing the probability that the image to be detected is a living body face image; the target living body detection model is a first branch network in an initial model, and the first branch network comprises a first convolution network and a second convolution network; the initial model also comprises a second branch network; the first branch network and the second branch network are used for jointly training the second convolution network;

And the judging unit is used for determining whether the image to be detected is a living body face image or not according to the detection result.

A fifth aspect of the embodiments of the present application provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method of the first aspect described above when executing the computer program.

A sixth aspect of the embodiments of the present application provides a living body detection apparatus, including an image capturing module, a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method described in the second aspect above when the computer program is executed.

A fourth aspect of the embodiments of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of the first or second aspect described above.

Compared with the prior art, the embodiment of the application has the beneficial effects that: the application jointly trains a second convolutional network in the first branch network through the first branch network and the second branch network in the initial model. And using the first convolution network and the trained second convolution network as a target living body detection model. In the process, the third convolution network in the second branch network is only used for assisting in training the second convolution network, so that the excellent detection precision of the target living body detection model is ensured, unnecessary calculated amount is reduced, and living body detection efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the related technical descriptions, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 shows a schematic flow chart of a training method of a living body detection model provided by the application;

FIG. 2 shows a specific schematic flowchart of step 101 in a training method of a living body detection model provided in the present application;

FIG. 3 shows a specific schematic flowchart of step A2 in a training method of a living body detection model provided by the application;

FIG. 4 shows a specific schematic flowchart of step A3 in a training method of a living body detection model provided by the application;

FIG. 5 shows a specific schematic flowchart of step 102 in a method for training a living body detection model provided in the present application;

FIG. 6 shows a schematic diagram of an initial model provided herein;

FIG. 7 shows a schematic flow chart of a method of in vivo detection provided herein;

FIG. 8 shows a specific schematic flow chart of step 701 in a method for in vivo detection provided herein

FIG. 9 shows a specific schematic flow chart of step 703 in a method of in vivo detection provided herein;

FIG. 10 shows a specific schematic flow chart of another method of in vivo detection provided herein;

FIG. 11 shows a schematic diagram of a training device for a living body detection model provided by the present application;

FIG. 12 shows a schematic view of an apparatus for in vivo detection provided herein;

fig. 13 is a schematic diagram of a terminal device according to an embodiment of the present invention;

fig. 14 is a schematic view of a living body detecting apparatus according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

For a better understanding of the technical problem solved by the present application, the above background art is further described herein:

the existing living body detection technology is often matched with various models to realize living body detection. For example: and inputting the image to be identified into a depth estimation model, and carrying out depth estimation through the depth estimation model to obtain a depth image. And identifying the depth image and the image to be identified based on the living body detection model to obtain a living body detection result.

Most living detection equipment often has limited hardware conditions, so that the method is time-consuming and has low efficiency when being used for calculating various models, and real-time detection cannot be performed.

In view of the above, embodiments of the present application provide a method for training a living body detection model, a method for living body detection, a training apparatus for a living body detection model, an apparatus for living body detection, a terminal device, a living body detection device, and a computer-readable storage medium, which can solve the above-mentioned technical problems.

Wherein the terminal device and the living body detection device may be the same device or different devices.

Referring to fig. 1, fig. 1 is a schematic flowchart of a training method of a living body detection model provided in the present application. As shown in fig. 1, the training method is applied to a terminal device, and includes the following steps:

Step 101, acquiring an initial model and a sample data set; the initial model comprises a first convolution network, a second convolution network and a third convolution network; the first convolution network and the second convolution network form a first branch network; the first convolutional network and the third convolutional network form a second branch network.

It is noted that the initial model and the sample data set are different according to the application scenario. The application scenarios of steps 101 to 103 are not limited in this embodiment. In order to better explain the technical solution of the present embodiment, the present embodiment uses an application scenario as a living body detection scenario as an example, and the technical solution of the present embodiment is explained. Those skilled in the art can analogize the technical scheme of the living body detection scene to obtain the technical scheme of other application scenes, which are not described herein.

Since different network structures in the initial model carry different processing tasks, the hierarchical structure in the initial model is divided into a first convolution network, a second convolution network and a third convolution network. In a living detection scenario, a first convolutional network is used for feature extraction, a second convolutional network is used for reconstructing a Fourier spectrogram, and a third convolutional network is used for detection classification. In order to better explain the technical solution of the present embodiment, the first convolutional network, the second convolutional network, and the third convolutional network are taken as examples.

Taking a living body detection scene as an example, the sample data set comprises a sample image, an original Fourier spectrogram corresponding to the sample image and labeling information corresponding to the sample image. The labeling information can be manually added information or model identification information, and is used for labeling the image types of the sample image, wherein the image types include but are not limited to living face images and non-living face images. It will be appreciated that as the number of branching networks of the initial model increases, the data types in the sample data set also increase.

Wherein for sample images, the terminal device can acquire the sample images in an existing database. The terminal equipment can also collect sample images through the living body detection equipment (preferably, because different cameras and application scenes are different, the existing database is also different from the images to be identified in the actual application scenes to some extent, and further the detection performance of the model obtained through training in the actual application is reduced.

The sample data set may be raw data (raw data including, but not limited to, a first raw image, an initial fourier spectrum, etc.) or raw data after preprocessing. In order to optimize the calculation amount in the training process, the method preprocesses the original data to obtain a sample data set, and the specific process is as follows in the optional embodiment:

As an alternative embodiment of the present application, the following steps A1 to A3 are included in step 101. Referring to fig. 2, fig. 2 is a specific schematic flowchart of step 101 in a training method of a living body detection model provided in the present application.

Step A1, a first original image is acquired.

The first original image includes a plurality of living face images and a plurality of non-living face images (the first original image refers to an image that has not been preprocessed). Illustratively, the first original image may be as follows: (1) a living body face image, (2) a black and white paper face image, (3) a living body detection device shoots a static face displayed on a mobile phone, a tablet and a computer screen, and (4) the living body detection device shoots a face in a video played on the mobile phone, the tablet and the computer screen, and the like, wherein (2) to (4) are non-living body face images.

And step A2, detecting a face area in the first original image, and acquiring the sample image corresponding to the face area.

The face detection method includes, but is not limited to, a combination of one or more models of single lens multi-box face detection (SSD) and object detection (YouOnlyLookOnce, YOLO).

Since the data size of the face area is large, in order to further optimize the calculation amount in the training process, the following optional embodiments are provided in the application:

As an alternative embodiment of the present application, the following steps a21 to a24 are included in step A2. Referring to fig. 3, fig. 3 shows a specific schematic flowchart of step A2 in a training method of a living body detection model provided in the present application.

Step A21, obtaining a first face frame of the first original image through a face detection model; the first face frame is used for marking the face area.

And step A22, amplifying the size of the first face frame to a second preset size to obtain an amplified first face frame.

On one hand, in order to ensure that a complete face image is obtained and prevent a partial area of a face from overflowing beyond a first face frame, on the other hand, effective information of living body detection is not only contained in the first face frame but also contained in the first face frame and a nearby background thereof, so that the terminal equipment enlarges the first face frame to a second preset size to obtain an enlarged first face frame so as to improve accuracy of living body detection.

The second preset size may be a fixed size (i.e. a size constant value is preset), or may be a size obtained according to a preset ratio, that is, the size of the first face frame is multiplied by the preset ratio to obtain the second preset size.

And step A23, intercepting the region corresponding to the amplified first face frame from the first original image to obtain a second original image.

And step A24, reducing the size of the second original image to a third preset size to obtain the sample image.

In order to improve the model training efficiency, the terminal equipment reduces the size of the second original image to a third preset size, and reduces the calculated amount of the model so as to improve the model training efficiency.

It will be understood that "reducing the size of the second original image to the third preset size" refers to reducing the entire second original image to the third preset size, and does not refer to cutting out the area of the third preset size from the second original image.

Notably, as the sample image is infinitely small in size, the detection accuracy of the trained second convolution network decreases. And when the size of the sample image is larger, the detection precision of the second convolution network obtained by training is high, but the calculated amount is large. Therefore, a third preset size which is more suitable can be drawn according to the comprehensive requirements of detection precision, calculation efficiency and the like of the actual application scene.

And step A3, calculating an original Fourier spectrogram of the sample image according to the sample image.

In step A3, the fourier spectrum of the sample image may be directly used as the original fourier spectrum, or the original fourier spectrum may be calculated according to the following alternative embodiments:

as an alternative embodiment of the present application, the following steps a31 to a33 are included in step A3. Referring to fig. 4, fig. 4 shows a specific schematic flowchart of step A3 in a training method of a living body detection model provided in the present application.

Step a31, calculating an initial fourier spectrum of the sample image.

The fourier spectrum may reflect differences in the frequency domain of the live/non-live face image (e.g., the electronic screen face may have moire), thus facilitating classification of the live/non-live faces.

Since the fourier spectrum is calculated by a conventional technical means, the description thereof will not be repeated here.

And step A32, carrying out normalization processing on the initial Fourier spectrum.

Because the same object has certain difference in different images corresponding to the same object under the influence of different shooting environments or software and hardware limitations and other factors, the implementation avoids the difference through normalization processing.

And step A33, reducing the size of the normalized initial Fourier spectrum to a first preset size to obtain the original Fourier spectrum.

In order to improve the model training efficiency, the terminal equipment reduces the size of the initial Fourier spectrogram to a first preset size, and reduces the calculated amount of the model so as to improve the model training efficiency.

It will be understood that "reducing the size of the initial fourier spectrum to the first preset size" refers to reducing the entire initial fourier spectrum to the first preset size, and does not refer to cutting out the region of the first preset size from the initial fourier spectrum.

Step 102, inputting the sample data set into the first branch network and the second branch network to obtain a joint loss function, training the second convolution network according to the joint loss function, and obtaining a trained second convolution network.

The mode of training the model to be trained comprises the following two modes:

mode (1): a student model having a simple network structure is taken as a second convolution network, and a teacher model having a complex network structure is taken as a third convolution network (both the second convolution network and the third convolution network are used for living body detection). And respectively inputting the same sample images into a teacher model and a student model, and training the student model in a learning distillation mode to obtain a trained second convolution network.

Mode (2): as an alternative embodiment of the present application, the following steps B1 to B7 are included in step 102. Referring to fig. 5, fig. 5 is a specific schematic flowchart of step 102 in a training method of a living body detection model provided in the present application.

Each sample data set is circularly executed as follows in steps B1 to B7, resulting in a trained second convolution network.

And B1, inputting the sample image into the first convolution network to obtain the characteristic data output by the first convolution network.

And B2, inputting the characteristic data into a second convolution network to obtain a prediction result output by the second convolution network.

And step B3, calculating a first loss function between the prediction result and the labeling information.

And B4, inputting the characteristic data into a third convolution network to obtain a predicted Fourier spectrum chart output by the third convolution network.

And B5, calculating a second loss function between the predicted Fourier spectrogram and the original Fourier spectrogram.

And step B6, taking the first loss function and the second loss function as the joint loss function.

And B7, updating parameters of the second convolution network according to the joint loss function.

In order to better explain the steps B1 to B7, the present embodiment explains the steps B1 to B7 with reference to the drawings. Referring to fig. 6, fig. 6 shows a schematic diagram of an initial model provided in the present application. It should be noted that the initial model in fig. 6 is merely an example, and parameters such as a network structure and a feature map of the initial model are not limited. As shown in fig. 6, a dashed box 1 represents a first convolutional network, a dashed box 2 represents a second convolutional network, and a dashed box 3 represents a third convolutional network. The initial model comprises two branch networks, wherein the first branch network is a branch network formed by a dotted line frame 1 and a dotted line frame 2, and the second branch network is a branch network formed by the dotted line frame 1 and a dotted line frame 3. Preferably, the first convolutional network and the second convolutional network may preferably adopt a MINIfasnet network hierarchy, and the third convolutional network may preferably adopt an ftproducer network hierarchy.

As shown in fig. 6, the sample image is input into the initial model. The sample image passes through a first convolution network (dashed box 1) to obtain feature data output by the first convolution network. And the characteristic data is processed by a second convolution network (a dotted line box 2) to obtain a second convolution network prediction result. And calculating a first Loss function corresponding to the labeling information corresponding to the sample image of the prediction result, namely 'Softmax Loss' (the error type is cross entropy). The characteristic data is processed by a third convolution network (dashed box 3) to obtain a predicted Fourier spectrogram. And calculating a second Loss function corresponding to the original Fourier spectrogram and the predicted Fourier spectrogram, namely 'FT Loss' (the error type is mean square error MSE). The first loss function and the second loss function are used as joint loss functions. The parameters of the second initial convolutional network (dashed box 2) are updated according to the joint loss function. And each sample data set circularly executes the steps B1 to B7 to obtain a trained second convolution network.

The trigger mechanism for training completion comprises the following two types: (1) the sample data set is fully trained, (2) the value of the joint loss function no longer drops (i.e., the joint loss function converges) (3) the value of the first loss function of the first branch network no longer drops (i.e., the first loss function converges).

And step 103, using the first convolution network and the trained second convolution network as a target living body detection model.

In order to reduce the calculation amount of the model, the third convolution network in the initial model is cut, and the rest of the first convolution network and the trained second convolution network (namely the first branch network) are used as the target living body detection model.

It will be appreciated that the third convolutional network is used only for the training phase to assist in increasing the detection level of the second convolutional network. The target living body detection model can simultaneously maintain the detection precision under the condition of having smaller calculation amount.

In this embodiment, a second convolutional network in the first branch network is jointly trained by the first branch network and the second branch network in the initial model. And using the first convolution network and the trained second convolution network as a target living body detection model. In the process, the third convolution network in the second branch network is only used for assisting in training the second convolution network, so that the excellent detection precision of the target living body detection model is ensured, unnecessary calculated amount is reduced, and living body detection efficiency is improved.

Referring to fig. 7, fig. 7 shows a schematic flow chart of a method for in vivo detection provided herein. As shown in fig. 7, the method is applied to a living body detecting device, and includes the steps of:

step 701, acquiring an image to be detected.

The image to be detected may be an original image (in order to better distinguish original images corresponding to different images, the original image of the image to be detected is referred to as a third original image, and the third original image refers to an image acquired by the living body detection apparatus or an image input to the living body detection apparatus or the like that is not preprocessed). However, since the third original image often includes a large amount of redundant data (i.e. non-face area), the large amount of redundant data results in a large calculation amount and low processing efficiency of the target living body detection model, in order to improve the processing efficiency of the model, the third original image may be preprocessed, and the specific process is as follows in an optional embodiment:

as an alternative embodiment of the present application, step 701 includes the following steps 7011 to 7015. Referring to fig. 8, fig. 8 is a specific schematic flowchart illustrating step 701 in a method for in-vivo detection provided in the present application.

Step 7011, a third original image is acquired.

Step 7012, obtaining a second face frame of the third original image through a face detection model; the second face frame is used for marking the face area.

Step 7013, enlarging the size of the second face frame to a fourth preset size, so as to obtain an enlarged second face frame.

On one hand, in order to ensure that a complete face image is obtained, a part of the second face is prevented from overflowing out of the second face frame, and on the other hand, effective information of living body detection is not only contained in the second face frame but also in the second face frame and a nearby background thereof, so that the terminal equipment expands the second face frame to a fourth preset size to obtain an enlarged second face area, and the accuracy of living body detection is improved.

The fourth preset size may be a fixed size (i.e. a size constant value is preset), or the fourth preset size may be a size obtained according to a preset ratio, that is, the size of the second face frame is multiplied by the preset ratio to obtain the fourth preset size.

The fourth preset size may be the same as or different from the second preset size in the embodiment of fig. 3.

Step 7014, intercepting the region corresponding to the amplified second face frame from the third original image, so as to obtain a fourth original image.

Step 7015, reducing the size of the fourth original image to a fifth preset size, so as to obtain the image to be detected.

It will be understood that "reducing the size of the fourth original image to the fifth preset size" refers to reducing the entire fourth original image to the fifth preset size, and does not refer to cutting out the region of the fifth preset size from the fourth original image.

The fifth preset size may be the same as or different from the third preset size in the embodiment of fig. 3.

Step 702, inputting the image to be detected into a target living body detection model to obtain a detection result output by the target living body detection model; the detection result is used for representing the probability that the image to be detected is a living body face image; the target living body detection model is a first branch network in an initial model, and the first branch network comprises a first convolution network and a second convolution network; the initial model also comprises a second branch network; the first branch network and the second branch network are used to jointly train the second convolutional network.

Step 703, determining whether the image to be detected is a living body face image according to the detection result.

The detection result is the confidence coefficient of the target image to be identified and is used for representing the probability that the target image to be identified is a face image.

If the image to be detected is single, the living body detection equipment determines whether the image to be detected is a living body face image according to the threshold value and the detection result of the single image to be detected. If the number of the images to be detected is plural, the living body detection device determines whether the images to be detected are living body face images or not according to the threshold value and the detection results of the plurality of the images to be detected. Here, "single image to be detected" means that only one image to be detected needs to be used in one living body detection, and "multiple images to be detected" means that the number of images to be detected used in one living body detection is multiple. The two ways described above are shown in the following two alternative embodiments:

as an alternative embodiment of the present application, step 703 includes the following steps C1 to C2. Referring to fig. 9, fig. 9 shows a specific schematic flowchart of step 703 in a method for in-vivo detection provided in the present application.

And step C1, if the detection result is larger than a threshold value, confirming that the image to be detected is a living body face image.

And C2, if the detection result is not greater than a threshold value, confirming that the image to be detected is a non-living human face image.

As an alternative embodiment of the present application, steps 701 to 703 specifically include the following steps D1 to D4. Referring to fig. 10, fig. 10 shows a specific schematic flowchart of another method for in-vivo detection provided herein.

And D1, acquiring a plurality of images to be detected.

The plurality of images to be detected may be images at any time in the shooting process.

Preferably, the plurality of target images to be recognized are preferably k frame images (a plurality of images to be recognized) adjacent in time series.

The processing procedure of each image to be detected is referred to step 701, and will not be described herein.

And D2, sequentially inputting the images to be detected into the target living body detection model to obtain detection results corresponding to the images to be detected, which are output by the target living body detection model.

And D3, if the detection results are all greater than a threshold value, confirming that the image to be detected is a living body face image.

And D4, if a plurality of detection results are not all greater than a threshold value, confirming that the image to be detected is a non-living human face image.

The number of the plurality of images to be recognized can be set according to the detection accuracy. The greater the number of images to be recognized, the higher the detection accuracy of the living body detection. The smaller the number of images to be recognized, the lower the detection accuracy of the living body detection, but the larger the calculation amount. Therefore, the number of the images to be identified can be set according to the comprehensive consideration of the detection precision and the calculated amount.

In order to more intuitively compare the detection accuracy between the steps C1 to C2 and the steps D3 to D4, the following comparison is performed on the two: if the probability of success of the non-living attacks by the steps C1 to C2 is p (p is smaller than 1), the probability of success of the non-living attacks by the steps D3 to D4 is p ^k (k is the number of images to be identified of the object). I.e., steps D3 through D4 may exponentially decrease the probability of success of the non-living attack.

In the present embodiment, the living body detection is performed using only the first branch network in the initial model. Since the second convolutional network in the first branch network is jointly trained over the first branch network and said second branch network. Therefore, the excellent detection precision is ensured, unnecessary calculation amount is reduced, and the living body detection efficiency is improved.

Fig. 11 is a schematic diagram of a training device 11 for a living body detection model, referring to fig. 11, fig. 11 shows a training device for a living body detection model provided by the present application, and the training device for a living body detection model shown in fig. 11 includes:

a first acquisition unit 111 for acquiring an initial model and a sample data set; the initial model comprises a first convolution network, a second convolution network and a third convolution network; the first convolution network and the second convolution network form a first branch network; the first convolution network and the third convolution network form a second branch network;

a training unit 112, configured to input the sample data set into the first branch network and the second branch network, calculate a joint loss function, train the second convolutional network according to the joint loss function, and obtain a trained second convolutional network;

and a clipping unit 113, configured to use the first convolutional network and the trained second convolutional network as a target living body detection model.

According to the training device for the living body detection model, the first branch network and the second branch network in the initial model are used for jointly training the second convolution network in the first branch network. And using the first convolution network and the trained second convolution network as a target living body detection model. In the process, the third convolution network in the second branch network is only used for assisting in training the second convolution network, so that the excellent detection precision of the target living body detection model is ensured, unnecessary calculated amount is reduced, and living body detection efficiency is improved.

Referring to fig. 12, fig. 12 is a schematic diagram of an apparatus for in-vivo detection, as shown in fig. 12, and includes:

a second acquisition unit 121 for acquiring an image to be detected;

a detection unit 122, configured to input the image to be detected into a target living body detection model, and obtain a detection result output by the target living body detection model; the detection result is used for representing the probability that the image to be detected is a living body face image; the target living body detection model is a first branch network in an initial model, and the first branch network comprises a first convolution network and a second convolution network; the initial model also comprises a second branch network; the first branch network and the second branch network are used for jointly training the second convolution network;

and a judging unit 123, configured to determine whether the image to be detected is a living body face image according to the detection result.

The device for detecting the living body only adopts the first branch network in the initial model to detect the living body. Since the second convolutional network in the first branch network is jointly trained over the first branch network and said second branch network. Therefore, the excellent detection precision is ensured, unnecessary calculation amount is reduced, and the living body detection efficiency is improved.

Fig. 13 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 13, a terminal device 13 of this embodiment includes: a processor 131, a memory 132, and a computer program 133 stored in the memory 132 and executable on the processor 131, such as a training program for a living body detection model. The processor 131, when executing the computer program 133, implements the steps of the training method embodiment of each of the above-described living body detection models, for example, steps 101 to 103 shown in fig. 1. Alternatively, the processor 131 may perform the functions of the units in the above-described device embodiments, such as the functions of the units 111 to 113 shown in fig. 11, when executing the computer program 134.

Illustratively, the computer program 133 may be partitioned into one or more units that are stored in the memory 132 and executed by the processor 131 to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing a specific function describing the execution of the computer program 133 in the one terminal device 13. For example, the computer program 133 may be divided into units with the following specific functions:

Including but not limited to a processor 131, a memory 132. It will be appreciated by those skilled in the art that fig. 13 is merely an example of one type of terminal device 13 and is not meant to be limiting as to one type of terminal device 13, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the one type of terminal device may also include input and output devices, network access devices, buses, etc.

The processor 131 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 132 may be an internal storage unit of the terminal device 13, such as a hard disk or a memory of the terminal device 13. The memory 132 may also be an external storage device of the terminal device 13, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 13. Further, the memory 132 may also include both an internal storage unit and an external storage device of the one terminal device 13. The memory 132 is used to store the computer program and other programs and data required for the one terminal device. The memory 132 may also be used to temporarily store data that has been output or is to be output.

Fig. 14 is a schematic view of a living body detecting apparatus according to an embodiment of the present invention. As shown in fig. 14, a living body detection apparatus 14 of this embodiment includes: the apparatus includes an imaging module 141, a processor 142, a memory 143, and a computer program 144 stored in the memory 143 and executable on the processor 142, for example, a program for living body detection. The processor 142, when executing the computer program 144, performs the steps of the method embodiment of one of the above-described living body detection methods, such as steps 701 to 703 shown in fig. 7. Alternatively, the processor 142, when executing the computer program 144, performs the functions of the units in the above-described device embodiments, for example, the functions of the units 121 to 123 shown in fig. 12.

Illustratively, the computer program 144 may be partitioned into one or more units that are stored in the memory 143 and executed by the processor 142 to accomplish the present invention. The one or more elements may be a series of computer program instruction segments capable of performing a specified function describing the execution of the computer program 144 in the one in-vivo detection device 14. For example, the specific functions of the computer program 144 may be divided into units as follows:

The second acquisition unit is used for acquiring the image to be detected;

The living body detection device can be a device with a face detection function, such as a punched-card machine, a mobile terminal, a notebook computer, a tablet computer and the like. The living body detection apparatus includes, but is not limited to, an image pickup module 141, a processor 142, and a memory 143. It will be appreciated by those skilled in the art that fig. 14 is merely an example of one type of living being detection device 14 and is not meant to be limiting of one type of living being detection device 14, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the one type of living being detection device may also include an input-output device, a network access device, a bus, etc.

The device module 141 is configured to collect image information.

The processor 142 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 143 may be an internal storage unit of the one living body detecting device 14, such as a hard disk or a memory of the one living body detecting device 14. The memory 143 may also be an external storage device of the one living body detection device 14, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the one living body detection device 14. Further, the memory 143 may also include both an internal storage unit and an external storage device of the one living body detection device 14. The memory 143 is used to store the computer program and other programs and data required for the one living body detection apparatus. The memory 143 may also be used to temporarily store data that has been output or is to be output.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that may be performed in the various method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to the photographing device/living body detecting apparatus, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electric carrier signal, a telecommunication signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to a detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is monitored" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon monitoring a [ described condition or event ]" or "in response to monitoring a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of training a living body detection model, the method comprising:

taking the first convolution network and the trained second convolution network as a target living body detection model;

the sample data set includes a sample image; the acquiring a sample data set includes: acquiring a first original image; detecting a face region in the first original image, and acquiring the sample image corresponding to the face region;

the detecting the face region in the first original image, and obtaining the sample image corresponding to the face region, includes: obtaining a first face frame of the first original image through a face detection model; the first face frame is used for marking a face area; amplifying the size of the first face frame to a second preset size to obtain an amplified first face frame; intercepting an area corresponding to the amplified first face frame from the first original image to obtain a second original image; and reducing the size of the second original image to a third preset size to obtain the sample image.

2. The method of claim 1, wherein the sample data set includes a sample image, an original fourier spectrum plot corresponding to the sample image, and annotation information corresponding to the sample image; the labeling information is used for labeling the image type of the sample image; the image types comprise living face images and non-living face images;

inputting the sample data into the first branch network and the second branch network, calculating a joint loss function, training the second convolution network according to the joint loss function, and obtaining a trained second convolution network, wherein the method comprises the following steps:

each sample data set circularly executes the following steps to obtain a trained second convolution network;

the method comprises the following steps of:

inputting the sample image into the first convolution network to obtain characteristic data output by the first convolution network;

inputting the characteristic data into a second convolution network to obtain a prediction result output by the second convolution network;

calculating a first loss function between the prediction result and the labeling information;

inputting the characteristic data into a third convolution network to obtain a predicted Fourier spectrogram output by the third convolution network;

Calculating a second loss function between the predicted fourier spectrum graph and the original fourier spectrum graph;

-taking the first and second loss functions as the joint loss function;

and updating parameters of the second convolution network according to the joint loss function.

3. The method of claim 1, wherein the sample data set further comprises an original fourier spectrum plot corresponding to the sample image and labeling information corresponding to the sample image;

the obtaining the initial model and the sample data set includes:

and calculating an original Fourier spectrogram of the sample image according to the sample image.

4. The method of claim 1, wherein said calculating an original fourier spectrum of said sample image from said sample image comprises:

calculating an initial fourier spectrum of the sample image;

normalizing the initial Fourier spectrum;

and reducing the size of the normalized initial Fourier spectrum to a first preset size to obtain the original Fourier spectrum.

5. A method of in vivo detection, the method comprising:

Acquiring an image to be detected;

determining whether the image to be detected is a living body face image or not according to the detection result;

the obtaining the image to be detected comprises the following steps:

acquiring a third original image; obtaining a second face frame of the third original image through a face detection model; the second face frame is used for marking a face area; amplifying the size of the second face frame to a fourth preset size to obtain an amplified second face frame; intercepting an area corresponding to the amplified second face frame from the third original image to obtain a fourth original image; and reducing the size of the fourth original image to a fifth preset size to obtain the image to be detected.

6. The method according to claim 5, wherein determining whether the image to be detected is a live face image according to the detection result includes:

if the detection result is greater than a threshold value, confirming that the image to be detected is a living body face image;

and if the detection result is not greater than the threshold value, confirming that the image to be detected is a non-living human face image.

7. The method of claim 5, wherein the acquiring the image to be detected comprises:

acquiring a plurality of images to be detected;

inputting the image to be detected into a target living body detection model to obtain a detection result output by the target living body detection model, wherein the detection result comprises the following steps:

sequentially inputting the images to be detected into the target living body detection model to obtain detection results corresponding to the images to be detected output by the target living body detection model;

the determining whether the image to be detected is a living body face image according to the detection result includes:

if the detection results are all greater than the threshold value, confirming that the image to be detected is a living body face image;

and if the detection results are not all greater than the threshold value, confirming that the image to be detected is a non-living human face image.

8. A training device for a living body detection model, the training device comprising:

a first acquisition unit for acquiring an initial model and a sample data set; the initial model comprises a first convolution network, a second convolution network and a third convolution network; the first convolution network and the second convolution network form a first branch network; the first convolution network and the third convolution network form a second branch network; the sample data set includes a sample image;

the clipping unit is used for taking the first convolution network and the trained second convolution network as a target living body detection model;

the first acquisition unit is also used for acquiring a first original image; detecting a face region in the first original image, and acquiring the sample image corresponding to the face region;

the first acquisition unit is also used for acquiring a first face frame of the first original image through a face detection model; the first face frame is used for marking a face area; amplifying the size of the first face frame to a second preset size to obtain an amplified first face frame; intercepting an area corresponding to the amplified first face frame from the first original image to obtain a second original image; and reducing the size of the second original image to a third preset size to obtain the sample image.

9. A device for in vivo detection, the device comprising:

the second acquisition unit is used for acquiring the image to be detected;

the judging unit is used for determining whether the image to be detected is a living body face image or not according to the detection result;

the second acquisition unit is also used for acquiring a third original image; obtaining a second face frame of the third original image through a face detection model; the second face frame is used for marking a face area; amplifying the size of the second face frame to a fourth preset size to obtain an amplified second face frame; intercepting an area corresponding to the amplified second face frame from the third original image to obtain a fourth original image; and reducing the size of the fourth original image to a fifth preset size to obtain the image to be detected.

10. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when the computer program is executed.

11. A biopsy device comprising a camera module, a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 5 to 7 when the computer program is executed.

12. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 4 or claims 5 to 7.