CN113569723A

CN113569723A - A face detection method, device, electronic device and storage medium

Info

Publication number: CN113569723A
Application number: CN202110850122.XA
Authority: CN
Inventors: 胡一博; 吴杭通; 石海林; 梅涛; 周伯文
Original assignee: Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-29
Anticipated expiration: 2041-07-27
Also published as: CN113569723B

Abstract

The invention provides a face detection method, device, electronic equipment and storage medium, which perform multi-scale transformation processing on the low-light image to be detected that has undergone image enhancement processing; for each pre-trained face detection model, multiple The multiple low-light images of the target to be detected obtained by the scale transformation are input into the pre-trained face detection model, so that the pre-trained face detection model performs face detection on each low-light image of the target to be detected, and each target to be detected is obtained. The face detection result of the low-light image, the result of combining the face detection results of each target low-light image to be detected is processed, and the target face detection result of the pre-trained face detection model is obtained; The target face detection results of the trained face detection model are merged and subjected to non-maximum suppression processing to obtain the face detection result of the low-light image to be detected. The present invention can improve the accuracy of face detection under low light conditions.

Description

Face detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of face detection technologies, and in particular, to a face detection method, an apparatus, an electronic device, and a storage medium.

Background

In recent years, face recognition is the most popular topic in the research of the visual field, and has important application value in the fields of security monitoring, testimony comparison, face unlocking, man-machine interaction, face special effect and the like, and the basis of face recognition is face detection.

With the development of deep learning, the face detection has made great progress, and very excellent performance can be achieved in most scenes. Outdoor scenes are susceptible to complex visual degradation in inclement weather, for example, the task of face detection in low light conditions can result in reduced visibility and loss of signal detail, leading to missed and false detections.

Disclosure of Invention

In view of this, the present invention provides a face detection method, an apparatus, an electronic device and a storage medium, so as to achieve the purpose of improving the face detection accuracy under the low light condition.

The invention discloses a face detection method in a first aspect, which comprises the following steps:

acquiring a low-illumination image to be detected, and performing image enhancement processing on the low-illumination image to be detected;

performing multi-scale transformation processing on the to-be-detected low-illumination image subjected to image enhancement processing to obtain a plurality of target to-be-detected low-illumination images, wherein the sizes of the target to-be-detected low-illumination images are different;

aiming at each pre-trained face detection model, inputting a plurality of low-illumination images to be detected of the target into the pre-trained face detection model, so that the pre-trained face detection model performs face detection on each low-illumination image to be detected of the target to obtain a face detection result of each low-illumination image to be detected of the target, and processing a result obtained by combining the face detection results of each low-illumination image to be detected of the target to obtain a target face detection result of the pre-trained face detection model;

and carrying out non-maximum suppression processing on the result obtained by combining the target face detection results of the pre-trained face detection models to obtain the face detection result of the low-illumination image to be detected.

Optionally, the acquiring a low-illumination image to be detected and performing image enhancement processing on the low-illumination image to be detected includes:

acquiring a low-illumination image to be detected;

and performing image enhancement processing on the low-illumination image to be detected by using a multi-scale retina enhancement algorithm MSRCR with color recovery.

Optionally, the pre-trained face detection models have different backbone networks, and each pre-trained face detection model is obtained by training a historical low-illumination image to be detected, which is subjected to image enhancement processing.

Optionally, the process of training the face detection model to be trained by using the historical low-illumination image to be detected subjected to image enhancement processing to obtain the pre-trained face detection model includes:

acquiring a plurality of training data, wherein each training data is obtained by performing image enhancement processing on a historical low-illumination image to be detected by using a multi-scale retina enhancement algorithm MSRCR with color recovery and a small-face data enhancement algorithm;

and inputting the training data into a face detection model to be trained aiming at each training data, so that the face detection model to be trained carries out face detection on the training data to obtain a face detection result of the historical low-illumination image to be detected, and calculating a corresponding loss function according to the face detection result of the historical low-illumination image to be detected so as to adjust the parameters of the face detection model to be trained according to the loss function until the face detection model to be trained converges to obtain the face detection model.

Optionally, if the main network of the face detection model to be trained is a ResNet50 neural network, the ResNet50 neural network includes an input layer, 3 stages, convolutional layers connected to each stage, and a context module connected to each convolutional layer, and the process of obtaining the pre-trained face detection model by training the face detection model to be trained by using the historical low-illumination image to be detected, which is subjected to image enhancement, includes:

inputting the training data into a ResNet50 neural network to be trained through the input layer;

processing the training data by using the 3 stages respectively to obtain a characteristic diagram corresponding to each stage;

respectively utilizing the convolution layers corresponding to each stage to carry out reduction processing on the feature graph corresponding to each stage to obtain a target feature graph corresponding to each convolution layer, and inputting a fusion result obtained by carrying out multi-scale fusion processing on each target feature graph into the context module corresponding to each convolution layer;

and calculating a multitask loss function of each context module based on a feature map obtained by processing an input fusion result by each context module, and adjusting parameters of the ResNet50 neural network to be trained by using the multitask loss function of each context module until the ResNet50 neural network to be trained converges to obtain a face detection model.

Optionally, the face detection result at least includes at least one face frame and a confidence corresponding to each face frame, and the method further includes:

judging whether the confidence corresponding to each face frame is larger than a preset confidence threshold;

and if the confidence corresponding to each face frame is greater than the preset confidence threshold, determining the face detection result as the final face detection result of the low-illumination image to be detected.

A second aspect of the present invention discloses a face detection apparatus, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring a low-illumination image to be detected and carrying out image enhancement processing on the low-illumination image to be detected;

the multi-scale transformation processing unit is used for carrying out multi-scale transformation processing on the to-be-detected low-illumination image subjected to image enhancement processing to obtain a plurality of target to-be-detected low-illumination images, wherein the sizes of the target to-be-detected low-illumination images are different;

a face detection unit, configured to input a plurality of low-illumination images to be detected of the target into each pre-trained face detection model, so that the pre-trained face detection model performs face detection on each low-illumination image to be detected of the target, so as to obtain a face detection result of each low-illumination image to be detected of the target, and process a result obtained by combining the face detection results of each low-illumination image to be detected of the target, so as to obtain a target face detection result of the pre-trained face detection model;

and the non-maximum suppression processing unit is used for performing non-maximum suppression processing on a result obtained by combining the target face detection results of the pre-trained face detection models to obtain a face detection result of the low-illumination image to be detected.

Optionally, the first obtaining unit includes:

the second acquisition unit is used for acquiring a low-illumination image to be detected;

and the image enhancement processing unit is used for carrying out image enhancement processing on the low-illumination image to be detected by utilizing a multi-scale retina enhancement algorithm MSRCR with color recovery.

A third aspect of the present invention discloses an electronic apparatus comprising: the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; a memory for storing a program for implementing the face detection method as disclosed in the first aspect of the invention.

In a fourth aspect of the present invention, a computer-readable storage medium is disclosed, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the face detection method disclosed in the first aspect of the present invention.

The invention provides a face detection method, a face detection device, electronic equipment and a storage medium, wherein a low-illumination image to be detected is obtained, and image enhancement processing is carried out on the low-illumination image to be detected; carrying out multi-scale transformation processing on the to-be-detected low-illumination image subjected to image enhancement processing to obtain a plurality of target to-be-detected low-illumination images, wherein the sizes of the target to-be-detected low-illumination images are different; aiming at each pre-trained face detection model, inputting a plurality of low-illumination images to be detected of the target into the pre-trained face detection model, so that the pre-trained face detection model performs face detection on each low-illumination image to be detected of the target to obtain a face detection result of each low-illumination image to be detected of the target, and processing a result obtained by combining the face detection results of each low-illumination image to be detected of the target to obtain a target face detection result of the pre-trained face detection model; and carrying out non-maximum suppression processing on the target face detection result of each pre-trained face detection model to obtain the face detection result of the low-illumination image to be detected. According to the technical scheme, the human face detection model is obtained by training the historical low-illumination image to be detected after image enhancement, the model can better learn the characteristics of the human face in a low-illumination outdoor scene, the performance of the model in the scene is greatly improved, then after the low-illumination image to be detected is subjected to image enhancement and multi-scale transformation to obtain a plurality of low-illumination images to be detected of targets with different sizes, the plurality of low-illumination images to be detected of targets with different sizes are input, the human face detection model is obtained by training the historical low-illumination image to be detected after the image enhancement is performed on each low-illumination image to be detected, and the human face detection precision can be improved under the low-illumination condition.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a ResNet50 neural network according to an embodiment of the present invention;

fig. 3 is an exemplary diagram of a face detection method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules, or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules, or units.

It is noted that references to "a", "an", and "the" modifications in the disclosure are exemplary rather than limiting, and that those skilled in the art will understand that "one or more" unless the context clearly dictates otherwise.

Referring to fig. 1, a schematic flow diagram of a face detection method provided by an embodiment of the present invention is shown, where the face detection method specifically includes the following steps:

s101: and acquiring a low-illumination image to be detected, and performing image enhancement processing on the low-illumination image to be detected.

In the specific process of executing step S101, a low-light image to be detected is obtained, and image enhancement processing is performed on the obtained low-light image to be detected by using a Multi-Scale retina enhancement algorithm (MSRCR) with Color recovery.

The MSRCR is an image enhancement method widely applied based on Retinex theory, and Retinex is established on the basis of scientific experiments and scientific analysis and can adaptively enhance various different types of images. In the practical process of the application, a parameter suitable for a low-illumination condition can be set so as to achieve a better image enhancement effect.

S102: and performing multi-scale transformation processing on the to-be-detected low-illumination image subjected to image enhancement processing to obtain a plurality of target to-be-detected low-illumination images, wherein the sizes of the target to-be-detected low-illumination images are different.

In the specific execution process of step S102, after the low-illumination image to be detected is subjected to image enhancement processing, a multi-level multi-scale fusion strategy is used to perform multi-scale transformation processing on the low-illumination image to be detected subjected to image enhancement processing, so as to obtain a plurality of target low-illumination images to be detected of different sizes.

S103: and aiming at each pre-trained face detection model, inputting a plurality of low-illumination images to be detected of targets into the pre-trained face detection model, so that the pre-trained face detection model performs face detection on each low-illumination image to be detected of the targets to obtain a face detection result of each low-illumination image to be detected of the targets, and processing a result obtained by combining the face detection results of each low-illumination image to be detected of the targets to obtain a target face detection result of the pre-trained face detection model.

In the process of step S103, a plurality of face detection models are trained in advance, each face detection model is obtained by training a historical low-light image to be detected, which is subjected to image enhancement processing, wherein the backbone network of each pre-trained face detection model is different.

Specifically, the process of training the historical low-illumination image to be detected for image enhancement processing to obtain the face detection model includes: acquiring a plurality of historical low-illumination images to be detected, and performing image enhancement processing on each historical low-illumination image to be detected by using a multi-scale retina enhancement algorithm MSRCR with color recovery and a small face data enhancement algorithm to obtain training data corresponding to each historical low-illumination image to be detected.

And aiming at each training data, inputting the training data into the face detection model to be trained so that the face detection model to be trained performs face detection on the training data to obtain a face detection result of the historical low-illumination image to be detected, and calculating a corresponding loss function according to the face detection result of the historical low-illumination image to be detected so as to adjust parameters of the face detection model to be trained according to the loss function until the face detection model to be trained converges to obtain the face detection model.

In the embodiment of the present application, a specific process of processing the historical low-illumination image to be detected by using the small-face data enhancement algorithm may be as follows: randomly selecting a face in the history low-illumination image to be detected, recording the size of the selected face as s, selecting the value with the minimum absolute value of the difference value of s from {16, 32,64,128,256,512}, and recording the value as a₁Index is ind₁Then from [1, ind ]₁]Randomly selects an index ind₂Obtaining the value a corresponding to the index₂Calculating scale ═ a₂(s) zooming in and out the image on a scale basisThen, the scaled image is randomly cropped and filled to obtain an image with a size of 640 x 640. After the image is subjected to the first data enhancement, we need to perform a cropping operation again, specifically, randomly select a value from {0.3,0.45,0.6,0.8,1.0}, for example, select 0.6, then calculate to obtain 0.6 × 640, crop a block 384 × 384 size from the 640 × 640 image, and include the image block of the face, and then re-enlarge the image block to 640 × 640 size.

In the embodiment of the present application, the backbone network of the face detection model may be a ResNet50 neural network, and the ResNet50 neural network includes an input layer, 3 stages, convolutional layers connected to each stage, and a context module connected to each convolutional layer, as shown in fig. 2.

If the main network of the face detection model to be trained is a ResNet50 neural network, the process of obtaining the pre-trained face detection model by training the face detection model to be trained by using the historical low-illumination image to be detected, which is subjected to image enhancement, may be as follows: acquiring training data with the size of 640 x 3, inputting the training data into a ResNet50 neural network to be trained through an input layer of the ResNet50 neural network to be trained, and processing the input training data by using 3 different stages respectively to obtain the feature maps P2 of the second stage of the first stage of the feature maps P1 and 40 x 1024 with the size of 80 x 256 and the feature map P3 of the third stage of the size of 20 x 2048.

Reducing the number of channels of the feature map P1 to 256 by using the convolution layer corresponding to the first stage to obtain a target feature map P1 ' with the size of 80 × 256, reducing the number of channels of the feature map P2 to 256 by using the convolution layer corresponding to the second stage to obtain a target feature map P2 ' with the size of 40 × 256, and reducing the number of channels of the feature map P3 to 256 by using the convolution layer corresponding to the third stage to obtain a target feature map P3 ' with the size of 20 × 256; fusing the target feature map P3 'with the empty target feature map to obtain a fused result C3, namely directly taking the target feature map P3' as the fused result C3, inputting a context module corresponding to the convolutional layer corresponding to the third stage so that the upper text module and the lower text module can be processed as the fused result C3, and outputting the feature map with the original size; fusing the target feature map P2' with the fusion result C3 to obtain a fusion result C2, inputting the fusion result C2 into a context module corresponding to the convolution layer corresponding to the second stage, so that the upper text module and the lower text module are used for processing the fusion result C2, and outputting a feature map with the original size; and fusing the target feature map P1' with the fusion result C2 to obtain a fusion result C1, inputting the fusion result C1 into a context module corresponding to the convolution layer corresponding to the first stage, so that the upper text module and the lower text module are used for processing the fusion result C1, and outputting the feature map with the original size. The context module is composed of a plurality of convolution layers with convolution kernel sizes larger than 5 x 5 and 7 x 7, and can achieve the purpose of introducing more context information by enlarging the receptive field on the basis of ensuring the size of the output feature graph unchanged.

And aiming at the feature diagram which is output by each context module and keeps the original size, calculating the position loss and the classification loss of the regression frame of the context module according to the feature diagram which is output by the context module and keeps the original size, and further calculating the multitask loss function of the context module according to the position loss and the classification loss of the regression frame of the context module. Taking C3 as an example, the feature size of C3 is 20 × 256 after passing through the corresponding context module, and in order to calculate the position loss of the regression frame of the context module, the feature size needs to be changed to 20 × 4 by the detection head (a convolution operation of 1 × 1), wherein the channel dimension is 4, and represents the position information of the regression frame; similarly, in order to calculate the regression frame classification loss, it is necessary to change the feature size to 20 × 2 through a classification header (a convolution operation of 1 × 1), where the channel dimension is 2, and represents the classification information of the regression frame (whether the face is a human face or not), and finally add the calculated regression frame position loss function to the regression frame classification loss function to obtain the multitask loss function of the context module, as shown in formula (1).

After the multitask loss function of each context module is calculated, the parameters of the ResNet50 neural network to be trained of the multitask loss function of each context module are used for adjusting until the ResNet50 neural network to be trained converges, and a face detection model is obtained.

Where i represents the index of the regression box, P_iProbability, P, representing whether the regression box is a face_i ^*The label 1 that is the regression box represents a face, and 0 represents not a face. t is t_iIs the positional information of the regression frame, t_i ^*Is the position information of the face regression box. L is_clsThe softmax loss function of two types (human face and background) is used, the smoothL1 loss function is used for Lreg, and P_i ^*Multiplication ensures that only the position loss function is calculated for the face regression box, and lambda is used for balancing the regression box position loss function and the regression box classification loss function.

Further, in this embodiment of the application, after the face detection model is trained, the trained face detection model can be further tested, specifically: acquiring a plurality of historical low-illumination images to be detected, respectively carrying out image enhancement processing on each historical low-illumination image to be detected by utilizing a multi-scale retina enhancement algorithm MSRCR with color recovery, and carrying out multi-scale transformation processing on the low-illumination images to be detected subjected to the image enhancement processing to obtain test data of each historical low-illumination image to be detected, wherein the test data of each historical low-illumination image to be detected comprises a plurality of historical low-illumination images to be detected subjected to the image enhancement processing and in different sizes.

Aiming at each test data, a face detection model pre-trained by the test data is used, so that the pre-trained face detection model processes each historical low-illumination image to be detected which is subjected to image enhancement processing and has different sizes, the face detection result of each historical low-illumination image to be detected which is subjected to image enhancement processing and has different sizes is obtained, and the combined result of the face detection results is subjected to non-maximum suppression processing to obtain the face detection result of the historical low-illumination image to be detected corresponding to the test data and the face detection result of the historical low-illumination image to be detected corresponding to the test data, actual face detection calculation of historical low-illumination images to be detected corresponding to the test data, calculating Average value (mAP) of AP of each category, and when the mAP of the face detection model reaches a preset mAP, determining that the performance of the face detection model reaches the standard.

Among them, AP is the most commonly used evaluation index in target detection, and mean Average Precision (mapp) is the Average value of each class of AP, because there is only one class, i.e. face, in the face detection task. AP is the area under the PR curve (Precision-Recall curve), the larger the area, the higher the AP, i.e. the better the model performance, with the range of AP being 0-1. The predetermined mAP may be 0.6944, and may be set according to practical applications, and the embodiment of the present application is not limited.

In the specific execution process of step S103, a plurality of target to-be-detected low-illumination images are input into each pre-trained face detection model, and for each pre-trained face detection model, the pre-trained face detection model performs face detection on each target to-be-detected low-illumination image to obtain a face detection result of each target to-be-detected low-illumination image, merges the face detection results of each target to-be-detected low-illumination image, and performs non-maximum suppression processing on the merged result to obtain a target face detection result of the pre-trained face detection model.

S104: and carrying out non-maximum suppression processing on the result obtained by combining the target face detection results of the pre-trained face detection models to obtain the face detection result of the low-illumination image to be detected.

In the specific process of executing step S104, after the target face detection result of each pre-trained face detection module is obtained, the target face detection results of each pre-trained face detection module are merged, and the merged result is subjected to non-maximum suppression processing to obtain the face detection result of the low-illumination image to be detected.

Further, in the embodiment of the application, the face detection result of the low-illumination image to be detected includes at least one face frame, the position information of each face frame, and the confidence level of the face frame, so that whether the confidence level corresponding to each face frame is greater than a preset confidence level threshold value can be judged arbitrarily; and if the confidence corresponding to each face frame is greater than a preset confidence threshold, determining the face detection result as the final face detection result of the low-illumination image to be detected. And if the confidence corresponding to a certain face frame is not greater than the preset confidence threshold, carrying out face detection on the low-illumination image to be detected again.

The invention provides a face detection method, which comprises the steps of obtaining a low-illumination image to be detected, and carrying out image enhancement processing on the low-illumination image to be detected; carrying out multi-scale transformation processing on the to-be-detected low-illumination image subjected to image enhancement processing to obtain a plurality of target to-be-detected low-illumination images, wherein the sizes of the target to-be-detected low-illumination images are different; aiming at each pre-trained face detection model, inputting a plurality of low-illumination images to be detected of targets into the pre-trained face detection model, so that the pre-trained face detection model performs face detection on each low-illumination image to be detected of the targets to obtain a face detection result of each low-illumination image to be detected of the targets, and processing a result obtained by combining the face detection results of each low-illumination image to be detected of the targets to obtain a target face detection result of the pre-trained face detection model; and carrying out non-maximum suppression processing on the target face detection result of each pre-trained face detection model to obtain the face detection result of the low-illumination image to be detected. According to the technical scheme, the human face detection model is obtained by training the historical low-illumination image to be detected after image enhancement, the model can better learn the characteristics of the human face in a low-illumination outdoor scene, the performance of the model in the scene is greatly improved, then after the low-illumination image to be detected is subjected to image enhancement and multi-scale transformation to obtain a plurality of low-illumination images to be detected of targets with different sizes, the plurality of low-illumination images to be detected of targets with different sizes are input, the human face detection model is obtained by training the historical low-illumination image to be detected after the image enhancement is performed on each low-illumination image to be detected, and the human face detection precision can be improved under the low-illumination condition.

In order to better understand the above, the following description is given by way of example.

For example, as shown in fig. 3, a 3-person face detection model is trained in advance, a low-illumination image to be detected (such as an input image in fig. 3) is obtained, image enhancement processing is performed on the low-illumination image to be detected, and the resolution of the image subjected to image enhancement is amplified by 2 times and 3 times respectively to obtain two target low-illumination images to be detected with larger scales, namely a target low-illumination image to be detected with a size of 1 and a target low-illumination image to be detected with a size of 2.

Inputting the low-illumination image to be detected of the target with the size of 1 and the low-illumination image to be detected of the target with the size of 2 into a pre-trained face detection model I, so that the pre-trained face detection model performs face detection on each low-illumination image to be detected of the target to obtain the face detection result of each low-illumination image to be detected of the target, and performing non-maximum suppression processing on the result obtained by combining the face detection results of each low-illumination image to be detected of the target to obtain the target face detection result of the pre-trained face detection model I.

Inputting the low-illumination image to be detected of the target with the size of 1 and the low-illumination image to be detected of the target with the size of 2 into a pre-trained face detection model II, so that the pre-trained face detection model carries out face detection on the low-illumination image to be detected of each target to obtain the face detection result of the low-illumination image to be detected of each target, and carrying out non-maximum suppression processing on the result obtained by combining the face detection results of the low-illumination images to be detected of each target to obtain the target face detection result of the pre-trained face detection model II.

Inputting the low-illumination image to be detected of the target with the size of 1 and the low-illumination image to be detected of the target with the size of 2 into a pre-trained face detection model III, so that the pre-trained face detection model III performs face detection on each low-illumination image to be detected of the target to obtain a face detection result of each low-illumination image to be detected of the target, and performing non-maximum suppression processing on a result obtained by combining the face detection results of each low-illumination image to be detected of the target to obtain a target face detection result of the pre-trained face detection model III.

And after the target face detection result of each pre-trained face detection module is obtained, combining the target face detection results of each pre-trained face detection module, and performing non-maximum suppression processing on the combined result to obtain the face detection result of the low-illumination image to be detected.

Based on the face detection method disclosed by the embodiment of the invention, the embodiment of the invention also correspondingly discloses a face detection device, and as shown in fig. 4, the face detection device comprises:

the first obtaining unit 41 is configured to obtain a low-illumination image to be detected, and perform image enhancement processing on the low-illumination image to be detected;

the multi-scale transformation processing unit 42 is configured to perform multi-scale transformation processing on the to-be-detected low-illumination image subjected to the image enhancement processing to obtain a plurality of target to-be-detected low-illumination images, where the size of each target to-be-detected low-illumination image is different;

a face detection unit 43, configured to input multiple target low-illumination images to be detected into the pre-trained face detection model for each pre-trained face detection model, so that the pre-trained face detection model performs face detection on each target low-illumination image to be detected, so as to obtain a face detection result of each target low-illumination image to be detected, and process a result obtained by combining the face detection results of each target low-illumination image to be detected, so as to obtain a target face detection result of the pre-trained face detection model;

and the non-maximum suppression processing unit 44 is configured to perform non-maximum suppression processing on a result obtained by combining the target face detection results of the pre-trained face detection models, so as to obtain a face detection result of the low-illumination image to be detected.

The specific principle and the execution process of each unit in the face detection device disclosed in the embodiment of the present invention are the same as those of the face detection method disclosed in the embodiment of the present invention, and reference may be made to corresponding parts in the face detection method disclosed in the embodiment of the present invention, which are not described herein again.

The invention provides a face detection device, which is used for acquiring a low-illumination image to be detected and carrying out image enhancement processing on the low-illumination image to be detected; carrying out multi-scale transformation processing on the to-be-detected low-illumination image subjected to image enhancement processing to obtain a plurality of target to-be-detected low-illumination images, wherein the sizes of the target to-be-detected low-illumination images are different; aiming at each pre-trained face detection model, inputting a plurality of low-illumination images to be detected of targets into the pre-trained face detection model, so that the pre-trained face detection model performs face detection on each low-illumination image to be detected of the targets to obtain a face detection result of each low-illumination image to be detected of the targets, and processing a result obtained by combining the face detection results of each low-illumination image to be detected of the targets to obtain a target face detection result of the pre-trained face detection model; and carrying out non-maximum suppression processing on the target face detection result of each pre-trained face detection model to obtain the face detection result of the low-illumination image to be detected. According to the technical scheme, the human face detection model is obtained by training the historical low-illumination image to be detected after image enhancement, the model can better learn the characteristics of the human face in a low-illumination outdoor scene, the performance of the model in the scene is greatly improved, then after the low-illumination image to be detected is subjected to image enhancement and multi-scale transformation to obtain a plurality of low-illumination images to be detected of targets with different sizes, the plurality of low-illumination images to be detected of targets with different sizes are input, the human face detection model is obtained by training the historical low-illumination image to be detected after the image enhancement is performed on each low-illumination image to be detected, and the human face detection precision can be improved under the low-illumination condition.

Optionally, the first obtaining unit includes:

and the image enhancement processing unit is used for performing image enhancement processing on the low-illumination image to be detected by using a multi-scale retina enhancement algorithm MSRCR with color recovery.

Optionally, the pre-trained face detection models have different backbone networks, and each pre-trained face detection model is obtained by training with a training unit.

Optionally, the training unit includes:

the third acquisition unit is used for acquiring a plurality of training data, and each training data is obtained by performing image enhancement processing on the historical low-illumination image to be detected by using a multi-scale retina enhancement algorithm MSRCR with color recovery and a small-face data enhancement algorithm;

and the training subunit is used for inputting the training data into the to-be-trained face detection model aiming at each training data, so that the to-be-trained face detection model performs face detection on the training data to obtain a face detection result of the historical to-be-detected low-illumination image, and calculating a corresponding loss function according to the face detection result of the historical to-be-detected low-illumination image, so as to adjust the parameters of the to-be-trained face detection model according to the loss function until the to-be-trained face detection model converges, and thus obtaining the face detection model.

Optionally, if the backbone network of the face detection model to be trained is a ResNet50 neural network, the ResNet50 neural network includes an input layer, 3 stages, convolutional layers connected to each stage, and a context module connected to each convolutional layer, and the face detection unit includes:

the input unit is used for inputting training data into a ResNet50 neural network to be trained through an input layer;

the first processing unit is used for processing the training data by utilizing 3 stages respectively to obtain a characteristic diagram corresponding to each stage;

the second processing unit is used for respectively utilizing the convolution layers corresponding to each stage to carry out reduction processing on the feature graph corresponding to each stage to obtain a target feature graph corresponding to each convolution layer, and inputting a fusion result obtained by carrying out multi-scale fusion processing on each target feature graph into a context module corresponding to each convolution layer;

and the parameter adjusting unit is used for calculating the multitask loss function of each context module based on a feature map obtained by processing the input fusion result by each context module, and adjusting the parameters of the ResNet50 neural network to be trained by using the multitask loss function of each context module until the ResNet50 neural network to be trained converges to obtain the face detection model.

Further, the face detection result at least includes at least one face frame and the confidence corresponding to each face frame, and the face detection apparatus provided by the present invention further includes:

the judging unit is used for judging whether the confidence coefficient corresponding to each face frame is larger than a preset confidence coefficient threshold value or not;

and the determining unit is used for determining the face detection result as the final face detection result of the low-illumination image to be detected if the confidence coefficient corresponding to each face frame is greater than a preset confidence coefficient threshold value.

An embodiment of the present application provides an electronic device, as shown in fig. 5, the electronic device includes a processor 501 and a memory 502, the memory 502 is used for program codes and data of face detection, and the processor 501 is used for calling program instructions in the memory to execute steps shown in the method for implementing face detection in the foregoing embodiments.

The embodiment of the application provides a storage medium, the storage medium comprises a storage program, and when the program runs, a device where the storage medium is located is controlled to execute the face detection method shown in the embodiment.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are merely illustrative, wherein units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. a face detection method, is characterized in that, described method comprises:

acquiring a low-light image to be detected, and performing image enhancement processing on the low-light image to be detected;

Multi-scale transformation processing is performed on the low-light images to be detected that have undergone image enhancement processing, to obtain multiple low-light images of the target to be detected, wherein each of the low-light images of the target to be detected has a different size;

For each pre-trained face detection model, input multiple low-light images of the target to be detected into the pre-trained face detection model, so that the pre-trained face detection model can detect each target Perform face detection on the low-light images to be detected, obtain the face detection results of each low-light image of the target to be detected, and perform the result of combining the face detection results of the low-light images of each target to be detected. processing, to obtain the target face detection result of the pre-trained face detection model;

Perform non-maximum suppression processing on the result of combining the target face detection results of each of the pre-trained face detection models to obtain the face detection result of the low-light image to be detected.

2 . The method according to claim 1 , wherein the acquiring a low-light image to be detected and performing image enhancement processing on the low-light image to be detected comprises: 2 .

Using the multi-scale retina enhancement algorithm MSRCR with color recovery, image enhancement processing is performed on the low-light image to be detected.

3. method according to claim 1, is characterized in that, the backbone network of each described pre-trained face detection model is different, and each described pre-trained face detection model is to utilize and carry out image enhancement processing The historical low-light images to be detected are obtained by training.

4. method according to claim 3, is characterized in that, the described process that utilizes the historical low-light image to be detected that has carried out image enhancement processing to be trained face detection model is trained to obtain the process of the pre-trained face detection model, comprising :

Acquiring a plurality of training data, each described training data is obtained by using the multi-scale retina enhancement algorithm MSRCR with color recovery and the small face data enhancement algorithm to perform image enhancement processing on the historical low-light images to be detected;

For each of the training data, the training data is input into the face detection model to be trained, so that the face detection model to be trained performs face detection on the training data to obtain the historical low light image to be detected Calculate the corresponding loss function according to the face detection results of the historical low-light images to be detected, so as to adjust the parameters of the face detection model to be trained according to the loss function, until the person to be trained The face detection model reaches convergence, and the face detection model is obtained.

5. method according to claim 3, is characterized in that, if the backbone network of described face detection model to be trained is ResNet50 neural network, described ResNet50 neural network comprises input layer, 3 stages, and each described The convolutional layers connected in stages, and the contextual modules connected to each of the convolutional layers, the use of the historical low-light images to be detected that have undergone image enhancement processing to train the face detection model to be trained to obtain pre-trained face detection The process of modeling, including:

The training data is input to the ResNet50 neural network to be trained through the input layer;

The three described stages are used to process the training data respectively, to obtain a feature map corresponding to each of the described stages;

Respectively use the convolutional layer corresponding to each of the stages to reduce the feature map corresponding to each of the stages to obtain a target feature map corresponding to each of the convolutional layers. The fusion result obtained by the multi-scale fusion processing of the target feature map is input into the context module corresponding to each of the convolutional layers;

Based on the feature map obtained by processing the input fusion result by each of the context modules, the multi-task loss function of each of the context modules is calculated, and the multi-task loss function of each of the context modules is used to analyze the The parameters of the ResNet50 neural network to be trained are adjusted until the ResNet50 neural network to be trained converges, and a face detection model is obtained.

6. The method according to claim 1, wherein the face detection result at least comprises at least one face frame and a confidence level corresponding to each of the face frames, and the method further comprises:

Determine whether the confidence corresponding to each of the face frames is greater than a preset confidence threshold;

If the confidence corresponding to each of the face frames is greater than the preset confidence threshold, the face detection result is determined as the final face detection result of the low-light image to be detected.

7. A face detection device, wherein the device comprises:

a first acquiring unit, configured to acquire a low-light image to be detected, and perform image enhancement processing on the low-light image to be detected;

A multi-scale transformation processing unit, configured to perform multi-scale transformation processing on the low-light image to be detected that has undergone image enhancement processing, to obtain multiple low-light images of the target to be detected, wherein each low-light image of the target to be detected is are different in size;

A face detection unit, for each pre-trained face detection model, input a plurality of the low-light images of the target to be detected into the pre-trained face detection model, so that the pre-trained face detection The model performs face detection on each low-light image of the target to be detected, obtains the face detection result of each low-light image of the target to be detected, and analyzes the face of each low-light image of the target to be detected. The result of combining the detection results is processed to obtain the target face detection result of the pre-trained face detection model;

A non-maximum value suppression processing unit, configured to perform non-maximum value suppression processing on the result of merging the target face detection results of each of the pre-trained face detection models, to obtain the person in the low-light image to be detected. face detection results.

8. The apparatus according to claim 7, wherein the first obtaining unit comprises:

a second acquiring unit, configured to acquire a low-light image to be detected;

The image enhancement processing unit is configured to perform image enhancement processing on the low-light image to be detected by using the multi-scale retina enhancement algorithm MSRCR with color recovery.

9. An electronic device, characterized in that, the electronic device comprises a processor and a memory, the memory is used to store program codes and data of face detection, and the processor is used to call the program instructions in the memory to execute A face detection method according to any one of claims 1-6.

10. A storage medium, characterized in that the storage medium comprises a stored program, wherein, when the program is run, a device where the storage medium is located is controlled to execute the one according to any one of claims 1-6. face detection method.