[go: up one dir, main page]

CN112818833A - Face multitask detection method, system, device and medium based on deep learning - Google Patents

Face multitask detection method, system, device and medium based on deep learning Download PDF

Info

Publication number
CN112818833A
CN112818833A CN202110124545.3A CN202110124545A CN112818833A CN 112818833 A CN112818833 A CN 112818833A CN 202110124545 A CN202110124545 A CN 202110124545A CN 112818833 A CN112818833 A CN 112818833A
Authority
CN
China
Prior art keywords
face
image
processing
deep learning
multitask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110124545.3A
Other languages
Chinese (zh)
Other versions
CN112818833B (en
Inventor
梁延研
朱震威
林旭新
于春涛
杨琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boyan Technology Zhuhai Co ltd
China Energy International Development Investment Group Co ltd
Original Assignee
China Energy International Construction Investment Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Energy International Construction Investment Group Co ltd filed Critical China Energy International Construction Investment Group Co ltd
Priority to CN202110124545.3A priority Critical patent/CN112818833B/en
Publication of CN112818833A publication Critical patent/CN112818833A/en
Application granted granted Critical
Publication of CN112818833B publication Critical patent/CN112818833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种基于深度学习的人脸多任务检测方法、系统、装置及存储介质,该方法包括获取原始人脸图像;对所述原始人脸图像进行归一化处理,得到第一图像;将所述第一图像输入超分辨率神经网络模型进行处理,得到第二图像;将所述第二图像输入基于深度学习的人脸多任务检测模型进行处理,得到人脸框坐标和人脸关键点坐标;本发明通过超分辨率神经网络模型,能够在保持特征图尺寸的前提下增强特征信息,同时增加对于小目标人脸的检测性能;通过基于深度学习的人脸多任务检测模型,可提升不同尺寸人脸的检测效果,使得检测结果更加精准;本发明可广泛应用于图像处理技术领域。

Figure 202110124545

The invention discloses a deep learning-based face multi-task detection method, system, device and storage medium. The method includes acquiring an original face image; normalizing the original face image to obtain a first image ; Input the first image into a super-resolution neural network model for processing to obtain a second image; Input the second image into a human face multi-task detection model based on deep learning for processing to obtain face frame coordinates and face Coordinates of key points; the present invention can enhance the feature information on the premise of maintaining the size of the feature map through the super-resolution neural network model, and at the same time increase the detection performance for small target faces; through the face multi-task detection model based on deep learning, The detection effect of faces of different sizes can be improved, so that the detection result is more accurate; the invention can be widely used in the technical field of image processing.

Figure 202110124545

Description

Face multitask detection method, system, device and medium based on deep learning
Technical Field
The invention relates to the technical field of image processing, in particular to a face multitask detection method, a system, a device and a storage medium based on deep learning.
Background
The face detection technology originally originated from face recognition, is a core and long-standing research branch in the field of computer vision, and is a crucial first step in face-related applications. In recent decades, human face detection has attracted a great deal of attention and is considered to be one of the successful applications in image analysis. In most of the prior art, in order to improve the detection accuracy of the model for the small face, a preprocessing mode of enlarging the image is adopted, however, two adverse effects are brought:
1) the size of the input image is increased, and the size of a characteristic diagram generated in the reasoning process of the scale-up model is the same, so that the calculated amount and the memory occupation amount are increased rapidly;
2) increasing the image size amplifies smaller targets, as well as larger targets, which negatively impacts the ability of the deep neural network model to detect large samples.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a face multitask detection method, a system, a device and a storage medium based on deep learning.
The technical scheme adopted by the invention is as follows:
in one aspect, an embodiment of the present invention includes a face multitask detection method based on deep learning, including:
acquiring an original face image;
carrying out normalization processing on the original face image to obtain a first image;
inputting the first image into a super-resolution neural network model for processing to obtain a second image;
and inputting the second image into a face multi-task detection model based on deep learning for processing to obtain a face frame coordinate and a face key point coordinate.
Further, the normalizing the original face image to obtain a first image specifically comprises:
and performing normalization processing on the original face image by adopting a method of uniformly mapping the gray values of 0-255 to 0-1 to obtain a first image.
Further, the super-resolution neural network model comprises:
a feature extraction module for extracting image features from the first image;
and the nonlinear mapping module is used for carrying out nonlinear mapping on the extracted image features to obtain the second image, and the second image comprises high-resolution image information.
Further, the step of inputting the second image into a face multitask detection model based on deep learning for processing specifically includes:
inputting the second image into a backbone network for processing, and acquiring a plurality of levels of feature maps;
after the feature maps of all levels are processed by a 1 x1 convolution kernel, combining the features of high-level semantics and low-level semantics layer by layer to obtain a plurality of feature maps containing high-level semantic information;
inputting all the characteristic diagrams containing high-level semantic information into an area suggestion network for processing, and obtaining a face target coarse candidate frame;
the face target coarse candidate frame is corresponding to a first position, the first position is a feature map position output by the first image through backbone network processing, and all the face target coarse candidate frames are unified into two dimensions through a RoI Align method and are divided into a first branch line and a second branch line for processing;
the first branch line screens and corrects the face target coarse candidate frame to obtain a face frame coordinate;
and the second branch line processes the face target coarse candidate frame to obtain face key point coordinates.
Further, unifying all the face target coarse candidate frames into two dimensions by a RoI Align method, and dividing the two dimensions into a first branch and a second branch for processing, specifically:
unifying all the face target coarse candidate frames into an image with the dimension of 7x7x256 by a RoI Align method, and dividing the image into first branches for processing;
unifying all the face target coarse candidate frames into an image with the dimension of 14x14x256 by a RoI Align method, and dividing the image into second branches for processing.
Further, the step of screening and correcting the coarse candidate frame of the face target by the first branch line to obtain the face frame coordinates specifically includes:
abstracting the characteristics of the image with the dimensionality of 7x7x256 into one dimension through two full connection layers;
fitting the classification and position offset of the face target coarse candidate frame by using two full-connected layers respectively;
and correcting the face target coarse candidate frame according to the classification and the position offset of the face target coarse candidate frame to obtain a face frame coordinate.
Further, the step of processing the face target coarse candidate frame by the second branch line to obtain face key point coordinates specifically includes:
processing an image with the dimension of 14x14x256 by 4 layers of convolution layers;
and fitting the whole connecting layer to obtain the coordinates of the key points of the human face.
On the other hand, the embodiment of the invention also comprises a face multitask detection system based on deep learning, which comprises the following steps:
the acquisition module is used for acquiring an original face image;
the normalization processing module is used for performing normalization processing on the original face image to obtain a first image;
the super-resolution module is used for inputting the first image into a super-resolution neural network model for processing to obtain a second image;
and the face multitask detection module is used for inputting the second image into a face multitask detection model based on deep learning to be processed, and obtaining face frame coordinates and face key point coordinates.
On the other hand, the embodiment of the invention also comprises a face multitask detection device based on deep learning, which comprises:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the detection method.
In another aspect, the embodiment of the present invention further includes a computer readable storage medium, on which a program executable by a processor is stored, and the program executable by the processor is used for implementing the detection method when being executed by the processor.
The invention has the beneficial effects that:
(1) according to the method, the characteristic information can be enhanced on the premise of keeping the size of the characteristic diagram through the super-resolution neural network model, and meanwhile, the detection performance of the small target face is improved, so that the small face is more easily detected, and compared with an amplified input image, the increment of computing resources is very small;
(2) according to the invention, through the face multitask detection model based on deep learning, the detection effects of faces with different sizes can be improved, the detection result is more accurate, and the size range of the face image can not be changed in the feature enhancement process in the detection process.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating steps of a face multitask detection method based on deep learning according to an embodiment of the present invention;
FIG. 2 is a network architecture diagram of a deep learning-based face multitask detection model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training process of a face multitask detection model based on deep learning according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a specific setting of network parameters according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a face multitask detection device based on deep learning according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
The embodiments of the present application will be further explained with reference to the drawings.
Referring to fig. 1, an embodiment of the present invention provides a face multitask detection method based on deep learning, including but not limited to the following steps:
s1, acquiring an original face image;
s2, carrying out normalization processing on the original face image to obtain a first image;
s3, inputting the first image into a super-resolution neural network model for processing to obtain a second image;
and S4, inputting the second image into a face multi-task detection model based on deep learning for processing to obtain face frame coordinates and face key point coordinates.
Regarding step S2, that is, performing normalization processing on the original face image to obtain a first image, specifically:
s201, performing normalization processing on the original face image by adopting a method of uniformly mapping gray values of 0-255 to 0-1 to obtain a first image.
In this embodiment, the operation of step S2 adopts a method of uniformly mapping gray-scale values of 0 to 255 to between 0 and 1, which is different from the mean standard deviation normalization used in the general detection task, and the purpose of the method is to adapt to the learning of the super-resolution neural network model.
The super-resolution neural network model in step S3 includes:
a feature extraction module for extracting image features from the first image;
and the nonlinear mapping module is used for carrying out nonlinear mapping on the extracted image features to obtain the second image, and the second image comprises high-resolution image information.
In this embodiment, the super-resolution neural network model adopts a lightweight network, which includes four convolutional layers and an upsampling layer, where the upsampling layer is only used to reconstruct a high-resolution image, and does not participate in the inference process of the backbone network, and a second image containing high-resolution information is obtained through processing of the four convolutional layers, where the second image is a feature image having the same resolution as an input original image; then, the second image is input into a human face multi-task detection model based on deep learning for further processing. Therefore, the super-resolution neural network model aims to enhance the feature information on the premise of keeping the size of the feature map, and the super-resolution neural network model is good in help for detecting the target with a small pixel point occupation range.
In this embodiment, the training process of the super-resolution neural network model includes the following processing processes:
(1) processing the first image by using bicubic interpolation downsampling to obtain a low-resolution image, and extracting image features from the low-resolution image through a feature extraction module;
(2) mapping the extracted image features to a same resolution feature image for representing higher resolution image information by a non-linear mapping module, wherein the same resolution feature image is the second image;
(3) performing learnable upsampling processing on the second image through an image reconstruction module to obtain a high-resolution image restored by the model;
in the embodiment, in the training process of the super-resolution neural network model, the low-resolution image is processed by the super-resolution module, a high-resolution image which is predicted and restored by the model can be output, and the difference between the high resolution and the first image is compared and calculated so as to supervise the network training process; and in the application process, the image reconstruction module does not participate in the work.
Optionally, step S4, that is, the step of inputting the second image into the depth learning-based face multitask detection model for processing specifically includes:
s401, inputting the second image into a backbone network for processing, and obtaining a plurality of levels of feature maps;
s402, after the feature maps of all levels are processed by a 1 x1 convolution kernel, combining the features of high-level semantics and low-level semantics layer by layer to obtain a plurality of feature maps containing high-level semantic information;
s403, inputting all the characteristic graphs containing the high-level semantic information into an area for proposing network processing to obtain a face target coarse candidate frame;
s404, enabling the face target coarse candidate frame to correspond to a first position, wherein the first position is a feature map position output by the first image through backbone network processing, and unifying all the face target coarse candidate frames into two dimensions through a RoI Align method and dividing the two dimensions into a first branch line and a second branch line for processing;
s405, screening and correcting the face target coarse candidate frame by the first branch line to obtain a face frame coordinate;
and S406, the second branch line processes the face target coarse candidate frame to obtain face key point coordinates.
In step S404, the unifying all the face target coarse candidate frames into two dimensions by the RoI Align method, and dividing the two dimensions into a first branch line and a second branch line for processing specifically:
s404-1, unifying all the face target coarse candidate frames into an image with the dimension of 7x7x256 by a RoI Align method, and dividing the image into first branches for processing;
s404-2, unifying all the face target coarse candidate frames into an image with the dimension of 14x14x256 by a RoI Align method, and dividing the image into second branches for processing.
In step S405, that is, the step of screening and correcting the coarse candidate frame of the face target by the first branch line to obtain the face frame coordinates specifically includes:
s405-1, abstracting the features of the image with the dimensionality of 7x7x256 into one dimension through two full connection layers;
s405-2, fitting the classification and the position offset of the face target coarse candidate frame by using two full-connection layers respectively;
s405-3, correcting the face target coarse candidate frame according to the classification and the position offset of the face target coarse candidate frame to obtain a face frame coordinate.
In step S406, that is, the step of processing the coarse candidate frame of the face target by the second branch line to obtain the coordinates of the face key point specifically includes:
s406-1, processing the image with the dimension of 14x14x256 by 4 layers of convolution layers;
and S406-2, fitting through the full connection layer to obtain the coordinates of the key points of the human face.
In the embodiment, a face multitask detection model based on deep learning obtains a feature map with highly abstracted information through a convolution layer, a batch normalization layer, an activation function and a pooling layer which are alternated according to a certain rule, and a multi-scale feature pyramid is constructed to assist faces with various scales to be accurately detected; meanwhile, the adopted two-stage network architecture comprises two parts of selecting a candidate frame and generating an accurate detection result, wherein the latter part can be divided into two parallel branches, and the coordinates of the face frame and the coordinates of the face key points are respectively obtained through the two parallel branches.
Specifically, referring to fig. 2, firstly, inputting an information enhanced feature map obtained through super-resolution neural network model processing to a backbone network to obtain feature maps of various levels, wherein due to the fact that the number of layers is gradually deeper and the feature maps are subjected to down-sampling processing step by step, the semantic abstraction degree is gradually increased, and the feature map size is gradually decreased; through the processing of M5-M2 layers, the characteristics of high-level semantics and low-level semantics can be combined layer by layer, characteristic graphs (such as P2-P6 in FIG. 2) with various sizes and containing high-level semantic information are obtained, a characteristic pyramid is formed, and the detection effect of the human faces with different sizes is improved;
secondly, inputting RPN (Region suggestion Network) into all the obtained feature maps (P2-P6), laying anchor frames with the side length of 4, 5.04 and 6.35 pixels and the side length ratio of 1:1 respectively by taking each pixel point as a center, and fitting through two branch lines: namely, the position of the classification anchor frame belongs to a human face or a background (two softmax scores), and the classification anchor frame is fitted with the position offset (the horizontal and vertical coordinates of the upper left corner point and the offset of the length and the width, which are four values) of the regression anchor frame and the true value target frame, so that a coarse candidate frame of the human face target can be obtained, and in the process, a first group of loss functions are generated:
Figure BDA0002923168720000071
in the formula, piRepresenting the classification probability, t, of a human face objectiThe position offset of the object representing the face,
Figure BDA0002923168720000072
represents piA corresponding true value is set for the value of,
Figure BDA0002923168720000078
represents tiA corresponding true value; n is a radical ofcls、NregThe number of classification and regression targets in one batch respectively; l isclsRepresenting the loss function of the classification, and adopting the cross entropy loss of the two classifications; l isregLoss function, representing position regression, was lost with smooth L1.
And then, corresponding the output coarse candidate frames of the face target to the position of the information enhancement feature map obtained by the super-resolution neural network model processing, unifying all the coarse candidate frames of the face target into two forms with the dimensionality of 14x14x256 and the dimensionality of 14x14x256 by a RoI Align method, and dividing the coarse candidate frames into two branches for processing. As shown in fig. 2, one of the branch lines abstracts the features to one dimension through two full connection layers, and then fits the classification and position offset of the target frame with the two full connection layers, respectively, so as to achieve the effect of finely correcting the target frame; the other branch was processed through 4 convolutional layers and 5 landworks normalized positions in the corresponding target frame were fitted with full-joins (10 values total). In this process, a second set of loss functions are generated:
Figure BDA0002923168720000073
the second set of loss functions is substantially the same as the first set of loss functions and, with the addition of landmark supervision, LlmThe loss of smooth L1, L is adoptediAnd
Figure BDA0002923168720000074
respectively as predicted value and true value of landmark; lambda [ alpha ]1、λ2、λ3Are all weights.
Therefore, the total loss function of the face multitask detection model based on deep learning is as follows:
Figure BDA0002923168720000075
wherein, alpha is a weight,
Figure BDA0002923168720000076
for the first set of loss functions,
Figure BDA0002923168720000077
a second set of loss functions.
Referring to fig. 3, in the embodiment of the present invention, because a super-resolution neural network model is introduced, a training strategy matched with the super-resolution neural network model is further proposed to improve the Detection performance of the human face multitask Detection model based on deep learning for the target with a small number of occupied pixels, as shown in fig. 3, a Detector in the figure corresponds to the structure shown in fig. 2, and a Detection Loss corresponds to a LossD. The super-resolution neural network model is used as a module for enhancing original image information, and needs to be supervised by a high-resolution image corresponding to an input image, so that the original image needs to be downsampled to generate a training pair, and the loss function of the super-resolution neural network model is as follows:
Figure BDA0002923168720000081
in the formula, yiAn SR image (super resolution restored image) is shown,
Figure BDA0002923168720000082
HR image (high resolution image) is represented, W, H, C being the width, height and channel number of the image, respectively.
For the purpose of keeping a face multi-task detection model based on deep learning in an original imageThe performance on the scale size is not reduced, and meanwhile, the detection performance for small targets is increased. Divide each batch in the training process into two groups: one group is original images (ori _ img) which are only partially processed by main lines; and the other group of images (de _ img) which are subjected to 4-time down-sampling are samples which can provide high-resolution images and correspondingly perform super-resolution training, and are processed by main line and branch line together. And finally, summing the loss values of the two parts of data by a certain weight to supervise the whole network, wherein the loss value processed by the main line part of the original image (ori _ img) is as follows: l isori_img=LossD(ii) a The loss value of the 4-fold down-sampled image (de _ img) processed by both main line and branch line is: l isde_img=LossD+βLossSR(ii) a And the expression of the summation of the two is: losstotal=Lori_img+γLde_img(ii) a Wherein beta and gamma are related weights, Liri_imgAnd Lde_imgThe respective Loss values of the two branches, LosstotalAnd finally adjusting the total loss value of the parameters of the face multitask detection model based on deep learning.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating specific settings of various parameters in the network architecture shown in fig. 2.
The face multi-task detection method based on deep learning provided by the embodiment of the invention has the following technical effects:
(1) according to the embodiment of the invention, the super-resolution neural network model can enhance the feature information on the premise of keeping the size of the feature map, and meanwhile, the detection performance of the small target face is increased, so that the small face is more easily detected, and compared with an amplified input image, the increase of computing resources is very small;
(2) according to the embodiment of the invention, the detection effect of the faces with different sizes can be improved through the face multitask detection model based on deep learning, so that the detection result is more accurate, and the size range of the face image can be ensured not to change in the feature enhancement process in the detection process.
On the other hand, the embodiment of the invention also provides a face multitask detection system based on deep learning, which comprises the following steps:
the acquisition module is used for acquiring an original face image;
the normalization processing module is used for performing normalization processing on the original face image to obtain a first image;
the super-resolution module is used for inputting the first image into a super-resolution neural network model for processing to obtain a second image;
and the face multitask detection module is used for inputting the second image into a face multitask detection model based on deep learning to be processed, and obtaining face frame coordinates and face key point coordinates.
Referring to fig. 5, an embodiment of the present invention further provides a face multitask detection device 200 based on deep learning, which specifically includes:
at least one processor 210;
at least one memory 220 for storing at least one program;
when executed by the at least one processor 210, causes the at least one processor 210 to implement the method as shown in fig. 1.
The memory 220, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs and non-transitory computer-executable programs. The memory 220 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 220 may optionally include remote memory located remotely from processor 210, and such remote memory may be connected to processor 210 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be understood that the device structure shown in fig. 5 is not intended to be limiting of device 200, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
In the apparatus 200 shown in fig. 5, the processor 210 may retrieve the program stored in the memory 220 and execute, but is not limited to, the steps of the embodiment shown in fig. 1.
The above-described embodiments of the apparatus 200 are merely illustrative, and the units illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purposes of the embodiments.
Embodiments of the present invention also provide a computer-readable storage medium, which stores a program executable by a processor, and the program executable by the processor is used for implementing the method shown in fig. 1 when being executed by the processor.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
It will be understood that all or some of the steps, systems of methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (10)

1. A face multitask detection method based on deep learning is characterized by comprising the following steps:
acquiring an original face image;
carrying out normalization processing on the original face image to obtain a first image;
inputting the first image into a super-resolution neural network model for processing to obtain a second image;
and inputting the second image into a face multi-task detection model based on deep learning for processing to obtain a face frame coordinate and a face key point coordinate.
2. The method for detecting the multitask face based on the deep learning of the claim 1 is characterized in that the normalization processing is performed on the original face image to obtain a first image, specifically:
and performing normalization processing on the original face image by adopting a method of uniformly mapping the gray values of 0-255 to 0-1 to obtain a first image.
3. The method for detecting the face multitask based on the deep learning as claimed in claim 1, wherein the super-resolution neural network model comprises:
a feature extraction module for extracting image features from the first image;
and the nonlinear mapping module is used for carrying out nonlinear mapping on the extracted image features to obtain the second image, and the second image comprises high-resolution image information.
4. The method for detecting the face multitask based on the deep learning as claimed in claim 1, wherein the step of inputting the second image into the face multitask detection model based on the deep learning for processing specifically comprises:
inputting the second image into a backbone network for processing, and acquiring a plurality of levels of feature maps;
after the feature maps of all levels are processed by a 1 x1 convolution kernel, combining the features of high-level semantics and low-level semantics layer by layer to obtain a plurality of feature maps containing high-level semantic information;
inputting all the characteristic diagrams containing high-level semantic information into an area suggestion network for processing, and obtaining a face target coarse candidate frame;
the face target coarse candidate frame is corresponding to a first position, the first position is a feature map position output by the first image through backbone network processing, and all the face target coarse candidate frames are unified into two dimensions through a RoIAlign method and are divided into a first branch line and a second branch line for processing;
the first branch line screens and corrects the face target coarse candidate frame to obtain a face frame coordinate;
and the second branch line processes the face target coarse candidate frame to obtain face key point coordinates.
5. The method according to claim 4, wherein the face multitask detection method based on deep learning is characterized in that all the face target coarse candidate frames are unified into two dimensions by a RoI Align method and are divided into a first branch and a second branch for processing, and specifically:
unifying all the face target coarse candidate frames into an image with the dimension of 7x7x256 by a RoIAlign method, and dividing the image into first branch lines for processing;
unifying all the face target coarse candidate frames into an image with the dimension of 14x14x256 by a RoIAlign method, and dividing the image into second branches for processing.
6. The method for detecting the multitask human face based on the deep learning of the claim 5 is characterized in that the step of correcting the coarse candidate frame of the human face target by the first branch line to obtain the coordinates of the human face frame specifically comprises the following steps:
abstracting the characteristics of the image with the dimensionality of 7x7x256 into one dimension through two full connection layers;
fitting the classification and position offset of the face target coarse candidate frame by using two full-connected layers respectively;
and correcting the face target coarse candidate frame according to the classification and the position offset of the face target coarse candidate frame to obtain a face frame coordinate.
7. The method for detecting the multitask human face based on the deep learning of claim 5, wherein the step of processing the coarse candidate frame of the human face target by the second branch line to obtain the coordinates of the key points of the human face specifically comprises:
processing an image with the dimension of 14x14x256 by 4 layers of convolution layers;
and fitting the whole connecting layer to obtain the coordinates of the key points of the human face.
8. A face multitask detection system based on deep learning is characterized by comprising:
the acquisition module is used for acquiring an original face image;
the normalization processing module is used for performing normalization processing on the original face image to obtain a first image;
the super-resolution module is used for inputting the first image into a super-resolution neural network model for processing to obtain a second image;
and the face multitask detection module is used for inputting the second image into a face multitask detection model based on deep learning to be processed, and obtaining face frame coordinates and face key point coordinates.
9. A face multitask detection device based on deep learning is characterized by comprising the following components:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the detection method of any one of claims 1-7.
10. Computer-readable storage medium, on which a program executable by a processor is stored, the program executable by the processor being adapted to implement the detection method according to any one of claims 1 to 7 when executed by the processor.
CN202110124545.3A 2021-01-29 2021-01-29 Multi-task face detection method, system, device and medium based on deep learning Active CN112818833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110124545.3A CN112818833B (en) 2021-01-29 2021-01-29 Multi-task face detection method, system, device and medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110124545.3A CN112818833B (en) 2021-01-29 2021-01-29 Multi-task face detection method, system, device and medium based on deep learning

Publications (2)

Publication Number Publication Date
CN112818833A true CN112818833A (en) 2021-05-18
CN112818833B CN112818833B (en) 2024-04-12

Family

ID=75860153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110124545.3A Active CN112818833B (en) 2021-01-29 2021-01-29 Multi-task face detection method, system, device and medium based on deep learning

Country Status (1)

Country Link
CN (1) CN112818833B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708481A (en) * 2022-03-28 2022-07-05 阿里云计算有限公司 Method and apparatus for processing image
CN118864258A (en) * 2024-09-26 2024-10-29 拓尔思信息技术股份有限公司 A method for small target detection in super-resolution images based on deep learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951867A (en) * 2017-03-22 2017-07-14 成都擎天树科技有限公司 Face recognition method, device, system and equipment based on convolutional neural network
CN107909026A (en) * 2016-11-30 2018-04-13 深圳奥瞳科技有限责任公司 Age and gender assessment based on the small-scale convolutional neural networks of embedded system
CN107958444A (en) * 2017-12-28 2018-04-24 江西高创保安服务技术有限公司 A kind of face super-resolution reconstruction method based on deep learning
CN109101915A (en) * 2018-08-01 2018-12-28 中国计量大学 Face and pedestrian and Attribute Recognition network structure design method based on deep learning
CN110263756A (en) * 2019-06-28 2019-09-20 东北大学 A kind of human face super-resolution reconstructing system based on joint multi-task learning
CN110532871A (en) * 2019-07-24 2019-12-03 华为技术有限公司 The method and apparatus of image procossing
CN111160202A (en) * 2019-12-20 2020-05-15 万翼科技有限公司 AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium
CN111259742A (en) * 2020-01-09 2020-06-09 南京理工大学 Abnormal crowd detection method based on deep learning
CN111274977A (en) * 2020-01-22 2020-06-12 中能国际建筑投资集团有限公司 Multi-task convolutional neural network model and method of use, device and storage medium
US20210012198A1 (en) * 2018-05-31 2021-01-14 Huawei Technologies Co., Ltd. Method for training deep neural network and apparatus

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909026A (en) * 2016-11-30 2018-04-13 深圳奥瞳科技有限责任公司 Age and gender assessment based on the small-scale convolutional neural networks of embedded system
CN106951867A (en) * 2017-03-22 2017-07-14 成都擎天树科技有限公司 Face recognition method, device, system and equipment based on convolutional neural network
CN107958444A (en) * 2017-12-28 2018-04-24 江西高创保安服务技术有限公司 A kind of face super-resolution reconstruction method based on deep learning
US20210012198A1 (en) * 2018-05-31 2021-01-14 Huawei Technologies Co., Ltd. Method for training deep neural network and apparatus
CN109101915A (en) * 2018-08-01 2018-12-28 中国计量大学 Face and pedestrian and Attribute Recognition network structure design method based on deep learning
CN110263756A (en) * 2019-06-28 2019-09-20 东北大学 A kind of human face super-resolution reconstructing system based on joint multi-task learning
CN110532871A (en) * 2019-07-24 2019-12-03 华为技术有限公司 The method and apparatus of image procossing
CN111160202A (en) * 2019-12-20 2020-05-15 万翼科技有限公司 AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium
CN111259742A (en) * 2020-01-09 2020-06-09 南京理工大学 Abnormal crowd detection method based on deep learning
CN111274977A (en) * 2020-01-22 2020-06-12 中能国际建筑投资集团有限公司 Multi-task convolutional neural network model and method of use, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Task-Oriented Feature-Fused Network With Multivariate Dataset for Joint Face Analysis", 《IEEE TRANSACTIONS ON CYBERNETICS》, pages 1292 - 1305 *
刘意文: "基于深度学习的低分辨率人脸检测算法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, no. 01, pages 138 - 1842 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708481A (en) * 2022-03-28 2022-07-05 阿里云计算有限公司 Method and apparatus for processing image
CN118864258A (en) * 2024-09-26 2024-10-29 拓尔思信息技术股份有限公司 A method for small target detection in super-resolution images based on deep learning
CN118864258B (en) * 2024-09-26 2024-12-27 拓尔思信息技术股份有限公司 A method for small target detection in super-resolution images based on deep learning

Also Published As

Publication number Publication date
CN112818833B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN112446383B (en) License plate recognition method and device, storage medium and terminal
CN110189255B (en) Face detection method based on two-level detection
CN111179217A (en) A multi-scale target detection method in remote sensing images based on attention mechanism
CN109101897A (en) Object detection method, system and the relevant device of underwater robot
WO2023070447A1 (en) Model training method, image processing method, computing processing device, and non-transitory computer readable medium
CN112800964A (en) Remote sensing image target detection method and system based on multi-module fusion
CN107704857A (en) A kind of lightweight licence plate recognition method and device end to end
CN111079739B (en) Multi-scale attention feature detection method
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN112949520B (en) An aerial vehicle detection method and detection system based on multi-scale small samples
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN115115601B (en) A remote sensing ship target detection method based on deformation attention pyramid
CN112434618A (en) Video target detection method based on sparse foreground prior, storage medium and equipment
CN112200191B (en) Image processing method, image processing device, computing equipment and medium
CN117409190B (en) Real-time infrared image target detection method, device, equipment and storage medium
CN116645592B (en) A crack detection method and storage medium based on image processing
CN111275660A (en) Defect detection method and device for flat panel display
CN116994236B (en) Low-quality image license plate detection method based on deep neural network
CN112818833A (en) Face multitask detection method, system, device and medium based on deep learning
CN113610178B (en) A method and device for detecting inland river vessel targets based on video surveillance images
CN113963272A (en) A UAV image target detection method based on improved yolov3
CN114219402A (en) Logistics tray stacking identification method, device, equipment and storage medium
CN111626379B (en) X-ray image detection method for pneumonia
CN112232102B (en) Building target recognition method and system based on deep neural network and multi-task learning
CN113284153A (en) Satellite cloud layer image processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Building I, 15th Floor, Jinlong Center, 105 Xixinghai Road, New Port, Macau, China

Patentee after: China Energy International Development Investment Group Co.,Ltd.

Country or region after: Macao, China

Address before: Building C, 7th Floor, Jinlong Center, 105 Xianxinghai Road, New Port, Macau, China

Patentee before: China Energy International Construction Investment Group Co.,Ltd.

Country or region before: Macao, China

CP03 Change of name, title or address
TR01 Transfer of patent right

Effective date of registration: 20240430

Address after: Room 4202, Building 2, No. 522 Duhui Road, Hengqin New District, Zhuhai City, Guangdong Province

Patentee after: Boyan Technology (Zhuhai) Co.,Ltd.

Country or region after: China

Address before: Building I, 15th Floor, Jinlong Center, 105 Xixinghai Road, New Port, Macau, China

Patentee before: China Energy International Development Investment Group Co.,Ltd.

Country or region before: Macao, China

TR01 Transfer of patent right