CN114170069A

CN114170069A - Automatic eye closing processing method based on continuous multiple pictures

Info

Publication number: CN114170069A
Application number: CN202111412708.4A
Authority: CN
Inventors: 姚金良; 黄卜凌霄
Original assignee: Hangzhou Dianzi University Shangyu Science and Engineering Research Institute Co Ltd
Current assignee: Hangzhou Dianzi University Shangyu Science and Engineering Research Institute Co Ltd
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2022-03-11

Abstract

The invention discloses an automatic eye closing processing method based on continuous multiple pictures. The method mainly comprises the following steps: positioning a human face; determining a human eye area through human face key point positioning; identifying a closed and open state of a human eye; replacing the closed eye region with an already open eye region; the replacement process comprises the following steps: obtaining an affine transformation matrix through human eye key points in the two images; transforming the open eye region to the closed eye region by an affine transformation matrix; and smoothing the boundary of the human eye region after replacement. The method can well deal with the problem that people in the photos can not all keep the eye-open state simultaneously when the group photo is taken, and has higher efficiency and better effect compared with the method for repairing the pictures through image editing software.

Description

Automatic eye closing processing method based on continuous multiple pictures

Technical Field

The invention belongs to the field of computer image processing and pattern recognition, and relates to an automatic eye closing processing method based on continuous multiple pictures.

Background

The process of taking group photos is an inevitable process in various occasions such as conferences, activities, student graduations and the like, however, various problems can be encountered in the process of taking group photos, and one key problem is that the condition that no person closes eyes in one photo cannot be guaranteed. In most cases, the photographer may take multiple photographs, but there is no guarantee that all people are open in one photograph at the same time.

In order to solve the problem of eye closing in the process of taking group photos, a photographer can try to keep all people in an eye opening state by means of shouting a password. In addition, with the current popularity of image editing software, the captured picture can replace the open eye region of another picture with the closed eye region of the person in another picture by means of a retouching process, but this process is time-consuming and labor-consuming. With the rapid development of deep learning technology, especially the wide application of the generation of the antagonistic neural network (GAN), many enterprises and developers have proposed the generation of the antagonistic neural network to generate an antagonistic operation on the Eye-closed face by using the face with the existing Eye-open state as a reference image so as to generate a face with an Eye-open state on the basis of the Eye-closed face [ Eye In-Painting with expression genetic adaptive network, Brian Dolhansky, crisian Canton Ferrer, CVPR 2018, arXiv: https:// arxiv. org/abs/1712.03999 ]. The method can achieve a good effect, but the generation process is a cassette operation mode, and whether the human eye area is the real human eye state or not cannot be guaranteed.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides an automatic eye closing processing method based on continuous multiple pictures.

The technical scheme adopted by the invention is as follows:

an automatic eye closing processing method based on continuous multiple pictures is used for eliminating the eye closing state in the pictures and comprises the following steps:

s1: acquiring a plurality of front photos continuously shot by a target person, and respectively positioning a face area in each photo by using a face positioning model;

s2: aiming at each photo processed in the S1, positioning a face key point in a face area by using a face key point detection model to obtain a human eye area in each photo;

s3: for each picture processed in the S2, extracting a direction gradient histogram of a human eye region in the picture as a feature vector, inputting the feature vector into a trained machine learning classifier to classify the state of human eyes, and judging whether the human eyes of a target person in each picture are in a closed state or an open state;

s4: and aiming at the photo to be processed with the human eyes of the target person in the closed state, acquiring a photo with the human eyes of the target person in the open state from other continuously shot photos as a replacement photo, registering the human eye region in the open state in the replacement photo to the human eye region in the closed state in the photo to be processed through affine transformation, replacing, and finally smoothing the boundary of the human eye region after replacement in the photo to be processed to complete the elimination of the closed state in the photo.

Preferably, the face positioning model adopts a deep learning model YOLO.

Preferably, the face keypoint detection model co-locates 68face keypoints in the face region, wherein 12 keypoints are shared by both eyes.

Preferably, the machine learning classifier is a support vector machine.

Furthermore, the support vector machine is trained by using the labeled positive and negative samples in advance, so that the classification accuracy of the support vector machine on the human eye state meets the preset condition.

Preferably, when affine transformation is performed on the photo to be processed and the substitute photo, an affine transformation matrix required for registration is calculated based on the human eye key points of the human eye regions in the two photos.

Preferably, the boundary of the human eye region after replacement in the photo to be processed is smoothed by gaussian filtering.

Preferably, the plurality of front photographs taken in succession are group photographs or single-person photographs.

Compared with the prior art, the invention has the following beneficial effects:

the invention can automatically realize that the human eyes in the eye closing state are converted into the eye opening state through a plurality of images (including human face images in the eye opening state), thereby realizing the eye closing operation in the personal photo or the group photo. The method of the present invention requires multiple images to be taken in succession and requires that a person have an eye-open condition in a particular image. The method does not generate an eye-opening image, but cuts the eye region in an eye-opening state in another image into the eye region in a eye-closing state through operations such as cutting change and the like, so that the eyes are guaranteed to be obtained by original shooting.

Drawings

FIG. 1 is a flow chart of steps of an automatic eye closure processing method based on a plurality of continuous photos;

FIG. 2 is a main flow chart of the method in the embodiment;

FIG. 3 is a flow chart of a detection function in an embodiment;

FIG. 4 is a flow chart of an alternate function in an embodiment;

FIG. 5 shows the face location result in the embodiment

FIG. 6 shows the detection results of key points of human faces in the embodiment;

FIG. 7 shows the result of eye positioning in the example;

FIG. 8 is a sample of eye region classification in an embodiment;

FIG. 9 is a sample result of human eye state recognition in the embodiment;

fig. 10 is a result of replacing the eye region in the closed-eye state with the eye region in the open-eye state in the embodiment.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and the detailed description.

The automatic closed-eye processing method comprises the main processes of firstly adopting a target detection method to position a human face region, then adopting a machine learning method to classify whether eyes are opened or not according to the positioned human eye region, obtaining affine transformation parameters of two human eye regions (closed eyes and open eyes) of the human eye region of the closed eyes through the corresponding relation of the key points, finally transforming the open-eye region to a closed-eye region space through affine transformation and replacing, and adopting a smoothing method to process the replaced edge. Thus, the human eyes in the photograph can be ensured to be in the eye-open state by automatically replacing the region in the eye-closed state with the region in the eye-open state.

The following is a description of specific implementations of the present invention.

As shown in fig. 1, as a preferred implementation manner of the present invention, an automatic eye closing processing method based on multiple consecutive photos is provided, which is used for eliminating the eye closing state in the photos, and includes the following steps:

s1: and acquiring a plurality of front photos continuously shot by the target person, and respectively positioning the face area in each photo by using the face positioning model. The face positioning model may adopt any model capable of realizing face region positioning, for example, a deep learning model YOLO.

S2: and for each photo processed in the step S1, positioning the face key points in the face region by using the face key point detection model to obtain the eye region in each photo. The face key point detection model may adopt any model capable of realizing face key point positioning, generally speaking, the face key point detection model needs to position 68face key points in a common manner for a face region, wherein 12 key points are provided for both eyes, and 6 key points are provided for each side of the face region.

S3: and for each picture processed in the step S2, extracting the directional gradient histogram of the human eye region in the picture as a feature vector, inputting the feature vector into a trained machine learning classifier to classify the state of human eyes, and judging whether the human eyes of the target person in each picture are in a closed state or an open state. The machine learning classifier can be implemented by adopting any two-classification network, and preferably adopts a support vector machine which is very effective on a small sample data set as the classifier. However, it should be noted that the support vector machine needs to be trained by using labeled positive and negative samples in advance, so that the classification accuracy of the support vector machine on the human eye state meets the minimum requirement on the accuracy.

It should be noted that, when performing affine transformation on the photo to be processed and the substitute photo, it is necessary to calculate an affine transformation matrix, and when calculating the affine transformation matrix, the key points of the human eyes in the human eye regions in the two photos can be used as the registration points for calculation. In addition, the boundary of the human eye region after replacement in the photo to be processed can be smoothed by adopting methods such as Gaussian filtering, feathering and the like so as to enhance the natural feeling of the human eye region after replacement. In addition, in the actual execution, for the photo to be processed, a photo with the eye opened closest to the time stamp of the photo to be processed may be selected as the replacement photo, so as to ensure that the facial expressions of the human beings in the two photos are consistent as much as possible.

The method of the present invention is applicable to both collective photographs of a plurality of front faces taken in succession and single photographs of a plurality of front faces taken in succession. If the photo is a single photo, only one target person exists in the photo, and the photo is processed; if the photo is a group photo, a plurality of different people exist in a single photo, and the closed-eye automatic processing method can be executed by taking each person as a target person in sequence, and certainly, the steps of face positioning, key point detection and the like in one photo can be executed in parallel.

The following describes a process of replacing the eye-closing state eye region with the eye-opening state eye region based on the automatic closed-eye processing method based on the continuous multiple photographs described in S1 to S4, taking a single photograph of a single person as an example. Of course, if there are a plurality of faces in the closed-eye state in the group photograph, the processing may be performed by cutting a single face and then processing the faces one by one.

Examples

In the embodiment, aiming at the problem that part of pictures in the continuously shot pictures have the eye closing state, the method provided by the invention realizes the automatic replacement of the eye closing state eye area by the eye opening state eye area through data fusion based on a plurality of continuous images, and the method can effectively process the eye closing state in the group pictures and beautify the pictures.

In this embodiment, the automatic closed-eye processing method based on a plurality of consecutive photos described in S1-S4 is implemented by the main flow function shown in fig. 2, in which the detection function shown in fig. 3 needs to be invoked to detect the face region and the key points in each photo, and the replacement function shown in fig. 4 needs to be invoked to replace the face region in the closed-eye photo. The following describes in detail the specific implementation flow of each step in this embodiment.

1. Locating faces in images

Face localization is a currently mature technology, and belongs to the field of target detection. With the development of the current deep learning technology, it is an easy-to-implement work to construct a detection model based on the existing target detection model. In this embodiment, a plurality of front photos are obtained by continuously shooting a target person, and a face detection model based on the deep learning model YOLOv5 is used for face localization for each of the continuously shot target person photos to obtain a face region in each photo. The result of face location in one of the example photographs is shown in fig. 4.

2. Positioning key points of human face and confirming human eye area

And aiming at each photo subjected to face positioning processing, positioning face key points in the face region by using a face key point detection model to obtain a human eye region in each photo.

In the general face recognition process, the face has 68 key points, and the eyes have 12 key points, wherein the key points of the eye region are respectively taken from four points of the junction of the outer canthus of the left eye and the right eye and the upper eyelid and the lower eyelid. The human eye area can be confirmed by identifying and confirming the key points and the coordinates of the eyes. Based on the result of Face positioning, the present embodiment uses a SHAPE PREDICTOR 68FACE LANDMARKS model based on DLIB library, and the model is implemented by training One millisecondary Face Alignment with an Ensemble of Regression Trees algorithm (GBDT). The GBDT algorithm is a regression tree-based face alignment algorithm that regresses the shape of a face from the current shape to the true shape step by building a cascaded residual regression tree (GBDT). And each leaf node of each GBDT stores a residual regression quantity, when the input falls on one node, the residual is added to the input to achieve the purpose of regression, and finally all the residual are superposed together to fulfill the purpose of face alignment.

The model can be used for detecting and identifying key points in the human face area, so that the coordinates of the human eye area are extracted and the human eye area is confirmed. Fig. 5 and fig. 6 show the positioning result of the face key points and the key points in the human eye region, respectively.

3. Recognizing whether human eyes are closed or open

And (3) extracting a direction gradient Histogram (HOG) of a human eye region in each photo subjected to the positioning processing of the key points of the human face as a feature vector, inputting the feature vector into a trained machine learning classifier to classify the state of the human eye, and judging whether the human eye of the target person in each photo is in a closed state or an open state.

The identification of the closed or open state of the human eye uses a statistical machine learning method. The basic process of classifying by the statistical machine learning method is to extract effective characteristics of a target and select a proper classifier for classification. In this embodiment, HOG is selected as the feature vector and a support vector machine is selected as the classifier. The starting points for selecting HOG as a feature vector are: the eye-closing condition eyelid is horizontal, while the eye-opening condition eyelid is arc, and the eyeball appears; the gradient directions of both the closed-eye state and the open-eye state are significantly different. The support vector machine is selected because the support vector machine has very good learning ability and classification effect under a small sample.

In this embodiment, the basic flow of training the support vector machine to realize the classification of the eye opening and eye closing states is as follows:

first, samples of the open-eye and closed-eye states were collected, and 59 human eyes in the closed-eye state were collected in total in this example; 76 images of the open eye state, all samples including the left and right eyes, and the wearing of glasses, etc., are shown in fig. 7. When the human eye area is cut, the human eyes have sizes, and the human eye cutting detection program is automatically realized, so that the problem of the size of a sample image exists. The embodiment adopts the scaling operation to unify the images to the same size so as to reduce the influence on the classifier during training.

Then, HOG features of the human eye region image are extracted, which reflect the appearance and shape of local objects that can be well described by the directional density distribution of the gradient or edge. The essence is as follows: statistics of the gradient, while the gradient is mainly present at the edge. The calculation process is as follows: (1) calculating the gradients of the horizontal coordinate and the vertical coordinate of the image, and calculating the gradient direction value of each pixel position according to the gradients; (2) constructing a direction histogram in a cell, wherein the value range of the direction histogram is 0-3600; (3) in yet a larger range, the histogram of direction is counted for the entire human eye region in this embodiment. In specific implementation, the method is constructed by adopting an HOGDescriptor class of an Opencv image library; the specific parameters are as follows:

HOGDescriptor*hog＝new HOGDescriptor(cvSize(ImgWidht,ImgHeight),cvSize(32,32),cvSize(16,16),cvSize(16,16),9)；

after the HOG features are extracted, the HOG features need to be input into a support vector machine for learning according to positive and negative samples, and support vectors are obtained. In this embodiment, a CvSVM class in an Opencv library is used for implementation. The parameters and criteria for training are as follows:

criteria＝cvTermCriteria(CV_TERMCRIT_EPS,1000,FLT_EPSILON)；

param＝CvSVMParams(CvSVM::C_SVC,CvSVM::RBF,10.0,0.09,1.0,10.0,0.5,1.0,NULL,criteria)；

after the sample training is finished, the model is kept in a file in the form of xml.

When the method is actually used for human eye state classification, a learned support vector machine model is introduced, then an input human eye image is zoomed to the same size as a training sample, an HOG feature vector is extracted from the zoomed human eye image, and the HOG feature vector is input into the learned support vector machine model for classification.

It is to be noted that, since the eye-open state of some of the human eyes is not obvious, in the present embodiment, when classifying both eyes, as long as any one of the two eyes is recognized as the eye-open state, both eyes are considered to be in the eye-open state. In one example of the present invention, the detection results for the OPEN-eye state (OPEN) and the CLOSED-eye state (CLOSED) are shown in fig. 8. The experimental result shows that the identification accuracy is more than 98%.

4. Human eye region replacement

And aiming at the photo to be processed with the human eyes of the target person in the closed state, acquiring a photo with the human eyes of the target person in the open state from other continuously shot photos as a replacement photo, registering the human eye region in the open state in the replacement photo to the human eye region in the closed state in the photo to be processed through affine transformation, replacing, and finally smoothing the boundary of the human eye region after replacement in the photo to be processed to complete the elimination of the closed state in the photo.

In the present embodiment, after the human eye regions of the to-be-processed photograph (i.e., the closed-eye photograph, hereinafter abbreviated as Pic1) and the replacement photograph (i.e., the open-eye photograph, hereinafter abbreviated as Pic2) are separately identified and the open/closed eye is determined, the human eye region of Pic2 is subjected to affine transformation to replace and cover the human eye region of Pic1, and the natural feeling after replacement is enhanced by gaussian filtering, feathering, or other methods. The specific implementation process is as follows:

firstly, reading in Pic1 and Pic2 which are classified by human eye states, drawing a convex hull by using eye key points obtained by previous recognition, preliminarily framing a mask of a human eye region, and carrying out equidistant scaling of the size of a relative photo of the convex hull obtained for the first time, so that the possibility that the real human eye region is cut off by the edge of the convex hull is avoided.

Then, the human eye region of Pic2 was affine-transformed to fit the human eye region of Pic 1. And performing Gaussian filtering on the edge point set of the image obtained by transformation and the edge point set of the replaced image, and calculating a convolution kernel to perform convolution to obtain the image edge with natural transition.

And finally, overlaying the processed image on the corresponding position of Pic1 to realize the replacement of the human eye area. Fig. 9 is the result of replacing the open eye region of the right image of fig. 8 to the closed eye state of the left image of fig. 8.

Therefore, the method of the invention assumes that a plurality of pictures are continuously shot in the shooting process, and the pictures are basically consistent in shooting settings such as exposure and the like, and the processing of partial eye closing areas is realized through the fusion replacement of the plurality of pictures on the basis that the external shooting scene such as illumination and the like is basically unchanged, so that the eye area in the eye opening state is ensured to be shot by the user, and is not generated through an artificial intelligence method. The process of the method is similar to the manual process by image editing software, but all processes are implemented in an automated manner by various image processing algorithms. The method can well deal with the problem that people in the photos can not all keep the eye-open state simultaneously when the group photo is taken, and has higher efficiency and better effect compared with the method for repairing the pictures through image editing software.

The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims

1. An automatic eye closing processing method based on continuous multiple pictures is used for eliminating the eye closing state in the pictures, and is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the face localization model adopts a deep learning model YOLO.

3. The method as claimed in claim 1, wherein the face keypoint detection model locates 68face keypoints together in the face region, and there are 12 keypoints for both eyes.

4. The method according to claim 1, wherein the machine learning classifier is a support vector machine.

5. The automatic closed-eye processing method based on multiple continuous photos as claimed in claim 4, wherein the support vector machine is trained in advance by using labeled positive and negative samples, so that the classification accuracy of the human eye state meets the preset condition.

6. The automatic closed-eye processing method based on multiple continuous photos as claimed in claim 1, wherein when affine transformation is performed on the photo to be processed and the replacement photo, an affine transformation matrix required for registration is calculated based on the key points of the human eyes in the human eye areas in the two photos.

7. The automatic closed-eye processing method based on multiple continuous photos as claimed in claim 1, wherein the boundaries of the human eye regions after replacement in the photos to be processed are smoothed by gaussian filtering.

8. The automatic closed-eye processing method based on continuous multiple photographs as claimed in claim 1, wherein the multiple positive photographs taken continuously are group photographs or single photographs.