Disclosure of Invention
In order to overcome the defects of the prior art, an object of the present invention is to provide a human eye attention localization method based on a deep neural network, which can solve the problems of low accuracy and certain limitations of the conventional human eye attention localization method.
The second objective of the present invention is to provide a human eye attention localization system based on a deep neural network, which can solve the problems of low accuracy and certain limitations of the conventional human eye attention localization method.
One of the purposes provided by the invention is realized by adopting the following technical scheme:
the human eye attention positioning method based on the deep neural network is applied to a camera to acquire a human face image, and is characterized by comprising the following steps of:
positioning key points, namely positioning the key points of the face image to be detected through a preset key point neural network to obtain 68 key points;
detecting a face pose angle, namely normalizing the key points to obtain corresponding key point coordinates, inputting the key point coordinates into a preset face pose angle detection neural network, and outputting a face pose angle value by the preset face pose angle detection neural network;
calculating a central point coordinate, namely calculating the distance of the face to be detected according to the distance of the eye pupils to be detected, the key point coordinate and the known image parameters of the face image to be detected to obtain the central point coordinate between two pupils in the face image to be detected by establishing a mapping relation between the distance of the eye pupils and the distance of the face;
calculating a spatial offset distance, namely inputting a left eye region image, a right eye region image, a face region image, a preset face proportion image and the face posture angle value in the face image to be detected into a preset depth neural network, and outputting the spatial offset distance by the preset depth neural network;
calculating an attention vector, namely calculating the attention vector according to the space offset distance and the central point coordinate;
marking an attention plane and an attention effective area, marking the attention plane of the attention device corresponding to the axis of the camera, and marking the attention effective area on the attention plane according to the size of the attention device;
and positioning an attention point, calculating an intersection point of the attention vector and the attention plane, and judging whether the intersection point is on the attention effective area, wherein if yes, the attention of the human eyes is on the attention device, and if not, the attention of the human eyes is not on the attention device.
Further, the key point localization comprises:
acquiring an image, namely acquiring an image to be detected containing a face to be detected;
detecting a face to be detected, wherein the face to be detected contains a face characteristic region in the image to be detected;
and (4) positioning key points, namely positioning the key points of the face image to be detected through a preset key point neural network to obtain 68 key points.
Further, the center point coordinate calculation includes:
establishing a mapping relation, acquiring a front face image of an original human face when a first axis distance and a second axis distance are preset through a camera, obtaining a first average pixel value and a second average pixel value corresponding to a human eye pupil distance in the front face image, and calculating the original mapping relation between the human eye pupil distance and the human face distance according to the preset first axis distance, the preset second axis distance, the first average pixel value and the second average pixel value, wherein the human face distance is the distance from the human face to the camera;
generating the inter-pupil distance of the human eyes to be detected, and carrying out image processing on the human face image to be detected to obtain the inter-pupil distance of the human eyes to be detected;
calculating the face distance, and calculating the face distance to be measured according to the original mapping relation and the interpupillary distance of the eyes to be measured;
and calculating coordinates, namely calculating to obtain the coordinate of a central point between two pupils in the face image to be detected according to the distance of the face to be detected, the coordinate of the key point and the known image parameters of the face image to be detected, wherein the coordinate of the central point is the coordinate of the central point between the two pupils in the face image to be detected.
Further, the size range of the horizontal rotation angle and the pitch angle from the original face to the camera axis of the front face image is 0-5 degrees.
Further, the preset first axle center distance is different from the preset second axle center distance.
Further, the preset human face posture angle detection neural network comprises an input layer, a first full connection layer, a second full connection layer and an output layer.
Further, the detection of the face pose angle specifically comprises: and normalizing the key points to obtain corresponding key point coordinates, entering the key point coordinates through an input layer, sequentially processing the key point coordinates through a first full-connection layer and a second full-connection layer, and finally outputting a face pose angle value through an output layer.
The second purpose of the invention is realized by adopting the following technical scheme:
human eye attention positioning system based on deep neural network is characterized by comprising:
the key point positioning module is used for carrying out key point positioning on the face image to be detected through a preset key point neural network to obtain 68 key points;
the human face posture angle detection module is used for carrying out normalization processing on the key points to obtain corresponding key point coordinates, inputting the key point coordinates into a preset human face posture angle detection neural network, and outputting a human face posture angle value by the preset human face posture angle detection neural network;
the central point coordinate calculation module is used for calculating the distance of the face to be detected according to the distance of the eye pupil to be detected and the mapping relation by establishing the mapping relation between the distance of the eye pupil and the distance of the face to be detected, and calculating the central point coordinate between two pupils in the face image to be detected according to the distance of the face to be detected, the key point coordinate and the known image parameters of the face image to be detected;
the spatial offset distance calculation module is used for inputting a left eye region image, a right eye region image, a face region image, a preset face proportion image and the face posture angle value in the face image to be detected into a preset deep neural network, and the preset deep neural network outputs a spatial offset distance;
the attention vector calculation module is used for calculating an attention vector according to the space offset distance and the center point coordinate;
the marking module is used for marking an attention plane of the attention device corresponding to the axis of the camera and marking an attention effective area on the attention plane according to the size of the attention device;
an attention point locating module, configured to calculate an intersection point of the attention vector and the attention plane, and determine whether the intersection point is on the attention valid region.
Furthermore, the key point positioning module comprises a camera, a face detection unit and a key point positioning unit, wherein the camera is used for acquiring an image to be detected containing a face to be detected; the face detection unit is used for detecting a face image to be detected containing a face characteristic region in the image to be detected; the key point positioning unit is used for carrying out key point positioning on the face image to be detected through a preset key point neural network to obtain 68 key points.
Furthermore, the central point coordinate calculation module comprises a mapping relation establishing unit, a to-be-detected human eye pupil distance generating unit, a human face distance calculating unit and a coordinate calculation unit;
the mapping relationship establishing unit is used for acquiring a frontal face image of an original human face when the frontal face image is located at a preset first axial distance and a preset second axial distance through a camera, obtaining a first average pixel value and a second average pixel value corresponding to a human eye pupil distance in the frontal face image, and calculating an original mapping relationship between the human eye pupil distance and the human face distance according to the preset first axial distance, the preset second axial distance, the first average pixel value and the second average pixel value, wherein the human face distance is the distance from the human face to the camera;
the unit for generating the interpupillary distance of the human face to be detected is used for carrying out image processing on the human face image to be detected to obtain the interpupillary distance of the human eye to be detected;
the face distance calculating unit is used for calculating the distance of the face to be measured according to the original mapping relation and the interpupillary distance of the eyes to be measured;
the coordinate calculation unit is used for calculating the coordinate of the central point between two pupils in the face image to be detected according to the distance of the face to be detected, the coordinate of the key point and the known image parameters of the face image to be detected, wherein the coordinate of the central point is the coordinate of the central point between the two pupils in the face image to be detected.
Compared with the prior art, the invention has the beneficial effects that: the invention relates to a human eye attention positioning method based on a deep neural network, which comprises the steps of obtaining key points through key point positioning, carrying out normalization processing on the key points to obtain corresponding key point coordinates, obtaining a human face posture angle value according to the key point coordinates, obtaining coordinates of a central point between two pupils of a human face to be detected through calculation, obtaining a spatial offset distance through a preset deep neural network, obtaining an attention vector according to the central point coordinates and the spatial offset distance, finally judging whether an intersection point of the attention vector and an attention plane is in an attention effective area, if so, the human eye attention is on attention equipment, if not, the human eye attention is not on the attention equipment, the positioning result accuracy rate of the whole process is high, the method is suitable for different equipment and can be used universally in different scenes.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to make the technical solutions of the present invention practical in accordance with the contents of the specification, the following detailed description is given of preferred embodiments of the present invention with reference to the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and the detailed description, and it should be noted that any combination of the embodiments or technical features described below can be used to form a new embodiment without conflict.
As shown in FIG. 1, the human eye attention localization method based on the deep neural network of the present invention comprises the following steps:
positioning key points, namely positioning the key points of the face image to be detected through a preset key point neural network to obtain 68 key points; the method specifically comprises the following steps: acquiring an image, namely acquiring an image to be detected containing a face to be detected; and acquiring an image to be detected containing the face to be detected by using a camera, wherein the image to be detected contains the face to be detected and other backgrounds.
Detecting a face, namely detecting a face image to be detected containing a face characteristic region in the image to be detected; and detecting the face characteristic region in the image to be detected and obtaining the image of the face to be detected only containing the face to be detected.
Positioning key points, namely positioning the key points of the face image to be detected through a preset key point neural network to obtain 68 key points; and inputting the key point neural network into a training set in advance for training to obtain a usable preset key point neural network, and processing the face image to be detected by using the preset key point neural network to obtain 68 key points in total.
Detecting a face pose angle, namely normalizing the key points to obtain corresponding key point coordinates, inputting the key point coordinates into a preset face pose angle detection neural network, and outputting a face pose angle value by the preset face pose angle detection neural network; in this embodiment, the original training set is used to train the face pose angle detection neural network, and the preset face pose angle detection neural network that can be used is obtained through repeated training for many times. The method specifically comprises the following steps: and normalizing the key points to obtain corresponding key point coordinates, namely transforming the key points in the face image to be detected to finally obtain uniform two-dimensional coordinates. And entering the key point coordinates through an input layer, sequentially processing the key point coordinates through a first full-connection layer and a second full-connection layer, and finally outputting the face pose angle value through an output layer. The face posture angle value comprises a horizontal rotation angle value, a tilt angle value and a pitch angle value; the horizontal rotation angle value is the degree value of the left-right rotation of the face, the inclination angle value is the degree value of the inclination of the face, and the pitch angle value is the degree value of the face rising or overlooking. In this embodiment, the normalization processing is performed on the key points to obtain corresponding key point coordinates, the key point coordinates enter through the input layer and sequentially pass through the first full-link layer and the second full-link layer, and finally the output layer outputs the face pose angle value. In this embodiment, the dimension of the input layer is 1 × 136, the dimension of the first fully-connected layer is 136 × 68, the dimension of the second fully-connected layer is 68 × 3, and the dimension of the output layer is 1 × 3. The size of the preset human face posture angle detection neural network is only 38k. In this embodiment, the preset human face posture model detection neural network only needs 1ms when detecting the human face posture angle.
And calculating the center point coordinate, namely calculating the distance of the face to be detected according to the eye pupil distance of the person to be detected and the mapping relation by establishing the mapping relation between the eye pupil distance and the face distance, and calculating the center point coordinate between two pupils in the face image to be detected according to the distance of the face to be detected, the key point coordinate and the known image parameters of the face image to be detected. The method specifically comprises the following steps:
establishing a mapping relation, acquiring a frontal face image of an original human face when the frontal face image is located at a preset first axial distance and a preset second axial distance through a camera, obtaining a first average pixel value and a second average pixel value corresponding to a human eye pupil distance in the frontal face image, and calculating the original mapping relation between the human eye pupil distance and the human face distance according to the preset first axial distance, the preset second axial distance, the first average pixel value and the second average pixel value, wherein the human face distance is the distance from the human face to the camera. In this embodiment, the following are specific:
establishing a space coordinate system by taking the axis of the camera as an original point, wherein the space coordinate system comprises an X axis, a Y axis and a Z axis, the preset first axis distance is the distance (Z axis direction) from the original face to the axis of the camera, and in this embodiment, the preset first axis distance is d1; the preset second axial distance is also the distance (Z-axis direction) from the original face to the axis of the camera, and in this embodiment, the preset second axial distance is d2; and d1 is not equal to d2; then, collecting the front face image of the original human face when the camera is positioned at the preset first axial center distance and the preset second axial center distance to obtain the width and the height of the front face image, and obtaining a first average pixel value and a second average pixel value corresponding to the eye pupil distance in the front face image; the first average pixel corresponds to a preset first axle center distance, the second average pixel corresponds to a preset second axle center distance, the first average pixel is made to be L1, and the second average pixel is made to be L2; calculating an original mapping relation between the human eye pupil distance and the human face distance according to a preset first axial center distance, a preset second axial center distance, a first average pixel value and a second average pixel value, wherein the specific mapping relation is shown as a formula (1),
d=k*(L-L1)+d1 (1)
wherein d is the distance between the human face,

d1 is a preset first axial distance, d2 is a preset second axial distance, and L is a human eye pupil distance, wherein 0<L<And = min (W, H), where W is the width of the front face image of the original face acquired by the camera, H is the height of the front face image of the original face acquired by the camera, L1 is the first average pixel, and L2 is the second average pixel. According to the formula (1), only d and L in the formula are variables, so that the variable relation of d and L can be obtained, and the variable relation is the distance between the pupil distance of the human eyes and the distance between the human facesAnd (5) original mapping relation. The front face image of the embodiment is that the horizontal rotation angle and the pitch angle of the original face relative to the axis of the camera are required to be in the range of 0-5 degrees, and due to actual error operation, certain errors are allowed to be accepted in the actual detection process.
Generating the interpupillary distance of the human eye to be detected, collecting the image of the human face to be detected through a camera, and carrying out image processing on the image of the human face to be detected to obtain the interpupillary distance of the human eye to be detected; specifically, the generation of the inter-pupil distance of the human eye to be detected is to acquire a human face image to be detected containing the human face to be detected through a camera, perform face detection processing, key point positioning processing and human face posture angle calculation processing on the human face image to be detected to obtain an unprocessed inter-pupil distance of the human eye and a horizontal corner to be detected, wherein the horizontal corner to be detected is a horizontal corner between the human face to be detected and the axis of the camera, and calculate the inter-pupil distance of the human eye to be detected according to the unprocessed inter-pupil distance of the human eye and the horizontal corner to be detected. The following are exemplified:
acquiring a human face image to be detected through a camera, performing human face detection processing, key point positioning processing and human face posture angle calculation processing on the human face image to be detected to obtain an unprocessed human eye pupil interval and a horizontal corner to be detected, wherein the unprocessed human eye pupil interval is set to be L _ temp, the horizontal corner to be detected is set to be Y, the human face image to be detected obtained at the moment has a rotation angle Y on the horizontal position relative to the axis of the camera, so that the unprocessed human eye pupil interval at the moment is converted into the human eye pupil interval in the face-up state, and the unprocessed human eye pupil interval and the horizontal corner to be detected are substituted into a formula (2) to calculate to obtain the human eye pupil interval to be detected, wherein the formula (2) is as follows:
wherein L is 1 The distance between the pupils of the human to be detected is L _ temp, the distance between the pupils of the human to be detected is unprocessed, and the horizontal corner to be detected is Y; in equation (2), Y must be greater than-90 and less than 90.
And calculating the distance of the human face, and calculating the distance of the human face to be detected according to the original mapping relation and the interpupillary distance of the human eyes to be detected. And (3) calculating the distance between the faces to be detected, namely the distance between the faces to be detected and the axis Z of the camera according to the mapping relation in the formula (1) and the interpupillary distance between the eyes to be detected obtained by the formula (2).
And calculating coordinates, namely calculating a central point coordinate between two pupils in the face image to be detected according to the distance of the face to be detected, the key point coordinate and the known image parameters of the face image to be detected, wherein the central point coordinate is the coordinate of the central point between the two pupils in the face image to be detected. The method specifically comprises the following steps: in this embodiment, let the coordinate of the central point between the left and right pupils in the face image to be measured be P1, the coordinate of P1 relative to the axis of the camera be (x 1, y1, Z1), and the face distance d to be measured obtained through the above calculation is the distance of P1 on the Z axis, that is, d = Z1. Calculating the coordinates (w 1, h 1) of the P1 on the face image to be detected according to the obtained key point coordinates; making the intersection point of the Z axis and the plane formed by the X axis and the Y axis where the face image to be detected is located be P0, then the coordinate of P0 is (0, Z1), and at the moment, the coordinate of P0 on the face image to be detected is (W/2, H/2), wherein W and H are the width and height of the face image to be detected; x1, y1 is calculated according to the following formula (3) and formula (4), the formula is as follows,
x1=k*(w1-W 1 /2-L1)+d1 (3)
y1=k*(h1-H 1 /2-L1)+d1 (4)
wherein X1 is the coordinate of the X axis of the central point on the space coordinate system taking the axis of the camera as the origin, Y1 is the coordinate of the Y axis of the central point on the space coordinate system taking the axis of the camera as the origin, wherein
d1 is a preset first axial distance, d2 is a preset second axial distance, and L is the interpupillary distance of human eyes, wherein 0<L<=min(W
1 ,H
1 ),W
1 Width of the face image to be measured, H
1 And L1 is a first average pixel and L2 is a second average pixel, wherein the height of the face image to be detected is obtained. From x1 and y1 obtained by the above formula, a specific value of the center point coordinate, i.e., (x 1, y1, z 1) can be obtained.
Calculating a spatial offset distance, namely inputting a left eye region image, a right eye region image, a face region image, a preset face proportion image and a face posture angle value in the face image to be detected into a preset depth neural network, and outputting the spatial offset distance by the preset depth neural network; let the spatial offset distance at this time be converted into a vector, i.e., (Δ x, Δ y, Δ z).
Calculating an attention vector, namely calculating to obtain the attention vector according to the space offset distance and the coordinates of the central point; let the attention vector be V1, V1= (Δ X-X1, Δ Y-Y1, Δ Z-Z1), and V1 can be obtained from the center point coordinates (X1, Y1, Z1) obtained as described above, where Δ X-X1 is the vector of attention on the X axis, Δ Y-Y1 is the vector of attention on the Y axis, and Δ Z-Z1 is the vector of attention on the Z axis.
Marking an attention plane and an attention effective area, marking the attention plane of the attention device corresponding to the axis of the camera, and marking the attention effective area on the attention plane according to the size of the attention device; the method specifically comprises the following steps: the attention device in this embodiment is a screen or a device or an object (such as a painting and calligraphy, an exhibit, etc.) in a scene, and a spatial plane of the attention device with respect to an axis (a center of a spatial three-dimensional coordinate system) of the camera, that is, an attention plane, is labeled, specifically: if the attention device is a regular plane, such as a screen, a planar device, etc., three non-collinear points p1, p2, p3 may be taken on the plane and the spatial coordinates of each point relative to the axis of the camera may be calculated. If the attention device is an irregular plane, three non-collinear points p1, p2, and p3 can be approximated on the plane, and the spatial coordinates of each point relative to the axis of the camera can be calculated. A plane formed by the three points is an attention plane, and an attention effective area is marked in the attention plane according to the length and width (size) of the attention device.
And (4) positioning an attention point, calculating an intersection point of an attention vector and an attention plane, and judging whether the intersection point is on an attention effective area, wherein if so, the attention of the human eyes is on the attention device, and if not, the attention of the human eyes is not on the attention device. And calculating the intersection point of the attention vector V1 and the attention plane, wherein if the intersection point exists and is on the attention valid region, the attention point of the human eye is on the attention device, and if no intersection point exists or the intersection point does not exist on the attention device, the attention of the human eye is not on the attention device.
As shown in fig. 2, the present invention provides a deep neural network-based human eye attention localization system, comprising: the key point positioning module is used for carrying out key point positioning on the face image to be detected through a preset key point neural network to obtain 68 key points;
the face pose angle detection module is used for carrying out normalization processing on the key points to obtain corresponding key point coordinates, inputting the key point coordinates into a preset face pose angle detection neural network, and outputting a face pose angle value by the preset face pose angle detection neural network;
the central point coordinate calculation module is used for calculating the distance of the face to be detected according to the eye pupil distance of the person to be detected and the mapping relation by establishing the mapping relation between the eye pupil distance and the face distance, and calculating the central point coordinate between two pupils in the face image to be detected according to the distance of the face to be detected, the key point coordinate and the known image parameters of the face image to be detected;
the spatial offset distance calculation module is used for inputting a left eye region image, a right eye region image, a face region image, a preset face proportion image and a face posture angle value in the face image to be detected into a preset depth neural network, and the preset depth neural network outputs a spatial offset distance;
the attention vector calculation module is used for calculating an attention vector according to the space offset distance and the center point coordinate;
the marking module is used for marking an attention plane of the attention device corresponding to the axis of the camera and marking an attention effective area on the attention plane according to the size of the attention device;
and the attention point positioning module is used for calculating the intersection point of the attention vector and the attention plane and judging whether the intersection point is on the attention effective area.
In this embodiment, the key point positioning module includes a camera, a face detection unit, and a key point positioning unit, where the camera is used to obtain a to-be-detected image containing a to-be-detected face; the face detection unit is used for detecting a face image to be detected containing a face characteristic area in the image to be detected; the key point positioning unit is used for carrying out key point positioning on the face image to be detected through a preset key point neural network to obtain 68 key points. The central point coordinate calculation module comprises a mapping relation establishing unit, a unit for generating the interpupillary distance of the human eye to be detected, a unit for calculating the distance of the human face and a coordinate calculation unit; the mapping relation establishing unit is used for acquiring a frontal face image of an original human face when the first axis distance and the second axis distance are preset through the camera, obtaining a first average pixel value and a second average pixel value corresponding to the human eye pupil distance in the frontal face image, and calculating the original mapping relation between the human eye pupil distance and the human face distance according to the preset first axis distance, the preset second axis distance, the first average pixel value and the second average pixel value, wherein the human face distance is the distance from the human face to the camera; the generating unit for the interpupillary distance of the human eyes to be detected is used for carrying out image processing on the human face image to be detected to obtain the interpupillary distance of the human eyes to be detected; the human face distance calculating unit is used for calculating the distance of the human face to be detected according to the original mapping relation and the interpupillary distance of the human eyes to be detected; the coordinate calculation unit is used for calculating and obtaining a central point coordinate between two pupils in the face image to be detected according to the face distance to be detected, the key point coordinate and the known image parameter of the face image to be detected, wherein the central point coordinate is the coordinate of the central point between the two pupils in the face image to be detected.
The invention relates to a human eye attention positioning method based on a deep neural network, which comprises the steps of obtaining key points through key point positioning, carrying out normalization processing on the key points to obtain corresponding key point coordinates, obtaining a human face posture angle value according to the key point coordinates, obtaining coordinates of a central point between two pupils of a human face to be detected through calculation, obtaining a spatial offset distance through a preset deep neural network, obtaining an attention vector according to the central point coordinates and the spatial offset distance, finally judging whether an intersection point of the attention vector and an attention plane is in an attention effective area, if so, the human eye attention is on attention equipment, if not, the human eye attention is not on the attention equipment, the positioning result accuracy rate of the whole process is high, the method is suitable for different equipment and can be used universally in different scenes.
The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner; those skilled in the art can readily practice the invention as shown and described in the drawings and detailed description herein; however, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the scope of the invention as defined by the appended claims; meanwhile, any changes, modifications, and evolutions of the equivalent changes of the above embodiments according to the actual techniques of the present invention are still within the protection scope of the technical solution of the present invention.