CN107545302A

CN107545302A - A kind of united direction of visual lines computational methods of human eye right and left eyes image

Info

Publication number: CN107545302A
Application number: CN201710650058.4A
Authority: CN
Inventors: 陆峰; 陈小武; 赵沁平
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-08-02
Filing date: 2017-08-02
Publication date: 2018-01-05
Anticipated expiration: 2037-08-02
Also published as: CN107545302B

Abstract

The invention provides a line-of-sight calculation method for the combination of left and right eye images of human eyes, including: an extraction model of binocular information, inputting human eye images, and automatically extracting the left eye and right eye contained in the image respectively through the dual-channel model The information features of the human eyes; the extraction model of the joint information features of the human eye, input the user's binocular image, and the model combines the information of the eyes to extract the joint information features of the human eyes; a joint algorithm is invented, and the three-dimensional line of sight direction is calculated by inputting the feature information. One of the applications of the present invention is virtual reality and human-computer interaction. Its principle is to calculate the direction of the user's line of sight by taking pictures of the user's eyes, so as to interact with the intelligent system interface or virtual reality objects. The invention can also be widely used in the fields of training, game entertainment, video monitoring, medical monitoring and the like.

Description

A gaze direction calculation method based on the combination of human left and right eye images

技术领域technical field

本发明涉及计算机视觉和图像处理领域，具体地说是一种人眼左右眼图像联合的视线方向计算方法。The invention relates to the fields of computer vision and image processing, in particular to a method for calculating the line of sight direction of human left and right eye images.

背景技术Background technique

视线追踪/眼动追踪对于用户行为理解和高效人机交互具有重要意义。人类可感知信息中超过80％的部分由人眼接收，而其中超过90％的部分由视觉系统处理。因此，视线是反映人与外界交互过程的重要线索。近年来，由于虚拟现实技术和人机交互技术的迅速发展，视线追踪技术的应用价值逐渐凸显；另一方面，视线方向计算在计算机视觉领域仍然是一个极富挑战性的问题。Gaze tracking/eye tracking is of great significance for user behavior understanding and efficient human-computer interaction. More than 80% of human perceptible information is received by the human eye, and more than 90% of it is processed by the visual system. Therefore, line of sight is an important clue to reflect the interaction process between people and the outside world. In recent years, due to the rapid development of virtual reality technology and human-computer interaction technology, the application value of gaze tracking technology has gradually become prominent; on the other hand, gaze direction calculation is still a very challenging problem in the field of computer vision.

目前的视线追踪技术从根本上分为了基于外观的视线追踪技术和基于模型的视线追踪技术两种。在当前的环境中，由于基于模型的视线追踪技术往往有着很高的正确率，从而使得大多数的人们都专注于基于模型的视线追踪方法的研究。基于模型的视线追踪技术需要实验者提供很多的几何特征，比如说瞳孔的方向，并以此来建立一个眼睛模型，通过模型来预测人们视线的方向。正因如此，基于模型的视线追踪技术有着以下几点缺陷:1)需要昂贵的设备仪器。基于模型的视线追踪技术是通过建立一个眼睛模型或者其他的几何模型来预测参与者的视线方向，所以它需要通过使用一些独特的设备来提取参与者的一些关于眼睛的几何特征，并以此建立模型。2)基于模型的视线追踪技术需要在严格的室内环境中进行。由于基于模型的视线追踪技术中需要实验者提供的几何特征一般是由红外线所测量得到的，而一些其他的干扰源，比如太阳光中蕴含了太多的红外光，这会对仪器的测量结果造成非常严重的干扰，所以测量的设备也需要放在严格的室内环境中，以此来避免太阳光等其他红外光的干扰。3)基于模型的视线追踪技术需要高分辨率的图像来进行训练，所以有着被限制的工作距离，一般来说不超过60cm。因此，基于模型的实验方法不能普遍的使用于大部分的普通环境中。The current eye-tracking technology is fundamentally divided into two types: appearance-based eye-tracking technology and model-based eye-tracking technology. In the current environment, since model-based eye-tracking technology often has a high accuracy rate, most people focus on the research of model-based eye-tracking methods. The model-based gaze tracking technology requires the experimenter to provide many geometric features, such as the direction of the pupil, and use this to build an eye model, and use the model to predict the direction of people's gaze. Because of this, model-based eye-tracking technology has the following disadvantages: 1) expensive equipment and instruments are needed. Model-based gaze tracking technology predicts the participant's gaze direction by building an eye model or other geometric models, so it needs to use some unique equipment to extract some geometric features of the participant's eyes, and establish Model. 2) Model-based eye-tracking techniques need to be performed in strict indoor environments. Because the geometric features that need to be provided by the experimenter in the model-based eye-tracking technology are generally measured by infrared rays, and some other sources of interference, such as too much infrared light contained in sunlight, will affect the measurement results of the instrument. It causes very serious interference, so the measurement equipment also needs to be placed in a strict indoor environment, so as to avoid the interference of other infrared light such as sunlight. 3) Model-based gaze tracking technology requires high-resolution images for training, so it has a limited working distance, generally no more than 60cm. Therefore, model-based experimental methods cannot be generally used in most common environments.

与此相反的，基于外观的视线追踪技术直接通过从人眼图片中学习到各种信息，建立图片和人眼视线之间的映射关系，以此来得到视线的方向，它并没有上述基于模型的视线追踪技术所拥有的种种限制，它只需要通过一个普通的相机去拍摄到人眼的外观图片。这一条件使得基于外观的视线追踪技术有着普遍的适用性，也使得基于外观的视线追踪技术有着无可媲美的应用前景。但是由于基于外观的视线追踪技术对于采样工具、采样环境的要求并不严格，从而使得模型的输入数据，往往有着多种多样的环境因素影响，比如说：光照、参与者、头部位置等。光照的强度会使得图片变得明亮或暗，在极暗环境中，人们更是难以从图片上分辨出一个人的眼睛图像；同样的，头部位置的不同更是对于人眼的采样有着很大的影响，对于同一个人来说，拍摄它的正面照和侧向拍照所得到的人眼图像就有所不同。以上的种种因素，使得基于外观的视线追踪技术的输入数据中就有着很大的噪声信息，这也正是基于外观的视线追踪技术所面临的一个挑战，正因为这个缺陷，基于外观的视线追踪技术，在精确度上，远远不如基于模型的视线追踪方法。同时，目前的基于外观的视线追踪方法，往往是以用户的单眼图像作为输入信息的，然而在实际的应用中，每一次对于用户信息的采集得到的往往都是用户某一时刻的双眼图像，将同一时刻的双眼图像分开输入，也正忽略了双眼图像中关于某一时刻的相关性。On the contrary, the appearance-based gaze tracking technology directly learns various information from human eye pictures, and establishes the mapping relationship between the picture and human eye sight, so as to obtain the direction of sight. It does not have the above-mentioned model-based Due to the various limitations of the eye-tracking technology, it only needs to take a picture of the appearance of the human eye through an ordinary camera. This condition makes the appearance-based eye-tracking technology universally applicable, and also makes the appearance-based eye-tracking technology have an unparalleled application prospect. However, since the appearance-based gaze tracking technology does not have strict requirements on sampling tools and sampling environments, the input data of the model is often affected by various environmental factors, such as lighting, participants, and head positions. The intensity of light will make the picture bright or dark. In an extremely dark environment, it is even more difficult for people to distinguish a person's eye image from the picture; similarly, the difference in head position has a great impact on the sampling of human eyes. For the same person, the human eye images obtained by taking its frontal photo and sideways photo are different. The above factors make the input data of the appearance-based eye-tracking technology have a lot of noise information, which is exactly a challenge faced by the appearance-based eye-tracking technology. Because of this defect, the appearance-based eye-tracking The technique, in terms of accuracy, is far inferior to model-based eye-tracking methods. At the same time, the current appearance-based gaze tracking methods often use the user's monocular image as input information. However, in practical applications, each time the user information is collected, it is often the user's binocular image at a certain moment. Inputting binocular images at the same moment separately is also ignoring the correlation of binocular images at a certain moment.

在近些年来，各种各样的新式模型也是应运而生，在这各种各样的模型中，神经网络的表现尤为突出。深度学习中的卷积神经网络作为神经网络的一个类别，更是十分火热。由于深度学习中的卷积神经网络具有了局部感知的特性，使得它能够很好的提取出图片的局部特征，保留图片的局部相关信息，同时由于权值共享的原因，使得深度学习中的卷积神经网络也不会需要消耗大量的时间去进行训练，也正因此，深度学习中的卷积神经网络在各种图像处理任务中表现尤为突出，例如：图像分类，目标检测,以及语义分隔等。同时，近些年来，硬件的飞速发展，更是让深度学习中的卷积神经网络在图像处理方法表现的更加出色。但用于人眼视线方向确定方法还尚未有相似文献报导。In recent years, a variety of new models have also emerged as the times require. Among these various models, the performance of neural networks is particularly prominent. As a category of neural networks, convolutional neural networks in deep learning are very popular. Since the convolutional neural network in deep learning has the characteristics of local perception, it can extract the local features of the picture well and retain the local relevant information of the picture. At the same time, due to the weight sharing, the convolutional neural network in deep learning The convolutional neural network does not need to consume a lot of time for training. Therefore, the convolutional neural network in deep learning is particularly outstanding in various image processing tasks, such as: image classification, target detection, and semantic segmentation. . At the same time, in recent years, the rapid development of hardware has made the convolutional neural network in deep learning perform better in image processing methods. However, there is no similar literature report on the method for determining the direction of sight of the human eye.

发明内容Contents of the invention

本发明技术解决问题：克服现有技术的不足，提供一种人眼左右眼图像联合的视线方向计算方法，通过使用神经网络去提取图像中蕴含着的信息因素，并通过自适应的方法调整神经网络模型，最终预测双眼的视线方向，主要通过结合双眼的图像信息，从而用于解决基于外观的视线追踪方法中输入的单眼图像噪声较大的问题，从而实现了高精度的三维视线方向预测。The technology of the present invention solves the problem: overcomes the deficiencies of the prior art, and provides a method for calculating the line of sight direction of the left and right eye images of the human eye, by using the neural network to extract the information factors contained in the image, and adjusting the neural The network model finally predicts the direction of sight of both eyes, mainly by combining the image information of both eyes, so as to solve the problem of large noise in the monocular image input in the appearance-based gaze tracking method, thus realizing high-precision 3D gaze direction prediction.

本发明技术解决方案：一种人眼左右眼图像联合的视线方向计算方法,包含以下步骤：The technical solution of the present invention: a method for calculating the line of sight direction of the combination of left and right eye images of human eyes, comprising the following steps:

(1)拍摄用户面部图像，定位左眼或右眼区域，预处理人眼图像，实现对头部位置的修正，并得到固定像素大小的人眼图像；(1) Capture the user's face image, locate the left eye or right eye area, preprocess the human eye image, realize the correction of the head position, and obtain the human eye image with a fixed pixel size;

(2)建立双通道模型，分别输入人眼图像中左眼和右眼的图像信息，使用深度神经网络模型分别提取并输出左眼和右眼的信息特征；(2) Establish a dual-channel model, input the image information of the left eye and the right eye in the human eye image respectively, use the deep neural network model to extract and output the information features of the left eye and the right eye respectively;

(3)建立单通道模型，输入左眼和右眼的图像信息，使用深度神经网络模型提取并输出左右眼图像联合信息特征；(3) Establish a single-channel model, input the image information of the left eye and the right eye, use the deep neural network model to extract and output the joint information features of the left and right eye images;

(4)使用回归分析的方法，结合左眼和右眼的信息特征及左右眼图像联合信息特征，并经过联合优化，预测双眼分别对应的三维视线方向；或者单独使用左眼和右眼的信息特征或左右眼图像联合信息特征，使用回归分析的方法，经过优化后，预测双眼分别对应的三维视线方向(4) Using the regression analysis method, combining the information features of the left and right eyes and the joint information features of the left and right eye images, and through joint optimization, predict the three-dimensional line of sight directions corresponding to the two eyes respectively; or use the information of the left and right eyes alone Feature or left and right eye image joint information features, using the method of regression analysis, after optimization, predict the three-dimensional line of sight direction corresponding to both eyes

所述步骤(2)建立双通道模型，分别输入人眼图像中左眼和右眼的图像信息，经过双通道模型分别提取并输出左眼和右眼的信息特征的具体过程如下：Described step (2) establishes two-channel model, input the image information of left eye and right eye in the human eye image respectively, extract respectively through two-channel model and output the concrete process of the information characteristic of left eye and right eye as follows:

(21)将修正后的固定大小的左眼和右眼图像I_l和I_r输入双通道模型中,I_l和I_r分别经过一个通道处理；(21) Input the fixed-sized left-eye and right-eye images _Il and _Ir after correction into the dual-channel model, and _Il and _Ir are processed by one channel respectively;

(22)每个通道均为一个深度神经网络模型，所述模型对输入的人眼图像进行卷积、池化、全连接操作，输出固定长度的特征向量；(22) Each channel is a deep neural network model, and the model performs convolution, pooling, and full connection operations on the input human eye image, and outputs a fixed-length feature vector;

(23)每个通道产生的固定长度的特征向量即是对应输入图像经过深度神经网络提取后的信息特征，将两个通道所产生的信息特征连接起来，得到最终的左眼和右眼的信息特征。(23) The fixed-length feature vector generated by each channel is the information feature of the corresponding input image extracted by the deep neural network, and the information features generated by the two channels are connected to obtain the final information of the left and right eyes feature.

所述步骤(3)建立单通道模型，输入左眼和右眼的图片信息，使用单通道模型提取并输出左右眼图像联合信息特征的具体过程如下：The step (3) establishes a single-channel model, inputs the picture information of the left eye and the right eye, and uses the single-channel model to extract and output the joint information feature of the left and right eye images. The specific process is as follows:

(31)将修正后的固定大小的人眼图像输入单通道模型中；(31) Input the corrected fixed-size human eye image into the single-channel model;

(32)分别使用深度神经网络模型，分别对左右眼的图像进行卷积、池化、全连接操作，输出精简后的左右眼信息特征；(32) Use the deep neural network model to perform convolution, pooling, and full connection operations on the images of the left and right eyes respectively, and output the simplified information features of the left and right eyes;

(33)连接左右眼信息特征，在深度神经网络模型后添加多个全连接层，使用全连接层合并左右眼的信息特征，最终得到左右眼图像联合信息特征。(33) Connect the information features of the left and right eyes, add multiple fully connected layers after the deep neural network model, use the fully connected layers to merge the information features of the left and right eyes, and finally obtain the joint information features of the left and right eye images.

所述步骤(4)利用使用回归分析的方法，结合左眼和右眼的信息特征，及左右眼图像联合信息特征，并经过联合优化，预测双眼分别对应的三维视线方向的具体过程如下：The step (4) utilizes the method of regression analysis, combines the information characteristics of the left eye and the right eye, and the joint information characteristics of the left and right eye images, and through joint optimization, the specific process of predicting the three-dimensional line of sight directions corresponding to the two eyes is as follows:

(41)输入修正后的左眼和右眼图像I_l和I_r，以及图像所对应的真实左眼视线方向g_l和真实右眼视线方向g_r；(41) Input the corrected left eye and right eye images I _l and I _r, and the real left eye line of sight direction g _l and the real right eye line of sight direction g _r corresponding to the image;

(42)使用步骤(2)以及步骤(3)中提出的深度神经网络模型，提取图像所对应的双眼信息特征以及联合信息特征；(42) Using the deep neural network model proposed in step (2) and step (3), extract the corresponding binocular information feature and joint information feature of the image;

(43)连接所有提取的信息特征或单独使用一种特征作为整体特征，使用回归分析的方法，得到预测的左眼视线方向f(I)_l和预测的右眼视线方向f(I)_r；(43) Connect all the extracted information features or use a single feature as the overall feature, and use the method of regression analysis to obtain the predicted left eye sight direction f(I) _l and the predicted right eye sight direction f(I) _r ;

(44)以角度差作为误差值，使用梯度下降的方法，对模型进行迭代优化，使得预测的视线方向越来越接近于真实的视线方向；(44) Use the angle difference as the error value, and use the method of gradient descent to iteratively optimize the model, so that the predicted line of sight direction is getting closer and closer to the real line of sight direction;

(45)选择预测视线方向最接近于真实视线方向的模型作为最终的模型，模型通过输入人眼图像，得到预测的视线方向，并将该视线方向作为最终预测结果。(45) Select the model whose predicted line-of-sight direction is closest to the real line-of-sight direction as the final model. The model obtains the predicted line-of-sight direction by inputting the human eye image, and takes this line-of-sight direction as the final prediction result.

与其它的视线追踪的方法相比，本发明有益的特点在于：Compared with other eye-tracking methods, the beneficial features of the present invention are:

(1)发明了双眼的信息特征提取模型，能够提取双眼的信息特征，并且也能够单独的使用双眼的特征信息来预测视线方向，结果仍然优于一般的单眼视线追踪方法；(1) Invented the information feature extraction model of both eyes, which can extract the information features of both eyes, and can also use the feature information of both eyes alone to predict the line of sight direction, and the result is still better than the general monocular line of sight tracking method;

(2)考虑到同一时刻的双眼图像中存在着某种相关性的联系，发明了人眼联合信息特征的提取模型，能够提取双眼的相关性特征信息，有效的结合了双眼的图像信息,同时仅仅使用双眼的相关性特征信息来预测视线方向，结果仍优于一般的单眼视线追踪方法；(2) Considering that there is a certain correlation between the binocular images at the same moment, an extraction model of human eye joint information features is invented, which can extract the correlation feature information of both eyes, effectively combining the image information of both eyes, and at the same time Only using the correlation feature information of both eyes to predict the gaze direction, the result is still better than the general monocular gaze tracking method;

(3)建立了一个多路的神经网络，网络通过输入特征信息，通过回归的方法能够更精确的预测得到双眼的三维视线方向，同时能够有效的解决某些单眼图像存在较大噪声而导致预测结果不准确的问题。(3) A multi-channel neural network is established. The network can more accurately predict the three-dimensional line of sight direction of the two eyes through the method of regression by inputting feature information. The problem with inaccurate results.

附图说明Description of drawings

图1是本发明的网络结构示意简图，其中a为发明内容中步骤(2)中的双通道模型，b为发明内容中步骤(3)中的单通道模型，c为发明内容中步骤(4)的视线预测模型；Fig. 1 is a schematic diagram of the network structure of the present invention, wherein a is a dual-channel model in step (2) in the summary of the invention, b is a single-channel model in step (3) in the summary of the invention, and c is a step in the summary of the invention ( 4) line-of-sight prediction model;

图2是本发明的基础神经网络结构示意图；Fig. 2 is a schematic diagram of the basic neural network structure of the present invention;

图3是本发明的基于用户双眼分析匹配视线方向的计算方法的总体结构图；Fig. 3 is the overall structural diagram of the calculation method based on the user's binocular analysis and matching line-of-sight direction of the present invention;

图4是本发明的模型训练流程图。Fig. 4 is a flow chart of model training in the present invention.

具体实施方式detailed description

下面结合附图对本发明的具体实施作详细说明。The specific implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.

本发明提供了一种人眼左右眼图像联合的视线方向计算方法，输入人眼信息特征，预测人们的双眼视线方向，同时，分别提出了单通道和双通道深度神经网络模型，用来提取方法中使用到的信息特征。本方法对系统没有额外需求，仅使用单相机拍摄的人眼图像作为输入。同时，本发明通过结合双眼的图像信息，能够消除某些单眼噪声较大的误差情况，从而实现了相比其它类似方法更好的鲁棒性。The present invention provides a line-of-sight calculation method for the combination of left and right eye images of human eyes, which inputs the information characteristics of human eyes to predict the line-of-sight direction of people's eyes. Information features used in . This method has no additional requirements on the system, and only uses the human eye image captured by a single camera as input. At the same time, by combining the image information of both eyes, the present invention can eliminate some error situations with relatively large monocular noise, thereby achieving better robustness than other similar methods.

首先，针对人眼图像获取，本发明包含以下流程。使用单相机，拍摄含有用户面部区域的图像。利用已有人脸分析方法定位左眼或右眼区域。对提取出的人眼图像进行预处理，得到对头部位置进行了修正的固定像素大小的人眼图像。First, for the acquisition of human eye images, the present invention includes the following procedures. Using a single camera, capture an image that contains areas of the user's face. Use existing face analysis methods to locate the left or right eye area. The extracted human eye image is preprocessed to obtain a human eye image with a fixed pixel size corrected for the head position.

其次，发明了同时提取左眼和右眼信息特征的双通道深度神经网络模型，在输入双眼图像的前提下，左眼和右眼图像分别进入一个通道，并经过各通道的单独处理后，分别得到左眼和右眼的特征信息。Secondly, a dual-channel deep neural network model that extracts the information features of the left and right eyes at the same time was invented. On the premise of inputting the binocular images, the left-eye and right-eye images enter a channel respectively, and after being processed separately by each channel, respectively The characteristic information of the left eye and the right eye is obtained.

进一步，发明了针对于双眼图像输入的人眼联合信息特征提取的单通道深度神经网络模型，在输入双眼图像的前提下，建立深度神经网络模型，模型同时处理左眼和右眼的图像信息，并结合处理后的图像信息，模型再对结合的信息进一步处理，从而得到关于左眼和右眼的联合信息特征。Furthermore, a single-channel deep neural network model for the extraction of human eye joint information features for binocular image input was invented. On the premise of inputting binocular images, a deep neural network model was established. The model simultaneously processes the image information of the left and right eyes. Combined with the processed image information, the model further processes the combined information to obtain joint information features about the left and right eyes.

最后，通过结合上面的两个网络模型，发明了人眼左右眼图像联合的视线方向确定方法。方法建立了多路深度神经网络视线方向预测模型，通过融合上面的两个信息特征提取模型，连接得到的双眼的信息特征和双眼的联合信息特征，回归分析得到双眼的预测视线方向，并且通过统计预测的视线方向与真实视线方向的角度偏差，来衡量当前模型的好坏。模型使用梯度下降的方法进行自主优化，通过使用公式：Finally, by combining the above two network models, a method for determining the gaze direction of the left and right eye images of the human eye is invented. Methods A multi-channel deep neural network line-of-sight prediction model was established. By fusing the above two information feature extraction models, the obtained information features of both eyes and the joint information features of both eyes were obtained, and the predicted line-of-sight direction of both eyes was obtained through regression analysis. The angular deviation between the predicted line of sight direction and the real line of sight direction is used to measure the quality of the current model. The model uses the gradient descent method for autonomous optimization, by using the formula:

计算角度偏差，其中n代表输入的图像对的数量，并且以降低角度偏差为目标，对模型进行不断的优化，在联合优化的过程中，每当输入一对人眼图像以及真实视线方向，就对模型进行一次迭代优化，当输入完所有的已知图像信息后，优化过程结束，并得到了最终的模型。实际应用中，模型通过接收一对全新的人眼图像，直接的预测人眼图像的视线方向。Calculate the angle deviation, where n represents the number of input image pairs, and aim to reduce the angle deviation, and continuously optimize the model. In the process of joint optimization, whenever a pair of human eye images and the real line of sight direction are input, the An iterative optimization of the model is performed. When all known image information is input, the optimization process ends and the final model is obtained. In practical applications, the model directly predicts the gaze direction of the human eye image by receiving a pair of brand new human eye images.

同时，视线方向估计方法具有可缩减性，可以单独将人眼左右眼的信息特征或联合信息特征用于回归分析，得到双眼的预测视线方向，并采用输入图像的真实视线方向与输出预测视线方向之间的平均角度偏差作为误差项，对预测模型进行自适应的调整。同样，视线方向估计方法具有可添加性，在得到了双眼的信息特征和双眼的联合信息特征后，可以直接添加一些其他的相关信息特征，并以所有的特征为整体，进行回归分析判断。At the same time, the gaze direction estimation method is scalable, and the information features or joint information features of the left and right eyes of the human eye can be used for regression analysis to obtain the predicted gaze direction of both eyes, and the real gaze direction of the input image and the output prediction gaze direction can be used The average angular deviation between them is used as an error term to adjust the prediction model adaptively. Similarly, the gaze direction estimation method is additive. After obtaining the information features of both eyes and the joint information features of both eyes, some other related information features can be directly added, and regression analysis and judgment can be performed based on all the features.

下面再详细进行说明，参阅图1本发明的网络结构示意简图，具体如下：Describe in detail below again, referring to Fig. 1 schematic diagram of the network structure of the present invention, specifically as follows:

图1中的(a)是用于同时提取左眼和右眼信息特征的双通道深度神经网络模型的结构示意简图TE-I。模型通过输入双眼的图像，图像经过基于卷积神经网络(CNN)的网络的处理分别输出左眼以及右眼的特征信息，特征信息称呼于双眼特征，由双眼特征也可以预测得到双眼的视线方向；(a) in Fig. 1 is a schematic structural diagram TE-I of a dual-channel deep neural network model for simultaneously extracting left-eye and right-eye information features. The model inputs the image of both eyes, and the image is processed by a convolutional neural network (CNN)-based network to output the feature information of the left eye and the right eye respectively. The feature information is called the feature of the eyes, and the line of sight direction of the eyes can also be predicted by the features of the eyes. ;

图1中的(b)是用于同时提取左眼和右眼信息特征的单通道深度神经网络模型结构示意图TE-II。输入双眼图像后，图像首先分别经过一个基于CNN的网络处理，得到各自独立的特征信息，随后经过一个全连接层，将各自独立的特征信息融合而最终得到了双眼相关性信息，同样的，仅仅由双眼的相关性信息，也可以预测得到双眼的视线方向；(b) in Figure 1 is a schematic diagram of the single-channel deep neural network model structure TE-II for simultaneously extracting information features of the left eye and the right eye. After inputting binocular images, the images are firstly processed by a CNN-based network to obtain independent feature information, and then pass through a fully connected layer to fuse the independent feature information to finally obtain binocular correlation information. Similarly, only From the correlation information of both eyes, the line of sight direction of both eyes can also be predicted;

图1中的(c)是人眼左右眼图像联合的视线方向计算方法的网络模型的结构示意图TE-A，通过结合图1中的(a)和图1中的(b)中的两种模型结构，通过一次性获得双眼特征和双眼相关性特征，并以这些信息为整体特征，回归分析得到双眼的视线方向。(c) in Figure 1 is a schematic structural diagram of the network model TE-A of the line-of-sight direction calculation method for the combination of left and right eye images of the human eye, by combining the two types of (a) in Figure 1 and (b) in Figure The model structure obtains binocular features and binocular correlation features at one time, and uses this information as an overall feature to obtain the gaze direction of both eyes through regression analysis.

在上述三种模型中，由于考虑到当使用相机从正面拍摄人眼外观时，用户头部位置的不同，会使得拍摄到的人眼图像有着不同的形变，尽管人眼图像在最初进行了相应的变换以消除头部位置的影响，但影响无法全部消除，所以在进行预测分析视线方向的时候，将头部位置向量也加入到最终的特征集中。In the above three models, due to the consideration of the difference in the position of the user’s head when the camera is used to capture the appearance of the human eye from the front, the captured human eye images will have different deformations, although the human eye images were initially adjusted accordingly. Transformation to eliminate the influence of the head position, but the influence cannot be completely eliminated, so when predicting and analyzing the direction of sight, the head position vector is also added to the final feature set.

参阅图2本发明的基础神经网络结构示意图。为了能从图像中提取出优异的特征信息，考虑到目前卷积神经网络在图像处理中的优异表现，方法采用了CNN网络作为特征提取的基础网络。网络的输入为一张36×60的灰度图片，输出为x维特征，x的具体数值可以自行设定。图片经过输入后，首先经过一层卷积，卷积核的大小设定为5×5，输出通道个数设定为20个，经过第一层卷积后，输出的便是20张32×56大小的图片，然后，图片经过最大池化层，通过2×2的池化后，输出20张16×28的图片。随后，将这20张图片再次进行卷积，卷积核仍为5×5，输出通道为50，一共输出50张12×24的图片，再将这50张图片经过2×2的最大池化层，得到50张6×12的图片。最后，将这50张6×12的图片摊开，得到50×6×12个数，通过全连接层，得到最终想要的x维特征。Refer to Fig. 2 for a schematic diagram of the basic neural network structure of the present invention. In order to extract excellent feature information from images, considering the excellent performance of convolutional neural network in image processing, the method uses CNN network as the basic network for feature extraction. The input of the network is a 36×60 grayscale image, and the output is an x-dimensional feature, and the specific value of x can be set by yourself. After the image is input, it first goes through a layer of convolution, the size of the convolution kernel is set to 5×5, and the number of output channels is set to 20. After the first layer of convolution, the output is 20 images of 32× 56-sized pictures, and then, the pictures go through the maximum pooling layer, and after passing through 2×2 pooling, 20 16×28 pictures are output. Then, the 20 pictures are convolved again, the convolution kernel is still 5×5, the output channel is 50, and a total of 50 pictures of 12×24 are output, and then these 50 pictures are subjected to 2×2 maximum pooling Layer, get 50 6×12 pictures. Finally, the 50 6×12 pictures are spread out to obtain 50×6×12 numbers, and the final desired x-dimensional features are obtained through the fully connected layer.

参阅图3本发明的一种人眼左右眼图像联合的视线方向确定方法的总体结构图。本发明通过自主组建神经网络，并以此来预测分析用户的双眼三维视线方向。方法通过输入固定大小的人眼灰度图片，以及头部角度向量，得到1506维特征向量，然后再通过回归分析，得到6维的双眼视线方向。本发明的总体结构同样包含了发明内容中的步骤(2)、(3)的网络。步骤(2)的网络总体结构如图3中以上面两张人眼图片作为输入的结构，网络分别输入左眼和右眼的图像，然后使用图2中的CNN网络对图像进行卷积，并且最后的特征数量x设定为1000，得到了长度为1000的特征向量；特征向量再分别经过一个全连接层(FC),分别得到了500维的特征向量；最后，简单的将这两个500维的特征向量连接起来，作为第一部分的输出特征。步骤(3)的网络总体结构如图3中以下两张图像作为输入的结构，第二部分同样使用左眼和右眼图像作为输入，通过使用CNN网络对图像进行卷积，并且最终的特征数量x设定为500，分别得到了500维的特征向量，然后，简单的将这500维的特征向量连接成为1000维的向量，并通过一个全连接层对特征向量进行融合，得到500维的特征向量，并以这500维的特征向量作为第二部分的输出；同样的，由于考虑到了头部位置的不同，将会对图像造成一些不可消除的影响，本发明将头部位置向量作为第三部分输入，并不经过处理，直接添加到最终的特征向量中。三个部分分别输出了1000维，500维，6维的特征向量，将这些向量进行连接，便得到了1506维的最终特征。Refer to FIG. 3 for an overall structural diagram of a method for determining the line-of-sight direction combined with images of the left and right eyes of the present invention. The present invention predicts and analyzes the user's binocular three-dimensional line of sight direction by independently constructing a neural network. Methods By inputting a fixed-size human eye grayscale image and a head angle vector, a 1506-dimensional feature vector is obtained, and then a 6-dimensional binocular gaze direction is obtained through regression analysis. The overall structure of the present invention also includes the network of steps (2) and (3) in the summary of the invention. The overall structure of the network in step (2) is shown in Figure 3 with the above two human eye pictures as the input structure, the network inputs the images of the left eye and the right eye respectively, and then uses the CNN network in Figure 2 to convolve the images, and The final number of features x is set to 1000, and a feature vector with a length of 1000 is obtained; the feature vectors pass through a fully connected layer (FC) respectively, and a 500-dimensional feature vector is obtained respectively; finally, simply combine these two 500 Dimensional feature vectors are concatenated as the output features of the first part. The overall structure of the network in step (3) is shown in Figure 3 with the following two images as input structures. The second part also uses the left-eye and right-eye images as input, and the image is convoluted by using the CNN network, and the final number of features Set x to 500 to obtain 500-dimensional feature vectors, and then simply connect the 500-dimensional feature vectors into a 1000-dimensional vector, and fuse the feature vectors through a fully connected layer to obtain 500-dimensional features vector, and use the 500-dimensional feature vector as the output of the second part; similarly, due to the consideration of the difference in head position, some irreversible effects will be caused on the image, the present invention uses the head position vector as the third Part of the input, without processing, is directly added to the final feature vector. The three parts output 1000-dimensional, 500-dimensional, and 6-dimensional feature vectors respectively, and these vectors are connected to obtain the final feature of 1506 dimensions.

参阅图4本发明的基于用户双眼图像的视线方向预测流程图，结合前文描述的相关具体技术，以下介绍基于用户双眼图像的视线方向预测的具体实施过程。Referring to FIG. 4 , the flow chart of the gaze direction prediction based on the user's binocular image of the present invention, combined with the related specific technologies described above, the specific implementation process of the gaze direction prediction based on the user's binocular image is introduced below.

首先，通过对图3中提出的模型使用简单的随机赋予初值的方法，初始化该预测模型。随后，输入经过处理后的人眼图像，每当输入一对人眼图像I_l和I_r，通过网络即可获得一对预测的三维视线方向f(I)_l和f(I)_r，分别代表了左眼和右眼的视线方向。再通过与原始的三维视线方向g_l和g_r进行比较，求得预测的角度偏差，然后通过使用梯度下降的方法，以降低角度偏差为目标，不断的优化网络，每当输入一对图像，便对网络参数进行一次迭代调整。当输入完全部图像后，得到的就是最终的预测模型。在最终的预测模型中，通过输入图像，便可以预测出图像所对应的视线方向。First, the predictive model is initialized by using a simple method of randomly assigning initial values to the model presented in Fig. 3. Then, the processed human eye image is input. Whenever a pair of human eye images I _l and I _r are input, a pair of predicted three-dimensional line-of-sight directions f(I) _l and f(I) _r can be obtained through the network, respectively Represents the gaze direction of the left and right eyes. Then, by comparing with the original three-dimensional line-of-sight directions g _l and g _r , the predicted angle deviation is obtained, and then the network is continuously optimized by using the gradient descent method to reduce the angle deviation. Whenever a pair of images is input, An iterative adjustment of the network parameters is performed. When all the images are input, the final prediction model is obtained. In the final prediction model, by inputting an image, the line-of-sight direction corresponding to the image can be predicted.

以上所述仅为本发明的一个代表性实施例，依据本发明的技术方案所做的任何等效变换，均应属于本发明的保护范围。The above is only a representative embodiment of the present invention, and any equivalent transformation made according to the technical solution of the present invention shall belong to the protection scope of the present invention.

Claims

1. A line-of-sight calculation method for the combination of left and right eye images of human eyes, characterized in that, comprising the following steps:

(1) Capture the user's face image, locate the left eye or right eye area, preprocess the human eye image, realize the correction of the head position, and obtain the human eye image with a fixed pixel size;

(2) Establish a dual-channel model, input the image information of the left eye and the right eye in the human eye image respectively, use the deep neural network model to extract and output the information features of the left eye and the right eye respectively;

(3) Establish a single-channel model, input the image information of the left eye and the right eye, use the deep neural network model to extract and output the joint information features of the left and right eye images;

(4) Using the regression analysis method, combining the information features of the left and right eyes and the joint information features of the left and right eye images, and through joint optimization, predict the three-dimensional line of sight directions corresponding to the two eyes respectively; or use the information of the left and right eyes alone Features or the joint information features of the left and right eye images, using the method of regression analysis, after optimization, predict the three-dimensional line of sight directions corresponding to the two eyes respectively.

2. the line-of-sight calculation method of the combination of left and right eye images of human eyes according to claim 1, characterized in that: said step (2) sets up a two-channel model, and inputs the image information of left eye and right eye in the human eye image respectively , the specific process of extracting and outputting the information features of the left eye and the right eye through the dual-channel model is as follows:

(21) Input the fixed-sized left-eye and right-eye images _Il and _Ir after correction into the dual-channel model, and _Il and _Ir are processed by one channel respectively;

(22) Each channel is a deep neural network model, and the model performs convolution, pooling, and full connection operations on the input human eye image, and outputs a fixed-length feature vector;

(23) The fixed-length feature vector generated by each channel is the information feature of the corresponding input image extracted by the deep neural network, and the information features generated by the two channels are connected to obtain the final information of the left and right eyes feature.

3. the line-of-sight calculation method of the combination of left and right eye images of human eyes according to claim 1, characterized in that: the step (3) sets up a single-channel model, inputs the picture information of the left eye and the right eye, and uses the single-channel model The specific process of extracting and outputting the joint information features of left and right eye images is as follows:

(31) Input the corrected fixed-size human eye image into the single-channel model;

(32) Use the deep neural network model to perform convolution, pooling, and full connection operations on the images of the left and right eyes respectively, and output the simplified information features of the left and right eyes;

(33) Connect the information features of the left and right eyes, add multiple fully connected layers after the deep neural network model, use the fully connected layers to merge the information features of the left and right eyes, and finally obtain the joint information features of the left and right eye images.

4. The line-of-sight calculation method of the combination of left and right eye images of human eyes according to claim 1, characterized in that: said step (4) utilizes the method of regression analysis, combines the information characteristics of left eye and right eye, and left and right eyes The specific process of predicting the three-dimensional line-of-sight direction corresponding to both eyes is as follows:

(41) Input the corrected left eye and right eye images I _l and I _r , and the real left eye sight direction g _l and the real right eye sight direction g _r corresponding to the images;

(42) Using the deep neural network model proposed in step (2) and step (3), extract the corresponding binocular information feature and joint information feature of the image;

(43) Connect all the extracted information features or use a single feature as the overall feature, and use the method of regression analysis to obtain the predicted left eye sight direction f(I) _l and the predicted right eye sight direction f(I) _r ;

(44) Use the angle difference as the error value, and use the method of gradient descent to iteratively optimize the model, so that the predicted line of sight direction is getting closer and closer to the real line of sight direction;

(45) Select the model whose predicted line-of-sight direction is closest to the real line-of-sight direction as the final model. The model obtains the predicted line-of-sight direction by inputting the human eye image, and takes this line-of-sight direction as the final prediction result.