CN116704589B

CN116704589B - A gaze point estimation method, electronic device and computer readable storage medium

Info

Publication number: CN116704589B
Application number: CN202211531249.6A
Authority: CN
Inventors: 孙贻宝; 周俊伟; 舒畅; 彭金平
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2024-06-11
Anticipated expiration: 2042-12-01
Also published as: CN116704589A

Abstract

The application provides a gaze point estimation method, electronic equipment and a computer readable storage medium, and relates to the technical field of computers. In the scheme, even if the electronic equipment shoots in an excessively dark or excessively bright environment of a scene, two-dimensional and three-dimensional information of an eye image and eye key points of a user can be acquired, and the gazing information of the user on a display screen can be accurately determined. The method comprises the following steps: acquiring an infrared image and a depth image of a user through a preset camera; identifying an infrared image to obtain a first eye region image of a user and first position information of eye key points of the user in the first eye region image; obtaining a first gaze feature for indicating eye and gaze correlation of the user based on the first eye region image; obtaining second gazing features for indicating three-dimensional features of eye key points of the user from the depth image based on the first position information; and determining the fixation information of the user on the display screen according to the first fixation characteristic and the second fixation characteristic.

Description

A gaze point estimation method, electronic device and computer readable storage medium

技术领域Technical Field

本申请涉及计算机技术领域，尤其涉及一种注视点估计方法、电子设备和计算机可读存储介质。The present application relates to the field of computer technology, and in particular to a gaze point estimation method, an electronic device, and a computer-readable storage medium.

背景技术Background technique

注视点，是指视知觉过程中，用户视线对准的目标对象的某一点。目前，电子设备可以协助用户的工作。在电子设备协助用户工作的过程中，电子设备可以根据用户的活动，确定用户的注视点，从而根据用户的注视点，协助用户工作。例如，在用户阅读电子书的场景中，手机可以确定用户的注视点，根据用户的注视点判断用户是否有翻页的意向，若有，则主动为用户提供翻页操作。又如，在浏览网页的场景中，手机可以确定用户的注视点，根据用户的注视点判断用户感兴趣的内容，主动为用户推荐用户感兴趣的内容。The fixation point refers to a certain point of the target object that the user's line of sight is aimed at during the visual perception process. At present, electronic devices can assist users in their work. In the process of electronic devices assisting users in their work, the electronic devices can determine the user's fixation point based on the user's activities, and thus assist the user in their work based on the user's fixation point. For example, in a scenario where a user is reading an e-book, the mobile phone can determine the user's fixation point, and determine whether the user has the intention to turn the page based on the user's fixation point. If so, the mobile phone actively provides the user with a page-turning operation. For another example, in a scenario where a user is browsing a web page, the mobile phone can determine the user's fixation point, and determine the content that the user is interested in based on the user's fixation point, and actively recommend the content that the user is interested in to the user.

电子设备协助用户工作的过程中，需要用户面对电子设备，使得电子设备可以获取用户的眼部几何特征(或者称眼部的关键点信息，例如，眼部轮廓、眼角开合度、瞳孔方向等)，并根据获取到的眼部几何特征预测用户的注视点。然而，在实际应用中，在一些特殊拍摄环境中，例如场景过暗或过亮的环境，电子设备无法获取到用户的眼部几何特征，从而导致电子设备无法准确确定用户的注视点，进而导致电子设备协助用户工作失败，降低了电子设备协助用户工作的可靠性。When an electronic device assists a user in their work, the user needs to face the electronic device so that the electronic device can obtain the user's eye geometry features (or key point information of the eye, such as eye contour, eye canthus opening, pupil direction, etc.), and predict the user's gaze point based on the obtained eye geometry features. However, in actual applications, in some special shooting environments, such as environments where the scene is too dark or too bright, the electronic device cannot obtain the user's eye geometry features, which results in the electronic device being unable to accurately determine the user's gaze point, which in turn causes the electronic device to fail to assist the user in their work, reducing the reliability of the electronic device in assisting the user in their work.

发明内容Summary of the invention

本申请实施例提供一种注视点估计方法、电子设备和计算机可读存储介质，用于解决现有技术中电子设备无法准确确定用户的注视点的问题。通过本方案，可以较准确确定用户的注视点。The embodiments of the present application provide a gaze point estimation method, an electronic device, and a computer-readable storage medium, which are used to solve the problem that the electronic device in the prior art cannot accurately determine the user's gaze point. Through this solution, the user's gaze point can be determined more accurately.

为达到上述目的，本申请的实施例采用如下技术方案：To achieve the above objectives, the embodiments of the present application adopt the following technical solutions:

第一方面，提供了一种注视点估计方法，该方法应用于电子设备，电子设备包括预设摄像头和显示屏，该方法包括：通过预设摄像头采集用户的红外线图像和深度图像；识别红外线图像，得到用户的第一眼部区域图像和第一眼部区域图像中用户的眼部关键点的第一位置信息；基于第一眼部区域图像得到第一注视特征，其中，第一注视特征用于指示用户的眼部的二维特征；基于第一位置信息从深度图像中，得到第二注视特征，其中，第二注视特征用于指示用户的眼部关键点的三维特征；根据第一注视特征和第二注视特征，确定用户在显示屏上的注视信息。In a first aspect, a gaze point estimation method is provided, which is applied to an electronic device, wherein the electronic device includes a preset camera and a display screen, and the method includes: collecting an infrared image and a depth image of a user through the preset camera; identifying the infrared image to obtain a first eye area image of the user and first position information of the user's eye key points in the first eye area image; obtaining a first gaze feature based on the first eye area image, wherein the first gaze feature is used to indicate a two-dimensional feature of the user's eye; obtaining a second gaze feature from the depth image based on the first position information, wherein the second gaze feature is used to indicate a three-dimensional feature of the user's eye key points; and determining the user's gaze information on the display screen according to the first gaze feature and the second gaze feature.

本方案中，即使在黑暗或者过亮的场景中，电子设备拍摄到的红外线(InfraredRadiation，IR)图像里面也可以有较清晰的用户面部图像，并从该IR图像中提取到用户的眼部区域图像(即下文中的第一眼部区域图像)，以及该用户眼部区域图像中用户的眼部关键点的位置信息。由于用户的IR图像和深度图像的内容基本相同，且相同内容在各自图像中的位置基本相同，因此，可以通过前述得到的该用户眼部区域图像中用户的眼部关键点的位置信息(即下文中的第一位置信息)从深度图像中得到该图对应的用于指示用户的眼部关键点的三维特征(即下文中的第二注视特征)。而且，电子设备也可以从IR图像对应的眼部区域图像中提取到用于指示用户的眼部的二维特征信息(即下文中的第一注视特征)第二注视特征。这样，电子设备便可以基于前述两种注视特征信息确定用户在显示屏上的注视信息。这样，电子设备便可以基于前述两种注视特征信息准确确定用户在显示屏上的注视信息。在电子设备根据用户的注视点，协助用户工作的场景中，可以使电子设备成功协助用户工作，提高了电子设备协助用户工作的可靠性。In this solution, even in a dark or overly bright scene, the infrared radiation (IR) image captured by the electronic device can also contain a clearer user facial image, and the user's eye area image (i.e., the first eye area image hereinafter) and the position information of the user's eye key points in the user's eye area image can be extracted from the IR image. Since the contents of the user's IR image and the depth image are basically the same, and the positions of the same contents in their respective images are basically the same, the three-dimensional features (i.e., the second gaze feature hereinafter) corresponding to the image for indicating the user's eye key points can be obtained from the depth image through the aforementioned position information of the user's eye key points in the user's eye area image (i.e., the first position information hereinafter). Moreover, the electronic device can also extract the two-dimensional feature information (i.e., the first gaze feature hereinafter) and the second gaze feature for indicating the user's eyes from the eye area image corresponding to the IR image. In this way, the electronic device can determine the user's gaze information on the display screen based on the aforementioned two kinds of gaze feature information. In this way, the electronic device can accurately determine the user's gaze information on the display screen based on the aforementioned two kinds of gaze feature information. In the scenario where the electronic device assists the user in working according to the user's gaze point, the electronic device can successfully assist the user in working, thereby improving the reliability of the electronic device in assisting the user in working.

在第一方面的一种可能的实现方式中，上述根据第一注视特征和第二注视特征，确定用户在显示屏上的注视信息，包括：根据第一注视特征，获取第一人眼位姿参数；根据第二注视特征，获取第二人眼位姿参数；其中，第一人眼位姿参数和第二人眼位姿参数均包括用户眼睛的视线朝向信息，用户眼睛的视线朝向信息包括用户眼睛在预设三维坐标系下的旋转角、俯仰角和航向角；基于第一人眼位姿参数和第二人眼位姿参数确定用户在显示屏上的注视信息。In a possible implementation manner of the first aspect, the above-mentioned determining the gaze information of the user on the display screen according to the first gaze feature and the second gaze feature includes: obtaining a first eye posture parameter according to the first gaze feature; obtaining a second eye posture parameter according to the second gaze feature; wherein the first eye posture parameter and the second eye posture parameter both include line of sight direction information of the user's eyes, and the line of sight direction information of the user's eyes includes a rotation angle, a pitch angle and a heading angle of the user's eyes in a preset three-dimensional coordinate system; and determining the user's gaze information on the display screen based on the first eye posture parameter and the second eye posture parameter.

由于第一注视特征为用于指示用户的眼部关键点的二维特征；基于第二眼部区域图像得到第二注视特征；第二注视特征为用于指示用户的眼部关键点的三维特征。用户的眼部关键点的二维特征和用户的眼部关键点的三维特征可以预测用户眼睛的朝向，因此，电子设备可以根据第一注视特征和第二注视特征，先确定用户眼睛的朝向，再基于用户眼睛的朝向，确定用户在显示屏上的注视信息。Since the first gaze feature is a two-dimensional feature for indicating the key points of the user's eyes, the second gaze feature is obtained based on the second eye region image, and the second gaze feature is a three-dimensional feature for indicating the key points of the user's eyes. The two-dimensional features of the key points of the user's eyes and the three-dimensional features of the key points of the user's eyes can predict the direction of the user's eyes. Therefore, the electronic device can first determine the direction of the user's eyes according to the first gaze feature and the second gaze feature, and then determine the user's gaze information on the display screen based on the direction of the user's eyes.

在第一方面的一种可能的实现方式中，上述预设三维坐标系为以预设摄像头的光心为原点的三维坐标系。In a possible implementation manner of the first aspect, the preset three-dimensional coordinate system is a three-dimensional coordinate system with an optical center of a preset camera as its origin.

在第一方面的一种可能的实现方式中，第一人眼位姿参数和第二人眼位姿参数与用户在显示屏上的注视信息之间存在预设对应关系，基于第一人眼位姿参数和第二人眼位姿参数确定用户在显示屏上的注视信息，包括：利用第一人眼位姿参数和第二人眼位姿参数从预设对应关系确定用户在显示屏上的注视信息。In a possible implementation of the first aspect, there is a preset correspondence between the first human eye posture parameter, the second human eye posture parameter and the user's gaze information on the display screen, and determining the user's gaze information on the display screen based on the first human eye posture parameter and the second human eye posture parameter includes: using the first human eye posture parameter and the second human eye posture parameter to determine the user's gaze information on the display screen from the preset correspondence.

在第一方面的一种可能的实现方式中，在识别红外线图像，得到第一注视特征之前，该方法还包括：确定红外线图像中包括人眼图像。In a possible implementation manner of the first aspect, before recognizing the infrared image and obtaining the first gaze feature, the method further includes: determining that the infrared image includes a human eye image.

在一些情况下，用户的头部在一定时间内可能不是一直静止的，还可以转向、抬头、低头等。如此，电子设备拍摄的用户的IR图像和深度图像中可能就不存在用户的眼部图像了。若电子设备拍摄的用户的IR图像和深度图像中不存在用户的眼部图像，电子设备执行该步骤要消耗一定的能量和时间。于是，为了节省电子设备执行该步骤所消耗的能量和时间，电子设备也可以先判断用户的IR图像中是不是存在用户的眼部图像，若存在，再识别红外线图像，得到第一注视特征。In some cases, the user's head may not be still for a certain period of time, and may turn, look up, or lower the head. In this way, the user's eye image may not exist in the IR image and depth image of the user taken by the electronic device. If the user's eye image does not exist in the IR image and depth image of the user taken by the electronic device, the electronic device will consume a certain amount of energy and time to perform this step. Therefore, in order to save the energy and time consumed by the electronic device to perform this step, the electronic device may also first determine whether the user's eye image exists in the user's IR image. If so, it can then identify the infrared image to obtain the first gaze feature.

在第一方面的一种可能的实现方式中，通过预设摄像头采集用户的红外线图像和深度图像，包括：接收第一操作，第一操作用于触发电子设备启动预设应用；响应于第一操作，启动预设应用后，周期性通过预设摄像头采集用户的红外线图像和深度图像。即电子设备中可以设置一些应用，该应用可以支持本申请的注视点估计方法，则电子设备可以在启动该应用后，周期性通过预设摄像头采集用户的红外线图像和深度图像。In a possible implementation of the first aspect, collecting infrared images and depth images of a user through a preset camera includes: receiving a first operation, the first operation is used to trigger the electronic device to start a preset application; in response to the first operation, after starting the preset application, periodically collecting infrared images and depth images of the user through the preset camera. That is, some applications can be set in the electronic device, and the application can support the gaze point estimation method of the present application, then the electronic device can periodically collect infrared images and depth images of the user through the preset camera after starting the application.

在第一方面的一种可能的实现方式中，通过预设摄像头采集用户的红外线图像和深度图像，包括：若电子设备处于预设协同工作模式，周期性通过预设摄像头采集用户的红外线图像和深度图像。本方案中，协同工作的模式可以是指电子设备可以确定用户的注视点，根据用户的注视点确定用户的意向，基于用户的意向执行相应的操作的状态。例如，在用户阅读电子书的场景中，手机可以确定用户的注视点，根据用户的注视点判断用户是否有翻页的意向，若有，则主动为用户提供翻页操作。若电子设备处于协同工作模式，则电子设备也可以周期性通过预设摄像头采集用户的红外线图像和深度图像，以便于电子设备更准确的检测出用户在显示屏上的注视信息。In a possible implementation of the first aspect, the infrared image and depth image of the user are collected through a preset camera, including: if the electronic device is in a preset collaborative working mode, the infrared image and depth image of the user are periodically collected through the preset camera. In this scheme, the collaborative working mode may refer to a state in which the electronic device can determine the user's gaze point, determine the user's intention based on the user's gaze point, and perform corresponding operations based on the user's intention. For example, in a scenario where a user is reading an e-book, the mobile phone can determine the user's gaze point, and determine whether the user has the intention to turn the page based on the user's gaze point. If so, the mobile phone actively provides the user with a page turning operation. If the electronic device is in a collaborative working mode, the electronic device may also periodically collect the user's infrared image and depth image through a preset camera, so that the electronic device can more accurately detect the user's gaze information on the display screen.

在第一方面的一种可能的实现方式中，在通过预设摄像头采集用户的红外线图像和深度图像之前，该方法还包括：响应于用户对预设协同开关的开启操作，进入预设协同工作模式；其中，预设协同开关配置在电子设备的设置界面中，或者，预设协同开关配置在电子设备的控制中心中，或者，预设协同开关配置在电子设备的预设应用中。In a possible implementation of the first aspect, before collecting the infrared image and depth image of the user through a preset camera, the method also includes: in response to the user's turning on operation of a preset collaborative switch, entering a preset collaborative working mode; wherein the preset collaborative switch is configured in a setting interface of the electronic device, or the preset collaborative switch is configured in a control center of the electronic device, or the preset collaborative switch is configured in a preset application of the electronic device.

电子设备中设置一些应用，该些应用需要电子设备开启协同工作的模式以及电子设备在启动该应用的条件并存的情况下，周期性通过预设摄像头采集用户的红外线图像和深度图像。其中，预设协同开关可以配置在电子设备的设置界面中，或者，预设协同开关配置在电子设备的控制中心中，或者，预设协同开关配置在电子设备的预设应用中，电子设备可以响应于用户对该些任意一种预设协同开关的开启操作，进入预设协同工作模式。需要说明的是，电子设备可以在启动该应用之前或者之后，开启协同工作的模式。Some applications are set in the electronic device, and these applications require the electronic device to start the collaborative working mode and the electronic device periodically collects the user's infrared image and depth image through the preset camera when the conditions for starting the application coexist. Among them, the preset collaborative switch can be configured in the setting interface of the electronic device, or the preset collaborative switch is configured in the control center of the electronic device, or the preset collaborative switch is configured in the preset application of the electronic device. The electronic device can enter the preset collaborative working mode in response to the user's turning on operation of any of the preset collaborative switches. It should be noted that the electronic device can turn on the collaborative working mode before or after starting the application.

在第一方面的一种可能的实现方式中，上述预设摄像头是TOF摄像头或者3D结构光摄像头。In a possible implementation manner of the first aspect, the preset camera is a TOF camera or a 3D structured light camera.

在第一方面的一种可能的实现方式中，上述通过预设摄像头采集用户的红外线图像和深度图像，包括：在检测到用户注视显示屏时，通过预设摄像头采集红外线图像和深度图像。即电子设备只要检测用户注视显示屏，就可以执行本方案的注视点估计方法，以此来提高电子设备执行本方案的注视点估计方法的时长。这样，在电子设备根据用户的注视点，协助用户工作的场景中，可以提高电子设备准确协助用户工作的时长，从而提高用户体验。In a possible implementation of the first aspect, the above-mentioned collecting of infrared images and depth images of the user by a preset camera includes: collecting infrared images and depth images by a preset camera when detecting that the user is looking at the display screen. That is, as long as the electronic device detects that the user is looking at the display screen, it can execute the gaze point estimation method of the present solution, thereby increasing the duration of the electronic device executing the gaze point estimation method of the present solution. In this way, in a scenario where the electronic device assists the user in working according to the user's gaze point, the duration of the electronic device accurately assisting the user in working can be increased, thereby improving the user experience.

第二方面，提供了一种注视点估计方法，该方法应用于电子设备，电子设备包括预设摄像头和显示屏，该方法包括：通过预设摄像头采集用户的红外线图像和深度图像；将红外线图像和深度图像作为输入，运行人眼注视信息估计模型，得到用户在显示屏上的注视信息；其中，人眼注视信息估计模型用于：根据识别红外线图像得到用户的眼部区域的第一注视特征，以及识别深度图像得到用户的眼部区域的第二注视特征，输出用户在显示屏上的注视信息；其中，第一注视特征用于指示用户的眼部关键点的二维特征，第二注视特征用于指示用户的眼部关键点的三维特征。In a second aspect, a gaze point estimation method is provided, which is applied to an electronic device, wherein the electronic device includes a preset camera and a display screen, and the method includes: collecting an infrared image and a depth image of a user through a preset camera; using the infrared image and the depth image as input, running a human eye gaze information estimation model to obtain the user's gaze information on the display screen; wherein the human eye gaze information estimation model is used to: obtain a first gaze feature of the user's eye area based on identifying the infrared image, and obtain a second gaze feature of the user's eye area based on identifying the depth image, and output the user's gaze information on the display screen; wherein the first gaze feature is used to indicate a two-dimensional feature of the user's eye key point, and the second gaze feature is used to indicate a three-dimensional feature of the user's eye key point.

可以理解，模型可以具有强大快捷的预测功能，利用人眼注视信息估计模型，得到用户在显示屏上的注视信息，可以提高注视信息的估计速度，提高用户体验。且在电子设备根据用户的注视点，协助用户工作的场景中，可以提高了电子设备协助用户工作的效率，提升用户体验。It can be understood that the model can have a powerful and fast prediction function. By using the human eye gaze information estimation model to obtain the user's gaze information on the display screen, the estimation speed of the gaze information can be improved, and the user experience can be improved. In the scenario where the electronic device assists the user in working according to the user's gaze point, the efficiency of the electronic device in assisting the user in working can be improved, and the user experience can be improved.

在第二方面的一种可能的实现方式中，上述运行人眼注视信息估计模型还用于：根据识别红外线图像得到用户的眼部关键点的第一注视特征，以及识别深度图像得到用户的眼部关键点的第二注视特征，输出用户在显示屏上的注视信息；其中，第一注视特征用于指示用户的眼部与注视相关的二维特征，第二注视特征用于指示用户的眼部关键点的三维特征。In a possible implementation of the second aspect, the above-mentioned human eye gaze information estimation model is also used to: output the user's gaze information on the display screen based on a first gaze feature of the user's eye key points obtained by identifying an infrared image, and a second gaze feature of the user's eye key points obtained by identifying a depth image; wherein the first gaze feature is used to indicate two-dimensional features of the user's eyes related to gaze, and the second gaze feature is used to indicate three-dimensional features of the user's eye key points.

在第二方面的一种可能的实现方式中，上述第一人眼位姿参数和第二人眼位姿参数与用户在显示屏上的注视信息之间存在预设对应关系。上述运行人眼注视信息估计模型还用于：利用第一人眼位姿参数和第二人眼位姿参数，根据预设对应关系确定用户在显示屏上的注视信息。In a possible implementation of the second aspect, there is a preset correspondence between the first eye posture parameter and the second eye posture parameter and the gaze information of the user on the display screen. The running eye gaze information estimation model is also used to: use the first eye posture parameter and the second eye posture parameter to determine the gaze information of the user on the display screen according to the preset correspondence.

在第二方面的一种可能的实现方式中，上述预设三维坐标系为以预设摄像头的光心为原点的三维坐标系。In a possible implementation manner of the second aspect, the preset three-dimensional coordinate system is a three-dimensional coordinate system with an optical center of a preset camera as its origin.

第三方面，提供了一种电子设备，该电子设备包括存储器和一个或多个处理器；该存储器用于存储代码指令；该处理器用于运行该代码指令，使得该电子设备执行如第一方面和第二方面中任一种可能的设计方式中的注视点估计方法。In a third aspect, an electronic device is provided, which includes a memory and one or more processors; the memory is used to store code instructions; the processor is used to run the code instructions so that the electronic device executes a gaze point estimation method in any possible design mode of the first aspect and the second aspect.

第四方面，提供了一种计算机可读存储介质，该计算机可读存储介质包括计算机指令，当该计算机指令在电子设备上运行时，使得该电子设备执行如第一方面和第二方面中任一种可能的设计方式中的注视点估计方法。In a fourth aspect, a computer-readable storage medium is provided, which includes computer instructions. When the computer instructions are executed on an electronic device, the electronic device executes a gaze point estimation method in any possible design mode of the first aspect and the second aspect.

第五方面，提供了一种计算机程序产品，包括计算机程序/指令，该计算机程序/指令被处理器执行时实现第一方面和第二方面中任一种可能的设计方式中的注视点估计方法。In a fifth aspect, a computer program product is provided, comprising a computer program/instruction, which, when executed by a processor, implements the gaze point estimation method in any possible design manner in the first aspect and the second aspect.

其中，第二方面、第三方面、第四方面和第五方面中任一种设计方式所带来的技术效果可参见第一方面中不同设计方式所带来的技术效果，此处不再赘述。Among them, the technical effects brought about by any design method in the second, third, fourth and fifth aspects can refer to the technical effects brought about by different design methods in the first aspect, and will not be repeated here.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1示出了一种IR图像和深度图像的示意图；FIG1 shows a schematic diagram of an IR image and a depth image;

图2示出了一种手机100的结构示意图；FIG2 shows a schematic structural diagram of a mobile phone 100;

图3示出了一种获取用户的人脸图像的应用场景示意图；FIG3 is a schematic diagram showing an application scenario of obtaining a user's face image;

图4示出了一种注视点估计方法的流程示意图；FIG4 shows a schematic flow chart of a method for estimating a gaze point;

图5示出了一种注视点估计方法的流程示意图；FIG5 is a schematic diagram showing a flow chart of a method for estimating a gaze point;

图6示出了一种人眼注视信息估计模型的计算示意图；FIG6 shows a calculation schematic diagram of a human eye gaze information estimation model;

图7示出了一种手机100的界面示意图；FIG7 shows a schematic diagram of an interface of a mobile phone 100;

图8示出了一种电子书应用的界面示意图。FIG. 8 shows a schematic diagram of an interface of an e-book application.

具体实施方式Detailed ways

本申请的说明性实施例包括但不限于一种注视点估计方法、电子设备和计算机可读存储介质。The illustrative embodiments of the present application include, but are not limited to, a gaze point estimation method, an electronic device, and a computer-readable storage medium.

下面结合附图，对本申请的实施例进行描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。本领域普通技术人员可知，随着技术的发展和新场景的出现，本申请实施例提供的技术方案对于类似的技术问题，同样适用。The following describes the embodiments of the present application in conjunction with the accompanying drawings. Obviously, the described embodiments are only embodiments of a part of the present application, rather than all embodiments. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

为了解决背景技术中的技术问题，本申请实施例示出了一种注视点估计方法，该方法应用在具有预设摄像头和显示屏的电子设备。该方法包括：电子设备通过预设摄像头采集用户的红外线(Infrared Radiation，IR)图像和深度图像，例如，图1示出了一种IR图像和深度图像的示意图，如图1中所示的IR图像和深度图像。由于IR图像是通过红外光发射器发射调制后的红外光脉冲，不停地打在物体表面，经反射后被接收器接收，通过相位的变化来计算时间差，进而结合光速计算出物体深度信息，一般拍照效果不受环境光干扰。因此，即使在黑暗或者过亮的场景中，电子设备拍摄到的IR图像里面也可以有较清晰的用户面部图像，并从该IR图像中提取到用户的眼部区域图像(即下文中的第一眼部区域图像)，以及该用户眼部区域图像中用户的眼部关键点的位置信息。由于用户的IR图像和深度图像的内容基本相同，且相同内容在各自图像中的位置基本相同，因此，可以通过前述得到的该用户眼部区域图像中用户的眼部关键点的位置信息(即下文中的第一位置信息)从深度图像中得到该图对应的用于指示用户的眼部关键点的三维特征(即下文中的第二注视特征)。而且，电子设备也可以从IR图像对应的眼部区域图像中提取到用于指示用户的眼部的二维特征信息(即下文中的第一注视特征)第二注视特征。这样，电子设备便可以基于前述两种注视特征信息确定用户在显示屏上的注视信息。基于该两种注视特征信息确定用户在显示屏上的注视信息的具体方案将在下文中介绍。In order to solve the technical problems in the background technology, the embodiment of the present application shows a method for estimating the gaze point, which is applied to an electronic device with a preset camera and a display screen. The method includes: the electronic device collects an infrared (IR) image and a depth image of the user through a preset camera. For example, FIG1 shows a schematic diagram of an IR image and a depth image, such as the IR image and the depth image shown in FIG1. Since the IR image is a modulated infrared light pulse emitted by an infrared light transmitter, which continuously hits the surface of the object and is received by the receiver after reflection, the time difference is calculated by the change of the phase, and then the depth information of the object is calculated in combination with the speed of light. Generally, the photographing effect is not affected by the ambient light. Therefore, even in a dark or overly bright scene, the IR image captured by the electronic device can also have a clearer user facial image, and the user's eye area image (i.e., the first eye area image hereinafter) and the position information of the user's eye key points in the user's eye area image are extracted from the IR image. Since the contents of the IR image and the depth image of the user are basically the same, and the positions of the same contents in the respective images are basically the same, the three-dimensional features (i.e., the second gaze features hereinafter) corresponding to the image and used to indicate the key points of the user's eyes can be obtained from the depth image through the position information of the key points of the user's eyes in the eye area image of the user obtained above (i.e., the first position information hereinafter). Moreover, the electronic device can also extract the two-dimensional feature information (i.e., the first gaze feature hereinafter) and the second gaze feature indicating the user's eyes from the eye area image corresponding to the IR image. In this way, the electronic device can determine the user's gaze information on the display screen based on the aforementioned two types of gaze feature information. The specific scheme for determining the user's gaze information on the display screen based on the two types of gaze feature information will be introduced below.

本方案中，即使电子设备在场景过暗或过亮的环境拍摄用户，电子设备也可以获取到用户的眼部关键点的二维特征和三维特征的关键点信息，准确确定用户在显示屏上的注视信息。在电子设备根据用户的注视点，协助用户工作的场景中，可以使电子设备成功协助用户工作，提高了电子设备协助用户工作的可靠性。In this solution, even if the electronic device shoots the user in an environment where the scene is too dark or too bright, the electronic device can obtain the key point information of the two-dimensional features and three-dimensional features of the key points of the user's eyes, and accurately determine the user's gaze information on the display screen. In the scenario where the electronic device assists the user in working according to the user's gaze point, the electronic device can successfully assist the user in working, thereby improving the reliability of the electronic device in assisting the user in working.

示例性的，本申请实施例中的电子设备可以为手机、平板、个人计算机、智慧屏、可穿戴头戴设备(例如，虚拟现实头戴装置和增强现实头戴装置)等设备，其中，可穿戴头戴设备可以是增强现实(Augmented Reality，AR)眼镜、虚拟现实技术(Virtual Reality，VR)眼镜。Exemplarily, the electronic devices in the embodiments of the present application may be mobile phones, tablets, personal computers, smart screens, wearable head-mounted devices (for example, virtual reality head-mounted devices and augmented reality head-mounted devices), and other devices, wherein the wearable head-mounted devices may be augmented reality (AR) glasses and virtual reality (VR) glasses.

本申请实施例以电子设备为手机为例进行说明。图2示出了一种手机100的结构示意图。The present application embodiment is described by taking a mobile phone as an example of an electronic device. FIG2 shows a schematic diagram of the structure of a mobile phone 100 .

如图2所示，手机100可以包括处理器110，外部存储器接口120，内部存储器121，通用串行总线(universal serial bus，USB)接口130，充电管理模块140，电源管理模块141，电池142，天线1，天线2，移动通信模块150，无线通信模块160，音频模块170，扬声器170A，受话器170B，麦克风170C，耳机接口170D，传感器模块180，按键190，马达191，指示器192，摄像头193，显示屏(触摸屏)194，以及用户标识模块(subscriber identification module，SIM)卡接口195等。As shown in Figure 2, the mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen (touch screen) 194, and a subscriber identification module (SIM) card interface 195, etc.

可以理解的是，本实施例示意的结构并不构成对手机100的具体限定。在另一些实施例中，手机100可以包括比图示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。图示的部件可以以硬件，软件或软件和硬件的组合实现。It is to be understood that the structure shown in this embodiment does not constitute a specific limitation on the mobile phone 100. In other embodiments, the mobile phone 100 may include more or fewer components than shown in the figure, or combine some components, or separate some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.

处理器110可以包括一个或多个处理单元，例如：处理器110可以包括应用处理器(application processor，AP)，调制解调处理器，图形处理器(graphics processingunit，GPU)，图像信号处理器(image signal processor，ISP)，控制器，存储器，视频编解码器，数字信号处理器(digital signal processor，DSP)，基带处理器，和/或神经网络处理器(neural-network processing unit，NPU)等。其中，不同的处理单元可以是独立的器件，也可以集成在一个或多个处理器中。本申请实施例中，处理器110以获取用户的IR图像和深度图像，并基于该两种图像确定用户在手机100的显示屏(触摸屏)194的注视信息。具体的，NPU可以获取用户的IR图像和深度图像，并基于该两种图像确定用户在手机100的显示屏(触摸屏)194的注视信息。The processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a memory, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors. In an embodiment of the present application, the processor 110 obtains the IR image and depth image of the user, and determines the user's gaze information on the display screen (touch screen) 194 of the mobile phone 100 based on the two images. Specifically, the NPU can obtain the user's IR image and depth image, and determine the user's gaze information on the display screen (touch screen) 194 of the mobile phone 100 based on the two images.

控制器可以是手机100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号，产生操作控制信号，完成取指令和执行指令的控制。The controller may be the nerve center and command center of the mobile phone 100. The controller may generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.

处理器110中还可以设置存储器，用于存储指令和数据。在一些实施例中，处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据，可从所述存储器中直接调用。避免了重复存取，减少了处理器110的等待时间，因而提高了系统的效率。The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may store instructions or data that the processor 110 has just used or cyclically used. If the processor 110 needs to use the instruction or data again, it may be directly called from the memory. This avoids repeated access, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.

在一些实施例中，处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit，I2C)接口，集成电路内置音频(inter-integrated circuitsound，I2S)接口，脉冲编码调制(pulse code modulation，PCM)接口，通用异步收发传播器(universal asynchronous receiver/transmitter，UART)接口，移动产业处理器接口(mobile industry processor interface，MIPI)，通用输入输出(general-purposeinput/output，GPIO)接口，用户标识模块(subscriber identity module，SIM)接口，和/或通用串行总线(universal serial bus，USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.

可以理解的是，本实施例示意的各模块间的接口连接关系，只是示意性说明，并不构成对手机100的结构限定。在另一些实施例中，手机100也可以采用上述实施例中不同的接口连接方式，或多种接口连接方式的组合。It is understandable that the interface connection relationship between the modules shown in this embodiment is only a schematic illustration and does not constitute a structural limitation on the mobile phone 100. In other embodiments, the mobile phone 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

充电管理模块140用于从充电器接收充电输入。其中，充电器可以是无线充电器，也可以是有线充电器。充电管理模块140为电池142充电的同时，还可以通过电源管理模块141为电子设备供电。The charging management module 140 is used to receive charging input from a charger. The charger can be a wireless charger or a wired charger. While the charging management module 140 is charging the battery 142, it can also power the electronic device through the power management module 141.

电源管理模块141用于连接电池142，充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入，为处理器110，内部存储器121，外部存储器，显示屏194，摄像头193，和无线通信模块160等供电。在一些实施例中，电源管理模块141和充电管理模块140也可以设置于同一个器件中。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160. In some embodiments, the power management module 141 and the charging management module 140 can also be set in the same device.

手机100的无线通信功能可以通过天线1，天线2，移动通信模块150，无线通信模块160，调制解调处理器以及基带处理器等实现。在一些实施例中，手机100的天线1和移动通信模块150耦合，天线2和无线通信模块160耦合，使得手机100可以通过无线通信技术与网络以及其他设备通信。The wireless communication function of the mobile phone 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor. In some embodiments, the antenna 1 of the mobile phone 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the mobile phone 100 can communicate with the network and other devices through wireless communication technology.

天线1和天线2用于发射和接收电磁波信号。手机100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用，以提高天线的利用率。例如，可以将天线1复用为无线局域网的分集天线。在另外一些实施例中，天线可以和调谐开关结合使用。Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in mobile phone 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas. For example, antenna 1 can be reused as a diversity antenna for a wireless local area network. In some other embodiments, the antenna can be used in combination with a tuning switch.

移动通信模块150可以提供应用在手机100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器，开关，功率放大器，低噪声放大器(lownoise amplifier，LNA)等。移动通信模块150可以由天线1接收电磁波，并对接收的电磁波进行滤波，放大等处理，传送至调制解调处理器进行解调。The mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G, etc., applied to the mobile phone 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, and filter, amplify, etc. the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.

移动通信模块150还可以对经调制解调处理器调制后的信号放大，经天线1转为电磁波辐射出去。在一些实施例中，移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中，移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves for radiation through the antenna 1. In some embodiments, at least some functional modules of the mobile communication module 150 can be set in the processor 110. In some embodiments, at least some functional modules of the mobile communication module 150 can be set in the same device as at least some modules of the processor 110.

无线通信模块160可以提供应用在手机100上的包括WLAN(如无线保真(wirelessfidelity，Wi-Fi)网络)，蓝牙(bluetooth，BT)，全球导航卫星系统(global navigationsatellite system，GNSS)，调频(frequency modulation，FM)，近距离无线通信技术(nearfield communication，NFC)，红外技术(infrared，IR)等无线通信的解决方案。The wireless communication module 160 can provide wireless communication solutions for application on the mobile phone 100, including WLAN (such as wireless fidelity (Wi-Fi) network), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), nearfield communication technology (NFC), infrared technology (IR), etc.

无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波，将电磁波信号调频以及滤波处理，将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号，对其进行调频，放大，经天线2转为电磁波辐射出去。The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signal and performs filtering, and sends the processed signal to the processor 110. The wireless communication module 160 may also receive a signal to be sent from the processor 110, modulate the signal, amplify the signal, and convert it into an electromagnetic wave for radiation via the antenna 2.

手机100可以通过ISP，摄像头193，视频编解码器，GPU，显示屏194以及应用处理器等实现拍摄功能。ISP用于处理摄像头193反馈的数据。摄像头193用于捕获静态图像或视频。在一些实施例中，手机100可以包括1个或N个摄像头193，N为大于等于1的正整数。本申请实施例中，手机100启动摄像头193后，可以获取实时图像数据。The mobile phone 100 can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194 and the application processor. The ISP is used to process the data fed back by the camera 193. The camera 193 is used to capture static images or videos. In some embodiments, the mobile phone 100 may include 1 or N cameras 193, where N is a positive integer greater than or equal to 1. In the embodiment of the present application, after the mobile phone 100 starts the camera 193, real-time image data can be obtained.

摄像头193可以是飞行时间摄像头(Time of Flight，TOF)，TOF是一种深度信息测量方案，主要由红外光投射器和接收模组构成。投射器向外投射红外光，红外光遇到被测物体后反射，并被接收模组接收，通过记录红外光从发射到被接收的时间，计算出被照物体深度信息，并完成3D建模。TOF摄像头可以获取IR图像和深度图像。TOF摄像头可用于将IR光投射到人脸上，获取用户的人脸IR图像。图3示出了一种获取用户的人脸图像的应用场景示意图。例如，如图3所示，手机100可以启动红外相机，拍摄人脸IR图像。除此之外，IR图像和深度图像也可以通过3D结构光摄像头获取，但不限于此。Camera 193 can be a time of flight camera (TOF). TOF is a depth information measurement solution, which is mainly composed of an infrared light projector and a receiving module. The projector projects infrared light outward, and the infrared light is reflected after encountering the object to be measured and is received by the receiving module. By recording the time from the emission to the reception of the infrared light, the depth information of the illuminated object is calculated, and 3D modeling is completed. The TOF camera can obtain IR images and depth images. The TOF camera can be used to project IR light onto a person's face to obtain an IR image of the user's face. Figure 3 shows a schematic diagram of an application scenario for obtaining a user's face image. For example, as shown in Figure 3, the mobile phone 100 can start an infrared camera to take an IR image of the face. In addition, IR images and depth images can also be obtained through a 3D structured light camera, but are not limited to this.

手机100通过GPU，显示屏194，以及应用处理器等实现显示功能。GPU为图像处理的微处理器，连接显示屏194和应用处理器。GPU用于执行数学和几何计算，用于图形渲染。处理器110可包括一个或多个GPU，其执行程序指令以生成或改变显示信息。The mobile phone 100 implements the display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

显示屏194用于显示图像，视频等。该显示屏194包括显示面板。本申请实施例中，注视信息可以包括用户在手机100的显示屏194上的注视点信息或者注视区域信息。The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. In the embodiment of the present application, the gaze information may include the gaze point information or gaze area information of the user on the display screen 194 of the mobile phone 100 .

外部存储器接口120可以用于连接外部存储卡，例如Micro SD卡，实现扩展手机100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信，实现数据存储功能。例如将音乐，视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function, such as storing music, video and other files in the external memory card.

内部存储器121可以用于存储计算机可执行程序代码，所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令，从而执行手机100的各种功能应用以及数据处理。例如，在本申请实施例中，处理器110可以通过执行存储在内部存储器121中的指令，内部存储器121可以包括存储程序区和存储数据区。The internal memory 121 may be used to store computer executable program codes, which include instructions. The processor 110 executes various functional applications and data processing of the mobile phone 100 by running the instructions stored in the internal memory 121. For example, in an embodiment of the present application, the processor 110 may execute the instructions stored in the internal memory 121, and the internal memory 121 may include a program storage area and a data storage area.

其中，存储程序区可存储操作系统，至少一个功能所需的应用程序(比如声音播放功能，业务抢占功能等)等。存储数据区可存储手机100使用过程中所创建的数据(比如音频数据，电话本等)等。此外，内部存储器121可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件，闪存器件，通用闪存存储器(universal flashstorage，UFS)等。The program storage area may store an operating system, an application required for at least one function (such as a sound playback function, a service preemption function, etc.), etc. The data storage area may store data created during the use of the mobile phone 100 (such as audio data, a phone book, etc.), etc. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc.

手机100可以通过音频模块170，扬声器170A，受话器170B，麦克风170C，耳机接口170D，以及应用处理器等实现音频功能。例如音乐播放，录音等。The mobile phone 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.

按键190包括开机键，音量键等。按键190可以是机械按键。也可以是触摸式按键。马达191可以产生振动提示。马达191可以用于来电振动提示，也可以用于触摸振动反馈。指示器192可以是指示灯，可以用于指示充电状态，电量变化，也可以用于指示消息，未接来电，通知等。SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195，或从SIM卡接口195拔出，实现和手机100的接触和分离。手机100可以支持1个或N个SIM卡接口，N为大于等于1的正整数。SIM卡接口195可以支持Nano SIM卡，Micro SIM卡，SIM卡等。The button 190 includes a power button, a volume button, etc. The button 190 can be a mechanical button. It can also be a touch button. The motor 191 can generate a vibration prompt. The motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. The indicator 192 can be an indicator light, which can be used to indicate the charging status, power changes, messages, missed calls, notifications, etc. The SIM card interface 195 is used to connect the SIM card. The SIM card can be inserted into the SIM card interface 195, or pulled out from the SIM card interface 195 to achieve contact and separation with the mobile phone 100. The mobile phone 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than or equal to 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.

本申请实施例提供一种注视点估计方法，该方法可以应用于具有上述硬件结构的电子设备(如手机100)中。图4示出了一种注视点估计方法的流程示意图。如图4所示，本申请实施例提供的注视点估计方法可以包括如下步骤：The present application embodiment provides a method for estimating a gaze point, which can be applied to an electronic device (such as a mobile phone 100) having the above hardware structure. FIG4 shows a flow chart of a method for estimating a gaze point. As shown in FIG4, the method for estimating a gaze point provided by the present application embodiment may include the following steps:

401：手机100通过摄像头193采集用户的IR图像和深度图像。401 : The mobile phone 100 collects the IR image and depth image of the user through the camera 193 .

由于IR图像的拍照效果不受环境光干扰。因此，即使在黑暗或者过亮的场景中，手机100拍摄到的IR图像里面也可以有较清晰的用户面部图像，并从该IR图像中提取到用户的眼部区域图像(即下文中的第一眼部区域图像)，以及该用户眼部区域图像中用户的眼部关键点的位置信息。由于用户的IR图像和深度图像的内容基本相同，且相同内容在各自图像中的位置基本相同，因此，可以通过前述得到的该用户眼部区域图像中用户的眼部关键点的位置信息从深度图像中得到该图对应的用于指示用户的眼部关键点的三维特征(即下文中的第二注视特征)。而且，手机100也可以从IR图像对应的眼部区域图像中提取到用于指示用户的眼部与注视相关的二维特征(即下文中的第一注视特征)。这样，手机100便可以基于前述两种注视特征信息确定用户在显示屏194上的注视信息。Since the photographic effect of the IR image is not affected by ambient light. Therefore, even in a dark or overly bright scene, the IR image captured by the mobile phone 100 can also contain a clearer user facial image, and the user's eye area image (i.e., the first eye area image hereinafter) and the position information of the user's eye key points in the user's eye area image are extracted from the IR image. Since the contents of the user's IR image and the depth image are basically the same, and the positions of the same content in each image are basically the same, the three-dimensional features corresponding to the image for indicating the user's eye key points can be obtained from the depth image through the aforementioned position information of the user's eye key points in the user's eye area image. In addition, the mobile phone 100 can also extract two-dimensional features (i.e., the first gaze feature hereinafter) for indicating the user's eyes and gaze-related from the eye area image corresponding to the IR image. In this way, the mobile phone 100 can determine the user's gaze information on the display screen 194 based on the aforementioned two types of gaze feature information.

在一些实施例中，手机100通过摄像头193采集用户的IR图像和深度图像需要触发条件，下面示例性介绍几种在触发条件下，IR图像和深度图像的获取：In some embodiments, the mobile phone 100 needs a trigger condition to collect the IR image and depth image of the user through the camera 193. The following examples introduce several ways to obtain the IR image and depth image under the trigger conditions:

方式1：Method 1:

手机100中可以设置一些应用，该应用可以支持本申请的注视点估计方法，则手机100可以在启动该应用后，周期性通过预设摄像头193采集用户的红外线图像和深度图像。具体地，手机100接收第一操作，第一操作用于触发手机100启动预设应用；响应于第一操作，启动预设应用后，手机100周期性通过预设摄像头193采集用户的红外线图像和深度图像。预设应用可以是电子书应用，浏览器应用，新闻应用等，但不限于此。Some applications can be set in the mobile phone 100, and the application can support the gaze point estimation method of the present application. Then, after starting the application, the mobile phone 100 can periodically collect infrared images and depth images of the user through the preset camera 193. Specifically, the mobile phone 100 receives a first operation, and the first operation is used to trigger the mobile phone 100 to start the preset application; in response to the first operation, after starting the preset application, the mobile phone 100 periodically collects infrared images and depth images of the user through the preset camera 193. The preset application can be an e-book application, a browser application, a news application, etc., but is not limited thereto.

方式2：Method 2:

协同工作的模式可以是指手机100可以确定用户的注视点，根据用户的注视点确定用户的意向，基于用户的意向执行相应的操作的状态。例如，在用户阅读电子书的场景中，手机可以确定用户的注视点，根据用户的注视点判断用户是否有翻页的意向，若有，则主动为用户提供翻页操作。又如，在浏览网页的场景中，手机100可以确定用户的注视点，根据用户的注视点判断用户感兴趣的内容，主动为用户推荐用户感兴趣的内容。若手机100处于协同工作模式，则手机100也可以周期性通过预设摄像头193采集用户的红外线图像和深度图像，以便于手机100更准确的检测出用户在显示屏194上的注视信息。具体的，若手机100处于预设协同工作模式，手机100周期性通过摄像头193采集用户的红外线图像和深度图像。The collaborative working mode may refer to a state in which the mobile phone 100 can determine the user's gaze point, determine the user's intention based on the user's gaze point, and perform corresponding operations based on the user's intention. For example, in a scenario where a user is reading an e-book, the mobile phone can determine the user's gaze point, determine whether the user has the intention to turn the page based on the user's gaze point, and if so, actively provide the user with a page turning operation. For another example, in a scenario where a web page is browsed, the mobile phone 100 can determine the user's gaze point, determine the content that the user is interested in based on the user's gaze point, and actively recommend the user the content that the user is interested in. If the mobile phone 100 is in a collaborative working mode, the mobile phone 100 can also periodically collect the user's infrared image and depth image through the preset camera 193, so that the mobile phone 100 can more accurately detect the user's gaze information on the display screen 194. Specifically, if the mobile phone 100 is in a preset collaborative working mode, the mobile phone 100 periodically collects the user's infrared image and depth image through the camera 193.

其中，在一些实施例中。手机100可以响应于用户对预设协同开关的开启操作，进入预设协同工作模式，预设协同开关可以配置在手机100的设置界面中，或者，预设协同开关配置在手机100的控制中心中，或者，预设协同开关配置在手机100的预设应用中。In some embodiments, the mobile phone 100 may enter a preset collaborative working mode in response to a user turning on a preset collaborative switch, and the preset collaborative switch may be configured in a setting interface of the mobile phone 100, or in a control center of the mobile phone 100, or in a preset application of the mobile phone 100.

方式3：Method 3:

手机100中设置一些应用，该些应用需要手机100开启协同工作的模式以及手机100在启动该应用的条件并存的情况下，周期性通过预设摄像头193采集用户的红外线图像和深度图像。Some applications are set in the mobile phone 100. These applications require the mobile phone 100 to start a collaborative working mode and the mobile phone 100 periodically collects infrared images and depth images of the user through the preset camera 193 when the conditions for starting the application exist.

其中，预设协同开关可以配置在手机100的设置界面中，或者，预设协同开关配置在手机100的控制中心中，或者，预设协同开关配置在手机100的预设应用中，手机100可以响应于用户对该些任意一种预设协同开关的开启操作，进入预设协同工作模式。Among them, the preset collaborative switch can be configured in the setting interface of the mobile phone 100, or the preset collaborative switch can be configured in the control center of the mobile phone 100, or the preset collaborative switch can be configured in the preset application of the mobile phone 100. The mobile phone 100 can enter the preset collaborative working mode in response to the user's turning on operation of any of the preset collaborative switches.

需要说明的是，手机100可以在启动该应用之前或者之后，开启协同工作的模式。It should be noted that the mobile phone 100 can start the collaborative working mode before or after starting the application.

方式4：Method 4:

在一些实施例中，手机100可以在开机状态下，支持本申请的注视点估计方法。那么，手机100只要检测用户注视显示屏194，就可以执行本方案的注视点估计方法，以此来提高手机100执行本方案的注视点估计方法的时长。这样，在手机100根据用户的注视点，协助用户工作的场景中，可以提高手机100准确协助用户工作的时长，从而提高用户体验。具体的，手机100在检测到用户注视显示屏194时，通过预设摄像头193采集红外线图像和深度图像。In some embodiments, the mobile phone 100 can support the gaze point estimation method of the present application when it is turned on. Then, as long as the mobile phone 100 detects that the user is looking at the display screen 194, the gaze point estimation method of the present solution can be executed, thereby increasing the duration for which the mobile phone 100 executes the gaze point estimation method of the present solution. In this way, in a scenario where the mobile phone 100 assists the user in working according to the user's gaze point, the duration for which the mobile phone 100 accurately assists the user in working can be increased, thereby improving the user experience. Specifically, when the mobile phone 100 detects that the user is looking at the display screen 194, the infrared image and the depth image are collected by the preset camera 193.

402：手机100基于用户的IR图像和深度图像，确定用户在显示屏194上的注视信息。402: The mobile phone 100 determines the user's gaze information on the display screen 194 based on the user's IR image and depth image.

注视信息可以包括用户在手机100的显示屏194上的注视点信息、注视区域信息、或者注视点信息和注视区域信息，其中，注视点信息可以为注视点的坐标，注视区域信息可以为注视区域的标识。例如，如图5所示，手机100的显示屏194可以分为多个注视区域：注视区域1至注视区域8，注视区域信息可以为该些注视区域的标识：数字1至8。注视点信息可以为用户在手机100的显示屏194上的注视点坐标(x1，y2)。The gaze information may include the gaze point information, the gaze area information, or the gaze point information and the gaze area information of the user on the display screen 194 of the mobile phone 100, wherein the gaze point information may be the coordinates of the gaze point, and the gaze area information may be the identification of the gaze area. For example, as shown in FIG5 , the display screen 194 of the mobile phone 100 may be divided into a plurality of gaze areas: gaze area 1 to gaze area 8, and the gaze area information may be the identification of the gaze areas: numbers 1 to 8. The gaze point information may be the coordinates (x1, y2) of the gaze point of the user on the display screen 194 of the mobile phone 100.

手机100可以基于用户的I R图像和深度图像，确定用户在显示屏194上的注视信息，具体的，下面提供一种具体的实现方式。例如，图5示出了一种注视点估计方法的流程示意图。如图5所示，包括如下步骤501至504：The mobile phone 100 can determine the user's gaze information on the display screen 194 based on the user's IR image and depth image. Specifically, a specific implementation method is provided below. For example, FIG5 shows a flowchart of a gaze point estimation method. As shown in FIG5, the following steps 501 to 504 are included:

501：手机100识别I R图像，得到用户的第一眼部区域图像和第一眼部区域图像中用户的眼部关键点的第一位置信息；其中，第一位置信息可以为用户的眼部关键点的每个像素在I R图像中的坐标，该坐标可以为预设二维坐标系下的x轴坐标和y轴坐标维坐。501: The mobile phone 100 identifies an IR image, obtains a first eye area image of a user and first position information of an eye key point of the user in the first eye area image; wherein the first position information may be coordinates of each pixel of the eye key point of the user in the IR image, and the coordinates may be x-axis coordinates and y-axis coordinates in a preset two-dimensional coordinate system.

可以理解，眼部关键点可以包括人眼及与人眼周围的组成部件。例如，眉毛、眼皮、眼球、眼睫毛、眼角、瞳孔等。It can be understood that the key points of the eyes may include the human eye and components around the human eye, such as eyebrows, eyelids, eyeballs, eyelashes, corners of the eyes, pupils, etc.

可以理解，手机100可以通过一些目标检测算法，识别I R图像，得到用户的第一眼部区域图像，以及第一眼部区域图像中用户的眼部关键点。例如，如图1所示的手机100从IR图像得到的用户的第一眼部区域图像E。It is understood that the mobile phone 100 can identify the IR image through some target detection algorithms to obtain the first eye area image of the user and the eye key points of the user in the first eye area image. For example, the first eye area image E of the user obtained by the mobile phone 100 from the IR image as shown in FIG1 .

预设二维坐标系可以为手机100预先设置的二维坐标系。如此，手机100在得到第一眼部区域图像中用户的眼部关键点之后，便可以基于该二维坐标系得到第一眼部区域图像中用户的眼部关键点的的每个像素点的坐标，这些像素点坐标构成了第一位置信息。The preset two-dimensional coordinate system may be a two-dimensional coordinate system pre-set by the mobile phone 100. Thus, after obtaining the eye key points of the user in the first eye area image, the mobile phone 100 may obtain the coordinates of each pixel point of the eye key points of the user in the first eye area image based on the two-dimensional coordinate system, and these pixel point coordinates constitute the first position information.

502：手机100基于第一眼部区域图像得到第一注视特征，其中，第一注视特征用于指示用户的眼部与注视相关的二维特征。502: The mobile phone 100 obtains a first gaze feature based on the first eye region image, wherein the first gaze feature is used to indicate a two-dimensional feature of the user's eyes related to gaze.

眼部区域图像中具有用户眼部与注视相关的很多特征信息，于是，手机100可以基于眼部区域图像得到用于指示用户的眼部与注视相关的特征。用户的眼部与注视相关的特征可以包括人眼及与人眼周围的组成部分的信息，例如，眉毛、眼皮、眼球、眼睫毛、眼角、瞳孔等的信息。具体的，第一注视特征可以包括人眼的形态和状态信息。人眼的形态信息可以是人眼的形状和大小，例如瞳孔形状和大小、眼皮类型、眼部轮廓大小。人眼的状态信息可以是人眼的动态信息，例如，人眼的状态信息可以是眼球转动状态、眼球位姿、眼角开合度、瞳孔方向等信息。The eye area image contains a lot of feature information related to the user's eyes and gaze, so the mobile phone 100 can obtain features related to the user's eyes and gaze based on the eye area image. The user's eyes and gaze-related features may include information about the human eye and components around the human eye, such as eyebrows, eyelids, eyeballs, eyelashes, eye corners, pupils, etc. Specifically, the first gaze feature may include morphology and state information of the human eye. The morphological information of the human eye may be the shape and size of the human eye, such as the shape and size of the pupil, the type of eyelids, and the size of the eye contour. The state information of the human eye may be dynamic information of the human eye, for example, the state information of the human eye may be information such as the rotation state of the eyeball, the position of the eyeball, the degree of opening of the eye corners, and the direction of the pupil.

503：手机100基于第一位置信息从深度图像中得到第二注视特征，其中，第二注视特征用于指示用户的眼部关键点的三维特征。503: The mobile phone 100 obtains a second gaze feature from the depth image based on the first position information, wherein the second gaze feature is used to indicate a three-dimensional feature of a key point of an eye of the user.

由于用户的I R图像和深度图像的内容基本相同，且相同内容在各自图像中的位置基本相同，因此，手机100便可以通过前述得到的第一位置信息，从同样位于预设二维坐标系下的深度图像中，得到用于指示用户的眼部关键点的三维特征的第二注视特征。Since the contents of the user's IR image and depth image are basically the same, and the positions of the same content in their respective images are basically the same, the mobile phone 100 can use the first position information obtained above to obtain a second gaze feature indicating the three-dimensional features of the user's eye key points from the depth image that is also located in the preset two-dimensional coordinate system.

第二注视特征包括用户眼部的3D信息，用户眼部的3D信息包括三维坐标信息(x，y，z)，其中，x表示用户的人眼在预设三维坐标系下的x轴位置，y表示用户的人眼在预设三维坐标系下的y轴位置，z表示用户与手机100的摄像头193的光心之间的距离。预设三维坐标系可以为以摄像头193的光心为原点的三维坐标系。The second gaze feature includes 3D information of the user's eyes, and the 3D information of the user's eyes includes three-dimensional coordinate information (x, y, z), wherein x represents the x-axis position of the user's eyes in a preset three-dimensional coordinate system, y represents the y-axis position of the user's eyes in the preset three-dimensional coordinate system, and z represents the distance between the user and the optical center of the camera 193 of the mobile phone 100. The preset three-dimensional coordinate system may be a three-dimensional coordinate system with the optical center of the camera 193 as the origin.

在一些实施例中，手机100可以基于第一位置信息从同样位于预设二维坐标系下的深度图像中得到第二注视特征。In some embodiments, the mobile phone 100 may obtain the second gaze feature from a depth image also located in a preset two-dimensional coordinate system based on the first position information.

在一些实施例中，预设三维坐标系与前述预设二维坐标系有对应映射关系。手机100便可以基于第一位置信息从该映射关系中得到深度图像中的第二注视特征。504：手机100根据第一注视特征和第二注视特征，确定用户在显示屏194上的注视信息。In some embodiments, the preset three-dimensional coordinate system has a corresponding mapping relationship with the aforementioned preset two-dimensional coordinate system. The mobile phone 100 can obtain the second gaze feature in the depth image from the mapping relationship based on the first position information. 504: The mobile phone 100 determines the user's gaze information on the display screen 194 based on the first gaze feature and the second gaze feature.

由于第一注视特征为用于指示用户的眼部与注视相关的二维特征，第二注视特征为用于指示用户的眼部关键点的三维特征。用户的眼部与注视相关的二维特征和用户的眼部关键点的三维特征可以预测用户眼睛的朝向，因此，手机100可以根据第一注视特征和第二注视特征，先确定用户眼睛的朝向，再基于用户眼睛的朝向，确定用户在显示屏194上的注视信息。Since the first gaze feature is a two-dimensional feature for indicating the user's eyes and gaze-related features, and the second gaze feature is a three-dimensional feature for indicating the user's eye key points, the two-dimensional features of the user's eyes and gaze-related features and the three-dimensional features of the user's eye key points can predict the direction of the user's eyes, so the mobile phone 100 can first determine the direction of the user's eyes according to the first gaze feature and the second gaze feature, and then determine the user's gaze information on the display screen 194 based on the direction of the user's eyes.

具体的，包括如下步骤5041-5042：Specifically, the process includes the following steps 5041-5042:

5041：手机100根据第一注视特征，获取第一人眼位姿参数；根据第二注视特征，获取第二人眼位姿参数；其中，人眼位姿参数包括用户眼睛的视线朝向信息，用户眼睛的视线朝向信息包括用户眼睛在预设三维坐标系下的旋转角、俯仰角和航向角；预设三维坐标系可以为以摄像头193的光心为原点的三维坐标系。5041: The mobile phone 100 obtains a first eye posture parameter according to the first gaze feature; and obtains a second eye posture parameter according to the second gaze feature; wherein the eye posture parameter includes the sight direction information of the user's eyes, and the sight direction information of the user's eyes includes the rotation angle, pitch angle and heading angle of the user's eyes in a preset three-dimensional coordinate system; the preset three-dimensional coordinate system can be a three-dimensional coordinate system with the optical center of the camera 193 as the origin.

5042：手机100基于第一人眼位姿参数和第二人眼位姿参数确定用户在显示屏上的注视信息。5042: The mobile phone 100 determines the user's gaze information on the display screen based on the first eye posture parameter and the second eye posture parameter.

在一些实施例中，人眼位姿参数与用户在显示屏194上的注视信息之间存在预设对应关系，手机100可以利用第一人眼位姿参数和第二人眼位姿参数从预设对应关系确定用户在显示屏194上的注视信息。In some embodiments, there is a preset correspondence between the eye posture parameters and the user's gaze information on the display screen 194. The mobile phone 100 can use the first eye posture parameters and the second eye posture parameters to determine the user's gaze information on the display screen 194 from the preset correspondence.

可以理解，模型可以具有强大快捷的预测功能，利用人眼注视信息估计模型，得到用户在显示屏上的注视信息，可以提高注视信息的估计速度，提高用户体验，且在电子设备根据用户的注视点，协助用户工作的场景中，可以提高了电子设备协助用户工作的效率，提升用户体验。于是，在一些实施例中，步骤402的功能可以集成在人眼注视信息估计模型。这样，手机100便可以将用户的IR图像和深度图像输入已经训练好的人眼注视信息估计模型中，利用人眼注视信息估计模型确定用户在手机100的显示屏194的注视信息。It can be understood that the model can have a powerful and fast prediction function. By using the human eye gaze information estimation model to obtain the user's gaze information on the display screen, the estimation speed of the gaze information can be improved, the user experience can be improved, and in the scenario where the electronic device assists the user in working according to the user's gaze point, the efficiency of the electronic device in assisting the user in working can be improved, and the user experience can be improved. Therefore, in some embodiments, the function of step 402 can be integrated into the human eye gaze information estimation model. In this way, the mobile phone 100 can input the user's IR image and depth image into the trained human eye gaze information estimation model, and use the human eye gaze information estimation model to determine the user's gaze information on the display screen 194 of the mobile phone 100.

例如，图6示出了一种人眼注视信息估计模型的计算示意图。如图6所示，人眼注视信息估计模型包括眼部区域图像确定模块1、第一注视特征确定模块2、第二注视特征确定模块3和注视信息输出模块4。For example, Fig. 6 shows a calculation schematic diagram of a human eye gaze information estimation model. As shown in Fig. 6, the human eye gaze information estimation model includes an eye region image determination module 1, a first gaze feature determination module 2, a second gaze feature determination module 3 and a gaze information output module 4.

眼部区域图像确定模块1具有执行上述步骤501的内容的功能。The eye area image determination module 1 has the function of executing the content of the above step 501 .

第一注视特征确定模块2具有执行上述步骤502的内容的功能。The first gaze feature determination module 2 has the function of executing the content of the above step 502 .

第二注视特征确定模块3具有执行上述步骤503的内容的功能。The second gaze feature determination module 3 has the function of executing the content of the above step 503 .

注视信息输出模块4具有执行上述步骤504的内容的功能。注视信息输出模块4可以是具有分类功能的模型，可以包括全连接层。The gaze information output module 4 has the function of executing the content of the above step 504. The gaze information output module 4 may be a model with a classification function, and may include a fully connected layer.

可以理解，人眼注视信息估计模型在初步建立阶段，可以执行以下步骤1和步骤2，以得到用于训练上述人眼注视信息估计模型的多个训练样本。步骤1：采集用户的IR图像和深度图。步骤2，用户的IR图像和深度图对应的用户在显示屏194上的真实注视信息。It can be understood that in the initial establishment stage of the human eye gaze information estimation model, the following steps 1 and 2 can be performed to obtain multiple training samples for training the above human eye gaze information estimation model. Step 1: Collect the user's IR image and depth map. Step 2: The user's real gaze information on the display screen 194 corresponding to the user's IR image and depth map.

可以理解，初步建立人眼注视信息估计模型的阶段中是采用上述训练样本，训练人眼注视信息估计模型的。因此，经过多次样本训练后的人眼注视信息估计模型，则可以具备采用用户的IR图像和深度图，得到用户的IR图像和深度图对应的用户在显示屏194上的注视信息。并且，样本训练的次数越多，人眼注视信息估计模型得到的用户在显示屏194上的注视信息的准确度越高。因此，本申请实施例中，手机100中预先配置的人眼注视信息估计模型可以是经过大量样本训练的AI模型。It can be understood that the above-mentioned training samples are used to train the human eye gaze information estimation model in the stage of initially establishing the human eye gaze information estimation model. Therefore, the human eye gaze information estimation model after multiple sample trainings can be equipped with the user's IR image and depth map to obtain the user's gaze information on the display screen 194 corresponding to the user's IR image and depth map. Moreover, the more times the sample is trained, the higher the accuracy of the user's gaze information on the display screen 194 obtained by the human eye gaze information estimation model. Therefore, in the embodiment of the present application, the human eye gaze information estimation model pre-configured in the mobile phone 100 can be an AI model that has been trained with a large number of samples.

示例性的，上述AI模型可以为卷积神经网络，例如上述AI模型可以是基于卷积层、池化层、全连接层的模型结构，本申请实施例对此不作限制。Exemplarily, the above-mentioned AI model can be a convolutional neural network. For example, the above-mentioned AI model can be a model structure based on a convolutional layer, a pooling layer, and a fully connected layer. The embodiments of the present application are not limited to this.

若前述人眼注视信息估计模型可以输出包含注视点信息和注视区域信息的注视信息。则前述人眼注视信息估计模型通过分类的方法输出注视区域，并通过联合多任务(分类和回归)学习的方式提升注视点估计精度。与业界已有方案相比，本方案算法鲁棒性更高，使用场景更丰富。综上，即使手机100在场景过暗或过亮的环境拍摄用户，手机100也可以获取到用户的眼部几何特征，进而利用深度图像准确确定用户的注视点，使手机100成功协助用户工作，提高了手机100协助用户工作的可靠性。If the aforementioned human eye gaze information estimation model can output gaze information including gaze point information and gaze area information. Then the aforementioned human eye gaze information estimation model outputs the gaze area through a classification method, and improves the accuracy of gaze point estimation through a joint multi-task (classification and regression) learning method. Compared with existing solutions in the industry, the algorithm of this solution is more robust and has richer usage scenarios. In summary, even if the mobile phone 100 shoots the user in an environment where the scene is too dark or too bright, the mobile phone 100 can also obtain the user's eye geometric features, and then use the depth image to accurately determine the user's gaze point, so that the mobile phone 100 successfully assists the user in his work, thereby improving the reliability of the mobile phone 100 in assisting the user in his work.

在一些情况下，用户的头部在一定时间内可能不是一直静止的，还可以转向、抬头、低头等。如此，手机100拍摄的用户的IR图像和深度图像中可能就不存在用户的眼部图像了。若手机100拍摄的用户的IR图像和深度图像中不存在用户的眼部图像，手机100执行该步骤要消耗一定的能量和时间。为了节省手机100执行该步骤所消耗的能量和时间，手机100也可以先判断用户的IR图像中是不是存在用户的眼部图像，若存在，则执行该步骤，若不存在，则重新执行步骤401，即通过摄像头193再一次采集用户的IR图像和深度图像。In some cases, the user's head may not be still for a certain period of time, and may turn, look up, or lower the head, etc. In this way, the user's eye image may not exist in the IR image and depth image of the user taken by the mobile phone 100. If the user's eye image does not exist in the IR image and depth image of the user taken by the mobile phone 100, the mobile phone 100 will consume a certain amount of energy and time to execute this step. In order to save the energy and time consumed by the mobile phone 100 to execute this step, the mobile phone 100 may also first determine whether the user's eye image exists in the user's IR image. If so, execute this step. If not, re-execute step 401, that is, collect the user's IR image and depth image again through the camera 193.

手机100可以利用用户在手机100的显示屏194的注视信息完成手机100与用户协同工作。例如，图7示出了一种手机100的界面示意图。如图7所示，手机100中有电子书应用，用户点击电子书应用图标A后，阅读电子书。在用户阅读电子书的过程中，手机100可以确定用户的注视点，根据用户的注视点判断用户是否有翻页的意向。例如，图8示出了一种电子书应用的界面示意图。如图8所示，手机100可以确定用户的注视点是否在转到前一页按钮B1或者转到后一页按钮B2，若在转到后一页按钮B2上，则主动为用户提供翻到下页操作。这样，可以提高用户对手机100与用户协同工作的智能体验。The mobile phone 100 can use the user's gaze information on the display screen 194 of the mobile phone 100 to complete the collaborative work between the mobile phone 100 and the user. For example, FIG7 shows a schematic diagram of the interface of the mobile phone 100. As shown in FIG7, there is an e-book application in the mobile phone 100. After the user clicks the e-book application icon A, the e-book is read. In the process of the user reading the e-book, the mobile phone 100 can determine the user's gaze point and determine whether the user has the intention to turn the page based on the user's gaze point. For example, FIG8 shows a schematic diagram of the interface of an e-book application. As shown in FIG8, the mobile phone 100 can determine whether the user's gaze point is on the go to previous page button B1 or the go to next page button B2. If it is on the go to next page button B2, it actively provides the user with a turn to the next page operation. In this way, the user's intelligent experience of the collaborative work between the mobile phone 100 and the user can be improved.

本申请实施例提供的注视点估计方法可以应用在智能手机、平板、智慧屏、AR/VR眼镜的人机交互上。The gaze point estimation method provided in the embodiments of the present application can be applied to human-computer interaction on smartphones, tablets, smart screens, and AR/VR glasses.

现有的注视点估计方法只通过RGB图像估计注视点，未考虑不同深度不同位姿拍摄图像对注视点的影响，造成注视点估计准确度降低。而本申请基于IR图像，在暗光场景下，手机100也能进行注视点估计。Existing gaze point estimation methods only estimate gaze points through RGB images, without considering the impact of images taken at different depths and postures on gaze points, resulting in reduced accuracy of gaze point estimation. However, this application is based on IR images, and the mobile phone 100 can also perform gaze point estimation in dark light scenes.

现有的人眼注视信息估计模型是以提取人脸图像和人脸网格作为输入。若用户戴口罩则可能导致人脸提取失败，从而不能够进行注视点估计。而本申请只基于眼部特征，戴口罩不影响特征提取，所以能够应对用户戴口罩的场景。The existing human eye gaze information estimation model uses the extracted face image and face mesh as input. If the user wears a mask, face extraction may fail, and thus the gaze point estimation cannot be performed. However, this application is only based on eye features, and wearing a mask does not affect feature extraction, so it can cope with the scenario where the user wears a mask.

本申请另一实施例提供了一种电子设备，该电子设备包括：存储器和一个或多个处理器。该存储器与处理器耦合。其中，上述存储器中还存储有计算机程序代码，该计算机程序代码包括计算机指令。当计算机指令被处理器执行时，电子设备可执行上述方法实施例中手机100执行的各个功能或者步骤。该电子设备的结构可以参考图2所示的手机100的结构。Another embodiment of the present application provides an electronic device, which includes: a memory and one or more processors. The memory is coupled to the processor. The memory also stores a computer program code, which includes computer instructions. When the computer instructions are executed by the processor, the electronic device can execute the various functions or steps executed by the mobile phone 100 in the above method embodiment. The structure of the electronic device can refer to the structure of the mobile phone 100 shown in Figure 2.

本申请实施例还提供一种计算机可读存储介质，该计算机存储介质包括计算机指令，当所述计算机指令在上述电子设备上运行时，使得该电子设备执行上述方法实施例中手机100执行的各个功能或者步骤。An embodiment of the present application also provides a computer-readable storage medium, which includes computer instructions. When the computer instructions are executed on the above-mentioned electronic device, the electronic device executes each function or step executed by the mobile phone 100 in the above-mentioned method embodiment.

本申请实施例还提供一种计算机程序产品，当所述计算机程序产品在计算机上运行时，使得所述计算机执行上述方法实施例中手机100执行的各个功能或者步骤。该计算机可以是上述电子设备(如手机100)。The present application also provides a computer program product, which, when executed on a computer, enables the computer to execute the functions or steps executed by the mobile phone 100 in the above method embodiment. The computer may be the above electronic device (such as the mobile phone 100).

本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码，该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。The various embodiments of the mechanism disclosed in the present application can be implemented in hardware, software, firmware or a combination of these implementation methods. The embodiments of the present application can be implemented as a computer program or program code executed on a programmable system, which includes at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device and at least one output device.

可将程序代码应用于输入指令，以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的，处理系统包括具有诸如例如数字信号处理器(Digital Signal Processor，DSP)、微控制器、专用集成电路(Application Specific Integrated Circuit，ASIC)或微处理器之类的处理器的任何系统。Program code can be applied to input instructions to perform the functions described in this application and generate output information. The output information can be applied to one or more output devices in a known manner. For the purposes of this application, a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.

程序代码可以用高级程序化语言或面向对象的编程语言来实现，以便与处理系统通信。在需要时，也可用汇编语言或机器语言来实现程序代码。事实上，本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下，该语言可以是编译语言或解释语言。Program code can be implemented with high-level programming language or object-oriented programming language to communicate with the processing system. When necessary, program code can also be implemented with assembly language or machine language. In fact, the mechanism described in this application is not limited to the scope of any specific programming language. In either case, the language can be a compiled language or an interpreted language.

在一些情况下，所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如，计算机可读)存储介质承载或存储在其上的指令，其可以由一个或多个处理器读取和执行。例如，指令可以通过网络或通过其他计算机可读存储介质分发。因此，机器可读存储介质可以包括用于以机器(例如，计算机)可读的形式存储或传播信息的任何机制，包括但不限于，软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(Read Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、可擦除可编程只读存储器(Erasable ProgrammableRead Only Memory，EPROM)、电可擦除可编程只读存储器(Electrically ErasableProgrammable Read-Only Memory，EEPROM)、磁卡或光卡、闪存、或用于基于因特网以电、光、声或其他形式的传播信号来传播信息(例如，载波、红外信号数字信号等)的有形的机器可读存储器。因此，机器可读存储介质包括适合于以机器(例如计算机)可读的形式存储或传播电子指令或信息的任何类型的机器可读存储介质。In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried or stored on one or more temporary or non-temporary machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, instructions may be distributed over a network or through other computer-readable storage media. Therefore, a machine-readable storage medium may include any mechanism for storing or disseminating information in a machine (e.g., computer) readable form, including but not limited to, a floppy disk, an optical disk, an optical disk, a read-only memory (CD-ROMs), a magneto-optical disk, a read-only memory (ROM), a random access memory (RAM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a magnetic card or an optical card, a flash memory, or a tangible machine-readable memory for disseminating information (e.g., a carrier wave, an infrared signal digital signal, etc.) based on the Internet in an electrical, optical, acoustic, or other form of propagation signal. Accordingly, machine-readable storage media include any type of machine-readable storage media suitable for storing or propagating electronic instructions or information in a form readable by a machine (eg, a computer).

在附图中，可以以特定布置和/或顺序示出一些结构或方法特征。然而，应该理解，可能不需要这样的特定布置和/或排序。而是，在一些实施例中，这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外，在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征，并且在一些实施例中，可以不包括这些特征或者可以与其他特征组合。In the accompanying drawings, some structural or method features may be shown in a specific arrangement and/or order. However, it should be understood that such a specific arrangement and/or order may not be required. Instead, in some embodiments, these features may be arranged in a manner and/or order different from that shown in the illustrative drawings. In addition, the inclusion of structural or method features in a particular figure does not mean that such features are required in all embodiments, and in some embodiments, these features may not be included or may be combined with other features.

需要说明的是，本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块，在物理上，一个逻辑单元/模块可以是一个物理单元/模块，也可以是一个物理单元/模块的一部分，还可以以多个物理单元/模块的组合实现，这些逻辑单元/模块本身的物理实现方式并不是最重要的，这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外，为了突出本申请的创新部分，本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入，这并不表明上述设备实施例并不存在其它的单元/模块。It should be noted that the units/modules mentioned in the various device embodiments of the present application are all logical units/modules. Physically, a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules. The physical implementation method of these logical units/modules themselves is not the most important. The combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application. In addition, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.

需要说明的是，在本专利的示例和说明书中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in the examples and description of this patent, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "including one" do not exclude the existence of other identical elements in the process, method, article or device including the elements.

虽然通过参照本申请的某些优选实施例，已经对本申请进行了图示和描述，但本领域的普通技术人员应该明白，可以在形式上和细节上对其作各种改变，而不偏离本申请的精神和范围。Although the present application has been illustrated and described with reference to certain preferred embodiments thereof, it will be apparent to those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. The method for estimating the gaze point is characterized in that the method is applied to electronic equipment, the electronic equipment comprises a preset camera and a display screen, and the method comprises the following steps:

acquiring an infrared image and a depth image of a user through the preset camera;

Identifying an infrared image to obtain a first eye region image of a user and first position information of eye key points of the user in the first eye region image;

Obtaining a first fixation feature based on the first eye region image, wherein the first fixation feature is used for indicating two-dimensional features of eyes of a user related to fixation;

Obtaining a second fixation feature from the depth image based on the first position information, wherein the second fixation feature is used for indicating three-dimensional features of eye key points of a user;

acquiring first eye pose parameters according to the first gazing characteristics;

Acquiring a second human eye pose parameter according to the second gazing feature; the first human eye pose parameter and the second human eye pose parameter comprise sight line orientation information of eyes of a user, and the sight line orientation information of the eyes of the user comprises a rotation angle, a pitch angle and a course angle of the eyes of the user under a preset three-dimensional coordinate system; the preset three-dimensional coordinate system takes the optical center of the preset camera as an origin;

and determining the gazing information of the user on the display screen based on the first human eye pose parameter and the second human eye pose parameter.

2. The method of claim 1, wherein there is a preset correspondence between the first and second human eye pose parameters and gaze information of the user on the display screen;

the determining, based on the first human eye pose parameter and the second human eye pose parameter, gaze information of the user on the display screen includes:

and determining the gazing information of the user on the display screen according to the preset corresponding relation by utilizing the first human eye pose parameter and the second human eye pose parameter.

3. The method according to claim 1 or 2, wherein prior to the deriving a first gaze feature based on the first eye region image, the method further comprises:

and determining that the infrared image comprises a human eye image.

4. A method according to any one of claims 1-3, wherein the capturing of the infrared image and the depth image of the user by the preset camera comprises:

receiving a first operation, wherein the first operation is used for triggering the electronic equipment to start a preset application;

And responding to the first operation, and periodically acquiring the infrared image and the depth image of the user through the preset camera after the preset application is started.

5. A method according to any one of claims 1-3, wherein the capturing of the infrared image and the depth image of the user by the preset camera comprises:

And if the electronic equipment is in a preset cooperative working mode, periodically acquiring the infrared image and the depth image of the user through the preset camera.

6. The method of claim 5, wherein prior to the capturing the infrared image and the depth image of the user by the preset camera, the method further comprises:

responding to the starting operation of a user on a preset cooperative switch, and entering a preset cooperative working mode;

the preset cooperative switch is configured in a setting interface of the electronic equipment, or is configured in a control center of the electronic equipment, or is configured in a preset application of the electronic equipment.

7. The method according to any one of claims 1-6, wherein the capturing, by the preset camera, an infrared image and a depth image of a user comprises:

And when the user is detected to watch the display screen, acquiring the infrared image and the depth image through the preset camera.

8. The method according to any one of claims 1-7, wherein the preset camera is a TOF camera or a 3D structured light camera.

9. The method for estimating the gaze point is characterized in that the method is applied to electronic equipment, the electronic equipment comprises a preset camera and a display screen, and the method comprises the following steps:

Taking the infrared image and the depth image as input, and operating a human eye gazing information estimation model to obtain gazing information of the user on the display screen;

wherein, the human eye gazing information estimation model is used for:

10. An electronic device comprising a memory and one or more processors; the memory is used for storing code instructions; the processor is configured to execute the code instructions to cause the electronic device to perform the method of any of claims 1-9.

11. A computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-9.