CN103809733B

CN103809733B - Human-computer interaction system and method

Info

Publication number: CN103809733B
Application number: CN201210440197.1A
Authority: CN
Inventors: 孙迅; 陈茂林
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2012-11-07
Filing date: 2012-11-07
Publication date: 2018-07-20
Anticipated expiration: 2032-11-07
Also published as: KR102110811B1; CN103809733A; CN108845668B; CN108845668A; KR20140059109A

Abstract

A human-computer interaction system and a human-computer interaction method are provided, the human-computer interaction system comprising: an image acquisition device for acquiring image data; a human-computer interaction processing device which determines an interactive operation that a user wants to perform according to a plurality of types of actions and gestures of the user detected from the image data; and the display device displays a display screen corresponding to the result of the interactive operation. The invention can utilize the combination of various motion detection modes to carry out human-computer interaction operation, thereby reducing the ambiguity of human-computer interaction operation identification and improving the accuracy of human-computer interaction operation under the condition of not needing an additional input device.

Description

Human-computer interaction system and method

技术领域technical field

本发明涉及计算机视觉和模式识别领域，更具体地，涉及一种非接触的、自然的远距离人机交互(HCI)系统和方法。The present invention relates to the field of computer vision and pattern recognition, more specifically, to a non-contact, natural long-distance human-computer interaction (HCI) system and method.

背景技术Background technique

基于计算机视觉技术的人机交互方式可通过各种图像获取和处理方法来视觉地获取用户输入。基于计算机视觉技术的人机交互方式成为了新一代人机交互技术的热门话题，尤其在休闲娱乐的人机交互方面得到了广泛的应用。在这种交互方式下，可通过用户的身体姿态、头部姿态、视线或人体动作来与计算机进行交互，从而可以使用户从传统的键盘、鼠标等的输入方式中解脱，得到前所未有的人机交互体验。The human-computer interaction method based on computer vision technology can visually obtain user input through various image acquisition and processing methods. The human-computer interaction method based on computer vision technology has become a hot topic in the new generation of human-computer interaction technology, especially in the human-computer interaction of leisure and entertainment, which has been widely used. In this interactive mode, the user can interact with the computer through the user's body posture, head posture, line of sight or human body movements, so that the user can be freed from the traditional keyboard, mouse, etc. input methods, and obtain unprecedented man-machine interaction. Interactive experience.

目前提出了多种基于计算机视觉的人机交互方式。在一种现有的人机交互方式中，可通过使用触摸输入和三维(3D)手势输入来产生、修改和操作3D物体。在另一种方法中，可通过人体姿态检测来与虚拟用户界面进行交互。At present, a variety of human-computer interaction methods based on computer vision have been proposed. In one existing way of human-computer interaction, 3D objects can be created, modified and manipulated by using touch input and three-dimensional (3D) gesture input. In another approach, human gesture detection can be used to interact with the virtual user interface.

然而，现有的人机交互设备和方法所利用的运动检测的类型较为单一，通常需要基于触摸的输入装置并且需要用户记住大量的规定动作来执行交互。由于手势、姿势以及深度感应范围的原因，通常需要进行预处理或各种手动操作，例如，需要校准各种传感器，预先定义交互空间等。这使用户感到不方便。因此，需要一种能够利用多种运动检测方式且不依赖于附加的输入装置的人机交互方式。However, the types of motion detection utilized by existing human-computer interaction devices and methods are relatively single, generally requiring touch-based input devices and requiring users to memorize a large number of prescribed actions to perform the interaction. Due to the range of gestures, poses, and depth sensing, preprocessing or various manual operations are usually required, for example, various sensors need to be calibrated, interaction spaces are pre-defined, etc. This inconveniences the user. Therefore, there is a need for a human-computer interaction method that can utilize multiple motion detection methods and does not rely on additional input devices.

发明内容Contents of the invention

根据本发明的一方面，提供了一种人机交互系统，包括：图像获取设备，用于获取图像数据；人机交互处理设备，根据从图像数据检测的用户的多种类型的动作和姿态来确定用户想要进行的交互操作；显示设备，显示与交互操作的结果对应的显示屏幕。According to an aspect of the present invention, there is provided a human-computer interaction system, including: an image acquisition device for acquiring image data; a human-computer interaction processing device for processing various types of actions and gestures of the user detected from the image data The interactive operation that the user wants to perform is determined; the display device displays a display screen corresponding to the result of the interactive operation.

根据本发明的一方面，人机交互处理设备包括：运动检测模块，从图像数据中检测用户的多种类型的动作和姿态；交互确定模块，根据运动检测模块检测的用户的多种类型的动作和姿态来确定用户想要将要进行的交互操作，并向显示控制模块发出相应的显示操作指令；显示控制模块，根据交互确定模块确定的指令控制显示设备在显示屏幕上显示相应的交互操作。According to one aspect of the present invention, the human-computer interaction processing device includes: a motion detection module, which detects various types of actions and gestures of the user from image data; an interaction determination module, which detects various types of actions of the user according to the motion detection module and posture to determine the interactive operation that the user wants to perform, and send a corresponding display operation instruction to the display control module; the display control module controls the display device to display the corresponding interactive operation on the display screen according to the instruction determined by the interaction determination module.

根据本发明的一方面，运动检测模块包括：视线捕捉模块，用于从图像数据中检测用户的视线方向；姿态追踪模块，用于在图像数据中追踪和识别用户身体各部分的姿态和动作。According to one aspect of the present invention, the motion detection module includes: a gaze capture module, used to detect the user's gaze direction from the image data; a gesture tracking module, used to track and recognize gestures and actions of various parts of the user's body in the image data.

根据本发明的一方面，视线捕捉模块通过从图像数据中检测用户的头部的俯仰方向和偏转方向来确定用户的视线方向。According to an aspect of the present invention, the sight-line capturing module determines the user's line-of-sight direction by detecting the pitch direction and the yaw direction of the user's head from the image data.

根据本发明的一方面，姿态追踪模块在图像数据中追踪和检测用户的手的节点以确定用户的手的运动和手势，并检测用户的身体骨骼节点以确定用户身体各部分的姿态动作。According to an aspect of the present invention, the gesture tracking module tracks and detects the nodes of the user's hand in the image data to determine the motion and gesture of the user's hand, and detects the nodes of the user's body skeleton to determine the gestures of various parts of the user's body.

根据本发明的一方面，交互确定模块根据视线捕捉模块检测的用户的视线方向和姿态追踪模块识别的用户的手的姿态来确定是否开始交互操作。According to an aspect of the present invention, the interaction determining module determines whether to start the interactive operation according to the user's gaze direction detected by the gaze capturing module and the gesture of the user's hand recognized by the gesture tracking module.

根据本发明的一方面，如果确定用户的视线方向和用户的手的指示方向均指向显示屏幕上的显示项超过预定时间，则交互确定模块确定开始对该显示项进行交互操作。According to an aspect of the present invention, if it is determined that both the user's gaze direction and the user's hand pointing direction point to the display item on the display screen for more than a predetermined time, the interaction determination module determines to start an interactive operation on the display item.

根据本发明的一方面，如果确定用户的视线方向和用户的手的指示方向均未指向显示项，则交互确定模块确定停止对该显示项进行交互操作。According to an aspect of the present invention, if it is determined that neither the user's gaze direction nor the user's hand pointing direction points to the display item, the interaction determining module determines to stop the interactive operation on the display item.

根据本发明的一方面，当用户靠近图像获取设备时，姿态追踪模块追踪和识别用户的手指动作以识别用户的手势，当用户远离图像获取设备时，姿态追踪模块追踪和识别用户的手臂的动作。According to an aspect of the present invention, when the user is close to the image acquisition device, the posture tracking module tracks and recognizes the user's finger movement to recognize the user's gesture, and when the user is far away from the image acquisition device, the posture tracking module tracks and recognizes the user's arm movement .

根据本发明的一方面，人机交互处理设备还包括：自定义姿势注册模块，用于注册与用户自定义的姿势动作对应的交互操作命令。According to an aspect of the present invention, the human-computer interaction processing device further includes: a custom gesture registration module, configured to register an interactive operation command corresponding to a user-defined gesture action.

根据本发明的另一方面，提供了一种人机交互方法，包括：获取图像数据；根据从图像数据检测的用户的多种类型的动作和姿态来确定用户想要进行的交互操作；显示与交互操作的结果对应的显示屏幕。According to another aspect of the present invention, a human-computer interaction method is provided, including: acquiring image data; determining the interaction operation that the user wants to perform according to various types of actions and gestures of the user detected from the image data; displaying and The display screen corresponding to the result of the interactive operation.

根据本发明的另一方面，确定交互操作的步骤包括：从图像数据中检测用户的多种类型的动作和姿态；根据检测的用户的多种类型的动作和姿态来确定将要进行的交互操作，并发出与交互操作对应的显示操作指令；根据确定的指令控制显示设备在显示屏幕上显示相应的交互操作。According to another aspect of the present invention, the step of determining the interactive operation includes: detecting various types of actions and gestures of the user from the image data; determining the interactive operation to be performed according to the detected various types of actions and gestures of the user, And issue a display operation instruction corresponding to the interactive operation; control the display device to display the corresponding interactive operation on the display screen according to the determined instruction.

根据本发明的另一方面，检测用户的多种类型的动作和姿态的步骤包括：从图像数据中检测用户的视线方向；追踪和识别用户身体各部分的姿态动作。According to another aspect of the present invention, the step of detecting various types of actions and gestures of the user includes: detecting the gaze direction of the user from image data; tracking and recognizing gestures and actions of various parts of the user's body.

根据本发明的另一方面，通过从图像数据中检测用户的头部的俯仰方向和偏转方向来确定用户的视线方向。According to another aspect of the present invention, the user's line of sight direction is determined by detecting the pitch direction and the yaw direction of the user's head from the image data.

根据本发明的另一方面，通过在图像数据中追踪和检测用户的手的节点以确定用户的手的运动和手势，并通过从图像数据中检测用户的身体骨骼节点以确定用户身体各部分的姿态动作。According to another aspect of the present invention, by tracking and detecting the nodes of the user's hand in the image data to determine the motion and gesture of the user's hand, and by detecting the nodes of the user's body skeleton from the image data to determine the shape of each part of the user's body Gesture action.

根据本发明的另一方面，根据检测的用户的视线方向和姿态追踪模块识别的用户的手的姿态来确定是否开始交互操作。According to another aspect of the present invention, whether to start the interactive operation is determined according to the detected gaze direction of the user and the gesture of the user's hand recognized by the gesture tracking module.

根据本发明的另一方面，如果确定用户的视线方向和用户的手的指示方向均指向显示屏幕上的显示项超过预定时间，则确定开始对该显示项进行交互操作。According to another aspect of the present invention, if it is determined that both the user's gaze direction and the user's hand pointing direction point to a display item on the display screen for more than a predetermined time, it is determined to start an interactive operation on the display item.

根据本发明的另一方面，如果确定用户的视线方向和用户的手的指示方向均未指向显示项，则确定停止对该显示项进行交互操作。According to another aspect of the present invention, if it is determined that neither the user's gaze direction nor the user's hand pointing direction points to the display item, it is determined to stop the interactive operation on the display item.

根据本发明的另一方面，当用户靠近图像获取设备时，追踪和识别用户的手指动作以识别用户的手势，当用户远离图像获取设备时，识别用户的手臂的动作。According to another aspect of the present invention, when the user is close to the image capturing device, the user's finger movement is tracked and recognized to recognize the user's gesture, and when the user is away from the image capturing device, the user's arm movement is recognized.

根据本发明的另一方面，确定交互操作的步骤还包括：确定与注册的用户自定义的姿势动作对应的交互操作。According to another aspect of the present invention, the step of determining the interaction operation further includes: determining the interaction operation corresponding to the registered user-defined gesture action.

附图说明Description of drawings

通过下面结合示例性地示出实施例的附图进行的描述，本发明的上述和其他目的和特点将会变得更加清楚，其中：The above and other objects and features of the present invention will become more apparent from the following description in conjunction with the accompanying drawings exemplarily showing embodiments, in which:

图1是示出根据本发明实施例的人机交互系统和用户进行互动的示意图；FIG. 1 is a schematic diagram illustrating the interaction between a human-computer interaction system and a user according to an embodiment of the present invention;

图2是示出根据本发明实施例的人机交互系统的人机交互处理设备的结构框图；2 is a structural block diagram showing a human-computer interaction processing device of a human-computer interaction system according to an embodiment of the present invention;

图3是示出根据本发明另一实施例的开始或停止人机交互操作姿态的示意图；Fig. 3 is a schematic diagram showing a gesture of starting or stopping a human-computer interaction operation according to another embodiment of the present invention;

图4是示出根据本发明实施例的人机交互方法的流程图；FIG. 4 is a flowchart illustrating a human-computer interaction method according to an embodiment of the present invention;

图5是示出根据本发明实施例的人机交互方法进行菜单操作的流程图；FIG. 5 is a flow chart illustrating menu operations performed by a human-computer interaction method according to an embodiment of the present invention;

图6是示出根据本发明实施例的人机交互方法进行3D显示目标的交互操作的流程图；FIG. 6 is a flow chart illustrating the interactive operation of a 3D display target according to a human-computer interaction method according to an embodiment of the present invention;

图7是示出根据本发明实施例的人机交互方法进行手写操作的流程图。Fig. 7 is a flow chart showing a handwriting operation performed by a human-computer interaction method according to an embodiment of the present invention.

具体实施方式Detailed ways

现将详细描述本发明的实施例，所述实施例的示例在附图中示出，其中，相同的标号始终指的是相同的部件。以下将通过参照附图来说明所述实施例，以便解释本发明。Reference will now be made in detail to embodiments of the invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like parts throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

图1是示出根据本发明实施例的人机交互系统和用户进行互动的示意图。Fig. 1 is a schematic diagram illustrating interaction between a human-computer interaction system and a user according to an embodiment of the present invention.

如图1所示，根据本发明实施例的人机交互系统包括图像获取设备100、人机交互处理设备200和显示设备300。图像获取设备100用于获取图像数据，图像数据可具有深度特征和颜色特征。图像获取设备100可以是能够拍摄深度图像的装置，例如，深度相机。As shown in FIG. 1 , the human-computer interaction system according to the embodiment of the present invention includes an image acquisition device 100 , a human-computer interaction processing device 200 and a display device 300 . The image acquisition device 100 is used to acquire image data, and the image data may have depth features and color features. The image acquisition device 100 may be a device capable of capturing depth images, for example, a depth camera.

人机交互处理设备200用于对图像获取设备100获取的图像数据进行分析，从而识别出用户的姿态和动作并对用户的姿态和动作进行解析。然后，人机交互处理设备200根据解析的结果控制显示设备300进行对应的显示。显示设备300可以是诸如电视机(TV)，投影仪的设备。The human-computer interaction processing device 200 is configured to analyze the image data acquired by the image acquiring device 100, so as to recognize the user's gesture and action and analyze the user's gesture and action. Then, the human-computer interaction processing device 200 controls the display device 300 to perform corresponding display according to the analysis result. The display device 300 may be a device such as a television (TV) or a projector.

这里，如图1所示，人机交互处理设备200可根据检测到的用户的多种类型的动作和姿态来确定用户想要进行的交互操作。例如，用户可在注视显示设备300所显示的内容中的多个对象(例如，图1中所示的OBJ1、OBJ2和OBJ3)中的某个特定对象(OBJ2)的同时，用手指指向该特定对象，从而开始交互操作。也就是说，人机交互处理设备200可检测用户的视线方向、手势和身体各部分的动作和姿势。用户还可通过移动手指来对显示的某个特定对象进行操作，例如，改变该对象的显示位置。同时，用户还可移动身体的某个部位(例如，手臂)或者移动整个身体来进行交互操作的输入。应理解，虽然图像获取设备100、人机交互处理设备200和显示设备300被显示为分离的设备，但是这三个设备也可以任意地组合为一个或两个设备。例如，图像获取设备100和人机交互处理设备200可以在一个设备中实现。Here, as shown in FIG. 1 , the human-computer interaction processing device 200 may determine the interaction operation that the user wants to perform according to various types of detected actions and gestures of the user. For example, a user may point a finger at a specific object (OBJ2) among a plurality of objects (eg, OBJ1, OBJ2, and OBJ3 shown in FIG. object to start interacting. That is, the human-computer interaction processing device 200 can detect the user's gaze direction, gestures, and actions and postures of various parts of the body. The user can also operate a certain displayed object by moving a finger, for example, change the display position of the object. At the same time, the user can also move a certain part of the body (for example, an arm) or move the whole body to input the interactive operation. It should be understood that although the image acquisition device 100 , the human-computer interaction processing device 200 and the display device 300 are shown as separate devices, these three devices may also be arbitrarily combined into one or two devices. For example, the image acquisition device 100 and the human-computer interaction processing device 200 can be implemented in one device.

下面将参照图2来对根据本发明实施例的人机交互系统中的人机交互处理设备200的结构进行详细说明。The structure of the human-computer interaction processing device 200 in the human-computer interaction system according to the embodiment of the present invention will be described in detail below with reference to FIG. 2 .

如图2所示，根据本发明实施例的人机交互处理设备200包括运动检测模块210、交互确定模块220和显示控制模块230。As shown in FIG. 2 , the human-computer interaction processing device 200 according to the embodiment of the present invention includes a motion detection module 210 , an interaction determination module 220 and a display control module 230 .

运动检测模块210用于检测用户的多种类型的动作和确定用户的姿态。例如，运动检测模块210可检测和确定用户的视线方向的运动、身体部件的运动、手势动作和身体姿态动作。交互确定模块220可根据运动检测模块210检测的用户的多种类型的动作和姿态来确定将要进行的交互操作。将在下面对运动检测模块210的操作过程进行详细描述。The motion detection module 210 is used to detect various types of actions of the user and determine the gesture of the user. For example, the motion detection module 210 may detect and determine motion of the user's gaze direction, motion of body parts, gestures, and body gestures. The interaction determination module 220 may determine an interaction operation to be performed according to various types of actions and gestures of the user detected by the motion detection module 210 . The operation process of the motion detection module 210 will be described in detail below.

根据本发明的一个实施例，运动检测模块210可包括视线捕捉模块211和姿态追踪模块213。According to an embodiment of the present invention, the motion detection module 210 may include a gaze capture module 211 and a gesture tracking module 213 .

其中，视线捕捉模块211用于从图像数据中获取用户的视线方向。可通过从图像数据中检测用户的头部姿势来获得用户的视线方向。头部的姿势主要由头部俯仰和头部偏转来体现。相应地，可在深度图像中的头部区域分别估计头部的俯仰角和偏转角，从而基于所述俯仰角和偏转角来合成相应的头部姿势，从而得到用户的视线方向。Wherein, the line-of-sight capturing module 211 is used to obtain the line-of-sight direction of the user from the image data. The user's gaze direction may be obtained by detecting the user's head posture from image data. The posture of the head is mainly reflected by head pitch and head yaw. Correspondingly, the pitch angle and yaw angle of the head can be estimated respectively in the head area in the depth image, so as to synthesize the corresponding head pose based on the pitch angle and yaw angle, and thus obtain the user's line of sight direction.

姿态追踪模块213用于追踪和识别用户身体各部分的姿态动作。例如，姿态追踪模块213可从获取的图像数据中追踪和识别用户的指示方向和手指的动作。姿态追踪模块213可追踪手的运动轨迹和速度。另外，姿态追踪模块213还可追踪和识别用户身体各个部件(例如，手臂)的动作。优选地，在用户靠近图像获取设备100的模式下，姿态追踪模块213可通过密集、可靠的图像数据对用户的手部的节点进行追踪，从而确定用户的手指的指向方向和动作(即，手势)。而在用户远离图像获取设备100的模式下，由于获取的图像数据比较粗略、噪声较多且手部区域小，姿态追踪模块213可通过追踪人体的骨骼节点来对用户的上臂(即，腕节点和肘节点之间的骨骼)进行追踪，从而追踪和识别用户的手臂指向方向和动作。The gesture tracking module 213 is used for tracking and recognizing gestures and actions of various parts of the user's body. For example, the gesture tracking module 213 can track and recognize the user's pointing direction and finger motion from the acquired image data. The gesture tracking module 213 can track the trajectory and speed of the hand. In addition, the posture tracking module 213 can also track and recognize movements of various parts of the user's body (eg, arms). Preferably, in the mode where the user is close to the image acquisition device 100, the posture tracking module 213 can track the nodes of the user's hand through dense and reliable image data, so as to determine the pointing direction and motion (that is, gesture) of the user's finger. ). While in the mode where the user is far away from the image acquisition device 100, since the acquired image data is relatively rough, with more noise and the hand area is small, the gesture tracking module 213 can track the user's upper arm (that is, the wrist node) by tracking the skeletal nodes of the human body. and the bone between the elbow node) to track and recognize the user's arm pointing direction and movement.

为此，根据本发明的实施例，姿态追踪模块213可基于皮肤颜色特征和/或3D特征来识别和追踪用户的手的运动。具体地，姿态追踪模块213可包括基于皮肤颜色或3D特征训练的分类器。对于采用皮肤颜色分类器的情况，可利用概率模型(例如，高斯混合模型(GMM))来通过手部皮肤的颜色分布以区分一个可能的像素是否属于手部。对于深度特征，可如“Real-Time Human Pose Recognition in Parts from Single Depth Images.JamieShotton et al.In CVPR 2011”中介绍的方式产生深度比较特征，或将局部深度块(小的矩形块)与已知手部模型上的块进行比较并测量相似度。然后，将不同的颜色特征和深度特征组合，可使用通用的分类器(诸如，Random Forest或AdaBoosting决策树)来执行分类任务以确定图像数据中的手部。然后，通过逐帧的检测手部，姿态追踪模块213可追踪和计算手部的运动轨迹/速度，以在2D图像和3D空间域中定位手部。特别地，通过将深度数据与3D手部模型进行比对，可追踪手部关节的位置。然而，如果手部远离图像获取设备100，则当图像数据中的手部区域小于预定阈值时，考虑数据可靠性，可通过追踪用户的身体骨骼的方式来确定手臂的运动。To this end, according to an embodiment of the present invention, the gesture tracking module 213 may recognize and track the movement of the user's hand based on the skin color feature and/or the 3D feature. Specifically, the pose tracking module 213 may include a classifier trained based on skin color or 3D features. For the case of using a skin color classifier, a probabilistic model (eg, Gaussian Mixture Model (GMM)) can be used to distinguish whether a possible pixel belongs to a hand through the color distribution of the hand skin. For depth features, depth comparison features can be generated as described in "Real-Time Human Pose Recognition in Parts from Single Depth Images. JamieShotton et al. In CVPR 2011", or local depth blocks (small rectangular blocks) can be compared with existing Blocks on the hand model are compared and the similarity is measured. Then, combining different color features and depth features, a general classifier (such as Random Forest or AdaBoosting decision tree) can be used to perform a classification task to determine the hands in the image data. Then, by detecting the hand frame by frame, the pose tracking module 213 can track and calculate the motion trajectory/velocity of the hand to locate the hand in the 2D image and 3D spatial domain. In particular, the position of the hand joints can be tracked by comparing the depth data with the 3D hand model. However, if the hand is far away from the image acquisition device 100, when the hand area in the image data is smaller than a predetermined threshold, considering data reliability, the arm motion can be determined by tracking the user's body skeleton.

交互确定模块220可根据由运动检测模块210检测到的用户的多种动作来确定将要进行的交互操作。例如，交互确定模块220可根据由姿态追踪模块211确定的用户视线方向和姿态追踪模块213确定的用户指示方向来确定是否进入交互操作姿态，并根据后续的用户的姿态动作和视线方向确定将要执行的交互操作。也就是说，交互确定模块220可根据用户视线方向和用户的指示方向来确定交互操作的开始或结束。具体地，当姿态追踪模块211确定用户的视线方向和姿态追踪模块213确定的用户指示方向均指向在显示设备300上显示的某个目标(即，视线方向和手指的指示方向的交汇之处具有特定的显示目标)超过预定时间时，交互确定模块220可确定用户想要开始进行交互以对显示目标进行操作。在对显示目标进行操作的过程中，交互确定模块220确定用户视线和指向方向中的至少一个是否仍然保持在该显示目标上。当用户视线和指向方向均未保持在该目标之上时，交互确定模块220可确定用户停止与该显示目标的交互操作。通过以上的方式，可更加准确地确定用户是否开始或者结束交互操作，从而提高了交互操作的准确性。The interaction determination module 220 may determine an interaction operation to be performed according to various actions of the user detected by the motion detection module 210 . For example, the interaction determining module 220 may determine whether to enter an interactive operation gesture according to the user's gaze direction determined by the gesture tracking module 211 and the user's pointing direction determined by the gesture tracking module 213, and determine whether to perform interactive operation. That is to say, the interaction determination module 220 may determine the start or end of the interaction operation according to the user's gaze direction and the user's pointing direction. Specifically, when the gesture tracking module 211 determines that the user's gaze direction and the gesture tracking module 213 determine the user's pointing direction both point to a certain target displayed on the display device 300 (that is, the intersection of the gaze direction and the pointing direction of the finger has When a specific display target) exceeds a predetermined time, the interaction determination module 220 may determine that the user wants to start interacting to operate the display target. During the operation on the display object, the interaction determination module 220 determines whether at least one of the user's line of sight and pointing direction is still on the display object. When neither the user's line of sight nor the pointing direction remains on the target, the interaction determination module 220 may determine that the user stops the interactive operation with the displayed target. Through the above manner, it can be more accurately determined whether the user starts or ends the interactive operation, thereby improving the accuracy of the interactive operation.

应理解，以上仅是根据检测到的用户的动作和姿态来确定是否开始或结束交互操作状态的一个示例。还可根据其它预设的方式来确定是否开始或结束交互操作状态。例如，可根据用户的视线方向和预定的手势来开始交互操作姿态。如图3所示，当运动检测模块210从图像数据中确定用户的手指张开且视线方向指向显示设备300的显示屏幕上的特定项时，则交互确定模块220可确定用户想要对该特定项进行交互操作。接下来，当运动检测模块210确定用户的手指并拢且手开始移动时，交互确定模块220可确定用户想要拖动特定项。如果运动检测模块210确定用户的手握成拳头，则交互确定模块220可确定用户想要停止交互操作。It should be understood that the above is only an example of determining whether to start or end the interactive operation state according to the detected actions and gestures of the user. Whether to start or end the interactive operation state can also be determined according to other preset manners. For example, the interactive operation gesture may be started according to the user's gaze direction and a predetermined gesture. As shown in FIG. 3 , when the motion detection module 210 determines from the image data that the user's fingers are spread open and the line of sight is directed to a specific item on the display screen of the display device 300, the interaction determination module 220 may determine that the user wants to select the specific item on the display screen of the display device 300. Items interact. Next, when the motion detection module 210 determines that the user's fingers are pinched together and the hand starts to move, the interaction determination module 220 may determine that the user wants to drag a specific item. If the motion detection module 210 determines that the user's hand is clenched into a fist, the interaction determination module 220 may determine that the user wants to stop the interactive operation.

在进入交互操作状态之后，交互确定模块220还根据用户的动作和姿态来确定用户想要进行的交互操作。根据本发明的一个实施例，交互确定模块220可根据用户的手的指示方向来确定移动指针的交互操作。根据姿态追踪模块213确定的用户的手的指示方向，交互确定模块220可计算出该指示方向与显示屏幕的交点，从而获得指针在显示屏幕上的位置。当用户的手移动时，交互确定模块220可发出相应的命令，指示显示控制模块230控制显示设备300的显示，使得指针也随着手的移动而在屏幕上移动。After entering the interactive operation state, the interaction determination module 220 also determines the interactive operation that the user wants to perform according to the user's actions and gestures. According to an embodiment of the present invention, the interaction determining module 220 may determine the interactive operation of moving the pointer according to the pointing direction of the user's hand. According to the pointing direction of the user's hand determined by the gesture tracking module 213, the interaction determining module 220 can calculate the intersection point of the pointing direction and the display screen, so as to obtain the position of the pointer on the display screen. When the user's hand moves, the interaction determination module 220 can issue a corresponding command, instructing the display control module 230 to control the display of the display device 300, so that the pointer also moves on the screen with the movement of the hand.

根据本发明的一个实施例，交互确定模块220还可根据姿态追踪模块213确定的用户的手部动作来确定按钮的交互操作。根据姿态追踪模块213确定的用户的手的指示方向，交互确定模块220可计算出该指示方向与显示屏幕的交点，如果在该位置存在诸如按钮的显示项，则交互确定模块220可确定用户按下该按钮。或者，如果姿态追踪模块213确定用户的手指/拳头沿着其指示方向快速移动，则交互确定模块220确定按钮被按下。According to an embodiment of the present invention, the interaction determining module 220 may also determine the interactive operation of the button according to the user's hand motion determined by the gesture tracking module 213 . According to the pointing direction of the user's hand determined by the gesture tracking module 213, the interaction determining module 220 can calculate the intersection point of the pointing direction and the display screen, and if there is a display item such as a button at this position, the interaction determining module 220 can determine that the user presses Press the button. Alternatively, if the gesture tracking module 213 determines that the user's finger/fist is moving rapidly along its pointed direction, the interaction determination module 220 determines that the button is pressed.

应理解，这里仅仅给出了交互确定模块220根据视线追踪模块210所确定的视线方向和姿态追踪模块213确定的用户的姿态动作来确定用户想要进行的交互操作的几个示例。但本领域的技术人员应理解，本发明的交互操作不限于此。还可根据用户的姿态动作和/或用户的视线方向来进行更多的交互操作，例如可通过移动手来拖动显示目标、旋转显示目标，通过手指的运动单击或双击显示目标等。It should be understood that here are just a few examples where the interaction determination module 220 determines the interaction operation that the user wants to perform according to the gaze direction determined by the gaze tracking module 210 and the user's gesture action determined by the gesture tracking module 213 . However, those skilled in the art should understand that the interactive operation of the present invention is not limited thereto. More interactive operations can also be performed according to the user's gestures and/or the direction of the user's line of sight, such as dragging the display target by moving the hand, rotating the display target, clicking or double-clicking the display target through finger movements, and the like.

另外，根据本发明的实施例，用户还可自定义与特定的动作姿势对应的交互操作。为此，人机交互处理设备200还可包括一自定义姿势注册模块(未示出)，用于注册与用户自定义的姿势动作对应的交互操作。自定义姿势注册模块可具有一数据库，用于将记录的姿势和动作映射到对应的交互操作命令。例如，在进行2D或3D目标显示的情况下，可通过追踪两个手的运动方向来缩小或放大2D或3D显示目标。特别地，为了注册新的姿势动作，自定义姿势注册模块测试用户自定义的姿势动作的可再现性和模糊性，并返回一个可靠性分数，以指示用户自定义的交互操作命令是否有效。In addition, according to the embodiment of the present invention, the user can also customize the interactive operation corresponding to a specific action gesture. To this end, the human-computer interaction processing device 200 may further include a custom gesture registration module (not shown), configured to register interactive operations corresponding to user-defined gestures and actions. The custom gesture registration module may have a database for mapping recorded gestures and actions to corresponding interactive manipulation commands. For example, in the case of 2D or 3D object display, the 2D or 3D displayed object can be reduced or enlarged by tracking the movement directions of the two hands. In particular, to register new gestures, the custom gesture registration module tests the reproducibility and ambiguity of user-defined gestures and returns a reliability score to indicate whether the user-defined interaction commands are valid.

在交互确定模块220确定了用户想要进行的交互操作之后，交互确定模块220向显示控制模块230发出相应的指令，显示控制模块230根据指令控制显示设备300在显示屏幕上显示相应的交互操作。例如，可控制显示设备300显示指针被移动、相应的显示项被移动、按钮被按下等操作的屏幕画面。After the interaction determination module 220 determines the interaction operation that the user wants to perform, the interaction determination module 220 sends a corresponding instruction to the display control module 230, and the display control module 230 controls the display device 300 to display the corresponding interaction operation on the display screen according to the instruction. For example, the display device 300 may be controlled to display a screen image of operations such as a pointer being moved, a corresponding display item being moved, a button being pressed, and the like.

下面将参照图4来描述根据本发明实施例的人机交互方法的具体过程。The specific process of the human-computer interaction method according to the embodiment of the present invention will be described below with reference to FIG. 4 .

如图4所示，在步骤S410，首先由图像获取设备100获取图像数据。As shown in FIG. 4 , in step S410 , image data is firstly acquired by the image acquisition device 100 .

接下来，在步骤S420，人机交互处理设备200分析图像获取设备100获取的图像数据中的多种类型的用户姿态和动作，以确定是否进入交互操作状态和用户想要进行的交互操作。这里，例如，人机交互处理设备200可从图像数据检测和识别用户的视线方向和人体的各个部分的动作和姿态，以确定用户想要进行的交互操作。根据本实施例，人机交互处理设备200可根据检测的视线方向和用户的指示方向来确定是否进入交互操作状态。具体地，当人机交互处理设备200确定从图像数据中检测出用户的视线方向和手的指示方向指向显示设备300的显示屏幕上所显示的某个显示项超过预定时间时，人机交互处理设备200进入交互操作状态，并根据用户后续的姿态动作来确定将要对显示目标执行的交互操作。Next, in step S420, the human-computer interaction processing device 200 analyzes various types of user gestures and actions in the image data acquired by the image acquisition device 100 to determine whether to enter the interactive operation state and the interactive operation that the user wants to perform. Here, for example, the human-computer interaction processing device 200 may detect and recognize the user's gaze direction and the actions and gestures of various parts of the human body from the image data to determine the interactive operation the user wants to perform. According to this embodiment, the human-computer interaction processing device 200 may determine whether to enter the interactive operation state according to the detected gaze direction and the user's pointing direction. Specifically, when the human-computer interaction processing device 200 determines from the image data that the user's gaze direction and hand pointing direction point to a certain display item displayed on the display screen of the display device 300 for more than a predetermined time, the human-computer interaction processing The device 200 enters the interactive operation state, and determines the interactive operation to be performed on the display target according to the user's subsequent gesture action.

然后，在步骤S430，根据确定的交互操作控制显示设备300显示对应的显示屏幕或者更新显示屏幕。例如，可根据用户的手的指示方向确定用户想要移动显示的指针的位置、拖动显示项、单击显示项、双击显示项等等。Then, in step S430, the display device 300 is controlled to display the corresponding display screen or update the display screen according to the determined interactive operation. For example, a position where the user wants to move a displayed pointer, drag a displayed item, click a displayed item, double-click a displayed item, etc. may be determined according to the pointing direction of the user's hand.

在步骤S420中，如果在执行交互操作期间，人机交互处理设备200确定用户的指示方向和视线方向均离开了显示目标，则确定用户想要停止对显示目标的交互操作，并显示停止对显示目标进行操作的显示屏幕。应注意，还可通过其它的方式来确定用户是否想要停止交互操作。例如，可根据用户的特定手势(如上所述的握紧拳头)来停止交互操作。In step S420, if the human-computer interaction processing device 200 determines that both the pointing direction and the line of sight direction of the user have left the display target during the execution of the interactive operation, then it is determined that the user wants to stop the interactive operation on the display target, and the display stops. The display screen on which the target operates. It should be noted that other ways can also be used to determine whether the user wants to stop the interactive operation. For example, the interactive operation may be stopped according to a specific gesture of the user (clenching a fist as described above).

下面将参照图5-图7来说明利用根据本发明的人机交互方法执行各种交互操作的示意流程。The schematic flow of performing various interactive operations by using the human-computer interaction method according to the present invention will be described below with reference to FIGS. 5-7 .

图5示出的是根据本发明实施例的人机交互方法进行菜单操作的流程图。FIG. 5 shows a flow chart of menu operations performed by a human-computer interaction method according to an embodiment of the present invention.

在图5的实施例中，假设预设菜单被显示在显示设备300的显示屏幕上，并且预设菜单包括若干项供用户进行交互操作。In the embodiment of FIG. 5 , it is assumed that the preset menu is displayed on the display screen of the display device 300 , and the preset menu includes several items for the user to perform interactive operations.

在步骤S510，当从捕捉的图像数据中检测到的人体姿态表现出用户的手的指示方向和视线方向均指向显示屏幕上的某个特定菜单项时，确定进入对菜单的交互操作状态。In step S510, when the human body posture detected from the captured image data shows that both the pointing direction of the user's hand and the direction of sight point to a specific menu item on the display screen, it is determined to enter the interactive operation state for the menu.

接下来，在步骤S520，可追踪用户的手的运动轨迹和速度以确定用户的手的动作和手势，并根据手的动作和手势确定用户想要执行的交互操作。例如，可根据用户的手的动作来模拟鼠标的交互操作。当确定用户的食指做出单击的动作时，可选中手指指示方向上的菜单的特定项。当确定用户的中指做出单击的动作时，可显示与鼠标右键动作对应的内容，例如，显示与该项相关的附加菜单选项等。然后，在步骤S530，控制显示设备显示或更新与确定的交互操作对应的菜单内容。Next, in step S520, the trajectory and speed of the user's hand can be tracked to determine the user's hand motion and gesture, and the interactive operation that the user wants to perform can be determined according to the hand motion and gesture. For example, the interactive operation of the mouse can be simulated according to the movement of the user's hand. When it is determined that the user's index finger makes a click action, a specific item of the menu in the direction pointed by the finger may be selected. When it is determined that the user's middle finger performs a click action, content corresponding to the right mouse button action may be displayed, for example, additional menu options related to the item may be displayed. Then, in step S530, the display device is controlled to display or update the menu content corresponding to the determined interactive operation.

图6是根据本发明实施例的人机交互方法进行3D显示目标的操作的流程图。这里，显示设备300是可以显示3D内容的显示设备。Fig. 6 is a flowchart of an operation of displaying a target in 3D according to a human-computer interaction method according to an embodiment of the present invention. Here, the display device 300 is a display device that can display 3D content.

首先，在步骤S610，当从捕捉的图像数据中检测到的人体姿态表现出用户的手的指示方向和视线方向均指向显示屏幕上的特定3D显示目标时，确定进入对3D显示目标的交互操作状态。接下来，在步骤S620，可追踪用户的手的运动轨迹和速度以确定用户的手的动作和手势，并根据手的动作和手势确定用户想要执行的交互操作。例如，可将手的指示方向和视线方向的交汇点上的3D显示目标拾取起来，并可根据手的移动而移动3D显示目标。另外，还可根据手的动作来拖动、放大或缩小选中的3D显示目标。最后，在步骤S630，控制显示设备根据确定的交互操作重新渲染交互操作之后的3D显示目标。First, in step S610, when the human body posture detected from the captured image data shows that the pointing direction of the user's hand and the direction of sight point to a specific 3D display object on the display screen, it is determined to enter the interactive operation on the 3D display object state. Next, in step S620, the trajectory and speed of the user's hand can be tracked to determine the user's hand motion and gesture, and the interactive operation that the user wants to perform can be determined according to the hand motion and gesture. For example, the 3D display object at the intersection of the pointing direction of the hand and the line of sight direction can be picked up, and the 3D display object can be moved according to the movement of the hand. In addition, the selected 3D display target can also be dragged, enlarged or reduced according to the movement of the hand. Finally, in step S630, the display device is controlled to re-render the 3D display object after the interactive operation according to the determined interactive operation.

图7是根据本发明实施例的人机交互方法进行文本输入操作的流程图。这里，假设显示设备300所显示的显示屏幕上的预定区域可作为文本输入区域。Fig. 7 is a flowchart of a text input operation performed by a human-computer interaction method according to an embodiment of the present invention. Here, it is assumed that a predetermined area on the display screen displayed by the display device 300 can be used as a text input area.

首先，在步骤S710，当从捕捉的图像数据中检测到的人体姿态表现出用户的手的指示方向和视线方向均指向显示屏幕上的手写输入区域时，确定进入手写输入的交互操作状态。接下来，在步骤S720，可追踪用户的手的运动轨迹和速度，并根据用户的手的运动轨迹确定用户想要输入的文本。可根据基于学习的识别方法来确定用户想要输入的文本，并将文本解释为对应的交互操作命令。最后，在步骤S730，控制显示设备显示交互操作命令执行之后的结果的显示屏幕。First, in step S710, when the human body posture detected from the captured image data shows that the pointing direction of the user's hand and the direction of sight both point to the handwriting input area on the display screen, it is determined to enter the interactive operation state of handwriting input. Next, in step S720, the trajectory and speed of the user's hand may be tracked, and the text that the user wants to input may be determined according to the trajectory of the user's hand. The text that the user wants to input may be determined according to a learning-based recognition method, and the text may be interpreted as a corresponding interactive operation command. Finally, in step S730, the display device is controlled to display a display screen of the result after the execution of the interactive operation command.

应理解，以上实施例虽然根据视线方向和手的指示方向来确定是否开始或结束交互操作以及用户的后续的交互操作，但是本发明不限于此。可根据检测其它类型的运动检测的组合来确定是否开始或结束交互操作以及后续的交互操作。It should be understood that although the above embodiments determine whether to start or end the interactive operation and the user's subsequent interactive operation according to the direction of sight and the pointing direction of the hand, the present invention is not limited thereto. Whether to start or end an interactive operation and subsequent interactive operations may be determined based on detecting a combination of other types of motion detection.

根据本发明，可以利用多种运动检测方式的组合来进行人机交互操作，从而在不需要额外的输入装置(例如，触摸屏输入装置)的情况下，降低人机交互操作识别的模糊度，提高人机交互操作的准确性。例如，在不采用触摸屏输入装置的情况下，可以实现显示目标的放大、缩小的交互操作。这样，充分利用了计算机视觉技术的运动检测方式，为用户带来了更好地交互操作体验。According to the present invention, a combination of multiple motion detection methods can be used for human-computer interaction, thereby reducing the ambiguity of human-computer interaction recognition and improving The accuracy of human-computer interaction operation. For example, without using the touch screen input device, the interactive operation of zooming in and zooming out the displayed object can be realized. In this way, the motion detection method of the computer vision technology is fully utilized, and a better interactive operation experience is brought to the user.

虽然已经参照本发明的若干示例性实施例示出和描述了本发明，但是本领域的技术人员将理解，在不脱离权利要求及其等同物限定的本发明的精神和范围的情况下，可以在形式和细节上做出各种改变。While the invention has been shown and described with reference to several exemplary embodiments thereof, those skilled in the art will understand that, without departing from the spirit and scope of the invention as defined in the claims and their equivalents, other modifications may be made. Various changes were made in form and detail.

Claims

1. a kind of man-machine interactive system, including：

Image acquisition equipment, for obtaining image data；

Sight capture module determines user by detecting pitch orientation and the deflection direction on the head of user from image data Direction of visual lines；

Posture tracing module, the instruction direction of the hand for user to be tracked and identified in image data；

Interaction determining module, is directed to display items to determine interaction based on the instruction direction of the direction of visual lines of user and the hand of user The beginning of operation；

Wherein, the interactive determining module：

At least one of the instruction direction of hand of direction of visual lines and user in response to determining user still points to display items to protect Interactive operation is held,

Display items are not directed toward to stop interactive operation in the instruction direction of the hand of direction of visual lines and user in response to determining user.

2. man-machine interactive system as described in claim 1, wherein posture tracing module be additionally operable in image data tracking and Identify the posture and action of user's body each section.

3. man-machine interactive system as claimed in claim 2, wherein use is tracked in image data and detected to posture tracing module The node of the hand at family detects the body bone node of user to determine user's body to determine movement and the gesture of the hand of user The posture of each section acts.

4. man-machine interactive system as claimed in claim 3, wherein appearance of the interaction determining module also according to user's body each section State determines beginning interactive operation with action.

5. man-machine interactive system as claimed in claim 4, wherein the use that interaction determining module is detected according to sight capture module Family direction of visual lines and posture tracing module identification user hand action come determine whether start interactive operation.

6. man-machine interactive system as claimed in claim 2, wherein when user is close to image acquisition equipment, posture tracks mould Block is tracked and the finger movement of identification user is to identify the gesture of user, when user is far from image acquisition equipment, posture tracking Module is tracked and the action of the arm of identification user.

7. man-machine interactive system as described in claim 1, further includes：

It shows equipment, shows display screen corresponding with the result of interactive operation,

Wherein, if it is determined that the display items that the instruction direction of the direction of visual lines of user and the hand of user is directed on display screen are super The predetermined time is spent, then starts interactive operation.

8. man-machine interactive system as described in claim 1, further includes：

Self-defined posture registration module, for registering interaction operation command corresponding with user-defined gesture actions.

9. a kind of man-machine interaction method, including：

Obtain image data；

The direction of visual lines of user is determined by detecting pitch orientation and the deflection direction on the head of user from image data；

The instruction direction of the hand of user is tracked and identified in image data；

It is directed to display items based on the instruction direction of the direction of visual lines of user and the hand of user to determine the beginning of interactive operation；

Wherein, in response to determining that the direction of visual lines of user and at least one of the instruction direction of hand of user still point to display items Interactive operation is kept, is stopped in response to determining that display items are not directed toward in the instruction direction of the direction of visual lines of user and the hand of user Only interactive operation.

10. man-machine interaction method as claimed in claim 9, wherein by the hand that user is tracked and detected in image data Node to determine movement and the gesture of the hand of user, and by detecting the body bone node of user from image data with true Determine the posture action of user's body each section.

11. man-machine interaction method as claimed in claim 9, wherein tracked according to the direction of visual lines of the user of detection and posture Module identification user hand action come determine whether start interactive operation.

12. man-machine interaction method as claimed in claim 9, wherein if it is determined that the hand of the direction of visual lines and user of user It indicates that the display items that direction is directed on display screen are more than the predetermined time, then starts interactive operation.

13. man-machine interaction method as claimed in claim 9, wherein when user is close to image acquisition equipment, tracking and identification The finger movement of user, when user is far from image acquisition equipment, identifies the action of the arm of user to identify the gesture of user.

14. man-machine interaction method as claimed in claim 9, wherein the step of determining interactive operation further include：It determines and registers The corresponding interactive operation of user-defined gesture actions.