CN103376890B

CN103376890B - The gesture remote control system of view-based access control model

Info

Publication number: CN103376890B
Application number: CN201210121832.XA
Authority: CN
Inventors: 王琪; 范伟; 谭志明
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-04-16
Filing date: 2012-04-16
Publication date: 2016-08-31
Anticipated expiration: 2032-04-16
Also published as: CN103376890A

Abstract

The invention discloses a gesture remote control system based on vision, which includes: an image capture device for capturing a series of images of an object; a gesture recognition device for a series of images captured by the image capture device The image recognizes the gesture of the object and sends the recognition result to the operation command triggering device; and the operation command triggering device triggers a predetermined operation command according to the recognition result sent from the gesture recognition device. The gesture recognition device includes: a hand detection part for detecting a hand of a subject from an image captured by an image capture device; a hand tracking part for detecting a hand of a subject in an image by the hand detection part, Track the object's hand in the image that comes down; The gesture recognition part is used to determine the motion of the object's hand according to the hand of the object detected by the hand detection part and the hand of the object tracked by the hand tracking part and determine the movement of the object's hand according to the determined object's hand motion to recognize gestures of objects.

Description

Vision-based gesture remote control system

技术领域 technical field

本发明涉及图像处理、模式识别及对象跟踪领域，并且更具体地涉及基于视觉的手势遥控系统。The present invention relates to the fields of image processing, pattern recognition and object tracking, and more particularly to vision-based gesture remote control systems.

背景技术 Background technique

随着在当今人们的生活中计算机以及众多便携式智能设备变得越来越不可或缺，人们将希望在人与计算机之间的更自然并且更高效的交互。然而，传统的诸如鼠标/键盘、遥控器、甚至于触摸屏之类的人机交互(HCI)的外围设备在一些特殊条件下(例如，在浴室或厨房中，在驾驶中，等等)对于用户来说是不方便的，因为在这些地方需要的是自由触摸HCI。因此，近年来，作为潜在的解决方案之一的手势遥控系统得到越来越多的关注。As computers and numerous portable smart devices become more and more indispensable in people's lives today, people will desire more natural and efficient interactions between humans and computers. However, traditional human-computer interaction (HCI) peripherals such as mouse/keyboard, remote control, and even touch screens are difficult for users under some special conditions (for example, in the bathroom or kitchen, while driving, etc.) It is inconvenient for those places, because what is needed in these places is touch-free HCI. Therefore, in recent years, the gesture remote control system as one of the potential solutions has received more and more attention.

基本上，手势遥控系统将跟踪手并分析有意义的手的表达，如果它们被识别为是预先定义的手势中的一种，则相应的操作命令将被触发以执行预定的操作。由于在很多情形中手势识别是很复杂的，所以在手势识别处理中，许多不同的工具被采用来解决此问题，诸如隐马尔科夫模型(Hidden Markov Models(HMM))、粒子滤波器、有限状态机(FSM)和神经网络。大多数手势识别系统要求高的计算复杂度；此外，其中一些具有某些限制，例如，需要额外的设备(如需要佩带手套)或精密的仪器(如需要红外相机收集深度信息)或者只能在良好照明环境以及简单背景环境中运行(如不能区分手与具有肤色类似颜色的物体，或者只能识别静态手势，等等)。Basically, the gesture remote control system will track the hands and analyze meaningful hand expressions, if they are recognized as one of the pre-defined gestures, the corresponding operation command will be triggered to perform the predetermined operation. Since gesture recognition is complex in many situations, many different tools are employed to solve this problem in gesture recognition processing, such as Hidden Markov Models (HMM), particle filters, finite State Machines (FSM) and Neural Networks. Most gesture recognition systems require high computational complexity; moreover, some of them have certain limitations, for example, require additional equipment (e.g. need to wear gloves) or sophisticated instrumentation (e.g. need infrared camera to collect depth information) or can only be used in Operate in a well-lit environment and in a simple background environment (such as not being able to distinguish between a hand and an object with a similar color to the skin tone, or only recognizing static gestures, etc.).

因此，需要一种实时遥控的计算复杂度低并且能够在复杂环境中良好运行的手势识别系统。Therefore, there is a need for a real-time remote control with low computational complexity and a gesture recognition system that can operate well in complex environments.

发明内容 Contents of the invention

根据本发明的一方面，一种基于视觉的手势遥控系统包括：图像捕获设备，所述图像捕获设备用于捕获对象的一系列图像；手势识别设备，所述手势识别设备用于从所述图像捕获设备所捕获的一系列图像识别对象的手势并将识别结果发送给操作命令触发设备；以及操作命令触发设备，所述操作命令触发设备用于根据从所述手势识别设备发送来的识别结果来触发预定操作命令，其中，所述手势识别设备包括：手检测部件，所述手检测部件用于从所述图像捕获设备所捕获的图像检测对象的手；手跟踪部件，所述手跟踪部件用于在当所述手检测部件在一图像中检测到对象的手时，在接下来的图像中跟踪对象的手；手势识别部件，所述手势识别部件用于根据所述手检测部件检测到的对象的手和所述手跟踪部件跟踪到的对象的手来确定对象的手的运动并根据所确定的对象的手的运动来识别对象的手势。According to an aspect of the present invention, a vision-based gesture remote control system includes: an image capture device for capturing a series of images of an object; a gesture recognition device for capturing a series of images captured by the device to recognize the gesture of the object and sending the recognition result to the operation command triggering device; triggering a predetermined operation command, wherein the gesture recognition device includes: a hand detection part for detecting a subject's hand from an image captured by the image capture device; a hand tracking part for using for tracking the subject's hand in a subsequent image when the hand detection component detects the subject's hand in one image; a gesture recognition component for detecting The hand of the subject and the hand of the subject tracked by the hand tracking part determine the movement of the hand of the subject and recognize the gesture of the subject based on the determined movement of the hand of the subject.

在一个实施例中，所述手检测部件通过将由所述图像捕获设备所捕获的图像变换成灰度图像，来利用基于局部二值模式的级联分类器从该灰度图像检测对象的手。In one embodiment, said hand detection means detects the subject's hand from a grayscale image using a local binary pattern based cascade classifier by transforming the image captured by said image capture device into a grayscale image.

在一个实施例中，所述手跟踪部件通过如下处理来跟踪对象的手：利用在前一图像中检测或跟踪到的手的范围以及当前图像的肤色图像与前一图像的肤色图像之间的差分图像，来初始定义当前图像的肤色图像中的用于跟踪手的搜索范围；执行模板匹配法以确定作为跟踪到的当前图像的手的范围，其中，所述模板匹配法包括：In one embodiment, the hand tracking component tracks the subject's hand by using the range of the hand detected or tracked in the previous image and the distance between the skin color image of the current image and the skin color image of the previous image. difference image, to initially define the search range for tracking the hand in the skin color image of the current image; execute the template matching method to determine the range of the hand as the tracked current image, wherein the template matching method includes:

在搜索范围中定义多个第一候选手范围，该些第一候选手范围具有与目标模板的大小相同的大小，并且，在差分图像中定义第二候选手范围，该候选手范围具有与目标模板的大小相同的大小，其中，所述目标模板为在前一图像中检测或跟踪到的手的范围；A plurality of first candidate hand ranges are defined in the search range, and the first candidate hand ranges have the same size as the target template, and second candidate hand ranges are defined in the difference image, and the candidate hand ranges have the same size as the target template. The size of the template is the same size, wherein the target template is the range of the hand detected or tracked in the previous image;

针对所述多个第一候选手范围循环执行如下步骤直到这多个第一候选手范围都经过如下匹配判断处理为止，从而确定出与目标模板最匹配的候选手范围作为在当前图像的肤色图像中跟踪到的手的范围：For the multiple first candidate hand ranges, perform the following steps in a loop until the multiple first candidate hand ranges have undergone the following matching judgment processing, so as to determine the candidate hand range that best matches the target template as the skin color image in the current image The range of hands tracked in:

计算一个第一候选手范围与目标模板的各像素的绝对差的平均值作为第一误差；Calculate the average value of the absolute difference of each pixel of the range of a first candidate hand and the target template as the first error;

如果该第一误差大于第一预定阈值，则表示该个候选手范围不与目标模板匹配，从而被排除；If the first error is greater than the first predetermined threshold, it means that the range of the candidate hand does not match the target template, thereby being excluded;

如果该第一误差小于第一预定阈值，则计算第二误差，第二误差是通过将第一误差减去第二候选手范围的各像素的值的平均值所获得的值乘以预定的调节系数而得到的值；If the first error is less than a first predetermined threshold, a second error is calculated, the second error being the value obtained by subtracting the average value of the values of the pixels of the second candidate hand range from the first error by a predetermined adjustment The value obtained by the coefficient;

如果该第二误差小于第二预定阈值，则确定匹配，即该个第一候选手范围被确定为在当前图像的肤色范围中跟踪到的手的范围，并且第二误差的值作为第二阈值以便对下一个第一候选手范围进行匹配判断。If the second error is less than the second predetermined threshold, it is determined to match, that is, the first candidate hand range is determined to be the range of the hand tracked in the skin color range of the current image, and the value of the second error is used as the second threshold In order to make a matching judgment on the range of the next first candidate.

在一个实施例中，在确定了在当前图像的肤色图像中的搜索范围之后并在执行模板匹配法之前，参考所述差分图像来修正初始定义的搜索范围，并在减小后的搜索范围内定义所述多个候选手范围并执行所述模板匹配法来确定当前图像的手的范围，其中，所述修正包括：将初始定义的搜索范围的各边逐渐向内缩，并且当任一边遇到像素值大于预定阈值的像素时，该边停止向内缩。In one embodiment, after determining the search range in the skin color image of the current image and before performing the template matching method, refer to the difference image to modify the initially defined search range, and within the reduced search range defining the plurality of candidate hand ranges and performing the template matching method to determine the range of the hand in the current image, wherein the correction includes: gradually shrinking each side of the initially defined search range inward, and when any side meets When a pixel whose pixel value is greater than a predetermined threshold is reached, the edge stops shrinking inward.

在一个实施例中，在所述模板匹配法中，在针对每个第一候选手范围确定与目标模板匹配时，该匹配被验证，所述验证包括：计算当前被确认最匹配的第一候选手范围与在所述手检测设备中所检测到的手的范围的各个像素的绝对差的平均值作为第三误差；判断该第三误差是否大于第三阈值，如果该第三误差大于第三预定阈值，则判定当前被确认最匹配的第一候选手范围实际上不匹配，从而排除该第一候选手范围。In one embodiment, in the template matching method, when the range of each first candidate is determined to match the target template, the match is verified, and the verification includes: calculating the first candidate that is currently confirmed as the best match The average value of the absolute difference of each pixel of the hand range and the range of the hand detected in the hand detection device is used as the third error; whether the third error is judged to be greater than the third threshold, if the third error is greater than the third If the predetermined threshold is determined, it is determined that the first candidate hand range that is confirmed to be the best match does not actually match, thereby excluding the first candidate hand range.

在一个实施例中，本发明中所使用的肤色图像可通过如下步骤而得到：将所捕获的图像的每个像素的RGB分量中的R分量的值减去G分量与B分量的均值，以得到一差值；并且将所述差值与预定阈值比较，如果所述差值比所述预定阈值小，则肤色图像的相应像素的值取0，并且如果所述差值比所述预定阈值大，则肤色图像的相应像素的值取所述差值。In one embodiment, the skin color image used in the present invention can be obtained through the following steps: the value of the R component in the RGB components of each pixel of the captured image is subtracted from the mean value of the G component and the B component to obtain Obtain a difference; and compare the difference with a predetermined threshold, if the difference is smaller than the predetermined threshold, the value of the corresponding pixel of the skin color image is 0, and if the difference is smaller than the predetermined threshold is large, the value of the corresponding pixel of the skin color image takes the difference.

在一个实施例中，所述手势识别部件根据所述手检测部件和所述手跟踪部件在每帧图像所检测或跟踪到的对象的手的位置、从每帧图像的手的位置计算得到的相邻两帧图像之间的手位置取向以及从相邻两个手位置取向计算得到的相邻两帧图像的位移方向来确定对象的手的运动并将所确定的对象的手的运动与预先定义的手势-手运动轨迹映射表进行对比来识别对象的手势。In one embodiment, the gesture recognition component is calculated from the hand position of each frame image according to the hand position of the object detected or tracked by the hand detection component and the hand tracking component in each frame image The hand position and orientation between two adjacent frames of images and the displacement direction of two adjacent frames of images calculated from the adjacent two hand positions and orientations are used to determine the motion of the object's hand and compare the determined motion of the object's hand with the predetermined The defined gesture-hand motion trajectory mapping table is compared to recognize the gesture of the object.

根据本发明的另一方面，一种基于视觉的手势遥控方法，包括捕获对象的一系列图像；从所捕获的一系列图像识别对象的手势；根据识别结果来触发预定操作命令，其中，识别对象的手势包括：从所捕获的图像检测对象的手；在当在一图像中检测到对象的手时，在接下来的图像中跟踪对象的手；并且根据检测到的对象的手和跟踪到的对象的手来确定对象的手的运动并根据所确定的对象的手的运动来识别对象的手势。According to another aspect of the present invention, a vision-based gesture remote control method includes capturing a series of images of an object; recognizing the gesture of the object from the captured series of images; triggering a predetermined operation command according to the recognition result, wherein the recognition of the object The gesture includes: detecting the subject's hand from the captured image; when the subject's hand is detected in one image, tracking the subject's hand in the next image; and based on the detected subject's hand and the tracked The hand of the subject is used to determine the movement of the hand of the subject and the gesture of the subject is recognized based on the determined movement of the hand of the subject.

根据本发明的又一方面，一种用于将RGB图像变换成肤色图像的方法，该方法包括：将RGB图像的每个像素的RGB分量的R分量的值减去G分量与B分量的均值，以得到一差值；将所述差值与预定阈值比较，如果所述差值比所述预定阈值小，则肤色图像的相应像素的值取0，并且如果所述差值比所述预定阈值大，则肤色图像的相应像素的值取所述差值。According to yet another aspect of the present invention, a method for converting an RGB image into a skin color image, the method includes: subtracting the mean value of the G component and the B component from the value of the R component of the RGB component of each pixel of the RGB image , to obtain a difference; compare the difference with a predetermined threshold, if the difference is smaller than the predetermined threshold, the value of the corresponding pixel of the skin color image is 0, and if the difference is smaller than the predetermined threshold If the threshold is large, the value of the corresponding pixel of the skin color image takes the difference.

根据本发明的又一方面，一种用于在图像序列中跟踪目标的方法，包括：利用在前一图像中检测或跟踪到的目标的范围以及当前图像的肤色图像与前一图像的肤色图像之间的差分图像，来初始定义当前图像的肤色图像中的用于跟踪目标的搜索范围；执行模板匹配法以确定作为跟踪到的当前图像的目标的范围，其中，所述模板匹配法包括：According to yet another aspect of the present invention, a method for tracking an object in a sequence of images includes: using the range of an object detected or tracked in a previous image and the skin color image of the current image and the skin color image of the previous image Between the difference image, to initially define the search range for tracking the target in the skin color image of the current image; execute the template matching method to determine the range of the target as the tracked current image, wherein the template matching method includes:

在所述搜索范围中定义多个第一候选目标范围，该些第一候选目标范围具有与目标模板的大小相同的大小，并且，在所述差分图像中定义第二候选目标范围，该候选目标范围具有与目标模板的大小相同的大小，其中，所述目标模板为在前一图像中检测或跟踪到的目标的范围；A plurality of first candidate target ranges are defined in the search range, the first candidate target ranges have the same size as the target template, and a second candidate target range is defined in the difference image, the candidate target ranges the range has the same size as the size of the target template, wherein the target template is the range of the target detected or tracked in the previous image;

针对所述多个第一候选目标范围循环执行如下步骤直到这多个第一候选目标范围都经过如下匹配判断处理为止，从而确定出与目标模板最匹配的候选目标范围作为在当前图像的肤色图像中跟踪到的目标的范围：For the multiple first candidate target ranges, the following steps are cyclically executed until the multiple first candidate target ranges have undergone the following matching judgment processing, so as to determine the candidate target range that best matches the target template as the skin color image in the current image Range of targets tracked in :

计算一个第一候选目标范围与目标模板的各像素的绝对差的平均值作为第一误差；Calculating the average value of the absolute difference between the first candidate target range and each pixel of the target template as the first error;

如果该第一误差大于第一预定阈值，则表示该个候选目标范围不与目标模板匹配，从而被排除；If the first error is greater than the first predetermined threshold, it means that the candidate target range does not match the target template, and thus is excluded;

如果该第一误差小于第一预定阈值，则计算第二误差，第二误差是通过将第一误差减去第二候选目标范围的各像素的值的平均值所获得的值乘以预定的调节系数而得到的值；If the first error is smaller than a first predetermined threshold, a second error is calculated, the second error being a value obtained by subtracting the average value of the values of the pixels of the second candidate target range from the first error by a predetermined adjustment The value obtained by the coefficient;

如果该第二误差小于第二预定阈值，则确定匹配，即该个第一候选目标范围被确定为在当前图像的肤色范围中跟踪到的目标的范围，并且第二误差的值作为第二阈值以便对下一个第一候选目标范围进行匹配判断。If the second error is smaller than the second predetermined threshold, match is determined, that is, the first candidate target range is determined to be the range of the target tracked in the skin color range of the current image, and the value of the second error is used as the second threshold In order to perform matching judgment on the next first candidate target range.

根据本发明的基于视觉的手势识别系统，即使在手移动越过具有与肤色类似的颜色的某些物体的情况中，也能够简单并且高效地从连续捕获的帧图像序列中准确地检测手、跟踪手运动并识别出手势。According to the vision-based gesture recognition system of the present invention, it is simple and efficient to accurately detect the hand, track Hand movement and gesture recognition.

附图说明 Description of drawings

图1示出了本发明的手势控制系统的示意性结构；Fig. 1 shows the schematic structure of the gesture control system of the present invention;

图2示出了本发明的由手势识别设备执行的手势识别处理的流程图；Fig. 2 shows the flowchart of the gesture recognition processing performed by the gesture recognition device of the present invention;

图3示出了所捕获的RGB图像和变换后的灰度图像的示例；Figure 3 shows examples of captured RGB images and transformed grayscale images;

图4示出了当在变换后的灰度图像中成功检测到手时的示例，其中手的范围被以矩形框示出；Figure 4 shows an example when a hand is successfully detected in the transformed grayscale image, where the extent of the hand is shown by a rectangular box;

图5示出了在本发明的手势识别处理中所利用的肤色图像的示例性示图；Figure 5 shows an exemplary diagram of a skin tone image utilized in the gesture recognition process of the present invention;

图6示出了针对手向右运动的连续两个图像的肤色图像之间的差分图像；Fig. 6 shows the difference image between the skin tone images of two consecutive images for hand movement to the right;

图7中的(a)、(b)和(c)示出了在本发明的手势识别处理中定义用于在当前图像的肤色图像中跟踪手的搜索范围的处理的示例性示图；(a), (b) and (c) in Fig. 7 show the exemplary diagram of the process of defining the search range for tracking the hand in the skin color image of the current image in the gesture recognition process of the present invention;

图8示出了初始定义的搜索范围被修正后的结果的示例性示图；Fig. 8 shows an exemplary diagram of the result after the initially defined search range is revised;

图9示出了在本发明的手势识别处理中通过利用改进的模板匹配算法来跟踪跨越面部的手的处理的示例性示图；以及Figure 9 shows an exemplary diagram of the process of tracking hands across the face by utilizing an improved template matching algorithm in the gesture recognition process of the present invention; and

图10示出了在本发明的手势识别设备中用于计算手位置取向、手位置取向改变以及手位置取向所定义的手运动方向的示例性示图。FIG. 10 shows an exemplary diagram for calculating hand position orientation, hand position orientation change, and hand motion direction defined by hand position orientation in the gesture recognition device of the present invention.

具体实施方式 detailed description

下面，将参考附图详细描述本发明的优选实施例。注意，在本说明书和附图中，用相同的标号来表示具有基本上相同的功能和结构的结构元件，并且省略对这些结构元件的重复描述。Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Note that, in this specification and the appended drawings, structural elements that have substantially the same function and structure are denoted with the same reference numerals, and repeated description of these structural elements is omitted.

本发明提供了一种高效并且健壮的基于视觉的手势遥控系统，该系统利用常见的web相机工作并且在复杂环境中表现出可靠性，同时只需要较少的计算量。The present invention provides an efficient and robust vision-based gesture remote control system that works with common web cameras and exhibits reliability in complex environments while requiring less computation.

图1是示出了本发明的手势控制系统的示意结构的示图。如图1所示，本发明的基于视觉的手势遥控系统主要包括三部分：图像捕获设备101，用于捕获对象的一系列图像；手势识别设备102，用于从图像捕获设备101所捕获的一些列图像识别对象的手势并将识别结果发送给操作命令触发设备103；以及操作命令触发设备103，用于根据从所述手势识别设备102发送来的识别结果来触发预定操作命令。FIG. 1 is a diagram showing a schematic structure of a gesture control system of the present invention. As shown in Figure 1, the vision-based gesture remote control system of the present invention mainly includes three parts: an image capture device 101, which is used to capture a series of images of objects; a gesture recognition device 102, which is used to capture some The column image recognizes the gesture of the object and sends the recognition result to the operation command triggering device 103 ;

在本说明书中，web相机被用作图像捕获设备，但是本发明并不限于此，任何类型的已知的或者未来将得知的能够捕获视频图像的捕获装置都可被用作该图像捕获设备。在本发明中，手势识别设备是手势控制系统的核心部分，其识别有意义的手势并且将所识别的手势动作与预先定义的手势进行比较，如果所识别的手势动作与预先定义的手势之一相配，则识别结果发送给操作命令触发设备以触发预定的操作命令。下面将详细描述本发明的手势识别处理的内容。In this specification, a web camera is used as the image capture device, but the present invention is not limited thereto, and any type of known or future known capture device capable of capturing video images can be used as the image capture device . In the present invention, the gesture recognition device is the core part of the gesture control system, which recognizes meaningful gestures and compares the recognized gestures with predefined gestures. match, the recognition result is sent to the operation command triggering device to trigger a predetermined operation command. The content of gesture recognition processing of the present invention will be described in detail below.

在描述本发明的手势识别处理之前，首先给出该手势识别处理所支持的预先定义的手势，如表1所示。从表1可知，与静态手势相比，本发明的手势识别方法还支持动态手势。例如，当手“短暂停留”时，手保持静止非常短的时间(例如少于1秒)，当手保持“停留”时，手保持静止长达1秒以上，这里“1秒”仅为示例并且此值可由用户任意设定。此外，例如，当手向左(向右、向上、向下)运动时，手运动轨迹为短暂停留后向左(向右、向上、向下)移动，而当手在挥动时，手运动轨迹为向左-向右-向左-向下(即，→，←，→，←)移动。此外，当手顺时针旋转时，手运动轨迹为而当手逆时针旋转时，手运动轨迹为这里，手运动轨迹可以是完整的圈或者大半个圈。此外，如果用户没有做出任何有意义的手势，则任意的手运动轨迹被记录。虽然这里给出了如上的手势定义，但是本发明并不限于此，用户可以根据需要任意地进行定义，因此，可用于识别的手势可更自然。Before describing the gesture recognition processing of the present invention, the predefined gestures supported by the gesture recognition processing are given first, as shown in Table 1. It can be seen from Table 1 that compared with static gestures, the gesture recognition method of the present invention also supports dynamic gestures. For example, when the hand "pauses", the hand remains still for a very short period of time (e.g. less than 1 second), when the hand remains "pause", the hand remains still for more than 1 second, where "1 second" is just an example And this value can be set arbitrarily by the user. In addition, for example, when the hand moves to the left (right, up, down), the hand motion trajectory is to move to the left (right, up, down) after a short stay, while when the hand is waving, the hand motion trajectory for left-right-left-down (ie, →, ←, →, ←) movement. In addition, when the hand rotates clockwise, the trajectory of the hand is And when the hand rotates counterclockwise, the trajectory of the hand is Here, the hand motion trajectory may be a complete circle or a half circle. Furthermore, if the user does not make any meaningful gestures, arbitrary hand motions are recorded. Although the above gesture definition is given here, the present invention is not limited thereto, and the user can define it arbitrarily according to needs, so the gestures that can be used for recognition can be more natural.

表1：手势定义Table 1: Gesture definitions

接着，将描述本发明的手势识别处理的具体过程。该手势识别处理从所捕获的图像中检测对象的手，跟踪手的运动，然后根据手的运动轨迹以及预先定义的手势来识别表达了怎样的有意义的手势命令，具体过程如图2中所示。Next, a specific procedure of gesture recognition processing of the present invention will be described. The gesture recognition process detects the subject's hand from the captured image, tracks the movement of the hand, and then recognizes what meaningful gesture commands are expressed according to the movement trajectory of the hand and the predefined gestures. The specific process is shown in Figure 2 Show.

图2是示出了本发明的由手势识别设备执行的手势识别处理的流程图。如图2所示，本发明的手势识别处理由三个主要的处理阶段构成：检测、跟踪和识别。具体地，对于由图像捕获设备所捕获的对象的一系列图像中的每帧图像，执行下面的过程。首先，在S201，由图像捕获设备所捕获的对象的图像被规格化，例如被规格化为160像素×120像素，当然本发明并不限于此，被规格化的大小可由用户任意设定。在S202，根据有关手检测和手跟踪的记录来确定在上一帧图像中是否检测到手和是否跟踪到手，如果在上一帧中没有检测到手，则执行有关手检测的处理，如果在上一帧中已经跟踪到手，则执行有关手跟踪的处理。例如，在一个实施例中，标志isTrack被设定用于表示是否检测到手，当检测到手时，isTrack被赋1，而当没有检测到手时，isTrack被赋0。另外，在该实施例中，标志isDectec被设定用于表示是否跟踪到手，当跟踪到手时，isDectec被赋1，而当没有跟踪到手时，isDectec被赋0。因此，在该实施例中，通过检查标志isTrack和isDectec的值就可确定在上一帧中是否检测到手和是否跟踪到手。FIG. 2 is a flowchart showing gesture recognition processing performed by a gesture recognition device of the present invention. As shown in Figure 2, the gesture recognition processing of the present invention consists of three main processing stages: detection, tracking and recognition. Specifically, for each image frame in a series of images of the object captured by the image capture device, the following process is performed. Firstly, in S201, the image of the object captured by the image capture device is normalized, for example, it is normalized to 160 pixels×120 pixels. Of course, the present invention is not limited thereto, and the normalized size can be set arbitrarily by the user. In S202, determine whether the hand is detected and whether the hand is tracked in the previous frame image according to the records of hand detection and hand tracking, if no hand is detected in the previous frame, then the processing of hand detection is performed, if the hand is detected in the previous frame If the hand has been tracked in the frame, the processing related to hand tracking is performed. For example, in one embodiment, the flag isTrack is set to indicate whether a hand is detected, isTrack is assigned 1 when a hand is detected, and isTrack is assigned 0 when a hand is not detected. In addition, in this embodiment, the flag isDectec is set to indicate whether the hand is tracked. When the hand is tracked, isDectec is assigned 1, and when the hand is not tracked, isDectec is assigned 0. Therefore, in this embodiment, it can be determined whether a hand was detected and tracked in the previous frame by checking the values of the flags isTrack and isDectec.

在上一帧中没有检测到手和跟踪到手的情况中，即在S202中为“否”，执行有关手检测的处理。具体地，如图2中所示，在S203中，将当前图像从RGB图像变换成灰度图像，当然，如果所捕获到的图像就是灰度图像，则此步将略去。图3示出了所捕获的RGB图像和变换后的灰度图像的示例。在S204中，利用预先训练好的基于LBP(局部二值模式)的级联分类器在变换后的灰度图像的整个范围内检侧手。图4示出当在变换后的灰度图像中成功检测到手时的示例，其中手的范围被以矩形框示出。如前面所述，如果检测到手，在S206中isDectec被赋1，如果没有检测到手，则在S207中isDectec被赋0，这些记录被保存以供在对下一帧图像进行处理时使用。In the case where a hand was not detected and tracked in the previous frame, that is, NO in S202 , processing regarding hand detection is performed. Specifically, as shown in FIG. 2, in S203, the current image is converted from an RGB image to a grayscale image. Of course, if the captured image is a grayscale image, this step will be omitted. Figure 3 shows examples of captured RGB images and transformed grayscale images. In S204, a pre-trained cascade classifier based on LBP (Local Binary Pattern) is used to detect sidehands in the entire range of the transformed grayscale image. Fig. 4 shows an example when a hand is successfully detected in the transformed grayscale image, where the extent of the hand is shown with a rectangular box. As mentioned above, if a hand is detected, isDectec is assigned 1 in S206, and if no hand is detected, isDectec is assigned 0 in S207, and these records are saved for use when processing the next frame of image.

当在变换后的灰度图像中没有检测到手的情况中，本发明的手势识别方法返回到S201以对下一帧图像进行处理。另一方面，当在变换后的灰度图像中成功检测到手的情况中，在S208中将当前RGB图像变换成肤色图像，这便于将肤色区域与背景分开，从而减少在对下一帧图像进行手跟踪处理时由于背景所造成的影响。更具体地，本发明中所使用的将RGB图像变换成肤色图像的方法如下：假设肤色图像中的像素的值由s表示，RGB图像中的每个像素的RGB分量的R分量、G分量、B分量的值分别由r、g、b表示，并且设定一临时中间变量Temp，则s的值由如下公式定义：When no hand is detected in the converted grayscale image, the gesture recognition method of the present invention returns to S201 to process the next frame of image. On the other hand, when the hand is successfully detected in the converted grayscale image, the current RGB image is transformed into a skin color image in S208, which is convenient for separating the skin color area from the background, thereby reducing the need for processing the next frame image. Effects due to background during hand tracking processing. More specifically, the method for converting an RGB image into a skin color image used in the present invention is as follows: assuming that the value of a pixel in the skin color image is represented by s, the R component, the G component, and the RGB component of each pixel in the RGB image are The values of the B components are respectively represented by r, g, and b, and a temporary intermediate variable Temp is set, then the value of s is defined by the following formula:

Temp＝r-((g+b)/2))；Temp=r-((g+b)/2));

Temp＝MAX(0，Temp)；Temp = MAX(0,Temp);

s＝Temp＞140？0：Temp；s=Temp>140? 0: Temp;

也即，首先，将RGB图像的每个像素的RGB分量中的R分量的值减去G分量与B分量的均值值，以得到一差值；接着，将该差值与预定阈值比较，如果该差值比预定阈值小，则肤色图像的相应像素的值取0，并且如果该差值比预定阈值大，则肤色图像的相应像素的值取该差值。例如，图5示出了所捕获的RGB图像被以预定阈值140进行变换后的肤色图像的示例，其中矩形框表示手被检测到，如上所述，肤色区域与背景分开。虽然这里描述了如上所述的从RGB图像变换得到肤色图像的方法，但是用户也可以根据需要来采用其他肤色分隔方法来获得肤色图像。此外，当前图像的肤色图像被保存以用于在对下一图像进行手跟踪处理时的差分图像计算，并且表示检测出的手的范围的参数也被保存以用于在对下一图像进行手跟踪处理时的模板匹配及误差评估。That is, at first, subtract the mean value of the G component and the B component from the value of the R component in the RGB components of each pixel of the RGB image to obtain a difference; then, compare the difference with a predetermined threshold, if If the difference is smaller than a predetermined threshold, the value of the corresponding pixel of the skin-color image takes 0, and if the difference is larger than the predetermined threshold, the value of the corresponding pixel of the skin-color image takes the difference. For example, FIG. 5 shows an example of a skin color image after the captured RGB image is transformed with a predetermined threshold 140, where a rectangular box indicates that a hand is detected, and the skin color area is separated from the background as described above. Although the method for obtaining a skin color image from an RGB image as described above is described here, the user may also use other skin color separation methods to obtain a skin color image as required. In addition, the skin color image of the current image is saved for differential image calculation when performing hand tracking processing on the next image, and parameters representing the range of detected hands are also saved for use in hand tracking processing for the next image. Template matching and error evaluation during tracking processing.

在上一图像中已经检测到手或跟踪到手的情况中，即在S202中为“是”，执行有关手跟踪的处理。本发明的手跟踪处理采用改进的模板匹配法，该模板匹配法计算更快并且能够容忍手在运动过程中的手形的某些变化。具体地，在S209中，将当前图像从RGB图像变换成肤色图像，变换方法如上面所述。在S210中，计算当前图像的肤色图像与所保存的前一图像的肤色图像之间的差分图像。在差分图像的计算中，只有沿着手运动方向的差值被保留，而与手运动方向相反的值被丢弃。具体地，首先，计算当前图像的肤色图像的每个像素的值与前一图像的肤色图像的相应像素的值的差值。接着，将此差值与0比较，如果此差值比0大，则差分图像的相应像素的值取该差值，如果此差值比0小，则差分图像的相应像素的值取0。图6示出了针对手向右运动的连续两个图像的肤色图像之间的差分图像。In the case where a hand has been detected or tracked in the previous image, that is, YES in S202, processing regarding hand tracking is performed. The hand tracking process of the present invention uses an improved template matching method that is computationally faster and can tolerate some changes in hand shape during hand motion. Specifically, in S209, the current image is converted from an RGB image to a skin color image, and the conversion method is as described above. In S210, calculate the difference image between the skin color image of the current image and the saved skin color image of the previous image. In the calculation of the difference image, only the difference values along the hand motion direction are kept, while the values opposite to the hand motion direction are discarded. Specifically, firstly, the difference between the value of each pixel of the skin color image of the current image and the corresponding pixel value of the skin color image of the previous image is calculated. Next, compare the difference with 0, if the difference is greater than 0, the value of the corresponding pixel in the difference image takes the difference, and if the difference is smaller than 0, the value of the corresponding pixel in the difference image takes 0. Fig. 6 shows the difference image between the skin tone images of two consecutive images for hand movement to the right.

接着，在S210中，根据在前一图像中检测或跟踪到的手以及当前图像的肤色图像与前一图像的肤色图像之间的差分图像来在当前图像的肤色图像中跟踪手。具体地，首先，利用在前一图像中检测或跟踪到的手的范围来初始定义当前图像的肤色图像中的用于跟踪手的搜索范围。例如，在手的范围被以矩形框示出的示例中，搜索范围可按照如下公式来定义：Next, in S210, the hand is tracked in the skin color image of the current image according to the hand detected or tracked in the previous image and the difference image between the skin color image of the current image and the skin color image of the previous image. Specifically, firstly, the range of the hand detected or tracked in the previous image is used to initially define the search range for tracking the hand in the skin color image of the current image. For example, in the example where the range of the hand is shown by a rectangular frame, the search range can be defined according to the following formula:

Range.x＝MAX(0，Target.x-15)；Range.x=MAX(0, Target.x-15);

Range.y＝MAX(0，Target.y-15)；Range.y=MAX(0, Target.y-15);

Range.width＝MIN(a-Range.x，Target.width+30)；Range.width=MIN(a-Range.x, Target.width+30);

Range.height＝MIN(b-Range.y，Target.height+30)，Range.height=MIN(b-Range.y, Target.height+30),

其中，Target.x、Target.y表示前一图像中检测或跟踪到的手的矩形框的左上角的顶点的横、纵坐标，Target.width、Target.height表示前一图像中检测或跟踪到的手的矩形框的宽度和高度，并且Range.x、Range.y表示当前图像的肤色图像中的搜索范围的左上角的顶点的横、纵坐标，Range.width、Range.height表示当前图像的肤色图像中的搜索范围的宽度和高度，并且a和b分别表示图像的水平和竖直像素数，并且“15”，“30”的数值是根据经验值所设定的并且根据情况可以是使得初始设定的搜索范围尽可能接近跟踪的手范围的其他值。Among them, Target.x and Target.y represent the horizontal and vertical coordinates of the vertices of the upper left corner of the rectangular frame of the hand detected or tracked in the previous image, and Target.width and Target.height represent the detected or tracked hand in the previous image The width and height of the rectangular frame of the hand, and Range.x and Range.y represent the abscissa and ordinate of the vertex in the upper left corner of the search range in the skin color image of the current image, and Range.width and Range.height represent the height of the current image The width and height of the search range in the skin color image, and a and b represent the horizontal and vertical pixel numbers of the image respectively, and the numerical values of "15", "30" are set according to empirical values and can be such that Initially set the search range as close as possible to other values of the tracked hand range.

为了减少下一步中执行模板匹配法中的计算量并提高准确度，在如上所述初始定义了搜索范围之后，利用当前图像的肤色图像与前一图像的肤色图像之间的差分图像来修正初始定义的搜索范围，即将所述差分图像中感兴趣区域作为修正后的搜索范围。如图7所示，示出了(a)前一图像中检测或跟踪到的手的范围，(b)当前图像的肤色图像与前一图像的肤色图像之间的差分图像，以及(c)在当前图像的肤色图像中定义的搜索范围，其中，图中左上角的标号示例性地表示图像的帧编号。并且，图8示出了作为搜索范围修正的示例的搜索范围减小的示图，其中，初始定义的搜索范围的四条边被逐渐向内缩，并且当任一条边遇到像素值大于预定阈值(例如5)的像素时，则此条边停止向内缩。In order to reduce the amount of calculation in the template matching method in the next step and improve the accuracy, after initially defining the search range as described above, the difference image between the skin color image of the current image and the skin color image of the previous image is used to correct the initial The defined search range refers to the region of interest in the difference image as the corrected search range. As shown in Figure 7, it shows (a) the extent of the hand detected or tracked in the previous image, (b) the difference image between the skin color image of the current image and the skin color image of the previous image, and (c) The search range defined in the skin color image of the current image, wherein the label at the upper left corner in the figure exemplarily represents the frame number of the image. And, FIG. 8 shows a diagram of search range reduction as an example of search range modification, wherein the four sides of the initially defined search range are gradually shrunk inward, and when any side encounters a pixel value greater than a predetermined threshold (eg 5) pixels, then this side stops shrinking inwards.

在当前图像的肤色图像中定义了搜索范围之后，在S212中，执行模板匹配法以确定作为跟踪到的当前图像的手的范围。这里模板匹配法是指将在上一图像中检测或跟踪到的手的范围作为目标模板，将候选手范围与该模板进行比较，如果误差小于一定值，则确定匹配。为了减少当手跨越具有肤色的一些物体(例如，脸、另外的手)时的跟踪误差，上述模板匹配法考虑了来自差分图像的运动信息，具体为：After the search range is defined in the skin color image of the current image, in S212, a template matching method is performed to determine the range of the hand as the tracked current image. The template matching method here refers to using the range of the hand detected or tracked in the previous image as the target template, comparing the range of the candidate hand with the template, and determining the match if the error is less than a certain value. To reduce the tracking error when the hand crosses some objects with skin color (e.g., face, another hand), the above template matching method considers the motion information from the difference image, specifically:

在搜索范围中定义多个第一候选手范围，该些第一候选手范围具有与目标模板的大小相同的大小，并且，在差分图像中定义第二候选手范围，该候选手范围具有与目标模板的大小相同的大小；A plurality of first candidate hand ranges are defined in the search range, and the first candidate hand ranges have the same size as the target template, and second candidate hand ranges are defined in the difference image, and the candidate hand ranges have the same size as the target template. The size of the template is the same size as;

针对所述多个第一候选手范围循环执行如下步骤直到这多个第一候选手范围都经过如下匹配判断处理为止，从而确定出与目标模板最匹配的候选手范围作为在当前图像的肤色图像中跟踪到的手的范围：计算一个第一候选手范围与目标模板的各像素的绝对差的平均值作为第一误差；如果该第一误差大于第一预定阈值，则表示该个候选手范围不与目标模板匹配，从而被排除；如果该第一误差小于第一预定阈值，则计算第二误差，第二误差是通过将第一误差减去第二候选手范围的各像素的值的平均值所获得的值乘以预定的调节系数；如果该第二误差小于第二预定阈值，则确定匹配，即该个第一候选手范围被确定为在当前图像的肤色范围中跟踪到的手的范围，并且第二误差的值作为第二阈值以便对下一个第一候选手范围进行匹配判断。For the multiple first candidate hand ranges, perform the following steps in a loop until the multiple first candidate hand ranges have undergone the following matching judgment processing, so as to determine the candidate hand range that best matches the target template as the skin color image in the current image The range of the hand tracked in: Calculate the average value of the absolute difference between the first candidate hand range and each pixel of the target template as the first error; if the first error is greater than the first predetermined threshold, it means that the candidate hand range Does not match the target template, thereby being excluded; if the first error is less than the first predetermined threshold, then calculate the second error, the second error is by subtracting the first error from the average value of each pixel in the second candidate hand range If the second error is smaller than the second predetermined threshold, it is determined to match, that is, the first candidate hand range is determined to be the hand tracked in the skin color range of the current image. range, and the value of the second error is used as a second threshold to make a matching judgment on the range of the next first candidate.

如上所述，因为本模板匹配法通过参考前一个候选手范围来逐步地找出最匹配的手范围，因此，这使得能够容忍手在连续运动中的手形的变化。然而，在连续的跟踪处理中，这也导致了匹配误差的累积。因此，可以通过设置额外的验证处理来限制历史的匹配误差，该验证处理包括：计算当前被确认最匹配的第一候选手范围与在手检测设备中所检测到的手的范围的各个像素的绝对差的平均值作为第三误差；判断该第三误差是否大于第三阈值，如果该第三误差大于第三预定阈值，则判定当前被确认最匹配的第一候选手范围实际上不匹配，从而排除该第一候选手范围。图9示出了当手移动跨越脸时的跟踪结果，为了更清楚地示出效果，选择性地示出了间隔的图像帧，并且其中矩形框表示跟踪到的手。As described above, since the present template matching method finds the best matching hand range step by step by referring to the previous candidate hand range, this makes it possible to tolerate changes in the hand shape of the hand in continuous motion. However, this also leads to the accumulation of matching errors during successive tracking processes. Therefore, the historical matching error can be limited by setting an additional verification process, which includes: calculating the difference between the range of the first candidate hand currently confirmed as the best match and the range of the hand detected in the hand detection device The average value of the absolute difference is used as the third error; it is judged whether the third error is greater than the third threshold, and if the third error is greater than the third predetermined threshold, then it is determined that the currently most matched first candidate range does not actually match, Thus, the first candidate range is excluded. Fig. 9 shows the tracking results when the hand moves across the face, in order to show the effect more clearly, alternate image frames are selectively shown, and the rectangular boxes represent the tracked hands.

接着，在S213中，判断在S212中是否成功跟踪到手。如果成功跟踪到手，则如上所述在S214中标志isTrack被赋1，否则在S215中标志isTrack被赋0并且方法流程转到S203以对当前图像执行手检测处理。Next, in S213, it is judged whether the hand is successfully tracked in S212. If the hand is successfully tracked, the flag isTrack is set to 1 as described above in S214, otherwise the flag isTrack is set to 0 in S215 and the method flow goes to S203 to perform hand detection processing on the current image.

在S214中标志isTrack被赋1的情况中，即，在当前图像中跟踪到手的情况中，在S216，将当前图像的肤色保存以用于对下一帧图像进行处理时的差分图像计算，并且将表示当前图像中跟踪到的手范围的参数保存以供用于对下一帧图像的模板匹配处理，并且将当前图像中跟踪到的手的位置的横、纵坐标分别定义为前一图像中检测或跟踪到的手的范围的横坐标与宽度的一半的和以及前一图像中检测或跟踪到的手的范围的纵坐标与高度的一半的和，如果用HandPos(x，y)来表示当前图像中跟踪到的手的位置的坐标，则其被表示为如下公式：In the case where the flag isTrack is assigned 1 in S214, that is, in the case where the hand is tracked in the current image, in S216, the skin color of the current image is saved for the differential image calculation when the next frame image is processed, and Save the parameters representing the tracked hand range in the current image for template matching processing on the next frame image, and define the horizontal and vertical coordinates of the tracked hand position in the current image as Or the sum of the abscissa and half of the width of the range of the tracked hand and the sum of the ordinate and half of the height of the range of the hand detected or tracked in the previous image, if HandPos(x, y) is used to represent the current The coordinates of the tracked hand position in the image are expressed as the following formula:

HandPos(x，y)＝(Target.x+Target.width/2，Target.y+Target.height/2)HandPos(x, y) = (Target.x+Target.width/2, Target.y+Target.height/2)

接着，在S217，在对由图像捕获设备所捕获的一系列图像中的所有图像都执行了手检测或手跟踪的处理之后，通过所记录的多个手位置可得到手的运动轨迹。通过分析该运动轨迹并参考所定义的手势-运动轨迹映射表可识别出对象的手势。具体地，首先，计算每相邻两个图像中检测或跟踪到的手的位置之间的手位置取向Orient，即，计算由当前图像中检测或跟踪到的手的位置与前一图像中检测或跟踪到的手的位置构成的直线与水平方向的夹角Next, at S217 , after hand detection or hand tracking is performed on all the images in the series of images captured by the image capture device, the hand movement trajectory can be obtained through the recorded multiple hand positions. The gesture of the object can be recognized by analyzing the motion trace and referring to the defined gesture-motion trace mapping table. Specifically, first, calculate the hand position orientation Orient between the hand positions detected or tracked in every two adjacent images, that is, calculate the difference between the hand position detected or tracked in the current image and the hand position detected in the previous image. Or the angle between the line formed by the tracked hand position and the horizontal direction

接着，计算相邻两个图像之间位移方向DeltaOrient。具体地，将当前图像相对于前一图像的手位置取向Orient减去后一图像相对于当前图像的手位置取向LastOrient而得到差值。如果该差值的绝对值大于180，则在所述差值大于0的情况下，所述差值减去360而得到值作为图像位移方向DeltaOrient；在所述差值小于0的情况下，所述差值加上360而得到值作为图像位移方向DeltaOrient。如果该差值的绝对值不大于180，则所述差值作为图像位移方向DeltaOrient。Next, calculate the displacement direction DeltaOrient between two adjacent images. Specifically, the hand position orientation Orient of the current image relative to the previous image is subtracted from the hand position orientation LastOrient of the subsequent image relative to the current image to obtain a difference. If the absolute value of the difference is greater than 180, then in the case of the difference greater than 0, subtract 360 from the difference to obtain a value as the image displacement direction DeltaOrient; in the case of the difference less than 0, the Add 360 to the difference value to obtain a value as the image displacement direction DeltaOrient. If the absolute value of the difference is not greater than 180, the difference is used as the image displacement direction DeltaOrient.

然后，对于8种有意义的手势(“任意”用于记录实时运动轨迹，并且“短暂停留”被设置作为用于向上/向下/向左/向右手势的预备动作，停留、向上/向下/向左/向右、挥手作为一组，它们基于Orient的值；CW/CC作为另一组，它们基于DeltaOrient的值和一段时间的累积和，特别地，对于挥手的手势，其包括向左和向右的运动的交替运动)，手势识别如下：Then, for 8 meaningful gestures ("any" is used to record real-time motion traces, and "short stay" is set as a preparatory action for up/down/left/right gestures, stay, up/ Down/left/right, waving as a group, they are based on the value of Orient; CW/CC as another group, they are based on the value of DeltaOrient and the cumulative sum of a period of time, in particular, for the waving gesture, which includes Alternating motion of left and right motions), the gesture recognition is as follows:

停留stay

为了识别停留的手势，需要对于连续STAY_NUM帧检测到手保持静止，即，对于连续STAY_NUM帧，总有在当前图像跟踪到的手的位置与在前一图像中检测或跟踪到的手的位置相同，其中STAY_NUM是预先定义的帧数。In order to recognize the gesture of staying, it needs to detect that the hand remains still for consecutive STAY_NUM frames, that is, for consecutive STAY_NUM frames, there is always the same hand position tracked in the current image as the detected or tracked hand position in the previous image, Where STAY_NUM is a predefined number of frames.

向上/向下/向左/向右up/down/left/right

为了识别向上/向下/向左/向右的手势，首先，应检测到预备动作“短暂停留”，然后在接下来的DIREC_NUM帧，Orient的值应沿一个方向保持在一定范围内，其中，DIREC_NUM是预先定义的帧数。作为示例，假定“短暂停留”动作为对于3帧保持静止。图10(b)示出了对于向上/向下/向左/向右方向的Orient的值范围。例如，如果46≤Orient≤134，则认为是向上方向移动。In order to recognize up/down/left/right gestures, first, the preparatory action "short stay" should be detected, then in the next DIREC_NUM frame, the value of Orient should be kept within a certain range along one direction, where, DIREC_NUM is a predefined number of frames. As an example, assume a "stop" action of staying still for 3 frames. Fig. 10(b) shows the range of values of Orient for up/down/left/right directions. For example, if 46≤Orient≤134, it is considered to be moving in an upward direction.

挥手：wave:

为了识别挥手的手势，需要检测连续的4个运动段：向右-向左-向右-向左。为了检测每个运动段，需要Orient值对于连续N帧应被保持在相应的值范围内(MIN_SEG_NUM≤N＜＝MAX_SEG_NUM)，其中MIN_SEG_NUM和MAX_SEG_NUM是预先定义的帧数阈值。In order to recognize the waving gesture, it is necessary to detect consecutive 4 motion segments: right-left-right-left. In order to detect each motion segment, it is required that the Orient value should be kept within a corresponding value range (MIN_SEG_NUM≤N<=MAX_SEG_NUM) for consecutive N frames, where MIN_SEG_NUM and MAX_SEG_NUM are predefined frame number thresholds.

顺时针旋转：clockwise rotation:

为了识别顺时针旋转的手势，需要在若干连续帧图像中，DeltaOrient的值保持为正，DeltaOrient的绝对值保持在一定范围(例如，大于10并且小于等于50的范围)并且DeltaOrient的绝对值的和达到了预定阈值CIRCLE_DEGREE。In order to recognize the gesture of clockwise rotation, the value of DeltaOrient needs to be kept positive in several consecutive frame images, the absolute value of DeltaOrient is kept in a certain range (for example, a range greater than 10 and less than or equal to 50) and the sum of the absolute values of DeltaOrient The predetermined threshold CIRCLE_DEGREE has been reached.

逆时针旋转：Anticlockwise rotation:

在逆时针旋转的判断中，除了DeltaOrient的值保持为负以外，其他均与顺时针旋转的判断相同。In the judgment of counterclockwise rotation, except that the value of DeltaOrient remains negative, the others are the same as the judgment of clockwise rotation.

当如上所述手势被识别出之后，用于相应手势候选的所有计数器将被重置。此外，系统将被“冻结”几帧，即，在这几帧中不识别手势。此机制被引入是为了避免对于一些无意手势的错误识别。例如，如果用户希望不断做出向右的手势，他/她在完成一个向右手势之后自然地会将手移回，然后再做出下一个向右手势。如果不利用此种“冷冻”机制，则这种“移回”将很可能被误识别为向左的手势。After a gesture is recognized as described above, all counters for the corresponding gesture candidate will be reset. Also, the system will be "frozen" for a few frames, i.e. the gesture will not be recognized during these few frames. This mechanism is introduced to avoid false recognition of some unintentional gestures. For example, if a user wishes to continuously gesture to the right, he or she will naturally move the hand back after completing one gesture to the right before making the next gesture to the right. If this "freeze" mechanism is not utilized, this "move back" will likely be misrecognized as a gesture to the left.

如此，对象的手势被识别出。In this way, the gesture of the subject is recognized.

接着，在S219中，作为识别结果的手势被输出给操作命令触发设备。Next, in S219, the gesture as the recognition result is output to the operation command triggering device.

接着，在作为手势与相应操作命令之间的接口的操作命令触发设备中，根据从手势识别设备发送来的手势识别结果通过查看预先定义的手势与操作命令的映射表来触发相应的操作命令。手势与操作命令的映射表例如下表2中所示。该表2给出了与8种手势相对应的对于windows图片查看器软件的操作之间的映射。虽然这里给出的是8种手势相对应的对于windows图片查看器软件的操作之间的映射，但是本发明不限于此，更多种的手势与相应操作之间的映射可被定义，并且操作不仅可涉及对各种类型的软件的操作，也可涉及对各种电子设备的不同功能的操作，这由用户根据需要来定义。例如，本发明可被应用于手势-鼠标控制。Then, in the operation command triggering device as an interface between the gesture and the corresponding operation command, the corresponding operation command is triggered by looking at a predefined mapping table of gestures and operation commands according to the gesture recognition result sent from the gesture recognition device. A mapping table of gestures and operation commands is shown in Table 2 below, for example. The table 2 shows the mapping between the operations of the windows image viewer software corresponding to the 8 gestures. Although the mapping between the operations of the windows picture viewer software corresponding to the 8 gestures is given here, the present invention is not limited thereto, and the mapping between more gestures and corresponding operations can be defined, and the operations It may involve not only the operation of various types of software, but also the operation of different functions of various electronic devices, which is defined by the user according to needs. For example, the invention can be applied to gesture-mouse control.

手势 gesture 操作命令 operation command 短暂停留 short stay 回车 carriage return 向左 left 前一个/如果图片放大则表示向左移动 Previous/Move to the left if the picture is zoomed in 向右 To the right 下一个/如果图片放大则表示向右移动 Next / If the picture is enlarged, it means move to the right 向上 up 如果图片放大则表示向上移动 If the picture is zoomed in, it means move up 向下 down 如果图片放大则表示向下移动 If the picture is zoomed in, it means move down 挥手 to wave 退出 quit 顺时针旋转 clockwise rotation 放大 enlarge 逆时针旋转 Anticlockwise rotation 缩小 zoom out

表2：手势-操作命令映射表Table 2: Gesture-operation command mapping table

上面描述了根据本发明的基于视觉的手势识别遥控系统的各部件及各部件中的相应操作，然而可在不脱离本发明的主旨的情况作出多种改变和修改，这些改变和修改也落在本申请的范围内。The above describes the components of the vision-based gesture recognition remote control system according to the present invention and the corresponding operations in each component, but various changes and modifications can be made without departing from the gist of the present invention, and these changes and modifications also fall within within the scope of this application.

Claims

1. A gesture remote control system based on vision, comprising:

an image capture device for capturing a series of images of the subject;

a gesture recognition device for recognizing a gesture of an object from a series of images captured by the image capture device and sending the recognition result to the operation command trigger device; and

An operation command triggering device, the operation command triggering device is used to trigger a predetermined operation command according to the recognition result sent from the gesture recognition device, wherein the gesture recognition device includes:

hand detection means for detecting a subject's hand from an image captured by the image capture device;

hand tracking means for tracking the subject's hand in a subsequent image when the hand detection means detects the subject's hand in an image;

a gesture recognition part configured to determine the motion of the subject's hand based on the subject's hand detected by the hand detection part and the subject's hand tracked by the hand tracking part and motion to recognize gestures of objects,

Wherein, the hand tracking component tracks the object's hand through the following processing:

Use the range of the hand detected or tracked in the previous image and the difference image between the skin color image of the current image and the skin color image of the previous image to initially define the search range for tracking the hand in the skin color image of the current image ;

perform a template matching method to determine the extent of the hand as the tracked current image,

Wherein, the template matching method includes:

A plurality of first candidate hand ranges are defined in the search range, and the first candidate hand ranges have the same size as the target template, and second candidate hand ranges are defined in the difference image, and the candidate hand ranges have the same size as the target template. The size of the template is the same size, wherein the target template is the range of the hand detected or tracked in the previous image;

For the multiple first candidate hand ranges, perform the following steps in a loop until the multiple first candidate hand ranges have undergone the following matching judgment processing, so as to determine the candidate hand range that best matches the target template as the skin color image in the current image The range of hands tracked in:

Calculate the average value of the absolute difference of each pixel of the range of a first candidate hand and the target template as the first error;

If the first error is greater than the first predetermined threshold, it means that the range of the candidate hand does not match the target template, thereby being excluded;

If the first error is less than a first predetermined threshold, a second error is calculated, the second error being the value obtained by subtracting the average value of the values of the pixels of the second candidate hand range from the first error by a predetermined adjustment The value obtained by the coefficient;

If the second error is less than the second predetermined threshold, it is determined to match, that is, the first candidate hand range is determined to be the range of the hand tracked in the skin color range of the current image, and the value of the second error is used as the second threshold In order to make a matching judgment on the range of the next first candidate.

2. The vision-based gesture remote control system according to claim 1, wherein the hand detection component utilizes local binary pattern based cascading by transforming the image captured by the image capture device into a grayscale image A classifier detects the subject's hand from this grayscale image.

3. The vision-based gesture remote control system according to claim 1, wherein after determining the search range in the skin color image of the current image and before performing the template matching method, the initially defined search range, and define the plurality of candidate hand ranges in the reduced search range and execute the template matching method to determine the range of the hand in the current image, wherein the correction includes: the initially defined search range Each edge shrinks inward gradually, and when any edge encounters a pixel whose pixel value is greater than a predetermined threshold, the edge stops shrinking inward.

4. The vision-based gesture remote control system according to claim 1, wherein, in the template matching method, when determining a match with a target template for each first candidate hand range, the matching is verified, and the verification include:

calculating the average value of the absolute difference of each pixel between the range of the first candidate hand that is currently confirmed as the best match and the range of the hand detected in the hand detection device as a third error;

It is judged whether the third error is greater than a third threshold, and if the third error is greater than a third predetermined threshold, then it is determined that the currently confirmed most matching first candidate hand range does not actually match, thereby excluding the first candidate hand range.

5. The vision-based gesture remote control system according to any one of claims 1 to 4, wherein the skin color image is obtained through the following steps:

subtracting the mean value of the G component and the B component from the value of the R component in the RGB components of each pixel of the captured image to obtain a difference; and

Comparing the difference with a predetermined threshold, if the difference is smaller than the predetermined threshold, the value of the corresponding pixel of the skin color image is 0, and if the difference is larger than the predetermined threshold, the color of the skin color image is The value of the corresponding pixel takes the difference.

6. The vision-based gesture remote control system according to claim 1, wherein the gesture recognition component is based on the position of the hand of the object detected or tracked by the hand detection component and the hand tracking component in each frame of image , The hand position orientation between two adjacent frame images calculated from the hand position of each frame image and the displacement direction of two adjacent frame images calculated from the adjacent two hand position orientations to determine the movement of the subject's hand And comparing the determined hand movement of the object with a predefined gesture-hand movement trajectory mapping table to identify the gesture of the object.

7. A vision-based gesture remote control method, comprising:

capture a series of images of the subject;

Recognize gestures of objects from a series of captured images;

Trigger a predetermined operation command according to the recognition result,

Among them, gestures for recognizing objects include:

Detecting the subject's hand from the captured image;

when the subject's hand is detected in one image, tracking the subject's hand in a subsequent image; and

determining a movement of the subject's hand based on the detected subject's hand and the tracked subject's hand and recognizing a gesture of the subject based on the determined subject's hand movement,

The gesture remote control method also includes:

Wherein, the template matching method includes:

8. The vision-based gesture remote control method according to claim 7, wherein the skin color image is obtained through the following steps:

Subtracting the mean value of the G component and the B component from the value of the R component of the RGB component of each pixel of the captured image to obtain a difference;

9. A method for tracking an object in a sequence of images, comprising:

Use the range of the target detected or tracked in the previous image and the difference image between the skin color image of the current image and the skin color image of the previous image to initially define the search range for tracking the target in the skin color image of the current image ;

Executing a template matching method to determine the range of the target as the tracked current image, wherein the template matching method includes:

A plurality of first candidate target ranges are defined in the search range, the first candidate target ranges have the same size as the target template, and a second candidate target range is defined in the difference image, the candidate target ranges the range has the same size as the size of the target template, wherein the target template is the range of the target detected or tracked in the previous image;

For the multiple first candidate target ranges, the following steps are cyclically executed until the multiple first candidate target ranges have undergone the following matching judgment processing, so as to determine the candidate target range that best matches the target template as the skin color image in the current image Range of targets tracked in :

Calculating the average value of the absolute difference between the first candidate target range and each pixel of the target template as the first error;

If the first error is greater than the first predetermined threshold, it means that the candidate target range does not match the target template, and thus is excluded;

If the first error is smaller than a first predetermined threshold, a second error is calculated, the second error being a value obtained by subtracting the average value of the values of the pixels of the second candidate target range from the first error by a predetermined adjustment The value obtained by the coefficient;

If the second error is smaller than the second predetermined threshold, match is determined, that is, the first candidate target range is determined to be the range of the target tracked in the skin color range of the current image, and the value of the second error is used as the second threshold In order to perform matching judgment on the next first candidate target range.