CN103679130B

CN103679130B - Hand method for tracing, hand tracing equipment and gesture recognition system

Info

Publication number: CN103679130B
Application number: CN201210375618.7A
Authority: CN
Inventors: 王琪
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-09-24
Filing date: 2012-09-24
Publication date: 2018-04-13
Anticipated expiration: 2032-09-24
Also published as: CN103679130A

Abstract

The present invention discloses a kind of hand method for tracing, hand tracing equipment and gesture recognition system.Hand method for tracing includes：Judge whether covered in the frame before present frame；According to masking judging result, proficiency target is selected in the hand target adaptively tracked from frame above as template；And template matching method is carried out to obtain the hand target of present frame for present frame based on the template.

Description

Hand tracking method, hand tracking device and gesture recognition system

技术领域technical field

本发明涉及图像识别领域，并且更具体地涉及手追踪方法、手追踪设备和手势识别系统。The present invention relates to the field of image recognition, and more particularly to a hand tracking method, a hand tracking device and a gesture recognition system.

背景技术Background technique

随着在当今人们的生活中计算机以及众多便携式智能设备变得越来越不可或缺，人们将希望在人与计算机之间的更自然并且更高效的交互。因此，作为自由触摸人机交互界面的手势遥控系统将成为引人关注的选择。As computers and numerous portable smart devices become more and more indispensable in people's lives today, people will desire more natural and efficient interactions between humans and computers. Therefore, a gesture remote control system as a free-touch human-computer interaction interface will become an attractive choice.

基本上，手势遥控系统将追踪手并分析有意义的手的表达，如果它们被识别为是预先定义的手势中的一种，则相应的操作命令将被触发以运作某种应用。可见，作为前提的鲁棒性的手追踪方法对于分析手动作以及识别手势来说是非常重要的。另外，对于低功率设备的实现，还需要考虑追踪算法的复杂度。Basically, the gesture remote control system will track the hands and analyze meaningful hand expressions, if they are recognized as one of the pre-defined gestures, the corresponding operation command will be triggered to run a certain application. It can be seen that a robust hand tracking method as a premise is very important for analyzing hand movements and recognizing gestures. In addition, for the implementation of low-power devices, the complexity of the tracking algorithm also needs to be considered.

当前，某些算法或其变体已被引入用于实时手追踪。例如，光学流(Optical Flow)算法通过附加到图像中的运动像素的速度矢量来反映帧改变，然而，这需要大量计算；CamShift算法可有效地追踪视觉场景中动态改变的概率分布，然而，这依赖于合适的亮度并且当在追踪过程中存在具有与肤色类似的颜色的其他物体时将变得不具有鲁棒性；模板匹配(Template Matching)算法可通过比较候选与模板之间的差异来给出最佳选择从而高效地追踪目标，然而在某些情况中，追踪准确度不令人满意。Currently, certain algorithms or their variants have been introduced for real-time hand tracking. For example, the Optical Flow algorithm reflects frame changes by appending to the velocity vectors of moving pixels in the image, however, this requires a lot of computation; the CamShift algorithm can effectively track the probability distribution of dynamic changes in the visual scene, however, this Rely on proper brightness and will not be robust when there are other objects with similar color to skin color during tracking; Template Matching algorithm can give However, in some cases, the tracking accuracy is not satisfactory.

因此，需要一种能够即使在当手跨越通过具有与肤色类似的颜色的某些物体时也能够具有高追踪准确度的手追踪方法以及应用该手追踪方法的手势遥控系统。Therefore, there is a need for a hand tracking method capable of high tracking accuracy even when the hand crosses over some object having a color similar to a skin color, and a gesture remote control system applying the hand tracking method.

发明内容Contents of the invention

根据本发明的一方面，提供一种手追踪方法，包括：判断当前帧前面的帧中是否发生遮蔽；依据遮蔽判断结果，自适应地从在前面的帧中所追踪到的手目标中选择一手目标作为模板；并且基于所述模板针对当前帧进行模板匹配方法以获得当前帧的手目标。According to one aspect of the present invention, a hand tracking method is provided, including: judging whether occlusion occurs in the frame before the current frame; adaptively selecting a hand from the hand targets tracked in the previous frame according to the occlusion judgment result The target is used as a template; and a template matching method is performed on the current frame based on the template to obtain the hand target of the current frame.

在一个实施例中，自适应地选择作为模板的手目标包括：如果当前帧前面的帧中发生遮蔽，则紧在遮蔽发生前的帧中所获得的手目标被作为所述模板；如果当前帧前面的帧中未发生遮蔽，则当前帧的前一帧中所获得的手目标被作为所述模板。In one embodiment, adaptively selecting the hand target as a template includes: if occlusion occurs in a frame before the current frame, the hand target obtained in the frame immediately before the occlusion occurs is used as the template; if the current frame If no occlusion occurs in the previous frame, the hand object obtained in the previous frame of the current frame is used as the template.

在一个实施例中，所述模板匹配方法包括：在当前帧的肤色图像中定义搜索范围和多个候选手范围；基于所述搜索范围和所述模板来找出所述搜索范围中与所述模板最匹配的候选手范围作为在当前帧中追踪到的手目标。In one embodiment, the template matching method includes: defining a search range and a plurality of candidate hand ranges in the skin color image of the current frame; The candidate hand ranges that best match the template are used as the hand targets tracked in the current frame.

在一个实施例中，定义所述搜索范围包括：在当前帧的肤色图像中通过将在前一帧中所追踪到的手目标的四周扩大预定数目的像素来定义初始搜索范围；并且将所述初始搜索范围收缩为包含在前一帧中所追踪到的手目标以及表示当前帧的肤色图像与前一帧的肤色图像之间的运动差的当前肤色差分图像在内的最小矩形作为所述搜索范围。In one embodiment, defining the search range includes: defining an initial search range by expanding the periphery of the hand target tracked in the previous frame by a predetermined number of pixels in the skin color image of the current frame; and The initial search range is shrunk to the minimum rectangle containing the hand target tracked in the previous frame and the current skin color difference image representing the motion difference between the skin color image of the current frame and the skin color image of the previous frame as the search scope.

其中，模板匹配方法进一步包括：在所述搜索范围中定义多个第一候选手范围，该些第一候选手范围具有与所述模板相同的尺寸，并且，在表示当前帧的肤色图像与前一帧的肤色图像之间的运动差的当前肤色差分图像中定义第二候选手范围，该第二候选手范围具有与所述模板相同的尺寸，针对所述多个第一候选手范围循环执行匹配判断处理直到这多个第一候选手范围都经过匹配判断处理为止，从而确定出与所述模板最匹配的候选手范围作为在当前帧中追踪到的手目标，所述匹配判断处理包括：计算一个第一候选手范围与所述模板的各像素的值的绝对差的平均值作为第一误差；如果所述第一误差大于第一预定阈值，则当前第一候选手范围被判定为不与所述模板匹配并从而被排除；如果所述第一误差小于所述第一预定阈值，则计算第二误差，所述第二误差是通过将所述第一误差减去所述第二候选手范围的各像素的值的平均值乘以修正因子得到的值而得到的差值；如果所述第二误差小于第二预定阈值，则当前第一候选手范围被判定为与所述模板匹配，并且所述第二误差的值以及当前第一候选手范围的各像素的值的平均值分别作为在针对下一个第一候选手范围的匹配判断处理中的第二预定阈值和第二候选手范围的各像素的值的平均值。Wherein, the template matching method further includes: defining a plurality of first candidate hand ranges in the search range, these first candidate hand ranges have the same size as the template, and, between the skin color image representing the current frame and the previous A second candidate hand range is defined in the current skin color difference image of the motion difference between the skin color images of a frame, and the second candidate hand range has the same size as the template, and is executed cyclically for the plurality of first candidate hand ranges Matching and judging processing until the multiple first candidate hand ranges have been subjected to matching and judging processing, thereby determining the candidate hand range that best matches the template as the hand target tracked in the current frame. The matching and judging processing includes: Calculating the average value of the absolute difference between the first candidate hand range and the value of each pixel of the template as the first error; if the first error is greater than the first predetermined threshold, the current first candidate hand range is judged as not matching the template and thereby being excluded; if the first error is less than the first predetermined threshold, calculating a second error by subtracting the first error from the second candidate The difference obtained by multiplying the average value of each pixel value of the hand range by the value obtained by the correction factor; if the second error is smaller than the second predetermined threshold, the current first candidate hand range is determined to match the template , and the value of the second error and the average value of each pixel value of the current first candidate hand range are respectively used as the second predetermined threshold and the second candidate hand value in the matching judgment process for the next first candidate hand range. The average of the values for each pixel of the range.

在一个实施例中，在当前帧前面的帧中未发生遮蔽的情况中，所述修正因子为第一预定值，并且在当前帧前面的帧中发生遮蔽的情况中，所述修正因子为第二预定值与紧在遮蔽发生前的帧与其前一帧的肤色差分图像中获得的手目标内各像素的值的平均值的百分数的和。In one embodiment, in the case that occlusion does not occur in the frame preceding the current frame, the correction factor is a first predetermined value, and in the case that occlusion occurs in the frame preceding the current frame, the correction factor is the first predetermined value. The sum of the two predetermined values and the percentage of the average value of each pixel value in the hand object obtained from the skin color difference image of the frame immediately before the occlusion occurs and the previous frame.

在一个实施例中，该手追踪方法还包括：判断当前帧中是否发生遮蔽，所述判断包括：如果在当前帧中未追踪到手目标，则直接判定当前帧中未发生遮蔽；并且如果在当前帧中追踪到手目标，则执行遮蔽条件判断以确定在当前帧中是否发生遮蔽。In one embodiment, the hand tracking method further includes: judging whether occlusion occurs in the current frame, and the judgment includes: if no hand target is tracked in the current frame, directly determining that occlusion does not occur in the current frame; If the hand target is tracked in the frame, the occlusion condition judgment is performed to determine whether occlusion occurs in the current frame.

在一个实施例中，所述遮蔽条件判断包括判断是否满足以下条件：(((PreOccDiff<＝A)&&(DeltaDiff>B))||((PreOccDiff>A)&&(DeltaDiff>C)))&&(OccNum<D)，其中，PreOccDiff表示紧在遮蔽发生前的帧与其前一帧的肤色差分图像中获得的手目标内各像素的值的平均值，DeltaDiff是PreOccDiff与AvgTarDiff的差值，AvgTarDiff为当前肤色差分图像中与所述模板最匹配的候选手范围内的各像素的值的平均值，OccNum表示当前帧前面发生连续遮蔽的次数，其中，C大于B，In one embodiment, the occlusion condition judgment includes judging whether the following conditions are satisfied: (((PreOccDiff<=A)&&(DeltaDiff>B))||((PreOccDiff>A)&&(DeltaDiff>C)))&& (OccNum<D), wherein, PreOccDiff represents the average value of the value of each pixel in the hand target obtained in the frame before the occlusion occurs and the skin color difference image of the previous frame, DeltaDiff is the difference between PreOccDiff and AvgTarDiff, and AvgTarDiff is The average value of the value of each pixel in the current skin color difference image and the candidate hand range that matches the template most, OccNum represents the number of times that continuous occlusion occurs in front of the current frame, wherein, C is greater than B,

并且其中，如果上述条件满足，则判定在当前帧中发生了遮蔽，如果上述条件不满足，则判定当前帧中未发生遮蔽。And wherein, if the above condition is satisfied, it is determined that occlusion has occurred in the current frame, and if the above condition is not satisfied, it is determined that occlusion has not occurred in the current frame.

根据本发明的第二方面，提供一种手追踪系统，该系统包括：判断装置，判断当前帧前面的帧中是否发生遮蔽；选择装置，依据遮蔽判断结果，自适应地从在前面的帧中所追踪到的手目标中选择一手目标作为模板；以及获得装置，基于所述模板针对当前帧进行模板匹配方法以获得当前帧的手目标。According to the second aspect of the present invention, there is provided a hand tracking system, which includes: judging means for judging whether occlusion occurs in the frame before the current frame; Selecting a hand object from the tracked hand objects as a template; and obtaining means for performing a template matching method on the current frame based on the template to obtain the hand object of the current frame.

根据本发明的第三方面，提供一种手势识别系统，该手势识别系统包括上述手追踪设备。According to a third aspect of the present invention, there is provided a gesture recognition system, the gesture recognition system comprising the above-mentioned hand tracking device.

根据本发明的手追踪方法和手追踪设备，即使在当手跨越通过具有与肤色类似的颜色的某些物体时也能够具有高追踪准确度，从而实现准备地手势识别。According to the hand tracking method and the hand tracking device of the present invention, it is possible to have high tracking accuracy even when the hand crosses over some object having a color similar to the skin color, thereby realizing preparatory gesture recognition.

附图说明Description of drawings

图1示出一般手势识别系统的框架图；Fig. 1 shows the frame diagram of general gesture recognition system;

图2示出传统模板匹配算法中当手在脸前移动时发生追踪错误的例示图；Fig. 2 shows an illustration of a tracking error occurring when a hand moves in front of a face in a traditional template matching algorithm;

图3示出根据本发明的手追踪方法的流程图；Fig. 3 shows the flowchart of the hand tracking method according to the present invention;

图4示出根据本发明的手追踪方法的更详细的流程图；以及Figure 4 shows a more detailed flowchart of the hand tracking method according to the present invention; and

图5示出根据本发明的手追踪方法在当手在脸前移动时的追踪结果的例示图。FIG. 5 shows an illustration of the tracking result of the hand tracking method when the hand moves in front of the face according to the present invention.

具体实施方式Detailed ways

下面将详细描述本发明各个方面的特征和示例性实施例。下面的描述涵盖了许多具体细节，以便提供对本发明的全面理解。但是，对于本领域技术人员来说显而易见的是，本发明可以在不需要这些具体细节中的一些细节的情况下实施。下面对实施例的描述仅仅是为了通过示出本发明的示例来提供对本发明更清楚的理解。本发明绝不限于下面所提出的任何具体配置和算法，而是在不脱离本发明的精神的前提下覆盖了相关元素、部件和算法的任何修改、替换和改进。Features and exemplary embodiments of various aspects of the invention will be described in detail below. The following description covers numerous specific details in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is only to provide a clearer understanding of the present invention by showing examples of the present invention. The present invention is by no means limited to any specific configuration and algorithm presented below, but covers any modification, replacement and improvement of related elements, components and algorithms without departing from the spirit of the present invention.

<1.手势识别系统><1. Gesture recognition system>

图1示出了一般手势识别系统的框架图。如图1所示，手势识别系统包括三个功能部分：手检测部分、手追踪部分和手势识别部分。具体地，人的动作通过普通的web相机被捕获并且所捕获的图像帧被输入到该手势识别系统中。对于每个图像帧，首先判断在前一帧中是否检测到手目标。如果在前一帧中还未检测到手目标，则手检测部分将在当前整个帧中检测手目标。如果在前一帧中已检测到手目标，则手追踪部分将在当前帧以及接下来的帧中追踪手目标。与此同时，手识别部分将分析历史追踪轨迹以确定是否某种预定手势被识别出。如果某种预定手势被识别出，则相应手势命令将被输出以供后级使用，例如，以触发某种应用。Figure 1 shows a block diagram of a general gesture recognition system. As shown in Figure 1, the gesture recognition system includes three functional parts: hand detection part, hand tracking part and gesture recognition part. Specifically, human motions are captured by a common web camera and the captured image frames are input into the gesture recognition system. For each image frame, first judge whether a hand object was detected in the previous frame. If the hand object has not been detected in the previous frame, the hand detection part will detect the hand object in the current whole frame. If a hand object has been detected in the previous frame, the hand tracking part will track the hand object in the current frame and the next frame. At the same time, the hand recognition part will analyze the historical traces to determine whether a predetermined gesture is recognized. If a predetermined gesture is recognized, the corresponding gesture command will be output for subsequent use, for example, to trigger a certain application.

如上所述，如果在前一帧中已检测到手目标，则手追踪部分将在当前帧以及接下来的帧中追踪手目标。可见，追踪手目标所涉及的计算量以及追踪准确度对于手势识别系统而言是很重要的。然而，如上所述，现有的手追踪方法需求的计算量较大并且追踪准确度不能令人满意，特别是在当手移动通过具有与肤色类似的颜色的某些物体的情形中，很可能会由于遮蔽(Occlusion)而发生追踪错误。例如，在传统的模板匹配算法中，总是前一帧中所获得的手目标被用作当前帧中进行模板匹配算法的模板，因此在当手移动通过具有与肤色类似的颜色的某些物体的情形中由于遮蔽发生而手目标模板被错用或不当使用时，追踪错误则会发生。As mentioned above, if a hand object has been detected in the previous frame, the hand tracking part will track the hand object in the current frame and the next frame. It can be seen that the calculation amount involved in tracking the hand object and the tracking accuracy are very important for the gesture recognition system. However, as mentioned above, the existing hand tracking methods require a large amount of calculation and the tracking accuracy is not satisfactory, especially when the hand moves through some objects with a color similar to the skin color, it is likely Tracking errors will occur due to occlusion. For example, in the traditional template matching algorithm, the hand target obtained in the previous frame is always used as the template for the template matching algorithm in the current frame, so when the hand moves through some objects with a color similar to the skin color Tracking errors can occur when hand target templates are misused or improperly used due to occlusions in the case of .

例如，图2示出了在当手移动通过脸时发生追踪错误的示例。如图2中所示，图中给出了5个连续图像帧331-335，在图像帧331-333中，正常的模板匹配算法被执行，即，图像帧331中的手追踪基于前一阵的图像帧中检测或追踪到的手目标作为模板，图像帧322中的手追踪基于图像帧311中追踪到的手目标作为模板，图像帧333中的手追踪基于图像帧332中追踪到的手目标作为模板。For example, Figure 2 shows an example where a tracking error occurs when a hand moves past a face. As shown in FIG. 2, five consecutive image frames 331-335 are shown in the figure. In image frames 331-333, the normal template matching algorithm is performed, that is, the hand tracking in image frame 331 is based on the previous frame The hand target detected or tracked in the image frame is used as a template, the hand tracking in the image frame 322 is based on the hand target tracked in the image frame 311 as a template, and the hand tracking in the image frame 333 is based on the hand target tracked in the image frame 332 as a template.

然而，如图所示，由于在图像帧333中发生手与脸交叠的情况，在该图像帧333中所获得的手目标包含了同为肤色的人脸的一部分，也就是所获得的手目标是不准确或错误的。因而，当针对图像帧334进行手追踪时，以在图像帧333中获得的不准确或错误的手目标作为模板，在该图像帧334中通过模板匹配将获得手继续停留于脸上的错误追踪结果。如此继续，在接下来的图像帧中也将获得错误的追踪结果。可见，一个图像帧中发生追踪错误，这种错误将会导致在接下来的图像帧中的追踪也相应发生错误，因此追踪结果不准确，从而影响手势的正确识别。However, as shown in the figure, since the hand and the face overlap in the image frame 333, the hand object obtained in the image frame 333 contains a part of the human face with the same skin color, that is, the obtained hand The target is inaccurate or wrong. Therefore, when performing hand tracking for the image frame 334, the inaccurate or wrong hand target obtained in the image frame 333 is used as a template, and in this image frame 334, the false tracking of the hand continuing to stay on the face will be obtained through template matching. result. If this continues, wrong tracking results will also be obtained in the next image frame. It can be seen that if a tracking error occurs in one image frame, this error will cause a corresponding tracking error in the next image frame, so the tracking result is inaccurate, thereby affecting the correct recognition of gestures.

证实鉴于上述问题，在本发明中提供了一种利用自适应模板匹配算法的手追踪方法，该方法通过判断当前帧前面的帧中是否发生遮蔽来自适应地选择合适的模板以进行模板匹配算法，从而即使当手移动通过具有与肤色类似的颜色的某些物体的情形中也能够获得高的追踪准确度。In view of the above-mentioned problems, the present invention provides a hand tracking method utilizing an adaptive template matching algorithm, which adaptively selects an appropriate template to perform a template matching algorithm by judging whether occlusion occurs in a frame in front of the current frame, High tracking accuracy can thereby be obtained even in the case where the hand moves past some object having a color similar to the skin color.

<2.手追踪方法><2. Hand tracking method>

下面将参考图3来描述根据本发明的手追踪方法。The hand tracking method according to the present invention will be described below with reference to FIG. 3 .

图3示出根据本发明的手追踪方法的流程图。如图3所示，针对每一图像帧，执行步骤S301-S303。具体地，在步骤S301中，判断当前帧前面的帧中是否发生遮蔽。在步骤S302中，依据遮蔽判断结果，自适应地从在前面的帧中所追踪到的手目标中选择一手目标作为模板。如果当前帧前面的帧中发生遮蔽，则紧在遮蔽发生前的帧中所获得的手目标被作为所述模板；如果当前帧前面的帧中未发生遮蔽，则当前帧的前一帧中所获得的手目标被作为所述模板。接着，在步骤S303中，基于所述模板针对当前帧进行模板匹配方法以获得当前帧的手目标。通过利用本发明的手追踪方法，可以根据是否发生遮蔽来选择合适的模板以执行模板匹配算法，从而避免因遮蔽发生而引起的错误追踪。Fig. 3 shows a flowchart of the hand tracking method according to the present invention. As shown in FIG. 3, for each image frame, steps S301-S303 are performed. Specifically, in step S301, it is determined whether occlusion occurs in a frame preceding the current frame. In step S302, according to the occlusion judgment result, a hand object is adaptively selected from the hand objects tracked in the previous frame as a template. If occlusion occurs in the frame before the current frame, the hand object obtained in the frame immediately before the occlusion is used as the template; if no occlusion occurs in the frame before the current frame, the The obtained hand object is used as the template. Next, in step S303, a template matching method is performed for the current frame based on the template to obtain the hand object of the current frame. By using the hand tracking method of the present invention, an appropriate template can be selected according to whether occlusion occurs to execute a template matching algorithm, thereby avoiding false tracking caused by occlusion.

接下来，将参考图4更详细地描述根据本发明的手追踪方法。Next, the hand tracking method according to the present invention will be described in more detail with reference to FIG. 4 .

图4示出了根据本发明的手追踪方法的更详细的流程图。如图4所述，首先，在步骤S401中，计算当前帧的肤色图像以及表示当前帧的肤色图像与前一帧的肤色图像之间的运动差的当前肤色差分图像。Fig. 4 shows a more detailed flowchart of the hand tracking method according to the present invention. As shown in FIG. 4 , first, in step S401 , the current skin color image and the current skin color difference image representing the motion difference between the current frame skin color image and the previous frame skin color image are calculated.

接着，在步骤S402中，自适应地定义搜索范围。搜索范围的定义包括以下两个步骤：(1)在当前帧的肤色图像中通过将前一帧中所追踪到的手目标的四周扩大预定数目的像素(例如，15个像素)来定义初始搜索范围；以及(2)将所述初始搜索范围收缩为包含前一帧中所追踪到的手目标以及所述当前肤色差分图像的最小矩形。如此就获得了执行模板匹配方法中所使用的搜索范围。Next, in step S402, the search range is adaptively defined. The definition of the search range includes the following two steps: (1) In the skin color image of the current frame, the initial search is defined by expanding the surrounding area of the hand target tracked in the previous frame by a predetermined number of pixels (for example, 15 pixels). range; and (2) narrowing the initial search range to the smallest rectangle containing the hand target tracked in the previous frame and the current skin color difference image. In this way, the search scope used in performing the template matching method is obtained.

接着，在步骤S403中，判断在当前帧前面的帧中是否发生遮蔽。之后，根据遮蔽判断结果来从当前帧前面的帧中所追踪到的手目标中选择一手目标作为当前帧中进行模板匹配方法的模板。Next, in step S403, it is judged whether occlusion occurs in the frame preceding the current frame. Afterwards, according to the occlusion judgment result, a hand target is selected from the hand targets tracked in the frame before the current frame as a template for the template matching method in the current frame.

具体地，如图3中所述，如果在步骤S403中判定在当前帧前面的帧中未发生遮蔽，即步骤S403中的判断结果为“否”，则过程进行到步骤S405。在步骤S405中，如传统的模板匹配算法一样，当前帧的前一帧中所追踪到的手目标作为当前帧中进行模板匹配方法的模板。此时，运动差的修正因子k为固定的预定值，例如k＝0.3，其中该修正因子k用于稍后进行的模板匹配方法。Specifically, as described in FIG. 3, if it is determined in step S403 that no occlusion has occurred in the frame preceding the current frame, ie, the determination result in step S403 is "No", the process proceeds to step S405. In step S405, like the traditional template matching algorithm, the hand object tracked in the previous frame of the current frame is used as a template for the template matching method in the current frame. At this time, the correction factor k of the motion difference is a fixed predetermined value, such as k=0.3, where the correction factor k is used in the template matching method performed later.

另一方面，如果在步骤S403中判定在当前帧前面的帧中发生了遮蔽，即步骤S403中的判断结果为“是”，则过程进行到步骤S404。在步骤S404中，紧在遮蔽发生前的帧中所追踪到手目标被作为当前帧中进行模板匹配方法的模板。此时，运动差的修正因子k被计算为预定值与PreOccDiff的百分数的和，例如，k＝0.4+PreOccDiff/100，其中，PreOccDiff表示紧在遮蔽发生前的帧与其前一帧的肤色差分图像中获得的手目标内各像素的值的平均值。对于当前帧中的遮蔽判断，将会在稍后的步骤中介绍，并且针对每一帧的是否遮蔽的判断结果会被存储以供在后一帧中判断前面的帧中是否发生遮蔽而直接使用。On the other hand, if it is determined in step S403 that occlusion has occurred in the frame preceding the current frame, that is, the result of determination in step S403 is "Yes", the procedure proceeds to step S404. In step S404, the hand object tracked in the frame immediately before the occlusion occurs is used as a template for the template matching method in the current frame. At this time, the correction factor k of the motion difference is calculated as the sum of the predetermined value and the percentage of PreOccDiff, for example, k=0.4+PreOccDiff/100, where PreOccDiff represents the skin color difference image of the frame immediately before the occlusion occurs and its previous frame The average value of each pixel in the hand object obtained in . For the occlusion judgment in the current frame, it will be introduced in a later step, and the occlusion judgment result for each frame will be stored for direct use in the next frame to judge whether occlusion occurred in the previous frame .

在步骤S402中定义了搜索范围并且在步骤S404或405中选择了模板之后，在步骤S406中，执行模板匹配方法。在本发明中，该模板匹配方法是带有运动差修正的模板匹配方法。本发明的关键在于基于遮蔽判断结果来选择模板，并基于如此选择的模板来执行模板匹配算法。因此，基于其他方面的考虑，也可以使用不带运动差修正的模板匹配算法。After the search range is defined in step S402 and a template is selected in step S404 or 405, in step S406, a template matching method is executed. In the present invention, the template matching method is a template matching method with motion difference correction. The key of the present invention is to select a template based on the occlusion judgment result, and to execute a template matching algorithm based on the thus selected template. Therefore, based on other considerations, a template matching algorithm without motion difference correction can also be used.

具体地，在该模板匹配方法中，首先在所定义的搜索范围中定义多个第一候选手范围，每个第一候选手范围具有与所选模板相同的尺寸，并且在所述当前肤色差分图像中定义一个第二候选手范围，此第二候选手范围同样具有与所选模板相同的尺寸。在示例性实施例中，在每帧中追踪到的手目标以及在搜索范围中以及在肤色差分图像中所定义的候选手范围均为矩形，以便简化计算量，并且矩形的位置以矩形的左上角的顶点来表示，以便以统一的规则更新手目标的位置。Specifically, in this template matching method, a plurality of first candidate hand ranges are defined in the defined search range, each first candidate hand range has the same size as the selected template, and the current skin color difference A second candidate hand range is defined in the image, and the second candidate hand range also has the same size as the selected template. In an exemplary embodiment, the hand target tracked in each frame and the candidate hand ranges defined in the search range and the skin color difference image are all rectangles in order to simplify the calculation, and the position of the rectangle is represented by the upper left of the rectangle corner vertices to update the position of the hand target with a uniform rule.

在该模板匹配方法中，针对所述多个第一候选手范围循环执行匹配判断处理直到这多个第一候选手范围都经过配判断处理为止，从而确定出与所择模板最匹配的候选手范围作为在当前帧中追踪到的手目标，所述匹配判断处理包括如下步骤：In the template matching method, the matching judgment process is cyclically executed for the multiple first candidate hand ranges until all the first candidate hand ranges have been matched and judged, so as to determine the candidate hand that best matches the selected template. range as the hand target tracked in the current frame, the matching judgment process includes the following steps:

计算一个第一候选手范围与所选模板的各像素的值的绝对差的平均值作为第一误差；Calculate the mean value of the absolute difference of the value of each pixel of the range of a first candidate hand and the selected template as the first error;

如果第一误差大于第一预定阈值，则当前第一候选手范围被判定为不与所选模板匹配并且从而被排除；If the first error is greater than a first predetermined threshold, the current first candidate hand range is determined not to match the selected template and thus excluded;

如果第一误差小于第一预定阈值，则计算第二误差，第二误差是通过将第一误差减去第二候选手范围的各像素的值的平均值乘以修正因子得到的值而得到的差值；If the first error is smaller than the first predetermined threshold, a second error is calculated, and the second error is obtained by subtracting the first error from the average value of each pixel value of the second candidate hand range multiplied by the correction factor. difference;

如果第二误差小于第二预定阈值，则当前第一候选手范围被判定为与所选模板匹配，并且该第二误差的值以及当前第一候选手范围中的各像素的值的平均值分别作为在针对下一个第一候选手范围的匹配判断处理中的第二预定阈值和第二候选手范围的各像素的值的平均值。If the second error is smaller than the second predetermined threshold, the current first candidate hand range is determined to match the selected template, and the value of the second error and the average value of the values of the pixels in the current first candidate hand range are respectively As the second predetermined threshold in the matching determination process for the next first candidate hand range and the average value of the values of the pixels of the second candidate hand range.

接着，在步骤S407中，基于在步骤S406中执行的模板匹配算法的结果来判断是否有候选手范围与所选模板匹配。之后，基于判定结果以及前一帧的信息来判断在当前帧中是否发生遮蔽并更新相关参数以用于下一帧中的处理。具体地，如果在步骤S407中判定出没有候选手范围与所选模板匹配，则处理进行到步骤S409，并且在步骤S409中直接判定当前帧中未发生遮蔽。如果在步骤S407中判定出有候选手范围与所选模板匹配，则处理进行到步骤S408，并且在步骤S408中执行遮蔽条件判断以确定在当前帧中是否发生遮蔽。Next, in step S407, it is judged based on the result of the template matching algorithm executed in step S406 whether any candidate hand range matches the selected template. Afterwards, based on the determination result and the information of the previous frame, it is determined whether occlusion occurs in the current frame and relevant parameters are updated for processing in the next frame. Specifically, if it is determined in step S407 that no candidate hand range matches the selected template, the process proceeds to step S409, and it is directly determined in step S409 that no occlusion occurs in the current frame. If it is determined in step S407 that a candidate hand range matches the selected template, the process proceeds to step S408, and occlusion condition judgment is performed in step S408 to determine whether occlusion occurs in the current frame.

在步骤S408中执行的遮蔽条件判断包括判断是否满足以下条件：(((PreOccDiff<＝A)&&(DeltaDiff>B))||((PreOccDiff>A)&&(DeltaDiff>C)))&&(OccNum<D)其中，PreOccDiff表示紧在遮蔽发生前的帧与其前一帧的肤色差分图像中追踪到的手目标内各像素的值的平均值，DeltaDiff是PreOccDiff与AvgTarDiff的差值，AvgTarDiff为当前肤色差分图像中与所选模板最匹配的候选手范围内的各像素的值的平均值，OccNum表示当前帧前面发生连续遮蔽的次数，其中，C大于B，并且其中，如果上述条件满足，则在步骤S410中判定在当前帧中发生了遮蔽，如果上述条件不满足，则在步骤S410判定当前帧中未发生遮蔽。The occlusion condition judgment performed in step S408 includes judging whether the following conditions are satisfied: (((PreOccDiff<=A)&&(DeltaDiff>B))||((PreOccDiff>A)&&(DeltaDiff>C)))&&(OccNum <D) Among them, PreOccDiff represents the average value of each pixel in the hand object tracked in the skin color difference image of the frame immediately before the occlusion and the previous frame, DeltaDiff is the difference between PreOccDiff and AvgTarDiff, and AvgTarDiff is the current skin color The average value of the values of each pixel in the range of the candidate hand that best matches the selected template in the difference image, OccNum represents the number of consecutive occlusions in front of the current frame, where C is greater than B, and where, if the above conditions are met, then in In step S410 it is determined that occlusion has occurred in the current frame, and if the above condition is not satisfied, in step S410 it is determined that occlusion has not occurred in the current frame.

在上述公式中，阈值A-D基于经验来设定。在诸如手移动跨越脸的情况中，一般地，以普通摄像机拍摄图像时遮蔽大概会发生在两帧内，因此例如，阈值D可设为2，当然依赖于摄像机所使用帧率以及手移动速度等其他因素，阈值D的也可被设定为其他合适的值。另外，实际上，PreOccDiff反映出手移动的强烈程度，PreOccDiff越大，表示手移动越快或手运动越强烈，反之亦然。因此，在对B、C的取值中，C可以取比B大的值，例如，在阈值A都取15的情况中，B、C分别可取为B＝和C＝7。后面图4给出的示例中正是使用了这里给出的具体示例，即A＝15，B＝4，C＝5并且D＝2。In the above formula, the thresholds A-D are set based on experience. In situations such as a hand moving across a face, typically the occlusion will occur within about two frames of an image taken with a normal camera, so for example the threshold D could be set to 2, depending of course on the frame rate used by the camera and the speed of the hand movement and other factors, the threshold D can also be set to other suitable values. In addition, in fact, PreOccDiff reflects the intensity of hand movement, and the larger the PreOccDiff, the faster the hand movement or the stronger the hand movement, and vice versa. Therefore, among the values of B and C, C can take a larger value than B. For example, in the case where the threshold A both takes 15, B and C can be taken as B= and C=7 respectively. The specific example given here, ie A=15, B=4, C=5 and D=2, is used in the example given later in FIG. 4 .

之后，在步骤S411和步骤S412中，依据步骤S409、步骤S410中的结果来更新相关参数和追踪轨迹。具体地，在当前帧中成功追踪到手目标的情况中，如果判定在当前帧中未发生遮蔽，则当前帧中追踪到手目标被作为下一帧中进行模板匹配方法的模板，如果判定在当前帧中发生遮蔽，则当前帧中进行模板匹配方法所使用的模板仍用于下一帧中进行模板匹配方法的模板。另外，在当前帧中没有成功追踪到手目标的情况中，也仍是遮蔽发生前的帧中追踪到手目标被作为下一帧中进行模板匹配方法的模板。Afterwards, in step S411 and step S412, the relevant parameters and the tracking track are updated according to the results of step S409 and step S410. Specifically, in the case that the hand target is successfully tracked in the current frame, if it is determined that no occlusion occurs in the current frame, the hand target tracked in the current frame is used as a template for the template matching method in the next frame. If masking occurs in the current frame, the template used for the template matching method in the current frame is still used for the template for the template matching method in the next frame. In addition, in the case that the hand object is not successfully tracked in the current frame, the hand object tracked in the frame before the occlusion is still used as the template for the template matching method in the next frame.

在当前帧中成功追踪到手目标的情况中，如果判定在当前帧中未发生遮蔽，则追踪轨迹被更新为：P_N(x,y)＝P(x,y)，其中，P_N(x,y)表示当前帧中追踪到的矩形手目标的左上角的坐标位置，P(x,y)表示与所选模板最匹配的候选手范围的左上角的位置。也就是说，当前帧中追踪到的手目标的左上角的位置为被确定与所选模板最匹配的候选手范围的左上角的位置。如果判定在当前帧中发生了遮蔽，则追踪轨迹被更新为：In the case of successfully tracking the hand target in the current frame, if it is determined that no occlusion occurs in the current frame, the tracking trajectory is updated as: P _N (x, y) = P (x, y), where P _N (x ,y) represents the coordinate position of the upper left corner of the rectangular hand target tracked in the current frame, and P(x,y) represents the position of the upper left corner of the candidate hand range that best matches the selected template. That is to say, the position of the upper left corner of the hand target tracked in the current frame is the position of the upper left corner of the candidate hand range determined to best match the selected template. If it is determined that occlusion occurs in the current frame, the tracking track is updated as:

其中P_N-1(x,y)表示当前帧的前一帧中追踪到的手目标的左上角的位置，并且P_N-2(x,y)当前帧前面第二帧中追踪到的手目标的左上角的位置。如此，当前帧中的追踪处理结束，之后进入下一帧。如此针对每一帧执行上述方法，直到所有帧都被处理为止，从而得到各个帧中追踪到的手目标的位置以便识别手的运动。where P _N-1 (x, y) represents the position of the upper left corner of the hand object tracked in the previous frame of the current frame, and P _N-2 (x, y) the hand tracked in the second frame before the current frame The position of the upper left corner of the target. In this way, the tracking process in the current frame ends, and then the next frame proceeds. The above method is executed for each frame in this way until all the frames are processed, so as to obtain the position of the tracked hand object in each frame so as to recognize the movement of the hand.

图5示出根据本发明的手追踪方法在当手在脸前移动时的追踪结果的例示图。在图5中，上面的帧阵列是每个帧与其前一帧之间的肤色差分图像，并且对于每帧还示出了AvgTargetDiff值以图示遮蔽的效果，此值可被用在步骤S408中判断遮蔽条件是否满足。此外，下面的帧阵列示出更新用于追踪处理的模板的参考顺序。如在图5中所示，在帧80中判定出遮蔽发生，所以在随后的帧81中进行模板匹配方法的模板基于遮蔽前的帧79中的所追踪到手目标而非基于在帧80中追踪到的错误手目标，从而即使当手移动通过具有与肤色类似的颜色的物体(在本示例中为人脸)时也能够提高追踪准确度。FIG. 5 shows an illustration of the tracking result of the hand tracking method when the hand moves in front of the face according to the present invention. In Fig. 5, the upper frame array is the skin color difference image between each frame and its previous frame, and for each frame also shows the AvgTargetDiff value to illustrate the effect of occlusion, this value can be used in step S408 Determine whether the masking condition is satisfied. Also, the frame array below shows the reference order of updating the templates used for the tracking process. As shown in FIG. 5 , occlusion is determined to occur in frame 80 , so the template for the template matching method in subsequent frame 81 is based on the tracked hand object in frame 79 before occlusion rather than based on tracking in frame 80 erroneous hand targets detected, thereby improving tracking accuracy even when the hand moves past an object (in this example, a human face) that has a color similar to the skin tone.

根据本发明的利用自适应模板匹配算法的手追踪法不受亮度改变的影响，并且不受具有与肤色类似的颜色的物体的影响，并且具有较低的计算复杂度以满足实现低功率设备的要求。The hand tracking method using an adaptive template matching algorithm according to the present invention is not affected by brightness changes, and is not affected by objects with a color similar to skin color, and has low computational complexity to meet the requirements of implementing low-power devices Require.

根据本发明的手追踪方法可以以软件、硬件、固件以及其他方式实现。例如，当该手追踪方法以软件的方式实现时，实现该手追踪方法的程序可被嵌入在存储介质中，并通过被加载的计算机并被运行来实施本发明。当根据本发明的手追踪方法以硬件的方式实现时，该手追踪方法可以由相应的部件来完成。例如，可以实现这样一种手追踪设备，包括：判断装置，判断当前帧前面的帧中是否发生遮蔽；选择装置，依据遮蔽判断结果，自适应地从在前面的帧中所追踪到的手目标中选择一手目标作为模板；以及获得装置，基于所述模板针对当前帧进行模板匹配方法以获得当前帧的手目标。The hand tracking method according to the present invention can be implemented in software, hardware, firmware and other ways. For example, when the hand tracking method is implemented in the form of software, the program for realizing the hand tracking method can be embedded in a storage medium, loaded and executed by a computer to implement the present invention. When the hand tracking method according to the present invention is implemented in hardware, the hand tracking method can be completed by corresponding components. For example, such a hand tracking device can be implemented, including: judging means for judging whether occlusion occurs in the frame before the current frame; selecting a hand target as a template; and obtaining means for performing a template matching method on the current frame based on the template to obtain the hand target of the current frame.

另外，根据本发明的手追踪方法可单独使用，也可作为手势识别过程中的一部分使用，同样，根据本发明的手追踪方法设备可单独使用，也可作为一组件而结合到手势识别系统中。In addition, the hand tracking method according to the present invention can be used alone, and can also be used as a part of the gesture recognition process. Similarly, the hand tracking method device according to the present invention can be used alone, and can also be combined into a gesture recognition system as a component .

在本说明书中是按照给定的流程图中的顺序描述的，然而本发明的手追踪方法也可以其他顺序来实现，例如，某些步骤可被并行执行或者某步骤可在另一步骤的前面执行，只要满足本发明的主题思想，即针对每一帧根据对于前面的帧中是否发生遮蔽的判断来自适应地选择模板以进行模板匹配算法，即可。In this specification, the sequence in the given flow chart is described, but the hand tracking method of the present invention can also be implemented in other sequences, for example, some steps can be executed in parallel or a certain step can be performed in front of another step Execution, as long as the main idea of the present invention is satisfied, that is, for each frame, a template is adaptively selected according to the judgment of whether occlusion occurs in the previous frame to perform the template matching algorithm.

根据本发明的手追踪方法和手追踪设备不限于上述提及的用途，而是可以应用于与手追踪有关的任何场合。The hand tracking method and the hand tracking device according to the present invention are not limited to the above-mentioned uses, but can be applied to any occasion related to hand tracking.

虽然，上面以具体实施例描述了本发明，但是本领域技术人员应当理解，依据设计要求和其他因素可出现各种修改、组合、子组合和替换，只要他们落在所附权利要求及其等同物的范围内即可。Although the present invention has been described above with specific embodiments, those skilled in the art should understand that various modifications, combinations, sub-combinations and replacements may occur depending on design requirements and other factors, as long as they fall within the scope of the appended claims and their equivalents within the scope of the object.

Claims

1. A hand tracking method, comprising:

Determine whether occlusion occurs in the frame before the current frame;

Adaptively select a hand target as a template from the hand targets tracked in previous frames according to the occlusion judgment result; and

Carrying out a template matching method for the current frame based on the template to obtain the hand target of the current frame,

Among them, adaptively selecting the hand target as a template includes:

If an occlusion occurs in a frame preceding the current frame, the hand object obtained in the frame immediately before the occlusion occurs is used as the template;

If no occlusion occurs in the previous frame of the current frame, the hand object obtained in the previous frame of the current frame is used as the template.

2. The hand tracking method according to claim 1, wherein the template matching method comprises:

Define the search range and multiple candidate hand ranges in the skin color image of the current frame;

Based on the search range and the template, find a candidate hand range in the search range that best matches the template as the hand target tracked in the current frame.

3. The hand tracking method according to claim 2, wherein defining the search range comprises:

In the skin color image of the current frame, an initial search range is defined by enlarging the periphery of the hand target tracked in the previous frame by a predetermined number of pixels; and

The initial search range is narrowed to the minimum rectangle containing the hand target tracked in the previous frame and the current skin color difference image representing the motion difference between the skin color image of the current frame and the skin color image of the previous frame as The search scope.

4. The hand tracking method according to claim 2 or 3, wherein the template matching method further comprises:

A plurality of first candidate hand ranges are defined in the search range, the first candidate hand ranges have the same size as the template, and the distance between the skin color image representing the current frame and the skin color image of the previous frame is defined. A second candidate hand range is defined in the current skin color difference image of the motion difference, and the second candidate hand range has the same size as the template,

The matching judgment process is cyclically executed for the plurality of first candidate hand ranges until the plurality of first candidate hand ranges have passed the matching judgment process, so as to determine the candidate hand range that best matches the template as the tracking frame in the current frame. Arrived hand target,

The matching judgment process includes:

Calculating the average value of the absolute difference between the first candidate hand range and the value of each pixel of the template as the first error;

If the first error is greater than a first predetermined threshold, the current first candidate hand range is determined not to match the template and thus excluded;

If the first error is less than the first predetermined threshold, a second error is calculated by subtracting the first error from the average value of the values of the pixels in the second candidate hand range The difference obtained by multiplying the value obtained by the correction factor;

If the second error is smaller than the second predetermined threshold, the current first candidate hand range is determined to match the template, and the value of the second error and the average of the values of each pixel in the current first candidate hand range The values are taken as the second predetermined threshold and the average value of the values of the pixels of the second candidate hand range in the matching judgment process for the next first candidate hand range, respectively.

5. The hand tracking method according to claim 4, wherein, in a case where occlusion does not occur in a frame preceding the current frame, the correction factor is a first predetermined value, and if occlusion occurs in a frame preceding the current frame In this case, the correction factor is the sum of the second predetermined value and the percentage of the average value of the value of each pixel in the hand object obtained in the skin color difference image of the frame immediately before the occlusion occurs and the previous frame.

6. The hand tracking method according to claim 1, further comprising: judging whether occlusion occurs in the current frame, said judging comprising:

If the hand target is not tracked in the current frame, it is directly determined that no occlusion occurs in the current frame; and

If a hand object is tracked in the current frame, an occlusion condition judgment is performed to determine whether occlusion occurs in the current frame.

7. The hand tracking method according to claim 6, wherein said occlusion condition judgment includes judging whether the following conditions are met:

(((PreOccDiff<=A)&&(DeltaDiff>B))||((PreOccDiff>A)&&(DeltaDiff>C)))&&(OccNum<D) among them, PreOccDiff means that the frame immediately before the occlusion occurs and the previous frame The average value of the value of each pixel in the hand object obtained in the skin color difference image of one frame, DeltaDiff is the difference between PreOccDiff and AvgTarDiff, and AvgTarDiff is each pixel in the current skin color difference image that best matches the template candidate hand The average value of the value, OccNum indicates the number of consecutive occlusions in front of the current frame, where C is greater than B,

And wherein, if the above condition is satisfied, it is determined that occlusion has occurred in the current frame, and if the above condition is not satisfied, it is determined that occlusion has not occurred in the current frame.

8. A hand tracking device comprising:

judging means, judging whether occlusion occurs in the frame before the current frame;

The selection device adaptively selects a hand target as a template from the hand targets tracked in the previous frame according to the occlusion judgment result; and

The obtaining device performs a template matching method for the current frame based on the template to obtain the hand target of the current frame,

Wherein, adaptively selecting the hand target as a template includes: if occlusion occurs in the frame before the current frame, the hand target obtained in the frame immediately before the occlusion occurs is used as the template; If no occlusion occurs, the hand object obtained in the previous frame of the current frame is used as the template.

9. A gesture recognition system comprising the hand tracking device of claim 8.