CN105204627B

CN105204627B - A digital input method based on gesture

Info

Publication number: CN105204627B
Application number: CN201510551014.7A
Authority: CN
Inventors: 冯志全
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2015-09-01
Filing date: 2015-09-01
Publication date: 2016-07-13
Anticipated expiration: 2035-09-01
Also published as: CN105204627A

Abstract

The invention provides a kind of digital input method based on gesture, belong to computer realm.The method structure virtual interface, described virtual interface is operator's gesture operation region, and this virtual interface can change along with the body position of operator or the change of figure；Gradually input N number of numeral by a virtual interface, or by generating N number of virtual interface, each virtual interface one numeral of input, this N number of virtual interface forms virtual interface group；Display screen is defined as physical interface, and the definition space between operator and physical interface is physical space；Described physical space includes virtual interface and non-virtual interface；The gesture of operator be only only in virtual interface effective and can perception, the gesture in non-virtual interface is invalid；Numeral is inputted by described virtual interface group.

Description

A digital input method based on gesture

技术领域technical field

本发明属于计算机领域，具体涉及一种基于手势的数字输入方法。The invention belongs to the field of computers, and in particular relates to a gesture-based digital input method.

背景技术Background technique

用手势输入数字一直是手势交互理论和应用研究中的难点问题和关键问题之一，也是智能家电等产业领域的痛点问题之一。Inputting numbers with gestures has always been one of the difficult and key issues in gesture interaction theory and application research, and it is also one of the pain points in industrial fields such as smart home appliances.

目前一般用手势数字实现用不同手势输入数字，它首先通过手势图像识别，然后把不同的手势解释为不同的数字，这种方法存在的主要问题是：(1)识别率收到很多因素的影响(例如距离、特征提取、识别算子等)，特别是对于形状相近的手势目前普通方法很难把它们区分开；(2)多人同时操作时容易相互干扰；(3)存在“MidasTouch问题”。At present, gesture numbers are generally used to input numbers with different gestures. It first recognizes gesture images, and then interprets different gestures as different numbers. The main problems of this method are: (1) The recognition rate is affected by many factors. (such as distance, feature extraction, recognition operator, etc.), especially for gestures with similar shapes, it is difficult to distinguish them by ordinary methods; (2) when multiple people operate at the same time, they are easy to interfere with each other; (3) there is a "MidasTouch problem" .

发明内容Contents of the invention

本发明的目的在于解决上述现有技术中存在的难题，提供一种基于手势的数字输入方法，方便用手势输入数字。The purpose of the present invention is to solve the problems existing in the above-mentioned prior art, and to provide a gesture-based number input method, which is convenient for inputting numbers with gestures.

本发明是通过以下技术方案实现的：The present invention is achieved through the following technical solutions:

一种基于手势的数字输入方法，构造虚拟界面，所述虚拟界面是操作者手势操作区域，该虚拟界面能够随着操作者的身体位置或体态的变化而变化；通过一个虚拟界面逐次输入N个数字，或者通过生成N个虚拟界面，每个虚拟界面输入一个数字，这N个虚拟界面形成虚拟界面群；A digital input method based on gestures, constructing a virtual interface, the virtual interface is the operator's gesture operation area, and the virtual interface can change with the change of the operator's body position or posture; input N numbers successively through a virtual interface number, or by generating N virtual interfaces, each virtual interface enters a number, and these N virtual interfaces form a virtual interface group;

将显示屏幕定义为物理界面，操作者与物理界面之间的空间定义为物理空间；The display screen is defined as the physical interface, and the space between the operator and the physical interface is defined as the physical space;

所述物理空间包括虚拟界面和非虚拟界面；操作者的手势只有在虚拟界面内才是有效的和能够感知的，在非虚拟界面内的手势是无效的；The physical space includes a virtual interface and a non-virtual interface; the operator's gestures are valid and perceivable only in the virtual interface, and gestures in the non-virtual interface are invalid;

通过所述虚拟界面群输入数字。Enter numbers through the cluster of virtual interfaces.

所述构造虚拟界面是这样实现的：The construction of the virtual interface is implemented as follows:

(1)计算机检测操作者的行为模型：如果检测到手掌首先前推，然后保持静止状态，则进入步骤(2)，如果是其它行为，则返回步骤(1)；(1) The computer detects the behavior model of the operator: if it is detected that the palm is first pushed forward and then remains stationary, then enter step (2), if it is other behaviors, then return to step (1);

(2)计算出手掌保持静止状态时手势的重心位置Z；(2) Calculate the position Z of the center of gravity of the gesture when the palm remains stationary;

(3)将虚拟界面模板M放到以手势的重心位置Z为中心的位置，得到虚拟界面V(Z,Ω_M)，其中，Ω_M表示模板M所确定的虚拟界面的范围；(3) Put the virtual interface template M at a position centered on the center of gravity position Z of the gesture to obtain a virtual interface V(Z, Ω _M ), where Ω _M represents the scope of the virtual interface determined by the template M;

(4)计算出V上各个兴趣点的三维位置信息；(4) Calculate the three-dimensional position information of each point of interest on V;

(5)返回虚拟界面V。(5) Return to the virtual interface V.

每一种应用的所述虚拟界面模板M的功能分布以及大小范围是确定不变的；所述兴趣点是指操作者在虚拟界面V上的交互对象；The function distribution and size range of the virtual interface template M of each application are fixed; the interest points refer to the interactive objects of the operator on the virtual interface V;

所述步骤(4)是利用空间深度信息和手势跟踪方法计算出V上各个兴趣点的三维位置信息。The step (4) is to use the spatial depth information and gesture tracking method to calculate the three-dimensional position information of each point of interest on V.

所述通过所述虚拟界面群输入数字是这样实现的：The input of numbers through the virtual interface group is realized in this way:

Q1，采用时钟表盘结构表示所述虚拟界面，其中，把时钟盘结构中的12点定义为数字0，然后沿顺时针方向依次定义数字1到9，数字1和数字2之间所在的弧的中点为A，数字2和数字3之间所在的弧的中点为B，圆心为Z，射线ZA和ZB围成一个扇形区域Θ；Q1, using the clock dial structure to represent the virtual interface, wherein the 12 o’clock in the clock dial structure is defined as the number 0, and then the numbers 1 to 9 are sequentially defined in the clockwise direction, and the arc between the number 1 and the number 2 is defined. The midpoint is A, the midpoint of the arc between the number 2 and the number 3 is B, the center of the circle is Z, and the rays ZA and ZB enclose a fan-shaped area Θ;

Q2,对于要输入的每个数字，具体步骤包括：Q2, for each number to be entered, the specific steps include:

(A1)生成第i个虚拟界面Vi；(A1) generate the i-th virtual interface Vi;

(A2)操作者在Vi上移动手势；(A2) The operator moves gestures on Vi;

(A3)如果手势处于静止状态，再进一步检测是否有由五指伸开的包袱手势变化到五指收缩的拳头手势(即“抓”手势)，如果是，则进入(A4)，如果否，返回步骤(A3)；(A3) If the gesture is in a static state, further detect whether there is a change from the bag gesture with five fingers stretched to the fist gesture with five fingers contracted (ie "grab" gesture), if yes, then enter (A4), if not, return to step (A3);

(A4)计算手势重心所在位置Pg；(A4) Calculate the position Pg of the center of gravity of the gesture;

(A5)如果转步骤(A4)；(A5) If Go to step (A4);

(A6)计算f：(A6) Calculate f:

$\underset{f f}{M m i i n no} {{d d i i s the s (({P P}_{g g},, {Θ Θ}_{f f}))}} - - - - - - ((11))$

其中，Θ_f表示虚拟界面V_i上数字f所在的空间扇形区域点集。Among them, Θ _f represents the point set of the space fan-shaped area where the number f is located on the virtual interface V _i .

所述Q1进一步包括：Said Q1 further includes:

将虚拟界面上数字0到9的9个扇形区域分为3个象限，其中象限I是0-3所在的扇形区域；象限II是3-6所在的扇形区域；象限III是6-9所在的扇形区域；Divide the 9 fan-shaped areas of numbers 0 to 9 on the virtual interface into 3 quadrants, where quadrant I is the fan-shaped area where 0-3 is located; quadrant II is the fan-shaped area where 3-6 is located; quadrant III is where 6-9 is located fan-shaped area;

操作者的手势从虚拟界面的中心点出发选择象限，则根据手势运动轨迹的方向范围，判断操作者欲选择的象限；The operator's gesture starts from the center point of the virtual interface to select a quadrant, and then judges the quadrant that the operator wants to select according to the direction range of the gesture trajectory;

中心点Z与数字0确定的射线为起始向量L，沿顺时针方向角度逐渐增大，手势运动的方向为T，T与L之间的夹角为θ＝<T,L>，T所在的象限为：The ray determined by the center point Z and the number 0 is the starting vector L, and the angle gradually increases in the clockwise direction. The direction of gesture movement is T, and the angle between T and L is θ=<T,L>, where T is The quadrants are:

在进行数字选择时，操作者的手掌沿T方向平移，然后垂直向前推手势，则选中T所确定的象限，然后调入对应的虚拟界面模板，将该象限设为新的虚拟界面供操作者进一步操作；When making digital selection, the operator’s palm moves along the T direction, and then pushes the gesture vertically forward, then the quadrant determined by T is selected, and then the corresponding virtual interface template is called, and the quadrant is set as a new virtual interface for operation or further operation;

在所述新的虚拟界面中，只有该象限中的4个数字。In said new virtual interface, there are only 4 numbers in that quadrant.

所述方法采用粒子滤波算法获得手势运动轨迹。The method adopts a particle filter algorithm to obtain gesture motion trajectories.

与现有技术相比，本发明的有益效果是：Compared with prior art, the beneficial effect of the present invention is:

(1)解决了“MidasTouch问题”(即“点石成金”问题，用户所有的手势动作都会被传感器捕获当作命令执行而导致系统状态紊乱，极大地加重认知用户负荷和操作负荷)。(1) Solve the "MidasTouch problem" (that is, the problem of "touching stones into gold", all gestures of the user will be captured by the sensor and executed as commands, resulting in system state disorder, which greatly increases the load on cognitive users and operations).

(2)解决了传统用手势输入数字时存在的缺陷(例如，识别率低；手势复杂，不便于输入多位数字，以及需要记忆数字手势等问题)。(2) It solves the defects (for example, low recognition rate; complex gestures, inconvenient input of multi-digit numbers, and the need to memorize digital gestures, etc.) when traditionally using gestures to input numbers.

附图说明Description of drawings

图1本发明中用于数字输入的虚拟界面模板Fig. 1 is used for the virtual interface template of digital input in the present invention

图2本发明实施例中的虚拟界面模板Fig. 2 virtual interface template in the embodiment of the present invention

图3-1本发明实施例中10个数字划分到三个象限中10 numbers in the embodiment of the present invention are divided into three quadrants in Fig. 3-1

图3-2本发明实施例中将第1象限作为虚拟界面模板生成新的虚拟界面。Figure 3-2 In the embodiment of the present invention, the first quadrant is used as a virtual interface template to generate a new virtual interface.

具体实施方式detailed description

下面结合附图对本发明作进一步详细描述：Below in conjunction with accompanying drawing, the present invention is described in further detail:

(1)虚拟界面(1) Virtual interface

通过分析操作者的手势操作行为，计算机可以形成位于操作者前面的手势交互感应区，该区域既可能是二维(2D)也可能是三维(3D)，它可以随着操作者的身体位置或体态的变化亦步亦趋地变化，就好像操作者手势附近有一个看不见但却可以随操作者身体的运动而变化的操作屏幕，本发明将这种具有特定结构和功能的操作者手势操作区域称为(无形)虚拟界面(TI)。把虚拟界面划分为若干个子块，把多模态手势(即2D界面手势/3D界面手势,通信型手势/操作型手势)进行组合构建基本的交互功能，并在这些子块与交互功能之间建立对应关系，这样的子块被称为功能块。By analyzing the operator's gesture operation behavior, the computer can form a gesture interaction sensing area in front of the operator, which may be two-dimensional (2D) or three-dimensional (3D), and it can follow the operator's body position or The change of posture changes step by step, as if there is an invisible operation screen near the operator's gesture that can change with the movement of the operator's body. The present invention refers to this operator's gesture operation area with specific structure and function as (Intangible) Virtual Interface (TI). Divide the virtual interface into several sub-blocks, combine multi-modal gestures (ie, 2D interface gestures/3D interface gestures, communication gestures/operating gestures) to build basic interactive functions, and build the basic interactive functions between these sub-blocks and interactive functions To establish a corresponding relationship, such a sub-block is called a function block.

虚拟界面的作用在于，它不仅从一个侧面反映了操作者手势操作的行为模型，而且它从一个侧面刻画了操作者的心理模型，并且使得非接触式交互界面是有结构的、是可以感知的、是可以计算的，从而将接触式交互与非接触式交互有效地统一起来。The role of the virtual interface is that it not only reflects the behavior model of the operator's gesture operation from one side, but also portrays the operator's mental model from one side, and makes the non-contact interactive interface structured and perceivable , can be calculated, so that the contact interaction and non-contact interaction can be effectively unified.

(2)物理空间(2) Physical space

本发明将显示屏幕称为物理界面(PI)，操作者与物理界面之间的空间称为物理空间。物理空间分为虚拟界面和非虚拟界面构成。给出了一种可能的物理空间结构图，其中，虚拟界面分为3D操作区和2D操作区，操作者手势只有在虚拟界面内才是有效的和可以感知的；在虚拟界面之外的物理空间是无效手势命令区，计算机不会响应无效手势命令区的任何手势命令或手势操作。In the present invention, the display screen is called a physical interface (PI), and the space between the operator and the physical interface is called a physical space. The physical space is divided into virtual interface and non-virtual interface. A possible physical space structure diagram is given, in which the virtual interface is divided into 3D operation area and 2D operation area, and operator gestures are effective and perceivable only in the virtual interface; The space is an invalid gesture command area, and the computer will not respond to any gesture commands or gesture operations in the invalid gesture command area.

虚拟界面的构造算法如下：The construction algorithm of the virtual interface is as follows:

1.计算机检查操作者的行为模型：手掌首先前推，然后保持静止状态。1. The computer checks the operator's behavior model: the palm is first pushed forward, and then held still.

2.计算出手掌保持静止状态时手势的重心位置Z；2. Calculate the position Z of the center of gravity of the gesture when the palm remains stationary;

3.将事先设计好的虚拟界面模板M放到以Z为中心的位置，得到虚拟界面V(Z,Ω_M)，其中，Ω_M表示模板M所确定的虚拟界面的范围；3. Place the pre-designed virtual interface template M at a position centered on Z to obtain a virtual interface V(Z, Ω _M ), where Ω _M represents the scope of the virtual interface determined by the template M;

4.利用空间深度信息和手势跟踪技术，计算出V上各个兴趣点的三维位置信息；4. Use spatial depth information and gesture tracking technology to calculate the three-dimensional position information of each point of interest on V;

5.返回虚拟界面V。5. Return to the virtual interface V.

虚拟界面模板M的功能分布以及大小范围是确定不变的。根据不同的应用需要，可以设计不同的虚拟界面模板M。所谓V上的兴趣点，是指操作者在虚拟界面V上的交互对象。The function distribution and size range of the virtual interface template M are determined and unchanged. According to different application needs, different virtual interface templates M can be designed. The so-called point of interest on V refers to the interactive object of the operator on the virtual interface V.

基于虚拟界面群的数字输入算法具体如下：The digital input algorithm based on the virtual interface group is as follows:

1算法描述1 Algorithm Description

为了通过虚拟界面输入N位数字，需要生成N个虚拟界面，从而形成虚拟界面群。其中，每个虚拟界面的模板相同，采用操作者熟悉的类似时钟表盘结构。其中，把“12”点定义为0，然后沿顺时针方向依次定义1到9，如图1所示，图中，沿顺时针方向，点1和点2之间所在的弧的中点为A，点2和点3之间所在的弧的中点为B，射线ZA和ZB围成一个扇形区域Θ。显然，选中点2的充分必要条件是手势重心落在区域Θ所确定的区域内且检测到操作者的“选中”行为。Θ的弧越大，选择的出错率就低。In order to input N-digit numbers through the virtual interface, N virtual interfaces need to be generated, thereby forming a group of virtual interfaces. Among them, the template of each virtual interface is the same, adopting a similar clock dial structure familiar to the operator. Among them, the "12" point is defined as 0, and then 1 to 9 are defined in the clockwise direction, as shown in Figure 1, in the figure, along the clockwise direction, the midpoint of the arc between point 1 and point 2 is A, the midpoint of the arc between point 2 and point 3 is B, and rays ZA and ZB enclose a fan-shaped area Θ. Obviously, the necessary and sufficient condition for selecting point 2 is that the center of gravity of the gesture falls within the area determined by the area Θ and the "selection" behavior of the operator is detected. The larger the arc of Θ, the lower the error rate of the selection.

数字输入算法：Number input algorithm:

0：For(i＝1；i≤N；i++)0: For(i=1; i≤N; i++)

{{

1.生成第i个虚拟界面Vi；1. Generate the i-th virtual interface Vi;

2.操作者在Vi上移动手势；2. The operator moves gestures on Vi;

3.如果手势处于静止状态：3. If the gesture is at rest:

4.检测到“抓”手势了吗？如果“不是”，转步骤4.4. Is the "grab" gesture detected? If "No", go to step 4.

5.计算手势重心所在位置Pg；5. Calculate the position Pg of the center of gravity of the gesture;

6.如果转步骤5；6. If Go to step 5;

7.计算f：7. Calculate f:

其中，Θf表示虚拟界面Vi上数字f所在的空间扇形区域点集；Wherein, Θf represents the point set of the space fan-shaped area where the number f is located on the virtual interface Vi;

8.感知f的是否正确，如果选择结果f错误，进入纠错处理；否则，得到Vi的选择结果f；8. Perceive whether f is correct, if the selection result f is wrong, enter the error correction process; otherwise, get the selection result f of Vi;

}}

同时用两个如图2所示的虚拟界面实现双位数字输入，可以用于智能电视频道输入。例如，操作者欲输入频道48，只需通过一个虚拟界面输入数字“4”，再通过另一个虚拟界面输入数字“8”。At the same time, two virtual interfaces as shown in Figure 2 are used to realize double-digit input, which can be used for smart TV channel input. For example, if the operator wants to input channel 48, he only needs to input the number "4" through one virtual interface, and then input the number "8" through another virtual interface.

2算法的进一步改进2 Further improvement of the algorithm

考虑到手势运动范围往往有限，且既不能增加操作者的操作负荷，又不能影响交互的自然性，故虚拟界面的大小往往有所受限，这就决定了相邻两个数字之间的距离比较小，从而上述算法的选择精准度会受到很大影响，必然加重操作者的操作负荷和认知负荷。为了解决这个问题，本发明把虚拟界面上0到9的9个扇形区域分为3个象限(图3-1)。象限I：0-3所在的扇形区域；象限II:3-6所在的扇形区域；象限III:6-9所在的扇形区域。如果规定手势从虚拟界面的中心点出发选择象限，则根据手势运动的方向范围，就可以判断操作者欲选择的象限。例如，规定Z与数字0确定的射线为起始向量L，沿顺时针方向角度逐渐增大，手势运动的方向为T，T与L之间的夹角为θ＝<T,L>。于是T所在的象限为Considering that the range of motion of gestures is often limited, and it can neither increase the operator's operating load nor affect the naturalness of the interaction, the size of the virtual interface is often limited, which determines the distance between two adjacent numbers. is relatively small, so the selection accuracy of the above algorithm will be greatly affected, which will inevitably increase the operator's operational load and cognitive load. In order to solve this problem, the present invention divides the 9 fan-shaped areas from 0 to 9 on the virtual interface into 3 quadrants (Fig. 3-1). Quadrant I: the fan-shaped area where 0-3 is located; Quadrant II: the fan-shaped area where 3-6 is located; Quadrant III: the fan-shaped area where 6-9 is located. If it is specified that the gesture starts from the center point of the virtual interface to select a quadrant, then the quadrant to be selected by the operator can be determined according to the direction range of the gesture movement. For example, it is stipulated that the ray determined by Z and the number 0 is the starting vector L, the angle gradually increases in the clockwise direction, the direction of the gesture movement is T, and the angle between T and L is θ=<T,L>. Then the quadrant where T is located is

在进行数字选择时，首先把需要的数字所在的象限提取出来放到一个新的虚拟界面中(手掌沿T方向平移，然后垂直向前推手势，则T所确定的象限被选中并调入相应的虚拟界面模板形成新的虚拟界面供操作者进一步操作)，在新的虚拟界面中，由于只有4个数字，数字之间的间距比较大(相当于放大了扇形区域的弧长)，因此，选择操作的精度得到提高(图3-2)When selecting a number, first extract the quadrant where the required number is located and place it in a new virtual interface (translate the palm along the T direction, and then push the gesture vertically forward, the quadrant determined by T will be selected and transferred to the corresponding The template of the virtual interface forms a new virtual interface for the operator to further operate), in the new virtual interface, since there are only 4 numbers, the distance between the numbers is relatively large (equivalent to enlarging the arc length of the fan-shaped area), therefore, The precision of select operations is improved (Figure 3-2)

图3-1中10个数字划分到三个象限中，通过手势运动方向可以利用隐式交互技术确定操作者的选择象限。图3-2当选中第1象限后，把该象限作为虚拟界面模板生成新的虚拟界面，从该虚拟界面选择数字的精度将得到提高。In Figure 3-1, the 10 numbers are divided into three quadrants, and the operator’s choice of quadrant can be determined by implicit interaction technology through the direction of gesture movement. Figure 3-2 When the first quadrant is selected, use this quadrant as a virtual interface template to generate a new virtual interface, and the accuracy of selecting numbers from this virtual interface will be improved.

本发明采用粒子滤波算法得到手势运动轨迹：The present invention adopts the particle filter algorithm to obtain the gesture trajectory:

S1：初始化。S1: Initialization.

根据手势重心状态的先验分布p(X₀)选择粒子集： Select the particle set according to the prior distribution p(X ₀ ) of the state of the center of gravity of the gesture:

k＝1；k=1;

S2：状态采样。S2: State sampling.

S2.1：Fori＝1ToNS2.1: Fori=1ToN

根据先验分布得到样本 According to the prior distribution get a sample

S2.3：求样本的权值：S2.3: Find the weight of the sample:

Fori＝1ToNFori=1ToN

${ω ω}_{k k}^{((i i))} = = {ω ω}_{k k - - 11}^{((i i))} \frac{p p (({Y Y}_{k k} | | {X x}_{k k}^{((i i))})) p p (({X x}_{k k}^{((i i))} | | {X x}_{k k - - 11}^{((i i))}))}{π π (({X x}_{k k}^{((i i))} | | {X x}_{k k - - 11}^{((i i))},, {Z Z}_{k k}))}$

S2.3：把权值标准化：S2.3: Standardize the weights:

Fori＝1ToNFori=1ToN

${ω ω}_{k k}^{((i i))} = = \frac{{ω ω}_{k k}^{((i i))}}{{Σ Σ}_{j j = = 11}^{N N} {ω ω}_{k k}^{((j j))}}$

S3：状态估计：S3: State Estimation:

${\overset{&OverBar; &OverBar;}{X x}}_{k k} = = {Σ Σ}_{i i = = 11}^{N N} {ω ω}_{k k}^{((i i))} {X x}_{k k}^{((i i))}$

S4：重采样：S4: resampling:

对样本进行重新抽样，产生一组新样本使得：Resample the sample to produce a new set of samples makes:

$p p (({\overset{~ ~}{X x}}_{k k}^{((i i))} = = {X x}_{k k}^{((i i))})) = = {ω ω}_{k k}$

亦即新样本中出现的概率为ω_i That is, in the new sample The probability of occurrence is ω _i

S5：k＝k+1，转S2。S5: k=k+1, go to S2.

手势识别算法：Gesture recognition algorithm:

本发明的一种基于形状上下文的手势识别方法，是通过手势图像上手势点对应的形状上下文特征进行比对，实现手势识别的。The gesture recognition method based on the shape context of the present invention realizes the gesture recognition by comparing the shape context features corresponding to the gesture points on the gesture image.

首先构建手势数据库。First build the gesture database.

(1)选取m种手势，每种手势选取n幅手势图像，m＝5、n＝10；即选取5种手势，每种手势选取10幅手势图像。(1) Select m gestures, select n gesture images for each gesture, m=5, n=10; that is, select 5 gestures, and select 10 gesture images for each gesture.

(2)找出每幅手势图像中的手势点；遍历整幅图像，若为黑色则认为是背景；否则则认为是手势点，同时记录下手势点的坐标和手势点的个数。(2) Find out the gesture point in each gesture image; traverse the whole image, if it is black, it is considered as the background; otherwise, it is considered as the gesture point, and record the coordinates of the gesture point and the number of gesture points.

(3)计算出手势重心以及手势重心与手势点之间的最大距离；(3) Calculate the center of gravity of the gesture and the maximum distance between the center of gravity of the gesture and the gesture point;

(4)将该最大距离作为最大半径做圆，然后将该最大半径平均分成k份，以此做k个同心圆，k取12，相邻的两个同心圆之间形成圆环；统计落在每个圆环内的手势点，然后计算出每个圆环的中心点，以此作为手势特征点；(4) Make a circle with the maximum distance as the maximum radius, and then divide the maximum radius into k parts on average to make k concentric circles, where k is 12, and a ring is formed between two adjacent concentric circles; statistics fall Gesture points in each ring, and then calculate the center point of each ring as the gesture feature point;

(5)在手势特征点和手势点的基础上提取形状上下文特征，最后将形状上下文特征写入文本文件存储在手势数据库；手势数据库共有50个文件，每种手势10个文件。(5) Extract shape context features on the basis of gesture feature points and gesture points, and finally write shape context features into text files and store them in the gesture database; there are 50 files in the gesture database, 10 files for each gesture.

其中，统计手势点函数voidHandsDetection(D2POINTedgepoint[]，BYTE*image,int*HandpointsNO)该函数的主要功能是在分割出来的手势图像基础上统计手势点，记录其坐标，并返回手势点的个数。Among them, the function of counting gesture points voidHandsDetection(D2POINTedgepoint[], BYTE*image, int*HandpointsNO) The main function of this function is to count gesture points based on the segmented gesture image, record their coordinates, and return the number of gesture points.

D2POINT为结构体类型，该结构体类型的定义为：structD2POINT{intx；inty；}；D2POINT is a structure type, and the definition of this structure type is: structD2POINT{intx; inty;};

输入：指向待处理图像的指针image。Input: pointer image to the image to be processed.

输出：函数的返回值为存放手势点坐标的edgepoint[]和用来记录手势点的Output: The return value of the function is the edgepoint[] that stores the coordinates of the gesture point and the edgepoint[] used to record the gesture point

个数的HandpointsNO。Number of HandpointsNO.

具体步骤：Specific steps:

①按照从上到下、从左到右的顺序遍历图像上每个像素点。① Traverse each pixel on the image in order from top to bottom and from left to right.

②若该像素点为黑色，则结束本次循环，然后继续遍历；若不为黑色，则把该像素的x坐标和y坐标存储在数组edgepoint[]中，并使手势点个数HandpointsNO加1。②If the pixel is black, end this cycle, and then continue traversing; if it is not black, store the x-coordinate and y-coordinate of the pixel in the array edgepoint[], and add 1 to the number of gesture points HandpointsNO .

③重复执行步骤②，直到图像遍历结束。③ Repeat step ② until the image traversal ends.

④返回手势点个数HandpointsNO。④ Return the number of gesture points HandpointsNO.

统计圆环中心点函数voidCountRing(D2POINTedgepoint[],D2POINTfeaturedot[],D2POINTsumpoints[],intHandpointsNO,intcircleno)Statistical ring center point function voidCountRing(D2POINTedgepoint[],D2POINTfeaturedot[],D2POINTsumpoints[],intHandpointsNO,intcircleno)

该函数的功能是计算每个圆环的中心点。The function of this function is to calculate the center point of each ring.

输入：存储手势点坐标的edgepoint[]，存储手势点个数的HandpointsNO，存储圆环份数的circleno，circleno的值为12。Input: edgepoint[] that stores gesture point coordinates, HandpointsNO that stores the number of gesture points, circleno that stores the number of circles, and the value of circleno is 12.

输出：返回存储圆环中心点坐标的数组featuredot[]，存储圆环中心点和手势点坐标的数组sumpoints[]，Output: return the array featuredot[] that stores the coordinates of the center point of the ring, the array sumpoints[] that stores the coordinates of the center point of the ring and the gesture point,

具体步骤：Specific steps:

①由手势点坐标edgepoint[]和HandpointsNO的值，求手势的重心坐标weight。① From the gesture point coordinates edgepoint[] and the value of HandpointsNO, find the weight of the center of gravity coordinates of the gesture.

②求重心到手势点数组edgepoint[]的最大距离maxjuli。②Find the maximum distance maxjuli from the center of gravity to the gesture point array edgepoint[].

③由最大距离maxjuli为最大半径，并将此半径平均分成12份，以此确定12个圆。③Use the maximum distance maxjuli as the maximum radius, and divide this radius into 12 parts on average to determine 12 circles.

④根据半径的范围统计每个圆环内手势点的坐标和落在此圆环内手势点的个数，分别存放在数组ring[12]的成员变量D2POINTshixinpoint[200]和no中。数组ring[12]的类型为结构体CircleRing。该结构体类型的定义为：structCircleRing{D2POINTshixinpoint[200]；//圆环内的点坐标intno；//圆环内点的数目D2POINTavg；//圆环中心点坐标}；④ Count the coordinates of the gesture points in each ring and the number of gesture points falling in the ring according to the range of the radius, and store them in the member variables D2POINTshixinpoint[200] and no of the array ring[12] respectively. The type of the array ring[12] is the structure CircleRing. The definition of this structure type is: structCircleRing{D2POINTshixinpoint[200]; // point coordinates intno inside the ring; // number of points inside the ring D2POINTavg; // center point coordinates of the ring};

⑤在第四步的基础上，计算每个圆环的中心点。若该圆环内点的个数即no的值不为0，则通过落在该圆环内的点计算该圆环的中心点，存储在数组ring[12]的成员变量D2POINTavg中。若no的值为0则将该圆环中心点的x、y坐标均置为0。⑤ On the basis of the fourth step, calculate the center point of each ring. If the number of points in the ring, that is, the value of no is not 0, the center point of the ring is calculated by the points falling in the ring, and stored in the member variable D2POINTavg of the array ring[12]. If the value of no is 0, the x and y coordinates of the center point of the ring are both set to 0.

⑤将每个圆环的中心点复制到数组featuredot[]中，作为手势特征点。将圆环中心点和手势点复制到数组sumpoints[]中，为之后的形状上下文特征提取做准备。⑤ Copy the center point of each ring to the array featuredot[] as the gesture feature point. Copy the ring center point and gesture point to the array sumpoints[] to prepare for the subsequent shape context feature extraction.

形状上下文特征提取函数voidShapeContext(intFeatureNo[][60],D2POINTfeaturedots[],D2POINTsumpoints[],intHandpointsNO,intcircleno)该函数的主要功能是求取手势特征点的形状上下文特征。Shape context feature extraction function voidShapeContext(intFeatureNo[][60], D2POINTfeaturedots[], D2POINTsumpoints[], intHandpointsNO, intcircleno) The main function of this function is to obtain the shape context feature of gesture feature points.

输入：featuredots[]为手势特征点即圆环中心点，sumpoints[]为手势特征点和手势点的集合，HandpointsNO为手势点的个数，circleno为圆环份数。Input: featuredots[] is the gesture feature point, that is, the center point of the circle, sumpoints[] is the collection of gesture feature points and gesture points, HandpointsNO is the number of gesture points, and circleno is the number of circles.

输出：用来存储每个手势特征点的形状上下文特征的数组FeatureNo[][60]。Output: an array FeatureNo[][60] used to store the shape context features of each gesture feature point.

具体步骤：Specific steps:

对每个手势特征点进行如下的操作。Perform the following operations on each gesture feature point.

①求当前手势特征点到数组sumpoints[]中点的最大距离maxdistance。① Find the maximum distance maxdistance from the current gesture feature point to the midpoint of the array sumpoints[].

②若该手势特征点的x、y坐标均不为0，则执行下面的算法。如下图所示：以当前手势特征点为极点，最大距离maxdistance为半径，将平面空间划分为60个区域。具体划分方法如下：以当前手势特征点为极点构造极坐标系，将整个平面空间从方向上平均划分出12个方向,同时在半径上则均匀划分5份。所以,整个平面空间就自然被划分为60个区域。在相同的环上,每个区域的面积是相等的,然后统计数组sumpoints[]中的点落在每一个区域中的个数。② If the x and y coordinates of the gesture feature point are not 0, execute the following algorithm. As shown in the figure below: take the current gesture feature point as the pole, and the maximum distance maxdistance as the radius, divide the plane space into 60 regions. The specific division method is as follows: take the current gesture feature point as the pole to construct a polar coordinate system, divide the entire plane space into 12 directions on average, and divide it into 5 parts evenly on the radius. Therefore, the entire plane space is naturally divided into 60 areas. On the same ring, the area of each area is equal, and then the number of points in the array sumpoints[] falling in each area is counted.

第i个手势特征点的60个属性值能够构成一个序列(a_i,1,a_i,2,...,a_i,60),所以可以用一个n*60形状矩阵来对图像形状进行描述:The 60 attribute values of the i-th gesture feature point can form a sequence (a _i,1 ,a _i,2 ,...,a _i,60 ), so an n*60 shape matrix can be used to perform image shape describe:

该矩阵的含义为：对于每个矩阵元素a_i,j，i代表第i个特征点，j代表60个区域中的第j个区域，a_i,j的含义为：以第i个特征点为极点，建立极坐标系，落在第j个区域内点的个数。n的值为特征点的总个数，这里n的值为12，因为共有12个圆环中心点，即手势特征点。该矩阵即代表该图像的形状的上下文特征。将此矩阵的值保存在二维数组FeatureNo[][60]中。若该手势特征点的x、y坐标均为0，则将该特征点的60个属性值全置为0。 The meaning of this matrix is: for each matrix element a _i,j , i represents the i-th feature point, j represents the j-th area in the 60 areas, and the meaning of a _i,j is: take the i-th feature point As the pole, establish a polar coordinate system, and the number of points falling in the jth area. The value of n is the total number of feature points, where the value of n is 12, because there are 12 center points of the ring, that is, gesture feature points. This matrix is the contextual feature representing the shape of the image. Save the values of this matrix in the two-dimensional array FeatureNo[][60]. If the x and y coordinates of the gesture feature point are both 0, all 60 attribute values of the feature point are set to 0.

然后进行手势识别。Then perform gesture recognition.

(1)顺序读取50个手势数据库文件并将其保存在数组中。(1) Read 50 gesture database files sequentially and save them in an array.

(2)从视频流中连续选取F帧待识别手势图像，F取10，从第十帧开始，从视频流中连续取10帧图像，作为待识别手势图像。(2) Continuously select F frames of gesture images to be recognized from the video stream, F is 10, starting from the tenth frame, continuously take 10 frames of images from the video stream as gesture images to be recognized.

(3)采用与上述相同的方法实时计算出每帧待识别图像的形状上下文特征；(3) adopting the same method as above to calculate the shape context feature of each frame of the image to be recognized in real time;

(4)计算每帧待识别手势图像的形状上下文特征分别与手势数据库中m*n幅手势图像的形状上下文特征之间的χ²距离，然后将手势数据库中每幅手势图像参与计算得出的所有χ²距离加起来保存在一个数组中，每帧待识别手势图像共对应保存m*n个χ²距离和数组，采用Sort函数求取m*n个χ²距离和数组的最小值A；( ⁴ ) Calculate the χ2 distance between the shape context features of each frame of gesture images to be recognized and the shape context features of m*n gesture images in the gesture database, and then use each gesture image in the gesture database to participate in the calculation. All χ ² distances add up and are stored in an array, and each frame of the gesture image to be recognized corresponds to saving m*n χ ² distances and an array, and the Sort function is used to obtain m*n χ ² distances and the minimum value A of the array;

(5)按照上述方法，分别计算得出F帧待识别手势图像对应的F个χ²距离和数组的最小值A，然后采用Sort函数对F个χ²距离和数组的最小值A再求最小值B，该最小值B所对应存储在手势数据库中的手势即为识别出来的手势。(5) According to the above-mentioned method, calculate respectively F χ ² distances and the minimum value A of the array corresponding to F frames of gesture images to be recognized, and then adopt the Sort function to find the minimum value A of the F χ ² distances and the array value B, the gesture stored in the gesture database corresponding to the minimum value B is the recognized gesture.

读手势模板库文件函数voidreadfile(inttemplet[50][20][60])Read gesture template library file function voidreadfile(inttemplet[50][20][60])

该函数的功能是读取已经建好的手势数据库文件，并将其保存在数组templet[50][20][60]中。其中第一维数值的大小为文件标号，共有5种手势，每种手势共10个手势库文件；第二维数值的大小代表手势特征点的个数；第三维数值的大小代表每个手势特征点所对应的60个特征的值。The function of this function is to read the already built gesture database file and save it in the array templet[50][20][60]. Among them, the value of the first dimension is the file label, and there are 5 kinds of gestures, and there are 10 gesture library files for each gesture; the value of the second dimension represents the number of gesture feature points; the value of the third dimension represents the characteristics of each gesture The values of the 60 features corresponding to the points.

输出：保存所有模板库文件的数组templet[50][20][60]。Output: An array templet[50][20][60] holding all template library files.

具体步骤：Specific steps:

①共有5种手势，每种手势10个文件，所以共有50个手势库文件。按照顺序依次将每个文件读进数组templet[50][20][60]中。数组templet[50][20][60]第一维的数值代表不同的手势。其中0-9为包袱，10-19为剪刀，20-29为ok手势，30-39为拳头，40-49为大拇指。① There are 5 gestures in total, and 10 files for each gesture, so there are 50 gesture library files in total. Read each file into the array templet[50][20][60] in sequence. The value of the first dimension of the array templet[50][20][60] represents different gestures. Among them, 0-9 is the bag, 10-19 is the scissors, 20-29 is the ok gesture, 30-39 is the fist, and 40-49 is the thumb.

②顺序读取每个文件。若读出来的值为-1，则认为是该文件读取结束，将flag置为1，作为该文件读取结束标志。② Read each file sequentially. If the read value is -1, it is considered that the reading of the file has ended, and the flag is set to 1 as the end sign of the reading of the file.

手势识别函数voidIdensitify(intfeaturecon[][60],floatchengben[],intn,inttemplet[50][20][60],intcircleno)Gesture recognition function voidIdensitify(intfeaturecon[][60],floatchengben[],intn,inttemplet[50][20][60],intcircleno)

此函数的功能是将待识别手势帧图像的形状上下文特征与手势库中某种手势的10个库文件中的某个文件中的形状上下文特征进行比较，求匹配代价。The function of this function is to compare the shape context feature of the gesture frame image to be recognized with the shape context feature in one of the 10 library files of a certain gesture in the gesture library, and find the matching cost.

输入：featurecon[][60]是求得的待识别手势帧图像的形状上下文特征，templet[50][20][60]用来存储从手势库文件中读取的形状上下文特征的值。n为文件的标号，代表待识别手势与第n个手势库文件进行比较。因为共有50个文件所以n的值取0-49。circleno为圆环的个数。Input: featurecon[][60] is the obtained shape context feature of the gesture frame image to be recognized, and templet[50][20][60] is used to store the value of the shape context feature read from the gesture library file. n is the label of the file, which means the gesture to be recognized is compared with the nth gesture library file. Because there are 50 files in total, the value of n takes 0-49. circleno is the number of circles.

输出：返回存储匹配代价的数组chengben[]。因为共有12个手势特征点，所以该数组共有12个值。Output: Returns the array chengben[] that stores the matching cost. Because there are 12 gesture feature points in total, the array has 12 values in total.

具体步骤：Specific steps:

①按行遍历指向待识别手势形状上下文特征的数组featurecon[][60]，同时遍历存储手势库形状上下文特征的数组templet[50][20][60]。① Traverse the array featurecon[][60] pointing to the gesture shape context feature to be recognized by row, and traverse the array templet[50][20][60] storing the shape context feature of the gesture library at the same time.

②若该手势特征点的60个属性值全为0，则将该手势库文件中对应手势特征点的60个属性值赋值予它。然后将该手势特征点的60个属性值与该手势库文件中每个特征点的形状上下文特征进行比较，因为该特征点与手势库中每个特征点的特征之间都有一个匹配代价，所以共得到12个匹配代价值，然后在这12个值中取最小值，作为数组chengben[]的某个元素值。匹配代价用χ²距离来定义。χ²距离的定义为：② If the 60 attribute values of the gesture feature point are all 0, assign the 60 attribute values of the corresponding gesture feature point in the gesture library file to it. Then compare the 60 attribute values of the gesture feature point with the shape context feature of each feature point in the gesture library file, because there is a matching cost between the feature point and the feature of each feature point in the gesture library, Therefore, a total of 12 matching cost values are obtained, and then the minimum value is taken among these 12 values as an element value of the array chengben[]. ^The matching cost is defined by the χ2 distance. ^The χ2 distance is defined as:

${C C}_{i i j j} = = \frac{11}{22} {Σ Σ}_{k k = = 11}^{K K} \frac{{[[{h h}_{i i} ((k k)) - - {h h}_{j j} ((k k))]]}^{22}}{{h h}_{i i} ((k k)) + + {h h}_{j j} ((k k))}$

公式中的h_i(k)为待识别手势中第i个手势特征点的形状上下文特征值，h_j(k)为某个手势库文件中第j个手势特征点的形状上下文特征值，这里k的值为1-60，代表60个属性值中的某一个。经过此公式得到匹配代价值C_ij，即i、j两个特征点之间的匹配代价。h _i (k) in the formula is the shape context feature value of the i-th gesture feature point in the gesture to be recognized, and h _j (k) is the shape context feature value of the j-th gesture feature point in a certain gesture library file, where The value of k is 1-60, representing one of the 60 attribute values. Through this formula, the matching cost value C _ij is obtained, that is, the matching cost between the two feature points i and j.

③按照以上方法对每个手势特征点进行遍历，所以最终得到12个匹配代价值，存放在数组chengben[]中。③ According to the above method, each gesture feature point is traversed, so 12 matching cost values are finally obtained, which are stored in the array chengben[].

排序函数IdensityFlagSort(IdensityFlaggross[],intn)Sort function IdensityFlagSort(IdensityFlaggross[], intn)

该函数的功能是对求得的χ²距离和求取最小值。 ^The function of this function is to find the minimum value of the calculated χ2 distance sum.

IdensityFlag为结构体类型，定义如下：IdentityFlag is a structure type, defined as follows:

tructIdensityFlag{floatsum；intflag；}；tructIdensityFlag { floatsum; intflag; };

输入：数组gross[]中的成员变量sum为总的匹配代价，即待识别手势与某个手势库文件之间的匹配代价总和。成员变量flag代表手势标号。0为包袱，1为剪刀，2为ok，3为拳头，4为大拇指。Input: The member variable sum in the array gross[] is the total matching cost, that is, the sum of matching costs between the gesture to be recognized and a certain gesture library file. The member variable flag represents the gesture label. 0 is burden, 1 is scissors, 2 is ok, 3 is fist, 4 is thumb.

输出：返回的是类型为IdensityFlag的变量mark。mark是所有匹配代价值中最小的。Output: The variable mark of type IdentityFlag is returned. mark is the smallest of all matching cost values.

具体步骤：Specific steps:

①定义一个IdensityFlag类型的变量mark来存储匹配代价和的最小值，并将数组gross[]中的第一个元素值赋值给mark。①Define a variable mark of type IdensityFlag to store the minimum matching cost sum, and assign the value of the first element in the array gross[] to mark.

②依次遍历数组gross[]，若该元素的成员变量sum值小于mark成员变量的sum值，则将该元素赋值给mark。② Traverse the array gross[] in turn, if the sum value of the member variable of the element is less than the sum value of the mark member variable, assign the element to mark.

③反复执行②直到遍历结束。返回mark的值。③Execute ② repeatedly until the traversal ends. Returns the value of mark.

voidCMainFrame::TotalIdensity(BYTE*lpImgData[],inttemplet[50][20][60])该函数的主要功能是对实时得到的10帧手势图像的一个识别总过程。voidCMainFrame::TotalIdensity(BYTE*lpImgData[],inttemplet[50][20][60]) The main function of this function is a total recognition process of 10 frames of gesture images obtained in real time.

输入：BYTE*lpImgData[]指向得到的10帧图像，inttemplet[50][20][60]存储手势模板值。Input: BYTE*lpImgData[] points to the obtained 10-frame image, inttemplet[50][20][60] stores the gesture template value.

具体步骤：Specific steps:

对于每帧图像分别进行如下的操作：For each frame of image, the following operations are performed:

①通过HandsDetection函数求取手势点。若该帧手势点个数为0,则舍弃该帧.若不为0则为有效帧，用frameNo来表示有效帧的个数。然后进行下面的计算。① Obtain the gesture point through the HandsDetection function. If the number of gesture points in the frame is 0, discard the frame. If it is not 0, it is a valid frame, and use frameNo to indicate the number of valid frames. Then perform the following calculations.

②通过CountRing函数统计每个圆环内的中心点。② Count the center points in each ring through the CountRing function.

③通过ShapeContext函数计算该帧手势图像的形状上下文特征。③ Calculate the shape context feature of the frame gesture image through the ShapeContext function.

④通过Idensitify函数将该帧手势图像的形状上下文特征与所有模板库中的形状上下文特征进行比较，以此得到50个χ²距离和。④Comparing the shape context features of the frame gesture image with the shape context features in all template libraries through the Idensitify function, so as to obtain 50 χ ² distance sums.

⑤在这50个χ²距离和中通过Sort函数求取最小值。⑤ In the 50 χ ² distance sums, find the minimum value through the Sort function.

循环执行步骤①-⑤，对10帧图像分别进行处理。因为每帧有效帧对应一个χ²距离和的最小值，所以n个有效帧对应n个χ²距离。在这n个χ2距离中通过Sort函数再取最小值。该最小值所对应的手势即为识别出来的手势Perform steps ①-⑤ in a loop to process 10 frames of images respectively. Because each effective frame corresponds to a minimum value of the sum of χ ² distances, n effective frames correspond to n χ ² distances. Take the minimum value again through the Sort function among the n χ2 distances. The gesture corresponding to the minimum value is the recognized gesture

本发明方法的优点在于：(1)虚拟界面模板概念符合人们的认知行为模型和心理模型，有助于凝练统一、规范的交互界面范式；(2)人们对于钟表盘界面的认知根深蹄固，把0到9的10个数字通过钟表盘进行关联符合人们的日常生活经验，要求人们“掌握”这种结构并进行交互具有天然的认知基础和实践基础，以至于即使把钟表盘界面进一步划分为三个象限，也不会增加操作者的认知负荷；(3)虚拟界面概念有效地解决了“MidasTouch问题”且可实现多人交互；(4)克服了现有数字手势方法的弊端，尤其是它避开了手势识别率的困扰；(5)可以方便地实现多位数字的输入，具有速度快、出错率低、简单方便、自然高效的优点。The method of the present invention has the advantages that: (1) the concept of the virtual interface template conforms to people's cognitive behavioral model and mental model, and helps to condense a unified and standardized interactive interface paradigm; (2) people's cognition of the clock dial interface is deeply rooted It is true that associating the 10 numbers from 0 to 9 through the clock dial is in line with people's daily life experience, requiring people to "master" this structure and interact with it has a natural cognitive and practical basis, so that even if the clock dial interface Further divided into three quadrants, it will not increase the cognitive load of the operator; (3) the virtual interface concept effectively solves the "MidasTouch problem" and can realize multi-person interaction; (4) overcomes the limitations of existing digital gesture methods. Disadvantages, especially it avoids the puzzlement of gesture recognition rate; (5) can realize the input of multi-digit number conveniently, has the advantage of fast speed, low error rate, simple and convenient, natural and efficient.

上述技术方案只是本发明的一种实施方式，对于本领域内的技术人员而言，在本发明公开了应用方法和原理的基础上，很容易做出各种类型的改进或变形，而不仅限于本发明上述具体实施方式所描述的方法，因此前面描述的方式只是优选的，而并不具有限制性的意义。The above-mentioned technical solution is only an embodiment of the present invention. For those skilled in the art, on the basis of the application methods and principles disclosed in the present invention, it is easy to make various types of improvements or deformations, and is not limited to The methods described in the above specific embodiments of the present invention, therefore, the above-described methods are only preferred and not limiting.

Claims

1. A digital input method based on gestures, characterized in that: the method constructs a virtual interface, the virtual interface is an operator's gesture operation area, and the virtual interface can change along with changes in the operator's body position or posture ; Input N numbers sequentially through a virtual interface, or generate N virtual interfaces, each virtual interface inputs a number, and these N virtual interfaces form a virtual interface group;

The display screen is defined as the physical interface, and the space between the operator and the physical interface is defined as the physical space;

The physical space includes a virtual interface and a non-virtual interface; the operator's gestures are valid and perceivable only in the virtual interface, and gestures in the non-virtual interface are invalid;

Input numbers through the virtual interface or group of virtual interfaces;

The construction of the virtual interface is implemented as follows:

(1) The computer detects the behavior model of the operator: if it is detected that the palm is first pushed forward and then remains stationary, then enter step (2), if it is other behaviors, then return to step (1);

(2) Calculate the position Z of the center of gravity of the gesture when the palm remains stationary;

(3) Put the virtual interface template M at a position centered on the center of gravity position Z of the gesture to obtain a virtual interface V(Z, Ω _M ), where Ω _M represents the scope of the virtual interface determined by the template M;

(4) Calculate the three-dimensional position information of each point of interest on V;

(5) Return to the virtual interface V.

2. The gesture-based digital input method according to claim 1, characterized in that: the function distribution and size range of the virtual interface template M of each application are fixed; or interactive objects on the virtual interface V.

3. The gesture-based digital input method according to claim 2, characterized in that: said step (4) is to calculate the three-dimensional position information of each point of interest on V by using spatial depth information and gesture tracking method.

4. The gesture-based digital input method according to claim 3, characterized in that: said inputting numbers through said virtual interface group is realized in the following way:

Q1, using the clock dial structure to represent the virtual interface, wherein the 12 o’clock in the clock dial structure is defined as the number 0, and then the numbers 1 to 9 are sequentially defined in the clockwise direction, and the arc between the number 1 and the number 2 is defined. The midpoint is A, the midpoint of the arc between the number 2 and the number 3 is B, the center of the circle is Z, and the rays ZA and ZB enclose a fan-shaped area Θ;

Q2, for each number to be entered, the specific steps include:

(A1) generate the i-th virtual interface Vi;

(A2) The operator moves gestures on Vi;

(A3) If the gesture is in a static state, further detect whether it is changed from a bag gesture with five fingers stretched to a fist gesture with five fingers contracted, if yes, then enter (A4), if not, return to step (A3);

(A4) Calculate the position Pg of the center of gravity of the gesture;

(A5) If Go to step (A4);

(A6) Calculate f:

\underset{f f}{M m i i n no} {{d d i i s the s (({P P}_{g g},, {Θ Θ}_{f f}))}} - - - - - - ((11))

Among them, Θ _f represents the point set of the space fan-shaped area where the number f is located on the virtual interface V _i .

5. The gesture-based digital input method according to claim 4, characterized in that: said Q1 further comprises:

Divide the 9 fan-shaped areas of numbers 0 to 9 on the virtual interface into 3 quadrants, where quadrant I is the fan-shaped area where 0-3 is located; quadrant II is the fan-shaped area where 3-6 is located; quadrant III is where 6-9 is located sector area;

The operator's gesture starts from the center point of the virtual interface to select a quadrant, and then judges the quadrant that the operator wants to select according to the direction range of the gesture trajectory;

The ray determined by the center point Z and the number 0 is the starting vector L, and the angle gradually increases in the clockwise direction. The direction of gesture movement is T, and the angle between T and L is θ=<T,L>, where T is The quadrants are:

When making digital selection, the operator’s palm moves along the T direction, and then pushes the gesture vertically forward, then the quadrant determined by T is selected, and then the corresponding virtual interface template is called, and the quadrant is set as a new virtual interface for operation or further operation;

In said new virtual interface, there are only 4 numbers in that quadrant.

6. The gesture-based digital input method according to any one of claims 1 to 5, characterized in that the method uses a particle filter algorithm to obtain gesture motion trajectories.