CN104834922B

CN104834922B - Gesture identification method based on hybrid neural networks

Info

Publication number: CN104834922B
Application number: CN201510280013.3A
Authority: CN
Inventors: 纪禄平; 尹力; 周龙; 王强; 卢鑫; 黄青君; 杨洁
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2015-05-27
Filing date: 2015-05-27
Publication date: 2017-11-21
Anticipated expiration: 2035-05-27
Also published as: CN104834922A

Abstract

The invention discloses a gesture recognition method based on a hybrid neural network. For a gesture image to be recognized and a gesture image training sample, firstly, a pulse-coupled neural network is used to detect and obtain noise points, and then a composite denoising algorithm is used to process the noise points, and then Cellular neural network is used to extract the edge points in the gesture image, and the connected areas are obtained according to the extracted edge points, and the fingertip detection is performed on each connected area by curvature to obtain the undetermined fingertip points, and the gesture area is obtained by excluding the interference of the face part, and then according to Gesture shape features are segmented, and Fourier descriptors that retain phase information are obtained according to the contour points of the segmented gesture area, and the first few Fourier descriptors are selected as gesture features; BP neural network is trained according to gesture features of gesture image training samples , input the gesture feature of the gesture image to be recognized into the BP neural network for recognition. The present invention improves the accuracy of gesture recognition through the application of various neural networks.

Description

Gesture Recognition Method Based on Hybrid Neural Network

技术领域technical field

本发明属于手势识别技术领域，更为具体地讲，涉及一种基于混合神经网络的手势识别方法。The invention belongs to the technical field of gesture recognition, and more specifically relates to a gesture recognition method based on a hybrid neural network.

背景技术Background technique

随着计算机技术的突飞猛进，人机交互技术在人们的生活中越来越普及。人机交互(Human-Computer Interaction,HCI)技术是指用户与计算机之间使用某种操作方式而执行的一种人与计算机之间的交互过程。它的发展大致经历了纯手工作业阶段、语言命令控制阶段、用户界面阶段等，然而随着近年来人工智能等技术的不断发展，逐渐引起对人机交互技术发展的重视。With the rapid development of computer technology, human-computer interaction technology is becoming more and more popular in people's life. Human-computer interaction (Human-Computer Interaction, HCI) technology refers to an interactive process between a human and a computer that is performed by using a certain operation mode between the user and the computer. Its development has roughly gone through the pure manual operation stage, the language command control stage, the user interface stage, etc. However, with the continuous development of artificial intelligence and other technologies in recent years, it has gradually attracted attention to the development of human-computer interaction technology.

现在随着计算机在应用领域方面的不断拓展，现有的人机交互方式已经不能满足人们对日常需求的更高层次的要求，急需一种更加简洁、友好的新型人机相互交互的方式。由于人机交互的最终目的是为了实现人与机器之间自然地交流，而在日常生活中人与人之间大部分是通过肢体语言或者面部表情来传达信息的，只有一小部分是通过自然语言来完成的，这就表明肢体语言表达人类情感或者意图方面具有更大的优势。由于在肢体语言当中，手扮演着极为重要的角色，因此，基于手势行为的交互方式即手势行为识别系统，也即手势识别系统受到人们的广泛关注。Now with the continuous expansion of computer applications, the existing human-computer interaction methods can no longer meet people's higher-level requirements for daily needs, and a more concise and friendly new human-computer interaction method is urgently needed. Since the ultimate goal of human-computer interaction is to achieve natural communication between humans and machines, in daily life, most of the information between humans is conveyed through body language or facial expressions, and only a small part is through natural communication. This shows that body language has a greater advantage in expressing human emotions or intentions. Since the hand plays an extremely important role in body language, the interaction method based on gesture behavior, that is, the gesture behavior recognition system, that is, the gesture recognition system has attracted widespread attention.

一般情况下，手势识别系统主要由以下几个部分组成：手势预处理、手势分割、手势建模、手势特征提取、手势识别。对于手势预处理操作，主要是手势图像的去噪操作，目前常见的去噪算法包括：均值滤波、中值滤波、空间低通滤波、频域低通滤波以及脉冲耦合神经网络等，但是对于多种噪声存在的情况下，目前的算法的去噪能力都不能达到能好的去噪效果，因此设计一个良好的去噪算法对于后期的识别过程至关重要。对于手势分割操作，目前常用的手势分割方法有基于肤色信息的分割方法、基于运动信息的分割方法以及基于边缘信息的分割方法。由于基于肤色信息的分割方法容易受到背景信息的干扰，基于边缘信息的分割方法又不能达到很好的分割效果，因此如何设计一个良好有效地分割算法也是至关重要的。对于手势特征提取操作，目前应用最广的是基于傅里叶描述子的特征提取方法，但是由于该方法的旋转不变性使得该方法对于手势旋转之后的手势的特征变化不大，因此如何设计一个不具有旋转不变性的傅里叶描述子也是至关重要的。对于手势识别操作，目前常见的方法有模板匹配技术、支持向量机、神经网络方法、隐马尔可夫模型等，因此如何选用一个良好的手势识别方法对于手势识别系统同样至关重要。In general, a gesture recognition system is mainly composed of the following parts: gesture preprocessing, gesture segmentation, gesture modeling, gesture feature extraction, and gesture recognition. For the gesture preprocessing operation, it is mainly the denoising operation of the gesture image. At present, the common denoising algorithms include: mean filter, median filter, spatial low-pass filter, frequency domain low-pass filter, and pulse-coupled neural network. In the presence of such noise, the denoising ability of the current algorithm cannot achieve a good denoising effect, so designing a good denoising algorithm is very important for the later recognition process. For gesture segmentation operations, currently commonly used gesture segmentation methods include segmentation methods based on skin color information, segmentation methods based on motion information, and segmentation methods based on edge information. Because the segmentation method based on skin color information is easily disturbed by background information, and the segmentation method based on edge information cannot achieve a good segmentation effect, so how to design a good and effective segmentation algorithm is also crucial. For the gesture feature extraction operation, the feature extraction method based on the Fourier descriptor is currently the most widely used, but due to the rotation invariance of the method, the method has little change in the characteristics of the gesture after the gesture is rotated, so how to design a Fourier descriptors that are not rotation invariant are also crucial. For gesture recognition operations, currently common methods include template matching technology, support vector machines, neural network methods, hidden Markov models, etc. Therefore, how to choose a good gesture recognition method is also crucial for gesture recognition systems.

神经网络方法是指利用一些简单的处理单元来模拟人脑神经元，并把这些简单的处理单元以某种方式连接成网络来实现对人脑模拟的一门科学。神经网络方法往往具有以下优势：并行计算，分布式存储，健壮性，非线性的处理以及良好的自适应性和容错性能力。因此，神经网络方法能够在多个场景下得到应用。例如：手势识别、图像分割、噪声处理等。The neural network method refers to the science of simulating the human brain neurons by using some simple processing units, and connecting these simple processing units into a network in a certain way to realize the simulation of the human brain. Neural network methods often have the following advantages: parallel computing, distributed storage, robustness, nonlinear processing, and good adaptability and fault tolerance. Therefore, the neural network method can be applied in multiple scenarios. For example: gesture recognition, image segmentation, noise processing, etc.

目前，神经网络方法已经在手势行为识别领域得到了越来越多的应用。然而，神经网络方法在手势行为识别领域的应用也仅限于手势识别这个阶段，在针对手势行为识别的其他阶段的应用很少。At present, the neural network method has been more and more applied in the field of gesture behavior recognition. However, the application of neural network methods in the field of gesture recognition is limited to the stage of gesture recognition, and there are few applications in other stages of gesture recognition.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于混合神经网络的手势识别方法，利用脉冲耦合神经网络提高手势图像的去噪效果，利用细胞神经网络进行手势分割，采用具有旋转可变性的傅里叶描述子作为手势特征，利用BP神经网络进行手势识别，从而提高手势识别的准确率The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a gesture recognition method based on a hybrid neural network, which uses a pulse-coupled neural network to improve the denoising effect of the gesture image, uses a cellular neural network to perform gesture segmentation, and adopts a method with rotational variability The Fourier descriptor is used as a gesture feature, and the BP neural network is used for gesture recognition, thereby improving the accuracy of gesture recognition

为实现上述发明目的，本发明基于混合神经网络的手势识别方法，包括以下步骤：In order to achieve the above-mentioned purpose of the invention, the gesture recognition method based on the hybrid neural network of the present invention comprises the following steps:

S1：提取待识别手势图像和手势图像训练样本的特征，具体步骤包括：S1: Extract the features of the gesture image to be recognized and the gesture image training samples, the specific steps include:

S1.1：建议手势灰度图像的脉冲耦合神经网络模型，将当前手势灰度图像各像素点的灰度值作为脉冲耦合神经网络中对应神经元的输入，利用脉冲耦合神经网络的发放特性对手势图像的像素点进行检测，如果像素点的输出状态为点火状态，则将检测结果矩阵中该像素点对应的元素置为1，否则置为0；遍历检测结果矩阵的每个元素，如果元素值为1，则以该元素为降噪处理窗口的中心，降噪处理窗口的大小根据实际情况设置，统计降噪处理窗口中除中心点元素以外的其他元素的值，如果值为0的元素数量大于预设阈值，说明该中心点是噪声点，其他情况则不是噪声点；S1.1: The pulse-coupled neural network model of the gesture grayscale image is suggested. The gray value of each pixel of the current gesture grayscale image is used as the input of the corresponding neuron in the pulse-coupled neural network, and the pulse-coupled neural network is used to control the neural network. The pixel of the gesture image is detected. If the output state of the pixel is the ignition state, the element corresponding to the pixel in the detection result matrix is set to 1, otherwise it is set to 0; traverse each element of the detection result matrix, if the element If the value is 1, this element will be the center of the noise reduction processing window. The size of the noise reduction processing window is set according to the actual situation, and the values of other elements in the noise reduction processing window except the center point element are counted. If the element with a value of 0 If the number is greater than the preset threshold, it means that the center point is a noise point, otherwise it is not a noise point;

分别按以下公式计算噪声点的两种噪声估计值H(i,j)和V(i,j)：Calculate the two noise estimates H(i,j) and V(i,j) of the noise points according to the following formulas:

H(i,j)＝|a(i,j)-b(i,j)|H(i,j)=|a(i,j)-b(i,j)|

其中，a(i,j)是图像中像素点(i,j)处的灰度值，b(i,j)为该像素点进行中值滤波后的中值输出灰度值；Among them, a(i, j) is the gray value at the pixel point (i, j) in the image, and b(i, j) is the median output gray value of the pixel point after median filtering;

其中，m₁(i,j)和m₂(i,j)分别代表像素点(i,j)所在邻域中与a(i,j)灰度值最接近的两个点的灰度值；Among them, m ₁ (i, j) and m ₂ (i, j) respectively represent the gray values of the two points closest to the gray value of a (i, j) in the neighborhood where the pixel point (i, j) is located ;

如果H(i,j)≥T₁，并且V(i,j)≥T₂，则采用中值滤波对该噪声点进行处理，否则采用均值滤波对该噪声点进行处理；If H(i,j)≥T ₁ , and V(i,j)≥T ₂ , use median filter to process the noise point, otherwise use mean value filter to process the noise point;

S1.2：对经步骤S1.1去噪后的手势灰度图像进行直方图均衡化；S1.2: performing histogram equalization on the gesture grayscale image denoised in step S1.1;

S1.3：建立手势灰度图像的细胞神经网络模型，将均衡化后的手势灰度图像各像素点(i,j)的灰度值作为细胞神经网络模型中对应细胞的输入u_ij，按照状态转移过程的公式进行迭代，直到整个细胞神经网络收敛，得到每个细胞的输出y_ij(t)；遍历细胞神经网络中每个像素点对应的细胞元的输出值，当某个像素点的输出值在[0,1]范围内，如果其对应邻域内其他像素点的像素值和大于预设阈值，则本像素不是边缘像素，否则是边缘像素点；当输出值在[-1,0)范围内，不是边缘像素点；S1.3: Establish the cellular neural network model of the gesture grayscale image, and use the gray value of each pixel (i, j) of the equalized gesture grayscale image as the input u _ij of the corresponding cell in the cellular neural network model, according to The formula of the state transition process is iterated until the entire cellular neural network converges, and the output y _ij (t) of each cell is obtained; the output value of the cell corresponding to each pixel in the cellular neural network is traversed, when a certain pixel The output value is in the range of [0,1]. If the sum of the pixel values of other pixels in the corresponding neighborhood is greater than the preset threshold, the pixel is not an edge pixel, otherwise it is an edge pixel; when the output value is in [-1,0 ) range, not edge pixels;

S1.4：根据步骤S1.3得到的边缘像素点得到连通区域，提取得到连通区域的轮廓，对每个连通区域分别进行指尖检测，指尖检测方法为：S1.4: Obtain connected regions according to the edge pixels obtained in step S1.3, extract the contours of the connected regions, and perform fingertip detection on each connected region respectively. The fingertip detection method is:

遍历连通区域中的每个轮廓像素点，将该像素点作为基准点，坐标记为p(p_x,p_y,0)，预设一个距离常数L，沿轮廓方向取p点前面的第L个点p₁(p_1x,p_1y,0)，取点p后面的第L个点p₂(p_2x,p_2y,0)，计算向量与向量之间夹角的余弦值cosα，如果cosα大于预设曲率阈值T，则判定该点为待定指尖点，否则不作为待定指尖点；Traverse each contour pixel point in the connected area, take this pixel point as a reference point, and mark the coordinates as p(p _x , p _y ,0), preset a distance constant L, and take the Lth point in front of point p along the contour direction point p ₁ (p _1x ,p _1y ,0), take the Lth point p ₂ (p _2x ,p _2y ,0) after point p, and calculate the vector with vector The cosine value cosα of the angle between them, if cosα is greater than the preset curvature threshold T, then it is determined that the point is a pending fingertip point, otherwise it is not regarded as a pending fingertip point;

根据遍历方向确定指尖位置向量积的符号，如果按照手势区域整体轮廓的顺时针遍历时，向量积符号应为负，否则为正，计算待定指尖点向量与向量之间的向量积如果该向量积的符号与指尖位置对应的符号相同，则保留为待定指尖点，否则不保留；Determine fingertip position vector product according to traversal direction The sign of , if the clockwise traversal of the overall outline of the gesture area is followed, the sign of the vector product should be negative, otherwise it is positive, and the pending fingertip point vector is calculated with vector vector product between If the sign of the vector product is the same as the sign corresponding to the fingertip position, it will be reserved as the pending fingertip point, otherwise it will not be reserved;

判断该连通区域中检测到的所有待定指尖点中，y坐标最大的待定指尖点与y坐标最小的待定指尖点的y坐标差值是否超过人脸高度的一半，如果是，该连通区域不是手势区域，否则作为待定手势区域；再进一步判断的每个待定手势区域中待定指尖点数量是否超过预设的数量阈值，如果是，则该连通区域为手势区域，否则不是；Determine whether the y-coordinate difference between the undetermined fingertip point with the largest y-coordinate and the undetermined fingertip point with the smallest y-coordinate among all undetermined fingertip points detected in the connected area exceeds half of the face height, and if so, the connected If the area is not a gesture area, otherwise it is regarded as a pending gesture area; further judge whether the number of pending fingertip points in each pending gesture area exceeds the preset number threshold, if yes, the connected area is a gesture area, otherwise not;

求取手势区域的主方向，根据主方向按照手势长度与宽度比值为2对手势区域进行分割，得到分割后的手势区域；Find the main direction of the gesture area, and divide the gesture area according to the main direction according to the ratio of the length of the gesture to the width of 2 to obtain the divided gesture area;

S1.5：将经步骤S1.4分割后得到的手势区域，将手势区域的轮廓点坐标以复数形式表示，将所有轮廓点坐标构成离散序列，记轮廓点数量为n，对该离散序列进行傅里叶变换，得到n个傅里叶系数z(k)，k＝0,1,…,n-1，计算傅里叶描述子 S1.5: The gesture area obtained after step S1.4 is divided, the contour point coordinates of the gesture area are expressed in plural form, all the contour point coordinates are formed into a discrete sequence, and the number of contour points is n, and the discrete sequence is performed Fourier transform, get n Fourier coefficients z(k), k=0,1,...,n-1, calculate Fourier descriptor

其中k′＝1,2,…,n-1，表示手势区域主方向与x轴的夹角。where k'=1,2,...,n-1, Indicates the angle between the main direction of the gesture area and the x-axis.

在傅里叶描述子中选择前Q个构成特征向量；Select the first Q constituent feature vectors in the Fourier descriptor;

S2：将训练样本手势图像的特征向量作为训练样本输入BP神经网络，其对应的手势图像类别作为BP神经网络的输出，对BP神经网络进行训练；S2: The feature vector of the gesture image of the training sample is input into the BP neural network as a training sample, and the corresponding gesture image category is used as the output of the BP neural network to train the BP neural network;

S3：将待识别手势图像的特征向量输入步骤S2训练好的BP神经网络中，输出识别得到的手势图像类别。S3: Input the feature vector of the gesture image to be recognized into the BP neural network trained in step S2, and output the recognized gesture image category.

本发明基于混合神经网络的手势识别方法，对于待识别手势图像和手势图像训练样本，首先采用脉冲耦合神经网络进行噪声点和边缘点的区分检测，再利用复合去噪算法对噪声点进行处理，然后采用细胞神经网络提取手势图像中的边缘点，根据提取到的边缘点得到连通区域，利用曲率对每个连通区域进行指尖检测得到待定指尖点，再排除人脸部分的干扰，得到手势区域，然后根据手形形状特点进行分割，得到分割后的手势区域；根据手势区域的轮廓点得到保留相位信息的傅里叶描述子，选择前若干个傅里叶描述子作为手势特征；根据手势图像训练样本的手势特征训练BP神经网络，将待识别手势图像的手势特征输入BP神经网络进行识别。The gesture recognition method based on the hybrid neural network of the present invention, for the gesture image to be recognized and the gesture image training samples, first uses the pulse-coupled neural network to distinguish and detect the noise point and the edge point, and then uses the composite denoising algorithm to process the noise point, Then the cellular neural network is used to extract the edge points in the gesture image, and the connected areas are obtained according to the extracted edge points, and the fingertip detection is performed on each connected area by the curvature to obtain the undetermined fingertip points, and then the interference of the face part is excluded to obtain the gesture region, and then segmented according to the characteristics of the hand shape to obtain the segmented gesture region; obtain the Fourier descriptors that retain the phase information according to the contour points of the gesture region, and select the first few Fourier descriptors as gesture features; according to the gesture image The gesture features of the training samples are used to train the BP neural network, and the gesture features of the gesture images to be recognized are input into the BP neural network for recognition.

本发明具有以下有益效果：The present invention has the following beneficial effects:

(1)利用脉冲耦合神经网络进行噪声点和边缘点的区分，结合复合去噪算法对手势图像进行去噪，可以提高去噪效果；(1) Use the pulse-coupled neural network to distinguish noise points and edge points, and combine the composite denoising algorithm to denoise the gesture image, which can improve the denoising effect;

(2)手势分割结合了细胞神经网络的粗分割与基于手势形状特征的细分割，可以提高手势分割的准确度；(2) Gesture segmentation combines the coarse segmentation of cellular neural network and the fine segmentation based on gesture shape features, which can improve the accuracy of gesture segmentation;

(3)手势特征采用傅里叶描述子，保留了相位信息，可以提高识别率。(3) The Fourier descriptor is used for the gesture feature, which retains the phase information and can improve the recognition rate.

附图说明Description of drawings

图1是本发明基于混合神经网络的手势识别方法的流程图Fig. 1 is the flowchart of the gesture recognition method based on hybrid neural network of the present invention

图2是本发明中手势图像特征提取的流程图Fig. 2 is the flowchart of gesture image feature extraction in the present invention

图3是结合手势形状特性进行手势细分割的流程图Figure 3 is a flow chart of gesture fine segmentation combined with gesture shape characteristics

图4是本发明指尖检测的示意图；Fig. 4 is the schematic diagram of fingertip detection of the present invention;

图5是手势粗分割的示例图；Fig. 5 is an example diagram of gesture coarse segmentation;

图6是手势细分割的示例图。FIG. 6 is an example diagram of fine segmentation of gestures.

具体实施方式detailed description

下面结合附图对本发明的具体实施方式进行描述，以便本领域的技术人员更好地理解本发明。需要特别提醒注意的是，在以下的描述中，当已知功能和设计的详细描述也许会淡化本发明的主要内容时，这些描述在这里将被忽略。Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention. It should be noted that in the following description, when detailed descriptions of known functions and designs may dilute the main content of the present invention, these descriptions will be omitted here.

实施例Example

图1是本发明基于混合神经网络的手势识别方法的流程图。如图1所示，本发明基于混合神经网络的手势识别方法包括以下步骤：Fig. 1 is a flow chart of the gesture recognition method based on the hybrid neural network of the present invention. As shown in Figure 1, the gesture recognition method based on hybrid neural network of the present invention comprises the following steps:

S101：提取待识别样本和训练样本的特征：S101: Extracting features of samples to be identified and training samples:

首先需要对待识别手势图像和手势图像训练样本进行特征提取。图2是本发明中手势图像特征提取的流程图。如图2所示，本发明中手势图像特征提取包括以下步骤：First, feature extraction needs to be performed on gesture images to be recognized and gesture image training samples. Fig. 2 is a flowchart of gesture image feature extraction in the present invention. As shown in Figure 2, gesture image feature extraction in the present invention comprises the following steps:

S201：手势图像去噪预处理：S201: gesture image denoising preprocessing:

本发明采用基于脉冲耦合神经网络(PCNN-Pulse Coupled Neural Network)和复合去噪算法相结合的去噪算法进行手势灰度图像的去噪，先通过采用脉冲耦合神经网络对手势图像进行噪声点和边缘点的区分检测，之后根据噪声点的类型采用复合去噪算法进行去噪操作，从而达到在保留边缘信息的前提下去除多种噪声的目的。The present invention uses a denoising algorithm based on a combination of PCNN-Pulse Coupled Neural Network (PCNN-Pulse Coupled Neural Network) and a composite denoising algorithm to denoise gesture grayscale images. The edge points are distinguished and detected, and then the compound denoising algorithm is used to perform denoising operations according to the type of noise points, so as to achieve the purpose of removing various noises under the premise of retaining edge information.

脉冲耦合神经网络的每个神经元由三个部分组成：接收部分、调制部分和脉冲产生器。脉冲耦合神经网络是图像降噪预处理的一种常用方法，其主要作用在于去除椒盐噪声。当脉冲耦合神经网络用于图像降噪领域时，可以理解成一个二维单层的局部连接网络，在这个网络中神经元与待处理灰度图像中的像素点是一一对应的，并且相邻神经元之间也是相互连接的关系。在降噪处理过程中，待处理图像的每个像素点的灰度值可以理解为神经元的反馈输入，同时每个神经元的输出只作为相邻神经元的输入，并且每个神经元的输出状态只有两种：点火状态和不点火状态，可分别记为1和0。由于噪声对应的像素点与周围的像素点区别较大，因此可以利用脉冲耦合神经网络的发放特性结合噪声的自身特性进行噪声点的判断，具体判断方法如下：Each neuron of a pulse-coupled neural network consists of three parts: a receiving part, a modulating part and a pulse generator. Pulse-coupled neural network is a common method for image noise reduction preprocessing, and its main function is to remove salt and pepper noise. When the pulse-coupled neural network is used in the field of image noise reduction, it can be understood as a two-dimensional single-layer local connection network. In this network, neurons correspond to pixels in the grayscale image to be processed one-to-one, and are relatively Neighboring neurons are also interconnected. In the process of denoising, the gray value of each pixel of the image to be processed can be understood as the feedback input of neurons, while the output of each neuron is only used as the input of adjacent neurons, and the output of each neuron There are only two output states: ignition state and non-ignition state, which can be recorded as 1 and 0 respectively. Since the pixel corresponding to the noise is quite different from the surrounding pixels, the noise point can be judged by using the emission characteristics of the pulse-coupled neural network combined with the characteristics of the noise itself. The specific judgment method is as follows:

建立手势灰度图像的脉冲耦合神经网络模型；将当前手势灰度图像各像素点的灰度值作为脉冲耦合神经网络中对应神经元的输入，然后利用脉冲耦合神经网络的发放特性对整个图像的像素点进行检测，如果像素点的输出状态为点火状态，则将检测结果矩阵中该像素点对应的元素置为1，否则置为0，可见检测结果矩阵和待处理图像的大小相同；设置降噪处理窗口大小，本实施例中为3×3；遍历检测结果矩阵的每个元素，如果元素值为1，也就是为点火状态，则以该元素为降噪处理窗口的中心，统计降噪处理窗口中除中心点元素以外的其他元素的值(即中心点对应像素点点以外其他像素点的检测结果)，如果值为0(即不点火状态)的元素数量大于预设阈值，说明该中心点是噪声点，其他情况则该中心点不是噪声点。从而达到对噪声点和边缘点进行判断区分的目的。数量阈值一般是降噪处理窗口中元素数量的一半。Establish the pulse-coupled neural network model of the gesture gray-scale image; use the gray value of each pixel of the current gesture gray-scale image as the input of the corresponding neuron in the pulse-coupled neural network, and then use the pulse-coupled neural network’s firing characteristics to analyze the entire image. Pixels are detected. If the output state of the pixel is the ignition state, the element corresponding to the pixel in the detection result matrix is set to 1, otherwise it is set to 0. It can be seen that the size of the detection result matrix is the same as that of the image to be processed; Noise processing window size, in the present embodiment, be 3 * 3; Traverse each element of detection result matrix, if element value is 1, just be ignition state, then take this element as the center of noise reduction processing window, statistical noise reduction Process the values of other elements in the window except the center point element (that is, the detection results of other pixels other than the pixel corresponding to the center point), if the number of elements with a value of 0 (that is, the non-ignition state) is greater than the preset threshold, it means that the center The center point is a noise point, and in other cases, the center point is not a noise point. So as to achieve the purpose of judging and distinguishing noise points and edge points. The number threshold is generally half the number of elements in the denoising processing window.

在判断得到噪声点后，再采用复合去噪算法进行相应的去噪操作，其主要方法为：After the noise point is judged, the composite denoising algorithm is used to perform the corresponding denoising operation. The main method is:

假设a(i,j)是图像中像素点(i,j)处的灰度值，1≤i≤M,1≤j≤N，M表示手势灰度图像每行的像素点数量(即列数)，N表示手势灰度图像每列的像素点数量(即行数)，b(i,j)为该像素点进行中值滤波后的中值输出灰度值。为了达到对高斯噪声的目的，采用噪声点的像素值和中值输出灰度值的差值作为噪声估计值，如下式(1)所示Assume that a(i,j) is the grayscale value at the pixel point (i,j) in the image, 1≤i≤M, 1≤j≤N, M represents the number of pixels in each row of the gesture grayscale image (ie column number), N represents the number of pixels (that is, the number of rows) in each column of the gesture grayscale image, and b(i, j) is the median output gray value of the pixel after median filtering. In order to achieve the purpose of Gaussian noise, the difference between the pixel value of the noise point and the median output gray value is used as the noise estimation value, as shown in the following formula (1)

H(i,j)＝|a(i,j)-b(i,j)| (1)H(i,j)=|a(i,j)-b(i,j)| (1)

由于噪声的类型不同，如果单纯地使用上述的估计方法，不能达到区分多种噪声的目的，因此在上述公式的基础上，又引入了另外一个噪声估计值V(i,j)，该参数即像素点(i,j)处的像素值a(i,j)与相近的两个点m₁(i,j)和m₂(i,j)的梯度和的平均值，如下式(3)所示Due to the different types of noise, if the above estimation method is simply used, the purpose of distinguishing multiple noises cannot be achieved. Therefore, on the basis of the above formula, another noise estimation value V(i,j) is introduced. The parameter is The average value of the gradient sum of the pixel value a(i,j) at the pixel point (i,j) and the two adjacent points m ₁ (i,j) and m ₂ (i,j), as shown in the following formula (3) shown

其中，m₁(i,j)和m₂(i,j)分别代表像素点(i,j)所在邻域中与a(i,j)灰度值最接近的两个点的灰度值。Among them, m ₁ (i, j) and m ₂ (i, j) respectively represent the gray values of the two points closest to the gray value of a (i, j) in the neighborhood where the pixel point (i, j) is located .

设置阈值为T₁和T₂，则通过上述两种噪声估计值与阈值之间的关系，实现对不同噪声的对应处理，具体方法为：If the thresholds are set to T ₁ and T ₂ , the corresponding processing of different noises can be realized through the relationship between the above two noise estimates and the thresholds. The specific method is as follows:

如果H(i,j)≥T₁，并且V(i,j)≥T₂，则判定该噪声点的类型为椒盐噪声或者脉冲噪声，采用中值滤波对该噪声点进行处理，即将该噪声点的灰度值修改为中值滤波输出值，如果H(i,j)＜T₁，或者H(i,j)≥T₁并且V(i,j)＜T₂，则判定该噪声类型为高斯噪声，采用均值滤波对该噪声点进行处理，即将该噪声点的灰度值修改为均值滤波输出值。If H(i,j)≥T ₁ , and V(i,j)≥T ₂ , then it is determined that the type of noise point is salt and pepper noise or impulse noise, and the noise point is processed by median filtering, that is, the noise point The gray value of the point is modified to the output value of the median filter. If H(i,j)<T ₁ , or H(i,j)≥T ₁ and V(i,j)<T ₂ , the noise type is determined For Gaussian noise, the noise point is processed by mean filtering, that is, the gray value of the noise point is changed to the output value of mean filtering.

在上述算法中，阈值T₁和T₂的选取对复合去噪算法结果的好坏至关重要。其中目前常用的阈值选取方法为平均绝对离差算法即MAD算法。根据该算法可知，T₁＝3.5δ_ij，δ_ij表示像素点(i,j)的去噪窗口内所有像素点的平均绝对离差。阈值T₂的选择主要是针对手势图像中可能出现的纹理，依据MAD算法和实验经验，T₂的值通常选择为6～10的整数。 _In the above algorithm, the selection _of thresholds T1 and T2 is very important to the results of the composite denoising algorithm. Among them, the commonly used threshold selection method is the mean absolute deviation algorithm (MAD algorithm). According to the algorithm, it can be known that T ₁ =3.5δ _ij , where δ _ij represents the average absolute dispersion of all pixels within the denoising window of the pixel (i,j). The selection of the threshold T ₂ is mainly aimed at the texture that may appear in the gesture image. According to the MAD algorithm and experimental experience, the value of T ₂ is usually selected as an integer of 6-10.

S202：直方图均衡化：S202: Histogram equalization:

直方图均衡化处理是指利用图像直方图对图像的对比度进行调整的方法，从而把原始图像的灰度直方图从比较集中的某个灰度区域变成在全局范围内均匀分布。本发明对步骤S201去噪处理后的手势灰度图像进行直方图均衡化处理，是为了扩大手势图像前景和后景部分灰度值的差别。直方图均衡化是目前一种常用的图像对比度增强的方法，其具体步骤在此不再赘述。Histogram equalization refers to the method of using the image histogram to adjust the contrast of the image, so that the gray histogram of the original image is changed from a relatively concentrated gray area to a uniform distribution in the global range. In the present invention, the histogram equalization process is performed on the gesture grayscale image after the denoising process in step S201, in order to enlarge the difference in the grayscale value of the foreground and background parts of the gesture image. Histogram equalization is a commonly used image contrast enhancement method at present, and its specific steps will not be repeated here.

S203：基于细胞神经网络的手势粗分割：S203: Coarse gesture segmentation based on cellular neural network:

与脉冲耦合神经网络一样，细胞神经网络中的神经元与手势灰度图像中的像素点一一对应，记第i行第j列的细胞为C(i,j)(对应手势灰度图像中的像素点(i,j))，细胞C(i,j)均由四部分组成：输入变量u_ij、状态转移变量x_ij、输出变量yi_j以及阀值I。细胞神经网络的细胞之间是局部互联的，细胞C(i,j)只与它的邻域N_r(i,j)中的细胞相互连接，而与其他的细胞无直接的连接关系。细胞C(i,j)邻域N_r(i,j)可以定义为：Like the pulse-coupled neural network, the neurons in the cellular neural network are in one-to-one correspondence with the pixels in the gesture grayscale image, and the cell in row i and column j is C(i,j) (corresponding to the gesture grayscale image pixel (i,j)), cell C(i,j) consists of four parts: input variable u _ij , state transition variable x _ij , output variable yi _j and threshold I. The cells of the cellular neural network are locally interconnected, and the cell C(i,j) is only connected to the cells in its neighborhood N _r (i,j), but has no direct connection relationship with other cells. Cell C(i,j) neighborhood N _r (i,j) can be defined as:

N_r(i,j)＝C(k,l)|max(k-i,l-j)≤r (3)N _r (i,j)=C(k,l)|max(ki,lj)≤r (3)

其中，r为正整数，1≤i,k≤M,1≤j,l≤N，M表示手势灰度图像每行的像素点数量，N表示手势灰度图像每列的像素点数量。即细胞C(i,j)的邻域是以C(i,j)为中心，边长为2r+1的正方形所包括的范围。Among them, r is a positive integer, 1≤i, k≤M, 1≤j, l≤N, M represents the number of pixels in each row of the gesture grayscale image, and N represents the number of pixels in each column of the gesture grayscale image. That is, the neighborhood of cell C(i,j) is the range included by a square with side length 2r+1 centered on C(i,j).

细胞神经网络的主要公式为：The main formula of CNN is:

状态转移过程：State transition process:

输出方程：Output equation:

其中，1≤i,k≤M，1≤j,l≤N；t表示迭代次数；A(k,l)代表细胞C(i,j)所处的邻域N_r(i,j)内的细胞C(k,l)的反馈权重；B(k,l)则代表细胞C(i,j)所处的邻域N_r(i,j)内的细胞C(k,l)的控制权重，也即模板B中除中心位置元素之外的其他元素。这里(k,l)的取值依据邻域N_r(i,j)的定义决定。Among them, 1≤i, k≤M, 1≤j, l≤N; t represents the number of iterations; A(k,l) represents the neighborhood N _r (i,j) where the cell C(i,j) is located The feedback weight of the cell C(k,l); B(k,l) represents the control of the cell C(k,l) in the neighborhood N _r (i,j) where the cell C(i,j) is located Weight, that is, other elements in template B except the central position element. Here the value of (k,l) depends on the definition of the neighborhood N _r (i,j).

反馈模板A和控制模块B都是(2r+1)×(2r+1)的矩阵，I代表细胞神经网络的阀值模板，A、B和I的值综合决定了细胞神经网络的输入量u_ij、输出量yi_j以及状态转移量x_ij的对应关系。因此对于细胞神经网络模型来说，如何正确地设计反馈模板A、控制模板B以及阀值I的取值至关重要。The feedback template A and the control module B are both (2r+1)×(2r+1) matrices, I represents the threshold template of the cellular neural network, and the values of A, B and I comprehensively determine the input quantity u of the cellular neural network The corresponding relationship between _ij , output quantity yi _j and state transition quantity x _ij . Therefore, for the cellular neural network model, how to correctly design the feedback template A, the control template B and the value of the threshold I is very important.

本发明采用的模板设计方法是基于代数结与前人模板设计经验相结合的模板设计方法，模板A、B、I的格式一般设计如下：The template design method that the present invention adopts is the template design method based on the combination of algebraic knot and predecessor's template design experience, and the format of template A, B, I is generally designed as follows:

I＝-d (8)I=-d (8)

其中，a,b,c,d均为正常数。Among them, a, b, c, d are all normal numbers.

建立手势灰度图像的细胞神经网络模型，将均衡化后手势灰度图像各像素点(i,j)的灰度值作为细胞神经网络模型中对应细胞的输入u_ij，按照状态转移过程的公式进行迭代，直到整个细胞神经网络收敛，每个细胞存在输出y_ij(t)。根据输出方程可知，细胞神经网络的输出值y_ij(t)介于1和-1之间，当y_ij(t)为1时，代表全黑；当y_ij(t)为-1时，代表全白。Establish the cellular neural network model of the gesture grayscale image, and use the gray value of each pixel (i, j) of the gesture grayscale image after equalization as the input u _ij of the corresponding cell in the cellular neural network model, according to the formula of the state transition process Iterate until the entire cellular neural network converges, and each cell has an output y _ij (t). According to the output equation, the output value y _ij (t) of the cellular neural network is between 1 and -1, when y _ij (t) is 1, it represents all black; when y _ij (t) is -1, Represents all white.

判断某像素点是否为边缘点的基本原理为：当某个像素值为全黑，即为+1时，如果其对应邻域内的各个像素值的和大于设定的阈值参数，则本像素不是边缘像素，此时像素值趋于全白；反之，如果，其对应邻域内的各个像素值的和小于设定的阀值参数，则本像素代表边缘像素，此时像素值趋于全黑。当本像素值为全白，即-1时，则无论其对应邻域内各个像素的值大小如何，本像素值都将趋于全白。The basic principle of judging whether a pixel is an edge point is: when a pixel value is completely black, that is, +1, if the sum of the pixel values in its corresponding neighborhood is greater than the set threshold parameter, then the pixel is not At this time, the pixel value tends to be completely white; on the contrary, if the sum of the pixel values in the corresponding neighborhood is less than the set threshold parameter, this pixel represents an edge pixel, and the pixel value at this time tends to be completely black. When the value of this pixel is all white, that is, -1, the value of this pixel will tend to be all white regardless of the value of each pixel in its corresponding neighborhood.

根据以上原理，本发明中判断某像素点是否为边缘点的方法为：遍历细胞神经网络中每个像素点对应的细胞元的输出值，当某个像素点的输出值在[0,1]范围内，如果其对应邻域内其他像素点的像素值和大于预设阈值，则本像素不是边缘像素，否则是边缘像素点；当输出值在[-1,0)范围内，不是边缘像素点。邻域像素值和的阈值是根据实际情况来设置的。According to the above principles, the method for judging whether a certain pixel is an edge point in the present invention is: traverse the output value of the cell corresponding to each pixel in the cellular neural network, when the output value of a certain pixel is in [0,1] Within the range, if the sum of the pixel values of other pixels in the corresponding neighborhood is greater than the preset threshold, the pixel is not an edge pixel, otherwise it is an edge pixel; when the output value is in the range of [-1,0), it is not an edge pixel . The threshold of the neighborhood pixel value sum is set according to the actual situation.

S204：结合手势形状特性进行手势细分割：S204: Combining gesture shape features to perform gesture segmentation:

图3是结合手势形状特性进行手势细分割的流程图。如图3所示，本发明手势细分割包括以下步骤：Fig. 3 is a flow chart of fine segmentation of gestures combined with gesture shape features. As shown in Figure 3, the gesture subdivision of the present invention includes the following steps:

S301：提取连通区域及轮廓：S301: Extract connected regions and contours:

根据采用细胞神经网络得到的边缘像素点，求取连通区域，从而去除其他背景信息的干扰，只保留人的手部和脸部区域。本实施例中求取连通区域采用的算法为two_pass算法。然后提取连通区域的轮廓，本实施例采用搜索标记方法提取轮廓，具体流程为：对上面提取连通区域后的图像进行系统性地扫描，如果遇到连通区域内的某一个点，则以该点为起始点，然后跟踪它的边缘，并对边缘上面的像素进行标记。当扫描的轮廓达到完整闭合，则回到上一个位置继续扫描，直到发现新的像素信息。提取连通区域和轮廓也可以根据需要选用其他方法。According to the edge pixels obtained by using the cellular neural network, the connected area is calculated, so as to remove the interference of other background information, and only keep the human hand and face area. In this embodiment, the algorithm used to obtain the connected regions is the two_pass algorithm. Then extract the contour of the connected region. This embodiment uses the search mark method to extract the contour. The specific process is: systematically scan the image after the connected region is extracted above. If a point in the connected region is encountered, use this point as the starting point, then trace its edge, and mark the pixels above the edge. When the scanned contour reaches complete closure, return to the previous position and continue scanning until new pixel information is found. Extracting connected regions and contours can also use other methods as needed.

S302：对每个连通区域进行指尖检测：S302: Perform fingertip detection on each connected region:

对于得到的每个连通区域分别进行指尖检测，从而判断是否为手势区域。一般情况下在进行手势识别时，手指都是分开的，因此可以通过曲率计算来进行指尖检测。图4是本发明指尖检测的示意图。如4所示，指尖检测的方法为：Fingertip detection is performed on each of the obtained connected regions to determine whether it is a gesture region. In general, when gesture recognition is performed, fingers are separated, so fingertip detection can be performed through curvature calculation. Fig. 4 is a schematic diagram of fingertip detection in the present invention. As shown in 4, the method of fingertip detection is:

遍历连通区域中的每个轮廓像素点，将该像素点作为基准点，坐标记为p(p_x,p_y,0)，(p_x,p_y)即表示该基准点在手势图像中的二维坐标，预设一个距离常数L，沿轮廓方向取p点前面的第L个点p₁(p_1x,p_1y,0)，则点p与点p₁组成一条直线，接着沿轮廓方向取点p后面的第L个点p₂(p_2x,p_2y,0)，则点p与点p₂也可以组成一条直线，这两条之间会形成一个夹角，该夹角记为α；将向量与向量之间夹角的余弦值作为将要计算的曲率结果，即曲率计算公式为：Traverse each contour pixel point in the connected area, and use this pixel point as a reference point, the coordinates are marked as p(p _x , p _y ,0), (p _x , p _y ) means the position of the reference point in the gesture image Two-dimensional coordinates, preset a distance constant L, take the L-th point p ₁ (p _1x ,p _1y ,0) in front of point p along the contour direction, then point p and point p ₁ form a straight line, and then follow the contour direction Take the L-th point p ₂ (p _2x ,p _2y ,0) behind point p, then point p and point p ₂ can also form a straight line, and an angle will be formed between the two, and the angle is recorded as α; convert the vector with vector The cosine value of the included angle is used as the curvature result to be calculated, that is, the curvature calculation formula is:

如果cosα大于预设曲率阈值T，则判定该点为待定指尖点。阈值T的大小是根据距离常数L来设置的，当距离常数L越大，阈值T也就越大。距离常数L通常也不能过小或过大，一般按照手指平均长度的四分之一到二分之一来设置。If cosα is greater than the preset curvature threshold T, it is determined that the point is a pending fingertip point. The size of the threshold T is set according to the distance constant L. When the distance constant L is larger, the threshold T is also larger. Usually, the distance constant L cannot be too small or too large, and it is generally set according to 1/4 to 1/2 of the average finger length.

对于手指的凹槽部分的干扰来说，可以通过向量与向量之间的向量积的符号来确定。通过图4可以看出，当点p位于指尖位置时向量积的符号与点p位于凹槽位置时向量积的符号不同，因此可以通过的符号来判断点p的位置。正是出于这个目的，才将点p、p₁和p₂的坐标以三维直角坐标方式表示。指尖位置的向量积的符号与遍历方向有关，当按照手势区域整体轮廓的顺时针遍历时，根据向量积的右手定则，指尖位置的向量积垂直于图像向内，即为负，当按照手势区域整体轮廓的逆时针遍历时(如图4中所示遍历方向)，指尖位置的向量积垂直于图像向外，即为正。根据指尖位置的向量积的符号，从而去除凹槽部分的干扰。即判断待定指尖点向量积的符号，如果与指尖位置对应的符号相同，则保留为待定指尖点，否则不保留。For the interference of the groove part of the finger, the vector with vector Determine the sign of the vector product between . It can be seen from Figure 4 that when the point p is located at the fingertip The sign of the vector product is the same as when the point p is in the groove position The sign of the vector product is different, so it can be obtained by to determine the position of point p. It is for this purpose that the coordinates of the points p, p ₁ and p ₂ are expressed in three-dimensional Cartesian coordinates. fingertip position The sign of the vector product is related to the traversal direction. When traversing the overall outline of the gesture area clockwise, according to the right-hand rule of the vector product, the position of the fingertip The vector product is perpendicular to the image inward, which is negative. When traversing counterclockwise according to the overall outline of the gesture area (traversal direction as shown in Figure 4), the fingertip position The vector product is positive if it is perpendicular to the image outwards. according to fingertip position The sign of the vector product, thus removing the interference of the groove part. That is, to judge the pending fingertip If the sign of the vector product is the same as that corresponding to the fingertip position, it is reserved as the pending fingertip point, otherwise it is not reserved.

S303：判定手势区域：S303: Determine the gesture area:

在检测到指尖点后，还需要对指尖点进行判断，从而去除人脸部分的某些部分因为角度问题而引起的曲率大于阈值的干扰，判定得到手势区域。本发明采用了两重判定方法：After the fingertip point is detected, it is also necessary to judge the fingertip point, so as to remove the interference caused by some parts of the face with a curvature greater than the threshold due to the angle problem, and determine the gesture area. The present invention has adopted double judgment method:

首先判断连通区域中检测到的即y坐标最大的待定指尖点与检测到的y坐标最小的待定指尖点之间的y坐标差值是否超过人脸高度的一半，如果是，该连通区域不是手势区域，否则作为待定手势区域。这里之所以将距离大小设置为人脸高度的一半，是通过实验测试得出的，这样就可以在完整保留正确指尖点的前提下，去除人脸部分的干扰。First judge whether the y-coordinate difference between the undetermined fingertip point with the largest y-coordinate detected in the connected area and the undetermined fingertip point with the smallest y-coordinate detected exceeds half of the face height, and if so, the connected area Not a gesture area, otherwise it is a pending gesture area. The reason why the distance is set to half the height of the face here is obtained through experimental testing, so that the interference of the face part can be removed under the premise of completely retaining the correct fingertip points.

再进一步判断的每个待定手势区域中待定指尖点数量是否超过预设的数量阈值，如果是，则该连通区域为手势区域，否则不是。实际手势区域得到的指尖点数量的多少与曲率阈值T有关，因此在实际应用中，指尖点数量的阈值可以通过对若干个手势训练样本的实验结果进行统计得到。It is further judged whether the number of undetermined fingertip points in each undetermined gesture area exceeds the preset number threshold, if yes, the connected area is a gesture area, otherwise it is not. The number of fingertip points obtained in the actual gesture area is related to the curvature threshold T. Therefore, in practical applications, the threshold of the number of fingertip points can be obtained by statistically analyzing the experimental results of several gesture training samples.

S304：手势区域分割：S304: Gesture area segmentation:

通过以上操作去除了人脸等其他连通区域的干扰，得到了手势区域。然而手势区域里面有可能不单单包括人的手掌部分，有可能还有手腕等部分。一般情况下，人的手势的有效信息都集中在人的手掌部分，手腕等部分的信息基本可以忽略。因此为了使得后期特征提取和跟踪的高效和有效，需要对手势区域进行分割，达到只保留手指和手掌部分的目的。Through the above operations, the interference of other connected areas such as faces is removed, and the gesture area is obtained. However, the gesture area may not only include the palm of the person, but may also include the wrist and other parts. In general, the effective information of human gestures is concentrated in the palm of the person, and the information of the wrist and other parts can basically be ignored. Therefore, in order to make the later feature extraction and tracking efficient and effective, it is necessary to segment the gesture area to achieve the purpose of retaining only the fingers and palms.

根据人手的形状特征，本发明根据手势的长度与手势的宽度的比值约等于2来实现对手势的分割。在进行分割之前，需要先知道手势区域的主方向，本实施例中求取手势主方向的方法为：求取手势区域的质心，然后求得质心向各个指尖点的向量，将这些向量进行平均，该平均向量的方向即为手势区域主方向。然后再根据手势区域的主方向进行手势的分割。本实施例采用的分割方法为：按手势区域主方向得到手势区域的外接矩形，与主方向平行所在边为长，与主方向垂直的边为宽，选择指尖点所在的宽边，从该宽边开始、沿长边截取距离为2倍宽边长度的外接矩形，该外接矩形内所包含的手势区域即为分割所要得到的只保留手指和手掌部分的手势区域。According to the shape characteristics of the human hand, the present invention realizes the segmentation of the gesture according to the ratio of the length of the gesture to the width of the gesture being approximately equal to 2. Before performing segmentation, it is necessary to know the main direction of the gesture area. In this embodiment, the method for obtaining the main direction of the gesture is: obtain the centroid of the gesture area, and then obtain the vectors from the centroid to each fingertip point, and carry out these vectors On average, the direction of the average vector is the main direction of the gesture area. Then the gesture is segmented according to the main direction of the gesture area. The segmentation method adopted in this embodiment is: according to the main direction of the gesture area, the circumscribed rectangle of the gesture area is obtained, the side parallel to the main direction is long, and the side perpendicular to the main direction is wide, and the wide side where the fingertip point is located is selected. Starting from the broad side, intercept the circumscribed rectangle along the long side with a distance of twice the length of the wide side. The gesture area contained in the circumscribed rectangle is the gesture area to be segmented and only the fingers and palms are reserved.

S205：采用保留相位信息的傅里叶描述子提取手势特征：S205: Extracting gesture features using a Fourier descriptor that retains phase information:

对于步骤S204分割得到的手势区域，本发明设计了一种保留相位信息的傅里叶描述子以提取手势特征信息，从而去除传统傅里叶描述子的旋转不变性，达到区分旋转手势的目的。For the gesture area segmented in step S204, the present invention designs a Fourier descriptor that retains phase information to extract gesture feature information, thereby removing the rotation invariance of traditional Fourier descriptors and achieving the purpose of distinguishing rotation gestures.

离散傅里叶系数z(k)可以表示为：The discrete Fourier coefficient z(k) can be expressed as:

其中，p(i)表示离散序列中的第i个数据，n表示离散序列中的数据数量，e表示自然常数，j为虚数单位。本发明中，由于需要进行变换的是手势轮廓，因此离散序列p(i)是步骤S104分割得到的手势区域轮廓像素点中坐标的复数形式。Among them, p(i) represents the i-th data in the discrete sequence, n represents the number of data in the discrete sequence, e represents the natural constant, and j is the imaginary unit. In the present invention, since it is the gesture contour that needs to be transformed, the discrete sequence p(i) is the complex form of the coordinates of the gesture region contour pixels obtained by segmentation in step S104.

傅里叶逆变换可以表示为：The inverse Fourier transform can be expressed as:

根据傅里叶变换的基本性质z(k)＝z^*(n-k)去除傅里叶变换形式z中的从K+1到n-K-1的高频部分，其中，这里的z*代表z的共轭复数形式；K的取值范围为：[0,n/2]。然后再对去除高频部分的z进行傅里叶逆变换，将得到和原傅里叶变换近似的曲线，但是该曲线变得更加平滑，这个曲线成为原傅里叶变化曲线的第K近似曲线。其中，上述所描述的傅里叶系数的子集{z(k)n-K＜k≤K}则就是要用来提取手势特征的傅里叶描述子。According to the basic property z(k)=z ^* (nk) of Fourier transform, the high-frequency part from K+1 to nK-1 in the Fourier transform form z is removed, wherein, z* here represents the total of z The complex form of the yoke; the value range of K is: [0,n/2]. Then inverse Fourier transform is performed on z that removes the high-frequency part, and a curve similar to the original Fourier transform will be obtained, but the curve becomes smoother, and this curve becomes the Kth approximate curve of the original Fourier change curve . Wherein, the subset {z(k)nK<k≤K} of the Fourier coefficients described above is the Fourier descriptor used to extract gesture features.

傅里叶描述子与形状的尺度、方向和曲线的起始位置都有一定的关系。因此，为了保证识别算法具有旋转、平移和尺度不变性，则需要对傅里叶描述子进行归一化操作。根据傅里叶变化的基本性质可以证明，用傅里叶系数表示轮廓时，系数幅值||z(k)||具有旋转不变性、平移不变性以及起点位置无关性，其中，0≤k≤n-1，又由于Z[0]不具有平移不变性，故将k的取值范围设置为[1,n-1]。为了实现傅里叶描述子的尺度不变性，可以将除Z[0]以外的每一个系数的幅值Z(k)||除以||Z(1)||，从而达到尺度不变的特性。归一化操作之后的傅里叶描述子S[k′]可以表示为：The Fourier descriptor has a certain relationship with the scale, direction and starting position of the curve. Therefore, in order to ensure that the recognition algorithm has rotation, translation and scale invariance, the Fourier descriptor needs to be normalized. According to the basic properties of the Fourier transformation, it can be proved that when the Fourier coefficient is used to represent the contour, the coefficient amplitude ||z(k)|| has rotation invariance, translation invariance and independence of the starting point position, where 0≤k ≤n-1, and since Z[0] does not have translation invariance, the value range of k is set to [1,n-1]. In order to achieve the scale invariance of the Fourier descriptor, the amplitude Z(k)|| of each coefficient except Z[0] can be divided by ||Z(1)|| to achieve scale invariance characteristic. The Fourier descriptor S[k′] after the normalization operation can be expressed as:

其中，1≤k′≤n-1；|| ||代表取模运算符。Among them, 1≤k′≤n-1; || || represents the modulo operator.

归一化傅里叶描述子的详细说明可以参见文献“宋瑞华.基于傅里叶描绘子的手势识别算法[D].西安电子科技大学，2008”。For a detailed description of the normalized Fourier descriptor, please refer to the document "Song Ruihua. Gesture Recognition Algorithm Based on Fourier Descriptor [D]. Xidian University, 2008".

本发明为了去除传统傅里叶描述子的旋转不变性，保留了旋转之后的相位信息，改进之后的傅里叶描述子的归一化形式可以表示为：In order to remove the rotation invariance of the traditional Fourier descriptor, the present invention retains the phase information after rotation, and improves the normalized form of the Fourier descriptor It can be expressed as:

其中，表示手势区域主方向与x轴的夹角，j是虚数单位。上面的傅里叶描述子S[k′]保留了手势旋转的相位信息，故该描述子不具有旋转不变性。因此本发明采用除以外的的系数作为手势区域的特征。该特征具有平移和尺度不变性，并且与手势轮廓曲线的起始位置无关，同时又具有旋转可变性，该特征向量可以达到对旋转手势区分的目的。由于不同手势区域的轮廓点数量不一定相同，因此在实际应用中，只在傅里叶描述子中统一选择前Q个构成特征向量，Q的大小可以根据实际情况进行确定。in, Indicates the angle between the main direction of the gesture area and the x-axis, and j is the imaginary unit. The above Fourier descriptor S[k'] retains the phase information of gesture rotation, so this descriptor does not have rotation invariance. Therefore, the present invention adopts the Besides The coefficient of is used as the feature of the gesture area. This feature has translation and scale invariance, and has nothing to do with the initial position of the gesture contour curve, and at the same time has rotation variability. This feature vector can achieve the purpose of distinguishing rotation gestures. Since the number of contour points in different gesture regions is not necessarily the same, in practical applications, only the first Q constituent feature vectors are uniformly selected in the Fourier descriptor, and the size of Q can be determined according to the actual situation.

S102：根据训练样本训练BP神经网络：S102: Train the BP neural network according to the training samples:

将训练样本手势图像的特征向量作为训练样本输入BP神经网络，其对应的手势图像类别作为BP神经网络的输出，对BP神经网络进行训练。BP神经网络是一种常用的神经网络，其网络的具体构成和参数以及训练方法，在此不再赘述。The feature vector of the gesture image of the training sample is input into the BP neural network as a training sample, and the corresponding gesture image category is used as the output of the BP neural network to train the BP neural network. The BP neural network is a commonly used neural network. The specific composition, parameters and training methods of the network will not be repeated here.

S103：对待识别样本进行手势识别：S103: Perform gesture recognition on samples to be recognized:

将待识别手势图像的特征向量输入步骤S102训练好的BP神经网络中，输出识别得到的手势图像类别。Input the feature vector of the gesture image to be recognized into the BP neural network trained in step S102, and output the recognized gesture image category.

为了说明本发明的技术效果，对本发明进行了实验验证。选择的手势训练样本分为手势朝上、手势朝下、手势朝左、手势朝右四个部分，每部分的训练样本数量为80，同样从这四类图像中再选择测试样本，每部分测试样本数量40。为了展示方便，此处只选取手势朝上样本来进行实施过程说明，样本中每张图片的尺寸为256×256，灰度级为256。In order to illustrate the technical effects of the present invention, the present invention has been verified experimentally. The selected gesture training samples are divided into four parts: gestures facing up, gestures facing down, gestures facing left, and gestures facing right. The number of training samples in each part is 80. Similarly, test samples are selected from these four types of images. Sample size 40. For the convenience of demonstration, here we only select the sample with the gesture facing up to illustrate the implementation process. The size of each picture in the sample is 256×256, and the gray level is 256.

首先需要对朝上样本进行图像去噪。由于样本图片的尺度大小为256×256，由于脉冲耦合神经网络用于图像降噪领域时，其神经元个数与待处理图像像素点是一一对应的，因此脉冲耦合神经网络的神经元个数设置为65536个，本实施例采用的脉冲耦合神经网络模型的参数设置为：神经元迭代次数τ＝10，神经元连接强度β＝3，动态门限参数θ_ij＝1，阈值输出的放大系数V_θ＝20，阈值函数的衰减系数a_θ＝0.2，然后利用发放特性对脉冲耦合神经网络进行检测，再通过检测结果判定得到噪声点，然后根据噪声点的类型，采用复合去噪算法进行去噪操作，其中复合去噪算法的参数设置为T₁＝3.5δ_ij，，其中S_k表示噪声窗口，噪声窗口大小和脉冲耦合神经网络的检测窗口大小一致，大小为3×3，T₂＝8。First, image denoising needs to be performed on the upward samples. Since the scale size of the sample picture is 256×256, and when the pulse-coupled neural network is used in the field of image noise reduction, the number of neurons corresponds to the pixel of the image to be processed, so the number of neurons in the pulse-coupled neural network The number is set to 65536, and the parameters of the pulse-coupled neural network model adopted in this embodiment are set to: the number of neuron iterations τ=10, the neuron connection strength β=3, the dynamic threshold parameter _θij =1, and the amplification factor of the threshold output V _θ = 20, the attenuation coefficient a _θ = 0.2 of the threshold function, and then use the emission characteristics to detect the pulse-coupled neural network, and then judge the noise points through the detection results, and then use the composite denoising algorithm to remove the noise points according to the type of noise points Noise operation, wherein the parameters of the composite denoising algorithm are set to T ₁ =3.5δ _ij , where S _k represents the noise window, the size of the noise window is consistent with the detection window size of the pulse-coupled neural network, the size is 3×3, T ₂ = 8.

对去噪后的手势图像进行直方图均衡化后，采用细胞神经网络检测得到手势图像中手势的边缘，实现对手势图像的粗分割，本实施例中，细胞神经网络中每个细胞的邻域的大小为3*3，所采用的模板为：After performing histogram equalization on the denoised gesture image, the edge of the gesture in the gesture image is detected by using the cellular neural network, and the rough segmentation of the gesture image is realized. In this embodiment, the neighborhood of each cell in the cellular neural network The size of is 3*3, and the template used is:

图5是手势粗分割的示例图。Fig. 5 is an example diagram of gesture coarse segmentation.

然后结合手势形状特征对手势图像进行细分割。其中常数L的大小为80，曲率计算的阈值T的大小为0.5。图6是手势细分割的示例图。可以看到，进行手势细分割后，可以消除人脸等区域的影响，得到较为准确的手势区域。Then the gesture image is finely segmented by combining gesture shape features. The size of the constant L is 80, and the size of the threshold T for curvature calculation is 0.5. FIG. 6 is an example diagram of fine segmentation of gestures. It can be seen that after fine segmentation of gestures, the influence of areas such as faces can be eliminated, and more accurate gesture areas can be obtained.

然后再将细分割得到的手势区域的轮廓点坐标构建离散序列，进行傅里叶变换后得到傅里叶系数，然后根据式(13)进行归一化，将归一化后的傅里叶描述子中选择前200个构成手势特征向量。Then the contour point coordinates of the gesture area obtained by subdivision are constructed into a discrete sequence, and the Fourier coefficients are obtained after Fourier transform, and then normalized according to formula (13), and the normalized Fourier description Select the first 200 constituent gesture feature vectors in the subsection.

采用训练样本的手势特征向量对BP神经网络进行训练，其中BP神经网络的输入层的个数由手势特征向量决定，输出层的个数由手势样本种类决定，本发明采用的输入层的个数为200，隐藏层的个数为10，输出层的个数为4。输出结果可以由二进制形式0001,0010,0100,1000表示，其中0001表示手势朝上，0010表示手势朝下，0100表示手势朝左，1000表示手势朝右，根据手势输出的结果判定手势属于何种类型。The gesture feature vector of the training sample is used to train the BP neural network, wherein the number of the input layer of the BP neural network is determined by the gesture feature vector, and the number of the output layer is determined by the gesture sample type, and the number of the input layer adopted in the present invention is 200, the number of hidden layers is 10, and the number of output layers is 4. The output result can be expressed in binary form 0001, 0010, 0100, 1000, where 0001 means the gesture is facing up, 0010 means the gesture is facing down, 0100 means the gesture is facing the left, and 1000 means the gesture is facing the right. According to the result of the gesture output, determine what kind of gesture it belongs to Types of.

为了验证本发明设计的基于脉冲耦合神经网络和复合去噪算法相结合的新型去噪算法的降噪效果的好坏，将本发明设计的去噪算法与单纯复合去噪算法和中值滤波做了对比分析，对比的主要指标是峰值信噪比PSNR。表1是本发明去噪算法与对比算法的PSNR对照表。In order to verify the quality of the noise reduction effect of the new denoising algorithm based on the combination of pulse-coupled neural network and composite denoising algorithm designed by the present invention, the denoising algorithm designed in the present invention is combined with simple composite denoising algorithm and median filtering For comparative analysis, the main index of comparison is the peak signal-to-noise ratio (PSNR). Table 1 is a comparison table of PSNR between the denoising algorithm of the present invention and the comparison algorithm.

表1Table 1

从表1可以看出，在相同的噪声密度的情况下，本发明提出的去噪方法其PSNR的值明显高于中值滤波和单纯复合去噪算法的值。由此可见，本发明设计的结合脉冲耦合神经网络和复合去噪算法的去噪算法具有良好的去噪效果。It can be seen from Table 1 that in the case of the same noise density, the PSNR value of the denoising method proposed by the present invention is obviously higher than that of the median filter and the simple compound denoising algorithm. It can be seen that the denoising algorithm combined with the pulse-coupled neural network and the compound denoising algorithm designed by the present invention has a good denoising effect.

此外，还采用传统的傅里叶描述子的识别效果进行对比，对比指标为手势样本的识别率。表2是传统傅里叶描述子的手势样本识别结果统计表。表3是本发明傅里叶描述子的手势样本识别结果统计表。In addition, the recognition effect of the traditional Fourier descriptor is also used for comparison, and the comparison index is the recognition rate of gesture samples. Table 2 is a statistical table of gesture sample recognition results of traditional Fourier descriptors. Table 3 is a statistical table of gesture sample recognition results of the Fourier descriptor of the present invention.

表2Table 2

表3table 3

通过对比表2和表3的结果可知，传统的傅里叶描述子不能很好的识别旋转较大的手势，识别率仅有71％左右，识别率较低，因此该方法用于旋转手势具有不同含义的场景时，效果不是很好。本发明改进的傅里叶描述子可以在容忍手势旋转一定的角度，虽然在角度旋转过大时会认为是两种不同的图像，但是通过实验验证本发明仍然达到了91％左右的识别率，取得了很好的手势识别效果。By comparing the results of Table 2 and Table 3, it can be seen that the traditional Fourier descriptor cannot recognize the gesture with a large rotation, the recognition rate is only about 71%, and the recognition rate is low, so this method is used for the rotation gesture. It doesn't work very well for scenes with different meanings. The improved Fourier descriptor of the present invention can tolerate a certain angle of gesture rotation. Although it will be considered as two different images when the angle is too large, it is verified by experiments that the present invention still achieves a recognition rate of about 91%. A good gesture recognition effect has been achieved.

尽管上面对本发明说明性的具体实施方式进行了描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。Although the illustrative specific embodiments of the present invention have been described above, so that those skilled in the art can understand the present invention, it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, As long as various changes are within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations using the concept of the present invention are included in the protection list.

Claims

1. A gesture recognition method based on a hybrid neural network is characterized by comprising the following steps:

s1: the method comprises the following steps of extracting the characteristics of a gesture image to be recognized and a gesture image training sample, and specifically comprises the following steps:

s1.1: suggesting a pulse coupling neural network model of the gesture gray image, taking the gray value of each pixel point of the current gesture gray image as the input of a corresponding neuron in the pulse coupling neural network, detecting the pixel point of the gesture image by using the issuing characteristic of the pulse coupling neural network, setting an element corresponding to the pixel point in a detection result matrix to be 1 if the output state of the pixel point is an ignition state, and otherwise setting the element to be 0; traversing each element of the detection result matrix, if the element value is 1, taking the element as the center of a noise reduction processing window, setting the size of the noise reduction processing window according to the actual situation, counting the values of other elements except the central point element in the noise reduction processing window, if the number of the elements with the value of 0 is greater than a preset threshold value, indicating that the central point is a noise point, and if the number of the elements with the value of 0 is not the noise point;

two kinds of noise estimation values H (i, j) and V (i, j) of the noise point are calculated by the following formulas, respectively:

H(i,j)＝|a(i,j)-b(i,j)|

wherein a (i, j) is the gray value of a pixel point (i, j) in the image, and b (i, j) is the median output gray value of the pixel point after median filtering;

wherein m is₁(i, j) and m₂(i, j) respectively representing the gray values of two points which are closest to the gray value of a (i, j) in the neighborhood of the pixel point (i, j);

if H (i, j) ≧ T₁And V (i, j) ≧ T₂，T₁And T₂If the set threshold value is represented, processing the noise point by adopting median filtering, otherwise, processing the noise point by adopting mean filtering;

s1.2: carrying out histogram equalization on the gesture gray level image subjected to denoising in the step S1.1;

s1.3: establishing a cell neural network model of the gesture gray level image, and taking the gray level value of each pixel point (i, j) of the equalized gesture gray level image as the input u of the corresponding cell in the cell neural network model_ijIterating according to a formula of a state transition process until the whole cell neural network converges to obtain the output yi of each cell_j(t); traversing the output value of the cell corresponding to each pixel point in the cellular neural network, and when the output value of a certain pixel point is [0,1]]In the range, if the sum of the pixel values of other pixel points in the corresponding neighborhood is greater than a preset threshold value, the pixel is not an edge pixel, otherwise, the pixel is an edge pixel point; when the output value is in the range of [ -1,0), the pixel is not an edge pixel;

s1.4: obtaining a connected region according to the edge pixel points obtained in the step S1.3, extracting the outline of the connected region, and respectively performing fingertip detection on each connected region, wherein the fingertip detection method comprises the following steps:

traversing each contour pixel point in the communication area, taking the pixel point as a reference point, and marking the coordinate as p (p)_x,p_y0), a distance constant L is preset, and the L-th point p before the p point is taken along the contour direction₁(p_1x,p_1y0), take the lth point p after point p₂(p_2x,p_2y0), calculating a vectorAnd vectorIncluded angle therebetweenThe cosine value cos α, if cos α is larger than a preset curvature threshold value T, the point is judged to be a fingertip point to be determined, otherwise, the point is not judged to be a fingertip point to be determined;

determining fingertip position vector product according to traversal directionIf the gesture area overall contour is clockwise traversed, the vector product sign is negative, otherwise, the vector is positive, and the pointed point vector to be determined is calculatedAnd vectorCross product ofIf the sign of the vector product is the same as the sign corresponding to the fingertip position, reserving the vector product as a fingertip point to be determined, otherwise, not reserving the vector product;

judging whether the difference value of the y coordinate of the undetermined fingertip point with the maximum y coordinate and the y coordinate of the undetermined fingertip point with the minimum y coordinate in all the fingertip points to be determined detected in the connected region exceeds half of the height of the human face, if so, judging that the connected region is not a gesture region, otherwise, judging that the connected region is a gesture region to be determined; further judging whether the number of the pointed points to be determined in each region to be determined is larger than a preset number threshold, if so, determining that the connected region is a gesture region, otherwise, not;

solving a main direction of the gesture area, and dividing the gesture area according to the main direction and the gesture length-width ratio of 2 to obtain a divided gesture area;

s1.5: representing the contour point coordinates of the gesture area in a complex form by using the gesture area obtained by the segmentation in the step S1.4, forming a discrete sequence by using all the contour point coordinates, recording the number of contour points as n, carrying out Fourier transform on the discrete sequence to obtain n Fourier coefficients z (k), wherein k is 0,1, …, n-1, and calculating a Fourier descriptor

Wherein k ═ 1,2, …, n-1,representing the included angle between the main direction of the gesture area and the x axis;

selecting the first Q constructed feature vectors in a Fourier descriptor;

s2: inputting the feature vectors of the gesture images of the training samples into a BP neural network as training samples, and training the BP neural network by using the corresponding gesture image categories as the output of the BP neural network;

s3: and inputting the feature vectors of the gesture images to be recognized into the BP neural network trained in the step S2, and outputting the recognized gesture image categories.

2. Gesture recognition method according to claim 1, characterized in that in step S1.1 the threshold T is₁＝3.5_ij，_ijRepresenting the average absolute deviation of all the pixel points in the denoising window of the pixel point (i, j); threshold value T₂Is an integer of 6 to 10.

3. The gesture recognition method according to claim 1, wherein in step S1.3, the feedback template a in the cellular neural network is:

<mrow> <mi>A</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>a</mi> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>=</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>&NotEqual;</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>

the control template B is:

<mrow> <mi>B</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>b</mi> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>=</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mi>c</mi> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>k</mi> <mo>&NotEqual;</mo> <mi>i</mi> <mo>,</mo> <mi>l</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>

the threshold value I is-d,

wherein (k, l) is neighborhood N with cell C (i, j) in cellular neural network as center and side length of 2r +1_rThe points in (i, j), a, b, c, d are all normal numbers.

4. The gesture recognition method according to claim 1, wherein in step S1.4, the method for obtaining the main direction of the gesture area comprises: and (3) solving the mass center of the gesture area, then solving the vectors of the mass center to each fingertip point, and averaging the vectors, wherein the direction of the average vector is the main direction of the gesture area.