CN109409231B - Multi-feature fusion sign language recognition method based on adaptive hidden Markov - Google Patents
Multi-feature fusion sign language recognition method based on adaptive hidden Markov Download PDFInfo
- Publication number
- CN109409231B CN109409231B CN201811131806.9A CN201811131806A CN109409231B CN 109409231 B CN109409231 B CN 109409231B CN 201811131806 A CN201811131806 A CN 201811131806A CN 109409231 B CN109409231 B CN 109409231B
- Authority
- CN
- China
- Prior art keywords
- sign language
- feature
- video
- fusion
- score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 69
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 12
- 239000013598 vector Substances 0.000 claims abstract description 50
- 238000012549 training Methods 0.000 claims description 51
- 238000012360 testing method Methods 0.000 claims description 33
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 239000011541 reaction mixture Substances 0.000 claims 2
- 230000000694 effects Effects 0.000 abstract description 6
- 206010011878 Deafness Diseases 0.000 description 5
- 238000012706 support-vector machine Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012847 principal component analysis method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于自适应隐马尔可夫的多特征融合手语识别方法,包括:1、首先对手语视频数据库提取多种特征并进行前端融合,即构建特征池集合;之后构建各手语视频在特征池集合中不同特征下的自适应隐马尔可夫模型,并提出了一种特征选择策略,以得到合适的后端得分融合特征;选择好后端得分融合特征之后,计算各后端得分融合特征下的得分向量,为其分配不同的权重,再进行后端得分融合,从而得到最优的融合效果。本发明能够实现对手语视频手语类别的精确识别,并提高识别的鲁棒性。
The invention discloses a multi-feature fusion sign language recognition method based on adaptive hidden Markov. An adaptive hidden Markov model under different features in the feature pool set, and a feature selection strategy is proposed to obtain the appropriate back-end score fusion features; after selecting the back-end score fusion features, calculate each back-end score The score vector under the fusion feature is assigned different weights, and then the back-end score fusion is performed to obtain the optimal fusion effect. The present invention can realize accurate identification of sign language video sign language categories, and improve the robustness of identification.
Description
技术领域technical field
本发明属于计算机视觉技术领域,涉及到模式识别、人工智能等技术,具体地说是一种基于自适应隐马尔可夫的多特征融合手语识别方法。The invention belongs to the technical field of computer vision, relates to pattern recognition, artificial intelligence and other technologies, in particular to a multi-feature fusion sign language recognition method based on adaptive hidden Markov.
技术背景technical background
聋哑人是残疾人中的一个庞大群体,因为无法说话,聋哑人通常使用手语作为沟通手段。当没有学过手语的正常人需要和聋哑人进行交流时,就产生了沟通障碍,而且社会中大部分正常人都没有接受过手语教育。因此手语翻译系统作为便于聋哑人融入社会的辅助方式,对于聋哑人而言,有着重大的意义。但目前手语翻译仍然是计算机视觉领域的一个难题,原因是手语者的身材、做手语的速度和习惯等多方面的因素千差万别,手语识别的情况十分复杂,往往难以取得令人满意的准确率。Deaf people are a large group of people with disabilities, and because they cannot speak, deaf people often use sign language as a means of communication. When normal people who have not learned sign language need to communicate with deaf people, communication barriers arise, and most normal people in society have not received sign language education. Therefore, the sign language interpretation system is of great significance for the deaf-mute as an auxiliary method to facilitate the integration of the deaf and mute into the society. However, sign language translation is still a difficult problem in the field of computer vision at present. The reason is that there are various factors such as the size of signers, the speed and habits of sign language, and the situation of sign language recognition is very complicated, and it is often difficult to obtain satisfactory accuracy.
手语识别是一个序列学习的问题,目前提出的模型有动态时间规整DTW、支持向量机SVM、曲线匹配和神经网络NN等。动态时间规整的计算开销较大,而支持向量机SVM常用于二分类问题,面临多分类问题时则无法使用。使用神经网络的先决条件是拥有大量的训练数据用于模型的训练和优化,当训练数据有限时,神经网络并不能得到最优的模型,因而影响手语识别精度。Sign language recognition is a sequence learning problem, and currently proposed models include dynamic time warping DTW, support vector machine SVM, curve matching and neural network NN. Dynamic time warping is computationally expensive, and support vector machine (SVM) is often used for binary classification problems, but cannot be used when faced with multi-classification problems. The prerequisite for using a neural network is to have a large amount of training data for model training and optimization. When the training data is limited, the neural network cannot obtain the optimal model, thus affecting the accuracy of sign language recognition.
多模态特征融合,传统的特征融合包括前端融合和后端得分融合,前端融合执行在特征的层面,而后端得分融合则是执行在分类识别概率得分层面。后端得分融合通常时间开销过大,而且在不同的模型中,效果较差的特征可能会主导特征融合,降低了融合后的效果。Multi-modal feature fusion, traditional feature fusion includes front-end fusion and back-end score fusion. Front-end fusion is performed at the feature level, while back-end score fusion is performed at the classification and recognition probability score level. Back-end score fusion is usually time-intensive, and in different models, less effective features may dominate feature fusion, reducing the effect of fusion.
发明内容SUMMARY OF THE INVENTION
本发明是为了改善手语识别精度,提供一种基于自适应隐马尔可夫的多特征融合手语识别方法,以期能够实现对于手语视频手语类别的精确识别,并提高识别的鲁棒性。In order to improve the sign language recognition accuracy, the present invention provides a multi-feature fusion sign language recognition method based on adaptive hidden Markov, so as to realize accurate recognition of sign language video sign language categories and improve the robustness of recognition.
本发明为解决技术问题采用如下技术方案:The present invention adopts the following technical scheme for solving the technical problem:
本发明一种基于自适应隐马尔可夫的多特征融合手语识别方法的特点是按如下步骤进行:The feature of a multi-feature fusion sign language recognition method based on adaptive hidden Markov of the present invention is to carry out the following steps:
步骤1、获取手语视频数据库,并将所述手语视频数据库中的手语视频分为训练数据集和测试数据集;所述训练数据集中包含N种手语单词对应的手语视频,每种手语单词对应多个手语视频;将所述N种手语单词记作C={c1,…,cn,…,cN},其中cn为第n种手语单词,1≤n≤N;
将所述训练数据集中的每种手语单词对应的多个手语视频作为相应手语单词所对应的手语视频集,从而得到N种手语单词所对应的手语视频集,记为Set1,…,Setn,…,SetN,其中Setn为第n种手语单词cn所对应的手语视频集;The multiple sign language videos corresponding to each sign language word in the training data set are used as the sign language video set corresponding to the corresponding sign language word, so as to obtain the sign language video set corresponding to N kinds of sign language words, which are marked as Set 1 ,...,Set n ,...,Set N , where Set n is the sign language video set corresponding to the nth sign language word c n ;
步骤2、构建特征种类集合F:Step 2. Construct a feature type set F:
对所述训练数据集中的手语视频提取M种特征,得到特征种类集合F={f1,f2,…,fM},fM表示第M种特征,M表示特征种类的总数;Extracting M kinds of features from the sign language video in the training data set to obtain a feature category set F={f 1 , f 2 ,..., f M }, where f M represents the M-th feature, and M represents the total number of feature types;
步骤3、构建特征池集合F′:Step 3. Construct a feature pool set F':
步骤3.1、定义变量i,并初始化i=1;Step 3.1, define variable i, and initialize i=1;
步骤3.2、定义第i个融合特征集合为Fi,并初始化Fi=F;Step 3.2, define the i-th fusion feature set as F i , and initialize F i =F;
步骤3.3、令i=2;Step 3.3, let i=2;
步骤3.4、从所述特征种类集合F中任取i种不同的特征并按序拼接为一种融合特征,从而得到由种融合特征组成的第i个融合特征集合Fi;Step 3.4, arbitrarily select i different features from the feature type set F and splicing them into a fusion feature in sequence, so as to obtain by The i-th fusion feature set F i composed of fusion features;
步骤3.5、令i+1赋值给i,判断i≤M是否成立,若成立,则执行步骤3.4;否则,表示得到M个融合特征集合F1,…,Fi,…,FM,并执行步骤3.6;Step 3.5, assign i+1 to i , and judge whether i≤M is established, if so, execute step 3.4; otherwise, it means to obtain M fusion feature sets F 1 ,...,Fi ,...,FM , and execute Step 3.6;
步骤3.6、将所述M个融合特征集合F1,…,Fi,…,FM中所有的特征构成特征池集合F′,并记为F′={f′1,…,f′m',…,f″M},其中f′m’表示第m′种特征池特征,M′表示特征池特征的总数;Step 3.6. All the features in the M fusion feature sets F 1 ,...,F i ,...,FM form a feature pool set F', and denote it as F'={f' 1 ,...,f' m ' ,...,f" M }, where f'm' represents the m'th feature pool feature, and M' represents the total number of feature pool features;
步骤4、采用高斯混合-隐马尔可夫GMM-HMM模型,构建N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合:Step 4. Use the Gaussian mixture-Hidden Markov GMM-HMM model to construct an adaptive hidden Markov model set of N kinds of sign language words under the features of M' feature pools:
步骤4.1、初始化n=1;Step 4.1, initialize n=1;
步骤4.2、初始化m′=1;Step 4.2, initialize m'=1;
步骤4.3、使用AP吸引子传播聚类算法对第n种手语单词cn对应的手语视频集Setn进行聚类处理,得到所述第n种手语单词cn对应的手语视频集Setn在所述第m′种特征池特征f″m下的特征聚类数 Step 4.3. Use the AP attractor propagation clustering algorithm to perform clustering processing on the sign language video set Set n corresponding to the nth sign language word cn, and obtain the sign language video set Set n corresponding to the nth sign language word cn at the location where the sign language video set Set n corresponds . The number of feature clusters under the m′-th feature pool feature f″ m
步骤4.4、定义所述第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型为并根据式(1)计算所述自适应隐马尔可夫模型的隐状态个数 Step 4.4, define the adaptive hidden Markov model of the nth sign language word c n under the m'th feature pool feature f'm ' as and calculate the adaptive hidden Markov model according to formula (1) The number of hidden states of
式(1)中,G为高斯混合模型中高斯函数个数;In formula (1), G is the number of Gaussian functions in the Gaussian mixture model;
步骤4.5、根据所述隐状态个数和高斯函数个数G,利用Baum-Welch算法在所述第n种手语单词cn对应的手语视频集Setn上进行学习,获得所述第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型 Step 4.5, according to the number of hidden states and the number of Gaussian functions G, use the Baum-Welch algorithm to learn on the sign language video set Set n corresponding to the nth sign language word c n , and obtain the nth sign language word c n in the m'th type feature pool Adaptive Hidden Markov Models with Features f'm'
步骤4.6、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤4.3;否则,执行步骤4.7;Step 4.6, assign
步骤4.7、令n+1赋值给n,判断n≤N是否成立,若成立,则执行步骤4.2;否则,表示得到N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合并执行步骤5;Step 4.7. Assign n+1 to n, and judge whether n≤N is established. If so, go to step 4.2; gather and go to step 5;
步骤5、构建选择特征集合F″用于后端得分融合:Step 5. Construct a selection feature set F" for back-end score fusion:
步骤5.1、初始化m′=1;Step 5.1, initialize m'=1;
步骤5.2、获取所述训练数据集中任意一个训练视频A,根据式(2)计算所述训练视频A在第m′种特征池特征f′m’下的得分向量 Step 5.2: Obtain any training video A in the training data set, and calculate the score vector of the training video A under the m'th feature pool feature f'm ' according to formula (2).
式(2)中,表示所述训练视频A在所述第n种手语单词cn在第m′种特征池特征fm″下的自适应隐马尔可夫模型上的手语识别概率得分;In formula (2), Represents the adaptive hidden Markov model of the training video A in the nth sign language word c n under the m′th feature pool feature f m ″ Sign language recognition probability score on ;
步骤5.3、重复步骤5.2,直至得到所述训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量;并计算所述训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量的平均方差之和,记为第m′种特征池特征f′m’的训练方差Varm′;Step 5.3. Repeat step 5.2 until the score vector of all sign language videos in the training data set under the m'th feature pool feature f'm ' is obtained; and calculate the m'th feature of all sign language videos in the training data set. The sum of the average variances of the score vectors under the pool feature f'm' is recorded as the training variance Var m ' of the m'th feature pool feature f'm ' ;
步骤5.4、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤5.2;否则,表示得到所述M′种特征池特征所对应的训练方差Var1,…,Varm′,…,VarM′,1≤m′≤M′,并执行步骤5.5;Step 5.4, assign
步骤5.5、对所述训练方差Var1,…,Varm′,…,VarM′进行降序排序,得到排序后的训练方差;Step 5.5, sort the training variances Var 1 ,...,Var m' ,...,Var M' in descending order to obtain the sorted training variances;
设置参数1≤K<M′,选取前K个排序后的训练方差所对应的特征池特征,并构成选择特征集合F″,并记为F″={f″1,…,f″k,…,f″K},其中f″k表示第k种选择特征,1≤k≤K;Set the
步骤6、获取所述测试数据集中任意一个测试视频B,计算所述测试视频B在所述选择特征集合F″={f″1,…,f″k,…,f″K}下的各得分向量:Step 6: Acquire any test video B in the test data set, and calculate each of the test video B under the selection feature set F″={f″ 1 ,...,f″ k ,...,f″ K }. Score vector:
步骤6.1、初始化k=1;Step 6.1, initialize k=1;
步骤6.2、根据式(3)计算所述测试视频B在第k种选择特征f″k下的得分向量 Step 6.2. Calculate the score vector of the test video B under the k-th selection feature f" k according to formula (3).
式(3)中,表示第n种手语单词cn在第k种选择特征f″k下的自适应马尔可夫模型;表示所述测试视频B在所述第n种手语单词cn在第k种选择特征f″k下的自适应隐马尔可夫模型上的手语识别概率得分;In formula (3), represents the adaptive Markov model of the nth sign language word c n under the kth selection feature f″ k ; represents the adaptive hidden Markov model of the test video B under the nth sign language word cn under the kth selection feature f" k Sign language recognition probability score on ;
步骤6.3、利用Min-Max标准化对所述测试视频B在第k种选择特征f″k下的得分向量进行归一化处理,得到归一化后的得分向量并对所述归一化后的得分向量中的元素进行降序排列后,画出其分数曲线,再计算分数曲线下的区域面积,从而得到所述归一化后的得分向量所对应的权重面积 Step 6.3, use Min-Max normalization to the score vector of the test video B under the k-th selection feature f" k Perform normalization processing to get the normalized score vector and the normalized score vector After the elements are arranged in descending order, draw the score curve, and then calculate the area under the score curve, so as to obtain the normalized score vector The corresponding weight area
步骤6.4、令k+1赋值给k,若k>K成立,则表示得到所述测试视频B在K种选择特征下的归一化后的得分向量及其所对应的权重面积并执行步骤7;否则,执行步骤6.2;Step 6.4, assign k+1 to k, if k>K is established, it means that the normalized score vector of the test video B under the K selection features is obtained and its corresponding weight area And go to step 7; otherwise, go to step 6.2;
步骤7、后端得分融合计算,并输出所述测试视频B对应的手语单词:Step 7, the back-end score fusion calculation, and output the sign language words corresponding to the test video B:
步骤7.1、根据式(4)计算所测试视频B在第k种选择特征f″k下的归一化后的得分向量的权重从而得到所述K种选择特征下的归一化后的得分向量的各自权重 Step 7.1. Calculate the normalized score vector of the tested video B under the k-th selection feature f″ k according to formula (4). the weight of Thereby, the normalized score vector under the K selection features is obtained their respective weights
步骤7.2、根据式(5)得到所述测试视频B的后端得分融合向量 Step 7.2, obtain the back-end score fusion vector of the test video B according to formula (5)
步骤7.3、根据式(6)获得所述后端得分融合向量中最大值所对应的手语单词序号n*:Step 7.3, obtain the back-end score fusion vector according to formula (6) The sign language word number n * corresponding to the maximum value:
从而得到所述测试视频B对应的手语单词为第n*种手语单词 Thus obtaining the sign language word corresponding to the test video B is the nth sign language word
与已有技术相比,本发明的有益效果体现在:Compared with the prior art, the beneficial effects of the present invention are embodied in:
1、本发明采用了高斯混合-隐马尔可夫模型GMM-HMM,该模型常用来解决序列问题,可以在训练数据较少的情况下,依旧取得较好的效果;本发明利用自适应隐马尔可夫模型,结合前端后端得分融合和特征选择策略,提高了手语识别的精度和鲁棒性;1. The present invention adopts the Gaussian mixture-Hidden Markov model GMM-HMM, which is often used to solve sequence problems, and can still achieve good results with less training data; the present invention uses adaptive hidden Markov models. Kov model, combined with front-end and back-end score fusion and feature selection strategy, improves the accuracy and robustness of sign language recognition;
2、本发明提出了一种自适应隐马尔可夫模型,使用AP吸引子传播聚类算法得到每个手语单词对应的手语视频集在每种特征下的特征聚类数,自适应地得到最佳隐马尔可夫模型参数,为每种特征下的每种手语单词训练不同的自适应隐马尔可夫模型,显著提升了预测效果;2. The present invention proposes an adaptive hidden Markov model, which uses the AP attractor propagation clustering algorithm to obtain the number of feature clusters under each feature of the sign language video set corresponding to each sign language word, and adaptively obtains the maximum number of feature clusters. Optimize the parameters of the hidden Markov model, train different adaptive hidden Markov models for each sign language word under each feature, which significantly improves the prediction effect;
3、本发明采用了前端融合和后端得分融合策略,利用前端融合从提取的视频特征中选取不同特征进行拼接,得到所有可能的融合特征;进一步的,本发明提出的后端得分融合方法提供了一种自适应的权值分配方法,揭示了这些特征的重要性并以加权方式聚合它们的识别概率得分,避免了效果差的特征主导特征融合而影响融合结果;3. The present invention adopts front-end fusion and back-end score fusion strategies, and uses front-end fusion to select different features from the extracted video features for splicing to obtain all possible fusion features; further, the back-end score fusion method proposed by the present invention provides An adaptive weight allocation method is proposed, which reveals the importance of these features and aggregates their recognition probability scores in a weighted manner, avoiding the fusion of poorly effective features that dominate the fusion results and affect the fusion results;
4、本发明提出了一种特征选择策略,选取合适的后端得分融合特征用于后端得分融合,该策略通过对比所有特征的方差性能,选取方差性能好的特征用于后端得分融合,从而避免了使用效果差的特征进行融合而影响融合效果。4. The present invention proposes a feature selection strategy, which selects appropriate back-end score fusion features for back-end score fusion. This strategy selects features with good variance performance for back-end score fusion by comparing the variance performance of all features. In this way, it is avoided to use the features with poor effect for fusion to affect the fusion effect.
附图说明Description of drawings
图1为本发明方法的示意图。Figure 1 is a schematic diagram of the method of the present invention.
具体实施方式Detailed ways
本实施例中,如图1所示,一种基于自适应隐马尔可夫的多特征融合手语识别方法,本方法采用高斯混合-隐马尔可夫模型GMM-HMM,首先对手语视频数据库提取多种特征并进行前端融合,即构建特征池集合;之后构建各手语视频在特征池集合中不同特征下的自适应隐马尔可夫模型,并提出了一种特征选择策略,以得到合适的后端得分融合特征;选择好后端得分融合特征之后,计算各后端得分融合特征下的得分向量,为其分配不同的权重,再进行后端得分融合,从而得到最优的融合效果。具体的说,如图1所示,包括如下步骤:In this embodiment, as shown in FIG. 1, a multi-feature fusion sign language recognition method based on adaptive hidden Markov, this method adopts Gaussian mixture-hidden Markov model GMM-HMM, Then, the adaptive hidden Markov model of each sign language video under different features in the feature pool set is constructed, and a feature selection strategy is proposed to obtain a suitable back-end Score fusion features: After selecting the back-end score fusion features, calculate the score vector under each back-end score fusion feature, assign different weights to them, and then perform back-end score fusion to obtain the optimal fusion effect. Specifically, as shown in Figure 1, it includes the following steps:
步骤1、获取手语视频数据库,并将手语视频数据库中的手语视频分为训练数据集和测试数据集;训练数据集中包含N种手语单词对应的手语视频,每种手语单词对应多个手语视频;将N种手语单词记作C={c1,…,cn,…,cN},其中cn为第n种手语单词,1≤n≤N;
将训练数据集中的每种手语单词对应的多个手语视频作为相应手语单词所对应的手语视频集,从而得到N种手语单词所对应的手语视频集,记为Set1,…,Setn,…,SetN,其中Setn为第n种手语单词cn所对应的手语视频集;The multiple sign language videos corresponding to each sign language word in the training data set are taken as the sign language video set corresponding to the corresponding sign language word, so as to obtain the sign language video set corresponding to N kinds of sign language words, which are recorded as Set 1 ,...,Set n ,... , Set N , where Set n is the sign language video set corresponding to the nth sign language word c n ;
在本实施例中,手语视频数据库中共有370种手语单词对应的手语视频,每种手语单词对应25个手语视频,手语视频分别由5个人演示,每个人重复演示5遍;手语单词可以是单词或词组;In this embodiment, there are 370 sign language videos corresponding to sign language words in the sign language video database, each sign language word corresponds to 25 sign language videos, and the sign language videos are demonstrated by 5 people, and each person repeats the demonstration 5 times; sign language words can be words or phrase;
步骤2、构建特征种类集合F:Step 2. Construct a feature type set F:
对训练数据集中的手语视频提取M种特征,得到特征种类集合F={f1,f2,…,fM},fM表示第M种特征,M表示特征种类的总数;Extract M features from the sign language video in the training data set, and obtain a feature category set F={f 1 , f 2 ,..., f M }, where f M represents the M-th feature, and M represents the total number of feature types;
步骤3、构建特征池集合F′:Step 3. Construct a feature pool set F':
步骤3.1、定义变量i,并初始化i=1;Step 3.1, define variable i, and initialize i=1;
步骤3.2、定义第i个融合特征集合为Fi,并初始化Fi=F;Step 3.2, define the i-th fusion feature set as F i , and initialize F i =F;
步骤3.3、令i=2;Step 3.3, let i=2;
步骤3.4、从特征种类集合F中任取i种不同的特征并按序拼接为一种融合特征,从而得到由种融合特征组成的第i个融合特征集合Fi;Step 3.4, randomly select i different features from the feature type set F and splicing them into a fusion feature in sequence, so as to obtain the The i-th fusion feature set F i composed of fusion features;
步骤3.5、令i+1赋值给i,判断i≤M是否成立,若成立,则执行步骤3.4;否则,表示得到M个融合特征集合F1,…,Fi,…,FM,并执行步骤3.6;Step 3.5, assign i+1 to i , and judge whether i≤M is established, if so, execute step 3.4; otherwise, it means to obtain M fusion feature sets F 1 ,...,Fi ,...,FM , and execute Step 3.6;
步骤3.6、将M个融合特征集合F1,…,Fi,…,FM中所有的特征构成特征池集合F′,并记为F′={f′1,…,f′m’,…,f′M’},其中f′m’表示第m′种特征池特征,M′表示特征池特征的总数;Step 3.6. All the features in the M fusion feature sets F 1 ,...,F i ,...,FM constitute a feature pool set F', and denote it as F'={f' 1 ,...,f'm' , ...,f'M' }, where f'm' represents the m'th feature pool feature, and M' represents the total number of feature pool features;
在本实施例中,对手语视频数据库中所有手语视频提取方向梯度直方图HOG特征,并使用PCA主成分分析法对所有特征进行降维处理,从而得到HOG特征;In the present embodiment, the HOG feature of the directional gradient histogram is extracted from all the sign language videos in the sign language video database, and the PCA principal component analysis method is used to perform dimensionality reduction processing on all the features, thereby obtaining the HOG feature;
对手语视频数据库中所有手语视频提取骨架结点坐标的SP特征,并对所有SP特征使用随机高斯扰动进行处理,适当增加噪声,避免造成过拟合,从而得到SP特征;Extract SP features of skeleton node coordinates from all sign language videos in the sign language video database, and use random Gaussian perturbation to process all SP features, increase noise appropriately to avoid overfitting, and obtain SP features;
对手语视频数据库中所有手语视频的SP特征和HOG特征拼接,从而得到SP-HOG前端融合特征;Splicing SP features and HOG features of all sign language videos in the sign language video database to obtain SP-HOG front-end fusion features;
步骤4、采用高斯混合-隐马尔可夫GMM-HMM模型,构建N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合:Step 4. Use the Gaussian mixture-Hidden Markov GMM-HMM model to construct an adaptive hidden Markov model set of N kinds of sign language words under the features of M' feature pools:
步骤4.1、初始化n=1;Step 4.1, initialize n=1;
步骤4.2、初始化m′=1;Step 4.2, initialize m'=1;
步骤4.3、使用AP吸引子传播聚类算法对第n种手语单词cn对应的手语视频集Setn进行聚类处理,得到第n种手语单词cn对应的手语视频集Setn在第m′种特征池特征f′m’下的特征聚类数 Step 4.3. Use the AP attractor propagation clustering algorithm to perform clustering processing on the sign language video set Set n corresponding to the nth sign language word c n , and obtain the sign language video set Set n corresponding to the nth sign language word c n in the m'th. The number of feature clusters under the feature pool feature f'm'
步骤4.4、定义第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型为并根据式(1)计算自适应隐马尔可夫模型的隐状态个数 Step 4.4. Define the adaptive hidden Markov model of the nth sign language word c n under the m'th feature pool feature f'm ' as And according to formula (1) to calculate the adaptive hidden Markov model The number of hidden states of
式(1)中,G为高斯混合模型中高斯函数个数;在本实施例中,G被设置为3;In formula (1), G is the number of Gaussian functions in the Gaussian mixture model; in this embodiment, G is set to 3;
步骤4.5、根据隐状态个数和高斯函数个数G,利用Baum-Welch算法在第n种手语单词cn对应的手语视频集Setn上进行学习,获得第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型Baum-Welch算法是一种解决隐马尔可夫模型参数估计问题的经典算法;Step 4.5, according to the number of hidden states and the number of Gaussian functions G, use the Baum-Welch algorithm to learn on the sign language video set Set n corresponding to the nth sign language word c n , and obtain the nth sign language word c n in the m'th type feature pool feature f'm ' Adaptive Hidden Markov Models under Baum-Welch algorithm is a classical algorithm to solve the parameter estimation problem of Hidden Markov Model;
步骤4.6、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤4.3;否则,执行步骤4.7;Step 4.6, assign
步骤4.7、令n+1赋值给n,判断n≤N是否成立,若成立,则执行步骤4.2;否则,表示得到N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合并执行步骤5;Step 4.7. Assign n+1 to n, and judge whether n≤N is established. If so, go to step 4.2; gather and go to step 5;
步骤5、构建选择特征集合F″用于后端得分融合:Step 5. Construct a selection feature set F" for back-end score fusion:
步骤5.1、初始化m′=1;Step 5.1, initialize m'=1;
步骤5.2、获取训练数据集中任意一个训练视频A,根据式(2)计算训练视频A在第m′种特征池特征f′m’下的得分向量 Step 5.2. Obtain any training video A in the training data set, and calculate the score vector of training video A under the m'th feature pool feature f'm ' according to formula (2).
式(2)中,表示训练视频A在第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型上的手语识别概率得分;由维特比Vertibe算法在自适应隐马尔可夫模型上计算得到;维特比Vertibe算法是一种动态规划算法,被广泛应用于求解隐马尔可夫模型的预测问题;In formula (2), Represents the adaptive hidden Markov model of training video A in the nth sign language word c n under the m'th feature pool feature f'm ' Sign Language Recognition Probability Scores on ; by the Viterbi Vertibe Algorithm in Adaptive Hidden Markov Models The Viterbi algorithm is a dynamic programming algorithm, which is widely used to solve the prediction problem of hidden Markov models;
步骤5.3、重复步骤5.2,直至得到训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量;并计算训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量的平均方差之和,记为第m′种特征池特征f′m’的训练方差Varm′;Step 5.3, repeat step 5.2 until the score vector of all sign language videos in the training dataset under the m'th feature pool feature f'm' is obtained; and calculate the m'th feature pool feature f' of all sign language videos in the training dataset The sum of the average variances of the score vectors under m' is denoted as the training variance Var m' of the m'th feature pool feature f' m ';
在本实施例中,将训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量的平均方差之和作为特征选择的标准;得分向量的平均方差反映得分向量中的变量与得分向量均值的偏离度,较小的平均方差意味着训练视频A在第m′种特征池特征f′m’下,与不同种手语单词相关的概率难以区分,相反,更大的平均方差表明得分向量具有良好的辨别力;In this embodiment, the sum of the average variances of the score vectors of all sign language videos in the training data set under the m'-th feature pool feature f'm' is used as the criterion for feature selection; the score vector The mean variance of reflect the variables in the score vector with the score vector The degree of deviation from the mean, a smaller mean variance means that the training video A is indistinguishable from the probability associated with different sign language words under the m'th feature pool feature f'm ' , on the contrary, a larger mean variance indicates that the score vector have good discrimination;
步骤5.4、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤5.2;否则,表示得到M′种特征池特征所对应的训练方差Var1,…,Varm′,…,VarM′,1≤m′≤M′,并执行步骤5.5;Step 5.4. Assign
步骤5.5、对训练方差Var1,…,Varm′,…,VarM′进行降序排序,得到排序后的训练方差;Step 5.5, sort the training variances Var 1 ,…,Var m′ ,…,Var M′ in descending order to obtain the sorted training variances;
设置参数1≤K<M′,选取前K个排序后的训练方差所对应的特征池特征,并构成选择特征集合F″,并记为F″={f″1,…,f″k,…,f″K},其中f″k表示第k种选择特征,1≤k≤K;Set the
步骤6、获取测试数据集中任意一个测试视频B,计算测试视频B在选择特征集合F″={f″1,…,f″k,…,f″K}下的各得分向量:Step 6: Obtain any test video B in the test data set, and calculate each score vector of the test video B under the selection feature set F″={f″ 1 ,...,f″ k ,...,f″ K }:
步骤6.1、初始化k=1;Step 6.1, initialize k=1;
步骤6.2、根据式(3)计算测试视频B在第k种选择特征f″k下的得分向量 Step 6.2. Calculate the score vector of the test video B under the k-th selection feature f″ k according to formula (3).
式(3)中,表示第n种手语单词cn在第k种选择特征f″k下的自适应马尔可夫模型;表示测试视频B在第n种手语单词cn在第k种选择特征f″k下的自适应隐马尔可夫模型上的手语识别概率得分;In formula (3), represents the adaptive Markov model of the nth sign language word c n under the kth selection feature f″ k ; represents the adaptive hidden Markov model of the test video B under the nth sign language word c n under the kth selection feature f″ k Sign language recognition probability score on ;
步骤6.3、利用Min-Max标准化对测试视频B在第k种选择特征f″k下的得分向量进行归一化处理,得到归一化后的得分向量并对归一化后的得分向量中的元素进行降序排列后,画出其分数曲线,再计算分数曲线下的区域面积,从而得到归一化后的得分向量所对应的权重面积 Step 6.3. Use Min-Max normalization to measure the score vector of test video B under the kth selection feature f" k Perform normalization processing to get the normalized score vector and the normalized score vector After the elements are sorted in descending order, draw the score curve, and then calculate the area under the score curve to obtain the normalized score vector The corresponding weight area
步骤6.4、令k+1赋值给k,若k>K成立,则表示得到测试视频B在K种选择特征下的归一化后的得分向量及其所对应的权重面积并执行步骤7;否则,执行步骤6.2;Step 6.4. Assign k+1 to k. If k>K is established, it means that the normalized score vector of the test video B under the K selection features is obtained. and its corresponding weight area And go to step 7; otherwise, go to step 6.2;
步骤7、后端得分融合计算,并输出测试视频B对应的手语单词:Step 7, the back-end score fusion calculation, and output the sign language words corresponding to the test video B:
步骤7.1、根据式(4)计算所测试视频B在第k种选择特征f″k下的归一化后的得分向量的权重从而得到K种选择特征下的归一化后的得分向量的各自权重 Step 7.1. Calculate the normalized score vector of the tested video B under the k-th selection feature f″ k according to formula (4). the weight of Thereby, the normalized score vector under K kinds of selection features is obtained their respective weights
步骤7.2、根据式(5)得到测试视频B的后端得分融合向量 Step 7.2, obtain the back-end score fusion vector of test video B according to formula (5)
步骤7.3、根据式(6)获得后端得分融合向量中最大值所对应的手语单词序号n*:Step 7.3, obtain the back-end score fusion vector according to formula (6) The sign language word number n * corresponding to the maximum value:
从而得到测试视频B对应的手语单词为第n*种手语单词 Thereby, the sign language word corresponding to the test video B is obtained as the n * th sign language word
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811131806.9A CN109409231B (en) | 2018-09-27 | 2018-09-27 | Multi-feature fusion sign language recognition method based on adaptive hidden Markov |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811131806.9A CN109409231B (en) | 2018-09-27 | 2018-09-27 | Multi-feature fusion sign language recognition method based on adaptive hidden Markov |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109409231A CN109409231A (en) | 2019-03-01 |
CN109409231B true CN109409231B (en) | 2020-07-10 |
Family
ID=65466362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811131806.9A Active CN109409231B (en) | 2018-09-27 | 2018-09-27 | Multi-feature fusion sign language recognition method based on adaptive hidden Markov |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109409231B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259804B (en) * | 2020-01-16 | 2023-03-14 | 合肥工业大学 | Multi-modal fusion sign language recognition system and method based on graph convolution |
CN111259860B (en) * | 2020-02-17 | 2022-03-15 | 合肥工业大学 | Multi-order feature dynamic fusion sign language translation method based on data self-driven |
CN113642422B (en) * | 2021-07-27 | 2024-05-24 | 东北电力大学 | Continuous Chinese sign language recognition method |
CN116471460B (en) * | 2023-05-08 | 2025-06-13 | 东南大学 | A fast recognition method for resolution-adaptive encrypted videos based on video fingerprint |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893942A (en) * | 2016-03-25 | 2016-08-24 | 中国科学技术大学 | eSC and HOG-based adaptive HMM sign language identifying method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9817881B2 (en) * | 2013-10-16 | 2017-11-14 | Cypress Semiconductor Corporation | Hidden markov model processing engine |
-
2018
- 2018-09-27 CN CN201811131806.9A patent/CN109409231B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893942A (en) * | 2016-03-25 | 2016-08-24 | 中国科学技术大学 | eSC and HOG-based adaptive HMM sign language identifying method |
Non-Patent Citations (3)
Title |
---|
Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition;DAN GUO等;《ACM Transactions Multimedia Computing, Communications, and Applications(TOMM)》;20180131;第14卷(第1期);第8:2-8:15页 * |
SIGN LANGUAGE RECOGNITION BASED ON ADAPTIVE HMMS WITH DATA AUGMENTATION;Dan Guo等;《2016 IEEE International Conference on Image Processing (ICIP)》;20160819;第2876-2880页 * |
基于Kinect3D节点的连续HMM手语识别;沈娟等;《计算机与信息工程》;20170531;第40卷(第5期);第638-642页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109409231A (en) | 2019-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Taherkhani et al. | AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning | |
CN108182427B (en) | A face recognition method based on deep learning model and transfer learning | |
CN109409231B (en) | Multi-feature fusion sign language recognition method based on adaptive hidden Markov | |
US7362892B2 (en) | Self-optimizing classifier | |
CN103605972B (en) | Non-restricted environment face verification method based on block depth neural network | |
CN110164452A (en) | A kind of method of Application on Voiceprint Recognition, the method for model training and server | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
CN117611932B (en) | Image classification method and system based on double pseudo tag refinement and sample re-weighting | |
CN106022273A (en) | Handwritten form identification system of BP neural network based on dynamic sample selection strategy | |
CN101447020A (en) | Pornographic image recognizing method based on intuitionistic fuzzy | |
WO2021190046A1 (en) | Training method for gesture recognition model, gesture recognition method, and apparatus | |
CN116226629B (en) | Multi-model feature selection method and system based on feature contribution | |
CN102103691A (en) | Identification method for analyzing face based on principal component | |
CN109493916A (en) | A kind of Gene-gene interactions recognition methods based on sparsity factorial analysis | |
CN113420833A (en) | Visual question-answering method and device based on question semantic mapping | |
CN111079837A (en) | Method for detecting, identifying and classifying two-dimensional gray level images | |
CN107193993A (en) | The medical data sorting technique and device selected based on local learning characteristic weight | |
CN109101984B (en) | Image identification method and device based on convolutional neural network | |
CN109255339A (en) | Classification method based on adaptive depth forest body gait energy diagram | |
Azam et al. | Speaker verification using adapted bounded Gaussian mixture model | |
CN109948662B (en) | A deep clustering method of face images based on K-means and MMD | |
CN111694954A (en) | Image classification method and device and electronic equipment | |
CN111860601B (en) | Method and device for predicting type of large fungi | |
Halkias et al. | Sparse penalty in deep belief networks: using the mixed norm constraint | |
CN108734116B (en) | Face recognition method based on variable speed learning deep self-coding network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |