[go: up one dir, main page]

CN109409231B - Multi-feature fusion sign language recognition method based on adaptive hidden Markov - Google Patents

Multi-feature fusion sign language recognition method based on adaptive hidden Markov Download PDF

Info

Publication number
CN109409231B
CN109409231B CN201811131806.9A CN201811131806A CN109409231B CN 109409231 B CN109409231 B CN 109409231B CN 201811131806 A CN201811131806 A CN 201811131806A CN 109409231 B CN109409231 B CN 109409231B
Authority
CN
China
Prior art keywords
sign language
feature
video
fusion
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811131806.9A
Other languages
Chinese (zh)
Other versions
CN109409231A (en
Inventor
郭丹
宋培培
赵烨
汪萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201811131806.9A priority Critical patent/CN109409231B/en
Publication of CN109409231A publication Critical patent/CN109409231A/en
Application granted granted Critical
Publication of CN109409231B publication Critical patent/CN109409231B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于自适应隐马尔可夫的多特征融合手语识别方法,包括:1、首先对手语视频数据库提取多种特征并进行前端融合,即构建特征池集合;之后构建各手语视频在特征池集合中不同特征下的自适应隐马尔可夫模型,并提出了一种特征选择策略,以得到合适的后端得分融合特征;选择好后端得分融合特征之后,计算各后端得分融合特征下的得分向量,为其分配不同的权重,再进行后端得分融合,从而得到最优的融合效果。本发明能够实现对手语视频手语类别的精确识别,并提高识别的鲁棒性。

Figure 201811131806

The invention discloses a multi-feature fusion sign language recognition method based on adaptive hidden Markov. An adaptive hidden Markov model under different features in the feature pool set, and a feature selection strategy is proposed to obtain the appropriate back-end score fusion features; after selecting the back-end score fusion features, calculate each back-end score The score vector under the fusion feature is assigned different weights, and then the back-end score fusion is performed to obtain the optimal fusion effect. The present invention can realize accurate identification of sign language video sign language categories, and improve the robustness of identification.

Figure 201811131806

Description

基于自适应隐马尔可夫的多特征融合手语识别方法Multi-feature fusion sign language recognition method based on adaptive hidden Markov

技术领域technical field

本发明属于计算机视觉技术领域,涉及到模式识别、人工智能等技术,具体地说是一种基于自适应隐马尔可夫的多特征融合手语识别方法。The invention belongs to the technical field of computer vision, relates to pattern recognition, artificial intelligence and other technologies, in particular to a multi-feature fusion sign language recognition method based on adaptive hidden Markov.

技术背景technical background

聋哑人是残疾人中的一个庞大群体,因为无法说话,聋哑人通常使用手语作为沟通手段。当没有学过手语的正常人需要和聋哑人进行交流时,就产生了沟通障碍,而且社会中大部分正常人都没有接受过手语教育。因此手语翻译系统作为便于聋哑人融入社会的辅助方式,对于聋哑人而言,有着重大的意义。但目前手语翻译仍然是计算机视觉领域的一个难题,原因是手语者的身材、做手语的速度和习惯等多方面的因素千差万别,手语识别的情况十分复杂,往往难以取得令人满意的准确率。Deaf people are a large group of people with disabilities, and because they cannot speak, deaf people often use sign language as a means of communication. When normal people who have not learned sign language need to communicate with deaf people, communication barriers arise, and most normal people in society have not received sign language education. Therefore, the sign language interpretation system is of great significance for the deaf-mute as an auxiliary method to facilitate the integration of the deaf and mute into the society. However, sign language translation is still a difficult problem in the field of computer vision at present. The reason is that there are various factors such as the size of signers, the speed and habits of sign language, and the situation of sign language recognition is very complicated, and it is often difficult to obtain satisfactory accuracy.

手语识别是一个序列学习的问题,目前提出的模型有动态时间规整DTW、支持向量机SVM、曲线匹配和神经网络NN等。动态时间规整的计算开销较大,而支持向量机SVM常用于二分类问题,面临多分类问题时则无法使用。使用神经网络的先决条件是拥有大量的训练数据用于模型的训练和优化,当训练数据有限时,神经网络并不能得到最优的模型,因而影响手语识别精度。Sign language recognition is a sequence learning problem, and currently proposed models include dynamic time warping DTW, support vector machine SVM, curve matching and neural network NN. Dynamic time warping is computationally expensive, and support vector machine (SVM) is often used for binary classification problems, but cannot be used when faced with multi-classification problems. The prerequisite for using a neural network is to have a large amount of training data for model training and optimization. When the training data is limited, the neural network cannot obtain the optimal model, thus affecting the accuracy of sign language recognition.

多模态特征融合,传统的特征融合包括前端融合和后端得分融合,前端融合执行在特征的层面,而后端得分融合则是执行在分类识别概率得分层面。后端得分融合通常时间开销过大,而且在不同的模型中,效果较差的特征可能会主导特征融合,降低了融合后的效果。Multi-modal feature fusion, traditional feature fusion includes front-end fusion and back-end score fusion. Front-end fusion is performed at the feature level, while back-end score fusion is performed at the classification and recognition probability score level. Back-end score fusion is usually time-intensive, and in different models, less effective features may dominate feature fusion, reducing the effect of fusion.

发明内容SUMMARY OF THE INVENTION

本发明是为了改善手语识别精度,提供一种基于自适应隐马尔可夫的多特征融合手语识别方法,以期能够实现对于手语视频手语类别的精确识别,并提高识别的鲁棒性。In order to improve the sign language recognition accuracy, the present invention provides a multi-feature fusion sign language recognition method based on adaptive hidden Markov, so as to realize accurate recognition of sign language video sign language categories and improve the robustness of recognition.

本发明为解决技术问题采用如下技术方案:The present invention adopts the following technical scheme for solving the technical problem:

本发明一种基于自适应隐马尔可夫的多特征融合手语识别方法的特点是按如下步骤进行:The feature of a multi-feature fusion sign language recognition method based on adaptive hidden Markov of the present invention is to carry out the following steps:

步骤1、获取手语视频数据库,并将所述手语视频数据库中的手语视频分为训练数据集和测试数据集;所述训练数据集中包含N种手语单词对应的手语视频,每种手语单词对应多个手语视频;将所述N种手语单词记作C={c1,…,cn,…,cN},其中cn为第n种手语单词,1≤n≤N;Step 1, obtain a sign language video database, and divide the sign language video in the sign language video database into a training data set and a test data set; the training data set contains the corresponding sign language videos of N kinds of sign language words, and each sign language word corresponds to many sign language videos; denote the N kinds of sign language words as C={c 1 ,...,c n ,...,c N }, where c n is the nth sign language word, 1≤n≤N;

将所述训练数据集中的每种手语单词对应的多个手语视频作为相应手语单词所对应的手语视频集,从而得到N种手语单词所对应的手语视频集,记为Set1,…,Setn,…,SetN,其中Setn为第n种手语单词cn所对应的手语视频集;The multiple sign language videos corresponding to each sign language word in the training data set are used as the sign language video set corresponding to the corresponding sign language word, so as to obtain the sign language video set corresponding to N kinds of sign language words, which are marked as Set 1 ,...,Set n ,...,Set N , where Set n is the sign language video set corresponding to the nth sign language word c n ;

步骤2、构建特征种类集合F:Step 2. Construct a feature type set F:

对所述训练数据集中的手语视频提取M种特征,得到特征种类集合F={f1,f2,…,fM},fM表示第M种特征,M表示特征种类的总数;Extracting M kinds of features from the sign language video in the training data set to obtain a feature category set F={f 1 , f 2 ,..., f M }, where f M represents the M-th feature, and M represents the total number of feature types;

步骤3、构建特征池集合F′:Step 3. Construct a feature pool set F':

步骤3.1、定义变量i,并初始化i=1;Step 3.1, define variable i, and initialize i=1;

步骤3.2、定义第i个融合特征集合为Fi,并初始化Fi=F;Step 3.2, define the i-th fusion feature set as F i , and initialize F i =F;

步骤3.3、令i=2;Step 3.3, let i=2;

步骤3.4、从所述特征种类集合F中任取i种不同的特征并按序拼接为一种融合特征,从而得到由

Figure GDA0002488552240000026
种融合特征组成的第i个融合特征集合Fi;Step 3.4, arbitrarily select i different features from the feature type set F and splicing them into a fusion feature in sequence, so as to obtain by
Figure GDA0002488552240000026
The i-th fusion feature set F i composed of fusion features;

步骤3.5、令i+1赋值给i,判断i≤M是否成立,若成立,则执行步骤3.4;否则,表示得到M个融合特征集合F1,…,Fi,…,FM,并执行步骤3.6;Step 3.5, assign i+1 to i , and judge whether i≤M is established, if so, execute step 3.4; otherwise, it means to obtain M fusion feature sets F 1 ,...,Fi ,...,FM , and execute Step 3.6;

步骤3.6、将所述M个融合特征集合F1,…,Fi,…,FM中所有的特征构成特征池集合F′,并记为F′={f′1,…,f′m',…,f″M},其中f′m’表示第m′种特征池特征,M′表示特征池特征的总数;Step 3.6. All the features in the M fusion feature sets F 1 ,...,F i ,...,FM form a feature pool set F', and denote it as F'={f' 1 ,...,f' m ' ,...,f" M }, where f'm' represents the m'th feature pool feature, and M' represents the total number of feature pool features;

步骤4、采用高斯混合-隐马尔可夫GMM-HMM模型,构建N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合:Step 4. Use the Gaussian mixture-Hidden Markov GMM-HMM model to construct an adaptive hidden Markov model set of N kinds of sign language words under the features of M' feature pools:

步骤4.1、初始化n=1;Step 4.1, initialize n=1;

步骤4.2、初始化m′=1;Step 4.2, initialize m'=1;

步骤4.3、使用AP吸引子传播聚类算法对第n种手语单词cn对应的手语视频集Setn进行聚类处理,得到所述第n种手语单词cn对应的手语视频集Setn在所述第m′种特征池特征f″m下的特征聚类数

Figure GDA0002488552240000021
Step 4.3. Use the AP attractor propagation clustering algorithm to perform clustering processing on the sign language video set Set n corresponding to the nth sign language word cn, and obtain the sign language video set Set n corresponding to the nth sign language word cn at the location where the sign language video set Set n corresponds . The number of feature clusters under the m′-th feature pool feature f″ m
Figure GDA0002488552240000021

步骤4.4、定义所述第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型为

Figure GDA0002488552240000022
并根据式(1)计算所述自适应隐马尔可夫模型
Figure GDA0002488552240000023
的隐状态个数
Figure GDA0002488552240000024
Step 4.4, define the adaptive hidden Markov model of the nth sign language word c n under the m'th feature pool feature f'm ' as
Figure GDA0002488552240000022
and calculate the adaptive hidden Markov model according to formula (1)
Figure GDA0002488552240000023
The number of hidden states of
Figure GDA0002488552240000024

Figure GDA0002488552240000025
Figure GDA0002488552240000025

式(1)中,G为高斯混合模型中高斯函数个数;In formula (1), G is the number of Gaussian functions in the Gaussian mixture model;

步骤4.5、根据所述隐状态个数

Figure GDA0002488552240000031
和高斯函数个数G,利用Baum-Welch算法在所述第n种手语单词cn对应的手语视频集Setn上进行学习,获得所述第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型
Figure GDA0002488552240000032
Step 4.5, according to the number of hidden states
Figure GDA0002488552240000031
and the number of Gaussian functions G, use the Baum-Welch algorithm to learn on the sign language video set Set n corresponding to the nth sign language word c n , and obtain the nth sign language word c n in the m'th type feature pool Adaptive Hidden Markov Models with Features f'm'
Figure GDA0002488552240000032

步骤4.6、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤4.3;否则,执行步骤4.7;Step 4.6, assign m'+1 to m', and judge whether m'≤M' is established, if so, go to step 4.3; otherwise, go to step 4.7;

步骤4.7、令n+1赋值给n,判断n≤N是否成立,若成立,则执行步骤4.2;否则,表示得到N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合

Figure GDA0002488552240000033
并执行步骤5;Step 4.7. Assign n+1 to n, and judge whether n≤N is established. If so, go to step 4.2; gather
Figure GDA0002488552240000033
and go to step 5;

步骤5、构建选择特征集合F″用于后端得分融合:Step 5. Construct a selection feature set F" for back-end score fusion:

步骤5.1、初始化m′=1;Step 5.1, initialize m'=1;

步骤5.2、获取所述训练数据集中任意一个训练视频A,根据式(2)计算所述训练视频A在第m′种特征池特征f′m’下的得分向量

Figure GDA0002488552240000034
Step 5.2: Obtain any training video A in the training data set, and calculate the score vector of the training video A under the m'th feature pool feature f'm ' according to formula (2).
Figure GDA0002488552240000034

Figure GDA0002488552240000035
Figure GDA0002488552240000035

式(2)中,

Figure GDA0002488552240000036
表示所述训练视频A在所述第n种手语单词cn在第m′种特征池特征fm″下的自适应隐马尔可夫模型
Figure GDA0002488552240000037
上的手语识别概率得分;In formula (2),
Figure GDA0002488552240000036
Represents the adaptive hidden Markov model of the training video A in the nth sign language word c n under the m′th feature pool feature f m
Figure GDA0002488552240000037
Sign language recognition probability score on ;

步骤5.3、重复步骤5.2,直至得到所述训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量;并计算所述训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量的平均方差之和,记为第m′种特征池特征f′m’的训练方差Varm′Step 5.3. Repeat step 5.2 until the score vector of all sign language videos in the training data set under the m'th feature pool feature f'm ' is obtained; and calculate the m'th feature of all sign language videos in the training data set. The sum of the average variances of the score vectors under the pool feature f'm' is recorded as the training variance Var m ' of the m'th feature pool feature f'm ' ;

步骤5.4、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤5.2;否则,表示得到所述M′种特征池特征所对应的训练方差Var1,…,Varm′,…,VarM′,1≤m′≤M′,并执行步骤5.5;Step 5.4, assign m'+1 to m', and judge whether m'≤M' is established, if so, execute step 5.2; otherwise, it means that the training variance Var 1 corresponding to the M' feature pool features is obtained, …,Var m′ ,…,Var M′ , 1≤m′≤M′, and go to step 5.5;

步骤5.5、对所述训练方差Var1,…,Varm′,…,VarM′进行降序排序,得到排序后的训练方差;Step 5.5, sort the training variances Var 1 ,...,Var m' ,...,Var M' in descending order to obtain the sorted training variances;

设置参数1≤K<M′,选取前K个排序后的训练方差所对应的特征池特征,并构成选择特征集合F″,并记为F″={f″1,…,f″k,…,f″K},其中f″k表示第k种选择特征,1≤k≤K;Set the parameter 1≤K<M′, select the feature pool features corresponding to the first K sorted training variances, and form the selected feature set F″, and denote it as F″={f″ 1 ,...,f″ k , ...,f″ K }, where f″ k represents the k-th selection feature, 1≤k≤K;

步骤6、获取所述测试数据集中任意一个测试视频B,计算所述测试视频B在所述选择特征集合F″={f″1,…,f″k,…,f″K}下的各得分向量:Step 6: Acquire any test video B in the test data set, and calculate each of the test video B under the selection feature set F″={f″ 1 ,...,f″ k ,...,f″ K }. Score vector:

步骤6.1、初始化k=1;Step 6.1, initialize k=1;

步骤6.2、根据式(3)计算所述测试视频B在第k种选择特征f″k下的得分向量

Figure GDA0002488552240000041
Step 6.2. Calculate the score vector of the test video B under the k-th selection feature f" k according to formula (3).
Figure GDA0002488552240000041

Figure GDA0002488552240000042
Figure GDA0002488552240000042

式(3)中,

Figure GDA0002488552240000043
表示第n种手语单词cn在第k种选择特征f″k下的自适应马尔可夫模型;
Figure GDA0002488552240000044
表示所述测试视频B在所述第n种手语单词cn在第k种选择特征f″k下的自适应隐马尔可夫模型
Figure GDA0002488552240000045
上的手语识别概率得分;In formula (3),
Figure GDA0002488552240000043
represents the adaptive Markov model of the nth sign language word c n under the kth selection feature f″ k ;
Figure GDA0002488552240000044
represents the adaptive hidden Markov model of the test video B under the nth sign language word cn under the kth selection feature f" k
Figure GDA0002488552240000045
Sign language recognition probability score on ;

步骤6.3、利用Min-Max标准化对所述测试视频B在第k种选择特征f″k下的得分向量

Figure GDA0002488552240000046
进行归一化处理,得到归一化后的得分向量
Figure GDA0002488552240000047
并对所述归一化后的得分向量
Figure GDA0002488552240000048
中的元素进行降序排列后,画出其分数曲线,再计算分数曲线下的区域面积,从而得到所述归一化后的得分向量
Figure GDA0002488552240000049
所对应的权重面积
Figure GDA00024885522400000410
Step 6.3, use Min-Max normalization to the score vector of the test video B under the k-th selection feature f" k
Figure GDA0002488552240000046
Perform normalization processing to get the normalized score vector
Figure GDA0002488552240000047
and the normalized score vector
Figure GDA0002488552240000048
After the elements are arranged in descending order, draw the score curve, and then calculate the area under the score curve, so as to obtain the normalized score vector
Figure GDA0002488552240000049
The corresponding weight area
Figure GDA00024885522400000410

步骤6.4、令k+1赋值给k,若k>K成立,则表示得到所述测试视频B在K种选择特征下的归一化后的得分向量

Figure GDA00024885522400000411
及其所对应的权重面积
Figure GDA00024885522400000412
并执行步骤7;否则,执行步骤6.2;Step 6.4, assign k+1 to k, if k>K is established, it means that the normalized score vector of the test video B under the K selection features is obtained
Figure GDA00024885522400000411
and its corresponding weight area
Figure GDA00024885522400000412
And go to step 7; otherwise, go to step 6.2;

步骤7、后端得分融合计算,并输出所述测试视频B对应的手语单词:Step 7, the back-end score fusion calculation, and output the sign language words corresponding to the test video B:

步骤7.1、根据式(4)计算所测试视频B在第k种选择特征f″k下的归一化后的得分向量

Figure GDA00024885522400000413
的权重
Figure GDA00024885522400000414
从而得到所述K种选择特征下的归一化后的得分向量
Figure GDA00024885522400000415
的各自权重
Figure GDA00024885522400000416
Step 7.1. Calculate the normalized score vector of the tested video B under the k-th selection feature f″ k according to formula (4).
Figure GDA00024885522400000413
the weight of
Figure GDA00024885522400000414
Thereby, the normalized score vector under the K selection features is obtained
Figure GDA00024885522400000415
their respective weights
Figure GDA00024885522400000416

Figure GDA00024885522400000417
Figure GDA00024885522400000417

步骤7.2、根据式(5)得到所述测试视频B的后端得分融合向量

Figure GDA00024885522400000418
Step 7.2, obtain the back-end score fusion vector of the test video B according to formula (5)
Figure GDA00024885522400000418

Figure GDA00024885522400000419
Figure GDA00024885522400000419

步骤7.3、根据式(6)获得所述后端得分融合向量

Figure GDA00024885522400000420
中最大值所对应的手语单词序号n*:Step 7.3, obtain the back-end score fusion vector according to formula (6)
Figure GDA00024885522400000420
The sign language word number n * corresponding to the maximum value:

Figure GDA0002488552240000051
Figure GDA0002488552240000051

从而得到所述测试视频B对应的手语单词为第n*种手语单词

Figure GDA0002488552240000052
Thus obtaining the sign language word corresponding to the test video B is the nth sign language word
Figure GDA0002488552240000052

与已有技术相比,本发明的有益效果体现在:Compared with the prior art, the beneficial effects of the present invention are embodied in:

1、本发明采用了高斯混合-隐马尔可夫模型GMM-HMM,该模型常用来解决序列问题,可以在训练数据较少的情况下,依旧取得较好的效果;本发明利用自适应隐马尔可夫模型,结合前端后端得分融合和特征选择策略,提高了手语识别的精度和鲁棒性;1. The present invention adopts the Gaussian mixture-Hidden Markov model GMM-HMM, which is often used to solve sequence problems, and can still achieve good results with less training data; the present invention uses adaptive hidden Markov models. Kov model, combined with front-end and back-end score fusion and feature selection strategy, improves the accuracy and robustness of sign language recognition;

2、本发明提出了一种自适应隐马尔可夫模型,使用AP吸引子传播聚类算法得到每个手语单词对应的手语视频集在每种特征下的特征聚类数,自适应地得到最佳隐马尔可夫模型参数,为每种特征下的每种手语单词训练不同的自适应隐马尔可夫模型,显著提升了预测效果;2. The present invention proposes an adaptive hidden Markov model, which uses the AP attractor propagation clustering algorithm to obtain the number of feature clusters under each feature of the sign language video set corresponding to each sign language word, and adaptively obtains the maximum number of feature clusters. Optimize the parameters of the hidden Markov model, train different adaptive hidden Markov models for each sign language word under each feature, which significantly improves the prediction effect;

3、本发明采用了前端融合和后端得分融合策略,利用前端融合从提取的视频特征中选取不同特征进行拼接,得到所有可能的融合特征;进一步的,本发明提出的后端得分融合方法提供了一种自适应的权值分配方法,揭示了这些特征的重要性并以加权方式聚合它们的识别概率得分,避免了效果差的特征主导特征融合而影响融合结果;3. The present invention adopts front-end fusion and back-end score fusion strategies, and uses front-end fusion to select different features from the extracted video features for splicing to obtain all possible fusion features; further, the back-end score fusion method proposed by the present invention provides An adaptive weight allocation method is proposed, which reveals the importance of these features and aggregates their recognition probability scores in a weighted manner, avoiding the fusion of poorly effective features that dominate the fusion results and affect the fusion results;

4、本发明提出了一种特征选择策略,选取合适的后端得分融合特征用于后端得分融合,该策略通过对比所有特征的方差性能,选取方差性能好的特征用于后端得分融合,从而避免了使用效果差的特征进行融合而影响融合效果。4. The present invention proposes a feature selection strategy, which selects appropriate back-end score fusion features for back-end score fusion. This strategy selects features with good variance performance for back-end score fusion by comparing the variance performance of all features. In this way, it is avoided to use the features with poor effect for fusion to affect the fusion effect.

附图说明Description of drawings

图1为本发明方法的示意图。Figure 1 is a schematic diagram of the method of the present invention.

具体实施方式Detailed ways

本实施例中,如图1所示,一种基于自适应隐马尔可夫的多特征融合手语识别方法,本方法采用高斯混合-隐马尔可夫模型GMM-HMM,首先对手语视频数据库提取多种特征并进行前端融合,即构建特征池集合;之后构建各手语视频在特征池集合中不同特征下的自适应隐马尔可夫模型,并提出了一种特征选择策略,以得到合适的后端得分融合特征;选择好后端得分融合特征之后,计算各后端得分融合特征下的得分向量,为其分配不同的权重,再进行后端得分融合,从而得到最优的融合效果。具体的说,如图1所示,包括如下步骤:In this embodiment, as shown in FIG. 1, a multi-feature fusion sign language recognition method based on adaptive hidden Markov, this method adopts Gaussian mixture-hidden Markov model GMM-HMM, Then, the adaptive hidden Markov model of each sign language video under different features in the feature pool set is constructed, and a feature selection strategy is proposed to obtain a suitable back-end Score fusion features: After selecting the back-end score fusion features, calculate the score vector under each back-end score fusion feature, assign different weights to them, and then perform back-end score fusion to obtain the optimal fusion effect. Specifically, as shown in Figure 1, it includes the following steps:

步骤1、获取手语视频数据库,并将手语视频数据库中的手语视频分为训练数据集和测试数据集;训练数据集中包含N种手语单词对应的手语视频,每种手语单词对应多个手语视频;将N种手语单词记作C={c1,…,cn,…,cN},其中cn为第n种手语单词,1≤n≤N;Step 1. Obtain a sign language video database, and divide the sign language videos in the sign language video database into a training data set and a test data set; the training data set contains sign language videos corresponding to N kinds of sign language words, and each sign language word corresponds to multiple sign language videos; Denote N kinds of sign language words as C={c 1 ,...,c n ,...,c N }, where c n is the nth sign language word, 1≤n≤N;

将训练数据集中的每种手语单词对应的多个手语视频作为相应手语单词所对应的手语视频集,从而得到N种手语单词所对应的手语视频集,记为Set1,…,Setn,…,SetN,其中Setn为第n种手语单词cn所对应的手语视频集;The multiple sign language videos corresponding to each sign language word in the training data set are taken as the sign language video set corresponding to the corresponding sign language word, so as to obtain the sign language video set corresponding to N kinds of sign language words, which are recorded as Set 1 ,...,Set n ,... , Set N , where Set n is the sign language video set corresponding to the nth sign language word c n ;

在本实施例中,手语视频数据库中共有370种手语单词对应的手语视频,每种手语单词对应25个手语视频,手语视频分别由5个人演示,每个人重复演示5遍;手语单词可以是单词或词组;In this embodiment, there are 370 sign language videos corresponding to sign language words in the sign language video database, each sign language word corresponds to 25 sign language videos, and the sign language videos are demonstrated by 5 people, and each person repeats the demonstration 5 times; sign language words can be words or phrase;

步骤2、构建特征种类集合F:Step 2. Construct a feature type set F:

对训练数据集中的手语视频提取M种特征,得到特征种类集合F={f1,f2,…,fM},fM表示第M种特征,M表示特征种类的总数;Extract M features from the sign language video in the training data set, and obtain a feature category set F={f 1 , f 2 ,..., f M }, where f M represents the M-th feature, and M represents the total number of feature types;

步骤3、构建特征池集合F′:Step 3. Construct a feature pool set F':

步骤3.1、定义变量i,并初始化i=1;Step 3.1, define variable i, and initialize i=1;

步骤3.2、定义第i个融合特征集合为Fi,并初始化Fi=F;Step 3.2, define the i-th fusion feature set as F i , and initialize F i =F;

步骤3.3、令i=2;Step 3.3, let i=2;

步骤3.4、从特征种类集合F中任取i种不同的特征并按序拼接为一种融合特征,从而得到由

Figure GDA0002488552240000061
种融合特征组成的第i个融合特征集合Fi;Step 3.4, randomly select i different features from the feature type set F and splicing them into a fusion feature in sequence, so as to obtain the
Figure GDA0002488552240000061
The i-th fusion feature set F i composed of fusion features;

步骤3.5、令i+1赋值给i,判断i≤M是否成立,若成立,则执行步骤3.4;否则,表示得到M个融合特征集合F1,…,Fi,…,FM,并执行步骤3.6;Step 3.5, assign i+1 to i , and judge whether i≤M is established, if so, execute step 3.4; otherwise, it means to obtain M fusion feature sets F 1 ,...,Fi ,...,FM , and execute Step 3.6;

步骤3.6、将M个融合特征集合F1,…,Fi,…,FM中所有的特征构成特征池集合F′,并记为F′={f′1,…,f′m’,…,f′M’},其中f′m’表示第m′种特征池特征,M′表示特征池特征的总数;Step 3.6. All the features in the M fusion feature sets F 1 ,...,F i ,...,FM constitute a feature pool set F', and denote it as F'={f' 1 ,...,f'm' , ...,f'M' }, where f'm' represents the m'th feature pool feature, and M' represents the total number of feature pool features;

在本实施例中,对手语视频数据库中所有手语视频提取方向梯度直方图HOG特征,并使用PCA主成分分析法对所有特征进行降维处理,从而得到HOG特征;In the present embodiment, the HOG feature of the directional gradient histogram is extracted from all the sign language videos in the sign language video database, and the PCA principal component analysis method is used to perform dimensionality reduction processing on all the features, thereby obtaining the HOG feature;

对手语视频数据库中所有手语视频提取骨架结点坐标的SP特征,并对所有SP特征使用随机高斯扰动进行处理,适当增加噪声,避免造成过拟合,从而得到SP特征;Extract SP features of skeleton node coordinates from all sign language videos in the sign language video database, and use random Gaussian perturbation to process all SP features, increase noise appropriately to avoid overfitting, and obtain SP features;

对手语视频数据库中所有手语视频的SP特征和HOG特征拼接,从而得到SP-HOG前端融合特征;Splicing SP features and HOG features of all sign language videos in the sign language video database to obtain SP-HOG front-end fusion features;

步骤4、采用高斯混合-隐马尔可夫GMM-HMM模型,构建N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合:Step 4. Use the Gaussian mixture-Hidden Markov GMM-HMM model to construct an adaptive hidden Markov model set of N kinds of sign language words under the features of M' feature pools:

步骤4.1、初始化n=1;Step 4.1, initialize n=1;

步骤4.2、初始化m′=1;Step 4.2, initialize m'=1;

步骤4.3、使用AP吸引子传播聚类算法对第n种手语单词cn对应的手语视频集Setn进行聚类处理,得到第n种手语单词cn对应的手语视频集Setn在第m′种特征池特征f′m’下的特征聚类数

Figure GDA0002488552240000071
Step 4.3. Use the AP attractor propagation clustering algorithm to perform clustering processing on the sign language video set Set n corresponding to the nth sign language word c n , and obtain the sign language video set Set n corresponding to the nth sign language word c n in the m'th. The number of feature clusters under the feature pool feature f'm'
Figure GDA0002488552240000071

步骤4.4、定义第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型为

Figure GDA0002488552240000072
并根据式(1)计算自适应隐马尔可夫模型
Figure GDA0002488552240000073
的隐状态个数
Figure GDA0002488552240000074
Step 4.4. Define the adaptive hidden Markov model of the nth sign language word c n under the m'th feature pool feature f'm ' as
Figure GDA0002488552240000072
And according to formula (1) to calculate the adaptive hidden Markov model
Figure GDA0002488552240000073
The number of hidden states of
Figure GDA0002488552240000074

Figure GDA0002488552240000075
Figure GDA0002488552240000075

式(1)中,G为高斯混合模型中高斯函数个数;在本实施例中,G被设置为3;In formula (1), G is the number of Gaussian functions in the Gaussian mixture model; in this embodiment, G is set to 3;

步骤4.5、根据隐状态个数

Figure GDA0002488552240000076
和高斯函数个数G,利用Baum-Welch算法在第n种手语单词cn对应的手语视频集Setn上进行学习,获得第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型
Figure GDA0002488552240000077
Baum-Welch算法是一种解决隐马尔可夫模型参数估计问题的经典算法;Step 4.5, according to the number of hidden states
Figure GDA0002488552240000076
and the number of Gaussian functions G, use the Baum-Welch algorithm to learn on the sign language video set Set n corresponding to the nth sign language word c n , and obtain the nth sign language word c n in the m'th type feature pool feature f'm ' Adaptive Hidden Markov Models under
Figure GDA0002488552240000077
Baum-Welch algorithm is a classical algorithm to solve the parameter estimation problem of Hidden Markov Model;

步骤4.6、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤4.3;否则,执行步骤4.7;Step 4.6, assign m'+1 to m', and judge whether m'≤M' is established, if so, go to step 4.3; otherwise, go to step 4.7;

步骤4.7、令n+1赋值给n,判断n≤N是否成立,若成立,则执行步骤4.2;否则,表示得到N种手语单词在M′种特征池特征下的自适应隐马尔可夫模型集合

Figure GDA0002488552240000078
并执行步骤5;Step 4.7. Assign n+1 to n, and judge whether n≤N is established. If so, go to step 4.2; gather
Figure GDA0002488552240000078
and go to step 5;

步骤5、构建选择特征集合F″用于后端得分融合:Step 5. Construct a selection feature set F" for back-end score fusion:

步骤5.1、初始化m′=1;Step 5.1, initialize m'=1;

步骤5.2、获取训练数据集中任意一个训练视频A,根据式(2)计算训练视频A在第m′种特征池特征f′m’下的得分向量

Figure GDA0002488552240000079
Figure GDA00024885522400000710
Step 5.2. Obtain any training video A in the training data set, and calculate the score vector of training video A under the m'th feature pool feature f'm ' according to formula (2).
Figure GDA0002488552240000079
Figure GDA00024885522400000710

Figure GDA00024885522400000711
Figure GDA00024885522400000711

式(2)中,

Figure GDA00024885522400000712
表示训练视频A在第n种手语单词cn在第m′种特征池特征f′m’下的自适应隐马尔可夫模型
Figure GDA00024885522400000713
上的手语识别概率得分;由维特比Vertibe算法在自适应隐马尔可夫模型
Figure GDA00024885522400000714
上计算得到;维特比Vertibe算法是一种动态规划算法,被广泛应用于求解隐马尔可夫模型的预测问题;In formula (2),
Figure GDA00024885522400000712
Represents the adaptive hidden Markov model of training video A in the nth sign language word c n under the m'th feature pool feature f'm '
Figure GDA00024885522400000713
Sign Language Recognition Probability Scores on ; by the Viterbi Vertibe Algorithm in Adaptive Hidden Markov Models
Figure GDA00024885522400000714
The Viterbi algorithm is a dynamic programming algorithm, which is widely used to solve the prediction problem of hidden Markov models;

步骤5.3、重复步骤5.2,直至得到训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量;并计算训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量的平均方差之和,记为第m′种特征池特征f′m’的训练方差Varm′Step 5.3, repeat step 5.2 until the score vector of all sign language videos in the training dataset under the m'th feature pool feature f'm' is obtained; and calculate the m'th feature pool feature f' of all sign language videos in the training dataset The sum of the average variances of the score vectors under m' is denoted as the training variance Var m' of the m'th feature pool feature f' m ';

在本实施例中,将训练数据集中所有手语视频在第m′种特征池特征f′m’下的得分向量的平均方差之和作为特征选择的标准;得分向量

Figure GDA0002488552240000081
的平均方差反映得分向量中的变量与得分向量
Figure GDA0002488552240000082
均值的偏离度,较小的平均方差意味着训练视频A在第m′种特征池特征f′m’下,与不同种手语单词相关的概率难以区分,相反,更大的平均方差表明得分向量
Figure GDA0002488552240000083
具有良好的辨别力;In this embodiment, the sum of the average variances of the score vectors of all sign language videos in the training data set under the m'-th feature pool feature f'm' is used as the criterion for feature selection; the score vector
Figure GDA0002488552240000081
The mean variance of reflect the variables in the score vector with the score vector
Figure GDA0002488552240000082
The degree of deviation from the mean, a smaller mean variance means that the training video A is indistinguishable from the probability associated with different sign language words under the m'th feature pool feature f'm ' , on the contrary, a larger mean variance indicates that the score vector
Figure GDA0002488552240000083
have good discrimination;

步骤5.4、令m′+1赋值给m′,判断m′≤M′是否成立,若成立,则执行步骤5.2;否则,表示得到M′种特征池特征所对应的训练方差Var1,…,Varm′,…,VarM′,1≤m′≤M′,并执行步骤5.5;Step 5.4. Assign m'+1 to m', and judge whether m'≤M' is true. If it is true, go to step 5.2; otherwise, it means that the training variance Var 1 ,..., Var m′ ,…,Var M′ , 1≤m′≤M′, and execute step 5.5;

步骤5.5、对训练方差Var1,…,Varm′,…,VarM′进行降序排序,得到排序后的训练方差;Step 5.5, sort the training variances Var 1 ,…,Var m′ ,…,Var M′ in descending order to obtain the sorted training variances;

设置参数1≤K<M′,选取前K个排序后的训练方差所对应的特征池特征,并构成选择特征集合F″,并记为F″={f″1,…,f″k,…,f″K},其中f″k表示第k种选择特征,1≤k≤K;Set the parameter 1≤K<M′, select the feature pool features corresponding to the first K sorted training variances, and form the selected feature set F″, and denote it as F″={f″ 1 ,...,f″ k , ...,f″ K }, where f″ k represents the k-th selection feature, 1≤k≤K;

步骤6、获取测试数据集中任意一个测试视频B,计算测试视频B在选择特征集合F″={f″1,…,f″k,…,f″K}下的各得分向量:Step 6: Obtain any test video B in the test data set, and calculate each score vector of the test video B under the selection feature set F″={f″ 1 ,...,f″ k ,...,f″ K }:

步骤6.1、初始化k=1;Step 6.1, initialize k=1;

步骤6.2、根据式(3)计算测试视频B在第k种选择特征f″k下的得分向量

Figure GDA0002488552240000084
Figure GDA0002488552240000085
Step 6.2. Calculate the score vector of the test video B under the k-th selection feature f″ k according to formula (3).
Figure GDA0002488552240000084
Figure GDA0002488552240000085

Figure GDA0002488552240000086
Figure GDA0002488552240000086

式(3)中,

Figure GDA0002488552240000087
表示第n种手语单词cn在第k种选择特征f″k下的自适应马尔可夫模型;
Figure GDA0002488552240000088
表示测试视频B在第n种手语单词cn在第k种选择特征f″k下的自适应隐马尔可夫模型
Figure GDA0002488552240000089
上的手语识别概率得分;In formula (3),
Figure GDA0002488552240000087
represents the adaptive Markov model of the nth sign language word c n under the kth selection feature f″ k ;
Figure GDA0002488552240000088
represents the adaptive hidden Markov model of the test video B under the nth sign language word c n under the kth selection feature f″ k
Figure GDA0002488552240000089
Sign language recognition probability score on ;

步骤6.3、利用Min-Max标准化对测试视频B在第k种选择特征f″k下的得分向量

Figure GDA00024885522400000810
进行归一化处理,得到归一化后的得分向量
Figure GDA0002488552240000091
并对归一化后的得分向量
Figure GDA0002488552240000092
中的元素进行降序排列后,画出其分数曲线,再计算分数曲线下的区域面积,从而得到归一化后的得分向量
Figure GDA0002488552240000093
所对应的权重面积
Figure GDA0002488552240000094
Step 6.3. Use Min-Max normalization to measure the score vector of test video B under the kth selection feature f" k
Figure GDA00024885522400000810
Perform normalization processing to get the normalized score vector
Figure GDA0002488552240000091
and the normalized score vector
Figure GDA0002488552240000092
After the elements are sorted in descending order, draw the score curve, and then calculate the area under the score curve to obtain the normalized score vector
Figure GDA0002488552240000093
The corresponding weight area
Figure GDA0002488552240000094

步骤6.4、令k+1赋值给k,若k>K成立,则表示得到测试视频B在K种选择特征下的归一化后的得分向量

Figure GDA0002488552240000095
及其所对应的权重面积
Figure GDA0002488552240000096
并执行步骤7;否则,执行步骤6.2;Step 6.4. Assign k+1 to k. If k>K is established, it means that the normalized score vector of the test video B under the K selection features is obtained.
Figure GDA0002488552240000095
and its corresponding weight area
Figure GDA0002488552240000096
And go to step 7; otherwise, go to step 6.2;

步骤7、后端得分融合计算,并输出测试视频B对应的手语单词:Step 7, the back-end score fusion calculation, and output the sign language words corresponding to the test video B:

步骤7.1、根据式(4)计算所测试视频B在第k种选择特征f″k下的归一化后的得分向量

Figure GDA0002488552240000097
的权重
Figure GDA0002488552240000098
从而得到K种选择特征下的归一化后的得分向量
Figure GDA0002488552240000099
的各自权重
Figure GDA00024885522400000910
Step 7.1. Calculate the normalized score vector of the tested video B under the k-th selection feature f″ k according to formula (4).
Figure GDA0002488552240000097
the weight of
Figure GDA0002488552240000098
Thereby, the normalized score vector under K kinds of selection features is obtained
Figure GDA0002488552240000099
their respective weights
Figure GDA00024885522400000910

Figure GDA00024885522400000911
Figure GDA00024885522400000911

步骤7.2、根据式(5)得到测试视频B的后端得分融合向量

Figure GDA00024885522400000912
Figure GDA00024885522400000913
Step 7.2, obtain the back-end score fusion vector of test video B according to formula (5)
Figure GDA00024885522400000912
Figure GDA00024885522400000913

Figure GDA00024885522400000914
Figure GDA00024885522400000914

步骤7.3、根据式(6)获得后端得分融合向量

Figure GDA00024885522400000917
中最大值所对应的手语单词序号n*:Step 7.3, obtain the back-end score fusion vector according to formula (6)
Figure GDA00024885522400000917
The sign language word number n * corresponding to the maximum value:

Figure GDA00024885522400000915
Figure GDA00024885522400000915

从而得到测试视频B对应的手语单词为第n*种手语单词

Figure GDA00024885522400000916
Thereby, the sign language word corresponding to the test video B is obtained as the n * th sign language word
Figure GDA00024885522400000916

Claims (1)

1. A multi-feature fusion sign language recognition method based on self-adaptive hidden Markov is characterized by comprising the following steps:
step 1, obtaining a sign language video database and counting the sign language video frequencyDividing sign language videos in a database into a training data set and a testing data set; the training data set comprises sign language videos corresponding to N sign language words, and each sign language word corresponds to a plurality of sign language videos; the N sign language words are marked as C ═ C1,…,cn,…,cNIn which c isnN is more than or equal to 1 and less than or equal to N;
taking a plurality of sign language videos corresponding to each sign language word in the training data Set as a sign language video Set corresponding to the corresponding sign language word, thereby obtaining sign language video sets corresponding to N sign language words, and recording the sign language video sets as Set1,…,Setn,…,SetNWhere SetnFor the nth sign language word cnThe corresponding sign language video set;
step 2, constructing a feature type set F:
extracting M kinds of features from the sign language video in the training data set to obtain a feature type set F ═ F1,f2,…,fM},fMRepresenting the Mth feature, wherein M represents the total number of the feature types;
step 3, constructing a feature pool set F':
step 3.1, defining a variable i, and initializing i to be 1;
step 3.2, define the ith fusion feature set as FiAnd initializing Fi=F;
Step 3.3, changing i to 2;
step 3.4, randomly selecting i different characteristics from the characteristic type set F and sequentially splicing the characteristics into a fused characteristic, thereby obtaining a feature set
Figure FDA0002488552230000011
The ith fusion feature set F consisting of fusion featuresi
Step 3.5, assigning i +1 to i, judging whether i is less than or equal to M, and if so, executing step 3.4; otherwise, M fusion feature sets F are obtained1,…,Fi,…,FMAnd executing the step 3.6;
step 3.6, integrating the M fusion feature sets F1,…,Fi,…,FMAll the features in the set constitute a feature pool set F', and is denoted as F ═ F1′,…,f′m',…,f′M′Of f'm′Representing the M 'th feature pool feature, and M' representing the total number of feature pool features;
step 4, adopting a Gaussian mixture-hidden Markov GMM-HMM model to construct a self-adaptive hidden Markov model set of N sign language words under M' feature pool characteristics:
step 4.1, initializing n to 1;
step 4.2, initializing m' ═ 1;
step 4.3, using the AP attractor to spread the clustering algorithm to the nth sign language word cnCorresponding sign language video SetnClustering to obtain the nth sign language word cnCorresponding sign language video SetnAt m 'type feature pool feature f'm′Characteristic cluster number of
Figure FDA0002488552230000021
Step 4.4, defining the nth sign language word cnAt m 'type feature pool feature f'm′The adaptive hidden Markov model of
Figure FDA0002488552230000022
And calculating the adaptive hidden Markov model according to equation (1)
Figure FDA0002488552230000023
Number of hidden states of
Figure FDA0002488552230000024
Figure FDA0002488552230000025
In the formula (1), G is the number of Gaussian functions in a Gaussian mixture model;
step 4.5, according toThe number of the hidden states
Figure FDA0002488552230000026
And a number of Gaussian functions G, using the Baum-Welch algorithm to generate the n sign language word cnCorresponding sign language video SetnThe nth sign language word c is obtained by learningnAt m 'type feature pool feature f'm′Adaptive hidden Markov model under
Figure FDA0002488552230000027
Step 4.6, assigning M '+ 1 to M', judging whether M 'is less than or equal to M' and executing step 4.3 if M 'is less than or equal to M'; otherwise, executing step 4.7;
step 4.7, assigning N +1 to N, judging whether N is equal to or less than N, and if so, executing step 4.2; otherwise, the self-adaptive hidden Markov model set of the N sign language words under the M' feature pool features is obtained
Figure FDA0002488552230000028
And executing the step 5;
and 5, constructing a selection feature set F' for rear-end score fusion:
step 5.1, initializing m' ═ 1;
step 5.2, obtaining any training video A in the training data set, and calculating the feature f ' of the training video A in the m ' th feature pool according to the formula (2) 'm′Score vector of
Figure FDA0002488552230000029
Figure FDA00024885522300000210
In the formula (2), the reaction mixture is,
Figure FDA00024885522300000211
represents the training video A inThe nth sign language word cnAt m 'type feature pool feature f'm′Adaptive hidden Markov model under
Figure FDA00024885522300000212
Sign language identification probability score;
step 5.3, repeating the step 5.2 until the m ' th feature pool feature f ' of all sign language videos in the training data set is obtained 'm′A lower score vector; and calculating the feature f ' of the m ' type feature pool of all sign language videos in the training data set 'm′The sum of the average variances of the score vectors is recorded as the m 'th feature pool feature f'm′Training variance of (Var)m′
Step 5.4, assigning M '+ 1 to M', judging whether M 'is less than or equal to M' and executing step 5.2 if M 'is less than or equal to M'; otherwise, the training variance Var corresponding to the M' feature pool features is obtained1,…,Varm′,…,VarM′M 'is more than or equal to 1 and less than or equal to M', and the step 5.5 is executed;
step 5.5, to the training variance Var1,…,Varm′,…,VarM′Sorting in a descending order to obtain sorted training variances;
setting parameter 1 ≤ K<M', selecting the characteristic pool characteristics corresponding to the first K sequenced training variances, and forming a selected characteristic set F ″, which is recorded as F ″ { F ″)1,…,f″k,…,f″KWhere f ″)kK is more than or equal to 1 and less than or equal to K;
step 6, obtaining any one test video B in the test data set, and calculating the selected feature set F ″ ═ F of the test video B1″,…,f″k,…,f″KScore vectors under }:
step 6.1, initializing k to 1;
step 6.2, calculating the k-th selection characteristic f' of the test video B according to the formula (3)kScore vector of
Figure FDA0002488552230000031
Figure FDA0002488552230000032
In the formula (3), the reaction mixture is,
Figure FDA0002488552230000033
indicating the nth sign language word cnSelection of feature f' in the kthkA lower adaptive Markov model;
Figure FDA0002488552230000034
indicating that said test video B is in said nth sign language word cnSelection of feature f' in the kthkAdaptive hidden Markov model under
Figure FDA0002488552230000035
Sign language identification probability score;
6.3, selecting a characteristic f' of the k-th type of the test video B by utilizing Min-Max standardizationkScore vector of
Figure FDA0002488552230000036
Carrying out normalization processing to obtain a normalized score vector
Figure FDA0002488552230000037
And to the normalized score vector
Figure FDA0002488552230000038
After the elements in the vector are arranged in a descending order, a fraction curve is drawn, and the area under the fraction curve is calculated, so that the normalized score vector is obtained
Figure FDA0002488552230000039
Corresponding weight area
Figure FDA00024885522300000310
And 6.4, assigning K +1 to K, and if K is greater than K, indicating that the normalized score vector of the test video B under K selection characteristics is obtained
Figure FDA00024885522300000311
And the corresponding weight area
Figure FDA00024885522300000312
And executing step 7; otherwise, executing step 6.2;
and 7, performing fusion calculation on the rear-end scores, and outputting the sign language words corresponding to the test video B:
step 7.1, calculating the k-th selection characteristic f' of the tested video B according to the formula (4)kNormalized score vector of
Figure FDA00024885522300000313
Weight of (2)
Figure FDA00024885522300000314
Thereby obtaining the normalized score vector under the K selection characteristics
Figure FDA00024885522300000315
Respective weights of
Figure FDA00024885522300000316
Figure FDA00024885522300000317
Step 7.2, obtaining the rear-end score fusion vector of the test video B according to the formula (5)
Figure FDA0002488552230000041
Figure FDA0002488552230000042
Step 7.3, obtaining the rear-end score fusion vector according to the formula (6)
Figure FDA0002488552230000043
Sign language word sequence number n corresponding to the maximum value in the sequence*
Figure FDA0002488552230000044
Thereby obtaining the sign language word corresponding to the test video B as the nth*Sign language word
Figure FDA0002488552230000045
CN201811131806.9A 2018-09-27 2018-09-27 Multi-feature fusion sign language recognition method based on adaptive hidden Markov Active CN109409231B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811131806.9A CN109409231B (en) 2018-09-27 2018-09-27 Multi-feature fusion sign language recognition method based on adaptive hidden Markov

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811131806.9A CN109409231B (en) 2018-09-27 2018-09-27 Multi-feature fusion sign language recognition method based on adaptive hidden Markov

Publications (2)

Publication Number Publication Date
CN109409231A CN109409231A (en) 2019-03-01
CN109409231B true CN109409231B (en) 2020-07-10

Family

ID=65466362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811131806.9A Active CN109409231B (en) 2018-09-27 2018-09-27 Multi-feature fusion sign language recognition method based on adaptive hidden Markov

Country Status (1)

Country Link
CN (1) CN109409231B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259804B (en) * 2020-01-16 2023-03-14 合肥工业大学 Multi-modal fusion sign language recognition system and method based on graph convolution
CN111259860B (en) * 2020-02-17 2022-03-15 合肥工业大学 Multi-order feature dynamic fusion sign language translation method based on data self-driven
CN113642422B (en) * 2021-07-27 2024-05-24 东北电力大学 Continuous Chinese sign language recognition method
CN116471460B (en) * 2023-05-08 2025-06-13 东南大学 A fast recognition method for resolution-adaptive encrypted videos based on video fingerprint

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893942A (en) * 2016-03-25 2016-08-24 中国科学技术大学 eSC and HOG-based adaptive HMM sign language identifying method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9817881B2 (en) * 2013-10-16 2017-11-14 Cypress Semiconductor Corporation Hidden markov model processing engine

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893942A (en) * 2016-03-25 2016-08-24 中国科学技术大学 eSC and HOG-based adaptive HMM sign language identifying method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition;DAN GUO等;《ACM Transactions Multimedia Computing, Communications, and Applications(TOMM)》;20180131;第14卷(第1期);第8:2-8:15页 *
SIGN LANGUAGE RECOGNITION BASED ON ADAPTIVE HMMS WITH DATA AUGMENTATION;Dan Guo等;《2016 IEEE International Conference on Image Processing (ICIP)》;20160819;第2876-2880页 *
基于Kinect3D节点的连续HMM手语识别;沈娟等;《计算机与信息工程》;20170531;第40卷(第5期);第638-642页 *

Also Published As

Publication number Publication date
CN109409231A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
Taherkhani et al. AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning
CN108182427B (en) A face recognition method based on deep learning model and transfer learning
CN109409231B (en) Multi-feature fusion sign language recognition method based on adaptive hidden Markov
US7362892B2 (en) Self-optimizing classifier
CN103605972B (en) Non-restricted environment face verification method based on block depth neural network
CN110164452A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN117611932B (en) Image classification method and system based on double pseudo tag refinement and sample re-weighting
CN106022273A (en) Handwritten form identification system of BP neural network based on dynamic sample selection strategy
CN101447020A (en) Pornographic image recognizing method based on intuitionistic fuzzy
WO2021190046A1 (en) Training method for gesture recognition model, gesture recognition method, and apparatus
CN116226629B (en) Multi-model feature selection method and system based on feature contribution
CN102103691A (en) Identification method for analyzing face based on principal component
CN109493916A (en) A kind of Gene-gene interactions recognition methods based on sparsity factorial analysis
CN113420833A (en) Visual question-answering method and device based on question semantic mapping
CN111079837A (en) Method for detecting, identifying and classifying two-dimensional gray level images
CN107193993A (en) The medical data sorting technique and device selected based on local learning characteristic weight
CN109101984B (en) Image identification method and device based on convolutional neural network
CN109255339A (en) Classification method based on adaptive depth forest body gait energy diagram
Azam et al. Speaker verification using adapted bounded Gaussian mixture model
CN109948662B (en) A deep clustering method of face images based on K-means and MMD
CN111694954A (en) Image classification method and device and electronic equipment
CN111860601B (en) Method and device for predicting type of large fungi
Halkias et al. Sparse penalty in deep belief networks: using the mixed norm constraint
CN108734116B (en) Face recognition method based on variable speed learning deep self-coding network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant