[go: up one dir, main page]

CN102543075A - Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology - Google Patents

Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology Download PDF

Info

Publication number
CN102543075A
CN102543075A CN201210008213XA CN201210008213A CN102543075A CN 102543075 A CN102543075 A CN 102543075A CN 201210008213X A CN201210008213X A CN 201210008213XA CN 201210008213 A CN201210008213 A CN 201210008213A CN 102543075 A CN102543075 A CN 102543075A
Authority
CN
China
Prior art keywords
recognition
speaker
svm
virtual instrument
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210008213XA
Other languages
Chinese (zh)
Inventor
刘祥楼
吴香艳
张明
姜继玉
刘昭廷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Petroleum University
Original Assignee
Northeast Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Petroleum University filed Critical Northeast Petroleum University
Priority to CN201210008213XA priority Critical patent/CN102543075A/en
Publication of CN102543075A publication Critical patent/CN102543075A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

本发明涉及的是基于虚拟仪器技术的说话人VQ-SVM并行识别系统,这种基于虚拟仪器技术的说话人VQ-SVM并行识别系统包括语音预处理单元、特征提取单元、说话人模型单元、识别单元、LabVIEW虚拟仪器平台,在虚拟仪器平台上通过LabVIEW子VI来实现将一个大程序分割成各小模块,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建。本发明克服现有的VQ-SVM两种方法混合进行说话人识别时需要串行识别浪费时间的,提出将VQ和SVM两种方法集中在同一个平台上实现并行识别处理,从而在提高整个系统的识别效果的前提下节省识别时间。       

Figure 201210008213

The present invention relates to a speaker VQ-SVM parallel recognition system based on virtual instrument technology. This speaker VQ-SVM parallel recognition system based on virtual instrument technology includes a speech preprocessing unit, a feature extraction unit, a speaker model unit, a recognition Unit, LabVIEW virtual instrument platform, on the virtual instrument platform, a large program is divided into small modules through LabVIEW subVI, and the program parts that call MATLAB nodes involved in the program are written into subVIs, and by calling these Sub VI to realize the construction of the system. The present invention overcomes the need for serial recognition to waste time when the existing VQ-SVM two methods are mixed for speaker recognition, and proposes to concentrate the VQ and SVM two methods on the same platform to realize parallel recognition processing, thereby improving the overall system The recognition time is saved under the premise of the recognition effect.

Figure 201210008213

Description

基于虚拟仪器技术的说话人VQ-SVM并行识别系统Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology

   the

一、     技术领域: 1. Technical field:

本发明涉及的是信号处理和模式识别领域,具体涉及的是基于虚拟仪器技术的说话人VQ-SVM并行识别系统。 The invention relates to the field of signal processing and pattern recognition, in particular to a speaker VQ-SVM parallel recognition system based on virtual instrument technology.

二、背景技术:   2. Background technology:

说话人识别是通过分析说话人的语音特征达到识别出说话人身份的目的。说话人识别方法主要包括矢量量化方法、概率统计方法、判别分类器方法等。按说话人识别系统构成原理,说话人识别主要包括训练和识别两个阶段如图1所示。首先获取原始语音信号,再经过预处理得到干净的语音信号,然后提取语音特征参数之后再通过特定的方法实现说话人训练与识别。其说话人模型通常采用数据库存储大量按特定算法处理后的语音特征样本,而待识别语音经过预处理和特征提取后与数据库中的样本集进行匹配计算后实现判别。 Speaker recognition is to identify the identity of the speaker by analyzing the speaker's speech characteristics. Speaker recognition methods mainly include vector quantization methods, probability statistics methods, and discriminant classifier methods. According to the composition principle of the speaker recognition system, speaker recognition mainly includes two stages of training and recognition, as shown in Figure 1. Firstly, the original speech signal is obtained, and then a clean speech signal is obtained through preprocessing, and then the speech feature parameters are extracted, and then speaker training and recognition are realized through a specific method. The speaker model usually uses a database to store a large number of speech feature samples processed by a specific algorithm, and the speech to be recognized is matched with the sample set in the database after preprocessing and feature extraction to achieve discrimination.

任何单一方法既有优势也有局限,目前研究的较多的是将两种或两种以上方法结合在一起的混合识别方法。VQ技术是一种数据压缩和编码技术;SVM是基于统计理论的机器学习方法。这两种方法具有互补性,矢量量化(VQ)方法的优点是大样本分类特性较发好,模型数量少,训练时间短,识别响应较快,缺点是不能解决非线性问题,抗噪性能差;支持向量机(SVM)方法的优点是小样本分类较好,在解决非线性及高维模式识别问题中表现出特有的优势,缺点是训练算法复杂且训练速度慢,难以处理大样本数据。虽然曾有VQ-SVM两种方法混合进行说话人识别,但各类运算通常都是在MATLAB平台上实现的。因此,若用多种不同方法也只能采用串行方式进行。同样,现有的VQ-SVM两种方法混合进行说话人识别也同样是先用一种方法进行初次识别再用另一种方法进行二次识别的所谓串行识别。不难发现这种串行识别方法的最大弱点是即占用机器资源又浪费识别时间。  Any single method has both advantages and limitations. At present, most of the research is a hybrid recognition method that combines two or more methods. VQ technology is a data compression and coding technology; SVM is a machine learning method based on statistical theory. These two methods are complementary. The advantages of the vector quantization (VQ) method are that the classification characteristics of large samples are better, the number of models is small, the training time is short, and the recognition response is faster. The disadvantage is that it cannot solve nonlinear problems and has poor anti-noise performance. The advantage of the support vector machine (SVM) method is that small sample classification is better, and it shows unique advantages in solving nonlinear and high-dimensional pattern recognition problems. The disadvantage is that the training algorithm is complex and the training speed is slow, and it is difficult to handle large sample data. Although there have been two methods of VQ-SVM mixed for speaker recognition, various operations are usually implemented on the MATLAB platform. Therefore, if a variety of different methods are used, it can only be carried out in a serial manner. Similarly, the existing VQ-SVM two methods are mixed for speaker recognition, which is also the so-called serial recognition in which one method is used for initial recognition and then another method is used for secondary recognition. It is not difficult to find that the biggest weakness of this serial identification method is that it takes up machine resources and wastes identification time. the

三、发明内容:  3. Contents of the invention:

本发明的目的是提供基于虚拟仪器技术的说话人VQ-SVM并行识别系统,它用于解决现有的VQ-SVM两种方法混合进行说话人识别即占用机器资源又浪费识别时间的问题。 The object of the present invention is to provide a speaker VQ-SVM parallel recognition system based on virtual instrument technology, which is used to solve the problem that the existing VQ-SVM two methods are mixed for speaker recognition, which occupies machine resources and wastes recognition time.

本发明解决其技术问题所采用的技术方案是:这种基于虚拟仪器技术的说话人VQ-SVM并行识别系统包括语音预处理单元、特征提取单元、说话人模型单元、识别单元、LabVIEW虚拟仪器平台,在虚拟仪器平台上通过LabVIEW子VI来实现将一个大程序分割成各小模块,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建;  The technical solution adopted by the present invention to solve its technical problems is: this speaker VQ-SVM parallel recognition system based on virtual instrument technology includes a speech preprocessing unit, a feature extraction unit, a speaker model unit, a recognition unit, and a LabVIEW virtual instrument platform On the virtual instrument platform, a large program is divided into small modules through LabVIEW subVIs, and the program parts that call MATLAB nodes involved in the program are written into subVIs, and the system is realized by calling these subVIs. Construct;

采用VQ算法,建立VQ模型 ,初始码本采用分裂法,选取特征向量的形心作为初始码书,在LabVIEW中通过调用MATLAB节点来实现说话人模型的建立及存储,算法公式如下: Using the VQ algorithm, the VQ model is established. The initial codebook adopts the split method, and the centroid of the feature vector is selected as the initial codebook. In LabVIEW, the establishment and storage of the speaker model is realized by calling the MATLAB node. The algorithm formula is as follows:

总失真:   

Figure 999366DEST_PATH_IMAGE001
              Total Distortion:
Figure 999366DEST_PATH_IMAGE001

计算新码字:

Figure 950004DEST_PATH_IMAGE002
                                 Compute the new codeword:
Figure 950004DEST_PATH_IMAGE002

式中:

Figure 770193DEST_PATH_IMAGE003
—集合中矢量的个数,
Figure 122677DEST_PATH_IMAGE004
中所有矢量的质心; In the formula:
Figure 770193DEST_PATH_IMAGE003
— the number of vectors in the set,
Figure 122677DEST_PATH_IMAGE004
The centroid of all vectors in ;

相对失真改进量: 

Figure 690110DEST_PATH_IMAGE006
Relative distortion improvement:
Figure 690110DEST_PATH_IMAGE006

采用SVM算法,建立SVM模型 ,选用径向基核函数建立说话人的模型,其算法公式如下:

Figure 48410DEST_PATH_IMAGE007
  ,
Figure 520980DEST_PATH_IMAGE008
 ; The SVM algorithm is used to establish the SVM model, and the radial basis kernel function is used to establish the speaker model. The algorithm formula is as follows:
Figure 48410DEST_PATH_IMAGE007
,
Figure 520980DEST_PATH_IMAGE008
;

识别单元中在结果判定部分通过说话人识别前面板上输出识别结果,当VQ、SVM两种识别方法的结果不一致时,只要有一种方法能识别就把该方法的识别结果作为正确结果输出;当两种方法的结果相同时,在说话人识别前面板上输出识别结果,正确识别用绿灯指示,不识别用红灯指示。 In the recognition unit, output the recognition result through the speaker recognition front panel in the result judgment part. When the results of the two recognition methods of VQ and SVM are inconsistent, as long as there is one method that can recognize, the recognition result of this method will be output as the correct result; When the results of the two methods are the same, the recognition result is output on the front panel of speaker recognition, and the correct recognition is indicated by a green light, and the non-recognition is indicated by a red light.

上述方案中特征提取单元采用美尔频率倒谱系数MFCC及其一阶差分作为识别的特征参数,通过在MATLAB7.0环境下编程实现特征参数的提取,具体参数设置为:帧长512,帧移256,滤波器的个数为12,采样频率44100Hz,并去除了首尾各两帧,因为这两帧的一阶差分为零,这样就得到了24维的语音特征向量。    In the above scheme, the feature extraction unit uses the Mel frequency cepstral coefficient MFCC and its first-order difference as the feature parameters for recognition, and realizes the feature parameter extraction by programming in the MATLAB7.0 environment. The specific parameters are set as: frame length 512, frame shift 256, the number of filters is 12, the sampling frequency is 44100Hz, and the first and last two frames are removed, because the first-order difference between the two frames is zero, so a 24-dimensional speech feature vector is obtained. the

有益效果: Beneficial effect:

1、本发明克服现有的VQ-SVM两种方法混合进行说话人识别时需要串行识别浪费时间的,提出将VQ和SVM两种方法集中在同一个平台上实现并行识别处理,从而在提高整个系统的识别效果的前提下节省识别时间。 1. The present invention overcomes the need for serial recognition to waste time when the existing VQ-SVM two methods are mixed for speaker recognition, and proposes to concentrate the VQ and SVM two methods on the same platform to realize parallel recognition processing, thereby increasing the The recognition time is saved under the premise of the recognition effect of the whole system.

2、本发明在虚拟仪器技术平台上将两种识别方法结合起来进行说话人并行识别。在小样本的情况下,SVM方法优于VQ方法;随着样本的增多,SVM的识别性能呈下降趋势,而VQ方法的识别性能有上升趋势,这样就充分利用了两种方法在样本数上所具有的互补性,从而可以提高系统的整体性能。  2. The present invention combines two recognition methods on a virtual instrument technology platform to perform speaker parallel recognition. In the case of small samples, the SVM method is better than the VQ method; as the number of samples increases, the recognition performance of the SVM shows a downward trend, while the recognition performance of the VQ method has an upward trend, which makes full use of the two methods in the number of samples. The complementarity they have can improve the overall performance of the system. the

四、附图说明:  4. Description of the drawings:

图1为说话人识别系统构成原理图; Figure 1 is a schematic diagram of the speaker recognition system;

图2为本发明的结构示意图; Fig. 2 is a structural representation of the present invention;

图3为本发明中说话人识别前面板的示意图; Fig. 3 is the schematic diagram of speaker identification front panel in the present invention;

图4为LBG算法流程图。 Figure 4 is a flowchart of the LBG algorithm.

1说话人识别前面板   2灯。  1 speaker recognition front panel 2 lights. the

五、具体实施方式:  5. Specific implementation methods:

下面结合附图对本发明做进一步的说明: Below in conjunction with accompanying drawing, the present invention will be further described:

结合图2、图3所示,这种基于虚拟仪器技术的说话人VQ-SVM并行识别系统包括语音预处理单元、特征提取单元、说话人模型单元、识别单元、LabVIEW虚拟仪器平台,在虚拟仪器平台上通过LabVIEW子VI来实现将一个大程序分割成各小模块,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建。为了实现VQ和SVM并行识别,鉴于LabVIEW可以实现多任务、多线程的特点,基于虚拟仪器技术并融合说话人识别技术,由LabVIEW管理和调用MATLAB来实现系统的处理。在本发明的结果判定部分选定时,当两种识别方法的结果不一致时,只要有一种方法能识别就把该方法的识别结果作为正确结果输出。当两种方法的结果相同时,说话人前面板1输出识别结果,正确识别用绿灯2指示,不识别用红灯2指示。系统的性能比较如表4所示。 As shown in Figure 2 and Figure 3, this speaker VQ-SVM parallel recognition system based on virtual instrument technology includes a speech preprocessing unit, a feature extraction unit, a speaker model unit, a recognition unit, and a LabVIEW virtual instrument platform. On the platform, a large program is divided into small modules through LabVIEW subVI, and the program parts that call MATLAB nodes involved in the program are written into subVIs, and the system construction is realized by calling these subVIs. In order to realize the parallel recognition of VQ and SVM, in view of the characteristics that LabVIEW can realize multi-task and multi-thread, based on virtual instrument technology and fusion of speaker recognition technology, LabVIEW manages and calls MATLAB to realize the system processing. When the result judging part of the present invention is selected, when the results of the two recognition methods are inconsistent, as long as there is one method that can recognize, the recognition result of this method will be output as the correct result. When the results of the two methods are the same, the speaker's front panel 1 outputs the recognition result, the correct recognition is indicated by the green light 2, and the red light 2 is indicated for no recognition. The performance comparison of the system is shown in Table 4.

说话人识别系统中的关键问题是语音特征参数的提取及说话人模型的建立,选用美尔频率倒谱系数(MFCC)及其一阶差分作为识别的特征参数。MFCC参数只反映了语音参数的静态特性,而人耳对语音的动态特征更为敏感,反映语音动态变化的参数就是差分倒谱。通过在MATLAB7.0环境下编程实现特征参数的提取,具体参数设置为:帧长512,帧移256,滤波器的个数为12,采样频率44100Hz,并去除了首尾各两帧,因为这两帧的一阶差分为零,这样就得到了24维的语音特征向量。  The key issues in the speaker recognition system are the extraction of speech feature parameters and the establishment of speaker models. Mel frequency cepstral coefficients (MFCC) and their first-order differences are selected as the feature parameters for recognition. MFCC parameters only reflect the static characteristics of speech parameters, while the human ear is more sensitive to the dynamic characteristics of speech, and the parameters that reflect the dynamic changes of speech are differential cepstrum. The extraction of feature parameters is realized by programming in the MATLAB7.0 environment. The specific parameters are set as follows: frame length 512, frame shift 256, number of filters 12, sampling frequency 44100Hz, and remove the first and last two frames, because the two The first-order difference of the frame is zero, so that the 24-dimensional speech feature vector is obtained. the

本发明采用的算法之一是VQ,建立VQ模型(码本)的方法有多种,最基本也最常用的方法是LBG算法,该算法通过训练矢量集和一定的迭代算法来实现最优码本的生成,算法流程如图4所示。初始码本采用分裂法,选取特征向量的形心(质心)作为初始码书。通过实验选取码本容量=16,失真阈值

Figure 740751DEST_PATH_IMAGE010
=0.01,最大迭代次数
Figure 902742DEST_PATH_IMAGE011
=log,得到了较好的识别效果。这些工作是在MATLAB7.0环境下通过编程实现的,然后在LabVIEW中通过调用MATLAB节点来实现模型的建立及存储。其中涉及到核心算法的公式如下:  One of the algorithms used in the present invention is VQ. There are many ways to establish the VQ model (codebook). The most basic and commonly used method is the LBG algorithm. The generation of the book, the algorithm process is shown in Figure 4. The initial codebook adopts the splitting method, and the centroid (centroid) of the feature vector is selected as the initial codebook. Select codebook capacity through experiments =16, distortion threshold
Figure 740751DEST_PATH_IMAGE010
=0.01, the maximum number of iterations
Figure 902742DEST_PATH_IMAGE011
= log , a better recognition effect was obtained. These tasks are realized through programming in the MATLAB7.0 environment, and then the establishment and storage of the model is realized by calling the MATLAB node in LabVIEW. The formulas involved in the core algorithm are as follows:

总失真:  

Figure 435541DEST_PATH_IMAGE001
                        Total Distortion:
Figure 435541DEST_PATH_IMAGE001

计算新码字:

Figure 848068DEST_PATH_IMAGE002
                                 Compute the new codeword:
Figure 848068DEST_PATH_IMAGE002

(其中:

Figure 813750DEST_PATH_IMAGE003
—集合中矢量的个数,
Figure 371956DEST_PATH_IMAGE005
中所有矢量的质心) (in:
Figure 813750DEST_PATH_IMAGE003
— the number of vectors in the set,
Figure 371956DEST_PATH_IMAGE005
centroid of all vectors in

相对失真改进量:

Figure 271779DEST_PATH_IMAGE006
                         Relative distortion improvement:
Figure 271779DEST_PATH_IMAGE006

本发明采用的另一算法是SVM,其关键是核函数选取及其参数的优化。 Another algorithm adopted by the present invention is SVM, the key of which is the selection of kernel function and the optimization of its parameters.

径向基核函数:

Figure 775572DEST_PATH_IMAGE007
  ,
Figure 264191DEST_PATH_IMAGE008
              Radial basis kernel function:
Figure 775572DEST_PATH_IMAGE007
,
Figure 264191DEST_PATH_IMAGE008

为了选定合适的核函数,实践中随意选取了10个男生、10个女生作为样本,用RBF和Poly两种核函数进行了对比实验,具体参见表1、表2。可以看出在相同的条件下,选用RBF的实验效果更佳。鉴于前期实验基础且RBF核函数的参数比Poly核函数的参数少,综合选用径向基核函数建立说话人的模型。核函数参数是影响支持向量机分类器性能的一个重要因素,因而核函数参数优化至关重要,目前常用的方法就是让C和γ在一定的范围内取值,对于取定的参数把训练集作为原始数据集利用交叉验证方法得到验证分类准确率,最终取训练集验证分类准确率最高的那组参数作为最佳参数,这种方法也就是网格搜索法的思想。利用libsvm工具箱下的python下的子目录下面的grid.py程序就可以实现参数C和γ的寻优, 20人5帧数据的参数寻优截图如图4所示。 In order to select a suitable kernel function, 10 boys and 10 girls were randomly selected as samples in practice, and a comparative experiment was carried out with RBF and Poly kernel functions, see Table 1 and Table 2 for details. It can be seen that under the same conditions, the experimental effect of using RBF is better. In view of the basis of previous experiments and the parameters of the RBF kernel function are less than those of the Poly kernel function, the radial basis kernel function is comprehensively selected to establish the speaker model. Kernel function parameters are an important factor affecting the performance of support vector machine classifiers, so the optimization of kernel function parameters is very important. At present, the commonly used method is to let C and γ take values within a certain range. For the given parameters, the training set As the original data set, use the cross-validation method to obtain the verification classification accuracy rate, and finally take the set of parameters with the highest verification classification accuracy rate in the training set as the optimal parameter. This method is also the idea of the grid search method. The optimization of parameters C and γ can be achieved by using the grid.py program under the subdirectory of python under the libsvm toolbox. The screenshot of parameter optimization of 20 people and 5 frames of data is shown in Figure 4.

   the

表 1   10个女生实验结果 帧数 ( C , γ ) (Degree,Coeff) 识别率%(RBF) 识别率% (Poly) RBF识别时间(s) Poly识别时间(s) 1 (0.25,0.25) (3 , 1) 100 100 0.01 0.02 3 (0.25,0.25) (3 , 1) 96.67 96.67 0.02 0.01 5 (0.25,0.25) (3 , 1) 100 98.00 0.02 0.02 7 (0.25,0.25) (3 , 1) 100 98.57 0.02 0.01 10 (0.25,0.25) (3 , 1) 97.00 95.00 0.02 0.02 15 (4.00 ,1.00) (3 , 1) 96.00 96.00 0.03 0.02 20 (4.00 ,1.00) (3 , 1) 96.00 97.00 0.03 0.03 30 (4.00 ,1.00) (3 , 1) 93.00 93.33 0.08 0.08 Table 1 Experimental results of 10 girls number of frames ( C , γ ) (Degree, Coeff) Recognition rate%(RBF) Recognition rate% (Poly) RBF identification time (s) Poly recognition time (s) 1 (0.25,0.25) (3 , 1) 100 100 0.01 0.02 3 (0.25,0.25) (3 , 1) 96.67 96.67 0.02 0.01 5 (0.25,0.25) (3 , 1) 100 98.00 0.02 0.02 7 (0.25,0.25) (3 , 1) 100 98.57 0.02 0.01 10 (0.25,0.25) (3 , 1) 97.00 95.00 0.02 0.02 15 (4.00 ,1.00) (3 , 1) 96.00 96.00 0.03 0.02 20 (4.00 ,1.00) (3 , 1) 96.00 97.00 0.03 0.03 30 (4.00 ,1.00) (3 , 1) 93.00 93.33 0.08 0.08

表 2   10个男生实验结果 帧数 ( C , γ ) (Degree,Coeff) 识别率%(RBF) 识别率%(Poly) RBF识别时间(s) Poly识别时间(s) 1 (1.32,0.76) (3 , 1) 90.00 90.00 0.01 0.01 3 (1.00 ,1.00) (3 , 1) 93.33 93.33 0.02 0.02 5 (4.00 ,1.00) (3 , 1) 96.00 94.00 0.02 0.01 7 (4.00 ,1.00) (3 , 1) 95.71 90.00 0.01 0.02 10 (4 .00,1.00) (3 , 1) 92.00 84.00 0.02 0.02 15 (4.00 ,1.00) (3 , 1) 86.67 84.00 0.05 0.03 20 (4.00 ,1.00) (3 , 1) 87.00 85.00 0.05 0.05 30 (4.00 ,1.00) (3 , 1) 84.67 85.33 0.11 0.09 Table 2 Experimental results of 10 boys number of frames ( C , γ ) (Degree, Coeff) Recognition rate%(RBF) Recognition rate%(Poly) RBF identification time (s) Poly recognition time (s) 1 (1.32,0.76) (3 , 1) 90.00 90.00 0.01 0.01 3 (1.00 ,1.00) (3 , 1) 93.33 93.33 0.02 0.02 5 (4.00 ,1.00) (3 , 1) 96.00 94.00 0.02 0.01 7 (4.00 ,1.00) (3 , 1) 95.71 90.00 0.01 0.02 10 (4.00,1.00) (3 , 1) 92.00 84.00 0.02 0.02 15 (4.00 ,1.00) (3 , 1) 86.67 84.00 0.05 0.03 20 (4.00 ,1.00) (3 , 1) 87.00 85.00 0.05 0.05 30 (4.00 ,1.00) (3 , 1) 84.67 85.33 0.11 0.09

经过以上的分析与实验,本发明先进行单一方法的说话人识别系统在虚拟仪器平台上的构建,并进行调试以达到较好的识别效果,然后将这两种方法融合并行实现基于虚拟仪器技术的说话人识别系统的组建。在系统搭建的过程中,充分利用了模块化的思想,将一个大程序分割成各小模块来实现,既简化了程序又增加了程序的可读性。即在虚拟仪器平台上通过LabVIEW子VI来实现,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建。系统前面板如图3所示。 Through the above analysis and experiments, the present invention first constructs a speaker recognition system of a single method on a virtual instrument platform, and debugs it to achieve a better recognition effect, and then combines these two methods in parallel to realize the virtual instrument technology-based The construction of the speaker recognition system. In the process of building the system, the idea of modularization is fully utilized, and a large program is divided into small modules for realization, which not only simplifies the program but also increases the readability of the program. That is to say, it is realized by LabVIEW subVI on the virtual instrument platform, and the program parts that call MATLAB nodes involved in the program are all written into subVIs, and the system construction is realized by calling these subVIs. The front panel of the system is shown in Figure 3.

实验验证及结果分析Experimental verification and result analysis

说话人的语音是在普通实验室环境下,通过电脑自带的录音机录音得到的。语料均为采样频率22050Hz,16位单声道的语音信号,文件类型为wav格式。本文选取了30人作为说话人,每人录20段语音(每段2~4s)作为样本库,均与样本无关,前10段用于建立说话人模型,后10段用于测试。经过对20人和30人随意抽取的5次和10次语音的测试,得到的实验结果如表3所示。从中分析,随着样本量的增多VQ方法优于SVM方法,也就说明了SVM方法在小样本上具有分类优势,而VQ方法在大样本上具有分类优势,可以推断随着样本量的增多甚至成千上万个,VQ方法将优于SVM方法。在本发明的结果判定部分选定当两种识别方法的结果不一致时,只要有一种方法能识别就把该方法的识别结果作为正确结果输出。当两种方法的结果相同时,说话人前面板1输出识别结果,正确识别用绿灯2指示,不识别用红灯2指示。系统的性能比较如表4所示。从中可以看出,用两种方法并行实现说话人识别使系统的性能得到了提高,识别时间虽然有所增加,但增加的很少。 The speaker's voice is recorded in a common laboratory environment through the tape recorder that comes with the computer. The corpus is a 16-bit monophonic voice signal with a sampling frequency of 22050 Hz, and the file type is wav format. In this paper, 30 people are selected as speakers, and each person records 20 segments of speech (each segment is 2-4s) as a sample library, all of which have nothing to do with the samples. The first 10 segments are used to establish the speaker model, and the last 10 segments are used for testing. After testing 5 times and 10 times of voice randomly selected by 20 and 30 people, the experimental results are shown in Table 3. From the analysis, as the sample size increases, the VQ method is better than the SVM method, which means that the SVM method has classification advantages in small samples, and the VQ method has classification advantages in large samples. It can be inferred that as the sample size increases or even Thousands of them, the VQ method will outperform the SVM method. In the result judging part of the present invention, when the results of the two recognition methods are inconsistent, as long as there is one method that can recognize, the recognition result of this method will be output as the correct result. When the results of the two methods are the same, the speaker's front panel 1 outputs the recognition result, the correct recognition is indicated by the green light 2, and the red light 2 is indicated for no recognition. The performance comparison of the system is shown in Table 4. It can be seen that the performance of the system has been improved by using the two methods in parallel to realize the speaker recognition, although the recognition time has increased, but the increase is very little.

   the

表 3   实验结果对比 Table 3 Comparison of experimental results

Figure 624766DEST_PATH_IMAGE013
Figure 624766DEST_PATH_IMAGE013

表 4   系统性能比较 识别方法 识别率(%) 误识率(%) 识别时间(s) VQ 94.88 10.09 0.06 SVM 97.38 9.71 0.08 VQ-SVM 98.54 5.28 0.15 Table 4 System performance comparison recognition methods Recognition rate(%) Misrecognition rate (%) Recognition time (s) Q 94.88 10.09 0.06 SVM 97.38 9.71 0.08 VQ-SVM 98.54 5.28 0.15

结论in conclusion

 本发明在虚拟仪器技术平台上将两种识别方法结合起来进行说话人并行识别。在小样本的情况下,SVM方法优于VQ方法;随着样本的增多,SVM的识别性能呈下降趋势,而VQ方法的识别性能有上升趋势,这样就充分利用了两种方法在样本数上所具有的互补性,从而可以提高系统的整体性能。 The present invention combines two recognition methods on a virtual instrument technology platform to perform parallel speaker recognition. In the case of small samples, the SVM method is better than the VQ method; as the number of samples increases, the recognition performance of the SVM shows a downward trend, while the recognition performance of the VQ method has an upward trend, which makes full use of the two methods in the number of samples. The complementarity they have can improve the overall performance of the system.

Claims (2)

1. one kind based on the parallel recognition system of the speaker VQ-SVM of virtual instrument technique; It is characterized in that: the parallel recognition system of this speaker VQ-SVM based on virtual instrument technique comprises voice pretreatment unit, feature extraction unit, speaker model unit, recognition unit, LabVIEW virtual instrument platform; On virtual instrument platform, realize a large program is divided into each little module through the sub-VI of LabVIEW; The program part that calls the MATLAB node that relates in the program all is written as each sub-VI, through calling the structure that this a little VI realizes system;
Adopt the VQ algorithm, set up the VQ model, initial codebook adopts disintegrating method, and the centre of form of selected characteristic vector realizes foundation and the storage of speaker model through calling the MATLAB node as inceptive code book in LabVIEW, and algorithmic formula is following:
Total distortion:
Figure 201210008213X100001DEST_PATH_IMAGE002
Calculate new code word:
Figure 201210008213X100001DEST_PATH_IMAGE004
In the formula: the number of vector in
Figure 201210008213X100001DEST_PATH_IMAGE006
-set, the barycenter of all vectors among -
Figure 201210008213X100001DEST_PATH_IMAGE010
;
Relative distortion improvement amount:
Figure 201210008213X100001DEST_PATH_IMAGE012
Adopt the SVM algorithm; Set up the SVM model; Select for use radially basic kernel function to set up speaker's model; Its algorithmic formula is following:
Figure 201210008213X100001DEST_PATH_IMAGE014
,
Figure 201210008213X100001DEST_PATH_IMAGE016
;
Export recognition result in decision section as a result on through the Speaker Identification front panel in the recognition unit, when the result of VQ, two kinds of recognition methodss of SVM is inconsistent, export the recognition result of this method as correct result as long as there is a kind of method just to discern; When the coming to the same thing of two kinds of methods, on the Speaker Identification front panel, export recognition result, correct identification is with the green light indication, and nonrecognition is indicated with red light.
2. the parallel recognition system of the speaker VQ-SVM based on virtual instrument technique according to claim 1; It is characterized in that: described feature extraction unit adopts Mei Er frequency cepstral coefficient MFCC and first order difference thereof as the characteristic parameter of identification, realizes the extraction of characteristic parameter through under the MATLAB7.0 environment, programming, and concrete parameter is set to: frame length 512; Frame moves 256; The number of wave filter is 12, SF 44100Hz, and removed each two frame of head and the tail; Because the first order difference of this two frame is zero, so just obtained the speech feature vector of 24 dimensions.
CN201210008213XA 2012-01-12 2012-01-12 Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology Pending CN102543075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210008213XA CN102543075A (en) 2012-01-12 2012-01-12 Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210008213XA CN102543075A (en) 2012-01-12 2012-01-12 Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology

Publications (1)

Publication Number Publication Date
CN102543075A true CN102543075A (en) 2012-07-04

Family

ID=46349815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210008213XA Pending CN102543075A (en) 2012-01-12 2012-01-12 Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology

Country Status (1)

Country Link
CN (1) CN102543075A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107945787A (en) * 2017-11-21 2018-04-20 上海电机学院 A kind of acoustic control login management system and method based on virtual instrument technique

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030036905A1 (en) * 2001-07-25 2003-02-20 Yasuhiro Toguri Information detection apparatus and method, and information search apparatus and method
CN1588535A (en) * 2004-09-29 2005-03-02 上海交通大学 Automatic sound identifying treating method for embedded sound identifying system
JP2005345683A (en) * 2004-06-02 2005-12-15 Toshiba Tec Corp Speaker recognition device, program, and speaker recognition method
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030036905A1 (en) * 2001-07-25 2003-02-20 Yasuhiro Toguri Information detection apparatus and method, and information search apparatus and method
JP2005345683A (en) * 2004-06-02 2005-12-15 Toshiba Tec Corp Speaker recognition device, program, and speaker recognition method
CN1588535A (en) * 2004-09-29 2005-03-02 上海交通大学 Automatic sound identifying treating method for embedded sound identifying system
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
余洋: "基于LabVIEW的说话人识别系统开发", 《中国优秀硕士学位论文全文数据库》 *
刘祥楼等: "说话人识别中支持向量机核函数参数优化研究", 《科学技术与工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107945787A (en) * 2017-11-21 2018-04-20 上海电机学院 A kind of acoustic control login management system and method based on virtual instrument technique

Similar Documents

Publication Publication Date Title
Cramer et al. Look, listen, and learn more: Design choices for deep audio embeddings
CN105786798B (en) Natural language is intended to understanding method in a kind of human-computer interaction
Sahu Multimodal speech emotion recognition and ambiguity resolution
CN102074234B (en) Speech Variation Model Establishment Device, Method, Speech Recognition System and Method
CN103700370B (en) A kind of radio and television speech recognition system method and system
CN110853666A (en) A speaker separation method, device, equipment and storage medium
CN108711421A (en) A kind of voice recognition acoustic model method for building up and device and electronic equipment
CN108364662B (en) Voice emotion recognition method and system based on paired identification tasks
CN103871426A (en) Method and system for comparing similarity between user audio frequency and original audio frequency
Fang et al. Channel adversarial training for cross-channel text-independent speaker recognition
CN103177733A (en) Method and system for evaluating Chinese mandarin retroflex suffixation pronunciation quality
CN107122807A (en) A kind of family's monitoring method, service end and computer-readable recording medium
CN103500579A (en) Voice recognition method, device and system
CN109754790A (en) A speech recognition system and method based on a hybrid acoustic model
CN106683687A (en) Abnormal voice classifying method and device
CN105702251A (en) Speech emotion identifying method based on Top-k enhanced audio bag-of-word model
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
CN102623008A (en) voiceprint recognition method
Byun et al. Consumer-level multimedia event detection through unsupervised audio signal modeling.
Hussain et al. Classification of bangla alphabets phoneme based on audio features using mlpc & svm
Chakroun et al. A hybrid system based on GMM-SVM for speaker identification
CN105006231A (en) Distributed large population speaker recognition method based on fuzzy clustering decision tree
CN102543075A (en) Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology
Van Leeuwen Speaker linking in large data sets
Sharma et al. A Natural Human-Machine Interaction via an Efficient Speech Recognition System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120704