CN102543075A - Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology - Google Patents
Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology Download PDFInfo
- Publication number
- CN102543075A CN102543075A CN201210008213XA CN201210008213A CN102543075A CN 102543075 A CN102543075 A CN 102543075A CN 201210008213X A CN201210008213X A CN 201210008213XA CN 201210008213 A CN201210008213 A CN 201210008213A CN 102543075 A CN102543075 A CN 102543075A
- Authority
- CN
- China
- Prior art keywords
- recognition
- speaker
- svm
- virtual instrument
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
本发明涉及的是基于虚拟仪器技术的说话人VQ-SVM并行识别系统,这种基于虚拟仪器技术的说话人VQ-SVM并行识别系统包括语音预处理单元、特征提取单元、说话人模型单元、识别单元、LabVIEW虚拟仪器平台,在虚拟仪器平台上通过LabVIEW子VI来实现将一个大程序分割成各小模块,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建。本发明克服现有的VQ-SVM两种方法混合进行说话人识别时需要串行识别浪费时间的,提出将VQ和SVM两种方法集中在同一个平台上实现并行识别处理,从而在提高整个系统的识别效果的前提下节省识别时间。
The present invention relates to a speaker VQ-SVM parallel recognition system based on virtual instrument technology. This speaker VQ-SVM parallel recognition system based on virtual instrument technology includes a speech preprocessing unit, a feature extraction unit, a speaker model unit, a recognition Unit, LabVIEW virtual instrument platform, on the virtual instrument platform, a large program is divided into small modules through LabVIEW subVI, and the program parts that call MATLAB nodes involved in the program are written into subVIs, and by calling these Sub VI to realize the construction of the system. The present invention overcomes the need for serial recognition to waste time when the existing VQ-SVM two methods are mixed for speaker recognition, and proposes to concentrate the VQ and SVM two methods on the same platform to realize parallel recognition processing, thereby improving the overall system The recognition time is saved under the premise of the recognition effect.
Description
the
一、 技术领域: 1. Technical field:
本发明涉及的是信号处理和模式识别领域,具体涉及的是基于虚拟仪器技术的说话人VQ-SVM并行识别系统。 The invention relates to the field of signal processing and pattern recognition, in particular to a speaker VQ-SVM parallel recognition system based on virtual instrument technology.
二、背景技术: 2. Background technology:
说话人识别是通过分析说话人的语音特征达到识别出说话人身份的目的。说话人识别方法主要包括矢量量化方法、概率统计方法、判别分类器方法等。按说话人识别系统构成原理,说话人识别主要包括训练和识别两个阶段如图1所示。首先获取原始语音信号,再经过预处理得到干净的语音信号,然后提取语音特征参数之后再通过特定的方法实现说话人训练与识别。其说话人模型通常采用数据库存储大量按特定算法处理后的语音特征样本,而待识别语音经过预处理和特征提取后与数据库中的样本集进行匹配计算后实现判别。 Speaker recognition is to identify the identity of the speaker by analyzing the speaker's speech characteristics. Speaker recognition methods mainly include vector quantization methods, probability statistics methods, and discriminant classifier methods. According to the composition principle of the speaker recognition system, speaker recognition mainly includes two stages of training and recognition, as shown in Figure 1. Firstly, the original speech signal is obtained, and then a clean speech signal is obtained through preprocessing, and then the speech feature parameters are extracted, and then speaker training and recognition are realized through a specific method. The speaker model usually uses a database to store a large number of speech feature samples processed by a specific algorithm, and the speech to be recognized is matched with the sample set in the database after preprocessing and feature extraction to achieve discrimination.
任何单一方法既有优势也有局限,目前研究的较多的是将两种或两种以上方法结合在一起的混合识别方法。VQ技术是一种数据压缩和编码技术;SVM是基于统计理论的机器学习方法。这两种方法具有互补性,矢量量化(VQ)方法的优点是大样本分类特性较发好,模型数量少,训练时间短,识别响应较快,缺点是不能解决非线性问题,抗噪性能差;支持向量机(SVM)方法的优点是小样本分类较好,在解决非线性及高维模式识别问题中表现出特有的优势,缺点是训练算法复杂且训练速度慢,难以处理大样本数据。虽然曾有VQ-SVM两种方法混合进行说话人识别,但各类运算通常都是在MATLAB平台上实现的。因此,若用多种不同方法也只能采用串行方式进行。同样,现有的VQ-SVM两种方法混合进行说话人识别也同样是先用一种方法进行初次识别再用另一种方法进行二次识别的所谓串行识别。不难发现这种串行识别方法的最大弱点是即占用机器资源又浪费识别时间。 Any single method has both advantages and limitations. At present, most of the research is a hybrid recognition method that combines two or more methods. VQ technology is a data compression and coding technology; SVM is a machine learning method based on statistical theory. These two methods are complementary. The advantages of the vector quantization (VQ) method are that the classification characteristics of large samples are better, the number of models is small, the training time is short, and the recognition response is faster. The disadvantage is that it cannot solve nonlinear problems and has poor anti-noise performance. The advantage of the support vector machine (SVM) method is that small sample classification is better, and it shows unique advantages in solving nonlinear and high-dimensional pattern recognition problems. The disadvantage is that the training algorithm is complex and the training speed is slow, and it is difficult to handle large sample data. Although there have been two methods of VQ-SVM mixed for speaker recognition, various operations are usually implemented on the MATLAB platform. Therefore, if a variety of different methods are used, it can only be carried out in a serial manner. Similarly, the existing VQ-SVM two methods are mixed for speaker recognition, which is also the so-called serial recognition in which one method is used for initial recognition and then another method is used for secondary recognition. It is not difficult to find that the biggest weakness of this serial identification method is that it takes up machine resources and wastes identification time. the
三、发明内容: 3. Contents of the invention:
本发明的目的是提供基于虚拟仪器技术的说话人VQ-SVM并行识别系统,它用于解决现有的VQ-SVM两种方法混合进行说话人识别即占用机器资源又浪费识别时间的问题。 The object of the present invention is to provide a speaker VQ-SVM parallel recognition system based on virtual instrument technology, which is used to solve the problem that the existing VQ-SVM two methods are mixed for speaker recognition, which occupies machine resources and wastes recognition time.
本发明解决其技术问题所采用的技术方案是:这种基于虚拟仪器技术的说话人VQ-SVM并行识别系统包括语音预处理单元、特征提取单元、说话人模型单元、识别单元、LabVIEW虚拟仪器平台,在虚拟仪器平台上通过LabVIEW子VI来实现将一个大程序分割成各小模块,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建; The technical solution adopted by the present invention to solve its technical problems is: this speaker VQ-SVM parallel recognition system based on virtual instrument technology includes a speech preprocessing unit, a feature extraction unit, a speaker model unit, a recognition unit, and a LabVIEW virtual instrument platform On the virtual instrument platform, a large program is divided into small modules through LabVIEW subVIs, and the program parts that call MATLAB nodes involved in the program are written into subVIs, and the system is realized by calling these subVIs. Construct;
采用VQ算法,建立VQ模型 ,初始码本采用分裂法,选取特征向量的形心作为初始码书,在LabVIEW中通过调用MATLAB节点来实现说话人模型的建立及存储,算法公式如下: Using the VQ algorithm, the VQ model is established. The initial codebook adopts the split method, and the centroid of the feature vector is selected as the initial codebook. In LabVIEW, the establishment and storage of the speaker model is realized by calling the MATLAB node. The algorithm formula is as follows:
总失真: Total Distortion:
计算新码字: Compute the new codeword:
式中:—集合中矢量的个数,—中所有矢量的质心; In the formula: — the number of vectors in the set, — The centroid of all vectors in ;
相对失真改进量: Relative distortion improvement:
采用SVM算法,建立SVM模型 ,选用径向基核函数建立说话人的模型,其算法公式如下: , ; The SVM algorithm is used to establish the SVM model, and the radial basis kernel function is used to establish the speaker model. The algorithm formula is as follows: , ;
识别单元中在结果判定部分通过说话人识别前面板上输出识别结果,当VQ、SVM两种识别方法的结果不一致时,只要有一种方法能识别就把该方法的识别结果作为正确结果输出;当两种方法的结果相同时,在说话人识别前面板上输出识别结果,正确识别用绿灯指示,不识别用红灯指示。 In the recognition unit, output the recognition result through the speaker recognition front panel in the result judgment part. When the results of the two recognition methods of VQ and SVM are inconsistent, as long as there is one method that can recognize, the recognition result of this method will be output as the correct result; When the results of the two methods are the same, the recognition result is output on the front panel of speaker recognition, and the correct recognition is indicated by a green light, and the non-recognition is indicated by a red light.
上述方案中特征提取单元采用美尔频率倒谱系数MFCC及其一阶差分作为识别的特征参数,通过在MATLAB7.0环境下编程实现特征参数的提取,具体参数设置为:帧长512,帧移256,滤波器的个数为12,采样频率44100Hz,并去除了首尾各两帧,因为这两帧的一阶差分为零,这样就得到了24维的语音特征向量。 In the above scheme, the feature extraction unit uses the Mel frequency cepstral coefficient MFCC and its first-order difference as the feature parameters for recognition, and realizes the feature parameter extraction by programming in the MATLAB7.0 environment. The specific parameters are set as: frame length 512, frame shift 256, the number of filters is 12, the sampling frequency is 44100Hz, and the first and last two frames are removed, because the first-order difference between the two frames is zero, so a 24-dimensional speech feature vector is obtained. the
有益效果: Beneficial effect:
1、本发明克服现有的VQ-SVM两种方法混合进行说话人识别时需要串行识别浪费时间的,提出将VQ和SVM两种方法集中在同一个平台上实现并行识别处理,从而在提高整个系统的识别效果的前提下节省识别时间。 1. The present invention overcomes the need for serial recognition to waste time when the existing VQ-SVM two methods are mixed for speaker recognition, and proposes to concentrate the VQ and SVM two methods on the same platform to realize parallel recognition processing, thereby increasing the The recognition time is saved under the premise of the recognition effect of the whole system.
2、本发明在虚拟仪器技术平台上将两种识别方法结合起来进行说话人并行识别。在小样本的情况下,SVM方法优于VQ方法;随着样本的增多,SVM的识别性能呈下降趋势,而VQ方法的识别性能有上升趋势,这样就充分利用了两种方法在样本数上所具有的互补性,从而可以提高系统的整体性能。 2. The present invention combines two recognition methods on a virtual instrument technology platform to perform speaker parallel recognition. In the case of small samples, the SVM method is better than the VQ method; as the number of samples increases, the recognition performance of the SVM shows a downward trend, while the recognition performance of the VQ method has an upward trend, which makes full use of the two methods in the number of samples. The complementarity they have can improve the overall performance of the system. the
四、附图说明: 4. Description of the drawings:
图1为说话人识别系统构成原理图; Figure 1 is a schematic diagram of the speaker recognition system;
图2为本发明的结构示意图; Fig. 2 is a structural representation of the present invention;
图3为本发明中说话人识别前面板的示意图; Fig. 3 is the schematic diagram of speaker identification front panel in the present invention;
图4为LBG算法流程图。 Figure 4 is a flowchart of the LBG algorithm.
1说话人识别前面板 2灯。
1 speaker
五、具体实施方式: 5. Specific implementation methods:
下面结合附图对本发明做进一步的说明: Below in conjunction with accompanying drawing, the present invention will be further described:
结合图2、图3所示,这种基于虚拟仪器技术的说话人VQ-SVM并行识别系统包括语音预处理单元、特征提取单元、说话人模型单元、识别单元、LabVIEW虚拟仪器平台,在虚拟仪器平台上通过LabVIEW子VI来实现将一个大程序分割成各小模块,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建。为了实现VQ和SVM并行识别,鉴于LabVIEW可以实现多任务、多线程的特点,基于虚拟仪器技术并融合说话人识别技术,由LabVIEW管理和调用MATLAB来实现系统的处理。在本发明的结果判定部分选定时,当两种识别方法的结果不一致时,只要有一种方法能识别就把该方法的识别结果作为正确结果输出。当两种方法的结果相同时,说话人前面板1输出识别结果,正确识别用绿灯2指示,不识别用红灯2指示。系统的性能比较如表4所示。
As shown in Figure 2 and Figure 3, this speaker VQ-SVM parallel recognition system based on virtual instrument technology includes a speech preprocessing unit, a feature extraction unit, a speaker model unit, a recognition unit, and a LabVIEW virtual instrument platform. On the platform, a large program is divided into small modules through LabVIEW subVI, and the program parts that call MATLAB nodes involved in the program are written into subVIs, and the system construction is realized by calling these subVIs. In order to realize the parallel recognition of VQ and SVM, in view of the characteristics that LabVIEW can realize multi-task and multi-thread, based on virtual instrument technology and fusion of speaker recognition technology, LabVIEW manages and calls MATLAB to realize the system processing. When the result judging part of the present invention is selected, when the results of the two recognition methods are inconsistent, as long as there is one method that can recognize, the recognition result of this method will be output as the correct result. When the results of the two methods are the same, the speaker's
说话人识别系统中的关键问题是语音特征参数的提取及说话人模型的建立,选用美尔频率倒谱系数(MFCC)及其一阶差分作为识别的特征参数。MFCC参数只反映了语音参数的静态特性,而人耳对语音的动态特征更为敏感,反映语音动态变化的参数就是差分倒谱。通过在MATLAB7.0环境下编程实现特征参数的提取,具体参数设置为:帧长512,帧移256,滤波器的个数为12,采样频率44100Hz,并去除了首尾各两帧,因为这两帧的一阶差分为零,这样就得到了24维的语音特征向量。 The key issues in the speaker recognition system are the extraction of speech feature parameters and the establishment of speaker models. Mel frequency cepstral coefficients (MFCC) and their first-order differences are selected as the feature parameters for recognition. MFCC parameters only reflect the static characteristics of speech parameters, while the human ear is more sensitive to the dynamic characteristics of speech, and the parameters that reflect the dynamic changes of speech are differential cepstrum. The extraction of feature parameters is realized by programming in the MATLAB7.0 environment. The specific parameters are set as follows: frame length 512, frame shift 256, number of filters 12, sampling frequency 44100Hz, and remove the first and last two frames, because the two The first-order difference of the frame is zero, so that the 24-dimensional speech feature vector is obtained. the
本发明采用的算法之一是VQ,建立VQ模型(码本)的方法有多种,最基本也最常用的方法是LBG算法,该算法通过训练矢量集和一定的迭代算法来实现最优码本的生成,算法流程如图4所示。初始码本采用分裂法,选取特征向量的形心(质心)作为初始码书。通过实验选取码本容量=16,失真阈值=0.01,最大迭代次数=log,得到了较好的识别效果。这些工作是在MATLAB7.0环境下通过编程实现的,然后在LabVIEW中通过调用MATLAB节点来实现模型的建立及存储。其中涉及到核心算法的公式如下: One of the algorithms used in the present invention is VQ. There are many ways to establish the VQ model (codebook). The most basic and commonly used method is the LBG algorithm. The generation of the book, the algorithm process is shown in Figure 4. The initial codebook adopts the splitting method, and the centroid (centroid) of the feature vector is selected as the initial codebook. Select codebook capacity through experiments =16, distortion threshold =0.01, the maximum number of iterations = log , a better recognition effect was obtained. These tasks are realized through programming in the MATLAB7.0 environment, and then the establishment and storage of the model is realized by calling the MATLAB node in LabVIEW. The formulas involved in the core algorithm are as follows:
总失真: Total Distortion:
计算新码字: Compute the new codeword:
(其中:—集合中矢量的个数,—中所有矢量的质心) (in: — the number of vectors in the set, — centroid of all vectors in
相对失真改进量: Relative distortion improvement:
本发明采用的另一算法是SVM,其关键是核函数选取及其参数的优化。 Another algorithm adopted by the present invention is SVM, the key of which is the selection of kernel function and the optimization of its parameters.
径向基核函数: , Radial basis kernel function: ,
为了选定合适的核函数,实践中随意选取了10个男生、10个女生作为样本,用RBF和Poly两种核函数进行了对比实验,具体参见表1、表2。可以看出在相同的条件下,选用RBF的实验效果更佳。鉴于前期实验基础且RBF核函数的参数比Poly核函数的参数少,综合选用径向基核函数建立说话人的模型。核函数参数是影响支持向量机分类器性能的一个重要因素,因而核函数参数优化至关重要,目前常用的方法就是让C和γ在一定的范围内取值,对于取定的参数把训练集作为原始数据集利用交叉验证方法得到验证分类准确率,最终取训练集验证分类准确率最高的那组参数作为最佳参数,这种方法也就是网格搜索法的思想。利用libsvm工具箱下的python下的子目录下面的grid.py程序就可以实现参数C和γ的寻优, 20人5帧数据的参数寻优截图如图4所示。 In order to select a suitable kernel function, 10 boys and 10 girls were randomly selected as samples in practice, and a comparative experiment was carried out with RBF and Poly kernel functions, see Table 1 and Table 2 for details. It can be seen that under the same conditions, the experimental effect of using RBF is better. In view of the basis of previous experiments and the parameters of the RBF kernel function are less than those of the Poly kernel function, the radial basis kernel function is comprehensively selected to establish the speaker model. Kernel function parameters are an important factor affecting the performance of support vector machine classifiers, so the optimization of kernel function parameters is very important. At present, the commonly used method is to let C and γ take values within a certain range. For the given parameters, the training set As the original data set, use the cross-validation method to obtain the verification classification accuracy rate, and finally take the set of parameters with the highest verification classification accuracy rate in the training set as the optimal parameter. This method is also the idea of the grid search method. The optimization of parameters C and γ can be achieved by using the grid.py program under the subdirectory of python under the libsvm toolbox. The screenshot of parameter optimization of 20 people and 5 frames of data is shown in Figure 4.
the
表 1 10个女生实验结果
表 2 10个男生实验结果
经过以上的分析与实验,本发明先进行单一方法的说话人识别系统在虚拟仪器平台上的构建,并进行调试以达到较好的识别效果,然后将这两种方法融合并行实现基于虚拟仪器技术的说话人识别系统的组建。在系统搭建的过程中,充分利用了模块化的思想,将一个大程序分割成各小模块来实现,既简化了程序又增加了程序的可读性。即在虚拟仪器平台上通过LabVIEW子VI来实现,将程序中涉及到的调用MATLAB节点的程序部分都编写成各子VI,通过调用这些子VI来实现系统的构建。系统前面板如图3所示。 Through the above analysis and experiments, the present invention first constructs a speaker recognition system of a single method on a virtual instrument platform, and debugs it to achieve a better recognition effect, and then combines these two methods in parallel to realize the virtual instrument technology-based The construction of the speaker recognition system. In the process of building the system, the idea of modularization is fully utilized, and a large program is divided into small modules for realization, which not only simplifies the program but also increases the readability of the program. That is to say, it is realized by LabVIEW subVI on the virtual instrument platform, and the program parts that call MATLAB nodes involved in the program are all written into subVIs, and the system construction is realized by calling these subVIs. The front panel of the system is shown in Figure 3.
实验验证及结果分析Experimental verification and result analysis
说话人的语音是在普通实验室环境下,通过电脑自带的录音机录音得到的。语料均为采样频率22050Hz,16位单声道的语音信号,文件类型为wav格式。本文选取了30人作为说话人,每人录20段语音(每段2~4s)作为样本库,均与样本无关,前10段用于建立说话人模型,后10段用于测试。经过对20人和30人随意抽取的5次和10次语音的测试,得到的实验结果如表3所示。从中分析,随着样本量的增多VQ方法优于SVM方法,也就说明了SVM方法在小样本上具有分类优势,而VQ方法在大样本上具有分类优势,可以推断随着样本量的增多甚至成千上万个,VQ方法将优于SVM方法。在本发明的结果判定部分选定当两种识别方法的结果不一致时,只要有一种方法能识别就把该方法的识别结果作为正确结果输出。当两种方法的结果相同时,说话人前面板1输出识别结果,正确识别用绿灯2指示,不识别用红灯2指示。系统的性能比较如表4所示。从中可以看出,用两种方法并行实现说话人识别使系统的性能得到了提高,识别时间虽然有所增加,但增加的很少。
The speaker's voice is recorded in a common laboratory environment through the tape recorder that comes with the computer. The corpus is a 16-bit monophonic voice signal with a sampling frequency of 22050 Hz, and the file type is wav format. In this paper, 30 people are selected as speakers, and each person records 20 segments of speech (each segment is 2-4s) as a sample library, all of which have nothing to do with the samples. The first 10 segments are used to establish the speaker model, and the last 10 segments are used for testing. After testing 5 times and 10 times of voice randomly selected by 20 and 30 people, the experimental results are shown in Table 3. From the analysis, as the sample size increases, the VQ method is better than the SVM method, which means that the SVM method has classification advantages in small samples, and the VQ method has classification advantages in large samples. It can be inferred that as the sample size increases or even Thousands of them, the VQ method will outperform the SVM method. In the result judging part of the present invention, when the results of the two recognition methods are inconsistent, as long as there is one method that can recognize, the recognition result of this method will be output as the correct result. When the results of the two methods are the same, the speaker's
the
表 3 实验结果对比 Table 3 Comparison of experimental results
表 4 系统性能比较
结论in conclusion
本发明在虚拟仪器技术平台上将两种识别方法结合起来进行说话人并行识别。在小样本的情况下,SVM方法优于VQ方法;随着样本的增多,SVM的识别性能呈下降趋势,而VQ方法的识别性能有上升趋势,这样就充分利用了两种方法在样本数上所具有的互补性,从而可以提高系统的整体性能。 The present invention combines two recognition methods on a virtual instrument technology platform to perform parallel speaker recognition. In the case of small samples, the SVM method is better than the VQ method; as the number of samples increases, the recognition performance of the SVM shows a downward trend, while the recognition performance of the VQ method has an upward trend, which makes full use of the two methods in the number of samples. The complementarity they have can improve the overall performance of the system.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210008213XA CN102543075A (en) | 2012-01-12 | 2012-01-12 | Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210008213XA CN102543075A (en) | 2012-01-12 | 2012-01-12 | Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102543075A true CN102543075A (en) | 2012-07-04 |
Family
ID=46349815
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210008213XA Pending CN102543075A (en) | 2012-01-12 | 2012-01-12 | Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102543075A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107945787A (en) * | 2017-11-21 | 2018-04-20 | 上海电机学院 | A kind of acoustic control login management system and method based on virtual instrument technique |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030036905A1 (en) * | 2001-07-25 | 2003-02-20 | Yasuhiro Toguri | Information detection apparatus and method, and information search apparatus and method |
CN1588535A (en) * | 2004-09-29 | 2005-03-02 | 上海交通大学 | Automatic sound identifying treating method for embedded sound identifying system |
JP2005345683A (en) * | 2004-06-02 | 2005-12-15 | Toshiba Tec Corp | Speaker recognition device, program, and speaker recognition method |
CN101640043A (en) * | 2009-09-01 | 2010-02-03 | 清华大学 | Speaker recognition method based on multi-coordinate sequence kernel and system thereof |
-
2012
- 2012-01-12 CN CN201210008213XA patent/CN102543075A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030036905A1 (en) * | 2001-07-25 | 2003-02-20 | Yasuhiro Toguri | Information detection apparatus and method, and information search apparatus and method |
JP2005345683A (en) * | 2004-06-02 | 2005-12-15 | Toshiba Tec Corp | Speaker recognition device, program, and speaker recognition method |
CN1588535A (en) * | 2004-09-29 | 2005-03-02 | 上海交通大学 | Automatic sound identifying treating method for embedded sound identifying system |
CN101640043A (en) * | 2009-09-01 | 2010-02-03 | 清华大学 | Speaker recognition method based on multi-coordinate sequence kernel and system thereof |
Non-Patent Citations (2)
Title |
---|
余洋: "基于LabVIEW的说话人识别系统开发", 《中国优秀硕士学位论文全文数据库》 * |
刘祥楼等: "说话人识别中支持向量机核函数参数优化研究", 《科学技术与工程》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107945787A (en) * | 2017-11-21 | 2018-04-20 | 上海电机学院 | A kind of acoustic control login management system and method based on virtual instrument technique |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cramer et al. | Look, listen, and learn more: Design choices for deep audio embeddings | |
CN105786798B (en) | Natural language is intended to understanding method in a kind of human-computer interaction | |
Sahu | Multimodal speech emotion recognition and ambiguity resolution | |
CN102074234B (en) | Speech Variation Model Establishment Device, Method, Speech Recognition System and Method | |
CN103700370B (en) | A kind of radio and television speech recognition system method and system | |
CN110853666A (en) | A speaker separation method, device, equipment and storage medium | |
CN108711421A (en) | A kind of voice recognition acoustic model method for building up and device and electronic equipment | |
CN108364662B (en) | Voice emotion recognition method and system based on paired identification tasks | |
CN103871426A (en) | Method and system for comparing similarity between user audio frequency and original audio frequency | |
Fang et al. | Channel adversarial training for cross-channel text-independent speaker recognition | |
CN103177733A (en) | Method and system for evaluating Chinese mandarin retroflex suffixation pronunciation quality | |
CN107122807A (en) | A kind of family's monitoring method, service end and computer-readable recording medium | |
CN103500579A (en) | Voice recognition method, device and system | |
CN109754790A (en) | A speech recognition system and method based on a hybrid acoustic model | |
CN106683687A (en) | Abnormal voice classifying method and device | |
CN105702251A (en) | Speech emotion identifying method based on Top-k enhanced audio bag-of-word model | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN102623008A (en) | voiceprint recognition method | |
Byun et al. | Consumer-level multimedia event detection through unsupervised audio signal modeling. | |
Hussain et al. | Classification of bangla alphabets phoneme based on audio features using mlpc & svm | |
Chakroun et al. | A hybrid system based on GMM-SVM for speaker identification | |
CN105006231A (en) | Distributed large population speaker recognition method based on fuzzy clustering decision tree | |
CN102543075A (en) | Speaker VQ-SVM Parallel Recognition System Based on Virtual Instrument Technology | |
Van Leeuwen | Speaker linking in large data sets | |
Sharma et al. | A Natural Human-Machine Interaction via an Efficient Speech Recognition System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120704 |