CN103811000A

CN103811000A - Voice recognition system and voice recognition method

Info

Publication number: CN103811000A
Application number: CN201410062780.2A
Authority: CN
Inventors: 蔡中军; 贾春晖; 周京蕙; 王翀; 郑潜; 余代员
Original assignee: China Mobile Shenzhen Co Ltd
Current assignee: China Mobile Shenzhen Co Ltd
Priority date: 2014-02-24
Filing date: 2014-02-24
Publication date: 2014-05-21

Abstract

The present invention relates to a speech recognition system and method. The speech recognition method includes the following steps: S1: collecting audio information to be recognized and region information corresponding to the audio information; S2: calling the region information according to the region information Corresponding voice database and grammar database, call the corresponding grammar file in the grammar database; S3: Identify the audio information according to the grammar file and the voice database. The invention has the beneficial effect of improving the recognition rate of multi-timbre and large vocabulary.

Description

Speech Recognition System and Method

技术领域technical field

本发明涉及语音识别领域，尤其涉及一种语音识别系统及方法。The invention relates to the field of speech recognition, in particular to a speech recognition system and method.

背景技术Background technique

现有语音识别的方法主要有动态时间归正技术DTW、矢量量化技术VQ、隐马尔可夫模型HMM和人工神经元网络ANN。The existing speech recognition methods mainly include dynamic time correction technology DTW, vector quantization technology VQ, hidden Markov model HMM and artificial neural network ANN.

动态时间归正技术DTW是较早的一种模式匹配和模型训练技术，它应用动态规划方法成功解决了语音信号特征参数序列比较时时长不等的难题，在孤立词语音识别中获得了良好性能。Dynamic Time Rectification Technology (DTW) is an earlier pattern matching and model training technology. It applies dynamic programming method to successfully solve the problem of unequal duration when comparing speech signal feature parameter sequences, and obtains good performance in isolated word speech recognition. .

矢量量化技术VQ从训练语音提取特征矢量，得到特征矢量集，通过LBG算法生成码本，在识别时从测试语音提取特征矢量序列，把它们与各个码本进行匹配，计算各自的平均量化误差，选择平均量化误差最小的码本，作为被识别的语音。Vector quantization technology VQ extracts the feature vector from the training speech, obtains the feature vector set, generates the codebook through the LBG algorithm, extracts the feature vector sequence from the test speech during recognition, matches them with each codebook, and calculates the average quantization error of each, The codebook with the smallest average quantization error is selected as the recognized speech.

隐马尔可夫模型HMM是语音信号时变特征的有参表示法，它由相互关联的两个随机过程共同描述信号的统计特性，其中一个是隐蔽的（不可观测的）具有有限状态的Markov链，另一个是与Markov链的每一状态相关联的观察矢量的随机过程（可观测的）。隐蔽Markov链的特征要靠可观测到的信号特征揭示。这样，语音时变信号某一段的特征就由对应状态观察符号的随机过程描述，而信号随时间的变化由隐蔽Markov链的转移概率描述。模型参数包括HMM拓扑结构、状态转移概率及描述观察符号统计特性的一组随机函数。按照随机函数的特点，HMM模型可分为离散隐马尔可夫模型（和连续隐马尔可夫模型以及半连续隐马尔可夫模型。The Hidden Markov Model (HMM) is a parametric representation of the time-varying features of the speech signal. It describes the statistical characteristics of the signal by two interrelated stochastic processes, one of which is a hidden (unobservable) Markov chain with finite states. , and the other is the stochastic process (observable) of the observation vector associated with each state of the Markov chain. The characteristics of the hidden Markov chain are revealed by the observable signal characteristics. In this way, the characteristics of a segment of the time-varying speech signal are described by the stochastic process of the corresponding state observation symbols, and the time-varying signal is described by the transition probability of the hidden Markov chain. Model parameters include HMM topology, state transition probabilities, and a set of random functions describing the statistical properties of observed symbols. According to the characteristics of random functions, HMM models can be divided into discrete hidden Markov models (and continuous hidden Markov models and semi-continuous hidden Markov models.

人工神经元网络在语音识别中的应用是现在研究的又一热点。ANN本质上是一个自适应非线性动力学系统，模拟了人类神经元活动的原理，具有自学、联想、对比、推理和概括能力。The application of artificial neural network in speech recognition is another research hotspot now. ANN is essentially an adaptive nonlinear dynamic system, which simulates the principle of human neuron activity, and has the ability of self-learning, association, comparison, reasoning and generalization.

目前的上述主流语音识别方法都存在一些不足，其中主要的缺点为：针对同一词汇，如果采用不同地区的口音说出来，音色会有一定的改变，这会导致语音识别率会大大降低。There are some deficiencies in the above-mentioned mainstream speech recognition methods. The main disadvantage is that if the same vocabulary is spoken with accents from different regions, the timbre will change to a certain extent, which will greatly reduce the speech recognition rate.

发明内容Contents of the invention

针对现有语音识别技术中对于音色改变时会导致语音识别率降低的缺陷，提供一种语音识别系统及方法。A speech recognition system and method are provided aiming at the defect in the existing speech recognition technology that the speech recognition rate will be reduced when the timbre is changed.

本发明解决技术问题采用的技术方案是：提供一种语音识别方法，包括以下步骤：The technical solution adopted in the present invention to solve the technical problem is: provide a kind of speech recognition method, comprise the following steps:

S1：采集待识别的音频信息以及与所述音频信息对应的地区信息；S1: Collect audio information to be identified and region information corresponding to the audio information;

S2：根据所述地区信息调用与该地区信息对应的语音数据库和语法数据库，调用所述语法数据库中对应的语法文件；S2: call the voice database and the grammar database corresponding to the region information according to the region information, and call the corresponding grammar file in the grammar database;

S3：根据所述语法文件以及所述语音数据库对所述音频信息进行识别。S3: Identify the audio information according to the grammar file and the voice database.

在本发明提供的语音识别方法中，所述步骤S1还包括：采集与所述音频信息对应的业务种类信息；In the speech recognition method provided by the present invention, the step S1 further includes: collecting service category information corresponding to the audio information;

所述步骤S2中调用所述语法数据库中的语法文件进一步包括：根据所述业务种类信息在所述语法数据库中调用与所述业务种类信息对应的语法文件。Calling the grammar file in the grammar database in the step S2 further includes: calling a grammar file corresponding to the business type information in the grammar database according to the business type information.

在本发明提供的语音识别方法中，在所述步骤S1之前还包括步骤S0：根据不同地区建立多个语音数据库和多个语法数据库并保存，每个所述语法数据库中对应业务种类生成语法文件。In the voice recognition method provided by the present invention, step S0 is also included before the step S1: multiple voice databases and multiple grammar databases are established and stored according to different regions, and grammar files are generated corresponding to business types in each grammar database .

在本发明提供的语音识别方法中，在所述步骤S1中，所述采集所述音频信息对应的地区信息包括：查询发出所述音频信息的服务器的号码，根据所述号码查询并提取所述音频信息对应的地区信息。In the speech recognition method provided by the present invention, in the step S1, the collecting the area information corresponding to the audio information includes: querying the number of the server that sends the audio information, querying and extracting the Region information corresponding to the audio information.

在本发明提供的语音识别方法中，在所述步骤S3还包括：设置关键字或词语，开始识别所述音频信息时开始计时，识别到所述关键字或词语时停止计时，输出识别时间。In the voice recognition method provided by the present invention, the step S3 further includes: setting keywords or words, starting timing when the audio information is recognized, stopping timing when the keywords or words are recognized, and outputting the recognition time.

本发明还提供了一种语音识别系统，包括：The present invention also provides a speech recognition system, comprising:

采集模块，所述采集模块包括用于采集待识别的音频信息第一采集单元以及用于采集与所述音频信息对应的地区信息的第二采集单元；A collection module, the collection module comprising a first collection unit for collecting audio information to be identified and a second collection unit for collecting region information corresponding to the audio information;

调度模块，所述调度模块用于根据所述地区信息选择调用与该地区对应的语音数据库和语法数据库并在所述语法数据库中调用对应的语法文件；A scheduling module, the scheduling module is used to select and call the voice database and the grammar database corresponding to the region according to the region information and call the corresponding grammar file in the grammar database;

识别模块，所述识别模块用于根据所述语法文件以及所述语音数据库对所述音频信息进行识别。A recognition module, configured to recognize the audio information according to the grammar file and the speech database.

在本发明提供的语音识别系统中，所述采集模块还包括用于采集与所述音频信息对应的业务种类信息的第三采集单元，所述调度模块还用于根据所述业务种类信息在所述语法数据库中调用与所述业务种类信息对应的语法文件。In the speech recognition system provided by the present invention, the collection module further includes a third collection unit for collecting business type information corresponding to the audio information, and the scheduling module is also used for Call the grammar file corresponding to the business type information in the grammar database.

在本发明提供的语音识别系统中，所述语音识别系统还包括存储模块，所述存储模块用于存储根据不同地区建立的所述多个语音数据库和所述多个语法数据库，每个所述语法数据库中对应业务种类生成语法文件。In the speech recognition system provided by the present invention, the speech recognition system further includes a storage module, the storage module is used to store the plurality of speech databases and the plurality of grammar databases established according to different regions, each of the Generate grammar files corresponding to business types in the grammar database.

在本发明提供的语音识别系统中，所述第二采集单元包括用于查询发出所述音频信息的服务器的号码的第一查询子单元以及用于根据所述号码查询并提取所述音频信息对应的地区信息的第一提取子单元。In the speech recognition system provided by the present invention, the second collection unit includes a first query subunit for querying the number of the server that sends out the audio information, and a first query subunit for querying and extracting the corresponding audio information according to the number. The first extraction subunit of the region information.

在本发明提供的语音识别系统中，还包括计时模块和设置模块，所述设置模块用于设置关键字或词语，所述计时模块用于在所述识别模块开始识别所述音频信息时计时，所述识别模块识别到所述关键字或词语时停止计时并输出识别时间。In the speech recognition system provided by the present invention, it also includes a timing module and a setting module, the setting module is used to set keywords or words, and the timing module is used to time when the recognition module starts to recognize the audio information, When the recognition module recognizes the keywords or words, it stops counting and outputs the recognition time.

本发明提供的语音识别系统及方法相对于现有技术具有以下有益效果：由于本发明提供的语音识别系统及方法在识别待识别的音频信息时，调用的是与音频信息的地区信息相对应的语音数据库和语法数据库，因此可以避免同一词汇因地区不同而口音不同，从而导致语音识别率降低，并且由于不同地区的业务种类是不同的，因此针对不同地区的业务种类情况调用相应的语法数据库可以提高语音识别的效率；因此本发明具有提高多音色识别率的有益效果。Compared with the prior art, the speech recognition system and method provided by the present invention have the following beneficial effects: when the speech recognition system and method provided by the present invention recognize the audio information to be recognized, they call the region information corresponding to the audio information. Speech database and grammatical database, so that the same vocabulary can be avoided due to different accents in different regions, resulting in a decrease in the speech recognition rate, and because the types of business in different regions are different, it is possible to call the corresponding grammatical database for the types of business in different regions Improve the efficiency of speech recognition; therefore, the present invention has the beneficial effect of improving the multi-timbre recognition rate.

附图说明Description of drawings

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:

图1是本发明第一实施例中的语音识别系统的原理框图；Fig. 1 is the functional block diagram of the speech recognition system in the first embodiment of the present invention;

图2是本发明图1所示实施例中的第二采集单元的原理框图；Fig. 2 is the functional block diagram of the second acquisition unit in the embodiment shown in Fig. 1 of the present invention;

图3是本发明第二实施例中的语音识别系统的原理框图；Fig. 3 is the functional block diagram of the speech recognition system in the second embodiment of the present invention;

图4是本发明第三实施例中的语音识别系统的原理框图；Fig. 4 is the functional block diagram of the speech recognition system in the third embodiment of the present invention;

图5是本发明第五实施例中的语音识别系统的原理框图；Fig. 5 is the functional block diagram of the speech recognition system in the fifth embodiment of the present invention;

图6是本发明第一实施例中的语音识别方法的流程框图；Fig. 6 is a block flow diagram of the speech recognition method in the first embodiment of the present invention;

图7是本发明第二实施例中的语音识别方法的流程框图；Fig. 7 is a block flow diagram of the speech recognition method in the second embodiment of the present invention;

图8是本发明第三实施例中的语音识别方法的流程框图。Fig. 8 is a flowchart of the voice recognition method in the third embodiment of the present invention.

具体实施方式Detailed ways

为了解决现有技术中所存在的音色改变导致语音识别率大大降低的缺陷，本发明的创新点在于：针对不同地区不同音色提供对应语音数据库和语法数据库，以提高语音识别率。In order to solve the defect in the prior art that the speech recognition rate is greatly reduced due to the change of timbre, the innovation of the present invention is to provide corresponding speech databases and grammar databases for different timbres in different regions, so as to improve the speech recognition rate.

为了使本发明的目的、技术方案以及优点描述更加清楚明白，以下结合附图和实施例，对本发明进行进一步详细说明。应当理解的是，此处所描述的具体实施例仅用于解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and description of the advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

图1示出了本发明第一实施例中的语音识别系统，其主要应用于业务监控，监控服务器自动模拟真实用户向运营商的服务器发出指令并对运营商的服务器回复的音频信息进行识别，以判断业务服务水平是否符合预期。该语音识别系统包括：采集模块1、调度模块2以及识别模块3。该采集模块1包括用于采集待识别的音频信息的第一采集单元11以及用于采集与该音频信息对应的地区信息第二采集单元12。调度模块2用于根据地区信息选择调用与该地区对应的语音数据库和语法数据库并在语法数据库中调用对应的语法文件。识别模块3用于根据调用的语法文件以及语音数据库对音频信息进行识别并输出识别结果。采集模块1、调度模块2以及识别模块3依次连接。Fig. 1 shows the speech recognition system in the first embodiment of the present invention, and it is mainly used in business monitoring, and the monitoring server automatically imitates the real user to issue instructions to the operator's server and recognizes the audio information that the operator's server replies, To judge whether the business service level meets expectations. The speech recognition system includes: a collection module 1 , a scheduling module 2 and a recognition module 3 . The collection module 1 includes a first collection unit 11 for collecting audio information to be identified and a second collection unit 12 for collecting region information corresponding to the audio information. The scheduling module 2 is used to select and call the speech database and the grammar database corresponding to the region according to the region information, and call the corresponding grammar file in the grammar database. The recognition module 3 is used to recognize the audio information according to the called grammar file and the voice database and output the recognition result. The acquisition module 1, the scheduling module 2 and the identification module 3 are connected in sequence.

该语音识别系统是通过在监控服务器上安装程序来实现的。即通过监控服务器的CPU根据实现这些功能的软件程序进行工作来实现采集模块1、调度模块2以及识别模块3的功能。The voice recognition system is realized by installing a program on the monitoring server. That is, the functions of the acquisition module 1 , the scheduling module 2 and the identification module 3 are realized by the CPU of the monitoring server working according to the software program for realizing these functions.

监控服务器模拟真实用户向运营商的服务器发出指令，运营商的服务器自动回复音频信息，安装在监控服务器上的采集模块1的第一采集单元11通过通信链路采集该待识别的音频信息。采集模块1的第二采集单元12用于获取与该待识别的音频信息对应的地区信息并将采集到的地区信息发送给调度模块2。调度模块2根据这个地区信息，调用与该地区信息所对应的语音数据库以及语法数据库，并调用语法数据库中的对应的语法文件。语音识别模块3根据调用的语音数据库以及语法文件对待识别的音频信息进行识别，并输出识别结果。The monitoring server simulates a real user to issue instructions to the operator's server, and the operator's server automatically replies with audio information, and the first acquisition unit 11 of the acquisition module 1 installed on the monitoring server collects the audio information to be identified through a communication link. The second collection unit 12 of the collection module 1 is configured to obtain region information corresponding to the audio information to be recognized and send the collected region information to the scheduling module 2 . According to the region information, the scheduling module 2 invokes the voice database and the grammar database corresponding to the region information, and invokes the corresponding grammar files in the grammar database. The speech recognition module 3 recognizes the audio information to be recognized according to the called speech database and grammar file, and outputs the recognition result.

如图2所示，第二采集单元12可以包括用于查询运营商的服务器的号码的第一查询子单元以及用于根据该号码查询并提取运营商的服务器的地区信息的第一提取子单元。调度模块2就调度与该地区信息对应的语音数据库以及语法数据库。As shown in Figure 2, the second collection unit 12 may include a first query subunit for querying the number of the operator's server and a first extraction subunit for querying and extracting the area information of the operator's server according to the number . The scheduling module 2 schedules the voice database and grammar database corresponding to the area information.

可以理解地，第二采集单元12还可以通过其他方式获取待识别的音频信息对应的地区信息。It can be understood that the second acquisition unit 12 can also acquire the area information corresponding to the audio information to be identified in other ways.

在本实施例中，识别模块3采用模版匹配HMM法来实现语音识别。该识别模块3包括特征提取单元、声学识别单元、语言识别单元。其中，特征提取单元用于对采集到的音频信息所对应的语音波形进行特征提取，以得到语音声学特征。可采用传统的语音特征提取算法对语音波形进行特征提取，例如提取MFCC(Mel频率倒谱系统)、LPC(线性预测编码系数)、语音能量等。声学识别单元根据调度模块2调用的语音数据库中的各个声学模型的特征量依次与特征提取单元提取的特征量进行对照，以得到该音频信息对应的音素串。In this embodiment, the recognition module 3 implements speech recognition by using the template matching HMM method. The recognition module 3 includes a feature extraction unit, an acoustic recognition unit, and a language recognition unit. Wherein, the feature extraction unit is used to perform feature extraction on the speech waveform corresponding to the collected audio information, so as to obtain speech acoustic features. Traditional speech feature extraction algorithms can be used to extract features from speech waveforms, such as extracting MFCC (Mel Frequency Cepstrum System), LPC (Linear Predictive Coding Coefficient), and speech energy. The acoustic recognition unit compares the feature quantities of each acoustic model in the speech database called by the scheduling module 2 with the feature quantities extracted by the feature extraction unit in order to obtain the phoneme string corresponding to the audio information.

其中，声学模型是根据语音会成为什么样的特征量进行建模而得的数据。由于每一地区对于同一词语的发音不同，因此其声学模型也是不相同的，所以要根据各个地区的不同建立多个语音数据库，每个语音数据库中设置有根据该地区的发音习惯建立的声学模型，这可以提高语音识别准确率以及识别速度。语言识别单元根据调度模块2调用的语法文件对音素串进行识别，得到单词串并输出。由于每个地区的业务情况是不同的，因此，根据不同地区的业务情况建立对应的语法数据库，可以提高语音识别率。Among them, the acoustic model is data obtained by modeling what kind of feature quantity the speech will be. Since the pronunciation of the same word is different in each region, its acoustic model is also different. Therefore, multiple speech databases should be established according to the differences in each region. Each speech database is equipped with an acoustic model established according to the pronunciation habits of the region. , which can improve speech recognition accuracy and recognition speed. The language recognition unit recognizes the phoneme string according to the grammar file invoked by the scheduling module 2, obtains and outputs the word string. Since the business conditions of each region are different, establishing corresponding grammar databases according to the business conditions of different regions can improve the speech recognition rate.

对于同一地区的同一业务而言，其语音词汇基本固定。因此，可以采用为每个业务编写一个小范围的语法文件。语法文件将一些词汇短语通过特定的规则组合在一起来限定语音识别单元输出结果的数据模型。因此，如图3所示，在第二实施例中，在第一实施例的基础上，采集模块1还可以包括第三采集单元13，该第三采集单元13用于采集与该音频信息对应的业务种类信息。调度模块2先根据地区信息调用与该地区信息对应的语法数据库，然后再根据业务种类信息在该语法数据库中调用与该业务种类对应的语法文件。在该根据业务种类生产的语法文件中，主要包括与该业务相关的关键词汇短语等，因此可以提高识别效率。For the same business in the same region, its phonetic vocabulary is basically fixed. Therefore, it is possible to write a small-scale grammar file for each business. The grammar file combines some vocabulary phrases through specific rules to define the data model of the output result of the speech recognition unit. Therefore, as shown in FIG. 3, in the second embodiment, on the basis of the first embodiment, the acquisition module 1 can also include a third acquisition unit 13, which is used to acquire the audio information corresponding to the audio information. business type information. The dispatching module 2 first invokes the grammar database corresponding to the region information according to the region information, and then invokes the grammar file corresponding to the business category in the grammar database according to the service category information. The grammar file produced according to the type of business mainly includes key vocabulary phrases related to the business, so the recognition efficiency can be improved.

例如，B地区有b1、b2、b3、b4四种业务。对应B地区，根据这四种业务建立语法数据库，该语法数据库中对应包括四个语法文件。每一语法文件中只具有与该种业务相关的各种常见语法词汇以及与该常见语法词汇所对应的音素串。当识别单元识别到该音素串时，输出与该音素串对应的单词串作为识别结果并输出。For example, region B has four services b1, b2, b3, and b4. Corresponding to area B, a grammar database is established according to these four kinds of services, and the grammar database includes four grammar files correspondingly. Each grammatical file only has various common grammatical words related to the business and phoneme strings corresponding to the common grammatical words. When the recognition unit recognizes the phoneme string, it outputs the word string corresponding to the phoneme string as a recognition result.

在该实施例中，调度模块2调度语音数据库和语法数据库的具体过程为：获取音频信息、对应省份信息。获取当前语音数据库库信息：当前语音数据库对应的地区信息、当前语音数据库的名字、当前语音数据库的路径。获取待识别语音库信息：待识别省份信息、待识别语音库名字、待识别语音库路径。当前语音库信息和待识别语音库信息进行比对。如果不一致则将对应省份的备份语音库和当前语音库进行切换，返回当前语音库对应省份的语法文件。如果一致则直接返回当前语音库对应省份的语法文件。In this embodiment, the scheduling module 2 schedules the speech database and the grammar database in a specific process: acquiring audio information and corresponding province information. Obtain the information of the current voice database: the region information corresponding to the current voice database, the name of the current voice database, and the path of the current voice database. Obtain the information of the speech database to be recognized: the province information to be recognized, the name of the speech database to be recognized, and the path of the speech database to be recognized. The current voice database information is compared with the voice database information to be recognized. If they are not consistent, switch between the backup voice database of the corresponding province and the current voice database, and return the grammar file of the province corresponding to the current voice database. If it is consistent, it will directly return the grammar file of the province corresponding to the current speech database.

如图4所示，在第三实施例中，在第二实施例的基础上，本发明提供的语音识别系统还包括存储模块4。存储模块4用于存储根据不同地区生成的多个语音数据库以及多个语法数据库。调度模块2从该存储模块4中调用语音数据库以及语法数据库。As shown in FIG. 4 , in the third embodiment, on the basis of the second embodiment, the speech recognition system provided by the present invention further includes a storage module 4 . The storage module 4 is used for storing multiple speech databases and multiple grammar databases generated according to different regions. The scheduling module 2 calls the speech database and the grammar database from the storage module 4 .

在第四实施例中，在第三实施例的基础上，本发明提供的语音识别系统还包括修复模块。调度模块2调度语音数据库失败时，对语音数据库进行自动修复。在实际工作过程中，调度模块2调度语音数据库失败的原因有很多。语音库切换可以以重命名形式进行。如果重命名错误的话，自动修复过程就是将被重命名错误的语音库修改过来，或者采用人工删除语音库则自动修复程序就会到备份语音库路径中将其找到并复制到现有语音库路径。还有可能是该语音数据库正处于被占用状态，此时就需要进行等待，当被占用的语音数据库被使用完以后即可供调度模块2再次调用。有可能是储存模块4中没有与该地区对应的语音数据库，那么此时就需要发出缺少该语音数据库的提示信息。修复模块具有自动修复语音数据库功能。每一个语音数据库都有相同的备用语音数据库，当调用某一地区的语音数据库失败时，即可切换到调用对应的备用语音数据库。In the fourth embodiment, on the basis of the third embodiment, the speech recognition system provided by the present invention further includes a repair module. When the scheduling module 2 fails to schedule the voice database, it automatically repairs the voice database. In the actual work process, there are many reasons why the scheduling module 2 fails to schedule the voice database. Voice bank switching can be done in the form of renaming. If the renaming is wrong, the automatic repair process is to modify the incorrectly renamed voice library, or manually delete the voice library, then the automatic repair program will find it in the backup voice library path and copy it to the existing voice library path . It is also possible that the voice database is in the occupied state. At this time, it is necessary to wait. After the occupied voice database is used up, it can be called again by the scheduling module 2. It may be that there is no voice database corresponding to the region in the storage module 4, so at this time, a prompt message that the voice database lacks needs to be issued. The repair module has the function of automatically repairing the voice database. Each voice database has the same standby voice database, and when calling the voice database in a certain area fails, it can switch to call the corresponding standby voice database.

可以理解地，在上述实施例的基础上，在第五实施例中，如图5所示，本发明提供的语音识别系统还包括设置模块5和计时模块6，该设置模块5可以设置识别的灵敏度、识别语种信息等。该设置模块5还可以设置关键字或词语，计时模块6在识别模块3开始识别时计时，当识别模块3识别到设置模块6预先设置的关键字或词语时停止计时，计时模块6输出识别模块3识别到关键字或词语所用的时间。该时间可以用来判断识别模块3的识别效率是否达到预期。It can be understood that, on the basis of the above-mentioned embodiments, in the fifth embodiment, as shown in FIG. Sensitivity, recognition language information, etc. This setting module 5 can also set keywords or words, and timing module 6 counts when recognition module 3 starts to identify, stops timing when recognition module 3 recognizes the keywords or words that setting module 6 presets, and timing module 6 outputs recognition module 3 The time it takes to recognize a key word or phrase. This time can be used to judge whether the recognition efficiency of the recognition module 3 meets expectations.

如图6所示，本发明还提供了一种语音识别方法，该语音识别方法在第一实施例中包括以下步骤：As shown in Fig. 6, the present invention also provides a kind of speech recognition method, and this speech recognition method comprises the following steps in the first embodiment:

S1：采集待识别的音频信息以及音频信息对应的地区信息。在该步骤中，监控服务器模拟真人输入指令并发送给运营商的服务器，运营商的服务器根据该指令自动回复音频信息，安装在监控服务器中的采集模块1的第一采集单元11通过通信链路采集该音频信息。采集模块1的第二采集单元12可以通过查询发出音频信息的运营商的服务器的号码，然后根据该号码查询并提取运营商的服务器对应的地区信息，也即是音频信息对应的地区信息。S1: Collect audio information to be identified and region information corresponding to the audio information. In this step, the monitoring server simulates a real person to input an instruction and sends it to the operator's server, and the operator's server automatically replies to the audio information according to the instruction, and the first acquisition unit 11 of the acquisition module 1 installed in the monitoring server passes the communication link Collect the audio information. The second collection unit 12 of the collection module 1 can query the number of the operator's server that sends out the audio information, and then query and extract the area information corresponding to the operator's server according to the number, that is, the area information corresponding to the audio information.

S2：根据地区信息选择与该地区对应的语音数据库和语法数据库；调用语法数据库中的语法文件。在该步骤中，调度模块2根据第二采集单元12采集的地区信息，在存储模块4中调用与该地区信息对应的语音数据库以及语法数据库，并进一步调度该语法数据库中的语法文件。S2: Select the speech database and the grammar database corresponding to the region according to the region information; call the grammar files in the grammar database. In this step, the dispatching module 2 invokes the speech database and the grammar database corresponding to the district information in the storage module 4 according to the region information collected by the second collection unit 12, and further dispatches the grammar files in the grammar database.

S3：识别模块3根据被调度的语法文件以及语音数据库对音频信息进行识别，并输出识别结果。S3: The recognition module 3 recognizes the audio information according to the scheduled grammar file and the voice database, and outputs the recognition result.

在该语音识别方法的第一实施例的基础上，如图7所示，在第二实施例中，步骤S1还可以进一步包括以下步骤：采集模块1的第三采集单元13采集与音频信息对应的业务种类信息。相应地，在步骤S2中，调用语法数据库中的语法文件具体为：调度模块2根据业务种类信息在语法数据库中调用与该业务种类信息对应的语法文件。在该步骤中，可以预先根据识别的业务种类给识别模块设置识别灵敏度。也可以设置关键字或词汇，计时模块5从识别模块3开始识别时开始计时，当识别模块3识别到设置模块6预先设置的关键字或词语时停止计时，计时模块5输出识别模块3识别到关键字或词语所用的时间。该时间可以用来判断识别模块3的识别效率是否达到预期。如图8所示，在第二实施例的基础上，该语音识别方法在第三实施例中还可以包括步骤S0：根据不同地区建立多个语音数据库和多个语法数据库并保存在储存模块4中，每个语法数据库中根据业务种类的不同生成有多个语法文件。On the basis of the first embodiment of the speech recognition method, as shown in FIG. 7 , in the second embodiment, step S1 may further include the following steps: the third acquisition unit 13 of the acquisition module 1 acquires the corresponding audio information business type information. Correspondingly, in step S2, invoking the grammar file in the grammar database specifically includes: the scheduling module 2 calls the grammar file corresponding to the business type information in the grammar database according to the business type information. In this step, the identification sensitivity can be set in advance for the identification module according to the type of identified business. Also keyword or vocabulary can be set, timing module 5 begins timing when identification module 3 begins to identify, stops timing when identification module 3 recognizes the keyword or word that setting module 6 presets, and timing module 5 outputs identification module 3 and recognizes The time used for keywords or phrases. This time can be used to judge whether the recognition efficiency of the recognition module 3 meets expectations. As shown in Figure 8, on the basis of the second embodiment, the voice recognition method may also include step S0 in the third embodiment: set up multiple voice databases and multiple grammar databases according to different regions and store them in the storage module 4 In each grammar database, multiple grammar files are generated according to different types of business.

在第四实施例中，地区信息也即是省份信息，该语音识别方法包括以下步骤：In the fourth embodiment, the region information is the province information, and the voice recognition method includes the following steps:

S1：采集模块1采集待识别音频信息所在路径、名称以及该待识别的音频信息对应的省份信息、识别该待识别的音频信息对应的业务信息、与该业务相关的关键字，并把这些采集的信息传输给调度模块。S1: Acquisition module 1 collects the path and name of the audio information to be identified, the province information corresponding to the audio information to be identified, identifies the business information corresponding to the audio information to be identified, and keywords related to the business, and collects these The information is transmitted to the scheduling module.

S2：调度模块2读取当前语音数据库库所属省份、名称、路径，将当前语音数据库所属地区信息和待识别的音频信息所对应的省份信息进行比对。S2: The scheduling module 2 reads the province, name, and path to which the current voice database belongs, and compares the region information to which the current voice database belongs with the province information corresponding to the audio information to be recognized.

如果调度模块2判断当前语音数据库库所对应的省份信息和带识别的音频信息所对应的省份信息相同则不需要进行语音数据库库切换操作，直接返回与待识别的音频信息的省份信息对应的语法数据库的路径和名称。If the scheduling module 2 judges that the province information corresponding to the current voice database is the same as the province information corresponding to the audio information with recognition, it does not need to switch the voice database, and directly returns the grammar corresponding to the province information of the audio information to be recognized The path and name of the database.

如果调度模块2判断当前语音数据库所对应的属省份信息和传入省份不同则调度模块2进行语音数据库切换，即将当前语音数据库库重命名为对应省份的备份语音数据库，再将与待识别的音频信息对应的语音数据库重命名为当前语音数据库，最后返回当前语音数据库对应省份的语法数据库。并根据业务种类信息调用该语法数据库中的语法文件。If scheduling module 2 judges that the province information corresponding to the current voice database is different from the incoming province, then scheduling module 2 performs voice database switching, that is, renaming the current voice database database to the backup voice database of the corresponding province, and then combining with the audio to be identified The voice database corresponding to the information is renamed to the current voice database, and finally the grammar database of the province corresponding to the current voice database is returned. And call the grammar file in the grammar database according to the business category information.

S3：如果调度模块2调度语音数据库和语法数据库不成功则直接退出识别流程，并进行语音数据库的自动检索和自动修复。如果调度成功根据识别所需的灵敏度、精确度、识别语种等参数对识别模块3进行设置。S3: If the scheduling module 2 fails to schedule the speech database and grammar database, it will directly exit the recognition process, and perform automatic retrieval and automatic repair of the speech database. If the scheduling is successful, the recognition module 3 is set according to the parameters required for recognition, such as sensitivity, accuracy, and recognition language.

根据识别模块3的识别过程创建识别上下文，用于记录识别过程中产生的信息包括识别成功信息、错误信息、异常信息等。A recognition context is created according to the recognition process of the recognition module 3, and is used to record information generated during the recognition process, including recognition success information, error information, abnormal information, and the like.

在识别模块3中创建识别音频流，设置音频流的Formart参数，如播放频率、赫兹等，并将需要识别的音频信息绑定到识别音频流上。激活识别模块3、识别模块3加载语音数据库和语法文件并导入识别音频流开始识别流程，识别模块3被激活后则等待识别专用的Windows自定义消息被触发。当识别专用的Windows自定义消息被触发时，如果Windows自定义消息的参数为识别结束则识别模块退出识别流程。如果Windows自定义消息的参数为识别内容则将识别内容提取并保存到识别结果中。如果Windows自定义消息的参数为识别错误则根据具体错误类型进行处理。Create a recognition audio stream in the recognition module 3, set the Format parameters of the audio stream, such as playback frequency, hertz, etc., and bind the audio information that needs to be recognized to the recognition audio stream. Activate the recognition module 3, and the recognition module 3 loads the voice database and grammar files and imports the recognition audio stream to start the recognition process. After the recognition module 3 is activated, it waits for the recognition-specific Windows custom message to be triggered. When the recognition-specific Windows custom message is triggered, if the parameter of the Windows custom message is recognition end, the recognition module exits the recognition process. If the parameter of the Windows custom message is recognition content, the recognition content is extracted and saved in the recognition result. If the parameter of the Windows custom message is an identification error, handle it according to the specific error type.

在该实施例中还可以创建其他自定义提示消息并确保不和其他提示消息冲突，设置识别专用消息的消息处理函数和重点监听的识别消息类型。In this embodiment, it is also possible to create other user-defined prompt messages and ensure that they do not conflict with other prompt messages, and set a message processing function for identifying a dedicated message and a type of identifying message to focus on monitoring.

综上所述，本发明提供的语音识别系统及方法由于针对不同地区设置了语音数据库和语法数据库，在识别的过程中，对应调用该语音数据库和语法数据库可以提供语音识别率和识别速度。另外，对应每一种业务还专门在语法数据库中设置了语法文件，识别过程中真的业务种类，调用相应的语法文件，能够进一步提高语音识别率和识别速度。To sum up, since the speech recognition system and method provided by the present invention are provided with speech databases and grammar databases for different regions, during the recognition process, the speech recognition rate and recognition speed can be provided by correspondingly calling the speech database and grammar database. In addition, a grammar file is specially set up in the grammar database corresponding to each kind of business, and the real business type in the recognition process can be called the corresponding grammar file, which can further improve the speech recognition rate and recognition speed.

应当理解的是，上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，这些均属于本发明的保护之内。It should be understood that the embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific implementations, and the above-mentioned specific implementations are only illustrative rather than restrictive. Under the enlightenment of the present invention, those skilled in the art can also make many forms without departing from the purpose of the present invention and the scope protected by the claims, and these all belong to the protection of the present invention.

Claims

1. A speech recognition method, characterized in that, comprises the following steps:

S1: Collect audio information to be identified and region information corresponding to the audio information;

S2: call the voice database and the grammar database corresponding to the region information according to the region information, and call the corresponding grammar file in the grammar database;

S3: Identify the audio information according to the grammar file and the voice database.

2. The speech recognition method according to claim 1, wherein the step S1 further comprises: collecting business category information corresponding to the audio information;

Calling the grammar file in the grammar database in the step S2 further includes: calling a grammar file corresponding to the business type information in the grammar database according to the business type information.

3. The speech recognition method according to claim 2, characterized in that, before said step S1, step S0 is also included: setting up and saving a plurality of speech databases and a plurality of grammar databases according to different regions, each of said grammar databases Generate grammar files corresponding to business types.

4. The speech recognition method according to any one of claims 1 to 3, characterized in that, in the step S1, the collecting the region information corresponding to the audio information comprises: querying the server that sends the audio information number, query and extract the region information corresponding to the audio information according to the number.

5. The speech recognition method according to any one of claims 1 to 3, characterized in that, in the step S3, it also includes: setting keywords or words, timing when starting to recognize the audio information, recognizing the key Stop timing when the word or phrase is detected, and output the recognition time.

6. The speech recognition system according to claim 1, characterized in that, comprising:

A collection module (1), the collection module (1) comprising a first collection unit (11) for collecting audio information to be identified and a second collection unit (12) for collecting area information corresponding to the audio information ;

A dispatching module (2), the dispatching module (2) is used to select and call a speech database and a grammar database corresponding to the region according to the region information, and call a corresponding grammar file in the grammar database;

A recognition module (3), configured to recognize the audio information according to the grammar file and the voice database.

7. The speech recognition system according to claim 6, characterized in that, the collection module (1) further comprises a third collection unit (13) for collecting business category information corresponding to the audio information, the The scheduling module (2) is further configured to call a grammar file corresponding to the business category information in the grammar database according to the business category information.

8. The voice recognition system according to claim 7, further comprising a storage module (4), the storage module (4) is used to store the multiple voice databases and the multiple voice databases established according to different regions. Grammar databases, and grammar files are generated corresponding to business types in each of the grammar databases.

9. The speech recognition system according to any one of claims 6 to 8, characterized in that, the second collection unit (12) includes a first query subunit for querying the number of the server that sends out the audio information And a first extracting subunit for querying and extracting region information corresponding to the audio information according to the number.

10. The speech recognition system according to any one of claims 6 to 8, characterized in that it also includes a timing module (5) and a setting module (6), and the setting module (5) is used to set keywords or words , the timing module (5) is used to start timing when the recognition module (6) starts to recognize the audio information, and when the recognition module (6) recognizes the keyword or word, stop timing and output the recognition time .