[go: up one dir, main page]

CN104732977B - A kind of online spoken language pronunciation quality evaluating method and system - Google Patents

A kind of online spoken language pronunciation quality evaluating method and system Download PDF

Info

Publication number
CN104732977B
CN104732977B CN201510102425.8A CN201510102425A CN104732977B CN 104732977 B CN104732977 B CN 104732977B CN 201510102425 A CN201510102425 A CN 201510102425A CN 104732977 B CN104732977 B CN 104732977B
Authority
CN
China
Prior art keywords
speech
test
evaluation
voice
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510102425.8A
Other languages
Chinese (zh)
Other versions
CN104732977A (en
Inventor
李心广
李苏梅
徐集优
张胜斌
陈君宇
李升恒
朱小凡
王泽铿
许港帆
陈嘉华
林帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Foreign Studies
Original Assignee
Guangdong University of Foreign Studies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Foreign Studies filed Critical Guangdong University of Foreign Studies
Priority to CN201510102425.8A priority Critical patent/CN104732977B/en
Publication of CN104732977A publication Critical patent/CN104732977A/en
Application granted granted Critical
Publication of CN104732977B publication Critical patent/CN104732977B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

本发明公开了一种在线口语发音质量评价方法和系统,所述方法包括:通过网络接收由移动客户端采集的测试语音;对接收到的测试语音进行预处理;对预处理后的测试语音进行语音特征参数的提取,获得所述测试语音的特征参数;根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果;将所述评价结果通过网络反馈给所述移动客户端,并通过所述移动客户端对所述评价结果进行显示。通过本发明可以实现在线的、方便的、准确的口语发音质量评价。

The invention discloses a method and system for evaluating the pronunciation quality of online spoken language. The method comprises: receiving a test voice collected by a mobile client through a network; performing preprocessing on the received test voice; and performing preprocessing on the preprocessed test voice The extraction of speech characteristic parameter obtains the characteristic parameter of described test speech; According to the characteristic parameter of described test speech and the characteristic parameter of standard speech, described test speech is evaluated, obtains evaluation result; Described evaluation result is fed back through network to the mobile client, and display the evaluation result through the mobile client. The invention can realize online, convenient and accurate oral pronunciation quality evaluation.

Description

一种在线口语发音质量评价方法和系统A method and system for evaluating online oral pronunciation quality

技术领域technical field

本发明涉及语音识别和评价技术领域,尤其涉及一种在线口语发音质量评价方法和系统。The invention relates to the technical field of speech recognition and evaluation, in particular to a method and system for evaluating the quality of online spoken language pronunciation.

背景技术Background technique

信号处理技术在语言学习中的应用是信息技术与语言学习整合的一个重要内容,其目标是将最新的语音技术与当前的教学和学习方法结合,建立计算机辅助语言学习系统,而口语发音质量评价作为辅助语言学习的重要内容一直备受关注。The application of signal processing technology in language learning is an important part of the integration of information technology and language learning. Its goal is to combine the latest voice technology with current teaching and learning methods to establish a computer-aided language learning system, and the quality of spoken pronunciation is evaluated As an important content to assist language learning, it has always attracted much attention.

然而,传统的口语发音质量评价系统,大多局限于传统的语言学习机或个人电脑上,不方便携带同时受限于有线网络链接,不利于学习者进行随时随地的口语学习。现有的适用于移动智能手机的口语发音质量评分系统,采用的是离线运行的方式,但是目前主流的移动智能手机并不能支持大数据量的存储和复杂的语音计算,限制了移动设备上的发音质量评价算法的复杂度,其评分结果不能真实反映英语口语学习者的发音质量水平。同时,现有方案将语音评价系统部署在语言学习机、个人电脑和移动设备中,不利于数据更新、存储和算法改进。进一步地,现有的口语发音质量评价系统,在发音质量评价中综合考量的评价指标不够全面,大多局限于单独或少量的评价指标,不能对用户的语音发音质量提供科学、综合和准确的评价,往往只是根据发音给出一个分数,缺乏评价与反馈。However, traditional oral pronunciation quality evaluation systems are mostly limited to traditional language learning machines or personal computers, which are inconvenient to carry and limited by wired network connections, which is not conducive to learners' oral learning anytime and anywhere. The existing oral pronunciation quality scoring system suitable for mobile smart phones adopts the mode of offline operation, but the current mainstream mobile smart phones cannot support the storage of large amounts of data and complex voice calculations, which limits the Due to the complexity of the pronunciation quality evaluation algorithm, the scoring results cannot truly reflect the pronunciation quality level of oral English learners. At the same time, the existing scheme deploys the speech evaluation system in language learning machines, personal computers and mobile devices, which is not conducive to data update, storage and algorithm improvement. Furthermore, in the existing oral pronunciation quality evaluation system, the evaluation indicators comprehensively considered in the pronunciation quality evaluation are not comprehensive enough, most of them are limited to a single or a small number of evaluation indicators, and cannot provide scientific, comprehensive and accurate evaluation of the user's voice pronunciation quality , often just give a score based on pronunciation, lack of evaluation and feedback.

发明内容Contents of the invention

本发明实施例的目的在于提供了一种在线口语发音质量评价方法和系统,以实现在线的、方便的、准确的口语发音质量评价。The purpose of the embodiments of the present invention is to provide an online oral pronunciation quality evaluation method and system, so as to realize online, convenient and accurate oral pronunciation quality evaluation.

一方面,本发明实施例提供了一种在线口语发音质量评价方法,包括:On the one hand, the embodiment of the present invention provides a method for evaluating the quality of online oral pronunciation, including:

通过网络接收由移动客户端采集的测试语音;Receive the test voice collected by the mobile client through the network;

对接收到的测试语音进行预处理;Preprocessing the received test voice;

对预处理后的测试语音进行语音特征参数的提取,获得所述测试语音的特征参数;Carry out the extraction of speech characteristic parameter to the test speech after pretreatment, obtain the characteristic parameter of described test speech;

根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果;Evaluate the test speech according to the characteristic parameters of the test speech and the characteristic parameters of the standard speech, and obtain an evaluation result;

将所述评价结果通过网络反馈给所述移动客户端,并通过所述移动客户端对所述评价结果进行显示。The evaluation result is fed back to the mobile client through the network, and the evaluation result is displayed through the mobile client.

优选地,所述在线口语发音质量评价方法还包括:Preferably, the online oral pronunciation quality evaluation method also includes:

将所述评价结果存储到数据库中,并对评价结果进行统计分析,获得统计结果;storing the evaluation results in a database, and performing statistical analysis on the evaluation results to obtain statistical results;

将统计结果发送给网页管理端,并通过网页管理端对所述统计结果进行展示。The statistical results are sent to the webpage management terminal, and the statistical results are displayed through the webpage management terminal.

优选地,所述在线口语发音质量评价方法还包括:Preferably, the online oral pronunciation quality evaluation method also includes:

获取标准语音;Get standard voice;

对所述标准语音进行预处理;Preprocessing the standard voice;

对预处理后的标准语音进行语音特征参数的提取,获取所述标准语音的特征参数。Extracting speech feature parameters from the preprocessed standard speech to obtain the feature parameters of the standard speech.

优选地,所述预处理包括预加重、分帧、加窗和端点检测。Preferably, the preprocessing includes pre-emphasis, framing, windowing and endpoint detection.

优选地,所述对预处理后的测试语音进行语音特征参数的提取,获得所述测试语音的特征参数,包括:Preferably, the extraction of speech feature parameters to the preprocessed test speech to obtain the feature parameters of the test speech includes:

对所述测试语音进行离散傅利叶变换,得到所述测试语音的频谱系数,将所述频谱系数用序列三角滤波器进行滤波,对滤波后的数据进行对数运算,利用离散余弦变换,获取所述测试语音的MFCC特征参数;Carry out discrete Fourier transform to described test speech, obtain the spectral coefficient of described test speech, filter described spectral coefficient with sequence triangular filter, carry out logarithmic operation to the data after filtering, utilize discrete cosine transform, obtain described The MFCC characteristic parameter of test speech;

对所述测试语音的基频特征、短时能量特征、共振峰特征进行提取,并将所述基频特征、所述短时能量特征和所述共振峰特征组成所述测试语音的情感特征参数;Extract the fundamental frequency feature, short-term energy feature, and formant feature of the test speech, and form the emotional feature parameter of the test speech with the fundamental frequency feature, the short-term energy feature and the formant feature ;

计算所述测试语音的发音时长,获取所述测试语音的发音时长特征参数;Calculating the pronunciation duration of the test voice, and obtaining the pronunciation duration characteristic parameter of the test voice;

对所述测试语音进行重音单元划分,提取重音的起始帧位置组与结束帧位置组,获取所述测试语音的重音位置特征参数;Carry out accent unit division to described test speech, extract the initial frame position group and the end frame position group of accent, obtain the stress position feature parameter of described test speech;

对所述测试语音进行语音单元划分,分别计算每个语音单元的时长,获取所述测试语音的语音单元时长特征参数;Carry out speech unit division to described test speech, calculate the duration of each speech unit respectively, obtain the speech unit duration characteristic parameter of described test speech;

通过时域上的自相关函数法提取所述测试语音每一帧数据的音高,获取所述测试语音的音高特征参数。The pitch of each frame of data of the test voice is extracted by an autocorrelation function method in the time domain, and the pitch characteristic parameter of the test voice is obtained.

优选地,所述根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果,包括:Preferably, the said test speech is evaluated according to the characteristic parameters of the test speech and the characteristic parameters of the standard speech, and the evaluation result is obtained, including:

根据所述测试语音的MFCC特征参数,基于分段聚类的概率神经网络集成语音识别模型,对所述测试语音进行语音识别,获得语音识别结果;并对所述测试语音的MFCC特征参数和所述标准语音的MFCC特征参数进行相似度计算,获得MFCC相关系数;根据所述语音识别结果和所述MFCC相关系数,计算出所述测试语音的准确度得分;According to the MFCC characteristic parameter of described test speech, based on the probabilistic neural network integrated speech recognition model of segmentation clustering, described test speech is carried out speech recognition, obtains speech recognition result; The MFCC feature parameter of described standard speech carries out similarity calculation, obtains MFCC correlation coefficient; According to described speech recognition result and described MFCC correlation coefficient, calculate the accuracy score of described test speech;

根据所述测试语音的情感特征参数,基于SVM情感模型,对所述测试语音进行情感识别,获得情感识别结果;并对所述测试语音的情感特征参数所述标准语音的情感特征参数进行相似度计算,获得情感相关系数;根据所述情感识别结果和所述情感相关系数,计算出所述测试语音的情感得分;According to the emotion characteristic parameter of described test speech, based on SVM emotion model, carry out emotion recognition to described test speech, obtain emotion recognition result; Calculate and obtain the emotion correlation coefficient; calculate the emotion score of the test speech according to the emotion recognition result and the emotion correlation coefficient;

根据所述标准语音和所述测试语音的发音时长特征参数,获取所述标准语音与所述测试语音的语速比,并根据所述语速比,计算出所述测试语音的语速得分;According to the pronunciation duration characteristic parameter of described standard speech and described test speech, obtain the speech speed ratio of described standard speech and described test speech, and according to described speech speed ratio, calculate the speech rate score of described test speech;

根据所述测试语音的重音位置特征参数和所述标准语音的重音位置特征参数,比对所述测试语音与所述标准语音的重音位置差异,并根据所述重音位置差异,计算出所述测试语音的重音得分;According to the stress position characteristic parameter of the test speech and the stress position characteristic parameter of the standard speech, compare the stress position difference between the test speech and the standard speech, and calculate the test according to the stress position difference Speech stress score;

根据所述测试语音的语音单元时长特征参数和所述标准语音的语音单元时长特征参数,利用dPVI算法,获取所述测试语音的dPVI参数,并根据所述dPVI参数,计算出所述测试语音的节奏得分;According to the speech unit duration characteristic parameter of the test speech and the speech unit duration characteristic parameter of the standard speech, utilize the dPVI algorithm to obtain the dPVI parameter of the test speech, and calculate the test speech according to the dPVI parameter tempo score;

根据所述测试语音的音高特征参数和所述标准语音的音高特征参数,利用DTW算法,获取所述标准语音与所述测试语音的音高差异,并根据所述音高差异,计算出所述测试语音的语调得分。According to the pitch characteristic parameter of the test speech and the pitch characteristic parameter of the standard speech, use the DTW algorithm to obtain the pitch difference between the standard speech and the test speech, and calculate according to the pitch difference The intonation score of the test speech.

优选地,所述根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果,还包括:Preferably, the evaluation of the test speech according to the characteristic parameters of the test speech and the characteristic parameters of the standard speech to obtain an evaluation result also includes:

对所述准确度得分、所述情感得分、所述语速得分、所述重音得分、所述节奏得分和所述语调得分进行加权求和,获得综合得分;并根据所述准确度得分、所述情感得分、所述语速得分、所述重音得分、所述节奏得分、所述语调得分和所述综合得分,结合各得分与等级评价的映射关系,获取所述测试语音的准确度等级评价、情感等级评价、语速等级评价、重音等级评价、节奏等级评价、语调等级评价和综合等级评价;并将所述测试语音的准确度等级评价、情感等级评价、语速等级评价、重音等级评价、节奏等级评价、语调等级评价和综合等级评价作为所述测试语音的评价结果。The accuracy score, the emotional score, the speed of speech score, the stress score, the rhythm score and the intonation score are weighted and summed to obtain a comprehensive score; and according to the accuracy score, the The emotion score, the speech rate score, the stress score, the rhythm score, the intonation score and the comprehensive score, combined with the mapping relationship between each score and grade evaluation, obtain the accuracy grade evaluation of the test voice , emotional grade evaluation, speech speed grade evaluation, accent grade evaluation, rhythm grade evaluation, intonation grade evaluation and comprehensive grade evaluation; and the accuracy grade evaluation, emotional grade evaluation, speech speed grade evaluation, stress grade evaluation , rhythm level evaluation, intonation level evaluation and comprehensive level evaluation as the evaluation results of the test speech.

优选地,所述在线口语发音质量评价方法还包括:Preferably, the online oral pronunciation quality evaluation method also includes:

根据所述评价结果,对用户的口语发音进行指导,获取发音指导意见;According to the evaluation result, guide the user's oral pronunciation, and obtain pronunciation guidance;

将所述发音指导意见通过网络反馈给所述移动客户端,并通过所述移动客户端对所述发音指导意见进行显示。The pronunciation guidance is fed back to the mobile client through the network, and the pronunciation guidance is displayed through the mobile client.

另一方面,本发明实施例提供了一种在线口语发音质量评价系统,包括通过网络连接的移动客户端和服务器端;On the other hand, an embodiment of the present invention provides an online oral pronunciation quality evaluation system, including a mobile client and a server connected through a network;

所述移动客户端包括:The mobile clients include:

语音采集单元,用于采集测试语音,并通过网络将所述测试语音发送给所述服务器端;A voice collection unit, configured to collect a test voice, and send the test voice to the server through the network;

所述服务器端包括:The server side includes:

预处理单元,用于对接收到的测试语音进行预处理;A preprocessing unit is used to preprocess the received test voice;

特征参数提取单元,用于对预处理后的测试语音进行语音特征参数的提取,获取所述测试语音的特征参数;A characteristic parameter extraction unit is used to extract the speech characteristic parameters of the preprocessed test speech, and obtain the characteristic parameters of the test speech;

语音评价单元,用于根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果;并将所述评价结果通过网络反馈给所述移动客户端;The voice evaluation unit is used to evaluate the test voice according to the feature parameters of the test voice and the feature parameters of the standard voice to obtain an evaluation result; and feed back the evaluation result to the mobile client through the network;

所述移动客户端还包括:The mobile client also includes:

数据显示单元,用于对所述评价结果进行显示。The data display unit is used to display the evaluation result.

优选地,所述系统还包括网页管理端,所述网页管理端通过网络与所述服务器端连接;所述服务器端还包括数据库和统计分析单元;Preferably, the system also includes a webpage management terminal, which is connected to the server through a network; the server also includes a database and a statistical analysis unit;

所述数据库,用于存储所述评价结果;The database is used to store the evaluation results;

所述统计分析单元,用于对评价结果进行统计分析,获得统计结果;并将所述统计结果发送给所述网页管理端;The statistical analysis unit is used to perform statistical analysis on the evaluation results to obtain statistical results; and send the statistical results to the webpage management terminal;

所述网页管理端,用于对接收到的统计结果进行展示。The webpage management terminal is used to display the received statistical results.

与现有技术相比,本发明实施例的优点在于:Compared with the prior art, the advantages of the embodiments of the present invention are:

本发明实施例基于C/S(Client/Server,客户端/服务器端)架构,构建移动客户端和服务器端,通过移动客户端采集用户的测试语音信号并发送给服务器端,服务器端对测试语音进行评价后向移动客户端返回语音评价结果,最后通过移动客户端对所述评价结果进行展示。用户可以利用移动互联网方便地接入服务器端,获取服务和数据,语料库和评价方法均可以通过服务器端实现同步,并通过服务器端提供性能更优、效果更佳的语音分析算法处理。The embodiment of the present invention is based on the C/S (Client/Server, client/server end) architecture, constructs mobile client and server end, collects user's test voice signal by mobile client end and sends to server end, and server end is to test voice signal After the evaluation, the voice evaluation result is returned to the mobile client, and finally the evaluation result is displayed through the mobile client. Users can use the mobile Internet to easily access the server to obtain services and data. The corpus and evaluation methods can be synchronized through the server, and the server provides better performance and better effect of speech analysis algorithm processing.

其次,本发明实施例还基于B/S(Browser/Server,网页端/服务器端)架构,构建网页管理端和服务器端,可以通过网页浏览器从服务器端的数据库中实时获取移动客户端用户的口语发音质量评价统计结果,为第三方(如教学者)提供移动客户端用户的口语发音情况,便于第三方制定线下口语指导和改良策略。Secondly, the embodiment of the present invention is also based on the B/S (Browser/Server, webpage/server) framework, builds the webpage management end and the server end, can obtain the spoken language of the mobile client user in real time from the database of the server end through the web browser The statistical results of pronunciation quality evaluation provide third parties (such as educators) with the oral pronunciation of mobile client users, which is convenient for third parties to formulate offline oral guidance and improvement strategies.

进一步地,本发明实施例针对测试语音进行多维度语音评价,各指标的评价方法合理、可信,并可针对用户的口语发音反馈发音指导意见,有助于纠正用户的发音错误,提高发音质量。Furthermore, the embodiments of the present invention perform multi-dimensional speech evaluation on the test speech. The evaluation method of each index is reasonable and credible, and the pronunciation guidance opinions can be fed back to the user's spoken pronunciation, which helps to correct the user's pronunciation errors and improve the pronunciation quality .

附图说明Description of drawings

图1是本发明提供的在线口语发音质量评价方法的一个实施例的步骤流程图;Fig. 1 is a flow chart of the steps of an embodiment of the online oral pronunciation quality evaluation method provided by the present invention;

图2是本发明提供的概率神经网络集成分类器的建立过程示意图;Fig. 2 is a schematic diagram of the establishment process of the probabilistic neural network integrated classifier provided by the present invention;

图3是本发明提供的在线口语发音质量评价系统的一个实施例的C/S架构示意图;Fig. 3 is the C/S architecture schematic diagram of an embodiment of the online oral pronunciation quality evaluation system provided by the present invention;

图4是如图3所示在线口语发音质量评价系统的B/S架构示意图。FIG. 4 is a schematic diagram of the B/S architecture of the online oral pronunciation quality evaluation system shown in FIG. 3 .

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。需要说明的是,实施例中各个步骤前的标号仅为了对各个步骤进行更清楚地标识,各个步骤之间没有必然的先后顺序的限定。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. It should be noted that the numbers in front of the steps in the embodiments are only for the purpose of more clearly identifying the steps, and there is no necessary sequence limitation between the steps.

参见图1,是本发明提供的在线口语发音质量评价方法的一个实施例的步骤流程图,所述方法包括:Referring to Fig. 1, it is a flow chart of the steps of an embodiment of the online oral pronunciation quality evaluation method provided by the present invention, said method comprising:

S1,通过网络接收由移动客户端采集的测试语音。S1, receiving the test voice collected by the mobile client through the network.

在具体实施当中,移动客户端以应用程序的方式安装于用户的手机或其他移动设备上,通过调用移动设备中的录音程序进行录音,采集用户在口语测试中发出的语音,并生成统一格式的音频文件,所述移动客户端对所述音频文件进行压缩编码后通过网络发送给服务器端。其中,所述音频文件优选为wav格式的音频文件,所述网络优选为移动互联网,移动客户端和服务器端采用基于TCP/IP(Transmission Control Protocol/InternetProtocol,传输控制协议/因特网互联协议)通信协议的Socket进行数据传输。In the specific implementation, the mobile client is installed on the user's mobile phone or other mobile devices in the form of an application program, and records by calling the recording program in the mobile device, collects the user's voice in the oral test, and generates a unified format An audio file, the mobile client compresses and encodes the audio file and then sends it to the server through the network. Wherein, described audio file is preferably the audio file of wav format, and described network is preferably mobile internet, and mobile client and server end adopt based on TCP/IP (Transmission Control Protocol/Internet Protocol, Transmission Control Protocol/Internet Protocol) communication protocol Socket for data transmission.

S2,对接收到的测试语音进行预处理。S2. Preprocessing the received test voice.

服务器端在接收到移动客户端发送来的数据后,对接收到的数据进行解压缩解码,还原为测试语音的原始文件。同时在对测试语音进行分析和处理之前,为了消除因为人发音器官本身和由于语音设备对测试语音产生的影响,对测试语音进行预处理,为后续语音特征参数的提取提供优质的数据源,从而提高语音处理的质量。本实施例所述预处理包括但不限于预加重、分帧、加窗和端点检测,具体如下:After receiving the data sent by the mobile client, the server side decompresses and decodes the received data and restores it to the original file of the test voice. At the same time, before analyzing and processing the test voice, in order to eliminate the influence of the human pronunciation organ itself and the voice equipment on the test voice, the test voice is preprocessed to provide a high-quality data source for the extraction of subsequent voice feature parameters, thereby Improve the quality of speech processing. The preprocessing described in this embodiment includes, but is not limited to, pre-emphasis, framing, windowing, and endpoint detection, as follows:

2.1)预加重:测试语音的平均功率谱受声门激励和口鼻辐射的影响,高频端大约在800Hz以上按6dB/oct衰减,频率越高相应的成分越小,因此在对测试语音进行分析之前需要对所述测试语音的高频部分加以提升。本实施例在对测试语音进行分析之前采用一个6dB/oct的高频提升预加重数字滤波器,对所述测试语音的高频部分加以提升,使所述测试语音的频谱变得平坦,保持在低频到高频的频带中。预加重的计算公式如下:2.1) Pre-emphasis: The average power spectrum of the test voice is affected by glottal excitation and mouth and nose radiation. The high-frequency end is attenuated by 6dB/oct above 800 Hz. The higher the frequency, the smaller the corresponding component. Therefore, the test voice is Before the analysis, the high-frequency part of the test speech needs to be enhanced. This embodiment adopts a 6dB/oct high-frequency boost pre-emphasis digital filter before analyzing the test voice to enhance the high-frequency part of the test voice, so that the frequency spectrum of the test voice becomes flat and remains at In the frequency band from low frequency to high frequency. The calculation formula for pre-emphasis is as follows:

y(n)=x(n)-0.9375*x(n-1) (公式1)y(n)=x(n)-0.9375*x(n-1) (Formula 1)

其中,x(n)为原始的测试语音。Among them, x(n) is the original test speech.

2.2)分帧:语音信号具有时变特性,但是在一个短时间范围内,其特性基本保持不变即相对稳定,语音信号的这种特性称为“短时性”,这一短时间范围一般为10~30ms,所以对测试语音的分析和处理建立在“短时性”的基础上,对测试语音进行“短时分析”(即分帧处理)。由于语音信号之间存在相关性,本实施例采用半帧交叠分帧的方式对所述测试语音进行分帧。2.2) Framing: The speech signal has time-varying characteristics, but in a short time range, its characteristics remain basically unchanged, that is, relatively stable. This characteristic of the speech signal is called "short-term", and this short-term It is 10 ~ 30ms, so the analysis and processing of the test voice is based on the "short-time nature", and the test voice is "short-time analysis" (ie frame processing). Due to the correlation between the speech signals, the present embodiment divides the test speech into frames in a way of half-frame overlapping and framing.

2.3)加窗:为实现对测试语音中抽样位置附近的语音波形加以强调而对波形的其余部分加以减弱,本实施例中选用汉明窗对测试语音进行加窗,分帧后进行加窗处理可以减少由于截断导致的吉布斯效应(Gibbs phenomenon),使得测试语音的频谱较为平滑。在一种可实现的方式中,加窗的计算公式如下:2.3) Windowing: In order to realize that the voice waveform near the sampling position in the test voice is emphasized and the rest of the waveform is weakened, in this embodiment, the Hamming window is selected to add a window to the test voice, and the windowing process is carried out after the frame division The Gibbs phenomenon (Gibbs phenomenon) caused by truncation can be reduced, so that the frequency spectrum of the test speech is relatively smooth. In a practicable manner, the calculation formula of windowing is as follows:

Sω(n)=y(n)*ω(n) (公式2)S ω (n)=y(n)*ω(n) (Formula 2)

其中,y(n)是预加重后的语音信号,ω(n)是窗函数。Among them, y(n) is the speech signal after pre-emphasis, ω(n) is the window function.

2.4)端点检测:本实施例采用双门限比较法来进行端点检测,检测出测试语音的起始点及结束点。双门限比较法以短时能量E和短时平均过零率Z作为特征,结合短时平均过零率Z和短时能量E的优点,使检测更为准确,有效降低系统的处理时间,提高系统处理的实时性,而且能排除无声段的噪声干扰,从而提高的语音信号的处理性能。2.4) Endpoint detection: In this embodiment, a double-threshold comparison method is used for endpoint detection to detect the start point and end point of the test voice. The double-threshold comparison method is characterized by short-term energy E and short-term average zero-crossing rate Z, and combines the advantages of short-term average zero-crossing rate Z and short-term energy E to make detection more accurate, effectively reduce system processing time, and improve The real-time processing of the system can eliminate the noise interference of the silent segment, thereby improving the processing performance of the speech signal.

S3,对预处理后的测试语音进行语音特征参数的提取,获得所述测试语音的特征参数。所述测试语音的特征参数包括MFCC(Mel-Frequency Cepstral Coefficients,Mel倒谱系数)特征参数、情感特征参数、发音时长特征参数、重音位置特征参数、语音单元时长特征参数和音高特征参数,在服务器端进行的特征参数提取过程如下:S3. Extracting speech feature parameters from the preprocessed test speech to obtain the feature parameters of the test speech. The feature parameter of described test voice comprises MFCC (Mel-Frequency Cepstral Coefficients, Mel cepstral coefficient) feature parameter, emotion feature parameter, pronunciation duration feature parameter, stress location feature parameter, speech unit duration feature parameter and pitch feature parameter, in server The feature parameter extraction process at the end is as follows:

3.1)对所述测试语音进行离散傅利叶变换(DFT,Discrete Fourier Transform),得到所述测试语音的频谱系数,将所述频谱系数用序列三角滤波器进行滤波,对滤波后的数据进行对数运算,利用离散余弦变换,获取所述测试语音的MFCC特征参数。具体步骤如下:3.1) Carry out discrete Fourier transform (DFT, Discrete Fourier Transform) to described test speech, obtain the spectral coefficient of described test speech, filter described spectral coefficient with sequence triangular filter, carry out logarithmic operation to the data after filtering , using discrete cosine transform to obtain the MFCC characteristic parameters of the test speech. Specific steps are as follows:

对预处理后的测试语音进行离散傅利叶变换得到频谱系数X(k)。Discrete Fourier transform is performed on the preprocessed test speech to obtain the spectral coefficient X(k).

对频谱系数X(k)用序列三角滤波器进行滤波处理,得到一组系数mi。计算所述系数mi的公式如下:Filter the spectral coefficients X(k) with a sequence triangular filter to obtain a set of coefficients m i . The formula for calculating the coefficient mi is as follows:

mi=ln[X(k)*Hi(k)] (公式3)m i =ln[X(k)*H i (k)] (Formula 3)

其中,in,

(公式4) (Formula 4)

f[i]为三角滤波器的中心频率,满足:f[i] is the center frequency of the triangular filter, satisfying:

Mel(f[i+1])-Mel(f[i])=Mel(f[i])-Mel(f[i-1]) (公式5)Mel(f[i+1])-Mel(f[i])=Mel(f[i])-Mel(f[i-1]) (Formula 5)

对所有滤波器的输出求对数,再利用离散余弦变换求得倒频谱系数,计算公式如下:Calculate the logarithm of the output of all filters, and then use the discrete cosine transform to obtain the cepstral coefficient. The calculation formula is as follows:

(公式6) (Formula 6)

其中,P是三角滤波器的个数,Ci为所求的MFCC特征参数。优选地,所述MFCC特征参数的阶数设为12。Among them, P is the number of triangular filters, and C i is the characteristic parameter of the MFCC to be sought. Preferably, the order of the MFCC characteristic parameters is set to 12.

3.2)对所述测试语音的基频特征、短时能量特征、共振峰特征进行提取,并将所述基频特征、所述短时能量特征和所述共振峰特征组成所述测试语音的情感特征参数。3.2) extract the fundamental frequency feature, short-term energy feature, and formant feature of the test speech, and form the emotion of the test speech with the fundamental frequency feature, the short-term energy feature and the formant feature Characteristic Parameters.

3.2.1)基频特征:基音周期是指发浊音时声带振动所引起的周期性,基频即基音周期的倒数。基频是语音信号最重要的参数之一,研究表明基音频率可以反映情感的变化。基频特征的检测方法包括但不限于自相关函数法(ACF)、倒谱分析法、平均幅度差函数法(AMDF)和小波法。本实施例中优选采用倒谱分析法,对预处理后的测试语音进行傅里叶变换,获取所述测试语音的幅度谱,对所述幅度谱取对数,得到测试语音在频域的一个周期信号,计算所述周期信号的频率值,即可获取所述测试语音的基频值。对所述周期信号进行傅里叶逆变换,获得基音周期处的一个峰值。在得出基频值后再通过计算得到基频的最大值、最小值、均值、中值和标准差等7阶基频统计学变化参数,作为测试语音的基频特征。3.2.1) Fundamental frequency characteristics: The pitch period refers to the periodicity caused by the vibration of the vocal cords when making voiced sounds, and the fundamental frequency is the reciprocal of the pitch period. Pitch frequency is one of the most important parameters of speech signals, and studies have shown that pitch frequency can reflect changes in emotion. The detection methods of fundamental frequency features include but not limited to autocorrelation function method (ACF), cepstrum analysis method, average amplitude difference function method (AMDF) and wavelet method. Preferably adopt cepstrum analysis method in the present embodiment, carry out Fourier transform to the test speech after the preprocessing, obtain the magnitude spectrum of described test speech, take logarithm to described magnitude spectrum, obtain a test speech in the frequency domain A periodic signal, calculating the frequency value of the periodic signal, can obtain the fundamental frequency value of the test voice. Inverse Fourier transform is performed on the periodic signal to obtain a peak value at the pitch period. After obtaining the fundamental frequency value, the 7th-order fundamental frequency statistical change parameters such as the maximum value, minimum value, mean value, median value and standard deviation of the fundamental frequency are obtained by calculation, which are used as the fundamental frequency characteristics of the test speech.

3.2.2)短时能量特征:语音信号的能量与情感语音的表达有较强关联性,能量大则表明声音的音量即响度相对较大。在实际的生活中,当人们愤怒或者生气的时候,发音的音量较大。当人们沮丧或者悲伤的时候,往往讲话声音较低。语音信号能量通常有短时能量和短时平均幅度能量两种,优选地选取测试语音的短时能量作为能量参数。短时能量为一帧采样点值的加权平方和,短时能量的定义如下:3.2.2) Short-term energy characteristics: the energy of the speech signal has a strong correlation with the expression of emotional speech, and a large energy indicates that the volume of the sound is relatively loud. In real life, when people are angry or angry, the volume of pronunciation is louder. When people are depressed or sad, their voices tend to be lower. Speech signal energy usually has two types: short-term energy and short-term average amplitude energy, and the short-term energy of the test speech is preferably selected as the energy parameter. The short-term energy is the weighted sum of squares of a frame of sampling point values, and the short-term energy is defined as follows:

(公式7) (Formula 7)

其中,xn(m)是测试语音的第n帧信号。Among them, x n (m) is the nth frame signal of the test speech.

在得出短时能量后,再通过计算得到短时能量的最大值、最小值、均值、中值和标准差等7阶短时能量统计学变化参数,作为测试语音的短时能量特征。After the short-term energy is obtained, the seven-order short-term energy statistical change parameters such as the maximum, minimum, mean, median and standard deviation of the short-term energy are calculated as the short-term energy characteristics of the test speech.

3.2.3)共振峰特征:共振峰是反映声道特性的一个重要参数,当声音激励通过声道时,会产生共振峰频率。当人处于不同情感状态时,其神经的紧张程度不同,导致声道形变,共振峰频率发生相应的改变。本实施例优选利用线性预测方法对每帧语音信号的共振峰参数进行提取,可快速、优良且行之有效地提取共振峰参数,通过线性预测法求出语音信号的第一共振峰和第二共振峰,再用分段聚类法将第一共振峰和第二共振峰规整为32阶参数,作为所述测试语音的共振峰特征。将共振峰特征、基频特征和短时能量特征结合在一起,构成46阶的语音情感特征参数。3.2.3) Formant characteristics: Formant is an important parameter reflecting the characteristics of the vocal tract. When the sound excitation passes through the vocal tract, the formant frequency will be generated. When a person is in different emotional states, the tension of their nerves is different, resulting in deformation of the vocal tract, and a corresponding change in the formant frequency. In this embodiment, the linear prediction method is preferably used to extract the formant parameters of each frame of the speech signal, which can quickly, excellently and effectively extract the formant parameters, and obtain the first formant and the second resonance of the speech signal by the linear prediction method peak, and then the first formant and the second formant are regularized into 32 order parameters by the segmented clustering method, as the formant feature of the test speech. Combine formant features, fundamental frequency features and short-term energy features to form 46-order speech emotion feature parameters.

3.3)计算所述测试语音的发音时长,获取所述测试语音的发音时长特征参数。3.3) Calculating the pronunciation duration of the test speech, and obtaining the characteristic parameter of the pronunciation duration of the test speech.

在具体实施当中,可以通过设定短时能量和过零率的高低限值,对测试语音进行端点检测,来获得测试语音的发音时长。In a specific implementation, the endpoint detection of the test voice can be performed by setting the high and low limit values of the short-term energy and the zero-crossing rate, so as to obtain the pronunciation duration of the test voice.

3.4)对所述测试语音进行重音单元划分,提取重音的起始帧位置组与结束帧位置组,获取所述测试语音的重音位置特征参数。3.4) Divide the test speech into stress units, extract the start frame position group and the end frame position group of the stress, and obtain the stress position characteristic parameters of the test speech.

重音单元划分流程如下:The accent unit division process is as follows:

a.提取测试语音的能量值。测试语音中重读音节响亮的特征将反映到时域上的能量强度,即重音音节表现为语音能量强度大。a. Extract the energy value of the test speech. The characteristics of loud stressed syllables in the test speech will be reflected in the energy intensity in the time domain, that is, the stressed syllables are characterized by high speech energy intensity.

b.对测试语音进行规整。由于说话人语速之间的差距,不同的说话人对同一句子的发音时长存在一定差异,但是不同人对同一句子的发音却遵循重音单元时长占整个句子一定比例的规律。因此,在对测试语音进行评分时,可以通过调取标准语音的发音时长特征参数,将所述测试语音的发音时长按比例规整为与所述标准语音的发音时长相同,有利于数据的处理,也使得系统的评价更为客观。b. Regularize the test voice. Due to the difference in speech speed between speakers, there are certain differences in the pronunciation duration of the same sentence by different speakers, but the pronunciation of the same sentence by different people follows the law that the duration of the stress unit accounts for a certain proportion of the entire sentence. Therefore, when the test speech is scored, the pronunciation duration of the test speech can be adjusted in proportion to be the same as the pronunciation duration of the standard speech by calling the pronunciation duration characteristic parameter of the standard speech, which is beneficial to the processing of data. It also makes the evaluation of the system more objective.

c.提取测试语音的重音音节。在具体实施当中,可以采用双门限比较法来进行重音端点检测,根据测试语音的能量值,逐个搜索测试语音中大于重音阀值Tu的最大语音信号值Smax,然后向信号值Smax左右搜索等于非重音阀值Tl的语音信号值Sl与Sr,将Sl与Sr设置为测试语音的重音信号,并将Sl与Sr之间的信号量值置0,避免重复在Sl与Sr之间搜索。由于测试语音中重读音节有着发音偏长的特征,而第一步搜索出来的重读音节单元可能存在能量值大,即听觉表现为发音响亮,却持续时间很短的问题,这些单元可能是短元音,也可能是信号尖峰的干扰,它们不构成重读音节,因此可以根据重读音节发音偏长的特征将重读音节单元进一步筛选,将重读音节单元的最小时长设定为一个大致重读元音时长(Stressed vowel durations),优选为100ms,并根据设定的最小时长进行对比。c. Extracting stressed syllables of the test speech. In the actual implementation, the double-threshold comparison method can be used to detect the stress endpoint. According to the energy value of the test speech, the maximum speech signal value S max in the test speech that is greater than the stress threshold T u is searched one by one, and then the signal value S max is increased to about Search for the speech signal values S l and S r equal to the non-accent threshold T l , set S l and S r as the accent signals of the test speech, and set the signal value between S l and S r to 0 to avoid repetition Search between S l and S r . Since the stressed syllables in the test speech have long pronunciation characteristics, and the stressed syllable units searched in the first step may have a large energy value, that is, the auditory performance is loud, but the duration is very short. These units may be short units They may also be the interference of signal spikes, they do not constitute stressed syllables, so the stressed syllable units can be further screened according to the longer pronunciation of the stressed syllable unit, and the minimum duration of the stressed syllable unit is set to a roughly stressed vowel duration ( Stressed vowel durations), preferably 100ms, and compare according to the set minimum duration.

通过以上步骤,完成对句子重音单元的划分,即可知道句子的重音的起始帧位置组与结束帧位置组,并将所述起始帧位置组与结束帧位置组作为所述测试语音的重音位置特征参数。Through the above steps, complete the division of the sentence stress unit, you can know the start frame position group and the end frame position group of the accent of the sentence, and use the start frame position group and the end frame position group as the test voice Accent position feature parameter.

3.5)对所述测试语音进行语音单元划分,分别计算每个语音单元的时长,获取所述测试语音的语音单元时长特征参数。所述语音单元的时长是指各个语音单元开始到结束的持续时间。3.5) Divide the test speech into speech units, calculate the duration of each speech unit respectively, and obtain the speech unit duration characteristic parameter of the test speech. The duration of the speech unit refers to the duration from the beginning to the end of each speech unit.

3.6)通过时域上的自相关函数法(ACF)提取所述测试语音每一帧数据的音高,获取所述测试语音的音高特征参数。3.6) Extracting the pitch of each frame of data of the test voice through the autocorrelation function method (ACF) in the time domain, and obtaining the pitch characteristic parameters of the test voice.

自相关函数法是使用自相关函数来计算一个音框s(i)和自身的相似度,其中,i=0~n-1,计算公式如下:The autocorrelation function method is to use the autocorrelation function to calculate the similarity between a sound frame s(i) and itself, wherein, i=0~n-1, the calculation formula is as follows:

(公式8) (Formula 8)

其中,n是指一帧语音数据的长度,τ是时间延迟量,找出能使acf(τ)在某一个合理的特定区间内的τ值,就可以算出此音框的音高。在具体的ACF计算过程中,将语音帧每次向右平移一点,将平移后的音框和原音框的重叠部分做内积,重复n次后得到的n个内积值就是一个语音帧对应的ACF值。Among them, n refers to the length of a frame of voice data, τ is the amount of time delay, find out the value of τ that can make acf(τ) within a reasonable specific interval, and then the pitch of the sound frame can be calculated. In the specific ACF calculation process, the speech frame is shifted a little to the right each time, and the overlapping part of the shifted sound frame and the original sound frame is used as an inner product, and the n inner product values obtained after repeating n times are corresponding to a speech frame The ACF value.

S4,根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果。S4. Evaluate the test speech according to the characteristic parameters of the test speech and the characteristic parameters of the standard speech, and obtain an evaluation result.

需要说明的是,标准语音的特征参数是通过事先对标准语音进行语音特征参数提取得到,存储于数据库中,待需要使用时调取。提取标准语音的特征参数的具体步骤包括:获取标准语音;对所述标准语音进行预处理;对预处理后的标准语音进行语音特征参数的提取,获取所述标准语音的特征参数。标准语音的特征参数提取的具体步骤与测试语音的特征参数提取过程一致,在此不再赘述。It should be noted that the characteristic parameters of the standard speech are obtained by extracting the speech characteristic parameters of the standard speech in advance, stored in the database, and retrieved when needed. The specific steps of extracting the characteristic parameters of the standard speech include: obtaining the standard speech; preprocessing the standard speech; extracting the speech characteristic parameters of the preprocessed standard speech, and obtaining the characteristic parameters of the standard speech. The specific steps of extracting the feature parameters of the standard speech are the same as the process of extracting the feature parameters of the test speech, and will not be repeated here.

根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价的过程具体如下:According to the feature parameters of the test voice and the feature parameters of the standard voice, the process of evaluating the test voice is as follows:

4.1)根据所述测试语音的MFCC特征参数,基于分段聚类的概率神经网络(Probabilistic Neural Network,PNN)集成语音识别模型,对所述测试语音进行语音识别,获得语音识别结果。并对所述测试语音的MFCC特征参数和所述标准语音的MFCC特征参数进行相似度计算,获得MFCC相关系数。根据所述语音识别结果和所述MFCC相关系数,计算出所述测试语音的准确度得分。需要说明的是,所述分段聚类的概率神经网络集成语音识别模型为事先训练得到,存储于数据库中,待需要使用时调取。4.1) According to the MFCC characteristic parameters of the test speech, a speech recognition model is integrated based on a segmented clustering probabilistic neural network (Probabilistic Neural Network, PNN), and the speech recognition is performed on the test speech to obtain a speech recognition result. and performing similarity calculation on the MFCC characteristic parameters of the test speech and the MFCC characteristic parameters of the standard speech to obtain the MFCC correlation coefficient. Calculate the accuracy score of the test speech according to the speech recognition result and the MFCC correlation coefficient. It should be noted that the segmented clustering probabilistic neural network integrated speech recognition model is trained in advance, stored in the database, and called when needed.

本实施例中,采用Bagging(Bootstrap aggregating,自助聚集)思想来生成集成所需的个体概率神经网络模型,Bagging是一种把多个不同的个体学习器集成为一个学习器的集成学习方法,通过可重复取样得到不同的数据子集,使得在不同数据子集上训练得到的个体学习器具有较高的泛化性能及有较大的差异度。利用现有网络的分布式计算可以进一步提高算法的时间效率,并且Bagging可以改善学习器的性能,有利于提高概率神经网络的分类准确率和泛化能力。In this embodiment, the idea of Bagging (Bootstrap aggregating, self-help aggregation) is used to generate the individual probabilistic neural network model required for integration. Bagging is an integrated learning method that integrates a plurality of different individual learners into one learner. Different data subsets can be obtained by repeated sampling, so that individual learners trained on different data subsets have high generalization performance and large differences. Using the distributed computing of the existing network can further improve the time efficiency of the algorithm, and Bagging can improve the performance of the learner, which is conducive to improving the classification accuracy and generalization ability of the probabilistic neural network.

参照图2,是本发明提供的概率神经网络集成分类器的建立过程示意图。每次从训练样本集A中随机抽取n个样本(如图中Bagging样本A1、Bagging样本A2…Bagging样本An),用概率神经网络分类算法进行训练,得到一个PNN分类器,利用相同的方法生成多个PNN分类器(即图中PNN分类器C1(x)、PNN分类器C2(x)…PNN分类器Cn(x)),训练之后可得到一个分类函数序列C1(x)、C2(x)…Cn(x),即PNN集成分类器,也就是本实施例中所述PNN集成语音识别模型,最终的分类函数C(x)对分类问题采用投票方式,得票最多的分类结果即为分类函数C(x)的最终类别。Referring to FIG. 2 , it is a schematic diagram of the establishment process of the probabilistic neural network ensemble classifier provided by the present invention. Each time, n samples are randomly selected from the training sample set A (Bagging sample A1, Bagging sample A2...Bagging sample An in the figure), trained with a probabilistic neural network classification algorithm, and a PNN classifier is obtained, which is generated using the same method Multiple PNN classifiers (ie, PNN classifier C 1 (x), PNN classifier C 2 (x)...PNN classifier C n (x) in the figure), after training, a classification function sequence C 1 (x) can be obtained , C 2 (x)...C n (x), that is, the PNN integrated classifier, that is, the PNN integrated speech recognition model described in this embodiment, the final classification function C (x) adopts a voting method for the classification problem, and gets the most votes The classification result of is the final category of the classification function C(x).

在语音识别的过程中,只需要将所述测试语音的MFCC特征参数输入到所述PNN集成语音识别模型中,以投票方式进行分类,判断内容是否正确。同时对所述测试语音的MFCC特征参数和所述标准语音的MFCC特征参数进行相似度计算,最后依据内容是否正确和MFCC相关系数的大小对测试语音的准确度进行评分。In the process of speech recognition, it is only necessary to input the MFCC feature parameters of the test speech into the PNN integrated speech recognition model, classify by voting, and judge whether the content is correct. Simultaneously, similarity calculation is carried out to the MFCC feature parameters of the test speech and the MFCC feature parameters of the standard speech, and finally the accuracy of the test speech is scored according to whether the content is correct and the size of the MFCC correlation coefficient.

4.2)根据所述测试语音的情感特征参数,基于SVM(Support Vector Machine,支持向量机)情感模型,对所述测试语音进行情感识别,获得情感识别结果。并对所述测试语音的情感特征参数所述标准语音的情感特征参数进行相似度计算,获得情感相关系数。根据所述情感识别结果和所述情感相关系数,计算出所述测试语音的情感得分。4.2) According to the emotional feature parameters of the test speech, based on the SVM (Support Vector Machine, Support Vector Machine) emotion model, perform emotion recognition on the test speech to obtain an emotion recognition result. And performing similarity calculation on the emotional characteristic parameters of the test speech and the emotional characteristic parameters of the standard speech to obtain an emotional correlation coefficient. An emotion score of the test speech is calculated according to the emotion recognition result and the emotion correlation coefficient.

测试语音的情感特征参数提取完毕之后,将情感特征参数输入至基于SVM情感模型进行分类,同时计算测试语音的情感特征参数与标准语音的情感特征参数的相关系数。最后,依据情感分类结果是否正确和情感特征参数的相关系数大小得出情感得分。After the emotional feature parameters of the test speech are extracted, the emotional feature parameters are input to the SVM emotion model for classification, and the correlation coefficient between the emotional feature parameters of the test speech and the standard speech is calculated. Finally, the sentiment score is obtained according to whether the sentiment classification result is correct or not and the correlation coefficient of the sentiment feature parameters.

4.3)根据所述标准语音和所述测试语音的发音时长特征参数,获取所述标准语音与所述测试语音的语速比,并根据所述语速比,计算出所述测试语音的语速得分。4.3) According to the pronunciation duration feature parameter of the standard speech and the test speech, obtain the speech rate ratio of the standard speech and the test speech, and calculate the speech rate of the test speech according to the speech rate ratio Score.

提取所述测试语音的发音时长特征参数后,通过以下公式计算语速比:After extracting the pronunciation duration feature parameter of the test voice, calculate the speech rate ratio by the following formula:

(公式9) (Formula 9)

其中,S发音时长指标准语音的持续时间,T发音时长是指测试语音的发音时长。Wherein, the S pronunciation duration refers to the duration of the standard speech, and the T pronunciation duration refers to the pronunciation duration of the test speech.

语速过快或过慢均不符合语言学表达的要求,因此可根据语速比,按语速过快或者过慢的程度,对测试语音的语速进行评分。Speech speed that is too fast or too slow does not meet the requirements of linguistic expression. Therefore, the speech speed of the test voice can be scored according to the degree of too fast or too slow according to the ratio of speech speed.

4.4)根据所述测试语音的重音位置特征参数和所述标准语音的重音位置特征参数,比对所述测试语音与所述标准语音的重音位置差异,并根据所述重音位置差异,计算出所述测试语音的重音得分。4.4) According to the stress position feature parameter of the test speech and the stress position feature parameter of the standard speech, compare the stress position difference between the test speech and the standard speech, and calculate the stress position difference according to the stress position difference. Stress scores for the test utterances described above.

在提取重音位置特征参数时,获得重音的起始帧位置和结束帧位置组,通过以下公式计算测试语音与标注语音的重音分布差异diff:When extracting the stress position feature parameters, the start frame position and the end frame position group of the accent are obtained, and the stress distribution difference diff between the test speech and the marked speech is calculated by the following formula:

(公式10) (Formula 10)

其中,Lenstd是指标准语音的有效语音帧长度,Lentest是指测试语音的有效语音帧长度。leftstd[i]是标准语音的起始帧位置组,rightstd[i]是标准语音的结束帧位置组。lefttest[i]是测试语音的起始帧位置组,righttest[i]是测试语音的结束帧位置组。Wherein, Len std refers to the effective speech frame length of the standard speech, and Len test refers to the effective speech frame length of the test speech. left std [i] is the start frame position group of standard voice, right std [i] is the end frame position group of standard voice. left test [i] is the start frame position group of the test speech, and right test [i] is the end frame position group of the test speech.

依据测试语音与标准语音的重音位置差异大小,对所述测试语音的重音进行评分。According to the stress position difference between the test speech and the standard speech, the stress of the test speech is scored.

4.5)根据所述测试语音的语音单元时长特征参数和所述标准语音的语音单元时长特征参数,利用dPVI(the Distinct Pairwise Variability Index)算法,获取所述测试语音的dPVI参数,并根据所述dPVI参数,计算出所述测试语音的节奏得分。4.5) According to the speech unit duration characteristic parameter of the test speech and the speech unit duration characteristic parameter of the standard speech, utilize the dPVI (the Distinct Pairwise Variability Index) algorithm to obtain the dPVI parameter of the test speech, and according to the dPVI parameters to calculate the rhythm score of the test speech.

提取测试语音的语音单元时长特征参数后,将测试语音的音节单元时长特征参数与标准语音的语音单元时长特征参数进行对比预算,并转换出用于系统评分依据的dPVI参数,dPVI参数的计算公式如下:After extracting the characteristic parameters of the speech unit duration of the test speech, compare the characteristic parameters of the syllable unit duration of the test speech with the speech unit duration characteristic parameters of the standard speech, and convert the dPVI parameters used for the system scoring basis, and the calculation formula of the dPVI parameters as follows:

(公式11) (Formula 11)

其中,d为句子划分的语音单元时长(如:dk为第k个语音单元时长),m=min(Ssnum,Tsnum),Ssnum为标准语音的语音单元数,Tsnum为测试语音的语音单元数,Len为标准语音的时长。Wherein, d is the speech unit duration of sentence division (such as: d k is the kth speech unit duration), m=min(S snum , T snum ), S snum is the speech unit number of standard speech, and T snum is the test speech The number of speech units, Len is the duration of the standard speech.

根据dPVI参数的大小,计算出所述测试语音的节奏得分。Calculate the rhythm score of the test speech according to the size of the dPVI parameter.

4.6)根据所述测试语音的音高特征参数和所述标准语音的音高特征参数,利用DTW(Dynamic Time Warping,动态时间归整)算法,获取所述标准语音与所述测试语音的音高差异,并根据所述音高差异,计算出所述测试语音的语调得分。4.6) according to the pitch characteristic parameter of described test speech and the pitch characteristic parameter of described standard speech, utilize DTW (Dynamic Time Warping, dynamic time rounding) algorithm, obtain the pitch of described standard speech and described test speech difference, and according to the pitch difference, calculate the intonation score of the test voice.

提取所述测试语音的音高特征参数后,还可以通过设置中位数滤波器,来对音高进行平滑,排除掉不稳定、音高值异常的语音帧。利用DTW算法将测试语音的音高特征参数与标准语音的音高特征参数进行差异性对比,计算出二者之间的音高差异参数dist,再计算出所述测试语音的语调得分,语调得分的计算公式如下:After extracting the pitch feature parameters of the test speech, a median filter can also be set to smooth the pitch and exclude unstable and abnormal speech frames. Utilize the DTW algorithm to compare the pitch feature parameters of the test voice with the pitch feature parameters of the standard voice, calculate the pitch difference parameter dist between the two, and then calculate the intonation score of the test voice, the intonation score The calculation formula of is as follows:

(公式12) (Formula 12)

其中,通过仿真实验,对比专家评分数据和系统评分数据,计算得到a=0.0005,b=2。Wherein, a=0.0005 and b=2 are calculated by comparing expert scoring data and system scoring data through simulation experiments.

4.7)对所述准确度得分、所述情感得分、所述语速得分、所述重音得分、所述节奏得分和所述语调得分进行加权求和,获得综合得分。并根据所述准确度得分、所述情感得分、所述语速得分、所述重音得分、所述节奏得分、所述语调得分和所述综合得分,结合各得分与等级评价的映射关系,获取所述测试语音的准确度等级评价、情感等级评价、语速等级评价、重音等级评价、节奏等级评价、语调等级评价和综合等级评价。并将所述测试语音的准确度等级评价、情感等级评价、语速等级评价、重音等级评价、节奏等级评价、语调等级评价和综合等级评价作为所述测试语音的评价结果。4.7) Perform weighted summation of the accuracy score, the emotion score, the speech rate score, the stress score, the rhythm score and the intonation score to obtain a comprehensive score. And according to the accuracy score, the emotion score, the speech rate score, the stress score, the rhythm score, the intonation score and the comprehensive score, combined with the mapping relationship between each score and grade evaluation, obtain Accuracy grade evaluation, emotion grade evaluation, speech rate grade evaluation, accent grade evaluation, rhythm grade evaluation, intonation grade evaluation and comprehensive grade evaluation of the test speech. And the accuracy grade evaluation, emotion grade evaluation, speech speed grade evaluation, accent grade evaluation, rhythm grade evaluation, intonation grade evaluation and comprehensive grade evaluation of the test speech are taken as the evaluation results of the test speech.

在对所述准确度得分、所述情感得分、所述语速得分、所述重音得分、所述节奏得分和所述语调得分进行加权求和的过程中,各指标分数所占权重可根据不同的需求采用不同的取值,可根据用户自身特点选择符合用户需求的权重组合。根据各得分与等级评价的映射关系,获取各指标的等级评价以及综合等级评价。例如,若所述准确度得分在90~100的分数范围内,则所述准确度等级评价为A级;若所述准确度得分在70~90的分数范围内,则所述准确度等级评价为B级;若所述准确度得分在60~70的分数范围内,则所述准确度等级评价为C级;若所述准确度得分在0~60的分数范围内,则所述准确度等级评价为D级。其他得分与等级评价的映射关系与上述准确度得分与准确度等级评价的映射关系类似,在此不再赘述。需要说明的是,上述分数与等级的映射关系仅仅为本发明的一个示例,在实际应用当中,可根据实际需要,设置不同的阈值,将不同的分数范围映射到不同的等级上,当然地也可以划分更多的等级。In the process of weighting and summing the accuracy score, the emotion score, the speech rate score, the stress score, the rhythm score and the intonation score, the weight of each index score can be based on different Different values are used for the needs of the user, and the weight combination that meets the user's needs can be selected according to the user's own characteristics. According to the mapping relationship between each score and the grade evaluation, the grade evaluation and the comprehensive grade evaluation of each indicator are obtained. For example, if the accuracy score is within the score range of 90-100, the accuracy grade is evaluated as A grade; if the accuracy score is within the score range of 70-90, the accuracy grade evaluation is Grade B; if the accuracy score is within the range of 60 to 70, the accuracy rating is grade C; if the accuracy score is within the range of 0 to 60, the accuracy The grade evaluation is D grade. The mapping relationship between other scores and grade evaluations is similar to the above mapping relationship between accuracy scores and accuracy grade evaluations, and will not be repeated here. It should be noted that the above-mentioned mapping relationship between scores and grades is only an example of the present invention. In practical applications, different thresholds can be set according to actual needs, and different score ranges can be mapped to different grades. More classes can be divided.

S5,将所述评价结果通过网络反馈给所述移动客户端,并通过所述移动客户端对所述评价结果进行显示。S5. Feed back the evaluation result to the mobile client through the network, and display the evaluation result through the mobile client.

所述服务器端获得评价结果后,即将评价结果通过移动互联网反馈给移动客户端,移动客户端将评价结果信息显示在移动设备的屏幕上,或者通过音频方式将评价结果信息进行提示。After the server side obtains the evaluation result, it will feed back the evaluation result to the mobile client through the mobile Internet, and the mobile client will display the evaluation result information on the screen of the mobile device, or prompt the evaluation result information through audio.

在具体实施当中,所述服务器端获取测试语音的评价结果后,还可以根据所述评价结果,对用户的口语发音进行指导,获取发音指导意见。可根据评价结果,与数据库中的发音指导意见进行匹配。In a specific implementation, after the server side obtains the evaluation result of the test voice, it can also guide the user's oral pronunciation according to the evaluation result, and obtain pronunciation guidance opinions. According to the evaluation results, it can be matched with the pronunciation guidance in the database.

将所述发音指导意见通过网络反馈给所述移动客户端,并通过所述移动客户端对所述发音指导意见进行显示。可通过发音指导意见指出用户口语发音中的错误与不足,并提出改进的意见,如若检测到用户的语速过快节奏混乱,可提示用户可稍微放慢语速,把握句子节奏等。The pronunciation guidance is fed back to the mobile client through the network, and the pronunciation guidance is displayed through the mobile client. Pronunciation guidance can be used to point out the errors and deficiencies in the user's oral pronunciation, and provide suggestions for improvement. If it is detected that the user's speech speed is too fast and the rhythm is chaotic, the user can be prompted to slow down the speech speed to grasp the rhythm of the sentence, etc.

本发明实施例基于C/S(Client/Server,客户端/服务器端)架构,构建移动客户端和服务器端,移动客户端采集用户的测试语音信号并发送给服务器端,服务器端对测试语音进行评价后向移动客户端返回语音评价结果,并通过移动客户端对所述评价结果进行展示。用户可以利用移动互联网方便地接入服务器端,获取服务和数据,语料库和评价方法均可以通过服务器端实现同步,并通过服务器端提供性能更优、效果更佳的语音分析算法处理。The embodiment of the present invention is based on C/S (Client/Server, client/server end) architecture, builds mobile client and server end, and mobile client collects user's test voice signal and sends to server end, and server end carries out test voice After the evaluation, the voice evaluation result is returned to the mobile client, and the evaluation result is displayed through the mobile client. Users can use the mobile Internet to easily access the server to obtain services and data. The corpus and evaluation methods can be synchronized through the server, and the server provides better performance and better effect of speech analysis algorithm processing.

进一步地,所述在线口语发音质量评价方法还包括:Further, the online oral pronunciation quality evaluation method also includes:

S6,将所述评价结果存储到数据库中,并对评价结果进行统计分析,获得统计结果。S6. Store the evaluation results in a database, and perform statistical analysis on the evaluation results to obtain statistical results.

在具体实施当中,当用户测试完毕,可以将用户的用户信息,测试语音和评价结果存储到数据库中,所述服务器端对数据库中的评价结果(包括各指标得分和综合得分)进行统计分析,获得单个用户的学习情况分析结果,也可以针对特定用户群组的用户或针对全网的所有用户获得群组学习情况分析结果或全网学习情况统计结果。In the middle of specific implementation, when the user's test is finished, the user information of the user, the test voice and the evaluation result can be stored in the database, and the server end performs statistical analysis on the evaluation result (comprising each index score and the comprehensive score) in the database, The analysis results of the learning situation of a single user can be obtained, and the analysis results of the learning situation of the group or the statistical results of the learning situation of the whole network can also be obtained for users of a specific user group or for all users of the entire network.

S7,将统计结果发送给网页管理端,并通过网页管理端对所述统计结果进行展示。网页管理端接收服务器端对移动客户端用户的口语发音评价的统计数据,以可视化的形式呈现给第三方(如教学者)。S7. Send the statistical result to the webpage management terminal, and display the statistical result through the webpage management terminal. The webpage management terminal receives the statistical data of the server-side evaluation of the oral pronunciation of the mobile client users, and presents it to a third party (such as a teacher) in a visualized form.

本发明实施例基于B/S(Browser/Server,网页端/服务器端)架构,构建网页管理端和服务器端,可以通过网页浏览器从服务器端的数据库中实时获取移动客户端用户的口语发音质量评价统计结果,为第三方提供移动客户端用户的口语发音情况,便于第三方制定线下口语指导和改良策略。The embodiment of the present invention is based on the B/S (Browser/Server, webpage/server) architecture, constructs the webpage management end and the server end, can obtain the oral pronunciation quality evaluation of the mobile client user in real time from the database of the server end through the web browser The statistical results provide the third party with the spoken pronunciation of mobile client users, which facilitates the third party to formulate offline oral guidance and improvement strategies.

参照图3,是本发明提供的在线口语发音质量评价系统的一个实施例的C/S架构图。所述在线口语发音质量评价系统与图1所示实施例中的在线口语发音质量评价方法的基本原理一致,本实施例中未详述之处,可参见图1所示实施例中的相关描述。Referring to FIG. 3 , it is a C/S architecture diagram of an embodiment of the online oral pronunciation quality evaluation system provided by the present invention. The basic principle of the online oral pronunciation quality evaluation system in the embodiment shown in Figure 1 is consistent with the online spoken pronunciation quality evaluation method in the embodiment shown in Figure 1. For the parts not described in detail in this embodiment, see the relevant description in the embodiment shown in Figure 1 .

所述系统包括通过网络连接的移动客户端100和服务器端200。The system includes a mobile client 100 and a server 200 connected through a network.

所述移动客户端100包括:The mobile client 100 includes:

语音采集单元101,用于采集测试语音,并通过网络将所述测试语音发送给所述服务器端200。The voice collection unit 101 is configured to collect a test voice, and send the test voice to the server 200 through the network.

所述服务器端200包括:Described server end 200 comprises:

预处理单元201,用于对接收到的测试语音进行预处理。The preprocessing unit 201 is configured to preprocess the received test voice.

特征参数提取单元202,用于对预处理后的测试语音进行语音特征参数的提取,获取所述测试语音的特征参数。The characteristic parameter extraction unit 202 is configured to extract the speech characteristic parameters of the preprocessed test speech, and obtain the characteristic parameters of the test speech.

语音评价单元203,用于根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果;并将所述评价结果通过网络反馈给所述移动客户端100。The voice evaluation unit 203 is used to evaluate the test voice according to the feature parameters of the test voice and the feature parameters of the standard voice to obtain an evaluation result; and feed back the evaluation result to the mobile client 100 through the network .

所述移动客户端100还包括:The mobile client 100 also includes:

数据显示单元102,用于对所述评价结果进行显示。The data display unit 102 is configured to display the evaluation result.

参照图4,是如图3所示在线口语发音质量评价系统的B/S架构示意图。Referring to FIG. 4 , it is a schematic diagram of the B/S architecture of the online oral pronunciation quality evaluation system shown in FIG. 3 .

所述系统还包括网页管理端300,所述网页管理端300通过网络与所述服务器端200连接。所述服务器端200还包括数据库204和统计分析单元205。The system also includes a webpage management terminal 300, which is connected to the server terminal 200 through a network. The server 200 also includes a database 204 and a statistical analysis unit 205 .

所述数据库204,用于存储所述评价结果。The database 204 is used to store the evaluation results.

所述统计分析单元205,用于对评价结果进行统计分析,获得统计结果。并将所述统计结果发送给所述网页管理端300。The statistical analysis unit 205 is configured to perform statistical analysis on the evaluation results to obtain statistical results. And send the statistical result to the webpage management terminal 300.

所述网页管理端300,用于对接收到的统计结果进行展示。The webpage management terminal 300 is used to display the received statistical results.

本发明实施例基于C(B)/S,构建移动客户端100、服务器端200和网页管理端300,通过移动客户端100采集用户的测试语音信号并发送给服务器端200,服务器端200对测试语音进行评价后向移动客户端100返回语音评价结果,通过移动客户端100对所述评价结果进行展示。用户可以利用移动互联网方便地接入服务器端200,获取服务和数据,语料库和评价方法均可以通过服务器端200实现同步,并通过服务器端200提供性能更优、效果更佳的语音分析算法处理。还可以通过网页管理端300从服务器端200的数据库中实时获取移动客户端用户的口语发音质量评价统计结果,为第三方(如教学者)提供移动客户端用户的口语发音情况,便于第三方制定线下口语指导和改良策略。The embodiment of the present invention is based on C(B)/S, builds mobile client 100, server end 200 and webpage management end 300, collects user's test voice signal by mobile client 100 and sends to server end 200, and server end 200 pairs test After the voice evaluation is performed, the voice evaluation result is returned to the mobile client 100 , and the evaluation result is displayed through the mobile client 100 . Users can use the mobile Internet to easily access the server 200 to obtain services and data. The corpus and evaluation methods can be synchronized through the server 200, and the server 200 provides better performance and better effect of speech analysis algorithm processing. It is also possible to obtain the oral pronunciation quality evaluation statistical results of the mobile client users in real time from the database of the server end 200 through the webpage management terminal 300, and provide the oral pronunciation situation of the mobile client users for a third party (such as a teacher), so as to facilitate the third party to formulate Offline oral instruction and improvement strategies.

本发明实施例提供的在线口语发音质量评价方法和系统可应用于英语口语学习中,检测英语口语的发音质量。也可以应用于其他语种的发音质量评价,如日语和法语。The online oral pronunciation quality evaluation method and system provided by the embodiments of the present invention can be applied to oral English learning to detect the pronunciation quality of spoken English. It can also be applied to the pronunciation quality evaluation of other languages, such as Japanese and French.

通过以上实施方式的描述,所属领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该软件产品存储在可读取的存储介质中,如计算机的软盘,U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等。Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be realized by means of software plus necessary general-purpose hardware. Components, etc. to achieve. The essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, U disk, mobile hard disk , Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (9)

1.一种在线口语发音质量评价方法,其特征在于,包括:1. A method for evaluating the quality of online oral English pronunciation, characterized in that, comprising: 通过网络接收由移动客户端采集的测试语音;所述移动客户端以应用程序的方式安装于用户的手机或其他移动设备上,通过调用移动设备中的录音程序采集所述测试语音,并将测试语音通过互联网发送给服务器端;Receive the test voice collected by the mobile client through the network; the mobile client is installed on the user's mobile phone or other mobile devices in the form of an application program, collect the test voice by calling the recording program in the mobile device, and test The voice is sent to the server through the Internet; 所述服务器端对接收到的测试语音进行预处理;The server side preprocesses the received test voice; 对预处理后的测试语音进行语音特征参数的提取,获得所述测试语音的特征参数;Carry out the extraction of speech characteristic parameter to the test speech after pretreatment, obtain the characteristic parameter of described test speech; 根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果;Evaluate the test speech according to the characteristic parameters of the test speech and the characteristic parameters of the standard speech, and obtain an evaluation result; 将所述评价结果通过网络反馈给所述移动客户端,并通过所述移动客户端对所述评价结果进行显示;Feedback the evaluation result to the mobile client through the network, and display the evaluation result through the mobile client; 所述根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果,包括:Described according to the characteristic parameter of described test speech and the characteristic parameter of standard speech, described test speech is evaluated, obtains evaluation result, comprises: 根据所述测试语音的重音位置特征参数和所述标准语音的重音位置特征参数,比对所述测试语音与所述标准语音的重音位置差异,并根据所述重音位置差异,计算出所述测试语音的重音得分;According to the stress position characteristic parameter of the test speech and the stress position characteristic parameter of the standard speech, compare the stress position difference between the test speech and the standard speech, and calculate the test according to the stress position difference Speech stress score; 所述方法还包括:根据所述评价结果,对用户的口语发音进行指导,获取发音指导意见;将所述发音指导意见通过网络反馈给所述移动客户端,并通过所述移动客户端对所述发音指导意见进行显示。The method further includes: guiding the user's spoken pronunciation according to the evaluation result, and obtaining pronunciation guidance; feeding back the pronunciation guidance to the mobile client through the network, and using the mobile client to The above pronunciation guidance is displayed. 2.如权利要求1所述的在线口语发音质量评价方法,其特征在于,所述方法还包括:2. online spoken language pronunciation quality evaluation method as claimed in claim 1, is characterized in that, described method also comprises: 将所述评价结果存储到数据库中,并对评价结果进行统计分析,获得统计结果;storing the evaluation results in a database, and performing statistical analysis on the evaluation results to obtain statistical results; 将统计结果发送给网页管理端,并通过网页管理端对所述统计结果进行展示。The statistical results are sent to the webpage management terminal, and the statistical results are displayed through the webpage management terminal. 3.如权利要求1所述的在线口语发音质量评价方法,其特征在于,所述方法还包括:3. The online oral English pronunciation quality evaluation method as claimed in claim 1, is characterized in that, described method also comprises: 获取标准语音;Get standard voice; 对所述标准语音进行预处理;Preprocessing the standard voice; 对预处理后的标准语音进行语音特征参数的提取,获取所述标准语音的特征参数。Extracting speech feature parameters from the preprocessed standard speech to obtain the feature parameters of the standard speech. 4.如权利要求1~3任一项所述的在线口语发音质量评价方法,其特征在于,所述预处理包括预加重、分帧、加窗和端点检测。4. The method for evaluating the pronunciation quality of online spoken language according to any one of claims 1 to 3, wherein the preprocessing includes pre-emphasis, framing, windowing and endpoint detection. 5.如权利要求1~3任一项所述的在线口语发音质量评价方法,其特征在于,所述对预处理后的测试语音进行语音特征参数的提取,获得所述测试语音的特征参数,包括:5. the online spoken language pronunciation quality evaluation method as described in any one of claim 1~3, it is characterized in that, described test speech after preprocessing is carried out the extraction of speech feature parameter, obtains the feature parameter of described test speech, include: 对所述测试语音进行离散傅利叶变换,得到所述测试语音的频谱系数,将所述频谱系数用序列三角滤波器进行滤波,对滤波后的数据进行对数运算,利用离散余弦变换,获取所述测试语音的MFCC特征参数;Carry out discrete Fourier transform to described test speech, obtain the spectral coefficient of described test speech, filter described spectral coefficient with sequence triangular filter, carry out logarithmic operation to the data after filtering, utilize discrete cosine transform, obtain described The MFCC characteristic parameter of test speech; 对所述测试语音的基频特征、短时能量特征、共振峰特征进行提取,并将所述基频特征、所述短时能量特征和所述共振峰特征组成所述测试语音的情感特征参数;Extract the fundamental frequency feature, short-term energy feature, and formant feature of the test speech, and form the emotional feature parameter of the test speech with the fundamental frequency feature, the short-term energy feature and the formant feature ; 计算所述测试语音的发音时长,获取所述测试语音的发音时长特征参数;Calculating the pronunciation duration of the test voice, and obtaining the pronunciation duration characteristic parameter of the test voice; 对所述测试语音进行重音单元划分,提取重音的起始帧位置组与结束帧位置组,获取所述测试语音的重音位置特征参数;Carry out accent unit division to described test speech, extract the initial frame position group and the end frame position group of accent, obtain the stress position feature parameter of described test speech; 对所述测试语音进行语音单元划分,分别计算每个语音单元的时长,获取所述测试语音的语音单元时长特征参数;Carry out speech unit division to described test speech, calculate the duration of each speech unit respectively, obtain the speech unit duration characteristic parameter of described test speech; 通过时域上的自相关函数法提取所述测试语音每一帧数据的音高,获取所述测试语音的音高特征参数。The pitch of each frame of data of the test voice is extracted by an autocorrelation function method in the time domain, and the pitch characteristic parameter of the test voice is obtained. 6.如权利要求5所述的在线口语发音质量评价方法,其特征在于,所述根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果,包括:6. The method for evaluating online spoken pronunciation quality as claimed in claim 5, wherein, the described test voice is evaluated according to the feature parameter of the test voice and the feature parameter of the standard voice, and obtains an evaluation result, including : 根据所述测试语音的MFCC特征参数,基于分段聚类的概率神经网络集成语音识别模型,对所述测试语音进行语音识别,获得语音识别结果;并对所述测试语音的MFCC特征参数和所述标准语音的MFCC特征参数进行相似度计算,获得MFCC相关系数;根据所述语音识别结果和所述MFCC相关系数,计算出所述测试语音的准确度得分;According to the MFCC characteristic parameter of described test speech, based on the probabilistic neural network integrated speech recognition model of segmentation clustering, described test speech is carried out speech recognition, obtains speech recognition result; The MFCC feature parameter of described standard speech carries out similarity calculation, obtains MFCC correlation coefficient; According to described speech recognition result and described MFCC correlation coefficient, calculate the accuracy score of described test speech; 根据所述测试语音的情感特征参数,基于SVM情感模型,对所述测试语音进行情感识别,获得情感识别结果;并对所述测试语音的情感特征参数所述标准语音的情感特征参数进行相似度计算,获得情感相关系数;根据所述情感识别结果和所述情感相关系数,计算出所述测试语音的情感得分;According to the emotion characteristic parameter of described test speech, based on SVM emotion model, carry out emotion recognition to described test speech, obtain emotion recognition result; Calculate and obtain the emotion correlation coefficient; calculate the emotion score of the test speech according to the emotion recognition result and the emotion correlation coefficient; 根据所述标准语音和所述测试语音的发音时长特征参数,获取所述标准语音与所述测试语音的语速比,并根据所述语速比,计算出所述测试语音的语速得分;According to the pronunciation duration characteristic parameter of described standard speech and described test speech, obtain the speech speed ratio of described standard speech and described test speech, and according to described speech speed ratio, calculate the speech rate score of described test speech; 根据所述测试语音的语音单元时长特征参数和所述标准语音的语音单元时长特征参数,利用dPVI算法,获取所述测试语音的dPVI参数,并根据所述dPVI参数,计算出所述测试语音的节奏得分;According to the speech unit duration characteristic parameter of the test speech and the speech unit duration characteristic parameter of the standard speech, utilize the dPVI algorithm to obtain the dPVI parameter of the test speech, and calculate the test speech according to the dPVI parameter tempo score; 根据所述测试语音的音高特征参数和所述标准语音的音高特征参数,利用DTW算法,获取所述标准语音与所述测试语音的音高差异,并根据所述音高差异,计算出所述测试语音的语调得分。According to the pitch characteristic parameter of the test speech and the pitch characteristic parameter of the standard speech, use the DTW algorithm to obtain the pitch difference between the standard speech and the test speech, and calculate according to the pitch difference The intonation score of the test speech. 7.如权利要求6所述的在线口语发音质量评价方法,其特征在于,所述根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果,还包括:7. online spoken language pronunciation quality evaluation method as claimed in claim 6, it is characterized in that, described according to the characteristic parameter of described test speech and the characteristic parameter of standard speech, evaluate described test speech, obtain evaluation result, also include: 对所述准确度得分、所述情感得分、所述语速得分、所述重音得分、所述节奏得分和所述语调得分进行加权求和,获得综合得分;并根据所述准确度得分、所述情感得分、所述语速得分、所述重音得分、所述节奏得分、所述语调得分和所述综合得分,结合各得分与等级评价的映射关系,获取所述测试语音的准确度等级评价、情感等级评价、语速等级评价、重音等级评价、节奏等级评价、语调等级评价和综合等级评价;并将所述测试语音的准确度等级评价、情感等级评价、语速等级评价、重音等级评价、节奏等级评价、语调等级评价和综合等级评价作为所述测试语音的评价结果。The accuracy score, the emotional score, the speed of speech score, the stress score, the rhythm score and the intonation score are weighted and summed to obtain a comprehensive score; and according to the accuracy score, the The emotion score, the speech rate score, the stress score, the rhythm score, the intonation score and the comprehensive score, combined with the mapping relationship between each score and grade evaluation, obtain the accuracy grade evaluation of the test voice , emotional grade evaluation, speech speed grade evaluation, accent grade evaluation, rhythm grade evaluation, intonation grade evaluation and comprehensive grade evaluation; and the accuracy grade evaluation, emotional grade evaluation, speech speed grade evaluation, stress grade evaluation , rhythm level evaluation, intonation level evaluation and comprehensive level evaluation as the evaluation results of the test speech. 8.一种在线口语发音质量评价系统,其特征在于,包括通过网络连接的移动客户端和服务器端;8. An online oral pronunciation quality evaluation system is characterized in that, comprising a mobile client and a server connected through a network; 所述移动客户端包括:The mobile clients include: 语音采集单元,用于采集测试语音,并通过网络将所述测试语音发送给所述服务器端;所述移动客户端以应用程序的方式安装于用户的手机或其他移动设备上,通过调用移动设备中的录音程序采集所述测试语音,并将测试语音通过互联网发送给服务器端;The voice collection unit is used to collect the test voice, and send the test voice to the server through the network; the mobile client is installed on the user's mobile phone or other mobile devices in the form of an application program, and the mobile The recording program in collects the test voice, and sends the test voice to the server end through the Internet; 所述服务器端包括:The server side includes: 预处理单元,用于对接收到的测试语音进行预处理;A preprocessing unit is used to preprocess the received test voice; 特征参数提取单元,用于对预处理后的测试语音进行语音特征参数的提取,获取所述测试语音的特征参数;A characteristic parameter extraction unit is used to extract the speech characteristic parameters of the preprocessed test speech, and obtain the characteristic parameters of the test speech; 语音评价单元,用于根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果;并将所述评价结果通过网络反馈给所述移动客户端;The voice evaluation unit is used to evaluate the test voice according to the feature parameters of the test voice and the feature parameters of the standard voice to obtain an evaluation result; and feed back the evaluation result to the mobile client through the network; 所述移动客户端还包括:The mobile client also includes: 数据显示单元,用于对所述评价结果进行显示;a data display unit, configured to display the evaluation result; 所述根据所述测试语音的特征参数和标准语音的特征参数,对所述测试语音进行评价,获得评价结果,包括:Described according to the characteristic parameter of described test speech and the characteristic parameter of standard speech, described test speech is evaluated, obtains evaluation result, comprises: 根据所述测试语音的重音位置特征参数和所述标准语音的重音位置特征参数,比对所述测试语音与所述标准语音的重音位置差异,并根据所述重音位置差异,计算出所述测试语音的重音得分;According to the stress position characteristic parameter of the test speech and the stress position characteristic parameter of the standard speech, compare the stress position difference between the test speech and the standard speech, and calculate the test according to the stress position difference Speech stress score; 所述服务器端还用于根据所述评价结果,对用户的口语发音进行指导,获取发音指导意见;将所述发音指导意见通过网络反馈给所述移动客户端,并通过所述移动客户端对所述发音指导意见进行显示。The server end is also used to guide the user's spoken pronunciation according to the evaluation results, and obtain pronunciation guidance opinions; feed back the pronunciation guidance opinions to the mobile client through the network, and use the mobile client to The pronunciation guidance is displayed. 9.如权利要求8所述的在线口语发音质量评价系统,其特征在于,所述系统还包括网页管理端,所述网页管理端通过网络与所述服务器端连接;所述服务器端还包括数据库和统计分析单元;9. the online spoken language pronunciation quality evaluation system as claimed in claim 8, is characterized in that, described system also comprises webpage management end, and described webpage management end is connected with described server end by network; Described server end also comprises database and statistical analysis unit; 所述数据库,用于存储所述评价结果;The database is used to store the evaluation results; 所述统计分析单元,用于对评价结果进行统计分析,获得统计结果;并将所述统计结果发送给所述网页管理端;The statistical analysis unit is used to perform statistical analysis on the evaluation results to obtain statistical results; and send the statistical results to the webpage management terminal; 所述网页管理端,用于对接收到的统计结果进行展示。The webpage management terminal is used to display the received statistical results.
CN201510102425.8A 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system Expired - Fee Related CN104732977B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510102425.8A CN104732977B (en) 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510102425.8A CN104732977B (en) 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system

Publications (2)

Publication Number Publication Date
CN104732977A CN104732977A (en) 2015-06-24
CN104732977B true CN104732977B (en) 2018-05-11

Family

ID=53456816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510102425.8A Expired - Fee Related CN104732977B (en) 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system

Country Status (1)

Country Link
CN (1) CN104732977B (en)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9978374B2 (en) * 2015-09-04 2018-05-22 Google Llc Neural networks for speaker verification
CN105205763A (en) * 2015-11-06 2015-12-30 陈国庆 Teaching method and apparatus based on new media modes
CN105513610A (en) * 2015-11-23 2016-04-20 南京工程学院 Voice analysis method and device
CN105488142B (en) * 2015-11-24 2019-07-30 科大讯飞股份有限公司 Performance information input method and system
CN105741832B (en) * 2016-01-27 2020-01-07 广东外语外贸大学 A method and system for spoken language evaluation based on deep learning
CN105608960A (en) * 2016-01-27 2016-05-25 广东外语外贸大学 Spoken language formative teaching method and system based on multi-parameter analysis
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN106056989B (en) * 2016-06-23 2018-10-16 广东小天才科技有限公司 Language learning method and device and terminal equipment
CN106205635A (en) * 2016-07-13 2016-12-07 中南大学 Method of speech processing and system
CN106205634A (en) * 2016-07-14 2016-12-07 东北电力大学 A kind of spoken English in college level study and test system and method
CN106328168B (en) * 2016-08-30 2019-10-18 成都普创通信技术股份有限公司 A kind of voice signal similarity detection method
CN106653055A (en) * 2016-10-20 2017-05-10 北京创新伙伴教育科技有限公司 On-line oral English evaluating system
CN108010513B (en) * 2016-10-28 2021-05-14 北京回龙观医院 Voice processing method and device
CN106531182A (en) * 2016-12-16 2017-03-22 上海斐讯数据通信技术有限公司 Language learning system
CN106782609A (en) * 2016-12-20 2017-05-31 杨白宇 A kind of spoken comparison method
CN107067834A (en) * 2017-03-17 2017-08-18 麦片科技(深圳)有限公司 Point-of-reading system with oral evaluation function
CN108806720B (en) * 2017-05-05 2019-12-06 京东方科技集团股份有限公司 Microphone, data processor, monitoring system and monitoring method
CN107221318B (en) * 2017-05-12 2020-03-31 广东外语外贸大学 English spoken language pronunciation scoring method and system
CN107316638A (en) * 2017-06-28 2017-11-03 北京粉笔未来科技有限公司 A kind of poem recites evaluating method and system, a kind of terminal and storage medium
CN107342079A (en) * 2017-07-05 2017-11-10 谌勋 A kind of acquisition system of the true voice based on internet
CN109697988B (en) * 2017-10-20 2021-05-14 深圳市鹰硕教育服务有限公司 Voice evaluation method and device
CN107818795B (en) * 2017-11-15 2020-11-17 苏州驰声信息科技有限公司 Method and device for evaluating oral English
CN110010123A (en) * 2018-01-16 2019-07-12 上海异构网络科技有限公司 English phonetic word pronunciation learning evaluation system and method
CN108417224B (en) * 2018-01-19 2020-09-01 苏州思必驰信息科技有限公司 Method and system for training and recognition of bidirectional neural network model
CN108322791B (en) * 2018-02-09 2021-08-24 咪咕数字传媒有限公司 A kind of voice evaluation method and device
CN108429932A (en) * 2018-04-25 2018-08-21 北京比特智学科技有限公司 Method for processing video frequency and device
CN108985196A (en) * 2018-06-29 2018-12-11 深圳市华讯方舟太赫兹科技有限公司 Evaluate method, terminal and the device with store function of the accuracy of safety check instrument
CN108922289A (en) * 2018-07-25 2018-11-30 深圳市异度信息产业有限公司 A kind of scoring method, device and equipment for Oral English Practice
CN110148427B (en) * 2018-08-22 2024-04-19 腾讯数码(天津)有限公司 Audio processing method, device, system, storage medium, terminal and server
CN109191349A (en) * 2018-11-02 2019-01-11 北京唯佳未来教育科技有限公司 A kind of methods of exhibiting and system of English learning content
CN109300339A (en) * 2018-11-19 2019-02-01 王泓懿 A kind of exercising method and system of Oral English Practice
CN109658921B (en) * 2019-01-04 2024-05-28 平安科技(深圳)有限公司 Voice signal processing method, equipment and computer readable storage medium
CN109817244B (en) * 2019-02-26 2021-03-19 腾讯科技(深圳)有限公司 Spoken language evaluation method, device, equipment and storage medium
CN109859548A (en) * 2019-04-15 2019-06-07 新乡学院 A kind of college mandarin auxiliary teaching equipment
CN110298537A (en) * 2019-05-21 2019-10-01 威比网络科技(上海)有限公司 Network classroom method for building up, system, equipment and storage medium based on exchange
CN110853679B (en) * 2019-10-23 2022-06-28 百度在线网络技术(北京)有限公司 Speech synthesis evaluation method and device, electronic equipment and readable storage medium
CN111583961A (en) * 2020-05-07 2020-08-25 北京一起教育信息咨询有限责任公司 Stress evaluation method and device and electronic equipment
CN111653292B (en) * 2020-06-22 2023-03-31 桂林电子科技大学 English reading quality analysis method for Chinese students
CN111916106B (en) * 2020-08-17 2021-06-15 牡丹江医学院 A Method to Improve Pronunciation Quality in English Teaching
CN112017694B (en) * 2020-08-25 2021-08-20 天津洪恩完美未来教育科技有限公司 Voice data evaluation method and device, storage medium and electronic device
CN112466335B (en) * 2020-11-04 2023-09-29 吉林体育学院 English pronunciation quality evaluation method based on accent prominence
TWI763207B (en) * 2020-12-25 2022-05-01 宏碁股份有限公司 Method and apparatus for audio signal processing evaluation
CN112863263B (en) * 2021-01-18 2021-12-07 吉林农业科技学院 Korean pronunciation correction system based on big data mining technology
CN112908360B (en) * 2021-02-02 2024-06-07 早道(大连)教育科技有限公司 Online spoken pronunciation evaluation method, device and storage medium
CN112967711B (en) * 2021-02-02 2022-04-01 早道(大连)教育科技有限公司 A method, system and storage medium for evaluating spoken language pronunciation in small languages
CN112951274A (en) * 2021-02-07 2021-06-11 脸萌有限公司 Voice similarity determination method and device, and program product
CN112767961B (en) * 2021-02-07 2022-06-03 哈尔滨琦音科技有限公司 Accent correction method based on cloud computing
CN113192494A (en) * 2021-04-15 2021-07-30 辽宁石油化工大学 Intelligent English language identification and output system and method
CN113160852A (en) * 2021-04-16 2021-07-23 平安科技(深圳)有限公司 Voice emotion recognition method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630448A (en) * 2008-07-15 2010-01-20 上海启态网络科技有限公司 Language learning client and system
CN102054375A (en) * 2009-11-09 2011-05-11 康俊义 main system of teaching language proficiency
CN102800314A (en) * 2012-07-17 2012-11-28 广东外语外贸大学 English sentence recognizing and evaluating system with feedback guidance and method of system
CN103617799A (en) * 2013-11-28 2014-03-05 广东外语外贸大学 Method for detecting English statement pronunciation quality suitable for mobile device
CN103928023A (en) * 2014-04-29 2014-07-16 广东外语外贸大学 Voice scoring method and system
CN104050965A (en) * 2013-09-02 2014-09-17 广东外语外贸大学 English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
CN204117590U (en) * 2014-09-24 2015-01-21 广东外语外贸大学 Voice collecting denoising device and voice quality assessment system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2148340C (en) * 1995-05-01 2004-12-07 Gianni Di Pietro Method and apparatus for automatically and reproducibly rating the transmission quality of a speech transmission system
CN1592236A (en) * 2003-09-03 2005-03-09 华为技术有限公司 Method and device for testing speech quality

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630448A (en) * 2008-07-15 2010-01-20 上海启态网络科技有限公司 Language learning client and system
CN102054375A (en) * 2009-11-09 2011-05-11 康俊义 main system of teaching language proficiency
CN102800314A (en) * 2012-07-17 2012-11-28 广东外语外贸大学 English sentence recognizing and evaluating system with feedback guidance and method of system
CN104050965A (en) * 2013-09-02 2014-09-17 广东外语外贸大学 English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
CN103617799A (en) * 2013-11-28 2014-03-05 广东外语外贸大学 Method for detecting English statement pronunciation quality suitable for mobile device
CN103928023A (en) * 2014-04-29 2014-07-16 广东外语外贸大学 Voice scoring method and system
CN204117590U (en) * 2014-09-24 2015-01-21 广东外语外贸大学 Voice collecting denoising device and voice quality assessment system

Also Published As

Publication number Publication date
CN104732977A (en) 2015-06-24

Similar Documents

Publication Publication Date Title
CN104732977B (en) A kind of online spoken language pronunciation quality evaluating method and system
JP7540080B2 (en) Synthetic Data Augmentation Using Voice Conversion and Speech Recognition Models
CN103928023B (en) A kind of speech assessment method and system
US12027165B2 (en) Computer program, server, terminal, and speech signal processing method
Sinith et al. Emotion recognition from audio signals using Support Vector Machine
CN103617799B (en) A kind of English statement pronunciation quality detection method being adapted to mobile device
Deshwal et al. Feature extraction methods in language identification: a survey
KR20240135018A (en) Multi-modal system and method for voice-based mental health assessment using emotional stimuli
EP2418643A1 (en) Computer-implemented method and system for analysing digital speech data
Yusnita et al. Malaysian English accents identification using LPC and formant analysis
CN108305639A (en) Speech-emotion recognition method, computer readable storage medium, terminal
CN113782032B (en) Voiceprint recognition method and related device
Liu et al. AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning
Fan et al. The impact of student learning aids on deep learning and mobile platform on learning behavior
Wang Detecting pronunciation errors in spoken English tests based on multifeature fusion algorithm
Wu et al. The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge.
Dwivedi et al. Analysing the impact of lstm and mfcc on speech emotion recognition accuracy
Yusnita et al. Analysis of accent-sensitive words in multi-resolution mel-frequency cepstral coefficients for classification of accents in Malaysian English
CN117711444B (en) Interaction method, device, equipment and storage medium based on talent expression
Janbakhshi et al. Automatic pathological speech intelligibility assessment exploiting subspace-based analyses
Zhao et al. A multimodal teacher speech emotion recognition method in the smart classroom
Gao Audio deepfake detection based on differences in human and machine generated speech
Yerigeri et al. Meta-heuristic approach in neural network for stress detection in Marathi speech
Sahoo et al. Analyzing the vocal tract characteristics for out-of-breath speech
CN111210845B (en) A Pathological Speech Detection Device Based on Improved Autocorrelation Features

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180511