CN106653055A

CN106653055A - On-line oral English evaluating system

Info

Publication number: CN106653055A
Application number: CN201610912307.8A
Authority: CN
Inventors: 李曙光
Original assignee: Beijing Innovation Partner Education Technology Co Ltd
Current assignee: Beijing Innovation Partner Education Technology Co Ltd
Priority date: 2016-10-20
Filing date: 2016-10-20
Publication date: 2017-05-10

Abstract

The invention relates to the field of education automation, and in particular relates to an on-line oral English evaluating system. The on-line oral English evaluating system comprises a voice preprocessing module, a convolution neural network analyzing module, and an assessment and feedback module, wherein the voice preprocessing module is used for randomly segmenting an oral English audio file to be evaluated into slices with equal length; the convolution neural network analyzing module is used for carrying out the short-time Fourier transform on the obtained audio slices so that corresponding two-dimensional time-frequency graphs are generated, and then carrying out high-level abstraction on the two-dimensional time-frequency graphs one by one so that the high-level abstraction characteristics of the audio slices are obtained; the assessment and feedback module is used for analyzing the high-level abstraction characteristics of the audio slices one by one through a machine learning model so that the score of each audio slice is obtained, and then averaging all the scores so that the final oral English evaluating score is obtained.

Description

Online English speaking assessment system

Technical field

The present invention relates to education automation field, and in particular to a kind of online English speaking assessment system.

Background technology

The product of spoken on-line testing has been occurred in that in the market, but the employing at present of these products is all such as lower section Method：Student's spoken audio is identified as into text first with speech recognition technology, then the text to recognizing carries out signature analysis, most Provide student's spoken language assessment result with machine learning algorithm afterwards.The method greatest problem is from speech recognition period and follow-up The signature analysis stage.First, high-precision English Phonetics identification engine R＆D costs are expensive, at present only similar Google, IBM it The large-scale scientific ＆ technical corporation of class or research unit just possess.Secondly, the result of speech recognition determine it is follow-up all, but current English Language speech recognition technology simply has enough accuracys rate in the speech recognition of pronunciation standard, and in the not accurate enough English that pronounces It is also undesirable in beginner's (such as Chinese learners) speech recognition.Finally, the signature analysis stage need Oral English Practice to teach The expert for learning examination field carrys out design feature, and this can also consume many manpower and materials, and effect is bad.

The content of the invention

Goal of the invention：The present invention makes improvement for the problem that above-mentioned prior art is present, i.e., the invention discloses online English speaking assessment system, it is not using English Phonetics technology of identification and is being independent of Oral English Practice field teaching examination expert's In the case of, the assessment to learner's Oral English Practice and marking are realized, reach accurate with the assessment of existing method same level even more high True property and robustness.

Technical scheme：Online English speaking assessment system, including with lower module：

Voice pretreatment module, for by Oral English Practice audio file random division to be evaluated be equal length section；

Convolutional neural networks analysis module, the audio frequency section to obtaining carries out Short Time Fourier Transform and generates corresponding two dimension Time-frequency figure, then one by one high-level abstractions are carried out to two-dimentional time-frequency figure, obtain the high-level abstractions feature of audio frequency section；

Assessment and feedback module, are analyzed one by one by machine learning model to the high-level abstractions feature of audio frequency section To the fraction of each audio frequency section, then all scores are taken the mean obtain final English speaking assessment fraction.

Further, random audio section when a length of 5 seconds.

Further, voice pretreatment module includes with lower module,

Speech analysis module is then right for being equal length section by Oral English Practice audio file random division to be evaluated All of audio frequency section carries out preemphasis, voice framing, adding window and end-point detection；

Voice signal processing module, for the section of all audio frequency, is sequentially completed time-domain analysis, frequency-domain analysis and cepstrum domain point Analysis；

Parameters,acoustic analysis module, is analyzed and calculates to the parameters,acoustic of audio frequency section, and parameters,acoustic includes MEL frequencies Rate cepstrum coefficient, linear prediction residue error and Line Spectral Pair coefficients；

Further, voice signal processing module is included with lower module：

Time-domain analysis module, the time domain charactreristic parameter in analysis and extraction audio frequency section；

Frequency-domain analysis module, by bandpass filter group method, Short Time Fourier Transform method, frequency domain Pitch detection, when-frequency Method for expressing, extracts frequency spectrum, power spectrum, cepstrum, the spectrum envelope of audio frequency section；

Cepstrum domain analyzing module, analyzes and extracts the cepstral domain feature parameter of audio frequency section, further by Homomorphic Processing Glottal excitation information and sound channel response message are effectively separated：Glottal excitation information be used for judge pure and impure sound, seek pitch period, Sound channel response message is used to seek formant, for the coding of voice, synthesis, identification.

Further, time domain charactreristic parameter includes short-time energy and short-time average magnitude, short-time average zero-crossing rate, in short-term Auto-correlation coefficient and short-time average magnitade difference function.

Further, convolutional neural networks analysis module includes with lower module,

Input module, by audio frequency section the two-dimentional time-frequency figure of some same scales is changed into；

Convolutional layer C1, the two-dimentional time-frequency figure that input module is obtained is rolled up by trainable wave filter and Ke Jia biasings Product, has obtained the local feature of two-dimentional time-frequency figure；

Feature Mapping figure S2, the local feature of the two-dimentional time-frequency figure extracted to convolutional layer C1 is sampled, weighted value, plus Bias to calculate image-region special characteristic maximum, characteristics of image is polymerized and is mapped；

Convolutional layer C3, the convolutional calculation for carrying out again by the characteristics of image that Feature Mapping figure S2 is obtained, obtains image Low-dimensional local feature；

Feature Mapping figure S4, samples to the characteristics of image that convolutional layer C3 is extracted, weighted value, plus biases to calculate image Region special characteristic mean value, completes the final polymerization to characteristics of image and mapping；

Output module, by the parameters,acoustic of each audio frequency section and the characteristics of image after Feature Mapping figure S4 process After combination, exported as the global feature of audio frequency section.

Beneficial effect：Online English speaking assessment system disclosed by the invention has the advantages that：

1st, low cost, without the need for relying on speech recognition technology, without the need for organizing Oral English Teaching examination in terms of expert carry out Characteristic Design；

2nd, strong robustness, without the need for relying on voice identification result, the spoken assessment to non-native English speaker personage also has high precision Rate；

3rd, extensibility is strong, as the accumulation of data can constantly carry out self-teaching, effect in the case of mass data It is splendid.

Description of the drawings

Fig. 1 is the schematic diagram of convolutional neural networks analysis module.

Specific embodiment：

The specific embodiment of the present invention is described in detail below.

Online English speaking assessment system, including with lower module：

Further, random audio section when a length of 5 seconds.

Further, voice pretreatment module includes with lower module,

Further, voice signal processing module is included with lower module：

Further, as shown in figure 1, convolutional neural networks analysis module is included with lower module,

Embodiments of the present invention are elaborated above.But the present invention is not limited to above-mentioned embodiment, In the ken that art those of ordinary skill possesses, can be doing on the premise of without departing from present inventive concept Go out various change.

Claims

1. online English speaking assessment system, it is characterised in that include with lower module：

Convolutional neural networks analysis module, the audio frequency section to obtaining carries out Short Time Fourier Transform and generates corresponding two-dimentional time-frequency Figure, then one by one high-level abstractions are carried out to two-dimentional time-frequency figure, obtain the high-level abstractions feature of audio frequency section；

Assessment and feedback module, are analyzed one by one to the high-level abstractions feature of audio frequency section by machine learning model and obtain every The fraction of individual audio frequency section, then all scores are taken the mean obtain final English speaking assessment fraction.

2. online English speaking assessment system according to claim 1, it is characterised in that random audio section when it is a length of 5 seconds.

3. online English speaking assessment system according to claim 1, it is characterised in that voice pretreatment module include with Lower module,

Speech analysis module, for being equal length section by Oral English Practice audio file random division to be evaluated, then to all Audio frequency section carry out preemphasis, voice framing, adding window and end-point detection；

Voice signal processing module, for the section of all audio frequency, is sequentially completed time-domain analysis, frequency-domain analysis and cepstrum domain analysis；

Parameters,acoustic analysis module, is analyzed and calculates to the parameters,acoustic of audio frequency section, and parameters,acoustic falls including MEL frequencies Spectral coefficient, linear prediction residue error and Line Spectral Pair coefficients.

4. online English speaking assessment system according to claim 3, it is characterised in that voice signal processing module includes With lower module：

Frequency-domain analysis module, by bandpass filter group method, Short Time Fourier Transform method, frequency domain Pitch detection, when-frequency represent Method, extracts frequency spectrum, power spectrum, cepstrum, the spectrum envelope of audio frequency section；

Cepstrum domain analyzing module, analyzes and extracts the cepstral domain feature parameter of audio frequency section, further by sound by Homomorphic Processing Door excitation information and sound channel response message effectively separate：Glottal excitation information be used for judge pure and impure sound, seek pitch period, sound channel Response message is used to seek formant, for the coding of voice, synthesis, identification.

5. online English speaking assessment system according to claim 4, it is characterised in that time domain charactreristic parameter is included in short-term Energy and short-time average magnitude, in short-term short-time average zero-crossing rate, auto-correlation coefficient and short-time average magnitade difference function.

6. online English speaking assessment system according to claim 1, it is characterised in that convolutional neural networks analysis module Including with lower module,

Convolutional layer C1, the two-dimentional time-frequency figure that input module is obtained carries out convolution by trainable wave filter and Ke Jia biasings, The local feature of two-dimentional time-frequency figure is obtained；

Feature Mapping figure S2, the local feature of the two-dimentional time-frequency figure extracted to convolutional layer C1 is sampled, weighted value, plus biasing To calculate image-region special characteristic maximum, characteristics of image is polymerized and is mapped；

Convolutional layer C3, the convolutional calculation for carrying out again by the characteristics of image that Feature Mapping figure S2 is obtained, obtains the low-dimensional of image Local feature；

Feature Mapping figure S4, samples to the characteristics of image that convolutional layer C3 is extracted, weighted value, plus biases to calculate image district Domain special characteristic mean value, completes the final polymerization to characteristics of image and mapping；

Output module, by the parameters,acoustic of each audio frequency section and the image characteristic combination after Feature Mapping figure S4 process Afterwards, exported as the global feature of audio frequency section.