CN1952995B

CN1952995B - Intelligent interaction language exercise device and method

Info

Publication number: CN1952995B
Application number: CN2005100306234A
Authority: CN
Inventors: 潘鹏凯; 欧可祺; 苏乐文
Original assignee: SAYBOT INFORMATION TECHNOLOGY (SHANGHAI) Co Ltd
Current assignee: Xueli Network Technology Shanghai Co ltd
Priority date: 2005-10-18
Filing date: 2005-10-18
Publication date: 2010-09-22
Anticipated expiration: 2025-10-18
Also published as: CN1952995A

Abstract

This invention relates to intelligent interact language train method and device, which comprises the following steps: providing first language database composed of at least one first sound data; providing one sound module and language database composed of at least one sound label data; collecting sound module and label data; providing second sound database; selecting first sound data from first sound database for output; receiving learner sound input into input sound data; identifying sound through sound module and language data base; getting one label data to get second language data; outputting second sound data through audio output device.

Description

Intelligent interaction language exercise device and method thereof

Technical field

The present invention specifically, relates to a kind of language exercise device and exercising method with interaction function relevant for language learning equipment and method.

Background technology

Along with development of times and progress, person to person's interchange no longer is subjected to the restriction of region, and except regional contacts, international exchange is also more and more frequent.The content that exchanges except common import and export commercial affairs, the interchange of other type, also more and more general such as investment, tourism etc.The pattern that dependence translation in the past exchanges also can not adapt to this development.Therefore, in society, people have begun the technical ability of requirement foreign language language ability as a kind of indispensability of people, and foreign language studying is become common practise.

Learn a language, except working hard, language environment is quite important.Why child can learn language, and environment has played sizable effect.Yet the language learner mostly learns within the gateway of a country at present, and the shortcoming of foreign language language environment is to learn and learn well one of most important obstacle of language fast.

For the help desk personnel learns language well, study servicing unit miscellaneous has appearred in the market.Roughly can be divided into following a few class:

One, electronic dictionary class: the user can be by this Chinese of class device input or English word, device can provide corresponding English or translator of Chinese then, or also comprise some explanation and explanations to speech, comparatively Gao Dang device can also provide the bill of this word.Obviously, the effect that remains a dictionary that this class device plays provides the function of user pronunciation at the most, and is still far away from user's demand;

Two, study class: this class device can comprise an input media, a display device and an audio output device, and the user can select the content learnt, and by display device or audio output device output, the user is with reading the content of study for device.In some functions are better installed, can also be input to user's pronunciation in the device, discern comparison, then, export a score value, with the form of score value, inform the order of accuarcy of user pronunciation.

This class learning device has been made a language learning environment to the learner to a certain extent, has improved the efficient and the effect of user's language learning.Yet, since its with learner's interaction the form by mark only, though the learner can be by the pronunciation order of accuarcy of mark understanding oneself, yet, where wrongly go to understand in the true time of cacology.Therefore, interactive the waiting of this class learning device improved.

Summary of the invention

Therefore, the object of the present invention is to provide a kind of better interactive intelligent interaction language exercise device and method that have,, create and provide one more near true man's language environment to the user by apparatus and method of the present invention.

According to above-mentioned purpose of the present invention, the invention provides a kind of interaction language exercise method, comprise the steps:

(a) provide first sound bank, comprise at least one the first speech datas;

(b) provide a speech model and syntax library, comprise at least one speech model recognition data;

(c) each described speech model recognition data is associated with an identification data;

(d) provide second sound bank, comprise at least one second speech data;

(e) second speech data of described identification data with described second sound bank is associated;

(f) from described first sound bank, select one first speech data, export by audio devices;

(g) reception learner's phonetic entry converts the input speech data to;

(h) described input speech data is carried out speech recognition by described speech model and syntax library, be complementary with a speech model recognition data, thereby obtain an identification data;

(i), from described second sound bank, obtain second speech data according to described identification data; And

(j) by audio output device, export described second speech data.

In said method, described first sound bank is the guiding sound bank, and described first speech data is the leading question sound data.

In said method, described second sound bank is the feedback sound bank, and described second speech data is the backchannel sound data.

In said method, one the 3rd sound bank also is provided, described the 3rd sound bank comprises at least one 3rd speech datas; Described the 3rd speech data in described the 3rd sound bank is associated with described first speech data in described first sound bank; In described step (f) afterwards, also comprise:

(f1) according to described first speech data of having exported, utilize the relevance of described the 3rd speech data and described first speech data, from described the 3rd sound bank, select one article of the 3rd speech data, export by audio frequency apparatus.

In said method, described the 3rd sound bank is the explanation sound bank, and described the 3rd speech data is the explanation speech data.

In said method, whether carry out described step (f1) according to user's selection decision.

In said method, in described step (g) afterwards, also comprise:

(g1) the described input speech data of storage.

In said method, in described step (j) afterwards, also comprise:

(k) export described first speech data by described audio output apparatus once more, perhaps by the described input speech data of described audio frequency apparatus output in step (g1) storage.

In said method, also comprise:

One exercise statement library is provided, comprises at least one exercise statement video data, described exercise statement video data is associated with first speech data in described first sound bank;

According to described first speech data, from described exercise statement library, select an exercise statement video data, show described exercise statement video data by a display device.

In said method, also comprise:

One feedback video data storehouse is provided, comprises at least one feedback video data, described feedback video data is associated with described identification data;

According to the described identification data that obtains in step (h), from described feedback video data storehouse, select a feedback video data, show described feedback video data by a display device.

In said method, also comprise:

Described speech model recognition data is associated with a fractional data;

When described step (h), obtain a fractional data;

This fractional data is shown by a display device.

In said method, described speech model recognition data comprises received pronunciation Model Identification data and garbled voice Model Identification data, and described received pronunciation Model Identification data are the speech model recognition data that is regarded as orthoepy; Described garbled voice Model Identification data are the speech model recognition data that is regarded as incorrect pronunciations.

In said method, the described first leading question sound data is the data of MP3 format or the data of OGG-Speex form, and the described second leading question sound data is the data of MP3 format or the data of OGG-Speex form.

The present invention also provides a kind of interaction language exercise device, comprising:

First sound bank comprises at least one the first speech datas;

Speech model and syntax library, comprise at least one speech model recognition data with the related identification data of described speech model recognition data;

Second sound bank comprises at least one second speech data, and described second speech data is associated with described identification data;

Control device links to each other with described first sound bank, selects first speech data from described first sound bank;

Audio output device links to each other with described first sound bank with described control device, according to the selection of described control device, obtains described first speech data from first sound bank, and output;

Speech input device is used to receive the user's voice input, and converts described phonetic entry to the input speech data; And

Recognition device links to each other with described speech input device, is used to receive described input speech data, and described input speech data is carried out speech recognition by described speech model and syntax library, is complementary with a speech model recognition data, obtains an identification data;

Described control device also links to each other with described recognition device, receives described identification data, according to described identification data, selects second speech data from described second sound bank;

Audio output device also links to each other with described second sound bank, according to the selection of control device, obtains described second speech data from second sound bank, and output.

In said apparatus, described first sound bank is the guiding sound bank, and described first speech data is the leading question sound data.

In said apparatus, described second sound bank is the feedback sound bank, and described second speech data is the backchannel sound data.

In said apparatus, also comprise:

The 3rd sound bank comprises at least one 3rd speech datas, and described the 3rd speech data in described the 3rd sound bank is associated with described first speech data in described first sound bank;

Described control device also links to each other with described the 3rd sound bank, according to described first speech data, utilizes the relevance of described the 3rd speech data and described first speech data, selects one article of the 3rd speech data from described the 3rd sound bank;

Described audio frequency apparatus also links to each other with described the 3rd sound bank, according to the selection of described control device, obtains described the 3rd speech data from the 3rd sound bank, and output.

In said apparatus, described the 3rd sound bank is the explanation sound bank, and described the 3rd speech data is the explanation speech data.

In said apparatus, also comprise:

Input media receives user's input, is used to select first speech data.

In said apparatus, also comprise:

Input voice memory storage links to each other with described speech input device, is used to store described input speech data.

In said apparatus, described instantaneous speech power links to each other with described input voice memory storage, is used to export described input speech data.

In said apparatus, also comprise:

The exercise statement library comprises at least one exercise statement video data, and described exercise statement video data is associated with described first sound bank;

Display device;

Described control device links to each other with described exercise statement library with described display device, according to described first speech data, selects an exercise statement video data from described exercise statement library, shows described exercise statement video data by described display device.

In said apparatus, also comprise:

Feedback video data storehouse comprises at least one feedback video data, and described feedback video data is associated with described identification data;

Described control device is selected a feedback video data according to described identification data from described feedback video data storehouse, show described type feedback video data by described display device.

In said apparatus, described speech model recognition data also is associated with a fractional data;

Described recognition device obtains fractional data, and described control device receives described fractional data from described recognition device, and described fractional data is offered described display device shows.

In said apparatus, described speech model recognition data comprises received pronunciation Model Identification data and garbled voice Model Identification data, and described received pronunciation Model Identification data are the speech data of orthoepy; Described garbled voice Model Identification data are the speech data of incorrect pronunciations.

In said apparatus, the described first leading question sound data is the data of MP3 format or the data of OGG-Speex form, and the described second leading question sound data is the data of MP3 format or the data of OGG-Speex form.

As mentioned above, exercising method of the present invention and device provide instant voice dialogue to the user, and can provide concrete voice feedback to the mistake that the learner made, and the user is just as many at one's side foreign language teachers, improve academic environment effectively, improve the preparatory and learning efficiency of study.

Description of drawings

Fig. 1 is the structural drawing of intelligent interaction language exercise device of the present invention;

Fig. 2 to Fig. 6 is the structural drawing of each variation example of intelligent interaction language exercise device of the present invention.

Embodiment

To describe specific embodiments of the invention with reference to the accompanying drawings in detail below; be to be understood that; following description only is a concrete example; be in order to help to understand and realize the present invention; these examples should not become limitation of the present invention, and protection scope of the present invention is just limited by appending claims.

At first, see also Fig. 1, Fig. 1 shows the structured flowchart of interaction language exercise device of the present invention.As shown in Figure 1, Fig. 1 shows a basic structure of the present invention, and it comprises first sound bank 10, second sound bank 20, speech model and syntax library 30, control device 40, recognition device 50, speech input device 60 and audio output device 70.

First sound bank 10 comprises at least one the first speech datas, and first speech data can be the leading question sound data in the present invention, as the data that need the user to follow to read: nuclear, or I was in your shoes justa few years ago; Can perhaps be an enquirement: How old are you?

First speech data can adopt MP3 format data commonly used at present, and perhaps the OGG-Speex audio format also can adopt for example audio format data such as WAV, AAC.

Second sound bank 20 comprises at least one second speech data, and second speech data is the backchannel sound data in the present invention.So-called feedback, the back will be further described.

Speech model and syntax library 30 include at least one speech model recognition data.This speech model recognition data is corresponding with first speech data in first sound bank 10.For example, for the example of top " nuclear ", in speech model and syntax library 30 to right pronunciation that this word should be arranged [nu:kli:

] the speech model recognition data, corresponding except the speech model recognition data of right pronunciation with first speech data in the present invention, can also comprise the speech model recognition data of mistake pronunciation.For the example of top " nuclear ", in speech model and syntax library 30 except right pronunciation that this word is arranged [nu:kli: ] the speech model recognition data outside, can also comprise the speech model recognition data of some typical fault pronunciations, for example: [nu:ku:l

], perhaps [nu:kel

] etc.

Again for example, example for top I was in your shoes just a few years ago, with the speech model recognition data of the corresponding right pronunciation except this sentence of first speech data of this sentence, the speech model recognition data that can also comprise some typical fault pronunciations or syntax error is the example of the speech model recognition data of some sentences with grammar mistake below:

I?was?in?your?shoes?just?a?few?years；

I?was?in?your?shoes?just?a?years?ago.

For the example of top enquirement, except to the pronunciation of the correct answer of this enquirement as the speech model recognition data, can also be the pronunciation of the answer of some typical faults as the speech model recognition data.

The also related identification data of each speech model recognition data.This identification data is used for this speech model recognition data is made a sign, so that obtain the feedback data of this speech model recognition data by this identification data.

The backchannel sound data that comprises in second sound bank 20 is corresponding with the identification data in speech model and the syntax library 30, and each promptly related with voice mould recognition data identification data can find the backchannel sound data of a correspondence in second sound bank 20.Identification data can use data to represent surely, for example corresponding to " 00 " identification data, in second sound bank 20, there is a pronunciation corresponding with it, can from second sound bank 20, finds the backchannel sound data of " correct " according to identification data " 00 " for the backchannel sound data of " correct ".Corresponding to " 03 " identification data, the backchannel sound data of a pronunciation for " grammar mistake " arranged in second sound bank 20, can comprise that also can be pointed out the backchannel sound data of wrong part more specifically, for example for the speech model recognition data of " I was inyour shoes just a few years ", its backchannel sound data can be that " I hear that you have said I was in your shoes just a few years, have missed ago.Please try again one time ", undertaken related by identification data between speech model recognition data and the backchannel sound data.

Content as for what and voice feedback of identification data can be determined with the course of reality as required that some common voice feedback can comprise: pronunciation (or answer) correctly, mispronunciation, grammar mistake, intonation mistake, stress mistake etc.

Control device 40 is core cells of interaction language exercise device, and whole device is coordinate operation under its unified control.It links to each other with first sound bank 10, selects one first speech data from first sound bank 10, offers the audio output device 70 that is attached thereto.

Audio output device 70 adopts elements such as loudspeaker usually, and for example, if control device 40 has been selected first speech data of nuclear, then audio output device 70 sends the right pronunciation of nuclear.Certainly, the selection of control device 40 can be that device itself is set, and carries out in sequence; Also can be selected by other input media (for example keyboard or mouse etc. are not shown) by the user, this choice structure belongs to known technology, in the present embodiment, no longer is explained in detail.

Speech input device 60 adopts usually such as sound-electricity conversion devices such as microphones, and it can receive the user's voice input, phonetic entry is converted to the input speech data of electronic type.After the standard pronunciation that nuclear that audio output device 70 sends etc. requires the user to follow to read, the user by speech input device 60 with in the phonetic entry auto levelizer of reading.

Recognition device 50 links to each other with speech input device 60, receive the input speech data, then, to import speech data and carry out speech recognition by speech model and syntax library 30, identify immediate speech model recognition data, obtain identification data by association then, and identification data is offered control device 40.Speech model in the present embodiment and syntax library 30 and recognition device 50 can use some technique known, and particular content can be referring to for example " Spoken Language Processing " (coming from Prentice Hall PTR (2001)) and StatisticalMethod for Speech Recognition (coming from MIT Press 98).

40 of control device are according to this identification data, in 20, obtain second speech data by associative search from second sound bank; Offer audio output device 70 searching the second speech data that obtains then, send feedback in the audio frequency mode to the user by audio output device 70.

Be the example of a user learning below:

Control device 40 is by the order of setting, or according to user's selection, selected first speech data of nuclear from first sound bank 10, sends the orthoepy of nuclear to the user by audio output device 70.

Following of device wait user read then, if user's pronunciation is [nu:ku:l ], by speech input device 60 this pronunciation is converted to after the input speech data, in recognition device 50, discern, from speech model and syntax library 30, identified invention for [nu:ku:l

] the recognizing voice data, obtain the identification data related by this recognizing voice data with it.

Then, recognition device 50 offers control module 40 to identification data, control module is searched from second sound bank 20 and is obtained corresponding second speech data by this identification data, this second speech data can be " I hear in your pronunciation second and triphone be [ku:l

], it is wrong to pronounce, and please tries one time again ".

The following describes and describe other feasible variation example more of the present invention.

Change example one:

See also shown in Figure 2ly, compare with the embodiment of Fig. 1, the variation example of Fig. 2 has increased by one the 3rd sound bank, 80, the three sound banks 80 and has comprised at least one 3rd speech datas, and the 3rd speech data is associated with first speech data in first sound bank 20.

When control device 40 has been exported first speech data of selecting in first sound bank 10 by audio output device 70 after, by association, from the 3rd sound bank 80, find out the 3rd speech data that is associated, and the 3rd speech data offered audio output device 70, by audio output device 70 outputs.

The 3rd speech data can be the explanation speech data.For example, with the corresponding explanation speech data of first speech data of " nuclear " can be " implication of this word is a core, nuclear, please follow and read ".

Whether device utilizes the 3rd speech data, then can select by for example input media such as keyboard or mouse by the user, and be to determine whether play the explanation speech data.

Change example two:

See also shown in Figure 3ly, compare with the embodiment of Fig. 1, the variation example of Fig. 3 has increased an input voice memory storage 90, and this input voice memory storage 90 links to each other with speech input device 60, is used for the input speech data that 60 conversions of storaged voice input media are exported.Control device 40 is (for example default or user select) as required, are stored in the input speech data of importing in the voice memory storage 90 by audio output device 70 outputs.

For example, after device has been play the backchannel sound data, can be according to default or user's selection, be stored in the input speech data of importing in the voice memory storage 60 by audio output apparatus 70 outputs, understand the pronunciation of oneself for the user, perhaps select, export first speech data (leading question sound data) once more, follow once more for the user and read according to default or user.

Change example three:

See also shown in Figure 4ly, compare with the embodiment of Fig. 1, the variation example of Fig. 4 has increased an exercise statement library 100 and display device 110.

This exercise statement library 100 comprises at least one exercise statement video data, and this exercise statement video data is associated with first speech data in first sound bank.Control device 40 is (for example default or user select) as required, and exercise statement video data is shown to the user by display device 110.

For example, control device 40 has been selected after one first speech data from first sound bank 10, before or after by audio output device 70 outputs first speech data, according to first speech data of selecting, pass through incidence relation, the exercise statement video data that selection is associated from exercise statement library 100, and send display device 110 to, show to the user.

Change example four:

See also shown in Figure 5ly, compare with the embodiment of Fig. 4, the variation example of Fig. 5 is to have substituted exercise statement library 100 with feedback video data storehouse 120.

This feedback video data storehouse 120 has comprised at least one feedback video data, and this feedback video data is associated with identification data.In a concrete example, feedback video data content displayed can be corresponding with the content of backchannel sound data.Control device 40 is (for example default or user select) as required, according to the identification data that obtains with incidence relation, from feedback video data storehouse 120, fed back video data accordingly, then, the feedback video data is shown to the user by display device 110 one that can be used as the backchannel sound data is replenished.

Change example five:

See also shown in Figure 4, in the embodiment of Fig. 4, except the speech model recognition data with an identification data is associated, can also be a speech model recognition data and a relevant certificate of fractional data.Recognition device 50 except obtaining identification data, by incidence relation, also can obtain this fractional data simultaneously, and offer control device 40 when carrying out speech recognition.Control device 40 can be this fractional data by display device 110 to user's data of flashing.This mark can be represented user's score situation of exercise this time.

Change example six:

See also shown in Figure 6ly, in the embodiment of Fig. 6, in identification device 50, can also increase and comprise a prosodic analysis device 55.Prosodic analysis device 55 can judge whether input speech data (or pronunciation of learner) has problems at aspects such as stress, sentence tone, word speeds.Prosodic analysis device 55 is exported an identification data after having done above-mentioned analysis, offer control device 40, then by control device 40 according to this identification data, from second sound bank 20, obtain second speech data by associative search.In the present embodiment, prosodic analysis device 55 can adopt technique known,, particular content can be referring to for example " Spoken LanguageProcessing " (coming from Prentice Hall PTR (2001)) and Statistical Method for SpeechRecognition (coming from MIT Press 98).

Though described some variation examples that some embodiments of the invention may occur above respectively, but be to be understood that, above-mentioned these descriptions are not to be limitation of the present invention, above-mentioned these change example also can be combined to form new variation example mutually, for example change example one and can form the new variation example of formation with variation two, get because of these make up all can derive after having understood the present invention for those skilled in the art,, describe no longer one by one at this therefore for making description too not complicated.

Claims

1. interaction language exercise device comprises:

First sound bank comprises at least one the first speech datas;

Speech model and syntax library, comprise the speech model recognition data with the related identification data of described speech model recognition data, described speech model recognition data comprises and the speech model recognition data of the corresponding right pronunciation of described first speech data and wrong speech model recognition data;

Second sound bank, comprise second speech data, described second speech data is associated with described identification data, described second speech data is the backchannel sound data, and described second speech data has comprised corresponding to the backchannel sound data of the speech model recognition data of right pronunciation with corresponding to the backchannel sound data of the speech model recognition data of mistake;

2. device as claimed in claim 1 is characterized in that, described first sound bank is the guiding sound bank, and described first speech data is the leading question sound data.

3. device as claimed in claim 2 is characterized in that, also comprises:

4. device as claimed in claim 3 is characterized in that, described the 3rd sound bank is the explanation sound bank, and described the 3rd speech data is the explanation speech data.

5. device as claimed in claim 1 is characterized in that, also comprises:

Input media receives user's input, is used to select first speech data.

6. as claim 1 or 3 described devices, it is characterized in that, also comprise:

7. device as claimed in claim 6 is characterized in that, described instantaneous speech power links to each other with described input voice memory storage, is used to export described input speech data.

8. as claim 1 or 3 described devices, it is characterized in that, also comprise:

Display device;

9. as claim 1 or 3 described devices, it is characterized in that, also comprise:

10. device as claimed in claim 9 is characterized in that, described speech model recognition data also is associated with a fractional data;

11., it is characterized in that the described first leading question sound data is the data of MP3 format or the data of OGG-Speex form as claim 1 or 3 described devices, the described second leading question sound data is the data of MP3 format or the data of OGG-Speex form.