JP2004334164A

JP2004334164A - System for learning pronunciation and identification of english phonemes "l" and "r"

Info

Publication number: JP2004334164A
Application number: JP2003353101A
Authority: JP
Inventors: Toshimasa Ishihara; 敏正石原
Original assignee: Individual
Current assignee: Individual
Priority date: 2002-10-24
Filing date: 2003-10-14
Publication date: 2004-11-25

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problems with the prior art that a learner finds it difficult to obtain a sufficiently satisfactory result when learning the pronunciation and identification of English phonemes "l" and "r" and that there is no other choice for the learner but to make such learning only on the limited learning site even if the learner desires to make study. <P>SOLUTION: The system is previously registered therein with the characteristic patterns of the voices "la" and "ra" uttered for several seconds by stressing the portions of the English phoneme "l" and "r" with lapse of time as model patterns and has a function to continuously display the model patterns and the imitation patterns of the learner in real time in compliance with the frequency of the learner in parallel on the same screen and to reproduce the voices of both as required. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

語学教育現場、例えば一般の英会話学校とか、初期の段階で英語を教える中学校、高等学校、大学等のコンピュータを利用することで、更に遠隔地の個々の学習者もコンピュータ間を結ぶ情報通信ネットワークシステムを利用することで、本装置は発音及び識別の訓練用装置として活用することが出来る。 An information and communication network system that connects computers at language education sites, such as general English conversation schools and computers in junior high schools, high schools, universities, etc. that teach English at an early stage, and even individual learners in remote locations. By using this device, the present device can be used as a training device for pronunciation and identification.

英語の音素「ｌ」及び「ｒ」の発音を指導するに当たって、従来の技術は、指導者等が手とか口、物、図面等を使った機械的な方法と、母国語人の模範例を音として示すという方法を採ってきた。学習者はそれを見よう見まねで学習してきた。しかしこのような機械的、及び聴覚的指導だけでは、学習者は十分に納得の行く発音能力を得るに至らなかった。一方、両音素の識別能力向上を目的とした指導は殆どなされなかった。更に技術的に未開拓の領域であったために、当該学習のためにネットワークシステムを利用することは無かった。 In teaching the pronunciation of the English phonemes "l" and "r", the conventional technology uses a mechanical method using hands, mouths, objects, drawings, etc., and an example of a native language speaker. We have adopted the method of showing as sound. Learners have learned by imitating. However, such mechanical and auditory guidance alone has not provided learners with satisfactory consent. On the other hand, little instruction was given to improve the discrimination ability of both phonemes. Further, since the area was technically unexplored, the network system was not used for the learning.

従来の技術では、音素「ｌ」及び「ｒ」の発音及び識別の訓練において、学習者が十分に納得が行く程の学習結果を学習者に与えることが困難であった。その原因は３つ考えられる。 In the prior art, in the training of pronunciation and discrimination of phonemes “l” and “r”, it has been difficult for the learner to give the learner sufficient learning results to the learner. There are three possible causes.

第１に、日本語しか知らない学習者にとって両音素の発音を習得する事は大変な困難を伴う。日本語には元来両音素が存在しない。その為に、聴覚的に両音素を識別する能力を日本人は殆ど持っていないか、非常に低い。従って指導者の模範発音を耳で聞きながら両音素の発音を習ったとしても、学習者自身は確信を持てる程の十分な効果を上げる事は不可能である。これは耳の不自由な人が言語を覚える際に遭遇する困難さに類似している。 First, it is very difficult for learners who only know Japanese to learn the pronunciation of both phonemes. Originally, both phonemes do not exist in Japanese. For this reason, Japanese have little or very little ability to discriminate between phonemes aurally. Therefore, even if you learn the pronunciation of both phonemes while listening to the instructor's model pronunciation, it is impossible for the learner to achieve a sufficient effect to be confident. This is similar to the difficulty that deaf people encounter when learning a language.

第２に、従来の技術は両音素の識別に殆ど重点を置かなかった。その為学習現場において当該音素の識別訓練が、排他的に実施される事はなかった。従って正確な識別が求められているにも拘らず、学習者はそれを学習獲得することが出来なかった。 Second, the prior art places little emphasis on biphone discrimination. Therefore, the training for discriminating the phoneme was not performed exclusively at the learning site. Thus, despite the need for accurate identification, learners could not learn it.

第３に、両音素の発音及び識別の能力を学習獲得しようとしても、その為の知識、技術、情報を得る場所が十分に存在しなかった。特に遠隔地の学習者にとって、当該目的を叶える事は殆ど不可能であった。これらの課題に対して以下の解決法が考えられる。 Third, there is not enough space to obtain knowledge, techniques, and information for learning and acquiring the pronunciation and discrimination ability of both phonemes. In particular, it has been almost impossible for remote learners to achieve this goal. The following solutions are conceivable for these problems.

第１の原因に対しては、聴覚的機能の欠陥を視覚的に補充することで、この問題を解決する事が出来る。本装置では、従来の技術で出来なかった両音素の音声学的特徴パターン（例えば、波形、振動数、振幅、等）を視覚的にコンピュータ画面の一部に模範パターンとして表示しておく。学習者は模範音声を聞きながら模倣して当該音声を発声、入力する。入力された音声はリアルタイムに解析され、同一画面上の残りの部分に模倣パターンとして瞬時に並べて表示されるようになっている。従って、学習者は双方のパターンを視覚的に容易に比較対照、評価出来る為に、正確なフィードバックを得られる。この得られた情報を基に、たとえ聴覚的情報が乏しくても、十分な発音訓練を実施する事が出来る。 For the first cause, this problem can be solved by visually supplementing the deficiency of the auditory function. In this apparatus, phonetic feature patterns (for example, waveforms, frequencies, amplitudes, etc.) of both phonemes, which cannot be obtained by the conventional technique, are visually displayed as a model pattern on a part of the computer screen. The learner simulates and inputs the sound while imitating while listening to the model sound. The input voice is analyzed in real time, and instantly arranged and displayed as imitation patterns on the rest of the same screen. Therefore, since the learner can easily compare and evaluate both patterns visually, accurate feedback can be obtained. Based on the obtained information, even if auditory information is scarce, sufficient pronunciation training can be performed.

第２の原因に対しては、更に２つの段階に分けて解決する事が出来る。第１段階では両音素の特徴的相違を知覚的及び解析的観点から学習者に正確に把握させる。第２段階では英語を母国語とする人（以後“母国語人”と表記する）の識別方法を基に訓練資料を編成しする。この資料を用いて前段階で得られた知識を基礎にしながら訓練することで、学習者は単に両音素の識別能力のみならず、母国語人のそれに類似した識別能力を獲得することが出来る。 The second cause can be resolved in two stages. In the first stage, the learner accurately grasps the characteristic difference between the two phonemes from a perceptual and analytical viewpoint. In the second stage, training materials are compiled based on the identification method of a person whose native language is English (hereinafter referred to as “native person”). By using this material and training on the basis of the knowledge obtained in the previous stage, the learner can acquire not only the discriminating ability of both phonemes but also the discriminating ability similar to that of native speakers.

第１段階として本装置は、識別能力の殆ど無い学習者に対して、両音素「ｌ」及び「ｒ」の特徴及び相違を、加工音声を用いて知覚的に学習及び獲得するのに必要な援助を提供する装置である。通常我々は１００ｍｓ（ミリセカンド、又は１／１０００秒）以内、時には２０ｍｓ、３０ｍｓという短時間に各種の音素を区別して発声し、又それらを瞬間的に識別知覚する。ところが初心者には経験の無い両音素の違いを獲得し、識別するのは至難の業である。しかし各音素を時間的に引き伸ばして加工音声を作ることで、学習者はたやすく両音素の知覚的特徴及び相違を獲得することが出来る。しかし獲得しただけでは意味が無い。実際に現場にて十分活用出来なくてはならない。従って本装置には、識別訓練用の様々な加工音声、基本音声及び単語、更に両音素の比較対照を容易にするために両音素から成る加工音声、基本音声及び単語の対、また模範パターンに加えてより実用的にするために、一般的な多数のネイティブによる数多くの発音例などが登録されている。また本装置はこれらの音声を繰り返し再生する機能を備えている。 As a first step, the apparatus is required for a learner with little discrimination ability to perceptually learn and acquire the features and differences of the two phonemes “l” and “r” using the processed voice. A device that provides assistance. Usually, we distinguish and utter various phonemes within a short time of 100 ms (millisecond or 1/1000 second), sometimes 20 ms or 30 ms, and instantly discriminate and perceive them. However, it is very difficult for beginners to acquire and discriminate between two phonemes that they have no experience with. However, by making each phoneme temporally stretched to produce a processed voice, the learner can easily acquire the perceptual features and differences of both phonemes. However, just acquiring it is meaningless. In fact, it must be able to be fully utilized in the field. Therefore, the apparatus includes various processed voices, basic voices and words for discrimination training, and processed voices composed of both phonemes, basic voice and word pairs, and model patterns to facilitate comparison of both phonemes. In addition, in order to make it more practical, a large number of pronunciation examples by many general natives are registered. The apparatus has a function of repeatedly reproducing these sounds.

第２段階として本装置は、英語学習者の音素「ｌ」及び「ｒ」の識別能力を母国語人のそれに近づけるための訓練装置である。その為に予め本装置には、英語の音声「ｌａ」及び「ｒａ」を、母国語人の識別基準に基づいて人工的に合成した複数個の人工合成音声、及びそれらを用いて行なう種々の識別訓練様式が登録されている。又訓練を効率よく実施するため以下のような各種の機能を備えている。各合成音声を種々の方法で提示するための機能、各合成音声を再生する音声再生機能、提示された音声が「ｌａ」又は「ｒａ」の何れであったかを学習者がマウス、又はキーボードを用いて識別入力する機能、入力された情報を集計し正誤の判定を下す機能、結果を報告する機能等である。 As a second step, the present apparatus is a training apparatus for bringing the English learner's ability to discriminate phonemes "l" and "r" closer to that of a native language speaker. For this purpose, the apparatus includes a plurality of artificially synthesized voices in which English voices “la” and “ra” are artificially synthesized based on the identification standards of native speakers, and various types of voices that are performed using the synthesized voices. Identification training form is registered. In addition, the following various functions are provided for efficient training. A function for presenting each synthesized voice in various ways, a voice reproduction function for reproducing each synthesized voice, and a learner using a mouse or a keyboard to determine whether the presented voice was “la” or “ra”. A function of identifying and inputting information, a function of totalizing the input information and determining whether the information is correct, a function of reporting the result, and the like.

第３の原因に対しては、本装置は、ネットワークシステム、即ちインターネット、及びＬＡＮなどと言ったコンピュータ間を繋いで行なう通信手段を用いて、遠隔地の学習者にも、同様の援助を、公平に提供する機能を備えている。 For the third cause, the present apparatus provides similar support to a remote learner by using a network system, that is, communication means for connecting computers such as the Internet and a LAN. It has a function to provide fairly.

本装置は、日本語には元来存在しない英語の音素「ｌ」及び「ｒ」の模範パターンと、学習者の両音素の模倣パターンを、視覚的且つ聴覚的に、リアルタイム且つ同時に、１つの画面上に並列して提示する事が出来る。従って学習者は容易に比較対照出来、自己評価を可能にした。その結果学習者は自己の聴覚的欠陥を補い、自主的に訓練を実施することが可能になり、両音素の発音方法を容易に習得することが可能になった。 This device combines the model patterns of the English phonemes “l” and “r”, which do not originally exist in Japanese, and the imitation patterns of both phonemes of the learner in one real-time and simultaneously It can be presented on the screen in parallel. Therefore, the learner could easily compare and contrast, and enabled self-evaluation. As a result, the learner can compensate for his own auditory deficits, can perform training independently, and can easily learn the pronunciation method of both phonemes.

本装置は、音素「ｌ」及び「ｒ」の識別訓練において、加工音声を採用した。その結果両音素の特徴的な相違を容易に学習することが可能になった。また学習、記憶を効率的に行なえるように短期記憶の特質を活かして基本音声対、及び単語対のタイミングを設定し再生提示出来るようにした。更に母国語人の識別基準に基づいて編成された複数個の人工合成音声「ｌａ」及び「ｒａ」を訓練用の音声資料として採用した。その結果、学習者は自己の識別基準を母国語人のそれに近づけることが可能になった。 This apparatus adopted processed speech in discrimination training of phonemes “l” and “r”. As a result, it became possible to easily learn the characteristic differences between the two phonemes. In addition, the characteristics of short-term memory were utilized so that learning and memory could be performed efficiently, so that the timing of a basic voice pair and a word pair could be set and reproduced and presented. Furthermore, a plurality of artificially synthesized speeches “la” and “ra” organized based on the identification criteria of the native language were adopted as training speech materials. As a result, learners can now approach their own identification criteria to those of native speakers.

更に本装置は、両音素の特徴的相違を単なる知識として獲得する事を援助するだけではない。この知識を実際の場面に活かされるようにする為に種々の訓練方法を考案し、各種の識別訓練テストを作成した。学習者はこれらの訓練テストを実施することでスムーズに実用的な識別能力を獲得することが可能になった。 In addition, the device does not only assist in acquiring the characteristic differences between the two phonemes as mere knowledge. In order to make use of this knowledge in actual situations, various training methods were devised, and various discriminative training tests were created. By performing these training tests, learners can smoothly acquire practical discriminating ability.

最後に本装置は、コンピュータを用いた通信手段であるネットワークシステムを利用することで、広義での学校という限られた場所だけではなく、居ながらにして広く一般の学習者が、学習したい時に好みに応じた時間だけ自由に識別の訓練を実施出来るようにした。 Finally, this device uses a network system, which is a means of communication using a computer, so that not only a limited place such as a school in a broad sense but also a wide range of learners who want to learn The training of discrimination can be freely carried out only for the time corresponding to.

音素「ｌ」及び「ｒ」の発音及び識別能力を習得するに当たって今回の発明では、聴覚的情報だけでは不充分な点を、視覚的情報として補う事が出来る。即ち音声解析器を用いて機械的に得られる解析学的情報（詳細な両音声の特徴的相違）をコンピュータの画面にパターンとして視覚的に提示する。従って学習者は聴覚による識別能力が不充分であっても視覚能力さえ充分あれば両音素の特徴及び相違を明確にし獲得する事が出来る。更に様々な機能と音声資料を用いて効果的な発音及び識別の訓練を行なう事で、学習者は獲得した相違を単なる知識ではなくて、実用的な能力に高める事が可能になった。またネットワークシステムによって学習者は居ながらにして当該学習を受けられるようになった。 In learning the pronunciation and discrimination ability of the phonemes “l” and “r”, the present invention can compensate for insufficient information using only auditory information as visual information. That is, analytic information (characteristic differences between the two detailed voices) mechanically obtained using the voice analyzer is visually presented as a pattern on a computer screen. Therefore, the learner can clarify and acquire the characteristics and differences between the two phonemes even if the discrimination ability by hearing is insufficient and the visual ability is sufficient. In addition, by conducting effective pronunciation and discrimination training using various functions and audio materials, the learner can increase the acquired differences to practical abilities rather than mere knowledge. In addition, the network system allows the learner to receive the learning while staying there.

初めに採用された図について簡単な説明をする。図１、図２、及び図３は、英語の音声「ｌａ」及び「ｒａ」の音声解析結果を基にして作成した両音声の概略図である。横軸は時間ｔ（単位１／１０００秒、以後、“ｍｓ”と表記する）を、縦軸は振動数（単位Ｈｚ）を表している。尚これらの図は、双方の相違を可能な限り明確に出来るようにする為に、両音声の特徴的な部分だけに焦点を当ててパラメーターを作成し図式化したものである。従って使用した数値は大まかなもので、実際の音声はこれよりも遥かに複雑で変化に富んでいる。 A brief description will be given of the figure adopted first. FIGS. 1, 2 and 3 are schematic diagrams of both voices created based on the voice analysis results of English voices “la” and “ra”. The horizontal axis represents time t (unit: 1/1000 second, hereinafter referred to as “ms”), and the vertical axis represents frequency (unit: Hz). In these figures, in order to make the difference between the two voices as clear as possible, parameters are created and schematized by focusing only on the characteristic portions of both voices. Therefore, the numerical values used are rough, and the actual speech is much more complex and varied.

両音声は共に時間軸に沿って３つの部分から構成されている。最初の０ｍｓから１００ｍｓの間は音素「ｌ」又は「ｒ」のみが発音される部分、次の１００ｍｓから１５０ｍｓの間は母音「ａ」が加わって「ｌａ」又は「ｒａ」が発音される部分、残りの２００ｍｓの間は、母音「ａ」のみが比較的長時間持続して発声される部分、以上３部分から成る。 Both voices are composed of three parts along the time axis. A portion where only the phoneme “l” or “r” is pronounced during the first 0 ms to 100 ms, and a portion where the vowel “a” is added and “la” or “ra” is pronounced during the next 100 ms to 150 ms During the remaining 200 ms, only the vowel "a" is uttered for a relatively long period of time, consisting of three parts.

尚、Ｆ１、Ｆ２、及びＦ３はフォルマントと呼ばれ、人間の音声を構成する基本的要素の一部を表す。通常は４つないし５つ存在し、単位として振動数ヘルツ（Ｈｚ）を用いる。そして振動数の少ない方から順にＦ１、Ｆ２、Ｆ３と表記する。音声「ｌａ」及び「ｒａ」の特徴を記述するに当たって、上記の３要素のみを採用し、残りの要素は重要でないために省略した。図３はＦ３のみを表し、Ｆ１及びＦ２は、両音素において同様の時間的変化を示すことを考慮して、図をより簡潔にするために便宜上省略した。 Note that F1, F2, and F3 are called formants and represent some of the basic elements that make up human speech. Usually, there are four to five, and a frequency hertz (Hz) is used as a unit. Then, F1, F2, and F3 are described in order from the one with the lowest frequency. In describing the features of the sounds “la” and “ra”, only the above three elements were adopted, and the remaining elements were omitted because they were not important. FIG. 3 shows only F3, and F1 and F2 have been omitted for the sake of simplicity in view of showing similar temporal changes in both phonemes.

図４及び図５は、それぞれ英語の音声「ｌａ」及び「ｒａ」が、模範パターンとして実際にコンピュータの画面上に表示された時の実施図である。但しそれぞれの図においては音素「ｌ」又は音素「ｒ」の部分が時間的に実際以上に強調して発声されている。図６及び図７は、両音素がそれぞれ単語の中間に来た場合と最後に来た場合の参考パターンである。図６は「ａｌｉｖｅ−ａｒｒｉｖｅ」のパターン、図７は「ｈａｉｌ−ｆｏｕｒ」のパターンを示している。図８は、両音素の時間軸に対する聴覚的な変化を図式化したものである。 FIG. 4 and FIG. 5 are implementation diagrams when English voices “la” and “ra” are actually displayed on a computer screen as model patterns. However, in each of the figures, the phoneme "l" or the phoneme "r" is uttered while being emphasized temporally more than actual. FIGS. 6 and 7 show reference patterns when both phonemes come in the middle of a word and when they come last. FIG. 6 shows a pattern of “alive-arrive”, and FIG. 7 shows a pattern of “hail-four”. FIG. 8 is a diagram illustrating an auditory change of both phonemes with respect to a time axis.

発音を学ぶに当たって、学習者は母国語人の発音特徴を知る事が大切である。図１及び図２において見られるように、英語には日本語とは異なって、子音のみを比較的長い時間（大体８０ｍｓから１５０ｍｓ）単音として発音する習慣が見られる。従って英語を学ぼうとする学習者は、この点を留意しつつ両音素も単音として持続的に発音出来るようにすることが大切である。以下に両音素の発音を獲得する為の実施例を「ｌ」、そして「ｒ」の順に説明する。 In learning pronunciation, it is important for learners to know the pronunciation characteristics of native speakers. As seen in FIGS. 1 and 2, unlike English, English has a habit of producing only consonants as single tones for a relatively long time (approximately 80 ms to 150 ms). Therefore, it is important for a learner who wants to learn English to keep this point in mind so that both phonemes can be continuously pronounced as single sounds. An embodiment for obtaining pronunciation of both phonemes will be described below in the order of "l" and "r".

今回発明した方法は、音声の解析結果を視覚的に表示する事で、聴覚的識別能力が無くても視覚的に結果を把握する事が可能になった。図４に示したように、母国語人によって音素「ｌ」を２秒から３秒間程度持続的に発音し、その結果を解析し、模範パターンとしてコンピュータ画面上の一部に表示する。一方学習者は、マイクを通して当該コンピュータに指導者、指導書、又は自分の既に学んだ発音知識に基づいて、同時に再生される模範音声を真似ながら発音し入力する。入力された音素は瞬時に解析され模倣パターンとして画面上の残りの部分（図４では画面上の下半分）に、既に表示されている模範パターンと並列して表示される。学習者は自分の模倣パターンを模範パターンに照合することで、視覚的に（必要に応じて聴覚的にも）容易に比較検討する事が出来る。しかもたとえ聴覚によって両音素を区別する事が出来なくても、自分の発音の良し悪しを、視覚的識別能力さえあれば客観的に検証することが出来る。 The method invented this time makes it possible to visually grasp the result even without the auditory discrimination ability by visually displaying the analysis result of the voice. As shown in FIG. 4, the phonetic "l" is continuously pronounced for about 2 to 3 seconds by the native language person, and the result is analyzed and displayed as a model pattern on a part of the computer screen. On the other hand, the learner pronounces and inputs the model voice reproduced at the same time through the microphone to the computer based on the instructor, the instruction book, or his or her already learned pronunciation knowledge. The input phonemes are instantaneously analyzed and displayed on the remaining portion on the screen (the lower half on the screen in FIG. 4) as the imitation pattern in parallel with the model pattern already displayed. The learner can easily compare visually (and, if necessary, auditory) by comparing his or her imitation pattern with the model pattern. Moreover, even if the two phonemes cannot be distinguished by hearing, it is possible to objectively verify whether one's pronunciation is good or not if there is a visual discrimination ability.

ここで重要な留意点は、図１に見られるように、音素「ｌ」が発音されている最初の１００ｍｓの間は、Ｆ２とＦ３の振動数の差（以後“間隙”と表記する）が、一定で殆ど変化しない点である。個人差はあるが、両フォルマントの間隙は通常平均すると、母国語人においては１３００Ｈｚから１９００Ｈｚ程度である。即ち母国語人が「ｌ」を含んだ単語を強調して発音する際には、上記の間隙が８０ｍｓから１５０ｍｓ程度持続される。従って学習者もこれに近い間隙をある程度の時間は持続出来るように発音訓練することが好ましい。 An important point to note here is that the difference between the frequencies of F2 and F3 (hereinafter referred to as "gap") during the first 100 ms when the phoneme "l" is pronounced, as seen in FIG. Is constant and hardly changes. Although there are individual differences, the gap between both formants is usually on average about 1300 Hz to 1900 Hz for native speakers. That is, when the native speaker emphasizes and pronounces a word including “l”, the above gap is maintained for about 80 ms to 150 ms. Therefore, it is preferable that the learner also exercises pronunciation so that a gap close to this can be maintained for a certain period of time.

更に本発明はリアルタイムで学習者の発声して音声を解析し模倣パターンとして表示出来る。従って学習者は両パターンを瞬時に比較ながら、問題点を見つけ、試行錯誤しつつマイペースでスムーズに訓練を継続する事が出来る。必要に応じて指導者等の援助を受けながら、好ましい間隙及び持続時間を獲得する為の効果的な訓練を実施することが出来る。最終的に学習者自身が自分の訓練結果を自分の目で確かめられるために、納得行く成果を得る事が出来る。 Further, according to the present invention, a learner can utter a voice in real time to analyze the voice and display the voice as an imitation pattern. Therefore, the learner can compare the two patterns instantly, find a problem, and continue the training smoothly at his own pace through trial and error. Effective training to achieve the desired gap and duration can be conducted with the assistance of leaders and the like as needed. Ultimately, the learners themselves can check their training results with their own eyes, so they can achieve satisfactory results.

音素「ｌ」が発音出来るようになったら音声「ｌａ」の練習に移る。舌及び口の位置関係を「ｌ」の状態から母音の位置関係に移動させると同時に、当該母音を発音する。例えば母音「ａ」を発音すれば、英語の音声「ｌａ」が得られる。実際に音声「ｌａ」を時間的に分析すると、図４で見られるように、最初に音素「ｌ」、次に「ｌａ」、そして最後に「ａ」に移行して行く過程を見ることが出来る。このような模範パターンを見ながら、類似したパターンを得られるように繰り返し練習する。好ましい間隙を安定的に得られるようにするには、模範と模倣を比較対照しながら継続的に訓練を行なうことが重要である。尚次ぎに来る母音の種類によってＦ２とＦ３の間隙が多少は変化する。しかし、概ね「ａ」の場合のように広く保つように心がけることが好ましい。 When the phoneme “l” can be pronounced, the program moves on to the practice of the voice “la”. The positional relationship between the tongue and the mouth is moved from the state of "1" to the positional relationship of the vowel, and at the same time the vowel is pronounced. For example, if a vowel "a" is pronounced, an English voice "la" is obtained. When we actually analyze the voice "la" in time, we can see the process of transitioning first to the phoneme "l", then to "la", and finally to "a", as seen in FIG. I can do it. While watching such a model pattern, practice repeatedly to obtain a similar pattern. In order to stably obtain a favorable gap, it is important to continuously train while comparing and imitating the model and the imitation. Note that the gap between F2 and F3 slightly changes depending on the type of the vowel that comes next. However, it is preferable to keep it broad as in the case of "a".

次に音素「ｒ」の説明に移る。図２において時間０ｍｓから１００ｍｓの間は、音素「ｒ」のみが発音されていることを示している。「ｒ」は、「ｌ」と比較してＦ２とＦ３の間隙が非常に狭いことが大きな特徴である。ところがＦ２の振動数は、両音素間で顕著な差が無くて、平均すると共に９００Ｈｚから１２００Ｈｚ程度である. 従って音素「ｒ」を発音する場合は、Ｆ３の振動数（注、間隙ではない）を「ｌ」の場合よりもかなり低く、平均して１５００Ｈｚから１８００Ｈｚ位にすることが重要になってくる。母国語人の中には、特に女性の場合に見られる現象として、２０００Ｈｚを超える例もある。しかし男性の場合は大体１５００Ｈｚから１６００Ｈｚの間にあり、１８００Ｈｚを超える例は珍しい。 Next, the description moves to the phoneme “r”. FIG. 2 shows that only the phoneme “r” is generated during the time from 0 ms to 100 ms. “R” is a major feature in that the gap between F2 and F3 is much narrower than “l”. However, the frequency of F2 has no remarkable difference between the two phonemes, and is about 900 Hz to 1200 Hz on average. Therefore, when the phoneme "r" is pronounced, the frequency of F3 (note, not a gap) Is much lower than in the case of “1”, and it is important to average from 1500 Hz to 1800 Hz. Some native speakers have a frequency exceeding 2000 Hz as a phenomenon particularly observed in women. However, in the case of men, it is generally between 1500 Hz and 1600 Hz, and cases exceeding 1800 Hz are rare.

図５は音素「ｌ」の場合と同様、音素「ｒ」の部分が特別に２秒から３秒間持続して発音された音声「ｒａ」の模範パターンをコンピュータ画面上の一部に表示したものである。学習者は、指導者、指導書、又は自分の既に学んだ発音知識を基にして、同時に再生される模範音声を模倣しながら発音する。学習者は画面の残りの部分に表示された自分の模倣パターンを模範パターンに照らしながら、「ｌ」の場合と同様のプロセスに沿って訓練する。「ｒ」が発音出来るようになったら、母音を加えたより実際的な訓練に入る。即ち、舌及び口の位置関係を「ｒ」の位置から母音の位置、例えば「ａ」の位置関係に移行すると同時に母音「ａ」を発音すれば、英語の音声「ｒａ」を得ることが出来る。 FIG. 5 shows, on the computer screen, a model pattern of a sound “ra” in which the phoneme “r” is particularly sustained for 2 to 3 seconds, as in the case of the phoneme “l”. It is. The learner pronounces while imitating the model sound played simultaneously based on the instructor, the instruction book, or his already learned pronunciation knowledge. The learner trains according to the same process as in the case of “l” while illuminating his or her imitation pattern displayed on the rest of the screen with the model pattern. Once the "r" can be pronounced, it begins to be more practical training with vowels. That is, if the positional relationship between the tongue and the mouth is shifted from the position of “r” to the position of a vowel, for example, the positional relationship of “a”, and the vowel “a” is pronounced at the same time, the English voice “ra” can be obtained. .

今回の発明は、両音素の聴覚的識別能力が無くても視覚的識別能力だけで、自己の発音結果に対する主体的な評価を可能にした。従来の技術では、学習者は他者（例えば、指導者）の評価に頼らざるを得なかった。その為にたとえ習得した発音能力が完璧でると判定されたとしても、他力的な評価である以上、「本当にこれで良いのだろうか」という不安が残り、１００％の自信を持つ事は困難であった。その原因は、日本語環境で育ったために、生後間もなく両音素の聴覚的識別能力を喪失した点にある。しかし今回の発明は、たとえ識別能力が無くても、学習者が随時画面上で自分の発音結果を簡単に模範と比較検討することを可能にした。従って学習者は客観的な評価を自分自身で行なう事が出来る。得られた評価を基に必要な対策を図り次回以降の訓練に取り入れて行くことで、スムーズに成果を上げる事が出来るようになった。 The present invention enables independent evaluation of the pronunciation results of one's own pronunciation only with the visual discrimination ability without the auditory discrimination ability of both phonemes. In the prior art, learners have had to rely on the evaluation of others (eg, instructors). For this reason, even if it is judged that the pronunciation ability acquired is perfect, it is difficult to have 100% confidence because there is anxiety about "Is this really good?" Met. The reason is that she lost her auditory discrimination ability between the two phonemes soon after birth because she grew up in a Japanese environment. However, the present invention makes it possible for a learner to easily compare his or her own pronunciation results on a screen with a model at any time, even if he has no discriminating ability. Therefore, the learner can make an objective evaluation himself. By taking necessary measures based on the obtained evaluations and incorporating them into the next and subsequent drills, it was possible to smoothly achieve results.

両音素が単語の先頭、中間、最後の何れにあっても基本音声の場合と同じ方法で訓練を行なうことが出来る。これまでの説明は、両音素から成る基本音声について議論した。しかし両音素が単語の先頭（例えば、「ｌｉｃｅ」とか「ｒｉｃｅ」など）に来た場合、中間及び最後に来た場合でも上記に示したと同様なパターンを形成する。先頭の場合は基本音声の場合と全く同じなので説明を控える。中間として図６は単語「ａｌｉｖｅ」及び「ａｒｒｉｖｅ」、又最後として図７は「ｈａｉｌ」及び「ｆｏｕｒ」のパターンである。両音素の比較対照を容易にするため、共に上部が「ｌ」、下部が「ｒ」の場合を示している。図から分かるように、各図において２本の縦線で囲まれている部分が音素「ｌ」又は「ｒ」である。即ち両音素が単語の中間に来た場合の両者の特徴は、先頭に来た場合のそれと全く同様の特徴を備えている事が分かる。また最後に来た場合は、Ｆ２、Ｆ３の振動数が音素「ｒ」で共にやや高めになっているが、両者の間隙は先頭及び中間の場合と殆ど変わらない。従って学習者はこの点に注意して訓練する事が重要である。尚、本装置には両音素が先頭及び中間、又は最後に来る多数の単語や、両音素を比較し易くした単語の対が多数登録されている。 Training can be performed in the same manner as in the case of basic speech, regardless of whether both phonemes are at the beginning, middle, or end of a word. The preceding discussion has discussed the basic speech consisting of both phonemes. However, when both phonemes come to the beginning of a word (for example, “rice” or “rice”), even when they come at the middle and at the end, a pattern similar to that shown above is formed. The first case is exactly the same as the case of the basic sound, so the explanation is omitted. FIG. 6 shows the patterns of the words “alive” and “arrive” in the middle, and FIG. 7 shows the patterns of “hal” and “four” in the end. In order to facilitate comparison of both phonemes, the case where the upper part is “l” and the lower part is “r” is shown. As can be seen from the figures, the portion surrounded by two vertical lines in each figure is the phoneme “l” or “r”. That is, it can be seen that the characteristics of both when the two phonemes come in the middle of the word have exactly the same characteristics as those when they come at the head. In the last case, the frequencies of F2 and F3 are slightly higher for the phoneme "r", but the gap between them is almost the same as in the first and middle cases. Therefore, it is important that the learner trains with this point in mind. It should be noted that the present apparatus registers a large number of words in which both phonemes are at the first, middle, or last, and a large number of pairs of words that facilitate comparison of both phonemes.

次に識別訓練装置の説明に入る。両音素「ｌ」と「ｒ」の相違は、感覚器官による知覚的観点からと、測定機械による音声解析的観点の２点から捕らえる事が出来る。従って今回の発明では、それぞれの観点に立って両音素の識別に関する基礎を学び、更に訓練の実施という３段階的に構成されている。第１段階は、両音素「ｌ」と「ｒ」の特徴的な相違が知覚的にどこにあるかを学習する。第２段階は、解析的相違に基づいて作成した人工合成音を用いて母国語人の識別の仕方を学習する。更に相違を学習しただけでは実際場面において役に立たない。そこで第３段階として各種の音声資料を用いて作成された多数の訓練用のテストを実施し、実用的な能力を身に付ける。以上３段階に分けてこの順序に従って実施例を示すことにする。 Next, a description will be given of the discrimination training device. The difference between the two phonemes “l” and “r” can be captured from two points, a perceptual point of view by a sensory organ and a voice analysis point of view by a measuring machine. Therefore, the present invention is configured in three stages: learning the basics of discriminating both phonemes from each viewpoint, and further implementing training. The first stage learns where the characteristic differences between the phonemes "l" and "r" are perceptually. The second step is to learn how to identify native speakers using artificially synthesized sounds created based on analytical differences. Further learning the differences does not help in the actual situation. Therefore, as a third step, a large number of training tests created using various audio materials are performed to acquire practical skills. The embodiment will be described in the above three steps in this order.

第１段階では、２種類の方法を採用した。１つは基本音声（本説明では、「ｌｅ」、「ｒｏ」「ｂｌｅ」、「ｂｌｏ」などのような「子音＋母音」で構成される音声を指す）を時間的に引き伸ばした加工音声を用いる。もう１つは基本音声から成る両音素の対（例えば、ｌｕ−ｒｕ、ｐｌｅ−ｐｒｅ等）や単語から成る対（例えば、ｐｌａｙ−ｐｒａｙ等）を用いる。ただし各対を構成する２つ音声は特定の時間的間隔をおいて再生出来るように配置した。 In the first stage, two methods were employed. One is a processed voice obtained by temporally stretching a basic voice (refers to a voice composed of “consonants + vowels” such as “le”, “ro”, “ble”, and “blo”). Used. The other uses a pair of phonemes composed of basic sounds (for example, lu-ru, ple-pre, etc.) and a pair composed of words (for example, play-play, etc.). However, the two sounds constituting each pair are arranged so as to be reproduced at a specific time interval.

１つは、加工音声を用いることで、両音素の特徴的相違を明確にすることが可能になった。通常我々は例外を除いて２０ｍｓとか３０ｍｓといった非常に短時間に多様な音声を区別して発音し知覚する。しかしこのように瞬間的に生じる両音素「ｌ」及び「ｒ」の違いを正確に把握するのは、日本人のような経験の無い学習者にとって殆ど不可能である。この問題を解決する最適な方法の１つは、両基本音声、例えば「ｌａ」及び「ｒａ」を時間的に引き伸ばした加工音声を用いる方法である。この操作によって両音声が時間軸に沿って変化して行く過程が実際よりも遅くなり、違いを知覚し易い形で音声を提示することが出来る。即ち通常では聞き逃してしまうような、或いは聞き取りにくい両音素の瞬間的に生じる相違部分が鮮明に表出される。よって学習者は容易に両音素の特徴的相違を学習することが出来る。 First, the use of processed speech has made it possible to clarify the characteristic differences between the two phonemes. Usually, we distinguish and pronounce various sounds in a very short time, such as 20 ms or 30 ms, with exceptions. However, it is almost impossible for an inexperienced learner such as a Japanese to accurately grasp the difference between the two phonemes “l” and “r” generated instantaneously. One of the optimal methods for solving this problem is to use both basic sounds, for example, processed sounds obtained by temporally extending “la” and “ra”. By this operation, the process in which the two voices change along the time axis becomes slower than the actual time, and the voices can be presented in a form in which the difference can be easily perceived. That is, instantaneous differences between the two phonemes that are normally missed or hard to hear are clearly displayed. Therefore, the learner can easily learn the characteristic difference between the two phonemes.

実際両音声を聞いた場合、音素「ｌ」は急速に変化するが、音素「ｒ」は緩やかに変化する。この違いを図式化すると、図８のようになる。横軸は時間（単位、ｍｓ）、縦軸は両音素の変化していく過程を表す。又縦軸の２つの水準（１つは音素「ｌ」又は「ｒ」、もう１つは母音「ａ」）の間は，各音素がそれぞれの音素から始まって、音声「ｌａ」又は「ｒａ」を経て母音「ａ」に変化していく過程を示す。実線は音素「ｌ」の変化過程を示し、点線は音素「ｒ」の変化過程を示す。この図から分かるように、両音素の特徴的な相違は、各音素から母音「ａ」に至るまでの移行過程にあることが分かる。音素「ｌ」の場合は、最初急激に母音に近づくが、その後は漸近的に変化し最終的に母音に至る。一方音素「ｒ」の場合は、最初徐々に変わり始め、後半で急激に変化し、最終的に母音に至る。このような両者の対称的な違いを解析的に正確に比較するのは計測器の精度、更には個人差との関係で困難であるが、大まかな数値として、音素「ｌ」の方が音素「ｒ」に比して２分の1から３分の１程度短い時間で殆ど母音に近づく。図８で言えば、音素「ｌ」の場合は１００ｍｓから１１５ｍｓの１５ｍｓをかけて、一方、音素「ｒ」の場合は１００ｍｓから１４０ｍｓの４０ｍｓをかけてそれぞれ母音「ａ」に近似する。 In fact, when both voices are heard, the phoneme “l” changes rapidly, but the phoneme “r” changes slowly. FIG. 8 is a diagram illustrating this difference. The horizontal axis represents time (unit: ms), and the vertical axis represents the process of changing both phonemes. Also, between the two levels on the vertical axis (one phoneme "l" or "r" and another vowel "a"), each phoneme starts with its own phoneme, and the speech "la" or "ra" ”To the vowel“ a ”. The solid line indicates the changing process of the phoneme “l”, and the dotted line indicates the changing process of the phoneme “r”. As can be seen from this figure, the characteristic difference between the two phonemes is in the transition process from each phoneme to the vowel "a". In the case of the phoneme "l", the sound rapidly approaches the vowel at first, but thereafter changes asymptotically and finally reaches the vowel. On the other hand, in the case of the phoneme “r”, it changes gradually at first, changes rapidly in the latter half, and finally reaches a vowel. It is difficult to accurately and analytically compare such a symmetrical difference between the two because of the accuracy of the measuring instrument and the individual difference. However, as a rough numerical value, the phoneme “l” is better than the phoneme. It almost approaches a vowel in a time that is about one-third to one-third shorter than “r”. In the case of FIG. 8, the vowel "a" is approximated by multiplying the phoneme "l" by 15 ms from 100 ms to 115 ms, while the phoneme "r" is multiplied by 40 ms from 100 ms to 140 ms.

もう１つは、人間の記憶能力に即した訓練資料を用いることで、効率的な学習を行なう。両音素からなる２つの単語（例えばｆｌａｍｅ、ｆｒａｍｅ等）を用いて、両単語が１／２秒から１秒の間隔をおいて再生提示出来るような音声資料の対を作る。更にこの対を７〜８秒以内に２度再生提示出来るようにタイミングを設定する。一方基本音声の場合は単語の場合と同様にして対を作成する。しかし再生提示する場合は、作成した対を７〜８秒間に２度、又は３度繰り返し再生出来る程度の速さにタイミングを設定する。以上のように設定された単語及び基本音声の対を訓練用の資料として連続的に繰り返し提示する。加工音声によって学んだ両音素の相違は、この方法によって確実なものになり、短期記憶の特質を基に効果的に記憶することが可能になる。 The other is to perform efficient learning by using training materials based on human memory skills. Using two words composed of both phonemes (for example, frame, frame, etc.), a pair of audio materials is created so that both words can be reproduced and presented at an interval of 1/2 second to 1 second. Further, the timing is set so that this pair can be reproduced and presented twice within 7 to 8 seconds. On the other hand, in the case of basic speech, a pair is created in the same manner as in the case of a word. However, when reproducing and presenting, the timing is set to such a speed that the created pair can be repeatedly reproduced twice or three times in 7 to 8 seconds. The pairs of words and basic speech set as described above are continuously and repeatedly presented as training materials. The difference between the two phonemes learned by the processed speech is ensured by this method, and can be effectively stored based on the characteristics of short-term memory.

第２段階では、母国語人は振動数的にみて特殊な識別の仕方をする。この点を重視して、識別様式を母国語人のそれに近づけることは実用的な能力を身に付けるという点で重要である。これまでの説明で明らかになったことであるが、両音素の重要な違いは、時間軸に対するＦ３の変化の相違として顕著に表出される（図１及び図２において、０ｍｓから１５０ｍｓの間）。実際にＦ１及びＦ２を一定にして、Ｆ３の出だしの振動数値（以後、“初期値”と表記する）を実験的に種々に変化させた場合、母国語人は、初期値がある振動数（平均すると２２００Ｈｚから２４００Ｈｚの間辺りに存在する）を境にして、それより低い場合を「ｒ」、それより高い場合を「ｌ」といった具合に識別する。従って母国語人に類似した識別様式を習得したい場合は、このような母国語人の識別基準を身に付けることが好ましい。 In the second stage, native speakers make a special identification in terms of frequency. With emphasis on this point, it is important to bring the style of recognition closer to that of native speakers in order to acquire practical skills. As has been clarified in the above description, an important difference between the two phonemes is remarkably expressed as a difference in change of F3 with respect to the time axis (between 0 ms and 150 ms in FIGS. 1 and 2). . When F1 and F2 are actually kept constant and the vibration value of the start of F3 (hereinafter referred to as “initial value”) is variously changed experimentally, the native language person assumes that the initial value has a certain frequency ( On the basis of the average (between 2200 Hz and 2400 Hz), a lower case is identified as "r", and a higher case is identified as "l". Therefore, when it is desired to learn an identification style similar to a native language person, it is preferable to acquire such an identification standard for a native language person.

この点に着目して、今回の発明では、初期値の変更を行ない易い人工合成音声を用いて、図３に示したようなＦ３のパラメータを設定した。Ｆ３の初期値は、最低値１６００Ｈｚから最高値２８５０Ｈｚの間を２５０Ｈｚの間隔で５等分し、合計６個（１６００Ｈｚ、１８５０Ｈｚ、２１００Ｈｚ、２３５０Ｈｚ、２６００Ｈｚ、２８５０Ｈｚ）を設定した。そして図１及び図２と同様に時間的変化を加える。即ち、０ｍｓから１００ｍの間は個々の初期値を維持し、１００ｍｓから１５０ｍｓの間は、個々の初期値から全て２４００Ｈｚにリニアーに変化し、１５０ｍｓから３５０ｍｓの間は２４００Ｈｚを維持する。こうして編成したパラメーターを図に表したのが図３である。つまりＦ３のみのパラメーターが６つまとめて１つの図の中に集約されていることを示す。これらの異なったＦ３の各々１つづつに、両音素において共通のＦ１とＦ２のパラメーターを加えて、合計６種類の人工音声を合成した。又より肉声に近い合成音を作成するために、フォルマントのパラメーター値をどうするか、合成音の数を幾つにするかなどを考慮して合成することも可能である。 Focusing on this point, in the present invention, the parameters of F3 as shown in FIG. 3 are set using artificially synthesized speech whose initial value is easily changed. The initial value of F3 was divided into five equal parts at intervals of 250 Hz from a minimum value of 1600 Hz to a maximum value of 2850 Hz, and a total of six pieces (1600 Hz, 1850 Hz, 2100 Hz, 2350 Hz, 2600 Hz and 2850 Hz) were set. Then, a temporal change is added as in FIGS. That is, each initial value is maintained from 0 ms to 100 m, linearly changes from the individual initial value to 2400 Hz from 100 ms to 150 ms, and 2400 Hz is maintained from 150 ms to 350 ms. FIG. 3 shows the parameters organized in this manner. That is, it shows that six parameters of only F3 are collected and collected in one figure. To each of these different F3s, the common F1 and F2 parameters for both phonemes were added to synthesize a total of six types of artificial speech. In addition, in order to create a synthesized voice closer to the real voice, it is also possible to perform synthesis by taking into account the formant parameter value and the number of synthesized sounds.

これらの合成音声を識別訓練資料として用いるためには、改めて母国語人によって各音声を「ｌａ」又は「ｒａ」に分類する必要がある。分類に当たっては、両音声の数が同数になるように初期値を設定することが好ましい。こうして分類編成された合成音声「ｌａ」及び「ｒａ」を実際の識別訓練の資料として使用する。本説明では便宜上、１６００Ｈｚ、１８５０Ｈｚ、２１００Ｈｚを「ｒａ」とし、残りの高い方２３５０Ｈｚ、２６００Ｈｚ、２８５０Ｈｚを「ｌａ」というふうに母国語人が識別したとして、各合成音声を図３に示したように分類し、得られた合成音声を識別訓練用の資料として使用する。 In order to use these synthesized speeches as discriminative training materials, it is necessary to classify each speech as "la" or "ra" again by the native language. In classification, it is preferable to set an initial value so that the number of both voices is the same. The synthesized speeches “la” and “ra” thus classified and used are used as data for actual discrimination training. In this description, for convenience, each synthesized voice is shown in FIG. And the obtained synthesized speech is used as data for discrimination training.

第３段階では、第１段階で得られた知識を現実に活かされるような能力に高める訓練を実施する必要がある。日常生活では、我々は多数の人達が各々の音質で語り合っても十分、且つ瞬時に何と言う音声が発せられたかを理解しあう。この点に注目して訓練方法を構成し、実施することが肝要である。そこで今回の発明には、第１、第２の両段階で多数の母国語人の発音例（本装置には５０名弱の母国語人による発音例が登録されている）を基に作成された訓練用資料、つまり加工音声、基本音声、単語、人工合成音声等がそれぞれ単音の形式で、又は対の形式で訓練テスト用の音声資料として本装置に登録されている。尚、採用された単語群は音素「ｌ」及び「ｒ」に関して常に対応した単語と対を成すように、例えば「ｌａｔｅ−ｒａｔｅ」とか「ｇｌａｓｓ−ｇｒａｓｓ」などと言った具合に編成する事が出来るように、選択登録されている。これらに加えて、単文も含めて文章の中に「ｌ」又は「ｒ」からなる単語を一つ、又は複数個入った単文或いは複文も当該資料として登録されている。 In the third stage, it is necessary to conduct training to increase the knowledge acquired in the first stage to the ability to be used in reality. In everyday life, we understand how many people can speak in each sound quality, and what they say instantaneously. It is important to construct and implement a training method paying attention to this point. Therefore, in the present invention, in both the first and second stages, a number of pronunciation examples of a large number of native speakers (pronunciation examples of less than 50 native speakers are registered in this device). The training materials, that is, processed voices, basic voices, words, artificially synthesized voices, etc., are registered in the present apparatus as voice materials for training tests in the form of single sounds or in pairs. Note that the adopted word group may be knitted in a manner such as "late-rate" or "glass-glass" so as to always form a pair with the corresponding word for the phonemes "l" and "r". It is selected and registered so that it can be done. In addition to this, a single sentence or a compound sentence containing one or more words consisting of “l” or “r” in the sentence including the single sentence is also registered as the material.

実施される訓練テストの形式は、再生提示される音声の種類及び形式に応じて変化する。そこで今回の発明では、再生提示される音声が単音の場合、対の場合、文章の場合に分けてそれぞれのテスト方式を考案した。 The form of the training test to be performed varies depending on the type and form of the sound to be reproduced and presented. Therefore, in the present invention, each test method is devised for a case where a reproduced sound is a single sound, a pair, and a sentence.

単音の場合は、提示方法として１４００ｍｓから１５００ｍｓの間に１音声を再生する。学習者はマウスを用いて再生された音声が音素「ｌ」又は「ｒ」の何れを含んでいたかを画面の解答例をクリックして解答する。解答と同時に解答に要した反応時間がミリセカンドを単位として測定される。又解答すると同時に正解か、不正解かを音によるシグナルか、又はイラストにて１０００ｍｓの間フィードバックされる。解答する前に再度聞いてみたい時は「リープレイ」ボタンをクリックして何度でも再生し試聴出来る。解答しない限り次の試行に移行はしない。フィードバックが終わると同時の次の試行に移行する。 In the case of a single sound, one sound is reproduced between 1400 ms and 1500 ms as a presentation method. The learner clicks the answer example on the screen to answer whether the sound reproduced using the mouse contains the phoneme “l” or “r”. At the same time as the answer, the reaction time required for the answer is measured in milliseconds. At the same time as answering, a correct signal or an incorrect answer is signaled by sound or an illustration is fed back for 1000 ms. If you want to hear it again before answering it, you can click the "Replay" button to play it again and listen. Do not move to the next trial unless you answer. When the feedback ends, the process moves to the next trial at the same time.

対の場合は、２８００ｍｓから３０００ｍｓで１つの対を再生提示する。対を構成する２つの音声の関係によって解答方法が異なる。例えば音素「ｌ」又は「ｒ」を除いた残りの部分が全く同様に発音される対の場合（例えば「ｌｏｃｋ−ｒｏｃｋ」とか「ｌｉｃｅ−ｒｉｃｅ」等）は、学習者はマウスを用いて提示された対が「おなじ」音素から構成されているのか、それとも「ちがう」かの何れかを判断し画面上の当該欄をクリックして解答する。又は対の種類が上記以外の場合（例えば「ｌａｃｋ−ｒｉｃｋ」とか「ｒａｃｋ−ｌｉｃｋ」等）は、各対に採用された音声に含まれる両音素の順序が「ｌ−ｒ」、「ｒ−ｌ」、「ｒ−ｒ」、「ｌ−ｌ」の何れであるかを判断し対応した画面をクリックして解答する。又は対を成す２つの音声の内、音素「ｌ」に注目して、それが含まれている音声が「まえ」か、それとも「あと」に再生されたかを判断して同様に解答する。反応時間、リープレイ、フィードバック、試行の移行は上記単音の場合と同様に行なう。 In the case of a pair, one pair is reproduced and presented in 2800 ms to 3000 ms. The answering method differs depending on the relationship between the two voices forming the pair. For example, in the case of a pair in which the remaining parts except for the phonemes “l” or “r” are pronounced exactly the same (for example, “lock-rock” or “rice-rice”, etc.), the learner presents using the mouse. It is determined whether the pair formed is composed of the "same" phoneme or "difference", and an answer is made by clicking the corresponding column on the screen. Alternatively, when the type of the pair is other than the above (for example, “lack-rick” or “rack-lick”), the order of both phonemes included in the voice adopted for each pair is “lr”, “r−r”. 1 "," rr ", or" ll "is determined, and the corresponding screen is clicked to answer. Alternatively, by paying attention to the phoneme "l" of the two voices forming a pair, it is determined whether the voice including the phoneme is reproduced "before" or "after", and the answer is similarly given. The transition of the reaction time, replay, feedback, and trial is performed in the same manner as in the case of the single tone.

提示される音声が文章の場合は、当該音素が１つしか含まれていない場合は、「ｌ」あるいは「ｒ」の何れであるかを判断し上記同様に画面上の音素名をクリックして解答する。複数含まれている場合には、前から順番に「ｌ、ｌ、・・・」、「ｒ、ｌ、・・・」、「ｒ、ｌ、ｌ、・・・」といった具合に当該音素の順序を画面上で選択してクリックして解答するか、又はキーボードより当該音素名のキーを押して解答する。尚、反応時間、リープレイ、フィードバック、試行の移行は上記単音の場合と同様に実施される。 If the presented voice is a sentence, if it contains only one phoneme, it is determined whether it is "l" or "r", and click on the phoneme name on the screen as above. Answer. If a plurality of the phonemes are included, the phonemes of the phoneme are sequentially ordered from the front, such as "l, l, ...", "r, l, ...", "r, l, l, ...". Select the order on the screen and click to answer or press the key of the phoneme name from the keyboard to answer. The transition of the reaction time, replay, feedback, and trial is performed in the same manner as in the case of the single tone.

テストに採用される音種の数は１つのテストにおいて、単音の場合は２０から３０種類前後で、これらの音種は１回のテストで４度から６度繰り返して再生する。従って１回のテストでは１００から１３０回程度、資料音声の再生試行を行なう。一方、対の場合は約１５から３０種類の音声資料が２度ないし５度繰り返し再生する。文章の場合は長さによって異なるが、全体として１つのテストを１０分前後で終えられるように編成してある。何れの場合であっても音声の再生順序はランダムであって、同じテストを継続して実施したとしも２度と同じ順序で音声資料が再生される事はないように設定してある。 The number of sound types used in the test is about 20 to 30 in the case of a single sound in one test, and these sound types are repeatedly reproduced four to six times in one test. Therefore, in one test, the reproduction of the material sound is tried about 100 to 130 times. On the other hand, in the case of a pair, about 15 to 30 types of audio materials are repeatedly reproduced twice or five times. The texts are organized so that one test can be completed in around 10 minutes, depending on the length of the text. In any case, the audio reproduction order is random, and the audio data is set so as not to be reproduced in the same order twice even if the same test is continuously performed.

テストの種類は音種、試行回数を変える事で無数に作成することが出来る。従って学習者は好みのテストを選んで訓練を行なう事が出来る。例えば単語が単音として再生提示される場合だけについて考えてみることにする。１つのテストには「ｌ」から成る音声と、「ｒ」から成る音声が同数あり、しかも互いに対になるように音種は構成されている。そこで２０種類の音種からなるテストを作成する場合、英単語全体で「ｌ」と「ｒ」が対を成すような単語の数は、合計すると１２０対以上存在する。この中からランダムに２０個だけ取り出す組み合わせだけでも無数にある。更に基本音声、対、合成音声、文章等を加えて多様な訓練テスト群を構成することが可能である。従って学習者は無数のテスト群の中から（実際には本装置には２００種類程度登録されている）好みのテストを選択抽出して、好きな時間だけ飽きることなく実施出来る為に、十分な訓練効率を上げられる。 You can create countless types of tests by changing the type of sound and the number of trials. Therefore, the learner can select a favorite test and conduct training. For example, consider only the case where a word is reproduced and presented as a single sound. One test has the same number of voices composed of “l” and the same number of voices composed of “r”, and the sound types are configured to be paired with each other. Therefore, when creating a test composed of 20 kinds of sound types, there are a total of 120 or more words in which “l” and “r” form a pair in the whole English word. There are countless combinations of only 20 combinations taken out at random. Further, it is possible to construct various training test groups by adding basic speech, pairs, synthesized speech, sentences, and the like. Therefore, the learner can select and extract his or her favorite test from a myriad of test groups (actually, about 200 types are registered in the present apparatus) and execute it for a desired time without getting tired. Increase training efficiency.

識別訓練用テストにおいては、正解率は言うにおよばず、解答に要する時間を短縮することが必須である。学習者は正確に両音素の相違を把握出来るようにしなくてはならない。しかし判断するのに何秒もかかっていては実用的ではない。従って各識別テストにおいて解答時間、又は反応時間が大きな意味をなす。正解率を１００％程度に高めると同時に、学習者には反応時間を本装置では目安として１０００ｍｓから１３００ｍｓまで短縮することが求められる。常に平均して上記のような目安を達成出来るように、識別訓練を実施する。尚本装置には、上記のような訓練を行なう為の機能、例えば解答画面の設定機能、音声を再生提示する為の機能、解答入力機能、反応時間測定機能、リープレイ機能、フィードバック機能等を備えている。 In the discrimination training test, it is essential to shorten the time required for answering, not to mention the correct answer rate. Learners must be able to accurately identify the differences between the two phonemes. But it can be impractical if it takes seconds to make a decision. Therefore, the answer time or the reaction time is significant in each discrimination test. At the same time as raising the correct answer rate to about 100%, the learner is required to reduce the reaction time from 1000 ms to 1300 ms as a guide in the present apparatus. Conduct discrimination training so that the above standard can always be achieved on average. The apparatus has functions for performing the above-mentioned training, for example, a function for setting an answer screen, a function for reproducing and presenting voice, an answer input function, a reaction time measuring function, a replay function, a feedback function, and the like. Have.

又、本装置は、コンピュータを用いて行なうネットワークシステムを利用して、地理的に離れた場所の学習者に対しても同様の訓練を実施提供することが出来る。サーバ、又はホストコンピュータに本装置を起動実施するために必要なファイル、例えば資料としての「音声ファイル」、各種訓練テストの実施手順を登録した「テスト実施ファイル」、テストを実施する為のアプリケーションファイルを予め搭載しておく。遠隔地にいる各学習者はインターネット、又はＬＡＮを介して自分のコンピュータをサーバに接続する。学習者は予め自分のコンピュータにアプリケーションをインストールしておくか、又はアプリケーションファイルをダウンロードして起動する。開かれたアプリケーションの画面を見ながらインストラクションに従って操作することで、自動的にサーバから好みのテスト実施ファイルをダウンロードしてテストを実施する事が出来る。個々のテスト結果はファイルとしてサーバに自動的に保存される。必要に応じて学習者は自己の結果をアプリケーションから閲覧することが出来る。尚、アプリケーションファイルは開くと自動的にサーバの当該ファイルを読み込むように設定しておく。こうしてネットワークを使用しない場合と同様の訓練を居ながらにして行なう事が出来る。尚通信速度に応じて多少の時間的な遅れが予想されるが、今後の技術革新によってこの欠点は解消する事が可能だと思われる。 In addition, the present apparatus can implement and provide similar training to learners at geographically distant places by using a network system performed using a computer. Files required to start and execute the device on the server or host computer, such as "sound files" as materials, "test execution files" in which various training test execution procedures are registered, and application files for executing tests Is mounted in advance. Each learner at a remote location connects his computer to the server via the Internet or LAN. The learner installs the application on his / her computer in advance, or downloads and starts the application file. By operating according to the instructions while viewing the screen of the opened application, it is possible to automatically download a desired test execution file from the server and execute the test. Each test result is automatically saved on the server as a file. If necessary, the learner can browse his / her own results from the application. It should be noted that the application file is set so as to automatically read the file on the server when opened. In this way, it is possible to carry out the same training as when not using a network. Although a slight time delay is expected depending on the communication speed, it is considered that this disadvantage can be solved by future technological innovation.

今後技術の進歩に伴って音声の解析精度、合成精度、及び表示技術が向上すれば、発音及び識別訓練方法はそれに伴って改善され、より一層効率的に実施することが可能になると思われる。今回の発明は、それらの予想される技術的進歩の結果を、装置の基本的概念を変更することなく、訓練実施資料の一部として取り入れて行くことが十分に可能である。 If the analysis accuracy of speech, the synthesis accuracy, and the display technology are improved with the advance of technology in the future, the pronunciation and discrimination training method will be improved accordingly, and it will be possible to carry out the training more efficiently. The present invention is fully capable of incorporating the results of those anticipated technological advances as part of the training material without changing the basic concept of the device.

又、今回の発明に採用された基本的概念は、音素「ｌ」及び「ｒ」にだけ限定されたものではない。一般に外国語を学習する場合、自国語に無い新しい音声を習得する際にも当該概念は応用可能である。すなわち学習者は、「ｌ」及び「ｒ」の場合と同様に、新しい音声の解析的特徴、及び知覚的特徴を、必要に応じて知覚可能な提示方式に変換する事で、容易に獲得する事が出来る。得られた基礎知識に基づいて当該音素の発音及び識別の訓練を、本装置に採用された訓練方法と同様な方法にて実施する事が出来る。 The basic concept adopted in the present invention is not limited to the phonemes “l” and “r”. Generally, when learning a foreign language, the concept can be applied to learning a new voice that is not in the native language. That is, the learner easily obtains the new speech analytic and perceptual features by converting them into a perceptible presentation method as necessary, as in the case of “l” and “r”. Can do things. Based on the obtained basic knowledge, the pronunciation and discrimination training of the phoneme can be performed by a method similar to the training method adopted in the present device.

英語の音声「ｌａ」の概略図Schematic diagram of English voice "la" 英語の音声「ｒａ」の概略図Schematic diagram of English voice "ra" 人工合成音のＦ３パラメーターの概略図Schematic diagram of F3 parameters of artificially synthesized sounds 音素「ｌａ」の模範パターンModel pattern of phoneme "la" 音素「ｒａ」の模範パターンModel pattern of phoneme "ra" 音素「ｌ」又は「ｒ」が単語の中間に来た場合の模範パターンModel pattern when phoneme "l" or "r" comes in the middle of a word 音素「ｌ」又は「ｒ」が単語の最後に来た場合の模範パターンExample pattern when phoneme "l" or "r" comes to the end of word 音素「ｌ」又は「ｒ」が母音「ａ」に移行して行く過程The process by which the phoneme "l" or "r" shifts to the vowel "a"

Explanation of reference numerals

Ｈｚ振動数
ｔ時間（単位１／１０００秒）
Ｆ１フォルマント１
Ｆ２フォルマント２
Ｆ３フォルマント３
Hz Frequency t Time (1/1000 second)
F1 Formant 1
F2 Formant 2
F3 Formant 3

Claims

The present apparatus is designed so that a learner trying to learn the pronunciation of English phonemes “l” and “r” can objectively convert the model pattern of a person whose native language is English and the model pattern of the learner, including voice. This is a device that enables the user to simultaneously practice pronunciation while comparing and contrasting with the above. For this purpose, the present device pre-emphasizes the characteristic patterns of the sounds “la” and “ra” uttered for a few seconds by temporally emphasizing the parts of the English phonemes “l” and “r” as model patterns. It is registered. In addition, a function for displaying the model pattern and the imitation pattern of the learner in parallel on the same screen in real time and continuously in accordance with the frequency of the learner, and reproducing both sounds as necessary. It has functions for:

This device is an identification assisting device that emphasizes the characteristic differences between the English phonemes “l” and “r”, expresses them more clearly, and allows the learner to quickly learn the features of both phonemes. . For this purpose, the present apparatus pre-processes a basic voice (for example, “li” or “re”) of a person whose native language is English, including phonemes “l” or “r”, and preprocesses the processed voice. It is registered. Furthermore, in order for the learner to compare and contrast the two phonemes, grasp the difference between them, and to facilitate the storage, the processed voice, the basic voice, and the word may be used as a single sound or as a pair (for example, li-ri, bright-bright). Etc.) Many audio materials are registered so that they can be reproduced at specific time intervals. In order to acquire more practical discriminating ability, a sentence containing either or both of the phonemes is also registered as the presentation voice.

The present device is also a training device for making the discriminating ability of phonemes “l” and “r” of an English learner close to the discriminating ability of native speakers. For this purpose, the apparatus previously generates a plurality of artificially synthesized voices based on the voice analysis results of the English voices “la” and “ra”, and re-generates them based on the native language identification criteria. Audio materials classified as “la” or “ra” are registered.

The present apparatus is also a training apparatus for efficiently operating various registered materials described in claims 2 and 3. Therefore, it is designed to maximize the efficiency of organizing the materials, presenting the materials, the learner's answer method, the type and presentation of the feedback, and reporting the results according to the purpose.

This apparatus is intended to enhance the ability to identify English by using a computer-based network system (for example, a communication means that connects computers over a wide area via the Internet or a limited area via a LAN). It provides the same opportunities and support as remote training for remote learners.

The speech training method adopted in the present device can also be applied to a case where a learner of a foreign language other than English learns and acquires a new speech that is not in his / her own language. The basic concept of the present invention, i.e., new speech features, is analyzed perceptually and analytically and provided to learners in an auditory and visually understandable manner. Based on the acquired basic knowledge, the learner performs training of pronunciation and identification of the voice in the same manner as the method adopted in the present invention. Without changing the basic concept, it is possible to easily achieve the purpose of acquiring a new voice using the present device. It should be noted that the present apparatus can incorporate the analysis results of the new technology without changing the basic concept of the present invention with respect to the expected advance of the voice analysis technology.