JP4517838B2

JP4517838B2 - Audio processing device

Info

Publication number: JP4517838B2
Application number: JP2004352671A
Authority: JP
Inventors: 資之渋谷
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2004-12-06
Filing date: 2004-12-06
Publication date: 2010-08-04
Anticipated expiration: 2024-12-06
Also published as: JP2006162849A

Description

本発明は、人間の音声を認識する技術に関する。 The present invention relates to a technique for recognizing human speech.

語学学習において発音の練習を行う場合、例えば、ＣＤ（Compact Disk）等の記録媒体に記録された模範音声を再生し、その模範音声の真似をして発音するという学習方法が用いられている。この学習を行う際、学習をより効果的に進めるためには、模範音声と自分の音声との差を客観的に評価する必要があるが、ＣＤに記録された模範音声を聞いてその真似をするだけでは、自分の発音が正しいものであるのかを具体的に把握することが困難であるという問題がある。 When practicing pronunciation in language learning, for example, a learning method is used in which a model voice recorded on a recording medium such as a CD (Compact Disk) is reproduced and pronounced by imitating the model voice. When performing this learning, in order to advance learning more effectively, it is necessary to objectively evaluate the difference between the model voice and one's own voice. By listening to the model voice recorded on the CD, the imitation can be imitated. There is a problem that it is difficult to know in detail whether your pronunciation is correct just by doing.

自分の発音の正誤を具体的に認識する技術としては、例えば特許文献１に開示されている技術がある。特許文献１には、人間が発する通鼻音、無声音、有声音や摩擦音等を、各音毎に設けられたマイクロホンにより検出し、各音のレベルをＬＥＤ（Light Emitting Diode）の点灯個数により表示する発音練習装置が開示されている。この装置によれば、通常では目に見えない各音のレベルが可視化され、正しい発音であるか否かを具体的に把握することができる。 As a technique for specifically recognizing correctness of his / her pronunciation, there is a technique disclosed in Patent Document 1, for example. In Patent Document 1, nasal sounds, unvoiced sounds, voiced sounds, friction sounds, and the like emitted by humans are detected by a microphone provided for each sound, and the level of each sound is displayed by the number of lighted LEDs (Light Emitting Diodes). A pronunciation practice device is disclosed. According to this apparatus, the level of each sound that is normally invisible is visualized, and it can be specifically grasped whether or not the sound is correct.

また、音の認識を行う技術としては、例えば特許文献２に開示された技術がある。特許文献２に開示された装置は、パラボラ反射器の焦平面上にマイクロホンアレイを備えており、マイクロホンアレイを構成する各マイクロホンからの出力信号に基づいてパラボラ反射器前方の音圧分布を検知する。そして、測定された音圧分布と、予め記憶している正しい音圧分布とを比較し、その異同により音の異常を判断する。
特開平１１−３５２８７６号公報特開平６−１１３３８７号公報 As a technique for recognizing sound, there is a technique disclosed in Patent Document 2, for example. The device disclosed in Patent Document 2 includes a microphone array on the focal plane of a parabolic reflector, and detects a sound pressure distribution in front of the parabolic reflector based on an output signal from each microphone constituting the microphone array. . Then, the measured sound pressure distribution is compared with the correct sound pressure distribution stored in advance, and the abnormality of the sound is determined based on the difference.
JP-A-11-352876 JP-A-6-113387

ところで、特許文献１に開示された発音練習装置においては、使用者が発した音声を正しく検出するために、各マイクロホンを収めた筐体を、鼻および口を覆うように顔面に密着させ、各音毎に設けられたマイクロホンを鼻や口唇のすぐ近傍に位置決めする必要がある。しかしながら、このように鼻や口を覆ってしまうと、使用者によっては不自由や不快感を感じる者もあり、また装置が顔に触れてしまうため、不特定多数の者により使用される場合には、衛生面からの不快感を感じる者もある。そして、不快感からユーザが位置決めを正しく行わないと、音声を正しく認識できなくなるという問題が生じてしまう。また、特許文献２に開示された装置においては、パラボラアンテナを使用しているため装置が大掛かりとなるという問題があり、ここでも装置の使用の不自由さという問題が生じてしまう。 By the way, in the pronunciation training device disclosed in Patent Document 1, in order to correctly detect the voice uttered by the user, the housing containing each microphone is brought into close contact with the face so as to cover the nose and mouth, It is necessary to position a microphone provided for each sound in the immediate vicinity of the nose and lips. However, if you cover the nose or mouth in this way, some users may feel inconvenient or uncomfortable, and the device will touch the face, so when used by an unspecified number of people Some people feel discomfort from the hygiene side. And if a user does not position correctly from discomfort, the problem that a sound cannot be recognized correctly will arise. Further, the apparatus disclosed in Patent Document 2 has a problem that the apparatus becomes large because a parabolic antenna is used, and the problem of inconvenience of using the apparatus also occurs here.

本発明は、上述した背景の下になされたものであり、ユーザに不自由さや不快感を与えることなく、ユーザの音声認識をできるようにする技術を提供することを目的とする。 The present invention has been made under the above-described background, and an object of the present invention is to provide a technique that enables a user to recognize a voice without causing inconvenience or discomfort to the user.

上述した課題を解決するために本発明は、標準言語音の音圧レベル分布および標準言語音の周波数スペクトルとを記憶する記憶手段と、マイクロホンがアレイ状に複数配設されたマイクロホンユニットと、前記マイクロホンユニットの各マイクロホンから出力される信号毎に、該信号が表す音の音圧レベルを検知する音圧レベル検知手段と、前記マイクロホンユニットの所定のマイクロホンから出力された信号の周波数スペクトルを検知するスペクトル検知手段と、前記音圧レベル検知手段により求められた複数の音圧レベルから、前記マイクロホンユニットにおいて前記マイクロホンが配設された面の音圧レベル分布を求める音圧分布検知手段と、前記音圧分布検知手段により求められた音圧レベル分布および前記スペクトル検知手段により求められた周波数スペクトルと、前記記憶手段に記憶された標準言語音の音圧レベル分布および標準言語音の周波数スペクトルとを比較し、前記マイクロホンユニットに入力された言語音を判別する音声判別手段と、前記音声判別手段の判別結果を出力する出力手段とを有する音声処理装置を提供する。
本発明においては、アレイ状にマイクロホンが配設されたマイクロホンユニットに向けてユーザは発声する。複数のマイクロホンに音声が入力され、アレイ状に配設された複数のマイクロホンから出力される信号から、発音した時の音圧レベル分布と周波数スペクトルが検知される。ユーザが発した音声を、この音圧レベル分布と周波数スペクトルとから判別するので精度よく音声を判別することが可能となり、ユーザはマイクロホンユニットをユーザに密着させる必要がない。 In order to solve the above-described problems, the present invention provides a storage means for storing a sound pressure level distribution of standard language sounds and a frequency spectrum of standard language sounds, a microphone unit in which a plurality of microphones are arranged in an array, For each signal output from each microphone of the microphone unit, sound pressure level detection means for detecting the sound pressure level of the sound represented by the signal, and the frequency spectrum of the signal output from the predetermined microphone of the microphone unit are detected. Spectrum detection means, sound pressure distribution detection means for obtaining a sound pressure level distribution on a surface of the microphone unit on which the microphone is disposed, from a plurality of sound pressure levels obtained by the sound pressure level detection means, and the sound The sound pressure level distribution obtained by the pressure distribution detection means and the spectrum detection means A speech discriminating means for comparing the obtained frequency spectrum with the sound pressure level distribution of the standard language sound and the frequency spectrum of the standard language sound stored in the storage means, and discriminating the language sound input to the microphone unit; And a voice processing apparatus having output means for outputting the discrimination result of the voice discrimination means.
In the present invention, a user speaks toward a microphone unit in which microphones are arranged in an array. Sound is input to a plurality of microphones, and a sound pressure level distribution and a frequency spectrum at the time of sound generation are detected from signals output from the plurality of microphones arranged in an array. Since the voice uttered by the user is discriminated from the sound pressure level distribution and the frequency spectrum, the voice can be discriminated with high accuracy, and the user does not need to bring the microphone unit into close contact with the user.

本発明の好ましい態様においては、音声処理装置は、前記複数の各マイクロホンに対応付けされた発光体を有し、前記音圧レベル検知手段により検知された各マイクロホンに対応する音圧レベルに基づいて、各マイクロホンに対応付けされた発光体を点灯させる制御手段を有する。また、他の好ましい態様においては、前記制御手段は、前記音圧レベルに応じて前記発光体の照度を可変するようにしてもよい。
また、他の好ましい態様においては、音声処理装置は、前記音圧分布検知手段により求められた音圧レベル分布を示す音圧分布画像を生成し、該音圧分布画像を表示する表示手段を有する。また、他の好ましい態様においては、音声処理装置は、少なくとも画像または音声のいずれかにより言語音の発声をユーザに要求する音声要求手段と、前記音声判別手段により判別された言語音と、前記音声要求手段により要求された言語音とが一致するか否かを判断する判断手段とを有し、前記出力手段は、前記判断手段の判断結果を出力する。 In a preferred aspect of the present invention, the sound processing device has a light emitter associated with each of the plurality of microphones, and is based on a sound pressure level corresponding to each microphone detected by the sound pressure level detecting means. And a control means for lighting a light emitter associated with each microphone. In another preferred embodiment, the control means may vary the illuminance of the light emitter according to the sound pressure level.
In another preferable aspect, the sound processing apparatus includes a display unit that generates a sound pressure distribution image indicating the sound pressure level distribution obtained by the sound pressure distribution detection unit and displays the sound pressure distribution image. . In another preferred embodiment, the speech processing apparatus includes speech request means for requesting the user to speak a language sound by at least either an image or sound, the language sound determined by the sound determination means, and the sound Judgment means for judging whether or not the speech sound requested by the request means matches, and the output means outputs a judgment result of the judgment means.

本発明によれば、ユーザは不快感を感じることなく容易に発音の認識ができる。 According to the present invention, the user can easily recognize the pronunciation without feeling uncomfortable.

［構成］
図１は、本発明の実施形態に係る音声処理装置１の構成を示した図である。図１に示したように、この音声処理装置１は、マイクロホンユニット１００と制御装置２００とに大別される。 [Constitution]
FIG. 1 is a diagram showing a configuration of a sound processing apparatus 1 according to an embodiment of the present invention. As shown in FIG. 1, the sound processing device 1 is roughly divided into a microphone unit 100 and a control device 200.

マイクロホンユニット１００は、図２に示したようにスタンド１４０と、スタンド１４０に取り付けられた基板１３０とを備えている。基板１３０上には、音声入力部２２０に接続された矩形のシリコンマイク１１０Ａ−１，１１０Ａ−２，・・・，１１０Ａ−１６〜シリコンマイク１１０Ｐ−１，１１０Ｐ−２，・・・，１１０Ｐ−１６が、図２に示したように縦横１６個ずつ格子上に配設されており、また、各シリコンマイクの縦方向の間には、制御部２１０に接続されたＬＥＤ１２０Ａ−１，１２０Ａ−２，・・・，１２０Ａ−１６〜ＬＥＤ１２０Ｐ−１，１２０Ｐ−２，・・・，１２０Ｐ−１６が配設されている。基板１３０上に配設された各シリコンマイクは入力された音声を電気信号に変換して出力し、各ＬＥＤは、制御部２１０の制御の下、点灯／消灯する。なお、各シリコンマイクは各々同じ構成であるため、以下個々のシリコンマイクを特に区別する必要の無い場合には、シリコンマイク１１０と記載する。また同様の理由によりＬＥＤについても、以下個々のＬＥＤを特に区別する必要の無い場合には、ＬＥＤ１２０と記載する。また、シリコンマイク１１０とＬＥＤ１２０の配設個数は、上述した数に限定されるものではなく、他の個数でも良いのは勿論である。 The microphone unit 100 includes a stand 140 and a substrate 130 attached to the stand 140 as shown in FIG. On the substrate 130, rectangular silicon microphones 110A-1, 110A-2,..., 110A-16 to silicon microphones 110P-1, 110P-2,. As shown in FIG. 2, 16 LEDs 16A-1 and 120A-2 are connected to the control unit 210 between the vertical directions of the silicon microphones. ,..., 120A-16 to LEDs 120P-1, 120P-2,. Each silicon microphone disposed on the substrate 130 converts the input sound into an electrical signal and outputs it, and each LED is turned on / off under the control of the control unit 210. Since the respective silicon microphones have the same configuration, the individual silicon microphones are hereinafter referred to as silicon microphones 110 when it is not necessary to distinguish them. For the same reason, the LED is hereinafter referred to as an LED 120 when it is not necessary to distinguish the individual LEDs. In addition, the number of silicon microphones 110 and LEDs 120 disposed is not limited to the number described above, and may be other numbers.

音声入力部２２０は各シリコンマイク１１０から出力された電気信号を受取るインターフェースとして機能し、入力された全ての電気信号を音圧レベル検知部２３０へ出力すると共に、所定のシリコンマイクから出力された電気信号を音声検知部２４０へ出力する。音圧レベル検知部２３０は、音声入力部２２０から出力された個々の電気信号から（即ち、各シリコンマイクから出力された電気信号毎に）各シリコンマイクに入力された音の音圧レベルを算出し、算出した音圧レベルを示す音圧データを制御部２１０のＣＰＵ２１１へ出力する。音声検知部２４０は、入力された電気信号をサンプリングしてデジタルデータとして記憶し、このデータから高速フーリエ変換により周波数分析を行い、電気信号が表す音声のスペクトルを求める。次に音声検知部２４０は、このスペクトルからホルマント周波数を求め、予め記憶している各種音声毎のホルマント周波数と比較し、どのような音声を発音したかを、例えばパターンマッチング等の手法により判断する。そして音声検知部２４０は、判別した音を示す発音データを生成し、この発音データを制御部２１０へ出力する。 The audio input unit 220 functions as an interface for receiving the electric signals output from the respective silicon microphones 110, outputs all the input electric signals to the sound pressure level detection unit 230, and outputs the electric signals output from the predetermined silicon microphones. The signal is output to the voice detection unit 240. The sound pressure level detection unit 230 calculates the sound pressure level of the sound input to each silicon microphone from each electrical signal output from the sound input unit 220 (that is, for each electrical signal output from each silicon microphone). Then, sound pressure data indicating the calculated sound pressure level is output to the CPU 211 of the control unit 210. The voice detection unit 240 samples the input electrical signal and stores it as digital data, performs frequency analysis by fast Fourier transform from this data, and obtains the spectrum of the voice represented by the electrical signal. Next, the voice detection unit 240 obtains a formant frequency from this spectrum, compares it with the formant frequency stored for each kind of voice in advance, and determines what kind of voice is pronounced by a technique such as pattern matching. . Then, the sound detection unit 240 generates sound generation data indicating the determined sound, and outputs the sound generation data to the control unit 210.

制御部２１０は、ＣＰＵ２１１、ＲＯＭ２１２、ＲＡＭ２１３、およびＨＤＤ（Hard Disk Drive）２１４を備えている。ＣＰＵ２１１は、ＲＯＭ２１２に記憶されているプログラムまたはＨＤＤ２１４に記憶されているプログラムを読み出して実行し、音声処理装置１の各部を制御する。ＨＤＤ２１４は、各種アプリケーションプログラムやデータを記憶する記憶装置であり、発音の練習を行うためのアプリケーションを実現させる発音練習プログラムや、このアプリケーションで使用されるデータであって、母音や子音等の音を表す音声データ、母音や子音等の音を発音した時の音圧分布を示す音圧分布データ、ＬＥＤ１２０を点灯させるか否かを判断するための閾値データ等を記憶している。 The control unit 210 includes a CPU 211, ROM 212, RAM 213, and HDD (Hard Disk Drive) 214. The CPU 211 reads out and executes a program stored in the ROM 212 or a program stored in the HDD 214 and controls each unit of the sound processing device 1. The HDD 214 is a storage device for storing various application programs and data, and is a pronunciation practice program for realizing an application for practicing pronunciation, and data used in this application. Voice data to be represented, sound pressure distribution data indicating a sound pressure distribution when a sound such as a vowel or a consonant is sounded, threshold data for determining whether or not the LED 120 is lit, and the like are stored.

表示部２１５は、ＣＲＴ（Cathode Ray Tube）またはＬＣＤ（Liquid Crystal Display）等の表示装置を備えており、ＣＰＵ２１１の制御下で文字や画像を表示する。操作部２１６は、キーボードおよびマウス（いずれも図示略）を備えている。ユーザは、この操作部２１６を操作することにより、制御部２１０に対して各種指示を入力することができる。 The display unit 215 includes a display device such as a CRT (Cathode Ray Tube) or an LCD (Liquid Crystal Display), and displays characters and images under the control of the CPU 211. The operation unit 216 includes a keyboard and a mouse (both not shown). The user can input various instructions to the control unit 210 by operating the operation unit 216.

［動作］
次に実施形態の動作について説明する。ユーザが操作部２１６を操作して発音練習プログラムの実行を指示すると、ＣＰＵ２１１は、ＨＤＤ２１４から発音練習プログラムを読み出して実行する。発音練習プログラムが実行されると、練習する発音の選択を促す画面が例えば図４に例示したように表示部２１５に表示される（図３：ステップＳＡ１）。ここで、次のメニュー画面を表示する操作が操作部２１６において行われると（ステップＳＡ２；ＹＥＳ）、ＣＰＵ２１１は、例えば短母音の練習を促すメニュー画面や、単語の発音練習を促すメニュー画面を表示部２１５に表示する（ステップＳＡ３）。また、発音練習プログラムの実行を終了させる操作が行われると（ステップＳＡ４；ＮＯ、ステップＳＡ１３；ＹＥＳ）、ＣＰＵ２１１は、プログラムの実行を終了する。一方、練習する音を選択する操作がユーザにより行われると（ステップＳＡ４：ＹＥＳ）、ＣＰＵ２１１は、練習する音を発音する際の正しい口の形を発音記号と共に、例えば図５（ａ）に例示したように表示する（ステップＳＡ５）。そしてＣＰＵ２１１は、音圧レベル検知部２３０と、音声検知部２４０とからデータが出力されるのを待つ（ステップＳＡ６）。 [Operation]
Next, the operation of the embodiment will be described. When the user operates the operation unit 216 to instruct execution of the pronunciation practice program, the CPU 211 reads out and executes the pronunciation practice program from the HDD 214. When the pronunciation practice program is executed, a screen for prompting selection of pronunciation to be practiced is displayed on the display unit 215 as exemplified in FIG. 4 (FIG. 3: step SA1). Here, when an operation for displaying the next menu screen is performed on the operation unit 216 (step SA2; YES), the CPU 211 displays, for example, a menu screen for encouraging the practice of short vowels or the menu screen for encouraging the pronunciation of words. This is displayed on the part 215 (step SA3). When an operation for ending the execution of the pronunciation practice program is performed (step SA4; NO, step SA13; YES), the CPU 211 ends the execution of the program. On the other hand, when the user performs an operation for selecting the sound to be practiced (step SA4: YES), the CPU 211 exemplifies a correct mouth shape when the sound to be practiced is pronounced together with a phonetic symbol, for example, in FIG. It is displayed as described (step SA5). Then, the CPU 211 waits for data output from the sound pressure level detection unit 230 and the sound detection unit 240 (step SA6).

ユーザが、表示された口の形を真似、マイクロホンユニット１００に向かって発音すると、マイクロホンユニット１００に配設された各シリコンマイク１１０は、ユーザの発した音声を電気信号に変換して音声入力部２２０へ出力する。音声入力部２２０は、マイクロホンユニット１００から出力された電気信号が入力されると、入力された全ての電気信号を音圧レベル検知部２３０へ出力すると共に、所定のシリコンマイク（例えば、シリコンマイク１１０Ｈ−８）から出力された電気信号を音声検知部２４０へ出力する。 When the user imitates the displayed mouth shape and sounds toward the microphone unit 100, each silicon microphone 110 disposed in the microphone unit 100 converts the voice uttered by the user into an electric signal and converts the voice input unit. To 220. When the electrical signal output from the microphone unit 100 is input, the audio input unit 220 outputs all the input electrical signals to the sound pressure level detection unit 230 and a predetermined silicon microphone (for example, the silicon microphone 110H). The electrical signal output from −8) is output to the sound detection unit 240.

音圧レベル検知部２３０は、まずシリコンマイク１１０Ａ−１に対応する電気信号から順番に１１０Ａ−２，・・・，１１０Ａ−１６という順番で音圧データを生成し、次にシリコンマイク１１０Ｂ−１〜１１０Ｂ−１６，シリコンマイク１１０Ｃ−１〜１１０Ｃ−１６，・・・，シリコンマイク１１０Ｐ−１〜１１０Ｐ−１６という順番で音圧データを生成し、生成した順番で音圧データをＣＰＵ２１１へ出力する。また、音声検知部２４０は、入力された電気信号をサンプリングしてデジタルデータとして記憶し、このデータから高速フーリエ変換により周波数分析を行い、電気信号が表す音声のスペクトルを求める。そして、ユーザが発音した音（言語音）を判別し、判別した音を示す発音データをＣＰＵ２１１へ出力する。 The sound pressure level detection unit 230 first generates sound pressure data in order of 110A-2,..., 110A-16 from the electrical signal corresponding to the silicon microphone 110A-1, and then generates the silicon microphone 110B-1. 110B-16, silicon microphones 110C-1 to 110C-16,..., Silicon microphones 110P-1 to 110P-16 are generated, and the sound pressure data is output to the CPU 211 in the generated order. . In addition, the voice detection unit 240 samples the input electric signal and stores it as digital data, performs frequency analysis by fast Fourier transform from this data, and obtains the spectrum of the voice represented by the electric signal. Then, the sound (language sound) generated by the user is determined, and sound generation data indicating the determined sound is output to the CPU 211.

ＣＰＵ２１１は、音圧データおよび発音データが入力されると、入力された順番で音圧データをＲＡＭ２１３に格納し、また発音データをＲＡＭ２１３に格納する（ステップＳＡ７）。次にＣＰＵ２１１は、まずシリコンマイク１１０Ａ−１に対応する音圧データが表す音圧レベルと、閾値データが表す値とを比較する。そして音圧データが表す音圧レベルが、閾値データが表す値以上である場合には、シリコンマイク１１０Ａ−１の下にあるＬＥＤ１２０Ａ−１を点灯させ、音圧データが表す音圧レベルが、閾値データが表す値未満である場合には、シリコンマイク１１０Ａ−１の下にあるＬＥＤ１２０Ａ−１を消灯させる。そしてＣＰＵ２１１は、ＲＡＭ２１３に格納した順番で、各シリコンマイク１１０に対応する音圧データと閾値データとを比較し、各シリコンマイク１１０の下にあるＬＥＤ１２０の点灯／消灯を行う（ステップＳＡ８）。 When the sound pressure data and the sound generation data are input, the CPU 211 stores the sound pressure data in the RAM 213 in the input order, and stores the sound generation data in the RAM 213 (step SA7). Next, the CPU 211 first compares the sound pressure level represented by the sound pressure data corresponding to the silicon microphone 110A-1 with the value represented by the threshold data. If the sound pressure level represented by the sound pressure data is equal to or greater than the value represented by the threshold data, the LED 120A-1 under the silicon microphone 110A-1 is turned on, and the sound pressure level represented by the sound pressure data is When the data is less than the value represented, the LED 120A-1 under the silicon microphone 110A-1 is turned off. Then, the CPU 211 compares the sound pressure data corresponding to each silicon microphone 110 with the threshold data in the order stored in the RAM 213, and turns on / off the LED 120 under each silicon microphone 110 (step SA8).

次にＣＰＵ２１１は、ステップＳＡ４にて選択された音を正しく発音した際の音圧レベルの分布を示す音圧分布データをＨＤＤ２１４から読出す。そして、ＲＡＭ２１３に格納された音圧データ（各シリコンマイクに入力された音の音圧レベルを示すデータ）から、マイクロホンユニット１００の表面の音圧レベル分布を求め、求めた音圧レベル分布と、音圧分布データが示す音圧レベルの分布との一致を、パターンマッチング等の方法により見る（ステップＳＡ９）。ここで音圧レベル分布が一致する場合には（ステップＳＡ９：ＹＥＳ）、次に、ＣＰＵ２１１は、ステップＳＡ４にて選択された音を正しく発音した際の音を表す音声データをＨＤＤ２１４から読出し、ＲＡＭ２１３に格納された音声データとの一致を見る（ステップＳＡ１０）。ここで、音声が一致する場合には（ステップＳＡ１０：ＹＥＳ）、ユーザは発音を正しく行ったものと判断し、ユーザの発音と、正しい発音が行われたことを表す画面を図５（ｂ）に例示したように表示部２１５に表示する（ステップＳＡ１１）。 Next, the CPU 211 reads, from the HDD 214, sound pressure distribution data indicating the distribution of sound pressure levels when the sound selected in step SA4 is correctly generated. Then, from the sound pressure data stored in the RAM 213 (data indicating the sound pressure level of the sound input to each silicon microphone), the sound pressure level distribution on the surface of the microphone unit 100 is obtained, and the obtained sound pressure level distribution; The coincidence with the distribution of the sound pressure level indicated by the sound pressure distribution data is observed by a method such as pattern matching (step SA9). If the sound pressure level distributions match (step SA9: YES), the CPU 211 reads out voice data representing the sound when the sound selected in step SA4 is correctly pronounced from the HDD 214, and the RAM 213. A coincidence with the audio data stored in (step SA10). Here, if the voices match (step SA10: YES), it is determined that the user has made the pronunciation correctly, and a screen showing the user's pronunciation and that the correct pronunciation has been made is shown in FIG. As shown in FIG. 4, the information is displayed on the display unit 215 (step SA11).

一方、ステップＳＡ９でＮＯ、またはステップＳＡ１０でＮＯと判断した場合、即ち、ユーザの発音が正しくないと判断した場合、ＣＰＵ２１１は、表示部２１５を制御し、ユーザの発音と、正しい発音が行われなかったことを表す画面を、例えば図５（ｃ）に例示したように表示部２１５に表示する（ステップＳＡ１２）。 On the other hand, if NO is determined in step SA9 or NO in step SA10, that is, if it is determined that the user's pronunciation is not correct, the CPU 211 controls the display unit 215 to perform the user's pronunciation and correct pronunciation. A screen indicating that there is no such message is displayed on the display unit 215 as exemplified in FIG. 5C (step SA12).

以上説明したように本実施形態によれば、マイクロホンユニット１００により、発音に応じた音圧分布を検知することができ、スペクトラム分析と合わせて、精度良く音声を判別することができる。また、ユーザが発音した音声を検出するためのマイクロホンユニット１００は、ユーザから離して使用することができるため、ユーザは不快感を感じることなく、また衛生面からも安心して使用することができる。 As described above, according to the present embodiment, the microphone unit 100 can detect the sound pressure distribution according to the sound generation, and can accurately determine the sound together with the spectrum analysis. In addition, since the microphone unit 100 for detecting the sound produced by the user can be used away from the user, the user can use it without worrying about discomfort and also from the viewpoint of hygiene.

［変形例］
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。例えば、上述の実施形態を以下のように変形して本発明を実施してもよい。 [Modification]
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. For example, the present invention may be implemented by modifying the above-described embodiment as follows.

音圧データが表す音圧レベルに応じてＬＥＤの駆動電圧を制御し、各シリコンマイク１１０により検知された音圧レベルに応じて、各シリコンマイク１１０の下にあるＬＥＤ１２０の照度を可変するようにしてもよい。また、シリコンマイク１１０のみを配置したマイクロホンユニット１００を、表示部２１５の手前に配置し、音圧分布の表示を表示部２１５で行うようにしてもよい。近年、シリコンマイクは小型の物が開発されており、アレイ状に並べてもパラボラアンテナのような大掛かりな装置とならないため、このようにマイクロホンユニット１００を表示部２１５の手前に配置し、音圧分布を表示部２１５で確認することができる。 The LED drive voltage is controlled according to the sound pressure level represented by the sound pressure data, and the illuminance of the LED 120 under each silicon microphone 110 is varied according to the sound pressure level detected by each silicon microphone 110. May be. Alternatively, the microphone unit 100 in which only the silicon microphone 110 is disposed may be disposed in front of the display unit 215 so that the sound pressure distribution is displayed on the display unit 215. In recent years, small-sized silicon microphones have been developed, and even if they are arranged in an array, they do not become a large-scale device such as a parabolic antenna. Can be confirmed on the display unit 215.

正しく発音した場合の音圧レベル分布を、ユーザが発音する前に表示させておくようにしてもよい。また、複数の色を発光可能なＬＥＤを採用し、正しく発音した場合の音圧レベル分布と、ユーザの発音により得られた音圧レベル分布とを異なる色で点灯させるようにし、正しく発音した場合の音圧レベル分布とユーザの発音による音圧レベル分布の違いを視認できるようにしてもよい。 The sound pressure level distribution when correctly pronounced may be displayed before the user pronounces it. In addition, when adopting LEDs that can emit multiple colors, the sound pressure level distribution when correctly pronounced and the sound pressure level distribution obtained by the user's pronunciation are lit in different colors so that the sound is pronounced correctly The difference between the sound pressure level distribution and the sound pressure level distribution caused by the user's pronunciation may be visible.

上述した実施形態においては、入力された音声と所定の音声とが一致しているか否かをユーザに示すようにしているが、所定の音声との一致を判断するのではなく、発音された音声を認識し、認識した音声を表示するようにしてもよい。例えば、音声処理装置は、「thing」と発音したことを検知した場合には表示部２１５に「thing」と表示し、「sing」と発音したことを検知した場合には「sing」と表示する。ユーザは「thing」と発音した場合に「sing」と表示された場合には、自分の発音が正しく認識されず、発音が誤っていることを知ることができる。 In the above-described embodiment, the user is shown whether or not the input voice matches the predetermined voice, but the generated voice is not judged to match the predetermined voice. May be recognized and the recognized voice may be displayed. For example, the voice processing device displays “thing” on the display unit 215 when it detects that it pronounced “thing”, and displays “sing” when it detects that it pronounced “sing”. . When “sing” is displayed when the user pronounces “thing”, the user can know that his / her pronunciation is not recognized correctly and the pronunciation is incorrect.

言語音を正しく発音した際の周波数スペクトルをＨＤＤ２１４に記憶させておき、音声検知部２４０で求めた周波数スペクトルと比較して、ユーザが発音した言語音を判別するようにしてもよい。 The frequency spectrum when the language sound is correctly pronounced may be stored in the HDD 214 and compared with the frequency spectrum obtained by the voice detection unit 240 to determine the language sound pronounced by the user.

本発明の実施形態に係る音声処理装置の構成を示す図である。It is a figure which shows the structure of the audio processing apparatus which concerns on embodiment of this invention. マイクロホンユニット１００の外観図である。1 is an external view of a microphone unit 100. FIG. ＣＰＵ２１１が行う処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the process which CPU211 performs. 表示部２１５に表示される画面例を示した図である。6 is a diagram illustrating an example of a screen displayed on a display unit 215. FIG. 表示部２１５に表示される画面例を示した図である。6 is a diagram illustrating an example of a screen displayed on a display unit 215. FIG.

Explanation of symbols

１００・・・マイクロホンユニット、１１０Ａ−１〜１１０Ｐ−１６・・・シリコンマイク、１２０Ａ−１〜１２０Ｐ−１６・・・ＬＥＤ、１３０・・・基板、１４０・・・スタンド、２００・・・制御装置、２１０・・・制御部、２１１・・・ＣＰＵ、２１２・・・ＲＯＭ、２１３・・・ＲＡＭ、２１４・・・ＨＤＤ、２１５・・・表示部、２１６・・・操作部、２２０・・・音声入力部、２３０・・・音圧レベル検知部、２４０・・・音声検知部。 DESCRIPTION OF SYMBOLS 100 ... Microphone unit, 110A-1 to 110P-16 ... Silicon microphone, 120A-1 to 120P-16 ... LED, 130 ... Substrate, 140 ... Stand, 200 ... Control device , 210... Control unit, 211... CPU, 212... ROM, 213... RAM, 214... HDD, 215. Voice input unit, 230... Sound pressure level detection unit, 240.

Claims

Storage means for storing the sound pressure level distribution of the standard language sound and the frequency spectrum of the standard language sound;
A microphone unit in which a plurality of microphones are arranged in an array;
For each signal output from each microphone of the microphone unit, sound pressure level detection means for detecting the sound pressure level of the sound represented by the signal;
Spectrum detecting means for detecting a frequency spectrum of a signal output from a predetermined microphone of the microphone unit;
A sound pressure distribution detecting means for obtaining a sound pressure level distribution of a surface on which the microphone is disposed in the microphone unit from a plurality of sound pressure levels obtained by the sound pressure level detecting means;
The sound pressure level distribution obtained by the sound pressure distribution detection means and the frequency spectrum obtained by the spectrum detection means; the sound pressure level distribution of the standard language sound and the frequency spectrum of the standard language sound stored in the storage means; And voice discrimination means for discriminating the language sound input to the microphone unit;
An audio processing apparatus comprising: output means for outputting a discrimination result of the audio discrimination means.

The microphone unit has a light emitter associated with each of the plurality of microphones,
The sound according to claim 1, further comprising control means for lighting a light emitter associated with each microphone based on a sound pressure level corresponding to each microphone detected by the sound pressure level detecting means. Processing equipment.

The sound processing apparatus according to claim 2, wherein the control unit varies the illuminance of the light emitter according to the sound pressure level.

2. The sound processing apparatus according to claim 1, further comprising display means for generating a sound pressure distribution image indicating a sound pressure level distribution obtained by the sound pressure distribution detecting means and displaying the sound pressure distribution image. .

Voice request means for requesting the user to speak a language sound at least in either image or voice;
Determining means for determining whether or not the language sound determined by the sound determining means and the language sound requested by the sound requesting means match;
The speech processing apparatus according to claim 1, wherein the output unit outputs a determination result of the determination unit.