JPH04115299A

JPH04115299A - Method and device for voiced/voiceless sound decision making

Info

Publication number: JPH04115299A
Application number: JP2236752A
Authority: JP
Inventors: Hiroko Kezuka; 毛塚　博子; Koji Yoshida; 幸司吉田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-09-05
Filing date: 1990-09-05
Publication date: 1992-04-16

Abstract

PURPOSE:To decide whether a voice signal is voiced or voiceless by predicting the state of the voice signal and adapting a threshold value for deciding whether or not there is a pitch and a threshold value for deciding the stationary/nonstationary state of a spectrum to the state of a voice which is inputted next. CONSTITUTION:A pitch presence/absence decision means 11 decides whether or not the input voice signal has a pitch by using the threshold value. Then a spectrum stationary/nonstationary state decision means 12 decides whether or not the spectrum of the input signal has variation from the spectrum in the last frame, namely, the stationary or nonstationary state by using the threshold value. Then a voiced/voiceless decision means 13 decides that the input signal which has no pitch and a stationary spectrum is a voiceless signal and other signals are voiced signals. Then a voice section predicting means 14 decides the state of the current voice signal from those decision results and predict the state of the next frame. Then a threshold value setting means 15 sets the threshold value suitable for the pitch presence/absence decision making and the threshold value suitable for the stationary/nonstationary spectrum decision making according to the prediction result.

Description

【発明の詳細な説明】産業上の利用分野本発明は、ディジタル通信、ボイスメール、パケット通
信等に利用する音声有音無音判定方法および装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a method and apparatus for determining voice presence/absence for use in digital communications, voice mail, packet communications, and the like.

従来の技術第３図には従来の音声有音無音判定方法の一例が示され
ている。第３図において、まず一定時間（以下、フレー
ムという。）毎に音声信号を入力しくステップ１）、入
力した音声信号にピッチが有るか無いかを判定する（ス
テップ２）。ピッチが無ければ、さらに、入力した音声
信号のスペクトルが複数個前のフレームに渡って定常か
非定常かを判定する（ステップ３）。そしてピッチが無
く、かつスペクトルが定常であれば、入力した音声信号
を無音と判定しくステップ５）、これ以外を有音と判定
する（ステップ４）。BACKGROUND OF THE INVENTION FIG. 3 shows an example of a conventional method for determining voice presence/absence. In FIG. 3, first, an audio signal is input at regular time intervals (hereinafter referred to as frames) (step 1), and it is determined whether or not the input audio signal has a pitch (step 2). If there is no pitch, it is further determined whether the spectrum of the input audio signal is stationary or non-stationary over a plurality of previous frames (step 3). If there is no pitch and the spectrum is stationary, the input audio signal is determined to be silent (step 5), and anything else is determined to be sound (step 4).

このように、上記従来の音声有音無音判定方法でもピッ
チが無くスペクトルが定常な雑音信号を無音と判定し、
スペクトルの変化の激しい子音やスペクトルは定常でも
ピッチが有る母音を有音と判定することができる。In this way, even with the above-mentioned conventional speech presence/non-speech determination method, a noise signal with no pitch and a stationary spectrum is determined to be silent,
Even if a consonant whose spectrum changes rapidly or a vowel whose spectrum is stationary, it can be determined that it is voiced.

発明が解決しようとする課題しかしながら、上記従来の音声有音無音判定方法では、
ピッチが無くスペクトルが定常であれば無音と判定して
いるため、スペクトルが非定常であってもピッチの無い
無声子音や、ピッチの判定が困難な有声子音を誤って無
音と判定してしまうという問題があった。Problems to be Solved by the Invention However, in the above-mentioned conventional voice presence/absence determination method,
If there is no pitch and the spectrum is stationary, it is determined to be silent, so even if the spectrum is non-stationary, voiceless consonants with no pitch or voiced consonants whose pitch is difficult to determine are mistakenly determined to be silent. There was a problem.

本発明は、このような従来の問題を解決するものであり
、上記のような無声子音や有声子音を正しく音声と判断
することのできる精度の高い音声有音無音判定方法およ
びその装置を提供することを目的とする。The present invention solves such conventional problems, and provides a highly accurate speech presence/non-speech determination method and device that can correctly determine the above-mentioned voiceless consonants and voiced consonants as speech. The purpose is to

課題を解決するための手段本発明は上記目的を達成するために、入力した音声信号
においてピッチが有るか無いかの判定結果とスペクトル
が定常か非定常かの判定結果と音声が有るか無いかの判
定結果とから次に入力する音声信号が無音区間であるか
、子音区間であるか、または母音区間であるか等の音声
信号の状態を予測し、この予測に基き、ピッチが有るか
無いかを判定するための閾値とスペクトルが定常か非定
常かを判定するための闇値とをそれぞれ次に入力される
音声の状態に適応させて、音声の有音無音の判定を行な
うようにしたものである。Means for Solving the Problems In order to achieve the above-mentioned object, the present invention provides a determination result of whether or not there is a pitch in an input audio signal, a determination result of whether the spectrum is stationary or non-stationary, and a determination result of whether there is voice or not. Based on the judgment result, predict the state of the next input audio signal, such as whether it is a silent section, a consonant section, or a vowel section, and based on this prediction, whether there is a pitch or not. The threshold value for determining whether the spectrum is stationary or unsteady is adapted to the state of the next input voice to determine whether the voice is voiced or not. It is something.

作用本発明は上記構成により次のような作用を有する。すな
わち、現在のフレームにおけるピッチのを無判定結果と
スペクトルの定常非定常判定結果と有音無音判定結果に
基づいて次のフレームの音声信号の状態を予測し、その
予測結果に従ってピッチの有無判定のための閾値と、ス
ペクトルが定常か非定常かの判定のための閾値とをそれ
ぞれ適宜設定することによって、音声の状態に応じた有
音無音の判定を行なうことができ、したがって音声有音
無音判定の精度を上げることができるという効果を有す
る。Effects The present invention has the following effects due to the above structure. In other words, the state of the audio signal of the next frame is predicted based on the pitch non-judgment result of the current frame, the steady/unsteady spectrum determination result, and the presence/absence determination result, and the pitch presence/absence determination is performed according to the prediction result. By appropriately setting the threshold value for determining whether the spectrum is stationary or non-stationary, it is possible to determine whether the voice is present or not according to the state of the voice. This has the effect of increasing the accuracy of.

実施例第１図は本発明の一実施例の構成を示すものである。第
１図において、１１はピッチ有無判定手段であり、入力
した音声信号にピッチがをるか無いかを判定する。１２
はスペクトル定常非定常判定手段であり、入力した音声
信号のスペクトルが定常であるか非定常であるかを判定
する。１３は音声有音無音判定手段であり、ピッチ有無
判定手段１１とスペクトル定常非定常判定手段１２とで
判定した結果に基づいて入力した音声信号に音声が有る
か無いかを判定する。１４は音声区間予測手段であり、
ピッチ有無判定手段１１とスペクトル定常非定常判定手
段１２と音声有音無音判定手段１３のそれぞれの判定結
果に基づいて現在の音声の状態を判定し、次のフレーム
の音声の状態を予測する。１５は閾値設定手段であり、
音声区間予測手段１４により得られた結果に基づいてピ
ッチ有無判定手段１１とスペクトル定常非定常判定手段
１２とにおける判定に適応した閾値を設定する。Embodiment FIG. 1 shows the configuration of an embodiment of the present invention. In FIG. 1, reference numeral 11 denotes a pitch presence/absence determining means, which determines whether or not the input audio signal has a pitch. 12
is a spectrum stationary/unstationary determining means, which determines whether the spectrum of the input audio signal is stationary or unsteady. Reference numeral 13 denotes a voice presence/absence determination means, which determines whether or not there is voice in the input audio signal based on the results determined by the pitch presence/absence determination means 11 and the spectral stationary non-stationary determination means 12 . 14 is a speech interval prediction means;
Based on the respective determination results of the pitch presence/absence determination means 11, the spectral stationary non-stationary determination means 12, and the voice presence/absence determination means 13, the current state of the sound is determined, and the state of the sound of the next frame is predicted. 15 is a threshold value setting means;
Based on the results obtained by the speech interval prediction means 14, a threshold value adapted to the determination by the pitch presence/absence determination means 11 and the spectral stationary non-stationary determination means 12 is set.

次に上記実施例の動作について第２図を用いて説明する
。第２図において、まずＡ／Ｄ変換された音声信号をフ
レーム毎に入力しくステップ２１）、ピッチ有無判定手
段１１において、入力した音声信号にピッチが有るか無
いかを判定する（ステップ２２）。この判定には種々の
方法が考えられるが、この場合はピッチの有無を判定す
るパラメータが閾値以下であればピッチが無いと、閾値
以上であればピッチが有ると判定する。次いでスペクト
ル定常非定常判定手段１２において、入力した信号のス
ペクトルが前のフレームのスペクトルとの間に変動が有
るか無いか、すなわち定常であるか非定常であるかを閾
値を用いて判定する（ステップ２３）。この場合、変動
が閾値以下であれば定常と、閾値以上であれば非定常と
判定する。そして、音声有音無音判定手段１３において
、ピッチが無くかつスペクトルが定常と判定された入力
信号を無音と判定する（ステップ２５）ピッチが有る場
合とスペクトルが非定常である場合は有音と判定する（
ステップ２４）。Next, the operation of the above embodiment will be explained using FIG. 2. In FIG. 2, first, an A/D converted audio signal is input frame by frame (step 21), and the pitch presence/absence determining means 11 determines whether or not the input audio signal has a pitch (step 22). Various methods can be considered for this determination, but in this case, if the parameter for determining the presence or absence of pitch is less than or equal to a threshold value, it is determined that there is no pitch, and if it is greater than or equal to the threshold value, it is determined that there is a pitch. Next, the spectrum stationary/unsteady determination means 12 uses a threshold to determine whether there is a variation in the spectrum of the input signal with the spectrum of the previous frame, that is, whether it is stationary or unsteady ( Step 23). In this case, if the variation is less than or equal to the threshold value, it is determined to be steady, and if it is greater than or equal to the threshold value, it is determined to be unsteady. Then, the voice presence/absence determining means 13 determines as silence the input signal for which there is no pitch and the spectrum is determined to be stationary (step 25).If there is a pitch or the spectrum is unsteady, it is determined to be voice. do(
Step 24).

次に、音声区間予測手段１４において、これらの判定結
果から現在の音声信号の状態を判定して、次のフレーム
の信号の状態を予測する（ステップ２６）。すなわち、
無音と判定されたならば無音区間と判定し、無音区間が
続いた後にスペクトルの急激な変動があれば音声の立ち
上がり区間すなわち子音区間であると判定し、立ち上が
り区間の次にピッチが有るフレームが続いたならば音声
の母音区間であると判定し、母音区間が続いた後スペク
トルの変動があり、直ぐにピッチが有るフレームが順に
続けば音声の渡り区間であると判定し、ピッチが無けれ
ば無音区間と判定する。Next, the audio section prediction means 14 determines the current state of the audio signal from these determination results and predicts the signal state of the next frame (step 26). That is,
If it is determined to be silent, it is determined to be a silent section, and if there is a sudden change in the spectrum after the silent section, it is determined that it is a rising section of speech, that is, a consonant section, and the next frame after the rising section is a frame with a pitch. If it continues, it is determined that it is a vowel section of the voice, and if there is a change in the spectrum after the vowel zone continues, and frames with pitch immediately follow in order, it is determined that it is a transition section of voice, and if there is no pitch, it is determined that there is no sound. It is determined that it is an interval.

そして、閾値設定手段１５において、ステップ２６の予
測結果に基づいて、ピッチの有無の判定に適応した閾値
と、スペクトルが定常か非定常かの判定に適応した閾値
とをそれぞれ設定する（ステップ２７）。すなわち、音
声区間予測手段１４において、入力した信号が無音区間
と判定された場合は、次に入力される信号は無音信号ま
たは音声の立ち上がり信号であると予測して、スペクト
ルの定常非定常を判定する閾値を下げてスペクトルに対
する感度を高くし、音声の立ち上がり区間を判定しやす
くする。また、入力した信号が子音区間であると判定さ
れた場合は、次に入力される信号は母音区間と予測され
、ピッチの閾値を下げてピッチの感度を高くシ、かつス
ペクトルの閾値を上げてスペクトルの変化に対する感度
を低くし、母音を判定しやすくする。また、入力した信
号が母音区間と判定された場合は、次に入力される信号
は音声の渡り区間または無音区間と予測され、スペクト
ルの閾値を下げてスペクトルの変化に対する感度を高く
して音声の渡りと無音とを判定しやすくするとともに、
スペクトルの変化が大きくなった後には次に母音が続く
ので、ピッチに対する感度を高くして母音を判定しやす
くする。Then, in the threshold value setting means 15, a threshold value adapted to determine the presence or absence of a pitch and a threshold value adapted to determine whether the spectrum is stationary or unsteady are set, respectively, based on the prediction result of step 26 (step 27). . In other words, when the input signal is determined to be a silent interval, the voice interval prediction means 14 predicts that the next input signal is a silent signal or a voice rising signal, and determines whether the spectrum is stationary or non-stationary. The threshold value is lowered to increase the sensitivity to the spectrum, making it easier to determine the rising edge of the voice. Additionally, if the input signal is determined to be a consonant interval, the next input signal is predicted to be a vowel interval, and the pitch threshold is lowered to increase the pitch sensitivity and the spectral threshold is raised. Lowers sensitivity to changes in the spectrum, making it easier to identify vowels. In addition, if the input signal is determined to be a vowel interval, the next input signal is predicted to be a voice transition interval or a silent interval, and the spectral threshold is lowered to increase the sensitivity to spectral changes. In addition to making it easier to distinguish between crossing and silence,
Since a vowel follows after a large change in the spectrum, the sensitivity to pitch is increased to make it easier to determine the vowel.

このように、上記実施例によれば、入力した音声信号に
おいてピッチが有るか無いかの判定結果とスペクトルが
定常か非定常かの判定結果と音声が有るか無いかの判定
結果とから次の信号が音声の無音区間であるか、音声の
立ち上がりの子音区間であるか、音声の母音区間である
か、または音声の渡り区間であるか等を予測し、この予
測に基づいて、ピッチが有るか無いかを判定するための
閾値とスペクトルが定常か非定常かを判定するための閾
値をそれぞれ次に入力する音声の状態に応じて適宜変化
させるようにしたので、音声の有音無音の判定を正確に
行なうことができ、音声有音無音判定の精度を上げるこ
とができる。In this way, according to the above embodiment, the following is calculated based on the determination result of whether or not there is a pitch in the input audio signal, the determination result of whether the spectrum is stationary or unsteady, and the determination result of whether voice is present or absent. It predicts whether the signal is a silent section of speech, a consonant section at the beginning of speech, a vowel section of speech, or a transition section of speech, and based on this prediction, the pitch is determined. The threshold value for determining whether the spectrum is stationary or non-stationary, and the threshold value for determining whether the spectrum is stationary or non-stationary, are changed as appropriate depending on the state of the next input voice, so it is easy to determine whether the voice is voiced or not. can be performed accurately, and the accuracy of voice presence/absence determination can be improved.

発明の効果本発明は上記実施例から明らかなように、入力した音声
信号においてピッチが有るか無いかの判定結果とスペク
トルが定常か非定常かの判定結果と音声が有るか無いか
の判定結果とから次の信号が無音区間であるか、子音区
間であるか、または母音区間であるか等の音声信号の状
態を予測し、この予測に基づいて、ピッチが有るか無い
かを判定するための閾値とスペクトルが定常か非定常か
を判定するための閾値とをそれぞれ次に入力される音声
の状態に適応するように変化させて、音声の有音無音の
判定を行なうようにしたので、音声の有音無音の判定を
正確に行なうことができ、精度の高い音声有音無音判定
方法およびその装置を実現することができるという効果
を有する。Effects of the Invention As is clear from the above-mentioned embodiments, the present invention provides a determination result as to whether or not there is a pitch in an input audio signal, a determination result as to whether the spectrum is stationary or non-stationary, and a determination result as to whether there is voice or not. To predict the state of the audio signal such as whether the next signal is a silent section, a consonant section, or a vowel section from and the threshold for determining whether the spectrum is stationary or non-stationary, respectively, are changed to adapt to the state of the next input voice to determine whether the voice is voiced or not. The present invention has the advantage that it is possible to accurately determine whether a voice is uttered or not, and it is possible to realize a highly accurate method and device for determining whether a voice is uttered or not.

[Brief explanation of drawings]

第１図は本発明の一実施例における音声有音無音判定装
置の概略ブロック図、第２図は同装置の動作を説明する
ためのフローチャート、第３図は従来の音声有音無音判
定方法を説明するためのフローチャートである。１１・・・ピッチ有無判定手段、１２・・・スペクトル
定常非定常判定手段、１３・・・音声有音無音判定手段
、１４・・・音声区間予測手段、１５・・・閾値設定手
段。代理人の氏名　弁理士小鍜治　明ほか２名々午図FIG. 1 is a schematic block diagram of a voice presence/absence determination device according to an embodiment of the present invention, FIG. 2 is a flowchart for explaining the operation of the device, and FIG. It is a flow chart for explanation. DESCRIPTION OF SYMBOLS 11... Pitch presence/absence determination means, 12... Spectrum steady/unsteady determination means, 13... Voice presence/absence determination means, 14... Voice segment prediction means, 15... Threshold value setting means. Name of agent: Patent attorney Akira Okaji and two others

Claims

[Claims]

(1) Determine whether the input audio signal has a pitch or not, determine whether the spectrum is stationary or unsteady, determine whether there is audio or not based on these results, and The state of the next input audio signal is predicted from the determination result, and based on this prediction, a threshold value for determining the presence or absence of pitch and a threshold value for determining whether the spectrum is stationary or non-stationary are respectively input. A voice presence/absence determination method is configured to be set in accordance with the state of the voice being played.

(2) Pitch presence/absence determination means for determining whether the input audio signal has a pitch or not; spectrum stationary/unsteady determination means for determining whether the spectrum is stationary or unsteady; A voice presence/no-sound determination means for determining whether there is voice or not; a voice interval prediction means for predicting the state of the next input voice signal from the respective determination results of the respective means; and a pitch prediction means based on the prediction. A voice presence/absence determination device comprising a threshold value setting means for setting a threshold value adapted to determine the presence or absence of a spectrum and a threshold value adapted to determine whether a spectrum is stationary or unsteady.