JPH04115299A - Method and device for voiced/voiceless sound decision making - Google Patents
Method and device for voiced/voiceless sound decision makingInfo
- Publication number
- JPH04115299A JPH04115299A JP2236752A JP23675290A JPH04115299A JP H04115299 A JPH04115299 A JP H04115299A JP 2236752 A JP2236752 A JP 2236752A JP 23675290 A JP23675290 A JP 23675290A JP H04115299 A JPH04115299 A JP H04115299A
- Authority
- JP
- Japan
- Prior art keywords
- spectrum
- stationary
- pitch
- threshold value
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
Description
【発明の詳細な説明】
産業上の利用分野
本発明は、ディジタル通信、ボイスメール、パケット通
信等に利用する音声有音無音判定方法および装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a method and apparatus for determining voice presence/absence for use in digital communications, voice mail, packet communications, and the like.
従来の技術
第3図には従来の音声有音無音判定方法の一例が示され
ている。第3図において、まず一定時間(以下、フレー
ムという。)毎に音声信号を入力しくステップ1)、入
力した音声信号にピッチが有るか無いかを判定する(ス
テップ2)。ピッチが無ければ、さらに、入力した音声
信号のスペクトルが複数個前のフレームに渡って定常か
非定常かを判定する(ステップ3)。そしてピッチが無
く、かつスペクトルが定常であれば、入力した音声信号
を無音と判定しくステップ5)、これ以外を有音と判定
する(ステップ4)。BACKGROUND OF THE INVENTION FIG. 3 shows an example of a conventional method for determining voice presence/absence. In FIG. 3, first, an audio signal is input at regular time intervals (hereinafter referred to as frames) (step 1), and it is determined whether or not the input audio signal has a pitch (step 2). If there is no pitch, it is further determined whether the spectrum of the input audio signal is stationary or non-stationary over a plurality of previous frames (step 3). If there is no pitch and the spectrum is stationary, the input audio signal is determined to be silent (step 5), and anything else is determined to be sound (step 4).
このように、上記従来の音声有音無音判定方法でもピッ
チが無くスペクトルが定常な雑音信号を無音と判定し、
スペクトルの変化の激しい子音やスペクトルは定常でも
ピッチが有る母音を有音と判定することができる。In this way, even with the above-mentioned conventional speech presence/non-speech determination method, a noise signal with no pitch and a stationary spectrum is determined to be silent,
Even if a consonant whose spectrum changes rapidly or a vowel whose spectrum is stationary, it can be determined that it is voiced.
発明が解決しようとする課題
しかしながら、上記従来の音声有音無音判定方法では、
ピッチが無くスペクトルが定常であれば無音と判定して
いるため、スペクトルが非定常であってもピッチの無い
無声子音や、ピッチの判定が困難な有声子音を誤って無
音と判定してしまうという問題があった。Problems to be Solved by the Invention However, in the above-mentioned conventional voice presence/absence determination method,
If there is no pitch and the spectrum is stationary, it is determined to be silent, so even if the spectrum is non-stationary, voiceless consonants with no pitch or voiced consonants whose pitch is difficult to determine are mistakenly determined to be silent. There was a problem.
本発明は、このような従来の問題を解決するものであり
、上記のような無声子音や有声子音を正しく音声と判断
することのできる精度の高い音声有音無音判定方法およ
びその装置を提供することを目的とする。The present invention solves such conventional problems, and provides a highly accurate speech presence/non-speech determination method and device that can correctly determine the above-mentioned voiceless consonants and voiced consonants as speech. The purpose is to
課題を解決するための手段
本発明は上記目的を達成するために、入力した音声信号
においてピッチが有るか無いかの判定結果とスペクトル
が定常か非定常かの判定結果と音声が有るか無いかの判
定結果とから次に入力する音声信号が無音区間であるか
、子音区間であるか、または母音区間であるか等の音声
信号の状態を予測し、この予測に基き、ピッチが有るか
無いかを判定するための閾値とスペクトルが定常か非定
常かを判定するための闇値とをそれぞれ次に入力される
音声の状態に適応させて、音声の有音無音の判定を行な
うようにしたものである。Means for Solving the Problems In order to achieve the above-mentioned object, the present invention provides a determination result of whether or not there is a pitch in an input audio signal, a determination result of whether the spectrum is stationary or non-stationary, and a determination result of whether there is voice or not. Based on the judgment result, predict the state of the next input audio signal, such as whether it is a silent section, a consonant section, or a vowel section, and based on this prediction, whether there is a pitch or not. The threshold value for determining whether the spectrum is stationary or unsteady is adapted to the state of the next input voice to determine whether the voice is voiced or not. It is something.
作用
本発明は上記構成により次のような作用を有する。すな
わち、現在のフレームにおけるピッチのを無判定結果と
スペクトルの定常非定常判定結果と有音無音判定結果に
基づいて次のフレームの音声信号の状態を予測し、その
予測結果に従ってピッチの有無判定のための閾値と、ス
ペクトルが定常か非定常かの判定のための閾値とをそれ
ぞれ適宜設定することによって、音声の状態に応じた有
音無音の判定を行なうことができ、したがって音声有音
無音判定の精度を上げることができるという効果を有す
る。Effects The present invention has the following effects due to the above structure. In other words, the state of the audio signal of the next frame is predicted based on the pitch non-judgment result of the current frame, the steady/unsteady spectrum determination result, and the presence/absence determination result, and the pitch presence/absence determination is performed according to the prediction result. By appropriately setting the threshold value for determining whether the spectrum is stationary or non-stationary, it is possible to determine whether the voice is present or not according to the state of the voice. This has the effect of increasing the accuracy of.
実施例
第1図は本発明の一実施例の構成を示すものである。第
1図において、11はピッチ有無判定手段であり、入力
した音声信号にピッチがをるか無いかを判定する。12
はスペクトル定常非定常判定手段であり、入力した音声
信号のスペクトルが定常であるか非定常であるかを判定
する。13は音声有音無音判定手段であり、ピッチ有無
判定手段11とスペクトル定常非定常判定手段12とで
判定した結果に基づいて入力した音声信号に音声が有る
か無いかを判定する。14は音声区間予測手段であり、
ピッチ有無判定手段11とスペクトル定常非定常判定手
段12と音声有音無音判定手段13のそれぞれの判定結
果に基づいて現在の音声の状態を判定し、次のフレーム
の音声の状態を予測する。15は閾値設定手段であり、
音声区間予測手段14により得られた結果に基づいてピ
ッチ有無判定手段11とスペクトル定常非定常判定手段
12とにおける判定に適応した閾値を設定する。Embodiment FIG. 1 shows the configuration of an embodiment of the present invention. In FIG. 1, reference numeral 11 denotes a pitch presence/absence determining means, which determines whether or not the input audio signal has a pitch. 12
is a spectrum stationary/unstationary determining means, which determines whether the spectrum of the input audio signal is stationary or unsteady. Reference numeral 13 denotes a voice presence/absence determination means, which determines whether or not there is voice in the input audio signal based on the results determined by the pitch presence/absence determination means 11 and the spectral stationary non-stationary determination means 12 . 14 is a speech interval prediction means;
Based on the respective determination results of the pitch presence/absence determination means 11, the spectral stationary non-stationary determination means 12, and the voice presence/absence determination means 13, the current state of the sound is determined, and the state of the sound of the next frame is predicted. 15 is a threshold value setting means;
Based on the results obtained by the speech interval prediction means 14, a threshold value adapted to the determination by the pitch presence/absence determination means 11 and the spectral stationary non-stationary determination means 12 is set.
次に上記実施例の動作について第2図を用いて説明する
。第2図において、まずA/D変換された音声信号をフ
レーム毎に入力しくステップ21)、ピッチ有無判定手
段11において、入力した音声信号にピッチが有るか無
いかを判定する(ステップ22)。この判定には種々の
方法が考えられるが、この場合はピッチの有無を判定す
るパラメータが閾値以下であればピッチが無いと、閾値
以上であればピッチが有ると判定する。次いでスペクト
ル定常非定常判定手段12において、入力した信号のス
ペクトルが前のフレームのスペクトルとの間に変動が有
るか無いか、すなわち定常であるか非定常であるかを閾
値を用いて判定する(ステップ23)。この場合、変動
が閾値以下であれば定常と、閾値以上であれば非定常と
判定する。そして、音声有音無音判定手段13において
、ピッチが無くかつスペクトルが定常と判定された入力
信号を無音と判定する(ステップ25)ピッチが有る場
合とスペクトルが非定常である場合は有音と判定する(
ステップ24)。Next, the operation of the above embodiment will be explained using FIG. 2. In FIG. 2, first, an A/D converted audio signal is input frame by frame (step 21), and the pitch presence/absence determining means 11 determines whether or not the input audio signal has a pitch (step 22). Various methods can be considered for this determination, but in this case, if the parameter for determining the presence or absence of pitch is less than or equal to a threshold value, it is determined that there is no pitch, and if it is greater than or equal to the threshold value, it is determined that there is a pitch. Next, the spectrum stationary/unsteady determination means 12 uses a threshold to determine whether there is a variation in the spectrum of the input signal with the spectrum of the previous frame, that is, whether it is stationary or unsteady ( Step 23). In this case, if the variation is less than or equal to the threshold value, it is determined to be steady, and if it is greater than or equal to the threshold value, it is determined to be unsteady. Then, the voice presence/absence determining means 13 determines as silence the input signal for which there is no pitch and the spectrum is determined to be stationary (step 25).If there is a pitch or the spectrum is unsteady, it is determined to be voice. do(
Step 24).
次に、音声区間予測手段14において、これらの判定結
果から現在の音声信号の状態を判定して、次のフレーム
の信号の状態を予測する(ステップ26)。すなわち、
無音と判定されたならば無音区間と判定し、無音区間が
続いた後にスペクトルの急激な変動があれば音声の立ち
上がり区間すなわち子音区間であると判定し、立ち上が
り区間の次にピッチが有るフレームが続いたならば音声
の母音区間であると判定し、母音区間が続いた後スペク
トルの変動があり、直ぐにピッチが有るフレームが順に
続けば音声の渡り区間であると判定し、ピッチが無けれ
ば無音区間と判定する。Next, the audio section prediction means 14 determines the current state of the audio signal from these determination results and predicts the signal state of the next frame (step 26). That is,
If it is determined to be silent, it is determined to be a silent section, and if there is a sudden change in the spectrum after the silent section, it is determined that it is a rising section of speech, that is, a consonant section, and the next frame after the rising section is a frame with a pitch. If it continues, it is determined that it is a vowel section of the voice, and if there is a change in the spectrum after the vowel zone continues, and frames with pitch immediately follow in order, it is determined that it is a transition section of voice, and if there is no pitch, it is determined that there is no sound. It is determined that it is an interval.
そして、閾値設定手段15において、ステップ26の予
測結果に基づいて、ピッチの有無の判定に適応した閾値
と、スペクトルが定常か非定常かの判定に適応した閾値
とをそれぞれ設定する(ステップ27)。すなわち、音
声区間予測手段14において、入力した信号が無音区間
と判定された場合は、次に入力される信号は無音信号ま
たは音声の立ち上がり信号であると予測して、スペクト
ルの定常非定常を判定する閾値を下げてスペクトルに対
する感度を高くし、音声の立ち上がり区間を判定しやす
くする。また、入力した信号が子音区間であると判定さ
れた場合は、次に入力される信号は母音区間と予測され
、ピッチの閾値を下げてピッチの感度を高くシ、かつス
ペクトルの閾値を上げてスペクトルの変化に対する感度
を低くし、母音を判定しやすくする。また、入力した信
号が母音区間と判定された場合は、次に入力される信号
は音声の渡り区間または無音区間と予測され、スペクト
ルの閾値を下げてスペクトルの変化に対する感度を高く
して音声の渡りと無音とを判定しやすくするとともに、
スペクトルの変化が大きくなった後には次に母音が続く
ので、ピッチに対する感度を高くして母音を判定しやす
くする。Then, in the threshold value setting means 15, a threshold value adapted to determine the presence or absence of a pitch and a threshold value adapted to determine whether the spectrum is stationary or unsteady are set, respectively, based on the prediction result of step 26 (step 27). . In other words, when the input signal is determined to be a silent interval, the voice interval prediction means 14 predicts that the next input signal is a silent signal or a voice rising signal, and determines whether the spectrum is stationary or non-stationary. The threshold value is lowered to increase the sensitivity to the spectrum, making it easier to determine the rising edge of the voice. Additionally, if the input signal is determined to be a consonant interval, the next input signal is predicted to be a vowel interval, and the pitch threshold is lowered to increase the pitch sensitivity and the spectral threshold is raised. Lowers sensitivity to changes in the spectrum, making it easier to identify vowels. In addition, if the input signal is determined to be a vowel interval, the next input signal is predicted to be a voice transition interval or a silent interval, and the spectral threshold is lowered to increase the sensitivity to spectral changes. In addition to making it easier to distinguish between crossing and silence,
Since a vowel follows after a large change in the spectrum, the sensitivity to pitch is increased to make it easier to determine the vowel.
このように、上記実施例によれば、入力した音声信号に
おいてピッチが有るか無いかの判定結果とスペクトルが
定常か非定常かの判定結果と音声が有るか無いかの判定
結果とから次の信号が音声の無音区間であるか、音声の
立ち上がりの子音区間であるか、音声の母音区間である
か、または音声の渡り区間であるか等を予測し、この予
測に基づいて、ピッチが有るか無いかを判定するための
閾値とスペクトルが定常か非定常かを判定するための閾
値をそれぞれ次に入力する音声の状態に応じて適宜変化
させるようにしたので、音声の有音無音の判定を正確に
行なうことができ、音声有音無音判定の精度を上げるこ
とができる。In this way, according to the above embodiment, the following is calculated based on the determination result of whether or not there is a pitch in the input audio signal, the determination result of whether the spectrum is stationary or unsteady, and the determination result of whether voice is present or absent. It predicts whether the signal is a silent section of speech, a consonant section at the beginning of speech, a vowel section of speech, or a transition section of speech, and based on this prediction, the pitch is determined. The threshold value for determining whether the spectrum is stationary or non-stationary, and the threshold value for determining whether the spectrum is stationary or non-stationary, are changed as appropriate depending on the state of the next input voice, so it is easy to determine whether the voice is voiced or not. can be performed accurately, and the accuracy of voice presence/absence determination can be improved.
発明の効果
本発明は上記実施例から明らかなように、入力した音声
信号においてピッチが有るか無いかの判定結果とスペク
トルが定常か非定常かの判定結果と音声が有るか無いか
の判定結果とから次の信号が無音区間であるか、子音区
間であるか、または母音区間であるか等の音声信号の状
態を予測し、この予測に基づいて、ピッチが有るか無い
かを判定するための閾値とスペクトルが定常か非定常か
を判定するための閾値とをそれぞれ次に入力される音声
の状態に適応するように変化させて、音声の有音無音の
判定を行なうようにしたので、音声の有音無音の判定を
正確に行なうことができ、精度の高い音声有音無音判定
方法およびその装置を実現することができるという効果
を有する。Effects of the Invention As is clear from the above-mentioned embodiments, the present invention provides a determination result as to whether or not there is a pitch in an input audio signal, a determination result as to whether the spectrum is stationary or non-stationary, and a determination result as to whether there is voice or not. To predict the state of the audio signal such as whether the next signal is a silent section, a consonant section, or a vowel section from and the threshold for determining whether the spectrum is stationary or non-stationary, respectively, are changed to adapt to the state of the next input voice to determine whether the voice is voiced or not. The present invention has the advantage that it is possible to accurately determine whether a voice is uttered or not, and it is possible to realize a highly accurate method and device for determining whether a voice is uttered or not.
第1図は本発明の一実施例における音声有音無音判定装
置の概略ブロック図、第2図は同装置の動作を説明する
ためのフローチャート、第3図は従来の音声有音無音判
定方法を説明するためのフローチャートである。
11・・・ピッチ有無判定手段、12・・・スペクトル
定常非定常判定手段、13・・・音声有音無音判定手段
、14・・・音声区間予測手段、15・・・閾値設定手
段。
代理人の氏名 弁理士小鍜治 明ほか2名々午
図FIG. 1 is a schematic block diagram of a voice presence/absence determination device according to an embodiment of the present invention, FIG. 2 is a flowchart for explaining the operation of the device, and FIG. It is a flow chart for explanation. DESCRIPTION OF SYMBOLS 11... Pitch presence/absence determination means, 12... Spectrum steady/unsteady determination means, 13... Voice presence/absence determination means, 14... Voice segment prediction means, 15... Threshold value setting means. Name of agent: Patent attorney Akira Okaji and two others
Claims (2)
するとともに、スペクトルが定常であるか非定常である
かを判定し、これらの結果から音声が有るか無いかを判
定して、それぞれの判定結果から次に入力する音声信号
の状態を予測し、この予測に基づいてピッチの有無の判
定のための閾値とスペクトルが定常か非定常かの判定の
ための閾値とをそれぞれ次に入力される音声の状態に適
応させて設定するようにした音声有音無音判定方法。(1) Determine whether the input audio signal has a pitch or not, determine whether the spectrum is stationary or unsteady, determine whether there is audio or not based on these results, and The state of the next input audio signal is predicted from the determination result, and based on this prediction, a threshold value for determining the presence or absence of pitch and a threshold value for determining whether the spectrum is stationary or non-stationary are respectively input. A voice presence/absence determination method is configured to be set in accordance with the state of the voice being played.
するピッチ有無判定手段と、スペクトルが定常であるか
非定常であるかを判定するスペクトル定常非定常判定手
段と、これらの判定結果から音声が有るか無いかを判定
する音声有音無音判定手段と、前記各手段によるそれぞ
れの判定結果から、次に入力する音声信号の状態を予測
する音声区間予測手段と、その予測に基づいてピッチの
有無の判定に適応した閾値とスペクトルが定常か非定常
かの判定に適応した閾値とを設定する閾値設定手段とを
備えた音声有音無音判定装置。(2) Pitch presence/absence determination means for determining whether the input audio signal has a pitch or not; spectrum stationary/unsteady determination means for determining whether the spectrum is stationary or unsteady; A voice presence/no-sound determination means for determining whether there is voice or not; a voice interval prediction means for predicting the state of the next input voice signal from the respective determination results of the respective means; and a pitch prediction means based on the prediction. A voice presence/absence determination device comprising a threshold value setting means for setting a threshold value adapted to determine the presence or absence of a spectrum and a threshold value adapted to determine whether a spectrum is stationary or unsteady.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2236752A JPH04115299A (en) | 1990-09-05 | 1990-09-05 | Method and device for voiced/voiceless sound decision making |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2236752A JPH04115299A (en) | 1990-09-05 | 1990-09-05 | Method and device for voiced/voiceless sound decision making |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH04115299A true JPH04115299A (en) | 1992-04-16 |
Family
ID=17005270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2236752A Pending JPH04115299A (en) | 1990-09-05 | 1990-09-05 | Method and device for voiced/voiceless sound decision making |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH04115299A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0713584A (en) * | 1992-10-05 | 1995-01-17 | Matsushita Electric Ind Co Ltd | Speech detecting device |
JP2002091470A (en) * | 2000-09-20 | 2002-03-27 | Fujitsu Ten Ltd | Voice section detection device |
JP2009122710A (en) * | 1998-08-21 | 2009-06-04 | Panasonic Corp | Parameter extraction apparatus and parameter extraction method |
JP2010230814A (en) * | 2009-03-26 | 2010-10-14 | Fujitsu Ltd | Audio signal evaluation program, audio signal evaluation apparatus, and audio signal evaluation method |
WO2020039598A1 (en) * | 2018-08-24 | 2020-02-27 | 日本電気株式会社 | Signal processing device, signal processing method, and signal processing program |
-
1990
- 1990-09-05 JP JP2236752A patent/JPH04115299A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0713584A (en) * | 1992-10-05 | 1995-01-17 | Matsushita Electric Ind Co Ltd | Speech detecting device |
JP2009122710A (en) * | 1998-08-21 | 2009-06-04 | Panasonic Corp | Parameter extraction apparatus and parameter extraction method |
JP4527175B2 (en) * | 1998-08-21 | 2010-08-18 | パナソニック株式会社 | Spectral parameter smoothing apparatus and spectral parameter smoothing method |
JP2010186190A (en) * | 1998-08-21 | 2010-08-26 | Panasonic Corp | Quantized lsp parameter dynamic feature extractor and quantized lsp parameter dynamic feature extracting method |
JP2002091470A (en) * | 2000-09-20 | 2002-03-27 | Fujitsu Ten Ltd | Voice section detection device |
JP2010230814A (en) * | 2009-03-26 | 2010-10-14 | Fujitsu Ltd | Audio signal evaluation program, audio signal evaluation apparatus, and audio signal evaluation method |
US8532986B2 (en) | 2009-03-26 | 2013-09-10 | Fujitsu Limited | Speech signal evaluation apparatus, storage medium storing speech signal evaluation program, and speech signal evaluation method |
WO2020039598A1 (en) * | 2018-08-24 | 2020-02-27 | 日本電気株式会社 | Signal processing device, signal processing method, and signal processing program |
JPWO2020039598A1 (en) * | 2018-08-24 | 2021-08-12 | 日本電気株式会社 | Signal processing equipment, signal processing methods and signal processing programs |
US11769517B2 (en) | 2018-08-24 | 2023-09-26 | Nec Corporation | Signal processing apparatus, signal processing method, and signal processing program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5331784B2 (en) | Speech end pointer | |
KR19980080615A (en) | Voice activity detection method and apparatus | |
JPH08263097A (en) | Method for recognition of word of speech and system for discrimination of word of speech | |
JPH0431898A (en) | Voice/noise separating device | |
JPH04115299A (en) | Method and device for voiced/voiceless sound decision making | |
Zhang et al. | Improved modeling for F0 generation and V/U decision in HMM-based TTS | |
JPH05257490A (en) | Speech rate conversion method and device | |
Ijitona et al. | Improved silence-unvoiced-voiced (SUV) segmentation for dysarthric speech signals using linear prediction error variance | |
JPH0950288A (en) | Device and method for recognizing voice | |
JPWO2009025142A1 (en) | Speaker speed conversion system and method, and speed conversion apparatus | |
JPS60129796A (en) | Sillable boundary detection system | |
JPH03114100A (en) | Voice section detecting device | |
JP4313724B2 (en) | Audio reproduction speed adjustment method, audio reproduction speed adjustment program, and recording medium storing the same | |
JP2001083978A (en) | Speech recognition device | |
KR100345402B1 (en) | An apparatus and method for real - time speech detection using pitch information | |
JP6790851B2 (en) | Speech processing program, speech processing method, and speech processor | |
KR100322704B1 (en) | Method for varying voice signal duration time | |
JPH0772899A (en) | Device for voice recognition | |
JPS5925240B2 (en) | Word beginning detection method for speech sections | |
JPS59149400A (en) | Syllable boundary selection system | |
JPH0456999B2 (en) | ||
CN116705025A (en) | Vehicle-mounted terminal communication method | |
KR100322203B1 (en) | Device and method for recognizing sound in car | |
JPH01165000A (en) | Vocal sound section information forming apparatus | |
JPH07225592A (en) | Device for detecting sound section |