JPS59216198A - Sound/soundless discrimination system for voice - Google Patents
Sound/soundless discrimination system for voiceInfo
- Publication number
- JPS59216198A JPS59216198A JP9213583A JP9213583A JPS59216198A JP S59216198 A JPS59216198 A JP S59216198A JP 9213583 A JP9213583 A JP 9213583A JP 9213583 A JP9213583 A JP 9213583A JP S59216198 A JPS59216198 A JP S59216198A
- Authority
- JP
- Japan
- Prior art keywords
- signal
- voiced
- unvoiced
- waveform signal
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims description 14
- 238000005311 autocorrelation function Methods 0.000 claims description 10
- 230000005236 sound signal Effects 0.000 claims 2
- 238000004458 analytical method Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000000737 periodic effect Effects 0.000 description 4
- 238000005314 correlation function Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002542 deteriorative effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000002257 embryonic structure Anatomy 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004898 kneading Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000003437 trachea Anatomy 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Landscapes
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】
印 ・産業上の利用分野
未発明は音声分析台5yに用いられる音声の有声≦無声
判定方式に関する。DETAILED DESCRIPTION OF THE INVENTION The field of industrial application not yet invented relates to a voiced ≦ unvoiced determination method used in the voice analysis stand 5y.
(ロ) 従来技術
現在、音声合成装置としてニック−コール方式が主流、
となっており、その概WP1を第1図に示す0同図に於
いて(1)は人間の声帯振動の周期性vi−模擬して周
期パルス音源信号を発生する有声音源発生回路、(2)
は人間の気管での乱流振動の非周期性を模擬して雑音音
源信号を発生する無声音源発生回路である。(3)は上
記有声音源発生回路(1)から得られる周期パルス音源
信号又は無声音源発生回路(2)から得られる雑音音源
信吟ヲ選択する選択スイッチであ′る。(4)は該スイ
ッチ(3)から得られる音源信号にその信号本来の振巾
成分を付与する乗算器、(5)は人間の声道の音響特性
を模擬したディジタルフィルタであり、上記乗算器(4
)から得られる有声又は無声音源信号を濾波する事に依
って、有声音声又は無声音声の音声波形信号が出力され
る。(6)は該ディジタルフィルタ(5)から得られる
ディジタル値の音声波形信号をアナログ値に変換するD
@に変換器、(7)は該D−A変換器(6)からの音声
波形信号に基づいて、合成音声を発声するスピーカであ
る。(8)にパラメータメモリであり、上記有声音源発
生回路(1)での周期パルス音源信号の周期?設定する
と共に上記選択スイッチ(3)での音源信号の選択を指
示するピッチパラメータPと上記乗算器(4)での振巾
成分子’s>定するアングツくラメータAと、上記ディ
ジタルフィルタ(5)のフィルタ特性を決定するパーコ
ール係数’iF(= (Kn K21 ”’+ Lo)
と、がフレーム周期毎にえられている0
従って、無声音声を合成する時には、ピツチノくラメー
タPの値は0”となっており、選択スイッチ(3)ハこ
の値″0”を検知して、無声音源発生回路(2)からの
雑音音源信号を乗算器(4)に導入し、この音源信号に
アンプパラメータAの値を乗算する事に依って、無声音
源信号を得る事になる。一方、有声音声を合成する時に
はビツチノくラメータPの値は0”でない数値に依って
ピッチ周期を示しており、この値Pに依って、有声音源
発生回路(1)はその周期パルス音源信号の周期を設定
すると共に、選択スイッチ(3)はピッチパラメータP
の値が60”でない事を検知して有声音源発生回路(1
)からの周期パルス音源信号を乗算器(4)に導入しこ
の音源信号にアンプパラメ・−タルの値を乗算する事に
依り、有声音源信号を得る事になる。(b) Prior art At present, the Nick-Call method is the mainstream voice synthesizer.
The approximate WP1 is shown in FIG. )
is a silent sound source generation circuit that generates a noise sound source signal by simulating the non-periodic nature of turbulent vibration in the human trachea. (3) is a selection switch for selecting the periodic pulse sound source signal obtained from the voiced sound source generation circuit (1) or the noise sound source signal obtained from the unvoiced sound source generation circuit (2). (4) is a multiplier that adds the original amplitude component to the sound source signal obtained from the switch (3); (5) is a digital filter that simulates the acoustic characteristics of the human vocal tract; (4
), a voice waveform signal of voiced or unvoiced voice is output by filtering the voiced or unvoiced sound source signal obtained from the source. (6) is a D converter that converts the digital audio waveform signal obtained from the digital filter (5) into an analog value.
@ is a converter, and (7) is a speaker that produces synthesized speech based on the audio waveform signal from the D-A converter (6). (8) is a parameter memory, which is the period of the periodic pulse sound source signal in the voiced sound source generation circuit (1)? The pitch parameter P that is set and also instructs the selection of the sound source signal with the selection switch (3), the amplitude component element A of the multiplier (4), and the angle parameter A that determines the amplitude component element of the multiplier (4); ) is the Percoll coefficient 'iF(= (Kn K21 '''+ Lo) that determines the filter characteristics of
is obtained every frame period. Therefore, when synthesizing unvoiced speech, the value of the pitch parameter P is 0", and the selection switch (3) detects this value "0". , the noise source signal from the unvoiced source generating circuit (2) is introduced into the multiplier (4), and by multiplying this source signal by the value of the amplifier parameter A, an unvoiced source signal is obtained. , when synthesizing voiced speech, the value of bitch parameter P indicates the pitch period by a non-zero value, and depending on this value P, the voiced sound source generation circuit (1) determines the period of the periodic pulse sound source signal. In addition to setting the pitch parameter P, the selection switch (3)
The voiced sound source generation circuit (1
) is introduced into the multiplier (4) and this sound source signal is multiplied by the value of the amplifier parameter, thereby obtaining a voiced sound source signal.
斯して、得られた有声又は無声音源信号はパーコール係
数Kl+ ”2s・・・+ K10にてフィルタ特性が
制御されたディジタルフィルタ(51にて濾波され、さ
らにD−A変換(6)されてスピーカ(7)にて有声又
は無声の合成音声が発声される。The voiced or unvoiced sound source signal thus obtained is filtered by a digital filter (51) whose filter characteristics are controlled by a Percoll coefficient Kl+"2s...+K10, and further subjected to D-A conversion (6). A voiced or unvoiced synthesized voice is uttered by a speaker (7).
斯様な音声合成装置に於いては、そのノくラメータメモ
リ(8)に・貯えておくべきパラメータP、A。In such a speech synthesizer, parameters P and A should be stored in its parameter memory (8).
K1−に1゜を予じめ元音声の波形@号に基づいて、1
0m sea程度のフレーム周期毎に分析抽出しておく
必要がある0この分析の為の装置としては従来、第2図
に示す如く、音声波形信号Sの0次から140次までの
自己相関関係値V。−■140 k算出する相関器fu
llと、該相関器(11)から得られる0次から10次
の各自己相関関数値■。〜V、o<C基づいて1次から
10次のパーコール係数に、〜Lr+に導出するパーコ
ール係数抽出器a21と、これとは別に上記相関器11
11から得られる各自己相関関数値Vll〜V140並
びに上記パーコール係数抽出器(12+から得られる1
0個のパーコール係数に1〜KIOに基づ5い・て変形
相関、処理を行ない音声波形信号の変形相関関数W(τ
)を導出する変形相関器(131と、該賢形相関器日か
らの相関関数W(τ)の最大値pm’を求め、この時の
遅れ時間τをピッチパラメータPとして出力する最大値
検出器財と、上記パーコール係数抽出器財てのパーコー
ル係数Kl’=KIOの抽出に併なって得られる1フレ
一ム周期単位の残差電力Ei上記最大値検出器a句から
得られるピッチパラメータPに基づいて1ピツチ毎に配
分したアンプパラメータAを算出するアンプパラメータ
抽出器卵と、を用いて、各パラメータP、A、に、〜K
IGを得ていた。ζらには、ピッチパラメータPの値を
60”とするか否かに依る有声無声の判定は、有声無声
判定部側に依って行なわれ、この判定条件については特
公昭55−34956号に詳しく記載されている様に、
上記最大値検出−路114+から得られる変形相関関数
W(τ)の最大値ρmと上記パーコール係数抽出器(+
21から得られる1次のパーコール係数に、に0.5倍
した値0−5x+と全加算した値、即ち2m+0.5x
、に求め、この値を特定の閾値tと比較し、
p m +0.5 K、≧tの時、有声音声であり、逆
に、
p m+ 0.5 K1 < tの時、無声音声である
と判定しており、これに基づく判定信号Uを出力してい
た0
しかしながら、斯様な従来のパーコール方式の音声分析
方法に於いては、その王たるパラメータであって情報量
の最も大きなパーコール係数に、〜に+o ’に求める
に当って相関器1ullにて0次から10次までの自己
相関係数■。〜■1゜を算出するだけで良いのに対して
、情報量が比較的小さなピッチパラメータP及び有声無
声の判定信号Ui得るのに相関器[111にてざらに1
1次から140次までの自己相関関数値Vll〜V14
0を算出しなければならず、この為の膨大な計算量、並
びにこね1等間数値v1□〜v14flを用いf?:、
変形相関器t131でのさらに膨大な計鎖、量が斯る分
析過程に要する計算量の太−?を占めていた。Set 1° to K1- in advance based on the waveform of the original audio.
It is necessary to analyze and extract every frame period of about 0m sea.As a conventional device for this analysis, as shown in Figure 2, the autocorrelation values from the 0th to the 140th order of the audio waveform signal S are used. V. -■140k correlator fu to calculate
ll, and each 0th to 10th order autocorrelation function value ■ obtained from the correlator (11). A Percoll coefficient extractor a21 that derives Percoll coefficients from 1st order to 10th order to ~Lr+ based on ~V, o<C, and the above-mentioned correlator 11 separately from this.
Each autocorrelation function value Vll to V140 obtained from 11 and the above Percoll coefficient extractor (1 obtained from 12+
The modified correlation function W(τ
), and a maximum value detector that calculates the maximum value pm' of the correlation function W(τ) from the wise correlator date and outputs the delay time τ at this time as the pitch parameter P. and the Percoll coefficient Kl' of all the above Percoll coefficient extractor goods = residual power Ei in one frame period unit obtained along with the extraction of KIO; the pitch parameter P obtained from the maximum value detector a phrase Using an amplifier parameter extractor that calculates the amplifier parameter A distributed for each pitch based on the parameter P, A, ~K
I was getting IG. ζ et al., the voiced/unvoiced determination based on whether or not the value of the pitch parameter P is set to 60'' is performed by the voiced/unvoiced determining section, and this determination condition is detailed in Japanese Patent Publication No. 55-34956. As stated,
The maximum value ρm of the modified correlation function W(τ) obtained from the maximum value detection path 114+ and the Percoll coefficient extractor (+
The value obtained by adding the first-order Percoll coefficient obtained from 21 by 0.5 times 0-5x+, that is, 2m+0.5x
, and compare this value with a specific threshold t. When p m + 0.5 K, ≥ t, it is voiced speech, and conversely, when p m + 0.5 K1 < t, it is unvoiced speech. However, in such conventional Percoll-based voice analysis methods, the Percoll coefficient, which is the main parameter and has the largest amount of information, is In calculating +o', the autocorrelation coefficients from the 0th order to the 10th order are obtained using a correlator 1ull. 〜■1°, whereas it is necessary to calculate the pitch parameter P, which has a relatively small amount of information, and the voiced/unvoiced judgment signal Ui, the correlator [111 roughly 1°
Autocorrelation function values from 1st to 140th order Vll to V14
It is necessary to calculate 0, which requires a huge amount of calculation, and using the kneading 1st interval values v1□ to v14fl, f? :,
The even larger number of calculations in the modified correlator t131 increases the amount of calculation required for such an analysis process. was occupied.
従って、斯る従来方法を用いて音声分析装置の化 小へ並びに分析時開の短縮を図るには限界があり。Therefore, it is difficult to develop a speech analysis device using such conventional methods. There are limits to how much space can be reduced and how quickly the analysis time can be shortened.
マイクロコンピュータクラスの計算処理システムを用い
ていたのでは音声分析の実時間処理は不可能であった。Real-time processing of speech analysis was impossible using a microcomputer-class computing system.
一方、癌−声波形信号から@接実時間でピッチパラメー
タP、¥は有声無声の判定信号Uを抽出する為の方式が
従来から提案されており、これ等の1六を採用すれば、
従来の音声分析方法に於ける変形相関器t131を不要
とし、さらには相関器lの割算iFを太1Jに低減でき
る事となるが、特に従来の有声無声の判定方式に於いて
はその判定精度が低い為に、音声の有声音声領域と無声
音声領域との識別誤差が大きく第1図に示した如き音声
合成装置にて得られる合成音声の品質が劣化する欠点が
あった。On the other hand, a method has been proposed for extracting the pitch parameter P and the voiced/unvoiced determination signal U from the cancer-voice waveform signal in @contact time, and if these 16 are adopted,
This eliminates the need for the modified correlator t131 in the conventional speech analysis method, and further reduces the division iF of the correlator l to 1J, which is especially difficult to use in the conventional voiced/unvoiced determination method. Because of the low accuracy, there is a large error in identifying voiced and unvoiced regions of speech, which has the disadvantage of deteriorating the quality of synthesized speech obtained by the speech synthesizer as shown in FIG.
(ハ) 発明の目的
本発明は、上述の点に鑑みて為され、精度が旨く実時間
での判定処理が可能な音声の有声無声判定方式を提供す
るものである。(C) Object of the Invention The present invention has been made in view of the above-mentioned points, and provides a voiced/unvoiced voice determination method that is highly accurate and capable of performing determination processing in real time.
に) 発明の構成
本発明の音声の有声無声判定方式は、音声波形信号の低
域成分信号の最大値をL−1音声波、影信号の0次の自
己相関関数値(rv。、1次のパーコール係数をに1.
夫々異なる定数をa、b、c、dとした時、
(1)L皿x(a
(ll) L max > a 、且っに、<b(I
ff) L rnax) a、且つb(K、<c、且
つ■。<dの内いずれかの式を満足する時、無声であり
、それ以外の時、有声であると判御するものである。2) Structure of the Invention The voiced/unvoiced voice determination method of the present invention calculates the maximum value of the low-frequency component signal of the voice waveform signal as the L-1 voice wave and the 0th order autocorrelation function value (rv., 1st order) of the shadow signal. The Percoll coefficient of 1.
When different constants are a, b, c, and d, (1) L plate x(a (ll) L max > a, and <b(I
ff) L rnax) a, and b(K, < c, and ■. < d) When it satisfies any of the following expressions, it is determined to be voiceless, and otherwise it is determined to be voiced. .
(ホ) 実施例
Wろ図に本発明の音声の有声無声判定部式を採用した音
声分析装置?示す。1jj1図に於いて、(21)はP
CM化された音声波形信号Sの0次から10次までの自
己相関関数値V。〜■、。をq出する相関器、のは該相
関器シ1)から得られる自己相関関数値■。〜v1゜に
基づいて1次から10次までのノく−コール係数を導出
するパーコール係数抽出型である。□□□はPCM化さ
れた音声波形信号Sの高竣成のを遮断すルテイジタルフ
ィルタ構成のローノ<スフイルタテあり、音声の基本周
波数であるピッチ周期が存在する1 03Hz以下の低
域成分信号L全通過せしめ100Hz以上に存在するホ
ルマントの影響)全除去する。(24+ flピッチパ
ラメータ抽出器であり、該ローパスフィルタ(23)か
らの低域成分信号りの差分値を正、負、零の三値化1ご
号11 i I+ 、 +1−i 11.0″としてこ
の三価化信号の自己相関関数値W(τ)を算出して、こ
のイ1nが1”に最も近い最大値となる時の遅れ時間τ
を求め、この値τをピッチI(ラメータPとして出力す
る0125)はアンブノくラメータ抽出器であり、上記
パーコール係数に、〜に1゜の抽出に併つて得られる1
フレ一ム周期単位の残差電力Eを上記ピッチパラメータ
抽出器財から蜀られるピッチパラメータPに基づいて1
ピツチ毎に配分したアンプパラメータAを算出する。(
28)ニ本発明に保る有声無声判定部であり、上記相関
器c!11にて算出した音声波形信号Sの0次から10
次までの自己相関関数、[V。〜■、。の内、0次のそ
れで表わされる音声波形信号の電力値■。と、上記)(
−コール係数抽出器(22)にてり出した1次から10
次までのノく−コール係数に1〜KIOの内、1次のそ
れで表わされる音声波形@牲の1サンプル遅延の相関度
に、と、さらに上記ローパスフィルタ(23)から得ら
れる音声波形信号Sの低M成分信@Lとを用いて有声無
声の判定が行なわれる。即ち、特別な関数又はパラメー
タを用いる事なく、パーコール方式の音声分析に必要不
可欠な上記電力■。及び相関度に1、並びにピッチパラ
メータPの抽出に8女な上記低域成分信号りを用いての
判定処理が行なわれる。該有声無声判定部@は、上記低
域成分信号りの1フレーム中の最大値Lmaxを検出す
る最大値検出器筒と、該最大値検知器筒からの最大値L
naxと特定値aとを比較する第1の比較器圀)と、
上記パーコール係数抽出器のからの相関度に1と特定値
すとを比較する第2の比較器(29)と、同じく相関度
に1と特定値C(b<c)とを比較する第6の比較器(
2))と上記相関器(21+からの電力■。と%定値d
とを比較する第4の比較器(3I)と、1に備え、その
判定方式は、第1の比較器(28)にてL max≦a
を検知して音声波形信号Sの低周波成分が小をいと認め
られる時、又は第2の比較器@にてに、< 1) i検
知して音声波形信号Sのランダム性が高いと認められる
時、又は第6の比較器(30)にてに、(cを検知する
と共に第4の比較器の11にてV。<dを検知し、音声
波形信号のランダム性がある8度低い場合でもその電力
が小さい時はこのフレームの音声波形信号Sは無声五声
に依るものと判定するものである。具体的には、ヲ、が
実験的に求められた最適な各定数の条件範囲であり、例
えば
(1) Lmax≦ヲ7
(lll L mx >7t 、且つに、〈0(if
f) L max >2−、、且つ0<K、<0.8
751且つV。(E) A speech analysis device employing the speech voiced/unvoiced determination section formula of the present invention in the embodiment W diagram? show. In figure 1jj1, (21) is P
Autocorrelation function values V from the 0th order to the 10th order of the CM audio waveform signal S. ~■,. The correlator that outputs q is the autocorrelation function value obtained from the correlator 1). This is a Percoll coefficient extraction type that derives Noku-Call coefficients from the first order to the tenth order based on ~v1°. □□□ has a rotary digital filter configuration that blocks high frequency signals of the PCM audio waveform signal S, and completely passes the low-frequency component signal L below 103 Hz, where there is a pitch period that is the fundamental frequency of audio. The influence of formants that exist above 100 Hz) is completely removed. (24+ fl pitch parameter extractor, which converts the difference value of the low-frequency component signal from the low-pass filter (23) into three values of positive, negative, and zero. The autocorrelation function value W(τ) of this trivalent signal is calculated as , and the delay time τ when this i1n becomes the maximum value closest to 1"
The pitch I (0125 output as a parameter P) is an ambuno parameter extractor, and the above Percoll coefficient is added to the 1 obtained by extracting 1 degree to ~.
The residual power E per frame period is calculated as 1 based on the pitch parameter P extracted from the pitch parameter extractor.
Calculate the amplifier parameter A distributed for each pitch. (
28) D. A voiced/unvoiced determination unit maintained in the present invention, which is the correlator c! 10 from the 0th order of the audio waveform signal S calculated in step 11
The autocorrelation function up to [V. ~■,. Among them, the power value of the audio waveform signal represented by the 0th order ■. and above)(
- 10 from the first order extracted by the call coefficient extractor (22)
The correlation coefficient of the voice waveform represented by the first-order one of 1 to KIO to the next call coefficient @1 sample delay, and the voice waveform signal S obtained from the above-mentioned low-pass filter (23). Voiced/unvoiced judgment is performed using the low M component signal @L. In other words, the above-mentioned power (■) is indispensable for Percall-based voice analysis without using any special functions or parameters. A determination process is performed using the above-mentioned low frequency component signal with a correlation degree of 1 and a pitch parameter P of 8. The voiced/unvoiced determination unit @ includes a maximum value detector tube that detects the maximum value Lmax in one frame of the low frequency component signal, and a maximum value Lmax from the maximum value detector tube.
a first comparator that compares nax and a specific value a);
A second comparator (29) that compares the correlation degree of 1 from the Percoll coefficient extractor with a specific value C, and a sixth comparator that also compares the correlation degree of 1 and a specific value C (b<c). comparator (
2)) and the above correlator (power from 21+) and % constant value d
and a fourth comparator (3I) for comparing L max≦a
When the low frequency component of the audio waveform signal S is recognized to be small by detecting < 1) i is detected and the randomness of the audio waveform signal S is recognized to be high. or when the sixth comparator (30) detects (c and the fourth comparator 11 detects V.<d, and the audio waveform signal is 8 degrees lower due to randomness) However, when the power is small, it is determined that the audio waveform signal S of this frame is due to the unvoiced five tones.Specifically, wo is within the condition range of the optimal constants determined experimentally. Yes, for example (1) Lmax≦ヲ7 (lll L mx >7t, and <0(if
f) L max >2-, and 0<K, <0.8
751 and V.
停
の6式の内、いずれかの式を満足する時、無声であり、
それ以外の時有声であると、1oosに近い精度でg足
されこの判定@号Uが出力される。When any one of the six equations for stopping is satisfied, it is silent,
If it is voiced at other times, g is added with an accuracy close to 1oos and this judgment @No.U is output.
ここそ参考までに、上記ローノくスフイルタのとピッチ
パラメータ抽出器g:(24+の具体的構成を第4図に
示す。同図のローパスフィルタc!31は2個の加算器
顛・・・と1個の乗算器(41)と2個の遅延素子(4
2のディジタルフィルタ構成?備え、その伝達関数はと
なり、その定数eは
ただし、 fcは遮断周波数、
1日は標本化周波数
のように決定される0
従って、fcを例えば100歌程度に設定すればPCM
化された音声波形@号s[2のホルマント成分はかなり
低減されピッチパラメータPの抽出誤りを低減できる。For reference, the specific configuration of the pitch parameter extractor g:(24+) of the above-mentioned ronnox filter is shown in Figure 4.The low-pass filter c!31 in the same figure is composed of two adders... One multiplier (41) and two delay elements (4
2 digital filter configuration? Therefore, if fc is set to about 100 songs, then the PCM
The formant component of the converted speech waveform @s[2 is considerably reduced, and the error in extracting the pitch parameter P can be reduced.
ピッチパラメータ抽出翻例は、上記ローパスフィルタ(
ハ)から得られる音声波形信号e 1(zlの低域成分
信号S 、<Z)とこれに1サンプル分の遅延旧を行な
った信号S。(2−9とを比較する比較回路(43Iを
設け、この比較の結果、の三値化信号を出力し、該信号
が三値化信号相関器(44)で自己相関され、この相関
関数値が最大となる時の、i!i!九時間τがピッチパ
ラメータPとして出力される。即ち、三値化信号の自己
相関処理の為の演算量は、例えば1サンプル12ビツト
の音声波形信号sl直接自己相関処理する為の演算量に
比べて大巾に低減されているにもかかわらす、ピッチパ
ラメータPの抽出ぬりはほとんどない。An example of pitch parameter extraction is the above low-pass filter (
c) Audio waveform signal e 1 (low frequency component signal S of zl, <Z) obtained from c) and signal S obtained by delaying this by one sample. (A comparison circuit (43I) is provided to compare 2-9 with When i!i!9 time τ is the maximum, it is output as the pitch parameter P.In other words, the amount of calculation for autocorrelation processing of the ternary signal is, for example, the audio waveform signal sl of 12 bits per sample. Although the amount of calculation is greatly reduced compared to the amount of calculation required for direct autocorrelation processing, there is almost no extraction of the pitch parameter P.
(へ) 発明の効果
本発明の音声の有声無声判定方式は以上の説明から明ら
な如く、音声波形信号の低域成分信号の最大値ff L
maX s音声波形信号の0次の自己相関関数値を■
い1次のパーコール係数をに1、夫々異なる定数をa、
b、 c、 dとした時、中Lmax≦a
(11)、L max > a s且つx、<b(1i
l) L max > a 、且つb <KI<:
IC,目つV。< dのいずれかの式を満足する時、無
声であり、それ以外の時、有声であると判定するもので
あるので従来方式のno<、膨大な演算量全必要とする
音声波形41号の高次の自己相関処理2行なう事なく、
少゛々い演算量で実時間での判定処理が実行できる。(f) Effects of the Invention As is clear from the above explanation, the voiced/unvoiced voice determination method of the present invention determines the maximum value ff L of the low frequency component signal of the voice waveform signal.
maX s The zero-order autocorrelation function value of the audio waveform signal is ■
Let the first-order Percoll coefficient be 1, and let the different constants be a,
When b, c, and d, medium Lmax≦a (11), Lmax > a s and x, <b(1i
l) L max > a and b <KI<:
IC, eyes V. When any of the expressions < d is satisfied, it is determined that it is unvoiced, and otherwise it is determined that it is voiced. Without performing two high-order autocorrelation processes,
Judgment processing can be performed in real time with a small amount of calculation.
しかも、この判足東件として上述の如き復数の有効なデ
ータL凪x+に1t Vok用いているので、h胚の高
い判定処理が可能となる。Moreover, since 1tVok is used for the valid data L-x+ of the above-mentioned number as a test case, high-quality determination processing of h embryos is possible.
また音声分析処理にて音声の特徴パラメータを導出する
為に取り扱われるデータケ流用して有声無声の判定がで
きるので、この判定処理の為の演算量は極めて少なく、
斯る音声分析処理?小型のマイクロコンピュータクラス
の計算処理システムにて実時間で実行する事が可能とな
る。In addition, since it is possible to determine whether voice is voiced or unvoiced by reusing the data used to derive voice characteristic parameters in the voice analysis process, the amount of calculation for this determination process is extremely small.
Such voice analysis processing? It becomes possible to execute in real time on a small microcomputer-class calculation processing system.
単1図は一般的な音声合成装置の構成を示すブロック図
、第2図は従来の音声分析装置の構成を示すブロック図
、第6図は本発明の音声の有声無声の利足方式を採用し
た音声分析装置の構成を示すブロック図、第4図は本発
明方式に係る音声分析装置の要部のブロック図であり、
(21)は相関器、呟はパーコール係数抽出器、ハハロ
ーバスフィルタ、シ4)はピッチパラメータ抽出器、t
25)はアンプパラメータ抽出器、吸)は有声無声判定
部、(27)は最大値検出回路、(至)(29)@is
uは比較回路全天々示している0Figure 1 is a block diagram showing the configuration of a general speech synthesis device, Figure 2 is a block diagram showing the configuration of a conventional speech analysis device, and Figure 6 adopts the voiced and unvoiced method of the present invention. FIG. 4 is a block diagram showing the configuration of a speech analysis device according to the present invention, and FIG.
(21) is a correlator, t is a Percoll coefficient extractor, ha-hello bass filter, 4) is a pitch parameter extractor, t
25) is an amplifier parameter extractor, (in) is a voiced/unvoiced judgment unit, (27) is a maximum value detection circuit, (to) (29) @is
u indicates the entire comparison circuit 0
Claims (1)
ィルタから得られる低域成分信号の最大値Lmaxと、
音声波形信号の自己相関関数?算出する相関器から得ら
れる0次の自己相関閃数値V。と、上配相5lil器に
て得られる0次乃至n次の自己相関[・・関数■nに基
づいて1次乃至II次のパーコール係数に1〜Kne導
出するパーコール係数抽出器から得られる1次のパーコ
ール係数に1と、定数a、 b、 c。 dとに依って表わされる下式 %式% ( ( 06式内いずれかの式な満足する時、上記音声波形信号
が無声音声信号であると判定し、逆にいずれの式をも満
足しない時、上記音声波形信号は有声音声信号であると
判定する事を特許とした音声の有声シ無声判定方式。(1) The maximum value Lmax of the low-frequency component signal obtained from the low-pass filter that completely blocks the high-frequency components of the audio waveform signal,
Autocorrelation function of audio waveform signal? Zero-order autocorrelation flash value V obtained from the correlator to be calculated. and the 0th to nth autocorrelation obtained by the upper phase 5lil device [...function 1 obtained from the Percoll coefficient extractor that derives 1 to Kne from the 1st to 1 for the next Percoll coefficient and constants a, b, c. The following formula expressed by d and % formula % ( (When any of the formulas in formula 06 is satisfied, the above audio waveform signal is determined to be an unvoiced audio signal, and conversely, when neither formula is satisfied. , a patented voiced/unvoiced voice determination method that determines that the audio waveform signal is a voiced audio signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP9213583A JPS59216198A (en) | 1983-05-24 | 1983-05-24 | Sound/soundless discrimination system for voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP9213583A JPS59216198A (en) | 1983-05-24 | 1983-05-24 | Sound/soundless discrimination system for voice |
Publications (2)
Publication Number | Publication Date |
---|---|
JPS59216198A true JPS59216198A (en) | 1984-12-06 |
JPH0420198B2 JPH0420198B2 (en) | 1992-03-31 |
Family
ID=14045983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP9213583A Granted JPS59216198A (en) | 1983-05-24 | 1983-05-24 | Sound/soundless discrimination system for voice |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS59216198A (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS56104399A (en) * | 1980-01-23 | 1981-08-20 | Hitachi Ltd | Voice interval detection system |
-
1983
- 1983-05-24 JP JP9213583A patent/JPS59216198A/en active Granted
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS56104399A (en) * | 1980-01-23 | 1981-08-20 | Hitachi Ltd | Voice interval detection system |
Also Published As
Publication number | Publication date |
---|---|
JPH0420198B2 (en) | 1992-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9111526B2 (en) | Systems, method, apparatus, and computer-readable media for decomposition of a multichannel music signal | |
JP2948739B2 (en) | Karaoke system user's song scorer | |
US8440901B2 (en) | Musical score position estimating apparatus, musical score position estimating method, and musical score position estimating program | |
US8805697B2 (en) | Decomposition of music signals using basis functions with time-evolution information | |
Kawahara et al. | Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis | |
JP4516157B2 (en) | Speech analysis device, speech analysis / synthesis device, correction rule information generation device, speech analysis system, speech analysis method, correction rule information generation method, and program | |
JP3033061B2 (en) | Voice noise separation device | |
US5452398A (en) | Speech analysis method and device for suppyling data to synthesize speech with diminished spectral distortion at the time of pitch change | |
JP6321334B2 (en) | Signal processing apparatus and program | |
Samad et al. | Pitch detection of speech signals using the cross-correlation technique | |
JPS59216198A (en) | Sound/soundless discrimination system for voice | |
JP2841797B2 (en) | Voice analysis and synthesis equipment | |
Reddy et al. | Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method | |
Sharma et al. | Singing characterization using temporal and spectral features in Indian musical notes | |
JP6930089B2 (en) | Sound processing method and sound processing equipment | |
JP2000003200A (en) | Voice signal processor and voice signal processing method | |
JPH11143460A (en) | Method for separating, separating and extracting melody included in music performance | |
KR0171004B1 (en) | Method for Measuring Ratio of Fundamental Frequency and First Formant Using SAMDF | |
KR100322704B1 (en) | Method for varying voice signal duration time | |
Thakuria et al. | Musical instrument tuner | |
Wang et al. | Single channel music source separation based on harmonic structure estimation | |
Bartkowiak et al. | Hybrid sinusoidal modeling of music with near transparent audio quality | |
JP2023530262A (en) | audio transposition | |
RU78470U1 (en) | SYSTEM FOR DETERMINING THE PARAMETERS OF LINEAR SPECTRA OF VOCALIZED SOUNDS | |
Raso et al. | Differences between LP orders for tonal and noise parts of audio signal |